Information-theoretic measures for nonlinear causality detection: application to social media sentiment and cryptocurrency prices

Information transfer between time series is calculated using the asymmetric information-theoretic measure known as transfer entropy. Geweke’s autoregressive formulation of Granger causality is used to compute linear transfer entropy, and Schreiber’s general, non-parametric, information-theoretic formulation is used to quantify nonlinear transfer entropy. We first validate these measures against synthetic data. Then we apply these measures to detect statistical causality between social sentiment changes and cryptocurrency returns. We validate results by performing permutation tests by shuffling the time series, and calculate the Z-score. We also investigate different approaches for partitioning in non-parametric density estimation which can improve the significance. Using these techniques on sentiment and price data over a 48-month period to August 2018, for four major cryptocurrencies, namely bitcoin (BTC), ripple (XRP), litecoin (LTC) and ethereum (ETH), we detect significant information transfer, on hourly timescales, with greater net information transfer from sentiment to price for XRP and LTC, and instead from price to sentiment for BTC and ETH. We report the scale of nonlinear statistical causality to be an order of magnitude larger than the linear case.


Comments to the Author(s)
The paper is an original exercise in employing the information-theoretic concept of entropy as a means to study the flow of economic information in cryptocurrency markets. This allows to extend common studies of Granger causality to non-linear dependencies. The application on sentiment data and cryptocurrency prices is meant to show that the nonlinearity is necessary to capture the information flow.
First some small points that absolutely need fixing: -You repeat in many places that sentiment causes /price/. This would be both statistically (nonstationarity) as well as economically (arbitrage) extremely worrying. Now, you do what should be done (take log differences, and say so in §2 of Sec 5), but the terms should be straightened out in the entire paper: you study the effects of changes in sentiment on returns! (and vice versa) -The "vice versa" is natural in the context of Granger causality and transfer entropy, but your liberal use of "causality" as synonymous with those will likely offend many theorists. It would certainly have offended Hume, your first reference. All this can still be fine if you declare this "equivalence" for the scope of your paper; however, when you (implicitly) dismiss other meanings of causality (including the standard and intuitive one), this goes too far: shuffling will destroy, in expectation, the time dependence you measure --and thus I have no objections to your method --but it will certainly not "hence [destroy] any causality" (p 6; just measure mass and volume of a liter of water every day under identical circumstances to conclude that there is no causal relationship between those 2 from the 2 constant lines observed) -You also over-reach in your statement on p 11 claiming to provide evidence of sub-hourly dependence patterns. You have indication of a stronger pattern as short as one-hour lags; there is no reason to conclude about frequencies you cannot study.
Less technical points: -While you link well to the literature on sentiment analysis in crypto markets, you do not mention at all that your problem is at the heart of a large financial literature on price discovery. -This would have also led you to the obvious question, do you find "causality" between returns of different coins? -The paper would also profit from at least some discussion of why you believe the standard approach to handle non-linearities is insufficient, namely transforming the driver process.
-In relation to this, the fact that you find mostly no linear but non-linear dependence begs the question how the relation does look like. You should be able to estimate the relationship, ideally provide a graph. -This is not a purely statistical point, actually the standard pricing models in financial economics will explain returns in factor models where the factors are non-linear functions, potentially of past prices! (momentum) -The other elephant in the room which goes unmentioned is liquidity. You should have trading volume data from cryptocompare and could thus address this easily with at least the Amihud measure.
-Consequently, this raises the point of alternative drivers. Given that you have done your placebo tests on generated data, why not show what causality you can detect from LTC sentiment changes on BTC returns, etc?
-Finally a marginal point which may be subjective, but I do not buy the argument that we should expect more effect of sentiment in crypto markets due to their limited role as media of exchange. Common stock is much harder to exchange, and many more unsophisticated investors are investing in stock markets. I do not claim that the speculative motive is smaller; I just don't believe your chain of argument to be convincing in this regard. And you do not even need it in the least: sentiment analysis is a strong (if controversial) field in financial economics outside and inside of cryptocurrency markets, your question is valid and important independent of your beliefs about how cryptos "should" be priced.

Review form: Reviewer 2
Is the manuscript scientifically sound in its present form? Yes

Are the interpretations and conclusions justified by the results? Yes
Is the language acceptable? Yes

Do you have any ethical concerns with this paper? No
Have you any concerns about statistical analyses in this paper? No

Recommendation?
Accept with minor revision (please list in comments)

Comments to the Author(s)
The manuscript presents a study of non-linear causality detection between social sentiment and prices. The detection is done by using nonlinear transfer entropy.
The topics is timely and the authors develop it properly. The results are interesting for the interdisciplinary research community interested in quantitative modeling of financial market dynamics.
The paper can be published as it is, however the authors might consider to amend it according to the following minor comments: 1) The authors consider prices of four cryptocurrencies. In several other studies of casualties asset returns have been considered. Why they choose prices? Would a study done with returns produce the same or similar results?
2) the authors are using 24-month windows with a one month step. Most of the results of figures 4-7 therefore are highly correlated at successive time records. To better understand the dynamics of the nonlinear causality it would be interesting to see how the results change when the overlapping between time windows decreases; 3) the recent literature on sentiment signals extracted from tweets highlights the evergrowing role of bots in the dynamics of Tweeter. Did the authors perform any pre-processing to evaluate the presence or absence of bot actions in the construction of their sentiment data series? Are bot present? Do they have any role? 4) Trading of cryptocurrency and Twitter activity are geographically constrained (although markets for cryptocurrencies are active 24 hours a day). Are intraday patterns due to geographical activity of Asia, Europe, and Americas playing a role in the determination of transfer entropy? 5) Several of the references are appropriate and informative. However, other studies dealing with causality detection in financial markets are missing from the reference list. Authors might find a nice literature review in the paper Sandoval, L., 2014. Structure of a global network of financial companies based on transfer entropy. Entropy, 16(8), pp.4443-4482 and select from it the literature they consider appropriate.

Decision letter (RSOS-200863.R0)
We hope you are keeping well at this difficult and unusual time. We continue to value your support of the journal in these challenging circumstances. If Royal Society Open Science can assist you at all, please don't hesitate to let us know at the email address below.

Dear Professor Aste
On behalf of the Editors, I am pleased to inform you that your Manuscript RSOS-200863 entitled "Information-theoretic measures for non-linear causality detection: application to social media sentiment and cryptocurrency prices" has been accepted for publication in Royal Society Open Science subject to minor revision in accordance with the referee suggestions. Please find the referees' comments at the end of this email.
The reviewers and handling editors have recommended publication, but also suggest some minor revisions to your manuscript. Therefore, I invite you to respond to the comments and revise your manuscript.
• Ethics statement If your study uses humans or animals please include details of the ethical approval received, including the name of the committee that granted approval. For human studies please also detail whether informed consent was obtained. For field studies on animals please include details of all permissions, licences and/or approvals granted to carry out the fieldwork.
• Data accessibility It is a condition of publication that all supporting data are made available either as supplementary information or preferably in a suitable permanent repository. The data accessibility section should state where the article's supporting data can be accessed. This section should also include details, where possible of where to access other relevant research materials such as statistical tools, protocols, software etc can be accessed. If the data has been deposited in an external repository this section should list the database, accession number and link to the DOI for all data from the article that has been made publicly available. Data sets that have been deposited in an external repository and have a DOI should also be appropriately cited in the manuscript and included in the reference list.
If you wish to submit your supporting data or code to Dryad (http://datadryad.org/), or modify your current submission to dryad, please use the following link: http://datadryad.org/submit?journalID=RSOS&manu=RSOS-200863 • Competing interests Please declare any financial or non-financial competing interests, or state that you have no competing interests.
• Authors' contributions All submissions, other than those with a single author, must include an Authors' Contributions section which individually lists the specific contribution of each author. The list of Authors should meet all of the following criteria; 1) substantial contributions to conception and design, or acquisition of data, or analysis and interpretation of data; 2) drafting the article or revising it critically for important intellectual content; and 3) final approval of the version to be published.
All contributors who do not meet all of these criteria should be included in the acknowledgements.
We suggest the following format: AB carried out the molecular lab work, participated in data analysis, carried out sequence alignments, participated in the design of the study and drafted the manuscript; CD carried out the statistical analyses; EF collected field data; GH conceived of the study, designed the study, coordinated the study and helped draft the manuscript. All authors gave final approval for publication.
• Acknowledgements Please acknowledge anyone who contributed to the study but did not meet the authorship criteria.
• Funding statement Please list the source of funding for each author.
Please ensure you have prepared your revision in accordance with the guidance at https://royalsociety.org/journals/authors/author-guidelines/ --please note that we cannot publish your manuscript without the end statements. We have included a screenshot example of the end statements for reference. If you feel that a given heading is not relevant to your paper, please nevertheless include the heading and explicitly state that it is not relevant to your work.
Because the schedule for publication is very tight, it is a condition of publication that you submit the revised version of your manuscript before 29-Jul-2020. Please note that the revision deadline will expire at 00.00am on this date. If you do not think you will be able to meet this date please let me know immediately.
To revise your manuscript, log into https://mc.manuscriptcentral.com/rsos and enter your Author Centre, where you will find your manuscript title listed under "Manuscripts with Decisions". Under "Actions," click on "Create a Revision." You will be unable to make your revisions on the originally submitted version of the manuscript. Instead, revise your manuscript and upload a new version through your Author Centre.
When submitting your revised manuscript, you will be able to respond to the comments made by the referees and upload a file "Response to Referees" in "Section 6 -File Upload". You can use this to document any changes you make to the original manuscript. In order to expedite the processing of the revised manuscript, please be as specific as possible in your response to the referees. We strongly recommend uploading two versions of your revised manuscript: 1) Identifying all the changes that have been made (for instance, in coloured highlight, in bold text, or tracked changes); 2) A 'clean' version of the new manuscript that incorporates the changes made, but does not highlight them.
When uploading your revised files please make sure that you have: 1) A text file of the manuscript (tex, txt, rtf, docx or doc), references, tables (including captions) and figure captions. Do not upload a PDF as your "Main Document"; 2) A separate electronic file of each figure (EPS or print-quality PDF preferred (either format should be produced directly from original creation package), or original software format); 3) Included a 100 word media summary of your paper when requested at submission. Please ensure you have entered correct contact details (email, institution and telephone) in your user account; 4) Included the raw data to support the claims made in your paper. You can either include your data as electronic supplementary material or upload to a repository and include the relevant doi within your manuscript. Make sure it is clear in your data accessibility statement how the data can be accessed; 5) All supplementary materials accompanying an accepted article will be treated as in their final form. Note that the Royal Society will neither edit nor typeset supplementary material and it will be hosted as provided. Please ensure that the supplementary material includes the paper details where possible (authors, article title, journal name).
Supplementary files will be published alongside the paper on the journal website and posted on the online figshare repository (https://rs.figshare.com/). The heading and legend provided for each supplementary file during the submission process will be used to create the figshare page, so please ensure these are accurate and informative so that your files can be found in searches. Files on figshare will be made available approximately one week before the accompanying article so that the supplementary material can be attributed a unique DOI.
Please note that Royal Society Open Science charge article processing charges for all new submissions that are accepted for publication. Charges will also apply to papers transferred to Royal Society Open Science from other Royal Society Publishing journals, as well as papers submitted as part of our collaboration with the Royal Society of Chemistry (https://royalsocietypublishing.org/rsos/chemistry).
If your manuscript is newly submitted and subsequently accepted for publication, you will be asked to pay the article processing charge, unless you request a waiver and this is approved by Royal Society Publishing. You can find out more about the charges at https://royalsocietypublishing.org/rsos/charges. Should you have any queries, please contact openscience@royalsociety.org.
Once again, thank you for submitting your manuscript to Royal Society Open Science and I look forward to receiving your revision. If you have any questions at all, please do not hesitate to get in touch. Both reviewers (one recommended) appreciate the contribution and agree that the paper should be accepted with minor revisions. The reviews are quite detailed. The code should be made available, as well as data (see comment from 1st reviewer).
Reviewer comments to Author: Reviewer: 1 Comments to the Author(s) The paper is an original exercise in employing the information-theoretic concept of entropy as a means to study the flow of economic information in cryptocurrency markets. This allows to extend common studies of Granger causality to non-linear dependencies. The application on sentiment data and cryptocurrency prices is meant to show that the nonlinearity is necessary to capture the information flow.
First some small points that absolutely need fixing: -You repeat in many places that sentiment causes /price/. This would be both statistically (nonstationarity) as well as economically (arbitrage) extremely worrying. Now, you do what should be done (take log differences, and say so in §2 of Sec 5), but the terms should be straightened out in the entire paper: you study the effects of changes in sentiment on returns! (and vice versa) -The "vice versa" is natural in the context of Granger causality and transfer entropy, but your liberal use of "causality" as synonymous with those will likely offend many theorists. It would certainly have offended Hume, your first reference. All this can still be fine if you declare this "equivalence" for the scope of your paper; however, when you (implicitly) dismiss other meanings of causality (including the standard and intuitive one), this goes too far: shuffling will destroy, in expectation, the time dependence you measure --and thus I have no objections to your method --but it will certainly not "hence [destroy] any causality" (p 6; just measure mass and volume of a liter of water every day under identical circumstances to conclude that there is no causal relationship between those 2 from the 2 constant lines observed) -You also over-reach in your statement on p 11 claiming to provide evidence of sub-hourly dependence patterns. You have indication of a stronger pattern as short as one-hour lags; there is no reason to conclude about frequencies you cannot study.
Less technical points: -While you link well to the literature on sentiment analysis in crypto markets, you do not mention at all that your problem is at the heart of a large financial literature on price discovery. -This would have also led you to the obvious question, do you find "causality" between returns of different coins? -The paper would also profit from at least some discussion of why you believe the standard approach to handle non-linearities is insufficient, namely transforming the driver process.
-In relation to this, the fact that you find mostly no linear but non-linear dependence begs the question how the relation does look like. You should be able to estimate the relationship, ideally provide a graph. -This is not a purely statistical point, actually the standard pricing models in financial economics will explain returns in factor models where the factors are non-linear functions, potentially of past prices! (momentum) -The other elephant in the room which goes unmentioned is liquidity. You should have trading volume data from cryptocompare and could thus address this easily with at least the Amihud measure.
-Consequently, this raises the point of alternative drivers. Given that you have done your placebo tests on generated data, why not show what causality you can detect from LTC sentiment changes on BTC returns, etc? -Finally a marginal point which may be subjective, but I do not buy the argument that we should expect more effect of sentiment in crypto markets due to their limited role as media of exchange. Common stock is much harder to exchange, and many more unsophisticated investors are investing in stock markets. I do not claim that the speculative motive is smaller; I just don't believe your chain of argument to be convincing in this regard. And you do not even need it in the least: sentiment analysis is a strong (if controversial) field in financial economics outside and inside of cryptocurrency markets, your question is valid and important independent of your beliefs about how cryptos "should" be priced.

Reviewer: 2
Comments to the Author(s) The manuscript presents a study of non-linear causality detection between social sentiment and prices. The detection is done by using nonlinear transfer entropy.
The topics is timely and the authors develop it properly. The results are interesting for the interdisciplinary research community interested in quantitative modeling of financial market dynamics.
The paper can be published as it is, however the authors might consider to amend it according to the following minor comments: 1) The authors consider prices of four cryptocurrencies. In several other studies of casualties asset returns have been considered. Why they choose prices? Would a study done with returns produce the same or similar results?
2) the authors are using 24-month windows with a one month step. Most of the results of figures 4-7 therefore are highly correlated at successive time records. To better understand the dynamics of the nonlinear causality it would be interesting to see how the results change when the overlapping between time windows decreases; 3) the recent literature on sentiment signals extracted from tweets highlights the evergrowing role of bots in the dynamics of Tweeter. Did the authors perform any pre-processing to evaluate the presence or absence of bot actions in the construction of their sentiment data series? Are bot present? Do they have any role? 4) Trading of cryptocurrency and Twitter activity are geographically constrained (although markets for cryptocurrencies are active 24 hours a day). Are intraday patterns due to geographical activity of Asia, Europe, and Americas playing a role in the determination of transfer entropy? 5) Several of the references are appropriate and informative. However, other studies dealing with causality detection in financial markets are missing from the reference list. Authors might find a nice literature review in the paper Sandoval, L., 2014. Structure of a global network of financial companies based on transfer entropy. Entropy, 16(8), pp.4443-4482 and select from it the literature they consider appropriate.

See Appendix A.
Decision letter (RSOS-200863.R1) We hope you are keeping well at this difficult and unusual time. We continue to value your support of the journal in these challenging circumstances. If Royal Society Open Science can assist you at all, please don't hesitate to let us know at the email address below.
Dear Professor Aste, It is a pleasure to accept your manuscript entitled "Information-theoretic measures for non-linear causality detection: application to social media sentiment and cryptocurrency prices" in its current form for publication in Royal Society Open Science.
You can expect to receive a proof of your article in the near future. Please contact the editorial office (openscience_proofs@royalsociety.org) and the production office (openscience@royalsociety.org) to let us know if you are likely to be away from e-mail contact --if you are going to be away, please nominate a co-author (if available) to manage the proofing process, and ensure they are copied into your email to the journal.
Due to rapid publication and an extremely tight schedule, if comments are not received, your paper may experience a delay in publication.
Please see the Royal Society Publishing guidance on how you may share your accepted author manuscript at https://royalsociety.org/journals/ethics-policies/media-embargo/.

Manuscript Resubmission
Dear Editor, please find enclosed the manuscript entitled "Information-theoretic measures for non-linear causality detection: application to social media sentiment and cryptocurrency prices" that we submit for publication as a regular article on Open Science.
We are thankful to the reviewers for the careful reading of the manuscript and their comments that helped to highly improve the quality of the manuscript.
We have revised the whole manuscript following their suggestions and criticisms. We accepted a large part of their suggestions and we modified the manuscript accordingly. Table 1 reports the reviewers' comments and our modifications. We did non accept some of the minor suggestions and this is motivated in Table 2. A version of the manuscript with changes in blue is submitted as well to help verify the modifications to the manuscript.
We hope that the present version is judged acceptable for publication in jour journal.
With kind regards, Tomaso Aste and Keskin Zac 1 Appendix A

Changes in response to reviewer's comments
Please find the list of changes that have been made to the manuscript in grateful response to the very helpful feedback from reviewers.
We split these up into required fixes and other comments. First points "that absolutely need fixing": You repeat in many places that sentiment causes /price/. This would be both statistically (non-stationarity) as well as economically (arbitrage) extremely worrying. Now, you do what should be done (take log differences, and say so in §2 of Sec 5), but the terms should be straightened out in the entire paper: you study the effects of changes in sentiment on returns! (and vice versa) We acknowledge this was an area which required clarification -in particular at the top of p3 where we introduce the Granger causality test -and so we have clarified this throughout. In some cases it is reasonable from context that, if changes in sentiment cause changes in price, then price is in some way driven by sentiment.
Throughout manuscript

Reviewer 1
The "vice versa" is natural in the context of Granger causality and transfer entropy, but your liberal use of "causality" as synonymous with those will likely offend many theorists. It would certainly have offended Hume, your first reference. All this can still be fine if you declare this "equivalence" for the scope of your paper; however, when you (implicitly) dismiss other meanings of causality (including the standard and intuitive one), this goes too far: shuffling will destroy, in expectation, the time dependence you measure -and thus I have no objections to your method -but it will certainly not "hence [destroy] any causality" We acknowledge that it was not specified that/where, when using the word causality, we are (usually) referring only to the statistical causality interpretation of Wiener / Granger. We have now replaced references to causality, to clarify where we mean statistical/G-causality Throughout manuscript Reviewer 1 You also over-reach in your statement on p 11 claiming to provide evidence of sub-hourly dependence patterns. You have indication of a stronger pattern as short as one-hour lags; there is no reason to conclude about frequencies you cannot study.
Yes acknowledged; the evidence is clear for hourly dependences and this is insufficient for concluding about higher-frequency relationships, whilst it is sufficient for demonstrating that there is information transfer at hourly timescales. We have removed these extraneous claims.