Inferring temporal dynamics from cross-sectional data using Langevin dynamics

Cross-sectional studies are widely prevalent since they are more feasible to conduct compared with longitudinal studies. However, cross-sectional data lack the temporal information required to study the evolution of the underlying dynamics. This temporal information is essential to develop predictive computational models, which is the first step towards causal modelling. We propose a method for inferring computational models from cross-sectional data using Langevin dynamics. This method can be applied to any system where the data-points are influenced by equal forces and are in (local) equilibrium. The inferred model will be valid for the time span during which this set of forces remains unchanged. The result is a set of stochastic differential equations that capture the temporal dynamics, by assuming that groups of data-points are subject to the same free energy landscape and amount of noise. This is a ‘baseline’ method that initiates the development of computational models and can be iteratively enhanced through the inclusion of domain expert knowledge as demonstrated in our results. Our method shows significant predictive power when compared against two population-based longitudinal datasets. The proposed method can facilitate the use of cross-sectional datasets to obtain an initial estimate of the underlying dynamics of the respective systems.


Comments to the Author(s)
In this paper, the authors propose a method to extract a model from a cross sectional study, thus alleviating the need for longitudinal study that are known to be costly both in time and resource.
Their model rely on the hypothesis that the current system is the result of a diffusion process in a particular energy landscape and that it has reached its equilibrium distribution. From the data, they may thus infer this energy landscape and make predictions based on the diffusion of a point in this landscape.
While this method is well known and validated in physics, I have strong doubts of its validity in the physiological domain. In particular, a strong hypothesis that is not mentioned is that the distribution does not depand (or depend sonly slightly) on other variables or factors. However, the BMI might be highly correlated with alimentation, activity, guts microbiome, etc. in a way that dominates the diffusion.
Moreover, it is very unclear for me what is the value of the prediction that may be drawn from this kind of model as no time scale, no mechanism and no units may be inferred by the method (a point rightly mentioned by the authors). Typically, what could possibly be the equivalent of page 7 on the BMI example? The presentation of the validation procedure should also be lengthened as I still have difficulties to understand what is exactly covered by the rescaled A. In particular, I would like to see the mathematical description of A^{average} and U_{CI}. Another of my concerns is about the small sample size for some of the data points, while you estimated your procedure with 5000 points, you apply it in the case of 39 points which may gives strong undersampling effects, a word about that should be made in the methods description (typically by evaluating this parameter in the toy model of figure 1).
In figure 3, is it normal that the four panels look exactly the same? And by the way at the end of section 3 you state : « We observe from Fig. 3 that this relative number is between 1 and -1 for most of the BMI bins. » Which is not surprising for a quantity that is rescaled to be between 1 and -1! I guessed that you want to emphasized that this value is usually not near 0 but the formulation is very strange.
As I still have doubts about the method, I would recommend the authors to make their validation procedure more detailed and clear. In particular by giving the mathematical description of all the mentioned quantities.

Decision letter (RSOS-202147.R0)
We hope you are keeping well at this difficult and unusual time. We continue to value your support of the journal in these challenging circumstances. If Royal Society Open Science can assist you at all, please don't hesitate to let us know at the email address below.

Dear Dr Quax
The Editors assigned to your paper RSOS-202147 "Inferring temporal dynamics from crosssectional data using Langevin dynamics" have made a decision based on their reading of the paper and any comments received from reviewers.
Regrettably, in view of the reports received, the manuscript has been rejected in its current form. However, a new manuscript may be submitted which takes into consideration these comments.
We invite you to respond to the comments supplied below and prepare a resubmission of your manuscript. Below the referees' and Editors' comments (where applicable) we provide additional requirements. We provide guidance below to help you prepare your revision.
Please note that resubmitting your manuscript does not guarantee eventual acceptance, and we do not generally allow multiple rounds of revision and resubmission, so we urge you to make every effort to fully address all of the comments at this stage. If deemed necessary by the Editors, your manuscript will be sent back to one or more of the original reviewers for assessment. If the original reviewers are not available, we may invite new reviewers.
Please resubmit your revised manuscript and required files (see below) no later than 11-Oct-2021. Note: the ScholarOne system will 'lock' if resubmission is attempted on or after this deadline. If you do not think you will be able to meet this deadline, please contact the editorial office immediately.
Please note article processing charges apply to papers accepted for publication in Royal Society Open Science (https://royalsocietypublishing.org/rsos/charges). Charges will also apply to papers transferred to the journal from other Royal Society Publishing journals, as well as papers submitted as part of our collaboration with the Royal Society of Chemistry (https://royalsocietypublishing.org/rsos/chemistry). Fee waivers are available but must be requested when you submit your manuscript (https://royalsocietypublishing.org/rsos/waivers). Comments to the Author: Dear authors I regret to inform you that your manuscript is rejected due to several weaknesses mentioned in the attached peer-review report. We hope you will find them useful and if you can address them you are welcomed to resubmit a new version of your paper. Best regards Reviewer comments to Author: Reviewer: 1 Comments to the Author(s) In this paper, the authors propose a method to extract a model from a cross sectional study, thus alleviating the need for longitudinal study that are known to be costly both in time and resource.
Their model rely on the hypothesis that the current system is the result of a diffusion process in a particular energy landscape and that it has reached its equilibrium distribution. From the data, they may thus infer this energy landscape and make predictions based on the diffusion of a point in this landscape.
While this method is well known and validated in physics, I have strong doubts of its validity in the physiological domain. In particular, a strong hypothesis that is not mentioned is that the distribution does not depand (or depend sonly slightly) on other variables or factors. However, the BMI might be highly correlated with alimentation, activity, guts microbiome, etc. in a way that dominates the diffusion.
Moreover, it is very unclear for me what is the value of the prediction that may be drawn from this kind of model as no time scale, no mechanism and no units may be inferred by the method (a point rightly mentioned by the authors). Typically, what could possibly be the equivalent of page 7 on the BMI example? The presentation of the validation procedure should also be lengthened as I still have difficulties to understand what is exactly covered by the rescaled A. In particular, I would like to see the mathematical description of A^{average} and U_{CI}. Another of my concerns is about the small sample size for some of the data points, while you estimated your procedure with 5000 points, you apply it in the case of 39 points which may gives strong undersampling effects, a word about that should be made in the methods description (typically by evaluating this parameter in the toy model of figure 1).
In figure 3, is it normal that the four panels look exactly the same? And by the way at the end of section 3 you state : « We observe from Fig. 3 that this relative number is between 1 and -1 for most of the BMI bins. » Which is not surprising for a quantity that is rescaled to be between 1 and -1! I guessed that you want to emphasized that this value is usually not near 0 but the formulation is very strange.
As I still have doubts about the method, I would recommend the authors to make their validation procedure more detailed and clear. In particular by giving the mathematical description of all the mentioned quantities.

===PREPARING YOUR MANUSCRIPT===
Your revised paper should include the changes requested by the referees and Editors of your manuscript. You should provide two versions of this manuscript and both versions must be provided in an editable format: one version identifying all the changes that have been made (for instance, in coloured highlight, in bold text, or tracked changes); a 'clean' version of the new manuscript that incorporates the changes made, but does not highlight them. This version will be used for typesetting if your manuscript is accepted.
Please ensure that any equations included in the paper are editable text and not embedded images.
Please ensure that you include an acknowledgements' section before your reference list/bibliography. This should acknowledge anyone who assisted with your work, but does not qualify as an author per the guidelines at https://royalsociety.org/journals/ethicspolicies/openness/.
While not essential, it will speed up the preparation of your manuscript proof if accepted if you format your references/bibliography in Vancouver style (please see https://royalsociety.org/journals/authors/author-guidelines/#formatting). You should include DOIs for as many of the references as possible.
If you have been asked to revise the written English in your submission as a condition of publication, you must do so, and you are expected to provide evidence that you have received language editing support. The journal would prefer that you use a professional language editing service and provide a certificate of editing, but a signed letter from a colleague who is a native speaker of English is acceptable. Note the journal has arranged a number of discounts for authors using professional language editing services (https://royalsociety.org/journals/authors/benefits/language-editing/).

===PREPARING YOUR REVISION IN SCHOLARONE===
To revise your manuscript, log into https://mc.manuscriptcentral.com/rsos and enter your Author Centre -this may be accessed by clicking on "Author" in the dark toolbar at the top of the page (just below the journal name). You will find your manuscript listed under "Manuscripts with Decisions". Under "Actions", click on "Create a Revision".
Attach your point-by-point response to referees and Editors at Step 1 'View and respond to decision letter'. This document should be uploaded in an editable file type (.doc or .docx are preferred). This is essential.
Please ensure that you include a summary of your paper at Step 2 'Type, Title, & Abstract'. This should be no more than 100 words to explain to a non-scientific audience the key findings of your research. This will be included in a weekly highlights email circulated by the Royal Society press office to national UK, international, and scientific news outlets to promote your work.

At
Step 3 'File upload' you should include the following files: --Your revised manuscript in editable file format (.doc, .docx, or .tex preferred). You should upload two versions: 1) One version identifying all the changes that have been made (for instance, in coloured highlight, in bold text, or tracked changes); 2) A 'clean' version of the new manuscript that incorporates the changes made, but does not highlight them. --If you are requesting a discretionary waiver for the article processing charge, the waiver form must be included at this step.
--If you are providing image files for potential cover images, please upload these at this step, and inform the editorial office you have done so. You must hold the copyright to any image provided.
--A copy of your point-by-point response to referees and Editors. This will expedite the preparation of your proof.

At
Step 6 'Details & comments', you should review and respond to the queries on the electronic submission form. In particular, we would ask that you do the following: --Ensure that your data access statement meets the requirements at https://royalsociety.org/journals/authors/author-guidelines/#data. You should ensure that you cite the dataset in your reference list. If you have deposited data etc in the Dryad repository, please include both the 'For publication' link and 'For review' link at this stage.
--If you are requesting an article processing charge waiver, you must select the relevant waiver option (if requesting a discretionary waiver, the form should have been uploaded at Step 3 'File upload' above).
--If you have uploaded ESM files, please ensure you follow the guidance at https://royalsociety.org/journals/authors/author-guidelines/#supplementary-material to include a suitable title and informative caption. An example of appropriate titling and captioning may be found at https://figshare.com/articles/Table_S2_from_Is_there_a_trade-off_between_peak_performance_and_performance_breadth_across_temperatures_for_aerobic_sc ope_in_teleost_fishes_/3843624.

At
Step 7 'Review & submit', you must view the PDF proof of the manuscript before you will be able to submit the revision. Note: if any parts of the electronic submission form have not been completed, these will be noted by red message boxes.

Author's Response to Decision Letter for (RSOS-202147.R0)
See Appendix A.

Comments to the Author(s)
In this paper the authors propose a method to derive temporal information (longitudinal study) from a one point in time measure (cross-sectional study). In order to do so, they make the hypothesis that the system under study is a kind of physical system at equilibrium and derive the approximate "energy" landscape from the distribution of the data-points. They then need some minimal hypothesis to determine a possible field force that may produce this landscape and use these forces to predict the evolution of the initial system.
The authors have done a great work to responds to my previous remarks and in particular to emphasize the hypothesis upon which their method is build and gives the framework in which it will stay valid. They also discuss in length how this "zero knowledge" model maybe improved and manipulate by field expert to draw information from this kind of result giving a more clear picture of the applicability of the method.

Decision letter (RSOS-211374.R0)
We hope you are keeping well at this difficult and unusual time. We continue to value your support of the journal in these challenging circumstances. If Royal Society Open Science can assist you at all, please don't hesitate to let us know at the email address below.
Dear Dr Quax, I am pleased to inform you that your manuscript entitled "Inferring temporal dynamics from cross-sectional data using Langevin dynamics" is now accepted for publication in Royal Society Open Science.
Please ensure that you send to the editorial office an editable version of your accepted manuscript, and individual files for each figure and table included in your manuscript. You can send these in a zip folder if more convenient. Failure to provide these files may delay the processing of your proof. You may disregard this request if you have already provided these files to the editorial office.
If you have not already done so, please remember to make any data sets or code libraries 'live' prior to publication, and update any links as needed when you receive a proof to check -for instance, from a private 'for review' URL to a publicly accessible 'for publication' URL. It is good practice to also add data sets, code and other digital materials to your reference list.
You can expect to receive a proof of your article in the near future. Please contact the editorial office (openscience@royalsociety.org) and the production office (openscience_proofs@royalsociety.org) to let us know if you are likely to be away from e-mail contact --if you are going to be away, please nominate a co-author (if available) to manage the proofing process, and ensure they are copied into your email to the journal. Due to rapid publication and an extremely tight schedule, if comments are not received, your paper may experience a delay in publication.
Please see the Royal Society Publishing guidance on how you may share your accepted author manuscript at https://royalsociety.org/journals/ethics-policies/media-embargo/. After publication, some additional ways to effectively promote your article can also be found here https://royalsociety.org/blog/2020/07/promoting-your-latest-paper-and-tracking-yourresults/.
On behalf of the Editors of Royal Society Open Science, thank you for your support of the journal and we look forward to your continued contributions to Royal Society Open Science. It is my pleasure to accept as is the resubmission of your paper which correctly took into account the criticisms of the reviewers. Thank you for your contribution to RSOS.
Reviewer comments to Author: Reviewer: 1 Comments to the Author(s) In this paper the authors propose a method to derive temporal information (longitudinal study) from a one point in time measure (cross-sectional study). In order to do so, they make the hypothesis that the system under study is a kind of physical system at equilibrium and derive the approximate "energy" landscape from the distribution of the data-points. They then need some minimal hypothesis to determine a possible field force that may produce this landscape and use these forces to predict the evolution of the initial system.
The authors have done a great work to responds to my previous remarks and in particular to emphasize the hypothesis upon which their method is build and gives the framework in which it will stay valid. They also discuss in length how this "zero knowledge" model maybe improved and manipulate by field expert to draw information from this kind of result giving a more clear picture of the applicability of the method. We appreciate the time and effort that you and the reviewers dedicated in reviewing our manuscript and providing valuable comments. This has substantially improved the quality and clarity of the manuscript. Below we give a comprehensive overview of the issues raised by the reviewer, our response and how we addressed the raised issues. The authors welcome further constructive comments if any.
Yours sincerely, on behalf of the authors,

Feedback & Response
We would like to thank the reviewer for the accurate and fair feedback. The reviewer appreciates the novelty of the idea and shows understanding of the goal of the paper.
In the revised manuscript, we have focussed on improving the manuscript by addressing the following main issues based on the reviewer's comments. In summary, first, a detailed account of all the assumptions made in this method have been provided. Second, a detailed discussion of the goal of this method, the systems to which this method is applicable, and the usefulness of this method have been included. Third, mathematical descriptions of all quantities have been provided. In addition, we explain how this method is applicable to the individual level and the population level. We have also included additional theoretical tests and statistical analyses to address the reviewer's concerns.
We have summarized the issues raised by the reviewer in the table below. The authors' remark and the changes introduced in the manuscript are indicated in their respective columns.

Reviewer's remark
Authors' remark Introduced changes A strong hypothesis that the distribution does not depend (or depends only slightly) on other variables or factors is not mentioned.
We apologize for this omission. We have now provided a detailed account of all the assumptions in the revised manuscript. We have also better explained how this method applies to the individual level and the population level.
We have added a discussion of the assumptions in paragraphs 4-8 of section 1: Introduction (pages 2-3). We have also summarized the assumptions in paragraph 2 of section 4: Discussion (page 15).
What is the value of the prediction that may be drawn from this kind of model as no time scale, no mechanism and no units may be inferred by the method?
The proposed method can be a useful tool to get an initial estimate of the underlying dynamics of the system when only cross-sectional data is available. Later with the inclusion of domain expert knowledge, the inferred 'baseline' model can be extended into a causal model The details of systems to which this method is applicable are included in paragraph 9 of section 1: Introduction (pages 3-4).
The goal of the method is discussed in paragraphs 10-11 of section 1: Introduction (page 4). and the timescale of the model predictions can be estimated.
The usefulness of the method is summarized in the last paragraph of section 4: Discussion (page 16). What could be the equivalent of page 7 on the BMI example?
In the revised manuscript, we have provided a theoretical test where the free energy landscape is changed by adding a term, which can be considered as analogous to an intervention, and then discussed potential comparison measures between the pre-intervention and post-intervention cases. This theoretical test has been performed for both the twoattractor and single attractor landscapes.
We have added a discussion of the theoretical tests regarding interventions from paragraph 5 to the last paragraph of section 2(b): Numerical algorithm (pages 7-9) and in Figure 2.
Mathematical descriptions of the mentioned quantities in the validation procedure are not provided.
We apologize for the omission. We have now included detailed mathematical descriptions of all quantities used for comparing estimates of the temporal dynamics obtained by our method against longitudinal datasets.
We have added the mathematical expressions of the following quantities in section 2(c): Comparison with longitudinal dataset (pages 9-11): We have also included Figure 3 to explain the comparison of the prediction accuracy of our model to random choice. Performance of the method on a small dataset is not provided.
Thank you for bringing this to our attention. We have improved the prior version by providing performance evaluations of the method on a small dataset.
We have added the comparison of performance results of our model on a large dataset (5000 data-points) and small datasets (40 data-points) in paragraph 5 of section 3: Results (page 14) and Figure 6.