Peer-to-peer loan acceptance and default prediction with artificial intelligence

Logistic regression (LR) and support vector machine algorithms, together with linear and nonlinear deep neural networks (DNNs), are applied to lending data in order to replicate lender acceptance of loans and predict the likelihood of default of issued loans. A two-phase model is proposed; the first phase predicts loan rejection, while the second one predicts default risk for approved loans. LR was found to be the best performer for the first phase, with test set recall macro score of 77.4%. DNNs were applied to the second phase only, where they achieved best performance, with test set recall score of 72%, for defaults. This shows that artificial intelligence can improve current credit risk models reducing the default risk of issued loans by as much as 70%. The models were also applied to loans taken for small businesses alone. The first phase of the model performs significantly better when trained on the whole dataset. Instead, the second phase performs significantly better when trained on the small business subset. This suggests a potential discrepancy between how these loans are screened and how they should be analysed in terms of default prediction.


Comments to the Author(s)
A list of acronyms would be most useful, with the constituent words appearing in brackets next to each acronym.
Owing to its central role in the performed research, P2P should be defined at the very beginning of the manuscript.
Artificial neural networks belong to the computational (rather than artificial) intelligence paradigm.
The "significant step forward to applying Big Data and Artificial Intelligence techniques to P2P" is not made clear.
The review of the literature is not always pertinent to the main theme of the submission.
It is not clear why the two datasets have been used concurrently rather than independently. It is also not obvious how or why the results obtained can be transferred to other datasets.
Carrying on with the previous point: How uniform are the datasets over the kind of loan requested? Their discrepancies over parameters as well decision type should be clearly stated. Would it be to advantage to treat each category independently (in a "one against the others" fashion)?
Since imputation is below 10%, both the full and the reduced datasets could be used for training and testing, with the results compared and important derivations made.
Along the same lines, how uniform is the data over the kind of loan requested over the two datasets? Would it be to advantage to treat (consider, in terms of training/testing and results) each dataset independently, thus simplifying the problem as well as the implemented methods?
At present Section 2.b is quite descriptive, with the implemented procedure not being adequately detailed to allow its duplication by the interested reader.
Why have the specific ANN architectures been selected? Different neural network training criteria including • pertinent architectures as well as nodes per layer; • earlier termination of the training stage; • cross-validation on the dataset (e.g. five-, 10-and/or leave-one-out cross-validation, with the folds being created either at random, or following ordering of the patterns with -for instance, for five-fold cross-validation -the 1st , 6th, 11th, … pattern belonging to the first fold, the 2nd, 7th, 12th etc. to the second fold, and so forth to the fifth fold), so that each fold contains the same number of patterns which extend over the entire problem space, instead of just a "split between training and validation sets" should be investigated.
The authors should ensure that all methodologies, metrics, data handling techniques etc. are accompanied by the corresponding primary references.
Section 3 contains information that should be moved to the previous section.
If logistic regression is, indeed, as successful as stated in section 3.a(iii), there is no need for more complicated (especially non-parametric) methodologies, which can add redundant and distorting detail to the problem methodology/solution and are not directly/easily (or even at all) expressed in a direct/parametric fashion.
The authors should ensure that the dataset is stationary; in case this is not so, alternative methodologies and/or on-line (re)training should be implemented.
Class imbalance is not optimal for ANN training; the appropriate measures should be taken in order to avoid training (and, thus, also) testing bias.
For this kind of problems, on-line training would be advisable so as to ensure the appropriateness/capability of the ANN to handle the changing (in time) data characteristics.
There is a considerable distance between linear and deep neural networks. Why has not an alternative (in-between) ANN architecture also been tested?
In all cases, it should be ensured that the number of ANN free parameters (weights and biases) is smaller than the number of training patterns. The authors should check for overfitting in the DNNs used, especially as the datasets are small (in relation to the size of the ANN).
What distinguishes test from recall patterns? Has no validation set been used? Testing can, by no means, be implemented on data that has been used for training.
At times it is not clear whether the aim is to mimic human decision making or to optimise/maximise recall as well as prediction.
The choice of features for the first stage should be fully justified.
If properly constructed and trained, the ANNs should be able to accurately learn the training patterns, as well as predict the test data.
A list of acronyms would be most useful, with the constituent words appearing in brackets next to each acronym.
Owing to its central role in the performed research, P2P should be defined at the very beginning of the manuscript.
Artificial neural networks belong to the computational (rather than artificial) intelligence paradigm.
The "significant step forward to applying Big Data and Artificial Intelligence techniques to P2P" is not made clear.
The review of the literature is not always pertinent to the main theme of the submission.
It is not clear why the two datasets have been used concurrently rather than independently. It is also not obvious how or why the results obtained can be transferred to other datasets.
Carrying on with the previous point: How uniform are the datasets over the kind of loan requested? Their discrepancies over parameters as well decision type should be clearly stated. Would it be to advantage to treat each category independently (in a "one against the others" fashion)?
Since imputation is below 10%, both the full and the reduced datasets could be used for training and testing, with the results compared and important derivations made.
Along the same lines, how uniform is the data over the kind of loan requested over the two datasets? Would it be to advantage to treat (consider, in terms of training/testing and results) each dataset independently, thus simplifying the problem as well as the implemented methods? At present Section 2.b is quite descriptive, with the implemented procedure not being adequately detailed to allow its duplication by the interested reader.
Why have the specific ANN architectures been selected? Different neural network training criteria including • pertinent architectures as well as nodes per layer; •earlier termination of the training stage; • cross-validation on the dataset (e.g. five-, 10-and/or leave-one-out cross-validation, with the folds being created either at random, or following ordering of the patterns with -for instance, for five-fold cross-validation -the 1st , 6th, 11th, … pattern belonging to the first fold, the 2nd, 7th, 12th etc. to the second fold, and so forth to the fifth fold), so that each fold contains the same number of patterns which extend over the entire problem space, instead of just a "split between training and validation sets" should be investigated.
The authors should ensure that all methodologies, metrics, data handling techniques etc. are accompanied by the corresponding primary references.
Section 3 contains information that should be moved to the previous section.
If logistic regression is, indeed, as successful as stated in section 3.a(iii), there is no need for more complicated (especially non-parametric) methodologies, which can add redundant and distorting detail to the problem methodology/solution and are not directly/easily (or even at all) expressed in a direct/parametric fashion.
The authors should ensure that the dataset is stationary; in case this is not so, alternative methodologies and/or on-line (re)training should be implemented.
Class imbalance is not optimal for ANN training; the appropriate measures should be taken in order to avoid training (and, thus, also) testing bias.
For this kind of problems, on-line training would be advisable so as to ensure the appropriateness/capability of the ANN to handle the changing (in time) data characteristics.
There is a considerable distance between linear and deep neural networks. Why has not an alternative (in-between) ANN architecture also been tested?
In all cases, it should be ensured that the number of ANN free parameters (weights and biases) is smaller than the number of training patterns. The authors should check for overfitting in the DNNs used, especially as the datasets are small (in relation to the size of the ANN).
What distinguishes test from recall patterns? Has no validation set been used? Testing can, by no means, be implemented on data that has been used for training.
At times it is not clear whether the aim is to mimic human decision making or to optimise/maximise recall as well as prediction.
The choice of features for the first stage should be fully justified.
If properly constructed and trained, the ANNs should be able to accurately learn the training patterns, as well as predict the test data.

20-Dec-2019
Dear Mr Turiel, The editors assigned to your paper ("P2P Loan acceptance and default prediction with Artificial Intelligence") have now received comments from reviewers. We would like you to revise your paper in accordance with the referee and Associate Editor suggestions which can be found below (not including confidential reports to the Editor). Please note this decision does not guarantee eventual acceptance.
Please submit a copy of your revised paper before 12-Jan-2020. Please note that the revision deadline will expire at 00.00am on this date. If we do not hear from you within this time then it will be assumed that the paper has been withdrawn. In exceptional circumstances, extensions may be possible if agreed with the Editorial Office in advance. We do not allow multiple rounds of revision so we urge you to make every effort to fully address all of the comments at this stage. If deemed necessary by the Editors, your manuscript will be sent back to one or more of the original reviewers for assessment. If the original reviewers are not available, we may invite new reviewers.
To revise your manuscript, log into http://mc.manuscriptcentral.com/rsos and enter your Author Centre, where you will find your manuscript title listed under "Manuscripts with Decisions." Under "Actions," click on "Create a Revision." Your manuscript number has been appended to denote a revision. Revise your manuscript and upload a new version through your Author Centre.
When submitting your revised manuscript, you must respond to the comments made by the referees and upload a file "Response to Referees" in "Section 6 -File Upload". Please use this to document how you have responded to the comments, and the adjustments you have made. In order to expedite the processing of the revised manuscript, please be as specific as possible in your response.
In addition to addressing all of the reviewers' and editor's comments please also ensure that your revised manuscript contains the following sections as appropriate before the reference list: • Ethics statement (if applicable) If your study uses humans or animals please include details of the ethical approval received, including the name of the committee that granted approval. For human studies please also detail whether informed consent was obtained. For field studies on animals please include details of all permissions, licences and/or approvals granted to carry out the fieldwork.
• Data accessibility It is a condition of publication that all supporting data are made available either as supplementary information or preferably in a suitable permanent repository. The data accessibility section should state where the article's supporting data can be accessed. This section should also include details, where possible of where to access other relevant research materials such as statistical tools, protocols, software etc can be accessed. If the data have been deposited in an external repository this section should list the database, accession number and link to the DOI for all data from the article that have been made publicly available. Data sets that have been deposited in an external repository and have a DOI should also be appropriately cited in the manuscript and included in the reference list.
If you wish to submit your supporting data or code to Dryad (http://datadryad.org/), or modify your current submission to dryad, please use the following link: http://datadryad.org/submit?journalID=RSOS&manu=RSOS-191649 • Competing interests Please declare any financial or non-financial competing interests, or state that you have no competing interests.
• Authors' contributions All submissions, other than those with a single author, must include an Authors' Contributions section which individually lists the specific contribution of each author. The list of Authors should meet all of the following criteria; 1) substantial contributions to conception and design, or acquisition of data, or analysis and interpretation of data; 2) drafting the article or revising it critically for important intellectual content; and 3) final approval of the version to be published.
All contributors who do not meet all of these criteria should be included in the acknowledgements.
We suggest the following format: AB carried out the molecular lab work, participated in data analysis, carried out sequence alignments, participated in the design of the study and drafted the manuscript; CD carried out the statistical analyses; EF collected field data; GH conceived of the study, designed the study, coordinated the study and helped draft the manuscript. All authors gave final approval for publication.
• Acknowledgements Please acknowledge anyone who contributed to the study but did not meet the authorship criteria.
• Funding statement Please list the source of funding for each author.
Once again, thank you for submitting your manuscript to Royal Society Open Science and I look forward to receiving your revision. If you have any questions at all, please do not hesitate to get in touch.
Kind regards, Royal Society Open Science Editorial Office Royal Society Open Science openscience@royalsociety.org on behalf of Prof Marta Kwiatkowska (Subject Editor) openscience@royalsociety.org Associate Editor's comments: The two reviewers have a number of suggestions and queries that should improve your manuscript -we would urge you to take their recommendations seriously, and be sure to not only include the requested updates, but provide an explanation of what changes have been made, or -even more importantly -if you choose not to include a change, explain why not. We'll look forward to receiving the revision in due course. Here are three points that shall be addressed to do at least the based model validation.

1.
Authors do a lot of manual tuning for obtained models. There is not enough information what was the evaluation scheme. I would expect that one evaluation of the data will be the out-of-time sample, external sample never saw by the model not never seen in the hyperparameter tuning.

2.
It is not clear if there was a nested CV performed or not. The grid search needs its own validation data, external from the out-of-sample validation data. The validation shall be described in larger details.

3.
Reporting AUC or Recall is far not enough. The literature is full of examples of overtrained blackboxes that are not validated. Authors shall do deeper post-hoc analysis of trained models with tools like Partial Dependency Profiles (https://pbiecek.github.io/PM_VEE/partialDependenceProfiles.html), Permutational feature importance (https://pbiecek.github.io/PM_VEE/featureImportance.html). Lots of packages for R and python to do this validation.
All proposed models (logistic regression, DNN and SVN) shall be xrayed with these tools, as it is very easy oto overfit and create an unfair model.

Reviewer: 2
Comments to the Author(s) A list of acronyms would be most useful, with the constituent words appearing in brackets next to each acronym.
Owing to its central role in the performed research, P2P should be defined at the very beginning of the manuscript.
Artificial neural networks belong to the computational (rather than artificial) intelligence paradigm.
The "significant step forward to applying Big Data and Artificial Intelligence techniques to P2P" is not made clear.
The review of the literature is not always pertinent to the main theme of the submission.
It is not clear why the two datasets have been used concurrently rather than independently. It is also not obvious how or why the results obtained can be transferred to other datasets.
Carrying on with the previous point: How uniform are the datasets over the kind of loan requested? Their discrepancies over parameters as well decision type should be clearly stated. Would it be to advantage to treat each category independently (in a "one against the others" fashion)?
Since imputation is below 10%, both the full and the reduced datasets could be used for training and testing, with the results compared and important derivations made.
Along the same lines, how uniform is the data over the kind of loan requested over the two datasets? Would it be to advantage to treat (consider, in terms of training/testing and results) each dataset independently, thus simplifying the problem as well as the implemented methods?
At present Section 2.b is quite descriptive, with the implemented procedure not being adequately detailed to allow its duplication by the interested reader.
Why have the specific ANN architectures been selected? Different neural network training criteria including • pertinent architectures as well as nodes per layer; • earlier termination of the training stage; • cross-validation on the dataset (e.g. five-, 10-and/or leave-one-out cross-validation, with the folds being created either at random, or following ordering of the patterns with -for instance, for five-fold cross-validation -the 1st , 6th, 11th, … pattern belonging to the first fold, the 2nd, 7th, 12th etc. to the second fold, and so forth to the fifth fold), so that each fold contains the same number of patterns which extend over the entire problem space, instead of just a "split between training and validation sets" should be investigated.
The authors should ensure that all methodologies, metrics, data handling techniques etc. are accompanied by the corresponding primary references.
Section 3 contains information that should be moved to the previous section.
If logistic regression is, indeed, as successful as stated in section 3.a(iii), there is no need for more complicated (especially non-parametric) methodologies, which can add redundant and distorting detail to the problem methodology/solution and are not directly/easily (or even at all) expressed in a direct/parametric fashion.
The authors should ensure that the dataset is stationary; in case this is not so, alternative methodologies and/or on-line (re)training should be implemented.
Class imbalance is not optimal for ANN training; the appropriate measures should be taken in order to avoid training (and, thus, also) testing bias.
For this kind of problems, on-line training would be advisable so as to ensure the appropriateness/capability of the ANN to handle the changing (in time) data characteristics.
There is a considerable distance between linear and deep neural networks. Why has not an alternative (in-between) ANN architecture also been tested?
In all cases, it should be ensured that the number of ANN free parameters (weights and biases) is smaller than the number of training patterns. The authors should check for overfitting in the DNNs used, especially as the datasets are small (in relation to the size of the ANN).
What distinguishes test from recall patterns? Has no validation set been used? Testing can, by no means, be implemented on data that has been used for training.
At times it is not clear whether the aim is to mimic human decision making or to optimise/maximise recall as well as prediction.
The choice of features for the first stage should be fully justified.
If properly constructed and trained, the ANNs should be able to accurately learn the training patterns, as well as predict the test data.
A list of acronyms would be most useful, with the constituent words appearing in brackets next to each acronym.
Owing to its central role in the performed research, P2P should be defined at the very beginning of the manuscript.
Artificial neural networks belong to the computational (rather than artificial) intelligence paradigm.
The "significant step forward to applying Big Data and Artificial Intelligence techniques to P2P" is not made clear.
The review of the literature is not always pertinent to the main theme of the submission.
It is not clear why the two datasets have been used concurrently rather than independently. It is also not obvious how or why the results obtained can be transferred to other datasets.
Carrying on with the previous point: How uniform are the datasets over the kind of loan requested? Their discrepancies over parameters as well decision type should be clearly stated. Would it be to advantage to treat each category independently (in a "one against the others" fashion)?
Since imputation is below 10%, both the full and the reduced datasets could be used for training and testing, with the results compared and important derivations made.
Along the same lines, how uniform is the data over the kind of loan requested over the two datasets? Would it be to advantage to treat (consider, in terms of training/testing and results) each dataset independently, thus simplifying the problem as well as the implemented methods?
At present Section 2.b is quite descriptive, with the implemented procedure not being adequately detailed to allow its duplication by the interested reader.
Why have the specific ANN architectures been selected? Different neural network training criteria including • pertinent architectures as well as nodes per layer; • earlier termination of the training stage; • cross-validation on the dataset (e.g. five-, 10-and/or leave-one-out cross-validation, with the folds being created either at random, or following ordering of the patterns with -for instance, for five-fold cross-validation -the 1st , 6th, 11th, … pattern belonging to the first fold, the 2nd, 7th, 12th etc. to the second fold, and so forth to the fifth fold), so that each fold contains the same number of patterns which extend over the entire problem space, instead of just a "split between training and validation sets" should be investigated.
The authors should ensure that all methodologies, metrics, data handling techniques etc. are accompanied by the corresponding primary references.
Section 3 contains information that should be moved to the previous section.
If logistic regression is, indeed, as successful as stated in section 3.a(iii), there is no need for more complicated (especially non-parametric) methodologies, which can add redundant and distorting detail to the problem methodology/solution and are not directly/easily (or even at all) expressed in a direct/parametric fashion.
The authors should ensure that the dataset is stationary; in case this is not so, alternative methodologies and/or on-line (re)training should be implemented.
Class imbalance is not optimal for ANN training; the appropriate measures should be taken in order to avoid training (and, thus, also) testing bias.
For this kind of problems, on-line training would be advisable so as to ensure the appropriateness/capability of the ANN to handle the changing (in time) data characteristics.
There is a considerable distance between linear and deep neural networks. Why has not an alternative (in-between) ANN architecture also been tested?
In all cases, it should be ensured that the number of ANN free parameters (weights and biases) is smaller than the number of training patterns. The authors should check for overfitting in the DNNs used, especially as the datasets are small (in relation to the size of the ANN).
What distinguishes test from recall patterns? Has no validation set been used? Testing can, by no means, be implemented on data that has been used for training.
At times it is not clear whether the aim is to mimic human decision making or to optimise/maximise recall as well as prediction.
The choice of features for the first stage should be fully justified.
If properly constructed and trained, the ANNs should be able to accurately learn the training patterns, as well as predict the test data.

Author's Response to Decision Letter for (RSOS-191649.R0)
See Appendix A.

Comments to the Author(s)
The manuscript has been improved to a satisfactory degree.

Decision letter (RSOS-191649.R1)
We hope you are keeping well at this difficult and unusual time. We continue to value your support of the journal in these challenging circumstances. If Royal Society Open Science can assist you at all, please don't hesitate to let us know at the email address below.

Dear Mr Turiel:
On behalf of the Editors, I am pleased to inform you that your Manuscript RSOS-191649.R1 entitled "P2P Loan acceptance and default prediction with Artificial Intelligence" has been accepted for publication in Royal Society Open Science subject to minor revision in accordance with the referee suggestions. Please find the referees' comments at the end of this email.
The reviewers and Subject Editor have recommended publication, but also suggest some minor revisions to your manuscript. Therefore, I invite you to respond to the comments and revise your manuscript.
• Ethics statement If your study uses humans or animals please include details of the ethical approval received, including the name of the committee that granted approval. For human studies please also detail whether informed consent was obtained. For field studies on animals please include details of all permissions, licences and/or approvals granted to carry out the fieldwork.
• Data accessibility It is a condition of publication that all supporting data are made available either as supplementary information or preferably in a suitable permanent repository. The data accessibility section should state where the article's supporting data can be accessed. This section should also include details, where possible of where to access other relevant research materials such as statistical tools, protocols, software etc can be accessed. If the data has been deposited in an external repository this section should list the database, accession number and link to the DOI for all data from the article that has been made publicly available. Data sets that have been deposited in an external repository and have a DOI should also be appropriately cited in the manuscript and included in the reference list.
If you wish to submit your supporting data or code to Dryad (http://datadryad.org/), or modify your current submission to dryad, please use the following link: http://datadryad.org/submit?journalID=RSOS&manu=RSOS-191649.R1 • Competing interests Please declare any financial or non-financial competing interests, or state that you have no competing interests.
• Authors' contributions All submissions, other than those with a single author, must include an Authors' Contributions section which individually lists the specific contribution of each author. The list of Authors should meet all of the following criteria; 1) substantial contributions to conception and design, or acquisition of data, or analysis and interpretation of data; 2) drafting the article or revising it critically for important intellectual content; and 3) final approval of the version to be published.
All contributors who do not meet all of these criteria should be included in the acknowledgements.
We suggest the following format: AB carried out the molecular lab work, participated in data analysis, carried out sequence alignments, participated in the design of the study and drafted the manuscript; CD carried out the statistical analyses; EF collected field data; GH conceived of the study, designed the study, coordinated the study and helped draft the manuscript. All authors gave final approval for publication.
• Acknowledgements Please acknowledge anyone who contributed to the study but did not meet the authorship criteria.
• Funding statement Please list the source of funding for each author.
Please note that we cannot publish your manuscript without these end statements included. We have included a screenshot example of the end statements for reference. If you feel that a given heading is not relevant to your paper, please nevertheless include the heading and explicitly state that it is not relevant to your work.
Because the schedule for publication is very tight, it is a condition of publication that you submit the revised version of your manuscript before 14-May-2020. Please note that the revision deadline will expire at 00.00am on this date. If you do not think you will be able to meet this date please let me know immediately.
To revise your manuscript, log into https://mc.manuscriptcentral.com/rsos and enter your Author Centre, where you will find your manuscript title listed under "Manuscripts with Decisions". Under "Actions," click on "Create a Revision." You will be unable to make your revisions on the originally submitted version of the manuscript. Instead, revise your manuscript and upload a new version through your Author Centre.
When submitting your revised manuscript, you will be able to respond to the comments made by the referees and upload a file "Response to Referees" in "Section 6 -File Upload". You can use this to document any changes you make to the original manuscript. In order to expedite the processing of the revised manuscript, please be as specific as possible in your response to the referees.
When uploading your revised files please make sure that you have: 1) A text file of the manuscript (tex, txt, rtf, docx or doc), references, tables (including captions) and figure captions. Do not upload a PDF as your "Main Document". 2) A separate electronic file of each figure (EPS or print-quality PDF preferred (either format should be produced directly from original creation package), or original software format) 3) Included a 100 word media summary of your paper when requested at submission. Please ensure you have entered correct contact details (email, institution and telephone) in your user account 4) Included the raw data to support the claims made in your paper. You can either include your data as electronic supplementary material or upload to a repository and include the relevant doi within your manuscript 5) All supplementary materials accompanying an accepted article will be treated as in their final form. Note that the Royal Society will neither edit nor typeset supplementary material and it will be hosted as provided. Please ensure that the supplementary material includes the paper details where possible (authors, article title, journal name).
Supplementary files will be published alongside the paper on the journal website and posted on the online figshare repository (https://figshare.com). The heading and legend provided for each supplementary file during the submission process will be used to create the figshare page, so please ensure these are accurate and informative so that your files can be found in searches. Files on figshare will be made available approximately one week before the accompanying article so that the supplementary material can be attributed a unique DOI.
Once again, thank you for submitting your manuscript to Royal Society Open Science and I look forward to receiving your revision. If you have any questions at all, please do not hesitate to get in touch.
Kind regards, Andrew Dunn Royal Society Open Science Editorial Office Royal Society Open Science openscience@royalsociety.org on behalf of Prof Marta Kwiatkowska (Subject Editor) openscience@royalsociety.org Associate Editor Comments to Author: It appears a couple of minor queries are left to address, but otherwise the paper is ready for acceptance. Please provide a final revision incorporating these remaining changes.
Reviewer comments to Author: Reviewer: 2 Comments to the Author(s) The manuscript has been improved to a satisfactory degree. Decision letter (RSOS-191649.R2) We hope you are keeping well at this difficult and unusual time. We continue to value your support of the journal in these challenging circumstances. If Royal Society Open Science can assist you at all, please don't hesitate to let us know at the email address below.

Dear Mr Turiel,
It is a pleasure to accept your manuscript entitled "P2P Loan acceptance and default prediction with Artificial Intelligence" in its current form for publication in Royal Society Open Science.
Please ensure that you send to the editorial office an editable version of your accepted manuscript, and individual files for each figure and table included in your manuscript. You can send these in a zip folder if more convenient. Failure to provide these files may delay the processing of your proof. You may disregard this request if you have already provided these files to the editorial office.
You can expect to receive a proof of your article in the near future. Please contact the editorial office (openscience_proofs@royalsociety.org) and the production office (openscience@royalsociety.org) to let us know if you are likely to be away from e-mail contact --if you are going to be away, please nominate a co-author (if available) to manage the proofing process, and ensure they are copied into your email to the journal.
Due to rapid publication and an extremely tight schedule, if comments are not received, your paper may experience a delay in publication. Royal Society Open Science operates under a continuous publication model. Your article will be published straight into the next open issue and this will be the final version of the paper. As such, it can be cited immediately by other researchers. As the issue version of your paper will be the only version to be published I would advise you to check your proofs thoroughly as changes cannot be made once the paper is published.
Please see the Royal Society Publishing guidance on how you may share your accepted author manuscript at https://royalsociety.org/journals/ethics-policies/media-embargo/. As most balancing techniques are not optimal and there is no straightforward way to adjust with the chosen loss we downsample the data (i.e. the overrepresented non-default class). We have tried oversampling, but this caused overfitting to the repeated data points. We have now added this discussion to Section 2.b.ii. Future work may try this with bootstrapping to make the examples less similar.
*For this kind of problems, on-line training would be advisable so as to ensure the appropriateness/capability of the ANN to handle the changing (in time) data characteristics.
This may be again related to clarities in the dataset. We do not expect non-stationarites in this type of data. If in the default rates, this should not matter for a well-trained classifier In terms of online learning and re-training we do not have enough data yet/a long enough period yet but as the number of loans in growing super-linearly ( Figure 1) regular training should anyway give more importance to recent loans. We are hence unable to notice the difference yet. Due to imbalance (as many loans take time to default) we need a year gap between training data and the present where we would run the model.
There is a considerable distance between linear and deep neural networks. Why has not an alternative (inbetween) ANN architecture also been tested?
Other architectures are currently being tested and will be the subject of future work. The current work already analyses four families of models and delving into more would drive attention away from the focus of the paper on a first Deep Learning application to P2P lending and its discussion.
In all cases, it should be ensured that the number of ANN free parameters (weights and biases) is smaller than the number of training patterns. The authors should check for overfitting in the DNNs used, especially as the datasets are small (in relation to the size of the ANN). This is checked for in the explainability part Section 3.b where we open up the model and analyse features and more. As a side point, training parameters are in the order of 10 2-3 and data is in the 10 4-5 .
What distinguishes test from recall patterns? Has no validation set been used? Testing can, by no means, be implemented on data that has been used for training.
As explained above, we use cross validation for hyperparameter tuning and an out of sample test set to show results for the selected models. We have now outlined this more in detail in Section 2.b.ii and 3.a.v.
At times it is not clear whether the aim is to mimic human decision making or to optimise/maximise recall as well as prediction.
We have again clarified this by inserting the scheme in the Methods Section 2.b and better defined the difference between first and second phase. The first phase aims to mimic human decisions of acceptance/rejectance as they are its target label. The second phase aims to predict default risk based on factual default data.
The choice of features for the first stage should be fully justified.
We wish to clarify that there was very little choice as no features are excluded, but geographic ones as they are categorical features and bare no intrinsic meaning. These should be used, encoded with related information, in further work.
If properly constructed and trained, the ANNs should be able to accurately learn the training patterns, as well as predict the test data. This is indeed shown, as the model correctly interprets the features (explainability Section 3.b) and the selected models achieve high scores on out of sample test data in Section 3.a.v.

Response to minor revisions
We thanks the reviewers for the previous and current comments whicb have grealty contributed to improving our work.
Reviewer comments to Author: Reviewer: 2 Comments to the Author(s) The manuscript has been improved to a satisfactory degree. OK (no need for changes) Reviewer: 1 Comments to the Author(s) Minor things: -Instead of 'Partial Dependency' it should be 'Partial Dependence'. This has been modified throughout the paper -The OX axes in Figures 4 and 5 should be improved These have been improved (see figures) - Figures 6 and 7 should be complemented with line cart that show how the conditional average probability depends on the values of the variables. These have been added in the now Figures 7,9,11 -Links to the code should be in Chapter 6 and not in references Done (see chapter 6) -References are very incoherent, they need to be made more consistent. In particular, remove all links to the DOI. Links have been removed throughout references if not strictly necessary and reference format has been changed to improve readability. -The small inscriptions in Figure 2 are unreadable. It is worth to enlarge them. These have been enlarged and are now visible.