Double attention recurrent convolution neural network for answer selection

Answer selection is one of the key steps in many question answering (QA) applications. In this paper, a new deep model with two kinds of attention is proposed for answer selection: the double attention recurrent convolution neural network (DARCNN). Double attention means self-attention and cross-attention. The design inspiration of this model came from the transformer in the domain of machine translation. Self-attention can directly calculate dependencies between words regardless of the distance. However, self-attention ignores the distinction between its surrounding words and other words. Thus, we design a decay self-attention that prioritizes local words in a sentence. In addition, cross-attention is established to achieve interaction between question and candidate answer. With the outputs of self-attention and decay self-attention, we can get two kinds of interactive information via cross-attention. Finally, using the feature vectors of the question and answer, elementwise multiplication is used to combine with them and multilayer perceptron is used to predict the matching score. Experimental results on four QA datasets containing Chinese and English show that DARCNN performs better than other answer selection models, thereby demonstrating the effectiveness of self-attention, decay self-attention and cross-attention in answer selection tasks.

attention, CNN, and MLP layers. The proposed architecture outperforms some selected baseline methods.
While the manuscript presents an interesting problem and addresses relevant challenges, I believe the paper is not well-presented, and many choices of the authors are not motivated. Below I provide more details about the limitations of the paper.
While the authors refused to discuss the related work in detail (Section 3 is done very superficially), they have spent a lot of effort describing well-known theories and components of neural models (e.g., Sections 4.2, 4.3, etc.) Also, while providing details about these neural components, I didn't find the provided motivation of employing them convincing. Therefore, it is a question whether the choice of components is well-motivated, which will then affect the quality of the contribution of the manuscript.
Furthermore, important baselines are missing from the experiments. For instance, I am curious to know how their system will compare against a BERT-based model. Also, the depth of discussions and analyses is not convincing enough. Some recent collections for question answering are also missing (e.g., ANTIQUE [1]). I would like to recommend to the authors to redo their experiments also on this dataset.
Overall, I believe that this manuscript has the potential of making a good publication but requires extensive revision, including but not limited to improving the presentation (i.e., related work, model description, results and analysis). More recent baselines should be added (i.e., BERT-based models). More recent datasets that are more appropriate for testing neural models should also be added to the experiments (i.e., ANTIQUE).
[1] https://ciir.cs.umass.edu/downloads/Antique/ Decision letter (RSOS-191517.R0) 27-Feb-2020 Dear Dr Wei, The editors assigned to your paper ("Double attention recurrent convolution neural network for answer selection") have now received comments from reviewers. We would like you to revise your paper in accordance with the referee and Associate Editor suggestions which can be found below (not including confidential reports to the Editor). Please note this decision does not guarantee eventual acceptance.
Please submit a copy of your revised paper before 21-Mar-2020. Please note that the revision deadline will expire at 00.00am on this date. If we do not hear from you within this time then it will be assumed that the paper has been withdrawn. In exceptional circumstances, extensions may be possible if agreed with the Editorial Office in advance. We do not allow multiple rounds of revision so we urge you to make every effort to fully address all of the comments at this stage. If deemed necessary by the Editors, your manuscript will be sent back to one or more of the original reviewers for assessment. If the original reviewers are not available, we may invite new reviewers.
To revise your manuscript, log into http://mc.manuscriptcentral.com/rsos and enter your Author Centre, where you will find your manuscript title listed under "Manuscripts with Decisions." Under "Actions," click on "Create a Revision." Your manuscript number has been appended to denote a revision. Revise your manuscript and upload a new version through your Author Centre.
When submitting your revised manuscript, you must respond to the comments made by the referees and upload a file "Response to Referees" in "Section 6 -File Upload". Please use this to document how you have responded to the comments, and the adjustments you have made. In order to expedite the processing of the revised manuscript, please be as specific as possible in your response.
In addition to addressing all of the reviewers' and editor's comments please also ensure that your revised manuscript contains the following sections as appropriate before the reference list: • Ethics statement (if applicable) If your study uses humans or animals please include details of the ethical approval received, including the name of the committee that granted approval. For human studies please also detail whether informed consent was obtained. For field studies on animals please include details of all permissions, licences and/or approvals granted to carry out the fieldwork.
• Data accessibility It is a condition of publication that all supporting data are made available either as supplementary information or preferably in a suitable permanent repository. The data accessibility section should state where the article's supporting data can be accessed. This section should also include details, where possible of where to access other relevant research materials such as statistical tools, protocols, software etc can be accessed. If the data have been deposited in an external repository this section should list the database, accession number and link to the DOI for all data from the article that have been made publicly available. Data sets that have been deposited in an external repository and have a DOI should also be appropriately cited in the manuscript and included in the reference list.
If you wish to submit your supporting data or code to Dryad (http://datadryad.org/), or modify your current submission to dryad, please use the following link: http://datadryad.org/submit?journalID=RSOS&manu=RSOS-191517 • Competing interests Please declare any financial or non-financial competing interests, or state that you have no competing interests.
• Authors' contributions All submissions, other than those with a single author, must include an Authors' Contributions section which individually lists the specific contribution of each author. The list of Authors should meet all of the following criteria; 1) substantial contributions to conception and design, or acquisition of data, or analysis and interpretation of data; 2) drafting the article or revising it critically for important intellectual content; and 3) final approval of the version to be published.
All contributors who do not meet all of these criteria should be included in the acknowledgements.
We suggest the following format: AB carried out the molecular lab work, participated in data analysis, carried out sequence alignments, participated in the design of the study and drafted the manuscript; CD carried out the statistical analyses; EF collected field data; GH conceived of the study, designed the study, coordinated the study and helped draft the manuscript. All authors gave final approval for publication.
• Acknowledgements Please acknowledge anyone who contributed to the study but did not meet the authorship criteria.
• Funding statement Please list the source of funding for each author.
Once again, thank you for submitting your manuscript to Royal Society Open Science and I look forward to receiving your revision. If you have any questions at all, please do not hesitate to get in touch. the reviewers have read your manuscript and recommend various revisions that will make it stronger. While revising your paper, please answer the technical questions of reviewer 1 and also make sure to address the rewrites reviewer 2 recommends. Also in the experimental section, it would be good to see further analysis, e.g., on the decay matrix component and possibly the addition of a BERT-based baseline.
Reviewers' Comments to Author: Reviewer: 1 Comments to the Author(s) Technical concerns: [1] Page 5, line 49-50: this statement is likely to be faulty; the multiplication of V results in a new contextualized representation matrix for the inputs, instead of "attention weight", which, in different contexts, refers to either the logit scores before softmax, or the distribution after softmax.
[2] Eq (10): Does w here denote a vector? If so, how is Q*K^T in Eq. (10) supposed to produce a vector? If not, you should not take w_i as a scalar in Eq. (11). In Eq. (14), you add w with your decay matrix, does that mean w is a matrix? These notations need to be fixed or further clarified.
General concerns: It's good to see the authors did an ablation study for all components in the framework. However, I still have the following concerns: [1] When you say "wo/ BiLSTM/attention/CNN", did you use the same amount of parameters? For instance, if you want to prove the effects of decay attention, the best practice would be replacing it with another self-attention, i.e., just removing the decay mask. Completely removing this attention leads to a non-negligible loss in the parameter size, which most likely, hurts overall model capacity.
[2] The comparison with BiLSTM is an interesting part to me, since the function of your decay attention seems similar to BiLSTM, i.e., putting emphasis on surrounding words. In order to demonstrate the point "self-attention cannot obtain the position information and word order information on the sequence", I guess the authors should increase the layers of selfattention/decay attention when trying removing BiLSTM.
[3] This paper argues that self-attention is not enough for capturing positional information. I generally agree with this point. However, in the original paper, together with the proposal of selfattention, "position embeddings" were adopted to encode positional prior of a sequence. I disappointedly found its reference/comparison was missing from this paper. Since the "decay matrix" proposed here is supposed to be one of the key contributions of this work, I would like to see a deeper analysis of this component, including the comparison to positional embeddings and more fair ablation study settings as mentioned above.

Reviewer: 2
Comments to the Author(s) This manuscript presents a neural architecture for answer selection in an information retrieval system. The neural architecture consists of several components, including BiLSTM, self-, crossattention, CNN, and MLP layers. The proposed architecture outperforms some selected baseline methods.
While the manuscript presents an interesting problem and addresses relevant challenges, I believe the paper is not well-presented, and many choices of the authors are not motivated. Below I provide more details about the limitations of the paper.
While the authors refused to discuss the related work in detail (Section 3 is done very superficially), they have spent a lot of effort describing well-known theories and components of neural models (e.g., Sections 4.2, 4.3, etc.) Also, while providing details about these neural components, I didn't find the provided motivation of employing them convincing. Therefore, it is a question whether the choice of components is well-motivated, which will then affect the quality of the contribution of the manuscript.
Furthermore, important baselines are missing from the experiments. For instance, I am curious to know how their system will compare against a BERT-based model. Also, the depth of discussions and analyses is not convincing enough. Some recent collections for question answering are also missing (e.g., ANTIQUE [1]). I would like to recommend to the authors to redo their experiments also on this dataset.
Overall, I believe that this manuscript has the potential of making a good publication but requires extensive revision, including but not limited to improving the presentation (i.e., related work, model description, results and analysis). More recent baselines should be added (i.e., BERT-based models). More recent datasets that are more appropriate for testing neural models should also be added to the experiments (i.e., ANTIQUE). On behalf of the Editors, I am pleased to inform you that your Manuscript RSOS-191517.R1 entitled "Double attention recurrent convolution neural network for answer selection" has been accepted for publication in Royal Society Open Science subject to minor revision in accordance with the referee suggestions. Please find the referees' comments at the end of this email.
The reviewers and Subject Editor have recommended publication, but also suggest some minor revisions to your manuscript. Therefore, I invite you to respond to the comments and revise your manuscript.
• Ethics statement If your study uses humans or animals please include details of the ethical approval received, including the name of the committee that granted approval. For human studies please also detail whether informed consent was obtained. For field studies on animals please include details of all permissions, licences and/or approvals granted to carry out the fieldwork.
• Data accessibility It is a condition of publication that all supporting data are made available either as supplementary information or preferably in a suitable permanent repository. The data accessibility section should state where the article's supporting data can be accessed. This section should also include details, where possible of where to access other relevant research materials such as statistical tools, protocols, software etc can be accessed. If the data has been deposited in an external repository this section should list the database, accession number and link to the DOI for all data from the article that has been made publicly available. Data sets that have been deposited in an external repository and have a DOI should also be appropriately cited in the manuscript and included in the reference list.
If you wish to submit your supporting data or code to Dryad (http://datadryad.org/), or modify your current submission to dryad, please use the following link: http://datadryad.org/submit?journalID=RSOS&manu=RSOS-191517.R1 • Competing interests Please declare any financial or non-financial competing interests, or state that you have no competing interests.
• Authors' contributions All submissions, other than those with a single author, must include an Authors' Contributions section which individually lists the specific contribution of each author. The list of Authors should meet all of the following criteria; 1) substantial contributions to conception and design, or acquisition of data, or analysis and interpretation of data; 2) drafting the article or revising it critically for important intellectual content; and 3) final approval of the version to be published.
All contributors who do not meet all of these criteria should be included in the acknowledgements.
We suggest the following format: AB carried out the molecular lab work, participated in data analysis, carried out sequence alignments, participated in the design of the study and drafted the manuscript; CD carried out the statistical analyses; EF collected field data; GH conceived of the study, designed the study, coordinated the study and helped draft the manuscript. All authors gave final approval for publication.
• Acknowledgements Please acknowledge anyone who contributed to the study but did not meet the authorship criteria.
• Funding statement Please list the source of funding for each author.
Please note that we cannot publish your manuscript without these end statements included. We have included a screenshot example of the end statements for reference. If you feel that a given heading is not relevant to your paper, please nevertheless include the heading and explicitly state that it is not relevant to your work.
Because the schedule for publication is very tight, it is a condition of publication that you submit the revised version of your manuscript before @@author due date will be populated when the email is sent@@. Please note that the revision deadline will expire at 00.00am on this date. If you do not think you will be able to meet this date please let me know immediately.
To revise your manuscript, log into https://mc.manuscriptcentral.com/rsos and enter your Author Centre, where you will find your manuscript title listed under "Manuscripts with Decisions". Under "Actions," click on "Create a Revision." You will be unable to make your revisions on the originally submitted version of the manuscript. Instead, revise your manuscript and upload a new version through your Author Centre.
When submitting your revised manuscript, you will be able to respond to the comments made by the referees and upload a file "Response to Referees" in "Section 6 -File Upload". You can use this to document any changes you make to the original manuscript. In order to expedite the processing of the revised manuscript, please be as specific as possible in your response to the referees.
When uploading your revised files please make sure that you have: 1) A text file of the manuscript (tex, txt, rtf, docx or doc), references, tables (including captions) and figure captions. Do not upload a PDF as your "Main Document". 2) A separate electronic file of each figure (EPS or print-quality PDF preferred (either format should be produced directly from original creation package), or original software format) 3) Included a 100 word media summary of your paper when requested at submission. Please ensure you have entered correct contact details (email, institution and telephone) in your user account 4) Included the raw data to support the claims made in your paper. You can either include your data as electronic supplementary material or upload to a repository and include the relevant doi within your manuscript 5) All supplementary materials accompanying an accepted article will be treated as in their final form. Note that the Royal Society will neither edit nor typeset supplementary material and it will be hosted as provided. Please ensure that the supplementary material includes the paper details where possible (authors, article title, journal name).
Supplementary files will be published alongside the paper on the journal website and posted on the online figshare repository (https://figshare.com). The heading and legend provided for each supplementary file during the submission process will be used to create the figshare page, so please ensure these are accurate and informative so that your files can be found in searches. Files on figshare will be made available approximately one week before the accompanying article so that the supplementary material can be attributed a unique DOI. We recommend that you ask a native speaker of English or solicit the support of a language editing service (https://royalsociety.org/journals/authors/language-polishing/) prior to resubmitting the manuscript. Note also typo in 'Simaese Architecture' (should be Siamese).

See Appendix B.
Decision letter (RSOS-191517.R2) We hope you are keeping well at this difficult and unusual time. We continue to value your support of the journal in these challenging circumstances. If Royal Society Open Science can assist you at all, please don't hesitate to let us know at the email address below.

Dear Dr Wei,
It is a pleasure to accept your manuscript entitled "Double attention recurrent convolution neural network for answer selection" in its current form for publication in Royal Society Open Science.
You can expect to receive a proof of your article in the near future. Please contact the editorial office (openscience_proofs@royalsociety.org) and the production office (openscience@royalsociety.org) to let us know if you are likely to be away from e-mail contact --if you are going to be away, please nominate a co-author (if available) to manage the proofing process, and ensure they are copied into your email to the journal. Due to rapid publication and an extremely tight schedule, if comments are not received, your paper may experience a delay in publication. Royal Society Open Science operates under a continuous publication model. Your article will be published straight into the next open issue and this will be the final version of the paper. As such, it can be cited immediately by other researchers. As the issue version of your paper will be the only version to be published I would advise you to check your proofs thoroughly as changes cannot be made once the paper is published.
Please see the Royal Society Publishing guidance on how you may share your accepted author manuscript at https://royalsociety.org/journals/ethics-policies/media-embargo/.

Dear Editors and Reviewers,
We have carefully read all the comments, and the corresponding revisions have been carried out. We believe this will make this paper more professional and rigorous. I would like to express my gratitude to editor and reviewers for their efforts. We marked the changes in red (Responds to comments from reviewer 1) and green (Responds to comments from reviewer 2) in the revised manuscript. We hope it meets with approval.
The point to point responds are as follows:

Associate Editor's comments:
Comments to the Author: Dear Author, the reviewers have read your manuscript and recommend various revisions that will make it stronger. While revising your paper, please answer the technical questions of reviewer 1 and also make sure to address the rewrites reviewer 2 recommends. Also in the experimental section, it would be good to see further analysis, e.g., on the decay matrix component and possibly the addition of a BERT-based baseline.

Response:
We have revised the manuscript according to the opinions of two reviewers and answered the reviewers' questions. In order to ensure the scientific nature of the article, we added some experiments. In the revised version we added an ablation experiment that only removed the decay matrix. This will not affect the capacity of the model, and the experimental result is that the MAP and MRR decreased by 2.5% and 2.5%, respectively. It can be seen that the decay mask has a positive effect on the model, and there was a clear difference between self-attention and decay self-attention. Then we listened to the suggestions of reviewers added BERT-based model and the data set of ANTIQUE to the experiment. Through experiments, compared with BERT, DARCNN is 1 to 2 percentage points lower on the three data sets, but the advantage of DARCNN is that its model capacity is much smaller than BERT's.

Reviewer: 1
Comments to the Author(s) Technical concerns: [1] Page 5, line 49-50: this statement is likely to be faulty; the multiplication of V results in a new contextualized representation matrix for the inputs, instead of "attention weight", which, in different contexts, refers to either the logit scores before softmax, or the distribution after softmax.

Response:
In the original manuscript, the expression "multiply the matrix V to get a feature vector of the attention weight" was not accurate enough. I am willing to follow the suggestions of reviewers and change them to "multiply the matrix V to get a new contextualized representation matrix". Thank you very much for your suggestion.
[2] Eq (10): Does w here denote a vector? If so, how is Q*K^T in Eq. (10) supposed to produce a vector? If not, you should not take w_i as a scalar in Eq. (11). In Eq. (14), you add w with your decay matrix, does that mean w is a matrix? These notations need to be fixed or further clarified.
General concerns: It's good to see the authors did an ablation study for all components in the framework. However, I still have the following concerns: [1] When you say "wo/ BiLSTM/attention/CNN", did you use the same amount of parameters? For instance, if you want to prove the effects of decay attention, the best practice would be replacing it with another self-attention, i.e., just removing the decay mask. Completely removing this attention leads to a non-negligible loss in the parameter size, which most likely, hurts overall model capacity.
Response: I am willing to follow the comments of reviewers. In order to better prove the role of decay self-attention in the model, I only removed the decay mask in the ablation experiment for comparison. From the results of the experiment, the MAP and MRR decreased by 2.5% and 2.5%, respectively. Thank you.
[2] The comparison with BiLSTM is an interesting part to me, since the function of your decay attention seems similar to BiLSTM, i.e., putting emphasis on surrounding words. In order to demonstrate the point "self-attention cannot obtain the position information and word order information on the sequence", I guess the authors should increase the layers of self-attention/decay attention when trying removing BiLSTM. Response: The point "self-attention cannot obtain the position information and word order information on the sequence" is obtained according to other literature and the calculation principle of self-attention. In the ablation experiment to remove BiLSTM, the MAP and MRR decreased by 6.7% and 7.0%, respectively. In the modified version, we used positional embedding to replace BiLSTM, the MAP and MRR only decreased by 3.5% and 3.7%. Therefore, we believe that BiLSTM considers the relationship between positional information and word order, while self-attention does not.
[3] This paper argues that self-attention is not enough for capturing positional information. I generally agree with this point. However, in the original paper, together with the proposal of self-attention, "position embeddings" were adopted to encode positional prior of a sequence. I disappointedly found its reference/comparison was missing from this paper. Since the "decay matrix" proposed here is supposed to be one of the key contributions of this work, I would like to see a deeper analysis of this component, including the comparison to positional embeddings and more fair ablation study settings as mentioned above.

Response:
The location information and word order information that BiLSTM has considered. I am willing to listen to the comments of reviewers and compare with positional embedding. In the ablation experiment of removing BiLSTM, we used positional embedding to replace BiLSTM. The MAP and MRR only decreased by 3.5% and 3.7%. Therefore, we believe that BiLSTM takes into account the relationship between position and word order when generating the new representation matrix, which makes up for the lack of self-attention. Thank you.

Reviewer: 2
Comments to the Author(s) This manuscript presents a neural architecture for answer selection in an information retrieval system. The neural architecture consists of several components, including BiLSTM, self-, cross-attention, CNN, and MLP layers. The proposed architecture outperforms some selected baseline methods.
While the manuscript presents an interesting problem and addresses relevant challenges, I believe the paper is not well-presented, and many choices of the authors are not motivated. Below I provide more details about the limitations of the paper.
While the authors refused to discuss the related work in detail (Section 3 is done very superficially), they have spent a lot of effort describing well-known theories and components of neural models (e.g., Sections 4.2, 4.3, etc.) Also, while providing details about these neural components, I didn't find the provided motivation of employing them convincing. Therefore, it is a question whether the choice of components is well-motivated, which will then affect the quality of the contribution of the manuscript.
Furthermore, important baselines are missing from the experiments. For instance, I am curious to know how their system will compare against a BERT-based model. Also, the depth of discussions and analyses is not convincing enough. Some recent collections for question answering are also missing (e.g., ANTIQUE [1]). I would like to recommend to the authors to redo their experiments also on this dataset.
Overall, I believe that this manuscript has the potential of making a good publication but requires extensive revision, including but not limited to improving the presentation (i.e., related work, model description, results and analysis). More recent baselines should be added (i.e., BERT-based models). More recent datasets that are more appropriate for testing neural models should also be added to the experiments (i.e., ANTIQUE).
Response: First, the motivation of designing this model is to improve the existing self-attention and make the model better in the answer selection. Secondly, the flexibility of self-attention feature extraction makes it easy to improve. In this article, we proposed adding a decay mask on self-attention, called decay self-attention. And use cross-attention to merge the two kinds of self-attention. To further demonstrate the role of these two types of self-attention in the model, in the modified version we added an ablation experiment that only removed the decay matrix. This will not affect the capacity of the model, and the experimental result is that the MAP and MRR decreased by 2.5% and 2.5%, respectively. It can be seen that the decay mask has a positive effect on the model, and there was a clear difference between self-attention and decay self-attention. BiLSTM is also an important component of the model. In the modified version, we added an ablation experiment and replaced BiLSTM with positional embedding. Through experimental comparison, we believe that BiLSTM can make up for the lack of location information and word order relation obtained by self-attention.
Then we listened to the suggestions of reviewers added BERT-based model and the data set of ANTIQUE to the experiment. Through experiments, compared with BERT, DARCNN is 1 to 2 percentage points lower on the three data sets, but the advantage of DARCNN is that its model capacity is much smaller than BERT's. Thank you very much for your suggestion.

Dear Editors and Reviewers,
We have carefully read all the comments, and the corresponding revisions have been carried out. We believe this will make this paper more professional and rigorous. I would like to express my gratitude to editor and reviewers for their efforts. We hope it meets with approval.
The point to point responds are as follows: Editor Comments to Author: We are marking this as 'accept with minor' to allow for improvements in English. Note also typo in 'Simaese Architecture' (should be Siamese).
As you have been requested to edit the written English, you must provide proof that you have done so: acceptable proof includes a certificate of language-editing from a language editing service or a signed letter from a native speaker of English. If you do not provide this proof, your manuscript may be returned to you. For information about language editing services endorsed by the Royal Society, please follow the link below: https://royalsociety.org/journals/authors/language-polishing/