Assortative mating frames establishment in a young island bird population

Successful island colonizations are key events to understand range dynamic processes, but studying a young population right after it reaches establishment is a rare opportunity in natural systems. The genetic structure of a recently established population may offer unique insights into its colonization history and demographic processes that are important for a successful colonization. Here, we studied the population genetics of a recently established island population of Eurasian blackbirds (Aves: Turdus merula) located on the island of Heligoland in the German North Sea. Using microsatellites, we genotyped the majority of the island population, including the nestlings, over a 4-year period between 2004 and 2007. We also genotyped high numbers of migrants on stopover and mainland individuals, as they are potential founders of the island population. We identified two genetic clusters that comply with the migrating and mainland birds. While most of the island birds belong to the mainland cluster, some breeding individuals and a low fraction of the offspring belong to the genetic cluster found in migrating individuals with almost no admixture between the two, pointing to assortative mating acting on the island population. We did not find any evidence for founder events and detected deviations from the Hardy–Weinberg equilibrium that disappeared in cohorts of older age that coincide with a lower number of siblings in older cohorts. The observed genetic patterns unravel a complex colonization history to which migratory and mainland birds have contributed and which is characterized by assortative mating. Further research will be directed towards habitat selection and phenotypic differences as potential drivers of assortative mating in this island population.

I have reviewed the paper "Assortative mating frames establishment in a young island bird population." The manuscript is well written and the methods are nicely explained. I am also impressed by the amount of work; genotyping no less than 630 individuals. However, I do have some comments and suggestions regarding the methods applied to analyze the genetic data.
First, the STRUCTURE analysis is based on all loci and all individuals. Using this complete dataset can potentially bias the results. Several loci deviate from Hardy-Weinberg equilibrium while STRUCTURE assumes that loci within populations are in Hardy-Weinberg equilibrium (see introduction of STRUCTURE manual). Next, the analysis of related individuals can cause estimation biases because of shared variation that consequently affects the ancestry analysis (see for example Porras-Hurtado et al. 2013 Frontiers in Genetics). Given that the authors have sampled almost the entire island population; the dataset likely contains several related individuals. I would suggest removing these individuals and rerunning the analyses. You could remove related individuals with ML-related (Kalinowski et al. 2006 Molecular Ecology Notes). Removing related individuals might also solve the issues with Hardy-Weinberg equilibrium. Alternatively, the analyses can be done with the software package CLUSTER_DIST which does not make Hardy-Weinberg assumptions and can deal with related individuals (Rodriguez-Ramilo et al. 2014 Genetics Research).
During the quantification of population differentiation (using Fst and Dest), the samples are divided over six groups. Why did you not pool all the island samples (resident island [>20 obs], resident island [<20 obs] and nestlings)?
Based on the observed heterozygosity values, the authors conclude that there are no signs of a founder effect. Although I agree with this reasoning, this statement can be tested statistically. One could for example use the software BOTTLENECK, which checks if there has been a significant decrease in allelic richness (Cornuet & Luikart 1997 Genetics).
The supplementary material contains simulations of admixture to test particular colonization scenarios. This analyses culminated in the following sentence: "neither scenario resulted in a comparable pattern than the clearly distinct distribution of genotypes in island offspring observed." Hence, I do not see the point of including these simulations in the manuscript. Normally, one would simulate several scenarios and compare the output with the actual data using a goodness-of-fit test. The scenario that most closely matches the data gives some insights into the possible history of the population. I would suggest that either the authors remove the simulations from the paper or they perform a more detailed analyses with statistical validation.
Specific comments Line 18: replace at with on. Line 18: Throughout the manuscript, blackbirds is sometimes written with capital letter and sometimes with small letter. Please check for consistency. Line 19: What genetic markers did you use? Mention the microsatellites here. Line 21-23: Restructure this sentence. I would suggest: We also genotyped high numbers of Decision letter (RSOS-190050.R0)

12-Mar-2019
Dear Dr Engler, The editors assigned to your paper ("Assortative mating frames establishment in a young island bird population") have now received comments from reviewers. We would like you to revise your paper in accordance with the referee and Associate Editor suggestions which can be found below (not including confidential reports to the Editor). Please note this decision does not guarantee eventual acceptance.
Please submit a copy of your revised paper before 04-Apr-2019. Please note that the revision deadline will expire at 00.00am on this date. If we do not hear from you within this time then it will be assumed that the paper has been withdrawn. In exceptional circumstances, extensions may be possible if agreed with the Editorial Office in advance. We do not allow multiple rounds of revision so we urge you to make every effort to fully address all of the comments at this stage. If deemed necessary by the Editors, your manuscript will be sent back to one or more of the original reviewers for assessment. If the original reviewers are not available, we may invite new reviewers.
To revise your manuscript, log into http://mc.manuscriptcentral.com/rsos and enter your Author Centre, where you will find your manuscript title listed under "Manuscripts with Decisions." Under "Actions," click on "Create a Revision." Your manuscript number has been appended to denote a revision. Revise your manuscript and upload a new version through your Author Centre.
When submitting your revised manuscript, you must respond to the comments made by the referees and upload a file "Response to Referees" in "Section 6 -File Upload". Please use this to document how you have responded to the comments, and the adjustments you have made. In order to expedite the processing of the revised manuscript, please be as specific as possible in your response.
In addition to addressing all of the reviewers' and editor's comments please also ensure that your revised manuscript contains the following sections as appropriate before the reference list: • Ethics statement (if applicable) If your study uses humans or animals please include details of the ethical approval received, including the name of the committee that granted approval. For human studies please also detail whether informed consent was obtained. For field studies on animals please include details of all permissions, licences and/or approvals granted to carry out the fieldwork.
• Data accessibility It is a condition of publication that all supporting data are made available either as supplementary information or preferably in a suitable permanent repository. The data accessibility section should state where the article's supporting data can be accessed. This section should also include details, where possible of where to access other relevant research materials such as statistical tools, protocols, software etc can be accessed. If the data have been deposited in an external repository this section should list the database, accession number and link to the DOI for all data from the article that have been made publicly available. Data sets that have been deposited in an external repository and have a DOI should also be appropriately cited in the manuscript and included in the reference list.
If you wish to submit your supporting data or code to Dryad (http://datadryad.org/), or modify your current submission to dryad, please use the following link: http://datadryad.org/submit?journalID=RSOS&manu=RSOS-190050 • Competing interests Please declare any financial or non-financial competing interests, or state that you have no competing interests.
• Authors' contributions All submissions, other than those with a single author, must include an Authors' Contributions section which individually lists the specific contribution of each author. The list of Authors should meet all of the following criteria; 1) substantial contributions to conception and design, or acquisition of data, or analysis and interpretation of data; 2) drafting the article or revising it critically for important intellectual content; and 3) final approval of the version to be published.
All contributors who do not meet all of these criteria should be included in the acknowledgements.
We suggest the following format: AB carried out the molecular lab work, participated in data analysis, carried out sequence alignments, participated in the design of the study and drafted the manuscript; CD carried out the statistical analyses; EF collected field data; GH conceived of the study, designed the study, coordinated the study and helped draft the manuscript. All authors gave final approval for publication.
• Acknowledgements Please acknowledge anyone who contributed to the study but did not meet the authorship criteria. Your paper has been reviewed by two referees who see strong merit in the study, but Referee 1 suggests some re-analysis, which seem sensible to me. I therefore am recommending that you at least try these analyses to see if they change the results and report this in your revision.

Comments to Author:
Reviewers' Comments to Author: Reviewer: 1 Comments to the Author(s) Dear editor, I have reviewed the paper "Assortative mating frames establishment in a young island bird population." The manuscript is well written and the methods are nicely explained. I am also impressed by the amount of work; genotyping no less than 630 individuals. However, I do have some comments and suggestions regarding the methods applied to analyze the genetic data.
First, the STRUCTURE analysis is based on all loci and all individuals. Using this complete dataset can potentially bias the results. Several loci deviate from Hardy-Weinberg equilibrium while STRUCTURE assumes that loci within populations are in Hardy-Weinberg equilibrium (see introduction of STRUCTURE manual). Next, the analysis of related individuals can cause estimation biases because of shared variation that consequently affects the ancestry analysis (see for example Porras-Hurtado et al. 2013 Frontiers in Genetics). Given that the authors have sampled almost the entire island population; the dataset likely contains several related individuals. I would suggest removing these individuals and rerunning the analyses. You could remove related individuals with ML-related (Kalinowski et al. 2006 Molecular Ecology Notes). Removing related individuals might also solve the issues with Hardy-Weinberg equilibrium. Alternatively, the analyses can be done with the software package CLUSTER_DIST which does not make Hardy-Weinberg assumptions and can deal with related individuals (Rodriguez-Ramilo et al. 2014 Genetics Research).
During the quantification of population differentiation (using Fst and Dest), the samples are divided over six groups. Why did you not pool all the island samples (resident island [>20 obs], resident island [<20 obs] and nestlings)?
Based on the observed heterozygosity values, the authors conclude that there are no signs of a founder effect. Although I agree with this reasoning, this statement can be tested statistically. One could for example use the software BOTTLENECK, which checks if there has been a significant decrease in allelic richness (Cornuet & Luikart 1997 Genetics).
The supplementary material contains simulations of admixture to test particular colonization scenarios. This analyses culminated in the following sentence: "neither scenario resulted in a comparable pattern than the clearly distinct distribution of genotypes in island offspring observed." Hence, I do not see the point of including these simulations in the manuscript. Normally, one would simulate several scenarios and compare the output with the actual data using a goodness-of-fit test. The scenario that most closely matches the data gives some insights into the possible history of the population. I would suggest that either the authors remove the simulations from the paper or they perform a more detailed analyses with statistical validation.
Specific comments Line 18: replace at with on. Line 18: Throughout the manuscript, blackbirds is sometimes written with capital letter and sometimes with small letter. Please check for consistency. Line 19: What genetic markers did you use? Mention the microsatellites here. Line 21-23: Restructure this sentence. I would suggest: We also genotyped high numbers of migrants on stopover and nearby mainland individuals, as they are potential founders of the island population. Line 27: typo -found should be find Line 28: What do you mean with vanished? I would use another word here or rephrase it. Line 42: replace by with with. Line 52: replace run with go Line 53: replace the entire with all Line 57-58: What do you mean with "divergent entities intervene"? Line 65: replace holding with comprising Line 70-71: Rephrase. I would suggest: In this regard, the system offers an interesting setting as the possible source for colonizing the island are birds from …" Line 76: This island population is the most isolated population of Eurasian blackbirds in Central Europe. Do you have a reference for that? Line 84: Mention that you used microsatellites here. Line 103-105: Move this sentence to the beginning of the section. It's important to know how many markers you used from the beginning. Line 104: typo -polymorphic Line 143: It is not clear from the text how you performed the population assignment test. I guess you compared the STRUCTURE output with the origin of the samples. Line 175-177: Can you really distinguish between recurrent migration and several propagules that intermixed later on? Line 186: replace was with were Line 187: I would not use the word "genotype" here. Cluster would be a better term (see also

Do you have any ethical concerns with this paper? No
Have you any concerns about statistical analyses in this paper? No

Recommendation?
Accept with minor revision (please list in comments)

Comments to the Author(s)
The authors have nicely addressed all my concerns. They performed extra analyses to show that the deviations from Hardy-Weinberg equilibrium did not affect the STRUCTURE analyses. And they added a statistical procedure to the simulations. Although I would have liked a statistical test of a potential bottleneck, I understand that this was not feasible with the present data set. I think this manuscript is almost ready for publication. I did, however, find a few minor mistakes in the text (see below). Notably, the results section switches between past and present tense. This can easily be corrected. On behalf of the Editors, I am pleased to inform you that your Manuscript RSOS-190050.R1 entitled "Assortative mating frames establishment in a young island bird population" has been accepted for publication in Royal Society Open Science subject to minor revision in accordance with the referee suggestions. Please find the referees' comments at the end of this email.

Minor comments
The reviewers and Subject Editor have recommended publication, but also suggest some minor revisions to your manuscript. Therefore, I invite you to respond to the comments and revise your manuscript.
• Ethics statement If your study uses humans or animals please include details of the ethical approval received, including the name of the committee that granted approval. For human studies please also detail whether informed consent was obtained. For field studies on animals please include details of all permissions, licences and/or approvals granted to carry out the fieldwork.
• Data accessibility It is a condition of publication that all supporting data are made available either as supplementary information or preferably in a suitable permanent repository. The data accessibility section should state where the article's supporting data can be accessed. This section should also include details, where possible of where to access other relevant research materials such as statistical tools, protocols, software etc can be accessed. If the data has been deposited in an external repository this section should list the database, accession number and link to the DOI for all data from the article that has been made publicly available. Data sets that have been deposited in an external repository and have a DOI should also be appropriately cited in the manuscript and included in the reference list.
If you wish to submit your supporting data or code to Dryad (http://datadryad.org/), or modify your current submission to dryad, please use the following link: http://datadryad.org/submit?journalID=RSOS&manu=RSOS-190050.R1 • Competing interests Please declare any financial or non-financial competing interests, or state that you have no competing interests.
• Authors' contributions All submissions, other than those with a single author, must include an Authors' Contributions section which individually lists the specific contribution of each author. The list of Authors should meet all of the following criteria; 1) substantial contributions to conception and design, or acquisition of data, or analysis and interpretation of data; 2) drafting the article or revising it critically for important intellectual content; and 3) final approval of the version to be published.
All contributors who do not meet all of these criteria should be included in the acknowledgements.
We suggest the following format: AB carried out the molecular lab work, participated in data analysis, carried out sequence alignments, participated in the design of the study and drafted the manuscript; CD carried out the statistical analyses; EF collected field data; GH conceived of the study, designed the study, coordinated the study and helped draft the manuscript. All authors gave final approval for publication.
• Acknowledgements Please acknowledge anyone who contributed to the study but did not meet the authorship criteria.
• Funding statement Please list the source of funding for each author.
Please note that we cannot publish your manuscript without these end statements included. We have included a screenshot example of the end statements for reference. If you feel that a given heading is not relevant to your paper, please nevertheless include the heading and explicitly state that it is not relevant to your work.
Because the schedule for publication is very tight, it is a condition of publication that you submit the revised version of your manuscript before 22-Jun-2019. Please note that the revision deadline will expire at 00.00am on this date. If you do not think you will be able to meet this date please let me know immediately.
To revise your manuscript, log into https://mc.manuscriptcentral.com/rsos and enter your Author Centre, where you will find your manuscript title listed under "Manuscripts with Decisions". Under "Actions," click on "Create a Revision." You will be unable to make your revisions on the originally submitted version of the manuscript. Instead, revise your manuscript and upload a new version through your Author Centre.
When submitting your revised manuscript, you will be able to respond to the comments made by the referees and upload a file "Response to Referees" in "Section 6 -File Upload". You can use this to document any changes you make to the original manuscript. In order to expedite the processing of the revised manuscript, please be as specific as possible in your response to the referees.
When uploading your revised files please make sure that you have: 1) A text file of the manuscript (tex, txt, rtf, docx or doc), references, tables (including captions) and figure captions. Do not upload a PDF as your "Main Document". 2) A separate electronic file of each figure (EPS or print-quality PDF preferred (either format should be produced directly from original creation package), or original software format) 3) Included a 100 word media summary of your paper when requested at submission. Please ensure you have entered correct contact details (email, institution and telephone) in your user account 4) Included the raw data to support the claims made in your paper. You can either include your data as electronic supplementary material or upload to a repository and include the relevant doi within your manuscript 5) All supplementary materials accompanying an accepted article will be treated as in their final form. Note that the Royal Society will neither edit nor typeset supplementary material and it will be hosted as provided. Please ensure that the supplementary material includes the paper details where possible (authors, article title, journal name).
Supplementary files will be published alongside the paper on the journal website and posted on the online figshare repository (https://figshare.com). The heading and legend provided for each supplementary file during the submission process will be used to create the figshare page, so please ensure these are accurate and informative so that your files can be found in searches. Files on figshare will be made available approximately one week before the accompanying article so that the supplementary material can be attributed a unique DOI.
Once again, thank you for submitting your manuscript to Royal Society Open Science and I look forward to receiving your revision. If you have any questions at all, please do not hesitate to get in touch. Comments to the Author(s) The authors have nicely addressed all my concerns. They performed extra analyses to show that the deviations from Hardy-Weinberg equilibrium did not affect the STRUCTURE analyses. And they added a statistical procedure to the simulations. Although I would have liked a statistical test of a potential bottleneck, I understand that this was not feasible with the present data set. I think this manuscript is almost ready for publication. I did, however, find a few minor mistakes in the text (see below). Notably, the results section switches between past and present tense. This can easily be corrected. Dear Dr Engler, I am pleased to inform you that your manuscript entitled "Assortative mating frames establishment in a young island bird population" is now accepted for publication in Royal Society Open Science.
You can expect to receive a proof of your article in the near future. Please contact the editorial office (openscience_proofs@royalsociety.org and openscience@royalsociety.org) to let us know if you are likely to be away from e-mail contact. Due to rapid publication and an extremely tight schedule, if comments are not received, your paper may experience a delay in publication.
Royal Society Open Science operates under a continuous publication model (http://bit.ly/cpFAQ). Your article will be published straight into the next open issue and this will be the final version of the paper. As such, it can be cited immediately by other researchers. As the issue version of your paper will be the only version to be published I would advise you to check your proofs thoroughly as changes cannot be made once the paper is published.

Associate Editor's comments (Professor Michael Bruford):
Your paper has been reviewed by two referees who see strong merit in the study, but Referee 1 suggests some re-analysis, which seem sensible to me. I therefore am recommending that you at least try these analyses to see if they change the results and report this in your revision. Dear Editor, We have now prepared a revision based on the reviewers feedback. To ease their job of accessing our changes you will find the revision in track change mode. In the following you will find a detailed response to all issues raised. We are confident that our revision accounted for all comments in a detailed and appropriate way and looking forward to you decision.
Kind regards, Jan Engler et al.

Comments to Author:
Reviewers' Comments to Author: Reviewer: 1 Comments to the Author(s) Dear editor, I have reviewed the paper "Assortative mating frames establishment in a young island bird population." The manuscript is well written and the methods are nicely explained. I am also impressed by the amount of work; genotyping no less than 630 individuals. However, I do have some comments and suggestions regarding the methods applied to analyze the genetic data. We would like to thank the reviewer in providing a highly valuable and constructive review on our work. We think this critical evaluation substantially improved the work. We provide detailed responses following each paragraph.
First, the STRUCTURE analysis is based on all loci and all individuals. Using this complete dataset can potentially bias the results. Several loci deviate from Hardy-Weinberg equilibrium while STRUCTURE assumes that loci within populations are in Hardy-Weinberg equilibrium (see introduction of STRUCTURE manual). Next, the analysis of related individuals can cause estimation biases because of shared variation that consequently affects the ancestry analysis (see for example Porras-Hurtado et al. 2013 Frontiers in Genetics). Given that the authors have sampled almost the entire island population; the dataset likely contains several related individuals. I would suggest removing these individuals and rerunning the analyses. You could remove related individuals with ML-related (Kalinowski et al. 2006 Molecular Ecology Notes). Removing related individuals might also solve the issues with Hardy-Weinberg equilibrium. Alternatively, the analyses can be done with the software package CLUSTER_DIST which does not make Hardy-Weinberg assumptions and can deal with related individuals (Rodriguez-Ramilo et al. 2014 Genetics Research). We thank the reviewer pointing to this crucial aspect. We were aware of the potential bias included by very related individuals. This was partly the reason why we separated individuals with a high record history (i.e. the "residents island >20") with those of low record history and nestlings. Regarding Structure, we -however -haven't done a separate run including only individuals from groups showing no HWE deviations (namely migrants, mainland birds and resident >20 individuals). We did now. As you will see in the figure below, the changes from the reduced dataset with the full dataset are only marginal and strongest for the very few individuals with very insecure assignments. Therefore, we are confident that the full dataset -while including siblings -will not affect results in a dramatic way so that the conclusions drawn remain unaffected. We now added more information why splitting the island residents into subgroups and also added our reduced structure analysis for full transparency to the reader. While we certainly could add this figure also to the appendix, we decided against because we deem it not necessary as it does not contribute to the main study question. We keep the reporting of all resident subgroups including all nestlings, as we wanted to know the assignment of each individual and hence the fraction to which these clusters are represented at each population level.
During the quantification of population differentiation (using Fst and Dest), the samples are divided over six groups. Why did you not pool all the island samples (resident island [>20 obs], resident island [<20 obs] and nestlings)? As we already explained in the method section (which is now expanded and clarified), we wanted to ensure that we did not wrongly assign birds as "resident" what were color banded but left the island (i.e. migrant or vagrant) or regularly return to it for unknown reasons apart from reproduction. Also this separation ensured the reduction of closely related individuals as shown in the HWE results in Table 2. So we applied this division to all analyses. Hence, comparing the population differentiation (and assignment) should be safest when using the "residents island >20" birds. Indeed, if comparing the general outcomes of the three different cohorts of island birds with migrants or mainland populations the general conclusions remain the same (i.e. island bird differ more from migrants than mainland populations). Yet, there is slight variation between residents with <20 observations to nestlings and those with >20 observations in both differentiation and assignment to other populations which is worth further investigation. Pooling all samples (for all the reasons explained above) could affect the general outcome or at least add more variation we can better handle (and explain) using the separated cohorts.
Based on the observed heterozygosity values, the authors conclude that there are no signs of a founder effect. Although I agree with this reasoning, this statement can be tested statistically. One could for example use the software BOTTLENECK, which checks if there has been a significant decrease in allelic richness (Cornuet & Luikart 1997 Genetics). We initially thought of using Bottleneck or an M-ratio test to statistically prove the presence/absence of a founder effect. However, the utility of these methods is restricted and we see a high chance of erroneous results using them due to a lack of power in both the number of individuals (focusing on unrelated island bird) and microsatellites used here (see Peery et al. 2012 and Hoban et al. 2013 for an extensive discussion). Because of this, we decided to restrict the investigation of a potential founder effect to its discussion based on the observed heterozygosities and refrain from explicit (and testable) hypotheses of this topic.
The supplementary material contains simulations of admixture to test particular colonization scenarios. This analyses culminated in the following sentence: "neither scenario resulted in a comparable pattern than the clearly distinct distribution of genotypes in island offspring observed." Hence, I do not see the point of including these simulations in the manuscript. Normally, one would simulate several scenarios and compare the output with the actual data using a goodness-of-fit test. The scenario that most closely matches the data gives some insights into the possible history of the population. I would suggest that either the authors remove the simulations from the paper or they perform a more detailed analyses with statistical validation. We agree that we should have added a statistical procedure to quantify the outcome of the simulation apart from a descriptive explanation. Hence, we added a comparison based on the unpaired mean difference of assignment probabilities between the island nestlings and each scenario. For each comparison a bootstrap confidence interval based on 5000 iterations is calculated. For the revision, we also redraw Figure S1 in order to better illustrate the differences and their significance. The simulation is of high importance to proof that the absence of admixture in island nestlings is not a matter of random mating of existing genotypes stemming from different sources.

Specific comments
Line 18: replace at with on. changed Line 18: Throughout the manuscript, blackbirds is sometimes written with capital letter and sometimes with small letter. Please check for consistency. unified Line 19: What genetic markers did you use? Mention the microsatellites here. changed Line 21-23: Restructure this sentence. I would suggest: We also genotyped high numbers of migrants on stopover and nearby mainland individuals, as they are potential founders of the island population. Thank you, we used your suggestion.
Line 27: typo -found should be find changed