Forensic characteristics and population construction of two major minorities from southwest China revealed by a novel 37 Y-STR loci system

Y-chromosome short tandem repeats (Y-STRs) have become important supplementary evidence in forensic science. Nowadays, the Y-chromosome STR haplotype reference database (YHRD) contains abundant Y-STR haplotype data from all over the world, while haplotype data of Guizhou Miao and Tujia are scarce. Hence, genetic polymorphisms of 37 Y-STRs were investigated in 446 unrelated males (206 Miao males and 246 Tujia males) residing in Guizhou Province. A total of 206 and 242 unique haplotypes with the highest diversity value of 0.9665 and 0.9470 were obtained. The heatmap, multidimensional scaling (MDS), the unweighted pair-group method with arithmetic means (UPGMA) tree and principal component analysis (PCA) based on the genetic distance (Rst) value within our studied populations and other 26 populations indicated that population structures follow the boundary of the continent. Guizhou Miao and Guizhou Tujia populations have intimate relationships with East Asian populations, especially the geographically close, similar history and the same language family populations.


Introduction
Y-chromosome short tandem repeats (Y-STRs), with the characteristics of male-specific, patroclinous and haplotype genetic, have been widely used in human evolution [1], genealogical research [2], population structure [3], forensic personal identification and paternity testing [4]. Therefore, the relevant databases are indispensable to ensure the haplotype frequencies estimated between two or more population-specific haplotypes [5,6]. Nowadays, the largest of freely accessible online database named the Y-chromosome STR haplotype reference database (YHRD, https://yhrd.org, release 62) have been established. The YHRD website contains abundant Y-STR haplotype data from diverse populations and ethnic groups all over the world. However, there are fewer population data reports of Y-STRs in Guizhou Miao and Guizhou Tujia populations. The AGCU Y37 PCR amplification kit contains six-colour fluorescence, which can greatly improve the polymorphic information content and individual discrimination (http://www. agcu.cn/page8?article_id=25&brd=1).
China, located in East Asia, is the world's most populous country. Additionally, China legally recognizes 56 distinct ethnic groups with a population of more than 1.4 billion in 2020 (https://en. jinzhao.wiki/wiki/China#cite_note-23). All ethnic groups comprise the Zhonghua Minzu. Miao and Tujia are the fourth and eighth largest minorities in China, which constitute approximately 0.7% and 0.6% of the total population (http://www.stats.gov.cn/tjsj/pcsj/rkpc/6rp/indexch.htm). Miao, also named Hmong, is an international ethnic group originated from China. At present, the Miao populations mainly live in southern China (Guizhou, Hunan, Sichuan, Yunan and so on) and some of the Miao populations have migrated out of China into Southeast Asia (Thailand, Laos and so on) (https://en. jinzhao.wiki/wiki/Miao_people). According to the sixth population census in 2010 (http://www.stats. gov.cn/tjsj/pcsj/rkpc/6rp/indexch.htm), the Miao populations are the largest ethnic group in Guizhou Province, with a population of 3.13 million. The Miao people have their own language (Hmong), which belongs to the Hmong-Mien language group of the Sino-Tibetan language family. In the early twentieth century, Miao people created their own writing system. Tujia is an ethnic group with a long history in China. Nowadays, the Tujia populations mainly reside in Guizhou Province, Hunan Province, Hubei Province and Chongqing Municipality bordering the Wuling mountains. Based on the sixth population census in 2010 (http://www.stats.gov.cn/tjsj/pcsj/rkpc/6rp/indexch.htm), about 1.02 million Tujia populations settled in Guizhou Province. The Tujia populations speak the Tujia language but have no script. The Tujia language is one of the Tibeto-Burman languages of the Sino-Tibetan language family (https://en.wikipedia.org/wiki/Tujia_people).
In the present study, 37 Y-STRs were typed to obtain the population data of the two populations. Moreover, the Y-chromosomal characteristics of AGCU Y37 PCR amplification kit were evaluated and the population structure with other populations from home and abroad were analysed. The geographical position is shown in electronic supplementary material, figure S1.

Population samples and ethical statement
Blood samples were obtained from 452 unrelated healthy male individuals who resided in Guizhou Province (southwest China). Among them, 206 Miao populations live in Kaili City, Eastern Guizhou, while the other 246 Tujia populations dwell at Tongren City, Northeast Guizhou. The inclusion criteria were as follows: (i) the volunteers were males; (ii) ancestors were Miao or Tujia populations; (iii) the people have been in Guizhou province for more than three generations. All volunteers provided informed consent under the approval of the Ethics Committee of the Zunyi Medical University (KLLY-2019-080).

Data analysis
Haplotype and allele frequencies were carried out by direct counting. Three multi-copy loci (DYS527a/b, DYS385a/b and DYF387S1) were regarded as allelic combinations. Genetic diversity (GD) and haplotype diversity (HD) were computed by Nei's formula [8]: n(1 − ∑pi 2 )/(n − 1), where n means the sample size, and pi denotes the frequency of the ith allele or haplotype. Match probability (MP) was determined as the sum of allele frequencies squared. Discrimination capacity (DC) was the ratio between the number of different haplotypes and the sample size. Twenty-eight populations from home and abroad were performed using analysis of molecular variance (AMOVA) [9] and multidimensional scaling (MDS) [10] tool on the YHRD website based on the same 27 Y-STRs (Yfiler Plus dataset). Pairwise genetic distances (Rst) were visualized by R Software v. 3.3 using the heatmap package. Unweighted pair-group method with arithmetic means (UPGMA) tree and principal component analysis (PCA) were constructed using Mega v. 7.0 [11] and SPSS v.26.0 [12].

Comparison between Chinese and foreigners
To better understand the paternal genetic relationships between our studied populations and others, 26 different populations from home and abroad were obtained from YHRD. Pairwise genetic distances (Rst) calculated using AMOVA are shown in electronic supplementary material, table S5. The minimum genetic distance was between Yanbian Korean and South Korea Korean (0.0000), while the maximal was between Chamdo Tibetan and Laos Laotian (0.4223). For our studied populations, Guizhou Miao (0.0049) and Tujia (0.0041) had the closest genetic distance with Hunan Miao, while they both had the farthest genetic distance with Chamdo Tibetan (0.2432 of Miao and 0.2266 of Tujia). H u n a n /C h in a ( D o n g ) G u a n g x i/ C h in a ( H a n ) H e il o n g ji a n g /C h in a ( H a n )

N in g x ia /C h in a ( H u i) Q in g h a i/ C h in a ( H u i)
X in ji a n g /C h in a ( K a z a k h ) Y a n b ia n /C h in a ( K o r e a n ) H u n a n /   royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 210447 Italian, Kazakhstan Kazakh and Xinjiang Uighur). PC3 (6.49%) can distinguish the Tibeto-Burmanspeaking populations (Guizhou Tujia, Hubei Tujia, Sichuan Tibetan and Chamdo Tibetan) from the other populations.

Discussion
Presently, a novel 37 Y-STR loci system was tested in the Miao and Tujia populations residing in Guizhou Province. GD values of 37 Y-STRs were all higher than 0.5 with the exception of DYS391, DYS437, DYS438 and DYS645. Three multi-copy loci (DYS527a/b, DYS385a/b and DYF387S1) showed higher gene diversity than single-copy loci (GD > 0.9). Additionally, the AGCU Y37 PCR amplification kit includes all of Y-STRs in the previously developed forensic commercial kit (such as Yfiler and Yfiler Plus) and adds another seven low-medium mutation loci (DYS444, DYS447, DYS527, DYS557, DYS593, DYS596 and DYS645). More low-medium mutation loci increase the individual discrimination and play a vital role in forensic family research. And the additional new Y-STRs included in the AGCU Y37 significantly enhance the HD and DC, but decrease the MP. The results showed that 37 Y-STRs are highly polymorphic and informative. Thus, the AGCU Y37 amplification kit can be used in forensic for supplementary of autosomal chromosome STRs.
Furthermore, 27 Y-STRs included in the AGCU Y37 PCR amplification kit were used to explain the population genetic relationships. In the present study, we used multifarious bioinformatics methods (Rst genetic distance, heatmap, UPGMA, MDS and PCA) to reconstruct the population relationship of Guizhou Tujia and diverse ethnic groups from nine major language families coming from 11 countries (Sinitic: Han, Hui; Tai-Kadai: Bouyei, Dong, Laotian, Thai; Tibeto-Burman: Tibetan, Tujia; Hmong-Mien: Miao, Yao; Turkic: Uighur, Kazakh; Mogolian: Mogolian, Daur; Indo-European: Danish, Indian, Italian; Semito-Hamitic: Bahraini, Arab; Independent: Korean, Japanese). These bioinformatics methods generally employed the population comparisons in the forensic medicine [13][14][15][16][17]. The results showed that there were significant genetic differences between populations belonging to different regions and languages. Guizhou Miao and Tujia populations were all far from the Tibetans. In accordance with Zhang's research [18], the Tibetans differed from other East Asian populations because of the highaltitude adaptation genes (EPAS1 and EGLN1). Except the Tibetans, the East Asians were close to the Central Asians and the Southeast Asians. For the studied populations, Guizhou Miao and Guizhou Tujia had intimate relationships with Heilongjiang Han and Hunan Yao. The results of Rst genetic distance showed that Guizhou Miao and Tujia had the closest genetic distance with Hunan Miao (0.0049 of Guizhou Miao and 0.0041 of Guizhou Tujia) and Heilongjiang Han (0.006 of Guizhou Miao and 0.014 of Guizhou Tujia). In the MDS plot, although Guizhou Tujia and Heilongjiang Han straddled two quadrants, the two populations formed a cluster with Guizhou Miao and Hunan Miao. As for the consistent result of the four populations mentioned above using distinct methods, it might be explained by the geographical location and gene flows. First of all, Guizhou Tujia had an intimate relationship with geographically close Guizhou Miao populations and the Guizhou Han populations, which have a long history of living together and inter-mating. Secondly, the Guizhou Miao and Hunan Miao are of the same ethnicity living in adjacent provinces. Again, most Han minority dispersed in mainland China keeps intimate relationships, so the Guizhou Tujia showed a genetic affinity with Heilongjiang Han. Notably, Guangxi Han was far away from the studied populations, which was consistent with the previous studies [16,17]. Moreover, some differences were found between Northern and Southern China in the Chinese populations. The genetic relationship between the southern populations was tight while that between the northern populations was scattered. Additionally, our studied populations were close to the Sinitic-speaking populations. The results further reflected the gene exchange between the study population and the Chinese-speaking population were significant in a relatively close time period. Four analytical results were almost identical. These differences may probably be caused by geography, cultural, historical and linguistic factors. The same and similar ethnolinguistic and geographical cluster characteristics observed in our study were also be reported using other genetic markers, such as autosomal STRs and X chromosomal STRs (X-STRs). Take, for example, 15 autosomal STR loci [19] and 19 X-STRs [13] based on the same statistical methods in our previous phylogenetic relationship analysis; Guizhou Tujia had an intimate relationship with geographically close populations and the Han populations, whereas far distant from the Tibeto-Burman language-speaking populations. Totally, our findings based on the 37 Y-STRs demonstrate that Guizhou Miao and Guizhou Tujia are genetically similar with geographically close populations and other linguistically close populations, which is in accordance with the autosomal STR and X-STR consequences of geography and language classification. royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 210447

Conclusion
We firstly reported the forensic parameters of 37 Y-STR loci from Miao and Tujia male individuals residing in Guizhou Province. This study followed the use of Y-STRs in forensic analysis under the recommendations of the International Society for Forensic Genetics (ISFG) [20]. All haplotype data were submitted to the YHRD and received accession numbers. These data showed high polymorphism and information in Guizhou Miao and Tujia populations. Additionally, the data could also provide support and supplement for forensic application and population structure.
Ethics. All of the volunteers had been adequately informed and signed the informed consent before sample collection.
This study was approved by the Ethics Committee of the Zunyi Medical University (KLLY-2019-080). The procedures used in this study adhere to the tenets of the Declaration of Helsinki.
Data accessibility. All data are publicly available. Y-chromosomal data genotyped for this study have been submitted to the open access Y-STR Haplotype Reference Database (YHRD, www.yhrd.org) and are available under accession nos. YA004671 and YA004672. Additionally, the datasets supporting this article have been uploaded as the electronic supplementary material.