Relationship between rice farming and polygenic scores potentially linked to agriculture in China

Following domestication in the lower Yangtze River valley 9400 years ago, rice farming spread throughout China and changed lifestyle patterns among Neolithic populations. Here, we report evidence that the advent of rice domestication and cultivation may have shaped humans not only culturally but also genetically. Leveraging recent findings from molecular genetics, we construct a number of polygenic scores (PGSs) of behavioural traits and examine their associations with rice cultivation based on a sample of 4101 individuals recently collected from mainland China. A total of nine polygenic traits and genotypes are investigated in this study, including PGSs of height, body mass index, depression, time discounting, reproduction, educational attainment, risk preference, ADH1B rs1229984 and ALDH2 rs671. Two-stage least-squares estimates of the county-level percentage of cultivated land devoted to paddy rice on the PGS of age at first birth (b = −0.029, p = 0.021) and ALDH2 rs671 (b = 0.182, p < 0.001) are both statistically significant and robust to a wide range of potential confounds and alternative explanations. These findings imply that rice farming may influence human evolution in relatively recent human history.


Introduction
Agriculture was one of the most critical transitions in human history. It fundamentally changed the way people live [1]. Yet, not all farming is the same. Rice farming is particularly important because it was the foundation of some of the world's largest civilizations. Over half the world's population (53%) lives in societies with significant legacies of rice farming [2]. Rice is also essential because it was so radically different from other common staple crops [3]. Although rice is not the only grain crop that humans have relied on, it is different from other major staple cereals such as wheat, barley and millet in important ways. Pre-modern paddy rice required about twice the labour hours as wheat and millet [3][4][5][6]. Furthermore, because it grows best in standing water, rice farmers often had to manage shared irrigation systems. Those shared systems often forced farmers to coordinate their water use and sometimes even flood and drain their fields at the same time [3,6]. Thus, labour and irrigation made rice farmers more dependent on each other.
There is some observational evidence that rice cultures differ from nearby cultures that farm other crops. For example, Davidson [7] found that people in rice-farming societies had a particularly strong work ethic. Talhelm et al. [8] reported that people from rice-farming areas are more interdependent than people from wheat-farming areas. People in rice areas of China were less likely to be alone and even less likely to move a chair out of their way in Starbucks-a sort of fitting into the environment that is more common in interdependent societies [9]. Around the globe, societies with a history of rice farming have less 'relational mobility' [10]. Low-mobility societies have more secure, long-term relationships, but less flexibility and fewer opportunities to meet new people. However, this literature has only analysed the relationship between rice farming and phenotypic traits. We know very little about the mechanisms of how rice farming leads to behaviours. This is particularly a puzzle when we consider evidence of rice-wheat differences among middle-class customers in Starbucks-people who have presumably never farmed rice or wheat in their lives. One possibility is that genes play a role in carrying on behavioural differences between people from rice cultures and non-rice cultures. In this study, we leverage a unique dataset recently collected from counties across China and findings from development in molecular genetics to test whether pre-modern rice farming is associated with modern polygenic traits.

Three reasons rice-farming genetic selection may be plausible
Although the idea that behavioural differences resulting from rice cultivation history may be partly genetic seems controversial, three lines of findings support the possibility. First is the idea that human evolution is not limited to the distant past. Recent research in evolutionary biology has found evidence that natural selection is still operating in contemporary humans [11][12][13]. To name a few examples, researchers have found evidence for natural selection on height, waist-to-hip ratio, skin colour, spleen size and infant head circumference within the last few thousand years [14][15][16]. What is more, researchers have demonstrated that individual genotyping data can be used to directly measure the action of selection [13,17].
Second, there is accumulating and converging biological and genetic evidence to show that the transition from hunter-gatherer to agricultural societies exerted selective pressures on human evolution [16,[18][19][20][21]. A well-known example is a genetic adaptation to digest lactose from milk in adulthood. Studies have found this adaptation came after some human groups domesticated dairy animals 8000 years ago [19]. Researchers have found compelling evidence that cattle domestication and dairying actively selected for lactose-tolerant genes among some Neolithic humans but not others [18][19][20][21]. There are yet more examples of genetic changes linked to the agricultural revolution. Mathieson et al. [21] analysed the ancient DNA of Europeans (who lived between 6500 BC to 300 BC) and found evidence of genetic changes in height, digestion and the immune system that were probably adaptations to settled agricultural life. Other researchers have documented archaeological evidence that early humans' bone density decreased after humans started agriculture [22,23]. These findings raise the possibility that rice farming-a unique agricultural practice that has been around for thousands of years-might have nudged genetic variations in certain traits.
The third line of thought is the accumulating evidence linking behavioural and personality traits to genes [24,25]. Much of this evidence has come from genome-wide association studies (GWAS). Scientists have identified genetic factors associated with reproductive preferences [26], risk preferences [27], time discounting [28] and educational attainment [29]. Given that genetics influence a wide range of royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 210382 behaviours, it is plausible to hypothesize that they influence-even if in a small way-differences between rice-farming cultures and wheat-farming cultures.

China as a natural test case
The current study builds on existing GWAS evidence and leverages a unique Chinese dataset to test for selection by rice domestication and cultivation. China presents a unique test case for the theory for three reasons. First, China has a long history of rice cultivation, giving it enough time for genetic selection to play out [30]. Second, China spans a large geographical area, with millions of people in traditionally ricefarming areas and millions of people in non-rice-farming areas. That gives enough statistical power to test the theory in a robust way. Third, despite China's large population and landmass, it is relatively unified in terms of politics, religion and language (especially compared with other areas with similar population sizes, such as sub-Saharan Africa or the Indian subcontinent). That makes it is easier-but surely still with some difficulty-to limit confounds in ethnicity, national government and language.

Trait candidates
Out of all the possible behaviours and genes that rice might have affected, where to start? We start with a few plausible candidates for phenotypes that might be connected to rice. The advent of agriculture reshaped diet, patterns of labour, population density and settled living [20,23]. Because this is an initial exploration of the theory, we test a broad range of physiological and behavioural traits. Nevertheless, we constrain the set to phenotypes that fit three criteria: (i) factors that might plausibly be linked to diet and subsistence style, (ii) factors that have received multiple empirical verifications linking genes to phenotype, and (iii) factors that give a broad coverage of physiological differences and behavioural traits [26,[28][29][30][31][32][33]. This resulted in genetic variants for bodily dimensions (height and BMI), mental health (risk of depression), alcohol metabolism capacity (alcohol dehydrogenase 1B or ADH1B rs1229984, and aldehyde dehydrogenase 2 or ALDH2 rs671), economic preferences (time, risk and reproductive preferences) and socioeconomic outcome (educational attainment).
We began by constructing several polygenic scores (PGSs) to measure these genome-wide complex traits. The sample consisted of 4101 adult participants from WeGene who consented to participate in research (see Methods). For each individual, we summed the weights of all related alleles. These PGSs are aggregated effects of hundreds or thousands of trait-associated DNA variants identified in GWAS studies, and can be used to predict propensities towards certain traits and outcomes [13,17]. Table 1 presents summary statistics of key variables in the sample.

The measure of rice cultivation
The main explanatory variable is defined as the percentage of cultivated land per county devoted to paddy rice (in a total of 328 counties from 30 provinces; figure 1). We use the earliest county-level rice data we could find, which for most provinces is around the year 2000. These recent statistics correlate highly with rice data from a more limited dataset from 1914 to 1918, r = 0.95, p < 0.001 [37]. Thus, the rice statistics seem to adequately represent historical rice farming. For simplicity, we describe non-ricefarming regions as 'wheat-farming' regions. This is a simplification because non-rice regions also traditionally grew similar dryland crops like millet and barley [3,35]. But, on the whole, rice is negatively correlated with wheat in China r prov = −0.69, p < 0.001. royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 210382 3

Alternative theories and individual control variables
Nevertheless, differences that appear to be due to rice and wheat might actually be driven by a wide range of other regional differences. For example, rice grows in southern latitudes, and researchers  have reported that genetic differences follow a latitudinal gradient from north to south in China. Rice is also correlated with temperature, and, around the world, the temperature is correlated with both height and BMI, albeit weakly (known as Bergmann's rule).
To counteract this problem, we run multivariate regressions testing the effect of rice on polygenic traits while accounting for an extensive range of additional factors. First, we take into account participants' age and the size of the city they grew up in (rural/town/city). Then, based on their birthplace, we control for regional characteristics that might confound the relationship between rice and genes: latitude, longitude, temperature, GDP, distance to coast, contact with herding cultures, regional education, history of rebellion and length of rivers (table 2 lays out all regional variables, sources and theories). Since China includes different ethnic populations, such as Mongolians and Tibetans, we also take into account people's ethnic make-up. Based on this accounting of population stratification, we adjust the model by a total of 42 ancestral compositions for each individual. The 42 ancestries are estimated directly from each individual's genotyping data using the ADMIXTURE program, including (from most to least frequent): Northern Han, Southern Han, Mongolian, Japanese, Naxi/Yi, Dai, Gaoshan, Kinh, Korean, She, Tibetan, Tungus, Ashkenazi, Balkan, Bantusa, Bengali, Cambodian, Egyptian, English, Eskimo, Finnish/Russian, French, Hungarian, Iranian, Kyrgyz, Lahu, Mala, Mayan, Mbuti, Miao/Yao, Papuan, Pima, Sardinian, Saudi, Sindhi, Somali, Spanish, Thai, Uygur, Uzbek, Yakut and Yoruba. Table 3A reports parameter estimates from separate multivariate regressions of each polygenic trait on the percentage of paddy rice. In the benchmark model (Model 1), we include only paddy rice, age, the city size of the place where participants grew up (rural, town or city), 42 ancestral composition variables and latitude/longitude. This model finds that paddy rice is significantly associated with four out of nine traits (PGSs of time discounting, age at first birth, educational attainment and genotype of ALDH2 rs671).
We then sequentially add regional control variables: average temperature (Model 2), distance to coast (Model 3), GDP (Model 4), contact with herding cultures (Model 5) and history of rebellion (Model 6). In almost all models, rice significantly predicts ALDH2 rs671 and people's age of reproduction (i.e. age at first birth PGS). These results fit the idea that selection from rice farming has been operating on genetic variants associated with these two traits. By contrast, the PGSs or genotypes of the other traits were not  royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 210382  [40] rebellion may have affected genetic selection or it may reflect regional cultural differences royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 210382    robustly associated with rice farming. After controlling for other regional differences, rice was not significantly linked to educational attainment, time discounting or ADH1B rs1229984. 1

Testing reverse causality
A potential threat to this analysis is reverse causality. In other words, perhaps certain people had genetic traits that were more suited to rice farming, so they chose to grow rice. For example, if the sensitivity to social norms helps solve the free-rider problem in collective irrigation systems [3], perhaps areas where people were already more sensitive to social norms were then more likely to start farming rice. In this way, genes would (in a sense) cause rice farming, rather than rice farming selecting for genes.
To test for reverse causality, we exploit exogenous variations that determine regional differences in rice farming. We select a natural instrumental variable (IV) that measures environmental suitability for wetland rice modelled by the United Nations Food Agriculture Organization's Global Agroecological Zones database [8]. Then we use two-stage least-squares (2SLS) models to tease apart the causal impact of rice farming on polygenic traits by incorporating the environmental suitability of rice growing variable and its quadratic form as instruments. Table 3B presents 2SLS estimates for the causal influence of rice farming on polygenic traits (using the full set of control variables from Model 7). We first test the validity of these instruments using the first stage F-test and the overidentification test. In the first stage, the F-statistic is 84.74, far exceeding the traditional cut-off of 10 for weak instruments. The p-values of the Sargan statistics are all higher than 0.1, indicating that the environmental suitability variable and its quadratic form are strong and valid instruments for the percentage of paddy rice field.
Compared with the earlier rice farming results (table 3A), the results incorporating environmental rice suitability remain robust for the ALDH2 rs671 genotype (aldehyde dehydrogenase deficiency) and age at first birth. Hence, our 2SLS estimation results imply that the selective pressures from paddy rice farming seem to favour individuals with a lower alcohol tolerance and people with polygenetic scores for having children at an earlier age. Furthermore, reverse causality is not likely to be driving the results. 2

Additional robustness checks
We then extend our analysis by performing a number of checks to test whether the findings are robust. First, we test historical disease rates in order to test the pathogen prevalence theory. The pathogen prevalence theory argues that, in areas with more communicable diseases, humans developed behaviours that helped protect them against disease [39]. For example, researchers have found that areas with higher historical rates of disease have higher fertility rates, lower birth weights and higher collectivism [39]. Because diseases are so clearly linked to life and death, pathogens seem like a plausible place to look for factors that would influence genetic selection.
To measure historical pathogen prevalence, we use rates of epidemic diseases in the Ming and Qing Dynasty at the province level (AD 1368-1911) [38]. These data lack three outlying provinces (Qinghai, Xinjiang and Inner Mongolia), leaving a total of 3956 participants. Pathogen rates did not predict genetic differences (table 4A).
Second, we test whether farming in general (as opposed to rice farming in particular) can explain genetic differences. To measure farming, we use the percentage of cultivated land per province. The multivariate regression results show that farming does not predict genetic differences, except for genetic variants associated with educational attainment (table 4B). Why might farming density select for educational attainment? This link may imply selection on specific brain functions or non-cognitive traits, which are correlated with genetic components of educational attainment [13,29]. Another theory 1 The fact that we consider nine different outcomes raises concerns about multiple hypothesis testing (MHT) and false positives. Failure to account for MHT can lead to a substantial risk of false discoveries in the econometric analysis [41]. In order to address the issue, we implement a common p-value correction method of the D/AP procedure and evaluate adjusted p-values for robustness [42]. For correlated N outcome variables to be tested, the adjusted p-value is calculated as p 1,adjusted = 1 − (1 − p 1 ) M1 , where M1 = N 1−r1 , r1 = (N − 1) −1 P N j=n r jn , and r jn is the correlation coefficient between the jth and nth outcome variables, and so forth. Additional results after accounting for MHT are generally consistent with main findings presented in Table 3, and are available upon request from the corresponding author. 2 To relax the assumption of strict exogeneity of instrumental variables, we also employ the plausibly exogenous approach built upon the inference strategy of plausibly exogenous instruments [43,44]. Plausibly exogenous estimation results suggest that violations to the exclusion restriction assumption may not be a critical issue in the current study. Results are available upon request from the corresponding author.
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 210382 9 Table 4. Testing alternative theories. This table shows parameter estimates (with p-values) for two alternative theories: historical pathogen prevalence (A) and the proportion of land that is cultivated, which tests farming in general, as opposed to rice farming in particular (B). All models include age, urbanization, 42 ethnic ancestry variables, latitude, longitude, temperature, distance to coast, GDP, history of herding and history of rebellion. that could explain this is the research on population density and 'life history' strategies [45]. Life-history research has found that people in densely populated places tend to shift their strategies from 'live fast, take chances' risky strategies to long-term investment strategies such as focusing time on fewer relationship partners and investing more in education. Areas with denser farming, in general, would have higher population densities and presumably more of the human institutions that go along with density, such as governments and schools. 3 2.5. Why would rice select for these genes?
Overall, the data suggest that rice cultivation exerted selective pressures in favour of earlier reproduction and alcohol intolerance. This leaves the question of why rice farming might have selected for these specific traits. Although we cannot prove any particular theory from our data, we offer initial hypotheses based on the ecology of rice.

Rice and childbirth
Why might rice farming have selected for earlier childbirth? Three features of rice areas are consistent with this idea: (1) Rice is far more labour-intensive than wheat and other dryland crops [3][4][5]. Researchers calculated how much labour a husband and wife would need to grow enough rice to eat and to barter for necessities like clothing and tools [46]. They concluded the labour demands were so high that a husband and wife would not be able to farm a large enough plot of rice to support the family if they relied on their labour alone. Children provided labour for farm families. Children were labour in farm families all over the world, but this may have been particularly critical for rice farmers. The idea that children could contribute to labour demands is consistent with one anthropologist's observation that rice farmers in China preferred to meet peak labour demands by enlisting family and extended family, rather than neighbours or wage labourers [46]. Thus, the environment for rice farming may have selected for people who had children at a younger age and thus had more offspring and labour in the family over a lifetime. (2) Rice farming is more productive than crops like wheat and millet. Historically, paddy rice produced three to five times the output per acre as wheat [5]. The relative abundance of food might have encouraged earlier reproduction in rice-farming communities [47]. (3) There is also evidence that China's rice areas had shorter life expectancy than wheat areas historically [35]. There is evidence that people in rice areas had less-diverse diets (deficiency of certain nutrients such as vitamin B and iron) and denser populations in an era without sanitation infrastructure [35]. One historian argued that the high productivity of rice might have paradoxically created more catastrophic collapses [35]. Because rice was so productive, it would have supported a larger population, which would have incentivized people to turn more land into rice fields. China's impressive rice terraces-rice fields cut into steep mountains-hint at this conversion of even marginal land into rice land. Although this would have raised overall productivity, it would have made the local ecosystem more susceptible to periodic collapses from drought or crop diseases [35]. The shortened lifespan and risk of disaster could have plausibly increased pressure to have children at younger ages. Such a pattern has been observed in domesticated animals, which usually have shorter lifespans but have offspring at earlier ages and with higher frequency.

Rice and alcohol
Why might rice have selected for genotypes of aldehyde dehydrogenase deficiency? This gene leads to excessive accumulation of acetaldehyde in the liver and the alcohol-flushing reaction, sometimes called the 'Asian flush' [30]. Grain was a common ingredient used to make alcohol. Some researchers have theorized that rice farmers may have had earlier access to the excess grain used to make alcohols like rice wine as early as 9000 years ago [11,30]. 'Agriculture and the making of fermented beverages go together. 3 We also perform a series of robustness checks by using proxies of historical rice farming, including the province-level percentage of cultivated land devoted to paddy rice in 1934, the province-level paddy rice yield per unit area (Mu or in Chinese) in the Qing dynasty, and the province-level total number of paddy rice varieties recorded by the administration of Ming (AD 1368-1644) and Qing (AD 1636-1912) dynasties. These additional results are consistent with original findings, and are available upon request from the corresponding author.
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 210382 Most hunter-gatherer populations do not have the means, know-how, or resources' to make alcohol [11]. With more alcohol available, rice areas may have had more experience with the consequences, such as alcoholism, child neglect, altered innate immune modulation and tumour development [48]. Thus, rice areas may have been exposed to alcohol pressure for longer than other populations [11,30].
We offer initial theories for why rice might have selected for these genes, but these are early steps since there is still a lack of direct archaeological evidence (such as from ancient DNA). As historians and anthropologists uncover more about the history of rice farming, we can refine our understanding of how it might have selected for particular genes. Furthermore, these results await future replication in samples from China and other rice-farming populations around the world, from East Asia, to India, and West Africa.

Potential implications for modern society
PGSs are not destiny. The relationship between genes and behaviour can differ entirely in a different environment, such as the same country in an earlier era versus a later era. Thus, it is important to treat implications of genetic differences with caution.
However, these differences may be a starting point for researchers investigating regional differences in China or between China and other countries. One relatively straightforward implication is that the aldehyde dehydrogenase difference in southern China would lead to less problematic drinking and perhaps less drinking overall. This hypothesis would be fairly straightforward to test.
The genetic scores linked to earlier childbirth could lead to the prediction of higher fertility or earlier birth of the first child in rice-farming parts of China. However, birth rates have been falling across China in the last century, and modern forms of contraception may disrupt the link between gene-related fertility and actual childbirth. However, it is possible, for example, that rates of unintended pregnancies may be higher in rice-farming parts of China.

Discussion
In sum, genetic data from over 4000 people across China produced evidence that genes for earlier reproduction and alcohol flush response were more common among people from areas with more historical rice farming. Rice farming was negatively associated with PGSs for educational attainment, although this relationship became marginal after controlling for the history of herding.
The effect of rice remained robust after controlling for individual demographic characteristics, ethnic make-up, a range of regional characteristics and potential self-selection into rice farming. Moreover, the large sample size of counties substantially increases statistical power and allows for greater control over confounding factors in the analysis. The results of this study suggest that a major cultural transition in human history had small but detectable effects on genes.
Researchers used to believe that evolution worked so slowly that meaningful changes were unlikely to have happened in the last 10 000 years of human history. But more recently, researchers have concluded that 'evolutionary change typically occurs much faster than people used to think'. There is also evidence that human evolution actually sped up in the last 40 000 years [49]. If rice domestication selected for particular genes, it would fit with this emerging picture of relatively recent human evolution.
We should note several limitations in our data that point to possible future improvements. (i) The current study is based on a sample of 4101 observations, which may lack statistical power due to the small sample size. (ii) The GWAS summary statistics used to construct the PGSs in this study were mostly based on samples of European ancestry, which may lead to a Euro-centric bias and limit the predictive power constructed PGSs [14]. 4 (iii) Identifying regional ancestry through the place of birth is not perfect. This method may misidentify people whose recent ancestors moved large distances. (iv) We analysed genetic differences but not phenotypes or actual behaviour. Genetic propensities are not destiny. (v) We do not have DNA samples from historical periods (e.g. ancient DNA). If future researchers gain access to historical DNA samples, this will allow for a directly test or completely rule out of the reverse causality issue.
It is worth remembering that environment is not destiny, either. It would be overly simplistic to expect that exact same pattern of results everywhere people grow rice. There is ample evidence that 4 In a recent study, Duncan et al. [50] demonstrated that the polygenic score performance may be reasonably reliable in East Asian samples (95%) relative to European samples (100%). We expect the generalizability of polygenic scores to non-European ancestry populations to be an active area of methods development in the near future. the same type of environment does not always lead to the same culture. As one small example, how farmers dealt with peak labour demands in rice differed across cultures. While Chinese farmers preferred to trade labour with family members, West African rice farmers sometimes relied on groups of youths, who would move from farm to farm. Rice presents common challenges, but cultures' solutions to those challenges (and the genetic selection pressures that come along) may differ.
Finally, the finding of rice-wheat genetic differences presents a hint about a puzzle of modernization. As fewer and fewer people are farming in China, how is it that rice-wheat differences persist in modern China? Studies have found rice-wheat differences among people who do not farm [8,9]. Genetic differences present one possible mechanism-but surely not the only mechanism-through which historical differences in subsistence style live on in the present day.

Data and research design
Through WeGene, we conducted an online survey with its customer base. WeGene is a leading private genetic testing company based in Shenzhen, which provides direct-to-consumer genetic testing and personalized healthcare information. After providing informed consent, approximately 4700 participants took our online survey between July 2018 and October 2019. The survey collected information on participants' demographic and socioeconomic characteristics such as gender, birth year, birthplace and parental birthplaces. Excluding individuals who did not finish enough questions or were under the age of 16 at the time of the survey yielded a total of 4101 observations from the original sample. An important feature about our dataset is that all respondents were genotyped on a WeGene custom genotyping array (Illumina). Imputation and quality control were performed using PLINK (1.90 Beta), SHAPEIT (v. 2.17) and IMPUTE2 (v. 2.3.1). For each individual, we obtained a total of 10 670 107 SNPs, which we then used to construct PGSs for a number of behavioural and psychological traits.
We built PGSs (also called 'polygenic risk scores', 'genetic risk scores' or 'genome-wide scores') for all 4101 individuals using effect estimates from recently published GWASs on height, BMI, depression, time discounting, age at first birth, educational attainments and risk preference [30][31][32][33][34][35][36][37]. We calculated each PGS as the sum of imputed allele j dosages carried by a respondent i (SNP j,i ) multiplied by the estimated effect size (β j ) reported by related GWAS, i.e. PGS i ¼ P j i¼1 b j SNP j:i . We then normalized all PGSs between 0 and 1.
Two exceptions are the ADH1B rs1229984 gene and ALDH2 rs671 gene. ADH1B encodes alcohol dehydrogenase, and ALDH2 encodes aldehyde dehydrogenase. ADH1B is solely determined by SNP rs1229984, and ALDH2 is determined by SNP rs671 alone, rather than multiple SNPs. Therefore, we directly measure ADH1B rs1229984 and ALDH2 rs671 as the number of effect alleles, resulting in three possible values (0, 1 and 2).

Statistical analysis
We estimate the following multivariate regression model using ordinary least squares (OLS), where i denotes individuals and j denotes county of birth The dependent variable is respondent i (born in county j ), who has one of the nine PGSs or genotypes (i.e. height, BMI, ADH1B, ALDH2, depression, time discounting, age at first birth, educational attainments and risk preference); pt rice ij represents the proportion of rice farming in county j that i was born in, and λ is the coefficient of primary interest; X ij contains a set of control variables, including individual characteristics, ancestral compositions and measures of county/province differences depending on the specific model described in the main text.
To mitigate the concern of rice farming endogeneity, we estimate two-stage least squares (2SLS) models using the following equations: first stage: pt rice ij ¼ rice suitability ij d 1 þ rice suitability 2 ij d 2 þ mX ij þ j i ð4:2Þ In the first stage (equation (4.2)), the endogenous variable of county-level rice farming percentage (pt rice ij ) is regressed on instruments of the environmental suitability for rice (rice suitability ij ) and its royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 210382