How the zebra got its stripes: a problem with too many solutions

The adaptive significance of zebra stripes has thus far eluded understanding. Many explanations have been suggested, including social cohesion, thermoregulation, predation evasion and avoidance of biting flies. Identifying the associations between phenotypic and environmental factors is essential for testing these hypotheses and substantiating existing experimental evidence. Plains zebra striping pattern varies regionally, from heavy black and white striping over the entire body in some areas to reduced stripe coverage with thinner and lighter stripes in others. We examined how well 29 environmental variables predict the variation in stripe characteristics of plains zebra across their range in Africa. In contrast to recent findings, we found no evidence that striping may have evolved to escape predators or avoid biting flies. Instead, we found that temperature successfully predicts a substantial amount of the stripe pattern variation observed in plains zebra. As this association between striping and temperature may be indicative of multiple biological processes, we suggest that the selective agents driving zebra striping are probably multifarious and complex.


Summary
The adaptive significance of zebra stripes has thus far eluded understanding. Many explanations have been suggested, including social cohesion, thermoregulation, predation evasion and avoidance of biting flies. Identifying the associations between phenotypic and environmental factors is essential for testing these hypotheses and substantiating existing experimental evidence. Plains zebra striping pattern varies regionally, from heavy black and white striping over the entire body in some areas to reduced stripe coverage with thinner and lighter stripes in others. We examined how well 29 environmental variables predict the variation in stripe characteristics of plains zebra across their range in Africa. In contrast to recent findings, we found no evidence that striping may have evolved to escape predators or avoid biting flies. Instead, we found that temperature successfully predicts a substantial amount of the stripe pattern variation observed in plains zebra. As this association between striping and temperature may be indicative of multiple biological processes, we suggest that the selective agents driving zebra striping are probably multifarious and complex.

Introduction
Coloration and patterning are important adaptive characteristics in many taxa [1][2][3]  environmentally correlated gradients in pigmentation that are strongly suggestive of adaptation to local environments [4][5][6]. Zebra stripes are among the most striking mammalian coat patterns, yet whether they are adaptive has not been established nor have the drivers of natural selection been pinpointed. The clinal modification in striping pattern that plains zebra (Equus quagga) exhibit (figure 2; electronic supplementary material, figure S4) suggests that environmental factors may create selective pressures that play a role in determining stripe patterns. Within this single species, individuals run the gamut in terms of striping patterns: some have strong striping over their entire body, while others have few to no stripes on the legs, and faint shadow stripes interspersed with the primary stripes along the torso. Individuals of the extinct quagga subspecies from South Africa had the least amount of striping, with stripes limited to the head, neck and torso. The quagga is thought to have diverged quite recently from other plains zebra and may have undergone stripe loss relatively rapidly, possibly associated with a more open, drier environment [7]. Variation in striping patterns has also figured prominently in subspecific classifications [8,9]. Distance and large barriers such as the Zambezi River may play a role in the observed variation [8,10], but the lack of genetic structuring observed among populations that exhibit divergent stripe phenotypes [11] suggests that stripe variation may represent adaptive variation rather than simply resulting from drift in isolated populations. That the degree of striping has a genetic basis is clear from a recent heritability study conducted in captive plains zebra [12].
There are a number of adaptive hypotheses for the existence of striping in zebras [13], including predation evasion [14,15], thermoregulation [14], social cohesion [16] and avoidance of biting flies [17]. In this paper we put the predation, thermoregulation and biting fly hypotheses to a spatially explicit empirical test by modelling how variation in plains zebra stripe pattern is associated with variation in environmental variables. For example, zebra are a common, possibly preferred, prey of lion [18,19]. Computer simulations with human subjects support the notion that strong black and white patterns make it difficult for predators to capture their prey [20]. Computer simulations have also shown that such patterns influence perceived size [21], speed [22] and trajectory (much like the optical illusions created by the spinning spokes of a wheel or the rotating stripes of a barber's pole) [23]. Each of these ideas depend on a strong pattern in order to produce the corresponding effect. So if zebra stripes function as an optical illusion that reduces the success of predation efforts, we would expect to see bolder, black and white stripes in areas where encounters with lions are more likely.
Striping has been shown experimentally to prevent both glossinid (tsetse) and tabanid flies from landing on surfaces [17,24,25]. For tabanids this phenomenon has been explained as reduced polarotactic attraction caused by the interspersion of weakly polarizing white stripes between the strongly polarizing and attractive black stripes [24]. However, this explanation does not work for other biting flies, such as tsetse flies, which are not polarotactic [26]. More likely, stripes may break up the silhouette of the body against a strong background [17] or create an optical illusion that confuses flies [23]. Biting flies tend to feed low on the body, especially around legs where the zebra's skin is the thinnest [24,27,28]. Thus, we expect that stripe characteristics of the legs, or perhaps even the belly, will correlate with tsetse fly prevalence if tsetse flies have played a role in the evolution of zebra stripes.
The hypothesis that stripes help zebra thermoregulate has not been tested empirically. This hypothesis is based on the idea that black and white stripes would heat up differentially, thus causing differential airflow between black and white stripes and creating eddies of air that would have a cooling effect [14]. This mechanism should work most effectively on strong, contrasting stripes, so we would predict good coverage with bold black and white striping to occur in areas in which zebra are regularly exposed to higher temperatures.
In this paper, we make the implicit assumption that striping is adaptive as we attempt to identify potential drivers of natural selection. We make an assumption typically made in studies of the adaptive significance of traits [29], that variation in current striping patterns can be explained by current environmental conditions and that the relationship between stripes and environment has remained stable over time, although the specifics of the distributions of habitats and, consequently, stripe patterns may have changed. We conduct spatially explicit modelling of plains zebra striping using a set of environmental layers that allow us to test the hypotheses that stripe characteristics are associated with predation, biting flies or temperature. In addition to these specific hypotheses about stripe thickness and stripe saturation as they relate to predicted lion, tsetse fly and temperature distributions, we also investigate available climatic and remotely sensed variables such as precipitation, surface moisture and vegetation characteristics to investigate whether these environmental predictors help explain striping patterns.

Results
To analyse the relationship between stripe phenotype and environment, we quantified stripe characteristics at 16 sites across the plains zebra range (electronic supplementary material, figure S1). One of the 16 sites is represented by a single photograph of the extinct quagga subspecies, which is the only quagga specimen with precise locality data such that environmental data could be assigned to it [30]. Stripe characteristics were quantified for forelegs, hind legs, torso and belly (electronic supplementary material, figure S2), and included stripe number, thickness, length and colour saturation. As a single stripe can be described by thickness, length and colour saturation, we multiplied standardized values of these traits in order to describe the overall quality of a stripe, which we call stripe definition.
We ran a random forest model [31][32][33][34][35] as implemented in the package randomForest [36,37] in R [38] using a set of 19 predictive variables. Environmental variables successfully explained at least 30% and as much as 63% of the variance for 12 of 18 striping characteristics: foreleg stripe number, thickness, saturation and overall definition, hind leg stripe thickness and definition, torso stripe number, length, thickness, saturation and definition, and belly stripe number (table 1). The most consistently important variables were isothermality (BIO3) and mean temperature of the coldest quarter (BIO11). Maximum annual vegetation as measured by the Normalized Difference Vegetation Index (NDVIMAX) and precipitation of the wettest month (BIO13) were also important for some characteristics. Estimated tsetse fly and lion distributions, by contrast, consistently failed to predict stripe pattern variation.
A good model has the capacity to predict characteristics for sites that were not included in the original model. We therefore tested our random forest models by quantifying stripe characteristics at eight additional sites (electronic supplementary material, figure S1) using photographs posted on the photo sharing website Flickr, or sent to us in response to requests placed there and on the University of California, Los Angeles' (UCLA) Center for Tropical Research website. We extracted the values of the best environmental predictor variables for the test sites, used these to predict the values for stripe characteristics across those sites, and regressed the predictions onto the observed values of stripe characteristics. The validity of our model estimates was well supported for stripe length, thickness, saturation and overall definition on the foreleg, thickness and definition on the hind leg, and number of stripes and definition on the torso. Our models successfully predicted the values of these eight stripe characteristics across the eight test sites (table 1). Models were also relatively successful at predicting hind leg stripe length and saturation at the test sites (table 1), while other characteristics, particularly those of the belly, were not accurately predicted for new sites. The random forest model and correlations between observed and predicted are shown in figure 1 for hind leg stripe thickness and torso stripe definition (as the quagga point is based on a single individual we reran models and predictions without that point included. Results were concordant with our previous models, electronic supplementary material, table S2).
We predicted stripe characteristics across the historical plains zebra range [10] by first extracting the values of the best predictor variables for 50 000 random points across Africa using ARCGIS (Environmental Systems Research Institute, Redlands, CA). We then used a random forest model to predict values of stripe characteristics for these 50 000 points (interpolated using ordinary kriging and limited to the plains zebra known range). Figure 2 shows the predicted distribution of stripe variation for hind leg stripe thickness and torso stripe definition. Both maps show that temperature accurately predicts the general pattern of zebra striping. Interestingly, the region of Africa known to have harboured E. quagga and Equus burchelli [10], the zebra subspecies typically having no stripes on the legs, is well delineated in the map of hind leg stripe thickness, in spite of the model containing only one data point in which zebra had no leg stripes. In addition, predictions of torso stripe definition correctly identify regions at central latitudes in eastern Africa in which zebra have full length and fully saturated but thinner stripes on the torso than zebra farther north. Bivariate plots (electronic supplementary material figure S4) show the relationship between the two traits shown in figures 1 and 2 and the environmental variables identified as important by our models.

Discussion
We found that environment, particularly temperature, was a significant predictor of zebra stripe patterns across their entire range in Africa. This was supported by the large amount of variation explained by the models containing these variables, as well as the successful predictions of striping characteristics at new sites. The stripe characteristics that were most readily explained by environmental variation   were stripe thickness and definition on forelegs and hind legs, and the number of stripes and definition on the torso, suggesting that these traits are the most likely targets of selection. This correlation with temperature may be explained by more than one causal mechanism, as discussed below, and will require further investigation. Our finding that the two environmental variables most closely associated with variation in striping were both temperature variables lends support to the hypothesis that striping may be related to thermoregulation. One would expect that mechanisms to adapt to temperature would be most relevant on the torso, and the stripe characteristics of the torso are well explained by isothermality (BIO3) and mean temperature of the coldest quarter (BIO11). Given the hypothesis that stripes give rise to differential air currents that produce a cooling effect [14], intense black stripes would be expected to create more of a differential relative to white stripes, and stripe saturation is greatest in the tropics where animals experience sustained high temperatures. There have been no published direct tests of the thermoregulation hypothesis, however preliminary observations using a non-contact infrared digital thermometer gun show that zebra maintain a significantly lower surface body temperature (29.2 • C vs 32.5 • C) than nearby, similar sized herbivores grazing under the same conditions [39], and observations of differences in shade seeking behaviour between thin striped Grevy's zebras and thick striped plains zebras suggest stripe thickness could play a role in thermoregulation (D. I. Rubenstein 1980, personal observation).
The association between temperature and striping on the legs is not easily explained as a mechanism for thermoregulation. It may simply be a result of genetic correlation as stripe characteristics on the legs and torso are highly correlated (electronic supplementary material, table S1), or it may be a response to a different mechanism, such as avoiding fly bites. Tsetse flies and other biting flies can negatively impact animals in a number of ways: both directly, through loss of time spent foraging and energetic expenditures that lead to weight loss [40][41][42], and indirectly, through the transmission of disease [43]. The possibility that biting flies could be a selective agent favouring striping is supported by experimental (a) Importance scores for each environmental variable used as input to random forest algorithm models for hind leg stripe thickness and torso stripe definition. Variables with higher mean square error (calculated as the average increase in squared residuals when the variable is permuted) are more important. Variables having an importance score greater than the absolute value of the lowest negative scoring variable (solid vertical line) are potentially important and informative [30]. Variables shown with a black circle are those that remained important as the model was refined. (b) Correlations between observed and predicted values for hind leg stripe thickness and torso stripe definition.
evidence that tsetses and tabanids avoid striped surfaces [24]. However, we found no relationship between tsetse flies and variation in striping across populations, which suggests the explanation for striping in zebra is more complex than simply the avoidance of biting flies. While predicted tsetse fly distributions did not explain stripe variation, it is highly likely that the strongest selective effect of biting flies is not the bites themselves, but the diseases they may transmit. Tsetse flies, for instance, do not ubiquitously carry trypanosomes [44], partly because the ability of trypanosome infections to develop in the flies is temperature dependent, with even a small temperature difference of 3 • C significantly reducing infection rates [45]. Thus, the distribution of the trypanosomes and diseases carried by tsetse flies and other flies may be quite different from the distribution of the flies themselves and could have more relevance to variation in stripe pattern. We suggest that temperature may influence trypanosome prevalence in tsetse flies and as a consequence help explain variation in striping. A recent paper [46] compared the number of stripes on different body parts of the seven extant equid species and their subspecies. The authors found significant correlations between either tabanid or tsetse flies and various aspects of striping. However, several points should be considered when interpreting these results. Estimates of tabanid fly prevalence were not based on species distributions modelled using tabanid location data, but simply on ranges of temperature and humidity that the authors estimated to  populations. Hind leg stripe thickness is best predicted by BIO3 and BIO11. Torso stripe definition is best predicted by BIO3, BIO11 and BIO13.
be favourable for tabanids. This both renders the reliability of the estimated tabanid ranges questionable and begs the question of whether certain ranges of temperature and/or humidity could themselves help explain observed striping patterns. The variables used represent a broad overlap between mean values of environmental variables and the species or subspecies ranges, and the use of standard statistical tests with a small number of observations meant that only variables passing individual bivariate tests for significance were added to their model. Thus, their analysis ignores the substantial variation in environmental variables that occurs within each species and subspecies range and does not allow a direct test of multiple hypotheses within the same model. Finally, their work found that the number of belly stripes in a species was positively correlated with predicted probabilities of tsetse flies. However, only one species of equid, the plains zebra, actually has both belly stripes and significant range overlap with tsetse flies, suggesting this relationship requires more rigorous analysis.
It is not particularly surprising that probabilities of lion occurrence failed to predict zebra stripe pattern, as lion are nearly ubiquitous throughout the range of plains zebra and the predicted probability of occurrence does not vary greatly (electronic supplementary material, figure S3). The predicted probability of lion in the region inhabited by the minimally striped quagga is approximately average relative to our other locations. In addition, if striping were a deterrent to lion predation, we might expect lion predation on zebra to be lower in proportion to their abundance than it is on other mammals. Instead, predation on zebra by lion has been found to be in excess of their relative abundance [18,19]. One factor that could make it difficult to detect a true effect of striping on predation using our dataset is a potential interaction between the animal pattern and the background pattern in creating camouflage or predator confusion [20,[47][48][49]. Quantification of the backgrounds provided by the habitats inhabited by zebra would be useful.
One notable aspect of both the training and test datasets is the paucity of sites from southernmost Africa. This is due to the fact that zebra were extirpated from most of South Africa in the late 1800s resulting in the loss of the quagga and burchelli subspecies, the two with the least striping and with typically no striping on the legs. The only GIS referenced phenotype we have been able to obtain for either of these subspecies is one quagga specimen from the Naturhistorisches Museum Basel, which was collected from a location between Shilow and Whittlesea in South Africa [30]. The ability of this single point to help delineate the region in which zebra with no stripes on the legs used to occur is surprising and gives even greater credence to the importance of temperature in explaining the adaptive significance of striping.
Our study has shown a strong correlation between temperature and striping across the plains zebra's range. The relationship between temperature and stripe pattern is clear and, along with the lack of genetic structuring among populations [11], suggests an adaptive explanative for stripe variation. We find no clear support for the hypotheses that biting flies or predators have driven stripe evolution, however, the reduction of striping in areas with seasonally low temperatures could be a response to a reduction in the infection rate of tsetse flies with trypanosomes, a reduction in the need for convective cooling, or both. We expect the function of striping will prove to be complex given the multifarious effects of striping shown through experimentation to date, and the influence of not only temperature variables, but also the additional, albeit smaller influence, of precipitation and NDVI.
Much additional work is needed to elucidate the true functionality of striping in zebra. Our work shows a correlation with temperature, but the cause of this correlation remains unknown. More realistic experimental studies are needed to investigate whether stripes function at a distance to help zebra evade detection, and whether they function at close range to create optical illusions resulting in sufficient confusion to decrease the likelihood of being bitten or predated. For instance, experimental evidence is mixed as to whether the visual signal of striping may be overcome by the attractiveness of odours such as CO 2 and ammonia [17,50] that tsetse flies use to locate animals, and direct evidence from live animals in the field is completely lacking. Although zebra blood is not typically found in surveys of tsetse fly blood meals [51,52] and screenings of wild mammals rarely identify zebra with trypanosomes [53], the same is true for many mammal species that lack stripes. Furthermore, trypanosomiasis has been found in zebra [54]. Therefore, one cannot reasonably conclude that the lack of parasite data in zebra is related to the characteristic of striping. While stripes clearly create confusion in the constrained environment of a computer screen, this same phenomenon may not occur on a larger scale under normal conditions. Larger scale experiments using live animals or a virtual system are clearly warranted. Experiments also need to be conducted to investigate the potential role of temperature. Whether stripes function in thermoregulation, how environmental factors influence tsetse trypanosome loads and likelihood of infection if bitten, all need to be studied. Finally, additional modelling, incorporating genetic data, additional sites, and new variables suggested by experimental data, should be run in order to clarify the pattern of variation and its causes.

Phenotype datasets
A minimum of eight zebra per site was measured for the training dataset (with the exception of the single data point for the quagga), and five or more for the test dataset. Zebra were photographed in profile with the entire body and legs (to fetlock) visible. The bulk of the photographs were taken by the authors and a few were donated by trusted individuals (other researchers or professional tour guides). As a convention, we treated the dark regions of the pelage as stripes. For each zebra, four stripe characteristics were quantified on each leg (L) and the torso (T), while three were quantified on the belly (B) (electronic supplementary material, figure S2). We quantified the number of stripes (LTB), length of each stripe (LT only, horizontally across the leg and vertically across the torso), stripe thickness (LTB) and stripe saturation (LTB). Stripe thickness was measured in pixels down the midline of the legs and torso and along the belly, and standardized by dividing by the pixel length of the body part. Stripe thickness and body part lengths were measured using IMAGEJ software [55]. For each stripe, however small or faint, length was estimated categorically as less than or equal to 25, 50, 75 or 100%, and saturation was estimated in categories from 0 to 7, with 0 being no stripe and 7 being black. All measurements were performed by one of three researchers. At least two zebra at each site were measured by a minimum of two researchers in order to estimate repeatability, and two photos at different exposures were taken of a subset of animals to determine the effect of exposure on estimation of colour saturation. Finally, we examined the possibility that allometry might be a concern using data on skull lengths from [8]. We found no significant correlations between stripe characteristics and exposure or size. Additional measurement details and repeatability results can be found in the electronic supplementary material.

Environmental variables
Variables input into random forest models included 19 temperature and precipitation variables from WorldClim [56], (http://www.worldclim.org/), and five remote-sensing variables quantifying the concentration of green leaf vegetation (MODIS-derived NDVI [57]), which are typically considered a measure of greenness [58] and productivity [59] (see electronic supplementary material, table S3). We also used the MODIS-derived vegetation continuous field (VCF) product as a measure of the percentage of tree canopy cover [60]. Monthly composites (mean and standard deviation) of global Quick Scatterometer (QSCAT) [61] are indicative of surface moisture, leaf water content and other seasonal attributes. We used the Food and Agriculture Organization's Programme Against African Trypanosomiasis data on predicted probability of tsetse fly occurrence [62]. The genus Glossina (tsetse flies) is divided into three groups, which comprise several species each. The Morsitans group is the most common tsetse group predicted to co-occur with plains zebra and is often the only one predicted to occur at a site, but the other two groups, Palpalis and Fuscus, are also predicted to be present in parts of the range. As multiple species of tsetse flies are capable of carrying all three species of trypanosomes that infect animals [63,64], we used a composite of the three groups as our predictor variable. We modelled predicted probabilities for lion presence using MAXENT [65] (electronic supplementary material, figure S3). Lion presence data for 116 locations was culled from various sources including the American Museum of Natural History (http://www.amnh.org/our-research/vertebrate-zoology/mammalogy/database), the National Museum of Natural History (http://vertebrates.si.edu/mammals/mammals_databases.html), the literature [19,[66][67][68][69][70][71][72][73][74][75] and one personal communication (Kimura D, ranger, Lake Mburo National Park, 1980). The tsetse fly and lion data are both predicted probabilities of occurrence based on habitat suitability; we assume that zebras are more likely to encounter and be impacted by tsetse flies and lions in those regions where the probability of occurrence of these species are highest. Reduction from 29 to 19 variables was accomplished by removing all but one of each set of covarying predictors (correlation of greater than 0.9).
Zebra are very mobile and likely to experience a range of conditions within their area of movement thus variables derived from 1 km resolution layers were deemed to be too fine grained. We therefore averaged the value of each variable across a 25 km radius around the central point of the protected area within which they were photographed.