Negative biotic interactions drive predictions of distributions for species from a grassland community

Understanding the factors that determine species' geographical distributions is important for addressing a wide range of biological questions, including where species will be able to maintain populations following environmental change. New methods for modelling species distributions include the effects of biotic interactions alongside more commonly used abiotic variables such as temperature and precipitation; however, it is not clear which types of interspecific relationship contribute to shaping species distributions and should therefore be prioritized in models. Even if some interactions are known to be influential at local spatial scales, there is no guarantee they will have similar impacts at macroecological scales. Here we apply a novel method based on information theory to determine which types of interspecific relationship drive species distributions. Our results show that negative biotic interactions such as competition have the greatest effect on model predictions for species from a California grassland community. This knowledge will help focus data collection and improve model predictions for identifying at-risk species. Furthermore, our methodological approach is applicable to any kind of species distribution model that can be specified with and without interspecific relationships.


Species distribution models for the California grassland community
We implemented species distribution models (SDMs) in R using the BIOMOD platform (Thuiller et al. 2009). The species pool contained 53 plant and 1 grasshopper species (Table S1) found at the Angelo Coast Range Reserve (39 • 44 17.7 N, 123 • 37 48.4 W), a protected and relatively undisturbed area of 7,660 acres in northern California. We downloaded presence records for these species from the Global Biodiversity Information Facility (http://GBIF.org), and, for each species, we classified as "present" any 800m × 800m grid cell in the western USA (32 • N to 49.1 • N and 125 • W to 115 • W) that contained one or more presence records.
For both prior and posterior community distribution matrices, we used the Maxent method (Elith et al. 2011) to estimate species' responses to bioclimate variables. Following Fithian & Hastie (2013), consider predicting the distribution of an individual species across a geographical domain D at locations z ∈ D. Associated with each location is a vector of measured bioclimate variables or features x(z). A presence-only data set consists of n 1 locations of species sightings z k ∈ D for k = 1, 2, . . . , n 1 , and n 0 "background" locations z k for k = n 1 + 1, 2, . . . , n 1 + n 0 ; and let x k = x(z k ) be the features for location k. The suitability of a location for the species is given by N (z k ) ∼ Poisson(λ(z k )); where the intensity is log-linear in features x(z): Notice that α only multiplies λ(z) by a constant, so β completely determines the distribution of habitat suitability across the geographical domain: p λ (z) = e βx(z) D e βx(z) dz . Computing the maximum entropy solution is equivalent to computing the maximum likelihood for Eqn (S1) given sighting data. There are two steps: i) Obtain the maximum likelihood estimate densitŷ p λ (z), which requires findingβ for which the density is "nearly geographically uniform" (i.e., entropy is maximised) subject to constraints that make it resemble the sighting data; and ii) Multiplyp λ (z) by n 1 , which requires findingα such thatλ(z) = n 1pλ (z). In practice, we used Maxent's logistic output in BIOMOD to obtain prior habitat suitability values (ranging from 0 to 1) at each location.
We considered seven bioclimate variables: maximum temperature of the warmest month; minimum temperature of the coldest month; annual precipitation; precipitation of the driest quarter; mean temperature of the wettest quarter; temperature seasonality (standard deviation × 100); and precipitation seasonality (coefficient of variation). The climate variables were derived from an ensemble of five atmosphere-ocean general circulation models for the year 2010, and are freely available online (Pearson et al. 2014 Table S2. For the posterior community distribution matrix, we modelled the effects of biotic interactions, shared habitat suitability, and other interspecific relationships on range predictions for 14 focal species using a method that also uses Bayesian networks (BNs; Staniczenko et al. 2017). The California grassland BN contains a total of 52 conditional dependencies, representing 6 positive and 9 negative biotic interactions, 32 positive shared habitat suitability relationships, and 2 positive and 3 negative uncategorised interspecific relationships; these interspecific relationships were classified according to a series of transplant and competitorremoval experiments (Thomsen et al. 2006;, consumer addition and removal experiments , and long-term monitoring studies (Sullivan et al. 2016). In practice, conditional dependencies modify prior habitat suitability values to produce posterior habitat suitability values that also include the effects of interspecific relationships; as with our prior habitat suitability values, posterior habitat suitability values also range from 0 to 1. Because prior habitat suitability values for non-focal species were, by design, unchanged by the California grassland BN, we only considered the 14 focal species in subsequent analysis.
Given a set of prior habitat suitability values for the 14 focal species and a corresponding set of posterior habitat suitability values, we converted habitat suitability values for each species in each matrix to a binary range using one of two threshold rules. For example, with the maxSens threshold, we considered each species in turn and identified the habitat suitability value that would result in all recorded presences (species sightings) being included in its estimated range; that is, all habitat suitability values above the threshold value were assigned "1" and all habitat suitability values below the threshold value were assigned "0". A similar process was used for the maxSSS threshold, but a different, typically larger threshold habitat suitability value was identified for each species and each matrix. The resulting prior and posterior community distribution matrices describe binary range predictions for multiple species (columns) at distinct locations (rows), and are suitable for data compression analysis to determine which interspecific relationships drive predictions of species' distributions.

Measuring data compression using Bayesian networks and total length
In data compression analysis, we represent interspecific relationships by conditional dependencies in a BN model. Whereas in the above section we used BNs to modify prior habitat suitability values to generate posterior habitat suitability values, we now use BNs to assess the strength of similarity or difference between range predictions in a community distribution matrix. For example, consider the BN model in Fig. 1 of the main text, which represents three species and one positive and one negative interspecific relationship by three nodes and one positive and one negative conditional dependency: A → C and B → C. Rather than prescribing a rule that increases the habitat suitability value of C if A is likely to be present at a particular location and decreasing the habitat suitability value of C if B is likely to be present at a particular location, in this application we use the BN as a compression model to test if the predicted ranges of species A and B contain meaningful information about the predicted range of species C.
In this example, we do not need to consider state tables (also known as conditional probability tables) for species A and B for data compression analysis because their binary ranges are not affected by interspecific relationships-in SDMs, at least-and are therefore the same in both prior and posterior community distribution matrices. The state table for species C is where there is a distinct probability for each unique state of A and B. We would expect p C,2 to have the highest value given the assumption of a positive conditional dependency between A and C, p C,1 to have the lowest value given the assumption of a negative conditional dependency between B and C, and intermediate values for p C,0 and p C,3 (whether p C,0 ≥ p C,3 or p C,0 ≤ p C,3 would be determined by how A and B combine to affect C).
Maximum likelihood probabilities are calculated from data, in this case community distribution matrices, and for data compression analysis their relative values are less important than the requirement that each state has a distinct probability (the is equivalent to the "FULL" model described in Staniczenko et al. 2017). This requirement ensures that total lengths, which are calculated from maximum likelihood probabilities, reflect not only the effects of each interspecific relationship on range predictions, but also the effects of each combination of interspecific relationships on range predictions.
Total length can be calculated as the log 2 -transformed version of normalized maximum likelihood (NML; Rissanen 2001;Myung et al. 2006;Staniczenko et al. 2014). In this study, NML quantifies how well a BN model representing interspecific relationships explains a particular community distribution matrix compared to how well the model explains all possible community distribution matrices with the same number of species and locations. A good model explains only the focal community distribution matrix well and all other matrices poorly, resulting in high NML and short total length. By contrast, an overly simplistic model is likely to explain the focal matrix no better than it does any other matrix, resulting in low NML and long total length. And although an overly complex model is likely to explain the focal matrix well, it will also explain all other matrices well due to its greater inherent flexibility, still resulting in low NML and short total length. For BN model M and community distribution matrix B, NML can be written as whereL is a likelihood function and p B is a vector of maximum likelihood probabilities from state tables associated with M . In Eqn (S2), conventional likelihood is normalised over all matrices B with the same number of rows and columns as the original community distribution matrix B.
For binary matrices, each element can be considered the outcome of a Bernoulli trial and so the maximum likelihood estimate of state table probability p l iŝ where U l is the number of ones (locations within a predicted range) and Z l the number of zeros (locations outside a predicted range) in B that are associated with p l . For example, consider the state table for species C, and, in particular, the case in which the predicted ranges of species A and B overlap, which has an associated probability p C,3 . Imagine that a corresponding community distribution matrix has 20 rows (locations) and the predicted ranges of A and B overlap at 6 locations. If the predicted range of C includes 4 of these 6 locations thenp C,3 = 4/(4 + 2) = 2/3. Similar calculations involving sections of the other 14 rows (e.g., locations in which the predicted ranges of A and B do not overlap forp C,0 ) would give maximum likelihood estimates for the three remaining state table probabilities.
The maximum likelihood function for binary matrices iŝ where the product is taken over the complete set of state table probabilities in p B .
Parametric complexity has a surprisingly simple form when model parameters are conditionally independent (Staniczenko et al. 2014) 1 . However, calculating parametric complexity is less straightforward with BN models due to conditional dependencies between random variables (Grünwald 2007). One solution is factorised NML (fNML), which has been shown to provide a good approximation to NML (Silander et al. 2008(Silander et al. , 2009 with the contribution for each state table probability p l written as which can be computed quickly (Staniczenko et al. 2014).
Because absolute changes in total length can be unwieldy when a community distribution matrix contains a large number of locations, it is clearer to present values scaled by the number of bits required to transmit an uncompressed version of the matrix. With binary matrices, this reference total length is simply the number of matrix rows (locations) multiplied by the number of matrix columns (species).
Implications of the requirement that Bayesian networks are acyclic for modelling interspecific relationships in species distribution models The requirement that a Bayesian network is acyclic restricts how the effect of an interspecific relationship can be modelled in our approach, particularly when the effects of an interaction could be symmetric (e.g., mutualistic or competitive interactions). In the case of competition, the main consideration is that one species must be identified as the stronger of the two competitors, with the stronger competitor influencing the occurrence probability (or habitat suitability value) of the weaker competitor. Once the direction of influence has been defined, a high occurrence probability (or habitat suitability value) for the stronger competitor is modelled as reducing the occurrence probability (or habitat suitability value) for the weaker competitor compared to its environment-only probability. Clearly, this imposed asymmetry is more realistic when interaction strengths are unequal between species and less realistic when strengths are more equal. Finally, it is worth noting that modelling an interaction in this way has no effect on the stronger competitor-due to the uni-directionality of a conditional dependency, only the estimated range of the weaker competitor is considered in data compression analysis.
Implications of the requirement that Bayesian networks are acyclic for comparing the effect of interspecific relationships on community distributions We tested the sensitivity of our results to the directionality of interspecific relationships in compression models by repeating analyses with a new set of Bayesian networks in which the sign of interspecific relationships is maintained from corresponding original versions but the direction of influence is reversed. We considered the same prior and posterior community distribution matrices with a new set of eight compression models: ALL reverse, SHS BI reverse, SHS BI+ reverse, SHS BI-reverse, SHS reverse, BI reverse, BI+ reverse, BI-reverse.
It is first important to note that it is not informative to compare raw values of ∆ M and ∆% M between the set of original compression models and the set of corresponding "reverse" models. This is because different numbers of species are potentially affected and so the scaling constant is different between the two sets of models, i.e., 13 focal species are assessed for compression with ALL (1 of the 14 focal species has no conditional dependencies), in comparison to 7 focal species and 23 non-focal species with ALL reverse. However, it is informative to compare differences in model rankings between the set of original compression models and the set of "reverse" compression models.
Rankings are similar between the two sets of compression models (Table S3). With the maxSSS threshold, the ranking for ∆ M among "reverse" Bayesian network models is exactly the same as with original models; rankings are similar for ∆% M , with BI-and BI-reverse top-ranked in their respective model sets. With the maxSens threshold, rankings are similar for ∆ M , and BI-and BI-reverse are again top-ranked in their respective model sets for ∆% M . These findings show that our approach, results, and conclusions are insensitive to directionality in Bayesian network compression models, especially when using the maxSSS threshold and when considering model rankings for ∆% M . (Of course, directionality is important when modelling interspecific relationships in species distribution models, such that range predictions are modified for the intended species.)

Results for alternative Bayesian networks for modifying habitat suitability values in species distribution models
We repeated our workflow with two alternative Bayesian networks for modifying habitat suitability values in SDMs (see Fig. 1 in main text): (i) only shared habitat suitability relationships (32 positive conditional dependencies) and (ii) only biotic interactions (6 positive and 9 negative conditional dependencies). With only shared habitat suitability relationships in SDMs, we would expect, of our eight compression models (ALL, SHS BI, SHS BI+, SHS BI-, SHS, BI, BI+, BI-; see Fig. 2 in main text), BI, BI+, and BI-to perform worse than compression models that include SHS because those interactions are not included in the SDMs. By contrast, with only biotic interactions in SDMs, we would expect BI, BI+, and BI-to perform better than compression models that include SHS.
We have already established that modelling 52 interspecific relationships between species from the California grassland community using a Bayesian network improves range predictions for 13 focal species, with model performance measured as increases in AUC compared to when no interspecific relationships are modelled in SDMs (Staniczenko et al. 2017 ; Fig. S1).
Model performance is still good for the two alternative Bayesian networks with only shared habitat suitability relationships (Fig. S2) and only biotic interactions (Fig. S3), although understandably fewer predicted ranges are modified.
With only shared habitat suitability relationships in SDMs, results for the maxSens threshold show that the SHS compression model moves from being lowly ranked (when all 52 interspecific relationships are modelled) to top-ranked while BI, BI+, and BI-move from highly ranked to bottom-ranked, as expected (for both ∆ M and ∆% M ; Table S4). Similarly, results for the maxSSS threshold show that all compression models that include SHS rank higher than BI, BI+, and BI-.
With only biotic interactions in SDMs, results for the maxSens threshold show that BI, BI+, and BI-remain top-ranked (compared to when all 52 interspecific relationships are modelled) and the order remains BI-then BI then BI+ (for both ∆ M and ∆% M ). Results for maxSSS and ∆ M show that BI, BI+, and BI-move from bottom-ranked to top-ranked.
Interestingly, although results for maxSSS and ∆% M show that BI and BI+ move from lowly ranked to highly ranked, as expected, BI-moves from top-ranked to bottom-ranked.
The finding that BI-drops in ranking when modelling only biotic interactions in SDMs suggests that modelling negative biotic interactions is most important when the two species involved in the interaction are likely to occur together in the absence of other species that may influence the presence of the focal species. One can understand this interpretation as follows. With our current implementation of modelling interspecific relationships in SDMs, the effect of negative biotic interactions is very strong in the absence of positive shared habitat suitability relationships. That is, modelled negative biotic interactions reduce habitat suitability values (HSVs) of affected species more often and by greater amounts than they otherwise would if all interspecific relationships were to be modelled. Because HSVs are systematically lower when mainly negative biotic interactions are modelled in SDMs, maxSSS "overcorrects" when transforming HSVs to a binary presence-absence range, and in doing so dilutes the effect of modelling negative biotic interactions in SDMs. This dilution effect explains why BI-does not lead to as much compression when modelling only biotic interactions in SDMs compared to modelling all interspecific relationships. (Note that maxSens does not appear to result in this dilution effect.) To illustrate this dilution effect, consider a focal species A that is negatively influenced by a biotic interaction with species B and positively influenced by a shared habitat suitability relationship (or some other positive interspecific relationship) with species C. First consider the case in which both negative and positive interspecific relationships are modelled in the SDM for species A. When all three species are likely to occur together based on an environment-only SDM-i.e., all three species have high prior HSVs at a particular location-then the effects of the positive and negative interspecific relationship are modelled as cancelling one another out. When only species A and B are likely to occur together, the negative biotic interaction has a large effect on the posterior HSV of species A, which is picked up by the BI-compression model. By contrast, consider the case in which the positive interspecific relationship between species A and C is not modelled in the SDM for species A.
The negative biotic interaction between species A and B "incorrectly" reduces the posterior HSV of species A when all three species are likely to occur together. Subsequently, maxSSS "overcorrects" and identifies a threshold HSV for species A that is very different from both the threshold value for prior HSVs and posterior HSVs when all interspecific relationships are modelled in the SDM for species A. This overcorrection leads to a dilution of the effect of negative biotic interactions on the predicted range of species A when only biotic interactions are modelled in SDMs, and therefore gives rise to the result that BI-performs less well as a compression model.     Figure S1: Increases and decreases in AUC scores for the 14 focal species when including all interspecific relationships in species distribution models (SDMs). For each focal species, we calculated the difference in the AUC score when using posteriors (output of an SDM with interspecific relationships) and when using priors (output of an SDM without interspecific relationships) for the same partition of data into training and test sets. Paired differences in AUC were calculated for 20 random partitions for each focal species. Box plots show interquartile range, median, minimum and maximum values of paired differences in AUC. If the difference in AUC is consistently greater than zero, then the Bayesian network is valuable and including the particular interspecific relationships in SDMs improves estimates of species geographical ranges.  Figure S2: Increases and decreases in AUC scores for the 14 focal species when including only shared habitat suitability relationships in species distribution models (SDMs). For each focal species, we calculated the difference in the AUC score when using posteriors (output of an SDM with interspecific relationships) and when using priors (output of an SDM without interspecific relationships) for the same partition of data into training and test sets. Paired differences in AUC were calculated for 20 random partitions for each focal species. Box plots show interquartile range, median, minimum and maximum values of paired differences in AUC. If the difference in AUC is consistently greater than zero, then the Bayesian network is valuable and including the particular interspecific relationships in SDMs improves estimates of species geographical ranges.  Figure S3: Increases and decreases in AUC scores for the 14 focal species when including only biotic interactions in species distribution models (SDMs). For each focal species, we calculated the difference in the AUC score when using posteriors (output of an SDM with interspecific relationships) and when using priors (output of an SDM without interspecific relationships) for the same partition of data into training and test sets. Paired differences in AUC were calculated for 20 random partitions for each focal species. Box plots show interquartile range, median, minimum and maximum values of paired differences in AUC. If the difference in AUC is consistently greater than zero, then the Bayesian network is valuable and including the particular interspecific relationships in SDMs improves estimates of species geographical ranges.