Common species link global ecosystems to climate change: dynamical evidence in the planktonic fossil record

Common species shape the world around us, and changes in their commonness signify large-scale shifts in ecosystem structure and function. However, our understanding of long-term ecosystem response to environmental forcing in the deep past is centred on species richness, neglecting the disproportional impact of common species. Here, we use common and widespread species of planktonic foraminifera in deep-sea sediments to track changes in observed global occupancy (proportion of sampled sites at which a species is present and observed) through the turbulent climatic history of the last 65 Myr. Our approach is sensitive to relative changes in global abundance of the species set and robust to factors that bias richness estimators. Using three independent methods for detecting causality, we show that the observed global occupancy of planktonic foraminifera has been dynamically coupled to past oceanographic changes captured in deep-ocean temperature reconstructions. The causal inference does not imply a direct mechanism, but is consistent with an indirect, time-delayed causal linkage. Given the strong quantitative evidence that a dynamical coupling exists, we hypothesize that mixotrophy (symbiont hosting) may be an ecological factor linking the global abundance of planktonic foraminifera to long-term climate changes via the relative extent of oligotrophic oceans.


Supplementary Text Time series analysis
We used two non-parametric and one model-based time series analysis method to test for causal directionality. For non-parametric time-lag analyses, missing SCOR values were linearly interpolated, and the DOT data averaged in the SCOR time bins. Furthermore, to assess significance with surrogate data, both records were detrended using a second-order polynomial to satisfy a stationarity criterion (Kwiatkowski et al., 1992), then normalized to zero mean and unit standard deviation. For consistency, the model-based analysis was also performed on the pre-processed data. However, the detrending may remove components of the variation relevant to uncovering the parameters of underlying processes. We therefore repeated the model-based analysis on untransformed data.

Convergent cross mapping (CCM)
If a time series X is a component of a dynamical system, we can construct a delay-coordinate embedding of the state space of the system into an m-dimensional real space, by constructing the vectors where x(t i ) is the scalar value of the time series X at time t i (Takens, 1981). That is, the vectors in E X are in one-to-one correspondence with the states of the whole system. If X and Y are coupled variables of the same dynamical system (i.e. they are causally connected), this correspondence is also true for time series Y , and therefore E X and E Y are in one-to-one correspondence with each other. CCM uses this result to predict scalar values of Y from the coordinate-delay embedding of X and vice versa.
The CCM algorithm (Sugihara et al., 2012) locates, for each scalar point P i in the prediction set (subset of time series Y ), the contemporaneous state vector L i in the library set (subset of state vectors in E X ). Next, it finds L i 's closest neighbors and estimates a value for the predictee P * i using simplex projection (Sugihara and May, 1990). CCM skill is determined by the correlation (Pearson's ρ) between estimated P * i and actual values of P i . With increasing library size, CCM skill is expected to increase and converge if the variables are causally related. The notation "X xmap Y " refers to estimating y(t i ) from corresponding lagged-coordinate state versions of x(t i ), which is interpreted as "Y is causally influencing X".
We performed CCM using the rEDM package (Ye et al., 2015a), with embedding dimension m = 2, delay time step τ = 1, and using the entire time series as both prediction and library sets. We used leave-one-out cross validation (i.e. the predictee P i itself and points in a time radius of E around P i were excluded from the libraries, such that no points sharing coordinates with P i were used in the predictions Ye et al., 2015a). To distinguish unidirectional forcing from bidirectional causality, we calculated CCM for different time lags of the original time series. If there is a discernable delay between cause and effect, then CCM is expected to peak for negative time lags in the direction(s) of true causality (past predicts future). If true causality is unidirectional but with synchrony (inducing two-way predictability), then any CCM skill in the non-causal direction is expected to peak for positive lags (Ye et al., 2015b). Lagged CCM analysis of SCOR and DOT is reported as median CCM skill and 95% ranges at each lag for 300 iterated samples of library size 150. CCM is considered significant if it exceeds the 95th percentile of 300 amplitude-adjusted Fourier transform (AAFT) surrogate time series (Theiler et al., 1992).

Information transfer (IT)
If two processes X and Y are independent, then knowing the past l states of Y has no influence on the state transition probability of X, from x n to x n+1 , beyond knowing the past k states of X alone: Transfer entropy (Schreiber, 2000) quantifies the information lost when assuming independence by means of a Kullback-Leibler divergence: which yields a non-symmetric measure of information flow, equivalent to Granger causality for linear, Gaussian systems (Barnett et al., 2009).
We implement transfer entropy in the modified IT form (Verdes, 2005;Hannisdal, 2011) and introduce time lags analogously to the lagged CCM. If there is a causal delay, then IT is expected to peak for negative time lags in the direction(s) of true causality (past → future). If true causality is unidirectional but with synchrony, then any IT in the non-causal direction is expected to peak for positive lags. IT varies as a function of the coarse-graining resolution, integrated in a single IT value using the area under the resulting curve (Verdes, 2005). Lagged IT analysis of SCOR and DOT is reported as median IT and 95% ranges for 100 draws of 100 random samples at each lag. IT is considered significant if it exceeds the 95th percentile of 5,000 AAFT surrogates.

Stochastic Differential Equations (SDE)
Linear SDEs can be used to distinguish between correlation and Granger causality, including uni-and bidirectional causation, as well as hidden (unmeasured) processes that expand the space of possible connections (Reitan et al., 2012;Liow et al., 2015). A basic SDE is which describes a mean-reverting Ornstein-Uhlenbeck process (OUP) X with systematic (dt) and stochastic (dB) terms, expectation µ X , standard deviation s X = σ X / √ 2α X , and half-life t 1/2,X = log(2)/α X . If α X = 0, then dX describes a Wiener process (WP, or random walk). A process Y 1 may have a hidden process (or layer) Y 2 folded into its systematic part, such that Y 1 fluctuates with a lagged response to the OUP Y 2 , and if there is a causal connection, e.g. from Y 2 to X, we can write where β Y 2 →X describes the connection strength.
We first modeled SCOR and DOT separately with up to three layers (two hidden), each layer being WP or OUP. We excluded one-layered WP, which does not accept incoming causal links. We also excluded between-layer feedback loops, which were numerically intractable. All model parameters were assigned normal priors, with 95% prior ranges of µ i ∈ (−1.96, 1.96), σ i ∈ (0.01, 1.0), t 1/2,i ∈ (0.1, 50)Myr, where i denotes the layer, β ∈ (−2, 2) for causal connections, and ρ ∈ (−0.96, 0.96) for correlations. We used MCMC-based importance sampling to estimate Bayesian model probabilities (Reitan et al., 2012). SDE analysis of SCOR and DOT used 1/2 Myr time bins, with SCOR calculated on the total original NSB species list (done prior to taxonomic re-mapping). However, CCM and IT results did not change with bin size or species re-mapping, and SDE conclusions should also be robust. The best SCOR model was a one-layer OUP, while the best DOT model was three-layered with a lowermost WP, yielding 15 connection models, including the null model of no relationship (figure S6). We allowed for two-way causality because SCOR and DOT both derive from deep-sea carbonate sediments. We assigned 50% prior probability to the null, and distributed 50% evenly among the 14 connection models.
The posterior probability of the null (P = 0.11) yields a Bayes factor favoring a connection of B = 7.9, which is substantial evidence (Jeffreys, 1961). In the two best models (figure S6) the upper DOT process (DOT1) affects SCOR positively while SCOR affects DOT1 or DOT2 negatively, but DOT processes react very slowly to changes in SCOR (table S2). Because SCOR changes more rapidly, the DOT response will be smoothed out, hence DOT affects SCOR much more strongly per time unit than vice versa. In the third best model (P = 0.18; figure S6) DOT1 affects SCOR unidirectionally, hence the evidence for feedback is weak (B = 2.6). SDE analysis thus supports at least one connection (B = 7.9); causality rather than correlation, given a connection (B = 12.5); and DOT influencing SCOR, given a connection (B = 15.4). Although the DOT-SCOR causality is indirect, SDE shows a direct connection, because data on causal intermediates are lacking. SDE analysis on untransformed data (uDOT and uSCOR, both having three-layer models with a lowermost WP) yielded very strong evidence (B = 73) for a connection, supporting feedback between the top layers uDOT1 and uSCOR1 (table S2). However, we found strong support for cyclical behaviour (1.5 Myr period), consistent with internal, between-layer feedback in uDOT.
Reitan, T., T. Schweder, and J. Henderiks (2012). Phenotypic evolution studied by layered stochastic differential equations. The Annals of Applied Statistics 6 (4), 1531-1551.  Figure S1. The relationship between global abundance and global occupancy is noisy, but positive. (a) Global abundance-occupancy relationship for modern planktonic foraminifera in the TARA Oceans data set (ref. 4). The data are metabarcodes (V9-18S rDNA sequences), their presence/absence across 334 sample localities, and the total abundance (number of reads) of each barcode. Barcodes were collapsed into "species" (OTUs, or "cid" identifiers). Here we include only those 102 species that have a planktonic taxon attribution showing >90% identity with a reference sequence, thus excluding 66 benthic, 8 tychopelagic, and 72 unresolved species (R. Morard, personal communication, Dec 19, 2016). The summed total abundance for each species is plotted against the number of sample localities in which it was present (sample occupancy). The 20 most widespread species account for more than 94% of the global abundance (or more than 84% if we exclude the most abundant species, marked in red). (b) Variability in the abundance-occupancy relationship in a Poseidon simulation. Parameter settings correspond to one of the scenarios in the random-loss experiment (figure S5), with RAD variability = 4 and a constant proportion of 50% of species randomly removed in each time step. This scenario matches the upper right-hand corner of the panels in figure S5d. The abundance-occupancy relationship is from a single time step (in which the spatial site sampling proportion is ∼16%), across ten model iterations. Like in the TARA Oceans data (a; see also ref. 4), there is greater variability above the trend line than below the trend line (abundant species are not always widespread, but widespread species are unlikely to be rare). (c) Random removal of species in each time step causes short-term volatility in SCOR, which is most severe in periods of high total abundance (depending on richness and RAD). Despite this volatility, SCOR (blue, encompassing 300 model iterations) is able to capture the relative changes in global abundance (red) with meaningful accuracy (time series are normalized to mean zero and unit standard deviation), and outperforms richness estimators under these conditions (figure S5d). Figure S2.   Figure S3. Subsampling methods track variability in the shape of the rank-abundance distribution. (a) Same experiment as in figure 1d, but measuring the coefficient of determination (R 2 ) between each estimator and the RAD shape parameter σ. Note that σ is a white noise process, and its variability is entirely random with respect to true richness and abundance. For the subsampling richness estimators, a high R 2 and blue colour here means that the estimator is effectively random with respect to the quantity it is trying to estimate. The sensitivity of subsampling methods to RAD shape thus exists regardless of how one might choose to measure sample evenness. (b) Distribution of sampled evenness across all Poseidon experiments. Shaded histogram represents the model runs testing the sensitivity to variability in the proportion of species randomly removed, and variability in RAD shape parameter σ ( figure S5). Un-shaded histogram (note transparency in overlap) represents the model runs testing the sensitivity to variability in the proportion of sites sampled, and variability in σ  Figure S4. Sampling-standardized richness can be reproduced by the sum of raw richness and evenness. Raw sampled richness (S) and evenness (Pielou's J) of Cenozoic coccolithophores (a) and planktonic foraminifera (b) species from the NSB database. In both groups, S generally increases with the number of boreholes representing the spatial sampling, while J decreases, as expected if improved sampling enhances the detection of rare species (figure 1b,c). The sum of raw S and raw J superimposed on shareholder quorum subsampling (SQS) estimates of richness for coccolithophores (c) and foraminifera (d), all normalized to zero mean and unit standard deviation. SQS was calculated with a quorum level of 0.4 (higher quorum levels give nearly identical results but are less complete for the older part of the record). Ma, million years before present; Paleoc., Paleocene; Oligoc., Oligocene; P, Pliocene; Pt, Pleistocene. However, a proportion of the species is randomly removed in each time step, causing volatility in occurrences (red). This random loss could represent dissolution or other processes, such as taxonomic preferences in sample processing, that are random with respect to abundance. No variability in the proportion lost means that 50% are randomly selected and removed in each time step. In this example, variability = 0.4, meaning that between 30% and 70% of species are lost. (c) Even with a constant original RAD shape, the random loss of species (and variability in the proportion lost) generates volatility in sampled evenness (this example is an extreme case, see figure S3b). (d), Sensitivity to variability in RAD shape and in the proportion of species lost. Values are median goodness-of-fit (R 2 ) of 50 model runs, comparing SCOR to true abundance, and richness estimates to true richness. Figure S6. Schematic of all possible connection models between SCOR and DOT in linear SDE analysis. SCOR is modeled as a single-layer OUP, and DOT as a three-layer model with a WP as the lowermost layer (DOT3). Note that SCOR cannot drive DOT3 because a WP does not accept incoming causal links.

Supplementary Figures
Percentage values represent posterior model probabilities. Solid arrows represent casual connections pointing from driver to response (relative strength of influence not indicated here). Dotted lines represent correlative relationships. See Supplementary Text for details. Figure S7. Testing for systematic offset between age models. A subset of 25 sites used in the NSB database planktonic foraminifera SCOR calculation were also used in the DOT reconstruction by Cramer et al. (ref. 23). To facilitate the comparison of the two age models at each site, we first discretized the overlapping sediment depth interval into depth bins, then calculated the median age for the samples in the depth bin relative to the total age range midpoint for each bin. If the age models were systematically offset, then the locations of these two binned age distributions would be shifted relative to each other. The paired differences between these two distributions give the offsets between the age models of the two databases. Although there are substantial differences between the age models, the age distributions are quite similar and symmetric, and the median offset is only 0.007 Ma. To test whether or not the two age distributions are significantly different, we performed a non-parametric Wilcoxon signed-rank test for paired differences. The null hypothesis of this test is that the difference between the pairs (i.e. the offset between the age models) follows a symmetric distribution around zero. With a test p-value of 0.41, we cannot reject the null hypothesis, suggesting that the age models are not systematically offset. Figure S8. The effect of variable global coverage of sampled sites on SCOR. Like in all the other Poseidon simulations, species richness and total abundance vary independently (e.g. figure 1a). However, here we use only 500 spatial grid cells, which are assigned unique latitude-longitude coordinates to mimic the real NSB data. Hence, the total numbers of species and individuals have also been scaled down (e.g. true richness here fluctuates between 100 and 200 species), in order to limit the number of time steps in which a species is present at all sampled sites. We use 195 time steps to match the Cenozoic NSB data at 1/3 Myr resolution. Instead of randomly subsampling cells, we here sample according to the latitude-longitude coordinates of the real sampled sites in the NSB planktonic foraminifera data through the Cenozoic. Because the number and global coverage of sites decrease notably with age, SCOR shows much greater short-term volatility in the older part of the record (blue, encompassing 300 model iterations). This increased volatility is also reflected in the expanding confidence bounds on the real NSB planktonic foraminifera SCOR (figure 2b). Despite this reduced precision, however, SCOR is not systematically biased with respect to the long-term pattern of relative changes in true global abundance (red). Time series are normalized to mean zero and unit standard deviation. Tables   Table S1. Taxonomic re-mapping within the most common NSB species.

Supplementary
NSB species Re-mapped species