## Abstract

Species richness is an essential biodiversity variable indicative of ecosystem states and rates of invasion, speciation and extinction both contemporarily and in fossil records. However, limited sampling effort and spatial aggregation of organisms mean that biodiversity surveys rarely observe every species in the survey area. Here we present a non-parametric, asymptotic and bias-minimized richness estimator, *Ω* by modelling how spatial abundance characteristics affect observation of species richness. Improved asymptotic estimators are critical when both absolute richness and difference detection are important. We conduct simulation tests and applied *Ω* to a tree census and a seaweed survey. *Ω* consistently outperforms other estimators in balancing bias, precision and difference detection accuracy. However, small difference detection is poor with any asymptotic estimator. An R-package, *Richness*, performs the proposed richness estimations along with other asymptotic estimators and bootstrapped precisions. Our results explain how natural and observer-induced variations affect species observation, how these factors can be used to correct observed richness using the estimator *Ω* on a variety of data, and why further improvements are critical for biodiversity assessments.

This article is part of the theme issue ‘Detecting and attributing the causes of biodiversity change: needs, gaps and solutions’.

### 1. Introduction

A central objective of biodiversity science is to understand how many species there are and how the number and kinds of species change across space and time [1]. Not only is species richness important for today's ecosystem and human health assessments [2–6], it is essential for inferring palaeoecological patterns and mass extinction events [7]. Estimating missing species or classes from limited observations is a fundamental problem in ecology [8–10], genomics [11], cryptography, machine learning and linguistics [12]. These examples show richness estimation has broad applications, yet current estimators have limited accuracy.

In biodiversity and conservation sciences, absolute richness estimation and detection of richness differences (spatial or temporal) are each challenging in their own ways [13–15] because of data limitation and statistical problems [16–21]. Observed species richness and detection of differences across communities or time are prone to data bias [22] and undocumented errors resulting from varying survey and sampling effort, design and observer skill [14]. These problems are more acute for species richness than for other biodiversity metrics [23], in part because richness weighs all species equally regardless of rarity. These data and statistical problems, along with other theoretical reasons, have led researchers to consider other biodiversity metrics focusing on abundance [23], community turnover [24], evenness-weighted diversity (Roswell *et al*. [25]) and interaction network (Chiu *et al*. [26]). However, richness remains a primary metric that is accessible to scientists and the public, is the main determinant of ecosystem functioning and stability in ecological theory [1], and is a measure of speciation and extinction [17]. Furthermore, knowing the number of missing species is critical to assessing how much sampling effort is required to accurately measure other biodiversity metrics (such as species interactions). Data-intensive methods such as multi-species occupancy models can estimate richness of targeted species groups, but they require repeated field surveys and are sensitive to models of abiotic and biotic drivers [27,28]. For many general applications and meta-analyses, it is desirable to have an asymptotic (finite projection at infinite sampling effort) richness estimator that can both correct bias (uncover true absolute richness) and accurately detect spatial and temporal differences across communities.

Commonly, biodiversity observations include individual counts identified to the species level. One obvious way to improve biodiversity observation is to increase sampling effort [29], but this is costly and not possible for some taxa, habitats and historical samples. Insufficient sampling effort can increase bias, reduce estimation precision (repeatability) and undermine efforts to detect change over time, but a robust, unbiased estimator can recover missing observations to some extent. The most robust asymptotic richness estimator that uses abundance data remains the Chao1 method [8,30]. Chao1 and related estimators are based on Turing's sampling theory that relates the number of unobserved classes (codes or species) to observed rarity [30,31]. These methods are expected to be inaccurate with spatial heterogeneity and heterogeneous observation probabilities [32]. Other estimators rely on tuning parameters, but these methods can be unstable and are still under development [33]. There is a need to develop new statistical methods that fully use information in common biodiversity survey design datasets, without having to separately estimate additional quantities that may be harder to obtain than richness itself [27].

Here we outline four technical problems that an ideal asymptotic estimator should solve. The first two are human-induced biases to species observation: the fraction of individuals observed and the number of patches (sites, transects or quadrats) sampled are expected to influence observed richness. The next two are natural properties of biological distributions that can also influence observation: abundance and patch occupancy (with effects foreshadowed in figure 1). These variables will be precisely defined, and we use them to mechanistically derive observation probabilities and a set of what we will call *Ω* bias-corrected asymptotic richness estimators. This approach departs from both Turing's reliance on rarity, which is prone to noisy observations, and from earlier parametric extrapolations that have weak justifications [30]. Using the derivations, we highlight how means and variances of the variables affect richness observations. Next, we evaluate the performance of three proposed *Ω*-estimators relative to six other established asymptotic estimators (Chao1, Gamma-Poisson, Chao2, abundance-based coverage estimator (ACE), Jackknife1-abundance, Jackknife2-incidence [8,30,34–37]) using simulations designed to challenge each estimator. We use six performance measures to evaluate bias (under- or over-estimation tendency), precision (repeatability of estimates) and accuracy (error magnitude and spatial/temporal trend detection). We additionally devise a bootstrapping procedure that accounts for the spatial and species dependence to quantify precision from a single survey. To test our method with empirical applications, we use two multi-year, multi-patch datasets: the Barro Colorado Island tree census [38], which probably reflected true richness and a negative temporal trend, and the British Columbia intertidal seaweed survey [39], which probably was incomplete with an unclear temporal trend. Even when true richness is unknown, spatial subsampling and local downsampling experiments on the real datasets allow us to explore estimators' consistency across real conditions. Finally, we provide Matlab codes [40] and an R-package*, Richness*, which generate a suite of asymptotic richness estimates (https://github.com/EWTekwa/Richness).

### 2. Methods

The general reader aiming to understand the performance and application of the proposed estimator may skip the following *Richness estimator derivation* and *Simulation set-up* subsections on first reading, and instead focus on evaluation and application sections that follow. The technical subsections will be of greatest interest to statisticians, but would also aid the general reader in understanding why our proposal works. The data and code used in this paper are available on a Zenodo repository at [40]. The R-package *Richness* is available at https://github.com/EWTekwa/Richness, which contains the main function *EstimateRichness* that takes in spatial abundance data and returns point estimates and bootstraps (see next section) for observed richness, Chao1, Gamma-Poisson, Chao2, ACE, Jackknife-abundance, Jackknife-incidence, *Ω*, *Ω** _{T}* and

*Ω*

_{0}.

#### (a) Richness estimator derivation

Here we define variables that can be measured from a spatially partitioned abundance dataset, with the aim of arriving at a bias-corrected richness estimator. Our strategy involves the following steps (figure 1). We define ‘patch’ *x* as the spatial unit where individuals can be counted, and a community consists of a collection of *N* patches (including those sampled and unsampled). We model how both spatial heterogeneities across patches and observation errors within patches affect the probability that a species is observed at least once in the community. We then use spatial abundance statistics of observed species to estimate true richness, which is the observed richness divided by some estimated observation probability.

Each species *i* has a true mean density *n _{i}*, which is the total number of individuals in the community divided by the true number of patches

*N*, or equivalently the individuals per patch

*x*across all true patches (

*n*=

_{i}*E*[

_{x}*n*]). The fraction of individuals observed at

_{x}*x*,

*m*, is also understood as the individual catchability of species

_{ix}*i*[35]. Thus,

*m*, simplified as (

_{ix}n_{ix}*mn*)

*, is the local ‘observable abundance’. The set of species*

_{ix}*i*'s observable abundances averaged across space, (

*mn*)

*, will serve as the first (non-spatial) random state variable in our subsequently developed richness estimator.*

_{i}In contrast with the above community scale, let us descend to the local patch scale where observation occurs as a function of within-patch density (*n _{ix}*). Specifically, local observation probability (

*D*

_{x}_{|}

_{nix >}_{0}) is a function of

*n*within occupied patches (

_{ix}*n*

_{ix}_{|}

_{nix >}_{0}). The mean within-patch density of occupied patches

*n*

_{i}_{|}

_{nix >}_{0}, is ${\sum}_{x|{n}_{ix}\u27e90}^{N{P}_{i}\hspace{0.17em}}{n}_{ix}/(N{P}_{i})$ where

*p*is occupancy, or the fraction of patches where species

_{i}*i*is present. On the other hand, mean density

*n*is ${\sum}_{x}^{N}{n}_{ix}/N$. Since in unocccupied patches

_{i}*n*= 0,

_{ix}*n*can be written as ${\sum}_{x|{n}_{ix}\u27e90}^{N{P}_{i}}{n}_{ix}/N$. Thus,

_{i}*n*

_{i}_{|}

_{ni>}_{0}

*= n*/

_{i}*p*. When a fraction

_{i}*m*of individuals are observed, on average there are (

*mn*)

*/*

_{i}*p*observable individuals in occupied patches. Using this mean, the local observation process can be described by a non-spatial Poisson sampling process (sampling with replacement). This local Poisson sampling process should not be confused with a Poisson spatial distribution across patches, which was not assumed. Sampling with replacement is likely in non-destructive field surveys when individuals move and their identities are not tracked; on the other hand, sampling without replacement approaches Poisson sampling when

_{i}*m*is small. When

*m*approaches 1, observation probability for sampling without replacement will be underestimated by Poisson (overestimates richness as will be seen by inspecting equation (2.4)), but correction hinges on knowing

*m*[41], which we assume to be unavailable. We write the local observation probability within occupied patches as 1 minus the rate of failure to observe:

Now we zoom back out to the community level and consider spatial processes across multiple patches that may be occupied or not. The observation probability of species *i* per random patch *x* is the occupancy times the conditional local observation probability (equation (2.1)) when the patch is occupied:

The observation probability of species *i* within the community is 1 minus the cumulative probability of failure to observe, the latter being the probability of failure to observe within patch raised to the power *k _{i}* (our fourth state variable) if patches are randomly picked with replacement. The result does not differ from picking without replacement when

*k*is much smaller than the true number of patches. Otherwise, sampling with replacement reduces the chance of observing the same unoccupied patch, so in this situation assuming sampling without replacement underestimates observation probability (overestimates richness). The sampling with replacement observation probability across the community (

_{i}*D*) is

_{i}If patches are not randomly picked, then *k _{i}* should be the effective sample size. As an extreme example, when only one patch is repeatedly sampled at the same time in exactly the same way, effectively

*k*= 1. In practice however, even sampling the same patch repeatedly would generate different individual and species counts when catchability

_{i}*m*is below 1 and/or individuals move between patches during the finite time separating samples. Ignoring the non-independence of spatial samples would inflate

*D*(underestimates richness), but defining what is meant by effective sample size will require nuanced considerations of how the spatial sampling design relates to the community's spatial distribution. We will assume

_{i}*k*samples are independent and revisit sample non-independence in the Discussion.

_{i}The observed richness *S* can be written without additional assumptions as the true richness *Ω* multiplied by the expectation of *D _{i}* across truly present species

*i*as a function of,

**which includes**

*Φ**R*state variables

*ϕ*. In our observation process model,

_{r}*R*= 3 and

*ϕ*to

_{1}*ϕ*are (

_{3}*mn*)

*,*

_{i}*P*and

_{i}*k*:

_{i}By rearranging equation (2.4), we define *Ω* (equation (2.5)) as our proposed ‘exact’ bias-corrected richness estimator using the average observation probabilities based on state variables $\hat{\phantom{\rule{-1.8pt}{0ex}}\mathbf{\Phi}}$ measured from the set of observed (rather than truly present) species (see figure 1 for numerical example). All quantities with a $\hat{\mathrm{hat}}$ denote observed rather than true values. The estimator *Ω* contains survivorship bias [42]—a type of sampling bias stemming from the difference between $\hat{\phantom{\rule{-1.8pt}{0ex}}\mathbf{\Phi}}$ and ** Φ**, because $\hat{\phantom{\rule{-1.8pt}{0ex}}\mathbf{\Phi}}$ is only measurable for observed species. For intuition, observed species (those that ‘survived’ sampling) will probably have higher abundances than the average of truly present species because rare species tend to be missed, so the mean observed abundance is higher than the true mean abundance of all species:

An alternative is to write ${E}_{i}[{D}_{i}(\mathbf{\Phi})]$ in equation (2.4) in terms of a second-order Taylor expansion (with *r* and *q* being variable indices), which provides theoretical insights into how means and variances of different statistics affect the observation process. The Taylor estimator *Ω _{T}* is

*c*,

_{1}*c*and

_{2}*c*.

_{3}*k*is assumed to have no variance since spatial surveys commonly aim to discover all possible species at the same time, hence there is not a second-order correction term for

*k*(but can be easily included otherwise). Higher moments are increasingly harder to estimate from a limited number of observed species and may destabilize the estimator. We chose to truncate

*Ω*at the second statistical moments, but we can even ignore the second moments. The resulting estimator,

_{T}*Ω*

_{0}, keeps only the first term of the denominator in equation (2.6):

The estimators *Ω*, *Ω _{T}* and

*Ω*

_{0}have idealized counterparts that eliminate the survivorship biases in the observed means, variances and covariances of

*mn*and

*P*. This is achieved by taking out $\hat{\mathrm{hat}}$ in equations (2.5)–(2.7); let us call the resulting corrected estimators

*Ω*,

_{C}*Ω*and

_{TC}*Ω*

_{0}

*. These corrected estimators can only be used in simulations where the state variables are known. Note we kept*

_{C}*S*as the observed richness, which clearly contains survivorship bias, but the biases in the correction terms were eliminated. Thus, the performance of these corrected estimators represents how much

*Ω*,

*Ω*and

_{T}*Ω*

_{0}can improve in the future.

#### (b) Quantifying precision by bootstrapping

From a single survey, we produce approximations for the standard deviation (s.d.) interval and the coefficients of variation (CV = variance/mean) of each estimator from bootstraps. These quantify the precision, uncertainty or repeatability of estimates (small s.d. and CV mean high precision). We used non-parametric bootstrapping [43] because the distribution of possible estimates for a single community is *a priori* unknown.

A key consideration in our implementation of bootstrapping is that individual occurrences are not independent both within species and patches because of the intraspecific tendency to cluster [44,45]. By contrast, common bootstrapping procedures for richness estimates assumed that sampled individuals or sample sites are independent (as did the estimators themselves) [8], an assumption that is also common in rarefaction and extrapolation techniques that control for sampling effort or coverage when comparing communities [14] but can lead to erroneous uncertainty estimates. To account for spatial and species dependencies of sampled individuals, we follow a simple block bootstrapping procedure [46]. For each bootstrap, we first randomly sampled with replacement *k* transects (electronic supplementary material, figure S4). Then, from this spatially randomized set, we randomly sampled with replacement as many species as the point estimator predicts from the original dataset (e.g. *Ω* or Chao1). All species were drawn from the observed set of *S* species, but the bootstrap pseudocommunity now consists of a potentially larger number of pseudospecies (see [8]). We used this bootstrap sample size instead of the observed *S*, because we know *S* underestimates the true community richness and bootstrapping is intended to recreate sampling variations from that community. s.d. and CV were computed from the bootstrap estimates. These measures allow us to measure an estimator's precision independently from bias.

#### (c) Simulation set-up

To test and compare the performance of asymptotic richness estimators, we simulated spatially partitioned communities under four scenarios that present different challenges that arise from imperfect local observation and spatial heterogeneities. For each simulation scenario, we created 2000 unique communities. Each community has 100 patches so that it is possible to sample a small fraction, and 1 to 100 species are present, which represents a common range for many focal taxa (such as what was observed in the seaweed dataset used later). Parameter values vary between communities within a scenario but converge on scenario mean values in abundance *E*[*n*] and occupancy *E*[*P*], which simulate natural variations in spatial biodiversity patterns that we may want to discern. Further, each community was sampled at different efforts, converging on mean spatial sampling effort (patches sampled *E*[*k*]) and fraction of individuals observed *E*[*m*]. Thus, each of the 2000 communities represent a unique set of biological and sampling conditions.

Specifically, each simulated community's mean fraction of individuals observed $\stackrel{~}{m}$ was drawn from *E*[*m*] + (*U*(0,1)−0.5)*E*[*m*](1−*E*[*m*]) where *U* is a uniform random value. Each species in a community in turn had a *m _{i}* value drawn from the same formula above but with E[

*m*] replaced by $\stackrel{~}{m}$. The community mean occupancy $\stackrel{~}{P}$ was drawn from

*E*[

*P*] + (

*U*(0,1)−0.5)

*E*[

*P*](1−

*E*[

*P*]), and each species has a

*p*value drawn from the same formula but with E[

_{i}*P*] replaced by $\stackrel{~}{P}$. Each community's mean abundance $\stackrel{~}{n}$ was drawn from 2E[

*n*]

*U*(0,1) and species

*n*was obtained from the following steps. First, a number $\underset{i}{\overset{\prime}{n}}$ was drawn from a lognormal distribution [47] with a mean of ln($\stackrel{~}{n}$)−0.5 an s.d. of 1, all divided by $\stackrel{~}{P}$. Then, all species with $\underset{i}{\overset{\prime}{n}}$ less than 1 was assigned the value of 1, and lastly ${n}_{i}=\underset{i}{\overset{\prime}{n}}\left(\sum {{n}^{\prime}}_{i}+{\sum}_{{{n}^{\prime}}_{i}<1}{{n}^{\prime}}_{i}-{\sum}_{{{n}^{\prime}}_{i}<1}1\right)/\sum \underset{i}{\overset{\prime}{n}}$ to ensure both the desired $\stackrel{~}{n}$ and $\stackrel{~}{P}$ are achieved by the following individual assignments (electronic supplementary material, figure S1

_{i}*e*). We assigned individuals to each patch by drawing from a Poisson distribution with the mean being

*n*multiplied by 1 if drawn from

_{i}*U*(0,1)<

*P*and 0 otherwise. Finally, subsampling biodiversity surveys were simulated by randomly picking

_{i}*k*∊ patches in each community (without replacement). Within each patch and for each species, individuals were observed by drawing from a Poisson distribution (sample with replacement) with the mean being (

*mn*)

*.*

_{i}#### (d) Performance evaluation

We desire asymptotic richness estimators to be precise (repeatable), unbiased and accurate, but these components may exhibit trade-offs given limited information [48]. Thus, the performance of an estimator cannot be summarized by a single measure; we present six measures to assess whether certain estimators rank higher in one aspect and lower in others.

A least-squares linear regression corrected for heteroscedasticity [49] (*hac* function in Matlab R2020a) was performed on the estimated richness versus true richness within each estimator to address richness-dependent bias and precision. Measure no. 1 is the regression slope. A slope less than 1 indicates that more diverse communities are more severely underestimated than less diverse communities, which is a prevalent bias among richness estimators (sometimes expressed as 1 minus slope [48]). We report the s.d. in the regression slope (measure no. 2) as a measure of the estimator's precision around a bias. A low s.d. indicates the estimator is precise (consistent) across communities and true richness. In addition, *R ^{2*}* was computed for the 1 : 1 line, which describes the deviation from true richness (measure no. 3). A small

*R*indicates either underestimates or overestimates, with large errors capable of generating negative values (worse than simple average).

^{2*}*R*is an overall measure of the point estimate's accuracy.

^{2*}*R*contrasts from the

^{2*}*R*of the true : estimate linear regression, the latter having been used to measure precision [48] but we believe the measure is confounded with bias. Even if an estimator always returns the same richness value for a given true richness (high precision), bias would reduce

^{2}*R*below 1, thus we chose to use slope and s.d. in slope as our measures of bias and precision. Richness difference detections (correct sign of difference) were checked between 2000 randomly picked community pairs that were different in richness by at most 2, 10 or 20 but were not identical (measure nos. 4–6). Detection accuracy is essential to infer spatial or temporal trends, which are major objectives of biodiversity studies. Detection accuracies above 0.5 is better than a coin-flip, since the random outcomes are whether community A has a higher or lower richness than community B, with equal probabilities of each being true across the 2000 comparisons.

^{2}For each of the six performance measures above, we computed an estimator's performance deviation from the mean across 13 estimators, then divided by the s.d. to obtain a score. Performances for measure nos. 4–6 are just as defined above. For measure no. 1, performance was 1-|slope-1|; for measure no. 2, performance was 1−*R ^{2*}*, ensuring that all large performance scores are better. The scores were averaged across the six metrics, with 1 and −1 corresponding to performances that are one s.d. better or worse than the average estimator.

To visualize estimations, we computed the mean and s.d. in estimates that occur in a rolling window of 10 centred around each true richness. The s.d. bounds are plotted around the means. We then compared these bounds with the same rolling window averages of within-community bootstrapped s.d. estimates across true richness. The s.d. bounds estimated from 50 bootstraps of a single community should ideally estimate the s.d. across multiple communities with the same true richness. Thus, when the two sets of bounds agree, the bootstrapping procedure is supported. These bounds are also linked to the slope s.d. introduced above as measures of precision: imprecise estimates should lead to a high slope s.d. independent of bias.

We observed from simulations that *Ω** _{T}* and

*Ω*

*'s estimated observation probabilities can occasionally be very small or negative because*

_{TC}*mn*variance in particular can be numerically unstable (overestimated) from limited sample size (number of observed species). We set a threshold of 10% estimated observation probability below which

*Ω*

*and*

_{T}*Ω*

*switched to*

_{TC}*Ω*

_{0}, which cannot be negative and is numerically stable.

#### (e) Empirical datasets

The first dataset that we will use to test estimators is a high-quality Barro Colorado Island tree census from Panama [38], which exhibits a negative temporal trend that is probably true [50]. Starting in 1982, every stem was identified by species every 5 years, for a total of eight census. Over 300 species were observed across 1250 quadrats fully covering a rectangle over 50 hectares.

The second dataset is an extensive seaweed diversity survey collected annually from 2012 to 2019 on the coast of British Columbia, Canada. The dataset features relatively high standards in sampling effort and taxonomic identification for the marine realm [39] but is not a census. The survey tracked seaweed species and their abundance (estimated as per cent cover). Nine transects were established. Each year, 10 quadrats were randomly selected on each transect and surveyed for per cent cover. Between 73 and 82 species were observed each year. Transects probably represent independent spatial samples of an underlying community, whereas quadrats were less spatially independent, so we aggregated quadrat data within transects for this analysis. A 10 × 10 grid was overlaid on the 0.5 m × 0.5 m quadrat, and any organism filling each square of this grid added 1% to the species’ total cover within the quadrat [51]. A species that was partially present in only one square of the grid was assigned a per cent cover of 0.5 (or trace). Since 0.5 is the lowest recorded cover and is akin to one individual count being the minimum, we divided all per cent cover data by 0.5 to obtain abundance.

Abundance within the context of sampling has no real biological meaning beyond being discrete; the abundance unit does not have to coincide with the individual organism, a concept that we note is open to biological debates and subject to evolution [52,53], but has no consequences for richness estimation. The only important feature of count data is that the observable abundance *mn*, not the biological *n*, approximates a Poisson process; *p* is not affected by a change of unit and is thus insensitive to the individual definition. The other consideration for converting per cent cover to observed local abundance (*mn*) is that it is capped and thus not quite Poisson, but in practice recording abundances beyond this cap will probably not affect the estimated observation probability (e.g. failure to observe, exp(-*mn/p*), is 4 × 10^{−44} for *mn/p* = 100 in equation (2.1)).

In addition to obtaining point estimates of richness on both full datasets, we subsampled transects and quadrats in different combinations and produced replicated subsample estimates. For each subsampling replicate for the Barro Colorado Island tree census, we randomly picked a subset of quadrats (patches) without replacement and sampled these same quadrats every year. For each subsampling replicate for the British Columbia seaweed survey, we randomly picked transects and quadrats (without replacement) every year. Quadrat counts were summed to obtain transect data, so transects were treated as patches here. Each spatial effort level also received a complementary local downsampling experiment in which only 10% (*m* = 0.1) of individuals in each selected transect was observed. We call this locally imperfect observation experiment ‘downsampling’ as opposed to ‘subsampling’ to distinguish it from spatial subsampling. For each experiment, we replicated the subsampling 40 times since different transects and quadrats could be randomly picked. An ideal estimator would predict the same true richness regardless of spatial subsampling and local downsampling.

The results were presented through five metrics. First, the point estimates for Chao1, Chao2, *Ω* and observed richness were plotted against subsample experiments in two groups, which are the experiments without and with local downsampling. Second, the CV were plotted to measure precision (see *Quantifying precision by bootstrapping* section). CV was obtained from either 2000 bootstraps of the full dataset, or 50 bootstraps in each of 40 subsampled and downsampled experiments that were then averaged across experiments. Third, we plotted the subsampled and downsampled richness estimates as the portion of richness estimates obtained from the full datasets (% of full estimates), which create a multi-dimensional rarefaction (with 100% indicating asymptote is reached). These relative measures show how close are the estimators to asymptotes with lowered efforts. Fourth, we performed linear regressions of point estimates on year or census (time) and plotted the slopes as the detected temporal trends. Fifth, we plotted the *p*-value of the slope being different from the null of no trends, which should approach zero if a true trend exists (likely for Barro Colorado Island trees) and larger if no trends exist (likely for British Columbia seaweeds).

For each full dataset, we plotted the s.d. from bootstraps in each year. For subsampling experiments, we plotted the s.d. across subsamples in each year. Both precision measures represent observation variations expected from repeatedly sampling the true community.

### 3. Results

#### (a) Theoretical insights

Here we describe how sampled patches (*k*) and means and variances in observable abundance (*mn*) and occupancy (*p*) affect species observation and induce bias to the observed richness. First, we found that the effects of means of these variables (across species) were captured by *D*_{i} in the richness estimator *Ω*_{0} (equation (2.7)), which is the single-species observation probability (equation (2.3)) computed at variables’ mean values (as if all species in the community share the same characteristics). Equations (2.2) and (2.3) show that observation probability increases with means of *k, n*, *m*, and *p*, results that can also be visualized in figure 1.

Across species variance effects (of *mn* and *p*) depend on the signs of the second-order partial derivatives according to the Taylor expansion in the estimator *Ω** _{T}* (equation (2.6)). The analytical expressions are long but always negative (see numerical outcomes in figure 2

*c,d*). Therefore, variances always decrease the overall observation probability. The effect of the covariance between

*mn*and

*p*can be either positive or negative. Precise numerical effects are outputs of the function

*estimateRichness*in the R-package

*Richness*when applied to user's spatial abundance data.

These results clarify two main ways in which non-richness biodiversity components induce bias to observed richness. Occupancy is related to spatial turnover and dissimilarity, yet its mean value affects species observation (equation (2.3)). Variance in observable abundance, which is probably inversely proportional to community evenness (assuming *m* is not anti-correlated with *n*), reduces observed richness even though true richness should not be affected by evenness. These effects illustrate the need for biodiversity metrics to be bias-corrected if they are to represent their definitions.

#### (b) Simulation tests

We tested the proposed estimators (*Ω*: equation (2.5), *Ω** _{T}*: equation (2.6),

*Ω*

_{0}: equation (2.7)) on a simulated dataset of species abundance. We compared these to other leading estimators including: the unbiased Chao1 [8,30], the Gamma-Poisson correction of Chao1 (GP) [34], Chao2 [35], ACE [36], Jackknife for abundance data (JK

_{a}) and Jackknife for incidence data (JK

_{i}) [37]. We also tested the idealized versions

*Ω*

*,*

_{C}*Ω*

*and*

_{TC}*Ω*

_{0C}that are only operational in simulations where the true spatial statistics are known for all species to establish theoretical limits of

*Ω*-type estimators. For incidence-based estimators including Chao2 and Jackknife2, abundance within patches were converted to presence/absence. The six performance statistics in figure 3 are described in the Methods:

*Performance evaluation*section. The electronic supplementary material, figure S3

*a,b*correspond to scenarios I and II in figure 3, and figure 2

*a,b*correspond to scenarios III and IV. These figures show s.d. bounds for observed richness, Chao1 and

*Ω*across communities sharing similar true richness. The electronic supplementary material, figure S2 shows that s.d. bounds from bootstrapping within each community are similar to the across-community s.d. bounds across scenarios, which supports the bootstrapping procedure for quantifying precision.

The simulations show that our proposed *Ω* and *Ω*_{0} estimators were the only non-idealized estimators that performed above average across all imperfect observation and spatial heterogeneity scenarios according to the overall score that summarizes bias (true : estimated richness slope), precision (slope s.d.), accuracy (*R*^{2*}) and difference detection accuracy (detection; figure 3). *Ω* performed better than *Ω*_{0} in all but scenario I where true occupancy was one for all species. *Ω*'s bias was lower than other abundance-based estimators but higher than incidence-based estimators in some scenarios, although these incidence-based estimators also had the lowest precision accompanied by low overall accuracy and/or low detection ability (figure 3). All estimators performed better than no corrections (observed richness) except for Jackknife estimators in some scenarios (figure 3*a*,*b* overall scores). *Ω* and *Ω*_{0} had higher precision when compared to observed richness except under the poorest conditions (figure 3*d*). Under the poorest scenario (figure 2*b*), only *Ω*-type estimators were better than coin flips in detecting richness differences that were less than two (figure 3*d*), although these small differences were challenging for any estimator (even idealized ones) to detect given the small spatial sampling effort (*k*).

We found that *Ω** _{T}* and the idealized

*Ω*

*outperformed existing estimators when spatial heterogeneity was present (figure 3*

_{TC}*b*–

*d*). However, they were less stable than

*Ω*and

*Ω*

_{0}, particularly in terms of accuracy and performance variation across scenarios, indicating that the variances and covariances in the Taylor expansion are numerically difficult to estimate. The proposed

*Ω*and

*Ω*

_{0}gained 2–7% in detection accuracy compared to using observed richness across scenarios. The idealized version of

*Ω*(

*Ω*

*, only operational in simulations) outperformed all others by all measures and scenarios, further improving difference detection accuracy (up to 17% higher than observed) while eliminating bias, increasing precision, and achieving 90% overall accuracy even under the poorest conditions (figure 3).*

_{C}The room for improvement to *Ω*-type estimators comes from survivorship bias in observed *mn* and *p*. This bias can be seen in the differences between the *Ω** _{T}* and

*Ω*

*correction factors in the imperfect detection versus spatial heterogeneity scenarios (figure 3*

_{TC}*a*,

*b*; electronic supplementary material, figure S3

*c,d*), which are also relevant for

*Ω*because the Taylor expansion version is an approximation of

*Ω*. Under imperfect local observation, observed mean

*mn*was biased upwards and variance downwards because rare species were missed (electronic supplementary material, figure S1

*a*). The upward bias in mean

*mn*overestimated observation probability while downward bias in variance had the opposite effect according to the effects derived in the

*Theoretical insights*section. Mean observed

*p*was biased downwards and variance was biased upwards (electronic supplementary material, figure S1

*b*). The downward bias from mean

*p*and upward bias from variance in

*p*led to underestimates in observation probability (upward bias, electronic supplementary material, figure S3

*a,c*). For the scenario with spatial heterogeneity, the biases in variance of

*mn*and mean of

*p*were the opposite of those caused by imperfect local observation (electronic supplementary material, figures S1

*d*and S3

*b,d*). The covariance correction factor was relatively small and similar in

*Ω*

*and*

_{T}*Ω*

*(figure 2*

_{TC}*c,d*). The opposing effects of biases across variables and sources partly cancelled each other, reducing net bias in the

*Ω*and

*Ω*

*richness estimates to various degrees (figure 2). In comparison, other estimators depended only on either abundance or spatial occupancy biases and therefore cannot cancel each other [8]. When the variables (inaccessible in real datasets) were unbiased, overall bias was reduced or eliminated (*

_{T}*Ω*

*in figure 3). Consistent directions in the survivorship biases associated with some biotic and abiotic sources suggest that corrections are possible to some extent; we identify these as future research objectives. The above insights also apply to*

_{C}*Ω*

_{0}but only through the opposing biases in means of

*mn*and

*p*under imperfect observation, and their aligning biases under spatial heterogeneity. The consequence is that when little (biological and observational) spatial heterogeneity is present,

*Ω*

_{0}outperforms

*Ω*(figure 3

*a*); otherwise

*Ω*is superior. We will focus on applying

*Ω*to datasets.

#### (c) Barro Colorado island tree census

We first computed the observed richness (all species recorded), Chao1, Chao2 and the proposed *Ω* point estimates as well as bootstrapped s.d. bounds for each year using the full census data (electronic supplementary material, figure S5*a*). Observed spatial statistics are shown in the electronic supplementary material, figure S6. All estimators predict richness close to the observed richness (figure 4*a*), suggesting the census was accurate. All estimators showed similar bootstrapped precision as measured by the CV (figure 4*b*). Based on the observed data, a negative trend of four species lost per census (per 5 years) was observed (figure 4*d*; average *p* = 2 × 10^{−5} from bootstrap least-squares linear regression). This trend is probably reliable owing to the high-quality census. All estimators agreed with a negative trend with low *p*-values.

Next we examined how the estimators perform when only 12 or 3 quadrats out of 12 50 were subsampled through the census years. Across 40 replications for each of the spatial subsampling experiments, the observed richness and *Ω* estimates at 12 sampled quadrats continued to indicate strong evidence for a negative trend (figure 4*d,e*), while Chao1 and Chao2 showed less confidence in such a trend (figure 4*e*). When spatial subsampling was coupled with local downsampling of individuals observed, precision decreased much more for Chao1 and Chao2 compared to *Ω* (figure 4*b*), particularly for Chao2 (electronic supplementary material, figure S5*f*). Low precisions probably contributed to the decreased ability to detect a temporal trend (if the trend from the full dataset is true). *Ω*'s correction was probably suppressed by the non-independence of quadrat samples (effective *k _{i}* in equation (2.3) is probably smaller than the number of quadrats sampled). In the downsampling experiments where

*m =*10% of individuals were recorded, observed richness continued to detect a temporal trend (figure 4

*d,e*), and

*Ω*came much closer than Chao1 and Chao2 to detecting the same trend at 12 and 3 sampled quadrats (

*p*= 0.06, 0.07 in figure 4

*e*).

Overall, *Ω* exhibited similar downward bias compared to other estimators (including observed richness) under all subsampling and downsampling, but was less biased than observed richness (figure 4*a,b*). The results are consistent with the simulation analyses, which revealed *Ω* is most likely to detect richness differences, and Chao2 should be the least precise (figure 3).

#### (d) British Columbia seaweed survey

In this marine seaweed diversity dataset, all estimates based on the full dataset indicated that richness is higher than observed (figure 4*f*; electronic supplementary material, figure S7). *Ω*_{,} Chao1 and Chao2 indicated that about 4, 12 or 17 species remained unobserved, respectively. Chao2 estimates were particularly uneven and imprecise across years (electronic supplementary material, figure S7; figure 4*g*). Observed spatial abundance statistics are shown in the electronic supplementary material, figure S8. The observed richness across years indicated a weak signal of temporal increase but was inconclusive (*p* = 0.17), while *Ω* indicated slightly more support for a positive trend (*p =* 0.13*;* figure 4*i,j*). Chao1 and Chao2 also indicated positive temporal trends but with much higher *p*-values. Given these statistics, the true temporal trend was probably weak but possibly positive (figure 4*i*). When the full spatial sample was locally downsampled (*m =* 0.1), Chao2 was closest to the observed richness followed by *Ω* (figure 4*f*), and all estimators and observed richness indicated no temporal trends (figure 4*j*).

Under the spatial subsampling experiment of nine transects with two quadrats each and either with or without local downsampling, Chao2 was the closest to the observed richness in the full dataset (presumably least biased; figure 4*f*) but the least precise (figure 4*g*). All estimators showed no evidence of temporal trends (figure 4*j*).

Overall, in this dataset *Ω* was the most precise and less biased than Chao1 under subsampling and downsampling relative to the observed richness in the full dataset (figure 4*f,h*), while Chao2 was the least biased but had the lowest precision. These results are consistent with the simulation analyses, with Chao2 and *Ω* being less biased than Chao1, but Chao2 being the least precise (figure 3).

### 4. Discussion

We derived a new set of *Ω* richness estimators (*Ω*: equation (2.5), *Ω** _{T}*: equation (2.6),

*Ω*

_{0}: equation (2.7)) and used simulations and real applications to show they approach true richness and detect richness differences when the data are imperfect, spatially incomplete, and collected in different ways across surveys. Although the new estimators are still affected by sample design and effort [9], they have a novel theoretical basis, perform better in multiple aspects and have good potentials to further improve when compared to previous approaches [31]. Our work also shows that under realistic ecological conditions and commonly achieved sampling effort, no existing estimator can reliably estimate true richness and detect small richness differences across space or time when data are poor or spatially heterogeneous. These results come from considering both ecological and methodological uncertainties to a greater degree than previous works. They suggest that many previously detected biodiversity trends or lack thereof may have come from methodological and data deficiencies, and that investments in statistical research and sample design are critical for biodiversity monitoring [8–10]. Overall we found

*Ω*to be a robust, general-purpose richness estimator.

The proposed *Ω* richness estimators for species richness achieve a superior balance of low bias, precision and accuracy in detecting richness differences compared to abundance-based asymptotic estimators including Chao1 [8], GP [34], ACE [36] and Jackknife1 [30] as shown by simulated communities containing imperfect local observations and spatial heterogeneities. *Ω* accounts for the mean and variance effects of patches sampled (sample size), fractions of individuals observed within each species (catchability), abundance (rarity), clustering (spatial variance) and occupancy, all readily computed from spatial abundance data and without the need to independently assess survey effectiveness. *Ω* harnesses information already present in common surveys, thus it is no more data-hungry than other methods we assessed. The proposed method can be used to study spatial and temporal trends [22] as well as their relationships with other environmental and biological covariates, without the confounding effects of these covariates on richness estimates, such as in some multi-species occupancy models [27,28].

*Ω* was the most accurate estimator across simulated scenarios. There remains a downward bias (increasing underestimation when true richness increases) to all richness estimators including *Ω*, with the exception of the incidence-based Chao2 [35] and Jackknife2 [30], which had the minimal bias when looking across all replicates despite ignoring abundances (they only relied on spatial occupancy). Incidence-based estimators were built to account for spatial and species catchability variations [32,35]. However, these estimators were the least precise and often the least accurate, meaning repeated observations will result in vastly different estimates. If absolute richness is the sole aim, these incidence-based estimators coupled with many independently repeated (data-hungry) surveys are recommended. The trade-off, in addition to imprecision, is the inability to resolve small differences between communities under conditions that mix imperfect local observation with spatial heterogeneity. By contrast, across the different scenarios that simulated datasets represent, *Ω* detected trends correctly more often than without correction (i.e. observed richness) and any other estimators under any spatially heterogeneous conditions while maintaining the lowest bias among abundance-based estimators. Results from empirical applications to the Barro Colorado Island tree census [54] and the British Columbia seaweed survey [39] largely agreed with the above conclusions. Although uncorrected observed richness tends to suggest empirical trends more often than estimators, observed richness does not consider the uncertainty from the observation process and thus may be over-confident in trend detection and direction as simulation results suggested. Finally, an improved bootstrapping procedure that accounts for spatial and species dependence of individual observations in a single survey showed that *Ω* was more precise than other estimators, especially relative to Chao2. High precision and low bias by themselves are not sufficient but necessary features for an accurate estimator. We recommend obtaining a variety of estimators when assessing community diversity and detecting change, with *Ω* being the best abundance-based estimator and overall best at detecting richness differences, *Ω*_{0} being overall best under low spatial heterogeneity conditions, and Chao2 being the least biased but requiring more replicated observations.

*Ω*-type estimators contain survivorship bias [42] because observed species tend to have higher abundance and higher occupancy than unobserved species. If survivorship bias can be corrected (as revealed by the Taylor expansion version *Ω** _{T}*)—and this may very well be possible by studying how spatial and abundance statistics change with subsampling and downsampling experiments—the proposed approach would come very close to being the perfect estimator (see the idealized

*Ω*

*). In addition, non-independence of spatial samples (*

_{C}*k*), such as when sampled patches are close together in space or time but counted as independent samples, adds downward bias to all

*Ω*estimators by inflating inferred observation probabilities. This problem is apparent for the Barro Colorado Island census where subsampling led to insufficient correction by

*Ω*, where

*k*probably overestimated the number of independent spatial samples. The non-independence effect appears to outweigh the upward bias from

*Ω*'s assumption of sampling individuals and patches with replacement when the actual sampling is without replacement—also a unique feature of the census. One way to address non-independence is relying on biological knowledge to demarcate independent patches, as we did for the British Columbia dataset by assuming that transects, not quadrats, are independent, or assuming that sets of quadrats belong to independent habitat types, for example from detailed analyses on Barro Colorado Island [54]. Alternatively, deriving an effective sample size may help correct downward biases from sample non-independence. Finally, insights from occupancy models that estimate observation probability from repeated sampling [29], when available, may be incorporated into

*Ω*estimators in the future to harness sample dependence.

The detection of richness change has previously been addressed using rarefaction and extrapolation techniques that control for sampling effort [13,14], as opposed to uncontrolled asymptotic estimators (including *Ω*). Future research may apply rarefaction and extrapolation techniques to *Ω* and evaluate these performances in detection along with other estimators. Additionally, proper confidence bounds will help quantify detection uncertainty, which remains an outstanding issue for all richness comparison methods [23] even though our bootstrapping captures methodological uncertainties. Regardless, asymptotic estimators have the advantage of producing absolute rather than relative richness, which may be an indispensable biodiversity assessment criterion [4].

Our method is amenable to extensions. For example, spatial incidence data where abundance counts are unavailable can be easily handled. With local observable abundance *mn* fixed to a sufficiently high value for observed species (set to greater than 10 when preparing the data), the observation probability per occupied patch (equation (2.1)) is effectively 1, meaning that the only effect on *Ω* is equating the observation probability per patch *D _{ix}* with occupancy

*p*(equation (2.2)). This simple extension illustrates the potential for the current approach to handle a variety of data qualities including surveys designed for biodiversity assessment, unstructured observational data and mixed data meta-analyses. We leave it for future research to quantify and improve the breadth and performance of

_{i}*Ω*-type richness estimators across different data types.

We have offered an improved asymptotic richness estimator that already performs well in balancing bias, precision and accuracy, supporting *Ω* as today's standard. On the other hand, we identified many more open questions and improvement pathways that reveal the inherent difficulties in estimating richness and detecting change [16–21]. Ultimately, any richness estimator will be sensitive to whether the places and species we choose to observe represent the underlying community [55]. Thus, biological and field-methodological considerations along with estimation methods are indispensable for estimating richness and detecting changes.

### Data accessibility

All code and data are provided on Zenodo [40], and an R-package *Richness* is available on Github at https://github.com/EWTekwa/Richness.

The data are provided in the electronic supplementary material [56].

### Authors' contributions

E.W.T.: conceptualization, formal analysis, methodology, software, validation, visualization, writing—original draft, writing—review and editing; M.A.W.: conceptualization, data curation, methodology, software, writing—review and editing; P.T.M.: conceptualization, data curation, methodology, writing—review and editing; M.I.O.: conceptualization, methodology, writing—review and editing.

All authors gave final approval for publication and agreed to be held accountable for the work performed therein.

### Conflict of interest declaration

We declare we have no competing interests.

### Funding

E.W.T. and M.W. were funded by the Tula Foundation. E.W.T. was additionally funded by a Mitacs Accelerate Internship. P.M. was funded by a NSERC Discovery Grant (grant no. RGPIN 2019-06240).

## Acknowledgement

We thank the Hakai Biodiversity Synthesis group for discussions.

### References

- 1.
Loreau M 2021 Biodiversity as insurance: from concept to measurement and application.**Biol. Rev.**, 2 333-2354. (doi:10.1111/brv.12756) Crossref, Web of Science, Google Scholar**96** - 2.
Mora C, Tittensor DP, Adl S, Simpson AGB, Worm B . 2011 How many species are there on Earth and in the Ocean?**PLoS Biol.**, e1001127. (doi:10.1371/journal.pbio.1001127) Crossref, PubMed, Web of Science, Google Scholar**9** - 3.
Costello MJ, May RM, Stork NE . 2013 Can we name Earth's species before they go extinct?**Science**, 413-416. (doi:10.1126/science.1230318) Crossref, PubMed, Web of Science, Google Scholar**339** - 4.
Pereira HM 2013 Essential biodiversity variables.**Science**, 277-278. (doi:10.1126/science.1229931) Crossref, PubMed, Web of Science, Google Scholar**339** - 5.
Gopalakrishnan V 2018 Gut microbiome modulates response to anti–PD-1 immunotherapy in melanoma patients.**Science**, 97-103. (doi:10.1126/science.aan4236) Crossref, PubMed, Web of Science, Google Scholar**359** - 6.
Dasgupta P, Levin S . 2023 Economic factors underlying biodiversity loss.**Phil. Trans. R. Soc. B**, 20220197. (doi:10.1098/rstb.2022.0197) Link, Web of Science, Google Scholar**378** - 7.
Jackson JBC, Johnson KG . 2001 Measuring past biodiversity.**Science**, 2 401-2404. (doi:10.1126/science.1063789) Crossref, Web of Science, Google Scholar**293** - 8.
Chao A . 1984 Nonparametric estimation of the number of classes in a population.**Scandinavian J. Stat.**, 265-270. Web of Science, Google Scholar**11** - 9.
- 10.
Pielou EC . 1966 The measurement of diversity in different types of biological collections.**J. Theor. Biol.**, 131-144. (doi:10.1016/0022-5193(66)90013-0) Crossref, Web of Science, Google Scholar**13** - 11.
Wang JP . 2010 Estimating species richness by a Poisson-compound gamma model.**Biometrika**, 727-740. (doi:10.1093/biomet/asq026) Crossref, PubMed, Web of Science, Google Scholar**97** - 12.
Orlitsky A, Santhanam NP, Zhang J . 2003 Always good turing: asymptotically optimal probability estimation.**Science**, 427-431. (doi:10.1126/science.1088284) Crossref, PubMed, Web of Science, Google Scholar**302** - 13.
Gotelli NJ, Colwell RK . 2001 Quantifying biodiversity: procedures and pitfalls in the measurement and comparison of species richness.**Ecol. Lett.**, 379-391. (doi:10.1046/j.1461-0248.2001.00230.x) Crossref, Web of Science, Google Scholar**4** - 14.
Chao A, Jost L . 2012 Coverage-based rarefaction and extrapolation: standardizing samples by completeness rather than size.**Ecology**, 2 533-2547. (doi:10.1890/11-1952.1) Crossref, Web of Science, Google Scholar**93** - 15.
Gonzalez A, Chase J, O'Connor M . 2023 A framework for the detection and attribution of biodiversity change**Phil. Trans. R. Soc. B**, 20220182. (doi:10.1098/rstb.2022.0182) Link, Google Scholar**378** - 16.
Elahi R, O'Connor MI, Byrnes JEK, Dunic J, Eriksson BK, Hensel MJS, Kearns PJ . 2015 Recent trends in local-scale marine biodiversity reflect community structure and human impacts.**Curr. Biol.**, 1 938-1943. (doi:10.1016/j.cub.2015.05.030) Crossref, PubMed, Web of Science, Google Scholar**25** - 17.
Barnosky AD 2011 Has the Earth's sixth mass extinction already arrived?**Nature**, 51-57. (doi:10.1038/nature09678) Crossref, PubMed, Web of Science, Google Scholar**471** - 18.
Kolbert E . 2014**The sixth extinction: an unnatural history**, 1st edn. New York, NY: Henry Holt and Company. Google Scholar - 19.
Dornelas M, Gotelli NJ, McGill B, Shimadzu H, Moyes F, Sievers C, Magurran AE . 2014 Assemblage time series reveal biodiversity change but not systematic loss.**Science**, 296-299. (doi:10.1126/science.1248484) Crossref, PubMed, Web of Science, Google Scholar**344** - 20.
Chase JM 2019 Species richness change across spatial scales.**Oikos**, 1 079-1091. (doi:10.1111/oik.05968) Crossref, Web of Science, Google Scholar**128** - 21.
Loreau M, Cardinale BJ, Isbell F, Newbold T, O'Connor MI, de Mazancourt C . 2022 Do not downplay biodiversity loss.**Nature**, E27-E28. (doi:10.1038/s41586-021-04179-7) Crossref, PubMed, Web of Science, Google Scholar**601** - 22.
Gonzalez A 2016 Estimating local biodiversity change: a critique of papers claiming no net loss of local diversity.**Ecology**, 1 949-1960. (doi:10.1890/15-1759.1) Crossref, Web of Science, Google Scholar**97** - 23.
Roswell M, Dushoff J, Winfree R . 2021 A conceptual guide to measuring species diversity.**Oikos**, 321-338. (doi:10.1111/oik.07202) Crossref, Web of Science, Google Scholar**130** - 24.
Hillebrand H 2018 Biodiversity change is uncoupled from species richness trends: consequences for conservation and monitoring.**J. Appl. Ecol.**, 169-184. (doi:10.1111/1365-2664.12959) Crossref, Web of Science, Google Scholar**55** - 25.
Roswell M, Harrison T, Genung M . 2023 Biodiversity-ecosystem function relationships change in sign and magnitude across the Hill diversity spectrum**Phil. Trans. R. Soc. B**, 20220186. (doi:10.1098/rstb.2022.0186) Google Scholar**378** - 26.
Chiu CH, Chao A, Vogel S, Kriegel P, Thorn S . 2023 Quantifying and estimating ecological network diversity based on incomplete sampling data.**Phil. Trans. R. Soc. B**, 20220183. (doi:10.1098/rstb.2022.0183) Link, Google Scholar**378** - 27.
Welsh AH, Lindenmayer DB, Donnelly CF . 2013 Fitting and interpreting occupancy models.**PLoS ONE**, e52015. (doi:10.1371/journal.pone.0052015) Crossref, PubMed, Web of Science, Google Scholar**8** - 28.
Devarajan K, Morelli TL, Tenan S . 2020 Multi-species occupancy models: review, roadmap, and recommendations.**Ecography**, 1612-1624. (doi:10.1111/ecog.04957) Crossref, Web of Science, Google Scholar**43** - 29.
Tingley MW, Nadeau CP, Sandor ME . 2020 Multi-species occupancy models as robust estimators of community richness.**Methods Ecol. Evol.**, 633-642. (doi:10.1111/2041-210X.13378) Crossref, Web of Science, Google Scholar**11** - 30.
Gotelli NJ, Colwell RK . 2011 Estimating species richness. In**Biological diversity: frontiers in measurement and assessment**(edsMagurran AE, McGill BJ ), pp. 39-54. Oxford, UK: Oxford University Press. Google Scholar - 31.
Good IJ . 1953 The population frequencies of species and the estimation of population parameters.**Biometrika**, 237-264. (doi:10.1093/biomet/40.3-4.237) Crossref, Web of Science, Google Scholar**40** - 32.
Chao A, Colwell RK . 2017 Thirty years of progeny from Chao's inequality: estimating and comparing richness with incidence data and incomplete sampling. SORT.**Statist. Operat. Res. Trans.**3-54. (doi:10.2436/20.8080.02.49) Google Scholar - 33.
Bunge J, Willis A, Walsh F . 2014 Estimating the number of species in microbial diversity studies.**Annu. Rev. Stat. Appl.**, 427-445. (doi:10.1146/annurev-statistics-022513-115654) Crossref, Web of Science, Google Scholar**1** - 34.
Chiu CH . 2023 A more reliable species richness estimator based on the Gamma–Poisson model.**PeerJ**, e14540. (doi:10.7717/peerj.14540) Crossref, PubMed, Google Scholar**11** - 35.
Chao A . 1987 Estimating the population size for capture-recapture data with unequal catchability.**Biometrics**, 783-791. (doi:10.2307/2531532) Crossref, PubMed, Web of Science, Google Scholar**43** - 36.
Chao A, Lee SM . 1992 Estimating the number of classes via sample coverage.**J. Am. Stat. Assoc.**, 210-217. (doi:10.1080/01621459.1992.10475194) Crossref, Web of Science, Google Scholar**87** - 37.
Burnham KP, Overton WS . 1979 Robust estimation of population size when capture probabilities vary among animals.**Ecology**, 927-936. (doi:10.2307/1936861) Crossref, Web of Science, Google Scholar**60** - 38.
Condit R, Pérez R, Aguilar S, Lao S, Foster R, Hubbell S . 2019 Complete data from the Barro Colorado 50-ha plot: 423617 trees, 35 years.**Dryad**(doi:10.15146/5xcp-0d46) Google Scholar - 39.
Whalen MA, Starko S, Lindstrom SC, Martone PT . 2023 Heatwave restructures marine intertidal communities across a stress gradient.**Ecology**. (doi:10.1002/ecy.4027) Crossref, PubMed, Web of Science, Google Scholar - 40.
Tekwa EW, Whalen MA, Martone PT, O'Connor MI . 2023Code and data for: Theory and application of an improved species richness estimator .**Zenodo**. (doi:10.5281/zenodo.7860480) Google Scholar - 41.
Chao A, Lin CW . 2012 Nonparametric lower bounds for species richness and shared species richness under sampling without replacement.**Biometrics**, 912-921. (doi:10.1111/j.1541-0420.2011.01739.x) Crossref, PubMed, Web of Science, Google Scholar**68** - 42.
Brown SJ, Goetzmann W, Ibbotson RG, Ross SA . 1992 Survivorship bias in performance studies.**Rev. Financ. Stud.**, 553-580. (doi:10.1093/rsfs/5.4.553) Crossref, Web of Science, Google Scholar**5** - 43.
Efron B, Tibshirani R . 1986 Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy.**Statist. Sci.**, 54-75. (doi:10.1214/ss/1177013815) Crossref, Google Scholar**1** - 44.
Taylor LR . 1961 Aggregation, variance and the mean.**Nature**, 732-735. (doi:10.1038/189732a0) Crossref, Web of Science, Google Scholar**189** - 45.
Lloyd M . 1967 ‘Mean crowding’.**J. Anim. Ecol.**, 1. (doi:10.2307/3012) Crossref, Web of Science, Google Scholar**36** - 46.
Hall P . 1985 Resampling a coverage pattern.**Stochastic Process. Appl.**, 231-246. (doi:10.1016/0304-4149(85)90212-1) Crossref, Web of Science, Google Scholar**20** - 47.
Loreau M . 2010**The challenges of biodiversity science**. Oldendorf/Luhe, Germany: International Ecology Institute. See http://er.uwpress.org/cgi/doi/10.3368/er.29.3.311. Google Scholar - 48.
Brose U, Martinez ND, Williams RJ . 2003 Estimating species richness: sensitivity to sample coverage and insensitivity to spatial patterns.**Ecology**, 2 364-2377. (doi:10.1890/02-0558) Crossref, Web of Science, Google Scholar**84** - 49.
Andrews DWK . 1991 Heteroskedasticity and autocorrelation consistent covariance matrix estimation.**Econometrica**, 817. (doi:10.2307/2938229) Crossref, Web of Science, Google Scholar**59** - 50.
Cid J, Lambert B, Leroi AM . 2022 The influence of selection, drift and immigration on the diversity of a tropical tree community.**bioRxiv**. (doi:10.1101/2022.06.27.497655) Google Scholar - 51.
Dethier M, Graham E, Cohen S, Tear L . 1993 Visual versus random-point percent cover estimations: ‘objective’ is not always better.**Mar. Ecol. Prog. Ser.**, 93-100. (doi:10.3354/meps096093) Crossref, Web of Science, Google Scholar**96** - 52.
Heyward AJ, Collins JD . 1985 Fragmentation in*Montipora ramosa*: the genet and ramet concept applied to a reef coral.**Coral Reefs**, 35-40. (doi:10.1007/BF00302202) Crossref, Web of Science, Google Scholar**4** - 53.
Tekwa EW, Gonzalez A, Loreau M . 2015 Local densities connect spatial ecology to game, multilevel selection and inclusive fitness theories of cooperation.**J. Theor. Biol.**, 414-425. (doi:10.1016/j.jtbi.2015.06.016) Crossref, PubMed, Web of Science, Google Scholar**380** - 54.
Harms KE, Condit R, Hubbell SP, Foster RB . 2001 Habitat associations of trees and shrubs in a 50-ha neotropical forest plot:*habitat associations of trees and shrubs*.**J. Ecol.**, 947-959. (doi:10.1111/j.1365-2745.2001.00615.x) Crossref, Web of Science, Google Scholar**89** - 55.
Purvis A, Hector A . 2000 Getting the measure of biodiversity.**Nature**, 212-219. (doi:10.1038/35012221) Crossref, PubMed, Web of Science, Google Scholar**405** - 56.
Tekwa EW, Whalen MA, Martone PT, O'Connor MI . 2023 Theory and application of an improved species richness estimator. Figshare. (doi:10.6084/m9.figshare.c.6619716) Google Scholar