How many dinosaur species were there? Fossil bias and true richness estimated using a Poisson sampling model (TRiPS)

The fossil record is a rich source of information about biological diversity in the past. However, the fossil record is not only incomplete but has inherent biases due to geological, physical, chemical and biological factors. Our knowledge of past life is also biased because of differences in academic and amateur interests and sampling efforts. As a result, not all individuals or species that lived in the past are equally likely to be discovered at any point in time or space. To reconstruct temporal dynamics of diversity using the fossil record, biased sampling must be explicitly taken into account. Here, we introduce an approach that utilizes the variation in the number of times each species is observed in the fossil record to estimate both sampling bias and true richness. We term our technique TRiPS (True Richness estimated using a Poisson Sampling model) and explore its robustness to violation of its assumptions via simulations. We then venture to estimate sampling bias and absolute species richness of dinosaurs in the geological stages of the Mesozoic. Using TRiPS, we estimate that 1936 (1543-2468) species of dinosaurs roamed the Earth during the Mesozoic. We also present improved estimates of species richness trajectories of the three major dinosaur clades; the sauropodomorphs, ornithischians and theropods, casting doubt on the Jurassic-Cretaceous extinction event and demonstrating that all dinosaur groups are subject to considerable sampling bias throughout the Mesozoic.


Introduction 33
One of the main goals of palaeobiology is to reconstruct diversity using information from the fossil record. 34 While the patterns of diversity in space and through time are interesting in themselves, understanding the 35 dynamics of taxon richness is also the first step in elucidating the biotic and abiotic forces that shape the 36 spatial and temporal variation in taxon diversity. In other words, we need an accurate picture of patterns 37 of past diversity to understand processes that operate on long time scales. As in all study systems where 38 data samples in themselves cannot be assumed to represent a complete picture of the underlying 39 population, richness studies based on the fossil record must consider the incompleteness of the fossil 40

record. 41
Not all organisms enter the fossil record or have the same potential of doing so. Once created, a fossil 42 record (a physical record of the existence of organisms that were alive in the past) is subject to eternal 43 loss through physical processes such as erosion. Whether or not a fossilized organism can be found is 44 also affected by variability in outcrop accessibility. Last but not least, sampling intensity encompassing 45 factors including academic/commercial interest, geographic location and sampling design also influence 46 information from the fossil record we have access to. While some of these factors contribute to noise in 47 our inference of historical patterns and processes, and thus only cloud biological signals, others may 48 cause systematic bias so as to yield misleading results if the data are interpreted at face value or with 49 inappropriate methods. 50 Several classes of approaches for estimating richness using an incomplete fossil record have been 51 developed. These might be loosely grouped into subsampling approaches, phylogenetic corrections and 52 residual approaches. It is not our purpose to give a full overview of the approaches available, which have 53 variously been reviewed elsewhere (see e.g. 1,2), but we briefly describe these in order to clarify why we 54 have developed a new approach here. Subsampling approaches, including rarefaction (reviewed in 1) and 55 Shareholder Quorum Subsampling (SQS; 3,4), attempt to standardize temporal (or spatial) samples so as 56 to achieve comparable relative richness across samples. Phylogenetic approaches use phylogenetic 57 hypotheses of the clade in question to infer ghost lineages unobserved in the fossil record but that must 58 have existed as implied by the given phylogenetic hypothesis (5). These ghost lineages are thus assumed 59 to give a minimum estimate of the number of lineages we have failed to observe in the fossil record. The 60 residual approach (6-9) assumes that a given proxy for sampling (e.g. outcrop area or number of fossil 61 bearing collections) captures the biases that might influence our observations and uses the proxy to model 62 where p binom,t is the binomial probability calculated from the estimated sampling rate (eq 4) using 150 maximum likelihood estimates (eq 3) and N t is the observed number of species in the time interval. Thus 151 the value N true that maximizes eq 5 is the maximum likelihood estimate of the true richness where N t 152 species were observed. 153 To quantify the uncertainty surrounding the estimate of the sampling rate and the true species richness we 154 utilize the relationship between the χ 2 distribution and log likelihood profiles (see e.g. 29). For the 155 confidence bounds on the maximum likelihood estimate we find the range of values for λ that satisfy 156 the inequality 157 2 6log 6 : | , , ; − log 6 | , , ;; < = > ?
158 where = > ? is the upper quantile function of the χ 2 distribution with one degree of freedom. Similarly 159 the upper and lower confidence bounds for the estimated true richness , -./ is found using the lower and 160 upper confidence bounds on the sampling probability (p binom,t ). 161 TRiPS thus yields maximum likelihood estimates and confidence intervals of true species richness for a 162 given time interval by estimating a sampling rate (observations per species per Ma). This sampling rate 163 can be transformed into a time interval specific sampling probability (probability of fossil detection per 164 species) and thereby appropriately take the duration of the time interval into account. In other words, we 165 do not need to conform the data to equal durations as commonly done (22,34). The sampling rates 166 estimated from TRiPS are thus directly comparable across geological intervals of unequal durations. Note 167 that while we have described TRiPS using species observations it can also be directly applied to genera or 168 groups of taxa defined in other ways. In fact, any grouping of taxa thought to exhibit similar sampling 169 rates might be combined, whether or not they actually are taxonomic clades. 170

Simulations using a birth-death-fossilization process 171
To evaluate our method's applicability and power we performed a large number of continuous time birth-172 death (BD) simulations, coupled with a fossilization scheme, which we interpret as sampling. In a classic 173 BD process a lineage either gives rise to a new species or goes extinct at a certain rate; our fossilization 174 scheme adds a third potential event: that of a lineage leaving a fossil. We are thus simulating a 'fossil record' given a set of parameters controlling the dynamics and sampling of the simulated clade, and then 176 using TRiPS to estimate the true number of species in these simulations 177 In case of no changes in species richness within a given time interval and identical sampling rates for all 178 lineages, TRiPS will consistently recover true richness. Using simulations, we explicitly investigate the 179 robustness of our approach to violations of TRiPS's two main assumptions, 1) equal sampling rates (per 180 lineage per Ma) for all species in the clade in question and 2) and negligible species turnover within a 181 time interval (i.e. all lineages span the entire interval in which they are observed). We then use the results 182 of our simulations to aid interpretation of our estimates based on dinosaur records (see below). 183 Our birth-death-fossilize model has six parameters; speciation and extinction rates (in per species per time respectively. Mean sampling rates ranged from 0.001 to 1.5 fossils per species per time unit as drawn 196 from a uniform distribution. Variation in sampling rate among species ranged from 0 to 0.3, also drawn 197 from a uniform distribution; setting this parameter to 0.3, for example, gives each lineage a sampling rate 198 ranging from 0.7 to 1.3 times the clade mean (this is equivalent to twice the coefficient of variation). Each 199 species thus has its own unique sampling rate drawn from a uniform distribution with means and variance 200 differing across simulations, explicitly violating the TRiPS assumption that lineages have the same 201 sampling rate. We ran 100 000 simulations. 202 We evaluated TRiPS's ability to infer sampling rates and true species richness by 1) tabulating the 203 number of simulations in which the true species richness was inside the predicted confidence interval 204 (which we call success rate), 2) estimating the bias in the maximum likelihood prediction of species 205 richness, the mean scaled error @AB = ( ∑ 234 DE FG 234 ( , and 3) Pearson's product-moment correlation 206 (ρ) between true and estimated richness in the simulations. We compiled the same statistics for estimated 207 sampling rates and sampling probabilities. Together these three metrics inform us of the precision, bias 208 and relative accuracy of our methodology. 209

Estimating dinosaur sampling bias and species richness 210
For each stage in the Mesozoic we estimate the sampling rate (i.e. the bias) and the species richness for 211 each of the four clades using TRiPS. Combined with confidence intervals, these estimates allow for sound 212 statistical comparisons of both bias and richness between geological stages. Additionally, to give a crude 213 estimate of the total richness across the Mesozoic, we calculate a Mesozoic sampling probability as a 214 mean of binomial probabilities, weighted by estimated richness in each stage. Mesozoic mean sampling 215 rates are calculated similarly. Using mean Mesozoic sampling probability and the number of unique 216 observed species we could also estimate the total richness of the dinosaur clades throughout the Mesozoic. 217

Implementation 218
All data analysis was performed in R (36). Code necessary for the analysis, combined with scripts to 219 directly download relevant (and thus updated) data from Paleobiology Database in addition to simpler R 220 scripts showing how other datasets can be analysed using TRiPS is available on the authors' website and 221 Dryad. Output from our simulations and functions and scripts to rerun simulations are also 222 deposited,.[AUTHOR COMMENT: Dryad will not accept data before acceptance of manuscript]. TRiPS based stage-specific sampling rates and probabilities for dinosaurs do not monotonically increase 232 through the Mesozoic, but exhibit a combination of high and low sampling regimes (figure 1). This 233 observation runs counter to the commonly held belief that younger geological strata exhibit a higher level 234 of fossil sampling (e.g. 21,23). Sampling probabilities are particularly high during the  and   relatively few sampling events per lineage per million years, but the probability of a species being 240 sampled, given that it was extant, is quite high (>0.8 for all groups, see figure 1). In general, the relative 241 changes in sampling dynamics are similar for our genus level analyses although sampling rates and 242 probabilities are naturally higher for genera (ESM figures S5 and S6). 243 The three clades have notably different sampling estimates from stage to stage, with binomial sampling 244 probabilities spanning from about 0.1 to almost 1. Theropods show higher sampling rates relative to 245 ornithischians and sauropodomorphs in the Triassic (Carnian, 237 -227 Ma, and Norian) but comparably 246 lower rates in parts of the Early Cretaceous (Valanginian,through Aptian,. 247 This runs counter to earlier conclusions that richness trajectories of Theropoda and Ornithischia seems to Ornithischia, Sauropodomorpha and Theropoda, respectively. However, there is substantial remaining 261 sampling bias not captured by DBCs (since R 2 < 0.51 for all). Using DBCs to remove bias in richness 262 estimation of dinosaurs (e.g. 38) will add a substantial amount of error in correcting diversity curves. 263

Dinosaur richness in the Mesozoic 264
The whole Mesozoic is estimated to have seen 1936 (1543 -2468, table 1) dinosaur species. A Mesozoic 265 mean sampling probability at the genus level (again, weighted by estimated genus richness from each 266 stage) yields estimates of total dinosaur genera richness at 1536 (1255 -1929). Total species richness for 267 the subclades is estimated to be 508 (409 -668) for Ornithischia, 513 (307 -983) for Sauropodomorpha 268 and1115 (780 -1653) for Theropoda (table 1). 269 The species richness estimates from TRiPS shares dynamics with those painted by both the raw counts of 270 species and range-through species richness using the same dataset ( figure 2). However, only in about half 271 the stages are the range-through estimates within the confidence interval of TRiPS estimates. Genus 272 richness dynamics are similar to species dynamics (ESM figure S6) and indicate that for at least this 273 dataset using these analyses, genus level estimates can be a proxy for species estimates, corroborating 274 Jablonski and Finarelli's findings (39). While genus richness estimates are lower, they are similar to 275 species estimates, unsurprisingly, given there are few dinosaur species per genera (approximately 1.15 276 identified species per genera in our data). 277 Estimated species richness shows an increase in the late Triassic, with a peak in the Rhaetian, though with 278 a considerably wide confidence interval (figure 2, ESM table S2). This indicates that the initial 279 diversification of dinosaur continued for up to 40 Ma, in contrast to the raw counts which peak in the 280

Norian. Comparing stages within the Early Jurassic shows that the elevated raw richness in the 281
Sinemurian for all clades is not corroborated by our estimates; TRiPS estimate richness for this stage well 282 within the confidence intervals for richness in the Hettangian, Ma) and the 283 Toarcian (182.7 -174.1 Ma). In the Mid and Late Jurassic, also contrary to the raw species counts 284 indicating a richness peak in the Tithonian, TRiPS estimates that all clades exhibit elevated richness in the 285 Oxfordian due to the relatively low sampling rate in this interval for all clades. Finally, the trough in raw 286 richness in the Coniacian (89.8 -86.3 Ma) -Santonian with a peak in the two final stages of the 287 Cretaceous for all clades seems to be a sampling artefact; while there is still a signal of reduced richness 288 in the Coniacian for all dinosaurs, this is most likely driven by the ornithischians; the Coniacian-289 Santonian trough is not particularly strong for sauropodomorphs and theropods when inspecting the 290 confidence intervals for these periods. 291

Simulation results 292
Most simulated species do not span the whole interval in which they were sampled, unless speciation and 293 extinction rates are both zero. Using TRiPS to estimate the true richness is in such cases (the majority of 294 our simulations) is thus a test of the degree to which our approach is robust to deviations from the 295 assumption that an observed lineage spans the whole interval in question. Across the whole parameter 296 space simulated (see above) TRiPS yielded confidence intervals including the true richness in 36% of 297 simulations and true sampling rates were inside the confidence interval in 39% of simulations (see ESM 298 for details). Though these are low absolute success rates, the correlations between true and estimated 299 richness is still high (ρ = 0.99) and the mean scaled error is also low (MSE<0.07). This indicates that the 300 maximum likelihood estimate of richness correlates very strongly with the true richness (indicating 301 robustness in terms of relative richness) and that the richness estimate is, on average, only 7% smaller 302 than the true richness. Unsurprisingly, the success rate and MSE of TRiPS are also highly dependent on 303 all parameter settings, except the degree of variation in sampling rate between lineages in a single 304 simulation (see ESM). In cases where individual lineages differ in sampling rates, TRiPS effectively 305 estimates the group mean, and there is no first order effect on bias, precision or correlation of estimates 306 when increasing the variability of sampling rates between lineages. The duration of the interval directly 307 scales the potential for changes in species richness; the longer the duration, the more time for speciation 308 and extinction within the time interval, leading to larger deviations from our assumption of lineages 309 spanning the entire interval, and a higher failure rate of capturing the true richness within the confidence 310 interval predicted by TRiPS. This underscores that TRiPS will perform better when applied to empirical 311 data with narrower time intervals. Secondly, the sampling rate itself affect the success rate of TRiPS; 312 TRiPS works best where sampling rates are relatively low, and even so when speciation and extinction 313 rates are > 0 (figure 3). We note, however, that the majority of our empirical dinosaur estimates are in 314 regions of parameter space in which TRiPS has a relatively high success rate ( figure 3). 315

Discussion 316
Fossil observation data are readily available in public databases, such as the PaleoDB. Yet, estimating 317 taxon richness using such databases is not trivial as fossilization, outcrop exposure and modern day 318 sampling and data compilation are heterogeneous processes. Unlike approaches such as subsampling 319 (1,34,40) and bias-corrected residual analysis (8,9) our approach, TRiPS, estimates true rather than 320 relative richness by utilizing information on sampling which is inherent in PaleoDB. In addition, unlike 321 the residual approach, we do not make presuppositions that an external time series can be used to correct 322 for sampling. This is important because such external time series (e.g. amount of outcrop, sea level) may 323 constitute a factor driving both richness and sampling as postulated by the Common Cause Hypothesis 324 (10,11) or be an effect of a third factor. The residual approach can also suffer from redundancy in the data 325 used (41,42), which would lead to erroneously corrected richness estimates. 326 In TRiPS, we tackle bias in the fossil record directly by estimating rates of sampling. This also allows us 327 to disentangle sampling and richness dynamics such that tests of links between potential drivers can be 328 done on sampling and richness independently (see also 43). An advantage of TRiPS is that our treatment 329 of sampling allows sampling probabilities to be directly comparable between intervals of unequal 330 duration (but see below). One assumption we do explicitly make which cannot be true most of the time, is 331 that a species detected in a given time interval is extant during that whole time interval. This is because 332 most species are unlikely to become extinct exactly at the late boundary of a time interval or originate 333 exactly at the early boundary of a time interval. While other methods for estimating richness also assume 334 that turnover is clumped at interval boundaries (see e.g. 40, p. 74), we explicitly examined the robustness 335 of our estimates to the violation of this crucial assumption. 336 Comparing our empirical sampling rate estimates with simulations that violated key assumptions of 337 TRiPS (figure 3), we find most of our empirical estimates fall within parameter ranges in which we are 338 able to retrieve true richness estimates reliably. This is with the caveat that the simulated speciation and 339 extinction rates are realistic for dinosaurs. 340 However, we note other caveats to the estimates from TRiPS. First, although the ability to estimate true 341 richness is relatively robust to deviations from our assumptions under our simulations, TRiPS does give 342 biased estimates of both sampling and richness when there are within-bin dynamics. When applying 343 TRiPS to cases with non-zero speciation and extinction rates and increasingly wide time intervals, our 344 'data' will consist more and more of observations of lineages that did not span the entire interval. This 345 generates a bias in our estimated sampling rates such that these estimates are lower than true sampling 346 rates (see ESM figure S2), and more so for longer durations and higher levels of speciation and extinction 347 rates. On the other hand, transforming this rate into a binomial probability (eq 4), assuming all lineages 348 lasts longer than they actually do, leads to an overestimation of the binomial sampling probability when 349 compared to the fraction of species actually observed in our simulations (see ESM figure S3). These two 350 opposing biases in our methodology in sum lead to a slight negative bias in the reconstructed richness (on 351 average 7% below true richness in our simulations). So, sampling rates and richness are slightly 352 underestimated, but sampling probabilities are overestimated. 353 The estimated richness is thus best treated as minimum richness estimates, particularly for intervals in 354 which there is reason to believe that within interval changes in true richness have been substantial, such as 355 in long geological stages. Second, with longer intervals (which gives more time for in-bin dynamics) and 356 higher sampling rates TRiPS fails more often in terms of capturing true richness inside confidence 357 intervals (figure 3) due to high precision combined with bias. Note, however, that the maximum 358 likelihood estimate of true richness itself is still not too far off target if speciation and extinction rates are 359 not too high (see ESM figure S1). On the other hand, one benefit of our explicit approach is that it is 360 straightforward to simulate a birth-death-fossilization process to check if the empirical estimates of 361 sampling rates and richness can be considered robust to violation of the assumption of negligible turnover 362 within an interval (see below). It is also worth highlighting that other approaches for reconstructing past 363 richness also fall victim to deviations from constant richness (see e.g. 40), even though such violations 364 have not been explicitly examined in published simulations, as far as we know. 365 The obvious solution to these caveats is to apply TRiPS only for temporally well-resolved data, and to 366 strive for better and more accurate dating of fossils. Other potential avenues for improvement lie in 367 development of the methodology itself. For instance, extending TRiPS to estimate sampling rates in a 368 hierarchical manner, where instead of estimating a single sampling rate, one could estimate a distribution 369 of sampling rates within a time interval, or link sampling rates for subclades in a hierarchical fashion, 370 possibly including more phylogenetic information. Other directions could be to include estimates of 371 lineage durations from independent sources instead of assuming that all lineages span the interval in 372 which they are observed. 373 In summary our simulations shows that TRiPS is prone to errors if its assumptions are substantially 374 violated. Even in applications where we have reason to believe there are high levels of species turnover, 375 estimates of richness will still be a reasonable approximation of the true richness, although the associated 376 confidence intervals should be treated with caution; they are in general too narrow for such cases. In 377 applying TRiPS on shorter intervals, i.e. when there is more reason to believe that lineages sampled 378 actually do span most of the interval, TRiPS confidence intervals will in most cases encompass the true 379 richness. With the pros and cons of TRiPS in mind, we can now compare our dinosaur estimates to 380 previous studies. 381 Dodson (19) estimated the total number of dinosaur genera to be 900-1200 for the whole Mesozoic, with 382 about 100 genera at any one geological stage. Our estimated genus richness of 1536 (1255 -1929) are 383 more in line with Wang and Dodson's (18) estimates, which inferred that the entire Mesozoic saw 1844 384 dinosaur genera. For the final stages of the Cretaceous Wang and Dodson (18)  It is currently accepted that dinosaurs did not rapidly diversify when they appeared around the start of the 388 Late Triassic. Rather, sauropodomorphs diversified during the final part of the Triassic, while 389 ornithischians and theropods increased in richness in the early Jurassic (21,38). While this pattern is in 390 part corroborated by our analysis for sauropodomorphs, a species richness of 80 for the whole dinosaur 391 clade is estimated for the last stage in the Triassic, which is relatively high compared with the rest of the 392 Mesozoic. Sampling rates for ornithischians cannot be estimated with confidence for any interval in the 393 Triassic. Sauropodomorphs exhibit rather high levels of both observed and estimated species richness 394 already in the Norian (228 -208.5 Ma), and our estimate of sauropodomorph species richness during the  Mesozoic richness was calculated using the Mesozoic mean sampling probability and the total number of 693 species in each clade. For stage specific sampling estimates see figure 1 and ESM.