Molecular clocks indicate turnover and diversification of modern coleoid cephalopods during the Mesozoic Marine Revolution

Coleoid cephalopod molluscs comprise squid, cuttlefish and octopuses, and represent nearly the entire diversity of modern cephalopods. Sophisticated adaptations such as the use of colour for camouflage and communication, jet propulsion and the ink sac highlight the unique nature of the group. Despite these striking adaptations, there are clear parallels in ecology between coleoids and bony fishes. The coleoid fossil record is limited, however, hindering confident analysis of the tempo and pattern of their evolution. Here we use a molecular dataset (180 genes, approx. 36 000 amino acids) of 26 cephalopod species to explore the phylogeny and timing of cephalopod evolution. We show that crown cephalopods diverged in the Silurian–Devonian, while crown coleoids had origins in the latest Palaeozoic. While the deep-sea vampire squid and dumbo octopuses have ancient origins extending to the Early Mesozoic Era, 242 ± 38 Ma, incirrate octopuses and the decabrachian coleoids (10-armed squid) diversified in the Jurassic Period. These divergence estimates highlight the modern diversity of coleoid cephalopods emerging in the Mesozoic Marine Revolution, a period that also witnessed the radiation of most ray-finned fish groups in addition to several other marine vertebrates. This suggests that that the origin of modern cephalopod biodiversity was contingent on ecological competition with marine vertebrates.


(b) Phylogenetic inference
The superalignment was analysed using the Markov Chain Monte Carlo (MCMC) sampler PhyloBayes MPI v1.5a [6] . The mixture model CAT + GTR + Γ 4 was applied, being the most appropriate to deal with across-site heterogeneities, while minimising long-branch biases. Two independent Monte Carlo chains were run, a burn-in of 25% of the Markov chains were discarded. These chains converged, with the maximum difference in the bipartitions of the chains < 0.1, as reported by bpcomp program in the PhyloBayes package. A further test of convergence was carried out using tracecomp (also under PhyloBayes), with effective sample sizes being > 50, and relative differences dropping below 0.1 for all parameters as compared between the independent chains. The maximum likelihood software RAxML MPI v8.1.15 [7] was applied to the same dataset as used in Bayesian inference, applying LG + I + Γ 4 . 1000 pseudoreplicates were run.

(c) Molecular divergence time inference
Phylobayes 3.3f was used to infer molecular divergence times using CAT + GTR, soft-bounds of 0.05, and a Yule-process birth-death model. A Bayes Factor analysis of the fit of three alternative models was performed (CIR [8] , log-normal [9] , and uncorrelated gamma [10] ), with CIR showing a marginally better model-fit. Of these models, CIR was applied due to its ability to model rate change along branches and between taxa, while avoiding over-relaxation of divergence time inference.
The topology was fixed to that inferred by PhyloBayes MPI v1.5a, the root constrained to the bifurcation between the uncontroversial monophyletic assemblages of annelids and molluscs (cephalopoda + bivalvia + gastropoda + scaphopoda); the bivalves and gastropods, plus annelids were considered a balanced outgroup (with comparable taxonomic sampling and phylogenetic crown spread). A prior was applied to the root of 565 ± 10 Ma, representing the root of lophotrochozoa [11] . This prior was tested as being appropriate by chains being run without data, to confirm that the samples were being drawn from a distribution that includes the prior. The root age of the prior run was 552 ± 8 Ma, supporting the prior as appropriate.
Eleven fossil calibration points were applied to the analysis, as shown in table (SI  table 1). Two independent MCMC chains were run for each model, with convergence being determined through tracecomp , with effective sample sizes > 50, and relative differences < 1 for all parameters as compared between independent chains. The discarded burn-in was 25% of the chain length.

Full phylogeny with outgroups
Supplementary figure 1: Molecular phylogeny of 26 cephalopod species, plus outgroups. 180 genes, concatenated as 36,156 aligned amino acid positions with 26% missing data, modelled under CAT + GTR + Γ. Numbers at nodes represent posterior probabilities.

Data acquisiton
Genomic data for Architeuthis dux , and transcriptomic data for Onychoteuthis banksi and Sthenoteuthis banksi were generated and assembled by the Sequencing Centre of the University of Copenhagen using the Qiagen RNeasy extraction protocol, and the TruSeq RNA Kit v2 RNA isolation, cDNA synthesis, ligation and PCR-amplification protocol. Quality control was carried out using a Bioanalyzer 2100, Agilent Technologies. Transcriptomic sequence data from Bathypolypus arcticus, Grimpoteuthis glacialis, Lolliguncula brevis and Spirula spirula were generated at the University of Bristol Life Sciences Sequencing Facility. For these species, the Trizol extraction protocol was applied, with sequencing carried out on the Illumina HiSeq platform, 100 base pair read length, paired end reading. Transcriptome assembly was carried out in Trinity version 2.0.3 [12] under default parameters and using Trimmomatic (default parameters, as part of the Trinity package) for quality control. See supplementary information for accession numbers of published sequence data.

Orthology assesment
Gene trees were assessed for orthology applying for following protocol. Maximum likelihood phylogenies were inferred for each gene using PhyML [13] version 3, modelling under LG [14] and accepting the best tree of either SPR or NJ. Sequences producing long branches were removed from the alignments, with a long branch considered to be more than 2 times the standard deviation of the average away from the average branch length for the gene in question (script available at github.com/jairly/MoSuMa_tools/). 17 genes were considered to have low orthology confidence (due to unresolved gene trees) across the taxonomic sample, and discarded, leaving 180 gene alignments. Ambiguously aligned positions were removed from the gene alignments by GBlocks v0.91b [4] ( b2 = 70%, b3 = 10, b4 = 5, b5 = half).
To provide alternative topological inference, maximum likelihood approaches were also employed. PartitionFinder [15] and IQTree [16] were used to test model fit under maximum likelihood, with both returning the substitution model of Le and Gascuel [14] , with a gamma distribution of rates and a proportion of invariant sites as having best model fit.

Fossil calibrations
12 fossil calibrations were applied to the molecular clock analyses. Letters in supplementary table 2 below refer to the nodes labelled a-l in supplementary figure 2. The root itself was constrained to 565 ± 10 Ma, as evidenced by Kimberella of the Douoshanto [11,17] . The root prior was tested as appropriate by running the analysis without data, returning a posterior distribution on the root of mean 565 ± 9.8 Ma, supporting the priors as appropriate.

Clock model cross-validation
Cross-validation was carried out in accordance with guidelines in the PhyloBayes manual, comparing the autocorrelated CIR model against the uncorrelated UGAM model. With CIR against UGAM, Bayes factor returned as 40.87 ± 165.7; a positive number is support for the fit of CIR.

Alternative clock models and calibration schemes
Analyses were run to assess the impact of internal versus external calibrations on divergence time estimations. See supplementary table 1 for calibrations classified as "internal" and "external". Note that "internal" refers to application of both internal and external calibrations.