Phylogenomics-guided discovery of a novel conserved cassette of short linear motifs in BubR1 essential for the spindle checkpoint

The spindle assembly checkpoint (SAC) maintains genomic integrity by preventing progression of mitotic cell division until all chromosomes are stably attached to spindle microtubules. The SAC critically relies on the paralogues Bub1 and BubR1/Mad3, which integrate kinetochore–spindle attachment status with generation of the anaphase inhibitory complex MCC. We previously reported on the widespread occurrences of independent gene duplications of an ancestral ‘MadBub’ gene in eukaryotic evolution and the striking parallel subfunctionalization that lead to loss of kinase function in BubR1/Mad3-like paralogues. Here, we present an elaborate subfunctionalization analysis of the Bub1/BubR1 gene family and perform de novo sequence discovery in a comparative phylogenomics framework to trace the distribution of ancestral sequence features to extant paralogues throughout the eukaryotic tree of life. We show that known ancestral sequence features are consistently retained in the same functional paralogue: GLEBS/CMI/CDII/kinase in the Bub1-like and KEN1/KEN2/D-Box in the BubR1/Mad3-like. The recently described ABBA motif can be found in either or both paralogues. We however discovered two additional ABBA motifs that flank KEN2. This cassette of ABBA1-KEN2-ABBA2 forms a strictly conserved module in all ancestral and BubR1/Mad3-like proteins, suggestive of a specific and crucial SAC function. Indeed, deletion of the ABBA motifs in human BUBR1 abrogates the SAC and affects APC/C–Cdc20 interactions. Our detailed comparative genomics analyses thus enabled discovery of a conserved cassette of motifs essential for the SAC and shows how this approach can be used to uncover hitherto unrecognized functional protein features.


Introduction
Chromosome segregation during cell divisions in animals and fungi is monitored by a cell cycle checkpoint known as the spindle assembly checkpoint (SAC) [1][2][3]. The SAC couples absence of stable attachments between kinetochores and spindle microtubules to inhibition of anaphase by assembling a four-subunit inhibitor of the anaphase-promoting complex (APC/C), known as the MCC [4][5][6]. The molecular pathway that senses lack of attachment and produces the MCC relies on two related proteins known as Bub1 and BubR1/Mad3 [2]. Bub1 is a serine/ threonine kinase that localizes to kinetochores and promotes recruitment of MCC subunits and of factors that stimulate its assembly [7][8][9]. These events are largely independent of Bub1 kinase activity, however, which instead is essential for the correction process of attachment errors [7,10,11]. BubR1/Mad3 is one of the MCC subunits, responsible for directly preventing APC/C activity and anaphase onset [6,12,13]. It does so by contacting multiple molecules of the APC/C co-activator Cdc20, preventing APC/C substrate access and binding of the E2 enzyme UbcH10 [5,6,14,15]. The BubR1/Mad3-Cdc20 contacts occur via various short linear motifs (SLiMs) known as ABBA, KEN and D-box [6,9,14,[16][17][18][19][20]. Like Bub1, BubR1 also impacts on the attachment error-correction process via a KARD motif that recruits the PP2A-B56 phosphatase [21][22][23]. This may not however be a universal feature of BubR1/Mad3-like proteins, because many lack a KARD-like motif.
Bub1 and BubR1/Mad3 are paralogues. We previously showed they originated by similar but independent gene duplications from an ancestral MadBub gene in many lineages, and that the two resulting gene copies then subfunctionalized in remarkably comparable ways [24]. An ancestral N-terminal KEN motif (KEN1: essential for the SAC) and an ancestral C-terminal kinase domain (essential for attachment error-correction) were retained in only one of the paralogous genes in a mutually exclusive manner in virtually all lineages (i.e. one gene retained KEN but lost kinase, while the other retained kinase but lost KEN). One exception to this 'rule' are vertebrates, where both paralogues have a kinase-like domain. The kinase domain of human BUBR1 however lacks enzymatic activity (i.e. is a pseudokinase) but instead confers stability onto the BUBR1 protein [24].
The similar subfunctionalization of Bub1 and BubR1/ Mad3-like paralogues was inferred from analysis of two domains (TPR and kinase) and one motif (KEN1). We set out to analyse whether any additional features specifically segregated to Bub1-or BubR1/Mad3-like proteins after duplications by designing an unbiased feature discovery pipeline and tracing feature evolution. The pipeline extracted all known and various previously unrecognized conserved motifs from Bub1/BubR1 family gene members. Two of these are novel ABBA motifs that flank KEN2 specifically in BubR1/Mad3-like proteins; we show that this highly conserved ABBA-KEN2-ABBA cassette is crucial for the SAC in human cells.  figure S1) corroborated the 10 independent duplications previously described [24] and allowed for a more precise determination of the age of the duplications. Strikingly, we found evidence for a number of additional independent duplications: three duplications in stramenopile species of the SAR super group (Albuginaceae (#10 in figure 1b), Ectocarpus siliculosis (#11) and Aureococcus anophagefferens (#12)) and one at the base of basidiomycete fungi (puccinioimycetes (#4)). The BUBR1 paralogue in teleost fish underwent a duplication and fission event, of which the C-terminus product was retained only in the lineage leading to zebra fish (Danio rerio (#7)).

Results
Lastly, through addition of recently sequenced genomes we could specify a duplication around the time plants started to colonize land (bryophytes (#13)) and an independent duplication in the ancestor of higher plants (tracheophytes (#14)), followed by a duplication in the ancestor of the flowering plants (magnoliaphytes (#15)). These gave rise to three MadBub homologues, signifying additional subfunctionalization of the paralogues in the plant model organism Arabidopsis thaliana. It thus seems to be the case that such striking parallel subfunctionalization as we originally identified is indeed predictive for more of its occurrence in lineages whose genome sequences have since been elucidated.

De novo discovery, phylogenetic distribution and fate after duplication of functional motifs in the MadBub gene family
Previous analyses revealed a recurrent pattern of mutually exclusive retention of an N-terminal KEN-box and a C-terminal kinase domain after duplication of an ancestral MadBub gene [24,25]. These patterns suggested the hypothesis of paralogue subfunctionalization towards either inhibition of the APC/C in the cytosol (retaining the KEN-box) or attachment-error correction at the kinetochore (retaining the kinase domain). Given the extensive sequence divergence of MadBub homologues and a scala of different known functional elements, we reasoned that a comprehensive analysis of MadBub gene duplicates would provide opportunities for the discovery of novel and co-evolving ancestral features. For clarity, we refer to the Bub1-like paralogue (C-terminal kinase domain) as BUB and the BubR1/Mad3-like paralogue (N-terminal KEN-box) as MAD throughout the rest of this paper.
To capture conserved ancestral features of diverse eukaryotic MadBub homologues, we constructed a sensitive de novo motif and domain discovery pipeline (ConFeaX: conserved feature extraction) similar to our previous approach used to characterize KNL1 evolution [26]. In short, the MEME algorithm [27] was used to search for significantly similar gapless amino acid motifs, and extended motifs were aligned by MAFFT [28]. Alignments were modelled using HMMER [29] and sensitive profile HMM searches were iterated and specifically optimized using permissive E-values/bit scores until convergence (Material and methods and figure 1a). Owing to the degenerate nature of the detected SLiMs, we manually scrutinized the results for incorrectly identified features and supplemented known motif instances, when applicable. We preferred ConFeaX over other de novo motif discovery methods [30,31], as it does not rely on high quality full length alignment of protein sequences and allows detection of repeated or dynamic non-syntenic conserved features (which is a common feature for SLiMs). It is therefore better tuned to finding conserved features over long evolutionary distances in general and specifically in this case where recurrent rsob.royalsocietypublishing.org Open Biol. 6: 160315 duplication and subfunctionalization hamper conventional multiple sequence alignment based analysis.
ConFeaX identified known functional motifs and domains and in some cases extended their definition: KEN1 [32], KEN2 [19], GLEBS [33], KARD [21 -23], CMI (also known as CDI [7]), D-box [19], CDII (a co-activator domain of BUB1 [7,34]) and the recently discovered ABBA motif (termed ABBA3, see §2.3) [9,16,18,20] (figure 1a; electronic supplementary material, table SII and sequence file 2). The TPR and the kinase domain were annotated using profile searches of previously established models [24] and excluded from de novo sequence searches. KEN1 and KEN2 could be discriminated by differentially conserved residues surrounding the core KEN-box (figure 1a). Those surrounding KEN1 are involved in the formation of the helix-turn-helix motif that positions BubR1/Mad3 towards Cdc20 [6], while two pseudo-symmetrically conserved tryptophan residues with unknown function specifically defined KEN2. Furthermore, rsob.royalsocietypublishing.org Open Biol. 6: 160315 we found that the third position of the canonical ABBA motif is often occupied by a proline residue and the first position in ascomycetes (fungi) is often substituted for a polar amino acid (KRN) (figure 1a), signifying potential lineage-specific changes in Cdc20-ABBA interactions. Last, we also discovered a novel motif predominantly associated with the MAD paralogue in basidiomycetes, plants, amoeba and stramenopiles but not metazoa, which we termed MAD-associated motif (MadaM) (figure 1a).
Projection of the conserved ancestral features onto the MadBub gene phylogeny provided a highly detailed overview of MadBub motif evolution (figure 1b; electronic supplementary material, figure S1b). We found that the core functional motifs and domains (TPR, KEN1, KEN2, ABBA, D-box, GLEBS, MadaM, CMI, CDII and kinase) are present throughout the eukaryotic tree of life, representing the core features that were probably part of the SAC signalling network in the last eukaryotic common ancestor (LECA). Of note are lineages (nematodes, flatworms (Schistosoma mansoni), dinoflagellates (Symbiodinium minutum) and early branching fungi (microsporidia and Conidiobolus coronatus)) for which multiple features were either lost or considerably divergent (electronic supplementary material, figure S1b). Especially interesting is Caenorhabditis elegans in which both KEN boxes and the GLEBS domain appear to have been degenerated (ceMAD ¼ san-1) and the CMI motif is lost (ceBUB ¼ bub-1), indicating extensive rewiring or a less essential role of the SAC in nematode species, as has been suggested recently [35,36].
Our motif discovery analyses revealed the Cdc20/Cdh1interacting ABBA motif to be much more abundant than the single instances that were previously reported for BUBR1 and BUB1 in humans [9,16,18]. We observed three different contexts for the ABBA motifs (figure 1b; electronic supplementary material, figure S1b): (i) in repeat arrays (e.g. MAD of Physcomitrella patens, basidiomycetes and stramenopiles), (ii) in the vicinity of CMI (many instances) and/or D-box/KEN (e.g. human) and (iii) as two highly conserved ABBA motifs flanking KEN2 (virtually all species). Because of the positional conservation of the latter, we have termed these ABBA1 and ABBA2. Any additional ABBA motifs were pooled in the category 'ABBA-other'.
In order to track the fate of the features discovered using ConFeaX, we quantified their co-presences and -absences, as a proxy for coevolution, by calculating the Pearson correlation coefficient (r) for the profiles of each domain/motif pair of 16 duplicated MadBub homologues (figure 1b) [37]. Subsequent average clustering of the Pearson distance (d ¼ 1 2 r) revealed two sets of co-segregating and anti-correlated conserved features (figure 2a,b) consistent with our hypothesis that MadBub gene duplication caused parallel subfunctionalization of features towards the kinetochore (mainly BUB) and the cytosol (MAD) [24]. GLEBS, CDI, ABBA-other, KARD, CDII and the kinase domain formed a coherent cluster of features with bona fide function at the kinetochore. For a detailed discussion on several intriguing observations regarding presence/ absence of these motifs in several eukaryotic lineages, and what this may mean for BUB/MAD and SAC function in these lineages, see the electronic supplementary material, Discussion. A second cluster contained known motifs that bind and interact with (multiple) CDC20 molecules, including KEN1, KEN2 and (to a lesser extent) the D-box. Our newly discovered ABBA motifs that flank KEN2 were tightly associated with KEN2 and KEN1 ( figure 2). As such, the ABBA1-KEN2-ABBA2 cassette (figure 3a) co-segregated with MAD function during subfunctionalization of MadBub gene duplicates. Although the D-box often co-occurs with the KEN-ABBA cluster, this motif was occasionally lost (e.g. archeaplastids, Schizosaccharomyces pombe and A. anophagefferens). Finally, MadaM co-segregated with the Cdc20interacting motifs (figure 2a), suggesting a MAD-specific role for this newly discovered motif (possibly in MCC function and/or Cdc20-binding) in species harbouring it such as plants, basidiomycetes and stramenopiles.

The conserved ABBA1-KEN2-ABBA2 cassette is essential for SAC signalling in human cells
The strong correlation of the ABBA1-KEN2-ABBA2 cassette with KEN1 and the D-box urged us to examine the role of these motifs in BUBR1-dependent SAC signalling in human cells. We therefore generated stable isogenic HeLa-FlpIn cell lines expressing doxycyclin-inducible versions of LAPtagged BUBR1 [38]. These included: DABBA1, DABBA2, DABBA1 þ 2, alanine-substitutions of the two KEN2-flanking tryptophans (W1-A, W2-A and W1/2-A), KEN1-AAA, KEN2-AAA, DABBA3 and DD-box (figure 3a-c). The SAC was severely compromised in cells depleted of endogenous BUBR1 by RNAi, as measured by inability to maintain mitotic arrest upon treatment with S-trityl-L-cysteine (STLC) [39] (median (m) ¼ 50 min from nuclear envelope breakdown to mitotic exit, compared with control (m . 500 min)) (figure 3d,e). SAC proficiency was restored by expression of siRNA-resistant LAP-BUBR1 (m . 500 min). As shown previously [19,40,41], mutants of KEN1, KEN2 and the D-box strongly affected the SAC. Importantly, BUBR1 lacking ABBA1 or ABBA2 or both, or either of the two tryptophans, could not rescue the SAC (figure 3e). We observed a consistently stronger phenotype for the mutated motifs on the N-terminal side of KEN2 (DABBA1 (m ¼ 65 min) and W1-A (m ¼ 165 min)) compared with those on the C-terminal side (DABBA2 (m ¼ 200 min) and W2-A (m ¼ 260 min)). The double ABBA (1/2) and tryptophan (1/2) mutants were however further compromised (m ¼ 50 and 110 min, respectively), suggesting non-redundant functions. As expected from the interaction of ABBA motifs with the WD40 domain of CDC20 [14,18], BUBR1 lacking ABBA1 and/or ABBA2 was less efficient in binding APC/C-Cdc20 in mitotic human cells, to a similar extent as mutations in KEN1 (figure 3f ). In our hands, the ABBA1 and ABBA2 mutants were strongly deficient in SAC signalling and APC/C-Cdc20 binding while the previously described ABBA motif (ABBA3) was not ( figure 3d,e). Previous studies suggested that ABBA3 might play a role in SAC silencing [16,42], which raises the possibility that ABBA3 may somehow counteract binding of ABBA1 and/or ABBA2 to CDC20. In conclusion, the ABBA1-KEN2-ABBA2 cassette in BUBR1 is essential for APC/C inhibition by the SAC. We here discovered a symmetric cassette of SLiMs containing two Cdc20-binding ABBA motifs and KEN2. This cassette strongly co-occurs with KEN1 in MAD-like and MadBub proteins throughout eukaryotic evolution and has important contributions to the SAC in human cells. Our co-precipitation experiments along with the known roles for ABBA-like motifs and KEN2 and their recent modelling into the MCC-APC/C structure [14,15] strongly suggest that the ABBA1-W-KEN2-W-ABBA2 cassette interacts with one or multiple Cdc20 molecules. Together with KEN1, these interactions probably rsob.royalsocietypublishing.org Open Biol. 6: 160315 regulate affinity of MCC for APC/C or its positioning once bound to APC/C. The constellation of interactions between two Cdc20 molecules (Cdc20 MCC and Cdc20 APC/C ) and the various Cdc20-binding motifs in one molecule of BUBR1 (3Â ABBA, 2Â KEN and a D-box) is not immediately clear, and will have to await detailed atomic insights. One suggestion that arises from our study is that the ABBA3 motif that is modelled into the APC/C-MCC structure by Alfieri et al. [14] might well be the ABBA2 motif. The symmetric arrangement of the cassette may be significant in this regard, as is the observation that (despite a highly conserved WD40 structure of Cdc20) the length of spacing between the ABBA motifs and KEN2 is highly variable between species. A more detailed understanding of SAC function may be aided by Con-FeaX-driven discovery of lineage-specific conserved features in the MadBub family when more genome sequences become available, as well as of features in other SAC proteins families.

Phylogenomic analysis
We performed iterated sensitive homology searches with jackhmmer [43] (based on the TPR, kinase, CMI, GLEBS and KEN boxes) using a permissive E-value and bitscore cut-off to    include diverged homologues on UniProt release 2016_08 and Ensemble Genomes 32 (http://www.ebi.ac.uk/Tools/ hmmer/search/jackhmmer). Incompletely predicted genes were searched against whole genome shotgun contigs (wgs, http://www.ncbi.nlm.nih.gov/genbank/wgs) using tblastn. Significant hits were manually predicted using AUGUST [44] and GENESCAN [45]. In total, we used 152 MadBub homologues (electronic supplementary material, sequence file 1). The TPR domains of 148 sequences were aligned using MAFFT-LINSI [28]; only columns with 80% occupancy were considered for further analysis. Phylogenetic analysis of the resulting multiple sequence alignment was performed using RAxML [46] (electronic supplementary material, figure 1a). Model selection was performed using Prot Test [47] (Akaike information criterion): LG þ G was chosen as the evolutionary model.

Conserved feature extraction and subfunctionalization analysis
ConFeaX starts with a probabilistic search for short conserved regions (max. 50) using the MEME algorithm (option: any number of repeats) [27]. Significant motif hits are extended on both sides by five residues to compensate for the strict treatment of alignment information by the MEME algorithm. Next, MAFFT-LINSI [28] introduces gaps and the alignments are modelled using the HMMER package [29] and used to search for hits that are missed by the MEME algorithm. Subsequent alignment and HMM searches were iterated until convergence.
For SLiMs with few conserved positions, specific optimization of the alignments and HMM models using permissive E-values/bit scores was needed (e.g. ABBA motif and D-box). Sequence logos were obtained using weblogo2 [48]. Subsequently, from each of the conserved features, a phylogenetic profile was derived (present is '1' and absent is '0') for all duplicated MadBub sequences as presented in figure 1. For all possible pairs, we determined the correlation using Pearson correlation coefficient [37]. Average clustering based on Pearson distances (d ¼ 1 2 r) was used to indicate subfunctionalization.

Live cell imaging
For live cell imaging experiments, the stable HeLa-FlpIn-TRex cells were transfected with 40 nM siRNA (start and at 24 h). After 24 h, the medium was supplemented with thymidine (2.5 mM) and doxycyclin (2 mg ml 21 ) for 24 h to arrest cells in early S-phase and to induce expression of the stably integrated construct, respectively. After 48 h, cells were released for 3 h and arrested in prometaphase of the mitotic cell cycle (after approximately 8-10 h) by the addition of the Eg5 inhibitor S-trityl-L-cysteine (STLC, 20 mM). HeLa cells were imaged (DIC) in a heated chamber (378C, 5% CO 2 ) using a CFI S Plan Fluor ELWD 20x/NA 0.45 dry objective on a Nikon Ti-Eclipse wide field microscope controlled by NIS software (Nikon). Images were acquired using an Andor Zyla 4.2 sCMOS camera and processed using NIS software (Nikon) and ImageJ.