Spatial separation of the cyanogenic β-glucosidase ZfBGD2 and cyanogenic glucosides in the haemolymph of Zygaena larvae facilitates cyanide release

Low molecular weight compounds are typically used by insects and plants for defence against predators. They are often stored as inactive β-glucosides and kept separate from activating β-glucosidases. When the two components are mixed, the β-glucosides are hydrolysed releasing toxic aglucones. Cyanogenic plants contain cyanogenic glucosides and release hydrogen cyanide due to such a well-characterized two-component system. Some arthropods are also cyanogenic, but comparatively little is known about their system. Here, we identify a specific β-glucosidase (ZfBGD2) involved in cyanogenesis from larvae of Zygaena filipendulae (Lepidoptera, Zygaenidae), and analyse the spatial organization of cyanide release in this specialized insect. High levels of ZfBGD2 mRNA and protein were found in haemocytes by transcriptomic and proteomic profiling. Heterologous expression in insect cells showed that ZfBGD2 hydrolyses linamarin and lotaustralin, the two cyanogenic glucosides present in Z. filipendulae. Linamarin and lotaustralin as well as cyanide release were found exclusively in the haemoplasma. Phylogenetic analyses revealed that ZfBGD2 clusters with other insect β-glucosidases, and correspondingly, the ability to hydrolyse cyanogenic glucosides catalysed by a specific β-glucosidase evolved convergently in insects and plants. The spatial separation of the β-glucosidase ZfBGD2 and its cyanogenic substrates within the haemolymph provides the basis for cyanide release in Z. filipendulae. This spatial separation is similar to the compartmentalization of the two components found in cyanogenic plant species, and illustrates one similarity in cyanide-based defence in these two kingdoms of life.

Chemical defence mediated by β-glucosidases is an important driver of the herbivore-plant arms race, and is excellently illustrated by the phenomenon of cyanogenesis [12][13][14][15][16], which is the release of hydrogen cyanide (HCN) from cyanogenic glucosides (CNglcs) catalysed by β-glucosidase activity [17]. After cleavage of the glucosyl moiety from CNglcs, the corresponding α-hydroxynitrile dissociates spontaneously at pH values above 6 or via the action of an α-hydroxynitrile lyase into HCN and a keto compound (electronic supplementary material, figure S1). HCN is an acute toxin for eukaryotic organisms due to inhibition of the cytochrome c oxidase, the terminal enzyme in the mitochondrial respiratory pathway [18]. In plants, spatial compartmentalization of the two components, i.e. CNglcs and β-glucosidase, on the tissue or cellular level ensures that HCN is only released after tissue disruption due to e.g. herbivore attack [3,9]. Arthropods may also rely on HCN for defence [17,19], and here cyanogenesis may either proceed in special tissues morphologically separated from the rest of the body, such as defence secretions [20][21][22], or in tissues largely connected to the rest of the body, such as gut and haemolymph. In this case, a prerequisite for cyanogenesis is the immediate enzymatic detoxification of HCN by β-cyanoalanine synthase [23][24][25]. This ability probably enabled some insects to exploit HCN for their own benefit, as reported from the burnet moth Zygaena filipendulae (Lepidoptera, Zygaenidae) that release HCN for defence, development and mating communication [26,27]. The CNglcs in Z. filipendulae, linamarin and lotaustralin, are derived from biosynthesis or sequestration, depending on the content in the food plant Lotus corniculatus which is polymorphic with respect to levels of CNglcs [7,8,28]. Furthermore, significant amounts of HCN in Z. filipendulae are released from crude larval haemolymph, whereas other tissues such as cuticular cavities containing defence droplets do not release HCN per se [29]. This scenario renders a haemolymph-based β-glucosidase with activity against CNglcs likely, and accordingly, Zygaena trifolii larvae have been shown to harbour cyanogenic β-glucosidase activity in the haemolymph [5]. It is still unknown which gene encodes the putative cyanogenic β-glucosidase and if the gene evolved in convergence or divergence to cyanogenic plant β-glucosidases. Furthermore, the localization of the β-glucosidase and its substrates would unravel whether compartmentalization of the two components occurs in Zygaena insects.
Here we identify and characterize a cyanogenic β-glucosidase from larvae of the specialist insect Z. filipendulae and elucidate the enzyme's spatial occurrence in comparison to its substrates, showing that cyanogenesis proceeds in the haemoplasma.
punctured prolegs of ice-chilled larvae, and separated into haemoplasma (supernatant) and haemocytes (pellet) by centrifugation for 10 min at 3000g and 4°C. The haemocytes were re-suspended in 60 mM citrate buffer pH 6. Viability of the haemocytes was analysed by staining with 2 mM Fluorescein Diacetate (FDA) and monitoring in a fluorescence microscope (Leica DMR). Defence droplets were collected by stroking the larva with a pipette tip, and used immediately in assays before the secretion hardened [29]. All other tissues such as head, gut, integument, Malpighian tubules and fat body were obtained by dissection of ice-chilled penultimate instar larvae followed by washing in 0.9% NaCl to exclude haemolymph contamination. RNA was extracted using the RNAqueous ® -Micro Kit (Ambion); cDNA was synthesized using SuperScript™ III Reverse Transcriptase (Invitrogen™).

Identification of β-glucosidase and glucocerebrosidase genes
Candidate genes from the β-glucosidase (GH1) and glucocerebrosidase (GH30) gene families were identified in three different Z. filipendulae transcriptome datasets [30,31] by BLAST searches (using BLASTx and BLASTn with default algorithm parameter settings) using selected insect protein sequences from GH1 and GH30 as queries (accession numbers displayed in figure 4). Gene candidates from Z. filipendulae were then BLAST searched against the Z. filipendulae transcriptomes to ensure exhaustive searches. Approximately 23 β-glucosidases (GH1) and 6 glucocerebrosidases (GH30) were found in all the Z. filipendulae transcriptomes. The exact number remains to be determined, since many of the sequences were partial and may belong to the same transcript. Only four of the sequences (ZfGBA1, ZfBGD1, ZfBGD2, ZfBGD3) were full length in the 454 transcriptome [30], and thus deemed highly expressed, and used for heterologous expression. Combining all three transcriptomes, it was possible to manually assemble 11 β-glucosidases and 3 glucocerebrosidases to full length, which were subsequently used for phylogenetic analyses.

Molecular cloning and heterologous expression
Since ZfBGD1 is already characterized [29], the remaining three candidate genes ZfGBA1, ZfBGD2 and ZfBGD3 were selected for heterologous expression in Sf 9 cells. PCR amplification of the open reading frames were carried out as in [29]. For primers used, see electronic supplementary material, table S1. The products were Sanger-sequenced (European Nucleotide Archive accession number: LT635663 for ZfBGD2; LT635664 for ZfBGD3; LT635665 for ZfGBA1), cloned into the XmaI-and NotI-restriction sites of an insect cell expression vector (pAcGP67A, BD Pharmingen), and subsequently mixed with Baculo-Gold DNA™ (BD Pharmingen) to transfect Sf 9 insect cells (Life Technologies) for cultivation in Grace media (Gibco ® , Life Technologies). Human UDP-N-Acetyl-α-D-Galactosamine:Polypeptide-N-Acetylgalactosaminyltransferase 2 (GalNAc-T2) was expressed as control [32]. The candidate genes ZfBGD2 and ZfGBA1 were expressed with and without (*) a predicted native N-terminal targeting signal peptide. Sf 9 cells were harvested by centrifugation for 2 min at 3000g and 4°C and re-suspended in 60 mM citrate buffer pH 6. Viability of the cells after centrifugation was confirmed by staining with FDA and monitoring intact cells in a fluorescence microscope (electronic supplementary material, figure S2). Expression of β-glucosidases was analysed by SDS-PAGE and western blot (electronic supplementary material, file S1 and figure S3). For further details of expression see [33].

Quantitative real-time PCR
Quantitative real-time PCR (qRT-PCR) was carried out to analyse the mRNA levels of the three βglucosidase candidate genes (ZfGBA1, ZfBGD2, ZfBGD3) in different tissues. Data was quantified relative to the mRNA levels of the reference gene RNA polymerase II 140 kDa subunit (RpII140-RA, GenBank accession number KJ192329) [28] using the 2 − Ct -method [35]. Reactions were run on a CFX-96 Touch™ Real-Time PCR Detection System (Biorad) using Brilliant III Ultra-Fast SYBR ® Green QPCR Master Mix (Agilent Technologies) and cDNA as template. Two technical replicates were analysed from five (crude haemolymph, haemocytes, haemoplasma) or three biological replicates (head, gut, integument, Malpighian tubules, fat body). Technical replicates with a Ct difference of more than 0.5 were repeated. Distilled water as template served as negative control. For primers used, see electronic supplementary material, table S1.

Proteomics
To identify proteins in haemocytes and haemoplasma, protein extracts from both fractions from three different larvae were tryptically digested and desalted. Peptides were separated on a UPLCM system from Waters with a 120 min gradient followed by analysis in a Q-Exactive™ HF Hybrid Quadrupole-Orbitrap mass spectrometer. The gradient used buffer A, 0.1% formic acid in H 2 O, and Buffer B, 0.1% formic acid in ACN (Biosolve UPLC quality). Buffer A is given: 97% 0-14 min (loop online); 97% 14-15 min (loop offline); 97%-70% 15-75 min; 70%-60% 75-90 min; 60%-10% 90-94 min; 10% 94-104 min; 10%-97% 104-105 min; 97% 105-120 min; The HPLC flow rate was 400 nl min −1 , and the column temperature 50°C. Samples were run on a 200 mm (length) × 75 µm (ID) reversed phase CSH (charged surface hybrid) column (Waters) with a particle size of 1.7 µm. The spectra were recorded in positive ionization mode by applying a voltage of 2.4 kV to the emitter, and measured in the mass range m/z 300-1650, using a resolution of 120 000 with an ion time of 100 ms, and a target value of 1E6. Top 15 MS/MS spectra were acquired with a resolution of 15 000 using an ion time of 100 ms and a target value of 1E5 ions. The peptides with a detected charge of 2+, 3+, 4+, 5+, 6+, 7+ or 8+ were selected for the MS/MS acquisition with a width of m/z 1.4, and a normalized collision energy (NCE) of 27. Fragmented peptides were dynamically excluded from further MS/MS analysis for 30 s. The search was carried out in Mascot Daemon version 2.5.1 using a Z. filipendulae transcriptomic database with 29.395 unique entries created in house. The search settings were as follows: enzyme Trypsin with 1 missed cleavage allowed, Peptide tolerance ±10 ppm and charge 2+ and 3+. MS/MS tolerance was set to 0.5 Da. Carbamidomethyl was set as fixed modification, and oxidation as variable modification. Peptides were accepted as identified with a Peptide Score of 30, and proteins as identified with at least 2 peptides and a false discovery rate of 0.1%. For a rough semi-quantitative evaluation, we used the Mascot protein Score. We verified the sequence coverage and the reliability of the Score manually for the proteins of interest.

Liquid chromatography-mass spectrometry
Liquid chromatography-mass spectrometry (LC-MS) analysis was carried out using re-suspended haemocytes (1 µl), haemoplasma (0.1 µl) and crude haemolymph (0.1 µl). Samples were added to icecold 55% methanol (containing 0.1% formic acid and 0.044 mM amygdalin as internal standard) resulting in haemocyte disruption. Analytical LC-MS was carried out as described in [8]. Mass spectral data were analysed with the native data analysis software.
Quantification of each compound was based on extracted ion chromatogram (EIC) peak areas compared to calibration curves of linamarin, lotaustralin and amygdalin.

Phylogenetic analysis of β-glucosidases and glucocerebrosidases
Candidate genes were aligned in MEGA7 [36] using MUSCLE [37,38] with default settings, and refined manually (electronic supplementary material, file S2). Representative gene sequences from the moths Amyelois transitella, Bombyx mori, and Plutella xylostella, as well as the butterflies Danaus plexippus, and Papilio polytes, were downloaded from GenBank following BLAST searches with ZfGBA1, ZfBGD2 and ZfBGD3 as query sequences (using BLASTx and BLASTn with default algorithm parameter settings). The honeybee Apis mellifera was used as outgroup in the glucocerebrosidase phylogenetic tree and LjBGD2 from Lotus japonicus and linamarase from clover Trifolium repens [39] in the β-glucosidase tree. Additionally, recently characterized β-glucosidases from the chrysomelid beetles Chrysomela lapponica, Phaedon cochleariae and Phyllotreta striolata were added to the β-glucosidase tree, and the only characterized glucocerebrosidase from insects, DmGba1b from Drosophila melanogaster, was added to the glucocerebrosidase tree. Phylogenetic trees were generated in MEGA7 using protein sequence alignments, and the maximum-likelihood method with a JTT model, and a discrete Gamma distribution. This model was chosen based on the model test from the MEGA7 program, where the model with the lowest BIC score (Bayesian information criterion) is considered to describe the observed substitution pattern best. To examine whether candidate genes had been under positive (ω values > 1) or negative selection (ω values < 1), nonsynonymous to synonymous substitution rate ratios (ω = (dN/dS)) were calculated for codon-based nucleotide alignments with the program codeml from the PAML package, v. 4.1 [40,41]. Different maximum-likelihood models of codon substitution were tested to account for variable selection pressures among different amino acid sites. All site models (NSsites) were tested with the settings model = 0. Branch models were also tested (model = 1:b). The remaining settings were default. The different models with different classes of ω were compared with likelihood ratio tests [42] to detect if specific regions of the genes had been under positive selection [43].

Investigation of proposed positively selected residues
The ZfBGD2 protein sequence was BLAST searched against the protein data bank (PDB) to identify templates suitable for homology modelling. The two best hits, a β-glycosidase from Spodoptera frugiperda (5CG0_A, E-value = 0.0) and a β-glucosidase from Neotermes koshunensis (3AHZ_A, E-value = 9.40268 × 10 −127 ), were selected based on sequence identity, resolution of the structure and quality of the structural determination. The sequences were aligned pairwise using the MUSCLE algorithm. The alignment and template structure was used by MODELLER [44] to create the homology model.

ZfBGD2 hydrolyses linamarin and lotaustralin resulting in cyanide release
To identify the enzyme responsible for hydrolysing CNglcs in Z. filipendulae, the β-glucosidases with the highest relative expression level according to the transcriptome sequencing, ZfBGD2, ZfBGD3 and ZfGBA1, were chosen for heterologous expression in Sf 9 insect cells. ZfGBA1 and ZfBGD2 were predicted to contain a native signal peptide and expressed both with and without (*) this peptide to ensure that the intrinsic N-terminal leader signal sequence in the expression vector would not be compromised. However, the recombinant proteins did not accumulate outside of the Sf9 cells despite being fused to the leader signal sequence in the vector. Consequently, expressed enzymes were retained in Sf9 cells, harvested by mild centrifugation, and the viability of cells was confirmed by FDA staining (electronic supplementary material, figure S2). Heterologously expressed enzymes of ZfBGD2 and ZfGBA1 were found to hydrolyse the generic substrate MUglc (ZfGBA1 ∼ 400, ZfBGD2 ∼ 3.000, GalNac-T2 ∼ 100 nmol released MU). ZfBGD2 and ZfGBA1 also hydrolysed linamarin and lotaustralin ( figure 1a,b). Surprisingly, ZfBGD2 was found to hydrolyse the aromatic CNglc prunasin up to fivefold better than linamarin (figure 1b). When incubating the expressed candidates with boiled haemolymph (containing CNglcs but lacking β-glucosidase activity), only ZfBGD2 emitted twice the amount of HCN as compared to boiled haemolymph alone (figure 1c). Defence droplets containing CNglcs and a non-cyanogenic β-glucosidase [29], did not induce a significantly higher HCN emission upon   . Additionally, CNglc-containing integument (taken from the underside of the larvae, thus lacking cuticular cavities and haemolymph) released HCN after tissue homogenization in buffer. However, when integument was intact, no HCN release was detected, implying that ZfGBA1 is not in contact with linamarin and lotaustralin in this tissue. ZfBGD3 generally showed low expression in the tissues analysed, having the highest expression in the integument with mean 2 − Ct -values of 7 ± 0.7. A total of 428 different proteins could be identified in the two fractions of haemolymph, of which 115 were only in the haemoplasma, 219 in the haemocytes, and 94 proteins were common in both fractions (electronic supplementary material, file S3). ZfBGD2 is detected only in haemocytes (coverage 25%) while ZfGBA1 is found in both fractions (coverage 64% in haemoplasma and 33% in haemocytes). ZfBGD3 is found only in haemoplasma (coverage 8%). Since both transcripts and proteins of ZfBGD2 are found in haemocytes, and not identified in the haemoplasma, it indicates that the protein is both produced  and stored here. On the contrary, ZfGBA1 seems to be expressed and produced in the fat body and integument, and then at least a fraction of this enzyme is transported into the haemolymph.

Linamarin, lotaustralin and cyanide are present in the haemoplasma
To elucidate whether cyanogenic β-glucosidases and CNglcs are spatially separated within Z. filipendulae haemolymph, crude haemolymph, haemoplasma and haemocytes were analysed by LC-MS. Linamarin and lotaustralin were found in the haemoplasma in the same amount and ratio as in crude haemolymph (figure 3b), but not detected in the haemocytes. HCN emission was detected from haemoplasma and crude haemolymph at equally high levels, but not detected from haemocytes (figure 3c). Finally, generic β-glucosidase activity as monitored by the hydrolysis of MUglc was mainly present in haemoplasma and crude haemolymph as well, whereas only minor generic β-glucosidase activity was detected in haemocytes (figure 3d).

ZfBGD2 evolved independently in insects compared to plants
Sequences belonging to GH family 1 and 30 were extracted from three Z. filipendulae transcriptomes, and each family was subjected to a phylogenetic analysis, with representative genes from lepidopteran species. ZfBGD2 has close homologues from all butterflies and moths examined (figure 4), and clearly evolved independently in insects compared to plants, because the cyanogenic β-glucosidases from L. japonicus and T. repens plants do not cluster with the insect sequences. The few insect GH1 β-glucosidases involved in chemical defence which have been functionally characterized: a myrosinase (β-thioglucosidase) from P. striolata [10] and two β-glucosidases from Chrysomelina leaf beetles [11], formed their own cluster in the tree (45-50% identical to the other insect β-glucosidases) and were not closely related to any of the highly expressed β-glucosidases in Z. filipendulae (46-48% identical).
ZfBGD3 was the best hit when BLAST searching the Z. filipendulae transcriptomes with these enzymes as queries, although ZfBGD2 was present in the top five hits (E-values 6 × 10 −89 , 2 × 10 −83 and 4 × 10 −92 respectively). ZfGBA1 from Z. filipendulae has no close homologues from other lepidopterans, and is apparently non-existent in plants (http://www.cazy.org/GH30_eukaryota.html). It has been duplicated in Z. filipendulae recently, as seen by the closely related gene Zf C21542, which could indicate a new functionalization event.
To examine if the gene candidates had been submitted to selection, we calculated the nonsynonymous (dN) to synonymous (dS) substitution rate ratio (ω = dN/dS) on full-length sequences of β-glucosidases and glucocerebrosidases from Z. filipendulae. In our analyses, average ω values range from 0.00 to 0.53 (table 1)   with the best fit to the glucocerebrosidase sequences, and model 8 (10 classes of ω values with 10% of the sequence fitted to each value, β distribution, and one extra class of ω above 1) provides the best fit for the β-glucosidases. Both the models have many classes of ω values, which for the glucocerebrosidases are all below 1. This signifies that some segments of the glucocerebrosidase sequences are enduring more negative selection than others, but no sites with positive selection were detected. For the β-glucosidases 1.6% of the sequence was found to be likely under positive selection, although only one significant amino acid could be found in the empirical Bayes analysis (amino acid 36 from the alignment). The following amino acids with a non-significant probability were found as well: 17, 23, 40, 42, 574 and 575. No significant positive selection was found on any of the branches leading to the candidate genes (data not shown), so the positive selection found for the β-glucosidases seems not to be restricted to any specific clades in the phylogenetic tree.

Investigation of proposed positively selected residues
To analyse the importance of the proposed positively selected residues, a structural model of the ZfBGD2 protein sequence was produced by homology modelling. Two sequences were selected as templates    and their structures used to create homology models. The spatial positioning of residues 23, 36, 40 and 42 hypothesized to be under positive selection was examined in the models. They all reside in the N-terminus of the protein, predicted to be a signal peptide, suggesting that the residues are unlikely to be important for protein function. This is supported by both templates retaining their function when lacking the approximately 20 residues of their N-terminus confirmed to be signal peptides. Unfortunately this truncation also made it impossible to model the proposed signal peptide region of ZfBGD2. Since ZfBGD2 is shorter than the alignment consensus sequence, residues 17, 574 and 575 are not present in ZfBGD2, and hence these residues could not be investigated further.

Discussion
Insect β-glucosidases are often studied for their ability to hydrolyse cellulose [45][46][47][48] or plant defence compounds [10,11,16,[49][50][51], after isolation from digestive tissues such as salivary glands or gut [52,53]. Gut β-glucosidases from insects may also elicit indirect plant defences based on volatiles [54], or detoxify plant diterpene glycosides [55]. These studies underpin the general role of insect β-glucosidases in herbivore-plant interactions. Here, we elucidate another role of an insect β-glucosidase, i.e. involvement in cyanogenesis. Larvae of Z. filipendulae were used as model system to understand how and where in the insect body the CNglcs linamarin and lotaustralin are enzymatically hydrolysed to release HCN. The haemoplasma was identified as the part of the body with HCN release per se. The spatial separation of the cyanogenic β-glucosidase ZfBGD2 (present in haemocytes) and its cyanogenic substrates (present in haemoplasma) elucidated here, provides the basis for active cyanogenesis in the Z. filipendulae haemoplasma. We hypothesize that ZfBGD2 is released from haemocytes into haemoplasma, and at the same time rendered active perhaps by the loss of the putative inactivating signalling peptide, or by binding to another protein or other factor. Proteomic profiling did not detect ZfBGD2 in the haemoplasma (electronic supplementary material, file S3) but this does not exclude a low amount of the active enzyme here. Since HCN emission is restricted to the haemoplasma, direct cytotoxic effects to haemocytes are prevented. Moreover, Zygaena insects are generally resistant to HCN [56], mainly due to β-cyanoalanine synthase activity converting HCN and the amino acid cysteine into β-cyanoalanine [17,29]. Recently, β-cyanoalanine synthases involved in HCN detoxification were identified in other arthropods coping with HCN, such as the spider mite Tetranychus urticae [12] and the butterfly Pieris rapae [24]. The Z. filipendulae transcriptomes have also revealed a copy of this gene, although it has not been characterized yet.
Although there are many cyanide-releasing arthropods, including species from Chelicerata, Diplopoda, Chilopoda, Coleoptera, Heteroptera as well as Lepidoptera [17,19], ZfBGD2 is the first cyanogenic β-glucosidase characterized from an arthropod: it hydrolyses the two aliphatic CNglcs linamarin and lotaustralin resulting in HCN release, but also the aromatic non-physiological CNglc prunasin. Prunasin is not naturally present in the food plant of Z. filipendulae larvae, but it can be sequestered by them [57], and may be ingested from food plants by other species of Zygaenidae (Chiharu Koshio 2016, personal communication). Only few insect GH1 β-glucosidases involved in chemical defence have been functionally characterized: a myrosinase from the striped flea beetle P. striolata hydrolysing sequestered aliphatic glucosinolates [10], and a β-glucosidase from glands in juvenile Chrysomelina leaf beetles hydrolysing e.g. sequestered salicin or de novo produced 8-hydroxygeraniolβ-D-glucoside [11]. None of the examined β-glucosidases from Z. filipendulae were closely related to any of these functionally characterized β-glucosidases (figure 4).
Similar to Z. filipendulae larvae, cyanogenic plants store CNglcs and specific β-glucosidase spatially separate, such as in vacuole versus apoplast or chloroplast [3]. This ensures that toxic HCN is only released after tissue damage, when a cyanogenic β-glucosidase comes in contact with its substrates [9]. Thus, the HCN release per se from haemoplasma of Z. filipendulae is different from cyanogenic plants since no obvious tissue damage to larvae is necessary for HCN formation. This is consistent with the finding that Z. filipendulae larvae constantly release HCN [26,53]. Given that the total concentration of haemocytes may vary in response to biotic stress as shown for a polyphagous caterpillar [58], nutritional or mechanical stress in Z. filipendulae may similarly result in enhanced levels of ZfBGD2 released from haemocytes enabling a higher turnover of CNglcs. Further examples of compartmentalization between glucosylated defence compound and activating glucosidase in insects are found in the aphid specialists Brevicoryne brassicae, Lipaphis erysimi. They store an endogenous myrosinase in crystalline microbodies and sequester glucosinolates into their haemolymph from their Brassica host plant [6,59]. Disruption of this compartmentalization results in the release of biologically active isothiocyanates. P. striolata also produces its own myrosinase [10], but it is still unclear how these beetles avoid hydrolysis of stored glucosinolates [10]. Therefore, a similar spatial separation as in Z. filipendulae can be envisioned.
Since CNglcs in Z. filipendulae are derived from either sequestration or biosynthesis, ZfBGD2 likely has a key role in regulating their overall levels [60]. Accordingly, it was found in transcriptomes from Z. filipendulae larvae regardless of whether they were biosynthesizing or sequestering CNglcs. Interestingly, ZfBGD2 is 1.4 times higher expressed in larvae that are sequestering compared to larvae that are biosynthesizing, as seen by comparing transcriptomes from them [31]. This is not statistically significant, but could indicate that the turnover of CNglcs is perhaps slightly higher in sequestering larvae due to the need to maintain the correct ratio of linamarin and lotaustralin regardless of the ratio in the ingested material [59].
Cuticular cavities harbouring defence droplets in Z. filipendulae larvae do not release HCN [29], although they contain even higher concentrations of CNglcs than crude haemolymph: approximately 25 µg µl −1 versus approximately 11 µg µl −1 , respectively [60]. This is because the only β-glucosidase present in the defence droplets (ZfBGD1) cannot hydrolyse linamarin and lotaustralin [29]. However, in case of severe injury of a larva, CNglcs in the droplets will come in contact with exuding haemolymph and thus mix with the ZfBGD2 enzyme identified in this study. This interplay leads to increased levels of HCN release (57 versus 31 nmol HCN/h/µl from haemolymph alone [29]), and is supported by the finding here that significantly more HCN is released from haemolymph spiked with recombinant ZfBGD2 than from haemolymph alone (figure 1c).
When the glycosyl moiety of linamarin or lotaustralin is cleaved by ZfBGD2, the resulting αhydroxynitrile is able to spontaneously dissociate into HCN and a ketone due to the pH of 6.3 in the haemolymph of Z. filipendulae [29]. Below pH 6 an α-hydroxynitrile lyase would be needed for HCN release. Since α-hydroxynitrile lyase activity has been found in crude haemolymph from Z. trifolii [61], the presence of an α-hydroxynitrile lyase could perhaps accelerate cyanogenesis in the haemolymph. Since more HCN is released when ZfBGD2 is combined with crude haemolymph (figure 1c), the enzyme might act in sequence with another protein or other component not present in Sf 9 cells or media for higher turnover. Future studies will reveal whether it is indeed an α-hydroxynitrile lyase or other factors which are needed for faster HCN release in Z. filipendulae.
Although ZfBGD2 is clearly the primary cyanogenic β-glucosidase in Z. filipendulae, ZfGBA1 also has activity against linamarin and lotaustralin. This candidate is only expressed in integument and fat body in Z. filipendulae larvae, but the enzyme can be found in haemocytes and haemoplasma. Both integument and fat body contain CNglcs, but HCN was only released from the integument if it was homogenized, which would lead to mixing of ZfGBA1 with linamarin and lotaustralin. This indicates that ZfGBA1 is not present in the same subcompartment as the CNglcs in this tissue. Therefore, the primary function of this glucocerebrosidase is perhaps not cyanogenesis, but rather another yet unknown function, which is supported by negative purifying selection to maintain the gene. A study in D. melanogaster indicates that the glucocerebrosidase gene DmGba1b plays a role in the metabolism of protein aggregates in the insect brain [62]. A similar function could be envisioned for ZfGBA1 or its homologues in Z. filipendulae, although they are only 34-38% identical to DmGba1b (figure 4). The activity of ZfGBA1 against CNglcs found in this study is the first such activity assigned to any glucocerebrosidase, and gives an indication of the capabilities of this type of enzyme in insects.
Our phylogenetic analyses of ZfBGD2 show that the cyanogenic β-glucosidases evolved convergently in plants and insects. This is similar to the evolution of the biosynthetic pathway of CNglcs in Z. filipendulae and plants [7], and highlights the fact that there are not many ways to biosynthesize or hydrolyse CNglcs, and that enzymes from specific families have to be recruited for the pathways. Since the positive selection found in the β-glucosidases is restricted to the putative signal peptide in the Nterminal of the proteins, it can be assumed that the function of activating CNglcs is ancient, needing to be maintained, and thus under negative purifying selection. Biosynthesis of linamarin and lotaustralin has also been hypothesized to be an old trait, at least within Zygaenidae, and most likely also in the common ancestor of butterflies and moths [15,63]. Accordingly, bouts of positive selection present upon recruitment of genes for the pathway have long since been masked by a long period of purifying selection to maintain the sequence intact, once the optimal sequence had been obtained [30]. The hydrolysis of CNglcs by a specific β-glucosidase in Z. filipendulae probably evolved at the same early time point or even earlier than biosynthesis, since a capacity for turnover of CNglcs is necessary for utilization of these compounds in the insect.

Conclusion
Activity of a cyanogenic β-glucosidase was shown to play a pivotal role for cyanogenesis in a specialized Lepidopteran. Spatial separation of ZfBGD2 and its two cyanogenic substrates within the haemolymph (haemocytes versus haemoplasma) enables Z. filipendulae larvae to have a constant turnover of CNglcs and thus HCN formation. This compartmentalization of the CNglc/β-glucosidase system is similar to the situation found in cyanogenic plants [64]. ZfBGD2 is to our knowledge the first characterized arthropod β-glucosidase involved in CNglc catabolism and has evolved in convergence compared to cyanogenic plant β-glucosidases.