Profiling of the full-length transcriptome in abdominal aortic aneurysm using nanopore-based direct RNA sequencing

Abdominal aortic aneurysm (AAA) is a common and serious disease with a high mortality rate, but its genetic determinants have not been fully identified. In this feasibility study, we aimed to elucidate the transcriptome profile of AAA and further reveal its molecular mechanisms through the Oxford Nanopore Technologies (ONT) MinION platform. Overall, 9574 novel transcripts and 781 genes were identified by comparing and analysing the redundant-removed transcripts of all samples with known reference genome annotations. We characterized the alternative splicing, alternative polyadenylation events and simple sequence repeat (SSR) loci information based on full-length transcriptome data, which would help us further understand the genome annotation and gene structure of AAA. Moreover, we proved that ONT methods were suitable for the identification of lncRNAs via identifying the comprehensive expression profile of lncRNAs in AAA. The results of differentially expressed transcript (DET) analysis showed that a total of 7044 transcripts were differentially expressed, of which 4278 were upregulated and 2766 were downregulated among two groups. In the KEGG analysis, 4071 annotated DETs were involved in human diseases, organismal systems and environmental information processing. These pilot findings might provide novel insights into the pathogenesis of AAA and provide new ideas for the optimization of personalized treatment of AAA, which is worthy of further study in subsequent studies.

HX, 0000-0002-9298-6262 Abdominal aortic aneurysm (AAA) is a common and serious disease with a high mortality rate, but its genetic determinants have not been fully identified. In this feasibility study, we aimed to elucidate the transcriptome profile of AAA and further reveal its molecular mechanisms through the Oxford Nanopore Technologies (ONT) MinION platform. Overall, 9574 novel transcripts and 781 genes were identified by comparing and analysing the redundant-removed transcripts of all samples with known reference genome annotations. We characterized the alternative splicing, alternative polyadenylation events and simple sequence repeat (SSR) loci information based on full-length transcriptome data, which would help us further understand the genome annotation and gene structure of AAA. Moreover, we proved that ONT methods were suitable for the identification of lncRNAs via identifying the comprehensive expression profile of lncRNAs in AAA. The results of differentially expressed transcript (DET) analysis showed that a total of 7044 transcripts were differentially expressed, of which 4278 were upregulated and 2766 were downregulated among two groups. In the KEGG analysis, 4071 annotated DETs were involved in human diseases, organismal systems and environmental information processing. These pilot findings might provide novel insights into the pathogenesis of AAA and provide new ideas for the optimization of personalized treatment of AAA, which is worthy of further study in subsequent studies. studies (GWAS) [7]. In addition, emerging evidence has shown that genetic variants are strongly associated with a number of cardiovascular diseases through GWAS studies, including AAA, coronary artery disease, myocardial infarction, as well as vascular remodelling, blood pressure, triglyceride cholesterol and LDL metabolism [8,9]. Previous studies have shown that genetic components account for approximately 70% of total AAA susceptibility [10], suggesting genetic factors play a vital role in aetiology. However, the genetic determinants of AAA have not yet been fully determined.
In the present paper, the transcriptome characterization of AAA was identified by the Oxford Nanopore Technologies (ONT) MinION platform, and its possible molecular mechanism was further revealed. By analysing transcriptome data, we attempted to reveal the vital transcripts and pathways implicated in AAA. The study design is presented in figure 1. These results will help to provide critical insights into the pathogenesis of AAA for future searches of the therapeutic targets.

Specimen gathering
AAA samples were obtained from five patients undergoing open surgical treatment, and another five peripheral blood samples of healthy subjects were gathered as the control group (t group) in the affiliated hospital of Qingdao University from between January 2019 and January 2020.

RNA extraction, cDNA library preparation and nanopore sequencing
Total RNA was isolated from the samples using Trizol reagents (Invitrogen, Carlsbad, CA, USA). One microgram of total RNA was prepared for the cDNA library construction by the cDNA-PCR sequencing kit (SQK-PCS109, Oxford Nanopore Technologies, Oxford, UK). Finally, the cDNA libraries within the FLO-MIN109 flow cells were worked on the PromethION platform (Biomarker Technology Company, Beijing, China). These testing processes were conducted according to the protocol of the manufacturer.

Remove redundant and find fusion transcript
Minimap2 program was employed to match the consensus sequences to the reference genome. Then, using the cDNA Cupcake package with min-coverage = 85% and minidentity = 90%, the mapped reads were collapsed. When collapsing redundant transcripts, the 5 0 difference was not taken into account. The criteria for fusion candidate genes were the single transcript, with (i) minimum distance of 10 kb between the loci, (ii) total coverage of ≥ 95%, (iii) minimum coverage rate of per loci was 5% and minimum coverage (bp) ≥ 1 bp, and (iv) at least map to 2 or more loci.  royalsocietypublishing.org/journal/rsob Open Biol. 12: 210172

Structural analysis of transcripts
Gffcompare was used to verify transcripts to the annotated transcripts of known reference. AStalavista and TransDecoder tools were employed to identify the alternative splicing (AS), such as intron retention (IR), exon-skipping (ES), AD, AA and MEE, and alternative polyadenylation (APA) events, respectively. CDSs and simple sequence repeat (SSR) analysis of transcriptome was performed by TransDecoder and MISA program, respectively.

LncRNA analysis
In the transcripts, the criteria of the minimum length greater than or equal to 200 nucleotides and at least 2 exon count thresholds were applied to screen the lncRNA candidates. Then, lncRNAs were further distinguished by four computational methods combined, including coding potential assessment tool (CPAT), protein family (Pfam), coding-noncoding index (CNCI) and coding potential calculator (CPC).

Quantification and differential expression analysis of gene/transcript expression
The full-length sequencing transcriptome and publicized genomic transcripts of reference were used for sequence alignment. The match quality of reads greater than 5 was further quantified. The expression level was evaluated via reads per gene/transcript mapped per 10 000 reads. The DESeq R package (1.18.0) was used to conduct differential expression analysis of two conditions/groups for the specimens with biological replicates. For controlling the false discovery rate, Benjamini and Hochberg's approach was employed to adjust the resulting p-values. Genes screened by DESeq with FDR < 0.05 and fold change ≥ 2 were specified as differentially expressed.
To specimens with no biological duplicates, read counts for each sequenced library were adjusted via edgeR program package prior to differential gene expression analysis. The EBSeq R package (1.6.0) was used for differential expression analysis of two samples and the PPDE ( posterior probability of being DE) for the resulting FDR (false discovery rate) adjustment. Threshold with FDR < 0.05 and foldchange ≥ 2 was considered to be significantly differential expression.

Protein-protein interaction
Based on the results of differential expression analysis, the predicted PPI of these differentially expressed transcripts (DETs) were obtained by the blast the sequences of the DETs with the genome of a related species (the protein interaction of which exists in the STRING database: http://stringdb.org/). Then, the Cytoscape program was used to visualize the PPI of these DEGs.

ONT sequencing overview
Overall, we constructed 10 transcriptome libraries in total (b1-b5 of the AAA group and t1-t5 of the control group), and 2.71 GB of clean data was output in each sample. After discarding the low-quality and short reads, 19 680 639 and 16 012 762 clean reads were obtained from the AAA and control group, with a mean N50 of 768 and 1241.8 bp, and the average read length of 820.8 and 1125 bp, respectively (electronic supplementary material, table S1 and figure S1). Additionally, the quality (Q) score distribution of the above 10 groups of raw reads was distributed between Q6 and Q18, with Q12 and Q13 accounting for the highest percentage (electronic supplementary material, figure S2). 13 977 197 and 10 088 571 clean reads were generated after removing rRNA, among which 88.07% and 84.18% were identified to be full length, respectively (electronic supplementary material table S2 and figure S3). Then, 1 to 24 fusion transcripts were obtained from each sample (electronic supplementary material, file S1). In total, 9574 novel transcripts and 781 genes were identified through comparing and analysing the redundant-removed transcripts of all samples with known reference genome annotation (electronic supplementary material, files S2 and S3).

Characterization of alternative splicing, alternative polyadenylation and SSR
Within the ONT data, a total of 13 427 AS events were detected and grouped into five classes, including 7339 ES events, 1976 alternative 3 0 sites (Alt. 3 0 ), 1827 IR events, 1730 alternative 5 0 sites (Alt. 5 0 ) and 555 mutually exclusive exon events (figure 2a,b). DETs with different AS events were of further concern (figure 2c). The identified APA of each sample is shown in the electronic supplementary material, figure S4, and the motif analysis of 50 bp sequences upstream of the polyA site of all transcripts is shown in electronic supplementary material, figure S5. In addition, transcripts longer than 500 bp were selected for SSR analysis by MISA. The result showed that a total of 38 844 SSRs were detected from ONT data (electronic supplementary material, file S4). SSR loci repeat units were 1 to 6 bases, of which the most frequent SSRs identified were p1 (21 458), followed by p3 (7700), p2 (4981), p4 (731) and p5 (182); few P6 (69) were discovered. Additionally, compound SSR and compound SSR with overlapping positions were 3558 and 165, respectively (figure 2d).

Dynamic expression of transcripts in abdominal aortic aneurysm
To get the annotation information of the transcripts, the obtained novel transcript sequences were aligned to the databases, such as NR, COG, KOG, Swissprot, GO, Pfam and KEGG. Functional annotation was conducted on the novel transcripts (electronic supplementary material, file S5), and the quantity of transcripts annotated in each database is listed in electronic supplementary material,  4b). royalsocietypublishing.org/journal/rsob Open Biol. 12: 210172 In general, the expression of transcripts is temporal and spatially specific. DETs were defined as those with significantly different expression levels under two different conditions. DET analysis was carried out and results showed 7044 transcripts in total were differentially expressed, of which 4278 were upregulated and 2766 were downregulated. The volcano plot of differential expression is shown in figure 5a. The overall distribution of expression level and fold change of transcripts of the two groups of transcripts can be clearly seen through MA plot (figure 5b). Additionally, hierarchical clustering analysis was used to screen DETs and significant differences were found in their expression profiles (figure 5c).

Functional annotation of differentially expressed transcripts in abdominal aortic aneurysm
The database function annotation for DETs was further executed, and the statistics of annotated transcripts number is listed in electronic supplementary material, table S5. The royalsocietypublishing.org/journal/rsob Open Biol. 12: 210172 GO enrichment analysis on DETs was introduced (figure 6a), which revealed a number of highly enriched biological processes, such as the single-organism process, cellular process, metabolic process and biological regulation. Enrichment for the cellular components of cell, cell part, organelle and membrane were also observed. The target genes were mainly involved in the regulation of binding, catalytic activity, molecular function regulator and signal transducer activity in molecular function. In the KEGG analysis, organismal systems, human diseases and environmental information  )   sample   b1  b2  b3  b4  b5  t1  t2  t3  t4  t5   sample   b1  b2  b3  b4  b5  t1  t2  t3  t4     ot he r or ga ni sm pa rt bi nd in g ca ta ly tic ac tiv ity m ol ec ul ar fu nc tio n re gu la to r si gn al tra ns du ce r ac tiv ity tra ns po rte r ac tiv ity nu cl ei c ac id bi nd in g tra ns cr ip tio n fa ct or ac tiv ity m ol ec ul ar tra ns du ce r ac tiv ity st ru ct ur al m ol ec ul e ac tiv ity tra ns cr ip tio n fa ct or ac tiv ity , pr ot ei n bi nd in g el ec tro n ca rr ie r ac tiv ity an tio xi da nt ac tiv ity tra ns la tio n re gu la to r ac tiv ity ch em or ep el le nt ac tiv ity ch em oa ttr ac ta nt ac tiv ity m et al lo ch ap er on e ac tiv ity pr ot ei n ta g m or ph og en ac tiv ity or gn el le pa rt ce ll ag gr eg at io n bi ol og ic al ph as e m ul tic el lu la r or ga ni sm al pr oc es s ce llu la r co m po ne nt or ga ni za tio n or bi og en es is royalsocietypublishing.org/journal/rsob Open Biol. 12: 210172 gamma R-mediated phagocytosis, natural killer cellmediated cytotoxicity and intestinal immune network for IgA production were the top three pathways. A total of 1381 DETs were classified as belonging to environmental information processing, in which the PI3 K-Akt, NF-kappa B and calcium signalling pathways were the top three. Then, to further clarify the molecular function of DETs from AAA, they were allocated to COG classification and separated into 26 specified categories (figure 7a). The results revealed that the top hits include 'protein turnover, posttranslational modification and chaperones', 'signal transduction mechanisms', and 'ribosomal structure, translation and biogenesis' in both groups. Additionally, the Cytoscape visualization of the DETs protein interaction network is shown in figure 7b.

Discussion
AAA is one of the most common causes of death and disability in cardiovascular disease, particularly in the elderly population, which imposes an exorbitantly high financial burden on society. Except for a small percentage of incidental findings through an ultrasound-based screening programme, clinical diagnosis is usually at an advanced stage [11,12]. The risk of developing AAA is now considered to be a combination of personal lifestyle, environmental factors, genetic factors, and some physiological parameters or disease conditions, such as tobacco smoking history, increased age, male sex, cholesterol level, obesity, trauma, acute or chronic infection, connective tissue or inflammatory diseases, and family history [2,[13][14][15][16]. Importantly, genetic components account for approximately 70% of the total susceptibility to AAA according to some estimates, suggesting that genetic factors may play a key part in aetiology. Interestingly, several studies have also reported that the strong linkage of aneurysm rupture to family history of AAA [17,18]. Therefore, identifying the genetic foundations of AAA will provide insights into the pathogenesis of the disease, and ultimately guide early surveillance, diagnosis, intervention and clinical decision-making.
Recently, a great quantity of evidence has shown an association between AAA and several microorganism infections, and there is a growing interest in this line of aetiologic investigation [19,20]. In this study, we identified 9574 novel transcripts and 781 genes by third-generation nanopore-based RNA sequencing combined with emerging genomic technologies from 10 libraries, detected the dynamic expression of transcripts, and further performed function annotation and GO enrichment analysis for DETs. In addition, using KEGG analysis, infection-related pathways related to human diseases (such as Staphylococcus aureus, tuberculosis and Epstein-Barr virus infection) were found to be highly expressed in annotated DETs of AAA samples. Matsui & Hatta [21] have reported a case of AAA in a patient with dialysis-related methicillin-resistant Staphylococcus aureus bacteraemia. Pathologically, Staphylococcus aureus may attach to the damaged intima by producing dextran, thereby invading the highly calcified arterial wall and causing mycotic aneurysm during bacteraemia. Previous studies have shown that tubercle bacillus could infect aortic wall causing AAA. Although this is a particularly rare complication of tuberculosis, it usually ruptures easily and causes serious clinical events [22,23]. The infection or reactivation of Epstein-Barr virus could lead to a variety of lymphoproliferative diseases and other less frequent clinical complications, including haematologic malignancies, haematologic malignancies, coronary artery aneurysm, etc. [24]. Several case-report studies suggest that Epstein-Barr virus infection may be related to coronary artery aneurysm and abdominal aortic lesions [25][26][27]. Unfortunately, the pathophysiologic mechanisms are still unclear and need further exploration.
AS and APA of RNAs are two conventional approaches for producing isoform diversity, leading to the production of different proteins necessary to maintain biological traits and function [28,29]. It is reported that more than 90% of  royalsocietypublishing.org/journal/rsob Open Biol. 12: 210172 human multi-exon genes undergo AS and almost 20% of genes have APA sites loci in introns [30,31]. In a previous aortic aneurysm study, mRNA expression and AS analysis of the identified proteins revealed different fingerprints between the bicuspid and tricuspid groups in dilated and non-dilated aortic tissue, implying AS may play a key role in the formation of aortic aneurysm in patients with bicuspid and tricuspid aortic valves [32]. Additionally, Martin et al. [33] have demonstrated that decreased soluble guanylyl cyclase (sGC) activity in aortic aneurysms was associated with increased expression of abnormal sGC splicing variants, suggesting that AS contributes to diminished sGC function in vascular dysfunction. Similarly, APA is a ubiquitous mechanism in eukaryotic cells and is crucial for diverse cellular processes, such as mRNA metabolism, cell proliferation and differentiation, protein localization and diversification, and more commonly in gene regulation [34]. Importantly, some studies have revealed that it plays a fundamental role in the establishment of human diseases. For example, in the failing heart, the 3 0 -end formation of numerous mRNAs is changed, corresponding to the decrease of poly(A)-binding protein nuclear-1 expression [35]. APA contributes to cardiomyocyte hypertrophy via changing the expression of hypertrophy genes [36]. This evidence suggests that specific APA events may participate in the development of cardiovascular disease. Regrettably, there is almost no relevant research on APA in AAA. In our study, AS, APA and SSR events were initially identified in two groups, and future studies that address the pathophysiological consequences of these events are needed to evaluate their role in the pathogenesis of AAA and whether manipulation of these changes can be considered a therapeutic option for AAA.
LncRNAs are considered to be engaged in numerous vital biological processes as a crucial regulator of gene expression [37]. The third generation of nanopore RNA sequencing is helpful to identify the genetic structure of lncRNAs. Previous studies have shown that several essential lncRNAs may be involved in regulating the progression of AAA [38]. In the current study, we determined the comprehensive expression profile of lncRNAs in the two groups and proved ONT methods were suitable for the identification of lncRNAs.
In summary, third-generation nanopore-based RNA sequencing was introduced to explore the regulatory mechanisms of AAA. Especially, the study represented the initial comprehensive analysis of AS, APA and SSR events in AAA. These findings may provide novel insights into the pathogenesis of AAA, and future research should address the pathophysiological consequences of these changes in order to assess their role in the pathogenesis of AAA, and whether manipulation of these changes can be considered as a treatment option for AAA.
Ethics. This study was authorized and supervised by the Ethics Committees of the Affiliated Hospital of Qingdao University, and each participant gave written informed consent.