Biochemical analysis of nucleosome targeting by Tn5 transposase

Tn5 transposase is a bacterial enzyme that integrates a DNA fragment into genomic DNA, and is used as a tool for detecting nucleosome-free regions of genomic DNA in eukaryotes. However, in chromatin, the DNA targeting by Tn5 transposase has remained unclear. In the present study, we reconstituted well-positioned 601 dinucleosomes, in which two nucleosomes are connected with a linker DNA, and studied the DNA integration sites in the dinucleosomes by Tn5 transposase in vitro. We found that Tn5 transposase preferentially targets near the entry–exit DNA regions within the nucleosome. Tn5 transposase minimally cleaved the dinucleosome without a linker DNA, indicating that the linker DNA between two nucleosomes is important for the Tn5 transposase activity. In the presence of a 30 base-pair linker DNA, Tn5 transposase targets the middle of the linker DNA, in addition to the entry–exit sites of the nucleosome. Intriguingly, this Tn5-targeting characteristic is conserved in a dinucleosome substrate with a different DNA sequence from the 601 sequence. Therefore, the Tn5-targeting preference in the nucleosomal templates reported here provides important information for the interpretation of Tn5 transposase-based genomics methods, such as ATAC-seq.


Introduction
Chromatin is the eukaryotic nuclear architecture by which genomic DNA is highly compacted and accommodated within a nucleus [1]. In chromatin, the core histones H2A, H2B, H3 and H4 form the histone octamer, and approximately 150 base-pairs of DNA are bound to the histone octamer surface. Consequently, the DNA stretch is left-handedly wrapped around the histone octamer, thus forming the nucleosome [1][2][3]. The nucleosomes are connected with the linker DNA segments in chromatin. In the nucleus, the linker DNA lengths are not uniform, depending on the genomic loci and cell types, and are determined by the translational positions of the nucleosomes [4][5][6][7][8][9][10]. The DNA directly bound to the histone surface within the nucleosome is usually inaccessible to DNA-binding proteins, which function as regulators of transcription, replication, repair and recombination [10][11][12][13]. The histone-free linker DNA regions then become the target sites for these DNA-binding proteins [14,15]. Therefore, the nucleosome positioning is an important regulatory element for genomic DNA compaction and regulation.
To probe the nucleosome positioning in cells, nuclease hypersensitivity, in which the DNA regions without nucleosomes (nucleosome-free regions, NFRs) and the linker DNA regions between nucleosomes are preferentially digested, is commonly used. Deoxyribonuclease I (DNase I) is a double-stranded endonuclease that is employed for detecting the nucleosome-free DNA regions in the genome [16,17]. Micrococcal nuclease is an endo-exonuclease that preferentially digests the DNA stretches without nucleosomes, such as linker DNAs, and is used to analyse nucleosome occupancy and positioning in cells [10,16,17]. These nucleosome-mapping methods are combined with high-throughput DNA sequencing, and are commonly used for chromatin analysis, but usually require multiple steps and/or a large number of cells [16,17].
The assay for transposase-accessible chromatin using sequencing (ATAC-seq) method has been developed for detecting NFRs and linker DNA regions by a simple procedure from small amounts of input materials, based on the activity of Tn5 transposase, followed by high-throughput DNA sequencing [18]. The Tn5 transposase technology also enables the mapping of the distributions of histone modifications and DNA-binding proteins in 100-1000 cells by the chromatin integration labelling method [19] and in greater than 60 cells by the CUT&Tag method [20]. In these methods, the genomic DNA sequences associated with the target molecules are tagged with adaptor DNA sequences by the Tn5 transposase-mediated integration.
Bacterial Tn5 transposase promotes the transposition of the Tn5 transposon [21]. The Tn5 transposon contains two inverted repeats, IS50L and IS50R, and each IS50 repeat is located adjacent to two different 19-base-pair Tn5 transposase recognition sequences: the outside end and inside end sequences are located at the Tn5 transposon-host genomic DNA boundary and in the Tn5 transposon, respectively. Tn5 transposase is encoded in IS50R, and catalyses the Tn5 transposition in bacterial cells. The first step of the transposition reaction is the formation of a synaptic complex containing a Tn5 transposase dimer with two outside end sequences of the Tn5 transposon. Tn5 transposase excises the transposon from the flanking genomic DNA, via hairpin formation with the DNA ends of the transposon, and releases the synaptic complex, resulting in a pair of 3 0 -OH groups at the blunt ends of the excised transposon DNA fragment [22]. The 3 0 -OH groups of the DNA attack the phosphodiester bonds of the target DNA, integrating the transposon in the target site within the synaptic complex [23,24]. In the integration step, Tn5 transposase cleaves the target DNA with nine nucleotide 5 0 overhangs, and the resulting nine basepair gaps flanking the transposon are filled by the cellular DNA polymerase [25]. In vitro, Tn5 transposase effectively catalyses the transposition using synthetic oligonucleotide sequences (adaptor DNAs), under conditions with Mg 2+ ions [26,27].
The ATAC-seq method has been applied to map the nucleosome positioning and relaxed 'active' chromatin regions, which correlate to the DNase I hypersensitive sites, in diverse cell types and developmental stages [17,18,[28][29][30][31][32]. However, it has remained unclear how Tn5 transposase targets the locations of the adaptor DNA integration sites in the chromatin substrates. In the present study, we reconstituted dinucleosomes with various linker DNA lengths, and mapped the Tn5 transposase-mediated adaptor DNA integration sites in model chromatin substrates.

Tn5 transposase integrates DNA fragments at a specific site in the nucleosomal DNA
To test the targeting sites of Tn5 transposase in chromatin, we performed Tn5 transposase assays with nucleosomal DNA substrates in vitro. Purified Tn5 transposase was incubated with short oligonucleotide duplexes (adaptor DNAs), and the Tn5 complexed with adaptor DNAs (Tn5-DNA complex) was purified by gel filtration chromatography (electronic supplementary material, figure S1A-C). The adaptor DNA integration reaction by Tn5 transposase was conducted with the reconstituted dinucleosome as the targeting substrate (figure 1a). The dinucleosome was reconstituted on the 601 sequence, which uniquely forms a nucleosome at a single position (figure 1b). In the dinucleosome, two nucleosomes were connected with a 15-base-pair linker DNA (figure 1b, upper panel). In this experimental system, the reaction products can be detected as DNA fragments, because Tn5 cleaves the target DNA and integrates the adaptor DNAs at the cleaved site (figure 1a). The DNA fragments were detected by non-denaturing polyacrylamide gel electrophoresis (PAGE).
In the presence of the naked DNA substrate, the Tn5-DNA complex generated multiple DNA fragments, which represented the integration of the adaptor DNAs into different sites of the target DNA (figure 1c, lanes 2-6). On the other hand, in the presence of the dinucleosome substrate, the Tn5-DNA complex appeared to cleave a single site, resulting in two DNA fragments (figure 1c). These data suggested that the adaptor DNAs were integrated at a specific site (figure 1c, lanes 8-12). Short minor products were observed, when the reactions were conducted for longer times (figure 1c, lanes 9-12). These minor products may correspond to the products resulting from the Tn5 integration periodicity in chromatin, as previously reported [31].     royalsocietypublishing.org/journal/rsob Open Biol. 9: 190116 are located in the middle of the linker DNA region (figure 2c; electronic supplementary material, table S1). This additional cleavage site was not observed as a major site for the naked DNA template (electronic supplementary material, table S1), suggesting that the dinucleosome formation with the 30-basepair linker DNA may dictate the cleavage site in the linker DNA. These findings indicate that the middle of the linker DNA region, which may not directly contact the histone surface, may be additionally cleaved by Tn5 transposase, when the linker DNA length is expanded to around 30 base-pairs.

Effect of the dinucleosomal DNA sequence on the integration reaction by Tn5 transposase
To test whether the DNA sequence affects the integration site in the dinucleosome, the integration reaction by the Tn5-DNA complex was conducted using dinucleosome substrates containing the 603 sequence, which is different from the 601 sequence (electronic supplementary material, figure S3A). Two 603 dinucleosome substrates with 15-base-pair and 30base-pair linker DNAs were prepared (figure 3a; electronic supplementary material, figure S3B and C). In the Tn5 transposase integration assay with naked DNAs, several discrete bands were detected in the naked 603 DNA substrates, and the band patterns were different from those of the naked 601 DNA substrates (figure 3b, lanes 3,7,11,15; electronic supplementary material, table S1). This indicated that the Tn5-DNA complex may exhibit a DNA sequence preference in the integration reaction, consistent with previous reports [18,33,34]. However, in both the 601 and 603 DNA experiments, the specific bands observed in the integration reactions with the dinucleosome substrates were different from those of the naked DNA substrates (figure 3b). Therefore, the sequence preference of the  . After deproteinization, the DNA fragments were analysed by non-denaturing PAGE with SYBR Gold staining. The bands marked with * are annealing products of the DNA fragments containing partially single-stranded DNA. Black, white and grey arrowheads indicate longer, middle and shorter DNA fragments produced by the Tn5 transposase reaction, respectively. The substrate DNAs exhibited unusual migration profiles, which do not correspond to the DNA length, probably due to the structural nature of the 601 sequence. The DNA sequences of these template DNAs were confirmed by direct sequencing. (c) Profile of the Tn5 transposase cleavage sites for the dinucleosome containing the 30-base-pair linker DNA. The DNA fragments tagged by the Tn5 transposase assay were analysed by massively parallel pairedend sequencing, and the 5'-ends of the fragments are mapped on the substrate DNA sequence, as described in the Methods. The two major cleavage sites are shown in the schematic.
royalsocietypublishing.org/journal/rsob Open Biol. 9: 190116 Tn5-DNA complex may not be a determinant for the specific integration site in the dinucleosome. Intriguingly, the band patterns of the 601 dinucleosomes (15-base-pair and 30-base-pair linker DNAs) were quite similar to those of the 603 dinucleosomes ( figure 3b, lanes 5,13 and 9,17). These results suggested that, in the dinucleosome substrates, the DNA targeting sites of the Tn5-DNA complex may be dictated by the local DNA situation induced by the nucleosome formation, but not the DNA sequence preference.
It should be noted that, in the 601 dinucleosome substrate, the Tn5 transposase targeting site is restricted to the proximal nucleosome (figures 1d and 2c). However, in the 603 dinucleosome, the proximal and distal nucleosomes were equally targeted by Tn5 transposase (figure 4a,b). The region of the 601 sequence containing the preferential Tn5 transposase targeting site reportedly has lower affinity than the other part [35,36]. Therefore, these results suggested that Tn5 transposase may preferentially target the DNA region flexibly detached from the histone surface in the nucleosome.

Discussion
In the ATAC-seq method, Tn5 transposase has been employed to insert an adaptor DNA fragment in the nucleosome-free DNA regions and linker DNA regions between nucleosomes. The ATAC-seq signals are obtained as the integration sites of the adaptor DNA, and provide the nucleosome locations in chromatin with small amounts of input materials [18,31,37]. Therefore, the ATAC-seq method has been widely used for chromatin analysis. However, the specific location of the Tn5 transposase targeting site in nucleosomal DNA has not been clarified yet.
In the present study, we performed an in vitro Tn5 transposase assay with reconstituted dinucleosomes, and found that Tn5 transposase preferentially integrates adaptor DNAs near the entry-exit sites of the nucleosomal DNA. One DNA site cleaved by Tn5 transposase is located within the histone-DNA contact region of the nucleosome (figures 1d, 2c and 4). This Tn5 transposase target site did not depend on the linker DNA length (figure 2b). It should be noted that the 5 0 -140 site, which is also found as the major cleavage site in the dinucleosome substrates with the 601 sequence, has been observed as a preferential cleavage site for Tn5 transposase in the 601 naked DNA substrates (electronic supplementary material, table S1). This sequence preference of Tn5 transposase was not observed when the 603 sequence was used as the dinucleosome and naked DNA substrates (figures 3 and 4; electronic supplementary material, table S1). Therefore, the sequence preference may not be a major cause of the specific nucleosome targeting by Tn5 transposase. Why does Tn5 transposase prefer to target the entry-exit sites of the nucleosomal DNA? One plausible explanation is 'nucleosomal DNA breathing' [38]. In the nucleosome, the entry-exit DNA regions spontaneously detach and re-attach on the histone surface [38]. Tn5 transposase may cleave the DNA stretch in the nucleosomal DNA region, when the target site is stochastically detached from the histone surface by the nucleosomal DNA breathing, rendering this entry-exit DNA region accessible to the enzyme.
In the 601 dinucleosome, the integration reaction by Tn5 transposase occurred in the proximal nucleosome, but rarely occurred in the distal nucleosome (figures 1d and 2c). This result is consistent with the report that the 601 sequence causes asymmetric nucleosomal DNA breathing, because of the different DNA flexibilities on either side in the DNA sequence [35,36,38]. Interestingly, Tn5 transposase attacks both the proximal and distal nucleosomes, when the 603 sequence is employed ( figure 4). The nucleosome containing the 603 sequence may provide attack sites for Tn5 transposase on both sides of the nucleosomal DNA ends, because they may equally bind to the histones.
As discussed above, Tn5 transposase preferentially targeted the entry-exit sites of the nucleosomal DNA, when the linker DNA lengths ranged from 5 to 25 base-pairs (figure 2b). However, we also found that, in the presence of the dinucleosome   royalsocietypublishing.org/journal/rsob Open Biol. 9: 190116 containing the 30-base-pair linker DNA, Tn5 transposase additionally targeted the middle of the linker DNA (figures 2b and 3b). Therefore, ATAC-seq may specifically detect the nucleosome entry-exit sites, if the linker length is shorter than 30 basepairs. In cells, the linker DNA length varies among species and genomic regions. Short 5-base-pair linker DNAs exist in mouse embryonic stem cells [9] and yeasts Saccharomyces cerevisiae [7] and Schizosaccharomyces pombe [8]. In S. cerevisiae, the median linker DNA length is reportedly 23 base-pairs [7], and the average linker DNA length around the transcriptional start sites of genes is 18 base-pairs [5]. Schep et al. [31] reported the nucleosome positioning of the S. cerevisiae genome, using an ATACseq-based analysis, and found that the most abundant size of the DNA fragments for nucleosome mapping by Tn5 transposase was 143 base-pairs, which is shorter than the DNA length associated within the nucleosome (145-147 base-pairs). This is consistent with our results, in which Tn5 transposase cleaves at the entry-exit regions within the nucleosomal DNA. In human primary CD4+ T cells, the average linker DNA lengths have been estimated as approximately 30 base-pairs in the regions retaining the epigenetic marks of active promoters and enhancers, and 58 base-pairs in the regions retaining the heterochromatin marks [6]. In mouse embryonic stem cells, the peaks of the linker DNA length frequencies are reportedly 35 base-pairs and 45 base-pairs [6]. In these cases, Tn5 transposase may attack both the nucleosomal entry-exit and linker DNA regions. These new findings provide important information to decode the ATAC-seq results in cells and/or genomic loci with different linker lengths.
A prototype foamy virus integrase reportedly integrates efficiently into the viral DNA in nucleosomal DNAs [39]. A cryoelectron microscopy structure of the integrase complexed with a nucleosome revealed that the integrase specifically targets the nucleosomal DNA at a position located 3.5 helical turns away from the nucleosomal dyad [39]. In contrast to the foamy virus integrase, Tn5 transposase requires a linker DNA between two nucleosomes for the adaptor DNA integration, and preferentially targets the nucleosomal DNA near the entry-exit site. To clarify the mechanisms by which Tn5 transposase targets the nucleosomal DNA, structural studies of the nucleosome complexed with Tn5 transposase are awaited.

Preparation of dinucleosomes
Human histones were prepared as described previously [40]. Briefly, H2A, H2B, H3 and H4, each cloned into the pET15b vector, were expressed as His 6   royalsocietypublishing.org/journal/rsob Open Biol. 9: 190116 coli, and purified using Ni-NTA agarose column chromatography (Qiagen) followed by thrombin (Wako) treatment and Mono S column chromatography (GE Healthcare). The histone octamer was reconstituted with purified H2A, H2B, H3 and H4, and was purified by size-exclusion chromatography (Superdex 200 16/60, GE Healthcare), as described previously [40]. The DNA fragments containing the 601 or 603 sequences [41] were cloned into the pGEM-T easy vector (Promega), and the plasmids were prepared from E. coli cells. The fragments were excised with an EcoRV treatment, and were prepared by polyethylene glycol precipitation and anion-exchange column chromatography. The DNA fragments were mixed with the histone octamer in 10 mM Tris-HCl ( pH 7.5) buffer containing 2 M KCl, 1 mM EDTA and 1 mM dithiothreitol, and the dinucleosomes were reconstituted by continuously decreasing the KCl concentration to 250 mM by the salt-dialysis method [42]. The sequences of these DNA fragments are described in electronic supplementary material, figure S4. The resulting dinucleosomes were fractionated by PAGE using a Prep Cell apparatus (Bio-Rad), as described previously [42]. The dinucleosome concentrations were estimated from the absorbance at 260 nm.
The supernatant was diluted with 83 ml of the 20 mM HEPES-KOH buffer ( pH 7.2) described above, and the precipitate was removed by centrifugation. The resulting supernatant was loaded onto a 17 ml chitin resin (NEB) column, and was washed with the same buffer (20 column volumes), followed by an incubation in 20 mM HEPES-KOH buffer ( pH 7.2) containing 100 mM dithiothreitol, 0.  figure S1A).

Sequence analysis
The integration reaction was performed with the Tn5-adaptor DNA complex containing Tn5MErev/Tn5ME-A. The DNA or dinucleosome substrate (containing 0.01 µg µl −1 DNA, from a 0.05 µg µl −1 stock solution) was incubated with the Tn5-royalsocietypublishing.org/journal/rsob Open Biol. 9: 190116 adaptor DNA complex (601 dinucleosome, 601 naked DNA and 603 dinucleosome: 0.5 µM, from the 5 µM stock solution described above; 603 naked DNA: 0.25 µM, from a 2.5 µM stock solution) in 10 mM N-Tris(hydroxymethyl)methyl-3aminopropanesulfonic acid-KOH buffer ( pH 8.5) containing 1 mM MgCl 2 , at 37°C for 15 min (601 dinucleosome, 601 naked DNA and 603 dinucleosome) or 30 min (603 naked DNA). The resulting DNA fragments were extracted as described above, and they were confirmed by non-denaturing PAGE. The samples were electrophoresed on a TAE-agarose gel, and approximately 100-300 base-pair DNA fragments were purified with a Wizard SV Gel and PCR Clean-Up System (Promega). The purified DNA fragments, in which the adaptor DNA was ligated as a sequence tag by the transposase assay, were analysed by massively parallel sequencing. The DNA fragments (10 ng) were ligated with the annealed Tn5MEDS-B oligonucleotides (Tn5MErev/Tn5ME-B, 2 µM) [43] on the blunt end side at 16°C for 30 min, using a TaKaRa DNA ligation kit. The resulting DNA fragments were purified to exclude fragments shorter than 150 base-pairs, using AMPure XP beads (Beckman Coulter). The polymerase chain reaction (PCR) amplification was performed using the Ad1 (5 0 -AATGATACGGCGACCACCGAGATCTACACTCGTCGG-CAGCGTCAGATGTG-3 0 ) and Ad2 (5 0 -CAAGCAGAAGACG GCATACGAGAT[8mer_index]GTCTCGTGGGCTCGGAGAT GT-3 0 ) primers. The PCR reaction was performed under the following conditions: 72°C for 3 min and 95°C for 30 s, followed by seven or eight cycles of 98°C for 10 s, 63°C for 30 s and 72°C for 1 min, with a final extension at 72°C for 5 min, using a Life-ECO thermal cycler (Hangzhou Bioer Technology Co. Ltd., China). The amplified library was purified with a Qiagen MinElute Cleanup kit. The purified library was selected into 260-320 base-pair fragments by E-Gel 2% SizeSelect electrophoresis (Invitrogen). Sequencing was performed in paired reads of 101 × 2 base-pairs, using an Illumina MiSeq system.

Sequencing data analysis
Adaptor DNAs were trimmed using TRIM GALORE (v. 0.5.0) with the following options: -paired -nextera. The pairedend reads were concatenated into single-end reads using FLASH (v. 1.2.11) with the options: -m 15 -M 101 (the reads that have lengths within 101-180 base-pairs were analysed). The reads were mapped to the coordinates of the 601 and 603 DNA sequences using BOWTIE (v. 1.2.2) with the following options: -v0 -m1 (retains exact hits and discards the ambiguous alignments of multi-reads).
Data accessibility. Supplementary figures are available as electronic supplementary material. The deep sequencing data in this study are publicly available at accession no. GEO: GSE130322.