Molecular and cytogenetic analyses in Geranium macrorrhizum L. wild Italian plants

Geranium macrorrhizum L. is a herbaceous species native to southern Europe and was introduced in central Europe and North America. It is also widely distributed in Italy. In this study, molecular and cytogenetic analyses were carried out on 22 wild plants, collected in central and southern Italy, compared with five cultivated plants, with the main purpose to identify those living near the Marmore waterfalls in central Italy, recently described as the new species Geranium lucarinii. Four barcoding markers (rbcL, matK, trnH-psbA intergenic spacer and internal transcribed spacer region) were sequenced and their variability among the plants was evaluated. Chromosome numbers were determined and 45S rDNA was physically mapped by fluorescence in situ hybridization. Moreover, genomic affinity between wild and cultivated plants was evaluated by genomic in situ hybridization. The results of this study supported that all the plants belong to G. macrorrhizum, including the Marmore population. Barcoding analyses showed a close similarity among the wild plants, and a differentiation, although not significant, between the wild plants on one hand and the cultivated plants on the other. Integrated studies focusing on morphological, genetic and ecological characterization of a larger number of wild populations would allow us to know the extent of the variability within the species.


Introduction
Accurate species delimitation is fundamental to biology.It has implications not only for a reliable evaluation of biodiversity but also for the use of the organisms at many levels, even for their conservation [1,2].It is now widely accepted that alpha taxonomy, based on morphological characters, could not of the plants studied and to characterize their chromosome complement by means of fluorescence in situ hybridization (FISH) of 45S rDNA.Finally, cross-genomic in situ hybridization (GISH) experiments were carried out to assess the genomic affinity between Marmore plants and the cultivated ones.This method is commonly applied to reveal genomic similarity between closely related species based on the homology of the repetitive DNA sequences ( [34] and references therein).

Plant material
Plants were collected in field inspections in central and southern Italy, at Marmore waterfalls and Felitto (Campania Region), respectively, by Prof. R. Venanzoni and Prof. R. P. Wagensommer in 2019 and 2020.Ten plants per population were sampled.As the species are rhizomatous, care was taken to collect plants at a suitable distance from each other.The plants from Felitto had been described as G. macrorrhizum [35,36].Two plants of G. macrorrhizum, previously collected at the National Park of Abruzzo, Latium and Molise (NPALM; central Italy) and then transferred to the Botanical Garden of Camerino University (central Italy), were obtained by R. Venanzoni from this institution.Other G. macrorrhizum plants were obtained from Botanical or public gardens (table 1 and figure 1).All the plants were cultivated ex situ at the Department of Chemistry, Biology and Biotechnology of Perugia University and used for molecular and cytogenetic analyses.

DNA extraction, amplification and sequencing
Total genomic DNA was isolated from fresh leaves using the DNeasy Plant Mini kit (Qiagen, Germany) according to the manufacturer's instructions.
Three plastid markers (rbcL, matK and trnH-psbA intergenic spacer) and the nuclear ITS region (ITS1-5.8S-ITS2),were amplified in a 25 µl volume reaction containing 20 ng of DNA template, 1 µl of each primer (10 pmol µl -1 ) and 0.5 units of MyTaq HS polymerase (Bioline).Amplifications were performed on a thermal cycler 2720 (Applied Biosystems, Foster City, CA, USA).The primer pairs and cycling conditions are listed in the electronic supplementary material, table S1.Two matK primer pairs were used.First, the 390F + 1326R pair [37] failed amplification.The primer pair 3F + 1R KIM [38] produced multiple sequences.In order to obtain a single amplicon, the 1R KIM sequence was modified according to the complementary region on the G. macrorrhizum matK sequences found in the GenBank database.Amplified products were purified using the ExoSAP-IT® Express reagent (Thermo Fisher Scientific Inc.).Sequencing in both directions was performed by Eurofins Genomics service (Germany).Primers used for sequencing were the same as those for amplifications.Electropherograms quality was visually inspected.Sequences were manually edited and aligned using the ClustalW algorithm implemented in BioEdit 7.1.7[39] with the default values.The sequences were compared with those available in GenBank (cf.electronic supplementary material, table S2) through a BLASTn search [40].
Newly determined sequences were deposited in GenBank (accession numbers OK299101 for rbcL, OM417815 for trnH-psbA, from OR656480 to OR656483 for ITS, OR668227 for matK).Only differing sequences for each locus have been deposited.

Phylogenetic analyses
The identification of variable and parsimony informative sites was carried out using MEGA 11 software [41].MUSCLE [42] was used to align the sequences with the outgroup ones in MEGA 11.Genetic relationships among samples were inferred using both neighbour-joining (NJ) [43] and maximum likelihood (ML) methods.In NJ analyses, the genetic distances were computed using the Kimura 2-parameter (K2P) substitution model [44] for each locus and were given as units of the number of base substitutions per site.All ambiguous positions were removed for each sequence pair.Bootstrap analysis was done using 1000 replicates [45].For ML analyses, the Tamura-Nei model was used [46].Initial trees for the heuristic search were obtained automatically by applying NJ and BioNJ algorithms to a matrix of pairwise distances and then selecting the topology with superior log likelihood value.Branch lengths measure the number of substitutions per site.Bootstrap analysis was done using 1000 replicates.Three Geranium species whose ITS sequence showed the highest royalsocietypublishing.org/journal/rsos R. Soc.Open Sci.11: 240035  .NJ and ML trees were constructed both for each marker and for concatenated markers.
In concatenated trees, sequences of G. macrorrhizum and outgroup species were obtained by the sum of three marker sequences probably deriving from different individuals.Genetic relationships among plants were also investigated through a median-joining network of haplotypes obtained by analysing the concatenated sequences.The network was constructed with the Network software v.10.2 (www.fluxus-engineering.com) by using the reduced median algorithm (ρ = 2).The term haplotype used here indicates the list of mutations found in the examined sequences in each sample, arbitrarily numbered for the analyses.

Cytogenetic analyses
Root apices were treated with ice-cold water for 24 h at 4°C, then transferred in 8-OH-quinoline (Sigma) 0.02 M for 5 h at room temperature and fixed in ethanol-acetic acid 3 : 1 (v/v).Fixed root tips were washed in an aqueous solution of 6 mM sodium citrate plus 4 mM citric acid, digested with a mixture of 10% pectinase (Sigma), 8% cellulose (Calbiochem) and 2% macerozyme (Serva) in citrate buffer pH 4.6 for 1 h at 37°C and squashed under a coverslip in a drop of 60% acetic acid.
After removing coverslips by the solid CO 2 method, slides were air-dried and used for FISH or GISH experiments.FISH was performed as described in Mascagni et al. [47].The wheat probe pTa71, containing 18S-5.8S-26S rDNA [48] was used.The DNA of nuclei and chromosomes was denatured in a thermal cycler for 6 min at 70°C and the preparations were incubated overnight at 37°C with 2 ng µl -1 of heat-denatured DNA probe which had been labelled with digoxigenin-11-dUTP (Roche) or biotin-16-dUTP (Roche) by nick translation.Detection of the digoxigenin or biotin at the hybridization sites was carried out using anti-digoxigenin conjugated with fluorescein isothiocyanate (FITC; Roche) or streptavidin conjugated with Cy3 (Cyanine 3; Sigma), respectively.The preparations were then counterstained with a 2% 4,6-diamidino-2-phenylindole (DAPI) solution in McIlvaine buffer pH 7, mounted in antifade solution (AFI; Citifluor) and analysed with a fluorescence microscope (DMRB, Leica).Images were captured with a digital camera (ILCE-7, Sony) and optimized using Adobe Photoshop 5.0.The same slides were used for chromosome counting.At least 10 DAPI-stained metaphases per plant were analysed.
For GISH experiments (self-GISH and cross-GISH), total genomic DNAs extracted from leaves were used as probes after labelling with biotin-11-dUTP by nick translation following the producer's protocol (BioNick Labeling System, Invitrogen).The GISH procedure was similar to the FISH protocol with the exception of the probe concentration, which was 5 ng µl -1 .These experiments were replicated three times.

Results
Sequences of different lengths were obtained for each marker (table 2).The rbcL sequences used for subsequent analyses were longer than the gene fragment, 599 bp in length, considering the barcode region by Hollingsworth et al. [8].matk locus proved to be difficult to amplify.The mainly used primer combinations failed in amplification, therefore a specific reverse primer was designed and used in combination with primer 3F KIM (electronic supplementary material, table S1).Also in this case, the primer pair used produced sequences slightly differing in length from the gene barcode region [8].BLASTn analysis showed that all the rbcL and matK sequences, either from wild or cultivated plants, were identical to those of G. macrorrhizum.Indeed, the identity percentage was in the range 99.86-100.00and 99.87-100.00for rbcL and matK sequences, respectively (electronic supplementary material, table S2).The range was slightly wider for ITS sequences (98.55-100.00;electronic supplementary royalsocietypublishing.org/journal/rsos R. Soc.Open Sci.11: 240035 material, table S2).Regarding the intergenic spacer trnH-psbA, sequences 339-346 bp in length were obtained.They resulted monomorphic in nucleotide composition and showed the highest identity percentage (95%) with Geranium maderense Yeo sequence (electronic supplementary material, table S2).Their alignment with the plastome of G. maderense showed that the intergenic spacer trnH-psbA in the analysed plants is 288 bp long and has a G + C content of 36.1%.

Phylogenetic analyses
The network analysis, constructed with the reduced median method, was applied to analyse the genetic relationships among the plants.To this purpose, sequences were aligned and trimmed for each marker, and concatenated sequences were used.Six haplotypes were found, four within the wild plants and two within the cultivated ones (figure 2a).Population 4 (Felitto, Campania) was homogeneous such as population 5 (Marmore).The two plants from NPALM (plants 2 and 3) showed unique haplotypes, as well as the cultivated plant 10.All the other cultivated samples shared the same haplotype.Within rbcL sequences, only a single nucleotide polymorphism (SNP) was found, consisting of a transversion A/C (nucleotide position 472 in the electronic supplementary material, table S3) which distinguishes the plants from Marmore and NPALM from all the others.Only one SNP, a transition C/T (nucleotide position 149 in the electronic supplementary material, table S3), was also observed among matK sequences, distinguishing all the wild plants from the cultivated ones.Instead, 13 polymorphic sites were observed among ITS sequences.All of them were located in the sequenced portions of the intergenic spacers ITS1 and ITS2.Four polymorphic sites distinguished cultivated plants from the wild ones.Several polymorphic sites were heteroplasmic nucleotides in the G. macrorrhizum sequence available in GenBank (DQ525073) and only in those of cultivated plants.
Heteroplasmy was shown at two sites in wild population 4 and one site in plant 2 from NPALM (electronic supplementary material, table S3).
Bearing in mind the need to clarify the taxonomical placement of the Marmore population, the genetic relationships among plants were further investigated by NJ and ML methods, including sequences of G. macrorrhizum and outgroup species retrieved from GenBank.Some characteristics of the trimmed aligned fragments are shown in table 2. The only polymorphism detected among rbcL sequences (see above) was responsible for a weakly supported but clear differentiation among wild plants (electronic supplementary material, figure S1).Instead, all the wild plants were included in the main branch of the matK tree, harbouring a weakly supported sub-cluster including the cultivated ones and two samples of G. macrorrhizum from GenBank (electronic supplementary material, figure S2).The concatenated rbcL+ matK tree, based on a total of 1479 bp, showed a highly supported main branch harbouring all the analysed plants.The cultivated plants closely clustered with G. macrorrhizum samples, whereas a further differentiation emerged among the wild ones (electronic supplementary material, figure S3).The ITS tree harboured two main clades, the first including all the wild plants, while the second comprising the cultivated plants (electronic supplementary material, figure S4).Some variability can be observed within each cluster.The two plants from NPALM formed a sub-cluster, whereas population 4 was slightly differentiated from population 5.Among the cultivated plants, sample 10 turned out to be more similar to G. dalmaticum than to G. macrorrhizum.The concatenated tree rbcL + matK + ITS, for a total of 2101 bp (figure 2b) included only G. robertianum as an outgroup because the sequences of all the three markers were available in GenBank only for this species.The tree highlighted the differences between the two groups of cultivated and wild plants, already observed in figure 2a.The wild plants were in turn grouped into three sub-clusters corresponding to their geographical provenance.Although ML trees showed the same NJ topology, the ML concatenated tree is reported in the electronic supplementary material, figure S5.

Cytogenetic analysis
Chromosome counts on the DAPI stained metaphases showed that the somatic chromosome number in all the wild plants, including those from Marmore, was 2n = 92, whereas in the cultivated plants it was 2n = 46, with the exception of plant 10, showing 2n = 92.Owing to the small chromosome size, it was difficult to arrange the karyotype.In order to establish at least the number of chromosome pairs carrying ribosomal DNA, FISH was carried out using pTa71 as a probe.Eight hybridization signals related to 45S rDNA were counted on metaphase plates of cultivated plants (figure 3a,b), whereas a maximum of 16 signals were observed on metaphase plates of wild plants, comprising those collected at Marmore waterfalls (figure 3c,d).
To evaluate the genome affinity between Marmore plants and G. macrorrhizum, GISH experiments were carried out by probing the genomic DNA of Marmore plants on chromosomes of cultivated plants and vice versa (figure 4).Preliminary experiments in which the labelled DNA of Marmore plants or cultivated plants was hybridized to its own chromosomes (self-GISH) were performed to better evaluate, by comparison, the results of cross-GISH.Thus, after self-GISH, fluorescent signals, although of different intensity, were observed on each chromosome of the complement in both wild and cultivated plants (figure 4a,b).Low-intensity signals were easily recognized at the centromeric and pericentromeric regions, showing a hybridization pattern typical of satellite DNA.The same hybridization pattern was observed after cross-GISH (figure 4c,d).S3), while colours (as in figure 1) indicate their geographical provenance.NPALM, National Park of Abruzzo, Latium and Molise.NJ tree (b) encompassing the concatenated sequences (ITS + rbcL + matK) from this study and those available from GenBank, obtained by concatenating ITS, rbcl and matK sequences from G. macrorrhizum (DQ525073, KP963387 and KY687134, respectively) and G. robertianum (DQ525071, KP963378 and KY687141, respectively).Plants numbering as in table 1. Bootstrap values are indicated next to the branches.

Discussion
In this study, DNA barcoding was applied to test the identity of plants living near the Marmore waterfalls in central Italy, considered a new species, G. lucarinii [29].The plants are morphologically so similar to G. macrorrhizum [29] that recently G. lucarinii has been considered its synonym [23].
All the sequences of markers rbcL, matK and ITS showed identity percentages equal or close to 100% with those of G. macrorrhizum.The greatest variability was observed among ITS sequences, as expected owing to its nature of a bi-parentally inherited marker.The minimum value of the range (98.55%) was   higher than the identity percentage (97.76%)between ITS of G. macrorrhizum DQ525073 and that of the closely related species G. dalmaticum DQ525072, confirming that our ITS sequences correspond to G. macrorrhizum.
trnH-psbA intergenic spacer did not contribute to species identification because the GenBank database was missing the reference sequence for G. macrorrhizum.However, it proved likewise useful.Indeed, any trnH-psbA sequence variation was observed among the plants here analysed.This sequence monomorphism, unusual for the marker, supports the fact that all the analysed plants seem to belong to the same species.
The distribution of genetic variability in barcoding sequences suggested some differentiation within the plants studied (figure 2).The cultivated plants closely clustered with G. macrorrhizum, whereas wild plants were clearly grouped into three sub-clusters corresponding to their geographical origin.The plants from the NPALM were more closely related to the Marmore population than to population 4 from Campania (Felitto).Interestingly, the two samples from the National Park are the same plants used to morphologically compare plants from Marmore, later considered a new species [29].Despite this clustering, it is clear that genetic variation between cultivated and wild plants does not support the existence of two different species.
This finding is also confirmed by the cytogenetic analyses.Two cytotypes, diploid and tetraploid, were detected in this study.Ninety-two chromosomes, corresponding to the tetraploid level, were counted in Marmore plants, as well as in the other wild plants studied, versus the 46 chromosomes counted in almost all the cultivated plants, with the exception of plant 10.However, the different chromosome numbers cannot be considered a discriminating factor, because the existence of diploid and tetraploid plants with n = 23 has long been known in G. macrorrhizum [15][16][17][18][19][20][21].Recently, plants with a genome size corresponding to the hexaploid level were found in Croatia [49].The occurrence of different ploidy levels in the same species, owing to endopolyploidy, is not exclusive to G. macrorrhizum.Rather, it is common to many taxa of the Geranium genus [50].
Our FISH analyses confirmed that wild plants have a doubled chromosome number compared with the cultivated ones (figure 3).The number of 45S rDNA signals was in agreement with the ploidy level, unlike what occurs in many species in which a reduction in the number of ribosomal DNA sites per monoploid genome is observed following polyploidization [51].FISH also showed that the number of chromosome pairs carrying ribosomal DNA is higher than that previously observed in karyotype analyses carried out in different G. macrorrhizum Bulgarian populations with 2n = 46 [18].Indeed, eight hybridization signals, corresponding to four chromosome pairs, were observed in our cultivated plants with 46 chromosomes, whereas only two or three nucleolar chromosome pairs were found by Petrova & Stanimirova [18].
Genomic affinity between Marmore plants and G. macrorrhizum was cytologically investigated by GISH (figure 4).Repetitive DNA sequences (satellite DNA) are mainly involved in the hybridization reaction.The method provides a powerful tool to study their distribution pattern along chromosomes, especially in species for which there is a lack of genome information [52,53].Since most satellite DNA sequences are fast evolving in structure, redundancy and localization even within the same species [54,55], their detection through GISH could give information about the relationship between related species.The comparison of hybridization patterns after self-GISH and cross-GISH in our material showed the homology of the repeated sequences between Marmore plants and the cultivated ones.
Thus, at present, our molecular and cytogenetic data support the presence of only species G. macrorrhizum L. in central and southern Italy, confirming that the name G. lucarinii Venanzoni & Wagens.becomes a synonym.Evidently, differences in the few morphological features discriminating the two species have been considered by Aedo [23] as a part of the morphological variability characterizing G. macrorrhizum.Further traits distinguishing G. lucarinii from G. macrorrhizum were the flowering period and the habitat, in terms of vegetation context and altitudinal range of distribution [29].The latter (190-250 m.a.s.l.) is partly overlapping with that of G. macrorrhizum (50-2800 m.a.s.l.).In addition, one of the most southern G. macrorrhizum Italian stations, Felitto (population 4 in this study), is also located at a low altitude (200-290 m.a.s.l.[35,36]).Differences in flowering period and morphology observed between Marmore plants and G. macrorrhizum could be owing to adaptation to environmental factors.For example, leaf traits previously used to delimit species in Rhodiola sect.Trifida were then found to be strongly influenced by climatic variables related to rainfall [2].A comparative climatic niche analysis could help to clarify this issue.
This study is, to our knowledge, the first report on molecular and cytogenetic characterization of G. macrorrhizum Italian populations.The topology of concatenated trees (figure 2b; electronic supplementary material, figure S5) suggests that G. macrorrhizum wild populations in central and southern Italy royalsocietypublishing.org/journal/rsos R. Soc.Open Sci.11: 240035 form a genetically fairly homogeneous group, well separated from the cultivated plants.The origin of cultivated plants here studied is not well known, just as there is not enough information on the status, if cultivated or wild, of G. macrorrhizum plants whose sequences were retrieved from GenBank.Beyond this, it is significant that the cultivated plants cluster together and with known G. macrorrhizum, whether it be cultivated or not, whereas the wild plants form a distinct cluster.Work is in progress to deepen morphological and genetic studies, extending them to a greater number of wild populations, to estimate the degree of the variability within the species.For the same purpose, the role played by the geographical distribution of the populations, their spatial isolation and consequent gene flow, as well as ecological specialization, will be evaluated.Such an integrative approach is fundamental to define different aspects of the speciation process and to delimit evolutionary distinct lineages [3,4].

Figure 1 .
Figure 1.Sampling location of wild (stars) and cultivated (grey circles) plants in Italy.Plant 9 was collected in Warsaw (Poland), here indicated in a separate box.The provenance of wild plants is highlighted with different colours: green for Marmore waterfalls, orange for NPALM and yellow for Felitto (see table 1 for details).

Figure 2 .
Figure 2. Reduced median network (a) of wild and cultivated plants based on their concatenated sequences (ITS +rbcL + matK).Circles represent the haplotypes and are proportional to the observed frequency in the analysed plants (electronic supplementary material, tableS3), while colours (as in figure1) indicate their geographical provenance.NPALM, National Park of Abruzzo, Latium and Molise.NJ tree (b) encompassing the concatenated sequences (ITS + rbcL + matK) from this study and those available from GenBank, obtained by concatenating ITS, rbcl and matK sequences from G. macrorrhizum (DQ525073, KP963387 and KY687134, respectively) and G. robertianum (DQ525071, KP963378 and KY687141, respectively).Plants numbering as in table1.Bootstrap values are indicated next to the branches.

Table 1 .
List of the plants studied, their status and geographical provenance.Geranium lasiopus Boiss.& Heldr.KX421242, and Geranium glaberrimum Boiss.& Heldr.KX421239).Two species for which both rbcL and matK sequences from the same origin were available, were chosen as outgroups (Geranium lucidum L. MK542503 and JN896161, and Geranium robertianum L. KP963378 and KY687141).Moreover, all the rbcL and matK G. macrorrhizum sequences available in GenBank were aligned to obtain those dendrograms (see accession numbers in dendrograms)

Table 2 .
Range of the amplicon sequences for each marker and some characteristics of the fragments aligned for NJ and ML trees.