Development and characterization of SSR markers for Sanguinaria canadensis based on genome skimming

Premise Polymorphic nuclear simple sequence repeat (nSSR) markers were developed for Sanguinaria canadensis (Papaveraceae), a spring ephemeral native to eastern North America. Methods and Results Based on the genome skimming data of S. canadensis, a total of 240 nSSR primer pairs were designed for 80 loci from the assembled nuclear contigs. Of these primer pairs, 19 were selected for initial validation in four populations (80 individuals). All 19 loci produced heterologous amplification. The numbers of alleles per locus ranged from one to 21; the levels of observed and expected heterozygosity per locus ranged from 0.000 to 1.000 and from 0.000 to 0.847, respectively. Transferability of the loci was tested in the related species Eomecon chionantha. Conclusions The developed nSSR markers revealed polymorphism in the four studied populations and may contribute to investigations of the genetic diversity of S. canadensis and E. chionantha.


of 4
Sanguinaria canadensis L. (bloodroot; Papaveraceae) is the only species of Sanguinaria L., a genus endemic to eastern North America. Bloodroot is a polymorphic species in regard to the morphology of its petals and stems, which vary greatly among individuals. Despite this polymorphism, most botanists consider the different phenotypes to represent variation within a single species; the variation is clearly continuous, exhibiting intermediate forms, and having no obvious correlation with either geography or habitat (Kiger, 1997). Bloodroot is a spring ephemeral herb commonly occurring in temperate deciduous forests, ranging from southern Ontario to eastern Texas in the west, and from northern Florida to New England in the east. Spring ephemerals exhibit a common strategy strongly associated with temperate deciduous forests that allows understory herbs to take advantage of the high levels of sunlight in spring reaching the forest floor prior to formation of a canopy by woody plants (Archibold, 1995). Understanding the population distribution and genetic structure of S. canadensis may not only shed light on the evolutionary history of the species, but also provide insights into the formation and evolution of the North American temperate deciduous forest biome.
Simple sequence repeats (SSRs) have been shown to be highly useful genetic markers for assessing genetic diversity and characterizing population genetic structure (Emanuelli et al., 2013;Lind and Gailing, 2013;Chen et al., 2017;Liu et al., 2018). However, there have been no SSR markers developed for S. canadensis. In this study, we developed 19 polymorphic nuclear SSR (nSSR) markers for S. canadensis using genome skimming and applied these markers to characterize the genetic diversity and population structure of four natural populations. Furthermore, we tested their cross-amplification in the related species Eomecon chionantha Hance. Our results suggest that these nSSRs markers will be valuable for future studies on population genetics and phylogeography of S. canadensis across its entire geographic range.

METHODS AND RESULTS
Two individuals of S. canadensis were selected for genome skimming analysis. Fresh leaves of these individuals (XGC1 and XGC2; Appendix 1) were sampled from the field and dried with silica gel. Total genomic DNA was extracted using Plant DNAzol Reagent (Thermo Fisher Scientific, Waltham, Massachusetts, USA) following the manufacturer's protocol. The high-molecular-weight DNA was sheared using a Covaris LE220 Focused-ultrasonicator (Covaris, Woburn, Massachusetts, USA), then the library was prepared using a NEBNext DNA Library Prep Master Mix Set for Illumina (New England Biolabs, Ipswich, Massachusetts, USA), followed by a size selection of 300-350 bp using the Agencourt AMPure XP system (Beckman Coulter, Shanghai, China). Finally, sequencing was conducted using the Illumina HiSeq 2500 (Illumina, San Diego, California, USA) at the Beijing Genomics Institute (Shenzhen, China), with 150-bp paired-end sequencing. The raw reads were filtered and then assembled de novo into contigs using the CLC Genome Workbench version 4.0.6 (QIAGEN, Hilden, Germany), following Liu et al. (2017). The raw reads were deposited in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (BioProject ID: PRJNA512069 for XGC1, PRJNA512066 for XGC2).
Contigs mapped to the plastome of Coreanomecon hylomeconoides Nakai (NC_031446) and the mitochondria of Nelumbo nucifera Gaertn. (NC_030753) were removed, so that only nuclear contigs remained. BLAST 2.2.26 (Altschul et al., 1990) was used for remapping. The remaining nuclear contigs were analyzed using CandiSSR (Xia et al., 2016) to identify candidate polymorphic nSSRs by comparing the contigs between the two S. canadensis individuals. For CandiSSR analysis, default parameters were used except that flanking sequence length = 50. For each SSR locus, primers were automatically designed in the pipeline via the Primer3 package (Koressaar and Remm, 2007), which generated three primer pairs for each of the 80 loci. The 240 SSR primer pairs (Table 1, Appendix S1) were then evaluated by OLIGO version 6.67 (Molecular Biology Insights, Cascade, Colorado, USA); primers that had ΔG values <4.5 kcal/mol and no annealing troubles were selected.
In total, 19 primer pairs were selected to test in 80 individuals from four natural populations (20 individuals each) of S. canadensis (Appendix 1; Tables 1, 2). DNA was extracted from the leaf tissues as described above. Two rounds of PCR amplifications were performed using a Tsingke PCR kit (Tsingke Biotech Company, Beijing, China). For the first round, a mixture of 15 μL containing approximately 5 ng DNA, 7.5 μL 2 × PCR Master Mix, 5 pM forward primer (synthesized with an 18-bp M13 tail [5′-TGTAAAACGACGGCCAGT-3′] at the 5′ end), and 5 pM of reverse primer. The PCR thermal profile involved an initial denaturation at 95°C for 5 min; followed by 35 cycles of denaturation at 95°C for 30 s, annealing at optimal temperature (Table 1) for 30 s, and extension at 72°C for 30 s; and a final extension of 72°C for 10 min. For the second round, a final volume of 30 μL contained approximately 3 μL of the first round's product, 15 μL of 2× PCR Master Mix, 100 μM of fluorescent-labeled (FAM, ROX, HEX, or TAMRA) universal M13 primer, and 10 pM of reverse primer. The PCR reactions were performed with an initial denaturation at 94°C for 2 min; followed by 38 cycles of 94°C for 60 s, 56°C for 45 s, 72°C for 1 min; and a final 10-min extension step at 72°C. The first round of PCR added an adapter to the fragments, which were then fluorescent-labeled in the second round of PCR. The final amplification products of four different loci (with four different fluorescent labels) for each plant sample were pooled and genotyped in an ABI 3730xl DNA Analyzer (Applied Biosystems, Foster City, California, USA) with GeneScan 500 LIZ Size Standard (Applied Biosystems) as the internal reference. Allele identification was performed using GeneMarker version 2.2.0 (SoftGenetics, State College, Pennsylvania, USA) with default parameters. The presence of null alleles and their bias on genetic diversity were evaluated based on the expectation maximization method implemented in FreeNA (Chapuis and Estoup, 2006). FSTAT version 2.9.3 (Goudet, 1995) was used to test for Hardy-Weinberg equilibrium, and GENEPOP version 4.0.7 (Rousset, 2008) was used to test for linkage disequilibrium. The number of alleles, levels of observed and expected heterozygosity, and polymorphism information content were calculated using CERVUS version 3.0.3 (Kalinowski et al., 2007).
In total, we detected 112 alleles among 19 loci of 80 individuals. The number of alleles at each locus ranged from one to 21, and PIC values ranged from 0.000 to 0.808 (Table 2), suggesting moderate to high levels of polymorphism (Botstein et al., 1980). Levels of observed and expected heterozygosity for each locus varied from 0.000 to 1.000 and 0.000 to 0.847, respectively (Table 2). These data suggest the presence of abundant genetic diversity in the species. Five loci (SSR4, SSR18, SSR64, SSR80, and SSR90) were shown to have null alleles (Table 2; Chapuis and Estoup, 2006). Loci SSR18, SSR80, and SSR90 revealed significant deviation from Hardy-Weinberg equilibrium in populations NC and VA, population WI, and population MN (P > 0.05; Table 2), respectively. Significant linkage disequilibrium was only detected between loci SSR18 and SSR92. These data may indicate non-random mating, and possibly inbreeding, of the plants in these populations.  To verify their potential for heterologous amplification, all 19 SSRs were tested in one population (20 individuals) of E. chionantha (Appendix 1), the sister species of S. canadensis (Wang et al., 2009). The procedures of PCR amplification and genotyping were the same as described above. All 19 loci amplified in E. chionantha, with polymorphism detected in all loci except SSR68 (Table 3), suggesting their potential utility in future studies.

CONCLUSIONS
In this study, 19 highly polymorphic nSSR markers were successfully developed for S. canadensis. These markers can be applied to investigate population genetics and phylogeography of S. canadensis and its close relatives.