Isolation and characterization of SSR and EST‐SSR loci in Chamaecyparis formosensis (Cupressaceae)

Premise of the Study Simple sequence repeat (SSR) and expressed sequence tag (EST)–SSR markers were developed as tools for marker‐assisted selection of Chamaecyparis formosensis and for the molecular differentiation of cypress species. Methods and Results Based on the SSR‐enriched genomic libraries and transcriptome data of C. formosensis, 300 primer pairs were selected for initial confirmation, of which 19 polymorphic SSR and eight polymorphic EST‐SSR loci were chosen after testing in 92 individuals. The number of alleles observed for these 27 loci ranged from one to 17. The levels of observed and expected heterozygosity ranged from 0.000 to 1.000 and from 0.000 to 0.903, respectively. Most markers also amplified in C. obtusa var. formosana. Conclusions The developed SSR and EST‐SSR sequences are the first reported markers specific to C. formosensis. These markers will be useful for individual identification of C. formosensis and to distinguish cypress species such as C. obtusa var. formosana.

Chamaecyparis formosensis Matsum., known as cypress, is a coniferous plant species in the Cupressaceae that is endemic to Taiwan. Although several simple sequence repeat (SSR) markers of Chamaecyparis Spach have been reported (Nakao et al., 2001;Matsumoto et al., 2006), these markers were not applicable to C. formosensis as evidenced in our preliminary screening tests. In the present study, next-generation sequencing was used to develop two types of effective markers in C. formosensis: (1) SSR markers (codominant markers that are theoretically distributed throughout the genome) were developed from noncoding regions, and (2) expressed sequence tag (EST)-SSR markers (which are thought to be highly conserved in closely related species) were derived from functional sequences. Compared to SSR markers, EST-SSR markers demonstrate a higher level of transferability across related species (Varshney et al., 2005). Thus, EST-SSR markers are more suitable for the discrimination of species.
The logging of illegally sourced timber poses a great threat to biodiversity. To address this problem, scientists and forestry experts have been developing methods to identify individual trees (Dormontt et al., 2015). Tereba et al. (2017) reported SSR-based markers to identify and match logs to the stumps at a given locality. Lowe et al. (2010) also demonstrated that SSR markers allow log suppliers to validate the integrity of wood products within a supply chain. The markers developed in this study will be used not only for the individual identification of C. formosensis, but also to supply an identification tool for evidence of illegal logging. In addition, we also tested the transferability of these markers in C. obtusa (Siebold & Zucc.) Endl. var. formosana (Hayata) Hayata to effectively distinguish C. formosensis and C. obtusa var. formosana, which are currently difficult to differentiate by phenotype.

Isolation and characterization of SSR and EST-SSR loci in Chamaecyparis formosensis (Cupressaceae)
Chiun-Jr Huang 1,2,3 , Fang-Hua Chu 1,4 , Shau-Chian Liu 5 , Yu-Hsin Tseng 3 , Yi-Shiang Huang 6 , Li-Ting Ma 1 , Chieh-Ting Wang 4 , Ya Ting You 3 , Shuo-Yu Hsu 1 , Hsiang-Chih Hsieh 1 , Chi-Tsong Chen 2 , and Chi-Hsiang Chao 2,7 P R I M E R N OT E extracted from fresh leaves using the cetyltrimethylammonium bromide (CTAB) method (Doyle and Doyle, 1987) from three individuals (Chung 2434, Chung 2607, and Chung 2626) from two localities in Taiwan (Appendix 1). Development of the SSR markers from the DNA library followed the magnetic bead enrichment method of Glenn and Schable (2005), using the restriction enzymes AluI, XmnI, and HaeIII (New England Biolabs, Ipswich, Massachusetts, USA). The concentration and quality of SSRenriched libraries were measured using a NanoDrop 2000 (Thermo Fisher Scientific, San Diego, California, USA) and Qubit 2.0 Fluorometer (Thermo Fisher Scientific), respectively, and then the DNA libraries were sequenced using the Illumina MiSeq System (2 × 300 bp paired-end; Illumina, San Diego, California, USA) at Tri-I Biotech (New Taipei City, Taiwan). A total of 13,653,074 raw reads were produced. The raw reads were quality-trimmed and merged using CLC Genomics Workbench version 7.5 (QIAGEN, Aarhus, Denmark). All sequence information has been uploaded to the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRP145153). The contigs ranging from 80 to 530 bp in length were merged, and a total of 10,487,858 contigs were assembled. These contigs were screened using the Simple Sequence Repeat Identification Tool (SSRIT; Temnykh et al., 2001) and at least five di-, tri-, tetra-, penta-, and hexanucleotide repeats were selected, resulting in a total of 305,556 SSR-containing sequences.
To prepare the RNA library, RNA was extracted from fresh leaves of one individual (specimen C.T. Wang s.n.) using the CTAB method (Chang et al., 1993). The concentration and quality of total RNA were measured using the NanoDrop 2000 and Qubit 2.0 Fluorometer, respectively, and sequencing was performed via the Illumina HiSeq 2000 System (2 × 100 bp paired-end) by BGI Genomics (Shenzhen City, Guangdong Province, China). The adapter contamination and low-quality reads were removed by BGI Genomics. All sequence information has been deposited in the NCBI Sequence Read Archive (SRP145033). There were a total of 48,126,630 clean reads with 90 bp per read. Clean reads were assembled and merged into a single sequence 1,197,968 bp in length using Geneious version 10.2.3 (Biomatters Ltd., Auckland, New Zealand). pSTR Finder (Lee et al., 2015) was used to screen the EST-SSR sequences, and at least five di-, tri-, tetra-, penta-, and hexanucleotide repeats were subsequently selected to generate a total of 112 potential EST-SSR sequences. Primers were designed for the potential SSR and EST-SSR sequences using Primer3 (Rozen and Skaletsky, 1999) with the optimum primer conditions: length of 18 to 28 bp, annealing temperature of 45-60°C, and target product size of 80-300 bp. Consequently, a total of 274 SSR primer pairs and 26 EST-SSR primer pairs were designed. To characterize the degree of polymorphism of each locus, 92 individuals from four populations (Appendix 1) were tested using the primer pairs. For this purpose, total genomic DNA was extracted from frozen leaves or wood samples using the Plant Genomic DNA Extraction Miniprep System Kit (Viogene, Taipei, Taiwan). PCR was conducted with a final volume of 20 μL containing approximately 2 ng of genomic DNA, 0.3 μL each of forward and reverse primer (10 μM), and 10 μL of Q-Amp 2× Screening Fire Taq Master Mix (Bio-Genesis Technologies, Taipei, Taiwan). The following PCR conditions were used: an initial denaturation of 95°C for 2 min; 30 cycles of 95°C for 45 s, a primer-specific annealing temperature (see   (Tables 1, 2) and confirmed to be polymorphic among the four tested populations (Table 3). All sequence information was combined and deposited at NCBI (BioProject PRJNA454510). The number of alleles per locus and levels of expected and observed heterozygosity were calculated using GenAlEx 6.503 (Peakall and Smouse, 2012). GENEPOP 4.2 (Raymond and Rousset, 1995) was used to test for Hardy-Weinberg equilibrium and linkage disequilibrium using exact tests. The total number of alleles ranged from one to 17 (Table 3). The levels of observed and expected heterozygosity ranged from 0.000 to 1.000 and from 0.000 to 0.903, with average values of 0.549 and 0.568, respectively. Significant deviations of Hardy-Weinberg equilibrium in terms of heterozygosity deficiency were detected in 11 loci (Cred47, Cred231, Cred248, Cred249, Cred253, Cred260, Cred262, Cred264, Cred276, Cred277, and Cred280). Significant linkage disequilibrium (P < 0.001) was detected between Cred35 and Cred229, Cred281 and Cred297, Cred249 and Cred298, and Cred260 and Cred298. The putative functions of EST-SSR sequences were determined by BLASTX against the non-redundant GenBank database. Thirteen SSR and four EST-SSR loci were successfully amplified in C. obtusa var. formosana (Table 4).

CONCLUSIONS
The 19 SSR and eight EST-SSR markers described in the present study are reported for the first time in C. formosensis. These endemic cypress-specific markers can be used not only for species identification, but also potentially to assist in the certification of legal timber trade and in studies of genetic diversity and population genetic structure in populations within Taiwan. Data from these types of studies will contribute to the conservation and management of C. formosensis, which is crucially threatened by illegal logging.