Development of 10 single‐copy nuclear DNA markers for Euchresta horsfieldii (Fabaceae), a rare medicinal plant

Premise of the Study Euchresta horsfieldii (Fabaceae) is a rare and endangered medicinal plant in Indonesia with restricted distribution. Single‐copy nuclear DNA (scnDNA) markers were developed for this species to facilitate further investigation of genetic diversity and population structure. Methods and Results We performed RNA‐Seq and de novo assembly of the transcriptome. Ten primer sets were developed for E. horsfieldii, all of which also amplified in E. japonica and E. tubulosa. Conclusions These scnDNA markers will be an important resource for the study of genetic diversity and population structure of E. horsfieldii and other species in the genus Euchresta.

package RNA-Seq by Expectation Maximization (RSEM; Li and Dewey, 2011), and only assembled genes with fragments per kilobase of transcript per million mapped reads (FPKM) values greater than 1 were selected for subsequent analysis. Coding regions within these unigenes were predicted by TransDecoder version 5.0.1 (https://github.com/TransDecoder). We performed Pfam and BLASTP searches of these protein-coding genes against UniProtKB/Swiss-Prot to predict their putative functions. Their ortholog groups were compared against Ricinus communis L., Arabidopsis thaliana (L.) Heynh., Oryza sativa L., and Physcomitrella patens (Hedw.) Bruch & Schimp. and identified using an online version of OrthoMCL-DB (Chen et al., 2006; http:// orthomcl.org/orthomcl/). These ortholog groups were treated as putative single-copy genes (scnDNA). Approximately 21 million Illumina paired-end clean reads were generated (National Center for Biotechnology Information [NCBI] Sequence Read Archive [SRA] accession no. SRP149026). Clean reads were assembled into 61,796 unigenes with an N50 length of 2160 bp. Among these, 49,804 unigenes with FPKM greater than 1 were obtained, 27,405 protein-coding genes were predicted, and 1017 putative scnDNA were identified. We randomly selected 24 of these putative scnDNA for initial design of 72 PCR primers using Primer-BLAST (Ye et al., 2012).
To validate the scnDNA markers, genomic DNA was extracted from two individuals each of E. horsfieldii (population BLBG) and E. japonica (populations SCB1, SCB2) (Appendix 1). Validation was done separately in these two species. DNA was extracted from approximately 15-20 mg of silica gel-dried leaf samples using the Plant Genomic DNA extraction kit (BioTeke, Beijing, China), following the manufacturer's instructions. DNA amplification was performed in a 20-μL reaction mixture containing 10 μL of 2× EasyTaq PCR SuperMix (TransGen Biotech Co.), 0.5 μL each of forward and reverse primer, 8.5 μL of ddH 2 O, and approximately 50 ng of template DNA. The PCR program was set as one cycle of 5 min at 95°C; 35 cycles of 30 s at 94°C, 90 s at 55°C, 60 s at 72°C; and a final extension of 10 min at 72°C. For amplicon quality and quantity check, each PCR product was run for 15 min of electrophoresis in 1% agarose gel at 120 V. Amplicons with only one clear band were sequenced using an ABI 3730xl DNA Sequencer (Tsingke Biological Technology, Guangzhou, China). Ten primer pairs showed single clear bands and good electropherogram quality from Sanger sequencing. Sequence data were then read, trimmed, and exported to FASTA in Chromas version 2.6.2 (Technelysium, South Brisbane, Queensland, Australia; http://technelysium.com.au). FASTA sequences were aligned using the MUSCLE algorithm available in MEGA 7.0 (Kumar et al., 2016) and then formatted manually to PHYLIP file format as input for further analysis.
These 10 primers were assessed for polymorphism in three populations of E. horsfieldii from Indonesia and one population of E. tubulosa from China, following the same protocol described above for marker validation. In total, we collected 38 wild individuals of E. horsfieldii from Indonesia and six individuals of E. tubulosa from China. Voucher specimens of E. horsfieldii were deposited in the Herbarium Hortus Botanicus Baliense (THBB), Bali Botanic Garden, Indonesian Institute of Sciences (LIPI), Bali, Indonesia (Appendix 1). PCR primer pairs and characteristics of the 10 newly developed scnDNA markers, GenBank accessions, and BLASTN hits are presented in Table 1. Genetic diversity measures of all samples derived from pairwise number of site differences, including nucleotide diversity (π), Watterson estimator (θ w ), and related measures, were calculated using DnaSP version 5.10 (Librado and Rozas, 2009) (Table 2). The average number of alleles was 5.9 (5 to 7), π was 5.03 × 10 −3 (1.75 × 10 −3 to 7.5 × 10 −3 ), and θ w was 4.01 × 10 −3 (1.65 × 10 −3 to 7.23 × 10 −3 ). There was no significant Tajima's D; however, EhoScn04a and EhoScn16a showed significant negative Fay and Wu's H, indicating non-neutrality of these two loci. Most loci showed no linkage disequilibrium after 10,000 permutations in Arlequin version 3.5.2.2 (Excoffier et al.,

2005), except EhoScg15b and EhoScg24b in population SCHU.
For the population genetic analysis, the PHYLIP file format was first converted to STRUCTURE input using SPADS 1.0 (Dellicour and Mardulyn, 2014), followed by conversion to GENEPOP format using PGDSpider2 (Lischer and Excoffier, 2012). Allelic richness of each locus and population and number of private alleles (Appendix 2) were generated by PopGenReport version 3.0.0 (Adamack and Gruber, 2014

CONCLUSIONS
In this study, we demonstrated the application of RNA-Seq to develop scnDNA markers for E. horsfieldii, and these markers were cross-amplified to E. japonica and E. tubulosa. These scnDNA markers will provide useful resources to study the population genetic diversity and population structure of these rare medicinal plants. Note: A = number of alleles; h = haplotype diversity; k = average number of nucleotide differences; n = number of sequences used (consists of four sequences of E. japonica, 12 sequences of E. tubulosa, and 80 sequences of E. horsfieldii); p n = proportion of polymorphic sites; π = nucleotide diversity; θ w = Watterson estimator per site from S; S = variable sites. **Significant (α = 0.01).