Low‐copy nuclear markers in Isoëtes (Isoëtaceae) identified with transcriptomes

Premise of the Study Few genetic markers provide phylogenetic information in closely related species of Isoëtes (Isoëtaceae). We describe the development of primers for several putative low‐copy nuclear markers to resolve the phylogeny of Isoëtes, particularly in the southeastern United States. Methods and Results We identified regions of interest in Isoëtes transcriptomes based on low‐copy genes in other plants. Primers were designed for these regions and tested with 16 taxa of Isoëtes and one species of Lycopodium. Parts of the pgiC, gapC, and IBR3 gene regions show phylogenetic signal within the North American and Mediterranean clades of Isoëtes. Conclusions Transcriptome data prove useful for identification and primer design of low‐copy genes. Three new markers show potential for inferring phylogenies in regional clades of Isoëtes, and possibly across the entire genus.

Isoëtes L. (Isoëtaceae, Lycopodiophyta) is a cosmopolitan genus of ca. 250 recognized species. These heterosporous lycophytes consist of a 2-3-lobed rootstock that bears linear, quill-like, microphyllous leaves or sporophylls. All microphylls have the potential to develop into sporophylls (Foster and Gifford, 1974). Mega-and microsporangia are produced at the base of sporophylls, in some species covered by a layer of tissue called a velum. Traditionally, spore ornamentation and velum coverage have been considered taxonomically important. Although species inhabit a variety of ecological niches, from obligate aquatic to ephemeral terrestrial habitats, their morphology is extremely conserved. Phylogenetic studies in closely related clades of Isoëtes have been limited by a dearth of morphological features and molecular markers. Hoot and Taylor (2001) identified the nuclear ribosomal gene internal transcribed spacer (ITS), a LEAFY homolog nuclear gene intron (LFY), and the plastid atpB-rbcL spacer region as informative markers in Isoëtes. However, although these markers and the plastid rbcL gene show utility in large-scale, global phylogenies, they generally lose resolution at the regional level (Rydin and Wikström, 2002;Hoot et al., 2006;Larsén and Rydin, 2016). LFY is more variable than the other three markers and is fairly informative in recently diverged species groups Hoot et al., 2004). With only a single informative nuclear marker within groups such as the eastern North American clade, it is difficult to fully test phylogenetic hypotheses of reticulate evolution and incomplete lineage sorting.
Transcriptomes provide a valuable tool for marker selection and PCR primer design in the absence of a sequenced genome, as is the case in Isoëtes. Databases such as the 1000 Plants project (http:// www.onekp.com; Matasci et al., 2014) contain transcriptomes across all major lineages of land plants, allowing identification of unique marker regions for a group of interest. Here we describe use of transcriptome data to develop PCR primers for phylogenetically informative low-copy nuclear markers in Isoëtes.

METHODS AND RESULTS
Markers of interest were selected based on a literature search of reportedly low-copy nuclear markers in ferns and mosses (Table 1; Szövényi et al., 2006;Schuettpelz et al., 2008;Rothfels et al., 2013). Nucleotide sequences for these markers were obtained from the National Center for Biotechnology Information's (NCBI) GenBank (http://www.ncbi.nlm.nih.gov/genbank/; Clark et al., 2016) or TreeBASE (http://www.treebase.org; Sanderson et al., 1994) (Camacho et al., 2009), local BLAST databases were constructed from each Isoëtes transcriptome. The sequences of selected fern (Rothfels et al., 2013) and moss (Szövényi et al., 2006) low-copy nuclear markers were BLASTed against the transcriptome databases to identify those markers present as single-copy in Isoëtes. These single-copy marker regions were extracted from their respective transcriptome and aligned with marker sequences from the literature using Geneious version 7 (Kearse et al., 2012). Primer sequences from the literature were modified to match the Isoëtes transcriptome sequences.
Plants were collected from the field, and leaf tissue was desiccated with silica gel. Voucher specimens have been stored at the Old Dominion University herbarium (ODU) and/or the U.S. National Herbarium (US). DNA was extracted from approximately 200 mg of dried tissue with the QIAGEN DNeasy Plant Mini Kit (QIAGEN Inc., Valencia, California, USA) or AutogenPrep 965 (Autogen Inc., Holliston, Mississippi, USA) using standard protocols. Sixteen diploid taxa of Isoëtes and one species of Lycopodium L. (one individual per taxon) were selected from available DNAs to represent various levels of divergence (Appendix 1).
Markers were selected for Sanger sequencing based on their producing a single band across all samples and for a maximum size of ~1000 bp. PCR products were treated with ExoSAP-IT PCR cleanup enzyme mix (Affymetrix Inc., Santa Clara, California, USA) before cycle sequencing with BigDye Terminator v3.1 (Thermo Fisher Scientific Inc.). The labeled sequencing fragments were read on an ABI 3130xl Genetic Analyzer (Thermo Fisher Scientific Inc.), and the resulting chromatograms were edited and analyzed using Geneious (Kearse et al., 2012).
Initial screening of primers showed that all amplify in at least some of the eastern North American taxa. Gel electrophoresis revealed that IBR3_1 and Transducin_2 are too long (~2000 bp) and Transducin_1 has both short and long copies in some individuals (~500 bp and ~1000 bp), making these poor candidates for a Sanger sequencing approach without needing molecular cloning or gel extraction. Although gapC_short readily amplified, it is contained within gapC_long, making sequencing of the shorter fragment redundant. pgiC, IBR3_2 (hereafter IBR3), and gapC_long (hereafter gapC) were selected for PCR and sequencing of the full taxa list (Appendices 2, 3).

pgiC
This primer pair is rooted in exons 14 and 16, and amplifies across introns 14, 15, and exon 15 of this locus (Rothfels et al., 2013). The region amplified easily across all taxa of Isoëtes and Lycopodium clavatum L., and generated consistently high-quality sequence data. All sequences aligned well, with a total alignment length of 466 bp and pairwise identity of 83%. Excluding L. clavatum, alignment length decreases to 357 bp and pairwise identity increases to 89%. Sequence length between these species of Isoëtes ranges from 310 to 347 bp, with a mean of 324 bp (Table 2). This is approximately half the length of the same region in ferns tested by Rothfels et al. (2013).

gapC
The gapC gene encodes cytosolic glyceraldehyde-3-phosphate and is part of the GAPDH gene family (Strand et al., 1997;Wall, 2002;Szövényi et al., 2006). Primers designed by Szövényi et al. (2006) are rooted in exons 5 and 9 and amplify all exons and introns in between. However, given concern that the resulting marker in Isoëtes may be too long for Sanger sequencing, the primers designed for this study were rooted in exons 5 and 8, amplifying introns 5, 6, 7, and exons 6 and 7.
This marker showed the least ability to routinely generate high-quality sequence data. Although not detected in any of the transcriptomes available, it is possible this results from off-target amplification of other members of the GADPH gene family (i.e., gapCp or an unnamed gapC/gapCp relative) (Schuettpelz et al.,

IBR3
Unlike pgiC and gapC, this gene does not have an extensive history of use as a phylogenetic marker. The IBR3 gene is thought to encode an indole-3-butyric acid-specific peroxisomal enzyme related to acyl-CoA dehydrogenases (Zolman et al., 2007). Rothfels et al. (2013) showed it to be single-copy throughout selected fern lineages, and this also appears to be the case in Isoëtes. Primers for the IBR3 marker amplify most species of Isoëtes easily, with the exception of two members of the Mediterranean clade (I. histrix Bory & Durieu and I. nuttallii A. Braun ex Engelm.). Alignment of Isoëtes sequences is 700 bp long with 87% pairwise identity (Table 2).

CONCLUSIONS
Transcriptome mining is shown to be a useful tool for identification of putative low-copy markers for primer design. Despite having access to transcriptomes of just three species of Isoëtes in the North American clade, primers could be designed for regions that show phylogenetic signal across widely divergent clades in the genus, and potentially across all Lycopodiophyta. Although techniques such as target enrichment allow for generation of data sets orders of magnitude larger (Mandel et al., 2014), design of primers for Sanger sequencing is still more time-and cost-efficient in taxonomic groups for which just a few markers may be needed to infer well-resolved phylogenies. Note: + = successful amplification or sequence quality >85%; -= no amplification or sequence quality <85%; NA = sequencing not attempted.