Finding the pond through the weeds: eDNA reveals underestimated diversity of pondweeds

Premise of the Study The detection of environmental DNA (eDNA) using high‐throughput sequencing has rapidly emerged as a method to detect organisms from environmental samples. However, eDNA studies of aquatic biomes have focused on surveillance of animal species with less emphasis on plants. Pondweeds are important bioindicators of freshwater ecosystems, although their diversity is underestimated due to difficulties in morphological identification and monitoring. Methods A protocol was developed to detect pondweeds in water samples using atpB‐rbcL and ITS2 markers. The water samples were collected from the Grand River within the rare Charitable Research Reserve, Ontario (RARE). Short fragments were amplified using primers targeting pondweeds, sequenced on an Ion Torrent Personal Genome Machine, and assigned to the taxonomy using a local DNA reference library and GenBank. Results We detected two species earlier documented at the experimental site during ecological surveys (Potamogeton crispus and Stuckenia pectinata) and three species new to the RARE checklist (P. foliosus, S. filiformis, and Zannichellia palustris). Discussion Our targeted approach to track the species composition of pondweeds in freshwater ecosystems revealed underestimation of their diversity. This result suggests that eDNA is an effective tool for monitoring plant diversity in aquatic habitats.

recovery strategy for these two species called for surveys to reconfirm their presence at previously reported locations and identify new occurrence sites within their distribution (Parks Canada, 2012;Environment Canada, 2015).
The morphological identification of pondweeds is often limited by phenology (during fruiting period) and microscopic traits (Fernald, 1932). Therefore, species-level identification of pondweeds is difficult for non-experts who often assist with field work. Additionally, aquatic habitats are often less accessible than terrestrial ones, with many plant species being completely submersed, and thus difficult to find. Overall, these factors lead to sporadic, incomplete records for aquatic macrophytes in ecological surveys and inventories (Wetzel, 1983). A targeted metagenomic approach for detection of this group of aquatic plants has the potential to overcome difficulties with their monitoring and identification during ecological surveys.
We explored the detection of pondweeds using eDNA in water samples collected along the Grand River, Ontario, Canada, within the rare Charitable Research Reserve (RARE). We further compared our results with the checklist of RARE that was generated using traditional methods of collecting and morphological identification. Finally, we tested how the markers from different genome compartments (plastid and nuclear), marker length, and primer specificity affect the taxonomic assignment of the eDNA fragments to the species of pondweeds.

Selecting eDNA markers and creating a DNA reference library
Using tissue and DNA available within the Centre for Biodiversity Genomics (CBG) archive, we selected 30 species of pondweeds recorded in Ontario: Potamogeton (26), Stuckenia (3), and Zannichellia (1) (Brouillet et al., 2010;Desmet and Brouillet, 2013;dx.doi.org/10.5883/DS-VASCAN). Species of known hybrid origin (e.g., P. ×ogdenii), defined in the literature as nothospecies (McNeill et al., 2012), were not included in the list. Each species was represented on average by three specimens with verified taxonomy (90 specimens) (dx.doi.org/10.5883/DS-POTAM) and used to generate the local pondweed reference library (DS-POTAM). DNA was extracted with the standard CBG protocols (Ivanova et al., 2008(Ivanova et al., , 2011. Two highly variable DNA regions were selected to design eDNA markers: the plastid-encoded intergenic spacer atpB-rbcL (Ito et al., 2014) and the nuclear-encoded ribosomal internal transcribed spacer ITS2 (China Plant BOL Group, 2011). The primers for these markers, specific for Potamogetonaceae, were designed to recover all species selected for the local reference library that generated amplicons suitable for eDNA detection.
The primers successfully amplifying pondweeds (Appendix S1, A: primers 6-9) were then used to build a complete reference DNA barcode library for 30 species of pondweeds recorded in Ontario. Amplicons were obtained using the protocols described in Appendix S1 (B and C), sequenced following standard procedures for the ABI 3730xl DNA Analyzer (Applied Biosystems, Foster City, California, USA), and uploaded to BOLD. Newly generated data sets included 71 atpB-rbcL sequences (mean length 450 bp) for 28 species and 88 ITS2 sequences (mean length 260 bp) (Appendix S2). The generated DNA barcode reference libraries for atpB-rbcL and ITS2 were used to design shorter markers for amplifying degraded DNA present in environmental samples (Table 1; Appendix S1, A: primers 10-14). Based on the reference libraries, two markers of different length for atpB-rbcL (117 bp and 184 bp) were used to explore the influence of amplicon length on species detection, along with a 157-bp region of ITS2 as eDNA markers.
To model species resolution for Ontario pondweeds for the eDNA markers, one individual per species was selected, as no intraspecific variation was observed within the generated alignments. Each of the three fragments were analyzed using distances calculated with the Tamura-Nei model in Geneious version 9.1.5 (Tamura and Nei, 1993) and approximated with a maximum likelihood tree using the Fast Tree algorithm (Price et al., 2010) (Figs. 1-3, Appendix S3). The groups of unresolved species were indicated in the local reference libraries as "complexes. "

Selection of the field site
The selected site is along the Grand River within RARE (43.3859, where two species of pondweeds (S. pectinata and P. crispus L.) were previously documented (Telfer et al., 2015). Herbarium vouchers were identified and deposited at the BIO Herbarium, University of Guelph (OAC; acronym used in accordance with Index Herbariorum [Thiers, 2017]). Tissue and DNA from the specimens are stored at the Center for Biodiversity Genomics (BIOUG24048-C12, BIOUG24048-D12); DNA barcodes are available on BOLD (dx.doi.org/10.5883/DS-POTAM).

Collecting water samples and eDNA extraction
Water samples were collected from three locations 200 m apart from each other, on 8 September 2016. Three replicates of 0.5 L of water were sampled in sterile plastic bottles from each location (Fig. 4). The samples of water were transported in a cooler to the laboratory and stored at 4°C overnight. Samples were filtered through

High-throughput sequencing strategy
Each of the nine DNA samples, one negative DNA control, and one negative PCR control were amplified in four replicates (Fig. 4). The first round of PCR was performed with primers 10-14 (Appendix S1). Prior to the second round of PCR, replicates were pooled and labeled with IonExpress MID tags (Thermo Fisher Scientific, Waltham, Massachusetts, USA) using fusion primers to produce barcoded amplicon libraries (Appendix S1, B-D). The fusion primers used in this study targeted the priming region with a tail containing P1 ISP binding adapter, key, and IonExpress MID tags for the reverse primers and the trP1 adapter for the forward primers (Appendix S1, D). All labeled products were pooled and the sequencing library was prepared with the Ion PGM Hi-Q OT2 400 Kit and the Ion PGM Hi-Q Sequencing Kit (Thermo Fisher Scientific) according to the manufacturer's instructions. All products were sequenced using an Ion 318 v2 chip on the Ion Torrent Personal Genome Machine (PGM; Thermo Fisher Scientific).

Bioinformatic workflow
The raw sequencing data were subject to two different bioinformatics pipelines (Fig. 4). The first one compared dereplicated sequences (FASTX Toolkit: http://hannonlab. cshl.edu/fastx_toolkit/) with the local reference libraries for pondweeds for atpB-rbcL (28 species) and ITS2 (30 species) using BLAST through QIIME (Caporaso et al., 2010). This analysis was performed to differentiate and detect species with minor distances (1-2 bp). The second approach included generation of operational taxonomic units (OTUs) with 99% identity using Uclust version 1.2.22q (Edgar, 2010). The OTUs were similarly analyzed using a pondweed reference library, and the results were compared with those obtained through the analysis of dereplicated sequences.
To indicate taxonomic affiliation with the non-target organisms (non-pondweeds), the OTUs were compared with a global reference library (GenBank). Both pipelines included trimming of the primer sequences using Cutadapt version 1.8.1 (Marcel, 2011) and filtering based on quality (QV20) and length (minimum 100 bp) with Sickle version 1.33 (Joshi and Fass, 2011). Additional filtering criteria were applied to exclude sequences with (1) less than 98% identity of a query sequence to a reference, (2) lower than two thirds overlap calculated based on the average expected length of the amplified DNA fragments (i.e., 77 bp for 117-bp fragment of atpB-rbcL, 121 bp for 184-bp fragment of atpB-rbcL, 104 bp for 157-bp fragment of ITS2), and (3) threshold more than 10 reads assigned to a taxon in the reference library.

Taxonomic assignment of dereplicated sequences and OTUs
Abundance of reads after filtering for three markers (atpB-rbcL-117, atpB-rbcL-184, and ITS2-157) varied between 200,000-500,000 for all locations (Fig. 5). The number of dereplicated sequences and OTUs associated with atpB-rbcL-117 and atpB-rbcL-184 was on average substantially lower (8000 dereplicated sequences, 110 OTUs) compared to ITS2-157 (70,000 dereplicated sequences, 8500 OTUs) (Appendix S4). By contrast, the proportion of reads assigned to the target DNA (species of pondweeds represented in the local DNA reference library) was on average the highest for atpB-rbcL-184 (85%), followed by atpB-rbcL-117 (59%), and the lowest for ITS2-157 (10%) (Fig. 5). The low number of hits to target species for ITS2-157 indicates substantial co-amplification of non-target DNA. Despite the lower specificity of ITS2-157, three species of pondweeds were detected at each location, whereas atpB-rbcL-117 and atpB-rbcL-184 on average detected two species at each of the locations (Fig. 5). Five detected taxa were resolved to species level using our custom libraries, supported with more than one marker and/or in multiple locations (P. crispus, P. foliosus Raf., S. filiformis (Pers.) Börner, S. pectinata, and Z. palustris L.) ( Table 2). BLAST results for both data sets (dereplicated sequences and OTUs) using a pondweed reference library showed overall consistency among all eDNA markers and across all locations. An exception was the discrimination of S. filiformis from S. pectinata by the dereplication data sets for atpB-rbcL-117 at all three locations. The minor distance between these two  species (two base pairs: Appendix S3, A) was masked by the clustering approach for tracking pondweed diversity, which grouped reads belonging to closely related species. Twelve dereplicated sequences with atpB-rbcL-184 were assigned to S. vaginata (Turcz.) Holub only in the third location.
Only OTUs were compared with the global reference library (GenBank) to summarize the higher-level taxonomy of nontarget (i.e., non-pondweed) sequences. The BLAST results recovered S. pectinata and Z. palustris in all locations for both atpB-rbcL and ITS2 eDNA markers with similar read depth ( Table 2). The presence of P. crispus was also confirmed by GenBank data for ITS. In contrast, the sequences assigned with the local reference library to P. foliosus (100% identity to MF694340) in GenBank were identical to the specimen identified as P. pusillus L. collected from Turkey (KX273110). OTUs for atpB-rbcL-117 and atpB-rbcL-184 assigned to P. foliosus (100% identity to MF694509) with the local library matched P. gayi A. Benn. (a tropical species from South America commonly cultivated in aquariums) in GenBank (98% identity to KT634258).
In addition to the species of pondweeds, the GenBank reference data for ITS indicated only one species of flowering plants in the second location (Panicum capillare L. [34 reads, 1 OTU], a common weed in North America) ( Table 2). The same database indicated presence of Charophyta (location 1), Cyanobacteria (locations 1 and 2), and Proteobacteria (all three locations).

DISCUSSION
ITS2-157 showed overall higher species resolution (57%) compared to both atpB-rbcL eDNA markers (43-46%) despite a lower proportion of reads matching the local reference library. The higher resolution of ITS2 relative to the atp-rbcL markers is a result of the better taxonomic resolution of pondweed species by ITS2. The complexes of unresolved species with ITS2-157 included 2-3 closely related species, whereas atpB-rbcL showed polytomy for the groups of 2-5 species (Figs. 1-3). In addition to the "linear-leaved lineage" (complex 2), which is known for being challenging to discriminate morphologically (Haynes and Hellquist, 2000;Lindqvist et al., 2006), atpB-rbcL failed to identify the "broad-leaved" species of pondweeds, although they have more distinct morphological characteristics (complexes 1, 4, and 5). However, the species-level resolution was complemented by two genome compartments (chloroplast and nuclear) resulting in discrimination of 77% of the pondweed species from Ontario. For example, the lack of resolution between S. vaginata and S. filiformis with ITS2 (complex 6) was redeemed with atpB-rbcL, which discriminated these two species. Conversely, ITS2 unambiguously resolved P. crispus and P. hillii (complexes 1 and 2 with atpB-rbcL). Five species of pondweeds detected in our experiment were resolved by at least one of the three eDNA markers (Figs. 1-3, species in red font with asterisk).
The three markers tested in this experiment (atpB-rbcL-117, atpB-rbcL-184, and ITS2-157) demonstrated suitability for detection of eDNA extracted from water samples. The performance of the plastid markers was highly specific, targeting the species from Potamogetonaceae almost exclusively, especially the longer atpB-rbcL fragment (85% average). Despite ITS2 showing a lower ratio of reads matching the reference library (10%), it detected a relatively broader range of species from this group, compared to atpB-rbcL markers (Table 2, Fig. 5). The high copy number per compartment, and the less permeable membrane of chloroplasts and mitochondria relative to the nucleus, suggest that organellar DNA persists longer in the environment than nuclear DNA (Nielsen et al., 2007;Barnes and Turner, 2016). The lower number of well-preserved copies of nuclear DNA relative to cpDNA, in combination with more conserved priming regions for ITS2, may explain the observed co-amplification of non-target taxa. However, only a small proportion of the non-target ITS2 reads (~0.06%) were identified using a BLAST algorithm as different algal and microbial communities in the analyzed locations. Between two chloroplast markers, the shortest one (mean 117 bp) succeeded in consistent detection of S. filiformis in all three locations, whereas the longest marker (mean 184 bp) failed to do so. We suggest this species likely occurred farther from the experimental site, and was therefore represented by more degraded eDNA, compared to S. pectinata. The use of longer markers may indicate the location of target populations more accurately because the extent of DNA degradation increases with distance from source specimens (Seymour et al., 2018). In addition to two species previously documented at RARE (S. pectinata and P. crispus), we detected three species (P. foliosus, S. filiformis, and Z. palustris) that are new to the experimental site and not included in the checklist for this nature reserve. Our findings are supported by the high abundance of sequences from more than one eDNA marker and reported at all locations. We predict that the presence of these new species will be confirmed during future ecological surveys. Although we detected S. vaginata, its presence at RARE is dubious because it was found by a single eDNA marker (atpB-rbcL-184), at one location (3), and by a small number of reads (12). Further tests are required to confirm this finding.
The differences in the taxonomic assignment of OTUs using GenBank and the local reference library in BOLD (DS-POTAM) are explained by two factors. First is the inconsistency between taxonomic treatments in European and North American literature . For example, two species (P. foliosus and P. pusillus) are recognized in the Flora of North America (FNA; Haynes and Hellquist, 2000), whereas only one of these species (P. pusillus) is recognized in Europe (Uotila, 2006). We hypothesize that the specimen collected in Turkey potentially represents a P. foliosus population (the sequences MF694340 and KX273110 are identical) that was identified as P. pusillus (Aykurt et al., 2017) by European taxonomic treatments. The reference specimens for both species in the pondweed reference library were identified by the authors of the treatment in FNA (CCDB-26261-D06, CCDB-26261-B09) and are differentiated by our eDNA markers (Appendix S3). It allows us to be confident that the sequences detected in our eDNA samples correspond with the taxa identified as P. foliosus in the FNA. The second reason is an incompleteness of the reference library in GenBank for the taxa and/or the marker. Specifically, the lack of atpB-rbcL sequences for the local species (P. foliosus) in GenBank resulted in the top hit match with a tropical species (P. gayi). The developed method can be applied for the effective detection of aquatic plant species and to improve knowledge of their distribution. For example, P. hillii (Hill's pondweed), a species of special concern in Ontario (COSEWIC, 2005), is known in the province only from several locations in Manitoulin Island, Bruce and Wellington counties, in the habitats associated with dolomitic limestone (Hellquist, 1984). It is possible that its range is wider than currently reported, as additional populations might occur in similar habitats along the Niagara Escarpment and the Precambrian contact line. Furthermore, a combination of the detected species of pondweeds can be used as a "fingerprint" for a biological community, or as an indicator of water quality, which is valuable to broader ecological and biomonitoring questions. Targeting a broader group of freshwater macrophytes through eDNA detection is an effective strategy for tracking underestimated plant diversity in aquatic habitats.
Our study demonstrated that eDNA can effectively detect different species of pondweeds in water samples, evaluate their potential under-estimated diversity in aquatic communities, and identify locations of rare and protected species. The use of both chloroplast and nuclear markers improves species resolution among the selected species of pondweeds and increases reliability of the results for the species with comparably lower abundance. Finally, we demonstrated that the results of eDNA detection strongly depend on the completeness and accuracy of the reference libraries.

ACKNOWLEDGMENTS
Funding for this study was provided by the Species at Risk Research Fund for Ontario (SARRFO); the Ministry of Research, Innovation, and Science; and the Canada First Research Excellence Fund. This work represents a contribution to the Food from Thought research program. We thank the rare Charitable Research Reserve (RARE) and Jenna Quinn for facilitating collection of water samples. We thank the two anonymous reviewers whose feedback improved the manuscript. We also extend our thanks to Natalia Ivanova for practical recommendations regarding planning of the experiment, Vasco Elbrecht for his suggestions about the analysis and illustrations, and Elizabeth Sears and Stephanie Pedersen for assisting in the field and laboratory work.

SUPPORTING INFORMATION
Additional Supporting Information (Appendices S1-S4) may be found online in the supporting information tab for this article.