Microsatellite marker development in the crop wild relative Linum bienne using genome skimming

PREMISE : Nuclear microsatellite markers were developed for Linum bienne , the sister species of the crop L. usitatissimum , to provide molecular genetic tools for the investigation of L. bienne genetic diversity and structure. METHODS AND RESULTS : Fifty microsatellite loci were identified in L. bienne by means of genome skimming, and 44 loci successfully amplified. Of these, 16 loci evenly spread across the L. usitatissimum reference nuclear genome were used for genotyping six L. bienne populations. Excluding one monomorphic locus, the number of alleles per locus ranged from two to 12. Four out of six populations harbored private alleles. The levels of expected and observed heterozygosity were 0.076 to 0.667 and 0.000 to 1.000, respectively. All 16 loci successfully cross-amplified in L. usitatissimum . CONCLUSIONS : The 16 microsatellite loci developed here can be used for population genetic studies in L. bienne , and 28 additional loci that successfully amplified are available for further testing.

pooled libraries (150 × 150 bp) were sequenced at Novogene (Beijing, China) in an Illumina HiSeq X lane (Illumina, San Diego, California, USA). Contigs generated by assembling raw reads with SPAdes version 3.13 (Bankevich et al., 2012) were mapped against a L. usitatissimum nuclear genome reference (GenBank IDs CP027619.1-CP027633.1) in BWA version 0.7.17 (Li and Durbin, 2009). The mapping contigs were then scanned for di-, tri-, and tetranucleotide repeat motifs with MSATCOMMANDER version 1.0.8 (Faircloth, 2008) using default settings to design primers. Contigs containing microsatellite loci were filtered in R version 3.5.2 (R Core Team, 2018) using a custom-made script. Loci with primers that met the following requirements were retained: pair penalty <1.7, left-right penalty <0.8, difference in melting temperature <2°C, primer distance from locus >20 bp, and pair product size between 89 and 301 bp. Polymorphic loci were then identified by BLASTing all contigs mapping to the L. usitatissimum reference genome for seven L. bienne individuals against the filtered contigs containing microsatellite loci, using BLAST version 2.2.31 (Altschul et al., 1990). Finally, 50 loci (Appendix 2) were left after filtering in R version 3.5.2 (R Core Team, 2018) based on BLAST output. Only microsatellite loci with the following features were retained: ≥4 repeats of the base motif, <5 mismatches between BLAST match and reference, and at least one individual per BLAST group differed from the reference in number of motif repeats. The code used for de novo assembly and selection of microsatellite loci is available in Appendix S1.
For in vivo testing, DNA was extracted from seedlings of six L. bienne populations as well as other Linum species (Appendix 1). DNA extractions were performed with the ISOLATE II Plant DNA kit (Bioline, London, United Kingdom), using approximately 20 mg of dry leaf material and following the kit protocol with buffer PA1. The 50 loci were first amplified in seven individuals following the Taq DNA Polymerase Master Mix instructions (ThermoFisher Scientific, Waltham, Massachusetts, USA). The PCR program consisted of an initial denaturation of 2 min at 94°C; 35 cycles of 1 min at 94°C, 1 min at 56°C (annealing temperature [T a ]), and 2 min at 72°C; and a final extension step of 10 min at 72°C. For 12 out of 50 primer pairs, these conditions did not lead to amplification or produced multiple bands. When multiple bands were obtained, we tested the primers again by increasing T a by 1°C. In situations where no initial amplification occurred, we decreased T a by 1°C. In total, 44 loci amplified successfully at the end of this process (Appendix 2), with sizes as expected from MSATCOMMANDER output. To genotype all individuals, 16 loci were selected (Table 1) based on maximizing dispersion along the genome, the visual identification of polymorphisms on agarose gels, and avoiding the overlap of peaks during capillary electrophoresis by varying the PCR product sizes. PCR products were pooled in mixes of four loci, and reverse primers were tagged with four different fluorochromes (Table 1). PCR products were electrophoresed on an ABI PRISM 3700 DNA analyzer (Applied Biosystems, Foster City, California, USA), along with a GeneScan 500 LIZ fluorescent internal size standard. Transferability was also tested in three additional Linum species, including L. usitatissimum (Appendix 1), for the subset of 16 loci. Genotyping was conducted manually in Peak Scanner Software version 1.0 (Applied Biosystems). Genetic diversity analyses are presented in Table 2. Allele number and observed heterozygosity (H o ) were estimated with the R package hierfstat version 0. 04-22 (Goudet, 2005). Unbiased expected heterozygosity (H s ), departure from Hardy-Weinberg equilibrium (HWE), linkage disequilibrium, and number of private alleles were calculated using the R package poppr version 2.8.3 (Kamvar et al., 2014).
All 16 loci selected for genotyping amplified in L. bienne (1.23% of missing data on average), but cross-amplification was successful only in L. usitatissimum. In L. bienne, locus ssr14.3 was monomorphic and therefore excluded from the analyses. Locus ssr2.2 included two different microsatellite regions that were then treated as independent loci (ssr2a.2 and ssr2b.2). The number of alleles per locus varied between two and 12 over all six L. bienne populations. All populations harbored one to three private alleles for one or more loci, except for populations VIL and IOW2. Depending on the population, 12 to 16 loci significantly deviated from HWE (P < 0.05). When loci were in HWE, it was mostly due to fixed alleles (Table 2). H o ranged between 0.000 and 1.000, and H S ranged between 0.000 and 0.773, across populations and loci. Linkage disequilibrium fluctuated between −0.336 and 1.000, with varying percentages of loci pairs in linkage disequilibrium within populations (between 9% and 54%, P < 0.05) (Appendix S2).
The high H o and frequent deviation from HWE (Table 2) might arise from fixed alleles on different paralogs produced by past polyploidization events in the genus Linum, which was also observed by Cloutier et al. (2012). If duplication is assumed when genotyping, consistency is essential while scoring loci showing a heterozygote fingerprint. Whether the latter is considered the result of homozygosity, heterozygosity, or a combination of both at the duplicated locus will affect estimates of allele frequencies.

CONCLUSIONS
Microsatellite loci are ideal for providing fine-scale geographic and temporal information about population genetic processes such as relatedness. The set of loci developed here are distributed across the genome and will therefore be useful to distinguish between genome-wide processes caused by demography and locus-specific processes such as adaptation. However, putative paralogy needs investigation. The sequencing of different alleles and additional analysis of the genomic data set could serve to discriminate between paralog copies.  The loci were obtained via genome skimming using the L. usitatissimum genome as reference; therefore, it was possible to identify a putative chromosome for each locus. b The product sizes reported here are based on MSATCOMMANDER output, although the sizes were double-checked by looking at the agarose gels of the PCR products for all loci, where a ladder was added to assist the estimation of the products' approximate size.