Evolutionary studies require solid phylogenetic frameworks, but increased volumes of phylogenomic data have revealed incongruent topologies among gene trees in many organisms both between and within genomes. Some of these incongruences indicate polytomies that may remain impossible to resolve. Here we investigate the degree of gene-tree discordance in Solanum, one of the largest flowering plant genera that includes the cultivated potato, tomato, and eggplant, as well as 24 minor crop plants.
A densely sampled species-level phylogeny of Solanum is built using unpublished and publicly available Sanger sequences comprising 60% of all accepted species (742 spp.) and nine regions (ITS, waxy, and seven plastid markers). The robustness of this topology is tested by examining a full plastome dataset with 140 species and a nuclear target-capture dataset with 39 species of Solanum (Angiosperms353 probe set).
While the taxonomic framework of Solanum remained stable, gene tree conflicts and discordance between phylogenetic trees generated from the target-capture and plastome datasets were observed. The latter correspond to regions with short internodal branches, and network analysis and polytomy tests suggest the backbone is composed of three polytomies found at different evolutionary depths. The strongest area of discordance, near the crown node of Solanum, could potentially represent a hard polytomy.
We argue that incomplete lineage sorting due to rapid diversification is the most likely cause for these polytomies, and that embracing the uncertainty that underlies them is crucial to understand the evolution of large and rapidly radiating lineages.
Recent advances in high-throughput sequencing have provided larger molecular datasets, including entire genomes, for reconstructing evolutionary relationships (e.g., Ronco et al., 2021). Considerable progress has been made since the publication of the first molecular-based classification of orders and families of flowering plants (APG, 1998), with one of the most recent examples including a phylogenetic tree of the entire Viridiplantae based on transcriptome data from more than a thousand species (One Thousand Plant Transcriptomes Initiative, 2019). While large datasets have strengthened our understanding of evolutionary relationships and classifications across the Tree of Life, several of them have demonstrated repeated cases of persistent topological discordance across key nodes in birds (Suh et al., 2015; Suh, 2016), mammals (Morgan et al., 2013; Romiguier et al., 2013; Simion et al., 2017), amphibians (Hime et al., 2021), plants (Wickett et al., 2014; One Thousand Plant Transcriptomes Initiative, 2019), and fungi (Kuramae et al., 2006). Whereas previous expectations were that these “soft polytomies” would be improved with the addition of more data, their persistence after addition of more taxonomic and molecular data have led some authors to suggest that they actually represent “hard polytomies”, i.e., extremely rapid divergence events of three or more lineages at the same time or reticulate evolution due to species hybridization and/or introgression. In an era where obtaining genome-wide sampling of species for phylogenetic reconstruction has become mainstream, the question about whether persistent topological discordance can be resolved with more data or whether they reflect complex biological realities (Jeffroy et al., 2006; Philippe et al., 2011) is becoming increasingly common.
Discordance in phylogenetic signal can be due to three general classes of effects (Wendel and Doyle, 1998): (1) technical causes such as gene choice, sequencing error, model selection, or poor taxonomic sampling (Philippe et al., 2011, 2017); (2) organism-level processes such as rapid or convergent evolution, rapid diversification, incomplete lineage sorting (ILS), or horizontal gene transfer (Degnan and Rosenberg, 2009), and (3) gene and genome-level processes such as interlocus interactions and concerted evolution, intragenic recombination, use of paralogous genes for analysis, and/or non-independence of sites used for analysis. Together, these biological and non-biological processes can lead to conflicting phylogenetic signals between different loci in the genome and hinder the recovery of the evolutionary history of a group (Degnan and Rosenberg, 2009). Consequently, careful assessment of phylogenetic discordance across mitochondrial, plastid, and nuclear datasets is critical for understanding realistic evolutionary patterns in a group, as traditional statistical branch support measures fail to reflect topological variation of the gene trees underlying a species tree (Liu et al., 2009; Kumar et al., 2012).
Here we explore the presence of topological discordance in nuclear and plastome datasets of the large and economically important angiosperm genus Solanum L. (Solanaceae), which includes 1,228 accepted species and several major crops and their wild relatives, including potato, tomato and brinjal eggplant (aubergine), as well as at least 24 minor crop species (website: Solanaceaesource.org, accessed November 2020). Building a robust species-level phylogeny for Solanum has been challenging because of the sheer size of the genus, and because of persistent poorly resolved nodes along the phylogenetic backbone. Bohs (2005) published the first plastid phylogenetic analysis for Solanum and established a set of 12 highly supported clades based on her strategic sampling of 112 species (9% of the total species number in the genus), spanning morphological and geographic variation. As new studies have emerged with increased taxonomic and genetic sampling (e.g., Levin et al., 2006; Weese and Bohs, 2007; Stern et al., 2011; Särkinen et al., 2013; Tepe et al., 2016), the understanding of overall phylogenetic relationships within Solanum has evolved to recognise three main clades: (1) the Thelopodium clade containing three species sister to the rest of the genus; (2) Clade I containing c. 350 mostly herbaceous and non-spiny species (including the Tomato, Petota, and Basarthrum clades that contain the cultivated tomato, potato, and pepino, respectively); and (3) Clade II consisting of c. 900 predominantly spiny and shrubby species, including the cultivated brinjal eggplant (Table 1). The two latter clades are further resolved into 10 major and 43 minor clades (Table 1).
|Minor clade||Associated major clade (Särkinen et al., 2013)||New associated major clade (this study)||Species||Sampled species (%)|
|Supermatrix||Plastome (PL)||Target capture (TC)|
|Thelopodium||Thelopodium||3||3 (100%)||1 (33%)||1 (33%)|
|African non-spiny||M Clade||VANAns||14||5 (36%)||1 (7%)||—|
|Normania||M Clade||VANAns||3||2 (67%)||1 (33%)||1 (33%)|
|Archaesolanum||M Clade||VANAns||8||8 (100%)||1 (13%)||1 (13%)|
|Valdiviense||M Clade||VANAns||1||1 (100%)||1 (100%)||1 (100%)|
|Dulcamaroid||M Clade||DulMo||45||25 (56%)||8 (18%)||1 (2%)|
|Morelloid||M Clade||DulMo||75||66 (88%)||15 (20%)||1 (1%)|
|Regmandra||Potato||Regmandra||12||6 (50%)||4 (33%)||1 (8%)|
|Pteroidea||Potato||10||10 (100%)||1 (10%)||—|
|Basarthrum||Potato||16||10 (56%)||3 (19%)||3 (19%)|
|Etuberosum||Potato||3||2 (67%)||2 (67%)||1 (33%)|
|Tomato||Potato||7||14 (82%)||8 (47%)||3 (18%)|
|Petota||Potato||113||61 (54%)||38 (34%)||2 (2%)|
|Clandestinum-Mapiriense||Clandestinum-Mapiriense||3||3 (100%)||1 (33%)||1 (33%)|
|Wendlandii-Allophyllum||Wendlandii-Allophyllum||10||7 (70%)||1 (10%)||1 (10%)|
|Nemorense||Nemorense||4||4 (100%)||1 (25%)||—|
|Pachyphylla||Cyphomandra||39||32 (82%)||1 (3%)||—|
|Cyphomandropsis||Cyphomandra||11||7 (64%)||1 (9%)||1 (9%)|
|Geminata||Geminata||150||68 (45%)||5 (3%)||1 (1%)|
|Reductum||Geminata||2||2 (100%)||1 (50%)||—|
|Brevantherum||Brevantherum||83||29 (35%)||3 (4%)||—|
|Gonatotrichum||Brevantherum||7||7 (100%)||1 (14%)||—|
|Inornatum||Brevantherum||5||2 (40%)||1 (20%)||—|
|Elaeagniifolium||Leptostemonum||5||5 (100%)||1 (20%)||1 (20%)|
|Micracantha||Leptostemonum||14||9 (64%)||1 (7%)||—|
|Torva||Leptostemonum||54||34 (63%)||5 (9%)||1 (2%)|
|Erythrotrichum||Leptostemonum||33||13 (39%)||1 (3%)||—|
|Thomasiifolium||Leptostemonum||9||4 (44%)||1 (11%)||—|
|Gardneri||Leptostemonum||10||8 (80%)||1 (10%)||—|
|Acanthophora||Leptostemonum||22||13 (59%)||1 (5%)||-|
|Sisymbriifolium||Leptostemonum||4||4 (100%)||1 (25%)||1 (25%)|
|Carolinense||Leptostemonum||11||8 (73%)||1 (9%)||—|
|Hieronymi||Leptostemonum||1||1 (100%)||1 (100%)||—|
|Eastern Hemisphere Spiny||Leptostemonum||332||197 (59%)||24 (7%)||16 (5%)|
|Crotonoides||Leptostemonum||3||2 (67%)||1 (33%)||—|
|Multispinum||Leptostemonum||1||1 (100%)||1 (100%)||—|
|TOTALS:||1228||746 (60%)||140 (11%)||39 (3%)|
Despite these advancements, phylogenetic relationships between many of the major clades of Solanum have remained poorly resolved, mainly due to limitations in taxon and molecular marker sampling. The most recent genus-wide phylogenetic study by Särkinen et al. (2013), based on seven markers (two nuclear and 5 plastid) and fewer than half (34%) of the species of Solanum, failed to resolve the relationships among major clades, especially within Clade II and the large component Leptostemonum clade, which includes the Old World spiny clade, comprising almost all spiny Solanum species that occur in the eastern hemisphere. To reduce colonial connotations associated with this name, we hereafter refer to this clade as the Eastern Hemisphere Spiny clade (EHS; Table 1).
To gain a better understanding of the evolutionary relationships of Solanum, we built a new Sanger supermatrix that included 60% of the species of the genus and compared the phylogenetic relationships obtained with the Sanger supermatrix with genus-wide plastid (PL) and nuclear target-capture (TC) phylogenomic datasets. We ask: (1) Does a significant increase in taxon sampling of the supermatrix dataset lead to significant changes in the circumscription of major and minor clades in Solanum? (2) Does increased gene sampling in both plastome and nuclear data resolve previously identified polytomies between major clades? (3) Is there evidence of discordance within and between genomic datasets? and (4) Are areas of high discordance in the Solanum phylogeny better represented by polytomies rather than bifurcating nodes? Comparison of the topologies from the different datasets, and results from discordance analyses, a filtered supertree network, and polytomy tests lead us to suggest that some of the soft polytomies of Solanum might be hard polytomies caused by rapid speciation and diversification coupled with ILS. We discuss the consequences that such an interpretation has for investigating the biogeography and morphological trait evolution across the economically important genus.
MATERIALS AND METHODS
A Sanger sequence supermatrix was generated including all available sequences from GenBank related to the genus Solanum for nine regions: (1) the nuclear ribosomal internal transcribed spacer (ITS); (2) low-copy nuclear region waxy (i.e., GBSSI); (3) two protein-coding plastid genes matK and ndhF; and (4) five non-coding plastid regions (ndhF-rpl32, psbA-trnH, rpl32-trnL, trnS-G, and trnT-L). Only vouchered and verified samples were utilized. All sequences were blasted against target regions in USEARCH version 11 (Edgar, 2010). Taxon names were checked against SolanaceaeSource synonymy (website: solanaceaesource.org, accessed November 2020) and duplicate sequences belonging to the same species were pruned out to retain a single individual per taxon. A total of 817 Sanger sequences were generated and added to the matrix, adding 129 previously unsampled species and new data for 257 species (Appendix S1). Final species sampling across major and minor clades of Solanum varied from 13 to 100%, with 742 species of Solanum (60% of the 1228 currently accepted species as of November 2020; Table 1). Four species of Jaltomata Schltdl. were used as an outgroup (Appendix S1).
To assess phylogenetic discordance within Solanum, a set of species was selected for the phylogenomic study to represent all 10 major and as many of the 43 minor clades of Solanum as possible (Table 1), as well as the outgroup Jaltomata. The final sampling included 151 samples for the plastome (PL) dataset (140 Solanum species; Table 1 and Appendix S2) and 40 samples for the target-capture (TC) dataset (39 Solanum species; Table 1 and Appendix S3). For the PL dataset, 86 samples were sequenced using low-coverage genome skimming, and the remaining samples were downloaded from GenBank (November 2019). For the TC dataset, 12 samples were sequenced as part of the Plant and Fungal Trees of Life project (Baker et al., 2021) using the Angiosperms353 bait set (Johnson et al., 2019). In addition, 17 sequences were added from an unpublished dataset provided by A. McDonnell and C. Martine. Sequences for the remaining 12 samples were extracted from the GenBank SRA archive using the SRA Toolkit 2.10.7 (website: https://github.com/ncbi/sra-tools; Appendix S3).
DNA extraction, library preparation and sequencing
Supermatrix Sanger sequencing
DNA extractions for Sanger sequencing were done using DNeasy plant mini extraction kits (Qiagen, Valencia, California, USA) or the FastDNA kit (MP Biomedicals, Irvine, California, USA). Amplification of waxy followed Levin et al. (2005) using two (waxyF with 1171R and 1058F with 2R) or four primer pairs (waxyF with Ex4R, Ex4F with 1171R, 1058F with 3′N, and 3F with 2R). trnT-L was amplified with primers a-d and c-f (Taberlet et al., 1991; Bohs and Olmstead, 2001; Bohs, 2004). ndhF amplification followed Bohs and Olmstead (1997), psbA-trnH followed Sang et al. (1997), matK followed Rosario et al. (2019), ITS and trnS-G followed Levin et al. (2006), and rpl32-trnL and ndhF-rpl32 followed Miller et al. (2009). Sequencing was carried out on ABI automated sequencers at the University of Utah DNA sequencing facility (Salt Lake City, Utah, USA), at the Natural History Museum (London, UK), and at Myleus Biotecnologia (Belo Horizonte, Brazil). Contigs were visually checked in Sequencher version 4.8 (GeneCodes, Ann Arbor, Michigan, USA) and Geneious Prime 2020.1.1 (website: https://www.geneious.com). The combined matrix was 10,908 bp long (Appendix S4). The two most densely sampled regions (trnT-L and ITS) included 84% and 82% of the sampled species, respectively; waxy (54%) and ITS (67%) loci had the most parsimony informative characters (Appendix S4).
PL and TC datasets
DNA for high-throughput sequencing was extracted using the low-salt CTAB method (Arseneau et al., 2017) and quantified on a Qubit fluorometer (Thermo Fisher Scientific, Waltham, Massachusetts, USA). Genome skimming was done at the Institute of Biotechnology, University of Helsinki (Finland). A paired-end genomic library was constructed using the Nextera DNA library preparation kit (Illumina, San Diego, California, USA). Fragment analysis was conducted with an Agilent Technologies (Santa Clara, California, USA) 2100 Bioanalyzer using a DNA 1000 chip. Sequencing was performed on an Illumina MiSeq platform from both ends with a read length of 150 bp. DNA extraction, quantification, and sequencing for TC followed Johnson et al. (2019). All PL and TC reads have been submitted to GenBank and the European Nucleotide Archive (Appendices S2 and S3).
Overview of methodological strategy
Ten phylogenetic analyses with different methodological strategies were compared across the supermatrix, PL and TC datasets, to test if the phylogenetic results were robust despite these different choices (e.g., Philippe et al., 2011, 2017; Saarela et al., 2018; Duvall et al., 2020). The Sanger supermatrix analyses based on Maximum Likelihood (ML) and Bayesian inference (BI) were used as a reference to compare results from the PL and TC species trees because the Sanger supermatrix had the most complete taxonomic sampling (Table 2). For the PL dataset, a total of four analysis were compared to test the effect of missing data and sampling on the resulting phylogenies, as well as the effect of different partitioning schemes in IQ-TREE2 (Table 2; Minh et al., 2020b). For the TC dataset, a total of four analyses were compared to test the effect of the phylogenetic method (ML vs. coalescent methods), missing data, and taxonomic sampling on the resulting phylogenies (Table 2). Full methods for all analyses are described below. All bioinformatic analyses were run either on the Toby-G1 server at the Royal Botanic Garden Edinburgh (Scotland, UK), or the Crop Diversity Server from the James Hutton Institute, in Dundee, Scotland, except for the supermatrix ML analysis.
|Dataset||Taxon and genomic sampling||Phylogenetic method||Partitioning scheme||Acronym|
|Supermatrix||746 taxa, 9 loci||ML: RaxML||—||Supermatrix ML|
|BI: Beast2||—||Supermatrix BI|
|Plastome (PL)||151 taxa, full + partial plastomes||ML: IQ-TREE2||Unpartitioned||PL-151-UP|
|151 taxa, full + partial plastomes||ML: IQ-TREE2||Best-Partition scheme||PL-151-BP|
|125 taxa, full plastomes only||ML: IQ-TREE2||Unpartitioned||PL-125-UP|
|125 taxa, full plastomes only||ML: IQ-TREE2||Best-Partition scheme||PL-125-BP|
|Target capture (TC) (A353)||40 taxa, 338 exons||ML: IQ-TREE2||—||TC-min04-ML|
|40 taxa, 338 exons||Coalescent: ASTRAL-III||—||TC-min04-ASTRAL-III|
|40 taxa, 303 exons||ML: IQ-TREE2||—||TC-min20-ML|
|40 taxa, 303 exons||Coalescent: ASTRAL-III||—||TC-min20-ASTRAL-III|
Sequences were aligned in MAFFT version 7 (Katoh et al., 2005), manually checked, and optimised. Short multi-repeats and ambiguously aligned regions were excluded manually or with trimAl (-gappyout method; Capella-Gutiérrez et al., 2009). Both ML and BI analyses were run on individual loci, as well as on a combined plastid alignment (seven loci in total) to check for topological incongruences, rogue taxa, and misidentified sequences. Visual checks revealed a small number of clear mis-determinations and/or lab errors. A further 26 samples were removed based on high RogueNaRok scores (Aberer et al., 2013). Nuclear sequence data (ITS and waxy) were identified for all known polyploid species (63 species, Appendix S5), and subsequently examined to determine if there were any strong incongruences with the results from the plastid loci. As none were found (Appendices S6 and S7), sequences from these species were kept in the final supermatrix analysis.
Maximum likelihood (ML) and Bayesian inference (BI) analyses were run on all nine loci individually and on the combined plastid dataset (seven loci). ML analyses were run in RaxML-HPC version 8.2.12 (Stamatakis, 2014) on XSEDE on CIPRES Science Gateway version 3.3 (Miller et al., 2010), with 10 independent runs based on unique starting trees. The General Time Reversible (GTR) model with CAT (Tavaré, 1986; Stamatakis, 2006) was used for all partitions. A total of 1,000 non-parametric bootstraps were run; bootstrap support (BS) ≥ 95% was considered strong, 75 to 94% moderate, and 60 to 74% weak.
BI analyses were run using Beast version 2.6.3 (Bouckaert et al., 2019), with two parallel runs sampling trees every 10,000 generations. ModelTest-NG (Darriba et al., 2020) was used to find the most suitable nucleotide substitution model for the individual loci and combined plastid loci; JC + G4 was specified for the ITS and trnS-G regions, GTR + G4 for the psbA-trnH, trnL-T, rpL32 and matK regions, and the GTR + I + G4 model for all other regions, as well as the combined plastid dataset and the full supermatrix dataset. For all analyses, an uncorrelated log-normal relaxed clock, birth-death tree prior, and a normally distributed UCLD.mean prior was specified (mean 1, SD = 0.3). All runs were checked with Tracer version 1.7.1 (Rambaut et al., 2018) to ensure that adequate effective sample sizes were reached (ESS > 200). LogCombiner and TreeAnnotator were used to generate the final maximum credibility tree with a 15% burn-in. Posterior probability (PP) values ≥0.95 were considered strong, and from 0.94 to 0.75 as moderate to weak.
The concatenated ML Sanger supermatrix analysis was run on a concatenated matrix, with the same settings as described above in RaxML. The concatenated BI Sanger supermatrix was analysed partitioning the dataset between ITS, waxy and the plastid genes. Modifications to the analysis included a monophyletic constraint on Solanum, and four parallel runs that were run for 60 million generations with two chains, sampling trees every 10,000 generations. The ML best tree was used as a starting topology to speed up convergence of the chains.
Paired reads from genome skimming were cleaned using BBDuk from the BBTools suite (sourceforge.net/projects/bbmap/; ktrimright = t, k = 27, hdist = 1, edist = 0, qtrim = rl, trimq = 20, minlength = 36, trimbyoverlap = t, minoverlap = 24, and qin = 33). Sequence quality was checked with FastQC (Andrews, 2010) and MultiFastQC (Ewels et al., 2016). Plastome assembly was done using de novo assembly with Fast-Plast version 1.2.6 (website: https://github.com/mrmckain/Fast-Plast), and reference-guided assembly using GetOrganelle version 1.6.2.e (Jin et al., 2020) with the high-coverage plastome sequence of S. dulcamara L. (GenBank KY863443; Amiryousefi et al., 2018). For GetOrganelle, the following settings were used: -w 0.6; -R 20; -k 85; 95; 105; and 127; for Fast-Plast, the Solanales Bow-tie index was used for the assembly. Results from both methods were aligned in Geneious and visually checked to determine consistency. Assembly quality was assessed using the reads identified from the Bow-tie step in the Fast-Plast analysis, which were mapped against the final recovered plastome sequence using BWA (Li and Durbin, 2010). Mean and standard deviation of coverage depth for each base pair was determined by examining the same files in Geneious. Assemblies were annotated using both Chlorobox GeSeq (Tillich et al., 2017) and the “Annotate from database” tool in Geneious using the reference plastid genome of S. dulcamara. Results were compared to ensure that start and stop codons for exon boundaries were congruent. Annotated plastomes were submitted to GenBank (Appendix S2). A total of 55 full plastomes were assembled with a mean length of 155,498 bp (max. 156,138 bp, min. 154,715 bp; Appendix S2), and a mean coverage of 158 (min. 22, max. 571; Appendix S2), and 28 partial plastomes (45,398 to 154,598 bp) with a mean coverage of 29 (min 4, max 96; Appendix S2). All plastomes had a highly conserved quadripartite structure, with no loss, duplication, or expansion of gene families.
Plastomes from this study and those retrieved from GenBank were aligned in Geneious using MAFFT (Katoh et al., 2005), visually checked, and corrected. A copy of the inverted repeat (IRa) was removed prior to phylogenomic analyses, although 1,189 bp were kept at the beginning of the region to be able to extract the gene that spans the boundary between the small single copy (SSC) and IRa region. We then separated the plastome alignment into: (1) 79 protein-coding regions; (2) 15 introns; and (3) 73 intergenic regions. For each dataset, the ambiguously aligned regions and polyA repeats were removed, using visual checks for the exons and intron regions, and the strict mode of trimAl (Capella-Gutiérrez et al., 2009) for the intergenic regions (Appendix S8). Sequences shorter than 25% of the length of the aligned matrix for each region and columns containing >75% of gaps were removed in trimAl (Capella-Gutiérrez et al., 2009) to avoid issues with long branch attraction following Gardner et al. (2021). Two pseudogenes (ycf1 and rps19) at the junction of IRa and Long Single Copy (LSC) (Amiryousefi et al., 2018), and four intergenic regions with no parsimony informative characters were excluded from the final analysis. All remaining loci alignments were concatenated together for the final PL phylogenetic analyses.
To test for the effect of missing data, two datasets were compared: (1) a matrix with 151 taxa containing all 140 species selected for this study with higher proportion of missing data (147,278 bp long with the second IR removed); and (2) a matrix with 125 samples containing only complete plastid sequences (Appendices S2 and S8).
ML searches were run on all PL datasets in IQ-TREE2 (Minh et al., 2020b) with 1,000 non-parametric bootstraps. Optimal substitution models were determined using –TEST in IQ-TREE2 (Appendix S9). For both PL datasets, topologies from two different partitioning schemes were also compared (unpartitioned vs. best-fit partition scheme based on PartitionFinder; Lanfear et al., 2012) in IQ-TREE2, to test if accounting for variation in substitution rate amongst loci affected the phylogenetic results. BS values ≥95% were considered strong, 75 to 94% moderate, and 60 to 74% weak.
Trimmomatic (Bolger et al., 2014) was used to trim reads (TruSeq. 3-PE-simpleclip.fa:1:30:6, LEADING:30, TRAILING:30, SLIDINGWINDOW:4:30, MINLEN:36). Read quality was checked with FastQC (Andrews, 2010) and MultiFastQC (Ewels et al., 2016). Over-represented repeat sequences were removed with CutAdapt (Martin, 2011). HybPiper (Johnson et al., 2016) was used to produce reference-guided de novo assembles using the reference provided by Johnson et al. (2019). Putative paralogs were identified using the HybPiper script “paralog_retriever.py”. Phylogenies were generated for all 45 loci for which paralog warnings were found using MAFFT (Katoh et al., 2005) and FastTree (Price et al., 2010). Five loci were deleted and several taxa whose paralogs caused paraphyly of clades were excluded from 27 loci (one to seven taxa per loci). A single gene (g5299) presented a clear duplication event and was divided into two separate matrices for downstream analyses.
Default HybPiper settings were used for all but three samples (S. betaceum Cav., S. valdiviense Dunal, and S. etuberosum Lindl.), for which the coverage cutoff was reduced from eight to four to maximise recovery of target genes. One sample (S. terminale Forssk.) was excluded due to poor sequence quality. Only the exon dataset was analyzed in downstream phylogenomic analyses, because the transcriptome dataset showed large differences in the recovered flanking regions of target loci between samples, likely due to post-transcriptional splicing and editing of messenger RNA. The HybPiper script “fasta_merge.py” was used to concatenate all genes together and produce a partition file. In summary, an average of 289 genes per sample were recovered for the TC analysis (min 48, max 340) when the two samples with low numbers were excluded (S. betaceum and S. etuberosum, Appendix S3). Furthermore, to reduce the effect of missing data and long branch attraction, sequences shorter than 25% of the average length for the gene were eliminated. The number of loci retained from the min04 and min20 datasets was 310 and 348 respectively, with the final aligned length varying between 242,272 bp and 261,975 bp (Appendix S10).
The effect of missing data was tested by comparing two different sampling thresholds based on the minimum number of taxa in each of the target genes alignments (min20 vs. min04, i.e., a minimum of 20 taxa per gene and a minimum of four taxa per gene, respectively) using HybPiper (Johnson et al., 2016) to retrieve and filter the genes.
ML analyses were run on both TC datasets in IQ-TREE2 (Minh et al., 2020b) with partitioning between loci. In addition, IQ-TREE2 was used to generate individual ML trees for each loci, and the resulting phylogenetic trees were used for coalescent analyses with ASTRAL-III version 5.7.3 (Appendix S9; Zhang et al., 2018), where tree nodes with <10% BS values were collapsed using Newick Utilities version 1.5.0 (Junier and Zdobnov, 2010). Trees with excessively long branches were identified using phyx (Brown et al., 2017) by looking at tree lengths and root-to-tip variation (command “pxlstr”); seven gene trees with excessively long branches were identified and excluded for the min20 and ten for the min04 datasets, leading to a total of 303 and 338 gene trees being used for the respective coalescent analyses. Branch support was assessed using local PP support (Sayyari and Mirarab, 2016) calculated in ASTRAL-III, where PP values >0.95 were considered strong, 0.75 to 0.94 weak to moderate, and ≤0.74 as unsupported.
Comparison of resulting species trees
Topological congruence and discordances between all 10 topologies generated were assessed visually by generating graphical representations through custom R-scripts using the following packages: “ggtree” (Yu, 2020), “stringr” (Wickham and Wickham, 2019), “ape” (Paradis and Schliep, 2019), “ggplot2” (Villanueva and Chen, 2019) and “gridExtra” (Auguie, 2017). To facilitate comparisons, all trees were reduced to include the outgroup Jaltomata and 9 taxa representing the following clades of Solanum, which were recovered in all analyses: Thelopodium, Regmandra, Potato, Morelloid (as a representative of both the Dulcamaroid and Morelloid clades), Archaesolanum, S. anomalostemon S.Knapp & M.Nee (species sister to Clade II), Acanthophora (minor clade of the Leptostemonum) and two representatives of the EHS clade (Table 1). The species sampled in the PL and TC datasets were identical for all except three minor clades, in which different closely related species were sequenced (Acanthophora: S. viarum Dunal/S. capsicoides All.; Morelloid: S. opacum A.Braun & C.D. Bouché/S. americanum Mill.)
Phylogenomic discordance was measured using gene concordance factors (gCF) and site concordance factors (sCF) calculated in IQ-TREE2 (Minh et al., 2020a). These metrics assess the proportion of gene trees that are concordant with different nodes along the phylogenetic tree and the number of informative sites supporting alternative topologies. Low gCF values can result from either limited information (i.e., short branches) and/or genuine conflicting signal; low sCF values (~30%) indicate lack of phylogenetic information in loci (Minh et al., 2020a). The metrics were calculated using the TC-min20-ASTRAL-III min20 topology (303 genes) and the PL IQ-TREE2 topology of 151 species (unpartitioned) where sampling was reduced to 21 and 34 tips in TC and PL topologies, respectively, retaining a single tip for each of the different minor and major clades. An additional tip was retained for the EHS Clade to visualize the gCF and sCF for the crown node of that lineage.
Network analyses and polytomy tests
The presence of reticulate evolution and conflicting signals in gene trees in the TC dataset was explored by generating a filtered supertree network in SplitsTree 4 (Huson and Bryant, 2006) of the TC min20 dataset (303 genes) collapsing branches with <75% local PP support with a minimum number of trees set to 50% (151 trees). Polytomy tests were carried out in ASTRAL-III (Sayyari and Mirarab, 2018), using the ASTRAL-III topologies of the two datasets (min20 and min04). Gene trees were used to infer quartet frequencies for all branches to determine the presence of polytomies while accounting for ILS. The analysis was run twice to minimize gene tree error.
Congruent recovery of major clades
All three datasets, including the supermatrix and the two phylogenomic datasets (PL and TC), recovered previously recognized major clades in Solanum (Figures 1 and 2A, C); a few minor clades, concentrated in Clade II, were found to be polyphyletic in the supermatrix phylogeny, including the Mapiriense-Clandestinum, Sisymbriifolium, Wendlandii-Allophyllum and Cyphomandropsis minor clades (Appendices S11 and S12); comparison with PL and TC phylogenies is not possible, as only one species of each clade were sampled in these datasets. In Clade I, nearly all specimens of the Dulcamaroid clade formed a monophyletic group. The only exception concerned S. alphonsei Dunal, sampled here for the first time. In both the supermatrix and PL analyses, this species was sister to S. valdiviense of the Valdiviense clade, with maximum branch support in the PL analyses (Figure 2, Appendix S13).
Despite these minor novelties, all analyses recovered the Thelopodium clade as sister to the rest of Solanum (Figures 1 and 2; Appendices S11–S15). The Potato clade was strongly supported across all analyses (Figures 1 and 2; Appendices S11–S15), as was the Regmandra clade in supermatrix and PL analyses (only one sample in TC phylogenies). Furthermore, all analyses recovered a clade here referred to as DulMo that includes the Morelloid and Dulcamaroid clades (Figures 1 and 2; Appendices S11–S15). A new strongly supported clade, here referred to as VANAns clade and comprising the Valdiviense (including S. alphonsei, see below), Archaesolanum, Normania, and the African non-spiny clades, was found across all analyses (Figures 1 and 2; Appendices S11 to S15).
Clade II was supported as monophyletic across all topologies (Figures 1 and 2A, C), with maximum branch support in all 10 species trees (Appendices S11 to S15). While differences in sampling prevent thorough comparisons of relationships between clades within Clade II, there was no deep incongruences detected amongst topologies obtained with the supermatrix, PL, and TC datasets (Figures 1 and 2A, C; Appendices S9–S15). Within Clade II, the large Leptostemonum clade (the spiny solanums) was strongly supported in all cases (Figures 1 and 2A, C; Appendices S11–S15).
Incongruent relationships amongst clades and impact of different analyses
Overall, we found that despite using different phylogenetic analyses and investigating the impact of missing data and taxon sampling on the different datasets, these had little impact on the relationships recovered amongst clades. The BI and ML supermatrix analyses were identical in terms of composition and relationships of major clades (Figure 3B), as were the four PL species trees (Figure 3D, E). There were some differences amongst the topologies of the TC datasets, but these differences concerned branches which had little support (Figure 3A–C). Between supermatrix, PL and TC datasets, however, major incongruences between species trees were observed with respect to the relationships among the main clades identified in the section above (Figures 1, 3).
While the BI and ML supermatrix phylogeny supported the monophyly of the previously recognised Clade I that includes most non-spiny Solanum clades (Figure 1; Appendices S11 and 12), the PL and TC phylogenetic trees resolved clades associated with Clade I as a grade relative to Clade II (Figure 2A, C; Appendices S13–S15). This was due in large part to the unstable position of the Regmandra clade that was subtended by a particularly short branch and resolved in different positions along the backbone in all three datasets (Figure 3). For example, the ML supermatrix analysis recovered the Regmandra clade as sister to the Potato clade with strong to moderate branch support (Figure 3B), although the BI supermatrix analysis could not resolve whether the Regmandra clade was sister DulMo + VANAns clade or the Potato clade (Figure 3B, Appendix S12). In contrast, the PL analyses resolved Regmandra as sister to the M clade + Clade II, with either maximal or no branch support at all (Figure 3). The TC species trees resolved Regmandra as sister to the Potato clade, DulMo, and Clade II, with maximum support (Figure 3). While one of the TC ASTRAL-III analysis also recovered this topology with moderate support (local posterior probability 0.82, Figure 3), the other TC ASTRAL-III analysis resolved Regmandra as sister to the VANAns clade, but without any branch support (local PP 0.4, Figure 3).
The previously identified M Clade composed of the VANAns and DulMo clades were not supported by all analyses (Figure 3). While all PL ML analyses recovered the M clade with maximum BS values (Figure 3), none of the TC analyses recovered it. Instead, they resolved the DulMo clade as sister to the Potato clade, with maximal BS or local PP support values (Figure 3). Furthermore, the VANAns clade was recovered as sister to the rest of Solanum (excluding the Thelopodium clade) with moderate support in the TC ML analyses. Placement of the VANAns clade in the TC ASTRAL-III analyses had low or no support value, being resolved as either sister to DulMo, or sister to the rest of Solanum, excluding the Thelopodium clade (Figure 3).
In addition, the position of the Potato clade within Solanum was incongruent between datasets, i.e., whereas it was resolved as sister to Regmandra in the supermatrix analysis, it was resolved as sister to the remaining Solanum in PL dataset, and sister to the DulMo clade in all TC analyses (Figure 3), all with strong branch support. The phylogenomic datasets also showed incongruent positions for the Etuberosum clade within the larger Potato clade, where TC analyses resolved it as sister to the Petota clade with maximum local PP support in the ASTRAL-III analyses (Appendix S15); in the ML analyses, this position either had moderate BS values (76%) or was found to be nested within the Petota clade with no branch support (Appendix S14). In contrast, PL analyses placed Etuberosum clade as sister to the Tomato clade with maximum branch support (Appendix S13).
Finally, the BI and ML supermatrix phylogenies resolved the morphologically unusual S. anomalostemon as sister to the rest of Clade II (BS 95%, PP 1.0; Figure 3, Appendices S11 and S12). This contrasts with results from previous analyses, which found it to be part of the Mapiriense clade (Särkinen et al., 2015). PL analyses supported S. anomalostemon + Brevantherum clade as sister to the rest of Clade II with high branch support (Appendix S13). Solanum anomalostemon was also found to be sister to Clade II, although the Brevantherum clade was not included in the TC analyses preventing a strict comparison (Figure 3). Two other taxa were found to represent single species lineage: S. polygamum Vahl as sister to the Leptostemonum clade and S. euacanthum Phil. as sister to the EHS clade (Appendices S11 and S12). Within the Leptostemonum clade, the EHS clade was strongly supported in all analyses (Figures 1, 3). There were however some minor differences in species-level relationships for closely related species of the Eggplant clade and Anguivi Grade (viz. S. campylacanthum Hochst. ex A.Rich., S. melongena L., S. linnaeanum Hepper & P.-M.LJaeger, S. dasyphyllum Schum. & Thonn., and S. aethiopicum L.; Figures 1 and 2A, C; Appendices S11–S15).
Phylogenomic discordance was generally high across the PL and TC topologies, with gCF values >50% in only three nodes in the PL phylogeny (Solanum as a whole, S. chilense (Dunal) Reiche + S. lycopersicum L. or the Tomato clade, and S. hieronymi Kuntze + S. aridum Morong in the Leptostemonum clade; Figure 4). Elsewhere, along the backbone of the PL phylogeny, gCF fell to 39% and below (8 nodes with gCF values 10% and below), with the lowest values found near branch nodes that varied the most amongst the different reconstructed species trees. This included the node subtending Regmandra (gCF 4%, SCF 38%; Figure 4), and that positioning Regmandra + DulMo + VANAns clade as sister to Clade II (gCF 2%, SCF 31%). Similarly, low gCF and uninformative sCF values around 33% were found across Clade II, including the node placing S. hieronymi + S. aridum as sister to the Elaeagnifolium + EHS minor clades (gCF 6%, sCF 36%; Figure 4), as well as the placement of the Erythrotrichum + Thomasiifolium clades within the large Leptostemonum clade (gCF 5%, sCF 23%; Figure 4).
Across the TC phylogeny, gCF and sCF values were slightly higher on average, with 3 nodes presenting values >50% for both metrics, i.e., one within the Petota clade (gCF 67%, SCF 69%; Figure 4), one at the base of the Leptostemonum clade (gCF 64%, SCF 72%; Figure 4), and another at the base of the EHS clade within Leptostemonum (gCF 58%, SCF 75%; Figure 4). Three nodes had low gCF values of 10% or less, with again some of the lowest values located near the base of the tree, including the relationship of Regmandra as sister to the VANAns clade (gCF 3%, sCF 39%; Figure 4), or placement of Potato as sister to the DulMo clade (gCF 10%, sCF 41%; Figure 4), and the relationship of the Potato + DulMo clades as sister to Clade II (gCF 4%, sCF 41%; Figure 4).
Network analyses and polytomy tests
High amount of reticulation/gene tree conflict was recovered between major clades of Solanum previously assigned to Clade I (e.g., Thelopodium, Regmandra, Potato, DulMo, VANAns), as well as with some lineage belonging to Clade II in the filtered supertree network using the TC data with 303 genes (min20; Figure 2B). The network clearly supported the monophyly of the Leptostemonum and the EHS clade (Figure 2B), corresponding to the nodes with high gCF and sCF values in the TC ASTRAL-III phylogeny (N1 and N2, Figure 4).
The polytomy tests carried out for the two TC ASTRAL-III datasets resulted in 10 nodes each for which the null hypothesis of branch lengths equal to zero was accepted, suggesting they should be collapsed into polytomies (Appendix S16); these nodes corresponded to the ones subtending the Regmandra, Leptostemonum and EHS clades, but were also located within the VANAns clade as well as within Clade II, the. Polytomies were also detected with the Petota clade, including at the base of the Tomato clade (min04 dataset, Appendix S16), and at the base of the Etuberosum + Petota + Tomato clade (min20 dataset, Appendix S16). Repeating the analysis by collapsing nodes with <75% local PP support led to the collapse of 12 to 13 nodes across the analyses, most of them affecting the same clades as in the previous runs, but also leading to the collapse of the crown node of Solanum. The effective number of gene trees was too low when nodes with <75% local PP support were collapsed to carry out the test for two nodes subtending S. betaceum and S. anomalostemon, most likely related to the low number of genes recovered for S. betaceum (Appendix S3).
The results of the ten phylogenetic analyses conducted here provide an updated evolutionary framework for the large and economically important genus Solanum, demonstrating that the major and minor clades within the group are stable (with a few noteworthy exceptions, see below). However, the strong levels of nuclear and nuclear-plastome discordance uncovered in the PL and TC analyses, in combination with the network analysis and polytomy tests, suggest that there are polytomies present along the backbone of the phylogeny. We first discuss the stability of the clades within Solanum, and the discovery of a few novel minor clades. We then examine the nuclear-plastome discordance and polytomies recovered and explore the possible causes underlying these, and their implications for the study of biogeography and trait evolution.
Updated evolutionary framework for Solanum
The supermatrix phylogeny, despite being based on only nine loci, nearly doubles the species sampling, confirming the monophyly of most major and minor clades established in previous analyses (Särkinen et al., 2013) and the polyphyly of three minor clades (Pachyphylla, Cyphomandropsis, and Allophyllum, the latter including species of Mapiriense-Clandestinum clade). It also reveals three new minor clades in Solanum comprising a single species each and confirms the placement of 129 previously unsampled species (e.g., S. alphonsei in the Valdiviense clade and S. graveolens Bunburry in the Cyphomandra clade; Appendices S11 and S12). Meanwhile, the phylogenomic analyses with increased gene sampling reveal a previously undetected major clade referred to as VANAns comprising of four minor clades (Valdiviense, Archaesolanum, Normania, and African non-spiny clades). Finally, our results did not support two previously resolved major clades due to nuclear-plastome discordance (Clade I and the M clade; Figure 2). Detailed molecular systematic studies with increased taxon and genetic sampling will be required to fully resolve the circumscription of all the major and minor clades recovered with diagnostic features, including the new ones identified here (Hilgenhof et al., unpublished manuscript).
Overall, our results establish that the taxonomic framework used in Solanum dividing the large genus into major and minor clades is robust, based on both phylogenomic datasets recovering the same major clades independent of methodological choices compared to the Sanger sequence supermatrix (e.g., Thelopodium, Regmandra, Potato, DulMo, VANAns, Clade II, Leptostemonum, and EHS clade). The major and minor clades currently used as informal infrageneric groups in Solanum were first established by Bohs (2005) based on a single locus of c. 2000 bp in length (ndhF). Our results demonstrate that larger species and gene sampling support the clades established earlier (e.g., Weese and Bohs, 2007; Särkinen et al., 2013). However, increased gene sampling provided by the two phylogenomic datasets does not help to resolve any of the polytomies along the backbone of Solanum close to the crown node and along the backbone of Clade II (Särkinen et al., 2013).
Nuclear and nuclear-plastome discordance
Our results reveal three regions of the Solanum phylogeny with gene discordance with low gCF and sCF values in the PL and TC dataset (Figure 4). These regions with nuclear discordance include: (1) the backbone of Solanum near the crown node of the genus where major clades previously identified as Clade I diverge (from here on referred to as Grade I); (2) the backbone of the large Leptostemonum clade; and (3) the backbone of the EHS clade within the Leptostemonum (Figures 2B and 3). Many of the branches within these regions are extremely short in both PL and TC phylogenomic datasets (Figures 1 and 2; Appendices S11–S15), and network analyses of the nuclear dataset reveals reticulation in one of them (Grade I, Figure 2B). Polytomy tests confirm that multiple nodes within all three regions should be collapsed in the TC dataset (Appendix S16) and support the recognition of these regions as polytomies. Hence, we refer to these three regions of the phylogeny as polytomies from hereon.
Further exploration of the polytomies reveal nuclear-plastome discordance within Grade I, relating to the position and relationship between Regmandra, Potato, DulMo and VANAns clades (Figures 3 and 4). No signal of nuclear-plastome discordance was detected in the other polytomies based on the species sampling presented here (Figures 3 and 4), but increased species sampling will be needed to confirm these results.
Altogether, our results indicate the presence of three polytomies which differ somewhat in nature. The deepest of these polytomies along the backbone of Solanum near the crown node shows high nuclear and nuclear-plastid discordance with reticulation evident even within the nuclear phylogenomic dataset (Figure 2B). This polytomy could be referred to as a hard polytomy because it will probably be difficult to resolve even with more genomic data, due to its deeper position in the phylogeny in terms of evolutionary depth and time, the presence of clear nuclear-plastome discordance, short branch lengths and evidence for reticulation within the nuclear phylogenomic dataset. In contrast, the other two polytomies along the backbone of Leptostemonum and the EHS clades are at shallower evolutionary depth and show nuclear discordance only without clear/widespread reticulation in the nuclear dataset (Figure 2B). These polytomies represent simpler cases and may turn out to be possible to resolve with more genomic data. In either case, to confirm whether the polytomies recovered here are truly “hard” or “soft”, denser taxon sampling and more genomic data will be required to carry out more rigorous tests concerning the cause of the gene discordance observed here.
What is causing genomic discordance in our dataset?
Finding genomic discordance in our phylogenomic datasets is unsurprising, given that it has also been found in many other phylogenomic studies in the Solanaceae, including Nicotiana (Dodsworth et al., 2020), the Capsiceae (Capsicum and relatives; Spalink et al., 2018), subtribe Iochrominae (Gates et al., 2018), Jaltomata (Wu et al., 2019), and two studies of Solanum involving the Tomato (Strickler et al., 2015; Pease et al., 2016) and Petota clades (Huang et al., 2019). ILS was shown to be responsible for the widespread discordance found in phylogenomic data in the diploid Tomato clade (Strickler et al., 2015; Pease et al., 2016), while hybridization and introgression has been argued to be behind genomic discordance in Petota clade that includes many polyploids (Huang et al., 2019).
Potential processes responsible for nuclear or nuclear-plastome discordance involve gene introgression, ILS, hybridization, and polyploidization; distinguishing between these remains difficult even with increased genomic sampling involving custom bait sets (Larridon et al., 2020; Koenen et al., 2021) or whole genome-sequences (Suh, 2016; Malinsky et al., 2018; Williams et al., 2021). Comparison of the nuclear and plastome topologies in our study does not indicate any obvious chloroplast capture events that could explain the observed nuclear-plastome discordance along the backbone of Solanum near the crown node. Furthermore, cytogenetic and chromosome studies show no evidence for genome duplication or polyploidy along the three polytomies discovered here, despite the three-fold increase in genome size between the distantly related potato (S. tuberosum L., Potato clade) and eggplant (S. melongena, Leptostemonum clade; Barchi et al., 2019). Chromosome counts indicate that the ancestor of Solanum was diploid, i.e., a large majority of Solanum species are reported to be diploid (>97% of the 506 species for which chromosome counts are available), and mapping of ploidy level across the phylogeny indicates that most of the lineages involved in the three polytomy regions identified here are diploid (Chiarini et al., 2018). Polyploidy has arisen independently within the Archaesolanum, Petota, Morelloid, Caroliniense, Elaeagnifolium, and EHS minor clades within the larger Leptostemonum clade (Chiarini et al., 2018), and hybridization/introgression has been argued to be the case behind phylogenomic discordance found in the Petota clade (Huang et al., 2019). Gene duplication could explain the signal recovered here for the EHS clade but is unlikely to explain the discordance observed here. Save for one locus, our analyses did not detect the presence of paralogs in our nuclear dataset.
Currently, the most likely explanation for the discordance along the backbone of Solanum is due to ILS caused by rapid speciation. Two of the polytomies include the most species-rich (Table 1) and rapidly diversifying lineages of Solanum, the Leptostemonum and the EHS clades (Echeverría-Londoño et al., 2020), whose crown ages have been estimated to be between 8 to 11 and 4 to 6 million years (Myr), respectively (Särkinen et al., 2013). The backbone of Solanum near the crown node has been estimated to be almost twice as old as the Leptostemonum clade (13 to 17 Myr; Särkinen et al., 2013) yet shows a strong signal of nuclear-plastome discordance. While past studies have not detected any increased rates of diversification near the crown node of Solanum, detecting diversification rate shifts remains a challenge (Louca and Pennell, 2020), especially in older nodes. Hence, we cannot fully exclude the option that ILS and rapid speciation has taken place close to the crown node of the genus.
Presence of short internal branches is typical of ILS in lineages with large population sizes and high mutation rates (Schrempf and Szöllősi, 2020). This fits with the biology of Solanum in general, which is typically known to contain “weedy”, disturbance-loving pioneer species resilient to change. Many species are known to have large geographical ranges and ecological amplitude, including globally distributed weeds from the Leptostemonum, Brevantherun and Morelloid clades, such as S. elaeagnifolium Cav., S. caroliniense L., S. torvum Sw., S. erianthum D.Don, S. mauritianum Scop., S. americanum, and S. nigrum L. (Knapp et al., 2017, 2019; Cowie et al., 2018; Särkinen et al., 2018). Some of the weedy characteristics found in these species include the ability to improve fitness and defense traits in response to disturbance (Chavana et al., 2021), as well as having allelopathic properties which allow them to establish themselves to the detriment of native vegetation (Cowie et al., 2018). If such characteristics were present in ancestral Solanum, they could have promoted rapid speciation across the globe, followed by rapid morphological evolution and speciation within areas. The patterns observed here could possibly be the result of three major rapid speciation “pulses” across the evolutionary history of Solanum, involving lineages close to the crown node of Solanum, Leptostemonum, and the EHS clade. The idea of an ecologically opportunistic ancestor is supported by the tendency of many of the major clades near the crown node of Solanum to occupy periodically highly stressed and disturbed habitats, including flooded varzea forests occupied by Thelopodium clade, hyper-arid deserts occupied by Regmandra clade, and highly disturbed and dynamic open mid-elevation Andean montane habitats occupied by DulMo clade, where landslides are among the most common areas where many of the species are found (Knapp, 2013; Särkinen et al., 2018; Knapp et al., 2019).
Future studies with larger datasets will be able to carry out additional tests, such as the impact of using phylogenetic models that take into consideration the heterogeneity of molecular sequence evolution (Williams et al., 2021), as well as different data types (Romiguier et al., 2013; Reddy et al., 2017). Future studies will need to untangle how introgression and ILS are potentially affecting the patterns of genomic discordance observed here at different phylogenetic depths (Meleshko et al., 2021). Additional information about recombination, chromosome structure, and genomic size and evolution of Solanum will also be useful to clearly define coalescence genes in phylogenomic datasets, fundamental units in coalescent analyses which are rarely examined (Springer and Gatesy, 2018). Currently, information about genome evolution in Solanum is lacking, as only 62 species (5% of Solanum) are recorded in the plant DNA C-value database (Pellicer and Leitch, 2020), and 86 species (7% of Solanum) have been studied with chromosome banding and/or FISH techniques (Chiarini et al., 2018). Information about genome size is missing for lineages such as the Thelopodium and Regmandra clades and for the majority of species not directly related to major commercial crops.
Implications for biogeographical and morphological studies in Solanum
The idea that well-supported and fully bifurcating phylogenies are a requisite for evolutionary studies is built on the premise that such trees are the accurate way of representing evolution. The shift in systematics from “tree”- to “bush”-like thinking, where polytomies and reticulate patterns of evolution are considered as acceptable or real (Poczai, 2013; Mallet et al., 2016; Edelman et al., 2019), comes from the accumulation of studies finding similar unresolvable phylogenetic nodes, despite using different large-scale genomic sampling strategies and various analytical methods (Suh, 2016). Given the difficulty of resolving short internal branches in phylogenies and the rapid evolution of major clades in Solanum, it will be important to adopt methods that incorporate polytomies and networks to conduct biogeographical and morphological studies (Than et al., 2008; Solís-Lemus et al., 2017; Wen et al., 2018; Olave and Meyer, 2020; Lutteropp et al., 2021 [Preprint]).
In terms of biogeography, our inability to resolve relationships amongst the major lineages in Solanum, especially along the backbone of Solanum near the crown node, has implications for understanding the ancestral environment of Solanum and its major lineages. Uncertainty amongst the relationships of major clades does not change the hypothesis that the genus probably originated from South America and spread multiple times to Africa, Asia, Australia, North America, and Europe (Olmstead and Palmer, 1997; Echeverría-Londoño et al., 2020). The polytomy near the crown node of Solanum does, however, cast uncertainty on the specific region and habitat/biome that the major clades originated within the South American continent. For example, the sister relationship of Regmandra and the Potato clade inferred by the Sanger supermatrix analysis suggests that the wild ancestors of both potato and tomato evolved from an ancestor adapted to survive in lomas deserts from coastal South America (Bennett, 2008; Figure 1). Yet, both nuclear and plastome phylogenomic datasets suggest that the Potato clade is more closely related to the DulMo clade found to occur in tropical montane and subtropical biomes (Figure 3).
The hard polytomy along the backbone of Solanum also has important implications for evolutionary biologists interested in trait evolution. Standard methods of trait evolution relying on bifurcating trees may incorrectly infer how traits evolve (Hahn and Nakhleh, 2016). The discordance between traits, gene trees, and species trees has been defined as hemiplasy (Avise and Robinson, 2008), and studies have shown that depending on the level of ILS present in the data, hemiplasy can lead to different interpretations of convergent evolution of traits across phylogenetic trees (Mendes et al., 2016). While broad mapping of morphological traits on a species-level phylogeny can help gain a rough understanding of phenotypic variation across clades, careful study of gene tree topologies in relation to a trait of interest is essential to gain an exact understanding of its evolutionary origin.
Our findings reflect results from recently published studies showing rapid morphological innovation coinciding with areas of strong phylogenomic discordance in different plants and animal groups (Parins-Fukuchi et al., 2021), where the signal of nuclear-plastome discordance corresponds to strong ecological diversification and morphological innovation across major clades in Solanum previously assigned to Clade I. The major clades involved in the nuclear-plastome discordance along Grade I show large differences in their ecology as well as morphology. Members of the Thelopodium, Regmandra, VANAns, Potato, and DulMo clades occupy a wide range of tropical, montane, and temperate habitats across South America, Africa, and Australia (Symon, 1994; Knapp, 2000; Bohs and Olmstead, 2001; Spooner et al., 2004, 2016, 2019; Bohs, 2005; Peralta et al., 2007; Bennett, 2008; Knapp, 2013; Knapp and Vorontsova, 2016; Tepe et al., 2016; Särkinen et al., 2018; Knapp et al., 2019). Morphology shows equally high polymorphism between these major clades across many traits, such as growth form, which varies from single-stemmed wand-like shrubs (Thelopodium clade), annual herbs (Regmandra, Potato, and Morelloid clade), woody climbers and shrubs (VANAns clade), and herbaceous vines rooting along nodes (Potato clade). Similar patterns are observed in inflorescence position and branching, corolla shape, stamen dimorphism, and anther shape showing the presence of high polymorphism in these clades of which only some was retained in Clade II (Hilgenhof et al., unpublished manuscript). Testing the idea that this phenotypic diversity is linked to ecological diversification will require the construction of detailed morphological and ecological datasets to test if this pattern holds up in more formal and rigorous analyses.
We demonstrate the stability of the majority of the clades defined within Solanum and uncover significant nuclear and nuclear-plastome discordance amongst relationships of major clades in Solanum based on the first phylogenomic study of the genus with wide species sampling. Three major polytomies are identified in Solanum based on the short branch lengths, gene concordance factor results, and polytomy tests. Two of these polytomies correspond to the biggest and most quickly diversifying lineages within Solanum (Leptostemonum and EHS clades). The third polytomy along the backbone of Solanum near the crown node involves reticulation and strong nuclear-plastome discordance and highlights great uncertainty in the relationships between the Potato, DulMo, Regmandra, and VANAns clades. This region of nuclear-plastome discordance corresponds with high ecological and morphological innovation and we argue that it is most likely due to ILS and rapid speciation based on current knowledge of genome evolution in Solanum. Future studies, even with full genome sequences and increased taxon sampling, might not be able to resolve the polytomy near the crown node of Solanum because the pattern of high reticulation combined with internodal short branches and its older age. Data on genome size and chromosome structure of the earliest branching lineages in Solanum will be required to further explore the nature and causes of this hard polytomy. We argue that acknowledging and embracing polytomies and reticulation is crucial if we are to design research programs aimed at understanding the biology of large and rapidly radiating lineages, such as the large and economically important Solanum.
We thank Elliot Gardner for sharing scripts and advice on phylogenomic analyses with HybPiper, Royce Steeves for providing advice on DNA extraction for genome skimming, Felix Forest and Olivier Maurin for providing technical support and providing feedback on the manuscript, and João R. Stehmann, Thais Almeida, Paul Gonzáles, and Maria Baden who greatly contributed to fieldwork and sample acquisition. Finally, we would also like to thank the three reviewers, including Stacey Smith and William J. Baker, who provided constructive reviews and feedback that greatly improved the final version of this manuscript.
This work was supported by the Fonds de recherche du Québec en Nature et Technologies postdoctoral fellowship and a grant from the Department of Biological Sciences of the University of Moncton to E.G., the Sibbald Trust fellowship to R.H., the Ceiba Foundation to A.O., CNPq Conselho Nacional de Desenvolvimento Científico e Tecnológico awards 479921/2010-54 and 427198/2016-0 and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior CAPES/FAPESPA award 88881.159124/2017-01 to L.L.G., NSF through grant DEB-0316614 “PBI Solanum: a worldwide treatment” to S.K. and L.B., the Calleva Foundation & Sackler Trust (Plant and Fungal Trees of Life Project at Kew), the LUOMUS Trigger and Systematics Research Fund to P.P., the OECD CRP and Eötvös Research Grant (MAEÖ−00074-002/2021). Field sampling was supported by the Northern Territory Herbarium (Palmerston, Northern Territory, Australia), and the David Burpee Endowment at Bucknell University (Lewisburg, Pennsylvania, USA) and National Geographic Society Northern Europe Award GEFNE49-12 (Peru, TS). Peruvian specimens were collected and sequenced under the permission of Ministerio de Agricultura, Dirección General Forestal y de Fauna Silvestre (collection permits 084-2012-AG-DGFFSDGEFFS and 096-2017-SERFOR/DGGSPFFS, and genetic resource permit 008-2014-MINAGRI-DGFFS/DGEFFS).
E.G. designed and performed the analyses for the paper, with guidance from P.P., A.O., S.D., and T.S.; E.G. produced all figures, and wrote the manuscript, with major contributions from T.S., as well as P.P., S.D., S.K., and X.A. R.H. and T.S. helped in data gathering and analyses. All other authors contributed data to the main analyses. All authors read and contributed to the final version of the manuscript.
DATA AVAILABILITY STATEMENT
Raw sequence data generated in this study are deposited in various archives, including GenBank (website: https://www.ncbi.nlm.nih.gov/genbank/) and the European Nucleotide Archive (website: https://www.ebi.ac.uk/ena/browser/home); full accession numbers are provided in Appendices S1, S2, and S3. In addition, the 10 species trees generated for this study, as well as the alignments used for the different phylogenetic analyses, including the concatenated Sanger supermatrix, the plastome dataset, and the target capture datasets (min04 and min20) are available via Data Dryad, at the following link: https://datadryad.org/stash/dataset/doi:10.5061/dryad.2v6wwpzpt.
Additional supporting information can be found online in the Supporting Information section at the end of this article.
|ajb21827-sup-0001-Appendix_S1_Supermatrix_samples.xlsx84.1 KB||Appendix S1. Supermatrix sample information, including voucher details and GenBank numbers for sequences used.|
|ajb21827-sup-0002-Appendix_S2_Plastome_samples.xlsx24.2 KB||Appendix S2. Plastome (PL) sample information, including voucher details and plastome assemblies’ results. Total length, as well as length for the long-single copy region (LSC), the short-single copy region (SSC), and the two inverted repeat regions (IR1 and IR2) is shown; statistics of mean coverage per base pair and standard deviation are also provided.|
|ajb21827-sup-0003-Appendix_S3_TargtCapture_samples_v2.xlsx16.2 KB||Appendix S3. Target-capture (TC) sample information, including voucher details and sequence recovery statistics. The number of reads (NumReads), the number of reads mapped to the targets (ReadsMapped), the percentage of reads on target (PctOnTarget), the number of genes with reads (GenesMapped), the number of genes with contigs (GenesWithContigs), (GenesWithSeqs, GenesAt25pct, GenesAt50pct, GenesAt75pct, GenesAt150pct, and the number of genes with paralog warnings (ParalogWarnings) is shown.|
|ajb21827-sup-0004-Appendix_S4_Supermatrix_alignment_details.pdf73.5 KB||Appendix S4. Supermatrix alignment details, with details about the nine regions selected for this study. Number of species sampled per region, accumulative percentage of species sampled per region, aligned length, proportions of parsimony informative characters (PI), and variable sites (VS) per region in the dataset are indicated. Values are calculated with outgroups, and with ambiguous regions and repeats excluded. Bp = base pairs.|
|ajb21827-sup-0005-Appendix_S5_Polyploid_list.xlsx15 KB||Appendix S5. List of polyploid taxa in Solanum.|
|ajb21827-sup-0006-Appendix_S6_ML_Individual_loci_and_plastid_phylogenies.pdf788.6 KB||Appendix S6. ML results for each of the nine individual loci and combined plastid loci. (A) ITS; (B) matK; (C) ndhF; (D) ndhF-rpL32; (E) psbA-trnH; (F) rpL32-trnL; (G) trnL-trnT; (H) trnS-trnG; (I) waxy; and (J) seven plastid loci. Nodes with bootstrap support equal and above 95% are in cyan, and with branch support between 75% and 94% in red. Tips indicate species names, followed by major and/or minor clade, as indicated in Table 1.|
|ajb21827-sup-0007-Appendix_S7_BI_Individual_loci_and_plastid_phylogenies.pdf821.2 KB||Appendix S7. BI results for each of the nine individual loci and combined plastid loci. (A) ITS; (B) matK; (C) ndhF; (D) ndhF-rpL32; (E) psbA-trnH; (F) rpL32-trnL; (G) trnL-trnT; (H) trnS-trnG; (I) waxy; and (J) seven plastid loci. Nodes with posterior probability equal and above 0.95 are in cyan, and nodes with posterior probabilities between 0.75 and 0.95 are in red. Tips indicate species names, followed by major and/or minor clade, as indicated in Table 1.|
|ajb21827-sup-0008-Appendix_S8_Plastome_alignment_statistics.pdf71.8 KB||Appendix S8. Plastome (PL) alignment statistics for plastome alignment. Data shows number of sequences, trimming mode, the number of loci retained for coalescent analysis after checking for excessive gene tree branch lengths, alignment length, number of informative and constant sites, pairwise identity, average GC content, percentage of gaps, and average locus length for the exon, intron, and intergenic regions.|
|ajb21827-sup-0009-Appendix_S9_Model_Selection.xlsx47.1 KB||Appendix S9. Optimal substitution model used in ML analyses for the PL and TC datasets, determined using ModelFinder in IQ-TREE2. For each locus, the number of taxa, sites, informative sites, and invariable sites are indicated, as well as the model selected and the AICc score. Worksheet titles correspond to the following: PLUnpartitioned = Models selected for PL unpartitioned datasets, for 151 taxa and 125 taxa; PLBestPartScheme = Models selected for PL datasets analysed according to the best-partition scheme; TCPartitioned_Min4 = Models selected for loci of the TC dataset, with minimum 4 taxa per loci; TCPartitioned_Min20 = Models selected for loci of the TC dataset, with minimum 20 taxa per loci.|
|ajb21827-sup-0010-Appendix_S10_TargetCapture_Alignment_statistics.pdf59.9 KB||Appendix S10. Target-capture (TC) alignment statistics. Loci excluded refer to the number of excluded loci based on excessively long branch lengths, and loci retained is the final number of loci retained for both ML and coalescent analyses. Empty sequences inserted refers to amount of missing data. Min = minimum; Bp = base pairs.|
|ajb21827-sup-0011-Appendix_S11_Supermatrix_ML_Phylogeny.pdf132.2 KB||Appendix S11. Detailed RaxML of supermatrix phylogenetic tree with 746 taxa. Nodes with bootstrap support equal and above 95% are in cyan, and with bootstrap support between 75% and 94% in red. Bootstrap support values for each node indicated in italic. Tips indicate species names, followed by major and/or minor clade, as indicated in Table 1.|
|ajb21827-sup-0012-Appendix_S12_Supermatrix_Beast_Phylogeny.pdf148.8 KB||Appendix S12. Detailed Bayesian inference (Beast) supermatrix phylogenetic tree with 746 taxa. Nodes with posterior probability equal and above 0.95 are in cyan, and nodes with posterior probabilities between 0.75 and 0.95 are in red. Posterior probability values for each indicated in italic. Tips indicate species names, followed by major and/or minor clade, as indicated in Table 1.|
|ajb21827-sup-0013-Appendix_S13_Plastome_ML_phylogenies.pdf271 KB||Appendix S13. ML phylogenetic trees of plastome datasets. Nodes with bootstrap support equal and above 95% are in cyan, and with bootstrap support between 75% and 94% in red. Tips indicate species names, followed by major and/or minor clade, as indicated in Table 1. (A) 151 taxa, all data, unpartitioned; (B) 125 taxa, all data, unpartitioned; (C) 151 taxa, all data, best partition scheme; (D) 125 taxa, all data, best partition scheme.|
|ajb21827-sup-0014-Appendix_S14_TargetCapture_ML_phylogenies.pdf113.1 KB||Appendix S14. ML phylogenetic trees of A353 target capture datasets (IQ-TREE2). Nodes with bootstrap support equal and above 95% are in cyan, and with bootstrap support between 75% and 94% in red. Tips indicate species names, followed by major and/or minor clade, as indicated in Table 1. (A) Filtering threshold of minimum 4 taxa per loci; (B) filtering threshold of minimum 20 taxa per loci.|
|ajb21827-sup-0015-Appendix_S15_TargetCapture_Coalescent_phylogenies.pdf95.7 KB||Appendix S15. Coalescent phylogenetic trees of A353 target capture datasets (ASTRAL-III). Nodes with multi-locus local posterior probability support equal and above 0.95 are in cyan, and with support between 0.75 and 0.94 in red. Tips indicate species names, followed by major and/or minor clade, as indicated in Table 1. (A) Filtering threshold of minimum of 4 taxa per loci; (B) filtering threshold of minimum 20 taxa per loci.|
|ajb21827-sup-0016-Appendix_S16_Polytomy_tests.pdf273.9 KB||Appendix S16. Polytomy test results with ASTRAL-III. (A) Target Capture A353 species tree ASTRAL-III, filtering threshold of minimum 4 taxa per loci, branches in gene trees with 10% or less branch support collapsed; (B) Target Capture A353, ASTRAL-III, filtering threshold of minimum 4 taxa per loci, branches in gene trees with 75% or less branch support collapsed; (C) Target Capture A353, ASTRAL-III, filtering threshold of minimum 20 taxa per loci, branches in gene trees with 10% or less branch support collapsed; (D) Target Capture A353, ASTRAL-III, filtering threshold of minimum 20 taxa per loci, branches in gene trees with 75% or less branch support collapsed.|
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
- 2013. Pruning rogue taxa improves phylogenetic accuracy: An efficient algorithm and webservice. Systematic Biology 62: 162–166.
- 2018. The chloroplast genome sequence of bittersweet (Solanum dulcamara): Plastid genome structure evolution in Solanaceae. PLoS One 13: e0196069.
- 2010. FastQC: A quality control tool for high throughput sequence data. Website: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
- APG [Angiosperm Phylogeny Group]. 1998. An ordinal classification for the families of flowering plants. Annals of the Missouri Botanical Garden 85: 531–553.
- 2017. Modified low-salt CTAB extraction of high-quality DNA from contaminant-rich tissues. Molecular Ecology Resources 17: 686–693.
- 2017). gridExtra: Miscellaneous functions for “grid” graphics. R package version 2.3. Website: https://CRAN.R-project.org/package=gridExtra
- 2008. Hemiplasy: A new term in the lexicon of phylogenetics. Systematic Biology 57: 503–507.
- 2021. A comprehensive phylogenomic platform for exploring the angiosperm tree of life. Systematic Biology 71: 301–319.
- 2019. A chromosome-anchored eggplant genome sequence reveals key events in Solanaceae evolution. Scientific Reports 9: 11769.
- 2008. Revision of Solanum section Regmandra (Solanaceae). Edinburgh Journal of Botany 65: 69–112.
- 2004. A chloroplast DNA phylogeny of Solanum section Lasiocarpa. Systematic Botany 29: 177–187.
- 2005. Major clades in Solanum based on ndhF sequence data. Monographs in Systematic Botany 104: 27–49.
- 1997. Phylogenetic relationships in Solanum (Solanaceae) based on ndhF sequences. Systematic Botany 22: 5–17.
- 2001. A reassessment of Normania and Triguera (Solanaceae). Plant Systematics and Evolution 228: 33–48.
- 2014. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30: 2114–2120.
- 2019. BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Computational Biology 15: e1006650.
- 2017. Phyx: Phylogenetic tools for UNIX. Bioinformatics 33: 1886–1888.
- 2009. trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25: 1972–1973.
- 2021. Local adaptation to continuous mowing makes the noxious weed Solanum elaeagnifolium a superweed candidate by improving fitness and defense traits. Scientific Reports 11: 6634.
- 2018. Data reassessment in a phylogenetic context gives insight into chromosome evolution in the giant genus Solanum (Solanaceae). Systematics and Biodiversity 16: 397–416.
- 2018. A review of Solanum mauritianum biocontrol: Prospects, promise and problems: A way forward for South Africa and globally. Biocontrol 63: 475–491.
- 2020. ModelTest-NG: A new and scalable tool for the selection of DNA and protein evolutionary models. Molecular Biology and Evolution 37: 291–294.
- 2009. Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends in Ecology & Evolution 24: 332–340.
- 2020. Extensive plastid-nuclear discordance in a recent radiation of Nicotiana section Suaveolentes (Solanaceae). Botanical Journal of the Linnean Society. Linnean Society of London 193: 546–559.
- 2020. Plastome phylogenomics of Poaceae: Alternate topologies depend on alignment gaps. Botanical Journal of the Linnean Society. Linnean Society of London 192: 9–20.
- 2020. Dynamism and context-dependency in diversification of the megadiverse plant genus Solanum (Solanaceae). Journal of Systematics and Evolution 58: 767–782.
- 2019. Genomic architecture and introgression shape a butterfly radiation. Science 366: 594–599.
- 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26: 2460–2461.
- 2016. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32: 3047–3048.
- 2021. Repeated parallel losses of inflexed stamens in Moraceae: Phylogenomics and generic revision of the tribe Moreae and the reinstatement of the tribe Olmedieae (Moraceae). Taxon 70: 946–988.
- 2018. Filtering of target sequence capture individuals facilitates species tree construction in the plant subtribe Iochrominae (Solanaceae). Molecular Phylogenetics and Evolution 123: 26–34.
- 2016. Irrational exuberance for resolved species trees. Evolution 70: 7–17.
- 2021. Phylogenomics reveals ancient gene tree discordance in the amphibian tree of life. Systematic Biology 70: 49–66.
- 2019. Analyses of 202 plastid genomes elucidate the phylogeny of Solanum section Petota. Scientific Reports 9: 4454.
- 2006. Application of phylogenetic networks in evolutionary studies. Molecular Biology and Evolution 23: 254–267.
- 2006. Phylogenomics: The beginning of incongruence? Trends in Genetics 22: 225–231.
- 2020. GetOrganelle: A fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biology 21: 241.
- 2016. HybPiper: Extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment. Applications in Plant Sciences 4: 1600016.
- 2019. A universal probe set for targeted sequencing of 353 nuclear genes from any flowering plant designed using k-Medoids clustering. Systematic Biology 68: 594–606.
- 2010. The Newick utilities: High-throughput phylogenetic tree processing in the UNIX shell. Bioinformatics 26: 1669–1670.
- 2005. MAFFT version 5: Improvement in accuracy of multiple sequence alignment. Nucleic Acids Research 33: 511–518.
- 2013. A revision of the Dulcamaroid clade of Solanum L. (Solanaceae). PhytoKeys 22: 1–432.
- 2019. A revision of the Morelloid clade of Solanum L. (Solanaceae) in North and Central America and the Caribbean. PhytoKeys 123: 1–144.
- 2000. A revision of Solanum thelopodium species group (section Anthoresis sensu Seithe, pro parte): Solanaceae. Bulletin of the Natural History Museum, Botany Series 30: 13–30.
- 2017. A revision of the Solanum elaeagnifolium clade (Elaeagnifolium clade; subgenus Leptostemonum, Solanaceae). PhytoKeys 67: 1–104.
- 2016. A revision of the “African Non-Spiny” Clade of Solanum L. (Solanum sections Afrosolanum Bitter, Benderianum Bitter, Lemurisolanum Bitter, Lyciosolanum Bitter, Macronesiotes Bitter, and Quadrangulare Bitter: Solanaceae). PhytoKeys 66: 1–142.
- 2021. The origin of the legumes is a complex paleopolyploid phylogenomic tangle closely associated with the Cretaceous–Paleogene (K–Pg) mass extinction event. Systematic Biology 70: 508–526.
- 2012. Statistics and truth in phylogenomics. Molecular Biology and Evolution 29: 457–472.
- 2006. Phylogenomics reveal a robust fungal tree of life. FEMS Yeast Research 6: 1213–1220.
- 2012. Partitionfinder: Combined selection of partitioning schemes and substitution models for phylogenetic analyses. Molecular Biology and Evolution 29: 1695–1701.
- 2020. Tackling rapid radiations with targeted sequencing. Frontiers in Plant Science 10: 1655.
- 2006. Phylogenetic relationships among the “spiny solanums” (Solanum subgenus Leptostemonum, Solanaceae). American Journal of Botany 93: 157–169.
- 2005. A four-gene study of evolutionary relationships in Solanum section Acanthophora. American Journal of Botany 92: 603–612.
- 2010. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26: 589–595.
- 2009. Coalescent methods for estimating phylogenetic trees. Molecular Phylogenetics and Evolution 53: 320–328.
- 2020. Extant timetrees are consistent with a myriad of diversification histories. Nature 580: 502–505.
- 2021. NetRAX: Accurate and fast maximum likelihood phylogenetic network inference. bioRxiv, website: https://doi.org/10.1101/2021.08.30.458194 [Preprint].
- 2018. Whole-genome sequences of Malawi cichlids reveal multiple radiations interconnected by gene flow. Nature Ecology & Evolution 2: 1940–1955.
- 2016. How reticulated are species? BioEssays 38: 140–149.
- 2021. Extensive genome-wide phylogenetic discordance is due to incomplete lineage sorting and not ongoing introgression in a rapidly radiated bryophyte genus. Molecular Biology and Evolution 38: 2750–2766.
- 2016. Gene tree discordance can generate patterns of diminishing convergence over time. Molecular Biology and Evolution 33: 3299–3307.
- 2009. Do multiple tortoises equal a hare? The utility of nine noncoding plastid regions for species-level phylogenetics in tribe Lycieae (Solanaceae). Systematic Botany 34: 796–804.
- 2010. Creating the CIPRES Science Gateway for inference of large phylogenetic trees. 2010 Gateway Computing Environments Workshop (GCE), 1-8. IEEE [Institute of Electrical and Electronics Engineers], New York, New York, USA.
- 2020a. New methods to calculate concordance factors for phylogenomic datasets. Molecular Biology and Evolution 37: 2727–2733.
- 2020b. IQ-TREE2: New models and efficient methods for phylogenetic inference in the genomic era. Molecular Biology and Evolution 37: 1530–1534.
- 2013. Heterogeneous models place the root of the placental mammal phylogeny. Molecular Biology and Evolution 30: 2145–2156.
- 2020. Implementing large genomic single nucleotide polymorphism data sets in phylogenetic network reconstructions: A case study of particularly rapid radiations of cichlid fish. Systematic Biology 69: 848–862.
- 1997. Implications for the phylogeny, classification, and biogeography of Solanum from cpDNA restriction site variation. Systematic Botany 22: 19–29.
- One Thousand Plant Transcriptomes Initiative. 2019. One thousand plant transcriptomes and the phylogenomics of green plants. Nature 574: 679–685.
- 2019. ape 5.0: An environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35: 526–528.
- 2021. Phylogenomic conflict coincides with rapid morphological innovation. Proceedings of the National Academy of Sciences, USA 118: e2023058118.
- 2016. Phylogenomics reveal three sources of adaptive variation during a rapid radiation. PLoS Biology 14: e1002379.
- 2020. The Plant DNA C-values database (release 7.1): An updated online repository of plant genome size data for comparative studies. New Phytologist 226: 301–305.
- 2007. The taxonomy of tomatoes: A revision of wild tomatoes (Solanum L. section Lycopersicon (Mill.) Wettst.) and their outgroup relatives (Solanum sections Juglandifolium (Rydb.) Child and Lycopersicoides (Child) Peralta). Systematic Botany Monographs 84: 1–186.
- 2011. Resolving difficult phylogenetic questions: Why more sequences are not enough. PLoS Biology 9: e1000602.
- D. M. de Vienne, V. Ranwez, B. Roure, D. Baurain, and F. Delsuc. 2017. Pitfalls in supermatrix phylogenomics. European Journal of Taxonomy 283: 1–125.
- 2013. To network or not to network, that is the question. Journal of Genetics 92: 703–705.
- 2010. FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS One 5: e9490.
- 2018. Posterior summarization in bayesian phylogenetics using Tracer 1.7. Systematic Biology 67: 901–904.
- 2017. Why do phylogenomic data sets yield conflicting trees? Data type influences the avian tree of life more than taxon sampling. Systematic Biology 66: 857–879.
- 2013. Less is more in mammalian phylogenomics: AT-rich genes minimize tree conflicts and unravel the root of placental mammals. Molecular Biology and Evolution 30: 2134–2144.
- 2021. Drivers and dynamics of a massive adaptive radiation in cichlid fishes. Nature 589: 76–81.
- 2019. DNA barcoding of the Solanaceae family in Puerto Rico including endangered and endemic species. Journal of the American Society for Horticultural Science 144: 363–374.
- 2018. A 250 plastome phylogeny of the grass family (Poaceae): Topological support under different data partitions. PeerJ 6: e4299.
- 1997. Chloroplast DNA phylogeny, reticulate evolution, and biogeography of Paeonia (Paeoniaceae). American Journal of Botany 84: 1120–1136.
- 2015. True black nightshades: Phylogeny and delimitation of the Morelloid clade of Solanum. Taxon 64: 945–958.
- 2013. A phylogenetic framework for evolutionary study of the nightshades (Solanaceae): A dated 1000-tip tree. BMC Evolutionary Biology 13: 214.
- 2018. A revision of the Old World black nightshades (Morelloid clade of Solanum L., Solanaceae). PhytoKeys 106: 1–223.
- 2016. Fast coalescent-based computation of local branch support from quartet frequencies. Molecular Biology and Evolution 33: 1654–1668.
- 2018. Testing for polytomies in phylogenetic species trees using quartet frequencies. Genes 9: 132.
- 2020. The sources of phylogenetic conflicts. In C. Scornavacca, F. Delsuc, and N. Galtier [eds.], Phylogenetics in the Genomic Era, 3.1:1–3.1:23. No commercial publisher.
- 2017. A large and consistent phylogenomic dataset supports sponges as the sister group to all other animals. Current Biology 27: 958–967.
- 2017. PhyloNetworks: A package for phylogenetic networks. Molecular Biology and Evolution 34: 3292–3298.
- 2018. Comparative transcriptomics and genomic patterns of discordance in Capsiceae (Solanaceae). Molecular Phylogenetics and Evolution 126: 293–302.
- 2016. Taxonomy of wild potatoes and their relatives in Southern South America (Solanum sect. Petota and Etuberosum). Systematic Botany Monographs 100: 1–240.
- 2004. Wild potatoes (Solanum section Petota; Solanaceae) of North and Central America. Systematic Botany Monographs 68: 1–209.
- 2019. Taxonomy of wild potatoes in northern South America (Solanum section Petota). Systematic Botany Monographs 108: 1–305.
- 2018. Delimiting coalescence genes (C-Genes) in phylogenomic data sets. Genes 9: 123.
- 2006. RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22: 2688–2690.
- 2014. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30: 1312–1313.
- 2011. Molecular delimitation of clades within New World species of the “spiny solanums” (Solanum subg. Leptostemonum). Taxon 60: 1429–1441.
- 2012. An explosive innovation: Phylogenetic relationships of Solanum section Gonatotrichum (Solanaceae). PhytoKeys 8: 89–98.
- 2015. Comparative genomics and phylogenetic discordance of cultivated tomato and close wild relatives. PeerJ 3: e793.
- 2016. The phylogenomic forest of bird trees contains a hard polytomy at the root of Neoaves. Zoologica Scripta 45: 50–62.
- 2015. The dynamics of incomplete lineage sorting across the ancient adaptive radiation of neoavian birds. PLoS Biology 13: e1002224.
- 1994. Kangaroo apples: Solanum sect. Archaesolanum. Published by the author, Adelaide, Australia.
- 1991. Universal primers for amplification of three non-coding regions of chloroplast DNA. Plant Molecular Biology 17: 1105–1109.
- 1986. Some probabilistic and statistical problems in the analysis of DNA sequences. Lectures on Mathematics in the Life Sciences 17: 57–86.
- 2016. Relationships among wild relatives of the tomato, potato, and pepino. Taxon 65: 262–276.
- 2008. PhyloNet: A software package for analyzing and reconstructing reticulate evolutionary relationships. BMC Bioinformatics 9: 322.
- 2017. GeSeq-versatile and accurate annotation of organelle genomes. Nucleic Acids Research 45: W6–W11.
- 2019. ggplot2: Elegant graphics for data analysis (2nd ed.). Measurement: Interdisciplinary Research and Perspectives 17: 160–167.
- 2007. A three-gene phylogeny of the genus Solanum (Solanaceae). Systematic Botany 32: 445–463.
- 2018. Inferring phylogenetic networks using PhyloNet. Systematic Biology 67: 735–740.
- 1998. Phylogenetic incongruence: Window into genome history and molecular evolution. In D. E. Soltis, P. S. Soltis, and J. J. Doyle [eds.], Molecular Systematics of plants II: DNA sequencing, 265–296. Springer, Boston, Massachusetts, USA.
- 2014. Phylotranscriptomic analysis of the origin and early diversification of land plants. Proceedings of the National Academy of Sciences, USA 111: e4859–e4868.
- 2019. Package ‘stringr.’ Website: http://stringr.tidyverse.org, https://github.com/tidyverse/stringr
- 2021. Inferring the deep past from molecular data. Genome Biology and Evolution 13(5): evab067.
- 2019. Genome sequence of Jaltomata addresses rapid reproductive trait evolution and enhances comparative genomics in the hyper-diverse Solanaceae. Genome Biology and Evolution 11: 335–349.
- 2020. Using ggtree to visualize data on tree-like structures. Current Protocols in Bioinformatics 69: e96.
- 2018. ASTRAL-III: Polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinformatics 19: 153.