The age and diversification of the angiosperms re-revisited†
The authors thank Steven Manchester for helpful discussion of angiosperm fossils and Alexei Drummond for advice on relaxed clock analyses and fossil prior constraints. The authors thank Sean Graham and two anonymous reviewers for feedback on earlier versions of this manuscript. This work was funded in part by the Office of Research and Sponsored Projects at the University of New Orleans. They also acknowledge the support of the Angiosperm Tree of Life Project (NSF EF-0431266).
Abstract
• Premise of the study: It has been 8 years since the last comprehensive analysis of divergence times across the angiosperms. Given recent methodological improvements in estimating divergence times, refined understanding of relationships among major angiosperm lineages, and the immense interest in using large angiosperm phylogenies to investigate questions in ecology and comparative biology, new estimates of the ages of the major clades are badly needed. Improved estimations of divergence times will concomitantly improve our understanding of both the evolutionary history of the angiosperms and the patterns and processes that have led to this highly diverse clade.
• Methods: We simultaneously estimated the age of the angiosperms and the divergence times of key angiosperm lineages, using 36 calibration points for 567 taxa and a “relaxed clock” methodology that does not assume any correlation between rates, thus allowing for lineage-specific rate heterogeneity.
• Key results: Based on the analysis for which we set fossils to fit lognormal priors, we obtained an estimated age of the angiosperms of 167–199 Ma and the following age estimates for major angiosperm clades: Mesangiospermae (139–156 Ma); Gunneridae (109–139 Ma); Rosidae (108–121 Ma); Asteridae (101–119 Ma).
• Conclusions: With the exception of the age of the angiosperms themselves, these age estimates are generally younger than other recent molecular estimates and very close to dates inferred from the fossil record. We also provide dates for all major angiosperm clades (including 45 orders and 335 families [208 stem group age only, 127 both stem and crown group ages], sensu APG III). Our analyses provide a new comprehensive source of reference dates for major angiosperm clades that we hope will be of broad utility.
Angiosperms are a diverse and species-rich clade of green plants, with the number of described species ranging from 250000 (96) or 260000 (83; see also 86; 72) to perhaps 400000 (Raven in 35). Flowering plants are also ecologically diverse and are found in every terrestrial habitat, as well as both fresh and saltwater habitats. Even before Darwin's famous correspondence to Hooker about the “abominable mystery” of the origin and early diversification of the angiosperms (see American Journal of Botany 96: 5–21), researchers have attempted to clarify relationships among members of this hyperdiverse clade. Over the past decade, plant systematists have made great strides in elucidating phylogenetic relationships among the major branches of the angiosperm tree of life (reviewed in 37; 79; 78, 36; see also 89; 53).
Attempts to estimate the age of the angiosperms and the timing of important divergences using molecular sequence data have seen much progress. Initial efforts yielded extremely variable estimates for the age of the angiosperms, ranging from ∼125 to >400 million years ago (Ma) (see 69; 81; 70; 4; 42; 75). However, most recent efforts to date the origin of the angiosperms using molecular data and improved dating methods have converged on more reasonable estimates between 180–140 Ma, predating the dates inferred from the fossil record by only 45 to 5 Myr (70; 4; but see 75, 42). A review of angiosperm dating studies has recently been published by 40.
Estimated ages for specific angiosperm clades using molecular data have generally been older than inferences from the fossil record (e.g., 94, compared with 42), but sometimes these discrepancies are small. For example, given the numerous diverse angiosperm fossils reported from as early as 115–125 Ma, the earliest angiosperms were almost certainly somewhat older than the oldest estimate based on the fossil record of 132 Ma. Conversely, molecular methods tend to overestimate ages (61), so refinement of dating approaches is needed to compensate for this bias. Attempts to estimate the timing of angiosperm origins have therefore relied on a variety of molecular data sets and estimation procedures.
To date, the most comprehensive divergence time analysis for the angiosperms, in terms of taxon sampling (560 angiosperm species plus seven outgroups, from 75% of the recognized flowering plant families; data set of 80) is that of 94, who estimated not only the age of the angiosperms, but also divergence times for all major clades. The resulting dates have subsequently been used as a temporal framework for many ecological studies (e.g., 74; 87; 20; 90), as well as “external,” or secondary, calibration points for subsequent divergence time analyses of groups that may lack reliable fossils (e.g., 58). 94 used nonparametric rate smoothing (NPRS) (65) to estimate divergence times across angiosperms using rbcL, atpB, and 18S rDNA sequence data and calibrated their tree using a single calibration point by fixing the split between Fagales and Cucurbitales at 84 Ma based on the fossils Protofagacea (29) and Antiquacupula (73), both from the Late Santonian. With this approach, they estimated an age for angiosperms of 179–158 Ma.
Although a landmark study, the analysis of 94 has several methodological shortcomings that were unavoidable at that time because the necessary analytical software was minimal: namely, reliance on a single calibration point and the use of NPRS. Like several other “relaxed clock” divergence time estimation methodologies (penalized likelihood [66, 68]; multidivtime [85; 38]), NPRS assumes an autocorrelation of rate changes, occurring at splitting events, where descendant branches inherit a rate similar to that of the parental branch. Although this model seems to fit evolutionary rates across many clades of angiosperms, rate changes within angiosperms are not strictly autocorrelated. For example, clear transitions in molecular rates have occurred when lineages shifted from a woody to an herbaceous habit (76; see also 6). This lineage-specific rate heterogeneity should be accounted for, when possible, in divergence time estimation. Although NPRS and penalized likelihood attempt to deal with such rate heterogeneity, through the application of smoothing methods, they cannot adequately accommodate severe shifts in evolutionary rates and extreme rate heterogeneity. However, recent advances in analytical approaches make such accommodations possible. The advantages and disadvantages of various dating methods, including the ability to accommodate extreme lineage-specific rate heterogeneity, have recently been discussed elsewhere (64; 23).
Although age estimates for angiosperms have converged somewhat across different studies, new data (both molecular and fossil), along with new estimation methodologies, will help to refine our inferences of the age of angiosperm origins and diversification. Given the interest in using large angiosperm phylogenies to investigate questions concerning ecology and comparative biology (e.g., 20; 90; 95) and with the continual refinement in our inferences of relationships among major angiosperm lineages (e.g., 33; 52, 51; 78; 89), we feel that new estimates of the ages of the major angiosperm clades are warranted. Improved divergence time estimation will concomitantly improve our understanding of both the evolutionary history of the angiosperms and the processes that have generated such high diversity on this branch of the tree of life.
In this paper, we estimate the age of the angiosperms, as well as divergence times of several major angiosperm lineages (e.g., Mesangiospermae, Eudicotyledoneae, Rosidae, Asteridae), using the data set (80) used by 94 and 36 calibration points and age constraints (Table 1). We used the computer software BEAST (19), which implements a “relaxed clock” methodology that does not assume any correlation between rates (18), thus accounting for lineage-specific rate heterogeneity while estimating ages and phylogeny simultaneously. We also provide dates for all major angiosperm clades (including 45 orders and 335 families [208 stem group age only, 127 both stem and crown group ages], sensu APG III), as well as for all remaining splits found in the 560-angiosperm tree. We also compare our results with those of 94.
Fossil (Clade) | Minimum age (Ma) | MRCA | Reference(s) | Mean (SD) |
---|---|---|---|---|
Unnamed (Hamamelidaceae) | 84 | Daphniphyllum and Itea | 46, 42 | 1.5 (0.5) |
Unnamed (Laurales) | 108.8 | Idiosperma and Sassafras | 12 | 2.1 (0.5) |
Pandanus sp. (Pandanales) | 65 | Stemona and Barbacenia | 55 | 1.8 (0.5) |
Dicolpopollis malensianus (Arecales) | 65 | Phoenix and Metroxylon | 57 | 1.8 (0.5) |
Restio sp. (Poales) | 68.1 | Zea and Puya | 55 | 1.8 (0.5) |
Spirematospermum chandlerae (Zingiberales) | 83.5 | Musa and Zingiber | 24 | 1.8 (0.5) |
Retitricolpites microreticulatus (Gunneraceae) | 88.2 | Myrothamnus and Gunnera | 55 | 1.5 (0.5) |
Unnamed (Caryophyllales) | 83.5 | Rhabdodendron and Spinacia | 9 | 1.5 (0.5) |
Dillenites (Dilleniaceae) | 51.9 | Dillenia and Tetracera | 9 | 1.5 (0.5) |
Unnamed (Santalales) | 51.9 | Schoepfia and Santalum | 9 | 1.5 (0.5) |
Unnamed (Ericales) | 91.2 | Impatiens and Arbutus | Nixon and Crepet (1993) | 1.5 (0.5) |
Fraxinus wilcoxiana (Lamiales) | 44.3 | Olea and Pedicularis | 7 | 1.5 (0.5) |
Cantisolanum daturoides (Solanales) | 44.3 | Nolana and Schizanthus | 9 | 1.5 (0.5) |
Ilexpollenites sp. (Aquifoliaceae) | 85 | Ilex and Gonocaryum | 55 | 1.5 (0.5) |
Unnamed (Vitaceae) | 57.9 | Leea and Vitis | 9 | 1.5 (0.5) |
Esqueiria futabensis (Myrtales) | 88.2 | Epilobium and Qualea | 82 | 1.5 (0.5) |
Unnamed (Sapindales) | 65 | Citrus and Bursera | 39 | 1.5 (0.5) |
Unnamed (Fabales) | 59.9 | Pisum and Polygala | 28 | 1.5 (0.5) |
Unnamed (Cercidiphyllaceae) | 65 | Cercidipyllum and Crassula | 42 | 1.5 (0.5) |
Divisestylus (Iteaceae) | 89.3 | Ribes and Itea | 30 | 1.5 (0.5) |
Ailanthus (Simaroubaceae/Rutaceae, Meliaceae) | 50 | Ailanthus and Swietenia | 11 | 1.5 (0.5) |
Burseraceae/Anacardiaceae | 50 | Bursera and Schinus | 10 | 1.5 (0.5) |
Parbombacaceoxylon (Malvales s.l.) | 65.5 | Thymea and Bombax | 92, 12) | 1.5 (0.5) |
Paleoclusia (Clusia/Hypericum) | 89 | Dicella and Mesua | 15 | 1.5 (0.5) |
Illiciospermum (Illiciales) | 89 | Illicium and Schisandra | 26 | 1.5 (0.5) |
Diplodipelta (Caprifoliaceae) | 36 | Valeriana and Dipsacus | 48 | 1.5 (0.5) |
Virginianthus (Calycanthaceae) | 98 | Calycanthus and Liriodendron | 25 | 1.5 (0.5) |
Hedyosmum sp. (Chloranthaceae) | 98 | Hedyosmum and Chloranthus | 16 | 1.5 (0.5) |
Perisyncolporites (Malpighiales) | 49 | Dicella and Malpighia | 34 | 1.5 (0.5) |
Pseudosalix (Malpighiales) | 48 | Idesia and Populus | 5 | 1.5 (0.5) |
Unnamed (Cornales) | 86 | Cornus and Nyssa | 13 | 1.5 (0.5) |
Platanocarpus brookensis (Proteales) | 98 | Platanus and Nelumbo | 14 | 1.5 (0.5) |
Unnamed (Buxaceae) | 98 | Didymeles and Buxus | 17 | 1.5 (0.5) |
Unnamed (Bignoniaceae) | 49.4 | Catalpa and Verbena | 91 | 1.5 (0.5) |
Unnamed (Bignoniaceae) | 35 | Catalpa and Campsis | 47 | 1.5 (0.5) |
MATERIALS AND METHODS
Estimation of phylogeny and relaxed-clock divergence times
Due to the lack of rate constancy among lineages (based on a likelihood ratio test [21]: P < 0.001 for all three data partitions: rbcL, 18S rDNA, atpB, as well as the concatenated data set), we estimated divergence times under a relaxed molecular clock. We used a Bayesian method (18) implemented in the program BEAST v.1.4.8 to estimate the phylogeny and divergence times simultaneously. We performed two analyses: in one, we estimated rates and ages from our sequences, modeling fossils as exponential priors, and in a second analysis, we set fossil priors to fit a lognormal distribution. In each case, we partitioned the data set by gene, estimating separate rates and rate-change parameters for each partition. Bayes factors, as calculated in the program Tracer v.1.3 (60), favored the uncorrelated lognormal (UCLN) model for rate change over the strict clock model (see 56, and references therein).
We set the underlying model of molecular evolution to be GTR + I + Γ, for each of the individual genes (78). We also used the UCLN model, which allows for rates of molecular evolution to be uncorrelated across the tree. BEAST also allows for uncertainty in the age of calibrations to be represented as prior distributions rather than as strict/fixed calibration points. We therefore constrained the minimum ages of several of the clades in the tree to prior probability distributions (see below). For each analysis, we initiated four independent Markov chain Monte Carlo (MCMC) analyses from starting trees with branch lengths that satisfied the priors on divergence times. A starting tree with branch lengths satisfying all fossil prior constraints was created using the program r8s version 1.7 (68) using NPRS. For each MCMC analysis, we ran six independent chains for 100 million generations and assessed convergence and stationarity of each chain to the posterior distribution using Tracer v.1.3 (60) and by plotting time series of the log posterior probability of sampled parameter values. After stationarity was achieved, we sampled each chain every 1000 steps until an effective sample size (ESS) of more than 200 samples was obtained. If convergence between the independent chains was evident, we combined the samples from each run using the program LogCombiner v.1.4.7 (part of the BEAST distribution). We also explored the influence of our priors by performing an MCMC run without the data (18), which allows researchers to investigate the influence of the prior distribution and its contribution to the posterior. With the advent of Bayesian methods for phylogenetic inference, systematists have become aware that the posterior distribution is a product of both the prior distribution (imposed by the investigator) and the likelihood function (information coming from your data/observations). Strong priors can override any signal from the data, just as vague, or uninformative, priors may get swamped by the signal in the data. XML files for all analyses have been submitted as Appendices S25 and S26 (see Supplemental Data at http://www.amjbot.org/cgi/content/full/ajb.0900346/DC1).
Fossil constraints
We treated all fossils as minimum age constraints (see Table 1), with the exception of the root node, which we set to a uniform distribution between 132 Myr (minimum age of angiosperms) and 350 Myr to correspond to the age of the most recent common ancestor (MRCA) of extant seed plants (63). The latter age is somewhat older than previous molecular estimates of the MRCA of seed plants (67; 42) and of the Cordaitales [see 84]), the oldest fossil member of the group found within the living lineages containing seed plants (41). We modeled all other fossil constraints as either an exponential distribution (32) with a mean of 1 and an offset (hard bound constraint) that equaled the minimum age of the fossil, or as a lognormal distribution with different means and standard deviations (see Table 1). The exponential distribution is similar to that of a lognormal distribution, in that it has a long tail of diminishing probability toward older ages (32). We chose to use an exponential distribution to minimize the number of additional parameters being estimated from the data. We assigned the ages of the fossils to the nodes of the MRCAs of the crown groups (Table 1) by enforcing the monophyly of these clades. In all cases, the monophyly of these constrained clades was well supported by previous phylogenetic analyses. Recently, 75 used an uncorrelated lognormal relaxed clock to investigate the age of the angiosperms and other vascular plants using 33 fossil constraints, many of which we use in this analysis. 42 used 50 fossil constraints (49 minimum and one maximum) in their analyses of angiosperms, and we have used 12 of the same fossils here. However, in many cases, we avoided some of the youngest fossils used by 42 because an abundance of constraints near the tips can bias estimates for deeper nodes (3). Finally, we selected fossils we considered most reliable in terms of both age and identification based on our own recent investigations (e.g., 36; 89).
RESULTS
Model selection and parameter estimation
Across all analyses, we found a coefficient of variation greater than 1, further evidence that the molecular sequence data are not evolving in a clock-like fashion. Across all analyses, the covariance value was greater than zero, suggesting we cannot reject an autocorrelation in substitution rates between ancestral and descendant lineages in the tree. Although autocorrelation may hold across the entire tree, certain clades (e.g., Monocotyledoneae) exhibit exceptionally high rate heterogeneity, warranting the use of methods that permit, but do not require, an assumption of autocorrelated rates, appropriate when there are sharp shifts in evolutionary rates. Furthermore, sharp shifts in evolutionary rates are associated with changes in life history (76).
Phylogeny estimation
Results for the phylogenetic analyses are presented in Appendices S1–S12 (see online Supplemental Data) and are in strong agreement with previous analyses of these data using Bayesian methods (78). The resulting topology is also consistent with more recent publications investigating angiosperm phylogeny (53), with a few notable exceptions (e.g., Amborella sister to Nymphaeales). However, some of these differences are not well supported (see Appendices S1–S12).
Bayesian age estimation
Age estimates for some of the major angiosperm splits from the two separate BEAST analyses are summarized in Table 2, along with the range of ages for these same splits obtained by 94. Superrosids and superasterids are new clade names (see 53) corresponding, respectively, to (1) Rosidae, Vitaceae, and Saxifragales, and (2) Berberidopsidales, Santalales, Caryophyllales, and Asteridae (Dilleniaceae are not placed with strong support in either of these two major clades). Crown group ages for all major angiosperm clades (recognized as 45 orders and 335 families [208 stem group age only, 127 both stem and crown group ages], sensu APG III) are provided in Appendices S13–S14 (see online Supplemental Data). The BEAST analysis that treated fossil priors as lognormal distributions provided an older estimated age (167–199 Ma) for crown group angiosperms than that using an exponential distribution (141–154 Ma), as well as a larger variance around age estimates, especially at the base of the tree. Our results overlap those of 94 considerably (Table 2).
Node | Clade | Wikström et al. | BEAST a | BEAST b |
---|---|---|---|---|
1 | Angiospermae | 158–179 | 147 (141–154) | 183 (167–199) |
2 | 153–171 | 144 (138–150) | 173 (160–187) | |
3 | Mesangiospermae | nc | 130 (123–138) | 146 (139–156) |
4 | nc | 127 (119–135) | 140 (128–140) | |
5 | Magnoliidae | 122–132 | 125 (121–130) | 122 (108–138) |
6 | 127–134 | 120 (111–129) | 119 (100–138) | |
7 | 108–113 | 120 (11–129) | 118 (107–133) | |
8 | 140–155 | 136 (130–142) | 156 (146–168) | |
9 | Eudicotyledoneae | 131–147 | 129 (123–134) | 130 (123–139) |
10 | 130–144 | 126 (120–130) | 129 (116–143) | |
11 | 128–140 | 121 (113–130) | 125 (110–138) | |
12 | 124–137 | 121 (116–126) | 134 (120–145) | |
13 | 123–135 | 119 (109–129) | 127 (109–139) | |
14 | Gunneridae | 116–127 | 119 (109–129) | 127 (109–139) |
15 | Pentapetalae | 114–124 | 117 (107–127) | 121 (111–124) |
16 | Superasterids | 104–111 | 117 (113–123) | 120 (112–131) |
17 | 106–114 | 111 (104–117) | 121 (113–129) | |
18 | nc | 106 (100–113) | 114 (107–122) | |
19 | Asteridae | 102–112 | 104 (98–111) | 110 (101–119) |
20 | 114–125 | 99 (93–106) | 108 (99–116) | |
21 | Core asterids | 107–117 | 93 (85–102) | 100 (92–109) |
22 | Superrosids | 111–121 | 117 (111–121) | 128 (120–135) |
23 | Rosidae | 108–117 | 101 (97–105) | 125 (118–132) |
24 | 95–101 | 114 (109–119) | 116 (108–121) |
- Note: nc = Node not compatible with inferred tree
- a Based on 36 minimum age constraints treated as exponential distributions.
- b Based on 36 minimum age constraints treated as lognormal distributions.
DISCUSSION
Age and diversification of angiosperms
Whereas early molecular-based estimates for the age of the angiosperms yielded highly disparate results, recent estimates have converged in the narrow range of approximately 180–140 Ma (70; 4; 77). Our results from both analyses also fall in this range (141–154 Ma when age constraints were treated as an exponential distribution and 167–199 Ma when treated as a lognormal distribution; Table 2). Importantly, this range is somewhat older than the early Cretaceous date of 132 Ma based on the known fossil record, suggesting a Lower Jurassic to Lower Cretaceous origin and initial diversification of crown group angiosperms. Recently, 75 and 42 obtained molecular-based estimates suggesting a Triassic origin of angiosperms. Hence, these molecular estimates raise the possibility that the oldest crown angiosperm fossils are still undiscovered.
Furthermore, molecular phylogenetic analyses suggest not just one or a few major radiations in the angiosperms, but layer upon layer of rapid radiation. For example, a series of recent studies, many based on complete plastid genome data sets, have inferred rapid radiations throughout the diversification of major groups of angiosperms, including the basal lineages of Mesangiospermae (33; 52), Pentapetalae (53), Rosidae (89), and Saxifragales (36), and the radiation of lineages of superasterids (53).
Despite convergence in age estimates for the origin of angiosperms among most recent studies (e.g., 70; 4; but see 42 and 75), new information on angiosperm phylogeny is continually improving our views on relationships, and revised topologies may influence the age estimates obtained across the tree. For example, the recent use of entire plastid genome sequence data has improved our understanding of the relationships among the major angiosperm clades (33; 52, 51). However, the topology of the Bayesian tree inferred here, although very similar overall to these recent results, differs in several respects. For example, complete plastid genome sequencing places Magnoliidae with Chloranthaceae (although with weak support), and this clade is sister to Monocotyedoneae and (Eudicotyledoneae + Ceratophyllaceae) (52). In contrast, the Bayesian tree used here has Monocotyledoneae weakly supported as sister to a trichotomy of Magnoliidae, Chloranthaceae, and Ceratophyllaceae + Eudicotyledoneae. Among Gunneridae, plastid genome sequencing reveals two major clades: superrosids and superasterids (53). The Bayesian tree is consistent with this more recent result. A shortcoming, however, of the 567-taxon Bayesian tree involves relationships within the Magnoliidae (Fig. 1). The Bayesian analysis did not recover relationships as they have been understood for several years: Canellales + Piperales and Laurales + Magnoliales (see 59; 1). In addition, in the Bayesian tree, Dilleniaceae are placed sister to Caryophyllales (as part of the superasterid clade), but plastid genome data suggest a relationship of Dilleniaceae to Rosidae, Vitaceae, and Saxifragales (53).
Comparison of methods of divergence time estimation
The ranges of the various age estimates obtained across major clades from the different methods compared here overlap considerably (see Table 2). The BEAST analysis that treated fossil priors as lognormal distributions provided the oldest age estimates across the tree (167–199) Ma for crown group angiosperms) and also exhibited the largest variance around the age estimates obtained, especially at the base of the tree. The two BEAST analyses that employed multiple (36) constraints yielded similar estimates. In nearly all cases, the 95% highest posterior distributions (HPD) for node ages of the Bayesian analyses overlapped one another, as well as with the ranges published by 94.
Dealing with uncertainty
Since the concept of a molecular clock was first proposed (97), the idea of using macromolecules to date divergence times between taxa has been met with great enthusiasm and, subsequently, with great skepticism, given the uncertainty inherent in the stochastic nature of the molecular substitution process (see 31). As methodologies were refined, researchers began to estimate the associated error in the substitution process, usually with nonparametric bootstrap procedures (2). These methods deal with additional uncertainty in age estimates due to weakly supported nodes in the phylogeny (phylogenetic uncertainty), as well as the problems associated with assigning fossil dates to specific nodes in a phylogeny. Our BEAST results are “usually” more uncertain than those that use traditional nonparametric bootstrap error estimates (e.g., 94). This additional uncertainty in Bayesian analyses is due to at least two factors. First, a Bayesian analysis by its very nature explicitly incorporates uncertainty, such as that associated with fossil constraints: instead of using fossils as fixed calibration points, fossil constraints are modeled as distributions that cover a broader range of possible ages. Second, each of the multitude of additional parameters that are being estimated in such analyses contributes to the amount of uncertainty associated with age estimates. This is definitely the case when only a single fossil constraint (in addition to the constraint placed on the root node) was used to calibrate the tree (data not presented). However, as more prior constraints (i.e., fossils) are incorporated into the analysis, we hope that error bars around each node will become narrower, especially in areas of the tree in close proximity to where we placed fossil priors.
Recent advances now allow investigators to incorporate uncertainty in fossil ages into divergence time analyses (see 32). However, choosing among the various distributions to model this uncertainty remains problematic. It is rare that stratigraphic sampling for any group of organisms is sufficiently complete to get a good idea of maximum ages (i.e., lower bounds). Because of this, the use of lower stratigraphic bounds or distributions that have a hard lower bound (e.g., a uniform distribution) will be much harder to justify. And at this point, at least for angiosperms, we know of no rigorous statistical analysis that has tried to estimate how much of a gap we might expect between the first appearance of a fossil in the record and its potential age of origination (49, 12; 22; 88). Selecting among different parametric distributions without this type of information will be difficult. For our analyses, we used both an exponential distribution and a lognormal distribution as priors for fossil distributions. As noted by 32, the exponential distribution might be a good alternative to a lognormal in the face of inadequate paleontological information. Our results show that age estimates from analyses that specified an exponential distribution are generally younger than and have narrower 95% HPD than those resulting from use of a lognormal distribution.
Future directions
Truly massive phylogenetic analyses that include thousands of species have recently been conducted (e.g., 76; 27). The use of ever-larger trees from different data sources and more fossils with more sophisticated prior distributions affords exciting opportunities for divergence time estimation in the years ahead. We hope that this study will serve as a community resource to further our understanding of angiosperm evolution and diversification. These results provide calibration points for future studies of divergence times for groups that lack a fossil record as well as an ideal framework for investigating potential hypotheses concerning codiversification with other organisms (e.g., 71; 54; 62). However, it is also important to note the potential limitations of the age estimates provided here. Caveats include the number of fossil age constraints (which are not evenly distributed across the tree), as well as potential problems relating to the topology itself. In addition, 51 has raised the concern that partitioned Bayesian analyses may be susceptible to inferring the wrong tree topologies when MCMC runs get trapped in parameter space characterized by unrealistically long trees. Given the overall congruence between our topological results and previous analyses of these same data, we do not feel that this is much of a concern for our analyses. However, no theoretical study has looked at the behavior of partitioned Bayesian analyses and divergence time estimation.
Although 560 angiosperm exemplars were included, there is low taxonomic sampling at the tips of the tree, with most clades recognized as families represented by only one or two exemplars, and these exemplars do not always represent basal divergences in each clade. As a result, the ages for clades recognized as families and orders in the supplemental tables may be based on recent divergences within each clade and therefore may not necessarily be accurate estimates for the crown ages of these clades in their total circumscription. Despite these various concerns and possible limitations, this analysis joins other recent efforts in providing new insights into the diversification and the origin of the angiosperms (e.g., 42; 75). Our analyses provide the first new comprehensive source of reference dates for the major angiosperm clades since the widely cited study of 94; we hope these dates will be broadly used by the biological community.