A protocol for high‐throughput, untargeted forest community metabolomics using mass spectrometry molecular networks

Premise of the Study We describe a field collection, sample processing, and ultra‐high‐performance liquid chromatography–tandem mass spectrometry (UHPLC‐MS/MS) instrumental and bioinformatics method developed for untargeted metabolomics of plant tissue and suitable for molecular networking applications. Methods and Results A total of 613 leaf samples from 204 tree species was collected in the field and analyzed using UHPLC‐MS/MS. Matching of molecular fragmentation spectra generated over 125,000 consensus spectra representing unique molecular structures, 26,410 of which were linked to at least one structurally similar compound. Conclusions Our workflow is able to generate molecular networks of hundreds of thousands of compounds representing broad classes of plant secondary chemistry and a wide range of molecular masses, from 100 to 2500 daltons, making possible large‐scale comparative metabolomics, as well as studies of chemical community ecology and macroevolution in plants.

Innovations in phylogenetics and phylogenomics are rapidly advancing our understanding of the tree of life, enabling the study of macroevolution at unprecedented scales. Despite these developments, the overwhelming diversity of plant secondary metabolites of unknown structure and the taxonomic rarity of any given compound have until recently remained obstacles to comparative metabolomics, the comparison of small-molecule metabolite profiles, at the large taxonomic scales necessary for the study of macroevolution and community ecology. However, recent advances in tandem mass spectrometry (MS/MS) bioinformatics enable the highthroughput comparison of the structures of unknown compounds (Wang et al., 2016), making possible comparative metabolomics at scales necessary for the study of chemical community ecology and macroevolution (Sedio, 2017).
The structural comparison of unknown molecules using MS/MS is possible because molecules with similar structures fragment into many of the same substructures. MS/MS spectra can be collected from complex mixtures directly, or with the added separation provided by ultra-high-performance liquid chromatograph (UHPLC), making MS-based metabolomics scalable to data sets containing hundreds of samples and tens of thousands of unique molecules.

For the Special Issue: Methods for Exploring the Plant Tree of Life
A protocol for high-throughput, untargeted forest community metabolomics using mass spectrometry molecular networks Brian E. Sedio 1,2,4 , Cristopher A. Boya P. 2,3 , and Juan Camilo Rojas Echeverri 2 network approach for (a) visualization of structural relationships among unknown metabolites, (b) comparative metabolomics among plant species, and (c) identification of known compounds by searching public MS libraries (Wang et al., 2016). Here, we describe a protocol for sample collection, chemical extraction, UHPLC-MS/MS instrumental methods, and bioinformatics workflow for the generation of molecular networks for plant metabolomics. This protocol is simple to execute, broadly inclusive of plant secondary chemical variation, effective over a relatively wide range of variation in polarity and molecular mass, and scalable to sample sizes large enough to facilitate chemical community ecology in species-rich plant communities such as tropical forests.

Field collection
For community metabolomics of the forest plots at BCI and SERC, we collected young, unlignified leaves from saplings encountered in the shaded understory during the rainy season between June and August 2014. Leaves were placed on ice immediately in the forest and transferred to a −80°C freezer within 3 h of collection. See field collection protocol in Appendix 2.

Extraction and sample preparation
We homogenized 100 mg of frozen leaf tissue on liquid nitrogen in a ball mill (TissueLyser; QIAGEN, Hilden, Germany) and extracted the homogenate with 700 μL of 90% methanol : 10% water (pH 5) for 10 min. Methanol is an effective solvent for small molecules representing a wide range in polarity; mild acidity improves the extraction of most alkaloids. The solution was vortexed and centrifuged, and the supernatant was isolated. The extraction was repeated on the remaining sample, and the fractions were combined. Samples were diluted in identical extraction solvent and filtered using 4-mm syringe filters with a hydrophilic polytetrafluoroethylene (PTFE) membrane with a 0.20-μm pore size (Merck Millipore, Billerica, Massachusetts, USA) prior to analysis using UHPLC-MS/ MS. See the chemical extraction protocol in Appendix 3.

Liquid chromatography instrument methods
Samples were analyzed using an Infinity 1290 UHPLC from Agilent Technologies (Santa Clara, California, USA) with a Kinetex C18 column that was 100 mm in length, 2.1 mm in internal diameter, with a 1.7-μm particle size (Phenomenex, Torrance, California, USA), and a flow rate of 0.5 mL/min at 25°C (no flow splitting was used prior to infusion into the mass spectrometer). To separate a complex  . We then sequentially tuned ion guide funnels and multipoles by modifying radio frequency (RF) stepping and transfer time until we were able to detect molecules ranging from 100 to 2500 m/z. Data-dependent collision energies were optimized to improve fragmentation quality and sensitivity. Mass spectra were acquired using a micrOTOF-QIII mass spectrometer from Bruker Daltonics by ESI in positive mode. The ESI source parameters were: end plate offset, 500 V; capillary voltage, 4500 V; nebulizer, 2.0 bar (nitrogen gas); dry gas, 9.0 L/min; and dry temperature, 200°C. The ion optics settings included: funnel 1 RF amplitude, 150 Vpp; funnel 2 RF amplitude, 300 Vpp; hexapole RF amplitude, 150 Vpp; in-source collision-induced dissociation (isCID) energy, 0.0 eV; quadrupole ion (transfer) energy, 10.0 eV; quadrupole low mass cut-off, 50.0 m/z; and pre-pulse storage, 10.0 μs. Data were acquired both for molecular ions (MS 1 ) and fragment ions (MS 2 ) in data-dependent fragmentation (auto MS/MS). For MS 1 acquisition, three spectra were collected per second (3 Hz).
For MS 2 acquisition, the rate of acquisition was slowed down for low-intensity molecular ions (20,000 counts) to 2 Hz in an attempt to increase the sensitivity for these ions and kept at 3 Hz for high-intensity molecular ions (1,000,000 counts); we employed a linear gradient in the rate of acquisition for species of intermediate ion intensity. In an attempt to increase sensitivity, we utilized the advanced stepping mode to preferentially transfer (through the collision cell) low-intensity precursor ions and different fragment ions, resulting in acquisition of an averaged mass spectrum with four different parameter combinations ( Data-dependent fragmentation (auto MS/MS) was set to select a maximum of five precursor ions with intensities ≥6500 counts per fragmentation cycle of 3.0 s. A maximum of three spectra were collected for each precursor ion before placing it in an exclusion list for 1 min to allow collection of as many different ions per chromatographic peak as possible. The fragmentation energies used for two possible charged states (singly and doubly charged) are presented in Table 1.  The optimized MS method provided a detection range of 100 to 2500 m/z. It should be noted that these parameter values are unique to the micrOTOF-QIII instrument. However, we suggest a similar approach to optimization for a wide mass range by using a calibration solution (e.g., ESI-Tunemix, G1969-85000; Agilent Technologies) to sequentially modify the MS settings until the desired m/z range is achieved. A chemically diverse biological extract consisting of a single sample (e.g., P. acuminata) or a pool of samples that are representative of molecular families of interest can be used to further tune the collision energies and confirm their suitability for the biological system to be analyzed. Although we used static collision energies for discrete m/z ranges (Table 1), ramping or stepping the collision energy applied during collision-induced dissociation within each m/z range may further improve the quality of molecular fragmentation achieved over a range of masses, molecular ion stabilities, and chemical classes. To eliminate non-informative fragmentation spectra, we filtered spectral matches by requiring a minimum number of matched fragment ions in the downstream bioinformatics analyses (see Bioinformatics, below).

Bioinformatics
We generated a molecular network using the online workflow at GNPS (https://gnps.ucsd.edu/; Wang et al., 2016). First, we filtered the data by removing all MS/MS peaks within ±17 Da of the precursor m/z. We then window-filtered the MS/MS spectra by choosing only the top six peaks in each ±50 Da window throughout the spectrum. The data were then clustered with MS-Cluster (Frank et al., 2008) with a parent mass tolerance of 2.0 Da and an MS/MS fragment ion tolerance of 0.5 Da to generate consensus spectra representing putative unique molecular structures. Consensus spectra containing <2 spectra were discarded and the remaining spectra were networked. Edges were formed for spectral matches with cosine score ≥0.6 and ≥6 matched peaks. Edges were retained in the network only if both nodes linked by the edge were in each other's top 10 most similar nodes.
During network generation, spectra were compared to annotated spectra in public libraries through GNPS (Wang et al., 2016). We applied identical filter criteria to library spectra as to our input data and employed the GNPS analog library search method with a maximum mass shift of 100 Da. We retained matches to library spectra characterized by a cosine score ≥0.6 and with ≥6 matched peaks. The "group mapping" feature of GNPS allows one to track the origin of spectra, and hence, the plant species, tissue, or treatment in which a compound was detected. Network visualization software such as Cytoscape (www.cytoscape.org) can be used to generate publication-quality figures of molecular networks that illustrate attributes of the data such as molecular mass ( Fig. 2A); incidence in plant species, tissues, or treatments ( Recent developments in the bioinformatics pipeline for the assembly of molecular networks have improved upon the methods we describe above in several key respects (Olivon et al., 2017). Namely, the MS-Cluster algorithm (Frank et al., 2008) for grouping spectra into consensus spectra was originally designed for proteomics rather than small molecule metabolomics and therefore was not designed to consider differences in LC retention time that typically distinguish structural isomers with identical molecular masses. Olivon et al.

CONCLUSIONS
We have developed an effective untargeted plant metabolomics workflow for community metabolomics, including a protocol for tissue collection in the field, a chemically general extraction protocol that retains compounds from a broad spectrum of plant secondary chemistry and is appropriate for diverse taxa, a UHPLC-MS/MS instrumental method suitable for a wide range of polarities and molecular size classes, and a protocol for sharing and networking MS/MS data with the GNPS molecular networking platform (Wang et al., 2016). Because of its simplicity and generality, this workflow can be scaled for the collection of large and taxonomically and chemically diverse data sets, such as ecological communities or evolutionary lineages, thus facilitating the study of chemical community ecology and macroevolution (Sedio, 2017). Future efforts should test the robustness of this workflow for field collections in remote locations where the freezing of tissue may be unfeasible and in situ drying of tissue may be the preferred means of sample collection. In addition, alternative extraction solvents, LC column stationary phases, and   This setup is convenient if tissue for metabolomic analysis is intended for intraspecific or intra-individual comparative metabolomics, if tissue for metabolomic analysis is to be collected alongside tissue intended for RNA extraction, or if collections are to be made over multiple days at a site more than a few hours from a laboratory equipped with a −80°C freezer.
A. Field supplies: In addition to the supplies listed for Setup 1, also bring: 1. Do not allow the metal TissueLyser adaptor plates to come into contact with liquid nitrogen. Expose each plastic tube rack to liquid nitrogen prior to fitting the tube rack into the adaptor plate.
2. Stainless steel beads are reusable. Wash the beads with warm, soapy water and rinse thoroughly with distilled water to remove soap residue and allow to air dry. If beads are to be used in nucleic acid extractions, incubate beads in 0.4 M HCl for 1 min at room temperature, rinse thoroughly with distilled water, and allow to air dry. 3. All centrifugation steps should be performed at room temperature. E. Safety The blank is a 2-mL Safe-Lock tube containing no leaf tissue. Apply all steps to the blank as if it were a leaf sample. Compounds found in blanks will be removed from downstream analyses. 2. Weigh 100 mg of frozen leaf tissue. Record the weight. 3. If the sample was collected directly into a 2-mL Safe-Lock tube, return the sample to the tube. If sample material was not collected directly into a 2-mL Safe-Lock tube, label a tube and place the sample into it. 4. Place a stainless steel bead into each tube. Screw on the screw cap and place the tube in the TissueLyser tube rack. 5. Repeat Steps 2 through 4 for a set of 12 or 24 samples. 6. Cool the TissueLyser tube rack in liquid nitrogen. If using dried or lyophilized tissue, the tubes do not need to be frozen in liquid nitrogen. 7. Fit each tube rack between the TissueLyser adapter plates and place them into the TissueLyser clamps as described in the TissueLyser User Manual. Tighten the clamps tightly by hand. Work quickly so that the plant material does not thaw. 8. Grind the samples for 2 min at 20 Hz. 9. Remove and disassemble the plates and racks, noting the orientation of the tube racks during the first round of homogenization. Ensure that each tube's screw cap is tightly closed. This script will (1) apply lock-mass calibration using the reference signal selected in the method parameters (e.g., reserpine), (2) export line spectra in .mzXML format to the folder specified (e.g., "C:\Users\username\userfolder\"), and (3)  During the processing period, DataAnalysis will be busy and a great deal of the computer processing power will be occupied. We recommend running the conversion process overnight and avoiding other computationally intensive processes while data conversion is taking place. In our case, it was usually left for overnight processing. 14. When the process is complete, the exported .mzXML files will be found in the assigned folder.

APPENDIX 5. Global Natural Products Social (GNPS) Molecular
Networking bioinformatics workflows using MS-Cluster.
The following protocol uses mass spectra in the construction of a molecular network using the GNPS Molecular Networking online platform • If the spectra can be meaningfully organized into six groups, for example, if the spectra were derived from six plant species, then the spectra representing each group can be uploaded into the six Spectrum Files folders. Proceed to Step 6. • If the network will contain more than six groups, add all user-supplied input mass spectra files to Spectrum Files G1. • Select the files you want to use for Group mapping, followed by selecting the Group mapping buttons, and then select the files to use for Attribute mapping, followed by selecting the Attribute mapping buttons. • A .txt Group mapping file must be provided by the user.
This file must be custom-edited using a text editor (e.g., Notepad++ for Windows or TextWrangler/BBEdit for Mac). • To create a Group Mapping file use the following format: GROUP_GroupName1=file1.mzXML;file2.mzXML GROUP_ GroupName2=file3.mzXML;file4.mzXML Where "GroupName1" can be any user-defined group name (e.g., PsychotriaAcuminata, or psycac, or BCI), and "file1. mzXML, " etc. are user-generated tandem mass spectrometry (MS/MS) spectra files. Each line in the Group mapping file must begin with the prefix "GROUP_" in all capital letters. • Note: The downstream GNPS network analyses provided in Appendix 6, below, do not depend on groups defined in the Group mapping file, but rather assume that filenames include a six-character species code with which to identify spectra. 6. Select Finish Selection to return to the Network Workflow page. 7. To adjust parameters that govern the sensitivity of MS-Cluster (under-the-hood software that generates consensus MS/ MS spectra), molecular networking and spectral library searches, navigate to Basic Options.
• Under Basic Options → Precursor ion mass tolerance, set the parameter to a value from 0.0075 to 2.0 Da; the lower value represents lower ppm error tolerance.
The particular setting of this parameter depends on the mass accuracy of the mass spectrometer as well as the specific instrument method used to collect the MS/MS data. We recommend using a precursor ion mass tolerance value below 1 Da for data collected on a quadrupoletime-of-flight (q-TOF) instrument.

• Under Basic Options → Fragment Ion Mass
Tolerance, set the parameter to a value from 0.0075 to 2.0 Da. This value specifies within what range fragment ion m/z values will be considered equivalent. We recommend using values below 0.5 Da for data collected on a q-TOF instrument. 8. To adjust parameters governing the molecular network similarity matrix alignment and the formation of links between nodes, navigate to Advanced Network Options (see Fig. A5-3).
• To adjust the threshold of similarity that must occur between a pair of consensus MS/MS spectra, set the value for Minimum cosine score to a value between 0.5 and 0.99. The default value is 0.7. Lower values will increase the size of the clusters due the clustering of less similar MS/MS spectra, and higher values will generate smaller clusters and leave more nodes unlinked. We recommend a value ≥0.6 for Minimum cosine score. • To adjust the maximum number of links to other nodes permitted for any single node, set the value for Network TopK. The default value is 10. The edge between two nodes is kept only if both nodes are within each other's TopK most similar nodes. We use the default value. • To adjust the minimum number of MS/MS spectra permitted to form a consensus spectrum, set the value for Minimum Cluster Size to a value ≥1. Make sure that Run MsCluster is activated (set to yes). MSCluster merges nearly identical MS/ MS spectra into consensus spectra that represent structurally unique molecules (Frank et al., 2008). We use values from 1 to 2 depending on the number of replicates collected per sample. • To modify the number of common fragment ions compared between two spectra, set the value for Minimum Matched Fragment Ions. The default value is 6. We use values from 3 to 6 depending on the molecular weight of molecules. • To adjust the maximum size of nodes allowed in a single connected network, set the value for Maximum Connected Component Size (Beta); the default value is 100. We use the default parameter for small networks of one to 10 samples. For large networks (more than 10 samples), we allow