Transcriptome analysis of maca (Lepidium meyenii) root at different developmental stages

Premise of the Study Maca (Lepidium meyenii; Brassicaceae) has been cultivated by Andeans for thousands of years as a food source and has been used for medicinal purposes. However, little is known about the mechanism underlying material accumulation during plant growth. Methods RNA‐Seq technology was used to compare the transcriptome of black maca root at three developmental stages. Gene Ontology term enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were applied for the identification of pathways in which differentially expressed genes were significantly enriched. Results Trinity was used to de novo assemble the reads, and 120,664 unigenes were assembled. Of these, 71.53% of the unigenes were annotated based on BLAST. A total of 18,321 differentially expressed genes were observed. Gene Ontology term enrichment analysis found that the most highly represented pathway among the differentially expressed genes was for genes involved in starch and sucrose metabolism. We also found that genes involved in secondary metabolite biosynthesis, such as glucosinolate biosynthesis, were significantly enriched. Discussion The genes that were differentially expressed between developmental time points likely reflect both developmental pathways and responses to changes in the environment. As such, the transcriptome data in this study serve as a reference for subsequent mining of genes that are involved in the synthesis of important bioactive components in maca.

processes that are active or suppressed under certain conditions, such as metabolic or signaling pathways (Yang et al., 2015). In this study, we used genome-wide transcript profiling to predict the accumulation of compounds and metabolites during the development of maca root. Because the biosynthesis and accumulation of certain secondary metabolites are often stage-and tissue-specific (Salam et al., 2014;Yu et al., 2015), we analyzed several development stages of maca root, which we believe provides an important foundation for the identification of candidate genes acting in the mechanism underlying material accumulation during root growth.

MATERIALS AND METHODS
Black maca roots grown in Dahai, Huize County, Yunnan Province, China, were used in this study. From August 2012 to February 2013, we harvested 100 maca root specimens (from 100 individuals) every month to investigate the growth dynamics of root diameter and root fresh weight. The results showed that the growth curve of maca root was S-shaped (Appendix S1). Based on these findings, we divided the development of maca root into three stages: an early stage with slow growth (stage I), a middle stage with fast growth (stage II), and a late stage when growth slows and then stops (stage III). We then harvested maca roots in September 2015 (stage I), November 2015 (stage II), and January 2016 (stage III). For the RNA-Seq samples, we harvested 15 specimens (from 15 individuals) from each time point, which were randomly divided into three replicate groups, and the epidermis was quickly removed. The remaining tissue of each specimen was then cut into small pieces and stored in liquid nitrogen until later usage. Each group was considered as one biological repetition. Total RNA was isolated using TRIzol Reagent (Invitrogen, Carlsbad, California, USA) according to the manufacturer's instructions. DNase Ι (TaKaRa Bio Inc., Otsu, Shiga, Japan) was added to remove any genomic DNA contamination in the total RNA. Finally, the integrity and concentration of the RNA were measured using the Agilent 2100 Bioanalyzer system (Agilent Technologies, Santa Clara, California, USA).
We then isolated mRNA via oligo(dT) magnetic beads (Thermo Fisher Scientific, Waltham, Massachusetts, USA), followed by fragmentation using the Covaris M220 Focused-ultrasonicator (Covaris, Woburn, Massachusetts, USA), allowing for ~200 bp average length. The mRNA fragments were used as a template to generate first-strand cDNAs via random hexamer-primers, followed by second-strand cDNA synthesis. cDNA preparations were purified via the Agencourt AMPure XP kit (Beckman Coulter, Brea, California, USA). Adapters were ligated to the cDNA fragments after generating blunt-ends and 3′-addition of polyA tails. The fragments were then amplified by PCR. Strand-specific library preparation and paired-end sequencing (read length 125 nucleotides) were performed on the Illumina HiSeq 2500 platform (Illumina, San Diego, California, USA) by Berry Genomics (Beijing, China). Raw sequence reads were filtered using Trimmomatic version 0.30 (Bolger et al., 2014) to remove nonsense sequences including adapters, low-quality reads, reads containing poly-N, and other very short sequences. Clean reads were pooled to assemble a reference transcriptome with the Trinity software package (version 2.1.1) using the default parameters (Grabherr et al., 2011).
Unigenes were annotated via BLAST (Götz et al., 2008) by mining the following databases using an E-value cutoff of 10 −5 : the National Center for Biotechnology Information (NCBI) non-redundant protein sequence database (nr), the NCBI nonredundant nucleotide (nt) database, the Gene Ontology (GO) database, the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, the euKaryotic Orthologous Groups (KOG) database, the Pfam database (a large collection of protein families), and SwissProt. Gene expression levels are based on FPKM (fragments per kilobase of transcript per million mapped reads). The quantification of fold changes and analysis of differential gene expression was carried out with Cufflink (Trapnell et al., 2012). The software edgeR was used to analyze the differential expression genes (Robinson et al., 2010). Differentially expressed genes between two samples were defined as having a log 2 ratio ≥1 and a false discovery rate (FDR) ≤0.001. Term enrichment analyses were carried out for GO and KEGG terms.

De novo transcriptome sequencing of maca root
To characterize the transcriptome of maca root and to generate expression profiles, we collected samples representing the three developmental stages of the plant as defined above, and sequenced corresponding cDNA samples using the Illumina HiSeq 2500. More than 162 million raw reads were generated from all three developmental stages, amounting to ~23.8 Gbp of data. The raw reads were submitted to the NCBI Sequence Read Archive (BioProject ID PRJNA450202, SRA accession SRP139986). The Transcriptome Shotgun Assembly project has been deposited at GenBank (accession GGNI00000000), and the version described in this paper is the first version (GGNI01000000). The de novo assembled transcriptome had 182,270 transcripts. The average GC content of the maca root transcriptome was 39.7%. The longest transcript of each gene was used as a unigene, and de novo assembly of the filtered reads was assembled into 120,664 unigenes. Transcript and unigene length ranged from 200 to >5000 bp.
We screened genes for significant differential expression between the three defined root stages. There were 6975 differentially expressed genes identified between stage III and stage I, with 4804 genes that were up-regulated and 2171 genes that were http://www.wileyonlinelibrary.com/journal/AppsPlantSci © 2018 Shang et al.
down-regulated. Between stage III and stage II, we identified 12,508 differentially expressed genes, of which 6669 genes were up-regulated and 5839 genes were down-regulated. Finally, 12,682 differentially expressed genes were identified between stage III and stage I, with 7717 genes that were up-regulated and 4965 genes that were down-regulated. A comparison between the differentially expressed genes among the three stages is shown in Figure 4.

Analysis of differentially expressed genes
We performed term enrichment analyses for both GO and KEGG terms to characterize differentially expressed genes that may have potential roles in metabolic pathways. GO enrichment classification was carried out using the GOseq package (Guo et al., 2014). We considered GO terms to be significantly enriched when the corrected P value was <0.05. In summary, 9967 differentially expressed genes were classified into "biological process, " 4296 genes were classified into "cellular component, " and 4058 genes were classified into "molecular function. " The majority of genes involved in root growth and development were strongly enriched, such as cell proliferation (GO:0008283), cell wall organization (GO:0071555), and xyloglucan:xyloglucosyl transferase activity (GO:0016762). For term enrichment analysis of differentially expressed genes based on KEGG pathways, we used the KOBAS software package (Guo et al., 2014). We mapped differentially expressed genes to 288 KEGG pathways, and the most highly represented pathway was starch and  sucrose metabolism (Ko00500) (Fig. 5). Starch is the major carbohydrate of tuber and root crops (Hoover, 2001); sucrose and other sugars serve as substrates for starch production, and they also play a regulatory role in root development and various metabolic processes (Ravi et al., 2009). The larger part of maca root is formed by parenchyma, which is rich in starch and sugar (León, 1964). Dried maca root contains 59% carbohydrates and 8.5% fiber (Dini et al., 1994). The results from the term enrichment analysis suggest that during development maca root undergoes rapid cell division and proliferation followed by mass production of carbohydrates.
It is interesting to note that we also found that GO terms corresponding to stress responses, such as cold (GO:0009409) and wounding (GO:0009611), were significantly enriched. This may be caused by a gradual decrease in temperature starting in October, which explains the enrichment of coldresponse GO terms as the maca root adapts to increasingly colder temperatures. It is also worth noting that some genes involved in secondary metabolite biosynthesis were significantly enriched, as they were associated with the terms "positive regulation of flavonoid biosynthesis" (GO:0009963) and "glucosinolate biosynthetic process" (GO:0019761). Additionally, glucosinolate biosynthesis (PATHWAY: ko00966) was one of the top 20 KEGG pathway enrichment terms (Fig. 5). This result is consistent with the analysis of the maca genome (Zhang et al., 2016), which showed that gene families involved in secondary metabolite biosynthesis (i.e., glucosinolate biosynthesis) underwent expansion in this species. Glucosinolates and their derivatives have attracted much attention because of their biological activities. For example, benzyl isothiocyanate is a potent inhibitor of breast cancer, stomach cancer, and liver cancer (Wattenberg, 1981;Rosa et al., 2010).
Maca root has a comparatively high concentration of glucosinolates, and the glucosinolate content of fresh maca root is approximately 100 times higher than that in other cruciferous crops such as cauliflower and cabbage (Li et al., 2001). Studies in animals suggest that maca root has a range of pharmacological functions, such as enhancing fatigue resistance (Ikeuchi et al., 2009), anticancer activity (Večeřa et al., 2007), and reducing prostate hyperplasia (Gonzales et al., 2008). These reported pharmacological activities were attributed to the high glucosinolate concentrations present in maca root. Therefore, it is necessary to further study the metabolism of glucosinolate biosynthesis and to use genomics to identify some of the key genes involved in its regulation.

CONCLUSIONS
A total of 182,270 transcripts and 120,664 unigenes were identified from three developmental stages of black maca root using RNA-Seq and de novo assembly via the Trinity software package. A total of 18,321 genes were identified as differentially expressed in these three developmental stages. These genes fall into distinct molecular pathways and are closely related to regulating root development and secondary metabolite biosynthesis. To the best of our knowledge, this is the first study that investigated tissue-specific transcript profiles of black maca root during different developmental stages. Our results will provide a foundation for future investigations aimed at elucidating the molecular mechanisms of compound accumulation during maca root development.

DATA ACCESSIBILITY
The raw reads were submitted to the National Center for Biotechnology Information (NCBI) Sequence Read Archive (BioProject ID PRJNA450202, SRA accession SRP139986). The Transcriptome Shotgun Assembly project has been deposited at GenBank (accession GGNI00000000), and the version described in this paper is the first version (GGNI01000000).

SUPPORTING INFORMATION
Additional Supporting Information may be found online in the supporting information tab for this article.
APPENDIX S1. The growth dynamics of diameter and fresh weight of maca root.

FIGURE 5.
Scatter diagram of the top 20 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment terms based on differentially expressed genes. The rich factors indicate the ratio of the number of differentially expressed genes mapped to a certain pathway to the total number of genes mapped to this pathway. Greater rich factor means greater intensiveness. The Q value was calculated using the hypergeometric test through Bonferroni correction. Q value is corrected P value ranging from 0-1, and a smaller Q value signifies greater intensiveness. Gene number means number of differentially expressed genes mapped to a certain pathway.