key: cord-0939997-8kkhhdyx authors: Patel, Ankita A.; Shukla, Yogesh M.; Kumar, Sushil; Sakure, Amar A.; Parekh, Mithil J.; Zala, Harshvardhan N. title: Transcriptome analysis for molecular landscaping of genes controlling diterpene andrographolide biosynthesis in Andrographis paniculata (Burm. f.) Nees date: 2020-11-07 journal: 3 Biotech DOI: 10.1007/s13205-020-02511-y sha: c8f39a50a1ad8a69895556bf5460f9d0a56ecccc doc_id: 939997 cord_uid: 8kkhhdyx Kalmegh [Andrographis paniculata (Burm. f.) Nees.] is one of the essential medicinal plants due to an important terpenoid, i.e. andrographolide possesses immense therapeutic and pharmacological uses. The experiment was performed to elucidate the expression of candidate genes associated with andrographolide biosynthesis. Based on results obtained in chromatography for andrographolide content analysis of six genotypes, two contrast genotypes, i.e. IC-520361 (maximum andrographolide content—2.33%) and Anand Local (lowest andrographolide content—1.01%) were selected for the transcriptome analysis. A total of 1.04 Gb of raw data were produced using MiSeq Illumina platform, in which IC 520361 generated 645 million base pairs sequence along with 4,524,251 raw reads and Anand Local produced 419 million base pairs sequence along with 3,021,316 raw reads. The combined assembly of high quality reads generated for both the samples had 33,247,454 bp of total assembled bases and 38,292 of transcripts. The GC percent of assembled transcripts was 44.79%, an average read length was 800 bp and N(50) value was 1186 bp. Species-specific distribution using BLAST X (Nr), showed the highest Blast hits with Sesamum indicum. Out of 23,346 transcripts, 87% of transcripts annotated in UniProt KB (Universal Protein Resource KnowledgeBase) database and only 0.21% of transcripts were annotated in TAIR (The Arabidopsis Information Resources). Biological processes gene ontology classified based on Blast2GO showed, out of 6853 transcripts, 1370 of transcripts were represented by terpenoid biosynthetic pathway, which involved in secondary metabolite andrographolide biosynthesis. The heat map showed 1016 transcripts were differentially expressed between two kalmegh genotypes, in which nine important differentially expressed transcripts related to MEP (2C methyl-d-erythritol 4-phosphate) and MVA (Mevalonic acid) andrographolide biosynthesis pathways such as, geranyl diphosphate synthase small subunit, Isopentenyl-diphosphate delta-isomerase i-like, 4, 13-hydroxy-3-methylglutaryl-coenzyme a reductase etc. were upregulated in IC 520361 as compared to Anand Local, which were validated through RT-qPCR. The highest expression of gene 13-hydroxy-3-methylglutaryl-coenzyme a reductase (HMGR) was reported, which is responsible for accumulation of andrographolide in leaf. This comparative transcriptome analysis confirmed the expression level of genes were higher in accession IC 520361 as compare to Anand Local related to andrographolide biosynthesis pathways i.e. MEP and MVA. These up-regulated genes could be over-expressed to enhance the andrographolide content using genetic engineering of these metabolic pathways. It will also give an idea to the breeder for development of molecular markers for direct screening of the genotypes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1007/s13205-020-02511-y) contains supplementary material, which is available to authorized users. Andrographis paniculata (Burm. f.) Nees. of family Acanthaceae is the key medicinal plant of the Ayurveda system of medicine. It is popularly known as "kalmegh" in India (Gangopadhyay et al. 2014) . Out of 40 medicinally important species, 26 species of the Andrographis genus are found in India (Abhyankar and Reddy 2007) . Both India and Sri Lanka are considered as the centre of origin of kalmegh. Kalmegh is mainly established in tropical Asian regions due to humid and hot climate and cultivated in the rainy season (Niranjan et al. 2010) . The kalmegh exhibited an extensive range of medicinal, pharmacological and therapeutically applications. It is recommended as a blood purifier, antidiabetic, immune stimulant, anti-snake venom, anti-pyretic, anti-microbial, anti-fungal, anti-malarial, anti-typhoid, anti-HIV, anti-viral, anti-inflammatory, anti-fertility, anticarcinogenic, anti-bacterial, anti-parasitic, anti-thrombotic, analgesic and vermicidal (Kurian and Sankar 2007) . It was successfully used for arresting the spread of highly infectious disease 'global flu' epidemic and denoted as a wonder drug in 1919 (Mishra et al. 2007) . The medicinal properties of kalmegh are due to several phytochemical such as andrographolide, deoxyandrographolide, dideoxyandrographolide, dehydro-andrographolide, neo-andrographolide, andrographane etc. Among various active compounds, andrographolide is a diterpene lactone having vast therapeutic applications (Niranjan et al. 2010; Garg et al. 2015) . Andrographolide act as a potential inhibitor of the main protease of SARS-CoV-2 (Mpro) and docked successfully in the binding site of SARS-CoV-2 Mpro through in silico study therefore, further biochemical and cell-based assays make andrographolide promising compound for use against COVID-19 (Enmozhi et al. 2020) . The biosynthesis of andrographolide in kalmegh mainly routed through Mevalonic acid (MVA) and 2C methyl-d-erythritol 4-phosphate/1-deoxy-d-xylulose-5-phosphate (MEP/DXP) pathways (Srivastava and Akhila 2010; Jha et al. 2011; Garg et al. 2015) . The andrographolide content reaches a maximum at 80-120 days after plantation i.e. before the flowering stage. Its concentration reduces with the progress of flowering and maturity ). This indicates that before flowering stage is a perfect stage to harvest maximum andrographolide content for marketing. This stage is also ideal for chemical and molecular study of kalmegh to elucidate biosynthesis processes and pathway of andrographolide. A few studies were attempted to generate genomic resources for kalmegh, i.e. genetic variability studied using molecular markers such as, RAPD (random amplified polymorphic DNA), SSR (simple sequence repeats), AFLP (amplified fragment length polymorphism) and ISSR (inter simple sequence repeat) (Sharma et al. 2009; Wijarat et al. 2012; Minz et al. 2013; Gangopadhyay et al. 2014; Hiremath et al. 2020) . The comparative transcriptome of leaf and root studied by Garg et al. (2015) and they analyzed the biosynthesis pathway for three important Entlabdane related diterpene (Ent-LRD) specialized secondary metabolite, i.e. andrographolide, neoandrographolide and 14-deoxy-11, 12-didehydroandrographolide using Illumina HiSeq 2000 platform. Rama Reddy et al. (2015) evaluated transcriptome using Illumina MiSeq platform to predict biosynthetic pathway of sennosides from senna (Cassia angustifolia Vahl.) using young and mature leaf tissues. Ding et al. (2015) reported the complete chloroplast genome of Andrographis paniculata with genome size of 150 Kb. Cherukupalli et al. (2016) identified different transcripts of CYP450 from seven divergent clans by performing de novo leaf transcriptome using Illumina HiSeq 2000 platform in kalmegh. In silico structural/functional analysis of HMGR enzyme (ApHMGR) in Andrographis paniculata which catalyzes the first committed and rate-limiting step in MVA pathway, studied by Venkata Bindu et al. (2017) . Tong et al. (2018) revealed critical roles of hormones (gibberellin, abscisic acid and ethylene) metabolism and signal transduction during seed germination of Andrographis paniculata by RNA Seq analysis. Li et al. (2019) performed paired-end RNA-seq for de novo transcriptome construction of A. paniculata leaf with 2-years continuous cropping stress. They noted majority of genes involved in biosynthesis pathways were downregulated, which proved that continuous cropping led to a declined synthesis of active ingredient diterpenes. The current investigation was only focused on the most valuable secondary metabolite, andrographolide. The detailed analysis was carried out at a specific developmental stage i.e. 90 DAP (days after plantation) using specific tissue, leaf. The leaf contains more andrographolide content than other tissues because it is the primary site of andrographolide synthesis and at 90 DAP, the andrographolide content reached at maximum level (Pandey and Mandal 2010; Sharma et al. 2011) . This present study provided information about the expression level of genes related to andrographolide biosynthesis pathway was higher in accession IC 520361 as compare to Anand Local. These results were validated by differential gene expression analysis and qPCR. This information would be useful to increase the andrographolide content by genetic engineering tools. The seeds of four genotypes (IC 399125, IC 520361, IC 520365 and IC 520394) were procured from ICAR-National Bureau of Plant Genetic Resources (NBPGR), New Delhi, India while seeds of Anand Kalmegh-1 and Anand Local were procured from Medicinal and Aromatic Plant Research Station (MAPRS), Anand Agricultural University (AAU), Anand. The seeds were raised during mid-June in pots for 40 days followed by one-row transplantation of seedlings in field at plant spacing of 30 cm in the first week of July. For LC/MS analysis, the leaves were collected at 60, 90, and 120 DAP followed drying at room temperature for 10 days. The powder was prepared from dried leaves using the standard mixture and coarse particles were removed using mesh. The leaves powder of six genotypes were transferred into a 50 ml plastic tube and kept in a regular refrigerator. For RNA extraction of two contrast genotypes, leaf tissues of Anand Local and IC-520361 were collected from field at 90 DAP, washed with RO water and stored immediately into RNA Later solution for long term storage at -80 °C. The reference standard procured from Sigma Company was first diluted into methanol then further serially diluted in mobile phase containing water (35%) and acetonitrile (ACN) (65%). For Sample preparation, the dried leaves powder of six different genotypes collected at 60, 90 and 120 DAP (days after plantation) were weighed 0.02 gm (20 mg) and extracted into methanol for 60 min in sonicator in 15 ml plastic tube. The tubes were transferred to water bath at 60 °C for 1 h. Samples were filtered using sterile syringe of 0.22 µm nylon membrane. From diluted filtrates, 5 µl was taken and diluted into 1 ml of 65% ACN: 35% water (Shende et al. 2016) . The samples were stored at 4 °C in 2 ml glass vials till use. The percentage of andrographolide was calculated according to the following formula (Song et al. 2013; Shende et al. 2016) . Andrographolide (%) = (C × V × D)/10,000 × W; Where: C = Concentration (mg/L) of andrographolide in test solution; D = Dilution factor; V = Final make up volume (in ml) of the test solution and W = Weight (gm) of the sample used for the preparation of the test solution . The protocol of Chang et al. (1993) was deployed to isolate the total RNA from leaf tissues. The quantity of RNA was determined using NanoDrop spectrophotometer (Nan-oDrop 1000, USA) and Qubit Fluorometer® 3.0 (Thermo Fisher Scientific, USA). The RNA quality was evaluated with RNA-6000 Nanochip of Agilent 2100 Bioanalyzer. The integrity of total RNA was verified to calculate RIN (RNA integrity number) values. RNA with values between 1.9 and 2.1 at 260/280 ratio and RIN > 8.0 were selected for cDNA library preparation. TruSeq RNA sample preparation Kit v2 (Illumina, San Diego, California, USA) was used To generate paired-end cDNA sequencing libraries. The library was prepared using total RNA (1 μg) by following the manufacturer's protocol. To enhance the sequencing depth, Oligo-dT covered magnetic beads were used to purify poly-A attached mRNA by eliminating other RNA molecules. The fragmentation of enriched and purified mRNA was subjected using fragmentation buffer at 94 °C temperature in a pre-heated thermal cycler. The fragmented mRNA was reverse transcribed into the first-strand cDNA using random primers. In the presence of DNA polymerase I and RNase H, synthesis of the second-strand cDNA was performed. After end repairing and 3′ adenylation, adapters were ligated to the cDNA ends to make ready for hybridization to a single-read flow cell. Before library preparation, cDNA fragments were purified and enriched with PCR. The quality of the library was examined by DNA-1000 kit using Agilent 2100 Bioanalyzer (Agilent Technologies, California, USA). Each library was sequenced in the MiSeq platform (Illumina, San Diego, California, USA) by MiSeq Reagent Kit v2 (150 × 2 bp). The sequence data generated in present study have been deposited at NCBI in the Short/Sequence Read Archive database under the accession numbers: SRR12791806 and SRR12791807 with Bioproject ID: PRJNA667584. After assessing the read qualities in FastQC, Trimmomatic was used to eliminate the adapter/primer sequence and bad/low quality reads using default parameters (Bolger et al. 2014 ). The read filtering was performed with phred score > 30. Eventually, quality and clean reads were assembled into transcripts through Trinity (Haas et al. 2013 ) using default parameters. Short transcripts with > 90% identity was discarded, and transcripts were clustered in CD-HIT-EST v4.6.1 tool (Li and Godzik 2006) to remove the redundancy in assembly. For functional assignment of each unique transcript, contigs were BlastX with the non-redundant (nr) database at an e-value cut-off of < 1e −6 . Blast2GO was used for gene ontology (GO) mapping annotation of contigs (Conesa et al. 2005) to catalog the functions of the predicted coding DNA sequences (CDS). Sequences similarity search was also carried out against the KEGG (Kyoto Encyclopedia Gene and Genome) database for functional annotation of CDS. edgeR v2.14 was used to determine the differential gene expression, i.e. up/down regulation of transcripts. The logtransformed and normalized value of genes based on both Pearson uncentered correlation distance and complete linkage method, a heat map was generated. The differentially expressed CDs were represented in the form of volcano graph and heatmap by using normalized and transformed expression values, respectively. The primers were designed from differentially expressed transcripts/genes (in fasta format) in both genotypes related to andrographolide biosynthesis pathway using Batchprimer3 online software. The sequences used for RT-qPCR primer designing are listed in Supplementary Table S1 . The best primers were selected after analyzing using IDT oligo Analyzer 3.1 and synthesized from MWG Operon, Bangalore, India. The first strand of cDNA was synthesized using 2 μg of total RNA in 20 μl reaction volume using Prime Script™ first-strand cDNA synthesis kit (TAKARA company) for both the genotypes from total RNA. PCR reactions were performed in 96-well plates using CFX96™ Real-Time BioRad PCR with SYBR Green detection chemistry. Actin was used as an endogenous gene. The CFX Manager™ software 2.1 (BioRad) was used for expression analysis of genes following Livak and Schmittgen (2001) 2 (-ΔΔ CT) method. There are 28 species of genus Andrographis, which are distributed in the tropical Asian region, among them only a few species are medicinally important and A. paniculata is the most important and widespread species (Raina et al. 2013) . By DIVA-GIS analysis, Raina et al. (2013) had identified five promising germplasm (i.e. IC 520361, IC 520395, IC 399125, IC 369404 and IC 520394) with high andrographolide content. Out of five, four promising genotypes (viz, IC 520361, IC 399125, IC 520365 and IC 520394) and two local genotypes (i.e. Anand Local and Anand kalmegh 1) were used for LC/MS analysis. The LC/MS study was done at three different growth stages, i.e. 60, 90, and 120 DAP to quantify the level of andrographolide from kalmegh leaves according to the methodology developed by Shende et al. (2016) . The elution flow rate of andrographolide was 0.5 µl/min; the calibration curves for andrographolide was found linear with regression coefficient r = 0.9962. The overall limit of detection (LOD) and limit of quantification (LOQ) were 8 ng and 27 ng/ml, respectively. Based on LC/ MS results, it was observed that the level of andrographolide was higher at 90 DAP as compared to 60 and 120 DAP, the results of quantification of andrographolide content was in agreement with earlier literature Sharma and Sharma 2013; Raina et al. 2013 ). Among six genotypes, two genotypes (IC-520361 and Anand Local) were selected for RNA-seq, as IC 520361 having maximum andrographolide content and Anand Local having lowest andrographolide content in leaf tissue at all three stages of development viz, 60 DAP, 90 DAP, and 120 DAP (Table 1) . Transcriptome analysis is a strong and competent tool to explore the gene expression pattern (Kulkarni et al. 2016) . For transcriptome study, the good quality total RNA of both the samples along with RIN (RNA Integrity Number) value more than 8, converted into cDNA libraries for both genotypes. Normally, the range of RIN value is one to ten, in which one shows most degraded RNA profile whereas, ten shows most intact RNA profile (Schroeder et al. 2006) . The RIN values of both kalmegh genotypes were ranged in between eight to nine, which showed good quality for further downstream processes. Cherukupalli et al. (2016) used RNA with more than seven RIN value for kalmegh leaf transcriptome study. Using NGS (next generation sequencing) MiSeq Illumina Platform, 645 million base pairs sequence along with 4,524,251 raw reads for IC 520361 and 419 million base pairs sequence along with 3,021,316 raw reads in case of Anand Local were generated (Table 2) . To generate high-quality reads, pre-processing of raw reads was carried out using FastQC and Trimmomatic to remove the errors and artefacts including adaptor sequence and low quality sequences. Cherukupalli et al. (2016) used high quality reads having ≥ 20 phred score to build up transcriptome. According to standard manual of FastQC, the Q ≥ 30 shows 99.9% base accuracy. In this FastQC report, the overall phred quality score Q was > 30 for both the genotypes, which showed that per base sequence quality was good for both the genotypes. To date, Trinity assembler was most efficiently used for assembling the contigs especially for short-read data of Illumina platform (Soetaert et al. 2013; Jayakodi et al. 2014; Gao et al. 2015) . The combined assembly of high quality reads generated by Trinity for both the samples had 33,247,454 bp of total assembled bases and 41,553 of transcripts. The GC percent of assembled transcripts was 44.79%, an average read length was 800 bp and N 50 value was 1186 bp. The final assembly produced through CD-HIT was of 38,292 contigs (Table 3 ). The N 50 value 1186 bp was higher than N 50 value of previously reported transcriptome assembly such as, Garg et al. (2015) , Kalariya et al.(2018) , Rama Reddy et al. (2015) etc. which showing the better transcriptome assembly. A total of 38,292 transcripts (contigs) assembled by Trinity were subjected to functional annotation (Suppl Fig. 1 and Suppl Table 2 ). Based on the data of species-specific distribution, it can be concluded that the transcripts showed the highest Blast hits with Sesamum indicum (19,113) followed by Erythranthe guttata (3545) and Coffea canephora (449) (Fig. 1) . S. indicum and A. paniculata belong to same order Lamiales and share common ancestry which could have allotted kalmegh top Blast hit with Sesamum indicum (Cherukupalli et al. 2016) . The data also presented Blast hits with A. paniculata (54), which is comparatively very less as the negligible amount of genomic information of the Acanthaceae family species in the database (Garg et al. 2015 Fig. 2 ). The functionally annotated 23,346 transcripts were classified using Blast2GO into three main domains: biological processes gene ontology, cellular components gene ontology and molecular functions gene ontology (Suppl Fig. 3 and Suppl Table 3 ). In gene ontology, total transcripts of biological processes were 6853, cellular components were 6763, and molecular functions were 9730. In biological processes gene ontology (Fig. 2) , the maximum numbers of transcripts were represented by metabolic processes (1938), followed by the terpenoid biosynthetic process (1370). Other transcripts such as, carotenoid biosynthetic process (994), flavonoid biosynthetic process (273), lipid metabolic process (669), carbohydrate metabolic process (1172), tricarboxylic acid cycle (652), cellular amino acid metabolic process (734) etc. were observed. The andrographolide is chemically diterpene and this terpenoid synthesized from terpenoid backbone biosynthesis pathway (Martin and Demain 1980) . The previous reports of Rama Reddy et al. (2015) and Soetaert et al. (2013) also predicted the major role of terpenoid backbone biosynthesis pathway in specialized secondary metabolite biosynthesis. Therefore, biological processes gene ontology annotation in this study supported the biosynthesis of secondary metabolite andrographolide through terpenoid biosynthetic process in kalmegh. In cellular component gene ontology (Fig. 3) , the maximum number of transcripts associated with plastid (1451) followed by, peroxisome (1176) and then cytoplasm (1149). The other important transcripts were also annotated such as, vacuole (1113), membrane (1102), mitochondrion (903), nucleus (849) etc. The biosynthesis of secondary metabolite andrographolide occurs in plastid and cytoplasm by following plastidial MEP pathway and cytosolic MVA pathway, respectively (Jha et al. 2011; Tholl and Lee 2011; Soetaert et al. 2013; Garg et al. 2015) . As per the cellular component gene ontology in present transcriptome analysis, the both plastidial MEP pathway and cytosolic MVA pathway are functional for the biosynthesis of andrographolide in Andrographis paniculata and the storage of andrographolide, the secondary metabolite occurs in vacuole. In molecular (Fig. 4) , the highest transcripts were attributed to structural molecule activity (3160), followed by binding (1471). The functional annotation also reported transcripts related to biosynthesis of secondary metabolites such as 4-hydroxy-3-methylbut-2-en-1-yl diphosphate synthase activity (1370), 1-deoxy-d-xylulose-5-phosphate synthase activity (1355), Ent-kaurene synthase activity (65), hydroxymethylglutaryl-CoA reductase-NADPH activity (750), isopentenyl-diphosphate delta-isomerase activity (458), terpene synthase activity (234) (Garg et al. 2015 ; Rama Reddy et al2015). Therefore, the molecular function gene ontology also aided to analyse the transcripts associated with biosynthesis of andrographolide in kalmegh. KEGG pathway analysis revealed that maximum annotated transcripts were ascribed to transferase followed by hydrolases and oxidoreductases class of enzymes (Fig. 5) . In response to secondary metabolite biosynthesis pathways MEP and MVA, transferases such as 2-C-methyl-d-erythritol-4-phosphate cytidylyltransferase, N-benzoyltransferases, glycosyltransferases, xylosyltransferases, Acetyl-CoA C-acetyl transferase etc., reductases such as hydroxymethylglutaryl-CoA reductase, 4-hydroxy-3-methylbut-2-enyl diphosphate reductase etc. and isomerases such as isopentenyl diphosphate isomerase, isopentenyl-diphosphate deltaisomerase etc. (Hao et al. 2011; Soetaert et al. 2013; Garg et al. 2015) were reported to be involved in medicinal plants such as Taxus, Senna and kalmegh. Based on above-reported enzymes transferases, reductases and isomerases, it can be concluded that these enzymes play an important role in andrographolide biosynthesis in Andrographis paniculata. Based on KEGG pathway analysis, a total of 111 metabolic pathways detected, out of which terpenoid backbone biosynthesis (140) and biosynthesis of antibiotics (138) are prominently present (Figs. 6 and 7; and Suppl Table 4 ). As the andrographolide is diterpene lactone and synthesized using terpenoid backbone biosynthesis (Martin and Demain 1980; Rama Reddy et al. 2015 and Soetaert et al. 2013) . The indirect role of starch and sucrose metabolism in regulating secondary metabolism was studied by Lloyd and Zakhleniuk (2004) by overexpressing the enzymes involved in primary metabolism. The other important pathways such as carotenoid biosynthesis and flavonoid biosynthesis pathways also involved in the biosynthesis of other secondary metabolites. Andrographolide is an important secondary metabolite of kalmegh. The present investigation was dedicated on identify genes/transcripts intricate in the andrographolide synthesis pathway. Andrographolide in A. paniculata is synthesized by MVA (Mevalonic acid) and MEP/DXP (2 C methyl-d-erythritol 4-phosphate/1-deoxy-d-xylulose-5-phosphate) pathways (Srivastava and Akhila 2010) . Jha et al. (2011) studied the differential expression of 3-hydroxy-3-methylglutaryl-coenzyme-A reductase in A. paniculata for andrographolide accumulation, which involves in the biosynthesis of andrographolide. In recent leaf transcriptome analysis, the presence of transcripts 3-hydroxy-3-methylglutaryl-coenzyme a reductase (HMGR) (TR25400|c0_g1_i1) and 13-hydroxy-3-methylglutaryl-coenzyme a reductase (TR25482|c0_g1_i1) indicated the accumulation of andrographolide in A. paniculata. In plants, two 5C isoprenoid building blocks of diverse terpene metabolites like dimethylallyl diphosphate (DMAPP) and isopentenyl diphosphate (IPP) synthesized by MEP and MVA pathways (Tholl and Lee 2011) . Monoterpenes, diterpenes and tetraterpenes synthesized from IPP and DMAPP, which derived from the MEP pathway and sesquiterpenes and triterpenes synthesized from IPP and DMAPP, which derived from the MVA pathway (Garg et al. 2015) . Most of the critical enzymes needed for biosynthesis of andrographolide using MVA and MEP pathways had identified in present investigation, during transcript function annotation (Supplementary Table S5 ) (Srivastava and Akhila 2010; Jha et al. 2011; Hao et al. 2011; Garg et al. 2015; Rama Reddy et al. 2015) . Some of the transcripts showed multiple isomers of particular enzymes during functional annotation, such as, two transcripts for 1-deoxyxylulose 5-phosphate synthase, gibberellin 2-beta-dioxygenase, 1-deoxy-d-xylulose 5-phosphate reductoisomerase, diphosphomevalonate decarboxylase, gibberellin oxidase and three transcripts for isopentenyl-diphosphate delta-isomerase, hydroxy-3-methylglutaryl-coenzyme a reductase, phosphomevalonate kinase, were observed. In case of cytochrome p450 and cytochrome p450 monooxygenase multiple isomers were reported. Based on previous reports (Srivastava and Akhila 2010; Tholl and Lee 2011; Jha et al. 2011; Garg et al. 2015; Sun et al. 2019 ) and KEGG pathway of terpenoid backbone biosynthesis (Fig. 7) , the tentative biosynthesis pathway of andrographolide in kalmegh was proposed (Fig. 8) . The MEP pathway in plastid provides 5C isoprenoid building blocks for the biosynthesis of chlorophyll, carotenoids, gibberellins, monoterpenes and diterpenes (Lichtenthaler, 1999) . Transcripts/enzymes involved in andrographolide biosynthesis were efficiently expressed in leaf tissue as compared to other tissues, i.e. Geranyl geranyl diphosphate synthase and Cytochrome P450s were reported in present investigation (Garg et al. 2015; Sun et al. 2019) . In functional annotation, 96 different isomers (cytochrome p families 70, 71, 73, 76, 78, 81, 82, 83, 84, 85, 86, 87, 90, 93, 94, 703, 704, 707, 711, 714, 716, 736, 749 , Monooxygenase etc.) of cytochrome p450 related to specialized terpene metabolism were identified (suppl Table S7 ). Cytochromes are membrane-bound hemoproteins play important role in an array of pathways in primary and secondary metabolism in plants. The cytochrome p450 enzyme converts Ent-diterpenol into andrographolide and deoxy-didehydroandrographolide secondary metabolites, whereas cytochrome p450 GT (glycosyl transferase) converts Ent-diterpenol into neoandrographolide (Garg et al. 2015; Sun et al. 2018) . In functional annotation of this leaf transcriptome, the cytochrome p450 GT was not reported may be because of its higher expression was reported in root tissue, not in leaf (Garg et al. 2015) . The expression of deoxy-didehydroandrographolide was highest in stem, whereas, very low in leaf tissue (Garg et al. 2015) and the highest andrographolide was detected in leaf as compared to other panchang (Pandey and Mandal 2010; Sharma et al. 2011) . Therefore, some other isomers of cytochrome p450 (cytochrome p450 98, cytochrome p450 706, etc.) highly expressed in stem might be involved in Fig. 8 Putative MEP and MVA biosynthesis pathway of andrographolide in kalmegh (Red color highlighted genes/enzymes expressed in these pathways based on RT qPCR analysis). DXS 1-Deoxy-d-xylulose-5-phosphate synthase, DXR 1-Deoxy-d-xylulose-5-phosphate reductoisomerase, CMK 4-Diphosphocytidyl-2-C-methyld-erythritol kinase, MDS 2-C-Methyl-d-erythritol 2,4-cyclodiphosphate synthase, HMDS 4-Hydroxy-3-methylbut-2-enyl-diphosphate synthase, IDI Isopentenyl-diphosphate delta-isomerase, GGPS Geranylgeranyl diphosphate synthase, HMGR Hydroxymethylglutaryl-CoA reductase, MK mevalonate kinase, PMK Phosphomevalonate kinase deoxy-didehydroandrographolide biosynthesis which were not detected in present leaf transcriptome study. The quantity of reads produced through RNA-seq is directly equivalent to that transcript's relative abundance (Parekh et al. 2018) . Differential gene expression profile between genotype consists of higher andrographolide content (IC 520361) and genotype consists of lower andrographolide content (Anand Local) was created using the edgeR software tool to study the transcript abundance differentially expressed between these two genotypes. Genotype Anand Local was considered as control as containing lower andrographolide content for analysis. A metrics for measuring transcript abundance is represented as Fragments per Kilobase per Million (FPKM), which decides the level of expression depends on the abundance in expected mapped fragments expressed. Most of the transcripts involved in andrographolide biosynthesis pathway were up-regulated which predicted based on LogFC value (Suppl Table S6 ). LogFC is the fold change ratio of Log 2 . FC value greater than zero were considered up-regulated whereas, FC value less than zero were considered down-regulated. Only two (based on previous reports and KEGG pathway analysis) transcript isomers of Cytochrome p450 were represented, but most of the transcript isomers of Cytochrome p450 were up-regulated in the present experiment. Based on the results obtained in differential gene expression analysis, the primers for quantitative real-time PCR were synthesized for the validation of these results. To document the differentially expressed transcripts, the effective numbers of counts were input to the edgeR, including fold change and statistical significance for two genotypes (Fig. 9) . The heat map showed 1016 transcripts were differentially expressed between two kalmegh samples (Suppl Fig. 4 ). Many transcripts were found differentially expressed related to the biosynthesis pathway of secondary metabolite andrographolide. Transcripts showed higher Log FC value means up-regulation in IC 520361 as compare to Anand Local were geranyl diphosphate synthase small subunit, gibberellin 2-beta-dioxygenase 8, 1-deoxy-d-xylulose-5-phosphate synthase, isopentenyl-diphosphate deltaisomerase i-like, 4-hydroxy-3-methylbut-2-enyl diphosphate synthase, 1-deoxy-d-xylulose 5-phosphate reductoisomerase, 3-hydroxy-3-methylglutaryl-coenzyme a reductase, 13-hydroxy-3-methylglutaryl-coenzyme a reductase, Entkaurene oxidase, Mevalonate kinase, Phosphomevalonate kinase (PMK), Ent-kaurene synthase chloroplast (KS) and Fig. 9 The MA [M (log ratios) and A (mean average] plot showing differential expression and volcano plot reporting false discovery rate between two kalmegh genotypes Cytochrome p450, whereas, transcripts showed lower Log FC value means down-regulation in IC 520361 as compare to Anand Local were 4-diphosphocytidyl-2-c-methyl-derythritol kinase chloroplastic (CMK), 2-c-methyl-d-erythritol -cyclodiphosphate synthase chloroplastic-like (MDS), Gibberellin 20 oxidase 1-d-like (GA20ox) and Gibberellin 3-oxidase (GA3ox) (Suppl Table S6 ). Quantitative PCR was performed to validate transcripts/ genes related to andrographolide biosynthesis pathway, which were differentially expressed in two different genotypes of kalmegh (IC 520361 and Anand Local) in differential gene expression analysis using edgeR software. The genes/transcripts used for primer designing are the key enzymes of andrographolide biosynthesis pathways. The transcripts, Ent-copalyl diphosphate synthase and Geranyl geranyl diphosphate synthase were not observed in this functional annotation; therefore ApU48901 and ApU8378 were used as reference transcripts from Garg et al. (2015) . Some other transcripts of key enzymes such as, 1-Deoxy-D-xylulose-5-phosphate synthase, 1-Deoxy-d-xylulose-5-phosphate reductoisomerase, Isopentenyl-diphosphate delta-isomerase and Hydroxymethylglutaryl-CoA reductase reported in isomers; so, to study all possible expression level, isomers of key enzymes from both (present and Garg et al. 2015) transcriptome were used. The other transcripts ApU8165, ApU46382, ApU9344 and ApU12883 were also taken from Garg et al. (2015) . Of 20 primers, ten primers (AP1 to AP10) were designed based on the analysis of differential gene expression study, the sequence of six primers (AP 11-16) was taken from Garg et al. (2015) and four primers for endogenous genes. Out of these, ten primers were selected during PCR screening, and actin was selected as an endogenous gene, which stability was good in kalmegh, reported by Garg et al. (2015) and Cherukupalli et al. (2016) . The gene expression data analysis was performed following Livak and Schmittgen (2001) 2 (-ΔΔCT) method ( Fig. 10 and Suppl Table S8 ). Anand Local was used as a control genotype as it consists of lower andrographolide content. Based on the Log2 fold change RQ 2 (-ΔΔ CT) results of quantitative PCR, it can be summarized that the expression level of candidate genes/transcripts related to andrographolide biosynthesis pathway such as, geranyl diphosphate synthase small subunit (TR776|c0_g1_i1), Gibberellin 2-beta-dioxygenase 8 (TR22713|c0_g1_i1), Isopentenyldiphosphate delta-isomerase i-like (TR24370|c0_g1_i1), 4-hydroxy-3-methylbut-2-enyl diphosphate synthase (TR18298|c0_g1_i1), 4-hydroxy-3-methylbut-2-enyl diphosphate synthase (TR25400|c0_g1_i1), 3-hydroxy-3-methylglutaryl-coenzyme a reductase (TR25482|c0_g1_ i1), 13-hydroxy-3-methylglutaryl-coenzyme a reductase (TR25482|c0_g1_i1), Ent-copalyl diphosphate synthase and geranylgeranyl diphosphate synthase were upregulated in IC 520361. Therefore, the expression level of these genes involved in andrographolide biosynthesis were relatively higher in IC 520361 as compare to Anand Local. The highest expression of gene 13-hydroxy-3-methylglutaryl-coenzyme a reductase (HMGR) was reported, which is responsible for accumulation of andrographolide in leaf (Jha et al. 2011) . The results obtained from real-time PCR showed similarity with differential gene expression analysis. These up-regulated genes of MEP and MVA pathways of andrographolide biosynthesis could be over-expressed to increase the andrographolide content to cope up with its higher economic value using genetic engineering of these metabolic pathways. The study revealed that the transcriptome analysis guided about the level of transcripts expression associated with andrographolide biosynthesis pathway. Candidate genes/ transcripts of MEP and MVA pathways of andrographolide biosynthesis such as, geranyl diphosphate synthase small subunit, gibberellin 2-beta-dioxygenase 8, isopentenyldiphosphate delta-isomerase i-like, 4-hydroxy-3-methylbut-2-enyl diphosphate synthase, 4-hydroxy-3-methylbut-2-enyl diphosphate synthase, 3-hydroxy-3-methylglutaryl-coenzyme a reductase, 13-hydroxy-3-methylglutaryl-coenzyme a reductase, Ent-copalyl diphosphate synthase and Geranylgeranyl diphosphate synthase showed relatively higher expression in accession IC 520361 as compare to accession Anand Local. The content of andrographolide could be enhanced to get higher market demand by over-expressing these genes (especially, 13-hydroxy-3-methylglutaryl-coenzyme a reductase) involved in MEP and MVA pathways using promising genotypes by genetic modifications. The molecular markers related to andrographolide biosynthesis pathway could be developed based on said leaf transcriptome analysis, which would be useful to breeders for direct screening of different uncharacterized genotypes of kalmegh. Rapid micropropagation via axillary bud proliferation of Adhatoda vasica Nees from nodal segments Trimmomatic: a flexible trimmer for Illumina sequence data A simple and efficient method for isolating RNA from pine trees De novo assembly of leaf transcriptome in the medicinal plant Andrographis paniculata Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research The complete chloroplast genome sequence of the medicinal plant Andrographis paniculata Andrographolide as a potential inhibitor of SARS-CoV-2 main protease: an in silico approach RAPD based genetic variability assessment in Andrographis paniculata (Burm f.) Wall. ex Nees (Kalmegh): an important hepatoprotective herb Transcriptome sequencing of Codonopsis pilosula and identification of candidate genes involved in polysaccharide biosynthesis Andrographis paniculata transcriptome provides molecular insights into tissue-specific accumulation of medicinal diterpenes De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis The first insight into the tissue specific Taxus transcriptome via illumina second generation sequencing Sundaresan V (2020) Morphometric, chemotypic, and molecular diversity studies in Andrographis paniculata Transcriptome profiling and comparative analysis of Panax ginseng adventitious roots Differential expression of 3-hydroxy-3-methylglutaryl-coenzyme A reductase of Andrographis paniculata in andrographolide accumulation De novo transcriptome analysis deciphered polyoxypregnane glycoside biosynthesis pathway in Gymnema sylvestre De novo transcriptome sequencing to dissect candidate genes associated with pearl millet-downy mildew Kalmegh. In: Peter KV (ed) Horticulture science series: medicinal plants Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences Transcriptome profiling reveals metabolic alteration in Andrographis paniculata in response to continuous cropping The plant 1-deoxy-D-xylulose 5 phosphate pathway for biosynthesis of isoprenoids Analysis of relative gene expression data using real-time quantitative PCR and the 2 -∆∆CT Method Responses of primary and secondary metabolism to sugar accumulation revealed by microarray expression analysis of the Arabidopsis mutant, pho3 Control of antibiotic biosynthesis Genetic variability among Andrographis paniculata in Chhattisgarh region assessed by RAPD markers Andrographis paniculata (Kalmegh), a review Biological activities of Kalmegh (Andrographis paniculata Nees.) and its active principles-a review Variation in morphological characteristics and andrographolide content in Andrographis paniculata (Burm. f.) Nees. of central India Transcriptomic profiling of developing fiber in levant cotton Andrographis paniculata (Burm. f.) Wall. ex Nees (kalmegh), a traditional hepatoprotective drug from India Next generation sequencing and transcriptome analysis predicts biosynthetic pathway of sennosides from senna (Cassia angustifolia Vahl.), a non-model plant with potent laxative properties The RIN: an RNA integrity number for assigning integrity values to RNA measurements Identification, purification and quantification of andrographolide from Andrographis paniculata Burm. f. Nees by HPTLC at different stages of life cycle of crop Assessment of intraspecific variability at morphological, molecular and biochemical level of Andrographis paniculata (Kalmegh) Quantitative HPLC analysis of andrographolide in Andrographis paniculata at two different stages of life cycle of plant Development and validation of highly sensitive LC-MS/MS method for andrographolide quantification and confirmation Differential transcriptome analysis of glandular and filamentous trichomes in Artemisia annua Biosynthesis of andrographolide in Andrographis paniculata The genome of the medicinal plant Andrographis paniculata provides insight into the biosynthesis of the bioactive diterpenoid neoandrographolide Terpene specialized metabolism in Arabidopsis thaliana RNA-sequencing analysis reveals critical roles of hormone metabolism and signaling transduction in seed germination of Andrographis paniculata Comparative protein profile studies and in silico structural/functional analysis of HMGR (ApHMGR) in Andrographis paniculata Genetic evaluation of Andrographis paniculata (Burm. f.) Nees. revealed by SSR, AFLP and RAPD markers Acknowledgments Authors would like to thank Anand Agricultural University for providing laboratory facility during experiment and ICAR-NBPGR, New Delhi, India for providing the seed material.Author contributions YMS and SK conceived of the experiment. AAP carried out field experiment. AAS performed biochemical analysis. AAP and MJP performed RNA sequencing. MJP, HNZ, AAP and SK analyzed and interpreted the data. AAP, HNZ, SK, MJP and YSM prepared the manuscript. AAP, HNZ and SK revised the manuscript draft. Conflict of interest None of the authors has any financial or personal relationships that could inappropriately influence or bias the content of the research paper. The authors declare that they have no conflict of interest in the publication.