key: cord-0280398-xlabkpgx authors: Taheri, Sima; How, Teo Chee; Heslop-Harrison, John S.; Schwarzacher, Trude; Seong, Tan Yew; Yee, Wee Wei; Khalid, Norzulaani; Biswas, Manosh Kumar; Mutha, Naresh V R; Mohd-Yusuf, Yusmin; Gan, Han Ming; Harikrishna, Jennifer Ann title: Genome assembly and analysis of the flavonoid and phenylpropanoid biosynthetic pathways in Fingerroot ginger (Boesenbergia rotunda) date: 2022-05-11 journal: bioRxiv DOI: 10.1101/2022.05.11.491478 sha: c184e2b7e0b9cb77e7d45935159959295fb391d1 doc_id: 280398 cord_uid: xlabkpgx Boesenbergia rotunda (Zingiberaceae), is a high-value culinary and ethno-medicinal plant of Southeast Asia. The rhizomes of this herb have high flavanone and chalcone content. Here we report genome analysis of B. rotunda together with a complete genome sequence as a hybrid assembly. B. rotunda has an estimated genome size of 2.4 Gb which was assembled as 27,491 contigs with N50 size of 12.386 Mb. The highly heterozygous genome encodes 71,072 protein-coding genes and has 72% repeat content, with class I TEs occupying ∼67% of the assembled genome. Fluorescence In Situ Hybridization of the 18 chromosome pairs at metaphase showed six sites of 45S rDNA and two sites of 5S rDNA. SSR analysis identified 238,441 gSSRs and 4,604 EST-SSRs with 49 SSR markers common among related species. Genome-wide methylation percentages ranged from 73% CpG, 36% CHG and 34% CHH in leaf to 53% CpG, 18% CHG and 25% CHH in embryogenic callus. Panduratin A biosynthetic unigenes were most highly expressed in watery callus. B rotunda has a relatively large genome with high heterozygosity and TE content. This assembly and data (PRJNA71294) comprise a source for further research on the functional genomics of B. rotunda, the evolution of the ginger plant family and the potential genetic selection or improvement of gingers. expression in the different samples i.e., in vitro leaf (IVL), embryogenic callus (EC), and non-271 embryogenic calli (dry callus (DC) and watery callus (WC)) using ex vitro leaf (EVL) samples 272 as the comparator (Fig. 5 , Table S6 ). The first enzyme in the phenylpropanoid pathway is 273 phenylalanine ammonia-lyase (PAL) which converts phenylalanine to cinnamic acid. PAL below the threshold of FPKM without any differential expression in studied samples. 300 301 DNA methylation analysis using bisulfite sequencing 302 Genome wide methylation percentages determined from bisulfite sequence data from leaf and 303 four tissue cultured samples, were higher in all methylated cytosine contexts for samples from 304 EVL (CpG 73.2%, CHG 36.2%, CHH 33.7%) and IVL (CpG71.3%, CHG 35.4%, CHH 33.5%). 305 The lowest levels were for EC (CpG 53.4%, CHG 18.5%, CHH 25.3%), followed by WC (CpG 306 63.8%, CHG 21.9%, CHH 28.1%) and DC (CpG 68.4%, CHG 25.9%, CHH 28.6%). We also 307 evaluated DNA methylation levels of three groups of genes (30 genes in total) including DNA 308 methyltransferase-related genes across the genome of B. rotunda (Fig. 6a-d) . In general, CHH 309 methylation levels were higher than methylation levels in CpG and CHG context and out of 30 310 genes, 22 genes (73.3%) showed low methylation levels (<0.1) in CHG and CHH cytosine 311 contexts whereas only 30% of the genes showed low methylation levels in the CpG context. 312 Cytosine methylation of methylation-related genes in all cytosine contexts (CpG, CHG & CHH) 313 was the highest for DRM2 and followed by MET1, and CMT3 ( Fig. 6a-d) . Among somatic 314 embryogenesis-related genes, WOX gene was heavily methylated in CpG and CHG contexts 315 compared to other somatic embryogenesis-related genes (LEC2, BBM, SERK) ( Fig. 6a-d) . For 316 pathway-related genes, LPOs methylated more in all studied samples compared to other genes. 317 Correlation between gene expression levels and DNA methylation levels of genes related 319 to methylation, somatic embryogenesis and secondary metabolite pathway 320 From gene expression analysis, we observed that the expression level of DNA 321 methyltransferase genes, MET 1, CMT 3, and DRM2 was higher in callus than leaf samples 322 and was highest in embryogenic callus (EC) for all three genes and lowest in in vitro leaf (IVL) 323 ( Fig. 7a ). DRM2 showed the lowest level of expression and the highest level of DNA 324 methylation. DNA methylation levels of these genes at CpG, CHG, CHH cytosine contexts 325 were the highest in DC and WC with similar and lower methylation levels in the embryogenic 326 callus and leaf samples. Overall, expression of methylation-related genes was higher in samples 327 EC, DC, and WC but other than for CMT3, which showed an inverse relationship between 328 expression level and methylation levels, there was no clear correlation between level of DNA 329 methylation and level of gene expression (Fig. 7b) . Similarly, while there were different 330 expression patterns for the four somatic embryogenesis-related genes SERK, BBM, LEC2, and 331 WOX between different leaf and callus samples (Fig. 7c) , the DNA methylation level of each 332 gene across the different leaf and callus samples was largely unchanged (Fig. 7d) . A 333 comparison of 23 of B. rotunda genes involved in flavonoid and phenylpropanoid pathways 334 showed them to be expressed differentially in B. rotunda leaf and callus samples. Among them, 335 BGLU, CAD, CHS, LPO8, LPO9 and PAL were expressed more highly in callus than in leaf 336 samples (Fig. 7e) . The highest level of DNA methylation was observed for HCT, CCR, and 337 LPO2 genes in all studied samples and again, there was no general correlation between gene 338 expression levels and methylation levels for these samples (Fig. 7f) . 339 We present a genome assembly of Boesenbergia rotunda (2n=36) with an estimated genome 341 size of 2.4Gb. The genome of the plant we sequenced, when in cultivation a largely vegetatively 342 propagated species, shows an unusually high heterozygosity of 3.01%, suggesting that the 343 cultivar may be of hybrid origin or may have undergone whole genome duplication events. 344 This is also suggested based on the large number of unigenes in B. rotunda, notably more than 345 and this has also been correlated with evolution of genome size in angiosperms 76 . The large 377 genome size and high repeat content of B. rotunda with relatively low gene body cytosine 378 methylation levels of the genes selected for observation in the current study, fit well with this 379 model and it will be interesting to compare this with other Boesenbergia species in the future 380 when similar data becomes available. 381 As DNA methylation is dynamic, we saw variations in global DNA methylation levels in the 382 different samples. Unmethylated DNA has been shown to demarcate expressed genes 81 and so 383 to be able to examine this in the context of gene expression in B. rotunda and to add depth to 384 our genome data, we included deep sequencing of leaf and callus transcriptomes from B. 385 rotunda. There are several alternative tools for the de novo assembly of RNA-seq short reads 386 into a reference transcriptome and we compared analysis from four assemblers. The quality of 387 assembly was noticeably affected by both k-mer size and assembler tool, with Oases delivering 388 the highest N50 size and average contig length at k-mer 21 compared to at k-mer 24 or other 389 assemblers ( Figure S4 , Table 3 ), indicating more effective and accurate assembly. In 390 comparison to a previous transcriptome assembly of B. rotunda by SOAPdenovo-Trans de 391 novo assembler, our study obtained a longer N50 size (1,019) compared to an N50 value of 236 392 reported by 40 . An Oases assembly of genome sequence data from a Fern, Lygodium japonicum 393 was also found to give the best mean transcript length and N50 size when compared to 394 assemblies using Trinity and SOAPdenovo-Trans 82 . The BUSCO assessment of B. rotunda 395 transcriptome data also showed that Oases had higher numbers of complete and single copy 396 contigs and less fragmented contigs. Based on this, the transcriptome assembly using Oases 397 offered an improved resource for genome annotation and the gene expression study in B. quantitative qRT-PCR validation suggested that the higher level of expression of 420 methyltransferase-related genes and the lower CG, CHG and CHH sequence contexts in EC 421 samples was negatively correlated with the total methylation level of DNA methyltransferase-422 related genes 84 . We did observe a similar pattern for CMT3 in all five sample types in the 423 current study (Fig. 7) , however, no similar correlation between expression level and cytosine 424 methylation was observed in the current data for the other genes examined. The lack of 425 correlation between transcript expression and the respective gene body methylation from our 426 data may be due to the limitations of the current genome assembly such that the cis regions 427 could not be well annotated. In the future a higher resolution genome assembly for B. rotunda 428 would be useful to examine the methylation data from the current study. 429 Although only a minor portion of the B. rotunda genome at around 0.35%, microsatellites are 430 key elements in plant genomes. Among these, short sequence repeat microsatellites (SSRs) 431 have found wide utility as co-dominant markers useful in breeding and diversity studies 85,86 . 432 In this study, we identified genomic and EST-SSRs from B. rotunda, designing primers and 433 showing several to have transferability to Musa and Ensete genomes, mostly in silico analysis, 434 but with 14 tested in PCR experiments. Boesenbergia, Musa and Ensete are members of the 435 same plant family Zingiberales, and all have abundant AT-rich SSR sequences, however they 436 are not from the same genus, so are phylogenetically somewhat distanced as reflected in the 437 fairly low numbers with potential as markers across these species. Nevertheless, these newly 438 developed SSR markers enhance the genetic resources for B. rotunda as well as the plant family 439 Zingiberales and these markers could be utilized for genotyping, population structure analysis, 440 association studies, cultivar identification as well as any other breeding application of the 441 Boesenbergia spp. and low-quality nucleotides. PacBio reads were filtered to remove the short reads of less than 552 500bp or a quality score lower than 0.8, then error correction for the long reads done by 553 FALCON 96 , following the general principles proposed by 97 . We have tried to use several de 554 novo assemblers to construct the assembly with both Illumina and PacBio reads. Finally, we 555 chose the SMARTdenovo 98 . Corrected PacBio reads were assembled with SMARTdenovo 556 software (https://github.com/ruanjue/smartdenovo) to construct contigs. For PacBio data, 557 constructed contigs were subsequently polished by stand-alone consensus modules, 97 and 558 Pilon software 99 for Illumina PE reads. Polished contigs were used as input for scaffolding. 559 Scaffolds were constructed by SOAP scaffolding, SSPACE tool 100 with Illumina mate-pair 560 reads (2k-40k) with default parameters to extend the length of scaffolds for the raw assembly. 561 The gaps within scaffolds, consensus sequences generated from PacBio sub-reads were filled 562 using PBJelly2 101 . Finally, the scaffolds were corrected by Pilon 99 with Illumina PE reads to 563 correct the assembly errors and obtained final genome assembly. 564 565 Genome assembly quality assessment 566 The completeness of the assembly was tested by searching for 1440 core eukaryotic genes 567 using Benchmarking Universal Single-Copy Orthologs (BUSCO) (v2.0) 60 . To assess the 568 quality of the genome assembly, the Illumina paired-end 250bp read data was mapped to the 569 contig using BWA-MEM (version 0.7.15-r1142) 102 and adapter sequences. The remaining high-quality reads were assembled into contigs using 652 four commonly used short-read assemblers: Oases 129 ; TransABySS 130 ; SOAPdenovo-Trans 653 131 and Trinity 132 . Two different approaches were used to assemble the transcriptome. In the 654 first approach (best k-mer strategy) high-quality reads were assembled at different k-mer length 655 21-51 using Oases, TransABySS and SOAPdenovo-Trans whereas the assembly by Trinity 656 used default parameters (K-mer 25). The assemblies from each software in the first approach 657 were further used in the second approach (additive k-mer followed by TGICL) in order to 658 improve the transcriptome assemblies. A two-step strategy was employed for assembly in the 659 second approach in which the contigs generated from all the k-mers by each respective 660 assembler were merged and redundancy was removed using CD-HIT 133 . The remaining non-661 redundant contigs were assembled using TGICL clustering tool 134 with a maximum identity 662 of 90 and a minimum overlap length of 40. The completeness of the transcriptome assemblies 663 was measured using the BUSCO 60 software. Availability of data 728 Raw sequence data used for genome assembly, mRNA sequencing (RNA-Seq) and whole-729 Species diversity driven by morphological and ecological disparity: 738 a case study of comparative seed morphology and anatomy across a large monocot 739 order The number of known plants species in the world and 741 its annual increase The Flora of A dictionary of the economic products of the Malay Peninsula A preliminary checklist of the Zingiberaceae of Gingers of peninsular Malaysia and 749 Singapore (Natural History Nutritional compositions and phytochemical properties of the 751 edible flowers from selected Zingiberaceae found in Thailand Phytochemicals, antioxidant 754 properties and anticancer investigations of the different parts of several gingers species 755 (Boesenbergia rotunda, Boesenbergia pulchella var attenuata and Boesenbergia 756 armeniaca) Zingiberaceae in Flowering Plants· 758 Monocotyledons Nomenclatural changes in Zingiberaceae: 760 Caulokaempferia is a superfluous name for Monolophus and Jirawongsea is reduced to 761 Boesenbergia Boesenbergia rotunda: From ethnomedicine to drug discovery Based Complementary Altern Elicitation enhancement of 765 bioactive compound accumulation and antioxidant activity in shoot cultures of 766 Boesenbergia rotunda L Structural analysis of a novel antimutagenic compound, 4-768 hydroxypanduratin A, and the antimutagenic activity of flavonoids in a Thai spice fingerroot (Boesenbergia pandurata Schult.) against mutagenic heterocyclic amines Kaempferia pandurata): 772 isolation, crystal structure and synthesis of (±)-Boesenbergin A Flavanoids from Boesenbergia Rotunda (L.) Mansf: Chemistry, Bioactivity 775 and Accumulation Distribution of flavonoids and cyclohexenyl chalcone derivatives in 778 conventional propagated and in vitro-derived field-grown Boesenbergia rotunda Based Complementary Altern Existence of bioactive flavonoids in 781 rhizomes and plant cell cultures of'Boesenbergia rotund Determination of 784 Quercetin and Flavonol Synthase in Boesenbergia rotunda Rhizome Characterization of flavonoid derivatives from Boesenbergia 787 rotunda (L.) Anti-inflammatory cyclohexenyl chalcone derivatives in 789 Boesenbergia pandurata Bioactive 791 compounds of Boesenbergia sp. and their anti-inflammatory mechanism: A review Anticancer properties of 794 panduratin A isolated from Boesenbergia pandurata (Zingiberaceae) Panduratin A inhibits cell proliferation by inducing G0/G1 phase cell cycle 797 arrest and induces apoptosis in breast cancer cells 4-Hydroxypanduratin A and isopanduratin A inhibit tumor necrosis 799 factor α-stimulated gene expression and the nuclear factor κB-dependent signaling 800 pathway in human lung adenocarcinoma A549 cells Panduratins D-I, novel 803 secondary metabolites from rhizomes of Boesenbergia pandurata Cytotoxic Activity of Boesenbergia rotunda Extracts against 806 OrthoFinder: phylogenetic orthology inference for 900 comparative genomics STAG: species tree inference from all genes STRIDE: species tree root inference from gene duplication 904 events The haplotype-resolved reference genome of lemon (Citrus limon 906 L. Burm f.) Draft genome sequence of wild Prunus yedoensis reveals massive inter-908 specific hybridization between sympatric flowering cherries Phased diploid genome assembly with single-molecule real-time 911 sequencing Isomerase (CHI) Gene from Boesenbergia rotunda Enhancing 915 flavonoid production by promiscuous activity of prenyltransferase, BrPT2 from 916 Boesenbergia rotunda Whole genome sequencing of a banana wild relative Musa itinerans 918 provides insights into lineage-specific diversification of the Musa genus The banana (Musa acuminata) genome and the evolution of 921 monocotyledonous plants Haplotype-resolved genome assembly and allele-specific gene 923 expression in cultivated ginger In depth characterization of repetitive DNA in 23 plant genomes reveals 925 sources of genome size variation in the legume tribe Fabeae Differential genome size and repetitive DNA evolution in diploid 928 species of Melampodium sect Haplotype-resolved genome of diploid ginger (Zingiber officinale) and 931 its unique gingerol biosynthetic pathway The structure and evolution of angiosperm nuclear genomes Global DNA cytosine methylation 935 as an evolving trait: phylogenetic signal and correlated evolution with genome size in 936 angiosperms Transposable elements, epigenetics, and genome evolution Differential methylation of genes and repeats in land plants Genes and transposons are differentially methylated in plants, 942 but not in mammals Epigenetic natural variation in Arabidopsis thaliana Stable unmethylated DNA demarcates expressed genes and their cis-946 regulatory space in plant genomes De novo transcriptome assembly of a fern, Lygodium japonicum, and a 948 web resource database, Ljtrans DB Expression and DNA 950 methylation of SERK, BBM, LEC2 and WUS genes in in vitro cultures of Boesenbergia 951 rotunda (L.) Expression and DNA methylation of MET1, CMT3 and DRM2 during 953 in vitro culture of Boesenbergia rotunda (L.) Mansf Transcriptome wide SSR discovery cross-taxa transferability and 956 development of marker database for studying genetic diversity population structure of 957 Lilium species The landscape of microsatellites in the enset (Ensete ventricosum) 959 genome and web-based marker resource development An efficient protocol for total 961 DNA extraction from the members of order Zingiberales-suitable for diverse PCR 962 based downstream applications A reference-grade wild soybean genome A simple and efficient protocol for isolation of 965 functional RNA from plant tissues rich in secondary metabolites Practical in situ hybridization Cloning and characterization of ribosomal RNA genes 970 from wheat and barley The genome of Dendrobium officinale illuminates the biology of the 972 important traditional Chinese orchid herb Genome survey of Chinese fir (Cunninghamia lanceolata): Identification 974 of genomic SSRs and demonstration of their utility in genetic diversity analysis A fast, lock-free approach for efficient parallel counting 977 of occurrences of k-mers GenomeScope 2.0 and 979 Smudgeplot for reference-free profiling of polyploid genomes Assembly and diploid architecture of an individual human genome 982 via single-molecule technologies Nonhybrid, finished microbial genome assemblies from long-read 984 SMRT sequencing data A de novo assembler using long 986 noisy reads Pilon: an integrated tool for comprehensive microbial variant 988 detection and genome assembly improvement Scaffolding pre-990 assembled contigs using SSPACE Mind the gap: upgrading genomes with Pacific Biosciences RS 992 long-read sequencing technology Aligning sequence reads, clone sequences and assembly contigs with BWA-994 MEM Tandem repeats finder: a program to analyze DNA sequences. Nucleic 996 Acids Res Repbase Update, a database of repetitive 998 elements in eukaryotic genomes Graph-based genome 1000 alignment and genotyping with HISAT2 and HISAT-genotype AUGUSTUS: ab initio prediction of alternative transcripts Using native and syntenically 1005 mapped cDNA alignments to improve de novo gene finding Prediction of complete gene structures in human genomic DNA Gene finding in novel genomes Transcription of plant defence genes in response to UV 1011 light or fungal elicitor Integrated nr database in protein annotation system and its localization InterProScan 5: genome-scale protein function classification Gene ontology: tool for the unification of biology KEGG: kyoto encyclopedia of genes and genomes. Nucleic 1019 Acids Res The SWISS-PROT protein sequence database and its 1021 supplement TrEMBL in 2000 The COG database: an updated version includes eukaryotes tRNAscan-SE: a program for improved detection of transfer 1025 RNA genes in genomic sequence Rfam 13.0: shifting to a genome-centric resource for non-coding RNA 1027 families Infernal 1.1: 100-fold faster RNA homology searches MEGA X: molecular 1031 evolutionary genetics analysis across computing platforms UpSetR: an R package for the visualization 1034 of intersecting sets and their properties CAFE 5 models variation in 1036 evolutionary rates among gene families Estimating 1038 divergence times in large phylogenetic trees Genome sequencing and analysis of the model grass Brachypodium 1040 distachyon The automatic 1042 detection of homologous regions (ADHoRe) and its application to microcolinearity 1043 between Arabidopsis and rice A wrapper tool around Cutadapt and FastQC to consistently 1045 apply quality and adapter trimming to FastQ files 516 Bismark: a flexible aligner and methylation caller for 1048 A reference methylome database and analysis pipeline to facilitate 1050 integrative and comparative epigenomics Oases: robust de novo RNA-1052 seq assembly across the dynamic range of expression levels ABySS: a parallel assembler for short read sequence data SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-1057 Seq reads Trinity RNA-Seq assembler performance optimization Discovery Environment: bridging from the eXtreme to the campus and beyond 45 1061 Cd-hit: a fast program for clustering and comparing large sets of 1063 protein or nucleotide sequences TIGR Gene Indices clustering tools (TGICL): a software system for 1065 fast clustering of large EST datasets Blast2GO: a universal tool for annotation, visualization and analysis 1067 in functional genomics research High-throughput functional annotation and data mining with the 1069 Blast2GO suite The KEGG database in 'In silico' simulation of biological processes RSEM: accurate transcript quantification from RNA-Seq data 1073 with or without a reference genome edgeR: a Bioconductor package for 1075 differential expression analysis of digital gene expression data Exploration and exploitation of novel SSR markers for candidate 1078 transcription factor genes in Lilium species Primer3-new capabilities and interfaces Sequence mapping by electronic PCR Fluorescent in situ hybridization with clone pTa71 (45S rDNA 1096 of wheat), labelled with digoxigenin and detected with FITC (green) and clone pTa794 (5S 1097 rDNA of wheat) labelled with biotin and detected with Alexa 674 (shown in red). A-C: early 1098 metaphase showing 6 sites of 45S rDNA (arrows) of variable strength at ends of 3 pairs of 1099 chromosomes. In some cases, the rDNA is extended, and the satellite is separated from the 1100 main chromosomes shown enlarged in B and C. D: Two 5S rDNA sites (arrows) were detected 1101 on a chromosome pair not bearing 45S rDNA. The star indicates fusion of 2 or 3 45S rDNA 1102 sites. E and F: Chromosome preparation using fresh root tips from plants grown and analysed 1103 in two different laboratories Figure 2. (a-b) Frequency distribution of SSR motif; (c) transferability of genomic and 1107 transcript SSR markers in four relatives of B. rotunda Gene ontology (GO) classification of assembled unigenes of B. rotunda. Results 1110 are summarized in three main categories: biological process (BP), molecular function (MF), 1111 and cellular component (CC). The x-axis indicates the subgroups in GO annotation while the 1112 y-axis indicates the percentage of specific categories of genes in each main category Distribution of Eukaryotic Orthologous Groups (KOG) classification. A total of 18,767 1114 assembled unigenes were annotated and assigned to 25 functional categories. The vertical axis 1115 indicates subgroups in the KOG classification and the x-axis represents the number of genes in 1116 each main category Cross-genera phylogenetic analysis of B. rotunda and 13 other species; (b) UpSet 1119 plot showing unique and shared protein ortholog clusters of B. rotunda and 13 selected 1120 reference genomes. Connected dots represent the intersections of overlapping orthologs with 1121 the vertical black bars above showing the number of orthogroups in each intersection Scheme of the flavonoid and phenylpropanoid biosynthetic pathways in B. rotunda 1124 based on KEGG pathways. Genes encoding enzymes for each step are indicated as follows: 1125 CAD, cinnamyl alcohol dehydrogenase; and BGLU, Beta-glucosidase; CALDH, coniferyl-1126 aldehyde dehydrogenase 4CL, 4-coumarate-CoA ligase CCoAOMT, caffeoyl-CoA 3-O-1128 methyltransferase F3H F5H, ferulate 5-hydroxylase PAL, phenylalanine 1132 ammonia lyase. Beside each enzyme, four boxes shown (from left to right): In vitro leaf (IVL) Red boxes indicate relatively 1134 higher mRNA expression compared to the leaf sample with the highest levels in darker red Green boxes indicate relatively lower expression compared to the leaf sample. The colour box 1136 is based on log2FC values Figure 6. (a-d) Average methylation levels of DNA methyltransferase-related genes and genes 1140 involved in flavonoid and phenylpropanoid biosynthesis pathways in different samples of B. 1141 rotunda; ex-vitro leaf (EVL), in vitro leaf (IVL) Watery callus (WC). a) cytosine methylation; b) CpG methylation; c) CHG methylation; and 1143 d) for CHH methylation. CAD, cinnamyl alcohol dehydrogenase; and BGLU, Beta-1144 glucosidase; CALDH, coniferyl-aldehyde dehydrogenase 4CL, 4-coumarate-CoA ligase; CHS, chalcone synthase C3H, ρ-coumarate 3-hydroxylase CCR, 1147 cinnamoyl-CoA reductase; COMT, caffeic acid O-methyltransferase; 6'DCHS, 6′-1148 deoxychalcone synthase DFR, dihydroflavonol 4-reductase F3H CoA shikimate; LPO, 1150 Lactoperoxidase; PAL, phenylalanine ammonia lyase; WOX, Wuschel; LEC3, Leafy 1151 cotyledon 2; BBM, Baby boom; SERK, Somatic embryogenesis receptor-like kinase Expression level and total methylation level in all cytosine contexts of methylation-1156 related genes (a & b), somatic embryogenesis-related gene (c & d) and flavonoid and 1157 phenylpropanoid biosynthesis pathways-related genes (e & f) in ex vitro leaf (EVL), in vitro 1158 leaf (IVL), embryogenic callus (EC), dry callus (DC), and watery callus (WC). CAD, cinnamyl 1159 alcohol dehydrogenase; and BGLU CALDH, coniferyl-aldehyde 1160 dehydrogenase CoA ligase; CHS, 1161 chalcone synthase; CHI, chalcone isomerase; CCoAOMT, caffeoyl-CoA 3-O-1162 methyltransferase F3H F5H, ferulate 5-hydroxylase