key: cord-1036971-a4py7ttx authors: Mahajan, Shruti; Chakraborty, Abhisek; Sil, Titas; Sharma, Vineet K title: Genome sequencing and assembly of Tinospora cordifolia (Giloy) plant date: 2021-08-04 journal: bioRxiv DOI: 10.1101/2021.08.02.454741 sha: 12e067c4f4fd72eb292d845ce0c191ec22729637 doc_id: 1036971 cord_uid: a4py7ttx During the ongoing COVID-19 pandemic Tinospora cordifolia also known as Giloy gained immense popularity and use due to its immunity-boosting function and anti-viral properties. T. cordifolia is among the most important medicinal plants that has numerous therapeutic applications in health due to the production of a diverse array of secondary metabolites. Therefore, to gain genomic insights into the medicinal properties of T. cordifolia, the first genome sequencing was carried out using 10x Genomics linked read technology and the draft genome assembly comprised of 1.01 Gbp. This is also the first genome sequenced from the plant family Menispermaceae. We also performed the first genome size estimation for T. cordifolia and was found to be 1.13 Gbp. The deep sequencing of transcriptome from the leaf tissue was also performed followed by transcriptomic analysis to gain insights into the gene expression and functions. The genome and transcriptome assemblies were used to construct the gene set in T. cordifolia that resulted in 19,474 coding gene sequences. Further, the phylogenetic position of T. cordifolia was also determined through the construction of a genome-wide phylogenetic tree using 35 other dicot species and one monocot species as an outgroup species. Tinospora cordifolia is a climbing shrub belonging to the Menispermaceae family that includes more 26 than 400 plant species of high therapeutic properties [1, 2] . It perhaps originated in Africa in the 27 Oligocene epoch (28.57 million years ago) and was spread to Asia in the early Miocene epoch (21.54 28 million years ago) [3] . T. cordifolia is found in tropical and sub-tropical regions including India, China, 29 Sri Lanka, Bangladesh, Myanmar, Thailand, Malaysia, etc. and also known as 'Giloy', 'Amrita', 30 'Guduchi', and 'heart leaved moonseed ' [2] . It is a perennial deciduous dioecious plant with 31 morphological characteristics of twining branches, succulent stem with papery bark, alternatively 32 arranged heart-shaped leaves, aerial roots and greenish yellow tiny flowers in the form of racemes 33 inflorescence [2, 4] . Being a climber, T. cordifolia needs a supportive plant like Jatropha curcas 34 (Jatropha), Azadirachta indica (Neem), Moringa oleifera (Moringa), etc. for its growth [4] . These co-35 occurring plants also play an important role in enhancing the production of various secondary 36 metabolites of T. cordifolia [4, 5] . Previous reports also indicated the presence of endophytic fungi in 37 the leaves and the stem of this plant but their ecological significance has yet to be studied [6, 7] . This 38 plant produces the secondary metabolites in response to the stress conditions and their concentration 39 also varies based on seasons and its dioecy status [8] . High genetic diversity has been reported in T. 40 cordifolia due to the dioecious nature [9] [10] [11] . 41 The chemical constituents of this plant have been broadly categorized as alkaloids (tinosporine, 42 magnoflorine, berberine, etc.), terpenoids (tinosporide, furanolactone diterpene, cordifolioside, etc.), 43 phenolics (lignans, flavonoids, phenylpropanoids etc.), polysaccharides (glucose, xylose, rhamnose, 44 etc.), steroids (giloinsterol, ß-sitosterol, etc.) , essential oils and aliphatic compounds along with a few 45 other compounds such as giloin, tinosporidine, sinapic acid, tinosporone, tinosporic acid, etc. that are 46 obtained from various parts of the plant [12, 13] . A terpene tinosporaside and an alkaloid berberine 47 were found to be the most dominant compounds in T. cordifolia and suggested to use them as its 48 chemical biomarkers [14, 15] . The bitter taste of T. cordifolia is due to the presence of tinosporic acid, 49 tinosporol, giloin, giloinin, tinosporide, cordifolide, tinosporin and a few other compounds [12] . A 50 study reported that among the two species of Tinospora (i.e. T. cordifolia and T. sinensis), T. cordifolia 51 produces three times higher concentration of berberine than T. sinensis, and thus the former is preferred 52 in therapeutics [16] . The bioactive compounds found in T. cordifolia have known biological properties such as anti-pyretic, 54 anti -diabetic, anti-inflammatory, anti-microbial, anti-allergic, anti-oxidant, anti-diabetic, anti-toxic, 55 anti-arthritis, anti-osteoporotic, anti-HIV, anti-cancer, hepatoprotective, anti-malarial, and also in 56 immunomodulation etc. [17, 18] . These properties make this species useful in the traditional treatment 57 of several ailments including fevers, cough, diabetes, general debility, ear pains, jaundice, asthma, 58 heart diseases, burning sensation, bone fracture, urinary problems, chronic diarrhoea, dysentery, assembly is yet unavailable. A preliminary study reported the transcriptome (482 Mbp data) of this 67 species from leaf and stem tissues using 454 GS-FLX pyrosequencing [22] . A recent karyological 68 study reported 2n=26 as the chromosome number in T. cordifolia, which was also supported by the 69 earlier studies [23] [24] [25] . Thus, to uncover the genomic basis of its medicinal properties and for further 70 exploration of its therapeutic potential, we carried out the first genome sequencing and assembly of T. 71 cordifolia using 10x Genomics linked reads. This is the first draft genome assembly of T. cordifolia 72 which is also the first genome sequenced so far from the medicinally important genus Tinospora and 73 its family [26] . We also carried out a comprehensive deep sequencing and assembly of the leaf 74 transcriptome using Illumina reads. The genome-wide phylogenetic analysis was also carried out for 75 T. cordifolia with other dicot species and a monocot species as an outgroup to determine its 76 phylogenetic position. implemented to assemble haplotype-phased genome. The barcodes of linked reads were processed using Longranger basic v2.2.2 104 (https://support.10xgenomics.com/genome-exome/software/pipelines/latest/installation) and these 105 processed reads were used by Tigmint v1.2.1 to rectify the mis-assemblies present in Supernova 106 assembled genome [31] . AGOUTI v0.3.3 with quality-filtered transcriptome reads was used to 107 accomplish the initial scaffolding [32] . In order to construct a more contiguous assembly ARCS v1.1.1 108 with its default parameters was used to provide additional scaffolding and enhance the contiguity of 109 the genome assembly [33] . Using a bloom filter-based method and k-mer value ranging from 30 to The de novo transcriptome assembly was carried out using RNA-Seq data generated in this study. Trimmomatic v.0.38 was used for processing of raw data reads i.e., adapter removal and quality- The orthogroups comprising genes from all 37 species were retrieved from all the identified 159 orthogroups. KinFin v1.0 was used to increase the genes in one-to-one orthogroups that identified and 160 extracted fuzzy one-to-one orthogroups among these retrieved orthogroups [49] . In cases where 161 multiple genes were present for a single species in any orthogroup, the longest gene among them was 162 selected as representative. (Table 1 and Supplementary Tables S1-S2 ). The barcode sequences were removed from 181 the raw reads and high-quality reads were selected for further analysis. The de novo assembly 182 generated by Supernova assembler v.2.1.1 using 499.36 million raw reads [30] , and the 'pseudohap2' 183 style in Supernova mkoutput was implemented to assemble haplotype-phased genome. Since the 184 genome size was not known for this plant, we performed the first genome size estimation for T. 185 cordifolia using SGA-preqc processed linked reads, and the genome size was estimated to be 1.13 186 Gbp. Considering this genome size, the sequenced genomic data amounts to 70.2x genome coverage. After scaffolding, mis-assemblies rectification, gap-filling and polishing T. cordifolia genome 188 assembly resulted in 1.01 Gbp assembly size (≥2,000 bp) as the final draft. The %GC of the assembled 189 genome was 35.12%, and a total of 56,342 scaffolds were obtained having the N50 of 50.2 Kbp (Table 190 1 and Supplementary Table S3 ). The BUSCO completeness was 78.9% in the final polished T. Table S4 ). The de novo transcriptome assembly was 192 constructed using Trinity v2.9.1 with strand-specific option and other default parameters using the 193 processed paired end reads. The Trinity software assembled 2,764,154 bp of de novo transcriptome 194 assembly, and a total of 8,208 transcripts were predicted in the transcriptome assembly. Ranunculales from all the other core eudicots. Order Ranunculales is considered as an early-diverging 263 eudicot order and is among a few other eudicot orders (collectively known as basal eudicots) that are 264 found to be sister lineage to the core eudicots [52, 64, 65] , which was also observed in the case of T. cordifolia that showed early divergence from all other dicot species (Figure 2) . Thus, the early 266 divergence could be the reason for its distinct position relative to the other eudicots and monocot 267 species. The correctness of this phylogeny is supported by the fact that the species belonging to the 268 same clade shared common nodes, e.g., Arabidopsis thaliana, Camelina sativa, Brassica napus, and 269 Arabis alpina shared the same clade because they belong to the same order Brassicales, and similarly properties of this plant, the availability of T. cordifolia genome will help in bridging the missing link 277 between its genomic and medicinal properties and provide leads for exploring the genomic basis of 278 these properties. It will also aid in various comparative genomic studies and will act as a reference for 279 the future species sequenced from its genus and family. It will also help in the genome-wide 280 phylogenetic assessments as well as evolutionary analyses on this species. The knowledge of 281 mechanisms and pathways involved in production of its numerous medicinally important secondary 282 metabolites will help in better exploitation of these pathways and resultant metabolites for medicinal 283 purposes and therapeutic applications. The authors declare no competing financial and non-financial interest. The Menispermaceae family of plants and its action against infectious 338 diseases: A review Re-delimitation of Tinospora (Menispermaceae): Implications for character evolution 341 and historical biogeography Tinospora cordifolia: a multipurpose medicinal plant-A EFFECT OF EDAPHIC FACTORS ON MAJOR SECONDARY METABOLITES OF 345 TINOSPORA CORDIFOLIA AND NEEM GUDUCHI WITH RESPECT TO THEIR IMMUNOMODULATORY 346 EFFECT In vivo and in vitro histological localization of endophytic fungi in Tinospora 348 cordifolia (Willd.) Miers ex Hook F. & Thomas Endophytic fungi of Tinospora cordifolia with anti-gout properties. 3 351 Biotech Impact of seasons and dioecy on therapeutic phytoconstituents of Tinospora 353 cordifolia, a Rasayana drug Assessment of genetic diversity in medicinal climber of Tinospora cordifolia 355 (Willd.) Miers (Menispermaceae) from Gujarat Development of genomic simple sequence repeats (g-SSR) markers in Tinospora 358 cordifolia and their application in diversity analyses Diversity in a widely distributed dioecious medicinal plant, Tinospora cordifolia (Willd Chemistry and pharmacology of Tinospora cordifolia. Natural product 362 communications The chemical constituents and diverse pharmacological importance of Tinospora 364 cordifolia. Heliyon TLC based analysis of allelopathic effects on tinosporoside 366 contents in Tinospora cordifolia Pharmacognostic evaluation of Tinospora cordifolia 368 (Willd.) Miers and identification of biomarkers HPLC estimation of berberine in Tinospora cordifolia and Tinospora sinensis. 370 Indian journal of pharmaceutical sciences Tinospora cordifolia (Willd.) Hook. f. and Thoms.(Guduchi)-validation of the 372 Ayurvedic pharmacology through experimental and clinical studies Tinospora cordifolia: One plant, many roles. Ancient science of life Tinospora cordifolia (Guduchi), a reservoir plant for therapeutic applications: A 377 Review Role of Tinospora cordifolia in metabolic health disorders: An updated review Beneficial role of Indian medicinal plants in COVID-19 De novo transcriptome sequencing facilitates genomic resource generation in 383 Tinospora cordifolia. Functional & integrative genomics Chromosome number in Tinospora Studies on the Menispermaceae Research article Somatic and gametic chromosomal 388 characterization with fluorescence banding of Giloy (Tinospora cordifolia): A berberine synthesizing 389 important medicinal plant of India Tinospora species: An overview of their modulating effects 392 on the immune system Evaluating methods for isolating total RNA and predicting the success of 394 sequencing phylogenetically diverse plant transcriptomes Two mini-preparation protocols to DNA extraction from plants with 396 high polysaccharide and secondary metabolites Efficient de novo assembly of large genomes using compressed data 399 structures Direct determination of diploid genome sequences Tigmint: correcting assembly errors using linked reads from large molecules AGOUTI: improving genome assembly and annotation using 405 transcriptome data. GigaScience ARCS: scaffolding genome drafts with linked reads Sealer: a scalable gap-closing application for finishing draft genomes. BMC 409 bioinformatics Pilon: an integrated tool for comprehensive microbial variant detection and 411 genome assembly improvement BUSCO: assessing genome assembly and annotation completeness with single-413 copy orthologs Trimmomatic: a flexible trimmer for Illumina sequence data De novo transcript sequence reconstruction from RNA-seq using the Trinity platform 417 for reference generation and analysis RepeatModeler2 for automated genomic discovery of transposable element 419 families CD-HIT: accelerated for clustering the next-generation sequencing data Genome annotation and curation using MAKER and MAKER-P. Current 423 protocols in bioinformatics Basic local alignment search tool AUGUSTUS: ab initio prediction of alternative transcripts Tandem repeats finder: a program to analyze DNA sequences. Nucleic acids research miRBase: tools for microRNA genomics tRNAscan-SE: searching for tRNA genes in genomic sequences Ensembl plants: integrating tools for visualizing, mining, and analyzing plant 435 genomics data, in Plant bioinformatics2016 OrthoFinder: phylogenetic orthology inference for comparative genomics. 437 Genome biology MAFFT multiple sequence alignment software version 7: improvements 441 in performance and usability. Molecular biology and evolution RAxML version 8: a tool for phylogenetic analysis and post-analysis of large 443 phylogenies Phylogeny of basal eudicots: Insights from non-coding and rapidly evolving DNA. 445 Organisms Diversity & Evolution Phylogeny and classification of Ranunculales: Evidence from four molecular loci and 447 morphological data Evaluation of secondary metabolites and antioxidant activity of 449 ethanolic leaves extract of Tinospora Cordifolia Analgesic, Antiinflammatory Activity of Tinospora Cordifolia (Guduchi) and Network pharmacology-based assessment to elucidate the molecular mechanism of 454 anti-diabetic action of Tinospora cordifolia COMPARATIVE ANALYSIS OF HPTLC, SECONDARY 456 METABOLITES AND ANTIOXIDANT ACTIVITIES OF TINOSPORA CORDIFOLIA STEM POWDERS Scientific validation of the medicinal efficacy of Tinospora 459 cordifolia Ayurveda botanicals in COVID-19 management: An in silico multi-target approach. 461 Plos one Immunomodulatory active compounds from Tinospora cordifolia Berberine and Tinospora cordifolia exert a potential anticancer effect on colon 465 cancer cells by acting on specific pathways. International journal of immunopathology and 466 pharmacology Anticonvulsant activity of berberine, an isoquinoline alkaloid in mice. Epilepsy & 468 Behavior A hybrid approach for de novo human genome sequence assembly and phasing. 470 Nature methods Angiosperm phylogeny inferred from 18S rDNA, rbcL, and atpB sequences Phylogenomic Insights into Deep Phylogeny of Angiosperms Based on Broad Nuclear 474