key: cord-0845494-1mmqfp7g authors: ul Qamar, Muhammad Tahir; Alqahtani, Safar M.; Alamri, Mubarak A.; Chen, Ling-Ling title: Structural basis of SARS-CoV-2 3CLpro and anti-COVID-19 drug discovery from medicinal plants † date: 2020-03-26 journal: J Pharm Anal DOI: 10.1016/j.jpha.2020.03.009 sha: a5d41731f93410cad241d58545a8034123cca834 doc_id: 845494 cord_uid: 1mmqfp7g Abstract The recent outbreak of coronavirus disease 2019 (COVID-19) caused by SARS-CoV-2 in December 2019 raised global health concerns. The viral 3-chymotrypsin-like cysteine protease (3CLpro) enzyme controls coronavirus replication and is essential for its life cycle. 3CLpro is a proven drug discovery target in the case of severe acute respiratory syndrome coronavirus (SARS-CoV) and middle east respiratory syndrome coronavirus (MERS-CoV). Recent studies revealed that the genome sequence of SARS-CoV-2 is very similar to that of SARS-CoV. Therefore, herein, we analysed the 3CLpro sequence, constructed its 3D homology model, and screened it against a medicinal plant library containing 32,297 potential anti-viral phytochemicals/traditional Chinese medicinal compounds. Our analyses revealed that the top nine hits might serve as potential anti- SARS-CoV-2 lead molecules for further optimisation and drug development process to combat COVID-19. The first case of the novel coronavirus was reported on December 30, 2019, in Wuhan city, 21 Hubei province, P.R. China [1] . Swift actions were taken by the Centre for Disease Control and 22 Prevention (CDC), Chinese health authorities, and researchers. The World Health Organization 23 (WHO) temporarily named this pathogen 2019 novel coronavirus (2019-nCoV) [2] . On January 24 10, 2020, the first whole-genome sequence of 2019-nCoV was released, which helped 25 researchers to quickly identify the virus in patients using reverse-transcription polymerase chain 26 reaction (RT-PCR) methods [3] . On January 21, the first article related to 2019-nCoV was 27 published, which revealed that 2019-nCoV belongs to the beta-coronavirus group, sharing 28 ancestry with bat coronavirus HKU9-1, similar to SARS-coronaviruses, and despite sequence 29 diversity its spike protein interacts strongly with the human ACE2 receptor [1] . On January 30, Coronaviruses are single-stranded positive-sense RNA viruses that possess large viral RNA 42 genomes [5] . Recent studies showed that SARS-CoV-2 has a similar genomic organization to 43 other beta-coronaviruses, consisting of a 5'-untranslated region (UTR), a replicase complex 44 (orf1ab) encoding non-structural proteins (nsps), a spike protein (S) gene, envelope protein (E) 45 gene, a membrane protein (M) gene, a nucleocapsid protein (N) gene, 3'-UTR, and several 46 unidentified non-structural open reading frames [3] . Although SARS-CoV-2 is classified into the 47 beta-coronaviruses group, it is diverse from MERS-CoV and SARS-CoV. Recent studies 48 highlighted that SARS-CoV-2 genes share <80% nucleotide identity and 89.10% nucleotide 49 similarity with SARS-CoV genes [6, 7] . Usually, beta-coronaviruses produce a ~800 kDa 50 polypeptide upon transcription of the genome. This polypeptide is proteolytically cleaved to 51 generate various proteins. The proteolytic processing is mediated by papain-like protease (PL pro ) 52 and 3-chymotrypsin-like protease (3CL pro ). The 3CL pro cleaves the polyprotein at 11 distinct sites 53 to generate various non-structural proteins that are important for viral replication [8]. 3CL pro play 54 a critical role in the replication of virus particles and unlike structural/accessory protein-encoding 55 genes, it is located at the 3' end which exhibits excessive variability. Therefore, it is a potential 56 target for anti-coronaviruses inhibitors screening [9]. Structure-based activity analyses and high-57 throughput studies have identified potential inhibitors for SARS-CoV and MERS-CoV 3CL pro 58 [10-12]. Medicinal plants, especially those employed in traditional Chinese medicine, have 59 attracted significant attention because they include bioactive compounds that could be used to 60 develop formal drugs against several diseases with no or minimal side-effects [13] . Therefore, 61 the present study was conducted to obtain structural insight into the SARS-CoV-2 3CL pro and to 62 discover potent anti-COVID-19 natural compounds. Whole-genome sequences of all available SARS-CoV-2 isolates available till January 31, 2020, 67 were downloaded from GISAID database (accession numbers and details are given in Table S1 ) 68 In order to identify similar sequences and key/conserved residues, and to infer phylogeny, 77 multiple sequence alignment of SARS-CoV-2 3CL pro followed by phylogenetic tree analyses 78 were performed using T-Coffee [15] and the alignment figure was generated using ESPript3 79 To probe the molecular architecture of SARS-CoV-2 3CL pro , comparative homology modelling 84 was performed using Modeller v9.11 [17] . To select closely-related templates for modelling, 85 PSI-BLAST was performed against all known structures in the protein databank (PDB) [18] . 86 Chimera v1.8. 1 [19] and PyMOL educational version [20] were used for initial quality 87 estimation, energy minimisation, mutation analyses, and image processing. Multiple sequence alignment results revealed that 3CL pro was conserved, with 100% identity 109 among all SARS-CoV-2 genomes. Next, the SARS-CoV-2 3CL pro protein sequence was 110 compared with its closest homologs (Bat-CoV, SARS-CoV, MERS-CoV, Human-CoV and 111 Bovine-CoV). The results revealed that SARS-CoV-2 3CL pro clustered with bat SARS-like 112 coronaviruses and sharing 99.02% sequence identity (Fig. 1A) . Furthermore, it shares 96.08%, 113 87.00%, 90.00% and 90.00% sequence identity with SARS-CoV, MERS-CoV, Human-CoV and 114 Bovine-CoV homologs, respectively (Fig. 1B) . These finding were consistent with an initial 115 study reporting that SARS-CoV-2 is more similar to SARS-CoV than MERS-CoV, and shares a 116 common ancestor with bat coronaviruses [1, 3, 31] . Analysis of physicochemical parameters 117 revealed that the SARS-CoV-2 3CL pro polypeptide is 306 amino acids long with a molecular 118 weight of 33,796.64 Da and a GRAVY score of -0.019, categorising the protein as a stable, 119 hydrophilic molecule capable of establishing hydrogen bonds (Table 1) . 120 considerably lower (Fig. S3) . These results indicated that the 12-point mutations identified at 161 previous step may disrupt important hydrogen bonds and alter the receptor binding site, thereby 162 affecting its ability to bind with the SARS-CoV inhibitor. 163 Therefore, it was essential to discover novel compounds that may inhibit SARS-CoV-2 3CL pro 164 and serve as potential anti-COVID-19 drug compounds. We developed a library from our 165 previously published studies that contains numerous natural compounds possessing potential 166 anti-viral activities and screened it against the SARS-CoV-2 3CL pro homology model. Recent 167 drug repurposing studies proposed few drugs that target SARS-CoV-2 3CL pro and suggested that 168 they could be used to treat COVID-19. Herein, we selected the best of these (Nelfinavir, 169 Prulifloxacin and Colistin) from three different drug repurposing studies [36, 37] and docked 170 them as controls in the present study (Fig. S4 ). Our analyses identified nine novel non-toxic, 171 druggable natural compounds that are predicted to bind with the receptor binding site and 172 catalytic dyad (Cys-145 and His-41) of SARS-CoV-2 3CL pro ( Table 2 ; Fig. S5 ). ADMET 173 profiling of the selected hits is given in Table S2 . Among these screened phytochemicals, To further investigate the molecular docking results, the top three phytochemical complexes, 188 In conclusion, our study revealed that 3CL pro is conserved in SARS-CoV-2. It is highly similar to 208 bat SARS-like coronavirus 3CL pro , with some differences from other beta-coronaviruses. We 209 predicted the 3D structure of the SARS-CoV-2 3CL pro enzyme, and the findings may help 210 researchers working on COVID-19 drug discovery. Despite significant overall similarity with the 211 SARS-CoV 3CL pro structure, the SARS-CoV-2 3CL pro substrate binding site had some key 212 differences, which highlighted the need for rapid drug discovery to address the alarming 213 Postdoctoral research platform grant of Guangxi University. We also acknowledge all authors 227 and laboratories mentioned in Table S1 for Amaranthin binding mode with receptor binding site of SARS-CoV-2 3CL pro . 249 Table S1 . Acknowledgement to the authors and laboratories, sampling, analysing and submitting 250 the genome sequences to GISAID database. 251 Table S2 . ADMET profiling enlisting absoprtion, metabloim and toxicity related drug like 252 parameters of all nine selected phytochemicals. 253 CoV-2 superimposed with the SARS-CoV 3CL pro structure. The SARS-CoV 3CL pro template is 358 coloured cyan, the SARS-CoV-2 3CL pro structure is coloured grey, and all identified mutations 359 are highlighted in red. (E) Docking of 5,7,3',4'-tetrahydroxy-2'-(3,3-dimethylallyl) isoflavone 360 inside the receptor-binding site of SARS-CoV-2 3CL pro , showing hydrogen bonds with the 361 catalytic dyad (Cys-145 and His-41). The 3CL pro structure is coloured dark blue, the 5,7,3',4'-362 tetrahydroxy-2'-(3,3-dimethylallyl) isoflavone is orange, and hydrogen coloured maroon. 363 Evolution of the novel coronavirus from the ongoing Wuhan 255 outbreak and modeling of its spike protein for risk of human transmission Cross-species transmission of the newly identified 258 coronavirus 2019-nCoV A novel coronavirus from patients with pneumonia in 260 China GISAID: Global initiative on sharing all influenza data-from vision 262 to reality Emerging coronaviruses: Genome structure, replication, and 264 pathogenesis Discovery of a novel coronavirus associated with the 266 recent pneumonia outbreak in humans and its potential bat origin A new coronavirus associated with human respiratory disease 269 in China Coronavirus main proteinase (3CLpro) structure: 271 basis for design of anti-SARS drugs Structures of the middle east respiratory syndrome 273 coronavirus 3C-like protease reveal insights into substrate specificity Design and synthesis of peptidomimetic severe acute 276 respiratory syndrome chymotrypsin-like protease inhibitors Identification, synthesis and evaluation of SARS-279 3C-like protease inhibitors An overview of severe acute respiratory 281 syndrome-coronavirus (SARS-CoV) 3CL protease inhibitors: peptidomimetics and small 282 molecule chemotherapy Computational screening of medicinal 284 plant phytochemicals to discover potent pan-serotype inhibitors against dengue virus ExPASy: The proteomics server for in-depth 287 protein knowledge and analysis T-Coffee: A novel method for fast and accurate 289 multiple sequence alignment ESPript: analysis of multiple sequence alignments 291 in PostScript Comparative protein structure modeling 293 using Modeller UCSF Chimera-a visualization system 297 for exploratory research and analysis Pymol: An open-source molecular graphics tool MPD3: a useful medicinal plants 301 database for drug designing MAPS Database: medicinal plant 303 activities, phytochemical and structural database Medicinal chemistry and the molecular operating environment 307 (MOE): application of QSAR and molecular docking to drug discovery Epitope-based peptide vaccine design 310 and target site depiction against middle east respiratory syndrome coronavirus: an immune-311 informatics study Peptide vaccine against chikungunya virus: 313 immuno-informatics combined with molecular docking approach admetSAR: a comprehensive source and free tool for 316 assessment of chemical ADMET properties GROMACS: a message-passing 318 parallel molecular dynamics implementation PRODRG, a program for generating 320 molecular topologies and unique molecular descriptors from coordinates of small molecules Discovery of selective inhibitors for 323 cyclic AMP response element-binding protein: a combined ligand and structure-based resources 324 pipeline A pneumonia outbreak associated with a new 326 coronavirus of probable bat origin BLAST: improvements for better sequence analysis The PMDB Protein Model Database The crystal structures of severe acute respiratory 332 syndrome virus main protease and its complex with an inhibitor Discovery of non-covalent inhibitors of the SARS 335 main proteinase 3CLpro, Probe Reports from the NIH Molecular Libraries Program Nelfinavir was predicted to be a potential inhibitor of 2019-338 nCov main protease by an integrative approach combining homology modelling, molecular 339 docking and binding free energy calculation Therapeutic drugs targeting 2019-nCoV main protease by 341 high-throughput screening Isoflavonoids and other compounds from psorothamnus a 343 rborescens with antiprotozoal activities Encyclopedia of traditional Chinese medicines Atomic Composition Carbon-1499; Hydrogen-2318 Nitrogen-402; Oxygen-445 Arg-11 (3.6%) Asn-21 (6.9%) Asp-17 (5.6%) Cys-12 (3.9%) Glu-9 Ile-11 (3.6%) Lys-11 (3.6%) Ser-16 Trp-3 (1.0%) Licoleafol Glycyrrhiza Amaranthin Amaranthus *3CL pro catalytic dyad (His-41 and Cys-145) residues are highlighted with bold font SARS-CoV-2 3CL pro is conserved, share 99.02% sequence identity with SARS-CoV 3CL pro and together with 12 point-mutations Mutations disrupt important hydrogen bonds and alter the receptor binding site of SARS-CoV-2 3CL pro Medicinal plants phytochemicals were proved potential anti-COVID-19 druggable candidates