key: cord-0825998-wslnnjlk authors: Ghosh, Kalyan; Amin, Sk.Abdul; Gayen, Shovanlal; Jha, Tarun title: Chemical-informatics approach to COVID-19 drug discovery: Exploration of important fragments and data mining based prediction of some hits from natural origins as main protease (Mpro) inhibitors date: 2020-08-05 journal: J Mol Struct DOI: 10.1016/j.molstruc.2020.129026 sha: 1d2eeef5826900dcec80ace9bd8d8899bea4a365 doc_id: 825998 cord_uid: wslnnjlk As the world struggles against current global pandemic of novel coronavirus disease (COVID-19), it is challenging to trigger drug discovery efforts to search broad-spectrum antiviral agents. Thus, there is a need of strong and sustainable global collaborative works especially in terms of new and existing data analysis and sharing which will join the dots of knowledge gap. Our present chemical-informatics based data analysis approach is an attempt of application of previous activity data of SARS-CoV main protease (Mpro) inhibitors to accelerate the search of present SARS-CoV-2 Mpro inhibitors. The study design was composed of three major aspects: (1) classification QSAR based data mining of diverse SARS-CoV Mpro inhibitors, (2) identification of favourable and/or unfavourable molecular features/fingerprints/substructures regulating the Mpro inhibitory properties, (3) data mining based prediction to validate recently reported virtual hits from natural origin against SARS-CoV-2 Mpro enzyme. Our Structural and physico-chemical interpretation (SPCI) analysis suggested that heterocyclic nucleus like diazole, furan and pyridine have clear positive contribution while, thiophen, thiazole and pyrimidine may exhibit negative contribution to the SARS-CoV Mpro inhibition. Several Monte Carlo optimization based QSAR models were developed and the best model was used for screening of some natural product hits from recent publications. The resulted active molecules were analysed further from the aspects of fragment analysis. This approach set a stage for fragment exploration and QSAR based screening of active molecules against putative SARS-CoV-2 Mpro enzyme. We believe the future in vitro and in vivo studies would provide more perspectives for anti-SARS-CoV-2 agents.  It is challenging to identify effective SARS-CoV-2 main protease inhibitor urgently.  This study involves classification QSAR based data mining of diverse SARS-CoV Mpro inhibitors.  Important molecular features regulating the Mpro inhibitory properties are identified.  Prediction of recently reported natural origin based virtual hits is reported. # Authors have equal contribution *Corresponding authors: S. Gayen (shovanlal.gayen@gmail.com) and T. Jha (tjupharm@yahoo.com) Severe acute respiratory syndrome (SARS) coronavirus-2 (SARS-CoV-2) has been spreading alarmingly by causing tremendous social and economic disruption [1] [2] [3] . This zoonatic infection has spread over 216 countries and territories [4, 5] . Notably, the genome of SARS-CoV-2 comprises ∼30,000 nucleotides with 10 Open Reading Frames (ORFs). The 3′ terminal regions encode viral structural proteins including spike (S), membrane (M), envelope (E) and nucleocapsid (N) proteins. On the other hand, the 5′ terminal ORF1ab encodes two viral replicase polyproteins pp1a and pp1b [1] . A number of 16 nonstructural (ns) proteins (nsp1 to nsp16) are raised upon proteolytic cleavage of pp1a and pp1b. The nsp5 (Chymotrypsin-like protease 3CLpro also called Main protease Mpro) is a prerequisite enzyme of the viral replication and maturation. Mpro turns into a charismatic target for anti-SARS-CoV-2 drug discovery and development [7] [8] [9] [10] [11] [12] . The research work in terms of molecular docking and target based virtual screening studies on Mpro have moved at a much faster pace after releasing of the several covalent and non-covalent inhibitor bound crystal structures. Despite the availability of inhibitor-bound SARS-CoV-2 Mpro crystal structures and lots of proteomic knowledge, the significant fragment/feature which modulates the structure-activity relationships (SARs) pattern is still not known. Therefore, ligandbased molecular modelling approaches are necessary to gather knowledge about the favourable and/or unfavourable molecular features/fingerprints/substructures regulating the Mpro inhibitory properties. Quantitative structure-activity relationship (QSAR) study is a very significant ligand-based molecular modelling technique that easily recognised the effect of structural and physic-chemical features of ligands on the biological activity [29, 30] . Not only that, it offers prediction of particular compounds to their biological activities of interest. As the SARS-CoV-2 genome has over 80% identity to SARS-CoV (about 96% sequence similarity for their Mpro), previously reported SARS-CoV Mpro inhibitors may have huge prospect to show their efficacy against SARS-CoV-2 also. In this connection, we design our current research, a part of our rational molecular modelling studies [3, [31] [32] [33] [34] , by covering three major characteristics- (1) classification QSAR based data mining of diverse SARS-CoV Mpro inhibitors, (2) The fighting against COVID-19 disease requires strong and sustainable global collaborative works especially in terms of data sharing which will join the dots of knowledge gap [35] . A set of 113 compounds were retrieved from the data as shared by Bobrowski and co-workers [35, 36] . We considered only compounds having half-maximal inhibitory concentration (IC 50 ) and eliminated the compounds with binding affinity (K i ) value. Thus, 88 compounds were kept for this current modelling study (Table S1) . The classification modelling assists to discriminate the Active and Inactive molecules in terms of their investigated biological significance. The 'activity threshold' for the current work was set to the IC 50 of 10,000 nM. Here, we performed Structural and physico-chemical interpretation (SPCI) analysis [37, 38] and Monte Carlo based Coral QSAR studies [39] [40] [41] [42] . Performing Monte Carlo based Coral QSAR study not only offer a graphical visualization of critical fingerprint or fragments attributed to enhance/decrease the SARS-CoV Mpro inhibitory activity but also it allows the chance of screening external set compounds. The SPCI analysis was explored to identify and approximate the contributions of different fragments that are important for Mpro inhibition. Initially, the descriptor calculation was performed with the help of SiRMS tool. Further, these descriptors were used for model development and validation in our study [37, 38] . In SPCI analysis, four diverse classification-based QSAR models were generated by using machine learning approaches like: Gradient boosting classification (GBC), Random Forest (RF), Support Vector Machine (SVM) and k-nearest neighbour (kNN). These models were further evaluated by different statistical parameters like: balanced accuracy, sensitivity, and specificity [38] . Additionally, all the fragments comprising of at most three attachment points were preferred and subsequently, favoured fragments were counted by RDKit in amalgamation with SMARTS pattern [38] . Finally, the overall contribution of the different fragments obtained from four machine learning models are shown in median fragment contribution graphs generated by using rspciR software package [43] . Monte Carlo optimization method was used to identify the important structural fingerprints that are solely responsible for endorsing or deterring of activity [3] . Different descriptors that are generally attributes and the local smile attributes are denoted by S k , SS k and SSS k [40] [41] [42] . Further, different Graph-based descriptors like: GAO (graph of atomic orbital), HSG (hydrogensuppressed graph) and HFG (hydrogen-filled graph) are calculated by following equation: Where, 0 EC k , 1 [3, 39] . The sensitivity, specificity, accuracy along with the MCC values was calculated as a measure of internal and external validation [3] . Lastly, the important structural attributes that are exclusively liable for promoting or hindering of Mpro activity were identified. Compounds having the SARS-CoV Mpro IC 50 value less than the 'activity threshold' were yielded to lower Mpro inhibitors or inactives (0) and those with Mpro IC 50 value higher than the 'activity threshold' (IC 50 = 10,000 nM) were classified as promising Mpro inhibitors or actives (1). Thus, 27 molecules were identified as actives (1) while, 61 compounds were distinguished as lower SAPS-CoV Mpro inhibitors (0) in the classification analysis (Table S1) . At first structural and physico-chemical interpretation (SPCI) analysis was performed [38] . These machine-learning based models were utilised for a fragment/feature analysis to estimate the contributions of different fragments towards Mpro inhibition. It enables an extensive interpretation of the structural and physico-chemical properties responsible for SARS-CoV Mpro inhibitory activities. The Monte Carlo optimization-based QSAR modelling was also employed by the aid of SMILES and graph-based descriptors to justify fragment contributions [39] . The best Monte Carlo optimization-based QSAR model was used for screening the recently reported docking based natural product hits. Together with the fragment/fingerprint analysis results justified the selection of the potential hits retrieved through such QSAR derived prediction. With the aim to construct an interpretable QSAR model, gradient boosting machine (GBM), random forest (RF), support vector machine (SVM) and k-nearest neighbor (kNN) were established using structural and physico-chemical interpretation (SPCI) analysis ( Table 1 ). The parameter settings used for the individual models (GBM, RF, SVM and kNN) development are given in Table S2 . A consensus model was also developed to eliminate biasness of individual models. The fragments obtained from different models are depicted in Figure 1 . These fragments were found to have different positive and negative contributions towards 3CLpro inhibitory activity. The 88 compounds with diverse structural features were also used in Monte Carlo optimization based classification QSAR analysis [3, 42] . Twenty-one different models from three different splits were generated using SMILES and graph-based descriptors with a combination of different connectivity indices were generated for construction of different Monte Carlo optimization based QSAR study [39] . Overall statistical characteristics of twenty-one different models are given in Table 3 . in compound 005, 009, 018 (Figure 4) Optimization results also highlighted similar fragments obtained from our current SPCI analysis. Similar fragments between the two analyses are highlighted in Table 4 . .(...c...) , ++++O---B2== etc. were also found in some lower active Mpro inhibitors (039, 042, 087 and 067, Figure 4) , but their strong negatively contributing groups further reduces their SARS-CoV Mpro inhibitory activities. Since the SARS-CoV-2 Mpro shares about 96% sequence similarity with SARS-CoV Mpro (while genome has over 80% identity), previously reported SARS-CoV Mpro inhibitors may have huge prospect to show their efficacy against SARS-CoV-2 Mpro also. Thus, considering high statistical significance of the best Monte Carlo optimization based QSAR model, we applied the model M21 (SMILES and HSG with 1 EC k ) from split-3 to perform QSAR derived prediction of a library of nature product hits from recent publications [7-9, 13, 16-20, 23, 24, 26-28] . The lists of nature product hits are depicted in Table S4 . After screening with the model M21, a number of 13 molecules from natural origin were predicted as actives (Table 5) also reported that these hits found to potentially bind with active site amino acid residues of SARS-CoV-2 Mpro [8, 13, 16, 18, 24, 27, 28] . The molecular docking study performed by Das and coworkers suggested that rutin (also known as vitamin P) forms non-covalent interactions with the SARS-CoV-2 Mpro active site residues [13] . It interacts with H41, L141, N142, E166, T190 and Q192 by forming hydrogen bonding. Moreover, a π-sulphur and π-alkyl interactions were noticed with C145 and P168, respectively. Hesperidin forms amide-π stacked interaction with T45, π-alkyl interactions with M49 and C145 as well as hydrogen bonding interactions with T24, T25, T45, S46 and C145 [13] . Both these two dietary polyphenols (rutin and hesperidin) having low systemic toxicity indicate promising potential for the treatment of COVID-19. Apart from hesperidin, another active constituent, neohesperidin, from Citrus aurantium was also found to be active as per our QSAR based prediction. Moreover, kouitchenside I and deacetylcentapicrin from the plants of Swertia genus were also predicted as actives. A pentacyclic triterpene, 22-hydroxyhopan-3-one from Cassia siamea (Fabaceae) showed AutoDock Vina 4.2 promising binding affinity (-8.6 kcal/mol) against Mpro of SARS-CoV-2 (PDB: 6LU7) [18] . 22-Hydroxyhopan-3-one forms conventional hydrogen bond with K137 along with alkyl and π-alkyl interactions with L275, L287, L286 and Y239, respectively [18] . Oolonghomobisflavan-A is an important polymerized polyphenol present in Tea. The semi-flexible docking tool CDOCKER utility of Discovery Studio suggested that Oolonghomobisflavan-A possesses two π-alkyl (M165, H41), and one π-π T-shaped interaction (H41) as well as forms several hydrogen bonds with T25, N142, H163, E166, R188, and H164 [16] . It showed the binding free energy of -256.875 kJ/mol better than the drug Lopinavir (binding free energy of -250.585kJ/mol) as per MM-PBSA calculations [16] . Quercetin 3-vicianoside interact with catalytic amino acid residues (PDB: 6LU7) by forming hydrogen bonds with L141, G143, S144, H163, E166 and hydrophobic bonds with T25, His41, F140, N142, C145, H164, M165, D187 , R188, Q189 [9] . Myricitrin (Plant source: Myrica cerifera) showed a docking score of -15.64 and binding affinity of -22.13 kcal/mol [24] . It forms several hydrogen bonding and other interactions with amino acid residues T24, T25, T26, L27, H41, C44, S46, M49, L141, N142, G143, S144, C145, H163, E166 and Q189 [24] . Baicalin (Plant source: S. baicalensis) is an experimentally manifested antiviral representative against SARS-CoV [54] , SARS-CoV-2 [22] . Notably, baicalin exhibited an IC 50 of 6.41 µM against SARS-CoV-2 Mpro along with K d of 11.50 µM [21] . In addition, the docking study of baicalin performed by Islam et al. [8] showed interaction through one hydrophobic, one π-sulfur and six hydrogen bonding interactions with the catalytic residues of SARS-CoV-2 Mpro (AutoDock Vina score of -8.1 kcal/mol and GOLD score of 59.19) [8] . Cyanidin 3-glucoside exhibits numerous hydrogen bonding interactions and hydrophobic interactions (AutoDock Vina score of -8.4 kcal/mol) in which one hydrophobic interaction is noticed with the catalytic C145 [8] . These hits could be tested for their in-vitro and in-vivo inhibition potential against SARS-CoV-2 Mpro. Further, the backbone structure of these molecules could be exploited to develop more potent Mpro inhibitors in future. Quantitative structure-activity relationship (QSAR) study is an efficient technique that extracts crucial information from complex datasets. Recently, QSAR modelling truly recognised the effect of structural and physicochemical features of compounds on the investigated biological activity and also offers simultaneous prediction of virtual libraries. In this current study, we developed multiple classification QSAR models with a diverse dataset of Shovanlal Gayen: Conceptualization, Writing-Reviewing and Editing. Tarun Jha: Writing-Reviewing and Editing, Supervision. The authors have no conflict of interests. Drug development and medicinal chemistry efforts toward SARS-coronavirus and Covid-19 therapeutics Recent discovery and development of inhibitors targeting coronaviruses Chemical-informatics approach to COVID-19 drug discovery: Monte Carlo based QSAR, virtual screening and molecular docking study of some in-house molecules as papain-like protease (PLpro) inhibitors Andrographolide as a potential inhibitor of SARS-CoV-2 main protease: an in silico approach A molecular modeling approach to identify effective antiviral phytochemicals against the main protease of SARS-CoV-2 In silico screening of natural compounds against COVID-19 by targeting Mpro and ACE2 using molecular docking Discovery of potential multi-target-directed ligands by targeting hostspecific SARS-CoV-2 structurally conserved main protease Marine natural compounds as potents inhibitors against the main protease of SARS-CoV-2. A molecular dynamic study Targeting SARS-CoV-2: a systematic drug repurposing approach to identify promising inhibitors against 3C-like proteinase and 2′-O-ribose methyltransferase An investigation into the identification of potential inhibitors of SARS-CoV-2 main protease using molecular docking study Ul-Haq,Identification of chymotrypsin-like protease inhibitors of SARS-CoV-2 via integrated computational approach Using Integrated Computational Approaches to Identify Safe and Rapid Treatment for SARS -CoV-2 Identification of bioactive molecules from Tea plant as SARS-CoV-2 main protease inhibitors Unravelling lead antiviral phytochemicals for the inhibition of SARS-CoV-2 Mpro enzyme through in silico approach Potential Inhibitors of Coronavirus 3-Chymotrypsin-Like Protease (3CLpro): An in-silico screening of Alkaloids and Terpenoids from African medicinal plants Understanding the binding affinity of noscapines with protease of SARS-CoV-2 for COVID-19 using MD simulations at different temperatures Identifying potential treatments of COVID-19 from Traditional Chinese Medicine (TCM) by using a data-driven approach Discovery of baicalin and baicalein as novel, natural product inhibitors of SARS-CoV-2 3CL protease in vitro Scutellariabaicalensis extract and baicalein inhibit replication of SARS-CoV-2 and its 3C-like protease in vitro Identification of potential molecules against COVID-19 main protease through structure-guided virtual screening approach Structural basis of SARS-CoV-2 3CLpro and anti-COVID-19 drug discovery from medicinal plants † Peptide-like and smallmolecule inhibitors against Covid-19 Identification of new anti-nCoV drug chemical compounds from Indian spices exploiting SARS-CoV-2 main protease as target Analysis of therapeutic targets for SARS-CoV-2 and discovery of potential drugs by computational methods Active constituents and mechanisms of Respiratory Detox Shot, a traditional Chinese medicine prescription, for COVID-19 control and prevention: network-molecular docking-LC-MSE analysis QSAR-based virtual screening: Advances and applications in drug discovery Artificial intelligence in chemistry and drug design Structural insight into the viral 3C-like protease inhibitors: Comparative SAR/QSAR approaches Design of aminopeptidase N inhibitors as anti-cancer agents Exploration of histone deacetylase 8 inhibitors through classification QSAR study: Part II* Fight against novel coronavirus: A perspective of medicinal chemists Computational models identify several FDA approved or 1 experimental drugs as putative agents against SARS-CoV-2 Interpretation of quantitative structure-activity relationship models: past, present, and future Structural and physico-chemical interpretation (SPCI) of QSAR models and its comparison with matched molecular pair analysis A quasi-QSPR modelling for the photocatalyticdecolourization rate constants and cellular viability (CV%) of nanoparticles by CORAL QSAR as a random event: modeling of nanoparticles uptake in PaCa2 cancer cells CORAL: building up QSAR models for the chromosome aberration test Large-scale QSAR study of aromatase inhibitors using SMILES-based descriptors The pharmacological potential of rutin Benefits of hesperidin for cutaneous functions Effects of oolonghomobisflavanA on oxidation of lowdensity lipoprotein Chemistry and health beneficial effects of oolong tea and theasinensins Chemical and biological research on herbal medicines rich in xanthones The binding site for neohesperidindihydrochalcone at the human sweet taste receptor Plant phenylpropanoids as emerging antiinflammatory agents Myricitrin, a nitric oxide and protein kinase C inhibitor, exerts antipsychotic-like effects in animal models Baicalin, the major component of traditional Chinese medicine Scutellariabaicalensis induces colon cancer cell apoptosis through inhibition of onco miRNAs Cyanidin-3-glucoside, a natural product derived from blackberry, exhibits chemopreventiveand chemotherapeutic activity In vitro susceptibility of 10 clinical isolates of SARS coronavirus to selected antiviral compounds