key: cord-0792515-r7cpkvgn authors: Kadioglu, Onat; Saeed, Mohamed; Greten, Henry Johannes; Efferth, Thomas title: Identification of novel compounds against three targets of SARS CoV-2 coronavirus by combined virtual screening and supervised machine learning date: 2021-03-30 journal: Comput Biol Med DOI: 10.1016/j.compbiomed.2021.104359 sha: a58d620fd63169ed9167072f2af6e918f62726f3 doc_id: 792515 cord_uid: r7cpkvgn Coronavirus disease 2019 (COVID-19) is a major threat worldwide due to its fast spreading. As yet, there are no established drugs or vaccines available. Speeding up drug discovery is urgently required. We applied a workflow of combined in silico methods (virtual drug screening, molecular docking and supervised machine learning algorithms) to identify novel drug candidates against COVID-19. We constructed chemical libraries consisting of FDA-approved drugs for drug repositioning and of natural compound datasets from literature mining and the ZINC database to select compounds interacting with SARS-CoV-2 target proteins (spike protein, nucleocapsid protein, and 2’-o-ribose methyltransferase). Supported by the supercomputer MOGON, candidate compounds were predicted as presumable SARS-CoV-2 inhibitors. Interestingly, several approved drugs against hepatitis C virus (HCV), another enveloped (-) ssRNA virus (paritaprevir, simeprevir and velpatasvir) as well as drugs against transmissible diseases, against cancer, or other diseases were identified as candidates against SARS-CoV-2. This result is supported by reports that anti-HCV compounds are also active against Middle East Respiratory Virus Syndrome (MERS) coronavirus. The candidate compounds identified by us may help to speed up the drug development against SARS-CoV-2. In the Chinese city of Wuhan, Hubei province, several cases of novel, SARS-like, severe pneumonia occurred in December 2019, as confirmed by the Chinese Center for Disease Control and Prevention and the China Office of the World Health Organization on December 31, 2019. Sequencing of the complete genome on January 13 th , 2020 showed that it was a novel coronavirus (GenBank No. MN908947). The official name is SARS-CoV-2. The previous, preliminary names were 2019-nCoV or Wuhan virus. The disease caused by SARS-CoV-2 has been termed Coronavirus disease 2019 (COVID-19) [1] , which has been declared by the World Health Organization (WHO) as a global pandemic. SARS-CoV-2 is an enveloped positive-sense single-stranded RNA virus (ssRNA) consisting of 29 ,903 nucleotides and two untranslated sequences of 254 and 229 nucleotides at the 5'-and 3'ends, respectively (GenBank No. MN908947) [2] . The putative genes code for a surface spike glycoprotein, an envelope membrane glycoprotein, a nucleocapsid phosphoprotein, a replicase complex and five other proteins, which compare to SARS-CoV and other coronaviruses. Comparable to SARS-CoV, the novel SARS-CoV-2 enters human cells via binding of the viral spike protein to the human angiotensin-converting enzyme 2 (ACE2) [3, 4] . Some coronaviruses also express hemagglutinin esterase on the surface, which is a shorter spike-like protein. Primary infective hosts were supposed to be traded as foods at the Huanan Fish and Seafood market in Wuhan, since several of the very first patients worked on this market. High sequence similarities of SARS-CoV-2 to coronaviruses in the Malayan pangolin (Sunda pangolins) [5] and bats (Rhinolophi sinicus) [3, 6] suggest that the virus might be transmitted from these animals to human hosts, although other hypotheses have also been put forward. Some coronaviruses (e.g. HCoV-229E, -NL63, -OC43, and -HKU1) usually cause respiratory infections and circulate worldwide in human populations [7] . Other coronavirus species (e.g. SARS-CoV, MERS-CoV, SARS-CoV-2) are rare and reveal higher mortality rates. In SARS-CoV-2 and MERS-CoV, more males than females are affected. Typical symptoms of SARS-CoV, SARS-CoV-2 and MERS-CoV include fever, dry cough, dyspnea, loss of tasting sense, muscle pain and other symptoms [8] . As of March 23, 2021, more than 124 Mio people were infected and more than 2,7 Mio deaths occurred. (https://www.worldometers.info/coronavirus/). As of yet, there are no drugs or approved to treat or prevent SARS-CoV-2. Some preliminary experiences with individual healing trials or animal experiments using anti-retroviral drugs (e.g. remdesivir, lopinavir, ritonavir, oseltamivir) and also alternative approaches from traditional J o u r n a l P r e -p r o o f Chinese medicine have been reported [9] [10] [11] [12] . For instance, clinical trials are running and some of them have been already published for remdesivir, lopinavir and ritonavir [13, 14] . The current clinical treatment is largely based on symptom-based therapies [11, 15] . Therefore, strategies for the rapid identification of drug candidates are urgently required. The concept of drug repurposing (or repositioning) came into the spotlight for several reasons [16] . As it became apparent that drugs approved for one disease, may also exert activity for other indications, FDA (https://www.fda.gov/) approved drugs became attractive as source for new drug development. A considerable advantage of old drugs in terms of time and costs for drug development is that their toxicity profile and pharmacokinetics are well-known in human beings. As the number of FDA-approved drugs is continuously decreasing during the past three decades, drug repurposing may speed up the marketing of new drugs. The dimension of drug development is, however, much broader in a sense that natural products (antibiotics, marine compounds, phytochemicals) represent a large chemical basis for drug development. Natural products serve as chemical scaffolds for derivatization to come up with novel compounds with improved pharmacological features. As a matter of fact, surveys of the National Cancer Institute, USA, repeatedly demonstrated that three quarters of drugs for all diseases worldwide during the past half century were in the one way or another based on natural resources [17, 18] . Hence, chemical scaffolds from natural sources are indispensable for drug development. Another dimension has been recently added by combining virtual drug screening methods with machine learning approaches for the development of new drugs [19, 20] , overcoming multidrug resistance [21] , and applications in precision medicine to select drugs for individualized therapies [22, 23] . The aim of the present study was to identify candidate drugs using a combined approach of virtual drug screening, molecular docking and supervised machine learning techniques. For this purpose, we used a library of FDA-approved drugs to investigate their potential for repurposing as anti-SARS-CoV-2 drugs as well as two chemical libraries with natural products. A flowchart of our in silico strategy to identify drug candidates against SARS-CoV-2 is shown in (2) Construction of compound databases: (A) 1,577 FDA-approved drugs (taken from ZINC database), (B) 39,442 natural products (taken from ZINC database) and (C) 115 natural products (taken from literature) were included in the study. Clinically established anti-viral drugs were chosen as presumable positive controls and clinically established drugs without antiviral activity were taken as presumable negative controls. All compounds were prepared in three-dimensional sdf format. (3) Virtual drug screening: All compounds were subjected to PyRx AutoDock VINA (blind docking mode) to generate ranking lists with compounds binding with high affinity to the three target proteins of SARS-CoV-2. (4) Molecular docking: The top 100 compounds from chemical libraries (A), (B) and (C) were analyzed for their ability to bind to the relevant pharmacophores of the three targets (ACE2 interaction site of spike protein, RNA-binding site of nucleocapsid protein and catalytic site of 2'o-ribose methyl transferase). Compounds with the best binding energies were then subjected to AutoDock VINA and AutoDock 4.2.6 (both in defined docking mode) to identify the amino acid residues involved in drug-binding. 3D illustrations of drug-protein interactions were prepared using VMD. J o u r n a l P r e -p r o o f (6) Identification of candidate compounds: Compounds with lowest binding energies of <-7 kcal/mol (from step 4) and probability values of R > 0.995 (from step 5) were proposed as candidate compounds with activity against SARS-CoV-2. Three sets of compounds were considered for the virtual screening on three proteins (spike protein, nucleocapsid protein, and 2'-o-ribose methyltransferase). FDA-approved drugs (1,577 compounds), natural compounds from the ZINC database (39,442 compounds), and natural compounds mined from the literature with antiviral activity (115 compounds) [27] [28] [29] [30] [31] . Furthermore, antiviral drugs were selected as presumable positive control drugs (27 compounds) and non-cytotoxic antidiabetic, antidepressants, cardiovascular agents, non-steroidal antiinflammatory drugs (NSAIDs) and proton pump inhibitors were selected as presumable negative control drugs (30 compounds) the from DrugBank database (https://www.drugbank.ca/). As described before, the threshold was set as -7 kcal/mol to consider the affinity of a chemical compound to its target protein as being strong [32] . The positive control drugs revealed binding energies of ≤-7 kcal/mol, while negative control drugs bound with affinities of >-7 kcal/mol to the three targets ( Table 1) . The test compounds have been subjected to an automated and comprising molecular docking campaign by using the AutoDock VINA algorithm PyRx algorithm (blind docking mode) and the high-performance supercomputer MOGON (Johannes Gutenberg University, Mainz). After the selection of compounds with strong interaction with target proteins, further validation was performed with molecular docking. For this purpose, the Lamarckian algorithm of AutoDock VINA was chosen (defined docking mode), and the AutoDock 4.2.6. Lamarckian algorithm was used to analyze the docking poses and binding energies with as described before [21, 33] . The The positive control drug class was labeled as "1" and the negative control drug class was labeled as "0". After the descriptors were calculated by Data Warrior software, the descriptors were selected in a similar manner, as previously reported by us using the SPSS software and considering the correlations of each descriptor with the class (0/1) [21] . After calculation of the 32 chemical descriptors, correlation coefficients between descriptors and correlation of the descriptors with the with the class (1/0) (potential drug; yes or no) were determined using SPSS statistics software version 23.0.0.3 (IBM, Armonk, NY: IBM Corp, USA). If the correlation with the class (1/0) (potential drug; yes or no) was below 0.1, this descriptor was omitted. Only descriptors correlating with the class (1/0) (potential drug; yes or no) category above 0.1 were selected for further processing. As a next step, descriptors having a pairwise correlation coefficient higher than 0.9 were excluded. By this strategy, relevant descriptors without an issue of over-fitting can be selected. The selected descriptors meeting the criteria were as follows: Hacceptors, H-donors, total surface area, relative PSA, molecular complexity, rotatable bonds, ring closures, aromatic atoms, sp3 atoms, symmetric atoms, amides, and aromatic nitrogens. Leaveone-out random sampling was used to build the models. Correlation matrix approach is among the preferred feature selection techniques. By applying the above-mentioned correlation matrix approach, we could eliminate overfitting and select only the relevant descriptors which are positively correlated with the target variable (potential drug (1/0) classification). The selected descriptors meeting the criteria were as follows: H-acceptors, H-donors, total surface area, relative PSA, molecular complexity, rotatable bonds, ring closures, aromatic atoms, sp3 atoms, symmetric atoms, amides, and aromatic nitrogens. Leave-one-out random sampling J o u r n a l P r e -p r o o f was used to build the models. To select the most suited algorithm, we applied the Orange software (Ljubljana, Slovenia) (https://orange.biolab.si/). We tested all 11 different algorithms and found that neural network performed better than the other algorithms for nucleocapsid protein and spike protein models, whereas naïve bayes was the best algorithm for 2'-o-ribose methyltransferase model. The performance parameters for each model are summarized in Table 2 . The top 100 compounds based on lowest binding energy (LBE) from each virtual screening output on three proteins were selected to evaluate their classes with our prediction model. The receiver operating characteristic (ROC) curves of 3 out of 11 algorithms are depicted in Figure 2 . The After establishing the prediction models for spike protein, nucleocapsid protein, and 2'-o-ribosemethyltransferase using the positive and negative control drugs ( Table 1) We then evaluated their therapeutic probability against SARS-CoV-2 by using our established prediction models with positive and negative control drugs. The compounds were ranked according to their binding energy (yielded from the AutoDock VINA-based virtual screening in blind docking mode). We selected the top 10 compounds from each dataset for each protein model and considered a probability threshold of R > 0.995. Then, these 10 compounds from each dataset were subjected to two further molecular docking programs for verification. PyRx implemented in AutoDock VINA allowed rapid screening in the blind docking mode, i.e. the best docking pose on the entire target protein surface was investigated. As a next step, we applied two defined docking modes (AutoDock VINA and Those compounds which consistently passed binding energy thresholds of < -7 kcal/mol with all three programs (2 ×AutoDock VINA and AutoDock 4.2.6) may be considered more suitable for further investigations than the other compounds ( Tables 3-5 ). In parallel, these sets of each 10 compounds were subjected to supervised machine learning to gain insight into the drug-likeliness of the compounds (ROC probability of being class "1" yielded from the prediction models). Eleven different algorithms available in the Orange software were tested for building the prediction models. The neural network algorithm was the best for the spike and nucleocapsid proteins, while naïve bayes was superior for 2'-o-ribose methyltransferase. Figure 2 displays 3 out of 11 tested algorithms for illustration. With these prediction models, the test compounds were calculated, and excellent ROC probabilities were obtained (Tables 3-5), indicating that the test compounds fulfilled the criteria of drug-likeliness defined by the 12 chemical parameters setting up the predictive models. Interestingly, among the drugs binding with high affinity to the spike protein were several approved drugs against another enveloped (+) ssRNA virus, the hepatitis C virus (HCV), i.e. paritaprevir, simeprevir and velpatasvir), indicating that these drugs may also be effective to treat COVID-19. Interestingly, some of the compounds shown in Tables 3-5 bound with high affinity not only to one target protein but also to another one. Among the FDA-approved drugs, ivermectin, nystatin, paritaprevir and simeprevir bound to spike protein and nucleocapsid, conivaptan, dihydroergotamine and ergotamine to nucleocapsid protein and 2'-O-ribose methyltransferase. Among the natural products, crinine, ilexsaponinB2, procyanidin, punicalagin, strictinin, ZINC000027215482 and ZINC000252515584 bound to spike protein and nucleocapsid, while loniflavone, ilexsaponin B2, procyanidin, punicalagin bound to spike protein and 2'-o-ribose methyltransferase, ilexsaponin B2, procyanidin, punicalagin, tirucallin A, ZINC000253504770 and ZINC000253504766 bound to nucleocapsid protein and 2'-O-ribose methyltransferase. These "two-in-one" compounds may be attractive for further drug development. Finally, as a conclusion from virtual screening, molecular docking and supervised machine learning the top compounds were identified. The target interactions (1) with the spike protein were highest for simeprevir, euphol and ZINC252515584, (2) with the nucleocapsid protein for paritaprevir, ilexsaponin B1 and ZINC27215482, and with 2'-o-ribose methyltransferase for conivaptan, loniflavone and ZINC15675938. The protein-drug interactions are illustrated in The stability of the loniflavone docking pose on the spike receptor binding domain was assessed with MD simulation. As can be seen in Figure 6 and Supplementary Video, loniflavone was stably interacting with the protein. COVID-19 rapidly increased to an epidemic in China. Although still mostly restricted to the Hubei province, there is a reasonable threat that the disease may spread all over the world. With 219 countries and territories affected (status: March 23, 2021), it will be difficult to manage the outbreak without drugs and vaccines available. Therefore, there is an urgent requirement for drugs that inhibit SARS-CoV-2. We have selected three important viral proteins as targets for our combined virtual screening/machine learning approach, i.e. spike protein, nucleocapsid protein, Furthermore, our results from the drug repurposing approach by using 1,577 FDA-approved drugs generally fit together with other well-known drugs from the literature, e.g. the anti-malarial artemisinin and its derivatives are also active against viruses, other infectious diseases and cancer [55] [56] [57] [58] . Broad-spectrum activities have also been reported for other classes of pharmacological drugs [59] , indicating that drug repurposing represents a fertile reservoir to develop drugs to fight COVID-19. During the past few years, molecular docking has been used for the identification of synthetic and natural drug candidates against targets of MERS-CoV and SARS-CoV such as chymotrypsin-like protease [60] [61] [62] [63] , mRNA polymerases [64] , and helicase [65] . To the best of our knowledge, we are the first describing drug candidates against viral proteins of SARS-CoV-2 by a combined virtual screening/molecular docking/supervised machine learning in silico approach. The World Health Organization Coronavirus disease (COVID-2019) situation reports 2020 A new coronavirus associated with human respiratory disease in China Discovery of a novel coronavirus associated with the recent pneumonia outbreak in humans and its potential bat origin. bioRxiv Structure analysis of the receptor binding of 2019-nCoV. Biochemical and Biophysical Research Communications Viral metagenomics revealed Sendai virus and coronavirus infection of Malayan pangolins (Manis javanica). Viruses Full-genome evolutionary analysis of the novel corona virus (2019-nCoV) rejects the hypothesis of emergence as a result of a recent recombination event Hosts and sources of endemic human coronaviruses Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirusinfected pneumonia in Wuhan, China Clinical characteristics and therapeutic procedure for four cases with 2019 novel coronavirus pneumonia receiving combined Chinese and Western medicine treatment Prophylactic and therapeutic remdesivir (GS-5734) treatment in the rhesus macaque model of MERS-CoV infection Critical care management of adults with communityacquired severe respiratory viral infection Case of the index patient who caused tertiary transmission of COVID-19 infection in Korea: The application of lopinavir/ritonavir for the treatment of COVID-19 infected pneumonia monitored by quantitative RT-PCR The Australasian COVID-19 Trial (ASCOT) to assess clinical outcomes in hospitalised patients with SARS-CoV-2 infection (COVID-19) treated with lopinavir/ritonavir and/or hydroxychloroquine compared to standard of care: A structured summary of a study protocol for a randomised controlled trial Effect of remdesivir vs standard care on clinical status at 11 days in patients with moderate COVID-19: A randomized clinical trial Practical recommendations for critical care and anesthesiology teams caring for novel coronavirus (2019-nCoV) patients Drug repositioning: identifying and developing new uses for existing drugs Natural products as sources of new drugs from 1981 to Natural products as sources of new drugs over the 30 years from 1981 to 2010 Quantitative structure-activity relationship: promising advances in drug discovery platforms. Expert Opinion in Drug Discovery Concepts of artificial intelligence for computer-assisted drug discovery A machine learning-based prediction platform for P-glycoprotein modulators and its validation by molecular docking. Cells Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction Cancer Drug Response Profile scan (CDRscan): A deep learning model that predicts drug effectiveness from cancer genomic signature Biochemical and structural insights into the mechanisms of SARS coronavirus RNA ribose 2 '-O-methylation by nsp16/nsp10 protein complex Characterization of a viral phosphoprotein binding site on the surface of the respiratory syncytial nucleoprotein Cryo-EM structures of MERS-CoV and SARS-CoV spike glycoproteins reveal the dynamic receptor binding domains Drug resistance of human immunodeficiency virus and overcoming it by natural products Polyphenols: A diverse class of multi-target anti-HIV-1 agents Anti-HIV activity of southern African plants: Current developments, phytochemistry and future research Medicinal plants used in the treatment of human immunodeficiency virus Virtual screening of Chinese herbs with Random Forest Peptide aptamer identified by molecular docking targeting translationally controlled tumor protein in leukemia cells. Investigative New Drugs Interactions of human P-glycoprotein transport substrates and inhibitors at the drug binding domain: Functional and molecular docking analyses Complex interactions between phytochemicals. The multi-target therapeutic concept of phytotherapy Antischistosomal activity of artemisinin derivatives in vivo and in patients From ancient herb to modern drug: Artemisia annua and artemisinin for cancer therapy. Seminars in Cancer Biology Beyond malaria: The inhibition of viruses by artemisinin-type compounds The activity of Artemisia spp. and their constituents against Trypanosomiasis Quinolines and quinolones as antibacterial, antifungal,aAnti-virulence, antiviral and anti-parasitic agents Biflavonoids from Torreya nucifera displaying SARS-CoV 3CL(pro) inhibition Flavonoid-mediated inhibition of SARS coronavirus 3C-like protease expressed in Pichia pastoris Identification of novel drug scaffolds for inhibition of SARS-CoV 3-Chymotrypsin-like protease using virtual and high-throughput screenings Potential broad spectrum inhibitors of the coronavirus 3CLpro: A virtual screening and structure-based drug design study Quantitative structure-activity relationship and molecular docking revealed a potency of anti-hepatitis C virus drugs against human corona viruses Altaher, Design, synthesis and molecular docking of novel triazole derivatives as potential CoV helicase inhibitors 14b-icosahydropicene-4a-carboxylic acid ZINC000514287935; 6-[1-(9a,11a-dimethyl-9 (hydroxymethyl)oxan-2-yl]oxy}oxan-2-yl]oxy}-3-hydroxy-4-methoxy-6-methyloxan-2-yl)oxy]-7'-hydroxy-8',12'-dimethyl-6'-oxaspiro 7S,9aS,11aR 9a benzodioxol-5-yloxy)-1H-1,2,3,4-tetrazol-1-yl]-hexahydrofuro 5-dihydroxy-6-methyltetrahydro-2H-pyran-2-yl)oxy]-6-methyltetrahydro-2H-pyran-2-yl)oxy)-14,16-dihydroxy-10 6R)-4,5-Dihydroxy-6-methyloxan-2-yl]oxy-4-hydroxy-6-methyloxan-2-yl]oxy-4-hydroxy-6-methyloxan-2-yl]oxy-12 5-dihydroxy-6-methyloxan-2-yl)oxy]-4-hydroxy-6-methyloxan-2-yl)oxy)-4-hydroxy-6-methyloxan-2-yl]oxy)-3a,11-dihydroxy-9a,11a-dimethyl-hexadecahydro-1H-cyclopenta computing time granted on the supercomputer Mogon at Johannes Gutenberg University Mainz (hpc.uni-mainz.de). The authors declare no conflict of interest.