key: cord-0735430-rwueexs7 authors: Zhao, Jun; Ma, Qinhai; Zhang, Baoyue; Guo, Pengfei; Wang, Zhe; Liu, Yi; Meng, Minsi; Liu, Ailin; Yang, Zifeng; Du, Guanhua title: Exploration of SARS-CoV-2 3CL(pro) Inhibitors by Virtual Screening Methods, FRET Detection, and CPE Assay date: 2021-11-19 journal: J Chem Inf Model DOI: 10.1021/acs.jcim.1c01089 sha: cc1afec2e81c7008d2ee4c0e8e524dc1dc43bc40 doc_id: 735430 cord_uid: rwueexs7 [Image: see text] COVID-19 caused by a novel coronavirus (SARS-CoV-2) has been spreading all over the world since the end of 2019, and no specific drug has been developed yet. 3C-like protease (3CL(pro)) acts as an important part of the replication of novel coronavirus and is a promising target for the development of anticoronavirus drugs. In this paper, eight machine learning models were constructed using naïve Bayesian (NB) and recursive partitioning (RP) algorithms for 3CL(pro) on the basis of optimized two-dimensional (2D) molecular descriptors (MDs) combined with ECFP_4, ECFP_6, and MACCS molecular fingerprints. The optimal models were selected according to the results of 5-fold cross verification, test set verification, and external test set verification. A total of 5766 natural compounds from the internal natural product database were predicted, among which 369 chemical components were predicted to be active compounds by the optimal models and the EstPGood values were more than 0.6, as predicted by the NB (MD + ECFP_6) model. Through ADMET analysis, 31 compounds were selected for further biological activity determination by the fluorescence resonance energy transfer (FRET) method and cytopathic effect (CPE) detection. The results indicated that (+)-shikonin, shikonin, scutellarein, and 5,3′,4′-trihydroxyflavone showed certain activity in inhibiting SARS-CoV-2 3CL(pro) with the half-maximal inhibitory concentration (IC(50)) values ranging from 4.38 to 87.76 μM. In the CPE assay, 5,3′,4′-trihydroxyflavone showed a certain antiviral effect with an IC(50) value of 8.22 μM. The binding mechanism of 5,3′,4′-trihydroxyflavone with SARS-CoV-2 3CL(pro) was further revealed through CDOCKER analysis. In this study, 3CL(pro) prediction models were constructed based on machine learning algorithms for the prediction of active compounds, and the activity of potential inhibitors was determined by the FRET method and CPE assay, which provide important information for further discovery and development of antinovel coronavirus drugs. At present, coronavirus disease 19 caused by a novel coronavirus (SARS-CoV-2) is still circulating worldwide and highly contagious mutant strains have emerged. As known from the World Health Organization (WHO), the number of confirmed cases of COVID-19 worldwide had exceeded 247 million as of November 4, 2021, and the cumulative death toll had exceeded 5.0 million. 1 The rapid spread of the virus and rising infectivity have driven the global acceleration of interventions. Currently, related vaccines have been introduced into the market, and people in many countries have been vaccinated, 2 but the related adverse reactions and effective duration still need further clinical confirmation. Although there has been rapid progress in the research and development of vaccines, no specific therapeutic drug has been developed against this virus. The main strategies of drug treatment include drug repositioning, broad-spectrum screening of antiviral drugs, and discovery of new targeted drugs. However, drugs that showed certain activity in the initial stage, such as chloroquine and remdesivir, could not significantly reduce the clinical mortality in COVID-19 patients with the progression of clinical trials. 3, 4 Therefore, screening all potential and available drugs aimed at the effective targets of SARS-CoV-2 is still necessary to control and alleviate the epidemic. After entering the host cell, novel coronavirus replicates and synthesizes a large amount of genetic material and related proteins in the cell, and then, the mature virus particles are assembled in the cytoplasm and released outside the cell. 5 3Clike protease (3CL pro ), also known as M pro , is an essential enzyme for the replication of coronavirus, which exerts a crucial part in cutting polymers and may interfere with the host's innate antiviral immune response. The replication and proliferation of coronavirus can be effectively interfered with by inhibiting the activity of 3CL pro . 6 3CL pro is highly conserved in different coronaviruses, so drugs targeting 3CL pro can significantly reduce mutation-mediated drug resistance and show broad-spectrum antiviral activity. 7 Finding or designing 3CL pro inhibitors is a potential therapeutic strategy for COVID- 19 . In recent years, as a method for computer-aided drug design and high-throughput screening, computer virtual screening technology has played an important role in drug discovery and development. The most common methods are molecular docking, pharmacophore modeling, and machine learning. Compared with the traditional screening process, the machine learning approach is simple, easy, and low cost, which can greatly reduce the research time. At present, there have been compelling studies focusing on potential drugs against COVID-19 through computer virtual screening technology based on the 3CL pro structure. Early in the outbreak of COVID-19, the structural sequence of 3CL pro was analyzed to Journal of Chemical Information and Modeling pubs.acs.org/jcim Article construct its 3D homologous model, which was used to screen a database of medicinal plants containing 32297 potential antiviral plant chemical constituents, and nine potential anti-SARS-CoV-2 compounds were found. 8 Gyebi et al. detected four potential nontoxic, drug-usable plant-derived 3CL pro inhibitors by screening 62 African plant-derived alkaloids and 100 terpenoids using molecular docking technique. 9 There have been 168 virtual screening studies for 3CL pro , but the accuracy of the screening models is limited, and most of the prediction results from models have not been verified by experiments. 10 In this paper, machine learning models were established first by naive Bayesian (NB) and recursive partitioning (RP) algorithms for 3CL pro to predict 5766 natural chemical components in the natural molecular database established by our laboratory. The predicted compounds were further screened by ADMET analysis, and then, the activity of screened drugs was determined by the fluorescence resonance energy transfer (FRET) method and the cytopathic effect (CPE) assay. Finally, the action mechanism of potential inhibitors was analyzed by molecular docking. The overall process is shown in Figure 1 . In summary, this paper provides important information for further discovery and development of antinovel coronavirus drugs. 2.1. Data Aggregation and Processing. The active ligands against 3CL pro were collected in the BindingDB database (http://www.bindingdb.org). After removing the repetitive structures, a total of 149 active compounds (inhibitors) were obtained, and then, these active ligands were used to generate inactive compounds (decoys) in the DUD-E database (http://dude.docking.org). Based on the proportion of 3:1, inactive compounds and active compounds were stochastically grouped into a training set including 112 active compounds and 337 inactive compounds and a test set including 37 active compounds and 113 inactive compounds in DS 2018 (Discovery Studio version 2018, San Diego, CA). 3CL pro inhibitors reported from the related literature were collected to form an external test set containing 40 active compounds and 120 inactive compounds. The symbols 1 and −1 were used to mark the activity of the active compounds and inactive compounds, respectively, in all data sets. Hydrogenation, deprotonation, and energy optimization were performed for all compounds before the molecular descriptors (MDs) were calculated. 2.2. Calculation and Optimization of Molecular Descriptors. Molecular descriptors (MDs) are employed to measure the molecular weight, atomic number, lipid−water partition coefficient, molecular polarity surface area, and other parameters. In this study, 348 molecular descriptors of the compounds in the training set were calculated by DS 2018 software, comprising 8 AlogP molecular descriptors, 35 molecular property descriptors, 43 topological molecular descriptors, 7 surface area and volume descriptors, 92 molecular property number descriptors, and 163 estate keys. The Pearson correlation coefficients were calculated to quantify the degree of correlation between 348 molecular descriptors and the activity of compounds. First, the molecular descriptor was removed when the frequency of the descriptor value was more than 50%. Then, the molecular descriptor was excluded if its Pearson correlation coefficient 11 with activity was less than 0.1. Meanwhile, of the two molecular descriptors with a correlation coefficient of more than 0.9, the one with a lower correlation coefficient with activity was discarded. Eventually, the reserved molecular descriptors were carried out by stepwise linear regression, in which the molecular descriptors were screened to construct the classification models. 2.3. Molecular Fingerprints. Molecular fingerprints characterize the molecular structure of compounds by a series of molecular fragments. In the present study, the SciTegic extended connection fingerprint ECFP was used, and to ensure that the molecular fragment size described by the molecular fingerprint was kept in the appropriate range, we used the molecular fingerprint with a diameter of 4 or 6, that is, ECFP_4 and ECFP_6, which were calculated in DS 2018 software. 12 Another MACCS molecular fingerprint using the MDL structure library containing 166 seed structures was calculated with PaDEL Descriptor software. 13 2.4. Spatial Distribution Prediction of Compounds. The spatial distribution diversity of compounds in the training set and test set greatly affects the predictive ability of the machine classification learning model. In general, when compounds in the training set have a wider chemical spatial distribution, the established classification model will also have higher prediction precision and stronger generalization. Conversely, when the spatial distribution in the training set is narrow, the model application will be limited to a great extent. In this study, principal component analysis (PCA) and Tanimoto analysis 14 were used to investigate the chemical spatial distribution characteristics of compounds in all data sets. 2.5. Naïve Bayesian Classification Model and Recursive Partitioning Model. The NB algorithm and RP algorithm were adopted to establish classification models by learning the mapping relationship between molecular descriptors and their activity, which can predict the activity of uncertain active compounds. The NB algorithm is a probability-based algorithm developed by British mathematician Bayes. 15 The NB model was established in DS 2018 software to study how to separate inhibitors from decoys based on the compound information in the training set. The RP algorithm can classify analytical samples layer by layer according to a series of rules by simulating the human learning process. 16 The outcome of the RP model can be directly shown by the graph of a bifurcated decision tree, so the RP model is also called the decision tree model, which was also built in DS 2018 software. The minimum number of samples per node, the maximum number of nodes for each descriptor, and the maximum depth of the decision tree were respectively set to 10, 20, and 20. Each model was established using the training set, and 5-fold cross verification in the training set was carried out in the process of building each model. 2.7.2. FRET Detection of SARS-CoV-2 3CL pro Activity. The amino acid sequence of 3CL pro in the 2019-nCoV M pro /3CL pro inhibitor screening kit is the same as that of natural novel coronavirus 3CL pro . The FRET method 19 was used to detect the activity of 3CL pro in this kit ( Figure 2 ). The fluorescent donor (Edans) and fluorescent receptor (Dabcyl) were connected to both ends of the natural substrate of 2019-nCoV 3CL pro , and the fluorescence of Edans could be detected when the two groups were separated by cutting substrate. The reaction was carried out in a 96-well black plate. First, 93 μL of 3CL pro assay reagent and 5 μL of compounds were added successively to each sample well, and DMSO was used to replace the compound in the model well, and 93 μL of assay buffer and 5 μL of DMSO were added to the control well. The 96-well plate was oscillated for 1 min to fully mix the reaction solution, and then, 2 μL of substrate was quickly added to each well and fully mixed. The 96-well plate was incubated at 37°C in black for 15−20 min. The fluorescence was determined by a multifunction enzyme labeling reader (SpectraMaxM5, Molecular Devices) with a 340 nm excitation wavelength and 490 nm emission wavelength. The inhibition rate of the detected compounds was calculated by formula (5) VeroE6 cells were incubated in 96-well plates (1 × 10 4 cells/ well) and cultured at 37°C in a humidified incubator supplied with 5% CO 2 . Control groups of the cell and solvent, virus group, and drug administration group were set up. After 24 h, cells were exposed to SARS-CoV-2 (100 50% tissue culture infective doses [TCID50]) for 2 h, washed, and cultured in different concentrations of compounds or fresh culture medium for 3 days. CPE was observed under a light microscope. IC 50 values (n = 3) were calculated by the Reed−Muench method and GraphPad Prism 7. All of the above experiments were carried out in a BSL-3 laboratory. 2.9. Molecular Docking. In general, molecular docking is often used in structure-based virtual screening models to study the possible binding modes between ligands and proteins in protein complexes. Based on the CHARMm molecular force field, CDOCKER 20 in DS 2018 first randomly searches the conformations of small molecules using the molecular dynamics method and then optimizes each structure in the active site region of the receptor by simulated annealing to produce more accurate docking results. To ensure the reliability of molecular docking, we selected the crystal structure of the protein−ligand complex with a resolution of less than 2.5 Å to establish a molecular docking model. The crystal complex structure of SARS-CoV-2 3CL pro and its active ligand N3 with a resolution of 2.16 Å was downloaded from the Protein Data Bank (PDB ID: 6LU7). The SARS-CoV-2 3CL pro crystal complex structure was pretreated in DS 2018. The active pocket of the protein−ligand docking was defined, and then, the ligand in the SARS-CoV-2 3CL pro structure was cut out and docked back to the intended active site. After docking, the molecular conformations generated by docking were compared with the original molecular conformation of Journal of Chemical Information and Modeling pubs.acs.org/jcim Article the ligand in the protein crystal structure, and the related rootmean-square deviations (RMSDs) were calculated. There were ten conformations, and more than half of the RMSDs were less than 2. The docking method was considered suitable for the studied system. On this basis, the compounds with potential anti-3CL pro activity were analyzed and verified. 3.2. Chemical Spatial Diversity Analysis. PCAs of the compounds in the data sets were carried out according to the reserved 12 molecular descriptors, and the results are presented in Figure 3 . The PC1 values of the compounds in the training set, test set, and external test set ranged from −6 to 6, the PC2 values ranged from −6 to 4, and the PC3 values were between −5 and 4, indicating that the chemical spatial distributions of the compounds in the three data sets were wide enough and could overlap well. Tanimoto similarity analysis is another method commonly used to evaluate the spatial distribution of compounds in data sets. The smaller the Tanimoto similarity coefficient, the greater the diversity of compounds. We calculated the Tanimoto similarity coefficients of the chemical compounds in the training set, test set, and external test set based on the molecular fingerprint ECFP-6. As shown in Table 1 , the Tanimoto similarity coefficients of the compounds in the three data sets were 0.105, 0.111, and 0.101, respectively, indicating that the compounds in the three data sets had good chemical structure diversity. 3.3. Validation of Classification Models. Based on the NB and RP algorithms, eight classification models (NB-1−NB-4 and RP-1−RP-4) were constructed using optimized 2D molecular descriptors combined with ECFP_4, ECFP_6, and MACCS molecular fingerprints. Table 2 shows the results for 5-fold cross verification and test set verification. The NB-1 and RP-1 models established only by 12 kinds of DS_2D_MD performed poorly. In the internal 5-fold cross verification of the two models, the values of MCC were 0.595 and 0.758, respectively, and in the test set verification, the values of MCC were 0.507 and 0.760, respectively. The classification models established by the combination of different molecular fingerprints and DS_2D_MD (NB-2−NB-4, RP-2−RP-4) were 1 and RP-1) , that is, the introduction of molecular fingerprints improved the prediction ability of the classification models to a great extent. The NB models with molecular fingerprints ECFP_4 (NB-2) and ECFP_6 (NB-3) performed better. The MCC values of the two models in the internal 5-fold cross verification were 0.953 and 0.988, respectively, and the MCC values in the test set verification were both 0.946. The performance of the RP model with MACCS molecular fingerprint (RP-4) was better than that of the RP models with molecular fingerprints ECFP_4 (RP-2) and ECFP_6 (RP-3) in internal 5-fold cross verification, but the MCC value of RP-4 in test set verification was slightly lower than that of RP-2 and RP-3. In addition, to further investigate the predictive ability of the models, 40 compounds with potential 3CL pro inhibitory activity were collected from the recently published literature Journal of Chemical Information and Modeling pubs.acs.org/jcim Article and combined with 120 decoys to form an external test set. The performances of NB-2, NB-3, RP-2, RP-3, and RP-4 were better in internal 5-fold cross verification and test set verification, so an external test set was carried out to further validate the above models, and the results are shown in Table 3 . The NB models in external test set verification were higher in Q values and AUC values but lower in MCC values. The RP model with the MACCS molecular fingerprint had a higher Q value, MCC value, and AUC value in the external test set verification. Considering the results of internal 5-fold cross verification, test set verification, and external test set verification, five models, including NB-2, NB-3, RP-2, RP-3, and RP-4, were used to comprehensively predict the natural product molecular database of our laboratory. Analysis. The introduction of fingerprints into the NB model provides information on the dominant and inferior structural fragments that play a crucial part in active compounds. Fifteen dominant fragments and fifteen inferior fragments were obtained by analyzing the Bayesian scores of structural fragments from the NB-3 (MD + ECFP_6) model, which provided a reference for the rational design of 3CL pro inhibitors. As shown in Figure 4 , most of the 15 dominant fragments contained amide bonds, and most of the 15 inferior fragments contained sulfonyl and nitrogen negative ions, which suggested that the existence of amide bonds was beneficial to inhibiting the activity of 3CL pro , while the existence of sulfonyl and nitrogen negative ions was not conducive to the inhibition of 3CL pro activity. 3.5. Prediction Results for Compounds. A total of 5766 natural chemical components in the database of our laboratory were predicted, among which 347 compounds were identified as active compounds by five models, and the EstPGood values of 347 compounds were more than 0.6 in the NB-3 (MD + ECFP_6) model. Further ADME analysis was carried out to remove the chemical compounds that fit any of the listed conditions: (1) the solubility was no more than 8, (2) CYP2D6 enzyme inhibition activity was true, (3) the absorption availability was greater than or equal to 2. There were 202 compounds left. The distribution of ADME parameters is given in Figure 5 . After that, toxicity prediction analysis was carried out to eliminate the compounds with toxicity possibilities greater than 0.7. Finally, 139 compounds were retained, and 31 compounds (Supporting Information Table S1 ) were selected for further in vitro activity detection. 3.6. FRET Detection of SARS-CoV-2 3CL pro . Taking ebselen as a reference compound, the inhibitory activity of 31 compounds on SARS-CoV-2 3CL pro was detected using the FRET technique. As shown in Table 4 and Figure 6 , the IC 50 value of ebselen detected was 0.76 μM ( Figure 6A) , which was similar to that previously reported (IC 50 = 0.67 μM). Among the 31 compounds, (+)-shikonin and shikonin had strong activity against SARS-CoV-2 3CL pro , and the IC 50 values were 4.38 μM and 4.50 μM, respectively. The IC 50 value of Figure 6B−E) . Cells. On the basis of 3CL pro inhibitory activity detection of the compounds, the active compounds were further tested for cellular-level activity inhibiting SARS-CoV-2, which was estimated through the CPE of VeroE6 cells under viral infection. It was reported that (+)-shikonin and shikonin could not inhibit the replication of SARS-CoV-2, 21 and then, the antiviral effect of scutellarein and 5,3′,4′-trihydroxyflavone was evaluated against SARS-CoV-2 in VeroE6 cells. According to the results of the CPE assay, 5,3′,4′-trihydroxyflavone showed certain antiviral effects ( Figure 6F , IC 50 = 8.22 μM). The median toxic concentration (TC 50 ) value of 5,3′,4′-trihydroxyflavone in the absence of viral infection was 131.66 μM, and the selection index (SI) was 16 ( Table 4 ). The antiviral activity and cytotoxicity of 5,3′,4′-trihydroxyflavone showed a good tendency to separate, suggesting that 5,3′,4′-trihydroxyflavone may be a promising candidate for further research to help develop more potent 3CL pro inhibitors against SARS-CoV-2. 3.8. Verification of Molecular Docking. Furthermore, the binding modes of 5,3′,4′-trihydroxyflavone, scutellarein, and shikonin with SARS-CoV-2 3CL pro were revealed by CDOCKER (Figure 7) . The original ligand N3 of SARS-CoV-2 3CL pro could form seven hydrogen bonds with amino acid residues of Glu166, His163, Gly143, Thr190, Gln189, His164, and Phe140 and carbon−hydrogen bonds with amino acid residues of Gln189, His164, Glu166, Met165, and His172. What is more, the potential interactions also included pi−alkyl interactions with Ala191 and Pro168 and alkyl interactions with Leu167, Met49, His41, and Met165. 5,3′,4′-Trihydroxyflavone could form hydrogen bonds similar to N3 with His163, Phe140, and Glu166. In addition, 5,3′,4′-Trihydroxyflavone could form another hydrogen bond with Ser144, pi−alkyl interaction with Met165, and pi−pi T-shaped interaction with His41. Scutellarein could form hydrogen bonds, pi−alkyl bonds, and pi−pi T-shaped interaction similar to 5,3′,4′trihydroxyflavone with His163, Phe140, Glu166, Met165, and His41. However, scutellarein also could form carbon−hydrogen bonds and pi−sulfur interaction with Arg188 and Cys145, respectively. Shikonin could interact with His163 to form hydrogen bonds similar to N3, with Gln189 to form carbon− hydrogen bonda; with Met49, His41, and Met165 to form alkyl interactions; and with Cys145 to form pi−sulfur interactions. To date, the spread of the novel coronavirus has disrupted the normal life order of many countries around the world and has laid a heaven burden on the country's economic development. At present, related vaccines against the virus have been introduced into the market, and people in many countries have been vaccinated, but the adverse reactions and effective duration after vaccination still need further clinical confirmation. Although relevant drugs are also under urgent development, there are still no specific drugs in the market, so screening and identifying all potential and available drugs are still important for controlling and alleviating the epidemic. 3CL pro is an enzyme necessary for coronavirus replication that can cleave polymers to produce nonstructural proteins and may also interfere with the host's innate antiviral immune response. 3CL pro is highly conserved in different coronaviruses and has no homologous protein in humans. Inhibiting the activity of this enzyme can effectively interfere with virus replication and proliferation and reduce mutation-mediated drug resistance. In this study, NB and RP algorithms were used to establish classification models for 3CL pro . First, active compounds and inactive compounds of 3CL pro were collected, and molecular descriptors were optimized by correlation evaluation and stepwise linear regression. Then, eight classification models were established based on the optimized molecular descriptors combined with ECFP_4, ECFP_6, and MACCS molecular fingerprints. According to the results of 5-fold cross verification, test set verification and external test set verification, the optimal models were selected. Through the prediction of the natural product molecular database collected and integrated by our previous work, 139 chemical components were predicted to be positive and had good ADMET parameters. Thirty-one compounds were further tested in vitro by the FRET method, among which Journal of Chemical Information and Modeling pubs.acs.org/jcim Article (+)-shikonin, shikonin, scutellarein, and 5,3′,4′-trihydroxyflavone showed certain activity inhibiting SARS-CoV-2 3CL pro . In the CPE assay, 5,3′,4′-trihydroxyflavone showed an antiviral effect. Also, the possible binding modes of 5,3′,4′-trihydroxyflavone, scutellarein, and shikonin with SARS-CoV-2 3CL pro were analyzed through CDOCKER in DS 2018. Shikonin, a purple-red tea quinone natural pigment extracted from the root of the natural plant Zongfu, possesses anticancer, anti-inflammatory, and antibacterial functions and is mainly used in the treatment of acute icteric or nonicteric hepatitis and chronic hepatitis. It has been reported that shikonin can effectively inhibit the activity of SARS-CoV-2 3CL pro in FRET analysis, 22 which is consistent with the result of our study. Scutellarein, a flavonoid mainly existing in Erigeron karvinskianus, owns anti-inflammation functions, relieves pain, dispels wind and dampness, and so on. Studies have shown that it has certain inhibitory activity against coronavirus. 23 We further verified its activity in inhibiting SARS-CoV-2 3CL pro by virtual screening and FRET analysis. However, shikonin and scutellarein did not show the activity of inhibiting SARS-CoV-2 in the CPE assay. There was no related report on 5,3′,4′-trihydroxyflavone having inhibitory activity on SARS-CoV-2 3CL pro and SARS-CoV-2. We first found that 5,3′,4′trihydroxyflavone had certain inhibitory effects on SARS-CoV-2 3CL pro with FRET detection and SARS-CoV-2 in the CPE assay. Based on the above analysis, NB and RP virtual screening models were established for the first time to predict the active natural products against 3CL pro . The inhibitory activity of 5,3′,4′-trihydroxyflavone on SARS-CoV-2 3CL pro in FRET detection and SARS-CoV-2 in the CPE assay was reported first. The binding modes of 5,3′,4′-trihydroxyflavone with SARS-CoV-2 3CL pro were explained and verified by molecular docking. This study lays a foundation for further in vivo and clinical research and speeds up the discovery of new drugs against novel coronavirus. The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.1c01089. Information of 31 compounds selected by optimal models and ADMET ( World Health Organization. WHO Coronavirus Disease (COVID-19) Dashboard World Health Organization. Draft Landscape of COVID 19 Candidate Vaccines World Health Organization. WHO Director-General's Opening Remarks at the Media Briefing on COVID-1925 SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor The COVID-19 Pandemic: A Comprehensive Review of Taxonomy, Genetics, Epidemiology, Diagnosis, Treatment, and Control The crystal structures of severe acute respiratory syndrome virus main protease and its complex with an inhibitor Structural basis of SARS-CoV-2 3CL(pro) and anti-COVID-19 drug discovery from medicinal plants Potential inhibitors of coronavirus 3-chymotrypsinlike protease (3CL(pro)): an in silico screening of alkaloids and terpenoids from African medicinal plants Strengths and Weaknesses of Docking Simulations in the SARS-CoV-2 Era: the Main Protease (Mpro) Case Study Analytic posteriors for Pearson's correlation coefficient Novel Scaffold FingerPrint (SFP): applications in scaffold hopping and scaffold-based selection of diverse compounds PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints Modeling Tanimoto Similarity Value Distributions and Predicting Search Results Bayesian methods in virtual screening and chemical biology Recursive Partitioning with Nonlinear Models of Change Comparison of the predicted and observed secondary structure of T4 phage lysozyme Reflection on modern methods: Revisiting the area under the ROC Curve Highly adaptable and sensitive protease assay based on fluorescence resonance energy transfer Flexible CDOCKER: Development and application of a pseudo-explicit structure-based docking method within CHARMM Evaluation of SARS-CoV-2 3C-like protease inhibitors using selfassembled monolayer desorption ionization mass spectrometry Structure of M(pro) from SARS-CoV-2 and discovery of its inhibitors Roles of flavonoids against coronavirus infection ⊥ Jun Zhao and Qinhai Ma are co-first authors. The authors declare no competing financial interest. A total of 5766 predicted compounds are derived from the natural product database of the screening Center Laboratory of Institute of Medicine, Chinese Academy of Medical Sciences, and are not open to the public. Other databases can be predicted by our model. The process of data collection and model prediction can be found in the method section of this paper, and the database involved are the BindingDB database (http://www.bindingdb.org) and the DUD-E database (http://dude.docking.org). Discovery Studio version 2018, which comes from BIOVIA, is paid software. PaDEL-Descriptor software can be downloaded at http://padel.nus. edu.sg/software/padeldescriptor. This work was supported by the National Natural Science Foundation of China (81673480), the Drug Innovation Major P r o j e c t ( N o s . 2 0 1 8 Z X 0 9 7 1 1 0 0 1 -0 0 3 -0 0 2 , a n d 2018ZX09711001-012), and the CAMS Major collaborative innovation fund for major frontier research (2020-I2M-1-003).