key: cord-0948963-pkwthszq
authors: Liu, Qiaoming; Wan, Jun; Wang, Guohua
title: A survey on computational methods in discovering protein inhibitors of SARS-CoV-2
date: 2021-10-08
journal: Brief Bioinform
DOI: 10.1093/bib/bbab416
sha: 16c618e2738ae7e61c6307dfd24bd8c8e15d1371
doc_id: 948963
cord_uid: pkwthszq

The outbreak of acute respiratory disease in 2019, namely Coronavirus Disease-2019 (COVID-19), has become an unprecedented healthcare crisis. To mitigate the pandemic, there are a lot of collective and multidisciplinary efforts in facilitating the rapid discovery of protein inhibitors or drugs against COVID-19. Although many computational methods to predict protein inhibitors have been developed [ 1– 5], few systematic reviews on these methods have been published. Here, we provide a comprehensive overview of the existing methods to discover potential inhibitors of COVID-19 virus, so-called severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). First, we briefly categorize and describe computational approaches by the basic algorithms involved in. Then we review the related biological datasets used in such predictions. Furthermore, we emphatically discuss current knowledge on SARS-CoV-2 inhibitors with the latest findings and development of computational methods in uncovering protein inhibitors against COVID-19.

Since the first instance of the new coronavirus, Coronavirus Disease-2019 (COVID- 19) , was uncovered in Hubei Province, China in December 2019, there have been approximately 18 months after turning the local pandemic into the global one. As of 8 June 2021, a total of about 174 million people were infected by COVID-19, including over 3 870 000 deaths worldwide [6] . The pandemic has devastating consequences not only on humans lives but also on the global economy, including more than 8.5 trillion US dollars lost in 2020 and 2021 [7, 8] . Therefore, there is an urgent need to control the pandemic by accelerating the development or production of effective drugs against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). According to the previous studies [9, 10] , SARS-CoV-2 is the single-stranded enveloped RNA virus with a symmetrical nucleocapsid. The viral genome of SARS-CoV-2 is highly similar to those of SARS-CoV and MERS-CoV [11] , whose outbreaks happened within two decades in China and Saudi Arabia, respectively. Hence, the drugs or inhibitors designed for SARS-CoV and MERS-CoV were considered to be applied for SARS-CoV-2 as well. For example, SARS-CoV enters into the target cells through the structure spike (S) protein by binding to the angiotensinconverting enzyme 2 (ACE2) receptor [10] . The conservation of spike protein of SARS-CoV-2 suggests that the same interaction between the spike protein and the ACE2 receptor would be remained during the processing of inflection. In addition, other small molecules can be potential targets that play critical roles in viral genome replication and gene transcription, e.g. RNA-dependent RNA polymerase (RdRp), or cleavage and activation of the spike protein to enter the host genome and assist genome replication, e.g. type 2 transmembrane serine protease (TMPRSS2). They also keep the similar conservative characteristics among SARS-CoV, MERS-CoV and SARS-CoV-2.

Regardless of their unknown side effects, effective vaccines have been developed against the SARS-CoV-2 infection, like BioNTech vaccine (from Germany), Moderna vaccine (from the USA), Sinopharm vaccine (from China) and AstraZeneca vaccine (from Britain) [12] . Among them, messenger RNA (mRNA)based vaccines are a relatively novel technology that remained to be further proven. Both available mRNA-based vaccines, Moderna and BioNTech, encode the spike protein of SARS-CoV-2 binding with the ACE2 receptor. But the SARS-CoV-2 virus has mutated frequently during its evolution and transmission [13, 14] , resulting in genetic variations in the population of circulating viral strains throughout the COVID-19 pandemic. Until June 2021, multiple major variants of SARS-CoV-2 have dominated the world, e.g. Alpha virus (found in England), Beta virus (found in South Africa), Delta virus (found in India), [15] . Variants of SARS-CoV-2 have different characteristics, leading to unknown efficacy of the existing vaccine against the mutated virus [16] . Hence, design of inhibitors or drugs for specifically mutated SARS-CoV-2 is still necessary and challengeable. Figure 1 presents the timeline of major events related to the SARS-CoV-2 outbreak and vaccine development during 2020 and 2021 until 30 June 2021.

Compared with the traditional drug/inhibitor design process which is time-consuming and costly, computational methods for drug/inhibitor design are highly efficient to predict or identify potential molecules for the disease treatment [17] . Thus, the computer-aided approaches have great potentials for rapidly designing drugs or vaccines for mutated SARS-CoV-2. In the past months, there were several small molecules identified as potential inhibitors targeting SARS-CoV-2, even though more experimental validations are needed on the molecular targets. Among the molecular targets of SARS-CoV-2, main protease (M pro ) or 3-chymotrypsin-like protease (3CL pro ) [18] , structure proteins (e.g. spike protein), and nonstructure proteases, such as RdRp, and helicase [19] , are highly conserved as well as essential to the viral life process. The structural information and functional roles of these major molecular targets against SARS-CoV-2 are summarized in Table S1 in the supplementary material.

We will start the review with diverse computational methods for drug and inhibitor design, followed by detailed descriptions and discussions on the findings of multiple enzymes as valid targets for potential inhibitors to treat coronaviruses diseases.

Computer-Aided Drug Design (CADD) emerged as an efficient method to uncover potential lead compounds and aiding the development of possible drugs for a wide range of diseases based on the knowledges collected by huge compound libraries [20] . Typically, CADD has three types of approaches, including structure-based drug design (SBDD), ligand-based drug design (LBDD) and virtual screening (VS). Furthermore, machine learning-based drug design (MLDD) has been widely applied with the rapid development of computer science communities [21, 22] . Herrin, we will provide a brief summary of CADD approaches and related databases as seen in Figure 2 .

With the development of chemical biology and structural biology technology, the structural information of more and more drugs has been uncovered, providing essential elements for SBDD. Depending on the 3D structure of targets (proteins), such as Xray crystallography or NMR spectroscopy, SBDD method predicts the potential interaction by evaluating the strength of the binding force between small molecule compounds and targets with the known structure. Molecular docking, a molecular modeling technique as the most basic method in SBDD, allows exhaustive search for the most suitable binding conformation of small molecules in the binding pocket of the protein. The framework of molecular docking is a search algorithm in which the ligand conformation is computed recursively until it converges to the lowest energy. It can effectively determine the ligand molecules that match the spatial and electrical characteristics of the active sites of the target receptors. At present, molecular docking plays an increasingly important role in SBDD [23] . Some common molecular docking software are listed in Table S2 , including AutoDock [24] , AutoDock Vina [25] , AutoDockFR [26] , ZDOCK [27] , Glide [28] , Flare [29] , Induced Fit [30] , MolDock [31] and M-ZDOCK [32] .

However, the 3D structures of some drug targets have not been resolved successfully. For such cases, people developed another approach for direct drug design, LBDD, by taking advantage of existing compounds with known biological activities then establishing the relationship between query molecules and the bioactive molecules. In general, LBDD first converts the molecular structure into digital descriptors from a constructed database, e.g. molecular fragments, physiochemical properties, topology and pharmacophores, then generates the relationship between the molecular activities and constructs these descriptors by specific design models. The new drug molecules can be predicted or designed based on proper statistical methods, whereas their possible targets can be inferred from the bioactive molecules having high chemical affinities with the query. In addition to two common LBDD methods, quantitative structure-activity relationship (QSAR) and pharmacophore model [33] , other popular LBDD-based software and their corresponding information are listed in Table S3 , including McQSAR [34] , SYBYL-X [35] , TOPS-MODE [36] , LigandScout [37] , PLIP [38] , FindSite-metal [39] , CORAL [40] .

VS is another technique that uses a high-performance computer to analyze large databases of compounds to identify potential drug candidates that bind well to known structural targets [22] . There are two specific strategies for VS: receptor structure-based and ligand similarity-based. Despite different detailed strategies in the VS, the following four steps are essential: (i) preparing target protein and compound database; (ii) docking the molecules in the molecular library with the target one by one; (iii) obtaining a reasonable binding mode according to scores of the binding modes between small molecule and target, then evaluating the binding strength; (iv) purchasing selected preranked screened compounds followed by the activity tests.

The whole VS process can be carried out on computers by indexing the structures of compound molecules in the database instead of purchasing and testing the real compound molecules before the selection. Obviously, VS method is more convenient, cost efficient and quicker, compared to the experimental synthesis. Table S4 lists some common VS tools with brief introductions, e.g. PyRx [41] , LiSiCA [42] , MTiOpenScreen [43] , iScreen [44] , DockThor [45] , GOLD [46] , FlexX-Scan [47] .

Machine learning (ML) is an advanced data analysis method to improve the model automatically through the learning process from data and patterns [48] . ML technologies have been widely used in many fields, such as computer vision [49] [50] [51] , natural language processing [52] [53] [54] [55] and bioinformatics [56] [57] [58] [59] [60] . MLDD adopts various algorithms, such as recursive partitioning, support vector machine (SVM), k-nearest neighbors and neural networks [61] [62] [63] , to investigate the activities of compounds against a target before the clinical trials [64, 65] . For example, Holden et al. [66] applied the SVM classification algorithm to the analysis of structure-activity relationship to predict the inhibition of dihydrofolate reductase by pyrimidines. Meng et al. [67] proposed persistent spectral-based ML models for drug design, which consist of the persistent spectral graph, persistent spectral simplicial complex and persistent spectral hypergraph based on the spectral theory. Now the integration of SARS-CoV-2 related studies with modern ML algorithms becomes a hot topic in drug repurposing models [68] [69] [70] .

Drug discovery and development have initiated many databases containing essential information and knowledge of combinatorial chemistry synthesis, genomics/genetics and drugs or drug candidates. In turn, these databases enhance and improve the CADD. Generally, publicly available datasets about drugs or drug candidates' discovery can be classified into seven scenarios (Table S5 ): (i) chemical molecules for activities against biological assays (e.g. PubChem [71] and ChEMBL [72] ); (ii) chemical features for drug compounds (e.g. DrugBank [73] and SuperDRUG2 [74] ); (iii) drugs verified by FDA (e.g. e-Drug3D [75] ); (iv) drug targets with genetic and proteomic information (e.g. BindingDB [76] and BioGRID [77] ); (v) metabolome or pathway related information (e.g. HMDB [78] and SMPDB [79] ); (vi) drug side effects (e.g. Drug-Matrix [80] and SIDER [81] ); (vii) clinical databases (e.g. AACT database [82] and PharmGKB [83] ).

These databases provide a variety of knowledges about drug candidates including physicochemical properties, molecule structure, in addition to diverse data in vitro, in vivo and from clinical. For example, PubChem [71] is a database of chemical molecules collected by the National Center for Biotechnology Information (NCBI). The NCBI now hosts three dynamically growing primary databases, including 111 million entries of compounds, 293 million entries of substances, and bioactivity results from 1.25 million high-throughput screening assays. Similar to PubChem, ChEMBL [72] is a publicly available database, containing information on binding, functional and ADMET for drug-like bioactive compounds. Currently, the database consists of 5.4 million bioactivity measurements for more than 1 million compounds and 5200 protein targets, which were manually abstracted from the primarily published literatures. There are some databases about drug compounds. For example, DrugBank [73] combines drug data with the information of drug targets and drug actions, which has been widely used in drug-target discovery, drug design, drug docking or screening, and drug interaction prediction. It collects approximately 4900 drug entries including 60% more FDA-approved small molecules and 10% more experimental biotech drug rugs. DrugBank has significantly improved the simplicity of its infrastructure and text query searches in the later updates. The e-Drug3D [75] is a 3D chemical structure database for drugs that provides several collections of drugs and commercial drug fragments. It currently contains 1519 annotated 3D structures of 1305 different FDAapproved drugs with molecular weight less than 2000. In the meantime, the drug databases in genetic and proteomic provide another scenario for drug design or discovery. As of September 2018, BioGRID [84] has recorded 1 598 688 biological interactions manually annotated from 55 809 publications for 71 species. BioGRID also accumulates details for over 700 000 posttranslational modification sites. The recently updated BioGRID also annotates genome-wide CRISPR/Cas9-based screens with gene-phenotype and gene-gene relationships.

During the drug development phases, biological information for therapeutic or metabolism are important and valuable. For example, the HMDB [78] , released in 2007, is now considered as the standard metabolomic resource for human metabolic studies including information about human metabolites, physiological concentrations, disease knowledge, chemistry associations, reference spectra and metabolic pathways. Side effects, known as adverse events to a drug, are a crucial research point in drug repurposing. DrugMatrix [80] has been developed based on drug toxicities, consisting of the comprehensive results of thousands of highly controlled and standardized toxicological experiments. It focuses on toxicities research with more than 200 compounds tested in vivo in rat tissues and 125 compounds in the in vitro rat hepatocytes. There is no doubt thought of clinical data which can provide highquality information supporting drug design or discovery. PharmGKB [83] is an open-access database with clinically relevant information, collecting approved drug labels, genedrug interactions and relationships between genotype and phenotype. The corresponding detailed information can be found in Table S5 .

The main protease (M pro , also known as 3CL pro ) is recognized as a key enzyme to play a dominant role in the processing of mediating viral transcription and replication [85] . Since the binding pocket of this enzyme is highly conserved among all coronaviruses, like SARS-CoV, MERS-CoV and HCV, the antiviral drug targeting M pro may be effective against SARS-CoV-2 as well [86] . Indeed, a lot of recent studies have been published that employed CADD to discover anti-SARS-CoV-2 agents against M pro by different strategies, e.g. structure-based, ligand-based, VS or ML-based approaches ( Figure 3 ).

For example, people used the structure-based docking approaches to predict the inhibitory activity and help drug design against SARS-CoV-2 M pro [87] . Yu et al. [88] screened potential drugs by molecular docking to examine the effects of some common antiviral drugs like ribavirin, remdesivir, chloroquine and honeysuckle (a traditional Chinese medicine) as shown in Figure 4 . Importantly, they recognized the luteolin as the control molecule is the main flavonoid in honeysuckle (Figure 3 ), which had a high binding affinity to the same sites of the main protease of SARS-CoV-2. Motonori Tsuji [4] performed structural refinement and energy calculations in the presence of peptidomimetic α-ketoamide inhibitors (PDB ID: 6Y2G, shown in Figure 5A ). They found 28 bioactive compounds, including CHEMBL3236740, CHEMBL1447944 and others, were identified as effective anti-SARS-CoV-2 drug candidates ( Figure 3 ). Singh et al. [89] identified several compounds, glucogallin, mangiferin, N3, remdesivir and X77 which had stronger binding affinities with M pro . Furthermore, the results suggest that the phlorizin had the lowest binding free energy toward M pro (Figure 4) , followed by glucogallin and mangiferin.

However, long-range interactions have not been discussed as often as the short-range interactions during the selection of candidate inhibitors. Sencanski et al. [90] used the protocol with both long-range and short-range interactions to select inhibitor candidates. They applied the informational spectrum method and molecular docking for small molecules to search the DrugBank database. Interestingly, 57 drugs were identified as potential SARS-CoV-2 M pro inhibitors. Additionally, tinospora crispa ( Figure 3 ) was recognized as one potential COVID-19 M pro inhibitor based on another independent molecular docking study [91] .

To rapidly discover lead compounds for clinical treatments, Jin et al. [86] investigated a mechanism-based inhibitor (N3) by CADD and the crystal structure of M pro of SARS-CoV-2 with complex N3. They built a predicted model by integrating structurebased virtual and high-throughput screening, which assayed over 10 000 compounds as inhibitor candidates of M pro . One of these compounds, named ebselen, also had potential antiviral ability in cell-based assays (Figures 3 and 4) .

Besides small molecules, some researchers have tried much effort to find potential candidates from natural products (NPs) against SARS-CoV-2 [92] [93] . For example, Ibrahim et al. [94] screened the MolPort database with molecular docking techniques. The top 5000 natural-like products (NLPs) were chosen according to the corresponding docking scores, like MolPort-000-708-794 and MolPort-044-179-844 ( Figure 3 ). They found that the most promising NPs shared the same binding mode with key amino acid residues including HIS164, HIS163 and GLU166 based on molecular docking and molecular dynamics. The findings of these studies are expected to provide insight into the field of COVID-19 drug discovery [95] [96] [97] [98] [99] [100] [101] [102] .

Some recent studies have shown the feasibility of employing VS in inhibitor design of targeting M pro . For example, Abel et al. [103] developed a VS method with both ligand-and structurebased approaches. The proposed VS was performed for two NPs databases, Super Natural II [104] and Traditional Chinese Medicine [105] . Additionally, they used an integrated drug repurposing approach to identify potential inhibitors against SARS-CoV-2 M pro . Some drugs, like naldemedine, SN00017653, and pseudostellarin C, were identified as potential inhibitors for the first time ( Figure 3 ). Lee et al. [1] identified potential inhibitors against COVID-19 from the Korea Chemical Bank drug repurposing (KCB-DR) database [106] . The results suggest ceftaroline fosamil ( Figure 4 ) and the hepatitis C virus (HCV) protease inhibitor telaprevir as potential inhibitors against M pro .

Although some drugs, such as remdesivir, favipiravir or dexamethasone, have been known beneficial for COVID-19 treatment, they have limitations clinically for different reasons. Hence, Nayak et al. [107] accomplished the VS of a variety of US-FDA-approved drugs using computer-aided tools. The US-FDA-approved drug structures were selected from DrugBank. Among them, arbutin, terbutaline, barnidipine, tipiracil and aprepitant were identified as potential hits. Moreover, tipiracil and aprepitant bound to the M pro consistently, demonstrating potentially promising effects in pharmacologic treatments for COVID-19.

Structure-based VS is adopted to predict the best interaction between a ligand and a molecular target by scoring function. For example, Kumar et al. [108] utilized structure-based VS to identify hit molecules binding with the highest affinity to M pro . The results indicated that the hydrogen bonding and hydrophobic interactions are the major contributing factors in binding pocket of COVID-19 M pro . In addition, Hage-Melim et al. [109] used VS approaches based on the structure of the enzyme and two compound libraries to identify apixaban as a potential drug for future treatment of COVID-19. Fischer et al. [110] used shape screening and two docking protocols relevant for pharmacokinetics to narrow down commercially available compounds, leading to the natural compounds (−)-taxifolin and rhamnetin as potential inhibitors of M pro (Figure 3 ). These new findings may bring insight into our further understanding and discovery of inhibitor candidates targeting M pro [5, [111] [112] [113] .

The reliability and accuracy of the ligand-based CADD method have been proven [114] . Han et al. [2] utilized the ligand-protein docking and molecular dynamic simulation for ab initio study to explore the binding mechanism or inhibitory ability by comparing two types of drugs: (i) clinically approved drugs including chloroquine, hydroxychloroquine, remdesivir, ritonavir, beclabuvir, indinavir and favipiravir, and (ii) a designed α-ketoamide inhibitor (13b) (Figure 3 ). The results suggested chloroquine had the strongest binding affinity with M pro /3CL pro . Meanwhile, inhibitor 13b has a higher research priority to treat the SARS-CoV-2 since its improved inhibition efficiency. Eleftheriou et al. [115] uncovered that anticoagulant therapy has been proposed for the treatment of severe SARS-CoV-2 caused pneumonia, particularly, DPP-4 inhibitors may be more effective for SARS-CoV-2-infected diabetic patients.

QSAR model, the classical ligand-based CADD method, was also utilized in recent inhibitor design studies. For example, Ishola et al. [116] selected SARS coronavirus 3C-like protease (3CL pro ) inhibitors data from the CHEMBL database. They constructed a QSAR model using the data with high correlations, which made the model statistically significant. The analysis revealed that 3CL pro -compound 21, 3CL pro -compound 22, 3CL procompound 40 complexes (Figure 3) were steadier than the baseline complex (3CL pro -X77). Alves et al. [3] developed QSAR models of these inhibitors then applied these models in VS with drugs in the DrugBank by conducting similarity searching and molecular docking in parallel. As a result, 42 compounds were identified as consensus computational hits. They were reported coincidentally in subsequent experimental screening studies (https://o pendata.ncats.nih.gov/covid19/). Kumar et al. [117] developed a 2D-QSAR model based on multiple linear regression (MLR) with 3CL pro inhibitors. The proposed model clearly exhibited the structural features which enhanced the inhibitory activity against the 3CL pro enzyme. Additionally, the most and least active molecules were investigated using molecular docking tools to explore the molecular interactions involved in binding. Gogoi et al. [118] screened a library of 44 citrus flavonoids using molecular docking. The nontoxic compounds were further investigated with molecular dynamics simulation and predicted activity (IC50 value) with the 3D-QSAR model. They suggested taxifolin ( Figure 3 ) as a potential inhibitor against SARS-CoV-2 M pro which can be further analyzed by subsequent experiments for treatment of COVID -19. There are more literatures about ligand-based CADD in inhibitor candidates designing targeting M pro [119, 120] .

As ML techniques can be applied to the predictive scenario based on previous knowledges and well-known patterns, some recent studies have contributed to the development of MLbased CADD methods targeting M pro . For example, Huang et al. [121] developed a biological activity-based modeling (BABM) approach, by which the compound activity can be predicted for a new target or other assays by using profiles across multiple well-defined assays. This model obtained 311 compounds against SARS-CoV-2, 32% of which showed antiviral activity in a cell culture live virus assay. More importantly, the most potent compounds presented nanomolar concentration levels for a half-maximal inhibitory. Nayarisseri et al. [122] proposed a shape-based ML method, which generates the 3D shaped pharmacophoric features of the seed compound. Furthermore, molecular docking was performed with optimized potential for liquid simulations (OPLS) algorithms to recognize high affinity compounds targeting M pro . The shape-based ML reported that remdesivir, valrubicin, aprepitant and fulvestrant were the best therapeutic drugs (Figure 3 ) since the highest affinities with the target protein. They also found a novel compound 'nCorv-EMBS', which is not included in public chemical databases (PubChem, ZINC or ChEMBL) so far. The results of toxicity analysis suggested nCorv-EMBS was valuable to further research as the main protease inhibitor in COVID-19 [122] .

Inspired by ensemble learning, Gimeno et al. [123] first applied molecule docking against the structure of M pro using three popular tools: Glide [28] , FRED [124] and AutoDock Vina [25] . Then, they proposed a hybrid ensemble approach to generate hypothetic binding modes replying on three score functions. Seven possible SARS-CoV-2 M pro inhibitors were predicted including perampanel, carprofen, celecoxib, alprazolam, trovafloxacin, sarafloxacin and ethyl biscoumacetate (Figure 3 ). Battisti et al. [125] also proposed an inhibitor predicting framework, which not only combines molecular dynamics simulations with molecular docking but also focuses on the feature information of pharmacophore modeling and the flexibility of molecular dynamics simulations simultaneously. The proposed approach identified 10 compounds with high coronavirus inhibition potential.

In addition to the traditional data-driven ML modeling, some studies used deep learning-based approaches to predict potential inhibitors of SARS-CoV-2 M pro [126] . For example, Park et al. [127] recognize some potentially drugs against SARS-CoV-2 using the pretrained deep learning drug-target interaction model called Molecule Transformer-Drug Target Interaction. They found that atazanavir, remdesivir, efavirenz, ritonavir and dolutegravir were the chemical compounds, showing an inhibitory potency against the SARS-CoV-2 3CL pro . Interestingly, they found that lopinavir, ritonavir and darunavir, which were designed to target viral proteinases, also bound to the replication complex components of SARS-CoV-2. Bung et al. [128] employed deep generative and predictive models to discover small molecules targeting inhibiting M pro . The transfer learning and reinforcement learning was applied to optimize the proposed deep learning model, which learned chemical space around the protease inhibitors. Other features, including multiple physicochemical property filters and VS scores, were used for the final screening as well. Finally, they proposed 33 potential compounds for further synthesis and testing against SARS-CoV-2. Based on the structural model, Zhang et al. [129] performed a deep learning-based VS method to rank and identify protein-ligand interactions. The summary of drugs or inhibitors targeting SARS-CoV-2 M pro /3CL pro can be found in Table S6 .

SARS-CoV-2 contains four structural proteins, including membrane protein (M), spike protein (S), envelope protein (E) and nucleocapsid protein (N), in addition to 16 nonstructural proteins (NSP1-16 as seen in the next section) [130] . Among them, the S protein can mediate the process of coronaviruses entering into host cells, so it becomes an attractive antiviral target for COVID-19 treatment.

Computational approaches have been developed to predict potential SARS-CoV-2 inhibitors targeting S protein. Previous studies demonstrated ACE2 as the key factor for SARS-CoV-2 to enter the host cells being bound by the spike protein of SARS-CoV-2 ( Figure 5B ). Hence, ACE2 becomes another common target of drug intervention. Wen et al. [131] investigated the existing drugs according to their abilities to block the binding of S protein to ACE2. According to the pathogenesis of SARS-CoV-2 from the perspective of S protein and ACE2 binding, they found some substances, including peptide P6, griffithsin, EK1 and extracts from traditional chinese medicine, which fought against SARS-CoV-2 through binding ACE2 receptor, S protein, or inhibiting the host and virus. Faria et al. [132] also focused on the molecules that can inhibit the interaction between the S protein and human ACE2. They discovered some molecules at the interaction sites: four molecules in Tyr-491(Spike)-Glu-37(ACE2) and one in Gly-488(Spike)-Lys-353(ACE2). Furthermore, they found that the molecule 1629 and the molecule 2542 had significant inhibitory effects on the site of Gly488-Ly353 and Tyr491-Glu37, respectively, suggesting further laboratory tests on the combination of these molecules that can work at two interaction sites simultaneously. Additionally, the human furin protease, cleaving the S1-S2 domains involved in entering the host cell, may become the third target. CUBUK et al. [133] docked five drug molecules, favipiravir, hydroxychloroquine, remdesivir, lopinavir and ritonavir, on not only S protein and main protease but also human furin protease. The results of molecular docking revealed that the human furin protease can be a potential target of SARS-CoV-2, whereas remdesivir, a nucleic acid derivative, can be used as a template for designing novel furin protease inhibitors to fight against the disease. Taking advantage of the DrugBank and PubChem, Unni et al. [134] identified Bisoxatin (DB09219), a laxative drug, as a promising repurposable drug to develop a new chemical compound for inhibiting SARS-CoV-2 entry into the host, even though Bisoxatin was used to treat constipation and preparation. GR 127935 hydrochloride hydrate, GNF-5, RS504393, and eptifibatide acetate were found to connect to viral binding motifs of ACE2 receptor by Tomar et al. [135] . Table S6 presents the summary of drugs or inhibitors targeting SARS-CoV-2 S protein and ACE2.

Many computational approaches also focused on potential SARS-CoV-2 inhibitors targeting M protein, N protein and E protein, which were believed to be useful for further structure-based VS and other CADD drug and vaccine design. Dong et al. [136] searched the homologous templates of all structural proteins of SARS-CoV-2, including S, E and N proteins. Banerjee et al. [137] recognized micromolecules of inhibitors targeting M protein and E proteins of SARS-CoV-2 by integrating docking and simulation methods. They investigated some compounds from an Indian medicinal plant source (Azadirachta indica or Neem) and found 70 compounds against these two proteins. With molecular dynamics simulations, a few common compounds binding to both M and E proteins were recognized as potentially inhibit their functions. Table S6 lists drugs or inhibitors targeting SARS-CoV-2 proteins with essential information.

CADD against SARS-CoV-2: Targeting the nonstructure protein SARS-CoV-2 nonstructure proteins can be potential targets to inhibit SARS-CoV-2 as well. For example, RdRp, as shown in Figure 5C , plays a crucial role in the viral cycle of coronaviruses, particularly the replication of the viral genome, with the assistance of nonstructure proteins, NSP7 and NSP8, in a polymerase complex. It is not surprising to see that RdRp has been recognized as an important coronavirus target for drug design. Since SARS-CoV-2 has high similarity with other SARS viruses, targetbased VS and molecular docking on antiviral molecules of the SARS explored that the antiviral galidesivir had promise against SARS-CoV-2 as well [138] . Quinupristin was identified as one candidate which can bind in the RNA tunnel of RdRP and block the path and access on both sides with potentials to prevent viral replication and RNA synthesis [139] . Wu et al. [140] systematically compared SARS-CoV-2 genes encoding proteins with that from other coronaviruses, then predicted and built 19 structures with homology modeling. Based on ZINC drug database and their own NPs database, they found 78 antiviral drugs for SARS-CoV-2, which are currently on the market or undergoing clinical trials.

Helicase is another macromolecule viral replication enzyme, responsible for separating DNA and RNA into two singlestranded nucleic acids in the coronaviruses viral cycle unwinding ( Figure 5D ). Some studies have also suggested drugs and NPs as potential SARS-CoV-2 helicase inhibitors. For example, one study suggests that vapreotide and atazanavir, two approved drugs for treating AIDS-related diarrhea and HIV infection, are observed to interrupt the activities of the SARS-CoV-2 helicase significantly [141] . Mirza et al. [142] have proposed an integrative VS and molecular dynamics simulations approach for targeting the main protease, RdRp and helicase, which warrants in vitro testing to evaluate compound efficacy.

Iftikhar et al. [143] focused on a small molecule that specifically binds to three essential proteins (RdRp, 3CL pro and helicase). They found three FDA-approved drugs binding to 3CL pro , one drug-like molecule binding to RdRp, and two drug-like molecules specifically interacting with helicase.

The poly-ADP-ribose polymerase 1 (PARP1, shown in Figure 5E ) is also critical for viral replication [144] [145] [146] . Ge et al. [147] developed a data-driven drug repositioning framework combining ML and statistical analysis approaches to explore potential drug candidates against SARS-CoV-2, by integrating their large-scale data including knowledge graphs and transcriptome data from public domain and literatures. Based on the model, CVL218, a PARP1 inhibitor, was recognized as the repurposed therapeutic agent for COVID-19.

The host serine protease TMPRSS2 has a pivotal role in the viral entry of SARS-CoV-2 ( Figure 5F ). In the study conducted by Singh et al. [89] , they uncovered the strong binding affinity between TMPRSS2 and compounds, glucogallin, mangiferin, N3, remdesivir and X77. Among them, mangiferin showed the lowest binding free energy, followed by phlorizin and glucogallin.

Additionally, more studies focused on other nonstructure proteases in the viral replication/transcription process, such as NSP15 protein [148] (belongs to the EndoU enzyme family), C3 (complement component 3) [149] and N7-MTase [150] (Guanine-N7 Methyltransferase). According to the findings, glisoxepide, and idarubicin, used to treat diabetes and leukemia, respectively, were identified as the stronger binder of EndoU enzyme [151] . We summarized drugs or inhibitors targeting SARS-CoV-2 macromolecules as seen in Table S6 .

Since the outbreak of COVID-19, people around the world have put much effort into investing vaccines and drugs against SARS-CoV-2. CADD and ML techniques have been employed in many studies to target SARS-CoV-2 macromolecules, which are considered as feasible options to speed up the processes for drug design and discovery. Our paper reviewed the theory and applications of these approaches with specific databases from these studies. We explored the new findings of inhibitors as potential interventions and treatments of COVID-19.

However, considering the variations of SARS-CoV-2, we are still facing big challenges to make sure that developed vaccines and drugs can keep efficient for different viral strains with specific mutations. It is known that structural variations on or even close to the binding sites could dramatically impact ligand binding properties. Gossen et al. [152] redefined the druggability of the proteins as an integrated chemical space generated by multiple conformations of binding sites when ligand binding. This process revealed the unique blueprint of SARS-CoV-2 M pro , leading to a definition of a pharmacophore based on the specific structure, which provides a strong foundation for rational drug design for SARS-CoV-2 M pro . Ugurel et al. [153] analyzed 3458 SARS-CoV-2 genome sequences isolated from 58 countries. They found the incidence of C17747T and A17858G mutations on helicase (NSP13) were significantly higher than others. However, four drugs, including cangrelor, fludarabine, folic acid and polydatin, interrupted both the wild type and mutant SARS-CoV-2 helicase, suggesting that they can be the most potent drugs. We expect that our review can bring insight to identify antiviral inhibitors and potential drug candidates against diverse SARS-COV-2 variants.

• Discovering potential inhibitors or drugs of SARS-CoV-2 is critical in mitigating the pandemic impact of COVID-19.

• We give a brief overview of existing computer-aid drug design methods and biological databases used in predicting drugs or inhibitors.

• We provide a systematic review of current knowledge, latest findings using computational methods to discover protein inhibitors of SARS-CoV-2.

Supplementary data are available online at https://academi c.oup.com/bib.

A computational drug repurposing approach in identifying the cephalosporin antibiotic and anti-hepatitis C drug derivatives for COVID-19 treatment

Potential inhibitors for the novel coronavirus (SARS-CoV-2)

QSAR Modeling of SARS-CoV M(pro) inhibitors identifies sufugolix, cenicriviroc, proglumetacin, and other drugs as candidates for repurposing against SARS-CoV-2

Potential anti-SARS-CoV-2 drug candidates identified through virtual screening of the ChEMBL database for compounds that target the main coronavirus protease

Bioinformatic study to discover natural molecules with activity against COVID-19

COVID-19: molecular targets, drug repurposing and new avenues for drug discovery

Economic, social and political issues raised by the COVID-19 pandemic

The impact of Covid-19 pandemic on corporate social responsibility and marketing philosophy

Emerging therapeutic approaches to combat COVID-19: present status and future perspectives

Structural biology aids the research of new anti-COVID-19 drugs

Natural and nature-derived products targeting human coronaviruses

COVID-19 vaccines: where we stand and challenges ahead

Updated SARS-CoV-2 single nucleotide variants and mortality association

Genetic spectrum and distinct evolution patterns of SARS-CoV-2

Prospective mapping of viral mutations that escape antibodies used to treat COVID-19

Neutralization of variant under investigation B.1.617 with sera of BBV152 vaccinees

In silico screening of some naturally occurring bioactive compounds predicts potential inhibitors against SARS-COV-2 (COVID-19) protease

Potential compounds from several Indonesian plants to prevent SARS-CoV-2 infection: a mini-review of SARS-CoV-2 therapeutic targets

The potential of drug repositioning as a short-term strategy for the control and treatment of COVID-19 (SARS-CoV-2): a systematic review

Development of chemical inhibitors of the SARS coronavirus: viral helicase as a potential target

Comprehensive evaluation of ten docking programs on a diverse set of proteinligand complexes: the prediction accuracy of sampling power and scoring power

Advancing computer-aided drug discovery (CADD) by big data and data-driven machine learning modeling

Structure-based drug repositioning: potential and limits

Automated docking of flexible ligands: applications of AutoDock

AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading

AutoDockFR: advances in protein-ligand docking with explicitly specified binding site flexibility

ZDOCK: an initial-stage proteindocking algorithm

Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy

FLARE: An integrated software package for friction and lubrication analysis of automotive engines-Part II: Experimental validation

Molecular recognition by induced fit: how fit is the concept?

MolDock: a new technique for high-accuracy molecular docking

M-ZDOCK: a grid-based approach for C n symmetric multimer docking

Current advances in ligand-based target prediction

McQSAR: a multiconformational quantitative structure− activity relationship engine driven by genetic algorithms

Molecular modeling software packages (Version 2.0). TRIPOS Associates

A TOPS-MODE approach to predict permeability coefficients

LigandScout: 3-D pharmacophores derived from protein-bound ligands and their use as virtual screening filters

PLIP: fully automated protein-ligand interaction profiler

FINDSITE-metal: integrating evolutionary information and machine learning for structurebased metal-binding site prediction at the proteome level

CORAL software: prediction of carcinogenicity of drugs by means of the Monte Carlo method

Small-molecule library screening by docking with PyRx

LiSiCA: a software for ligand-based virtual screening and its application for the discovery of butyrylcholinesterase inhibitors

MTiOpenScreen: a web server for structure-based virtual screening

iScreen: world's first cloud-computing web server for virtual screening and de novo drug design based on TCM database@ Taiwan

DockThor 2.0: a free web server for protein-ligand virtual screening. XIX SBQT-Simpósio Brasileiro de Química Teórica

Docking of phosphonate and trehalose analog inhibitors into M. tuberculosis mycolyltransferase Ag85C: comparison of the two scoring fitness functions GoldScore and ChemScore, in the GOLD software

FlexX-scan: fast, structure-based virtual screening

Deep learning

Nonlinear subspace clustering via adaptive graph regularized autoencoder

TPNE: topology preserving network embedding

Fast sparse deep neural networks: theory and performance analysis

Principal component analysis

Learning the parts of objects by non-negative matrix factorization

Nonlinear dimensionality reduction by locally linear embedding

On spectral clustering: Analysis and an algorithm

Single cell RNA sequencing of human liver reveals distinct intrahepatic macrophage populations

SC3: consensus clustering of single-cell RNA-seq data

Deep learning: new computational modelling techniques for genomics

Clustering single-cell RNA-seq data with a model-based deep learning approach

Deep learning enables rapid identification of potent DDR1 kinase inhibitors

DrugClust: a machine learning approach for drugs side effects prediction

THPep: a machine learning-based approach for predicting tumor homing peptides

A recurrent neural network model to predict blood-brain barrier permeability

Machine learning techniques and drug design

Machine learning techniques applied to the drug design and discovery of new antivirals: a brief look over the past decade

Drug design by machine learning: support vector machines for pharmaceutical data analysis

Persistent spectral-based machine learning (PerSpect ML) for protein-ligand binding affinity prediction

Drug repurposing for COVID-19 using machine learning and mechanistic models of signal transduction circuits related to SARS-CoV-2 infection

A state-of-the-art survey on artificial intelligence to fight COVID-19

Drug repositioning by merging active subnetworks validated in cancer and COVID-19. medRxiv 2021

PubChem in 2021: new data content and improved web interfaces

ChEMBL: a large-scale bioactivity database for drug discovery

DrugBank: a knowledgebase for drugs, drug actions and drug targets

SuperDRUG2: a one stop resource for approved/marketed drugs

E-Drug3D: 3D structure collections dedicated to drug repurposing and fragment-based drug design

BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities

BioGRID: a general repository for interaction datasets

HMDB: the human metabolome database

SMPDB: the small molecule pathway database

An overview of National Toxicology Program's Toxicogenomic applications: DrugMatrix and ToxFX

The SIDER database of drugs and side effects

The database for aggregate analysis of ClinicalTrials. Gov (AACT) and subsequent regrouping by clinical specialty

PharmGKB: the pharmacogenetics knowledge base

The BioGRID interaction database: 2019 update

Potential inhibitors against 2019-nCoV coronavirus M protease from clinically approved medicines

Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors

Coronavirus disease 2019 drug discovery through molecular docking

Computational screening of antagonists against the SARS-CoV-2 (COVID-19) coronavirus by molecular docking

Protease inhibitory effect of natural polyphenolic compounds on SARS-CoV-2: an in silico study

Drug repurposing for candidate SARS-CoV-2 main protease inhibitors by a novel in silico method

Biochemical and computational approach of selected phytocompounds from Tinospora crispa in the management of COVID-19

Identification of phytochemical inhibitors against main protease of COVID-19 using molecular modeling approaches

A molecular modeling approach to identify effective antiviral phytochemicals against the main protease of SARS-CoV-2

Natural-like products as potential SARS-CoV-2 M(pro) inhibitors: in-silico drug discovery

Computational design of ACE2-based peptide inhibitors of SARS-CoV-2

Interactive molecular dynamics in virtual reality is an effective tool for flexible substrate and inhibitor docking to the SARS-CoV-2 main protease

An investigation into the identification of potential inhibitors of SARS-CoV-2 main protease using molecular docking study

Repurposing drugs against the main protease of SARS-CoV-2: mechanism-based insights supported by available laboratory and clinical data

Potential of NO donor furoxan as SARS-CoV-2 main protease (M(pro)) inhibitors: in silico analysis

Ligand-based approach for predicting drug targets and for virtual screening against COVID-19

Moroccan medicinal plants as inhibitors against SARS-CoV-2 main protease: computational investigations

Pharmacoinformatics and molecular dynamics simulation studies reveal potential covalent and FDA-approved inhibitors of SARS-CoV-2 main protease 3CL(pro)

Computational prediction of potential inhibitors of the main protease of SARS-CoV-2

Super natural II-a database of natural products

Traditional Chinese medicine database and application on the web

Diversity of compounds in Korea Chemical Bank

Targeting SARS-CoV-2 main protease: a computational drug repurposing study

Computational investigation of potential inhibitors of novel coronavirus 2019 through structure-based virtual screening, molecular dynamics and density functional theory studies

Virtual screening, ADME/Tox predictions and the drug repurposing concept for future use of old drugs against the COVID-19

Potential inhibitors for novel coronavirus protease identified by virtual screening of 606 million compounds

Virtual screening, molecular dynamics and structure-activity relationship studies to identify potent approved drugs for Covid-19 treatment

Putative inhibitors of SARS-CoV-2 main protease from a library of marine natural products: a virtual screening and molecular Modeling study

Drug repurposing against SARS-CoV-2 using E-pharmacophore based virtual screening, molecular docking and molecular dynamics with main protease as the target

Exploring the SARS-CoV-2 proteome in the search of potential inhibitors via structure-based pharmacophore modeling/-docking approach

In silico evaluation of the effectivity of approved protease inhibitors against the main protease of the novel SARS-CoV-2 virus

QSAR modeling and Pharmacoinformatics of SARS coronavirus 3C-like protease inhibitors

Development of a simple, interpretable and easily transferable QSAR model for quick screening antiviral databases in search of novel 3C-like protease (3CLpro) enzyme inhibitors against SARS-CoV diseases

Computational guided identification of a citrus flavonoid as potential inhibitor of SARS-CoV-2 main protease

Computational determination of potential inhibitors of SARS-CoV-2 main protease

Computational discovery of small drug-like compounds as potential inhibitors of SARS-CoV-2 main protease

Biological activity-based modeling identifies antiviral leads against SARS-CoV-2

Shape-based machine learning models for the potential novel COVID-19 protease inhibitors assisted by molecular dynamics simulation

Prediction of novel inhibitors of the main protease (M-pro) of SARS-CoV-2 through consensus docking and drug reposition

A computational approach to identify potential novel inhibitors against the coronavirus SARS-CoV-2

Potential COVID-2019 3C-like protease inhibitors designed using generative deep learning approaches

Predicting commercially available antiviral drugs that may act on the novel coronavirus (SARS-CoV-2) through a drug-target interaction deep learning model

De novodesign of new chemical entities for SARS-CoV-2 using artificial intelligence

Deep learning based drug screening for novel coronavirus 2019-nCov

SARS-CoV-2: structure, biology, and structure-based therapeutics development

Drug screening and development from the affinity of S protein of new coronavirus with ACE2

Computational search for drug repurposing to identify potential inhibitors against SARS-COV-2 using molecular docking, QTAIM and IQA methods in viral spike protein -human ACE2 interface

Comparison of clinically approved molecules on SARS-CoV-2 drug target proteins: a molecular docking study

Identification of a repurposed drug as an inhibitor of spike protein of human coronavirus SARS-CoV-2 by computational methods

Identification of SARS-CoV-2 cell entry inhibitors by drug repurposing using in silico structure-based virtual screening approach

A guideline for homology modeling of the proteins from newly discovered betacoronavirus, 2019 novel coronavirus (2019-nCoV)

A computational prediction of SARS-CoV-2 structural protein inhibitors fromAzadirachta indica(neem)

Analysis of SARS-CoV-2 RNA-dependent RNA polymerase as a potential therapeutic drug target using a computational approach

Potential RNA-dependent RNA polymerase inhibitors as prospective therapeutics against SARS-CoV-2

Analysis of therapeutic targets for SARS-CoV-2 and discovery of potential drugs by computational methods

State-of-theart tools unveil potent drug targets amongst clinically approved drugs to inhibit helicase in SARS-CoV-2

Structural elucidation of SARS-CoV-2 vital proteins: computational methods reveal potential drug candidates against main protease, Nsp12 polymerase and Nsp13 helicase

Identification of potential inhibitors of three key enzymes of SARS-CoV2 using computational approach

Host-and strainspecific regulation of influenza virus polymerase activity by interacting cellular proteins

Herpes simplex virus 1 infection activates poly (ADP-ribose) polymerase and triggers the degradation of poly (ADP-ribose) glycohydrolase

Resolution of the cellular proteome of the nucleocapsid protein from a highly pathogenic isolate of porcine reproductive and respiratory syndrome virus identifies PARP-1 as a cellular target whose interaction is critical for virus biology

A data-driven drug repositioning framework discovered a potential therapeutic agent targeting COVID-19

Analysis of natural compounds against the activity of SARS-CoV-2 NSP15 protein towards an effective treatment against COVID-19: a theoretical and computational biology approach

Computational analysis of complement inhibitor compstatin using molecular dynamics

Structure-based virtual screening and molecular dynamics simulation of SARS-CoV-2 guanine-N7 methyltransferase (nsp14) for identifying antiviral inhibitors against COVID-19

Identification of potential inhibitors of SARS-COV-2 endoribonuclease (EndoU) from FDA approved drugs: a drug repurposing approach to find therapeutics for COVID-19

A blueprint for high affinity SARS-CoV-2 Mpro inhibitors from activitybased compound library screening guided by analysis of protein dynamics

Evaluation of the potency of FDA-approved drugs on wild type and mutant SARS-CoV-2 helicase (Nsp13)