key: cord-0869737-x2hyf554
authors: Tian, Xiongfei; Shen, Ling; Gao, Pengfei; Huang, Li; Liu, Guangyi; Zhou, Liqian; Peng, Lihong
title: Discovery of Potential Therapeutic Drugs for COVID-19 Through Logistic Matrix Factorization With Kernel Diffusion
date: 2022-02-28
journal: Front Microbiol
DOI: 10.3389/fmicb.2022.740382
sha: 676518aa3fb6303c39f96f0b0c0df29919fe66cc
doc_id: 869737
cord_uid: x2hyf554

Coronavirus disease 2019 (COVID-19) is rapidly spreading. Researchers around the world are dedicated to finding the treatment clues for COVID-19. Drug repositioning, as a rapid and cost-effective way for finding therapeutic options from available FDA-approved drugs, has been applied to drug discovery for COVID-19. In this study, we develop a novel drug repositioning method (VDA-KLMF) to prioritize possible anti-SARS-CoV-2 drugs integrating virus sequences, drug chemical structures, known Virus-Drug Associations, and Logistic Matrix Factorization with Kernel diffusion. First, Gaussian kernels of viruses and drugs are built based on known VDAs and nearest neighbors. Second, sequence similarity kernel of viruses and chemical structure similarity kernel of drugs are constructed based on biological features and an identity matrix. Third, Gaussian kernel and similarity kernel are diffused. Forth, a logistic matrix factorization model with kernel diffusion is proposed to identify potential anti-SARS-CoV-2 drugs. Finally, molecular dockings between the inferred antiviral drugs and the junction of SARS-CoV-2 spike protein-ACE2 interface are implemented to investigate the binding abilities between them. VDA-KLMF is compared with two state-of-the-art VDA prediction models (VDA-KATZ and VDA-RWR) and three classical association prediction methods (NGRHMDA, LRLSHMDA, and NRLMF) based on 5-fold cross validations on viruses, drugs, and VDAs on three datasets. It obtains the best recalls, AUCs, and AUPRs, significantly outperforming other five methods under the three different cross validations. We observe that four chemical agents coming together on any two datasets, that is, remdesivir, ribavirin, nitazoxanide, and emetine, may be the clues of treatment for COVID-19. The docking results suggest that the key residues K353 and G496 may affect the binding energies and dynamics between the inferred anti-SARS-CoV-2 chemical agents and the junction of the spike protein-ACE2 interface. Integrating various biological data, Gaussian kernel, similarity kernel, and logistic matrix factorization with kernel diffusion, this work demonstrates that a few chemical agents may assist in drug discovery for COVID-19.

A novel coronavirus disease named COVID-19, caused by coronavirus SARS-CoV-2, is spreading around the globe. As of 3 December 2021, more than 263 million confirmed cases of SARS-CoV-2 infection and 5,232 thousand confirmed cases of SARS-CoV-2-caused death have been reported (WHO, 2021) . The rapid transmission of SARS-CoV-2 has become a severe threat to public health worldwide (Baker et al., 2020; Hopman et al., 2020; Khan M. T. et al., 2021) . Although its vaccines have been studied , it is an immediate urgency to find promising antiviral drugs for COVID-19 therapies (Mahdian et al., 2020; Saxena, 2020) .

However, under such an urgent situation, it is almost impossible to research and develop a new drug for patients with the COVID-19 infections since designing a new drug may spend more than 10 years Yang et al., 2020) . It might be an effective alternative to find possible therapeutic clues from Food and Drug Administration (FDA)-approved drugs, that is, drug repurposing Yang et al., 2016; Chu et al., 2020; Masoudi-Sobhanzadeh, 2020; Zhang et al., 2020 Zhang et al., , 2021 . Now, researchers worldwide have focused on repositioning the FDA-approved drugs for COVID-19. Since these drugs have been tested for the efficacy, safety, and toxicity in the clinical trials, they can be fast applied as clinically available drugs against COVID-19 (Wu et al., 2020) . Multiple examples of repositioned drugs, such as antiviral drugs and host-targeting treatment, are or have been clinical trials for COVID-19 (Tang et al., 2020) . Computational methods for identifying potential options against COVID-19 can be categorized into structurebased virtual screening methods (Khan M. T. et al., 2021) and network-based methods (Dotolo et al., 2020) .

To capture possible antiviral drugs against SARS-CoV-2, a vast amount of structure-based virtual screening methods are carried out. The type of methods uses molecular docking and dynamics simulation techniques to measure binding capabilities between potential anti-COVID-19 drugs and targets. For example, Elfiky (2020) and Muralidharan et al. (2020) combined molecular docking and molecular dynamics simulation. Islam et al. (2021) integrated docking with two approaches, molecular dynamics simulation, and in silico absorption, distribution, metabolism, excretion, and toxicity (ADMET) profile. Kandeel et al. (2020) applied molecular docking, molecular dynamics simulation of top 10 hits, and free energy calculation. Khan et al. (2020a) designed an integrated computational framework for key residue identification via an alanine scanning strategy and an extensive simulation, a cryo-EM structure for novel drug identification based on computational virtual screening and molecular docking (Khan et al., 2020b) , a multi-step drug screening method to shortlist potential drugs , and a structural and biomolecular simulation technique for revealing the impact of specific mutations in the B.1.617 variant . Wang et al. (2020) detected inhibition affect of human defensin-5 against SARS-CoV-2 invasion combining molecular dynamics simulation and statistical analysis. Elmezayen et al. (2020) used molecular docking for top-ranked compounds, molecular dynamics simulations, ADMET profile prediction, and free energy computation. found a versatile antimicrobial peptide that can be used as an inhibitor of SARS-CoV-2 attachment based on dual mechanisms.

Network-based methodologies are widely applied to drug repositioning by integrating multiple data sources. In these methods, nodes denote drugs, diseases, or targets, while edges denote interactions or associations between nodes. Networkbased methods contain network-based clustering methods and network-based propagation methods (Messina et al., 2020; Sadegh et al., 2020) . Network-based clustering methods have been developed to find novel drug-target interactions or drug-disease associations by finding biological modules (for example, drug-target, drug-disease, drug-drug) using clustering algorithms. Network-based propagation methods used network proximity and network propagation algorithms to model associations between drugs, targets, and COVID-19-related diseases. For example, Peng et al. (2020) and Zhou et al. (2020) separately used bipartite local model and the KATZ measurement to find potentially suitable drugs against COVID-19 and validated the predicted results by molecular docking and recent publications. Meng et al. (2021) proposed a similarity constrained probabilistic matrix factorization method to find new Virus-Drug Associations (VDAs). Fiscon et al. (2021) developed a searching off-label drug and network method to uncover interactions between targets and disease-specific proteins. Based on the above studies, developed a random walk with restart-based VDA prediction model to discover possible anti-SARS-CoV-2 drugs on the constructed three VDA datasets. These methods effectively discovered possible antiviral drugs for the treatment of COVID-19.

In this study, we develop a novel VDA prediction method, VDA-KLMF, to find potential chemical agents for COVID-19. VDA-KLMF integrates virus sequences, drug chemical structures, known VDAs, Gaussian kernel, similarity kernel, and Logistic Matrix Factorization with Kernel diffusion. VDA-KLMF is compared with two state-of-the-art VDA prediction models [VDA-KATZ (Zhou et al., 2020) and VDA-RWR ] and three classical association identification models [NGRHMDA , LRLSHMDA (Wang et al., 2017) , and NRLMF ] based on fivefold cross validations on viruses, drugs, and VDAs on three VDA datasets. Experimental results show that VDA-KLMF computes the optimal recalls, AUCs, and AUPRs, significantly improving VDA identification performance. Four chemical agents (remdesivir, ribavirin, nitazoxanide, and emetine) coming together on any two VDA datasets are inferred to be underlying anti-COVID-19 drugs.

Molecular docking is an important drug discovery tool applied to find the best appropriate intermolecular binding between a chemical agent and a target or two proteins. It can effectively elucidate fundamental biochemical processes and characterize activity of ligands binding target proteins (McConkey et al., 2002) . In this manuscript, a molecule docking software, AutoDock (Morris et al., 2009) , is used to measure the molecular activities of the predicted four antiviral small molecules at the junction of the SARS-Cov-2 Spike (S) proteinangiotensin-converting enzyme 2 (ACE2) interface. The dockings show that the four drugs have higher binding energies with two key residues (K353 and G496).

Three VDA datasets were provided by Peng et al. (2021) . Dataset 1 contains 96 VDAs from 11 viruses and 78 drugs. Dataset 2 contains 770 VDAs from 69 viruses to 128 drugs. Dataset 3 contains 407 VDAs from 34 viruses and 203 drugs. The virus sequences and drug chemical structures can be downloaded from the NCBI (Sayers et al., 2021) and DrugBank (Wishart et al., 2018) databases, respectively. Virus sequence similarity matrix S v and drug chemical structure similarity matrix S d can be computed by MAFFT (Katoh et al., 2019) and RDKit (Landrum, 2021) , respectively. The details are shown in Table 1 .

All virus-drug pairs in a dataset can be characterized as a matrix Y:

where v i and d j represent the ith virus and jth drug, respectively.

Given virus similarity matrix S v , drug similarity matrix S d , and VDA matrix Y, our task is to quantify the interplays between viruses and drugs, which can be divided into four scenarios: (1) known virus-known drug association, that is, a virus associates with no less than one drug and a drug associates with no less than one virus; (2) known virus-new drug association, that is, a virus interacts with at least one drug and a new drug does not interact with any virus; (3) new virus-known drug association, that is, a new virus does not associate with any drug and a drug interacts with at least one virus; (4) new virus-new drug association, that is, both virus and drug have no any association information. Our goal is to exploit a novel model to boost the VDA prediction performance. In particular, the model assigns an association probability to a virus-drug pair to measure the likelihood of interplay between the virus and the drug. The higher the probability is, the more likely the virus and the drug are associated with each other. Figure 1 illustrates the flowchart of the VDA-KLMF model.

SARS-CoV-2 is a new single strand RNA virus and has no any associated drug. That is, there may exist the scenario of new virus (for example, SARS-CoV-2) and new drug when a VDA dataset is split during cross validations. The nearest neighbor information of a virus/drug contributes to prioritizing VDAs related to the virus/drug. To find interacting drugs for a virus v i , its Gaussian kernel is constructed as follows. First, its association profile is computed based on its nearest neighbor information by Eq. (2):

where n i represents nearest neighbors of v i , and k is a hyperparameter and denotes the number of nearest neighbors of v i . Second, the computed association profile is normalized by Eq. (3):

Finally, Gaussian kernel K GIP v of viruses is calculated via the normalized association profiles by Eq. (4):

where σ is the kernel bandwidth. Similarly, Gaussian kernel K GIP d of drugs can be computed.

Sequence information of viruses and chemical structure information of drugs help VDA candidate screening. To comprehensively consider these data, original two similarity matrices are transformed into two kernel matrices (K 

Then, the symmetrized matrix S sym v is transformed into a positive semi-definite matrix K sym v by Eq. (6):

where I is an identity matrix, and ε is a parameter. Similarly, K sym d can be calculated.

For a virus, its Gaussian kernel only depicts the similarities between the virus and its k nearest neighbors, the remaining information is discarded. To characterize virus features, inspired by a kernel technique proposed by Hao et al. (2016) , we diffused two different types of virus similarity into a final kernel matrix. First, the local virus similarity matrices are built based on Gaussian kernel and similarity kernel by Eqs (7) and (8), respectively:

Second, the global virus similarity matrices G GIP v and G seq v are produced by iteratively updating by Eqs. (9) and (10).

where G GIP v (h + 1) and G seq v (h + 1) represent global Gaussian kernel and similarity kernel matrices generated at h-th iteration, respectively. And

Finally, virus similarity matrix M v is integrated by Eq. (11):

Similarly, drug similarity matrix M d can be computed.

After the diffused virus similarity matrix M v and drug similarity matrix M d are computed, a Logistic Matrix Factorization model (VDA-KLMF) with Kernel diffusion is then designed for VDA discovery. Viruses and drugs are first randomly mapped into two latent vector spaces A R m × r and B R n × r with the dimension of r. And association probability for each virus-drug pair can be calculated by Eq. (12):

where α,β, and γ are smoothing coefficients with the summation of 1, B T denotes the transpose of B. Inspired by the method provided by , under the assumption that all samples are independent, interplays between viruses and drugs can be rewritten by assigning each known VDA as a confident value of c by Eq. (13):

where P ij denotes association probability between the i-th virus and the j-th drug. Known VDAs are validated by wet experiments and more reliable, therefore, c is assigned as a higher value. Assume that the two vectors follow the zero-mean spherical Gaussian distribution defined by Eqs. (14) and (15):

where σ 2 v and σ 2 d are two parameters used to control the variances of Gaussian distribution, a i and b j refer to potential variables for the i-th virus and the j-th drug, respectively. I is an identity matrix. We can obtain the following distribution based on the Bayesian inference by Eq. (16):

The log formula of the posterior distribution can be represented as Eq. (17):

, ||·|| 2 2 represents the spectral norm, and • denotes the Hadamard product. Thus the latent variable virus matrix A and drug matrix B can be generated by maximizing an objective function defined by Eq. (18):

where ||·|| 2 F represents the Frobenius norm. According to the work provided by , A and B can be solved by Eqs. (19) and (20):

, A and B can be updated based on the AdaGrad algorithm (Duchi et al., 2011) .

Molecular docking is utilized to measure dynamics and binding energies between the predicted antiviral compounds against SARS-CoV-2 and the junction of the S protein-ACE2 interface. Similar to the molecular docking process provided by Peng et al. (2021) , we first downloaded structures of the S protein and ACE2 and chemical structures of drugs from the RCSB Protein Data Bank (Rose et al., 2016) and the DrugBank databases (Wishart et al., 2018) , respectively. Second, solvent and organic compounds were removed and the receptor proteins were preprocessed based on PyMOL (Schrodinger, 2010) . Third, atoms VDA-KLMF from receptors were set to the AD4 type. Finally, AutoDock was applied to implement molecular docking. During docking, the predicted anti-COVID-19 drugs was used as ligands and the junction of the S protein-ACE2 interface was taken as receptor. Binding pocket was set via AutoGrid4, the grid size was 126 × 126 × 126, and Lamarckian genetic algorithm was selected 

as the search method. The detailed processes were set the same as ones provided by Peng et al. (2021) .

We perform experiments to evaluate the performance of the VDA-KLMF method. Given a VDA matrix Y n × m between n viruses and m drugs, inspired by Cross Validation (CV) provided by Peng et al. , three different 5-fold CVs, CV on viruses (CV1), CV on drugs (CV2), and CV on virusdrug pairs (CV3), are implemented. Under CV1, in each round, 80% viruses are used to train VDA prediction models and the remaining 20% of viruses are used to test the performance of these models. Under CV2, in each round, 80% drugs are used to train VDA prediction models and the remaining 20% of drugs are used to test their performance. Under CV3, in each round, 80% VDAs are used to train VDA prediction models and the remaining 20% of VDAs are used to test their performance. The three CVs correspond to VDA prediction for a new virus, a new drug, or based on known VDA data.

The number of iterations h is set as 100. The confident level c of known VDA, the number of neighbors k, weights λ A , and λ B are set in the range of [3, 10], [1, 10], [1, 10] , and [1, 10], respectively. We repeatedly implemented experiments for 100 times and used random search approach to select the optimal parameters. The optimal parameter combinations of VDA-KLMF and other five VDA prediction methods (NGRHMDA, LRLSHMDA, NRLMF, VDA-KATZ, and VDA-RWR) are shown in Table 2 .

Recall (sensitivity), specificity, precision, F1 score, AUC, and AUPR are used to assess the performance of six VDA prediction approaches (VDA-KLMF, NGRHMDA, LRLSHMDA, NRLMF, VDA-KATZ, and VDA-RWR). Recall (sensitivity) indicates the ratio of correctly predicted positive VDAs to all known positive VDAs. Precision represents the ratio of correctly predicted VDAs to all predicted positive VDAs. Specificity denotes the ratio of correctly predicted negative VDAs to all known negative VDAs. F1 Score is the harmonic mean of recall and precision. The four evaluation metrics are defined as follows:

where TP, FP, TN and FN denote true positive, false positive, true negative and false negative, respectively. AUC is the average area under the Receiver Operating Characteristics (ROC) curve. The best results are denoted in bold in each column.

Frontiers in Microbiology | www.frontiersin.org The best results are denoted in bold in each column. The best results are denoted in bold in each column.

The ROC curve is the plot of true positive ratio as a function of false positive ratio when the threshold to capture VDAs from the ranking varies. AUPR is the area under the Precision-Recall (PR) curve. The PR curve is the plot of true positive ratios among all predicted positive VDAs for each given recall value. AUPR provides a quantitative measurement of how well, on average, inferred association probabilities of positive VDAs are separated from the probabilities of negative VDAs. Higher recall, specificity, precision, F1 score, AUC and AUPR illustrate better performance. AUC and AUPR are two more important evaluation criterions compared to other four metrics.

VDA-KLMF is compared with NGRHMDA , LRLSHMDA (Wang et al., 2017) , NRLMF , VDA-KATZ (Zhou et al., 2020) , and VDA-RWR . The former three methods are representative association prediction approaches. NGRHMDA fused collaborative filtering and graphbased scoring. LRLSHMDA utilized a Laplacian regularized least square classifier. NRLMF used a neighborhood regularized Logistic matrix factorization model. The remaining two methods are state-of-the-art VDA prediction models. The two methods used the KATZ measurement and random walk with restart to prioritize anti-SARS-CoV-2 drugs, respectively. The experiments are repeated for 20 times and the final performance is averaged for 20 times. The results are shown in Tables 3-5. The best performance obtained from the six VDA prediction methods in each dataset is denoted in bold in each column. Table 3 lists the performance of six VDA identification models under CV1. It can be observed that VDA-KLMF computes the best recall, AUC, and AUPR, significantly outperforming NGRHMDA, LRLSHMDA, NRLMF, VDA-KATZ, and VDA-RWR on datasets 2 and 3. On dataset 1, VDA-KLMF calculates slightly lower recall, specificity, AUC, and AUPR than NGRHMDA and VDA-RWR. However, on dataset 2 and 3, VDA-KLMF obtains much better performance than the two approaches. It may be resulted in by small sample feature of dataset 1. The results demonstrate that abundant data can boost the prediction performance of VDA inference algorithms.

More importantly, the performance achieved by six VDA prediction models under CV1 is relatively lower than those of CV2 and CV3. The reason may be that there is a completely unknown virus in the three datasets, SARS-CoV-2, which does not show any associated drugs and thus decreases the prediction ability of these algorithms. Under the situation that few of any unlabeled drug for a new virus exists, VDA-KLMF can calculate the best AUCs of 0.8149 and 0.8224 and the best AUPRs of 0.3487 and 0.3431 on datasets 2 and 3, respectively. The result suggests that VDA-KLMF can be effectively applied to prioritize potential small molecules for a new virus, especially SARS-CoV-2. Table 4 gives the performance of six VDA identification algorithms on the three VDA datasets under CV2. VDA-KLMF computes the best recall, F1 score, AUC and AUPR on all three datasets, much better than other five VDA techniques. For example, AUCs computed by 13.98, 12.38, 12.28, and 4 .65% than NGRHMDA, LRLSHMDA, NRLMF, VDA-KATZ, and VDA-RWR on dataset 1, respectively. It is better 7. 23, 14.06, 8.92, 18.54, and 7.15% on dataset 2 and 24.13, 17.17, 13.38, 23.45, and 10 .17% on dataset 3. AUPRs achieved from 48.01, 24.18, 40.32, and 63 .52% compared to NGRHMDA, LRLSHMDA, NRLMF, VDA-KATZ, and VDA-RWR on dataset 1, respectively. Its performance outperforms 46.07, 41.00, 22.58, 40.88, and 47 .14 on dataset FIGURE 2 | The AUC values predicted by six VDA prediction methods (D denotes dataset, Dl denotes dataset 1, D2 denotes dataset 2, D3 denotes dataset 3).

Frontiers in Microbiology | www.frontiersin.org 2 and 49. 03, 46.31, 22.65, 42.90, and 42 .52% on dataset 3. The comparative results demonstrate the superior prediction ability of VDA-KLMF for identifying possible viruses associated with a new drug. Table 5 shows recall, specificity, precision, F1 score, AUC, AUPR computed by six VDA prediction models on the three datasets under CV3. It can be seen that VDA-KLMF still obtains the best performance in terms of recall and AUC on the three datasets. Under CV3, NRLMF computes the best precision and F1 score on all datasets and is the second-best method. In particular, compared to NRLMF, recall obtained by VDA-KLMF is better 24.42, 26.90, and 27.41% on datasets 1-3, respectively. AUCs calculated by VDA-KLMF are better 6.14, 4.22, and 2.89%, respectively. AUPRs achieved from VDA-KLMF are better 11.20, and 4.31% on datasets 1-2, respectively. The results suggest that VDA-KLMF can effectively improve VDA prediction performance based on known VDAs.

Under CV1, NGRHMDA calculates AUCs of 0.7026, 0.4301, and 0.4058 on three datasets, respectively. Under CV2, it computes AUCs of 0.8329, 0.8017, and 0.6772, respectively. Under CV3, it calculates AUCs of 0.6459, 0.3011, and 0.2554, respectively. Under CV1 and CV3, NGRHMDA computes AUCs smaller than 0.5 on datasets 2 and 3. In contrast, if we re-draw the ROC curve, it will obtain AUCs larger than 0.5 on the two datasets under CV1 and CV3. However, its computed AUCs will be smaller than 0.5 under CV2. Similarly, LRLSHMDA and VDA-KATZ compute AUCs smaller than 0.5 on three datasets under CV1, and ones larger than 0.5 under CV2 and CV3. In contrast, if we re-graph the ROC curve, the two methods will compute AUCs larger than 0.5 under CV1 and ones smaller than 0.5 under CV2 and CV3. It may be caused by their poor generalization ability.

In addition, VDA-KLMF computes the slightly smaller specificity. However, specificity indicates the ratio of correctly predicted negative VDAs to all known negative VDAs. For anti-COVID-19 drug screening, it is possible anti-COVID-19 drugs that we need to capture. Therefore, it is more significant to find correctly predicted positive VDAs than correctly predicted negative VDAs. That is, sensitivity (recall) and precision are much more important than specificity. More importantly, under majority of situations, VDA-KLMF computes better AUCs and AUPRs, demonstrating relatively strong VDA prediction performance of VDA-KLMF. Figures 2, 3 depict the AUC and AUPR values calculated by six VDA prediction algorithms on three datasets under three different CVs, respectively. 

In the VDA-KLMF model, logistic matrix factorization model with kernel diffusion integrates Gaussian kernel and biological similarity kernel including sequence similarity of viruses and chemical structure similarity of drugs. Gaussian kernel fully utilizes the nearest neighbor information of viruses and drugs. We investigated VDA prediction performance of logistic matrix factorization model considering kernel diffusion with Gaussian kernel and biological similarity kernel (VDA-KLMF) and only considering biological similarity (VDA-LMFB). The results are shown in Figure 4. From Figure 4 , we can find that kernel diffusion contributes to improving VDA identification ability.

In the VDA-KLMF model, viruses and drugs are randomly mapped into two latent vector spaces A R m × r and B R n × r with the dimension of r. To evaluate the effect of different r-values on the prediction performance, we compared the performance of VDA-KLMF under different settings. Table 6 illustrates the comparison results of VDA-KLMF on three datasets under CV3. On dataset 1, we set r in the range of [2, 30] with the interval of 1. The results show that VDA-KLMF obtains the best prediction ability when r is set to 6. On datasets 2 and 3, we set r in the range of [5, 100] with the interval of 5. The results suggest that VDA-KLMF computes the best performance when r is set to 45 and 35, respectively. Therefore, the dimension r is set to 6, 45, and 35 on the three datasets, respectively.

We wanted to identify potential chemical agents for preventing COVID-19 after confirming the powerful prediction ability of VDA-KLMF. We prioritized the top 10 compounds associated with SARS-CoV-2 on the three datasets. The results are shown in Tables 7-9, respectively. Among the top 10 small molecules with the highest association rankings with SARS-CoV-2, the majority of anti-SARS-CoV-2 drugs have been validated by current literatures. The results in Tables 7-9 show that there are seven available anti-SARS-CoV-2 compounds coming together on any two datasets, that is, remdesivir, ribavirin, nitazoxanide, favipiravir, emetine, chloroquine, and mycophenolic acid. The best results are denoted in bold in each column.

Remdesivir is an adenosine triphosphate analogue. It has broad-spectrum antiviral activity and thus can be applied to the treatment of various diseases resulted in by the Arenaviridae, Flaviviridae, Filoviridae, Paramyxoviridae, Pneumoviridae, and Coronaviridae viral families (Malin et al., 2020) . Remdesivir's action against the Coronaviridae family makes it as a potential therapeutic strategy for COVID-19 (Gordon et al., 2020) . On 19 November 2020, the drug in combination with baricitinib has been authorized to the treatment of COVID-19 (Eastman et al., 2020; FDA, 2021) .

Ribavirin is a synthetic guanosine nucleoside (American et al., 2017) . The small molecule can generate broad activity against a few RNA and DNA viruses by inhibiting the synthesis of viral mRNAs. It is widely applied to the treatment of hepatitis C and viral hemorrhagic fevers and might be effective in the early steps of viral hemorrhagic fevers (Myers et al., 2015; Wishart et al., 2018) .

Nitazoxanide is a broad anti-infective compound. The drug can markedly modulate the survival, growth, and proliferation of various intracellular and extracellular protozoa, helminths, viruses, anaerobic and microaerophilic bacteria (Shakya et al., 2017) . It can inhibit the replication of a few RNA and DNA viruses and has been investigated as a broad antiviral compound (Wishart et al., 2018) .

We conducted molecular dockings for the predicted antiviral drugs and the junction of the S protein-ACE2 interface. The binding energies between the predicted top 10 antiviral drugs on three datasets and the junction are shown in Table 10 .

From Table 10 , we can observe that the identified top small molecules show higher binding energies with the junction, where nitazoxanide, mycophenolic acid, and zanamivir have the highest binding abilities. In addition, the key residues between the predicted seven compounds coming together on any two datasets and the junction are K68 and Q493 for remdesivir, R403, Q493, K353, and G496 for ribavirin, Q493 and S494 for nitazoxanide, K353 and G496 for favipiravir, T500 for emetine, H34 for chloroquine, and H34, K353, F390, and G496 for mycophenolic acid, respectively. The results suggest that K353 and G496 are possible key residues between anti-SARS-CoV-2 drugs and the junction of the S protein-ACE2 interface.

Molecular dockings between the predicted four possible antiviral drugs against ribavirin, nitazoxanide, and emetine) and the junction are illustrated in Figure 5 , where two docking graphs [(a) remdesivir and (b) ribavirin] were provided by Peng et al. (2020) . The subfigure in each circle denotes the residues at the junction and their 

Since the outbreak of COVID-19, we conducted several works for initially screening possible drugs applied to this highly contagious disease based on virus sequences, drug chemical structures, and observed VDAs from existing data resources. These works include VDA-RLSBN , VDA-RWR , and the proposed VDA-KLMF methods. VDA-RLSBN and VDA-RWR first utilized complete genomic sequences of viruses and chemical structures of drugs. Second, they developed computational models to detect underlying associations between SARS-CoV-2 and small molecules. Finally, they conducted molecular dockings between the predicted anti-COVID-19 drugs and two target proteins including the S protein FIGURE 5 | Molecular dockings between the predicted four possible antiviral drugs against COV1D-19 (remdesivir, ribavinn, nitazoxanide, and emetine) and the junction of the S protein-ACE2 interface. (A) remdesivir Wang J. et al., 2021; Shen et al., 2022) , (B) ribavirin Wang J. et al., 2021; Shen et al., 2022) , (C) nitazoxanide, and (D) emetine.

and ACE2 to measure their binding ability. The two methods effectively captured possible antiviral drugs against COVID-19.

In particular, VDA-KLMF integrates drug chemical structures, virus sequences, known VDAs, Gaussian kernel, similarity kernel, and logistic matrix factorization with kernel diffusion. It is compared with two state-of-the-art VDA prediction models and three classical association inference methods. The experimental results illustrate that the proposed VDA-KLMF method obtains powerful prediction performance.

SARS-CoV-2 is a new virus, that is, an orphan node in a VDA network. It has no association with available drugs. To capture underlying FDA-approved drugs against SARS-CoV-2, VDA-KLMF computes sequence similarity between the virus and other viruses and obtains a similarity matrix with the elements in the range of (0,1). Based on sequence similarity kernel and Gaussian kernel, VDA-KLMF can predict association information for SARS-CoV-2 combining matrix factorization model with kernel diffusion. The results show that four small molecules, remdesivir, ribavirin, nitazoxanide, and emetine, have higher binding energies with the junction of the S protein-ACE2 interface.

VDA-KLMF computes superior prediction performance. It has the following three characteristics. First, it effectively integrates various biological information including global and local similarities of viruses and drugs. Second, logistic matrix factorization model with kernel diffusion more accurately quantifies the interplays between viruses and drugs. Finally, two key residues (K353 and G496) are found and need further medical validation.

Compared to VDA-KLMF, VDA-RLSBN remains the following four limitations: (i) Its prediction ability was only validated on one dataset comprised of 96 VDAs between 12 viruses and 78 drugs, which may possibly result in the overfitting problem. (ii) It was only evaluated under CV3 and failed to measure the performance under CVs on viruses and drugs, thereby failures to investigate its generalization ability. (iii) It found 10 potential small molecules against COVID-19 from 78 FDA-approved drugs on the constructed small dataset. Drugs that may be applied to screen the clues of treatment for patients with the infection of COVID-19 are relatively few ones. (iv) It implemented molecular dockings between the identified small molecules and the target proteins including the S protein and ACE2, respectively. In comparison, our proposed VDA-KLMF method use three datasets and is evaluated under CVs on viruses, drugs and VDAs. In this context, VDA-KLMF obtains better performance, thereby demonstrating its powerful generalization ability. Moreover, VDA-KLMF screens possible anti-COVID-19 drugs coming together in any two datasets and the inferred results may be more reliable than those from unique dataset. Finally, VDA-KLMF conducts molecular dockings between the screened drugs and the junction of the S protein-ACE2 interface, which can more reasonably measure their binding abilities.

Similar to VDA-KLMF, VDA-RWR was also measured under three CVs on three datasets. AUC and AUPR are two more important evaluation metrics compared to recall, precision, specificity, and F1 score. VDA-KLMF significantly outperforms VDA-RWR under the above situations. The results illustrate that VDA-KLMF can more precisely screen potential drugs against COVID-19, while further accurately prioritizing possible small molecules during the initial drug screening is vital to the treatment of COVID-19. More importantly, VDA-KLMF captures two candidate drugs (nitazoxanide and emetine) except remdesivir and ribavirin and provides more choices to initially screen available compounds against COVID-19.

To better uncover potential therapeutic clues for COVID-19 and similar diseases produced by evolving SARS-CoV-2, in the future, first, we will build a bigger and SARS-CoV-2related database comprised of drugs, disease, and targets. Second, abundant biological data related to single strand RNA viruses should be integrated to more accurately depict biological features of viruses and drugs. Finally, a more robust model, for example, deep learning model, should be built to boost VDA identification performance. We anticipate that this work can contribute to the initial drug screening for therapy of patients with the infection of COVID-19.

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ Supplementary Material.

XT, LP, and LZ: conceptualization. XT and LP: methodology and writing-review and editing. XT and LH: software. XT, LS, PG, and GL: validation. LP and LZ: investigation, supervision, project administration, and funding acquisition. XT, LS, and GL: data curation. LP: writing-original draft preparation. LS: visualization. All authors have read and agreed to the published version of the manuscript. 

HCV Guidance. Available online at: http:// hcvguidelines.org

Men's health: COVID-19 pandemic highlights need for overdue policy action

DTI-MLCD: predicting drug-target interactions using multi-label learning with community detection method

A review on drug repurposing applicable to COVID-19

Adaptive subgradient methods for online learning and stochastic optimization

Remdesivir: a review of its discovery and development leading to emergency use authorization for treatment of COVID-19

Anti-HCV, nucleotide inhibitors, repurposing against COVID-19

Drug repurposing for coronavirus (COVID-19): in silico screening of known drugs against coronavirus 3CL hydrolase and protease enzymes

Available online at

SAveRUNNER: a networkbased algorithm for drug repurposing and its application to COVID-19

Remdesivir is a direct-acting antiviral that inhibits RNA-dependent RNA polymerase from severe acute respiratory syndrome coronavirus 2 with high potency

Improved prediction of drugtarget interactions using regularized least squares integrating with kernel fusion technique

Managing COVID-19 in low-and middle-income countries

Prediction of microbe-disease association from the integration of neighbor and graph with collaborative recommendation model

A molecular modeling approach to identify effective antiviral phytochemicals against the main protease of SARS-CoV-2

Repurposing of FDA-approved antivirals, antibiotics, anthelmintics, antioxidants, and cell protectives against SARS-CoV-2 papain-like protease

MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization

Structural insights into the mechanism of RNA recognition by the N-terminal RNAbinding domain of the SARS-CoV-2 nucleocapsid phosphoprotein

Phylogenetic analysis and structural perspectives of RNAdependent RNA-polymerase inhibition from SARs-CoV-2 with natural products

Preliminary structural data revealed that the SARS-CoV-2 B. 1.617 Variant's RBD binds to ACE2 receptor stronger than the wild type to enhance the infectivity

Structures of SARS-CoV-2 RNA-binding proteins and therapeutic targets

Targeting the N-terminal domain of the RNA-binding protein of the SARS-CoV-2 with high affinity natural compounds to abrogate the protein-RNA interaction: a molecular dynamics study

RDKit: Open-Source Cheminformatics Software

Genomic variation, origin tracing, and vaccine development of SARS-CoV-2: A systematic review

Predicting lncRNA-miRNA interactions based on logistic matrix factorization with neighborhood regularized

A systematic study on drug-response associated genes using baseline gene expressions of the cancer cell line encyclopedia

Neighborhood regularized logistic matrix factorization for drug-target interaction prediction

Drug repurposing using computational methods to identify therapeutic options for COVID-19

Remdesivir against COVID-19 and other viral diseases

Computational-based drug repurposing methods in COVID-19

The performance of current methods in ligand-protein docking

Drug repositioning based on similarity constrained probabilistic matrix factorization: COVID-19 as a case study

COVID-19: viral-host interactome analyzed by network basedapproach model to study pathogenesis of SARS-CoV-2 infection

AutoDock4 and AutoDockTools4: automated docking with selective receptor flexibility

Computational studies of drug repurposing and synergism of lopinavir, oseltamivir and ritonavir binding with SARS-CoV-2 Protease against COVID-19

An update on the management of chronic hepatitis C: 2015 consensus guidelines from the canadian association for the study of the liver

Prioritizing antiviral drugs against SARS-CoV-2 by integrating viral complete genome sequences and drug chemical structures

Identifying effective antiviral drugs against SARS-CoV-2 by drug repositioning through virus-drug association prediction

The RCSB protein data bank: integrative view of protein, gene and 3D structural information

Exploring the SARS-CoV-2 virus-host-drug interactome for drug repurposing

Drug targets for COVID-19 therapeutics: ongoing global efforts

The PyMOL molecular graphics system

Update on nitazoxanide: a multifunctional chemotherapeutic agent

VDA-RWLRLS: an anti-SARS-CoV-2 drug prioritizing framework combining an unbalanced bi-random walk and Laplacian regularized least squares

Indicator regularized non-negative matrix factorization method-based drug repurposing for COVID-19

Human intestinal defensin 5 inhibits SARS-CoV-2 invasion by cloaking ACE2

Human cathelicidin inhibits SARS-CoV-2 infection: killing two birds with one stone

LRLSHMDA: laplacian regularized least squares for human microbe-disease association prediction

Screening potential drugs for COVID-19 based on bound nuclear norm regularization

Available online at

DrugBank 5.0: a major update to the DrugBank database

Analysis of therapeutic targets for SARS-CoV-2 and discovery of potential drugs by computational methods

Discover the network underlying the connections between aging and age-related diseases

Human geroprotector discovery by targeting the converging subnetworks of aging and age-related diseases

Using network distance analysis to predict lncRNA-miRNA interactions

SPVec: a Word2vec-inspired feature representation method for drug-target interaction prediction

Probing antiviral drugs against SARS-CoV-2 through virus-drug association prediction based on the KATZ method

We really appreciate reviewers for the valuable comments. We would like to thank all authors of the cited references.

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb.

Conflict of Interest: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.