key: cord-1049176-uu7uuj10 authors: Tang, Xianfang; Cai, Lijun; Meng, Yajie; Xu, JunLin; Lu, Changcheng; Yang, Jialiang title: Indicator Regularized Non-Negative Matrix Factorization Method-Based Drug Repurposing for COVID-19 date: 2021-01-29 journal: Front Immunol DOI: 10.3389/fimmu.2020.603615 sha: 20d1e6959c0504bd936527aa6da02e63a9ca7733 doc_id: 1049176 cord_uid: uu7uuj10 A novel coronavirus, named COVID-19, has become one of the most prevalent and severe infectious diseases in human history. Currently, there are only very few vaccines and therapeutic drugs against COVID-19, and their efficacies are yet to be tested. Drug repurposing aims to explore new applications of approved drugs, which can significantly reduce time and cost compared with de novo drug discovery. In this study, we built a virus-drug dataset, which included 34 viruses, 210 drugs, and 437 confirmed related virus-drug pairs from existing literature. Besides, we developed an Indicator Regularized non-negative Matrix Factorization (IRNMF) method, which introduced the indicator matrix and Karush-Kuhn-Tucker condition into the non-negative matrix factorization algorithm. According to the 5-fold cross-validation on the virus-drug dataset, the performance of IRNMF was better than other methods, and its Area Under receiver operating characteristic Curve (AUC) value was 0.8127. Additionally, we analyzed the case on COVID-19 infection, and our results suggested that the IRNMF algorithm could prioritize unknown virus-drug associations. Human coronaviruses (HCoVs) are a large family of enveloped, single-stranded, and positive-sense RNA viruses belonging to the subfamily orthocoronavirinae. They include a-coronavirus, bcoronavirus, g-coronavirus, and d-coronavirus (1) . Commonly, the a-coronavirus and the bcoronavirus can only infect mammals, particularly humans (2) . Besides, the spread of severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV) in the last 2 decades has brought considerable risks to human life and caused substantial economic losses worldwide (3) . At the end of 2019, another new pathogenic human coronavirus was named COVID-19, because of its long incubation period, which caused the infection to spread rapidly to all regions globally (4) (5) (6) (7) . According to reports through media, the total number of people infected with COVID-19 globally has reached 60 million and the number of deaths has exceeded the 1,500,000 marks. Many researches have focused on the formation, treatment, and vaccine of the COVID-19 (8) (9) (10) (11) (12) (13) (14) (15) (16) . On October 22, 2020, Remdesivir has become the first officially approved COVID-19 treatment by the U.S Food and Drug Administration (FDA). However, high medical expenses prompt researchers to find alternative drugs with the same or similar efficacy as Remdesivir. Drug repurposing aims to explore new treatment strategies to treat diseases based on the approved drugs that are outside the scope of the original medical indication. It has gained considerable attention that drug repurposing can significantly reduce time and labor costs than de novo drug development (17) (18) (19) . For instance, the use of an anti-viral drug, like Ribavirin, could be used to cure infectious diseases such as Hepatitis C Virus (HCV), respiratory syncytial virus (RSV), and influenza B virus (IBV) (20) . Lim et al. proposed Ribavirin could be the potential drugs for COVID-19 based on the clinical treatment (21) . Muralidharan et al. discovered that a combination of three drugs was more effective than the use of any single drug alone against COVID-19 with the binding energy increasing nearly to 100% using sequential docking (22) . This evidenced that existing drugs can be repositioned and used as a potential molecular target for the treatment of COVID-19. However, so far, only a few databases collate the potential related-drug, with the rapidly growing researches on the repositioning of drugs to treat COVID-19. The success of traditional computational methods of drug repurposing relies on protein targets, sequences, and other biological data. However, the novel computational methods focus on the relationship between drug-targets using network approaches are dependent on the development of high-throughput technologies. According to the respective experimental validation, there is increasing evidence in different studies indicating that these novel computational methods are meaningful and useful. From the PPI network, Zhou et al. combined the virus-related proteins and the drugs that target the corresponding proteins to construct a networkbased method aimed at identifying potential drugs against COVID-19 (23). Elsewhere, Gysi et al. developed multiple network strategies for specific drugs with potential efficacy against COVID-19 and utilized three rank aggregation methods (Borda's count, the Dowdall method, and CRank) to assess the selected drugs and gave the final drug ranking (24) . Consequently, Wang et al. employed hierarchical virtual screening methods to analyze the structure of COVID-19 and used the MM-PBSA-WSAS method that calculates the binding energy to obtain the prospective inhibitors of COVID-19 (25) . Therefore, research to identify and develop more effective drugs to prevent and treat COVID-19 is urgently needed (26) . Also, it is imperative to use novel computational methods to accelerate the corresponding research about the large-scale test to show the association of different drugs with COVID-19. In our study, we built a novel virus-drug dataset and proposed the indicator regularized non-negative matrix factorization (IRNMF) method to predict the potential drugs for COVID-19, which is the first time to apply such algorithm in this area. The FDA approved anti-viral drugs against viruses like the a-coronavirus, b-coronavirus, influenza virus, and HIV was adopted to construct the virus-drug association dataset, which comprised of 34 viruses, 210 drugs, and 437 confirmed related virus-drug pairs. We used the virus-drug dataset to research the relationship between the existing drugs and COVID-19. Several studies have outlined that the association relationship between miRNA or lncRNA and biological processes or diseases could effectively be predicted by the non-negative matrix factorization (NMF) method (27, 28) . Based on this framework, we proposed the IRNMF method to investigate the relationship between the existing drugs and COVID-19. Firstly, we calculated the drug and virus similarities extracted from the molecular drug information and the sequenced information on viruses. Secondly, we constructed a virus-drug interaction network based on the virus-drug association, the drug similarity matrix, and the virus similarity matrix. Thirdly, we introduced the indicator matrix into the non-negative matrix factorization algorithm to constrain the final association matrix, which could select the optimal related drug of COVID-19. Five-fold cross-validation (CV) was used to evaluate the IRNMF performance and its AUC value was 0.8127. Consequently, this IRNMF performance AUC value was compared to that of other methods: NMF (0.7968), IMC (0.7221), CMF (0.6470), and RLSMDA (0.7384). The obtained prediction results indicated that the proposed method owed the optimal performance in predicting the virus-drug association with the treatment of COVID-19. Moreover, we analyzed the cases of COVID-19 and MERS-CoV infections, and our results suggested that the IRNMF method could improve the efficiency of the unknown virus-drug associations towards the treatment of COVID-19. To detect the possible drugs against COVID-19, this study proposed a novel method called IRNMF, which includes three steps. First, we calculated the molecular similarity of the drugs and that of viruses. Next, the virus-drug interaction network based on the virus-drug association, drug, and virus similarity were constructed. Lastly, to reveal the potential drugs against COVID-19, we performed IRNMF. Figure 1 shows the IRNMF method framework. It is beneficial to provide novel strategies for the development of treatments against COVID-19 by conducting research on a variety of viruses and examining their corresponding drug targets. In terms of the virus collection, we preferred human infectious viruses, like coronavirus, RNA viruses, DNA viruses, and HIV. In the drug collection, original drugs against related viruses and broad-spectrum anti-viral drugs were used as therapeutic effects in treating viruses. About the above discussions, we constructed a virus-drug dataset, that included 34 viruses, 210 drugs, and 437 confirmed virus-drug pairs, as illustrated in Table 1 . Individual virus particles of the same strain of viruses have genetic sequences that are very similar, but not completely identical and this resemblance can be obtained by comparing different virus sequenced sequences. Multiple Alignment using Fast Fourier Transform (MAFFT) algorithm, which is a multiple sequence alignment program, is used for molecular biological studies. The MAFFT algorithm has been iteratively upgraded and has formed a complete system to help realize different biological information after performing a similarity analysis. This system has different methods like progressive, iterative refinement, and structural alignment for RNAs (29, 30) . Here, we use the FFT-NS-1 algorithm to calculate the similarities of the viruses, which are appropriate for medium-scale alignments. The existing drug similarity measurements were addressed based on the different overlap of viruses-related drugs. As shown in eq. (1), we adopted the Tanimoto coefficient (TC) to calculate the similarity between all drug pairs (31) . This was about the molecular structure of drugs, which was downloaded from the DrugBank website. S A /S B belong to the value of drug A/B-related targets, genes and structures. S C is the value of the Common parts between A and B. D (A, B) is the value of Tanimoto coefficient, which ranges from 0 to 1. The larger of the value, the more similar of the two drugs' structure are. This calculation depended on Open Babel V2.3.1. Several studies have focused on the matrix calculation found in biological analysis (32) (33) (34) . The NMF method decomposes the origin matrix to a product of two non-negative matrices (35, 36) . It could be used to convert the matrix Y of a defined virus-drug matrix Y ϵ R n×m , to two matrices: U ϵ R n×k and V ϵ R m×k (k << min (n,m)) Based on the NMF method, we introduced the indicator matrix into the NMF to construct the IRNMF algorithm to repurpose virus-related drugs. This indicator matrix is used to ensure that both products of U and V are conversant with the original matrix. This intervention aims at effectively avoiding any noisy information, which only contains two values, either 0 or 1, whereby, 0 means no value at this position of the matrix, and 1 means there is a value at this position of the matrix. Where I represent the indicator matrix, a, b, l 1 , and l 2 are the regularization coefficients, U i and V j are the ith and jth rows of U and V respectively, and S if and S jq are the ith and jth rows of S d and S v b respectively, which belongs to drug similarity and virus similarity matrices respectively. The scaling factor ||‧|| F is the Frobenius norm. Tr (.) is the trace value of the matrix; Q d = D d -S d and Q v = D v -S v are the Laplacian similarity matrix for S d and S v ; where the values of D d and D v represents the diagonal matrices of S d and S v matrices, respectively (37) . According to the corresponding Lagrange function, d and ϵ belongs to the matrix of the Lagrange factor, which are defined as d =[d ik ] and ϵ = [ϵ jk ]. Hence, it is easy to obtain the partial derivatives of the Eq. (4) using the values of U and V. We adopted the Karush-Kuhn-Tucker (KKT) condition to resolve the Lagrange optimization process. Defined as d ik U ik = 0 and ϵ jk V jk = 0, and then we could get the regularized non-negative matrices U and V, which represented drug and virus matrices, respectively. According to the Eq. (5), we could get the corresponding virus-drug association matrix by using the formula Y* = U T V, and then selecting the optimal virus-related drugs based on the matrix Y*. To evaluate the performance of IRNMF and other methods with 5-fold CV (38) . The virus-drug datasets were randomly divided into five subsets with equal sizes. In the experiment, four subsets were selected to train the model, respectively, and the last subset was used to evaluate the performance of the model. after the process was repeated for 5 times repeated, all the virus-drug of association scores have been estimated and sorted once. Subsequently, we set a threshold s. Here, if the association score was higher than s, it was concluded that the prediction of the positive sample value was correct. However, if the association score was lower than s, it meant that the prediction of the negative sample was correct. Consequently, this study adopted the receiver operating characteristic (ROC) curve to assess and compare these methods. The true positive rate (TPR) and the false positive rate (FPR) were compiled as follow: TP and TN are the numbers of true positive and true negative samples that the method could predict whereas FP and FN are the numbers of false-positive and false-negative samples that the method could predict. Lastly, the area under the ROC curve (AUC) was used to calculate the performance of the different corresponding methods. In this work, the optimal parameters of the IRNMF method were determined using the following values: k is the value of low rank, which was set at 18. For a, b, l 1 , and l 2 , the values belonged to the regularization coefficients. To simplify this method, we assumed that a = b = 0.8 and l 1 = l 2 = 0.1. The value of iteration was set at 1500. In this study, we adopted the IRNMF algorithm to select potential drug targets against COVID-19. Here, the database did not correlate the corresponding drug targets and COVID-19 and the predicted related drugs with other drugs and viruses. To confirm the performance of the IRNMF, we compared four algorithms based on the virus-drug network association prediction using different methods such as non-negative matrix factorization (NMF), Notably, all the above methods belonged to the semi-supervised procedures. Luo et al. adopted the NMF method and predicted small molecule-miRNA associations (28). Natarajan et al. proposed that the IMC method predicts genedisease associations (39) . The IMC method combines multiple types of features to learn latent factors, and it can be applied to the novel associations and it just needs the existing associations networks rather than the traditional matrix completions. Elsewhere, the CMF method was developed to predict the drug-disease association model (40) . This method joins multiple matrices, which are factorized using the original interaction matrices, and represents the association of different classes, which helps to understand the potential characteristics of different relationships. Additionally, Chen et al. developed the RLSMDA algorithm to study the correlation between miRNAs and diseases (41) . To obtain the likelihood of the relationship of miRNAs and diseases, this algorithm designed a continuous classification function using regularized least squares. Of note, before the experiments were performed in the above-stated comparison methods, the optimal parameter settings were adjusted accordingly. Here, the proposed IRNMF method was compared with NMF, IMC, CMF, and RLSMDA methods through 5-fold CV. As illustrated in Figure 2 , the IRNMF ROC curve value was above that of NMF, IMC, CMF, and RLSMDA in most of the experiments. The AUC values of IRNMF, NMF, IMC, CMF, and RLSMDA were 0.8127, 0.7968, 0.7221, 0.6470, and 0.7384, respectively. Notably, the AUC value of the IRNMF algorithm increased on average than the other four methods were 2%, 12.5%, 25.6%, and 10.1%, respectively. It demonstrated that the IRNMF mastered more knowledge about the similarity of viruses and that of drugs than the other methods, particularly, it showed a better prediction performance than NMF. We also selected the most frequent potential drugs for COVID -19 through the IRNMF and the other 4 algorithms. As was the Table 2 shown, Ribavirin, Nitazoxanide, Amantadine, N4-Hydroxycytidine, Chloroquine, and Mizoribine were predicted by all methods. Camostat, Niclosamide, Favipiravir, Zanamivir, Artesunate, Umifenovir, and Gemcitabine appeared 3-4 times. Particularly, the 13 predicted drugs belonged to the top16 potential drugs by the IRNMF method were the same as the most frequent potential drugs, and the other drugs top16 drugs selected by the IRNMF method have been predicted at least once by the four methods. The result showed that the topranked predictive drugs were more important than the lowerranked predictive drugs and the valuable drugs were more possible to selected by the other methods. Therefore, we attest that the IRNMF algorithm is helpful for the prediction of drugs against COVID-19. Remdesivir 3 14 Gemcitabine 3 The statistics based on the Top16 predicted drugs by the five methods. The bold drugs represent the predicted drugs by the IRNMF method. To assess the impact of IRNMF performance parameters, this study, assumed that the 4 parameters of the regularization coefficients were as follows; a = b and l 1 = l 2 , and then the comparative experiments were performed by adjusting these two parameters. As shown in Figure 3 , when the value of a/b was in the range of 0.1-0.4, the IRNMF performance could be improved if the value of a/b increased. However, when the value of a/b was fixed, the IRNMF performance was weakened when the value of l increased. When the value of a/b was in the range of 0.5-1.0, the IRNMF performance was stable between 0.8 and 0.81, and the performance of IRNMF fluctuations was maintained at a small range. Our results showed that IRNMF had a stable generalization performance. In this study, we analyzed top16 predicted potential drugs against COVID-19. As illustrated in Figure 4 . drugs such as Ribavirin, N4-Hydroxycytidine (NHC), Remdesivir, and Favipiravir, belong to similar nucleoside analogs and they can bind to the RNA dependent RNA polymerase (RdRps) enzyme which is crucial during the life cycle of RNA viruses. Ribavirin, an anti-viral drug is used to treat infections caused by influenza viruses and HCV. Hung et al. (42) recommended that the clinical trials, among the 127 patients during the randomized control trial, on the joint use of interferon beta-1b, lopinavir-ritonavir, and Ribavirin revealed that the joint use of these anti-viral drugs was safer and more effective than single medication used for the treatment of COVID-19. Elsewhere, the N4-Hydroxycytidine (NHC) has been demonstrated that it could treat infections caused by the influenza virus, Ebola, and SARS-CoV. Moreover, Sheahan et al. (43) noted that it could inhibit clinical isolates of COVID-19 infection through the replication of both Vero and Calu-3 cells. This study resulted in a dose-dependent reduction on the following; IC50 of 0.3 mM and CC50 >10mM in Vero cells, IC50 of 0.08 mM in virus titers, and IC50 of 0.09 mM in viral genomic RNA. Remdesivir which has been developed for the treatment of the Ebola virus, has been of interest after it showed the capability of metabolizing to Remdesivir triphosphate which competes for incorporation through RdRps and the fact that it interferes with the viral RNA replication of COVID-19 (13 (48) . Zanamivir, a neuraminidase inhibitor that disrupts viral exit from infected cells and has known anti-viral activity that inhibits influenza viruses. Hall et al. found evidence that Zanamivir could be the 3CL protease proteinase inhibitor against COVID-19 infection using silicon docking models (49) . Clinical trials on Nitazoxanide, which belong to the anti-viral prodrug, showed that this drug exhibits an in vitro activity against MERS-CoV and SARS-CoV infections, and therefore it could be used as a potential drug for the treatment of COVID-19 patients (50) . Artemisinin is a potent anti-malarial drug. Sardar (56) . Therefore, there is a need for scientists to perform more experiments to validate this potential drug. Mizoribine is an immunosuppressive drug, and its research shows that it could inhibit the replication of SARS-CoV and MERS-CoV infections, hence it could also be a potential drug against COVID-19. In addition, Chloroquine, an antimalarial drug, is used in the treatment of several other diseases. Gao et al. found that chloroquine could interfere with the general endocytic trafficking to inhibit both the replication and infection caused by COVID-19 (57) . However, Mehra et al. proposed that chloroquine could not help patients' recover when used alone or with a macrolide (58) . Based on these, we don't recommend this drug as a clinical treatment option. In conclusion, COVID-19 is becoming one of the most prevalent and infectious diseases in human history. Existing drugs can be repositioned and used as a potential molecular target for the treatment of COVID-19. However, so far, only a few databases collate the potential related-drug to treat COVID-19. In this study, we constructed the virus-drug dataset, which included 34 viruses and 210 drugs, which were obtained from 437 confirmed FIGURE 4 | The predicted Top16 potential drugs by IRNMF. The dark color indicated the predicted drug has been validated. The light color indicated the predicted drug has not been validated. cases on the related pairs about the virus and drug. Besides, we developed the indicator regularized non-negative matrix factorization (IRNMF) algorithm to predict the potential drug against COVID-19. In this IRNMF model, the known virus-drug associations, the virus similarity networks, and the drug similarity network were merged to calculate the prediction score of each virus-drug pair. According to the 5-cross VDA, the AUC value of IRNMF is 0.8127. Comparing this value with that of NMF (0.7968), IMC (0.7221), CMF (0.6470), and RSLMDA (0.7384), this proposed method achieved better performance. The results suggest that the IRNMF algorithm can deduce the unknown virus-drug associations. In addition, IRNMF is restricted with the scale of the virus-drug dataset, the predicted potential drugs might not be totally accurate. Therefore, according to the corresponding drug and virus database, our future work is to enlarge the virus-drug dataset of COVID-19. The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://github.com/ dukebai/IRNMF. XT: Conceptualization, methodology, software, writingoriginal draft, visualization. JX, JY, LC: Conceptualization, methodology, visualization, supervision, project administration, funding acquisition. YM, CL: data curation, formal analysis, writing-review and editing. All authors contributed to the article and approved the submitted version. Novel SARS-CoV-2 outbreak and COVID19 disease; a systemic review on the global pandemic COVID19: an announced pandemic Recent discovery and development of inhibitors targeting coronaviruses Characteristics of COVID-19 infection in Beijing The COVID-19 cytokine storm; what we know so far The 2019 novel coronavirus disease (COVID-19) pandemic: A zoonotic prospective SARS-CoV-2 causing pneumonia-associated respiratory disorder (COVID-19): diagnostic and proposed therapeutic options Extensive Partnership, Collaboration, and Teamwork is Required to Stop the COVID-19 Outbreak Immunoinformatics approach to understand molecular interaction between multi-epitopic regions of SARS-CoV-2 spike-protein with TLR4/MD-2 complex COVID-19: Consider IL6 receptor antagonist for the therapy of cytokine storm syndrome in SARS-CoV-2 infected patients Tocilizumab: A therapeutic option for the treatment of cytokine storm syndrome in COVID-19 Probable Molecular Mechanism of Remdesivir for the Treatment of COVID-19: Need to Know More Cytokine storm in COVID-19: the current evidence and treatment strategies Development of epitope-based peptide vaccine against novel coronavirus 2019 (SARS-COV-2): Immunoinformatics approach A SARS-CoV-2 vaccine candidate: In-silico cloning and validation Drug repositioning: identifying and developing new uses for existing drugs A bibliometric review of drug repurposing The broadspectrum antiviral ribonucleoside ribavirin is an RNA virus mutagen Antiviral drugs-: history and obstacles Case of the index patient who caused tertiary transmission of COVID-19 infection in Korea: the application of lopinavir/ritonavir for the treatment of COVID-19 infected pneumonia monitored by quantitative RT-PCR Computational studies of drug repurposing and synergism of lopinavir, oseltamivir and ritonavir binding with SARS-CoV-2 Protease against COVID-19 Network-based drug repurposing for novel coronavirus 2019-nCoV/SARS-CoV-2 Network medicine framework for identifying drug repurposing opportunities for covid-19 Fast identification of possible drug treatment of coronavirus disease-19 (COVID-19) through computational drug repurposing study Boosting the arsenal against COVID-19 through computational drug repurposing DMFMDA: Prediction of microbe-disease associations based on deep matrix factorization using Bayesian Personalized Ranking Incorporating Clinical, Chemical and Biological Information for Predicting Small Molecule-microRNA Associations based on Non-negative Matrix Factorization MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform MAFFT multiple sequence alignment software version 7: improvements in performance and usability A novel features ranking metric with application to scalable visual and bioinformatics data classification A Machine Learning Method for Drug Combination Prediction Identifying Potential miRNAs-Disease Associations with Probability Matrix Factorization CMF-Impute: an accurate imputation tool for single-cell RNA-seq data Predicting miRNA-disease association by integrating low-rank matrix completion with miRNA and disease similarity information A novel computational method for the identification of potential miRNA-disease association based on symmetric non-negative matrix factorization and Kronecker regularized Least Square A graph regularized non-negative matrix factorization method for identifying microRNA-disease associations Degree-based similarity indexes for identifying potential miRNA-disease associations Inductive matrix completion for predicting genedisease associations Predicting Drug-Disease Associations via Multi-Task Learning Based on Collective Matrix Factorization Semi-supervised learning for potential human microRNAdisease associations inference Triple combination of interferon beta-1b, lopinavir-ritonavir, and ribavirin in the treatment of patients admitted to hospital with COVID-19: an open-label, randomised, phase 2 trial An orally bioavailable broad-spectrum antiviral inhibits SARS-CoV-2 in human airway epithelial cell cultures and multiple coronaviruses in mice Delayed Initiation of Remdesivir in a COVID-19-Positive Patient Experimental treatment with favipiravir for COVID-19: an open-label control study. Engineering (2020) Amantadine Treatment for People with COVID-19 Favipiravir versus arbidol for COVID-19: a randomized clinical trial Various Combination of Protease Inhibitors, Oseltamivir, Favipiravir, and Hydroxychloroquine for Treatment of COVID19: A Randomized Control Trial (THDMS-COVID19) A search for medications to treat COVID-19 via in silico molecular docking models of the SARS-CoV-2 spike glycoprotein and 3CL protease Potential role for nitazoxanide in treating SARS-CoV-2 infection COVID-19 and Plasmodium vivax malaria co-infection. IDCases (2020) 21:e00879 Pyronaridine and artesunate are potential antiviral drugs against COVID-19 and influenza Antiviral activity of berberine Recognition of Natural Products as Potential Inhibitors of COVID-19 Main Protease (Mpro): In-Silico Evidences Screening of FDA-approved drugs using a MERS-CoV clinical isolate from South Korea identifies potential therapeutic options for COVID-19 Camostat mesilate therapy for COVID-19 Breakthrough: Chloroquine phosphate has shown apparent efficacy in treatment of COVID-19 associated pneumonia in clinical studies Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis We are grateful to the support of Hunan University and the Hunan Provincial Innovation Foundation for Postgraduate.