key: cord-1051489-6168uuii authors: Ghosh, Nimisha; Saha, Indrajit; Sharma, Nikhil title: Interactome of Human and SARS-CoV-2 Proteins to Identify Human Hub Proteins Associated with Comorbidities date: 2021-10-06 journal: Comput Biol Med DOI: 10.1016/j.compbiomed.2021.104889 sha: c7d37f904037fc59dc89e3d5c06456e488e76b77 doc_id: 1051489 cord_uid: 6168uuii SARS-CoV-2 has a higher chance of progression in adults of any age with certain underlying health conditions or comorbidities like cancer, neurological diseases and in certain cases may even lead to death. Like other viruses, SARS-CoV-2 also interacts with host proteins to pave its entry into host cells. Therefore, to understand the behaviour of SARS-CoV-2 and design of effective antiviral drugs, host-virus protein-protein interactions (PPIs) can be very useful. In this regard, we have initially created a human-SARS-CoV-2 PPI database from existing works in the literature which has resulted in 7085 unique PPIs. Subsequently, we have identified at most 10 proteins with highest degrees viz. hub proteins from interacting human proteins for individual virus protein. The identification of these hub proteins is important as they are connected to most of the other human proteins. Consequently, when they get affected, the potential diseases are triggered in the corresponding pathways, thereby leading to comorbidities. Furthermore, the biological significance of the identified hub proteins is shown using KEGG pathway and GO enrichment analysis. KEGG pathway analysis is also essential for identifying the pathways leading to comorbidities. Among others, SARS-CoV-2 proteins viz. NSP2, NSP5, Envelope and ORF10 interacting with human hub proteins like COX4I1, COX5A, COX5B, NDUFS1, CANX, HSP90AA1 and TP53 lead to comorbidities. Such comorbidities are Alzheimer, Parkinson, Huntington, HTLV-1 infection, prostate cancer and viral carcinogenesis. Subsequently, using Enrichr tool possible repurposable drugs which target the human hub proteins are reported in this paper as well. Therefore, this work provides a consolidated study for human-SARS-CoV-2 protein interactions to understand the relationship between comorbidity and hub proteins so that it may pave the way for the development of anti-viral drugs. SARS-CoV-2, the virus responsible for COVID-19 has disrupted our daily lives and even after almost two years, we are still struggling in our fight against the virus. Though it originated in China, in no time COVID-19 cases were reported from all around the globe. By September 2021, more than 229 million people have been affected by this virus with more than 4 million deaths 1 . The usual symptoms of COVID-19 range from common cough and cold, shortness of breath, fever to multiple organ failure which may eventually lead to death. Since this is a RNA virus, it shows high mutations and new strains of the virus are also in circulation right now. According to W.H.O 2 , the strains of the virus declared as variants of concern are Alpha or B.1.1.7, Beta or B.1.351, Gamma or P.1 and Delta or B.1.617.2 [1] [2] [3] . SARS-CoV-2 encompasses four structural proteins, spike glycoprotein, envelope, membrane glycoprotein and nucleocapsid, apart from non-structural proteins (NSP1-NSP16) and accessory proteins like ORF3a, ORF6, ORF7a, ORF7b, ORF8, ORF9b, ORF9c and ORF10 [4] . Viruses are incapable of living and reproducing outside a host body. Thus, they need to infiltrate a host for their survival. Protein-protein interaction (PPI) is one such way by which a virus invades a host cell [5] ; SARS-CoV-2 being no exception. For SARS-CoV-2, bats are supposed to be the primary hosts and pangolins are identified to be the possible intermediate hosts from which the virus got transmitted to humans resulting in COVID-19 disease [6] [7] [8] . Furthermore, knowledge of virus invasion and pathogenesis of SARS-CoV-2 is very important to understand the comorbidities in human host. In this regard, study of PPI is crucial and helpful in drug repurposing and discovery as well. These facts have motivated us to conduct this research. Traditionally, the collection of PPI data is mainly done through laboratory-based methods such as protein-chips [9, 10] , correlated mRNA expression profile [11] , TAP-tagging [12, 13] , yeast-two hybrid [14, 15] and synthetic lethal analysis [16] . However, laboratory based methods are mostly time consuming and labour-intensive. Also, due to the voluminous nature of PPI data there is a chance that PPI data generated by laboratory-based methods may not be complete [17] . Furthermore, small proteins are difficult to recognise in lab set up although they have important functional roles in many biological processes [18] . Moreover, it has been frequently observed that high false positives and false negatives occur in the prediction results of laboratory-based methods [19] [20] [21] . To mitigate these problems, a large number of computational methods have been proposed in the literature to identify protein-protein interactions. In this regard, a very popular method to predict PPI is link prediction model where it is considered that proteins interact if they are similar [22] . However, the accuracy of such models are heavily dependent on the reliability of PPI networks which may be affected due to a huge number of false-negative and false-positive PPIs. Also, in scale-free property of PPI networks [23, 24] , some PPI are dense while others are mostly sparse (average degree of 7 or less [25] ) and link predictive models are not very efficient for sparse networks. Thus, high throughput technologies which consider biological information of proteins can be used to predict PPIs [26] . In [27] , the authors have used bioinformatics and machine learning approaches to identify potential drug targets and pathways in COVID-19. In this regard, they have identified 1520 and 1733 differentially expressed genes (DEGs) from GSE152418 and CRA002390 PBMC datasets and have considered hub gene signature based on module membership (MMhub) statistics and PPI networks. Furthermore, they have demonstrated the classification performance of hub genes with more than 90% accuracy, thereby suggesting the potential of the hub genes to be biomarkers. Gupta et al. [28] have also used machine learning for prediction of new small molecule modulators of PPI. In their work, they have concluded that Random Forest predicts general PPI Modulators independent of PPI family with an AUC-ROC value > 0.9. They have also identified novel chemical scaffolds as inhibitors for RBD_hACE PPI which are involved in host cell entry of SARS-CoV-2. Several public databases have been created for the experimentally determined human-virus PPI data and mostly consists of two categories [29] . The first one consists of PPI for species-specific databases encompassing only one specific viral species. It includes NCBI HIV-1 Human Interaction Database [30] , HCVpro [31] , DenHunt [32] , DenvInt [33] and ZikaBase [34] . The second category on the other hand comprises of a wider range of virus species databases such as Viruses.STRING [35] , VirusMentha [36] , PHISTO [37] , VirHostNet [38] and HPIDB [39] . Mostly, these public databases are created by integrating other PPI databases using automatic integration tools like PSICQUIC [40] or they may be manually collected from other public databases as well. In order to contribute to the ongoing research pertaining to SARS-CoV-2 and PPI, in this work we have initially created a human-SARS-CoV-2 PPI database from existing works in the literature. In this regard, we have identified 7085 unique PPIs between the human and SARS-CoV-2 proteins. This consolidated database is a novel contribution of our work. Furthermore, for each virus protein we have identified at most 10 human hub proteins which have the highest degrees. These hub proteins are connected to most of the other human proteins. Consequently, if they are affected, the potential diseases in the pathways of most of the human proteins will get triggered as well, thereby leading to comorbidities. Also, the biological significance of the identified human hub proteins is reported by using KEGG which is essential for identifying the corresponding pathways related to diseases or comorbidities. Also, GO enrichment analysis is performed as well. As a consequence, it is identified that SARS-CoV-2 proteins viz. NSP2, NSP5, Envelope and ORF10 interacting with human hub proteins like COX4I1, COX5A, COX5B, NDUFS1, CANX, HSP90AA1 and TP53 can lead to comorbidities. Such comorbidities comprise of Alzheimer, Parkinson, Huntington, HTLV-1 infection, prostate cancer and viral carcinogenesis. Moreover, drug repurposing which is an effective drug discovery strategy from existing drugs is a very practical alternative to de novo drug discovery and random clinical trials. Considering this, we have also reported possible repurposable drugs like Disodium Selenite, Desipramine, Clindamycin and Vorinostat targeting the human hub proteins. To summarise, we have prepared human-SARS-CoV-2 PPI database by curating such PPIs from different existing works in the literature resulting in 7085 unique PPIs, identified human hub proteins using such PPI networks and finally identified the list of repurposable drugs for such human hub proteins as well as comorbidity issues related to such hub proteins. To the best of our knowledge, these consolidated ideas have not been addressed previously in any article. Therefore, this study mitigates the gaps in the literature through the above mentioned contributions. It is to be noted that other works like [41, 42] have analysed drug repurposibility and comorbidities by considering expression data as opposed to our work which directly considers PPI data for the above analysis. In this section, the data preparation is elaborated at first which is then followed by the discussion on the pipeline of the proposed work. For our work, initially we have prepared a consolidated human-SARS-CoV-2 PPI database taking into consideration the PPIs from [4] , [5] and [43] . There are 332 PPIs in [4] whereas [5] has reported 6489 PPIs and Li et al. [43] The pipeline of the work is shown in Figure 1 The distribution of the PPIs in the literature is shown in Figure 1(b) . Thereafter, all the human proteins for a particular virus protein are given as an input to the STRING database 3 . STRING database returns all the human-human protein interactions for those inputs and may include additional human proteins apart from the ones that are provided as inputs. It may also exclude some human proteins in the process as well. Next, for each SARS-CoV-2 protein, at most 10 human proteins viz. hub proteins are identified which have the highest degrees. It is important to note that based on their association with an individual SARS-CoV-2 protein, there are two levels of human proteins, Level 1 and Level 2 as shown in Figure 1 Once the hub proteins are identified, to understand the effects of these hub proteins on comorbidities, their pathways are explored and the biological significance are demonstrated using KEGG pathway and GO enrichment analysis. KEGG pathway analysis is also important for identifying the pathways leading to comorbidities. Finally, identification of potential repurposable drugs targeting the human hub proteins to curb the effects of COVID-19 are carried out using degrees which are then considered to be the hub proteins for each virus protein. The statistics of human proteins for each virus protein are reported in Table 1 . This table shows the number of unique human proteins directly interacting with SARS-CoV-2 proteins, number of unique human proteins present in human PPI network considering proteins directly interacting with SARS-CoV-2 proteins, number of unique human hub proteins (out of top 10) directly interacting with SARS-CoV-2 proteins, number of unique human hub proteins (out of top 10) indirectly interacting with SARS-CoV-2 proteins and number of unique human proteins apart from the hub proteins directly connected to the hub proteins. As has been mentioned earlier, not all human proteins directly interacting with the SARS-CoV-2 proteins may be a part of the PPI network. This can be inferred from Table 1 as well. For example, for NSP1, 4 human proteins are present in the PPI network while 7 human proteins are directly interacting with SARS-CoV-2 proteins. The corresponding graph for the number of human proteins directly interacting with the SARS-CoV-2 proteins is shown in Figure 1 (d). The sum of interactions or the total degree of the human proteins in human PPI interactome with respect to the virus protein is shown in Figure 1 (e). For example, NSP7 has a total of 53448 human PPI interactions. It can be seen from the figure that out of the 29 virus proteins, 28 has corresponding human-human interaction networks while NSP16 does not have any associated human-human protein interactions. All the identified human hub proteins may not be directly interacting with the SARS-CoV-2 proteins, rather they may be connected indirectly. For example, for NSP7, out of the 10 hub proteins, 9 such proteins are directly interacting with the SARS-CoV-2 protein while 1 human hub protein is indirectly interacting with the virus protein through some other human proteins. It is to be noted that for SARS-CoV-2 proteins like NSP1, NSP2, NSP4, NSP6 and NSP14 which have corresponding interacting human proteins equal to 7, 15, 10, 4 and 10 respectively have number of hub proteins equal to 4, 9, 6, 2 and 2, all less than 10. The details of the human hub proteins for each protein of SARS-CoV-2 are reported in Table 2 . The table provides a list of the directly and indirectly connected hub proteins along with their respective degrees. For example, the directly connected hub proteins of NSP2 are NDUFS1, COX4I1, COX5A, COX5B, EIF4E2, FKBP15, GIGYF2 and MTCH2 with their respective degrees being 4, 3, 3, 3, 1, 1, 1 and 1 while the indirectly connected hub protein is KIAA1033 which has a degree of 1. The human-SARS-CoV-2 PPI network with only the directly and indirectly connected human hub proteins are visualised in Figure 2 while Figure 3 shows the individual PPI networks for all the SARS-CoV-2 proteins. The networks are created using Cytoscape [46] which is an open-source platform. As there may be a lot of human proteins directly connected to the hub proteins (for example, Envelope protein has 673 human proteins directly connected to hub proteins), for visualisation purposes, for each SARS-CoV-2 protein, apart from all the hub proteins, only a handful of the human proteins are chosen from both level 1 and level 2 and shown in KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway analysis reveals the potential diseases that can develop in humans due to SARS-CoV-2. Hub proteins are the ones which are connected to most of the other human proteins in the PPI network. Thus, instead of considering all the human proteins that have been returned by the STRING database, for the KEGG pathway analysis only the hub proteins and those human proteins which are directly connected to the hub proteins are considered. Table 3 [47, 48] which show that prolonged endoplasmic reticulum stress is responsible for the development and progression of many diseases like atherosclerosis, neurodegeneration, liver disease, type 2 diabetes and cancer. Moreover, TP53 targeted by ORF10 is enriched in pathway relating to hsa05203: Viral carcinogenesis (FDR-corrected p-value 1.70E-04). Other significant pathways found for the human proteins with FDR corrected p-values within 5% statistical significance are Till now, no efficacious drug has been discovered to combat SARS-CoV-2. The traditional mechanism of drug development is expensive and time-consuming, thereby making drug repurposing a viable option for effective drug identification for COVID-19. In this regard, human hub proteins corresponding to each SARS-CoV-2 protein can be considered to be good candidates as targets for drug repurposing. Such drugs that interact with the hub proteins are identified using DSigDB in Enrichr tool. For each virus protein, the results for at most top 5 drugs (if any) along with their Drug Bank ID as collected from Drug Bank 5 , their FDR corrected p-values and the possible treatments are reported in Table 4 . As can be seen from Table 4 , several drugs are identified which can be used for treating cancer. For example, Tanespimycin (FDR corrected p-value 4.44E-03 and Drug Bank ID DB05134) which targets human hub protein like HSP90AA1 corresponding to Envelope protein is used for treating several types of cancer, solid tumors or chronic myelogenous leukemia. As previously discussed, HSP90AA1 which is targeted by SARS-CoV-2 Envelope protein triggers PI3K-Akt signaling pathway whose aberrant activation promotes the survival and growth of tumor cells in many human cancers. Other drugs like Phenethyl isothiocyanate, 4-Hydroxytamoxifen, Daunorubicin, Camptothecin, Vorinostat, Diindolylmethane etc. are also used for the treatment of various types of cancer. It is worth noting that identified drugs like Resveratrol known for the treatment of high cholesterol, cancer and heart disease and Niclosamide used for treating tapeworm infection are under trials for the treatment of COVID-19 [49, 50] . Please note that all the hub proteins involved for KEGG pathway analysis may not have corresponding drugs with FDR corrected p-value less than 5%. Thus, only those hub proteins are reported in Table 4 for which there are corresponding relevant drugs. For example, for NSP2, the hub proteins with corresponding KEGG pathways having FDR corrected p-values less than 5% are NDUFS1, COX4I1, COX5A and COX5B while the hub proteins with relevant drugs having FDR corrected p-values less than 5% are NDUFS1, COX5A and COX5B. Figure 6 provides a glimpse of the common hub proteins and drugs among multiple SARS-CoV-2 proteins. For example, RPSA is a hub protein common to NSP3, NSP7 and Spike glycoprotein and the corresponding drug that targets RPSA is Disodium Selenite. Please note that though RPSA is also targeted by ORF9b as shown in Table 4 , it is not shown in the figure as Disodium Selenite is not a relevant drug for RPSA in ORF9b as the corresponding FDR corrected p-value of Disodium Selenite is not less than 5% in this case. Other drugs like Desipramine, Clindamycin and Vorinostat used as antidepressants, antibiotic and for treating Cutaneous T-cell lymphoma (CTCL) respectively are also relevant drugs for the human hub proteins targeted by multiple SARS-CoV-2 proteins. Apart from the discussed hub proteins, it is to be noted that as per https://cancer.sanger.ac.uk/cosmic/, other identified hub proteins like XPC in NSP4, RPN1 in NSP5, XPO1 in NSP8, NUP214 in NSP9, PABPC1 and PCBP1 in NSP12, PRKACA in NSP13, SRSF3 and FIP1L1 in ORF7a and CALR in ORF8 are also cancer related human proteins. Comorbidity in COVID-19 patients is one of the primary reasons which have led to so many deaths around the globe. In this work, we have identified human and SARS-CoV-2 protein-protein interactions to identify human hub proteins associated with comorbidities. In this regard, we have initially collected 7116 human-SARS-CoV-2 PPI from different works in the literature resulting in identifying 7085 unique PPIs. This can be considered to be a novel and significant contribution of our work. Thereafter, we have considered at most top 10 human hub proteins based on their degrees. Moreover, biological significance of the identified human proteins is demonstrated using KEGG which is essential for identifying the pathways related to diseases or comorbidities. Also, GO Enrichment analysis is performed as well. This work provides a consolidated study for human-SARS-CoV-2 protein interactions to understand the association between comorbidity and human hub proteins and we hope it will also be helpful in drug repurposing and discovery as well. To summarise, we have prepared human-SARS-CoV-2 PPI database by curating such PPIs from different works in the literature resulting in 7085 unique PPIs, identified human hub proteins using such PPI networks and identified a list of repurposable drugs for such human hub proteins as well as comorbidity issues related to such hub proteins. NSP11 TBCA 10 TBCD, TBCE, TUBA1A, TUBA4A, TUBB1, TUBB2A, TUBB2B, TUBB4A, TUBB4B 10 Formal analysis; Methodology, Coding; Visualization; Writing -original draft & editing, Indrajit Saha: Conceptualization; Data curation; Supervision; Funding acquisition; Formal analysis; Investigation; Methodology; Project administration; Resources; Validation; Visualization; Writing -review & editing, Nikhil Sharma: Methodology; Visualization; Writing -review & editing Novel SARS-CoV-2 variants: the pandemics within the pandemic One year into the pandemic: Short-term evolution of SARS-CoV-2 and emergence of new lineages, Infection Reduced sensitivity of SARS-CoV-2 variant delta to antibody neutralization A SARS-CoV-2 protein interaction map reveals targets for drug repurposing SARS-CoV-2-human protein-protein interaction network The potential intermediate hosts for SARS-CoV-2 The proximal origin of SARS-CoV-2 The continuing search for the origins of SARS-CoV-2 A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules Global analysis of protein activities using proteome chips Correlation between transcriptome and interactome mapping data from saccharomyces cerevisiae An efficient tandem affinity purification procedure for interaction proteomics in mammalian cells Systematic identification of protein complexes in saccharomyces cerevisiae by mass spectrometry The two-hybrid system: an assay for protein-protein interactions A comprehensive two-hybrid analysis to explore the yeast protein interactome Systematic genetic analysis with ordered arrays of yeast deletion mutants Protein-protein interaction detection: Methods and analysis Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method Analysis of protein complexes using mass spectrometry Precision and recall estimates for two-hybrid screens New methodologies for measuring protein interactions in vivo and in vitro Network-based prediction of protein interactions HiSCF: leveraging higher-order structures for clustering analysis in biological networks Control principles for complex biological networks Assessing and predicting protein interactions by combining manifold embedding with multiple information integration A survey on computational models for predicting protein-protein interactions Bioinformatics and machine learning approach identifies potential drug targets and pathways in COVID-19 SMMPPI: a machine learning-based approach for prediction of modulators of protein-protein interactions and its application for identification of novel inhibitors for RBD:hACE2 interactions in SARS-CoV-2 Current status and future perspectives of computational studies on human-virus protein-protein interactions HIV-1, human interaction database: current status and new features HCVpro: hepatitis C virus protein interaction database Denhunt -a comprehensive database of the intricate network of dengue-human interactions Denvint: A database of protein-protein interactions between dengue virus and its hosts Zikabase: An integrated ZIKV-human interactome map database STRING: A virus-host protein-protein interaction database Virusmentha: a new resource for virus-host protein interactions PHISTO: pathogen-host interaction search tool VirHostNet 2.0: surfing on the web of virus/host molecular interactions data HPIDB 2.0: a curated database for host-pathogen interactions PSICQUIC and PSISCORE: accessing and scoring molecular interactions Comparative transcriptome analysis of sars-cov, mers-cov, and sars-cov-2 to identify potential pathways for drug repurposing Unraveling the molecular crosstalk between atherosclerosis and covid-19 comorbidity Virus-host interactome and proteomic survey reveal potential virulence factors influencing SARS-CoV-2 pathogenesis Enrichr: Interactive and collaborative HTML5 gene list enrichment analysis tool Enrichr: a comprehensive gene set enrichment analysis web server 2016 update Cytoscape 2.8: New features for data integration and network visualization The role for endoplasmic reticulum stress in diabetes mellitus Role of endoplasmic reticulum stress in metabolic disease and other disorders Resveratrol inhibits HCoV-229E and SARS-CoV-2 coronavirus replication in vitro Getting hands on a drug for Covid-19: Inhaled and Intranasal Niclosamide