key: cord-0814988-fqwb6kyc authors: Erola, Pau; Martin, Richard M.; Gaunt, Tom R. title: The network of SARS-CoV-2—cancer molecular interactions and pathways date: 2022-04-05 journal: bioRxiv DOI: 10.1101/2022.04.04.487020 sha: 60d0ae7ddab3b3f37db99d78d3eb0d2c01ea41bb doc_id: 814988 cord_uid: fqwb6kyc Background Relatively little is known about the long-term impacts of SARS-CoV-2 biology, including whether it increases the risk of cancer. This study aims to identify the molecular interactions between COVID-19 infections and cancer processes. Materials and Methods We integrated recent data on SARS-CoV-2 – host protein interactions, risk factors for critical illness, known oncogenes, tumor suppressor genes and cancer drivers in EpiGraphDB, a database of disease biology and epidemiology. We used these data to reconstruct the network of molecular links between SARS-CoV-2 infections and cancer processes in various tissues expressing the angiotensin-converting enzyme 2 (ACE2) receptor. We applied community detection algorithms and Gene Set Enrichment Analysis (GSEA) to identify cancer-relevant pathways that may be perturbed by SARS-CoV-2 infection. Results In lung tissue, the results showed that 4 oncogenes are potentially targeted by SARS-CoV-2, and 92 oncogenes interact with other human genes targeted by SARS-CoV-2. We found evidence of potential SARS-CoV-2 interactions with Wnt and hippo signaling pathways, telomere maintenance, DNA replication, protein ubiquitination and mRNA splicing. Some of these pathways were potentially affected in multiple tissues. Conclusions The long-term implications of SARS-CoV-2 infection are still unknown, but our results point to the potential impact of infection on pathways relevant to cancer affecting cell proliferation, development and survival, favoring DNA degradation, preventing the repair of damaging events and impeding the translation of RNA into working proteins. This highlights the need for further research to investigate whether such effects are transient or longer lasting. Our results are openly available in the EpiGraphDB platform at https://epigraphdb.org/covid-cancer and the repository https://github.com/MRCIEU/covid-cancer (https://doi.org/10.5281/zenodo.6391588). The coronavirus SARS-CoV-2 remains a major public health concern since it was first identified over 2 years ago, and a global scientific effort is focused on understanding its biology and how to minimize its clinical sequelae. One major cause of concern is the virus's unknown long-term effects [1] . It has long been known that viral infections have the potential to alter a cell's DNA, activating carcinogenic processes and preventing the immune system from eliminating damaged cells. There is, therefore, an urgent need to understand the long-term health consequences of SARS-CoV-2, including whether it increases the risk of cancer or accelerates its progression [2] . A key to addressing these questions is to develop our understanding of the shared biology of COVID-19 disease and cancer, with the aim of identifying molecular mechanisms that may link SARS-CoV-2 with cancer. Early work from Gordon et al. [3, 4] established a milestone in our understanding of the virus, describing viral bait genes and human target genes. Subsequently, Tutuncuoglu et al. [5] examined the structure and biological processes of this network of target genes, describing potentially undesirably perturbed candidate cancer genes at the host-virus interface. It has been hypothesized that SARS-CoV-2 may play a role in promoting cancer onset, for example by inhibiting tumor suppressor genes [6] , and its potential impact on cancer risk and treatment [2] . However, to the best of our knowledge, the downstream links between SARS-CoV2 and established cancer genes have not been quantitatively described. In this study, we hypothesize that these target genes may not be directly linked to cancer development themselves, but may interact with oncogenes, tumor suppressor genes and cancer drivers with the potential to influence oncogenic processes. To investigate these potential downstream effects, we analysed and visualized the network of interactions between the human target genes of SARS-CoV2 and cancer-related genes. To develop this "COVID-cancer" interaction network we used the EpiGraphDB database, an analytical bioinformatics' platform that supports population health data science, to link the recently mapped hostcoronavirus protein interactions in SARS-CoV-2 infections with existing knowledge of cancer biology and epidemiology. Using these data, we constructed an accessible network of plausible molecular interactions between viral targets, genetic risk factors, and known oncogenes, tumour suppressor genes and cancer drivers for a variety of cancer types. Our results are openly accessible through the EpiGraphDB platform at https://epigraphdb.org/cancer-covid. Coronaviruses target the respiratory system, and angiotensin-converting enzyme 2 (ACE2) produced by SARS-CoV-2 targets human cells expressing the ACE2 receptor [7] , including in lung tissue. Thus, many patients with SARS-CoV-2 suffer long-term symptoms due to damage to the walls and lining of the lungs. We reconstructed the molecular network for lung cancer risk, representing cancer genes and interacting SARS-CoV-2 target genes ( Figure 1 ). For the reconstructed network of other human cells expressing the ACE2 receptor see the interactive views at https://epigraphdb.org/cancer-covid or the data at https://github.com/MRCIEU/covid-cancer (https://doi.org/10.5281/zenodo.6391588). We found 93 human genes targeted by SARS-CoV-2 which were either oncogenic (n=4) or interacted with oncogenic genes (n=92). These genes were clustered to enrich the network visualization, where each cluster of genes was depicted as two columns, one for SARS-CoV2 target genes and one for cancer genes. We then searched for molecular pathways that may be perturbed by each gene. A total of 12 clusters were found and 8 of them showed significant enrichment for pathways. The top enrichments in lung tissue are in Table 1 , and the complete GSEA results for all tissues are in Supplementary File 1. Our results suggest SARS-CoV-2 host protein interactions could influence cancer risk through interactions with proteins in Wnt and hippo signaling pathways (clusters 4 and 11), two important pathways frequently linked to cancer due to their roles in cell proliferation, development and cell survival [8, 9] . Perturbations in telomere maintenance and DNA replication may affect the integrity of DNA, favoring its degradation and preventing the repair of damaging events like gene fusions. The enrichment in cluster 2 suggests this could be influenced by SARS-CoV-2 infection. In addition, several target proteins interact with the quasi-universal tumor suppressor TP53 [10] that regulates cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. SARS-CoV-2 infection may also impact on gene function through changes in the mRNA splicing process, impeding translation into working proteins, as depicted in cluster 7, and alterations in protein ubiquitination and the positive regulation of kinase activity as per the metabolic pathways enriched in cluster 3. In the latter, the oncogenes RET and ERBB2 could be affected by upstream gene targets. RET (receptor tyrosine-protein kinase) is involved in numerous signaling pathways including cell proliferation and both cell death and survival [11] . In addition, ERBB2 is a protein tyrosine kinase that is part of several cell surface receptor complexes and member of the epidermal growth factor (EGF) receptor family that prevents the phosphorylation of the tumor suppressor gene APC [12] . Furthermore, NOTCH4, a gene increasing the risk of critical illness in people with COVID-19 [13] , was linked to CCND1 and ERBB2, genes that impact on cancer metastasis and poor prognosis [12, 14, 15] . Cluster 1 shows enrichment for the cytokine-mediated signaling pathway, including genes IFNAR2 and TYK2. Both genes were identified as genetic risk factors for COVID-19 [13] and may interact with interleukin 6 (IL6), an important gene in the regulation of host defense during infections. Also, there was a significant enrichment of the regulation of DNA-binding transcription factors, including the oncogene MAPK1 that plays a vital role in the regulation of meiosis, mitosis and postmitotic functions in differentiated cells [16] . We compared the oncogenic genes that are SARS-CoV-2 targets, or interact with them, in 6 different tissues expressing the ACE2 receptor, including lung, testis, colon, kidney, stomach and small intestine. Figure 2A shows the overlap of these sets of genes using a Venn diagram. Lung is the tissue with the UOB Confidential & Sensitive most oncogenic genes (n=96) likely targeted and it shares 17 genes with colonic, 14 genes with gastric, and 6 genes with testicular, tissues. Of those, 13 genes are shared between lung, colonic and gastric, and 4 genes between lung, colonic and testicular, tissue. Figure 3A details the genes shared between tissues. The well-established oncogenic genes TP53, MYC, PTEN and RASSF1 were all present for kidney, stomach and small intestine. The oncogene KRAS is present in all tissue networks except testis and the tumor suppressor gene APC was shown to be targeted in lung, colon, small intestine and stomach. Looking at the molecular pathways ( Figure 2B and 3B) there were 12 significantly enriched pathways shared between lung and testis. Those included important cancer-related molecular processes such as DNA replication, telomere maintenance, and the hippo signaling pathway. However, it is worth noting that this enrichment was triggered by only two genes STK3 and STK4. STK3 and STK4 are key components of the Hippo signaling pathway, playing a pivotal role in tumor suppression by restricting proliferation and promoting apoptosis [17] . Moreover, the regulation of the canonical Wnt signaling pathway was enriched for lung, colon, small intestine, and stomach. Only testis showed enrichment for regulation of signal transduction by p53 class mediator. A detailed list of the genes and tissues in this comparative analysis between tissues can be found in Supplementary File 2. It is estimated that 12% of all human cancers are caused by viruses. Viral oncogenesis is a complex process where viruses are necessary but not sufficient for cancer development and cancers only appear after long-term persistent infections [18] . Ghavasieh et al. [19] compared the host-parasite proteinprotein interaction (PPI) network of SARS-CoV-2 against 92 known viruses finding similarities with respiratory viruses (Influenza A and Human Respiratory Syncytial virus), as would be expected, and also with other viruses such as HIV1, a risk factor for cancer due to its activation of chronic inflammation and the exhaustion of T cells. Tutuncuoglu et al. [5] engineered the map of SARS-CoV-2 proteins directly interacting with potential cancer genes with the aim of identifying if coronaviruses and cancer cells disturb the same host pathways. Building on this approach, and with the objective of triangulating evidence [13] , we extended the network of SARS-CoV-2 interacting proteins to include not just those which are oncogenic themselves (rarely the case), but also those cancer genes whose product interacts with target proteins (i.e. upstream or downstream of SARS-CoV-2 interacting proteins). Whilst the virus interacts with few oncogenes directly, some exceptions can be seen in Figure 1 for lung tissue. The tumor suppressor genes PTPN13, RALA and MIB1 and the oncogene MARK2 are direct targets of the virus and therefore more likely to be perturbed. The final functional impact is, however, not known. Our results showed that SARS-CoV-2 could potentially alter fundamental cancer hallmarks [21] . Hypothetically, an inhibited tumor suppressor such as p53 could have an oncogenic potential through mechanisms similar to HPV [6] . Nonetheless, well-known oncogenic viruses like HPV typically establish long-term infections, but there is no evidence of long-term viral latency of SARS-CoV-2 [6] . Exceptions to this are immunodeficient individuals, such as some cancer patients, who are at risk of rapid viral evolution and prolonged infection [23, 24] . Long-term infections in these patients represent a source of variation for the virus that aids intra-host and inter-host transmission [25] , and requires additional mitigation strategies such as self-isolation, immunization, combination monoclonal antibodies and antiviral agents. Similarly, the end of legal mitigation strategies may place some patients at risk of continuous reinfection and to a similar biological stress as in long-term viral infections. To conclude, our novel results have pointed out the potential impact of SARS-CoV-2 on oncogenic pathways, and provide new understanding of the biology of SARS-CoV-2 virus and its molecular connection to cancer. However, the long-term implications of SARS-CoV-2 infection are still unknown and further research is needed to comprehend the oncogenic risk of SARS-CoV-2 and its induced inflammatory response. In this project we used the knowledge graph EpiGraphDB [26] , available at https://epigraphdb.org, to extract the human interactome of protein-protein interactions (PPI) originally obtained from StringDB [27] . Only high confidence interactions (interaction score > 700) were included in the database. We extended the dataset with the SARS-CoV-2 human protein-protein interactions that were extracted from the seminal work of Gordon et al. [3, 4] . Only human target proteins were added to the network. We also included the genetic risk factors of critical illness in COVID-19 identified by [13] . A total of 9 single nucleotide variants (SNPs) were mapped to the nearest gene. Those were related to key mediators of inflammatory organ damage and other host antiviral defense mechanisms. Finally, a list of genes previously associated with tissue specific cancers was extracted from CancerMine (http://bionlp.bcgsc.ca/cancermine/) [29] , a database based on automated mining of the scientific literature to identify and classify cancer genes as oncogenes, tumor suppressor genes and cancer drivers for different cancer types. Note that these are non-exclusive categories, and each gene can have multiple roles. Only genes with more than 2 references in the literature for any cancer type were included to avoid low confidence results. When necessary, conversion of gene names to UniProt identifiers, or vice versa, was performed using BioMart [28] . Network approaches have the potential to provide a holistic representation of unrelated genes or proteins onto a large-scale network context. Here, we built accessible networks of plausible molecular interactions between viral targets, genetic risk factors, and known oncogenes, tumor suppressor genes and cancer drivers for different cancer types. It has been recently demonstrated that SARS-CoV-2 targets specific human cells that express the receptor ACE2 [7] . Hence, we reconstructed 6 different networks, one for each tissue where ACE2 is highly expressed: lung, testis, colon, kidney, stomach, and small intestine. Each one of these networks was constructed based on the SARS-CoV-2 human target proteins, tissue-specific cancer oncogenic genes that interact with these target proteins, and genetic risk factors of critical illness of COVID-19 that interact with any cancer gene. These networks were compiled as CSV files and loaded into the Cytoscape platform, version 3.8.2 [30] . The scripts to reproduce the network construction are available at https://github.com/MRCIEU/covidcancer (https://doi.org/10.5281/zenodo.6391588). To increase the statistical power in our protein-protein interaction networks we reduced the dimensionality of data using network modules and functional annotation. These techniques and their accompanying visualizations tackle the challenge of deriving relevant biological insights from long lists of genes and allow us to gain novel insights about the true biologically important effect, leading to a more intuitive interpretation of the data. We used an edge-betweenness algorithm [31] , as implemented in the Cytoscape plugin ReactomeFIPlugIn version 8.0.3 [32] , to partition the networks into modules of interacting proteins. Only modules with more than 4 nodes were allowed. Subsequently, we looked at the enrichment analysis for pathways and GO annotations for each subset of genes and annotated the modules, using the binomial test implemented in ReactomeFIPlugIn. Top enriched categories with a cutoff FDR<10 -4 were extracted for each module. All network diagrams were drawn with Cytoscape, and layouts were tuned manually to clearly display the modules and different target genes and cancer related genes. The comparison of targeted oncogenic genes and the pathway enrichment between different tissues were performed using Venn diagrams with the R package venn [33] . For a more detailed representation of similarities between tissues we converted the data into binary matrices for genes and pathways. These matrices were plotted with the R package ggcorrplot [34] , but only those genes and pathways shared between two or more tissues were depicted. R version 4.1.2 [35]. P.E. and T.R.G. wrote the manuscript. P.E. conceptualized the project. T.R.G. and P.E. supervised the project. P.E., T.R.G and R.M. acquired funding for the project. P.E. integrated the data and performed the data analysis. All authors contributed to the review and editing of the manuscript and provided comments. Long-term consequences of COVID-19: research needs. The Lancet Infectious Diseases The Impact of COVID-19 on Cancer Risk and Treatment A SARS-CoV-2 protein interaction map reveals targets for drug repurposing Comparative host-coronavirus protein interaction networks reveal pan-viral disease mechanisms The Landscape of Human Cancer Proteins Targeted by SARS-CoV-2 SARS-CoV-2 infection and cancer Evidence for and against a role of SARS-CoV-2 in cancer onset SARS-CoV-2 Receptor ACE2 Is an Interferon-Stimulated Gene in Human Airway Epithelial Cells and Is Detected in Specific Cell Subsets across Tissues Wnt signaling in cancer Control of proliferation and cancer growth by the hippo signaling pathway TP53 mutations in human cancers: functional selection and impact on cancer prognosis and outcomes RET tyrosine kinase signaling in development and cancer Overexpression of ErbB2 in cancer and ErbB2-targeting strategies Genetic mechanisms of critical illness in COVID-19 The role of CCND1 alterations during the progression of cutaneous malignant melanoma Prognostic significance of CCND1 (cyclin D1) overexpression in primary resected nonsmall-cell lung cancer The MAP kinase pathway is required for entry into mitosis and cell survival Mapping the STK4/Hippo signaling network in prostate cancer cell Human Viral Oncogenesis: A Cancer Hallmarks Analysis Multiscale statistical physics of the pan-viral interactome unravels the systemic nature of SARS-CoV-2 infections Triangulation in aetiological epidemiology Hallmarks of cancer: the next generation Sex-specific clinical characteristics and prognosis of coronavirus disease-19 infection in Wuhan, China: A retrospective study of 168 severe patients Public Library of Science SARS-CoV-2 Variants in Patients with Immunosuppression Chronic SARS-CoV-2 infection and viral evolution in a hypogammaglobulinaemic individual. medRxiv Recurrent SARS-CoV-2 Mutations in Immunodeficient Patients. medRxiv EpiGraphDB: a database and data mining platform for health data science STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic acids research BioMart -Biological queries made easy CancerMine: a literature-mined resource for drivers, oncogenes and tumor suppressors in cancer Network biology using Cytoscape from within R. F1000Research 2019 8:1774. F1000 Research Limited Community structure in social and biological networks A human functional protein interaction network and its application to cancer data analysis Draw Venn Diagrams 1 Visualization of a Correlation Matrix using "ggplot2 R: The R Project for Statistical Computing The authors declare no competing interests in relation to the work described.