key: cord-0770165-qk5owhg3 authors: Das, Asim Bikas title: Lung Disease Network Reveals the Impact of Comorbidity on SARS-CoV-2 infection date: 2020-05-13 journal: bioRxiv DOI: 10.1101/2020.05.13.092577 sha: 5d5098848cce80723641592054718be95573a63b doc_id: 770165 cord_uid: qk5owhg3 Higher mortality of COVID19 patients with comorbidity is the formidable challenge faced by the health care system. In response to the present crisis, understanding the molecular basis of comorbidity is essential to accelerate the development of potential drugs. To address this, we have measured the genetic association between COVID19 and various lung disorders and observed a remarkable resemblance. 141 lung disorders directly or indirectly linked to COVID19 result in a high-density disease-disease association network that shows a small-world property. The clustering of many lung diseases with COVID19 demonstrates a greater complexity and severity of SARS-CoV-2 infection. Furthermore, our results show that the functional protein-protein interaction modules involved RNA and protein metabolism, substantially hijacked by SARS-CoV-2, are connected to several lung disorders. Therefore we recommend targeting the components of these modules to inhibit the viral growth and improve the clinical conditions in comorbidity. to understand the risk of COVID19 with comorbidity, we have constructed and analyzed the disease-gene and disease -disease association map of the STN. To construct a disease association map of STN, we obtained the disease-gene association data from the ORGANizer database [12] . 184 lung diseases, 1957 genes, and 6039 disease-gene pairs were considered for further analysis (see Methods) (supplementary table 2). To construct the disease-gene association map, we screened the diseases which are associated with proteins (nodes) in STN. A disease and node are then connected if the node is associated with the disorder in the lungs. We observed, 618 gene/proteins, consisting of 36 SARS-CoV-2 targets, are linked to a total of 146 disorders, which includes COVID19 (supplementary table 3 ). Figure. Similarly, a disorder in the LDGN is also connected to multiple genes. For instance, ventricular septal defect (k =142), respiratory insufficiency (k =133), congestive heart failure (k =95), apnea (k =63) and hypothyroidism (k =60) ( Fig. 2b and Fig S1) . The disease-gene association pattern in LDGN indicates the molecular connection of COVID19 with a wide range of lung disorders. To comprehend the association between COVID19 and lung diseases, a disease-disease association network (DDAN) was constructed, where two diseases were linked if they share one associated gene (Fig 2d) . DDAN consists of a total of 141 diseases (nodes) and 1326 links, indicating a higher clustering between diseases. We observed 49 diseases (red nodes) in DDAN, which are directly connected to COVID19 (yellow node) (Fig. 3d) . Jaccard similarity coefficient was computed based on the number of common genes to identify the extent of molecular overlapping between lung diseases and COVID19. There are several diseases, like respiratory insufficiency, congestive heart failure, respiratory failure, ventricular septal defect, mitral regurgitation, and hyperthyroidism, which are closely associated with COVID19 ( Fig. S2a and b) . Thus patients having these disorders probably are more vulnerable for COVID19 symptoms or vice versa because of overlapping molecular connections. We observed that the degree distribution of DDAN does not follow the scale-free property (Fig.2e) A network is said to be a small-world network if S>1 [13] . Hence the topology of DDAN represents a small-world property, indicating any two diseases in DDAN have a high tendency to be interconnected and resulting in the overlapping pathogenesis between the diseases in DDAN. The molecular similarities between these lung disorders create a highdensity comorbidity cluster and contribute to higher mortality in COVID19 patients. Therefore, it is necessary and a challenge to develop effective drugs to control the patientspecific risks of comorbidity in SARS-CoV-2 infection. However, it is difficult to select and prioritize the targets for treatment due to the several overlapping molecular connections. Therefore, we propose to target host functional protein modules associated with different disorders and hijacked by SARS-CoV-2. Modularity in the network refers to the pattern of connectedness in which nodes are grouped into highly connected subsets [14] . One of the key features in the protein interaction network is that the tightly connected proteins within a community are mostly involved in similar biological functions [15] . Similarly, genes involved in related diseases are shown to be highly connected; moreover, diseases linked to common genes resulting in the formation of disease modules and comorbidity [16] . We have compared various community detection algorithms, i.e., fast-greedy, walktrap, louvain, leading eigenvector, and spinglass, to identify protein modules in STN [17, 18] . Spinglass showed good partitioning, i.e., higher modularity score compared to other algorithms (see methods and supplementary table 4 system are associated with these modules (Fig S3) . Gysi et al. [20] predicted the manifestation of SARS-CoV-2 in different human tissues could cause various disorders. Therefore not only lung-related disorders but comorbidity in various organs can also be a potential threat for COVID19 patients. To strengthen this observation, the pattern of coexpression of genes in functional modules was analyzed. Genes in the same functional module often show a high coexpression profile; therefore, we have calculated Pearson correlation coefficients of pairs of genes using gene expression data of healthy lung tissue from TCGA. The median value of the positive correlation between the genes in all modules is significantly higher (p-value< 0.0001) compared with the random gene set (Fig3, fourth column) . Therefore theses modules can be identified as coexpress modules that share core transcriptional programs in the lung, indicating that their perturbation can result in a similar disease phenotype. We propose to target functional protein modules, hijacked by SARS-CoV-2, by drug repositioning. There are two main reasons to target these modules. Firstly, the binding of a drug to its target in a module will prevent the replication of the virus. Secondly, as a module is linked to several lung diseases, targeting a module can improve the severity of comorbidity. We identified 56 approved targets in the functional modules (red color nodes in Fig.4) from DrugBank [21] . Considering the complexity of COVID19, we also suggest using combination therapy to target multiple highly connected nodes simultaneously in the same or different functional modules (indicated by the arrow in Fig.4 ). For example, NTRK1 (k=43), and IMPDH2 (k=37) in module1, as well as PLAT (k=17), and COMT (k=10) in module2. Reads Per Kilobase Million) transformed data of adjacent healthy tissue of 59 lung adenocarcinoma patients were retrieved, and Pearson correlation coefficient was computed to measure the coexpression levels using Hmisc Package in R. Average path length, transitivity, dyadicity, and Jaccard similarity coefficient were measured using igraph package in R. Average path length refers to the average length of pairwise shortest paths from a set of nodes to another set of nodes and transitivity (T) indicates the relative number of triangles in the graph, compared to a total number of connected triples of nodes. Dyadicity (D) measures the number of same label edges divided by the expected number of same label edges, and D> 1 indicates higher connectedness between the nodes with the same label. Jaccard similarity coefficient of two nodes is the number of common neighbors divided by the number of nodes that are neighbors of at least one of the two nodes being considered. Random network models were generated using the 1000 Erdös-Rényi random graph model of the same density. The random networks were compared with the original network by measuring the Z-score and p-value. R packages tidyverse and stringr were used for data analysis, and plotting of graphs was done by ggplot2. Networks were visualized using Gephi. All statistical tests were performed using R. Implications of COVID-19 for patients with pre-existing digestive diseases Prevalence and impact of cardiovascular metabolic diseases on COVID-19 in China Comorbidity and its impact on 1590 patients with Covid-19 in China: A Nationwide Analysis Disease association of human tumor suppressor genes The human disease network Large-scale mining disease comorbidity relationships from post-market drug adverse events surveillance data A SARS-CoV-2 protein interaction map reveals targets for drug repurposing Emerging SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant The TissueNet v.2 database: A quantitative view of proteinprotein interactions across human tissues Community of protein complexes impacts disease association Gene ORGANizer: linking genes to the organs they affect Small-World Brain Networks Revisited The road to modularity Comparison of module detection algorithms in protein networks and investigation of the biological meaning of predicted modules Network medicine: a networkbased approach to human disease Statistical mechanics of community detection Fast unfolding of communities in large networks Topological and functional comparison of community detection algorithms in biological networks Network Medicine Framework for Identifying Drug Repurposing Opportunities for COVID-19 DrugBank 5.0: a major update to the DrugBank database for 2018 Drugtarget network DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants Statistical mechanics of community detection Metascape provides a biologist-oriented resource for the analysis of systems-level datasets GOSemSim: an R package for measuring semantic similarity among GO terms and gene products