key: cord-1050675-z7r1w4c1 authors: Oh, J. H.; Tannenbaum, A.; Deasy, J. title: Identification of biological correlates associated with respiratory failure in COVID-19 date: 2020-09-30 journal: nan DOI: 10.1101/2020.09.29.20204289 sha: 132c0f8a31561b2ae6037901f1ad99c0613c7143 doc_id: 1050675 cord_uid: z7r1w4c1 Background: Coronavirus disease 2019 (COVID-19) is a global public health concern. Recently, a genome-wide association study (GWAS) was performed with participants recruited from Italy and Spain by an international consortium group. Methods: Summary GWAS statistics for 1610 patients with COVID-19 respiratory failure and 2205 controls were downloaded. In the current study, we analyzed the summary statistics with the information of loci and p-values for 8,582,968 single-nucleotide polymorphisms (SNPs), using gene ontology analysis to determine the top biological processes implicated in respiratory failure in COVID-19 patients. Results: We considered the top 708 SNPs, using a p-value cutoff of 5x10-5, which were mapped to the nearest genes, leading to 144 unique genes. The list of genes was input into a curated database to conduct gene ontology and protein-protein interaction (PPI) analyses. The top-ranked biological processes were wound healing, epithelial structure maintenance, muscle system processes, and cardiac-relevant biological processes with a false discovery rate < 0.05. In the PPI analysis, the largest connected network consisted of 8 genes. Through literature search, 7 out of the 8 genes were found to be implicated in both pulmonary and cardiac diseases. Conclusion: Gene ontology and protein-protein interaction analyses identified cardio-pulmonary processes that may partially explain the risk of respiratory failure in COVID-19 patients. Coronavirus disease 2019 caused by a novel coronavirus (severe acute respiratory syndrome coronavirus 2, SARS-CoV-2) has resulted in a global pandemic with a rapidly developing global health and economic crisis [1] . Most people with are asymptomatic or experience only mild symptoms [2] . However, about 5% of patients infected with the coronavirus develop acute lung injury and acute respiratory distress syndrome, possibly leading to lethal lung damage and even death [3] . The most common reported comorbidities associated with poor outcomes in COVID-19 include hypertension, diabetes, cardiovascular disease, and chronic respiratory infections [4, 5] . However, the underlying molecular mechanisms in severe COVID-19 and their interplay with such comorbidities or clinical factors are poorly understood [6] . To identify putative biomarkers that can help better understand the molecular basis of COVID-19, Blanco-Melo et al. investigated the host transcriptional response to SARS-CoV-2 and other respiratory infections through in vitro, ex vivo, and in vivo experiments [1] . Bioinformatical approaches including gene ontology and protein-protein interaction (PPI) analyses were performed to identify key biological correlates. To investigate key genetic variants associated with respiratory failure in COVID-19 patients, a genome-wide association study (GWAS) was carried out on participants recruited from Italy and Spain [7] . In the current study, we performed an in-depth biological characterization including gene ontology and PPI analyses on summary statistics that resulted from the GWAS analysis in order to identify key biological correlates relevant to respiratory failure in COVID-19 patients. The GWAS conducted by an international consortium group involved 1980 patients with severe acute respiratory failure induced by COVID-19 at seven hospitals in Italy and Spain [7] . After quality control, the final case-control cohort included 835 patients and 1255 control participants from Italy and 775 patients and 950 control participants from Spain. After genotyping and imputation on genome build GRCh38, univariate analysis was performed for 8,582,968 single-nucleotide polymorphisms (SNPs). The resulting summary statistics including individual SNP positions and p-values were submitted to the European Bioinformatics Institute (www.ebi.ac.uk/gwas; accession numbers, GCST90000255 and GCST90000256) and are available from www.c19-genetics.eu. The GCST90000255 was a main analysis in which all the association statistics were corrected for the top 10 principal components (PCs), whereas in the additional analysis of GCST90000256, association statistics were corrected for the top 10 PCs, age, and sex. In the current study, we downloaded the summary statistics of GCST90000255 for further biological analysis. To further enrich gene ontology terms with more plausible SNPs likely relevant to acute respiratory failure in COVID-19, we employed a relaxed p-value of 5×10 -5 as a filtering threshold on the summary statistics. The SNPs with p-values < 5×10 -5 were mapped to nearest genes using a 50kb window on both upstream and downstream sides of each gene. The resulting list of genes was together fed into MetaCore software (Thompson Reuters, New York, NY) for gene ontology analysis. Further PPI analysis was performed to explore the largest connected network, assuming that interacting proteins in a biological network may have the same or similar molecular functions [8] [9] [10] . All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 30, 2020. . preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 30, 2020. . https://doi.org/10.1101/2020.09.29.20204289 doi: medRxiv preprint In our analysis, with a p-value threshold of 5×10 -5 applied to the summary statistics, 708 SNPs remained and a corresponding set of 144 unique genes in autosomes was found. The list of genes was fed into a MetaCore database. Table 1 shows the top 10 biological processes and corresponding genes that appear to be relevant to respiratory failure in COVID-19 patients, all with a false discovery rate (FDR) < 0.05. Wound healing, epithelial structure maintenance, muscle system process, and cardiac-relevant biological processes were top-ranked. Zhang [30] The largest connected PPI network in the list of 144 genes is shown in Figure 2 . The PPI network consisted of 8 genes: GATA4, ID2, MAFA, NOX4, PTBP1, SMAD3, TUBB1, and WWOX. We conducted a literature search in PubMed to investigate the potential associations between those 8 genes/proteins and pulmonary or cardiac diseases. Table 2 lists an overview of reported studies in terms of the associations. Interestingly, except All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 30, 2020. . https://doi.org/10.1101/2020.09.29.20204289 doi: medRxiv preprint for MAFA that is involved in insulin secretion, all 7 genes were found to be implicated in both pulmonary and cardiac diseases. Summary statistics from a GWAS for respiratory failure in COVID-19 patients was analyzed employing techniques from bioinformatics. To enrich biological discovery, a relaxed p-value of 5×10 -5 was adopted, which likely enabled the inclusion of much potential genomic information in the analysis and the identification of plausible biological correlates associated with pulmonary or cardiac symptoms. A list of SNPs filtered by the relaxed p-value threshold was mapped to genes. The resulting 144 genes were fed into a MetaCore database for gene ontology and PPI analyses. Gene ontology analysis identified wound healing, cardiac-related biological process, and muscle system process as key correlates. For PPI analysis, we attempted to find the largest connected network in the list of 144 genes, assuming that interacting proteins in a biological network tend to have the same or similar molecular functions. As a result, a network consisting of 8 genes was identified including GATA4, ID2, MAFA, NOX4, PTBP1, SMAD3, TUBB1, and WWOX. A literature search was conducted through PubMed to investigate whether there were previously reported results in terms of biological associations between these genes/proteins and respiratory or cardiac symptoms. Interestingly, we found that most of these genes are relevant to both respiratory and cardiac diseases. In what follows, we describe the role of these biomarkers in biology. All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 30, 2020. . https://doi.org/10.1101/2020.09.29.20204289 doi: medRxiv preprint A study reported that GATA4 plays a critical role as a transcription factor in normal pulmonary development [11] . GATA4 also has been found to be a human candidate gene relevant to congenital heart disease [12, 13] . Several studies showed that GATA4 is a key protein responsible for the development of the lung, heart, and diaphragm in mice [14] [15] [16] . Arwood et al. described a mechanism of pulmonary hypertension in heart failure with preserved ejection fraction (HFpEF) using transcriptome-wide RNA sequencing [17] . When comparing transcriptomics between patients with combined post-and pre-capillary pulmonary hypertension and those without pulmonary hypertension, six differentially expressed genes were identified. In a further replication test on an independent cohort, only ID2 was validated and in an animal study, ID2 expression was significantly upregulated in mice with HFpEF and pulmonary hypertension compared to control mice. Another study showed a functional role of ID2 as one of the culprit genes in both the arterial and the venous poles of the heart [18] . An increased expression of NOX4 and TGF-β was found to be correlated with the increased volume of airway smooth muscle mass and epithelial cells in small airways in the lung with chronic obstructive pulmonary disease (COPD) [19] . Another study reported that the upregulation of NOX4 in heart induced cardiac remodeling, suggesting its potential to reduce the severity of established heart failure [20] . Gauldie et al. demonstrated the interactions between inflammation, TGF-β activation, and SMAD3 signaling and pulmonary fibrosis and emphysema [21] . Huang et al. found that All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 30, 2020. . SMAD3 is a key mediator in chronic cardiovascular disease, and plays a critical role in hypertensive cardiac remodeling [22] . as key splicing factors in heart development and cardiovascular disease [26] . A study reported that loss of WWOX promoted cell proliferation in pulmonary artery smooth muscle cells and contributed to pulmonary vascular remodeling in pulmonary arterial hypertension [27] . Another study reported the vital implications of WWOX in atherosclerosis and cardiovascular diseases [28] . MAFA has not been found to be directly related to pulmonary or cardiac symptoms in the literature review. However, MAFA has been shown to be a key regulator that controls genes implicated in insulin secretion [29, 30] . A recent study indicated that a number of patients with COVID-19, who were comorbid with diabetes and several diabetes-related traits, were associated with increased ACE2 expression [31] . This suggests that ACE2 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 30, 2020. . appears to be a potentially key molecular link between insulin resistance and COVID-19 severity [32] . This combined evidence indicated that lung disease is likely to be associated with cardiovascular risk. Further research should be warranted to identify the common biological processes between lung and heart diseases and the interplay between them. We further assessed various filtering thresholds. With a stricter p-value of 1×10 -5 , 390 This implies that the selection of an optimal threshold is critical to identify real biological correlates. Information informed by machine learning-based predictive modeling on GWAS data, which we employed in other studies [8, 9] , can help resolve the issue. We analyzed summary statistics from a GWAS where individual SNPs were tested for associations with respiratory failure in COVID-19 patients. Bioinformatic approaches with SNPs filtered using a relaxed p-value enabled the identification of plausible biological All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 30, 2020. . correlates that are likely to be relevant to pulmonary or cardiac symptoms. When genotyping data become available, a more in-depth analysis using machine learning and bioinformatics techniques should provide greater insights into the underlying mechanisms of respiratory failure in COVID-19 patients. Imbalanced Host Response to SARS-CoV-2 Drives Development of COVID-19 COVID-19: Role of neutrophil extracellular traps in acute lung injury A Systematic Review of Cases of Acute Respiratory Distress Syndrome in the Coronavirus Disease Exploring the clinical association between neurological symptoms and COVID-19 pandemic outbreak: a systematic review of current literature Coronavirus Disease Pandemic (COVID-19): Challenges and a Global Perspective T-cell hyperactivation and paralysis in severe COVID-19 infection revealed by single-cell analysis Genomewide Association Study of Severe Covid-19 with Respiratory Failure Computational methods using genome-wide association studies to predict radiotherapy complications and to identify correlative molecular processes Machine Learning on a Genome-wide Association Study to Predict Late Genitourinary Toxicity After Prostate Radiation Therapy Machine learning on genome-wide association studies to predict the risk of radiation-associated contralateral breast cancer in the WECARE Study Gata4 is necessary for normal pulmonary lobar development GATA4 mutations cause human congenital heart defects and reveal an interaction with TBX5 Spectrum of atrial septal defects associated with mutations of NKX2.5 and GATA4 transcription factors Impaired mesenchymal cell function in Gata4 mutant mice leads to diaphragmatic hernias and primary lung defects Gene regulatory networks in the evolution and development of the heart Retinoid receptors in the developing human lung Transcriptome-wide analysis associates ID2 expression with combined pre-and post-capillary pulmonary hypertension Expression of Id2 in the second heart field and cardiac defects in Id2 knock-out mice NOX4 expression and distal arteriolar remodeling correlate with pulmonary hypertension in COPD NADPH oxidase 4 induces cardiac fibrosis and hypertrophy through activating Akt/mTOR and NFκB signaling pathways Smad3 signaling involved in pulmonary fibrosis and emphysema Smad3 mediates cardiac inflammation and fibrosis in angiotensin II-induced hypertensive cardiac remodeling Identification of gene biomarkers for respiratory syncytial virus infection in a bronchial epithelial cell line The TUBB1 Q43P functional polymorphism reduces the risk of cardiovascular disease in men by modulating platelet function and structure Identification of MicroRNA-124 as a Major Regulator of Enhanced Endothelial Cell Glycolysis in Pulmonary Arterial Hypertension via PTBP1 (Polypyrimidine Tract Binding Protein) and Pyruvate Kinase M2 The Emerging Role of the RBM20 and PTBP1 Ribonucleoproteins in Heart Development and Cardiovascular Diseases Smooth Muscle Cell Loss of the Tumor Suppressor WWOX Contributes to the Development of Pulmonary Hypertension Modeling WWOX Loss of Function in vivo: What Have We Learned? Front Oncol MAFA controls genes implicated in insulin biosynthesis and secretion MafA is a key regulator of glucose-stimulated insulin secretion Exploring Diseases/Traits and Blood Proteins Causally Related to Expression of ACE2, the Putative Receptor of SARS-CoV-2: A Mendelian Randomization Analysis Highlights Tentative Relevance of Diabetes-Related Traits Coronavirus and Obesity: Could Insulin Resistance Mediate the Severity of Covid-19 Infection? Front Public Health This research was funded in part through the NIH/NCI Cancer Center Support Grant P30 CA008748 and R21 CA234752. The authors declare no conflicts of interest.