key: cord-0273317-ieex3n4d authors: Regan, J. A.; Abdulrahim, J.; Bihlmeyer, N.; Haynes, C.; Kwee, L. C.; Patel, M.; Shah, S. H. title: A Phenome Wide Association Study of Severe COVID-19 Genetic Risk Variants date: 2021-12-11 journal: nan DOI: 10.1101/2021.12.08.21267433 sha: c8d02fc4a6721b672cc7f24d35cbefbe75280407 doc_id: 273317 cord_uid: ieex3n4d Background: Genetic loci associated with risk of severe COVID-19 infection have been identified and individuals with complicated COVID-19 infections often have multiple comorbidities. Objective: Identify known and unidentified comorbidities associated with genetic loci linked to risk of severe COVID-19 infection. Methods: A Phenome Wide Association Study (PheWAS) was conducted in 247,448 unrelated, white individuals from the UK Biobank to test the association of 1,402 unique phenotypes with ten genome-wide significant severe-COVID risk single nucleotide polymorphisms (SNP) identified from prior studies. A validation PheWAS was conducted in 2,247 white individuals from the CATHGEN. Results: Four of the ten tested genetic loci showed significant phenotypic associations in UK Biobank after FDR adjustment. Vascular dementia significantly associated with rs7271165 near TMEM65 on 8q24.13 in individuals with the C risk allele (OR 5.66 [95% CI 2.21-11.85], q=0.049). We identified 40 novel phenotype associations with rs657152 on 9q34.2 coinciding with the ABO gene with individuals with the A COVID risk allele having higher odds of heart failure (OR 1.09 [95% CI 1.03-1.14], q=0.004), diabetes mellitus (OR 1.05 [95% CI 1.02-1.07], q=0.004) and hypercholesterolemia (OR 1.04 [95% CI 1.02-1.06], q=6.3x10-5). Eight phenotypes associated with rs1819040 near KANSL1 on 17q21.31 in individuals with the A risk allele including atrial fibrillation and flutter (OR 1.07 [95% CI 1.04-1.10], q=0.0084) and pulmonary fibrosis (OR 0.80 [95% CI 0.71-0.89], q=0.035). Ten novel phenotypic associations were identified in association with rs74956615 on 19p13.2 near the TYK2 gene including individuals with the A COVID risk allele having lower odds of psoriatic arthropathy (OR 0.31 [95% CI 0.20-0.47], q=4.5x10-5), rheumatoid arthritis (OR 0.83 [95% CI 0.64-0.83], p=1.4x10-6) and thyrotoxicosis with or without goiter (OR 0.77 [95% CI 0.68-0.87], p-6.9x10-5). Two associations for rs1819040 (KANSL1) and seven associations for rs74956615 (TYK2) validated in CATHGEN. Conclusions: Using a broad PheWAS approach in a large discovery and validation cohort, we have identified novel phenotypic associations with risk alleles for severe COVID-19 infection. Interestingly, the ABO locus was associated with comorbidities that are also risk factors for severe COVID-19 infection, suggesting that this locus has pleiotropic effects and provides a potential mechanism for this association. The 19p13 locus was associated with lower risk of autoimmune disease, these findings may have broad implications for the importance of multiple comorbidities across both infectious and non-infectious diseases and may provide insight in the molecular function of the genes near these genetic risk loci. 0.71-0.89], q=0.035). Ten novel phenotypic associations were identified in association with rs74956615 on 48 19p13.2 near the TYK2 gene including individuals with the A COVID risk allele having lower odds of psoriatic 49 arthropathy (OR 0.31 [95% CI 0.20-0.47], q=4.5x10 -5 ), rheumatoid arthritis , 50 p=1.4x10 -6 ) and thyrotoxicosis with or without goiter (OR 0.77 [95% CI 0.68-0.87], p-6.9x10 -5 ). Two 51 associations for rs1819040 (KANSL1) and seven associations for rs74956615 (TYK2) validated in CATHGEN. 52 53 Conclusions: Using a broad PheWAS approach in a large discovery and validation cohort, we have identified 54 novel phenotypic associations with risk alleles for severe COVID-19 infection. Interestingly, the ABO locus was 55 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted December 11, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 Introduction 81 There is marked heterogeneity in the clinical manifestations of coronavirus disease 2019 , 82 which is caused by infection with the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). 83 Symptoms can range from mild-flu like symptoms to severe respiratory failure requiring supplemental oxygen, 84 intubation or intensive care unit (ICU) care and multiple distinct cardiovascular complications have also been 85 identified 1 . Demographic factors and existing clinical comorbidities are associated with severe COVID-19. For 86 example, age, male gender, Black and South Asian ancestry, diabetes, obesity and chronic lung disease are 87 associated with increased risk of COVID-19-related mortality 2 . 88 The biologic underpinnings for the heterogeneity in increased risk with these clinical comorbidities is 89 unclear, however, inherited genetic factors have been associated with severe COVID-19. In June 2020, using 90 a genomewide association study (GWAS) analyses of common genetic variants, the Severe COVID GWAS 91 Group first identified two single nucleotide polymorphisms (SNPs) with genome-wide significance for severe 92 COVID infection using a meta-analysis of 1,610 participants with severe COVID-19 with respiratory failure and 93 2,205 healthy controls across seven hospitals in Italy and Spain 3 . The first SNP was rs11385942 at locus 94 3p21.31 with a signal spanning multiple genes including chemokine receptors: SLC6A20, LZTFL1, CCR9, 95 FYCO1, CXRC6 and XCR1. The second SNP was rs657152 at locus 9q34.2, coinciding with the ABO blood 96 locus group, with analyses showing greater risk of severe COVID in type A blood carriers and protective effects 97 in individuals with type O blood group. Further, a small study of 4 young male patients without chronic disease 98 treated for severe COVID-19 requiring mechanical ventilation and ICU care analyzed rare variants and found 99 loss-of-function variants in TLR7 on the X-chromosome with associated impairment in type I and type II 100 interferon (IFN) responses 4 . In December 2020, the Genetics Of Mortality In Critical Care (GenOMICC) 101 genome-wide association study of 2,244 critically ill patients across 208 United Kingdom (UK) ICUs identified 102 and replicated four additional genome-wide significant signals 5 . Most recently in July, 2021, the COVID-19 103 Host Genetics Initiative has identified ten distinct genome-wide significant loci associated with severe COVID-104 19 and confirmed the findings of these earlier studies 6 . 105 Given that greater comorbidities have also been observed in patients with severe COVID-19 infection 106 we aimed to identify association between a wide range of comorbidities for these same genetic loci associated 107 with severe COVID-19, with the goal of better understanding potential genetic risk of severe COVID-19 108 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted December 11, 2021. ; https://doi.org/10.1101/2021.12.08.21267433 doi: medRxiv preprint mediated by these variants. Phenome-wide association study (PheWAS) has emerged as an unbiased 109 approach to identify novel associations of previously identified, disease-associated genetic variants, across 110 many phenotypes. One such PheWAS study has been conducted for the 3p21.31 locus 7 , however, additional 111 phenotypic associations and broader implications of risk for additional identified COVID-19 genetic loci have 112 not yet been described. 113 Methods 115 The UK Biobank is a prospective population-based cohort with deep genetic and phenotypic data 117 collected on 500,000 individuals 40-69 years old at recruitment (2006) (2007) (2008) (2009) (2010) across the United Kingdom 8 . For 118 this study, we selected a discovery cohort composed of 247,488 unrelated white British ancestry UK Biobank 119 participants with both high-quality genotype and phenotype data available. Genetic data were obtained from 120 either the UK Biobank Axiom array from Affymetrix or UK BiLEVE Axiom array and are reported in Genome 121 Reference Consortium Human Reference 37 (GRCh37). Imputation was performed by UK Biobank using a 122 merged panel of the Haplotype Reference Consortium (HRC) panel, the UK10K panel and the 1000 Genome 123 Phase 3 panel. Phenotype data was derived from International Classification of Diseases (ICD) codes from 124 primary care data, hospitalizations and death-related data in the UK Biobank. Data for this project was 125 approved and accessed through approved UK Biobank applications (Application ID: 48785 and 65043). 126 The validation cohort consisted of individuals from the CATHGEN study, a study of 9334 sequential 127 individuals who underwent cardiac catherization at Duke University Medical Center (Durham, NC) between 128 2001-2010 9 . The validation cohort was comprised of 2,247 self-reported white individuals from the CATHGEN 129 study. Genotype data were obtained using the Illumina Human Omni1-Quad Infinium Bead Chip, and imputed 130 with Minimac4 using 1000G phase 3 reference panels and are reported in GRCh37 10 . Phenotype data was 131 derived from electronic health record data from 2001-2020. 132 Both studies have human subjects research approval from the Duke Institutional Review Board (IRB) 133 and all participants provided informed consent. 134 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted December 11, 2021. ; https://doi.org/10.1101/2021.12.08.21267433 doi: medRxiv preprint Ten single nucleotide polymorphisms (SNP) were selected for use from prior studies of Specifically, amongst the other fine-mapped SNPs for each loci, consideration was given to linkage 137 disequilibrium, availability of the variant data in each cohort and minor allele frequency (MAF) for inclusion, 138 with the goal of selecting only one SNP from each locus. The ten severe COVID-19 associated loci studied 139 here are: rs11385942 (3p21.31 composed of a six gene cluster, risk allele GA, MAF=0.07), rs1886814 ( Outcomes were defined by the PheCODE scheme using ICD 9 and ICD 10 codes to identify 152 phenotypes, where incident and prevalent cases were included for both studies 11, 12 . The R PheWAS package 153 was used for all analyses 13 . All outcomes with <20 cases were excluded resulting in 1,403 phenotypes 154 assessed in UK Biobank and 1,002 phenotypes in CATHGEN. The full list of phenotypes can be found in 155 Supplementary Table 1. Logistic regression was performed for disease outcomes and all analyses were 156 adjusted for age, genotyping array, gender and the first five genetic principal components. In the UK Biobank 157 discovery cohort, significant associations were considered at an FDR adjusted p-value (q-value) and nominal 158 associations were considered at p<0.05. In the CATHGEN cohort, nominal association at p<0.1 was 159 considered for validation. All analyses were performed in using R v4.0.2. 160 161 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted December 11, 2021. ; https://doi.org/10.1101/2021.12.08.21267433 doi: medRxiv preprint Four out of the ten tested severe COVID-19-associated genetic risk loci showed significant phenotypic 164 associations in the UK Biobank dataset after FDR adjustment. First, vascular dementia significantly associated 165 with rs72711165 (TMEM65) (OR 5.66 [95% CI 2.21-11.85], q=0.049, but did not validate in CATHGEN (Figure 166 1A, Supplemental (Figure 1C , 174 Supplemental Table 6 ). Only glaucoma validated in CATHGEN (p<0.1). Ten phenotypes were significantly 175 associated with rs74956615 (TYK2) in UK Biobank after FDR adjustment, all with lower odds associated with 176 the COVID-19 risk allele (Figure 1D , Supplemental preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Table 9 ). One phenotype validated in CATHGEN for lower odds of Immunity deficiency associated with the 218 COVID-19 risk allele (p<0.05). 219 No phenotypes were significantly associated with rs2236757 in UK Biobank (Supplemental Table 10 Using an unbiased PheWAS approach to clinical diagnoses in a large dataset of genetic and electronic 231 health record data, we have identified novel phenotypic associations with the risk alleles from four of ten loci 232 previously identified as associated with severe COVID-19 infection. These associations could suggest that 233 individuals carrying these genetic markers, known for their role in blood traits, host anti-viral response and 234 inflammation, may have modified risk of cardiovascular disease, as well as auto-immune and inflammatory 235 disorders including arthropathies and endocrinopathies, which in turn increases risk of severe Alternatively, these genetic risk loci may have pleiotropic effects on these diseases and on COVID-19 related 237 No prior phenotypic associations are published for the rs72711165 SNP near TMEM65. TMEM65 is a 239 mitochondrial inner-membrane protein that may play a role in mitochondrial respiration and cardiac 240 development and function. Mutations in TMEM65 have been described to cause mitochondrial myopathy and 241 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted December 11, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 neurologic disease 14 . Direct mechanisms related to the association with vascular dementia identified here are 242 unclear, but warrant further investigation. 243 Prior to being identified as a risk variant for severe COVID-19, rs657152 within the ABO blood locus 244 group, had been associated with hypercoagulable state, arterial embolism and thrombosis and other disorders 245 of circulatory system 11 . We validated these previously reported associations for the rs657152 SNP and 246 identified novel associations including with greater odds of heart failure, diabetes mellitus, and 247 hypercholesterolemia and lower odds of gastrointestinal disorders including duodenal ulcer and duodenitis. 248 Genetic predisposition for these cardiovascular and endocrine phenotypes may amplify the risk of adverse 249 COVID-19 outcomes but may also have broader long-term health implications 15 . Taken together these 250 associations add support to risk factors contributing to a hypercoagulable state, as both the rs657152 risk 251 allele and COVID-19 infection itself may increase risk of via multiple mechanisms of thrombosis 16 . 252 Mutations in KANSL1 are known to cause neurodevelopmental delay disorders described within 253 17q21.31 deletion syndrome or Koolen-de Vries syndrome 17 . KANSL1 plays a role in histone acetylation, 254 microtubule stabilization and mitochondrial respiration 18 . Here, we identified novel associations with the 255 rs1819040 SNP near KANSL1, including greater odds of atrial fibrillation, hypothyroidism and glaucoma, and 256 interestingly, lower odds of postinflammatory pulmonary fibrosis. Biologic mechanisms linking these 257 associations, as well as the risk of severe COVID-19, warrant further study. 258 We also found that rs74956615 near the TYK2 gene was associated with lower odds of psoriasis and 259 related disorders, rheumatoid arthritis and thyrotoxicosis, as well as greater odds of tobacco use disorder. 260 Adding strength to the results for rs74956615, these findings nominally validated in the CATHGEN cohort. however, these studies have had mixed directions of effect. A recent systematic review and meta-analysis 267 identified protective effects against autoimmune disease for five TYK2 SNPs and risk for SLE associated with 268 one 20 . Rare coding variants found to have protective effects have been associated with reductions in IL-23 and 269 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted December 11, 2021. ; https://doi.org/10.1101/2021.12.08.21267433 doi: medRxiv preprint psoriasis associated with rs74956615, which may implicate a distinct impact of this allele on TYK2 gene 271 function from what has been previously identified in prior GWAS analysis of psoriasis. Notably previous 272 investigators studying the protective impact of TYK2 variants on autoimmune disease did not identify 273 pleiotropic effects via PheWAS analyses 22 and the associations of TYK2 and thyroid disease found in the 274 present analyses have not been previously reported, however the utilization of the UK Biobank cohort 275 represents the largest analysis of TYK2 variants to date. Our study design does not allow for more detailed 276 confirmation of whether the reported cases of hypothyroidism may have been autoimmune in etiology. 277 Though findings did not reach prespecified significance thresholds in the present analyses, the other 278 preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted December 11, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 The Variety of Cardiovascular Presentations of COVID-19 Factors associated with COVID-19-related death using 299 OpenSAFELY Genomewide Association Study of Severe Covid-301 19 with Respiratory Failure Men With Severe COVID-19 Genetic mechanisms of critical illness in COVID-19 Initiative C-HG. Mapping the human genetic architecture of COVID-19 Altered blood cell traits underlie a major genetic locus of severe 309 COVID-19 The UK Biobank resource with deep phenotyping and genomic 311 data A Guide for a Cardiovascular Genomics Biorepository: 313 the CATHGEN Experience Next-generation genotype imputation service and methods. Nat 315 Genet Systematic comparison of phenome-wide association study 317 of electronic medical record data and genome-wide association study data Mapping ICD-10 and ICD-10-CM Codes to Phecodes: Workflow 320 Development and Initial Evaluation R PheWAS: data analysis and plotting tools for phenome-wide 322 association studies in the R environment A mutation in the TMEM65 gene results in mitochondrial myopathy 325 with severe neurological manifestations Prevalence and impact of cardiovascular metabolic diseases on COVID-19 327 in China Thrombosis in COVID-19 Mutations in KANSL1 cause the 17q21.31 microdeletion 331 syndrome phenotype Imbalanced autophagy causes synaptic deficits in a 333 human model for neurodevelopmental disorders Clinical efficacy of launched JAK inhibitors in rheumatoid arthritis Association of TYK2 338 polymorphisms with autoimmune diseases: A comprehensive and updated systematic review with meta-339 analysis Identification of rare coding 341 variants in TYK2 protective for rheumatoid arthritis in the Japanese population and their effects on cytokine 342 signalling TYK2 protein-coding variants protect against rheumatoid arthritis 344 and autoimmunity, with no evidence of major pleiotropic effects on non-autoimmune complex traits All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.The copyright holder for this this version posted December 11, 2021. ; https://doi.org/10.1101/2021.12.08.21267433 doi: medRxiv preprint All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.The copyright holder for this this version posted December 11, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 356 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.The copyright holder for this this version posted December 11, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021