key: cord-0312936-aapxsi7y authors: Kousathanas, A.; Pairo-Castineira, E.; Rawlik, K.; Stuckey, A.; Odhams, C. A.; Walker, S.; Russell, C. D.; Malinauskas, T.; Millar, J.; Elliott, K. S.; Griffiths, F.; Oosthuyzen, W.; Morrice, K.; Keating, S.; Wang, B.; Rhodes, D.; Klaric, L.; Zechner, M.; Parkinson, N.; Bretherick, A. D.; Siddiq, A.; Goddard, P.; Donovan, S.; Maslove, D.; Nichol, A.; Semple, M. G.; Zainy, T.; Maleady-Crowe, F.; Todd, L.; Salehi, S.; Knight, J.; Elgar, G.; Chan, G.; Arumugam, P.; Fowler, T. A.; Rendon, A.; Shankar-Hari, M.; Summers, C.; Elliott, P.; Yang, J.; Wu, Y.; GenOMICC Investigators,; Investiga, 23andMe title: Whole genome sequencing identifies multiple loci for critical illness caused by COVID-19 date: 2021-09-02 journal: nan DOI: 10.1101/2021.09.02.21262965 sha: acaf4b332891c67d1dbee9d78aea562b71fe8b59 doc_id: 312936 cord_uid: aapxsi7y Critical illness in COVID-19 is caused by inflammatory lung injury, mediated by the host immune system. We and others have shown that host genetic variation influences the development of illness requiring critical care or hospitalisation following SARS-Co-V2 infection. The GenOMICC (Genetics of Mortality in Critical Care) study is designed to compare genetic variants in critically-ill cases with population controls in order to find underlying disease mechanisms. Here, we use whole genome sequencing and statistical fine mapping in 7,491 critically-ill cases compared with 48,400 population controls to discover and replicate 22 independent variants that significantly predispose to life-threatening COVID-19. We identified 15 new independent associations with severe COVID-19, including variants within genes involved in interferon signalling (IL10RB, PLSCR1), leucocyte differentiation (BCL11A), and blood type secretor status (FUT2). Using transcriptome-wide association and colocalisation to infer the effect of gene expression on disease severity, we find evidence implicating expression of multiple genes, including reduced expression of a membrane flippase (ATP11A), and increased mucin expression (MUC1), in severe disease. We show that comparison between critically-ill cases and population controls is highly efficient for genetic association analysis and enables detection of therapeutically-relevant mechanisms of disease. Therapeutic predictions arising from these findings require testing in clinical trials. : Lead variants from independent regions in the per-population GWAS and trans-ancestry meta-analysis. Variants and the reference and alternate allele are reported with hg38 build coordinates. Asterisk (*) indicates the risk allele. For each variant, we report the risk allele frequency in Europeans (RAF), the odds ratio and 95% confidence interval, and the association P -value. Consequence indicates the worst consequence predicted by VEP99, and Gene indicates the VEP99predicted gene, but not necessarily the causal mediator. Expression indicates genes where is evidence of gene expression affecting COVID-19 severity, found by TWAS and colocalisation analysis. individuals (critically-ill cases n = 7, 491, controls (100k) n = 46, 770, controls (mild COVID- Figure 1 : GWAS results for EUR ancestry group, and trans-ancestry meta-analysis. Manhattan plots are shown on the left and quantile-quantile (QQ) plots of observed versus expected P values are shown on the right, with genomic inflation (λ) displayed for each analysis. Highlighted results in blue in the Manhattan plots indicate variants that are LD-clumped (r 2 =0.1, P 2 =0.01, EUR LD) with the lead variants at each locus. Gene name annotation by Variant Effect Predictor (VEP) indicates genes impacted by the predicted consequence type of each lead variant. The red dashed line shows the Bonferroni-corrected P -value=2.2 × 10 −8 . . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted September 2, 2021. Table 2 : Replication in a combined data from external studies -combined meta-analysis of HGI freeze 6 B2 and 23andMe. Odds ratios and P -values are shown for variants in LD with the lead variant that were genotyped/imputed in both sources. Chromosome, reference and alternate allele correspond to the build hg38. An asterisk (*) next to the HGI and 23andme meta-analysis P -value indicates that the lead signal is replicated with P -value<0.002 with a concordant direction of effect. Citation lists the first publication of confirmed genome-wide associations with critical illness or (in brackets) any COVID-19 phenotype. Replication was performed using summary statistics generously shared by collaborators: data from 128 the COVID-19 Host Genetics Initiative (HGI) data freeze 6 were combined using meta-analysis 129 with data shared by 23andMe (Methods). Although the HGI programme included an analysis 130 intended to mirror the GenOMICC study (analysis "A2"), there are currently insufficient cases 131 from other sources available to attempt replication, so we used the broader hospitalised phenotype 132 (analysis "B2") for replication. We removed signals in the HGI data derived from GenOMICC cases 133 using mathematical subtraction (see Methods) to ensure independence. Using LD clumping to find 134 variants genotyped in both the discovery and replication studies, we required P < 0.002 (0.05/25) 135 and concordant direction of effect (Table 2) for replication. 136 We replicated 22 of the 25 significant associations identified in the population specific and/or 137 trans-ancestry GWAS. Two of the three loci not replicated correspond to rare alleles that may not be We inferred credible sets of variants using Bayesian fine-mapping with susieR 14 , by analysing the 143 6 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted September 2, 2021. , but found no significant associations at a gene-wide significance level. All individual rare variants 163 included in the tests had P -values >10 −5 . 164 We then further examined the association with 13 genes involved in the regulation of type I and 165 III interferon immunity that were implicated in critical COVID-19 pneumonia 2 but, as with other 166 recent studies 10 , we did not find any significant gene burden test associations (tests for all genes 167 had P -value>0.05, Supplementary File AVTsuppinfo.xlsx). We also did not replicate the reported 168 association 10 for the toll-like receptor 7 (TLR7 ) gene. Transcriptome-wide association study 170 In order to infer the effect of genetically-determined variation in gene expression on disease sus-171 ceptibility, we performed a transcriptome-wide association study (TWAS) using gene expression 172 data (GTExv8) for two disease-relevant tissues, lung and whole blood. We found 14 genes with 173 significant association between predicted expression and critical COVID-19 in the lung and 6 in 174 whole blood analyses (Supplementary File: TWAS.xlsx). To increase statistical power using eQTLs 175 from multiple tissues, we performed a TWAS meta-analysis using all available tissues in GTExv8, 176 revealing 51 transcriptome-wide significant genes. Since TWAS uses a composite signal derived 177 from multiple eQTLs, we used colocalisation to find specific eQTLs in whole blood (eqtlGen and 178 GTExv8) and lung (GTExv8 18 ) which share the same signal with GWAS (EUR) associations. We 179 found 16 genes which significantly colocalise in at least one of the studied tissues, shown in Figure 2 . 180 We repeated the TWAS analysis using models of intron excision rate from GTExv8 to obtain splicing 181 TWAS. We found 40 signals in lung, affecting 16 genes and 20 signals in whole blood which affect 182 9 genes. In a meta-analysis of splicing TWAS using all GTExv8 tissues, we found 91 significant 183 introns in a total of 33 genes. Using GTExv8 lung and whole blood sqtls to find colocalising 184 7 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted September 2, 2021. ; signals with splicing TWAS significant results, we found 11 genes with colocalising splicing signals is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted September 2, 2021. ; https://doi.org/10.1101/2021.09.02.21262965 doi: medRxiv preprint (rs28368148, replication P = 0.00243, significance threshold P < 0.002) may be due to limited power 204 in the replication cohort. The lead variant in TYK2 in whole genome sequencing is a well-studied 205 protein-coding variant with reduced phosphorylation activity, consistent with that reported recently, 4 206 but associated with significantly increased TYK2 expression (Figure 2 , Methods). Fine mapping 207 reveals a significant critical illness association with an independent missense variant in IL10RB, a 208 receptor for Type III (lambda) interferons (rs28368148,p.Trp164Cys, Table 1 ). Overall, variants 209 predicted to be associated with reduction in interferon signalling are associated with critical disease. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted September 2, 2021. and Importin Glu107 is indicated with a dashed line. The risk variant is predicted to eliminate this bond, disrupting nuclear import, an essential step for effect on antiviral signalling 22 and neutrophil maturation. 25 (b) Regional detail showing fine-mapping to separate two adjacent independent signals. Top two panels: variants in linkage disequilibrium with the lead variants shown. The loci that are included in two independent credible sets are displayed with black outline circles. Bottom panel: locations of protein-coding genes, coloured by TWAS P -value. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted September 2, 2021. ; https://doi.org/10.1101/2021.09.02.21262965 doi: medRxiv preprint FUT2 locus includes rs492602 (chr19:48703160:A:G) which is linked to a stop codon gain mutation 248 (chr19:48703417:G:A), leading to the well-described non-secretor phenotype in homozygotes. 40;41 249 We show that the stop-gain, non-secretor allele is protective against life-threatening COVID-19. 250 The protective variant in our study has been previously reported to protect against other viruses . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted September 2, 2021. ; https://doi.org/10.1101/2021.09.02.21262965 doi: medRxiv preprint We thank the patients and their loved ones who volunteered to contribute to this study at one of the 278 most difficult times in their lives, and the research staff in every intensive care unit who recruited 279 patients at personal risk during the most extreme conditions we have ever witnessed in UK hospitals. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted September 2, 2021. ; https://doi.org/10.1101/2021.09.02.21262965 doi: medRxiv preprint Participants were recruited to the mild COVID-19 cohort on the basis of having experienced mild 350 (non-hospitalised) or asymptomatic COVID-19. Participants volunteered to take part in the study 351 via a microsite and were required to self-report the details of a positive COVID-19 test. Volunteers 352 were prioritised for genome sequencing based on demographic matching with the critical COVID-19 353 cohort considering self-reported ancestry, sex, age and location within the UK. We refer to this 354 cohort as the covid-mild cohort. Qubit and normalised to 50ng/µl before WGS or genotyping. For all three cohorts, DNA was extracted from whole-blood using standard protocols. Sequencing CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted September 2, 2021. ; https://doi.org/10.1101/2021.09.02.21262965 doi: medRxiv preprint 385 For the critical and mild COVID-19 cohorts, sequencing data alignment and variant calling was 386 performed with Genomics England pipeline 2.0 which uses the DRAGEN software (v3.2.22). Align-final filtered set of tested variants for each population, we LD-pruned in a window of 250Kb and r 2 = 0.8 with plink 1.9. We then computed the Bonferroni-corrected P -value threshold as 0.05 530 divided by the number of LD-pruned variants. The P -value thresholds that were used for declaring 531 statistical significance are given in Supplementary Table 3. 532 LD-clumping 533 We used plink1.9 to do clumping of variants that were genome-wide significant for each analysis with 534 P 1 set to per-population P -value from table X, P 2 = 0.01, clump distance 1500Mb and r 2 = 0.1. To find the set of independent variants in the per-population analyses, we performed a step-wise 537 conditional analysis with the GWAS summary statistics for each population using GTCA 1.9.3 538 -cojo-slct function. The parameters for the function were pval = 2.2 × 10 −8 , a distance of 10,000 kb 539 and a colinear threshold of 0.9 47 . 540 Fine-mapping 541 We performed fine-mapping for genome-wide significant signals using Rpackage SusieR v0.11.42 48 . For each genome-wide significant variant locus, we selected the variants 1.5 Mbp on each side and 543 computed the correlation matrix among them with plink v1.9. We then run the susieR summary-544 statistics based function susie_rss and provided the summary z-scores from SAIGE (i.e, effect size 545 divided by its standard error) and the correlation matrix computed with the same samples that 546 were used for the corresponding GWAS. We required coverage >0.95 for each identified credible set 547 and minimum and median correlation coefficients (purity) of r=0.1 and 0.5, respectively. 549 We annotated all variants included in each credible set identified by SusieR using VEP v99. We also 550 selected the worst consequence across transcripts using bcftools +split-vep -s worst. We also ranked 551 each variant within each credible set according to the predicted consequence and the ranking was 552 based on the table provided by Ensembl: https://www.ensembl.org/info/genome/variation/predicti 553 on/predicted_data.html. Trans-ancestry meta-analysis 555 We performed a meta-analysis across all ancestries using a inverse-variance weighted method and 556 control for population stratification for each separate analysis in the METAL software 13 . The 557 meta-analysed variants were filtered for variants with heterogeneity P -value p < 2.22 × 10 −8 and 558 variants that are not present in at least half of the individuals. We used the meta R package to plot 559 forest plots of the clumped trans-ancestry meta-analysis variants 49 . 561 In order to quantify the support for genome-wide significant signals from nearby variants in LD, we assessed the internal consistency of GWAS results of the lead variants and their surroundings. To this end, we compared observed z-scores at lead variants with the expected z-scores based on those 19 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted September 2, 2021. ; observed at neighbouring variants. Specifically, we computed the observed z-score for a variant i as s i =β/σβ and, following the approach of 50 , the imputed z-score at a target variant t aŝ s t = Σ t,P (Σ P,P + λI) −1 s P where s P are the observed z-scores at a set P of predictor variants, Σ x,y is the empirical correlation 562 matrix of dosage coded genotypes computed on the GWAS sample between the variants in x and y, 563 and λ is a regularization parameter set to 10 −5 . The set P of predictor variants consisted of all 564 variants within 100 kb of the target variant with a genotype correlation with the target variant 565 greater than 0.25. Replication 567 We used the Host Genetic Initiative (HGI) GWAS meta-analysis round 6 hospitalised COVID vs 568 population (B2 analysis), including all genetic ancestries. In order to remove overlapping signals 569 we performed a mathematical subtraction of the GenOMICC GWAS of European genetic ancestry. The HGI data was downloaded from https://www.covid19hg.org/results/r6/. The subtraction was 571 performed using MetaSubtract package (version 1.60) for R (version 4.0.2) after removing variants 572 with the same genomic position and using the lambda.cohortswith genomic inflation calculated on 573 the GenOMICC summary statistics. Then, we calculated a trans-ancestry meta-analysis for the three 574 ancestries with summary statistics in 23andMe: African, Latino and European using variants that 575 passed the 23andMe ancestry QC, with imputation score > 0.6 and with maf > 0.005. And finally 576 we performed a final meta-analysis of 23andMe and HGI B2 without GenOMICC to create the final 577 replication set. Meta-analysis were performed using METAL 13 , with the inverse-variance weighting 578 method (STDERR mode) and genomic control ON. We considered that a hit was replicating if the 579 direction of effect in the GenOMICC-subtracted HGI summary statistics was the same as in our 580 GWAS, and the P -value was significant after Bonferroni correction for the number of attempted 581 replications (pval < 0.05/25). If the main hit was not present in the HGI-23andMe meta-analysis or 582 if the hit was not replicating we looked for replication in variants in high LD with the top variant 583 (r 2 > 0.9), which helped replicate two regions. Stratified analysis 585 We also performed sex-specific analysis (male and females separately) as well as analysis stratified 586 by age (i.e., participants <60 and >=60 years old) for each super-population set. To compare effect 587 of variants within groups for the age and sex stratified analysis we first adjusted the effect and error 588 of each variant for the standard deviation of the trait in each stratified group and then used the 589 following t-statistic, as in previous studies 51;52 where b 1 is the adjusted effect for group 1, b 2 is the adjusted effect for group 2, se 1 and se 2 are 592 the adjusted standard errors for group 1 and 2 respectively and r is the Spearman rank correlation 593 between groups across all genetic variants. HLA-DQB1, and HLA-DPB1 using the HIBAG package in R 19 . At time of writing, HLA types were also imputed for 82% of samples using HLA*LA 53 . Inferred HLA alleles between HIBAG and threshold being p12 = 10 − 5. eQTL signal and GWAS signals were deemed to colocalise if these 638 two criteria were met: (1) At P 12 = 5 × 10 − 5 the probability of colocalisation P P H4 > 0.5 and (2) At p12 = 10 − 5 the probability of independent signal (PPH3) was not the main hypothesis 640 (P P H3 < 0.5). These criteria were chosen to allow eQTLs with weaker P -values due to lack of 641 power in GTExv8, to be colocalised with the signal when the main hypothesis using small priors 642 was that there wasn't any signal in the eQTL data. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted September 2, 2021. ; https://doi.org/10.1101/2021.09.02.21262965 doi: medRxiv preprint Genetic mechanisms of critical illness in Covid-19 Inborn errors of type I IFN immunity in patients with life-threatening COVID-648 19 Genomewide association study of severe covid-19 with respiratory failure COVID-19 Host Genetics Initiative. Mapping the human genetic architecture of COVID-19 Features of 20 133 UK patients in hospital with covid-19 using the 655 Characterisation Protocol: Prospective observational cohort study Tissue-Specific Immunopathology in Fatal COVID-19 Robust, reproducible clinical patterns in hospitalised patients with Dexamethasone in Hospitalized Patients with Covid-19 -Preliminary Report New susceptibility loci for severe COVID-19 by detailed GWAS analysis 664 in European populations Pan-ancestry exome-wide association analyses of COVID-19 outcomes 666 in 586,157 individuals Rare loss-of-function variants in type i ifn immunity genes are not associated 669 with severe covid-19 Efficiently controlling for case-control imbalance and sample relatedness in METAL: fast and efficient meta-analysis of genomewide 674 association scans A simple new approach to variable 676 selection in regression, with application to genetic fine mapping CADD: predicting 681 the deleteriousness of variants throughout the human genome Scalable generalized linear mixed model for region-based association tests in 684 large biobanks and cohorts The mutational constraint spectrum quantified from variation in 687 141,456 humans The GTEx Consortium atlas of genetic regulatory effects across human 690 tissues /6509/1318. Publisher: American Association for the Advancement of Science HIBAG -HLA genotype imputation with attribute bagging Open source clinical science for emerging infections. The Lancet Infectious 696 Diseases 14 Repurposed Antiviral Drugs for Covid-19 -Interim WHO Solidarity Trial Results Phospholipid scramblase 1 potentiates the antiviral activity of interferon Phospholipid scramblase 1 interacts with influenza a virus np, impairing its 702 nuclear import and thereby suppressing virus replication Phospholipid Scramblase 1 Contains a Nonclassical Nuclear Localization 704 Signal with Unique Binding Site in Importin A* Nuclear phospholipid scramblase 707 1 prolongs the mitotic expansion of granulocyte precursors during G-CSF-induced granulopoiesis Phospholipid scramblase: An update Bcl11a is essential for lymphoid development and negatively regulates p53. The 712 Plasmacytoid Dendritic Cells: Development, Regulation, and Function Hemokinin is a hematopoietic-specific 716 tachykinin that regulates b lymphopoiesis Hemokinin-1 activates the mapk pathway and enhances b cell proliferation 718 and antibody production Proinflammatory tachykinins that signal through the neurokinin 1 720 receptor promote survival of dendritic cells and potent cellular immunity Inflammatory profiles across the spectrum of disease reveal a distinct role 723 for GM-CSF in severe COVID-19 Gm-csf-based treatments 725 in covid-19: reconciling opposing therapeutic approaches Resequencing Study Confirms That Host Defense and Cell Senescence Gene 728 Variants Contribute to the Risk of Idiopathic Pulmonary Fibrosis Phospholipid flippase activities and substrate specificities of human type iv 732 p-type atpases localized to the plasma membrane Changes in membrane phospholipid distribution 735 during platelet activation Membrane asymmetry and blood coagulation. 737 New genetic signals for lung function highlight pathways and chronic obstructive 739 pulmonary disease associations across multiple ancestries Blood group type A secretors are associated with a higher risk of 741 COVID-19 cardiovascular disease complications Sequence and expression 743 of a candidate for the human secretor blood group alpha(1,2)fucosyltransferase gene (fut2) homozygosity for an enzyme-inactivating nonsense mutation commonly correlates with the 745 non-secretor phenotype A natural history of fut2 polymorphism in humans A fut2 gene common polymorphism determines resistance to 749 rotavirus a of the p[8] genotype Genome-wide association and hla region fine-mapping studies identify suscepti-751 bility loci for multiple common infections The landscape of host genetic factors involved in immune response to common 753 viral infections. medRxiv : the preprint server for health sciences Non-secretion of abo antigens predisposing to infection by neisseria 755 meningitidis and streptococcus pneumoniae PLINK: A Tool Set for Whole-Genome Association and Population-Based 757 Linkage Analyses Conditional and joint multiple-SNP analysis of GWAS summary statistics 760 identifies additional variants influencing complex traits A simple new approach to variable 763 selection in regression, with application to genetic fine mapping How to perform a meta-analysis with R: a practical 767 tutorial Fast and accurate imputation of summary statistics enhances evidence 770 of functional enrichment The influence of age and sex on genetic associations with adult body size 774 and shape: A large-scale genome-wide interaction study Sexual differences in genetic architecture in uk biobank HLA*LA-HLA typing from linearly projected graph alignments Exploring the phenotypic consequences of tissue specific gene expression 783 variation inferred from GWAS summary statistics A gene-based association method for mapping traits using reference 786 transcriptome data Integrating predicted transcriptome from multiple tissues improves 789 association detection Bayesian Test for Colocalisation between Pairs of Genetic Association 792 Unraveling the polygenic architecture of complex traits using blood eQTL 796 metaanalysis