key: cord-0990982-t2cx2ju5 authors: Zhou, J.; Sun, Y.; Huang, W.; Ye, K. title: Altered blood cell traits underlie a major genetic locus of severe COVID-19 date: 2020-09-10 journal: nan DOI: 10.1101/2020.09.09.20191700 sha: 08b19eb2dfa39958770d64288092dd462df5f8db doc_id: 990982 cord_uid: t2cx2ju5 Purpose: The genetic locus 3p21.31 has been associated with severe coronavirus disease 2019 (COVID-19), but the underlying pathophysiological mechanism is unknown. Methods: To identify intermediate traits of the COVID-19 risk variant, we performed a phenome-wide association study (PheWAS) with 923 phenotypes in 310,999 European individuals from UK Biobank. For candidate target genes, we examined associations between their expression and the polygenic score (PGS) of 1,263 complex traits in a meta-analysis of 31,684 blood samples. Results: Our PheWAS identified and replicated multiple blood cell traits to be associated with the COVID-19 risk variant, including monocyte count and percentage (p = 1.07e-8, 4.09e-13), eosinophil count and percentage (p = 5.73e-3, 2.20e-3), and neutrophil percentage (p = 3.23e-3). The PGS analysis revealed positive associations between the expression of candidate genes and genetically predicted counts of specific blood cells: CCR3 with eosinophil and basophil (p = 5.73e-21, 5.08e-19); CCR2 with monocytes (p = 2.40e-10); and CCR1 with monocytes and neutrophil (p = 1.78e-6, 7.17e-5). Conclusions: Multiple blood cell traits, especially monocyte, eosinophil, and neutrophil numbers, are associated with the COVID-19 risk variant and the expression of its candidate target genes, representing probable mechanistic links between the genetic locus 3p21.31 and severe COVID-19. The coronavirus disease 2019 , caused by infection of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), affects individuals differently, with clinical manifestations ranging from asymptomatic infection, to mild flu-like symptoms, to severe respiratory failure [1] [2] [3] . Besides demographic factors and pre-existing conditions, genetic variation is partly responsible for these varying individual responses [4] [5] [6] . The first genome-wide association study (GWAS) for COVID-19 in patients with respiratory failure identified genetic associations at locus 3p21.31 4 , which was independently replicated by the COVID-19 Host Genetics Initiative 5 . The peak signal at this locus spans multiple chemokine receptor genes (e.g., CCR9, CXCR6, XCR1 and CCR1) and risk variants are associated with the expression of CXCR6, CCR1 and SLC6A20 4 . However, the underlying causal variant, the target gene, and the pathophysiological process are unknown. Phenome-wide association study (PheWAS) is an unbiased approach that evaluates the associations of a disease-associated genetic variant (e.g., a COVID-19 risk variant) with a wide range of phenotypes (i.e., the phenome). PheWAS may identify intermediate traits or biomarkers residing in the causal physiological route from the genetic variant to the disease of interest. It may also reveal unexpected comorbidities that indicate shared biological mechanisms 7, 8 . Similarly, expression quantitative trait locus (eQTL) analysis for a trait-associated genetic variant across the transcriptome can identify candidate causal genes that are either close (in cis) or remote (in trans) to the variant 9, 10 . From the perspective of a candidate gene, insights could be gained into its physiological pathways and downstream functional effects by examining . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 10, 2020. . https://doi.org/10.1101/2020.09.09.20191700 doi: medRxiv preprint associations of its expression level with phenotypes across the phenome, or even with the genetically predicted phenotypic status if measured ones are unavailable 10 . This project aimed to explore the mechanistic link between the genetic locus 3p21. 31 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 10, 2020. . https://doi.org/10.1101/2020.09.09.20191700 doi: medRxiv preprint UK Biobank is a large population-based prospective study that recruited more than 500,000 individuals aged 40-69 years between 2006 and 2010. It was approved by the North West Multi-Centre Research Ethics Committee and proper informed consent was obtained. All participants received baseline measurements, donated biological materials, and provided access to their medical records 11 . Data for this project was accessed through an approved application to UK Biobank (Application ID: 48818). Discovery analysis. This was performed with data from UK Biobank. Only participants fulfilling the following criteria were included: 1) genetic ancestry is Caucasian; 2) included in the genetic principal component analysis; 3) not outliers for heterogeneity and missing genotype rate; 4) no sex chromosome aneuploidy; 5) self-reported sex matching genetic sex; 6) no high degree of genetic kinship, and 7) for relative pairs (kinship coefficient > 0.0884), a minimum number of participants were removed so that all those remaining are unrelated. A total of 310,999 unrelated individuals passed this quality control and filtering procedure. Three sets of phenotypes were examined in our PheWAS: binary disease outcomes, blood and urine biomarkers, and blood cell traits. Binary disease status was defined by mapping ICD9/ICD10 diagnosis codes in the hospital episode statistics to phecodes in the PheCODE grouping system 12 . A total of 858 phecodes with case number no less than 200 were retained in our analysis (Supplementary Table 1 ). For continuous traits, our PheWAS included 34 blood and . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 10, 2020. . https://doi.org/10.1101/2020.09.09.20191700 doi: medRxiv preprint urine biochemistry markers, and 31 blood cell traits (Supplementary Table 2 ) 11 . Statistical association analyses were performed with scripts in R language and the PheWAS package 13 . Logistic regression was performed for binary disease outcomes and linear regression for continuous blood and urine biomarkers, adjusting for age, sex, genotyping array, assessment center, and the first 10 genetic principal components. We applied Bonferroni correction for the total number of phenotypes tested, although we note that this is a conservative approach because the phenotypes are not independent. Summary statistics for nominally significant associations between the COVID-19 risk variant and blood cell traits were also retrieved from GeneATLAS, To identify genes whose expression levels are associated with the COVID-19 risk variant, we inquired eQTL analysis results from GTEx and eQTLGen 10, 17 . The GTEx project studies tissuespecific gene expression and regulation in 54 non-diseased tissue sites from about 1,000 individuals 17 . The eQTLGen Consortium conducted eQTL meta-analysis in 31,684 samples of blood and peripheral blood mononuclear cells from 37 datasets 10 . It also performed PGS analysis . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 10, 2020. . https://doi.org/10.1101/2020.09.09.20191700 doi: medRxiv preprint to evaluate the associations between the expression level of most genes and the polygenic scores of 1,263 traits 10 . The majority of samples in both studies are of European ancestry. In the PGS analysis, multiple PGS were calculated for each trait with different GWAS, sample ancestry, and p-value cutoffs (p = 0.01, 1´10 -3 , 1´10 -4 , 1´10 -5 , 5´10 -8 ). For blood cell traits, three previous GWAS were used and designated as study 1 18 , study 2 19 , and study 3 20 , respectively. Statistical significance was defined with the false discovery rate approach (FDR < 0.05) 10 . is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 10, 2020. . https://doi.org/10.1101/2020.09.09.20191700 doi: medRxiv preprint The severe COVID-19 risk variant examined in this study is rs67959919 (G/A), whose risk allele A has an odds ratio (OR) of 2.07 (95% confidence interval (CI): 1.66-2.56, p = 4.69´10 -11 ) for severe COVID-19 after adjustment for genetic principal components, age and sex 4 . It is in perfect linkage disequilibrium (LD, r 2 = 1) with the lead variant, rs11385942 (A/GA, OR = 2.11, 95% CI: 1.70-2.61, p = 9.46´10 -12 ) in European populations 21 . The lead variant is an insertiondeletion polymorphism and is not found in some existing datasets. To identify phenotypes associated with rs67959919, we performed a PheWAS in a subset of 310,999 unrelated European individuals from the UK Biobank after quality control and filtering (Supplementary Table 3 With the conservative Bonferroni correction for the total number of phenotypes tested (p < 5.42×10 -5 ), we observed that the severe COVID-19 risk variant is associated with monocyte percentage (p = 4.09´10 -13 ) and monocyte count (p = 1.07´10 -8 ). None of the binary disease outcomes or biomarkers passes this significance cutoff. The top three binary phenotypes were all related to the digestive system: sialolithiasis (p = 4.76´10 -4 ), periodontitis (p = 1.53´10 -3 ) and its subcategory, chronic periodontitis (p = 2.43´10 -3 ). At the nominal significance level (p < 0.05), associations were observed with additional blood cell traits (Table 1) : eosinophil count and percentage, neutrophil count and percentage, mean corpuscular hemoglobin, and mean corpuscular hemoglobin concentration. In GeneATLAS, a large database of associations based . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Candidate target genes of the COVID-19 risk variant through regulation of gene expression could be identified with eQTL analysis. Based on cis-eQTL analysis in 54 tissues from GTEx 17 , the COVID-19 risk variant is associated with the expression of CXCR6, SLC6A20, CCR1, CCR9, RP11-697K23. 3 , and LZTFL1 in a total of 9 tissues (Supplementary Table 5 ). Moreover, eQTLGen 10 , a meta-analysis for cis-eQTL in 31,684 blood samples additionally identified the following genes: FLT1P1, CCR3, SACM1L, CCR5, CCR2 and RP11-24F11.2 (Supplementary Table 6 ). Trans-eQTL analysis in both studies did not identify any genes. Table 7) . Genetically predicted higher monocyte count is positively associated with the expression of CCR1 (p = 1.78´10 -6 ) and CCR2 (p = 2.40´10 -10 ). Genetically predicted . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Table 7 ). These results suggest a possibility that the target gene of the COVID-19 risk variant is involved in hematologic processes and regulates blood cell counts. Integrating and reconciling association signals in PheWAS, eQTL, and PGS analysis, three candidate blood cell traits and their corresponding candidate genes were prioritized (Fig. 3) . First, the severe COVID-19 risk allele inhibits the expression of CCR1 and CCR2, subsequently reducing the monocyte count. Second, the risk allele downregulates CCR3 expression and further diminishes the eosinophil count. Third, the risk allele downregulates CCR3 expression and relieves its inhibition on the neutrophil count. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 10, 2020. . https://doi.org/10.1101/2020.09.09.20191700 doi: medRxiv preprint With an unbiased phenome-wide scan approach, our study established two pairs of relationships: 1) associations of the severe COVID-19 risk variant with blood cell traits; and 2) associations between expression levels of candidate target genes and the PGS of blood cell traits. Integrating association signals across multiple analyses prioritizes three blood cell traits, the counts of monocyte, eosinophil and neutrophil, and their candidate target genes, CCR1, CCR2, and CCR3. However, others did not find a significant difference 35 , and there is also a report of an expanded eosinophil percentage among the total viable leukocyte CD45+ population 22 . These changes in circulating blood cells are closely related to the infiltration and accumulation of lymphocyte, . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 10, 2020. . https://doi.org/10.1101/2020.09.09.20191700 doi: medRxiv preprint neutrophil, eosinophil, and inflammatory monocyte-macrophage in the lung and other organs, leading to neutrophil extracellular trap and cytokine release syndrome 23, [36] [37] [38] . Notably, an immuno-monitoring study of COVID-19 patients from acute to recovery phages observed gradual reduction of neutrophil and replenishment of basophil, eosinophil and non-inflammatory monocyte 39 . Our PheWAS in UK Biobank for the severe COVID-19 risk variant, with replications in another dataset, revealed that the risk allele is associated with decreased monocyte count and percentage, eosinophil count and percentage, but with increased neutrophil percentage. GeneATLAS reported even more significant associations for these relationships, probably due to its different quality control procedures and a larger sample size 14 . It also reports suggestive evidence of negative associations between the risk allele and basophil count and percentage (Supplementary Table 4 ). These association directions are consistent with the observed blood cell count changes in COVID-19 patients, as discussed above. Of note, our associations were identified in the generally healthy population samples. On the other hand, the vast majority of existing studies measured blood cell counts at hospital admission or during hospitalization, which likely reflect immune responses to SARS-CoV-2 infection. Future studies are warranted to evaluate if beforeinfection differences in blood cell counts play a role in modulating the risk of developing severe COVID-19. Our PGS analysis for the potential target genes of the severe COVID-19 risk variant unraveled associations with multiple blood cell counts. It is important to stress that these associations are consistent across analyses with PGS calculated with different GWAS datasets, p-value cutoffs, . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 10, 2020. . https://doi.org/10.1101/2020.09.09.20191700 doi: medRxiv preprint and sample ancestries (Supplementary Table 7 ). Intersecting and reconciling association signals across PheWAS, eQTL, and PGS analysis yielded multiple possible pathways for the severe COVID-19 risk allele. Strong and consistent evidence was found on the pathways through monocyte and eosinophil. On the other hand, support for the role of neutrophil is weaker. While the association with neutrophil percentage was replicated, the association with neutrophil count was only significant in UK Biobank. The negative associations between CCR3 expression and PGS of the neutrophil count were only suggestive (p = 0.011). In addition to these three blood cells, basophil may serve as another candidate pathway: the risk allele downregulates CCR3 expression, reduces its stimulatory effect on basophil count, and thus leads to a reduction of basophil. Additional evidence for the potential importance of these candidate genes could be drawn from their cell-type-specific expression patterns (Supplementary Figure 1) . CCR3 has highly specific expression in eosinophil and basophil and only slight expression in neutrophil, CCR2 has high expression in basophil and medium expression in classical monocyte, while CCR1 has medium to high expression across all types of granulocytes and monocyte. Notably, this pathway prioritization analysis utilized eQTL association signals in blood samples, but the regulatory effects could be different across tissues 17 . Also, the eQTL analyses were based on generally healthy samples 10, 17 . The regulatory effects of the risk variant may be different under the SARS-CoV-2 infection. Further studies are needed to examine its functional effects in patients and to identify the most relevant tissue. Nonetheless, our study prioritized hematologic processes as the downstream pathophysiology of a major genetic locus of severe COVID-19. The strengths of our study include the unbiased phenome-wide approach at two levels of analysis, the genetic variant (923 phenotypes) and the gene expression (1,263 phenotypes) . The . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. In conclusion, our phenome-wide association study for the severe COVID-19 risk variant at locus 3p21.31 and its candidate target genes identified altered blood cell traits, especially counts of monocyte, eosinophil, and neutrophil, as the probable mechanistic links between the genetic locus and severe COVID-19. These blood cell traits, together with their candidate acting genes, CCR1, CCR2 and CCR3, represent compelling and testable hypotheses that call for follow-up studies into their roles in COVID-19 pathogenesis. The authors would like to thank the UK Biobank participants and administrators for data access. We also want to thank all other Ye lab members for stimulating discussions. KY is supported by . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 10, 2020. . https://doi.org/10.1101/2020.09.09.20191700 doi: medRxiv preprint the University of Georgia Research Foundation. Funding sources had no involvement in the conception, design, analysis, or presentation of this work. The authors declare no conflicts of interest. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 10, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 10, 2020. Table 7 . Blood cell traits are categorized into three groups: platelet, red blood cells, and white blood cells. The effects of association, Z-score, are shown as the heatmap. The statistical significance is indicated with "*" (p < 0.05) or "**" (FDR < 0.05). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 10, 2020. . https://doi.org/10.1101/2020.09.09.20191700 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 10, 2020. . https://doi.org/10.1101/2020.09.09.20191700 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 10, 2020. . https://doi.org/10.1101/2020.09.09.20191700 doi: medRxiv preprint Severe Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study Prevalence of Asymptomatic SARS-CoV-2 Infection: A Narrative Review Association of Cardiac Injury With Mortality in Hospitalized Patients With COVID-19 in Wuhan, China Genomewide Association Study of Severe Covid-19 with Respiratory Failure The COVID-19 Host Genetics Initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic COVID-19 outcomes and the human genome Unravelling the human genome-phenome Unraveling the polygenic architecture of complex traits using blood eQTL metaanalysis. bioRxiv The UK Biobank resource with deep phenotyping and genomic data Mapping ICD-10 and ICD-10-CM Codes to Phecodes: Workflow Development and Initial Evaluation R PheWAS: data analysis and plotting tools for phenome-wide association studies in the R environment An atlas of genetic associations in UK Biobank The Allelic Landscape of Human Blood Cell Trait Variation and Links to Common Complex Disease The MR-Base platform supports systematic causal 29 Pathogenic T-cells and inflammatory monocytes incite inflammatory storms in severe COVID-19 patients Pathological inflammation in patients with COVID-19: a key role for monocytes and macrophages Eosinopenia and COVID-19 Eosinopenia and elevated C-reactive protein facilitate triage of COVID-19 patients in fever clinic: a retrospective case-control study The role of peripheral blood eosinophil counts in COVID-19 patients Clinical characteristics of 140 patients infected with SARS-CoV-2 in Wuhan Eosinophil count in severe coronavirus disease 2019 Targeting potential drivers of COVID-19: Neutrophil extracellular traps How does SARS-CoV-2 cause COVID-19? Science