key: cord-279913-lgdmlies authors: Katz, D. H.; Tahir, U. A.; Ngo, D.; Benson, M. D.; Bick, A. G.; Pampana, A.; Gao, Y.; Keyes, M. J.; Correa, A.; Sinha, S.; Shen, D.; Yang, Q.; Robbins, J. M.; Chen, Z.-Z.; Cruz, D. E.; Peterson, B.; Natarajan, P.; Vasan, R. S.; Smith, G.; Wang, T. J.; Gerszten, R. E. title: Proteomic Profiling in Biracial Cohorts Implicates DC-SIGN as a Mediator of Genetic Risk in COVID-19 date: 2020-06-11 journal: medRxiv : the preprint server for health sciences DOI: 10.1101/2020.06.09.20125690 sha: doc_id: 279913 cord_uid: lgdmlies COVID-19 is one of the most consequential pandemics in the last century, yet the biological mechanisms that confer disease risk are incompletely understood. Further, heterogeneity in disease outcomes is influenced by race, though the relative contributions of structural/social and genetic factors remain unclear. Very recent unpublished work has identified two genetic risk loci that confer greater risk for respiratory failure in COVID-19: the ABO locus and the 3p21.31 locus. To understand how these loci might confer risk and whether this differs by race, we utilized proteomic profiling and genetic information from three cohorts including black and white participants to identify proteins influenced by these loci. We observed that variants in the ABO locus are associated with levels of CD209/DC-SIGN, a known binding protein for SARS-CoV and other viruses, as well as multiple inflammatory and thrombotic proteins, while the 3p21.31 locus is associated with levels of CXCL16, a known inflammatory chemokine. Thus, integration of genetic information and proteomic profiling in biracial cohorts highlights putative mechanisms for genetic risk in COVID-19 disease. SARS-CoV-2 infection displays a wide array of clinical manifestations and degrees of severity. While there is evidence that comorbidities, particularly cardiovascular and metabolic disease, are risk factors for disease severity and outcomes, 6 the underlying biologic mechanisms that cause some to develop life threatening disease while others remain asymptomatic are not well understood. The association recently observed between the ABO locus on chromosome 9 and susceptibility to respiratory failure in COVID-19 is consistent with earlier work showing an association between blood type and COVID-19 disease, though the mechanism(s) by which this locus might confer susceptibility to respiratory failure is unknown. 7, 8 The association with the 3p21.31 region observed in the same recent study was particularly novel, but also of unclear significance. 3 Emerging proteomic technologies enable large-scale protein profiling in population-based studies. Leveraging available genetic data, investigators have identified the genetic architecture of the circulating proteome. [9] [10] [11] Conversely, combing this information can identify proteomic signatures associated with specific loci or disease variants. Here we used measurements of 1,305 circulating proteins on the SOMAScan TM platform and genetic data from 4,859 participants in three large population-based studies: the Jackson Heart Study (JHS), a cohort of black participants, as well as meta-analyzed data from two white cohorts, the Framingham Heart Study (FHS) and the Malmö Diet and Cancer Study (MDCS). We tested for associations between genetic variants at the ABO and 3p21.31 loci and protein levels in the three cohorts to identify possible mediators of disease. Participants in the JHS with proteomics had an average age of 56 years and were 61% female. They had multiple comorbidities including hypertension (56% on treatment), diabetes mellitus (24%), and obesity (mean BMI 32) . Baseline characteristics in FHS/MDCS have been reportedly previously. 9 In brief, participants in FHS/MDCS were of similar age to JHS but with fewer females (49-53%), and lower prevalence of treated hypertension, obesity, and diabetes mellitus. Table 1 shows the 56 proteins that associate with variants within 1MB of the transcription start site (TSS) of the ABO gene in either JHS or FHS/MDCS or both at a p-value < 5×10 -8 . Such variants are termed protein quantitative trait loci (pQTLs). Twenty-three proteins had significant genetic associations in both black and white subjects, while 15 were specific to JHS and 18 were specific to the FHS/MDCS. As might be expected given the ABO region's known association with thrombosis, 12 proteins associated with variants in this locus across all cohorts included ADAMTS13, von Willebrand Factor (vWF), Tie-1, Angiopoetin-1 receptor, VEGFR-2 and VEGFR-3. Inflammatory mediators, including P-selectin and E-selectin, Immunoglobulin superfamily containing leucine-rich repeat protein 2, and FAM3D (which has known cytokine activity) were also observed. Strikingly, CD209 antigen/DC-SIGN, which is the known binding site for multiple viruses including SARS-CoV, and a theorized binding site for SARS-CoV-2, 13 showed a strong association in both white and black cohorts. The specific variant most strongly associated with levels differed between JHS and FHS/MDCS. To demonstrate the specificity of the aptamer for DC-SIGN protein, we separately identified 47 variants within the gene encoding DC-SIGN protein (on chromosome 19) that associated with DC-SIGN protein levels in JHS at genome wide significance (p < 5×10 -8 ). Supplementary Table 1 summarizes data that similarly support aptamer specificity for all proteins described herein, using the presence of such variants at or near the gene coding for the target protein that also associate with measured protein levels (termed cis-pQTLs), mass spectrometry data, or immunoassay data. Some protein associations were observed only in one racial group or the other. Among JHS participants, pQTLs in the ABO locus were associated with multiple inflammatory proteins including Cytotoxic Tlymphocyte protein 4, Interleukin-13, Interleukin-6 receptor subunit beta, Immunoglobulin alpha Fc receptor, T-cell surface glycoprotein CD4, and Programmed cell death 1 ligand 2. No pQTLs were identified for these proteins in FHS/MDCS. On the other hand, in FHS/MDCS but not JHS, variants in the ABO locus were associated with multiple adhesion molecules including Intercellular adhesion molecule 2, Neural cell adhesion molecule L1, Intercellular adhesion molecule 5, and Junctional adhesion molecule B. FAM3D, and vWF were all associated with this variant in both cohorts. For each of these four proteins, the effect allele that conferred higher risk for respiratory failure, A, was associated with higher measured levels of protein. pQTLs in the chr3:45800446-46135604 locus Given the multiple genes spanned by the other susceptibility locus, on chromosome 3, we similarly looked for pQTLs in this locus. Table 3 shows the proteins associated with variants in this region. Two proteins were found to have pQTLs in this locus across all cohorts: C-X-C motif chemokine 16 (CXCL16) and Teratocarcinoma growth factor 1. The TDGF1 gene is near this locus making this a cis-pQTL. Further, while not a true cis relationship, CXCL16 is the ligand for CXCR6, whose gene is within this locus. No proteins were significantly associated with the specific variant identified by Ellinghaus et al., rs11385942. Finally, we examined whether circulating levels of CXCL16 or DC-SIGN are associated with known risk factors for COVID-19 in JHS using unadjusted associations. While CXCL16 showed modest associations with age, sex, BMI, smoking, and renal function, DC-SIGN only showed a weak association with coronary disease (Table 4 ). The COVID-19 pandemic is a continually evolving public health crisis, and the biological mechanisms that confer the heterogeneous outcomes of infection remain unclear. Given the recently identified COVID-19 risk loci, our data identify several possible pathways by which these loci might confer risk in COVID-19. The ABO locus is highly pleiotropic in our pQTL data, being associated with the levels of 56 proteins across the black and white cohorts. This likely reflects, in part, its role as a glycosyltransferase, altering the overall structure of multiple glycoproteins. An association between ABO blood group and disease is All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 11, 2020. . https://doi.org/10.1101/2020.06.09.20125690 doi: medRxiv preprint emerging in COVID-19. At least one published study observed a higher proportion of type A blood among individuals hospitalized with COVID-19, 14 and unpublished data from China and the United States suggests that blood group A is associated with increased risk of acquiring COVID-19. In China, an increased proportion of Type A blood was observed among those with COVID-19 as compared with local controls. 8 Zietz and Tatonetti found similar results among patients from New York Presbyterian Hospital, and meta-analyzed with the data from China to confirm the association. 7 Most recently, the ABO locus was shown to be a risk locus for COVID-19 severity; in a meta-analysis of 1,610 cases of COVID-19 and respiratory failure and 2,205 COVID-19 cases without respiratory failure across a population of patients from Spain and Italy, Ellinghaus et al. observed multiple variants in the ABO gene locus that conferred increased risk. 3 They, and others, have speculated about the possible ways in which ABO might play a role in COVID-19. The thrombotic risk associated with this locus is one plausible element, and indeed we observe the well-known association of this locus with vWF levels. 12, 15 Others suggest that blood type associates with ACE2 levels. 16 Still others hypothesized that circulating anti-A antibodies in individuals with Type O or B blood might also confer some level of immunity to COVID-19. 14,17 Our data, however, strongly suggest that ABO influences CD209 antigen/DC-SIGN, which is a known binding site for SARS-CoV. 4 DC-SIGN is expressed by dendritic cells, and it has been shown that the SARS-CoV spike protein utilizes this protein for cell entry, as do HIV and dengue virus. 4 Interestingly, DC-SIGN is thought to increase with age, and pre-print data suggest that it is also increased in smokers, associating it with two risk factors for COVID-19, 18 though we did not observe these associations in our JHS data. Additionally, we did not observe an association with ACE2 levels measured on the SOMAScan TM platform. The association between ABO and DC-SIGN is consistent with data from other GWAS studies of the human proteome, 10, 11 and we now show the association in a black population. The frequency of the risk allele is slightly higher in JHS, one potential (though likely modest) reason for racial differences in COVID-19. Taken together, these data suggest that the ABO locus may confer its risk in part by modulating DC-SIGN, a putative binding site for SARS-CoV-2 cell entry. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 11, 2020. . Our data further suggest that the ABO locus may influence disease through pleiotropic effects, which may differ between black and white individuals. We show that ABO variants are associated with proteins involved in endothelial function and thrombosis, important complicating factors in COVID- 19. 19 However, there were key differences between the cohorts. In JHS, levels of multiple inflammatory proteins were associated with ABO variants, whereas in FHS/MDCS, ABO variation was associated with proteins involved in cellular adhesion. These differences may mediate increased risk, perhaps for inflammatory complications of COVID-19, such as cytokine storm. 20 We also observed interesting associations at the other risk locus identified by Ellinghaus et al., at which the sentinel effect allele conferred higher risk in their analysis, compared to ABO. 3 The validity of the 3p21.31 locus is noted by them to be supported by the Covid-19 Host Genetics Consortium data, which also showed increased COVID-19 risk, albeit at a lower level of significance. 21 In our analysis, two key proteins emerged, CXCL16 and Teratocarcinoma growth factor 1. TDGF-1 is a signaling protein, and has an unclear relationship to infection or propagation of respiratory disease. CXCL16, on the other hand, is the chemokine ligand for the CXCR6 receptor, whose coding gene is at the chromosome 3 locus of interest. This makes the CXCR6 gene a strong candidate for further investigation among the large number of genes at this locus. Indeed CXCL16/CXCR6 has been implicated in LPS-induced acute lung injury and alveolar inflammation previously. 5, 22 Limitations Our study has several important limitations. We have not studied the relationship between these proteins and genes in COVID-19 cases, thus confirming their role in pathogenesis requires further study. While the proteomic profiling discussed here is extensive, it does not cover the entire proteome, and important protein associations may be missed as a result. Further, while changes in aptamer binding are typically reflective of protein levels, it may be the case that ABO-mediated glycosylation alters aptamer binding, thus changing the measurement of the protein without a true change in protein levels. Nonetheless, our All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 11, 2020. . data demonstrate that ABO affects the identified proteins, motivating further investigation. Additionally, in the case of DC-SIGN (and CXCL16), we and others have found that variants at the cognate gene are associated with levels of the protein as measured by the aptamer (Supplementary Table 1 ). Finally, we acknowledge we do not observe an association between the lead risk SNP at 3p21.31 identified by Ellinghaus et al. and CXCL16. Formal colocalization in larger cohorts will be useful in clarifying the gene-protein relationships. We show here extensive proteomic profiling of genetic variation proposed to confer risk in COVID-19. We have shown in black and white cohorts that the ABO locus is associated with DC-SIGN, a putative binding site for SARS-CoV-2, suggesting ABO-mediated alteration of DC-SIGN may play a key role in disease pathogenesis. We further identify the CXCL16/CXCR6 pair as another potential disease mediator. Further study, specifically in COVID-19 infected patients, is needed to confirm these findings and determine whether levels or other modifications of these proteins could alter disease processes. The human study protocols were approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center, Boston University Medical Center, Lund University, and University of Mississippi Medical Center, and all participants provided written informed consent. The Jackson Heart Study is a community-based longitudinal cohort study begun in 2000 of 5306 selfidentified African Americans from the Jackson, Mississippi metropolitan statistical area, the design of which is previously described. 23 Baseline characteristics were assessed at Visit 1 between 2000 and 2004. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 11, 2020. . https://doi.org/10.1101/2020.06.09.20125690 doi: medRxiv preprint Included in the present study are 1813 individuals with proteomic profiling and whole genome sequencing. Resting blood pressure was measured while sitting by recording two measurements with a Hawksley random zero sphygmomanometer using one of four cuff sizes selected by measured arm circumference. Glomerular filtration rate was estimated using the CKD-EPI equation. 24 Prevalent coronary heart disease (CHD) at Visit 1 was determined as a composite of patient reported angina, patient reported myocardial infarction, and evidence of previous myocardial infarction on ECG. JHS plasma samples were collected at Visit 1 in EDTA tubes maintained in -70°C freezers. 25 Proteomic measurements were performed using SOMAscan™, a single-stranded DNA aptamer-based proteomics platform, which contained 1,305 aptamers. 26 Samples were run in three separate batches for cost efficiency. Batch 1 was run as a nested case-control study of incident coronary disease, excluding those with prevalent coronary heart disease at Visit 1. Batches 2 and 3 were a randomly selected sample of the remaining JHS participants. Each batch was divided into several plates containing a subset of the samples. The FHS Offspring study has been previously described. 27, 28 Included in the present study are 1625 individuals with proteomic profiling and genotyping performed at Visit 5. Proteomic profiling in FHS was also performed on the SOMAscan™ platform. In FHS, plasma samples were collected in citratetreated tubes, which were then centrifuged within 15 minutes at 2000 g for 10 minutes and the supernatant plasma was aliquoted and stored at −80°C without freeze thaw cycles until assayed. This was completed in 2 different batches. In batch one, 1129 proteins (1.1k) were profiled in 695 individuals. As a result of platform enhancements that occurred in the interval between the first and second set of samples being run, batch 2 included an expanded panel of 1305 proteins (1.3k), which was assayed in 930 participants. The MDCS is a Swedish population-based, prospective, observational cohort recruited between 1991 and 1996. 29 Included in the present study are 1421 individuals with proteomics and genotyping. Proteomic profiling in MDCS was also performed on the SOMAscan™ 1.3k platform as above, except samples were collected in EDTA-treated tubes. All assays in all cohorts were performed using SOMAscan™ reagents according to the manufacturer's detailed protocol. 30 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 11, 2020. . https://doi.org/10.1101/2020.06.09.20125690 doi: medRxiv preprint Whole genome sequencing (WGS) in JHS has been described previously. 31 33 for all SNPs passing the following criteria: call rate ≥ 97%, pHWE ≥ 1 × 10 −6 , Mishap P ≥ 1 × 10−9, Mendel errors ≤ 100, and MAF ≥ 1%. In the MDCS, genotyping was conducted using the Illumina Omni Express Exome BeadChip kit. Genotypes were called using Illumina GenomeStudio and imputation performed to the same 1000 Genomes version as for FHS using IMPUTE (v2) for SNPs passing the following criteria: call rate ≥ 95%, pHWE ≥ 1 × 10 −6 , minor allele frequency ≥ 0.01. In JHS, because proteomic data varied by batch, measurements were first standardized to a set of control samples that were part of each plate. Because of the non-normal distribution of the resulting protein levels, age, sex, and batch adjusted residuals were generated and inverse normalized. The association between these values and genetic variants was tested using linear mixed effects models adjusted for age, sex and the genetic relationship matrix to adjust for relatedness using the fastGWA model implemented in the GCTA software package. 34 Variants with a minor allele count less than 5 were excluded. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 11, 2020. To determine associations between clinical variables and DC-SIGN or CXCL16, we used Pearson correlations for continuous variables and logistic regression for dichotomous variables. All protein levels were log-transformed and scaled by batch to normalize the data and reduce batch effects. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 11, 2020. . https://doi.org/10.1101/2020.06.09.20125690 doi: medRxiv preprint Proteins associated with any variant within 1MB of transcription start site of ABO gene in either Jackson Heart Study (JHS) or Framingham Heart Study (FHS)/Malmö Diet and Cancer Study (MDCS). The SNP with lowest p-value for association with that protein in that cohort is displayed for simplicity. P values of 0 are reported where the value is below the limit of the software. Proteins with association with rs657152 in either Jackson Heart Study (JHS) or Framingham Heart Study (FHS)/Malmö Diet and Cancer Study (MDCS) meta-analysis. All Beta estimates are for the presence of the A allele. Proteins associated with any variant within chr3:45800446-46135604 of hg38 in either Jackson Heart Study (JHS) or Framingham Heart Study (FHS)/Malmö Diet and Cancer Study (MDCS) meta-analysis. The SNP with lowest p-value for association with that protein in that cohort is displayed for simplicity. As GWAS in FHS/MDCS was performed using hg37, the locus has been translated to chr3:45841939-46177097. Racial and Ethnic Disparities in Population Level Covid-19 Mortality Racial demographics and COVID-19 confirmed cases and deaths: a correlational analysis of 2886 US counties The ABO blood group locus and a chromosome 3 gene cluster associate with SARS-CoV-2 respiratory failure in an Italian-Spanish genome-wide association analysis pH-dependent entry of severe acute respiratory syndrome coronavirus is mediated by the spike glycoprotein and enhanced by dendritic cell transfer through DC-SIGN CXCL16/CXCR6 is involved in LPS-induced acute lung injury via P38 signalling Cardiovascular Considerations for Patients, Health Care Workers, and Health Systems During the COVID-19 Pandemic Testing the association between blood type and COVID-19 infection, intubation, and death Relationship between the ABO Blood Group and the COVID-19 Susceptibility The Genetic Architecture of the Cardiovascular Risk Proteome Genomic atlas of the human plasma proteome ABO Blood Group and Risk of Thromboembolic and Arterial Disease DC/L-SIGNs of Hope in the COVID-19 Pandemic Association between ABO blood groups and risk of SARS-CoV-2 pneumonia Relationship between ABO blood group and von Willebrand factor levels: from biology to clinical implications ABO blood group predisposes to COVID-19 severity and cardiovascular diseases COVID-19 and ABO blood group: another viewpoint A Hint on the COVID-19 Risk: Population Disparities in Gene Expression of Three Receptors of SARS-CoV Coagulation abnormalities and thrombosis in patients with COVID-19 COVID-19 cytokine storm: the interplay between inflammation and coagulation The COVID-19 Host Genetics Initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic Role for CXCR6 and its ligand CXCL16 in the pathogenesis of T-cell alveolitis in sarcoidosis Toward resolution of cardiovascular health disparities in African Americans: design and methods of the Jackson Heart Study A new equation to estimate glomerular filtration rate Laboratory, reading center, and coordinating center data management methods in the Jackson Heart Study Aptamer-based multiplexed proteomic technology for biomarker discovery Aptamer-Based Proteomic Platform Identifies Novel Protein Predictors of Incident Heart Failure and Echocardiographic Traits An Investigation of Coronary Heart Disease in Families: The Framingham Offspring Study The Malmo Diet and Cancer Study. Design and feasibility D-Dimer in African Americans A genome-wide association study of pulmonary function measures in the Framingham Heart Study MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes A resource-efficient tool for mixed model association analysis of large-scale data. Nat. FHS-MDC exome array and GWAS meta-analysis The Jackson Heart Study (JHS) is supported and conducted in collaboration with Jackson State University (HHSN268201800013I), Tougaloo College (HHSN268201800014I), the Mississippi State Department of Health (HHSN268201800015I/HHSN26800001) and the University of Mississippi Medical Center (HHSN268201800010I, HHSN268201800011I and HHSN268201800012I) contracts from the National Heart, Lung, and Blood Institute (NHLBI) and the National Institute for Minority Health and Health Disparities (NIMHD). The Framingham Heart Study (FHS) acknowledges the support of contracts NO1-HC-25195, HHSN268201500001I and 75N92019D00031 from the National Heart, Lung and Blood Institute and grant supplement R01 HL092577-06S1 for this research. Dr. Katz is supported by NHLBI Molecular data for the Trans-Omics in Precision Medicine (TOPMed) program was supported by the National Heart, Lung and Blood Institute (NHLBI). Genome sequencing for "NHLBI TOPMed: The Jackson Heart Study" (phs000964.v1.p1) was performed at the Northwest Genomics Center (HHSN268201100037C). Core support including centralized genomic read mapping and genotype calling, along with variant quality metrics and filtering were provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1; contract HHSN268201800002I). Core support including phenotype harmonization, data management, sample-identity QC, and general program coordination were provided by the TOPMed Data Coordinating Center (R01HL-120393; U01HL-120393; contract HHSN268201800001I). We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed.The authors wish to thank the staffs and participants of the JHS. The views expressed in this manuscript are those of the authors and do not necessarily represent the