key: cord-0592974-q3n5ve2k authors: Warren, Rene L; Birol, Inanc title: HLA predictions from the bronchoalveolar lavage fluid samples of five patients at the early stage of the Wuhan seafood market COVID-19 outbreak date: 2020-04-15 journal: nan DOI: nan sha: f871d3f53f3237cc7b8d250e1bf280726856f205 doc_id: 592974 cord_uid: q3n5ve2k We are in the midst of a global viral pandemic, one with no cure and a high mortality rate. The Human Leukocyte Antigen (HLA) gene complex plays a critical role in host immunity. We predicted HLA class I and II alleles from the transcriptome sequencing data prepared from the bronchoalveolar lavage fluid samples of five patients at the early stage of the COVID-19 outbreak. We identified the HLA-I allele A*24:02 in four out of five patients, which is higher than the expected frequency (17.2%) in the South Han Chinese population. The difference is statistically significant with a p-value less than $10^{-4}$. Our analysis results may help provide future insights on disease susceptibility. SARS-CoV-2 infections have reached global pandemic proportions in early 2020, affecting over 2M people worldwide (as of this writing) and showing no signs of easing, except in a few jurisdictions where strict quarantine measures were implemented early on. The resulting coronavirus disease (COVID-19) has a relatively high (∼3.4%) mortality rate [1] -a figure that varies widely between jurisdictions due to factors yet to be determined. Currently, no vaccines or effective treatments are available. Most current data analysis efforts are, understandably, focused on the virus itself for the purpose of vaccine development and tracking its evolution for diagnostics and infection monitoring purposes. Curiously, it is estimated that as high as 18-30% or more of the population may be asymptomatic to SARS-CoV-2 infections [2] [3] , while other affected individuals exhibit mild to severe to critical symptoms of infection. Thus, gaining insights on host susceptibility to the coronavirus is clearly another important aspect that needs to be worked on and understood. One would expect a link between host immunity genes and susceptibility or resistance to infection. The Human Leukocyte Antigen (HLA) gene complex includes two classes of such genes, which encode the Major Histocompatibility Complex (MHC). Proteins of the MHC present (class I) internally-or (class II) externally-derived antigenic determinants (epitopes) to T cells, which upon recognition of the epitope-complex, will mount an immune response to defend against viral and bacterial infections. HLA genes are therefore cornerstone to acquired immunity in humans. HLA alleles have also been shown to be factors in susceptibility or resistance to certain diseases, and their frequency and composition in human populations vary widely (http://allelefrequencies.net). A previous study found HLA class I genes HLA-B*46:01 and HLA-B*54:01 to be associated with the 2003 severe acute respiratory syndrome (SARS) coronavirus infections in Taiwan [4] -a related disease to the current pandemic. For over a decade, high-throughput transcriptome sequencing (RNA-Seq) has proven a worthy instrument for measuring changes of gene expression in human diseases and beyond [5] . Transcriptome analysis has the potential to reveal key genes that are modulated in response to infections, but also to reveal the HLA composition of affected individuals. A few years ago, our group developed an approach for mining high-throughput next-generation shotgun sequencing data for the purpose of HLA determination [6] , which has since been applied in a broader clinical context [7] . Here, we report our initial observations based on transcriptome sequencing libraries prepared from the bronchoalveolar lavage fluid samples of five patients at the early stage of the Wuhan seafood market pneumonia coronavirus outbreak (see Methods). We identified the HLA class I allele A*24:02 and class II haplotype DPA1*02:02-DPB1*05:01 in four out of five individuals. Although HLA-A*24:02 is common in some populations, the prevalence observed (80%) is higher than the allele frequency in the Chinese population (17.2%) -the presumed ethnicity of the patients in the reported study. We downloaded MGISEQ-2000RS paired-end (150 bp) RNA-Seq reads from libraries prepared from the bronchoalveolar lavage fluid samples of five patients (https://www.ebi.ac.uk/ena/data/view/PRJNA605983 Accessions: SRX7730880-SRX7730884 denoted in the tables below as Patient 1-5, respectively). We note that these are metagenomics RNA samples, and, although not explicitly noted, we think that they were prepared for the primary purpose of identifying and characterizing the novel coronavirus at the outbreak epicentre. On each dataset, we ran HLAminer [6] in targeted assembly mode with default values (v1.4; contig length ≥200bp, seq. identity ≥99%, score ≥1000), predicting HLA class I (HLA-I) and class II (HLA-II) alleles and report 4-digit (HLA allele/protein) resolution when top-scoring predictions are unambiguous. Otherwise the 2-digit (allele group) resolution is reported. We predicted and compiled the likely HLA class I (Table 1) and II ( We observe the HLA-A*24:02 allele in four out of five (80%) patients (Table 1) . HLA-A*24 is a common group of alleles in Southeastern Asian populations, and the frequency of HLA-A*24:02 allele is high, especially in indigenous Taiwanese populations, reaching as high as 86.3%. However, our understanding is that the five patients are from the Wuhan market area, and the associated A*24:02 frequency in Chinese population is typically 17.2%, a value that is statistically significantly less than our observed frequency of 80% (p< 10 −4 , 1-sided z-test). Also of note, the HLA class II DPA1*02:02 and DPB1*05:01 haplotype predicted in patients 1 to 4 (80%), genes DQB1*03:01 in patients 1 and 4, and DRB4*01:01 in patients 2 and 3 ( Table 2 ). HLA-A*24 has not been previously reported as a risk factor for SARS infection [9] , but there are reports of disease association with HLA-A*24:02, notably with diabetes [10] [11] [12] [13] , which is a reported potential risk factor in COVID-19 patients [14] . Both DPA1*02:02 and DPB1*05:01 occur at relative high frequency (44.8% and 31.3%, [15] , and associations of those HLA type II alleles with narcolepsy [16] and Graves' disease [15] , both autoimmune disorders, have been reported in that population. Further, a GWAS study found a link between HLA-DPB1*05:01 and chronic hepatitis B in Asians, and it has been suggested that this risk allele may impact one's ability to clear viral infections [16] [17] . HLA also informs vaccine development. This knowledge would help prioritize SARS-CoV-2 derived epitopes predicted to be stable HLA binders [18] [19] . Future studies into host susceptibility and resistance to SARS-CoV-2 infections are sorely needed as they may help us better manage and mitigate risks of infections. We chose to communicate our early findings in this domain to facilitate rapid development of response strategies. This work was supported by Genome BC and Genome Canada [281ANV]; and the National Institutes of Health [2R01HG007182-04A1]. The content of this paper is solely the responsibility of the authors, and does not necessarily represent the official views of the National Institutes of Health or other funding organizations. The many estimates of the COVID-19 case fatality rate. The Lancet Estimating the asymptomatic proportion of corona-virus disease 2019 (COVID-19) cases on board the Diamond Princess cruise ship Estimation of the asymptomatic ratio of novel coronavirus infections (COVID-19) Association of HLA class I with severe acute respiratory syndrome coronavirus infection RNA-Seq: a revolutionary tool for transcriptomics Derivation of HLA types from shotgun sequence datasets Neo-antigens predicted by tumor genome meta-analysis correlate with increased patient survival Complete genome characterisation of a novel coronavirus associated with severe human respiratory disease in Wuhan Association between HLA gene polymorphism and the genetic susceptibility of SARS infection. Book: HLA and associated important diseases Soluble HLA class I antigens in patients with type I diabetes and their family members The HLA class I A locus affects susceptibility to type 1 diabetes Combination of HLA-A24, -DQA1*03, and -DR9 contributes to acute-onset and early complete beta-cell destruction in type 1 diabetes: longitudinal study of residual beta-cell function Circulating preproinsulin signal peptide-specific CD8 T cells restricted by the susceptibility molecule HLA-A24 are expanded at onset of type 1 diabetes and kill β-cells Comorbidity and its impact on 1590 patients with Covid-19 in China: A Nationwide Analysis Fine mapping MHC associations in Graves' disease and its clinical subtypes in Han Chinese HLA-DPB1 and HLA class I confer risk of and protection from narcolepsy A genome-wide association study identifies variants in the HLA-DP locus associated with chronic hepatitis B in Asians Human leukocyte antigen susceptibility map for SARS-CoV-2. medRxiv COVID-19 vaccine candidates: prediction and validation of 174 SARS-CoV-2 epitopes key: cord-1003569-l8w4som2 authors: Warren, René L.; Birol, Inanç title: HLA predictions from the bronchoalveolar lavage fluid samples of five patients at the early stage of the wuhan seafood market COVID-19 outbreak date: 2020-04-15 journal: ArXiv DOI: nan sha: f871d3f53f3237cc7b8d250e1bf280726856f205 doc_id: 1003569 cord_uid: l8w4som2 We are in the midst of a global viral pandemic, one with no cure and a high mortality rate. The Human Leukocyte Antigen (HLA) gene complex plays a critical role in host immunity. We predicted HLA class I and II alleles from the transcriptome sequencing data prepared from the bronchoalveolar lavage fluid samples of five patients at the early stage of the COVID-19 outbreak. We identified the HLA-I allele A*24:02 in four out of five patients, which is higher than the expected frequency (17.2%) in the South Han Chinese population. The difference is statistically significant with a p-value less than 10(−4). Our analysis results may help provide future insights on disease susceptibility. SARS-CoV-2 infections have reached global pandemic proportions in early 2020, affecting over 2M people worldwide (as of this writing) and showing no signs of easing, except in a few jurisdictions where strict quarantine measures were implemented early on. The resulting coronavirus disease (COVID-19) has a relatively high (∼3.4%) mortality rate [1] -a figure that varies widely between jurisdictions due to factors yet to be determined. Currently, no vaccines or effective treatments are available. Most current data analysis efforts are, understandably, focused on the virus itself for the purpose of vaccine development and tracking its evolution for diagnostics and infection monitoring purposes. Curiously, it is estimated that as high as 18-30% or more of the population may be asymptomatic to SARS-CoV-2 infections [2] [3] , while other affected individuals exhibit mild to severe to critical symptoms of infection. Thus, gaining insights on host susceptibility to the coronavirus is clearly another important aspect that needs to be worked on and understood. One would expect a link between host immunity genes and susceptibility or resistance to infection. The Human Leukocyte Antigen (HLA) gene complex includes two classes of such genes, which encode the Major Histocompatibility Complex (MHC). Proteins of the MHC present (class I) internally-or (class II) externally-derived antigenic determinants (epitopes) to T cells, which upon recognition of the epitope-complex, will mount an immune response to defend against viral and bacterial infections. HLA genes are therefore cornerstone to acquired immunity in humans. HLA alleles have also been shown to be factors in susceptibility or resistance to certain diseases, and their frequency and composition in human populations vary widely (http://allelefrequencies.net). A previous study found HLA class I genes HLA-B*46:01 and HLA-B*54:01 to be associated with the 2003 severe acute respiratory syndrome (SARS) coronavirus infections in Taiwan [4] -a related disease to the current pandemic. For over a decade, high-throughput transcriptome sequencing (RNA-Seq) has proven a worthy instrument for measuring changes of gene expression in human diseases and beyond [5] . Transcriptome analysis has the potential to reveal key genes that are modulated in response to infections, but also to reveal the HLA composition of affected individuals. A few years ago, our group developed an approach for mining high-throughput next-generation shotgun sequencing data for the purpose of HLA determination [6] , which has since been applied in a broader clinical context [7] . Here, we report our initial observations based on transcriptome sequencing libraries prepared from the bronchoalveolar lavage fluid samples of five patients at the early stage of the Wuhan seafood market pneumonia coronavirus outbreak (see Methods). We identified the HLA class I allele A*24:02 and class II haplotype DPA1*02:02-DPB1*05:01 in four out of five individuals. Although HLA-A*24:02 is common in some populations, the prevalence observed (80%) is higher than the allele frequency in the Chinese population (17.2%) -the presumed ethnicity of the patients in the reported study. We downloaded MGISEQ-2000RS paired-end (150 bp) RNA-Seq reads from libraries prepared from the bronchoalveolar lavage fluid samples of five patients (https://www.ebi.ac.uk/ena/data/view/PRJNA605983 Accessions: SRX7730880-SRX7730884 denoted in the tables below as Patient 1-5, respectively). We note that these are metagenomics RNA samples, and, although not explicitly noted, we think that they were prepared for the primary purpose of identifying and characterizing the novel coronavirus at the outbreak epicentre. On each dataset, we ran HLAminer [6] in targeted assembly mode with default values (v1.4; contig length ≥200bp, seq. identity ≥99%, score ≥1000), predicting HLA class I (HLA-I) and class II (HLA-II) alleles and report 4-digit (HLA allele/protein) resolution when top-scoring predictions are unambiguous. Otherwise the 2-digit (allele group) resolution is reported. We predicted and compiled the likely HLA class I (Table 1) and II ( We observe the HLA-A*24:02 allele in four out of five (80%) patients (Table 1) . HLA-A*24 is a common group of alleles in Southeastern Asian populations, and the frequency of HLA-A*24:02 allele is high, especially in indigenous Taiwanese populations, reaching as high as 86.3%. However, our understanding is that the five patients are from the Wuhan market area, and the associated A*24:02 frequency in Chinese population is typically 17.2%, a value that is statistically significantly less than our observed frequency of 80% (p< 10 −4 , 1-sided z-test). Also of note, the HLA class II DPA1*02:02 and DPB1*05:01 haplotype predicted in patients 1 to 4 (80%), genes DQB1*03:01 in patients 1 and 4, and DRB4*01:01 in patients 2 and 3 ( Table 2 ). HLA-A*24 has not been previously reported as a risk factor for SARS infection [9] , but there are reports of disease association with HLA-A*24:02, notably with diabetes [10] [11] [12] [13] , which is a reported potential risk factor in COVID-19 patients [14] . Both DPA1*02:02 and DPB1*05:01 occur at relative high frequency (44.8% and 31.3%, [15] , and associations of those HLA type II alleles with narcolepsy [16] and Graves' disease [15] , both autoimmune disorders, have been reported in that population. Further, a GWAS study found a link between HLA-DPB1*05:01 and chronic hepatitis B in Asians, and it has been suggested that this risk allele may impact one's ability to clear viral infections [16] [17] . HLA also informs vaccine development. This knowledge would help prioritize SARS-CoV-2 derived epitopes predicted to be stable HLA binders [18] [19] . Future studies into host susceptibility and resistance to SARS-CoV-2 infections are sorely needed as they may help us better manage and mitigate risks of infections. We chose to communicate our early findings in this domain to facilitate rapid development of response strategies. This work was supported by Genome BC and Genome Canada [281ANV]; and the National Institutes of Health [2R01HG007182-04A1]. The content of this paper is solely the responsibility of the authors, and does not necessarily represent the official views of the National Institutes of Health or other funding organizations. The many estimates of the COVID-19 case fatality rate. The Lancet Estimating the asymptomatic proportion of corona-virus disease 2019 (COVID-19) cases on board the Diamond Princess cruise ship Estimation of the asymptomatic ratio of novel coronavirus infections (COVID-19) Association of HLA class I with severe acute respiratory syndrome coronavirus infection RNA-Seq: a revolutionary tool for transcriptomics Derivation of HLA types from shotgun sequence datasets Neo-antigens predicted by tumor genome meta-analysis correlate with increased patient survival Complete genome characterisation of a novel coronavirus associated with severe human respiratory disease in Wuhan Association between HLA gene polymorphism and the genetic susceptibility of SARS infection. Book: HLA and associated important diseases Soluble HLA class I antigens in patients with type I diabetes and their family members The HLA class I A locus affects susceptibility to type 1 diabetes Combination of HLA-A24, -DQA1*03, and -DR9 contributes to acute-onset and early complete beta-cell destruction in type 1 diabetes: longitudinal study of residual beta-cell function Circulating preproinsulin signal peptide-specific CD8 T cells restricted by the susceptibility molecule HLA-A24 are expanded at onset of type 1 diabetes and kill β-cells Comorbidity and its impact on 1590 patients with Covid-19 in China: A Nationwide Analysis Fine mapping MHC associations in Graves' disease and its clinical subtypes in Han Chinese HLA-DPB1 and HLA class I confer risk of and protection from narcolepsy A genome-wide association study identifies variants in the HLA-DP locus associated with chronic hepatitis B in Asians Human leukocyte antigen susceptibility map for SARS-CoV-2. medRxiv COVID-19 vaccine candidates: prediction and validation of 174 SARS-CoV-2 epitopes