key: cord-0721508-8kyvvqqq authors: Secolin, Rodrigo; de Araujo, Tânia K.; Gonsales, Marina C.; Rocha, Cristiane S.; Naslavsky, Michel; De Marco, Luiz; Bicalho, Maria A.C.; Vazquez, Vinicius L.; Zatz, Mayana; Silva, Wilson A.; Lopes-Cendes, Iscia title: Genetic variability in COVID-19-related genes in the Brazilian population date: 2020-12-06 journal: bioRxiv DOI: 10.1101/2020.12.04.411736 sha: 3d07c729e66d82950cd420a6840b6f2a254e7e6d doc_id: 721508 cord_uid: 8kyvvqqq SARS-CoV-2 employs the angiotensin-converting enzyme 2 (ACE2) receptor and the transmembrane serine protease (TMPRSS2) to infect human lung cells. Previous studies have suggested that different host genetic backgrounds in ACE2 and TMPRSS2 could contribute to differences in the rate of infection or severity of COVID-19. Recent studies also showed that variants in 15 genes related to type I interferon immunity to influenza virus could predispose to life-threatening COVID-19 pneumonia. Additional genes (SLC6A20, LZTFL1, CCR9, FYCO1, CXCR6, XCR1, IL6, CTSL, ABO, and FURIN) and HLA alleles have also been implicated in response to infection with SARS-CoV-2. Currently, Brazil has recorded the third-highest number of COVID-19 patients worldwide. We aim to investigate the genetic variation present in COVID-19-related genes in the Brazilian population. We analysed 27 candidate genes and HLA alleles in 954 admixed Brazilian exomes. We used the information available in two public databases (http://www.bipmed.org and http://abraom.ib.usp.br/), and additional exomes from individuals born in southeast Brazil, the region with the highest number of COVID-19 patients in the country. Variant allele frequencies were compared with the 1000 Genomes Project phase 3 (1KGP) and the gnomAD databases. We found 395 non-synonymous variants; of these, 325 were also found in the 1000 Genome Project phase 3 (1KGP) and/or gnomAD. Six of these variants were previously reported as putatively influencing the rate of infection or clinical prognosis for COVID-19. The remaining 70 variants were identified exclusively in the Brazilian sample, with a mean allele frequency of 0.0025. In silico prediction of the impact in protein function revealed that three of these rare variants were pathogenic. Furthermore, we identified HLA alleles that were previously associated with COVID-19 response at loci DQB1 and DRB1. Our results showed genetic variability common to other populations, but also rare and ultra-rare variants exclusively found in the Brazilian population. These findings could potentially lead to differences in the rate of infection or response to infection by SARS-CoV-2 and should be further investigated in patients with the disease. Introduction 1 COVID-19 disease, caused by the SARS-CoV-2 coronavirus, is currently a 2 worldwide pandemic. To enter human lung cells, SARS-CoV-2 employs the spike protein, 3 which is primed by the host serine protease (TMPRSS2), followed by angiotensin-4 converting enzyme 2 (ACE2) receptor binding, and proteolysis with activation of 5 membrane fusion within endosomes by cathepsin L (CTSL) 1-4 . The main feature in SARS-6 CoV-2 infection is pre-activation of the spike protein by FURIN inside the host cell, which 7 leads to increased SARS-CoV-2 spread into lung cells and increased virulence 5 . The rapid 8 SARS-CoV-2 infection leads to an exacerbated immune reaction, and a few studies have 9 shown that increased levels of IL-6 (an essential immune response mediator) are associated 10 with increased inflammatory response, respiratory failure, increased probability of 11 intubation, the presence of clinical complications, and higher mortality in patients with 12 COVID-19 [6] [7] [8] . Additional studies found the enrichment of rare variants predicted to be loss-13 of-function in genes related to type I interferon (IFN) immunity to influenza virus among 14 patients with life-threatening COVID-19 pneumonia (TLR3, TICAM1, TRIF, UNC93B1, 15 TRAF3, TBK1, IRF3, NEMO, IKBKG, IFNAR1, IFNAR2, STAT1, More importantly, there were 70 variants which were exclusive to the Brazilian 1 sample, including 11 variants in genes related to type I INF immunity to influenza virus 9 , 2 six in candidate genes for COVID response identified by GWAS 11 , and five related to 3 SARS-CoV-2 entry in lung cells and virus replication 2, 10 . These are rare or ultra-rare 4 variants, presenting a mean AF of 0.0025 (Supplementary Data 1). Among these, we found 5 one in the dataset from Belo Horizonte and two in the ABraOM database for ACE2 6 p.Arg219Cys; one in the dataset from Barretos and two in the ABraOM database for ACE2 7 p.Leu731Phe; and the TMPRSS2 p.Val160Met variant was present in samples from all the 8 different Brazilian towns and the two public databases (BIPMed and ABraOM), with an 9 AAF ranging from 0.1333 in Belo Horizonte to 0.2931 in Campinas. Among the reported 10 variants in genes influencing type I INF immunity to influenza virus 9 , we found three 11 variants in the ABraOM database (one TLR3 p.Pro554Ser, one IFR3 p.Asn146Lys and one 12 IRF7 p.Pro246Ser) (Supplementary Data1 and 2) . 13 In addition, we identified five variants (rs35044562, rs34326463, rs35508621, 14 rs67959919, and rs35624553) which were previously described in the COVID-19 risk core 15 haplotype and inherited from Neanderthals 12 . These were only present in samples from 16 Ribeirão Preto and the BIPMed dataset (rs34326463), Campinas (rs35044562, and 17 rs35508621), and the ABraOM dataset (rs35044562, rs35508621, rs67959919, and 18 rs35624553) (Table 1) . 19 20 We identified seven variants that were predicted to affect protein function for the 12 22 algorithms used: p.Phe249Ser, p.Gly164Val, and p.Leu25Pro in the SLC6A20 gene; 23 Horizonte; and p.Asn414Ser in the BIPMed dataset). 11 We did not find any predicted deleterious variants in ACE2 and TMPRSS2 based on 12 our 12 algorithm criteria. However, Hou et al. 2 We compared the frequency of these HLA alleles in admixed Brazilians and in populations 1 that occupy the top 10 positions with most cases of COVID-19 and the five populations less 2 affected by the disease, including the United States, India, Russia, Colombia, Peru, Mexico, 3 Spain, Argentina, South Africa, Japan, Australia, South Korea, Vietnam, and Taiwan. The 4 frequency of these alleles is described in Supplementary Data 3. We noticed that the HLA- The SARS-CoV-2 proteome was presented by a diversity of HLA alleles from 3 classes I and II (Supplementary Table 2 ). The HLA proteins are predicted to bind a small 4 proportion of all possible SARS-CoV-2 derived peptides with high affinity (on average 5 0.5% for HLA class I and 2% for HLA class II). Also, we found a small proportion of weak 6 binders (on average, 1.5% for HLA class I and 8. We studied 27 human COVID-19-related genes and the HLA region in two public 19 genomic databases of admixed Brazilians (BIPMed, www.bipmed.org 19 ; ABraOM 20 http://abraom.ib.usp.br/ 24 ), and additional samples from individuals born in three different 21 towns of south-eastern Brazil. We reported the variants and HLA alleles found in these 22 samples and compared them with worldwide populations. We also reported variants 23 constituting the COVID-19 risk core haplotype on locus 3p21.31, described as being 1 inherited from Neanderthals 12 . 2 Previous studies showed that the ACE2, TMPRSS2, CTSL, FURIN, and IL6 genes, 3 as well as the HLA region, may be involved in SARS-CoV-2 infection 1-5,10 and immune 4 response [6] [7] [8] 14, 23, 25 . Furthermore, variants on loci 3p21.31 and 9q34.2 (encompassing 5 SLC6A20, LZTFL1, FYCO1, CXCR6, XCR1, CCR9, and ABO) have been associated with 6 Spanish and Italian patients with COVID-19 11 , and different variants were found to affect 7 the predisposition to life-threatening illness in patients with COVID-19 from different 8 ancestries 9 . 9 The analysis of genetic variability in candidate genes for specific populations can 10 help to identify individuals at a higher risk of infection or severe disease by constructing 11 risk haplotypes, which can also provide therapeutic targets for the development of more 12 effective treatments and the control of COVID-19 2,10 . Thus, in addition to investigating 13 genetic variability in the 27 candidate genes, we extended our analysis to include HLA 14 alleles, which influence immunological response to many infectious agents (updated on 15 September 28 th 2020; https://covid19.who.int/; https://coronavirus.jhu.edu/map.html). This disease. Thus, it seems likely that different population-specific haplotypes may be 8 associated with an increased risk of severe disease in different populations. 9 In conclusion, we found rare variants in three COVID-19-related genes that are 10 present only in the Brazilian dataset and are predicted to affect protein function. Cell entry mechanisms of SARS-CoV-2. Proceedings of the meta-analysis on 5,871 community-dwelling Brazilians Distribution of local ancestry and evidence of adaptation in 4 admixed populations The Brazilian Initiative on Precision Medicine (BIPMed): fostering genomic data-7 sharing of underrepresented populations Human genetic susceptibility to infectious disease The mutational constraint spectrum quantified from 12 variation in 141,456 humans A global reference for human genetic variation HLA alleles frequencies and susceptibility to COVID-19 in a 17 group of 99 Italian patients. HLA n/a Exomic variants of an elderly cohort of Brazilians in the ABraOM database Distribution of HLA allele 21 frequencies in 82 Chinese individuals with coronavirus disease-2019 (COVID-19). populations in the US Disparities in SARS-CoV-2 positivity 5 rates: Associations with race and ethnicity National disparities in COVID-19 outcomes between black and 8 white Americans Ancestry variation and footprints of 10 natural selection along the genome in Latin American populations Following the footprints of polymorphic inversions 13 on SNP data: from detection to association tests Genomic insights into the ancestry and demographic history 16 of South America The variant call format and VCF tools ANNOVAR: functional annotation of genetic 22 variants from high-throughput sequencing data evolution of gene function, and other gene attributes, in the context of phylogenetic 4 trees MutationTaster 6 evaluates disease-causing potential of sequence alterations Improving the assessment of the outcome of 9 non-synonymous SNVs with a consensus deleteriousness score Predicting the 12 functional effect of amino acid substitutions and indels Predicting functional effect of human 15 missense mutations using PolyPhen-2 SIFT web server: predicting effects of amino acid substitutions on 18 proteins Comprehensive statistical study of 452 BRCA1 missense 20 substitutions with classification of eight recurrent substitutions as neutral Arlequin suite ver 3.5: a new series of programs to 3 perform population genetics analyses under Linux and Windows 0: improved predictions of MHC antigen presentation by 7 concurrent motif deconvolution and integration of MS MHC eluted ligand data NetMHCIIpan-2.0 -10 Improved pan-specific HLA-DR predictions using a novel concurrent alignment and 11 weight optimization training procedure Allele frequency net database (AFND) 2020 update: 14 gold-standard data classification, open access genotype data and new query tools 16 The authors declare no competing interests. 17 18 Author contributions 19 RS contributed with the study design, conceptualization, data acquisition, analysis, and 20 paper writing; TKA contributed with HLA sequencing, analysis, in silico prediction 21 analysis, and writing of the paper; MCG contributed with in silico prediction analysis and 22