key: cord-0319162-khji80nm authors: Pandey, Rudra Kumar; Srivastava, Anshika; Singh, Prajjval Pratap; Chaubey, Gyaneshwer title: Genetic association of TMPRSS2 rs2070788 polymorphism with COVID-19 Case Fatality Rate among Indian populations date: 2021-10-05 journal: bioRxiv DOI: 10.1101/2021.10.04.463014 sha: 6802d88d53c1138e656da6d4d6501f34c8569a69 doc_id: 319162 cord_uid: khji80nm SARS-CoV2, the causative agent for COVID-19, an ongoing pandemic, engages the ACE2 receptor to enter the host cell through S protein priming by a serine protease, TMPRSS2. Variation in the TMPRSS2 gene may account for the difference in population disease susceptibility. The haplotype-based genetic sharing and structure of TMPRSS2 among global populations have not been studied so far. Therefore, in the present work, we used this approach with a focus on South Asia to study the haplotypes and their sharing among various populations worldwide. We have used next-generation sequencing data of 393 individuals and analysed the TMPRSS2 gene. Our analysis of genetic relatedness for this gene showed a closer affinity of South Asians with the West Eurasian populations therefore, host disease susceptibility and severity particularly in the context of TMPRSS2 will be more akin to West Eurasian instead of East Eurasian. This is in contrast to our prior study on ACE2 gene which shows South Asian haplotypes have a strong affinity towards West Eurasians. Thus ACE2 and TMPRSS2 have an antagonistic genetic relatedness among South Asians. We have also tested the SNP’s frequencies of this gene among various Indian state populations with respect to the case fatality rate. Interestingly, we found a significant positive association between the rs2070788 SNP (G Allele) and the case fatality rate in India. It has been shown that the GG genotype of rs2070788 allele tends to have a higher expression of TMPRSS2 in the lung compared to the AG and AA genotypes, thus it might play a vital part in determining differential disease vulnerability. We trust that this information will be useful in underscoring the role of the TMPRSS2 variant in COVID-19 susceptibility and using it as a biomarker may help to predict populations at risk. Abstract 9 SARS-CoV2, the causative agent for COVID-19, an ongoing pandemic, engages the ACE2 10 receptor to enter the host cell through S protein priming by a serine protease, TMPRSS2. 11 Variation in the TMPRSS2 gene may account for the difference in population disease 12 susceptibility. The haplotype-based genetic sharing and structure of TMPRSS2 among global 13 populations have not been studied so far. Therefore, in the present work, we used this 14 approach with a focus on South Asia to study the haplotypes and their sharing among various 15 populations worldwide. We have used next-generation sequencing data of 393 individuals 16 and analysed the TMPRSS2 gene. Our analysis of genetic relatedness for this gene showed a 17 closer affinity of South Asians with the West Eurasian populations therefore, host disease 18 susceptibility and severity particularly in the context of TMPRSS2 will be more akin to West 19 Eurasian instead of East Eurasian. This is in contrast to our prior study on ACE2 gene which 20 shows South Asian haplotypes have a strong affinity towards West Eurasians. Thus ACE2 and 21 TMPRSS2 have an antagonistic genetic relatedness among South Asians. We have also tested 22 the SNP's frequencies of this gene among various Indian state populations with respect to the 23 case fatality rate. Interestingly, we found a significant positive association between the 24 rs2070788 SNP (G Allele) and the case fatality rate in India. It has been shown that the GG 25 genotype of rs2070788 allele tends to have a higher expression of TMPRSS2 in the lung 26 compared to the AG and AA genotypes, thus it might play a vital part in determining 27 differential disease vulnerability. We trust that this information will be useful in underscoring 28 the role of the TMPRSS2 variant in COVID-19 susceptibility and using it as a biomarker may 29 help to predict populations at risk. COVID-19 is an ongoing pandemic that has cost millions of lives worldwide, caused by the 33 SARS-CoV2 virus of the Beta Family. Along with ACE2 (Angiotensin-converting enzyme 2) 34 which acts as a receptor, TMPRSS2 (Transmembrane protease, serine 2), a serine protease, is 35 also involved in virus entry the host cell through S Protein priming (1,2). Along with SARS-CoV-36 2, the Influenza virus, as well as the various human coronaviruses such as HCoV-229E, MERS-37 CoV, and SARS-CoV, have been identified to utilize this protein for cell entrance (3). Serine 38 proteases have been linked to a variety of physiological and pathological processes. 39 Androgenic hormones were shown to upregulate this gene in prostate cancer cells, while 40 androgen-independent prostate cancer tissue was found to downregulate it (4). Northern 41 blots analysis has revealed that in mice TMPRSS2 is mainly expressed in the kidney and 42 prostate, whereas in humans, TMPRSS2 is largely expressed in the prostate, salivary gland, 43 stomach and colon (5). TMPRSS2 is also expressed in the epithelia of the respiratory, 44 urogenital and gastrointestinal tracts according to in-situ hybridization investigations 45 performed on mice embryos and adult tissues (5). 46 The impact of the COVID-19 crisis is not uniform across ethnic groups. Patients from different 47 ethnic backgrounds suffer disproportionately (6). Discrepancies in infection as well as case 48 fatality rates (CFR) could be due to multiple reasons e.g., differences in quarantine and social 49 distancing policies, access to medical care, reliability & coverage of epidemiological data, and 50 population age structure, which shows that mortality is greater among the elderly and those 51 with comorbidity (7,8). However, many young and healthy people have also lost their lives 52 due to rapid cytokine storms (9). It is important to note that these factors do not appear to 53 account for all the disparities noticed among groups, and there are significant gaps that 54 require the scientific community's attention to propose and test theories that will assist us in 55 better understanding the disease etiology. This is even more important, keeping in mind that 56 the number of cases and deaths may be poorly reported in some populations however, 57 countries with strict standards for the collection and presentation of epidemiological data 58 suggest that human variation in genetic makeup may account for differential susceptibility 59 and severity in disease outcomes among different populations (10). There is evidence that 60 supports the role of ACE2 gene variations in susceptibility to COVID-19 in Indian populations 61 (11,12). However, little is known regarding the genetic structure of TMPRSS2 haplotypes 62 among South Asian populations, a detailed analysis of the sequence data of TMPRSS2 gene 63 from world populations may unveil its haplotype sharing, which may help understand the role 64 of TMPRSS2 in disease susceptibility globally. Given the relevance of the TMPRSS2 gene in the 65 SARS-CoV-2 infection process, COVID-19 infection and severity pattern may be directly linked 66 to elevated TMPRSS2 gene expression, resulting in varying disease susceptibility outcomes in 67 various communities globally. However, the role of TMPRSS2 polymorphism for disease 68 susceptibility in the Indian populations is largely unexplored and this needs to be examined. 69 Therefore, in the current study, we analysed the haplotype structure of TMPRSS2 focusing on 70 South Asia and its genetic markers that could be responsible for changes in the gene's 71 expression in the lungs tissue and, correlate it with epidemiological data on COVID-19 for any 72 existing association among Indian population. 73 The TMPRSS2 gene haplotype analysis for various world populations was done using NGS data 75 from (13). PLINK 1.9 was used to extract sequences from the dataset for different populations 76 (14). After excluding samples from Sahul and Africa, as well relatives up till second-degree, a 77 total of 393 samples and 795 SNPs were observed and were used further for study 78 (Supplementary Table 1 and 2). The plink file was converted to fasta (ped to IUPAC) by a 79 customized script (15). For the purpose of phasing, Fst calculation, Population-wise genetic 80 distances calculation, and generation of Network and Arlequin input file, DNAsp was used 81 (16). MEGA X was used to construct an Fst based Neighbour-joining tree (17). To calculate 82 Nei's genetic and average pairwise distance, Arlequin 3.5 was used and plotted on a graph by 83 R V3.1 (18,19). Network v5 and network publisher were employed to draw the median-joining 84 network while total and prevalent haplotypes in TMPRSS2 gene for each population were 85 calculated using XML file generated through Arlequin 3.5 (18, 20) . 86 For the association study, we searched for the studies on TMPRSS2 variants reported in the 87 literature elsewhere in relation to COVID-19 susceptibility (4,21-41). We obtained a total of 88 5 SNPs (rs2070788, rs734056, rs12329760, rs2276205, and rs3787950) was observed in our 89 data and studied subsequently in detail. Data from the Estonian Biocentre (42- TMPRSS2 is a serine protease enzyme that is encoded in humans by the TMPRSS2 gene that 104 is located on chromosome 21q22.3. (50). This protein aids in virus entry into host cells, such individuals. Haplotypes 48 and 75 were found to be more common in Europe, while 121 haplotypes 98 and 260 were observed to be more common in Siberia. Haplotype 34 was 122 frequent in Southeast Asia, followed by Central Asia (Supplementary Table 3A Figure 1C and Figure 1A) . 130 Therefore, the host susceptibility of SARS-CoV-2 for TMPRSS2 gene among South Asians is 131 most likely expected to be similar to West Eurasian rather than that of East Eurasians. In 132 contrast with this, our previous study on the ACE2 gene has shown the strong affinity of South 133 Asian haplotypes with the East Eurasians (11,12). Thus, for the South Asians, ACE2 and 134 TMPRSS2 have an antagonistically genetic relatedness. As a result, it's worth proposing that 135 the South Asian population's susceptibility to SARS-CoV-2 will fall somewhere between West 136 and East Eurasian people, which is most likely the cause of the moderate susceptibility. 137 There has not been any association study so far on the TMPRSS2 variants in relation to COVID-138 19 among Indian Populations. Therefore, we calculated groupwise allele frequencies in Indian 139 populations for all the 5 SNPs (rs2070788, rs734056, rs12329760, rs2276205, and rs3787950) 140 observed in our data. The linear regression analysis was carried out for these SNP's for spatial 141 frequency in India with COVID-19 CFR among various Indian states (Supplementary Table 4 142 A, B and 5). The Regression Analysis showed a significant positive correlation for rs2070788 143 SNP (G allele), between allele frequency and case fatality rate (p < 0.05). Higher CFR was 144 observed where the allele frequency is higher and vice versa (Figure 2A and B) . (Table 1) . 152 Tmprss2 expression in the lungs was reported to be higher in the rs2070788 GG genotype 153 than those in the AA and AG genotype (52) thus, the G allele may contribute to severe 154 consequences in SARS-COV2 infection in populations with high frequency. We found that G 155 allele frequency in India ranges from 20% to 50%, with the mean frequency of 39%, lowest 156 being in Arunachal Pradesh and highest in Bihar which is in accordance as per data observed 157 which clearly shows Arunachal Pradesh is among those states that show lowest CFR while 158 Bihar and other states are among higher CFR rate (Supplementary Table 4A and B) . Thus this 159 may explain the disparity in severity of pandemic among various Indian states (Figure 2 B) . 160 Being an androgen-sensitive gene TMPRSS2 is known to mediate sex-related effects and Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease 211 Inhibitor A pneumonia outbreak associated 213 with a new coronavirus of probable bat origin TMPRSS2: A potential target for treatment of 215 influenza virus and coronavirus infections The pivotal role of TMPRSS2 in coronavirus disease 2019 and 217 prostate cancer Expression of transmembrane 219 serine protease TMPRSS2 in mouse and human tissues COVID-19 and Racial/Ethnic Disparities COVID-19 and comorbidities: 223 Deleterious impact on infected patients Comorbidity and its 225 Impact on Patients with COVID-19 Attenuation of COVID-227 19-induced cytokine storm in a young male patient with severe respiratory and neurological 228 symptoms Genetic susceptibility of COVID-19: a systematic review of current evidence Genetic Association 234 of ACE2 rs2285666 Polymorphism With COVID-19 Spatial Distribution in India Asian haplotypes of ACE2 share identity by descent with East Eurasian populations Genomic analyses inform 240 on migration events during the peopling of Eurasia PLINK: A Tool Set 242 for Whole-Genome Association and Population-Based Linkage Analyses Visualising migration flow data with circular plots 245 DNA Sequence Polymorphism Analysis of Large Data Sets Molecular Evolutionary Genetics 251 Analysis across Computing Platforms Arlequin suite ver 3.5: a new series of programs to perform population 253 genetics analyses under Linux and Windows The R Project for Statistical Computing Median-joining networks for inferring intraspecific phylogenies Common variants at 259 21q22.3 locus influence MX1 and TMPRSS2 gene expression and susceptibility to severe 260 COVID-19. iScience ACE2 and TMPRSS2 variants and expression as 262 candidates to sex and country differences in COVID-19 severity in Italy. Aging Global Spread of 265 SARS-CoV-2 Subtype with Spike Protein Mutation D614G is Shaped by Human Genomic 266 Variations that Regulate Expression of TMPRSS2 and MX1 Genes The Expression and Polymorphism of Entry Machinery for COVID-19 in Human: 270 Juxtaposing Population Groups, Gender, and Different Tissues New insights into genetic 273 susceptibility of COVID-19: an ACE2 and TMPRSS2 polymorphism analysis Genetic variants that 276 influence SARS-CoV-2 receptor TMPRSS2 expression among population cohorts from multiple 277 continents Infectivity and Progression of COVID-279 19 Based on Selected Host Candidate Gene Variants Ethnicity-dependent allele frequencies 281 are correlated with COVID-19 case fatality rate [Internet]. Preprints Like sugar in 324 milk": reconstructing the genetic history of the Parsi population The Genetic 327 Ancestry of Modern Indus Valley Populations from Northwest India Estonian Biocentre Public Data The genetic legacy of continental 332 scale admixture in Indian Austroasiatic speakers. Sci Rep A map of 334 human genome variation from population-scale sequencing SNV: understanding the evolutionary and 337 medical implications of human single nucleotide variations in diverse populations Pearson Correlation Coefficient Haploview: analysis and visualization of LD and haplotype 344 maps Cloning of the TMPRSS2 Which Encodes a Novel Serine Protease with Transmembrane, LDLRA, and SRCR 347 Domains and Maps to 21q22.3. Genomics Structural analysis of experimental drugs binding to the SARS-CoV-2 target 349 TMPRSS2 Pandemic A(H1N1) Influenza and A(H7N9) Influenza. J 352 Infect Dis Sex-mediated effects of ACE2 and TMPRSS2 on the 354 incidence and severity of COVID-19; The need for genetic implementation Androgenic hormones and the excess male 357 mortality observed in COVID-19 patients: new convergent data Male sex 359 identified by global COVID-19 meta-analysis as a risk factor for death and ITU admission. Nat 360 Commun