key: cord-0877561-gzy4yud3 authors: Russo, Roberta; Andolfo, Immacolata; Lasorsa, Vito Alessandro; Iolascon, Achille; Capasso, Mario title: Genetic analysis of the novel SARS-CoV-2 host receptor TMPRSS2 in different populations date: 2020-04-24 journal: bioRxiv DOI: 10.1101/2020.04.23.057190 sha: 11a27e81a2b607cad6243f6231e7454642b97699 doc_id: 877561 cord_uid: gzy4yud3 The infection coronavirus disease 2019 (COVID-19) is caused by a virus classified as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). At cellular level, virus infection initiates with binding of viral particles to the host surface cellular receptor angiotensin converting enzyme 2 (ACE2). SARS-CoV-2 engages ACE2 as the entry receptor and employs the cellular serine protease 2 (TMPRSS2) for S protein priming. TMPRSS2 activity is essential for viral spread and pathogenesis in the infected host. Understanding how TMPRSS2 protein expression in the lung varies in the population could reveal important insights into differential susceptibility to influenza and coronavirus infections. Here, we systematically analyzed coding-region variants in TMPRSS2 and the eQTL variants, which may affect the gene expression, to compare the genomic characteristics of TMPRSS2 among different populations. Our findings suggest that the lung-specific eQTL variants may confer different susceptibility or response to SARS-CoV-2 infection from different populations under the similar conditions. In particular, we found that the eQTL variant rs35074065 is associated with high expression of TMPRSS2 but with a low expression of the interferon (IFN)-α/β-inducible gene, MX1, splicing isoform. Thus, these subjects could account for a more susceptibility either to viral infection or to a decrease in cellular antiviral response. In December 2019 a new infectious respiratory disease emerged in Wuhan, Hubei province, China. [1] [2] [3] Subsequently, it diffused worldwide and became a pandemic. The World Health Organization (WHO) has officially named the infection coronavirus disease 2019 (COVID- 19) , and the virus has been classified as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The mechanism of infection of SARS-CoV-2 is not yet well known; it appears to have affinity for cells located in the lower airways, where it replicates. 4 COVID-19 cause a severe clinical picture in humans, ranging from mild malaise to death by sepsis/acute respiratory distress syndrome. At cellular level, virus infections initiate with binding of viral particles to host surface cellular receptors. 5, 6 Receptor recognition is therefore an important determinant of the cell and tissue tropism of a virus. Recently, human angiotensin converting enzyme 2 (ACE2) was reported as an entry receptor for SARS-CoV-2. 3 Moreover, the spike (S) protein of coronaviruses facilitates viral entry into target cells. Entry depends on binding of the surface unit, S1, of the S protein to a cellular receptor, which facilitates viral attachment to the surface of target cells. SARS-CoV-2 engages ACE2 as the entry receptor and employs the cellular serine protease 2 (TMPRSS2) for S protein priming. 5 TMPRSS2 activity is essential for viral spread and pathogenesis in the infected host. 7-10 TMPRSS2 as a host cell factor is critical for spread of several clinically relevant viruses, including influenza A viruses and coronaviruses. 7,10,11-16 TMPRSS2 is a cell surface protein that is expressed by epithelial cells of specific tissues including those in the aerodigestive tract. It is dispensable for development and homeostasis and thus, constitutes an attractive drug target. 13 In this context, it is noteworthy that the serine protease inhibitor camostat mesylate, which blocks TMPRSS2 activity has been approved in Japan for human use, but for an unrelated indication. 10, 14 Due to the crucial role of TMPRSS2 in the viral infection, we analyzed its genetic landscape in different populations trying to find a possible genetic predisposition to SARS-CoV-2 infection. To systematically investigate the candidate functional variants in TMPRSS2 and the allele frequency (AF) differences between 17 populations with different ethnic origin, we analyzed all the 1025 variants in TMPRSS2 gene region downloaded from the gnomAD browser and annotated with 34 pathogenic variant scores (Supplementary Table S1 ). The locus region comprises 496 non-coding and 520 coding variants. The AFs of all the variants located in the coding region of TMPRSS2 in different largescale genome databases were summarized in Supplementary Table S2 . Forty-three loss-of-function (LoF) variants were annotated in gnomAD in the TMPRSS2 gene locus. The benign variants were classified by using a combination of the three algorithms VEST3, REVEL, and RadialSVM and the pathogenic ones by other three algorithms MutationTaster, Mcap, and CADD as recently suggested. 15 The 26% (88/334) of non-synonymous variants has been classified as pathogenic. All these variants are located along the entire coding region of the gene (Figure 1a Supplementary Table S3 . We also investigated, throughout the GTEx database, the distribution of the expression quantitative trait loci (eQTL) for TMPRSS2 (Supplementary Table S4 ). Indeed, we found 203 unique and significant (FDR<0.05) eQTL variants for TMPRSS2 in five different tissues divided as follows: 136 (66.9%) in lung, 56 (27.6%) in testis, 9 (4.4%) in prostate, 1 (0.5%) in ovarian and in thyroid (0.5%) tissue. TMPRSS2 is highly expressed in prostate, testis, stomach, colon-transverse, pancreas, and in tissues of the respiratory tract, as bronchus, pharyngeal mucosa, and lung. However, no difference in gene expression between male and female was observed for non-gender specific tissues (Supplementary Figure S1 ). The AFs of the 136 eQTL-lung variants were compared among different populations, but no substantial differences in AF distribution was observed (Supplementary Figure S2) . Nevertheless, the average AF of 76 eQTL-lung variants with positive normalized effect size (NES) was higher in European populations (FIN, 0.463; NFE, 0.541), whereas the average AF of these variants in East Asian (EAS) population was much lower (0.085) (Figure 1b and Supplementary Table S3 ). Interestingly, the top 25 variants (NES > 0.1) were in a genomic region that includes both TMPRSS2 and MX1 genes. In particular, the most significant eQTL variant rs35074065 is located in the intergenic region between the two genes (distance = 2379 from MX1; distance Targeting TMPRSS2 expression and/or activity could be a promising candidate for potential interventions against COVID-19 given its crucial role in initiating SARS-CoV-2 and other respiratory viral infections. 16 Understanding how TMPRSS2 protein expression in the lung varies in the population could reveal important insights into differential susceptibility to influenza and coronavirus infections. Immunohistochemical studies, with limited sample size, suggest that the TMPRSS2 protein is more heavily expressed in bronchial epithelial cells than in surfactant-producing type II alveolar cells and alveolar macrophages, and that there is no expression in type I alveolar cells that form the respiratory surface. 17 A recent single-cell RNA-sequencing study, confirmed that TMPRSS2 is expressed in type 1 d 2 alveolar cells. 18 Accordingly, our in-silico analysis supported the high TMPRSS2 gene expression in tissues of the respiratory tract, as bronchus, pharyngeal mucosa, and lung. Moreover, it is also considerable to study the genetic variants and the eQTL of this gene as cause of protein expression variability of TMPRSS2. For example, patients who carried single nucleotide polymorphisms associated with higher TMPRSS2 expression (rs2070788 and rs383510) were more susceptible to influenza virus infection A(H7N9) in two separate patient cohorts. 19 Our data on eQTL variants showed that the EAS population has much lower AFs in the eQTL lung-specific variants associated with higher TMPRSS2 expression in lung, while the European populations have higher AFs for the same variants. Interestingly, the top eQTL variants were in a genomic region that includes not only TMPRSS2 gene but also MX1 gene, which encodes a guanosine triphosphate (GTP)metabolizing protein that participates in the cellular antiviral response. MX1 is an interferon (IFN)-α/β-inducible gene that is widely recognized as an influenza susceptibility gene. 20 Of note, the downregulation of MX1 has been documented in non-responder patients to interferon-based antiviral therapy of chronic hepatitis C virus infection. 21 Our data demonstrated that subjects with the eQTL associated with high expression of TMPRSS2 could also carry the associated eQTL in the MX1 gene. In particular, we found that the eQTL variant rs35074065 is associated with high expression of TMPRSS2 but with a low expression of MX1 splicing isoform. Thus, these subjects could account for a more susceptibility either to viral infection or to a decrease in cellular antiviral response. Epidemiological studies across diverse countries including China, Italy, and the United States showed that the incidence and severity of diagnosed COVID-19 as well as other TMPRSS2dependent viral infections such as influenza may be higher in men than women. Interestingly, we observed the TMPRSS2 is expressed at high levels in male specific tissues: prostate and testis. In these latter tissues we also found a high number of eQTLs for TMPRSS2 whose VAF varied among the different population with the lowest frequency in EAS individuals. Another possible explanation of gender differences in mortality and morbidity could be the presence of TMPRSS2:ERG fusion protein in prostate cancer as well as the strong regulation of TMPRSS2 by androgens. Remarkably, at the mRNA level, constitutive expression of TMPRSS2 in lung tissue does not appear to differ between men and women. 16 Accordingly, TMPRSS2 gene expression data from GTEx database do not highlight any difference between male in female. There is a wide variation among both sexes in terms of mRNA expression levels. 16 Low levels of androgens present in women may suffice to sustain TMPRSS2 expression. In addition, TMPRSS2 (and tumors with the TMPRSS2:ERG fusion protein) may be responsive to estrogen signaling. 22, 23 It is attractive to speculate that androgen receptorinhibitory therapies might reduce susceptibility to COVID-19 pulmonary symptoms and mortality. In summary, we systematically analyzed coding-region variants in TMPRSS2 and the eQTL variants, which may affect the gene expression, to compare the genomic characteristics of The variants in TMPRSS2 gene region (chr21:42836478-42903043, 66.566 Kb) were obtained from the gnomAD v2.1.1 database. 24 To analyze the distribution of eQTLs for TMPRSS2, we used the data from Genotype Tissue Expression (GTEx) database (https://www.gtexportal.org/home/datasets). Annotation of TMPRSS2 variants and eQTLs was performed with ANNOVAR by using the pathogenicity prediction tools described in Supplementary Table S1 Clinical features of patients infected with 2019 novel coronavirus in Wuhan A Novel Coronavirus from Patients with Pneumonia in China A novel coronavirus outbreak of global health concern Update Related to the Current Outbreak of COVID-19 SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor Efficient activation of the severe acute respiratory syndrome coronavirus spike protein by the transmembrane protease TMPRSS2 TMPRSS2 Contributes to Virus Spread and Immunopathology in the Airways of Murine Models after Coronavirus Infection Wild-type human coronaviruses prefer cellsurface TMPRSS2 to endosomal cathepsins for cell entry Clinical Isolates of Human Coronavirus 229E Bypass the Endosome for Cell Entry Protease inhibitors targeting coronavirus and filovirus entry The spike protein of the emerging betacoronavirus EMC uses a novel coronavirus receptor for entry, can be activated by TMPRSS2, and is targeted by neutralizing antibodies Evidence that TMPRSS2 activates the severe acute respiratory syndrome coronavirus spike protein for membrane fusion and reduces viral control by the humoral immune response Phenotypic analysis of mice lacking the Tmprss2-encoded protease Simultaneous treatment of human bronchial epithelial cells with serine and cysteine protease inhibitors prevents severe acute respiratory syndrome coronavirus entry Evaluation of in Silico Algorithms for Use With ACMG/AMP Clinical Variant Interpretation Guidelines TMPRSS2 and COVID-19: Serendipity or opportunity for intervention? Influenza and SARS-coronavirus activating proteases TMPRSS2 and HAT are expressed at multiple sites in human respiratory and gastrointestinal tracts SARS-CoV-2 Entry Genes Are Most Highly Expressed in Nasal Goblet and Ciliated Cells within Human Airways Identification of TMPRSS2 as a Susceptibility Gene for Severe Pandemic A(H1N1) Influenza and A(H7N9) Influenza Host genetics of severe influenza: from mouse Mx1 to human IRF7 Elevated expression and polymorphisms of SOCS3 influence patient response to antiviral therapy in chronic hepatitis C The androgen-regulated protease TMPRSS2 activates a proteolytic cascade involving components of the tumor microenvironment and promotes prostate cancer metastasis Estrogen-dependent signaling in a molecularly distinct subclass of aggressive prostate cancer The mutational constraint spectrum quantified from variation in 141,456 humans The authors thank their colleagues Francesco Manna, Barbara Eleni Rosato, Annalaura Montella e Roberta Marra for continuing their lab work with the same dedication as ever during this troubled period of coronavirus disease COVID-19 pandemic.