key: cord-1047599-1mjl7f8w authors: Sagkan, Rahşan Ilikci; Akin‐Bali, Dilara Fatma title: Structural variations and expression profiles of the SARS‐CoV‐2 host invasion genes in Lung cancer date: 2020-06-03 journal: J Med Virol DOI: 10.1002/jmv.26107 sha: 7f0e73e7262149913d1cd694648e5a038384fcc4 doc_id: 1047599 cord_uid: 1mjl7f8w Recent days have seen growing evidence of cancer's susceptibility to severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) and of the effect of genomic differences on the virus' entrance genes in lung cancer. Genetic confirmation of the hypotheses regarding gene expression and mutation pattern of target genes, including Angiotensin Converting Enzyme‐2 (ACE2), Transmembrane Serine Protease 2 (TMPRSS2), Basigin (CD147/BSG) and Paired Basic Amino Acid Cleaving Enzyme (FURIN/PCSK3), as well as correlation analysis, were evaluated in lung adenocarcinoma (LUAD) and lung squamous carcinoma (LUSC) using in silico analysis. Not only were gene expression and mutation patterns detected, but also there was a correlation and survival analysis between ACE2 and other target genes expression levels. The total genetic anomaly carrying rate of target genes, including ACE2, TMPRSS2, CD147/BSG and FURIN/PCSK3, was determined as 8.1% and 21 mutations detected, with 7 of these mutations having pathogenic features. p.H34N on the RBD binding residues for SARS‐CoV‐2 was determined in our LUAD patient group. According to gene expression analysis results in LUAD and LUSC patient groups, while the TMPRSS2 level was statistically significantly decreased in the LUSC patient group compared to healthy control, the ACE2 level was determined to be high in LUAD and LUSC. There were no meaningful differences in expression of CD147 and FURIN genes. The challenge for the today is building the assessment of genomic susceptibility to COVID‐19 in lung cancer, requiring detailed experimental laboratory studies, in addition to in silico analyses, as a way of assessing the mechanism of novel virus invasion that can be used in the development of effective SARS‐CoV‐2 therapy. This article is protected by copyright. All rights reserved. SARS-CoV-2 is an RNA virus, which means that the genetic material of this viruses contains positive-sense, single-stranded RNA. The novel coronavirus SARS-CoV-2 leads to coronavirus disease 2019 (synonym: . Replication and its function are similar to other related coronaviruses that infect mammals, including humans. 1, 2 The molecular mechanism of interaction between new coronavirus and mammalian host cells has not been fully understood. Numerous researcher groups focus on the COVID-19 pandemic in recent days. There are two mechanisms of the entire process of infection. The main target molecule of spike (S) protein of SARS-CoV-2 is ACE2 as a receptor on mammalian host cells. [3] [4] [5] The serine protease TMPRSS2, another human protein, helps activate the coronavirus S protein to allow it to enter the cell. [6] [7] [8] In addition, the FURIN/PCSK3 cleavage of spike protein helps to SARS-CoV-2 interaction with ACE2 receptor. 9,10 FURIN/PCSK3 is one of the important proteases which facilitates viral invasion. The combination of binding and activation allows the virus to enter the host cell. 9, 10 The other mechanism is S protein, which also binds to alternative or the additive molecule CD147/BSG, which is a novel receptor glycoprotein of the immunoglobulin super family, and that acts as the mediating the viral invasion. 11 Cancer patients are at a risk for development of several infections, including the prevalent COVID-19 pandemic in recent days. 12 The lungs are main target of novel coronaviruses. Patients with lung cancer increases the susceptibility to infection. 8 ACE2 is commonly used molecule in the entrance of new viruses to host eukaryotic cells. 3, 4, 7, 9, 13 Transformed cells can change the surface molecules and signaling This article is protected by copyright. All rights reserved. Accepted Article also used to study the prognostic relationship between target genes and lung cancer subtypes. This study aims to investigate the potential of computational tools in increasing expressions and mutations of concerned invasion proteins, as well as their impact, which may be helpful for assessing possible COVID-19 susceptibility in patients with LUSC and LUAD. It also demonstrated the possible susceptibility of cancers to SARS-CoV-2 and prognosis of COVID-19 patients with LUSC and LUAD. The lung cancer data set was obtained from TCGA database. Demographic, clinical and genetic data regarding the patient group are summarized in Table 1 . The cBio Cancer Genomics Portal (http://cbioportal.org) is an open accessed platform contains The Cancer Genome Atlas (TCGA) dataset is a collaboration between the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI) and is the largest-scale cancer genomics projects that allows interactive research of multiple cancer genomic datasets and provides access to the data of more than 5,000 tumor samples from previously various cancer studies. 18 The LUAD and LUSC were chosen as the type of cancers of interest on the web interface to examine mutations in ACE2, TMPRSS2, CD147/BSG and FURIN/ PCSK3 genes in LUAD and LUSC patients presented in the portal. The selected TCGA data set comprised the genome sequencing data of 1097 LUAD and LUSC patients. To that end we use algorithm to analyze the mutation distribution of specific protein functional regions This article is protected by copyright. All rights reserved. using the OncoPrint, Cancer Type Summary and Mutation tools by the interface. These tools provide an overview of genomic alterations in particular genes affecting particular individual samples. To determine the possible pathogenicity of the detected mutations in ACE2, TMPRSS2, CD147/BSG and FURIN/PCSK3 genes, we used the scores provided by This article is protected by copyright. All rights reserved. It includes evolutionary constraints, structural features and protein annotation information. The most important single characteristics for SNAP prediction is conservation in a family of related proteins as reflected by PSIC scores. If the values are between -100-0 and 0-100, the mutations are considered neutral and effected, respectively in the SNAP software. Finally, the score given by the COSMIC 21 (https://cancer.sanger.ac.uk/cosmic) database was used to predict and verify the pathogenic effect of detected mutations. Evolutionary conservation region analysis was carried out to determine whether the mutations which detected affect the critical amino acid codons for the target proteins. The evolutionary conservation analyses of the detected mutant amino acids were evaluated among different species via the "Multiple sequence alignment" tool in the PolyPhen-2 software. GEPIA (http://gepia.cancer-pku.cn/index.html) is an advanced interactive network supporting normal and tumor tissue samples gene expression profiling and interactive analyses. 22 GEPIA offers customizable features such as differentially expressed tumor/normal analysis from the TCGA and the GTEx (Genotype-Tissue Expression) databases. It is an interactive web server advanced to provide several customizable analyses such as differential gene expression, profiling by cancer types or pathological stages, survival analysis, similar gene detection, correlation analysis, and dimensionality reduction. This article is protected by copyright. All rights reserved. The gene expression profiles of ACE2, TMPRSS2, CD147/BSG and FURIN/PCSK3 genes were analyzed as box plot graphs created by the GEPIA database using data from 483 LUAD and 486 LUSC patients and healthy tissue samples obtained from server. Furthermore, the correlation analyses between the expression levels of the ACE2 gene and other target genes were performed using the software. The p-values were automatically calculated by the software in both analyzes, and p-value 0.05 was established statistically significant. The survival analyses of the target genes (ACE2, TMPRSS2, CD147/BSG and FURIN/PCSK3) according to their varying gene expression fluctuations were performed using the web interface. All statistical analyses were carried out on the GEPIA database. Kaplan-Meier curves regarding overall survival. Low and high expression groups were compared used the log-rank test. Pearson test was performed for correlation analyses using online database. p<0.05 was established a statistically significant. In our study, genome sequencing data of 1097 LUAD and LUSC cancer patients were selected and analyzed from cBioPortal portal in order to detect genetic changes in ACE2, TMPRSS2, CD147/BSG and FURIN/ PCSK3 genes in LUAD and LUSC patient samples. It was determined that 8.1% of LUAD and LUSC patients who made up our study group had at least one genetic change (missense, nonsense, inframe mutation, gene amplification and deep deletion) in the target genes. When we looked This article is protected by copyright. All rights reserved. separately at the frequency of genetic anomaly transport in cancer types, it was found to be 7.4% for LUAD and 9,8% for LUSC. A total of 21 mutations (17 missense, 2 nonsense, 1 splice site and 1 deletion) were detected for 4 target genes. Detailed information about the detected mutations is given in Table- codon between these codons appears to be p.H34N with missense mutation in the LUAD cancer group. Apart from this, when we consider functionally mutations, it is likely to cause anomaly in ACE2 gene expression since the p.X233_splice mutation seen in the LUSC patient group is located in the splice region that is 100% protected between species in the evolutionary process. The other target gene, TMPRSS2, is a type II transmembrane serine protease consisting of 492 amino acids expressed on the cell surface and thus ideally positioned to regulate cell-cell and cell-matrix interactions. 23 The binding of the S protein to ACE2 triggers a conformational change in the S protein of the coronavirus by allowing proteolytic digestion by host cell This article is protected by copyright. All rights reserved. proteases (TMPRSS2), thereby infecting the cell by entering the viral RNA into the cell. In our study, a total of 2 mutations (1 missense, 1 nonsense), gene amplification, and deep deletion were detected in the TMPRSS2 gene. The 2 nucleotide detected is on the transmembrane domain, in particular the p.G19* mutation may cause truncated protein formation as the TMPRSS2 polypeptide is used early in the codon 19 amino acid. Another molecule is CD147 used by SARS-CoV-2 to entrance host cells. CD147/BSG is a transmembrane glycoprotein of the immunoglobulin superfamily that plays a role in tumor development, plasmodium infestation and virus infection. 11 In our study, 1 missense mutation p.F275V was detected in CD147/BSG gene in the LUAD patient group. Recently, SARS-CoV-2 has been reported to contain four residues (Pro681, Arg682, Arg683 and Ala684) as a potential cleavage site for the FURIN/PCSK3 protease of the S protein. Therefore, our fourth target gene in our study is FURIN/PCSK3. It includes 9 nucleotide exchanges (7 missense, 1 deletion, 1 nonsense mutation) as well as gene amplification and deep deletion. All of the mutations detected except p.L428V missense change were detected in the LUAD patient group. Also, the p.E112* mutation may cause the FURIN/PCSK3 polypeptide to terminate at the codon 11 amino acid, causing truncated protein to form. In our study, the schematic representation of mutations detected in target genes on protein domains were summarized in Figure-2 . Firstly, this analysis was carried out by including the missense mutations that we explained in detail in the results of mutation analysis according to our pathogenicity This article is protected by copyright. All rights reserved. analysis results with the Poly-Phen2 Database Program. It was determined that 7 of 17 missense mutations given in Table-1 Damaging) because the pathogenic score is close to 1. However, as a result of the second analysis made using the SNAP program, 4 of the same missense mutations were identified as being affected by the score between 0-100 ( Figure-3 A-D) . In addition, a comparison analysis of amino acid sequences affected by missense mutations detected between different species was performed using the "Multiple sequence alignment" option in the Poly-Phen2 program. As a result of this analysis, it was determined that p.R219P, p.I256M, p.L320F, p.D693N, p.Q183K p.E230D, p.G277W, p.L420L, p.L428V and p.Y495Q mutations were conserved due to being on evolutionarily critical conserved amino acids. In Figure 3 , only missense mutations of ACE-2 were included. All of the conjectural pathogenic features and evolutionary conservation analyzes, which were conducted by using the Poly-Phen2 software, has been presented elaborately in Figure-3 (a-d) . The m-RNA expression analysis was performed to determine whether 483 LUAD and 486 LUSC patients differ in ACE2, TMPRSS2, CD147/BSG and FURIN/PCSK3 gene expression profiles compared to the healthy sample group. According to the results of our analysis, ACE2 and CD147/BSG gene expression levels are high in both cancer groups, although not statistically significant in healthy patient samples. However, in our LUSC patient group statistically significantly, TMPRSS2 gene expression was found to be lower in the patient group compared to the healthy sample group. There is no significant difference for FURIN/PCSK3 in both cancer groups This article is protected by copyright. All rights reserved. ( Figure 4A-D) . In addition, the relationship between TMPRSS2, CD147/BSG and FURIN/PCSK3 expression results and the expression profile of ACE2 was evaluated separately by Pearson correlation test, and no correlation was detected. (Figure 5 ). According to our results of survival analysis, the overall survival times of those with low TMPRSS2 expression in the LUAD cancer group are statistically significantly longer than those with high levels (Figure 6, p=0.04 ). Since the first case detected in December 2019, a new Coronavirus causing COVID-19 disease has led more than 3,500,000 infections worldwide and about 245,000 deaths. 24 Deaths caused by this pandemic affected more people with many critical diseases, including cancer. Cancer patients have vary susceptibility to infectious and varying responses to pathogen invasion to host cells regulated gene expression and mutation development in the related genes. Gene expression and mutation profile may show variability between individuals depending on various parameters such as therapy regimen, age and immune system condition. Cancer patients may at a higher risk of being more susceptible to COVID-19 infectious diseases. Recent studies have shown that ACE2 and cellular protease TMPRSS2 play a role in the entry of SARS-CoV-2 into lung cells. [6] [7] [8] 23, 25 Our study is complementary to the lack of information in the literature in order to make a comparative evaluation of the mutation and gene expression profiles of ACE2, TMPRSS2, CD147/BSG and FURIN/PCSK3 genes in LUAD and LUSC patient groups. The study of ACE2, TMPRSS2, CD147/BSG and FURIN/PCSK3's genomic and functional expression variants for potential susceptibility and/or resistance to This article is protected by copyright. All rights reserved. coronavirus infection is not particularly available in the literature. First of all, mutation profiles of target genes, including ACE2, TMPRSS2, CD147/BSG and FURIN/PCSK3 from the genome sequencing results of the LUAD and LUSC patient, which are accessible in TGCA data sets, were analyzed extensively. In our analysis results, it was determined that 8.1% of LUAD and LUSC patients had genetic abnormalities and ACE2 was the most mutating gene among these four target genes. However, when cancer groups are evaluated separately, our results show that the frequency of carrying genetic anomalies is higher for the LUSC group. We detected mutations in sequences encoding important domains of target genes in both types of cancer. In the LUSC patient group, the p.X233_splice site mutation in the ACE2 gene is able to prevent a functional transcript from occurring because it disrupts splicing activity, but these results must be confirmed by wet laboratory studies. In particular, missense mutations in ACE2 and FURIN/PCSK3 genes are likely to be capable of impairing the function of the protein to occur due to the fact that they affect critical amino acids that are conserved among species throughout the evolutionary process. In addition, deep deletions in all four target genes indicate that genes are likely to be homozygous deletions, and in this case gene expression may be impressive. Gene amplification detected in ACE2 and FURIN/PCSK3 genes can cause uncontrolled and excessive gene expression. Hoffman et al. reported that TMPRSS2 showed that SARS-CoV-2 is required for the interaction of S protein with ACE2 receptor and the entry and propagation of SARS-CoV-2 into the host cell. TMPRSS2 p.G19* nonsense mutation is truncating mutation and we think that TMPRSS2 protein synthesis will cause deficient/immature enzyme formation with the formation of stop codon before the completion of this condition and this may lead to disruption in protein function. This article is protected by copyright. All rights reserved. Particularly, genetic variants in ACE2 are thought to affect the interaction of RBD on the SARS-CoV-2 spike protein. It is known that a total of 18 amino acids (Q24, T27, K31 H34, E37, D38, Y41, Q42, L45, L79, M82, Y83, N90, Q325, E329, N330, K353 and G354) on ACE2 are the binding sites with RBD. Experimental and bioinformatics based studies are carried out on the fact that the mutations/variants in these localizations will be able to change the binding affinity of SARS-CoV-2. In our study, There are several reasons why the lung appears to be the most vulnerable target organ for this virus. The first is that the large surface area of the lung makes the lung quite susceptible to the inhaled viruses, but also a biological factor plays a role as a second cause. Zhao et al. showed that 83% of ACE2-expressing cells were alveolar epithelial type II cells (AECII), suggesting that these cells can be used as a reservoir for SARS-CoV-2 invasion. 26 The expression of the ACE2 receptor outside of the lung is known to be found in many extra pulmonary tissues, including the heart, kidney, endothelium, and intestine. 2, 4, 28 In our study, according to the expression profile analysis of ACE2, TMPRSS2, CD147/BSG and FURIN/PCSK3 genes in the LUAD and LUSC groups, it was determined that the expression of TMPRSS2 was significantly lower especially in the LUSC patient group, compared to the healthy group and the LUAD group. LUAD can provide the patient group with protective This article is protected by copyright. All rights reserved. properties against SARS-CoV-2 invasion. ACE2 and CD147/BSG expressions show higher expression in both groups compared to the healthy group, but this enhanced expression is not statistically significant. Some research groups reported that ACE2 expression showed a positive correlation to SARS-CoV-2 infection in experimental studies. 6, 7, 25, 29 The gene expression level of ACE2 may indicate that it is susceptible to SARS-CoV-2 infection and that TMPRSS2 plays a supporting role. Furthermore, we conducted a low TMPRSS2 expression resulted in significantly longer overall survival times in the LUAD cancer patients. Therefore, assessment of the TMPRSS2 gene downregulated expression only may useful for predicting prognosis and susceptibility to COVID-19 in these patient group. We concluded that increased expression of ACE2 and CD147/BSG and decreased This article is protected by copyright. All rights reserved. Nonetheless, integrating expression and mutation patterns of virus invasion genes with defined correlation may prove highly informative, as highlighted by the successful application of in silico analysis to understand lung cancer subtypes variations in possible response to COVID-19. The data used in our study are obtained from public database the TCGA Research Network: https:// www.cancer.gov/tcga. We thank the TCGA and GEPIA databases for the availability of the data. No potential conflict of interest was reported by the author(s). No funding was received. The data used in our study were obtained from public database TCGA, therefore, ethical approval was not required. The datasets generated and analyzed during the current study are available in TGCA database (https:// www.cancer.gov/tcga), The cbio cancer genomics portal (http://www.cbioportal.org/). Coronavirus disease 2019 (COVID-19): current status and future perspectives Emerging coronaviruses: Genome structure, replication, and pathogenesis Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor Angiotensin-converting enzyme 2 (ACE2) as a SARS-CoV-2 receptor: molecular mechanisms and potential therapeutic target Structural, glycosylation and antigenic variation between 2019 novel coronavirus (2019-nCoV) and SARS coronavirus (SARS-CoV) SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor This article is protected by copyright. All rights reserved. Accepted Article SARS-CoV-2 receptor ACE2 and TMPRSS2 are primarily expressed in bronchial transient secretory cells Analysis of the susceptibility of lung cancer patients to SARS-CoV-2 infection. Mol Cancer A review on the cleavage priming of the spike protein on coronavirus by angiotensin-converting enzyme-2 and furin The spike glycoprotein of the new coronavirus 2019-nCoV contains a furin-like cleavage site absent in CoV of the same clade CD147 as a Target for COVID-19 Treatment: Suggested Effects of Azithromycin and Stem Cell Engagement Cancer patients and research during COVID-19 pandemic: A systematic review of current evidence Structural variations in human ACE2 may influence its binding with SARS-CoV-2spike protein This article is protected by copyright. All rights reserved. Accepted Article 14 Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein Receptor and Regulator of the Renin-Angiotensin System Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries Global Epidemiology of Lung Cancer The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data Predicting functional effect of human missense mutations using PolyPhen-2 SNAP: predict effect of non-synonymous polymorphisms on function COSMIC: the Catalogue Of Somatic Mutations In Cancer GEPIA: a web server for cancer and normal gene expression profiling and interactive analyses TMPRSS2, a serine protease expressed in theprostate on the apical surface of luminal epithelial cells and released intosemen in prostasomes Assessing ACE2 expression patterns in lung tissues in the pathogenesis of COVID-19 Single-cell RNA expression profiling of ACE2, the putative receptor of Wuhan ACE2 inhibits breast cancer angiogenesis via suppressing the VEGFa