key: cord-0920050-kwsvxz36 authors: Facchiano, Antonio; Facchiano, Francesco; Facchiano, Angelo title: An investigation into the molecular basis of cancer comorbidities in coronavirus infection date: 2020-09-24 journal: FEBS Open Bio DOI: 10.1002/2211-5463.12984 sha: 74b1bced6c6008ad881fae19b1a1b728b66e9a3d doc_id: 920050 cord_uid: kwsvxz36 Comorbidities in COVID‐19 patients often worsen clinical conditions and may represent death predictors. Here, the expression of 5 genes, known to encode coronavirus receptors/interactors (ACE2, TMPRSS2, CLEC4M, DPP4 and TMPRSS11D), was investigated in normal and cancer tissues, and their molecular relationships with clinical comorbidities were investigated. Using expression data from GENT2 databases, we evaluated gene expression in all anatomical districts from 32 normal tissues in 3,902 individuals. Functional relationships with body districts were analyzed by Chilibot. We performed DisGeNet, GeneMania and DAVID analyses to identify human diseases associated with these genes. Transcriptomic‐expression levels were then analyzed in 31 cancer‐types and healthy controls from about 43,000 individuals, using GEPIA2 and GENT2 databases. By performing ROC analysis, Area Under Curve (AUC) was used to discriminate healthy from cancer patients. Coronavirus receptors were found to be expressed in several body districts. Moreover, the 5 genes were found to associate with acute respiratory syndrome, diabetes, cardiovascular diseases and cancer, i.e., the most frequent COVID‐19 comorbidities. Their expression levels were found to be significantly altered in cancer types including colon, kidney, liver, testis, thyroid and skin cancers, (p < 0.0001); AUC > 0.80 suggests TMPRSS2, CLEC4M and DPP4 as relevant markers of kidney, liver, and thyroid cancer, respectively. The five coronavirus receptors are related to all main COVID‐19 comorbidities and three show significantly different expression in cancer vs control tissues. Further investigation into their role may help in monitoring other comorbidities as well as for follow‐up of patients who have recovered from SARS‐CoV‐2 infection. COVID-19 pandemic is now affecting almost all countries. Patients in severe clinical conditions are often affected by other pathologies, the most frequent being diabetes, coronary heart diseases, cerebrovascular diseases [1] , cancer [2] . SARS-CoV-2 virus directly affects mainly lung tissues and higher respiratory tract. Nevertheless, besides the lung and lung fluids, SARS-CoV-2 has been found also in far districts such as in feces (about 30% of cases) and in the blood (1% of cases) [3] . Further, acute kidney injury, proteinuria and hematuria have been found associated with death of COVID-19 patients [4] and other organs such as intestine, testis and kidney have been proposed as possible transmission routes [5] . Previous studies demonstrated large diffusion of other coronavirus strains (i.e., SARS-CoV) in almost the entire body [6] . In the present study we focused on the molecular bases possibly underlying the comorbidities observed in SARS-CoV-2 infection. We investigated genes involved as receptors or main interactors of SARS-CoV-2 and similar coronaviruses responsible of SARS and MERS. Namely ACE2, TMPRSS2, CLEC4M, DPP4 and TMPRSS11D were analyzed, by assessing their RNA expression levels in different body districts and in different cancer types. ACE2 (Angiotensin converting enzyme 2) is a carboxypeptidase which converts angiotensin I to angiotensin 1-9 and angiotensin II to angiotensin 1-7. It is recognized as the receptor of SARS-CoV and SARS-CoV-2 viruses [7] . TMPRSS2 (Transmembrane protease serine 2) is a serine protease up-regulated by androgen hormones; it is involved in the infection process of many viruses, including coronaviruses, acting on the spike proteins and on ACE2, facilitating virus-cell membrane fusion [8, 9] . CLEC4M, DPP4 and TNPRSS11D are reported to be receptors or interactors of other coronaviruses; while their receptor-activity for SARS-CoV-2 is not shown to date, many evidences demonstrate their role in related coronaviruses. CLEC4M (C-type lectin domain family 4 member M) is a membrane protein involved as an attachment site of many viruses, including SARS; it is a receptor with pathogen recognition capability toward several parasites and viruses, and with cell adhesion properties. It is a known attachment receptor for Ebola virus, Hepatitis C virus, Human coronavirus 229E, SARS coronavirus and others. [10] . DDP4 (Dipeptidyl peptidase 4) is a serine exopeptidase, corresponding to the T-cell activation antigen CD26. It is a glycoprotein membrane receptor involved in T-cell activation, with peptidase enzymatic activity. It is known as the MERS receptor [11] . TMPRSS11D (Transmembrane protease serine 11D) is a serine protease, active on ACE2 as well as on viral spike proteins. It cleaves and activates the spike glycoprotein of human coronavirus 229E (HCoV-229E) facilitating its cell entrance [12] [13] The transcriptomic expression levels of the 5 receptors/interactors of SARS-CoV-2 and other human coronaviruses was investigated, in the human body districts. Namely, ACE2, TMPRSS2, CLEC4M, DPP4 and TMPRSS11D expression levels in normal tissues were derived from GENT2 database, containing data from about 28,000 controls and cancer subjects (see Supplementary Table 1 for further details on tissues and numerosity). Interestingly, the 5 coronaviruses receptors were found to be expressed in almost any anatomical district. Expression in normal tissues was analyzed in more details as compared to differential expression in cancer types; as reported below. Ubiquitous expression was confirmed by an additional analysis carried out on Human Protein Atlas, showing both protein and RNA expression in all body districts (Supplementary Figure 1) . The almost ubiquitous expression of coronavirus receptors led us to hypothesize that their biological action may affect many organs and tissues. To investigate this hypothesis a Chilibot analysis was carried out (www.chilibot.org). The presence of interactive relationships was analyzed by measuring co-occurrence of the given keywords in the same sentence within the manuscripts abstract. Table 1 shows that most anatomical districts share interactive relationships with "coronavirus" word. Districts such as colon, liver, testis, lung, kidney show the highest relationships with coronavirus, reported in Table 1 , and also show the highest RNA expression levels in many cases (see Supplementary Figure 1 ). carcinogenesis and metastasis were found to be associated to these genes. Infections other than coronavirus were found to be associated to these genes, namely influenza, HIV, HCV, trypanosomiasis infections. As a further investigation, we used a gene-enrichment approach by selecting for each of the 5 genes a list of the 20 most related genes, according to the methodology described in Methods section. The five lists were combined and investigated with DAVID and Genetic Association Database in order to detect gene-disease relationships. The combination of the 5 genes ACE2, TMPRSS2, CLEC4M, DPP4 and TMPRSS11D with the 20 genes most related to each of these genes (i.e., 5 + 100) shows significant relation to diseases classes as reported in Figure 1 . The disease-class showing the best association is "IMMUNE", related to 45 genes of the list, with a highly significant P-value 3.24E-07. Relevant associations were also found with "REPRODUCTION", "AGING" and "CANCER" disease-classes. The analysis has been also performed for the 5 separate lists, and results confirm the evidences from the DisGenNET analysis (see Supplementary Table 2 reporting the complete list of diseases associated to the 5 separate lists). Given the strong relationships found with different cancer types, we then focused our analysis on the expression level of these genes in a large number of different cancer types, by analyzing transcriptomic datasets. Table 3 shows the presence of significant differential expression in cancer vs normal ctrls in several cases. Namely, ACE2 has validated significant differential expression in colon-, kidney-, testicular-and thyroid-cancers, TMPRSS2 has validated significant differential expression in breast-, colon-head&neck-, kidney-, lung-, skin-and uterus cancers; CLEC4M has validated significant differential expression in liver-, lung-and ovary cancers; DPP4 has validated significant differential expression in breast-, kidney-, blood-, skin-, stomach-and thyroid cancers; TNPRSS11D has validated significant different expression in lung cancer. Figure 2 shows the most relevant differential expression and the corresponding AUC according to ROC analysis, respectively, of TMPRSS2 ( Figure 2A ), CLEC4M ( Figure 2B ) and DPP4 ( Figure 2C ). The AUC > 0.80 shown by TMPRSS2, CLEC4M and DPP4 suggests that these genes may act as effective molecular markers for kidney, liver and thyroid cancers, respectively. Combining information taken from the expression levels in the normal tissues and from Table 3 (i.e. cancer types with a validated differential expression vs the corresponding normal tissues), led to an interesting observation summarized in Figure 3 : the normal tissues where such genes have the highest expression levels match with cancer types where these genes show a validated differential expression (red columns in Figure 3 ). This is evident for ACE2, TMPRSS2, CLEC4M and DPP4 as depicted in Figure 3 . The current study investigates the hypothesis that specific molecular bases may underlie the observed comorbidities of COVID-19, specifically involving coronavirus receptors/interactors. Molecular expression analyses and gene-disease association analyses were carried out to investigate the role coronavirus interactors may play in such comorbidities. We followed here a methodology based on gene expression analyses and validation, previously shown as an effective approach in cancer markers investigations [14] [15] [16] . Five molecules known to be involved in coronavirus infection were investigated, namely, ACE2, TMPRSS2, CLEC4M, DPP4, TMPRSS11D. Additional molecules are being proposed to control the virus entry [17] . Figure 3 and Supplementary Figure 1 show that the expression of such molecules is not limited to the bodyinfection sites, rather they appear almost ubiquitously expressed in all body districts, at both RNA and protein levels. This observation parallels data reported in Table 1 indicating that coronavirus shares interactive relationships with many body districts including small intestine, lung, heart, kidney, testis, ovary, breast, where receptors are highly expressed. Furthermore, Table 2 and CLEC4M, DPP4 and TMPRSS11D. Many diseases reported in Table 2 and Figure 1 are frequent COVID-19 comorbidities, namely hypertension, diabetes, cardiovascular diseases, respiratory system disease, kidney diseases and cancers [18, 19, 2] , suggesting that their occurrence in COVID-19 patients may be pathogenetically related to the molecules regulating the virus entry. Kidney comorbidity as well as hypertension and diabetes mellitus have been suggested as death predictors in coronavirus patients [20, 21] . According to the high expression values of ACE2, TMPRSS2, CLEC4M, DDP4 and TMPRSS11D in the normal skin compartment indicated in Figure 3 , and the large interactive relationships with skin tissue reported in Table 1 , we may hypothesize comorbidity signs in coronavirus patients at the skin level; this has been actually confirmed in a very recent report highlighting dermatological manifestations in about 20% of COVID-19 patients [22] . We focused the current study on cancer comorbidity in COVID-19 patients, recently shown to reach a rate up to 11% [23] . We highlight here the relevant association of TMPRSS2 with prostate cancer. TMPRSS2 is an androgen-regulated gene which helps coronavirus entry into cells. Several studies propose TMPRSS2 as a prostate cancer marker, as fused with ERG gene. We then hypothesize that coronavirus infection, related at least in part to TMPRSS2 expression, might be associated to prostate cancer risk, at least to some extent. Further, the analyses carried out revealed that different cancer types as well as carcinogenesis and cancer metastasis are associated to such 5 genes; in our opinion this may explain, at least in part, why cancer is reported as one of the main comorbidities in coronavirus infection [28, 2] , namely hematological malignancies, colorectal cancer, lung cancer [29] . We speculate that the frequent cancer co-occurrence in COVID-19 patients may not be a casual or age-related event, rather it may associate to specific expression patterns of SARS-Cov-2 receptors in kidney, prostate, testis, thyroid, skin and other organs. Surprisingly, combining data of gene-expression in normal tissues (reported in Figure 3 ) with significant differential expression in cancer types (reported in Table 3 ), led us to observe that differential expression in cancers occurs mostly in body districts where these genes are highly expressed. Such correspondence is highlighted as red bars in Figure 3 : ACE2 is highly expressed in gallbladder, testis, kidney and colon normal tissues (1st, 2nd, 3rd and 4th in the rank) and consistently shows significant different expression in cholangio-carcinoma, testis-, kidney-and colon-cancers, i.e., in the corresponding body districts. Similarly, TMPRSS2 shows the highest expression in colon, kidney and lung normal tissues (1st, 4th and 5th in the rank) and consistently shows a significant different expression in colon-, kidney-and lung-cancers. Similarly, CLEC4M shows high expression levels in liver and ovary normal tissues (1st and 4th in the rank) and has differential expression in liver-and ovary-cancers. Finally, DDP4 is highly expressed in kidney (2nd in the rank) and has significant differential expression in kidney-cancer. We found relevant and significant expression changes of 3 genes, namely TMPRSS2, CLEC4M and DPP4. More in detail, a significant reduction of TMPRSS2 and CLEC4M in kidney and liver cancers, and a significant increase of DPP4 in thyroid cancer was highlighted. The AUC was >0.81, so relevant to propose these as possible molecular markers to be further investigated diffusion. According to a recent study [30] a significant association of Kawasaki disease with SARS-CoV-2 has been observed. Kawasaki disease has been proposed to be related to coronavirus infection although this is still a debated issue [31] , while a genetic association of Kawasaki disease with ACE gene polymorphism has been proven [32] and a role of ACE2 in vasculitis and the control of endothelial wall physiology is known [33] . In addition, mice transgenic for human ACE2 show vasculitis signs [34] . Furthermore, TMPRSS11D has been indicated as possible target gene of miRnas, biomarkers of Kawasaki disease [35] , further linking SARS-Cov-2 receptors to Kawasaki disease. One additional observation should be highlighted: gender and age are known to play a key role as risk-or protective-factors in the most serious and lethal forms of COVID-19 patients. In fact, COVID-19 epidemiology reveals that men and elderly are largely more seriously affected than women and young/child [36] . Noteworthy, genes under investigation in this study appear to be strongly related to the endocrine axis (disease class named "REPRODUCTION" in Figure 2) and to aging (disease class named "AGING" in Figure 2 ). Thus, we hypothesize that the functional connections of coronavirus receptors with prostate cancer and the "AGING" and "REPRODUCTION" disease classes may at least in part underlie the age and sex epidemiological features of COVID-19 patients. As final note, the disease-class that best associates to the molecular network of the 5 coronavirus genes is "IMMUNE" (Figure 2 ). It is not surprising that molecules related to the virus-entry associate with this class, but this finding underlies these molecules as potential triggers of both immune-related viral infections and other diseases such as endocrine related and cancer related diseases. According to the results of this study, we suggest the tissue expression of these coronaviruses receptors/interactors, as well as their association with specific diseases and differential expression in cancer types, may represent, at least in part, the molecular basis of COVID-19 comorbidities. We propose that further investigation of these molecules may help controlling COVID-19 comorbidities or may improve the follow up of patients who recovered from this infection. The possible occurrence of still unrecognized comorbidities is also suggested. A molecular approach somehow comparable to the one proposed in the present study, although limited to ACE2 and TMPRSS2 receptors, has been published during the submission process of our study [37] . According to the large tissues distribution of coronavirus receptors, as well as to their association with different diseases and to the highly significant differential expression in cancer types, we propose here for the first time that coronavirus receptors are molecularly related to the most frequent COVID-19 comorbidities, including cancers. The expression level of ACE2, TMPRSS2, CLEC4M, DPP4 and TMPRSS11D in 32 human normal tissues was derived from GENT2 database [38] . Transcriptomic data of this database are Table 1 reports the number of healthy controls as well number of cancer patients investigated in the present study taken from GENT2 database [38] . Chilibot analysis [39] is publicly available at www.chilibot.net. It measures the co-occurrence of the chosen keywords in the same sentence, within PubMed-indexed manuscripts, allowing to distinguish interactive (stimulatory or inhibitory) from non-interactive relationships. The tool named "relationships between two lists" was used in the present study. The first list contained the word "coronavirus"; the second list contained the words depicting all body districts. The search was carried out on March 22th 2020; the "show only interactive relationships" filter was activated. The strength of the interactive relationships was measured as function of number of PubMedindexed references supporting it. The default setup conditions were used, which stops when the analysis found 30 supporting abstracts. Association of the 5 genes to human diseases was investigated exploiting different complementary approaches. The first analysis was carried out on DisGeNET database, available at https://www.disgenet.org/, a large genes collection involved in human diseases; it allows to identify genes associated to human diseases and their comorbidities [40] . An additional analysis was carried out on GeneMania tool (www.genemania.org) [41] . Each gene was singularly analyzed to obtain a list of 20 genes most related to it. GeneMania selects the related genes on the basis of protein-protein and protein-DNA interactions, common pathways, reactions, gene and protein expression data, protein domains and phenotypic screening profiles, by using publicly available databases. The 5 lists (each composed by 1 + 20 genes) were analyzed singularly and combined, by means of DAVID Bioinformatics Resources 6.8 (https://david.ncifcrf.gov) [42, 43] , looking for gene-annotation enrichment analysis of gene-diseases association. The Genetic Association Database [44] ) was used to find diseases and disease-classes associated to each list of genes and to their combination. Gene expression levels of the five genes were investigated in two public cancer-expression databases. Analyses were first carried out on GEPIA2 database (http://gepia2.cancerpku.cn/#index) [45] . Boxplot analysis was carried out with P < 0.0001 significance cutoff in cancers vs TCGA and GTEx normal samples. Validation was carried out on GENT2 database (http://gent2.appex.kr/gent2/) [38] with a P < 0.001 significance threshold. Data from about 49,000 healthy and cancer individuals were analyzed in more than 30 different cancer type. More details on cancer types investigated and number of patients and controls present in GEPIA2 and GENT2 are reported in Supplementary Table 1. Data analyzed in the present study are all derived from anonymous public databases, with no ethic concerns. A warm thank to all individuals participating to such collections is given. ROC analysis was carried out on the expression levels of the 5 genes in the cancer types and healthy ctrls, available from GENT2 database. Area Under Curve (AUC) was computed with the ROC (Receiver Operating Characteristics) analysis, the most known metrics to evaluate binary classification. In this case the two classes were "healthy controls" on one side, and "kidney cancer" or "liver cancer" or "thyroid cancer" on the other. AUC measures the ability of the classifier to effectively distinguish the two classes, ranging from 0,5 (corresponding to 50% ability, i.e, by chance) to 1 (corresponding to 100% ability to distinguish healthy controls from cancer individuals). AUC was calculated by Prisma GraphPad software (GraphPad Prism version 6.01 for Windows, GraphPad Software, La Jolla, California USA, www.graphpad.com); significance threshold P < 0.001 was considered, unless differently specified. Boxes and whiskers graphs were obtained with Excel Microsoft. The authors declare no conflict of interest " " corresponds to more than 30 references supporting the relationship. "" corresponds to a number between 15 and 30 references supporting the relationship. "" corresponds to less than 15 references supporting the relationship. "-" stands for no references. The "only interactive relationship" filter was active. Strength of association is measured by the number of supporting references, according to DisGeNET database (https://www.disgenet.org/). "" indicates 1 to 9 peer-reviewed studies reporting the association; "" indicates 10 or more peer-reviewed studies reporting the association. Are patients with hypertension and diabetes mellitus at increased risk for COVID-19 infection? COVID-19 and Italy: what next? Lancet pii Caution should be exercised for the detection of SARS-CoV-2, especially in the elderly Kidney disease is associated with in-hospital death of patients with COVID-19 Structure analysis of the receptor binding of 2019-nCoV Organ distribution of severe acute respiratory syndrome (SARS) associated coronavirus (SARS-CoV) in SARS patients: implications for pathogenesis and virus transmission pathways Analysis of angiotensin-converting enzyme 2 (ACE2) from different species sheds some light on cross-species receptor usage of a novel coronavirus 2019-nCoV Wild-type human coronaviruses prefer cell-surface TMPRSS2 to endosomal cathepsins for cell entry SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and Is blocked by a clinically proven protease inhibitor Cell pii Polymorphisms in the C-type lectin genes cluster in chromosome 19 and predisposition to severe acute respiratory syndrome coronavirus (SARS-CoV) infection Dipeptidyl peptidase 4 is a functional receptor for the emerging human coronavirus-EMC Coronaviruses -drug discovery and therapeutic options Accepted Article FEBS Open Bio (2020) © 2020 The Authors WIPI1, BAG1, and PEX3 Autophagy-Related Genes Are Relevant Melanoma Markers Ion Channel Expression in Human Melanoma Samples: In Silico Identification and Experimental Validation of Molecular Targets A (2020) Expression of genes related to lipid-handling may underlie the "obesity paradox" in melanoma: a public database-based approach Characterization of spike glycoprotein of SARS-CoV-2 on virus entry and its immune cross-reactivity with SARS-CoV Prevalence of comorbidities in the novel Wuhan coronavirus (COVID-19) infection: a systematic review and meta-analysis Estimation of direct medical costs of middle east respiratory syndrome coronavirus infection: a single-center retrospective chart review study Characteristics and outcome of viral pneumonia caused by influenza and Middle East respiratory syndrome-coronavirus infections: A 4-year experience from a tertiary care center Clinical outcomes of current medical approaches for Middle East respiratory syndrome: A systematic review and meta-analysis Cutaneous manifestations in COVID-19: a first perspective Risk factors of critical & mortal COVID-19 cases: A systematic literature review and meta-analysis How Comorbidities Affect COVID-19 Severity in the U Accepted Article FEBS Open Bio (2020) © 2020 The Authors Do Patients with Cancer Have a Poorer Prognosis of COVID-19? An Experience in New York City Middle East respiratory syndrome coronavirus (MERS-CoV) outbreak in South Korea, 2015: epidemiology, characteristics and public health implications A (2020) Outcome of Oncology Patients Infected With Coronavirus An outbreak of severe Kawasaki-like disease at the Italian epicentre of the SARS-CoV-2 epidemic: an observational cohort study Cause of Kawasaki syndrome uncertain again Insertion/deletion polymorphism of angiotensin converting enzyme gene in Kawasaki disease Hypertension, Thrombosis, Kidney Failure, and Diabetes: Is COVID-19 an Endothelial Disease? A Comprehensive Evaluation of Clinical and Basic Evidence Mice transgenic for human angiotensin-converting enzyme 2 provide a model for SARS coronavirus infection Serum exosomal miR-328, miR-575, miR-134 and miR-671-5p as potential biomarkers for the diagnosis of Kawasaki disease and the prediction of therapeutic outcomes of intravenous immunoglobulin therapy Estimates of the severity of coronavirus disease 2019: a model-based analysis Which cancer type has the highest risk of COVID-19 infection? GENT2: an updated gene expression database for normal and tumor tissues Content-rich biological network constructed by mining PubMed abstracts DisGeNET: a comprehensive platform integrating Accepted Article FEBS Open Bio (2020) © 2020 The Authors The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function Systematic and integrative analysis of large gene lists using DAVID Bioinformatics Resources Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists The genetic association database GEPIA2: an enhanced web server for largescale expression profiling and interactive analysis Accepted Article FEBS Open Bio (2020) © 2020 The Authors The technological support from the facility for Complex Protein Mixture (CPM) Analysis at ISS, Rome, Italy is kindly acknowledged.The research activity of Angelo Facchiano is partly supported by ELIXIR IT infrastructure.