key: cord-0911308-1ff6494i authors: Chetta, Massimiliano; Rosati, Alessandra; Tarsitano, Marina; Bukvic, Nenad title: A SARS-CoV-2 host infection model network based on genomic Human Transcription Factors (TFs) depletion date: 2020-09-19 journal: Heliyon DOI: 10.1016/j.heliyon.2020.e05010 sha: 7e4bee25f0b278a691a1082897c9a9ffac9ddf61 doc_id: 911308 cord_uid: 1ff6494i In December 2019 a new beta-coronavirus was isolated and characterized by sequencing samples from pneumonia patients in Wuhan, Hubei Province, China. Coronaviruses are positive-sense RNA viruses widely distributed among different animal species and humans in which they cause respiratory, enteric, liver and neurological symptomatology. Six species of coronavirus have been described (HCoV-229E, HCoV-OC43, HCoV-NL63 and HCoV-HKU1) that cause cold-like symptoms in immunocompetent or immunocompromised subjects and two strains of sometimes fatal zoonotic origin that cause severe acute respiratory syndrome (SARS-CoV and MERS-CoV).The SARS-CoV-2 strain is the emerging seventh member of the coronavirus family, which is actually determining a global emergency. In silico analysis is a promising approach for understanding biological events in complex diseases and due to serious worldwide emergency and serious threat to global health, it is extremely important to use bioinformatics methods able to study an emerging pathogen like SARS-CoV-2. Herein, we report on in silico comparative analysis between complete genome of SARS-CoV, MERS-CoV, HCoV-OC43 and SARS-CoV-2 strains, to identify the occurrence of specific conserved motifs on viral genomic sequences which should be able to bind and therefore induce a subtraction of host’s Transcription Factors (TFs) which lead to a depletion, an effect comparable to haploinsufficiency (a genetic dominant condition in which a single copy of wild-type allele at a locus, in heterozygous combination with a variant allele, is insufficient to produce the correct quantity of transcript and, therefore, of protein, for a correct standard phenotypic expression). In this competitive scenario, virus versus host, the proposed in silico protocol identified the TFs same as the distribution of TFBSs (Transcription Factor Binding Sites) on analyzed viral strains, potentially able to influence genes and pathways with biological functions confirming that this approach could brings useful insights regarding SARS-CoV-2. According to our results obtained by this in silico approach it is possible to hypothesize that TF-binding motifs could be of help in the explanation of the complex and heterogeneous clinical presentation in SARS-CoV-2 and subsequently predict possible interactions regarding metabolic pathways, and drug or target relationships. identified in the sequence of SARS-CoV-2, it remains quite similar to SARS-CoV (about 79%) and MERS-CoV (about 50%) [2] . As expected, the greatest number of mutations take place in the region RBD and cause the recognition of different molecules on the surface of the host cell as a receptor. Human ACE2, expressed on the surface of the cells from respiratory and gastrointestinal tract, has been identified as the main cellular receptor for SARS-CoV-2 [5] . The growing evidence of the high adaptability of coronaviruses to new host animals in ecological niches (mammals, including humans, and birds) and the remarkable frequency of recombination, with a high mutation rate, required the use of tools that can provide quick information on the evolution of these viruses. Furthermore, the complex interplay between viruses and the host transcription machinery used by viruses for their own benefits, together with RNA viral replication, seems of fundamental importance. New evidence confirmed that many positive and negative single-strand RNA viruses, whose primary replication site is cytoplasmic, are able to sequester nuclear proteins or TF to facilitate the replication process and alter the function of host cells. One of the mechanisms by which viruses can achieve this result is the disruption of the nucleus-cytoplasmic traffic and induce a spatial redistribution of proteins from the nucleus to the cytoplasm. Several viruses (picornaviruses as well as VSV -vesicular stomatitis virus), use this kind of trafficking alteration to redistribute proteins to the cytoplasm, increase their interaction with the Internal Ribosome Entry Site (IRES) and facilitate translation of the virus polyprotein [6] . are able to increase the degradation of the host mRNA through interaction with specific viral proteins. These proteins determine endonucleolytic degradation of cytoplasmic mRNA by inducing widespread down-regulation of host gene expression. This mechanism, known as "host shutdown", allows viruses to rapidly reduce the gene expression of the host cell andto attenuate the immune response same as to recover proteins needed for viral replication [8] . It is also well known that many viruses including SARS coronavirus, alpha and gammaherpesviruses and influenza A virus, accelerate host mRNA degradation through the use of viral proteins that trigger endonucleolytic cleavage of mRNAs in the cytoplasm. This process, termed 'host shutoff', induce a widespread down regulation of host gene expression and allows viruses to rapidly restrict cellular gene expression in order to blunt immune responses and liberate resources for viral replication [8] . Furthermore, it must be considered that although human cells contain the same genome, they can differ in the expression of genes. It is known that gene expression is subject to multiple effects both spatial and temporal and does not depend exclusively on the information contained in the coding sequence of the DNA. It has recently been shown that viruses by subtracting parts of the machinery of host cells can induce epigenetic and epitranscriptomic modifications of gene expression. Finally, although all human cells in a given individual contain the same genome, they differ widely in genes expression and, hence, in their biological properties. It is therefore well established that gene expression is not determined simply by the sequence information that is encoded into genomic DNA, but rather is subject to multiple levels of control. Epigenetic and epitranscriptomic gene regulation of viral gene expression has only recently begun to emerge. As obligate intracellular parasites, viruses misappropriate parts of the host-cell machinery [9] .].On the bases of these observations, we propose the hypothesis that the virus infection could be capable of inducing a subtraction of host Transcriptional Factors (TFs) which lead to a depletion, an effect comparable to haploinsufficiency (a genetic dominant condition in which a single copy of wild-type allele at a J o u r n a l P r e -p r o o f locus, in heterozygous combination with a variant allele, is insufficient to produce the correct quantity of transcript and, therefore, of protein, for a correct standard phenotypic expression). That means that RNA viruses, could be able to used this ability as extra opportunity of their strategy to usurp cellular gene expression and interfere with target cellular processes [7] . In this competitive scenario, virus versus host, the analysis pipeline was performed using different bioinformatics friendly tools (available online free of charge) on the complete genome of SARS-CoV, MERS-CoV, HCoV-OC43 and SARS-CoV-2 strains, to identify the occurrence of specific conserved motifs capable to bind human TFs and subsequently predict their possible interplay. The analysis pipeline was performed using different bioinformatics tools available online and consists of four main steps: Analysis of complete strains of SARS-CoV, MERS-CoV, HCoV-OC43 and SARS-CoV-2 to discover conserved motifs on RNA sequences. The analysis on *.FASTA sequences was performed using MEME (Multiple EM For Motif Elicitation). By default MEME chooses the width and number of occurrences of each motif automatically in order to minimize the 'E-value' of the motif, the probability of finding an equally well-conserved pattern in random sequences. Only motif widths between 6 and 50 are considered. [10, 11] 2) All the obtained motifs were used as query for Tomtom (http://memesuite.org/doc/tomtom.html), another tool of MEME suite that compared the newly identified motifs against a database of known motifs (i.e., JASPAR). JASPAR CORE is a database that contained a curated and non-redundant set of open data access collections of experimentally discovered and proven TFs binding sites. Tomtom ranked the motifs in the database and produced an alignment for each significant match searching one or more query motifs against one or more databases of human target motifs (and their reverse complements when applicable). The report for each query was a list of target motifs, ranked by p-value and for each match an E-value and a q-value has been produced. The q-value is the minimal false discovery rate at which the observed similarity would be considered significant. Tomtom estimated q-values from all the match p-values using the Benjamini and Hochberg method. By default, significance was measured by q-value of the match. A list of Human TFs that contained the common conserved domain were obtained for all motif's queries. All the human TFs were reported in the Figure 1 [12, 13] . Comparison between the all TFs obtained from the MEME output to select unique TFs for the SARS-CoV-2 strain. . The original analytical workflow was developed to screen possible conserved motifs on SARS-CoV-2 able to subtract human genome TFs and subsequently modify the regulatory host scale networks. The result of the in silico analysis permitted the identification of twenty unique TFs specific for SARS-CoV-2. Using STITCH analysis, fifteen of those TFs were connected in a gene regulatory knowledge The approach adopted by our in silico analysis allowed to predict that a reduction of some TFs directly related to diffuse parenchymal lung disease, beside well known roles in Activator of Transcription 4) associated with a reduction of IFN-γ immunity and macrophage activity susceptibility [27, 28, 29] . It is particularly interesting that the modulation of the described TFs could be traced to the highest variability degree of IL1, IL6, IL10 and TNF-α related to the response against exogenous effects Interestingly, GATA2 (GATA Binding Protein 2) a transcription factor with an essential role in proliferation and differentiation of hematopoietic cells, as mentioned earlier, was identified among the TFs. An anomaly in GATA2 has been recently reported as the cause of familial syndromes with autosomal-dominant inheritance such as severe monocytopenia, NK and B lymphopenia, and absence of dendritic cells. Gata2 reduction also confers a selective advantage to EVI1 (Ecotropic Viral Integration Site 1) expression that contributes to an acceleration of leukemogenesis [41, 42] . Another group of TFs seems to be involved in possible development of human malignancies (see is associated with significant risk of Aspirin-induced asthma [48] . Therefore, at least in the early stages of the illness, it may be prudent to avoid use of this non-steroidal anti-inflammatory drug (NSAID). In conclusion, the extended knowledge of TFs identified by this in silico approach with apparent deep translational relevance could provide an insight of the molecular aspects as they relate to the infection process, the dynamics of changes in the host's cellular system; clarifies the possible mechanism of elevated pulmonary tropism and, finally, indicate some clarifications that reflect on paving the way to the best therapeutic strategies. Even though further in vitro and in vivo analysis will be necessary to support the functional evidences derived from above reported observations of our in silico research, it seems reasonable to advise that all patients cured from SARS-CoV-2 infection, to undergo long term follow up for tumor susceptibility due to possible oncogenic J o u r n a l P r e -p r o o f potential of SARS-CoV-2. Same attention should be paid in regards to the potential pregnancy risks. Genetic variation is the process, devoid of specific and predetermined purposes, which underlies the differentiation and, therefore, the evolution of viruses. The set of viral mutations that originate as a consequence of different molecular mechanisms, allow the virus adaptation to the environment that will act by selecting the most suitable characteristics [49, 50] . RNA viruses, with high predisposition to replication error mediated by RNA-polymerase or reverse transcriptase are under the high diversification process [51] . The introduction of errors in the virus genome, generally around leading or consensus sequences, determines an extremely heterogeneous population structure which, in many cases, follows the concept of "quasispecies" -the viral population within a host composed by a distribution of different mutants in dynamic balance [52, 53] . This population is subject to continuous processes of genetic variation, competition and adaptation, that favor subsequent selection. Surface proteins that bind to the cell receptor, even if the non-structural proteins and the regulatory regions of the viral genome can influence the tropism of virus for the host cell play the main role in this selection [54] . Considering the evidence as well as the public health emergencies such as COVID-19 (SARS-CoV-2), it is necessary to find new rapid analysis strategies, such as computational approaches which, in addition to providing indications on the mechanisms of action of the virus, can also provide evolutionary models and possible models of transmission following the disturbance of the host's cellular network. Coronaviruses: an overview of their replication and pathogenesis Genome Composition and Divergence of the Novel Coronavirus (2019-nCoV) Originating in China The proximal origin of SARS-CoV-2 Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding ACE2 receptor expression and severe acute respiratory syndrome coronavirus infection depend on differentiation of human airway epithelia The interaction of animal cytoplasmic RNA viruses with the nucleus to facilitate replication Changes in cellular mRNA stability, splicing, and polyadenylation through HuR protein sequestration by a cytoplasmic RNA virus Changes in mRNA abundance drive shuttling of RNA binding proteins, linking cytoplasmic RNA degradation to transcription. Elife Epigenetic and epitranscriptomic regulation of viral replication The Battle of RNA Synthesis: Virus versus Host MEME SUITE: tools for motif discovery and searching DREME: motif discovery in transcription factor ChIP-seq data Quantifying similarity between motifs STITCH: interaction networks of chemicals and proteins GATA2 deficiency in children and adults with severe pulmonary alveolar proteinosis and hematologic disorders NFATregulated cytokine gene expression during tacrolimus therapy early after renal transplantation Inhibitory effect and transcriptional impact of berberine and evodiamine on human white preadipocyte differentiation Cooperative regulation of the interferon regulatory factor-1 tumor suppressor protein by core components of the molecular chaperone machinery NFAT control of immune function: New Frontiers for an Abiding Trooper. F1000Res CCAAT-enhancer-binding proteins (C/EBP) regulate the tissue specific activity of the CD11c integrin gene promoter through functional interactions with Sp1 proteins Evaluation of TNF-α, IL-10 and IL-6 Cytokine Production and Their Correlation with Genotype Variants amongst Tuberculosis Patients and Their Household Contacts Inflammaging and Anti-Inflammaging: The Role of Cytokines in Extreme Longevity Expression of the human FSHD-linked DUX4 gene induces neurogenesis during differentiation of murine embryonic stem cells A unique subset of lowrisk Wilms tumors is characterized by loss of function of TRIM28 (KAP1), a gene critical in early renal development: A Children's Oncology Group study Dynamic Pattern of HOXB9 Protein Localization during Oocyte Maturation and Early Embryonic Development in Mammals The human pregnane X receptor: genomic structure and identification and functional characterization of natural allelic variants The role of the GATA2 transcription factor in normal and malignant hematopoiesis Constitutive expression of the AP-1 transcription factors c-jun, junD, junB, and c-fos and the marginal zone B-cell transcription factor Notch2 in splenic marginal zone lymphoma Modeling the Function of TATA Box Binding Protein in Transcriptional Changes Induced by HIV-1 Tat in Innate Immune Cells and the Effect of Methamphetamine Exposure. Front Immunol Affinity and competition for TBP are molecular determinants of gene expression noise Transcriptome profiling reveals the complexity of pirfenidone effects in idiopathic pulmonary fibrosis Sanjuán R, Domingo-Calap P. Mechanisms of viral mutation PubMed Central PMCID: PMC5075021. 51: Duffy S. Why are RNA virus mutation rates so damn high? The quasispecies nature and biological implications of the hepatitis C virus Viral quasispecies evolution. MicrobiolMolBiol Rev Virus-Receptor Interactions: The Key to Cellular Invasion The 2019 novel coronavirus disease (COVID-19) pandemic: A zoonotic prospective Development of epitope-based peptide vaccine against novel coronavirus 2019 (SARS-COV-2): Immunoinformatics approach SARS-CoV-2 causing pneumonia-associated respiratory disorder (COVID-19): diagnostic and proposed therapeutic options Probable Molecular Mechanism of Remdesivir for the Treatment of COVID-19: Need to Know More A SARS-CoV-2 protein interaction map reveals targets for drug repurposing