key: cord-0785906-mlhtbc3d authors: Vahed, Majid; Vahed, Mohammad; Sweeney, Aaron; Shirazi, Farshad H; Mirsaeidi, Mehdi title: Mutation in position of 32 (G>U) of S2M differentiate human SARS-CoV2 from Bat Coronavirus date: 2020-09-08 journal: bioRxiv DOI: 10.1101/2020.09.02.280529 sha: 454357d4346f513c6dbd41ad38caf9b0ad68c535 doc_id: 785906 cord_uid: mlhtbc3d The new Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) is a zoonotic pathogen that has rapidly mutated and become transmissible to humans. There is little existing data on the mutations in SARS-CoV-2 and the impact of these polymorphisms on its transmission and viral load. In this study, the SARS-CoV-2 genomic sequence was analyzed to identify variants within the 3’UTR region of its cis-regulatory RNA elements. A 43-nucleotide genetic element with a highly conserved stem-loop II-like motif (S2M), was discovered. The research revealed 32 G>U and 16 G>U/A mutations located within the S2M sequence in human SARS-CoV-2 models. These polymorphisms appear to make the S2M secondary and tertiary structures in human SARS-CoV-2 models less stable when compared to the S2M structures of bat/pangolin models. This grants the RNA structures more flexibility, which could be one of its escape mechanisms from host defenses or facilitate its entry into host proteins and enzymes. While this S2M sequence may not be omnipresent across all human SARS-CoV-2 models, when present, its sequence is always highly conserved. It may be used as a potential target for the development of vaccines and therapeutic agents. The emergence of new viral pathogens is a danger to public health (1) . Three emerging pathogenic coronaviruses (CoVs), Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV-1), Middle East Respiratory Syndrome Coronavirus (MERS-CoV), and a newly identified CoV (SARS-CoV-2), are zoonotic viruses, which utilize bats as their natural reservoir. They are then transmitted through intermediate hosts, eventually infecting humans. The pressure on host selection of SARS-CoV-2 in human models will have an effect on long term conservation of mutations that enhance its transmissibility (2) . However, the determinants regulating strong trans-species evolution remain unknown due to challenges in recognizing viral precursors and animal reservoirs (3) . A 43-nucleotide genetic element with a highly conserved stem-loop II-like motif (S2M) has been reported in four groups of positive single stranded RNA viruses; Astroviridae, Picornaviridae, Caliciviridae, and Coronaviridae (4) . The significance of S2M sequences in viral strains remains to be determined. However, it appears to be important to viral transmissibility and will likely be found in future emergent coronaviruses (5) . Given that S2M is a highly conserved component of coronaviruses viral genome and is absent from the human genome, it could become a potential target for antiviral drug therapy (6) . This study was performed in order to analyze coronavirus genomic sequences isolated from human, bat, and pangolin models in order to identify variants within the structured cis-regulatory RNA elements in the 3'UTR region, including the S2M loop. The SARS-CoV-2 isolated from patients on a cruise ship in Japan (NCBI GenBank Accession Number LC528232.1) was used as a reference genome. Identifying functional RNA motifs and elements was performed using RegRNA 2.0 tools (filtered to human settings) (7) . The S2M motifs (43 nucleotides long) were aligned for bat/pangolin models ( Fig. 1 (a) ) and for human models SARS-COVs ( Fig. 1 (b) ). A G to U amino acid transfer at position 32 and a G to U/A amino acid transfer as position 16 ( Fig. 1(b) ) were found in human SARS-CoV-2. This profile was used to search for all viral sequences in GenBank, while using different combinations of 32 G>U and 16 G>U/A nucleotide substitutions that occur in a conserved region within 3′ UTR, known as the S2M motif. RNA sequences were aligned using 3´UTR (29,543-29,903), with queries in a BLASTn search of the NCBI database for S2M motifs (8, 9) . The 3` UTR stem-loop structures were determined by using the RNAfold web server (http://rna.tbi.univie.ac.at/forna/). Additionally, sequence and structural information of astrovirus S2M motifs was integrated for RNAfold design to analyze a potential zoonotic precursor for coronaviruses from a hypothetical infectious sequence (10) . The PDB structure for S2M was downloaded from the Protein Data Bank (http://www.rcsb.org/pdb) (PDB ID: 1XJR) (6). All of the structures were visualized by PyMOL (11) and analysis was performed as presented in previous studies (12, 13) . RNAComposer was used to model the three dimensional (3D) RNA structures (14, 15) . Clustal Omega was used to apply mBed algorithms for guide trees. ClustalW alignment tools were used to execute multiple sequence alignments (16) . The S2M RNA sequences were compared between members of the coronavirus family (bat and pangolin CoVs, SARS-CoV-1, and SARS-CoV-2) as well as the astrovirus family (avian, porcine, bovine, chicken, turkey, ovine, mink, tiger, human and mamastrovirus). The S2M motifs were screened with miRBase tools in order to identify potential miRNA binding sites (17) . Only alignment positions which harbored defined A/T/G/C residues in 95% of their genomes were considered for nucleotide substitutions (18) . In order to be graphically represented, sequence logos of the selected S2M motifs were then constructed using WEBLOGO (19) (20) (21) . A nucleotide BLAST search of these sequence motifs was performed on the NCBI portal. S2M containing sequences were identified for the following sequence: Position 32 of S2M is a critical point of the sequence that is specific to the human coronaviruses and variable for the bat/pangolin coronaviruses ( Fig. 1 (a,c)). A second polymorphism appears in human SARS-CoV-2 models at position 16 G>U/A ( Fig. 1 (d)). A previous polymorphism was found in SARS-CoV-1 at position 6 C>U ( Fig. 1 (b) ). secondary structures shows non-uniform distribution (Fig. 3) . The closer the two curves, the better is the defined diffraction of the S2M structures in human and bat/pangolin coronaviruses. The centroid structures of RNA S2M sequences with minimal base-pair distance are significantly different in human MFE structures of astroviruses, and are similar in other astroviruses (Fig. S3, S4 ). 3D structures of S2M sequences of coronaviruses in human and bat/pangolin models were created based on the secondary structure in the dot-bracket format (Fig. 4) . The mutations at position 16 and 32 (G>U) impact tertiary structures and consequently cause conformational changes. The human stem-loop structure is bent to the right side when compared to the stem-loop structure in the bat/pangolin coronavirus (Fig. 5) . The conformational changes within the S2M loop may affect its binding to host proteins and enzymes. ClustalW multiple sequence alignment was used to align secondary structures. The great majority of the sequences were able to fold into the canonical S2M stem-loop structure. Clusters of S2M sequences are highlighted. The clustering trees of coronavirus ( Mirbase was used to screen human miRNA that could target S2M sequences. Additional focus was put on miRNAs that have been reported as components of anti-viral miRNA-mediated defense [28] . This study identified two potential binding sites within the S2M sequences of bat/pangolin and SARS-CoV-1: hsa-miR-1304-3p & hsa-miR-1307-3p. Only one potential binding site was found within the S2M of Australian and Iranian SARS-CoV-2 samples: hsa-miR-1307-3p ( Fig. 7) . While analyzing all available sequences available up to August 14 th 2020, a 16 G>U/A S2M motif mutation was identified at position 29742 of SARS-CoV-2-2020 in Iran, Australia, Taiwan, Sir Lanka, Bahrain, USA, Georgia, Bangladesh, Norway, Hong-Kong, Germany and Turkey ( Fig. 8 and 9 ). The current study found that a consistent G>U mutation at the 32 position, of the 43 nucleotide, long S2M sequence of SARS-CoV-2. This mutation has not been found in any bat or pangolin CoV strains. We concluded that transmissibility to human from bat/pangolin was related to this mutation. The of SARS-CoV-2 may promote its viability and infectivity. It is likely that the S2M sequences of (+) ssRNA viruses are still active and will continue to affect these viruses' evolution [5] . The 16 G>U/A and 32 G>U nucleotide changes in the S2M sequence of SARS-CoV-2 render it a target for one human miRNA, hsa-miR-1307-3p. However, bat/pangolin coronavirus and SARS-CoV-1 S2M sequences are targetable by two human miRNAs hsa-miR-1307-3p and hsa-miR-1304-3p. As such, only one human miRNA is capable of targeting the S2M sequence of SARS-CoV-2 and affecting its viral replication. The study examined S2M mutations able to impact the cis-regulatory elements in the SARS-CoV-2 genome. While the evolutionary and functional origin of S2M has yet to be discovered, its presence across even distantly related viruses insinuates that the sequence is important for viral transmission. These findings provide insight into the significance of viral RNA structures and introduce S2M as a potential target for development of vaccines and therapeutic agents. The miRNA binding sites within the S2M of SARS-CoV-2. Social and environmental risk factors in the emergence of infectious diseases SARS coronavirus pathogenesis: host innate immune responses and viral antagonism of interferon Molecular constraints to interspecies transmission of viral pathogens A mobile genetic element with unknown function found in distantly related viruses Distribution and Evolutionary History of the Mobile Genetic Element s2m in Coronaviruses The structure of a rigorously conserved RNA element within the SARS virus genome An enhanced computational platform for investigating the roles of regulatory RNA and for identifying functional RNA motifs Basic local alignment search tool BLAST+: architecture and applications ViennaRNA Package 2.0 Ligand docking and binding site analysis with PyMOL and Autodock/Vina The initial stage of structural transformation of Abeta42 peptides from the human and mole rat in the presence of Fe(2+) and Fe(3+): Related to Alzheimer's disease Simulation Study on Complex Conformations of Abeta42 Peptides on a GM1 Ganglioside-Containing Lipid Membrane New functionality of RNAComposer: an application to shape the axis of miR160 precursor structure Automated 3D structure composition for large RNAs The EMBL-EBI search and sequence analysis tools APIs in 2019 miRBase: tools for microRNA genomics Mutation landscape of SARS-CoV-2 reveals three mutually exclusive clusters of leading and trailing single nucleotide substitutions WebLogo: a sequence logo generator Sequence logos: a new way to display consensus sequences A tool for detecting bipartite motifs by considering base interdependencies Nextstrain: real-time tracking of pathogen evolution disease and diplomacy: GISAID's innovative contribution to global health Emergence of SARS-CoV-2 through Recombination and Strong Purifying Selection Screening of feral and wood pigeons for viruses harbouring a conserved mobile viral element: characterization of novel Astroviruses and Picornaviruses Authors would like to that Nextstrain for providing a real-time snapshot of evolving SARS-CoV2. The authors declare no conflict of interest. (16)).