key: cord-0686994-0qvuzqak authors: Lopes, Elisson N.; Fonseca, Vagner; Frias, Diego; Tosta, Stephane; Salgado, Álvaro; Assunção Vialle, Ricardo; Paulo Eduardo, Toscano S.; Barreto, Fernanda K.; Ariston de Azevedo, Vasco; Guarino, Michele; Angeletti, Silvia; Ciccozzi, Massimo; Junior Alcantara, Luiz C.; Giovanetti, Marta title: Betacoronaviruses genome analysis reveals evolution toward specific codons usage: Implications for SARS‐CoV‐2 mitigation strategies date: 2021-05-24 journal: J Med Virol DOI: 10.1002/jmv.27056 sha: 5eadb8952f31ccdac3946393c8c40bb9cb087d2e doc_id: 686994 cord_uid: 0qvuzqak Since the start of the coronavirus disease 2019 (COVID‐19) pandemic, the severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) has rapidly widespread worldwide becoming one of the major global public health issues of the last centuries. Currently, COVID‐19 vaccine rollouts are finally upon us carrying the hope of herd immunity once a sufficient proportion of the population has been vaccinated or infected, as a new horizon. However, the emergence of SARS‐CoV‐2 variants brought concerns since, as the virus is exposed to environmental selection pressures, it can mutate and evolve, generating variants that may possess enhanced virulence. Codon usage analysis is a strategy to elucidate the evolutionary pressure of the viral genome suffered by different hosts, as possible cause of the emergence of new variants. Therefore, to get a better picture of the SARS‐CoV‐2 codon bias, we first identified the relative codon usage rate of all Betacoronaviruses lineages. Subsequently, we correlated putative cognate transfer ribonucleic acid (tRNAs) to reveal how those viruses adapt to hosts in relation to their preferred codon usage. Our analysis revealed seven preferred codons located in three different open reading frame which appear preferentially used by SARS‐CoV‐2. In addition, the tRNA adaptation analysis indicates a wide strategy of competition between the virus and mammalian as principal hosts highlighting the importance to reinforce the genomic monitoring to prompt identify any potential adaptation of the virus into new potential hosts which appear to be crucial to prevent and mitigate the pandemic. Betacoronaviruses lineages. Subsequently, we correlated putative cognate transfer ribonucleic acid (tRNAs) to reveal how those viruses adapt to hosts in relation to their preferred codon usage. Our analysis revealed seven preferred codons located in three different open reading frame which appear preferentially used by SARS-CoV-2. In addition, the tRNA adaptation analysis indicates a wide strategy of competition between the virus and mammalian as principal hosts highlighting the importance to reinforce the genomic monitoring to prompt identify any potential adaptation of the virus into new potential hosts which appear to be crucial to prevent and mitigate the pandemic. The Coronaviruses (CoVs) are organized into four genera: Alphacoronavirus and Betacoronavirus which have as natural hosts bats and rodents, and Deltacoronavirus along with Gammacoronavirus that are more frequently found in avian species. 2 After the emergence of severe acute respiratory syndrome (SARS) and middle east respiratory syndrome (MERS), 1,3,4 the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which is the etiological agent of the coronavirus disease 2019 (COVID-19), is the third major coronavirus outbreak in the last 20 years. COVID-19 may cause symptoms such as fever, cough, fatigue, and other severe complications leading to death. 4 According to the World Health Organization (WHO) updated in April 2021, more than 137 million people have been infected, causing more than 2.9 million deaths worldwide. 4 SARS-CoV-2 has a natural host, bats, and a secondary one, probably a mammalian host, who was the key to originating the jumping species mutation needed for human infection. 3 Viruses with multiple host species such as the Coronaviruses evolve to successfully thrive under different hosts environments and available resources. Therefore, the virus may suit better with codons matching their hosts' codon usage. 3 Those variants of concern (VOCs) appear to share a common aspect: the viral adaptation to the human host, resulting in changeable effects on COVID-19 and complicating attempts to control the pandemic. 10, 11 In addition, it should be noted that We collected all fourteen (14) reference Betacoronavirus sequences from the National Center for Biotechnology Information (NCBI) Genbank. An in house R script was used to split the sequences in open reading frames (ORFs) ( RSCU is a measure of nonuniform usage of synonymous codons in a sequence and it has been found to have causes and implications in RNA viruses. 12 Higher RSCU values indicate a higher bias toward a codon in detriment of its synonymous codon using codon metrics. To calculate RSCU, the observed codon value is divided by the expected codon value. We used Euclidean distance algorithms to identify putative relationships between the virus sequences and hosts. The construction of the Euclidean distance matrix was based on RSCU values calculated in previous section of each host and viruses, the analysis was performed using the following equation: ). The availability of tRNA was inferred from the hosts' genomes, counting the number of genes that encode each type of tRNA and taking into account the mechanism of tRNA sharing between synonymous codons ending with pyrimidines. 14 We compare the hosts tRNA distribution with Betacoronavirus RSCU values; then we calculated the ratio from each host and virus using the following formula: (Table S2 ). In addition, we created a matrix of RSCU values and calculated the Euclidean Distance across all Betacoronavirus, then we noticed a correlation between SARS-CoV-2 and SARS coronavirus (Table S3) , which was expected based on their similarity as already described. 2 After performing Betacoronavirus codon analysis, we focused on SARS-CoV-2 and we found that codons with the highest RSCU values Table S2 ). Soon after, we compared SARS-CoV-2 RSCU values to mammalian hosts (Figure 1 ), we search for codons targets which appear to be more important for the virus and less to human, to elucidate the adaptive mechanism. Figure 1 represents SARS-CoV-2 RSCU values close to 1, in red; and human RSCU codons values close to 0, in blue. We found seven codons, which appear to be more preferentially used by the virus than the human host, which are located in three distinct ORFs: CCG, ACG, CTC located in ORF10; GGC, TAC located in E; and GAC in M. We searched in hosts' tRNA pools for tRNA corresponding to Betacoronavirus codons with higher RSCU codons, and our goals were found codons which appear to be more crucial to virus translation than for the hosts. Thus, correlating RSCU results for humans and SARS-CoV-2, we found four of the target codons as tRNA abundant and three as tRNA scarce (Table S4 ). After that, we used the TAI index to measure the adaptation scenario which will be able to explain how this newly emergent virus was able to adapt to the hosts in relation to their preferred codon usage (Table 1) . These data compared the tRNA distribution and codon frequencies (for all virus TAI see Table S5 ). Our results point out to an high SARS-CoV-2 adaptation, with TAIs values over 70% compared with all mammalian hosts tested suggesting that also other mammal host might be ideal environments LOPES ET AL. Our work brought to light seven codons and three ORFs as preferred in the SARS-CoV-2 selection. These codons present a nucleotide preference: A and T ending codons, which is in line with previous findings. 13 Using translational adaptation models, we further inferred SARS-CoV-2 capability to survive in mammalian hosts highlighting the importance to reinforce the genomic monitoring to prompt identify any potential adaptation of the virus into new potential hosts which appear to be crucial to prevent and mitigate the pandemic. Note: These data present the Euclidean distance between observed values and ideal values for viral and each host. Abbreviations: SARS-COV-2, severe acute respiratory syndrome coronavirus 2; TAI, translational adaptation index. Additional Supporting Information may be found online in the supporting information tab for this article. Genomic characterization of a novel SARS-CoV-2 Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan Middle east respiratory syndrome coronavirus: another zoonotic betacoronavirus causing SARS-like disease SARS-CoV-2 infection in farmed minks, the Netherlands Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations Sixteen novel lineages of SARS-CoV-2 in South Africa Genomic and epidemiology of the P.1SARS-CoV-2lineage in Manaus Phylogenetic relationship of SARS-CoV-2 sequences from amazonas with emerging Brazilian variants harboring mutations E484K and N501Y in the spike protein Evolution and epidemic spread of SARS-CoV-2 in Brazil SARS-CoV-2 B.1.1.7 and B.1.351 spike variants bind human ACE2 with increased affinity Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes Genomic and evolutionary comparison between SARS-CoV-2 and other human coronaviruses Human retrovirus codon usage from tRNA point of view: Therapeutic insights Human viruses have codon usage biases that match highly expressed proteins in the tissues they infect