key: cord-0914334-v49o44gy authors: Dutta, Rupam; Buragohain, Lukumoni; Borah, Probodh title: Analysis of codon usage of Severe Acute Respiratory Syndrome Corona Virus 2 (SARS-CoV-2) and its adaptability in dog date: 2020-08-07 journal: Virus Res DOI: 10.1016/j.virusres.2020.198113 sha: 045959b3f4c7e1f631f37cf5454636f1368c0aba doc_id: 914334 cord_uid: v49o44gy Severe acute respiratory syndrome corona virus 2 (SARS-CoV-2) is recognized as one of the life-threatening viruses causing the most destructive pandemic in this century. The genesis of this virus is still unknown. To elucidate its molecular evolution and regulation of gene expression, the knowledge of codon usage is a pre-requisite. In this study, an attempt was made to document the genome-wide codon usage profile and the various factors influencing the codon usage patterns of SARS-CoV-2 in human and dog. The SARS-CoV-2 genome showed relative abundance of A and U nucleotides and relative synonymous codon usage analysis revealed that the preferred synonymous codons mostly end with A/U. The analysis of ENc-GC3s, Neutrality and Parity rule 2 plots indicated that natural selection and other undefined factors dominate the overall codon usage bias in SARS-CoV-2 whereas the impact of mutation pressure is comparatively minor. The codon adaptation index and relative codon deoptimization index of SARS-CoV-2 deciphered that human is more favoured host for adaptation compared to dog. These results enhance our understanding of the factors involved in evolution of the novel human SARS-CoV-2 and its adaptability in dog. Coronaviruses belong to the family Coronaviridae and are the largest enveloped single-stranded RNA viruses, ranging from 26 to 31 kilobases in genome size (Lauber et al., 2012) . These viruses infect a wide range of avian and mammalian species, and are responsible for enteric or respiratory infections (Woo et al., 2009) . Human coronaviruses, viz. severe acute respiratory syndrome-related coronavirus (SARS-CoV) and Middle-East respiratory syndrome coronavirus (MERS-CoV) emerged in the year 2002 and 2012, respectively (Zaki et al., 2012) . Both of these viruses have a zoonotic origin and hence emergence of human infections associated with these viruses has emphasized the need of controlling coronaviruses associated with diseases inanimals in close contact with humans (Kin et al., 2016) . A cluster of pneumonia cases of unknown origin were reported from the Wuhan, the capital city of Hubei Province of China in late December 2019. The cases were found to be linked with Huanan Seafood Market and the pathogen was thought to have a zoonotic origin (Andersen et al., 2020) . The virus that caused the outbreak was identified as a novel, human-infecting coronavirus, which is closely related to bat coronaviruses, pangolin coronaviruses, and SARS-CoV (Han et al., 2020; Perlman et al., 2020) . Subsequently, the virus spread globally causing a pathological condition which was termed as coronavirus disease 2019 , and the pathogen was named as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The World Health Organization (WHO) declared the outbreak as a Public Health Emergency of International Concern on 30 th January 2020 and recognized it as a pandemic in 11 th March 2020. Till June 2020, approximately 9.7 million cases of SARS-CoV-2 were reported worldwide with more than 491960 deaths. The SARS-CoV-2 genome contains 14 ORFs encoding 27 proteins. The orf1ab and orf1a genes encode proteins, Pp1ab and Pp1a, respectively. The Pp1ab protein contains 15 nsps (nsp1-nsp10 and nsp12-nsp16). The SARS-CoV-2 genome also contains four structural proteins, namely, spike (S), envelope (E), membrane (M) and nucleocapsid (N) proteins (Wu et al., 2020) . The S protein is the key protein that regulates the attachment of the virus receptor to the host target cell (Cavanagh, 1995) , E protein acts as an ion channel and facilitates virion assembly (Ruch and Machamer, 2012) , M and E proteins play a role in virus assembly and are involved in biosynthesis of new virus particles (Neuman et al., 2011) , while N protein forms the nucleoprotein complex with the virus RNA (Risco et al., 1996) . The 9th ORF of SARS-CoV-2 codes for N protein and another unique accessory protein called ORF9b in a different reading frame, whose function is not yet known. The N-protein is a 46 kDa protein composed of 422 amino acids (Rota et al., 2003) . It is a multifunctional protein with distinct functions such as enhancing transcription of the viral genome, association with M protein during virion assembly, and disruption of the various activities of the host cell by inducing toxicity (McBride et al., 2014) . It is also the most conserved and stable protein among the CoV structural proteins; whereas, the S protein undergoes substantial changes during virus infection. The S glycoprotein harbours a furin cleavage site at the boundary between the S1/S2 subunits, which is processed during biogenesis. Cleavage of S protein activates the protein for membrane fusion via extensive irreversible conformational changes andthus initiates the binding of SARS-CoV-2 with ACE2 receptor and entry to the host system (Walls et al., 2020) . Codon usage bias is an important measure of genome evolution. Factors that could influence the bias in codon usage include mutational pressure including natural selection, G+C content, secondary protein structure and selective transcription replication (Butt et al., 2014) . Codon usage is a driving force in the evolution of viruses (Sewatanon et al., 2007) . The codon usage bias frequency of RNA viruses is low, such as in the Zaire ebolavirus (Cristina et al., 2015) , and the N gene of Rabies virus (He et al., 2017) and Equine influenza virus (Kumar et al., 2016) . However, the overall codon usage bias in case of Hepatitis A virus (HAV) is high (Zhang et al., 2011) . Investigation of viral gene structure and its composition at the codon or nucleotide level is essential to understand the mechanism of virus-host relationship and evolution of the virus (Hemert et al., 2016) . Viruses that infect humans, but not those that infect other mammals or aves, show a strong resemblance to most mammalian and avian hosts, in terms of both amino J o u r n a l P r e -p r o o f acid and codon preferences. In groups of viruses that infect humans or other mammals, the highest observed level of adaptation of viral proteins to host codon usages is for those proteins that appear abundantly in the virion. In contrast, proteins that are known to participate in hostspecific recognition do not necessarily adapt to their respective hosts (Bahir et al., 2009) . The redundancy of the genetic code provides evolution with the opportunity to adjust the efficiency and accuracy of protein production preserving the same amino acid sequence (Stoletzki et al., 2007) . Similarity in codon usage pattern among viruses and their hosts may influence viral fitness, evasion from host's immune system and evolution (Costafreda et al., 2014) . Synonymous triplet codons are generally not used randomly and the main forces that drive this bias from equal usage are natural selection and mutational biases (Musto et al., 2016) . Therefore, the study of codon usage in viruses can reveal important information about virus evolution, regulation of gene expression and protein synthesis (Butt et al., 2014) . In addition, codon composition may also influence robustness of translation and, in turn, robustness of folding, which is critical to the capsid stability of hepatitis A viruses (Andrea et al., 2019) . The aim of this study was to carry out a comprehensive analysis of codon usage and composition of the severe acute respiratory syndrome corona virus 2 (SARS-CoV-2) genome and to ascertain the possible evolutionary determinants of the biases found. Complete genome sequences of SARS-CoV-2 were obtained from the Virus Resource at the National Centre for Biotechnological Information (https://www.ncbi.nlm.nih.gov/labs/virus). and NC045512.2. The Open reading frames (ORFs) for each genome were concatenated in the following order: ORF1ab + Spike + Envelop + Membrane + Nucleocapsid. The nucleotide composition of SARS-CoV-2 was analysed at the third nucleotide position of the codons (A3%, G3%, C3% and U3%) and the overall composition of nucleotides AU%, AU3%, GC%, GC12 and GC3 were determined. The RSCU value of a codon is the ratio of its observed frequency to its expected frequency given that all codons for a particular amino acid are used equally (Bera et al., 2017) . The RSCU values were calculated using the method described by Kumar et al. (2016) using the following equation: Where gij is the observed number of the i th codon for the j th amino acid, which has ni kinds of synonymous codons. Codons with RSCU value <1.0, 1.0 and>1.0 represent negative codon usage bias, no bias and positive codon usage bias, respectively. The dinucleotide frequencies of SARS-CoV-2, which is another way of establishing the relation with codon usage bias, were calculated as described by Kumar et al. (2016) . The expected dinucleotide values were calculated assuming random association of bases from the observed frequencies of each base for every sequence. The ratio of the observed and the expected dinucleotide frequencies is known as odds ratio. It was used for designation of overrepresentation (>1.23) or under-representation (<0.78) in terms of relative abundance compared with a random association of mononucleotides. The similarity index analysis was performed to know the result of codon usage by the host and their role in shaping the overall codon usage of the virus. Analysis of codon usage by coding sequences of SARS-COV-2 and its respective hosts (human and dog) was performed using the method of Zhou et al. (2013) . The similarity index was calculated using the following formula: Wright (1990) gave the concept of the eff ective number of codons (ENc) to recognize the bias in the identical codon usage. The values of ENc range from 20 to 61 and an ENc value of 20 indicates an extreme codon usage bias of a gene, and this means a specific amino acid is denoted by only one codon, despite the availability of synonymous codons. On the contrary, ENc value of 61 indicates no bias in codon usage which means a uniform use of all the synonymous codons. Generally, for a genome or a gene, an ENc value below 35 is known to have a strongly biased codon usage. An ENc-plot was employed to determine whether the codon usage of SARS-COV-2 (concatenated ORFs) is mainly because of the burden of mutational or selection pressure. The expected ENc plot was generated by plotting the ENc values on x-axis and the GC3 values in yaxis (frequency of either a guanine or cytosine at the third codon position of the synonymous codons) (Wright, 1990) . If the predicted ENc value lies on the expected curve, it indicates that codon usage is constrained only by mutation bias, while ENc values below the expected curve indicate that other factors such as selection pressure have affected the codon usage bias. A neutrality plot analysis was done to understand the effect of mutational bias and translation selection on codon usage. Neutrality plot was constructed with GC12 on y-axis and GC3 on x-axis, where GC12 stands for the average value of GC contents at the first and the second positions of the codons and GC3 refers to the GC contents at the third position of the codon. A regression line was drawn between contents of GC12 and GC3. The slope of regression line represents the impact of mutational force (Nasrullah et al., 2015) . The AT bias [A3/(A3+T3)] as the ordinate and the GC bias [G3/(G3+C3)] as the abscissa were used to determine a parity rule 2 (PR2) bias (Wu et al., 2015) . The GRAVY value is the total of all amino acids' hydropathy values in a series separated by the number of residues ranging from −2.0 to +2.0 (Kyte and Doolittle, 1982) . Hydrophobicity of a protein is characterized by positive values, whereas negative values are indicative of J o u r n a l P r e -p r o o f hydrophilicity. The frequency of the aromatic amino acids, i.e. Phenylalanine, Tyrosine and Tryptophan is known as AROMO value in a given amino acid sequence. Codon adaptation index (CAI) analysis is a quantitative value indicating the frequency of a preferred codon utilized by highly expressed genes. This shows the efficiency of translation and is often used to construct nucleotide sequences to get the highest level of protein expression for the purpose of vaccine production (Gustafsson et al., 2012) . The value of CAI varies from 0.0 to 1.0; a higher value suggests a greater propensity for gene expression. Alternatively, values close to 1 are shown by the codons with higher RSCU values. In the present study, CAI values were calculated for SARS-COV-2 using an RSCU reference set for human, bat, dog, cat, pig, horse and cattle. The synonymous codon usage data of human, dog, cat, pig, horse and cattle were retrieved from the codon usage database (http://www.kazusa.or.jp/codon/), whereas for bat, the sequence of Pteropus vampyrus (NW_011888782) was retrieved from NCBI (https://www.ncbi.nlm.nih.gov) and by using online program 'Countcodon' (available at: http://www.kazusa.or.jp/codon/countcodon.html) the reference codon usage table for bat was prepared. The Relative Codon Deoptimization Index (RCDI) is used to compare the codon usages of the genes and reference genomes. The viral gene translation rate to a host system is calculated using RCDI value. RCDI value close to one indicates similar codon usages by the host and the pathogen, and a greater adaptation to the host can be predicted (Butt et al., 2016) . In the present study, RCDI values of SARS-COV-2 were calculated for human, bat, dog, cat, pig, horse and cattle. The values for the RSCU and AROMO were estimated using CODONW 1.4 program. Calculation of GRAVY values were done by using the online tool available at http://www.gravycalculator.de/. CAI and RCDI values were measured using the online tool available at J o u r n a l P r e -p r o o f http://genomes.urv.es/CAIcal/ (Puigbo et al., 2008) . Another web-based tool was used to obtain the tRNA database (GtRNAdb: Genomic tRNA database). The SARS-CoV-2 was found to have comparative abundance of A and U nucleotides in comparison to G and C nucleotides. The nucleotide compositions of SARS-CoV-2 genes were calculated in order to determine the compositional constrains of its genome (Supplementary Table S1 ). Out of the four nucleotides, the mean percentage of U (32.02%) was found to be the highest, followed by A (29.94%) and G (19.78%), while C (18.25%) showed the lowest mean value. In the third position of the synonymous codons, U3 (43.87%) was the highest in frequency, followed by A3 (28.13%) and C3 (15.36%), while G3 (12.63%) was found to be the lowest. The mean AU and GC compositions were 61.96% and 38.03%, and the mean AU3 and GC3 compositions were 72% and 27.99%, respectively (Supplementary Table S1 ). It was quite interesting to note that over-represented codons were A/U ended and mostly underrepresented codons were C/G-ended (Table 1) . Analysis of RSCU values of SARS-CoV-2 and its different hosts uncovered the codon preferences of SARS-CoV-2, human, dog, cat, pig, horse and cattle (Table 1 ). The average RSCU of SARS-CoV-2 was compared to that of its normal (human) and accidental (dog) hosts along with other animal species which revealed that the codon preference of SARS-CoV-2 and its hosts (natural, accidental and other) are not similar (Figure 1 ). Specific preferences in SARS-CoV-2 and its host codon usage suggested that the virus does not compete with the host tRNA array. The composition of UpU (8.00%) and ApA (7.48%) were obtained as the most abundant dinucleotide in the SARS-CoV-2 genome with odd ratio of 1.04 and 1.07 respectively, while CpC (4.56) and CpG (4.75%) were the least abundant dinucleotide with the lowest odds ratio All CpG-containing codons were not over-represented (RSCU≤ 1.6) and were not preferred for their respective amino acids, while among the UpA containing codons, only UUA (Leucine) was over-represented in case of SARS-CoV-2.The relative abundance of UpG (1.4) and CpA (1.28) dinucleotides also indicated a severe deviation from the normal and these dinucleotides were over-represented compared to others ( Figure 2 ). All the UpG -containing codons were underrepresented (RSCU≤ 1.6) and were not preferred for their respective amino acids. Among the CpA containing codons, ACA (T) and CCA (P) were over-represented (RSCU ≥ 1.6). The mutational pressure or selection pressure on a gene or genome due to codon usage is determined by ENc-GC3s plot analysis. The analysis of ENc-plot revealed that all the points of SARS-COV-2 virus lie below the expected curve, indicating the influence of natural selection as J o u r n a l P r e -p r o o f the major force in codon usage bias in SARS-CoV-2 virus sequences ( Figure 3) . However, the overall reduction in the percentage of estimated ENc value of all the considered genes of SARS-COV-2 compared to the theoretical value was found to be 8.52%. A neutrality plot analysis was done to decipher the degree of influence of natural selection and mutation pressure in shaping the codon usage bias in SARS-COV-2 virus sequences. In case of SARS-COV-2, a weak positive correlation was observed between GC12 and GC3 (r = 0.3). However, the slope of the regression line in respect of SARS-CoV-2 was 0.1488 indicating that the relative influence of mutation pressure was 14.88% and contribution of natural selection was 85.12 % (Figure 4) . For parity analysis, we plottedA3/(A3 + T3) and G3/(G3 + C3) as ordinate and abscissa, respectively ( Figure 5 ). The means of AT bias [A3/(A3 + T3)] and GC bias [G3/(G3 + C3)] were found to be 0.39 and 0.451, respectively. A bias value greater than 0.5 suggests a preference for pyrimidine over purine (Zhang et al., 2018) . Thus in SARS-CoV-2, T is preferred over A and C is preferred over G. Frequency of tRNA genes in human cells; for a single codon, a variable number of isoacceptor tRNAs are present, which varies across the organisms. Translation selection determines whether most codons preferred by SARS-CoV-2 are recognized by the most abundant isoacceptor tRNAs (Khandia et al., 2019) . Out of 18 amino acids (which are encoded by two or more amino acid codons) except for Leucine, Isoleucine, Valine and Proline, non-optimal codon-anticodon base pairs were used (Table 2) . The average codon adaptation index (CAI) values for all the five genes was found to be highest in bat (0.817) and human (0.698) followed by dog, cattle, horse, cat and pig, respectively ( Figure 6 ). The cumulative effect of codon biases on gene expression was determined by relative codon Correlation among various parameters such as ENc, GC3, CAI, Laa, AROMO and GRAVY was also studied (Table 3) . A positive correlation of GC3 was observed with CAI, GRAVY and AROMO, while a negative correlation was observed with Laa and ENc. The correlation analysis among CAI, GRAVY and AROMO was done to determine the effect of GRAVY and AROMO (indicators of natural selection) on expressivity of gene (indicated by CAI). However, no correlation was observed. A similarity (SiD) analysis was performed to ascertain the function of different hosts in framing the codon usage pattern of SARS-CoV-2.The investigation of similarity indices revealed that human has more (0.117) impact than dog (0.05) on SARS-CoV-2 codon usage bias. In the present investigation, we studied codon bias and codon usage of SARS-COV-2 by characterizing them with different parameters. The SARS-COV-2 genomes were found have relative abundance of A and U nucleotides and a preference of A/U ending codons over G/C ending codons. Similar results were reported in a previous study on SARS-CoV-2 (Dilucca et al., , Tort et al., 2020 . It was reported that the N gene of Coronavirus has higher AT% than GC% with an effective number of codons ranging from 40.43 to 53.85 indicating a slight codon bias (Sheikha et al., 2019) . The codon usage in RNA viruses is affected by the relative abundance of dinucleotide (Belalov and Lukashev, 2013) . CpG depletion is considered to be a selective force that influences the frequency of codons that contain CpG. Low relative abundance of CpG may be attributed to unmethylated CpG-containing sequences, which are recognized as pathogenic signatures and methylation of cytosine residues by innate hosts' defence systems (Li and Zhang, 2014) . It was found that the RSCU value of six codons containing CpG (CCG, GCG, CGG, UCG, ACG and CGA) were under-represented (RSCU < 0.6). This indicates that the selection pressure influences significantly on the codon usage in J o u r n a l P r e -p r o o f SARS-COV-2. Our observations were in agreement with those of previous studies on equine influenza virus (Kumar et al., 2016) and Nipah virus (Khandia et al., 2019) . It was reported that TpA and UpA containing dinucleotides were also under-represented in the genome of DNA and RNA viruses (Kumar et al., 2016) . Higher cytoplasmic RNase susceptibility to UpA helps to maintain mRNA turnover within the cell (Beutler et al., 1989) . (Kumar et al., 2016) . Our results suggested that dinucleotide compositions play a significant role in determining the codon usage patterns in SARS-COV-2 genome. This also suggests that selection pressure leading to low UpA frequencies is not directly involved in SARS-COV-2 codon usage patterns; rather these patterns are primarily regulated by compositional constraints, since SARS-COV-2 genome is rich in A and U nucleotides. This result is consistent with the earlier findings of Khandia et al. (2019) , who reported that codon bias is primarily due to the direct effect of dinucleotide bias. The average GC and GC3 contents of SARS-COV-2 genome were 38.03 and 27.99, respectively. In the case of codon usage that is influenced only by the genome's GC3 content, the ENc values lie just above the predicted ENc curve indicating mutational pressure (He et al., 2016) . The ENc values were below the predicted ENc curve in the SARS-COV2 genome indicating the dominant role of selection pressure. A neutrality plot analysis was performed to determine the role of selection pressure. The weak positive correlation between GC12 and GC3 and slope of the regression line closing to zero (regression line slope, y = 0.1488x + 38.889, R 2 = 0.3698) observed in the present study indicated that selection pressure J o u r n a l P r e -p r o o f was the dominant factor in shaping the codon usage pattern of SARS-CoV-2. It was also observed that the concatenated CDS of SARS-CoV-2 were away from the slope of the regression line which further suggested that selection pressure was the major force and mutational pressure was the minor force influencing SARS-CoV-2 codon usages. No association between ENc and GRAVY or ENc and AROMO was found, suggesting that hydrophobicity or aromaticity does not affect codon usage bias. In addition, no association between CAI, GRAVY and AROMO was observed to suggest an impact of GRAVY and AROMO on gene expression. Negative correlation between Laa and ENc indicated that the number of amino acids does not have any influence on codon usage bias, which might be due to the effect of natural selection in synonymous codon usage pattern (Wei et al., 2014) . Analysis of similarity index showed that the human genome has more effect on SARS-COV-2 codon usage than that of dog. Previously, the similarity index analysis was reported for chikungunya virus and Zika virus (Butt et al., 2014; 2016) . However, higher similarity indices were observed in dog and African green monkey than human host for Nipah virus (Khandia et al., 2019) . Evolutionary analysis suggested that SARS-CoV-2 has the highest similarity to bat virus as compared to human host (Nasrullah et al., 2015) . Codon usage can be shaped by many different selection forces including certain host factors. It was hypothesized that the codon usage in SARS-CoV-2 maybe directly correlated to the codon usage of its host (Ji et al., 2020) . Deoptimization analysis is conducted by contrasting the use of codon in a virus to that of its host. The RCDI values provide an insight into potential virus and host genome co-evolution. Lower RCDI value indicates a virus being more adaptable to its host. Here in our study, human showed lesser mean RCDI value (1.61) than dog (1.753) indicating better adaptation of the virus in J o u r n a l P r e -p r o o f human compared to dog. Lower the RCDI value higher is the CAI value. Higher RCDI value may indicate gene expression during latency period or low translation rate maintenance to achieve error-proof translation (Puigbo et al., 2010) . Higher average CAI values of human compared to dog observed in the present study indicated that dog is less susceptible to COVID 19 than human. However, till now cross-transmission of SARS-CoV-2 between human and dog hasnot been well-understood. The present study was conducted to compare SARS-CoV-2 adaptation in human and dog hosts. The findings of this study may be useful to evaluate and determine the role of other animal species serving as a host to the virus for their potential. It also highlighted the emerging health hazards to human as a result of living in close contact with animals, which may serve as carriers of a pandemic virus like SARS-CoV-2 and a potential source of infection. SARS-CoV-2 is the recently identified emerging virus causing a serious public health emergency across the globe. There is an urgent need to develop an effective vaccine and to identify possible measures for its control. In this study, we compared humans and dogs as the hosts for SARS-CoV-2 on the basis of codon usage patterns. Based on the CAI and RCDI values, SARS-CoV-2 sequences were found to be highly human-adapted. Knowledge of the pattern of codon usage of a virus is helpful to optimize the expression of its protein. Information on enhanced protein expression would be useful in developing a suitable SARS-CoV-2 vaccine candidate by expressing it in various prokaryotic/eukaryotic systems. Detailed information of codon usage may also be used to evolve effective methods to reduce the synthesis of SARS-CoV-2 protein during pathogen replication. Moreover, it may be useful to obtain analogous information for other viruses. The work was carried out in the absence of any commercial or financial relationship which could be viewed as a possible conflict of interest. Rupam Dutta and Lukumoni Buragohain substantially contributed to the conception, design, analysis, interpretation of data, checking and approving final version of the manuscript. Probodh Borah helped in writing and finalization of the manuscript. No human or animal samples were handled in the present study. All the sequences were downloaded from the viral database. Therefore, ethical committee approval is not required. Ala ( The proximal origin of SARS-CoV-2 Viral adaptation to host: a proteome-based analysis of codon usage and amino acid preferences Compositional properties and codon usage of TP73 gene family Causes and implications of codon usage bias in RNA viruses Genetic and codon usage bias J o u r n a l P r e -p r o o f analyses of polymerase genes of equine influenza virus and its relation to evolution Genome-wide analysis of codon usage and influencing factors in chikungunya viruses Evolution of codon usage in Zika virus genomes is host and vector specific The Coronavirus Surface Glycoprotein Hepatitis A virus adaptation to cellular shutoff is driven by dynamic adjustments of codon usage and results in the selection of populations with altered capsids Genome-wide analysis of codon usage bias in Ebolavirus The Critical Role of Codon Composition on the Translation Efficiency Robustness of the Hepatitis A Virus Capsid Engineering genes for predictable protein expression Pangolins Harbor SARS-CoV-2-Related Coronaviruses Analysis of codon usage patterns in Ginkgo biloba reveals codon usage tendency from A/U-ending to G/Cending Codon usage bias in the N gene of rabies virus Cross-species Transmission of the Newly Identified Coronavirus 2019-nCoV Analysis of Nipah Virus Codon Usage and Adaptation to Hosts Comparative molecular epidemiology of two closely related coronaviruses, bovine coronavirus (BCoV) and human coronavirus OC43 (HCoV-OC43), reveals a different evolutionary pattern Revelation of influencing factors in overall codon usage bias of equine influenza viruses A simple method for displaying the hydropathic character of a protein Mesoniviridae: a proposed new family in the order Nidovirales formed by a single species of mosquitoborne viruses DNA methylation in mammals The coronavirus nucleocapsid is a multifunctional protein What we know and what we should know about codon usage Genomic analysis of codon usage shows influence of mutation pressure, natural selection, and host features on J o u r n a l P r e -p r o o f Marburg virus evolution A structural analysis of M protein in coronavirus assembly and morphology Another Decade, Another Coronavirus RCDI/eRCDI: a web-server to estimate codon usage deoptimization CAIcal: a combined set of tools to assess codon usage adaptation The transmissible gastroenteritis coronavirus contains a spherical core shell consisting of M and N proteins The coronavirus E protein: assembly and beyond Compositional bias and size of A comprehensive analysis of genome composition and codon usage patterns of emerging coronaviruses Nucleotide composition of the Zika virus RNA genome and its codon usage Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein Analysis of codon usage bias of mitochondrial genome in Bombyx mori and its relation to evolution Codon usage bias and the evolution of Influenza A viruses. Codon usage biases of Influenza virus Coronavirus diversity, phylogeny and interspecies jumping The 'effective number of codons' used in a gene Analysis of codon usage patterns in herbaceous paeony (Paeoni alactiflora Pall.) based on transcriptome data Gene characteristics of the complete mitochondrial genomes of Paratoxodera polyacantha and Toxodera hauseri (Mantodea: Toxoderidae) Analysis of synonymous codon usage in Hepatitis A virus The distribution of synonymous codon choice in the translation initiation region of dengue virus The authors acknowledge DBT, Govt of India for providing financial support to the host department. No new data has been generated in our study. All the data were obtained from the Virus