key: cord-0733500-qybhe5hm authors: Woo, Patrick C.Y.; Wong, Beatrice H.L.; Huang, Yi; Lau, Susanna K.P.; Yuen, Kwok-Yung title: Cytosine deamination and selection of CpG suppressed clones are the two major independent biological forces that shape codon usage bias in coronaviruses date: 2007-12-20 journal: Virology DOI: 10.1016/j.virol.2007.08.010 sha: 520b2eaf8d2e9a3473534f56a1241c0d425b7547 doc_id: 733500 cord_uid: qybhe5hm Using the complete genome sequences of 19 coronavirus genomes, we analyzed the codon usage bias, dinucleotide relative abundance and cytosine deamination in coronavirus genomes. Of the eight codons that contain CpG, six were markedly suppressed. The mean NNU/NNC ratio of the six amino acids using either NNC or NNU as codon is 3.262, suggesting cytosine deamination. Among the 16 dinucleotides, CpG was most markedly suppressed (mean relative abundance 0.509). No correlation was observed between CpG abundance and mean NNU/NNC ratio. Among the 19 coronaviruses, CoV-HKU1 showed the most extreme codon usage bias and extremely high NNU/NNC ratio of 8.835. Cytosine deamination and selection of CpG suppressed clones by the immune system are the two major independent biochemical and biological selective forces that shape codon usage bias in coronavirus genomes. The underlying mechanism for the extreme codon usage bias, cytosine deamination and G + C content in CoV-HKU1 warrants further studies. Cytosine deamination and selection of CpG suppressed clones are the two major independent biological forces that shape codon usage bias in coronaviruses Codon usage bias is one of the most important indicators of the selective forces that shape genome evolution. In general, codon usage bias may be a result of mutation pressure and/or relative abundance of the corresponding acceptor tRNA molecules. For human RNA viruses, it has been observed in one study that codon usage bias was related to mutation pressure, G + C content, segmented nature of the genome and the route of transmission of the virus (Jenkins and Holmes, 2003) . In other studies, it has been suggested that mutation pressure may result in bias in dinucleotide usage, such as CpG suppression, in small eukaryotic viruses (Karlin et al., 1994; Shackelton et al., 2006) . Other factors, such as cytosine deamination, which results in C → U changes, have also been proposed to be responsible for shaping the G + C contents and GC skews of RNA viruses . Recently, it has been observed that codon usage is an important driving force in the evolution of astroviruses and small DNA viruses (Sewatanon et al., 2007; van Hemert et al., 2007) . Despite all these fragmented observations, no study has integrated the various factors and been able to explain the basis for codon usage bias in viruses successfully. Coronaviruses are positive sense, single-stranded RNA (ssRNA) viruses found in a wide range of animals in which they can cause respiratory, enteric, hepatic and neurological diseases of varying severity. The sizes of the genomes of coronaviruses are about 30 kb, the largest among RNA viruses. Based on genotypic and serological characterization, coronaviruses were divided into three distinct groups (Brian and Baric, 2005; Lai and Cavanagh, 1997; Ziebuhr, 2004) . As a result of the low fidelity of the RNA-dependent-RNA polymerases, the mutation rates of RNA virus genomes are high, in the order of 1 per 10,000 nucleotides replicated. Furthermore, the unique mechanism of viral replication has resulted in a high frequency of recombination in coronaviruses (Lai and Cavanagh, 1997; Woo et al., 2006b) . Their tendency for recombination and high mutation rates have made their genomes highly plastic, allowed them to adapt to new hosts and ecological niches, and given them the potential to be good candidates for causing pandemics. These factors have made the study of coronavirus evolution particularly important, both biologically and for practical purposes (Grigoriev, 2004; Gu et al., 2004; Yap et al., 2003) . However, the relative importance of the various selective forces that shape the codon usage bias in coronaviruses and their underlying biological and biochemical basis are still poorly understood. The recent severe acute respiratory syndrome (SARS) epidemic, the discovery of SARS coronavirus (SARS-CoV) and identification of SARS-CoV-like viruses from Himalayan palm civets and a raccoon dog from wild live markets in China have led to a boost in interests in discovery of novel coronaviruses in both humans and animals Marra et al., 2003; Peiris et al., 2003; Rota et al., 2003; Snijder et al., 2003; Woo et al., 2004) . For human coronaviruses, in 2004, a novel group 1 human coronavirus, human coronavirus NL63 (HCoV-NL63), was reported (Fouchier et al., 2004; van der Hoek et al., 2004) ; and in 2005, we described the discovery, complete genome sequence and molecular diversity of another novel group 2 human coronavirus, coronavirus HKU1 (CoV-HKU1) (Lau et al., 2006; Woo et al., 2005a Woo et al., ,b,c, 2006b . As for animal coronaviruses, six group 1 (Poon et al., 2005; Tang et al., 2006; Woo et al., 2006a; Lau et al., 2007) , six group 2, including bat SARS coronavirus, sable antelope coronavirus, giraffe coronavirus, and two new subgroups of group 2 coronaviruses Li et al., 2005; Woo et al., 2006a Woo et al., , 2007 , 11 group 3 (Cavanagh et al., 2002; East et al., 2004; Jonassen et al., 2005; Liu et al., 2005; Hasoksuz et al., 2007) coronaviruses, and two unclassified coronaviruses from Asian leopard cats and Chinese ferret badgers (Dong et al., 2007) have recently been described. Since the number of coronavirus species with complete genomes available has increased from 9 in 2003 to 19 in 2007, this has provided a golden opportunity to study genome evolution in coronaviruses. In this study, we analyzed the codon usage bias, dinucleotide relative abundance, cytosine deamination in coronavirus genomes and the codon usage bias in the hosts of the various coronaviruses. The relative importance of the various forces in shaping the codon usage bias in the various coronaviruses and the extreme codon usage bias and cytosine deamination in CoV-HKU1 were also discussed. The mean (S.D.) effective number of codons (Nc) of the 19 coronaviruses is 45.448 (4.207) ( Table 1 ). The codon usage fractions in the 19 coronavirus genomes are shown in Table 2 . For all amino acids, the codon usage patterns of every individual coronavirus species are similar to the general codon usage patterns in coronaviruses. CoV-HKU1, HCoV-NL63, murine hepatitis virus (MHV) and bat coronavirus HKU5 (bat-CoV HKU5) are the four coronaviruses with relatively larger number of codons showing usage fractions outside the mean ± 2 S.D. usage fraction range of the corresponding codons, probably due to their relatively high (MHVand bat-CoV HKU5) or low (CoV-HKU1 and HCoV-NL63) G + C contents (Tables 1 and 2) . To study the possible effect of CpG suppression on codon usage bias, the usage fractions of the eight codons that contain CpG (CCG, GCG, UCG, ACG, CGC, CGG, CGU and CGA) were analyzed. Of these eight codons, six [CCG (mean 0.058), GCG (mean 0.060), UCG (mean 0.038), ACG (mean 0.070), CGG (mean 0.038) and CGA (mean 0.060)] were markedly suppressed. CGC is slightly suppressed (mean 0.122) whereas CGU is over-represented (mean 0.322). To study the possible effect of cytosine deamination on codon usage bias, codons of amino acids that can use C or U in the codons were analyzed. For all amino acids that only use either NNU or NNC as codon (asparagine, histidine, aspartic acid, tyrosine, cysteine and phenylalanine), all NNU are markedly over represented with usage fractions of more than 0.700, whereas the usage fractions of all NNC are less than 0.300. For amino acids that use NNU, NNC or other codons (threonine, isoleucine, proline, leucine, alanine, glycine, valine and serine), the usage fractions of all NNU are at least three times more than those of the corresponding NNC. For leucine, UUA (mean 0.223) is used much more frequently than CUA (mean 0.081), and UUG (mean 0.261) is used much more frequently than CUG (mean 0.072). To study the possible effect of A ↔ G transition on codon usage bias, codons of amino acids that can use A or G in the codons were analyzed. For amino acids that use either NNA or NNG as codons (lysine, glutamine and glutamic acid) and those that use NNA, NNG or other codons but excluding those codons with CpG (arginine, glycine and valine), the usage fractions of NNA are often higher than those of NNG, but the differences between the usage fractions of NNA and NNG are not as marked as those between the usage fractions of NNU and NNC. Among all the 19 coronaviruses, CoV-HKU1 showed the most extreme codon usage bias. CoV-HKU1 is the only coronavirus that showed Nc outside the mean ± 2 S.D. range. CoV-HKU1 also possessed the lowest G + C content, highest GC skew, lowest percentages of G and C and highest percentage of U among all coronavirus genomes (Table 1) . For the six amino acids that only use either NNU or NNC as codon (asparagine, histidine, aspartic acid, tyrosine, cysteine and phenylalanine), amino acids that use NNU, NNC or other codons (threonine, isoleucine, proline, leucine, alanine, glycine, valine and serine), and for leucine that use UNN or CNN as codon, the average (S. D.) ratio of the usage fractions of the codons with U to those with C is 9.66 (2.49) ( Table 2 ). For amino acids that use either NNA or NNG as codons (lysine, glutamine and glutamic acid) and those that use NNA, NNG or other codons but excluding those codons with CpG (arginine, glycine and valine), the average (S. D.) ratio of the usage fractions of the codons with A to those with G is 2.72 (0.57) ( Table 2) . The codon usage fractions in the hosts of coronaviruses, including human, mouse, pig, cat and chicken, are shown in Table 3 . To study the possible effect of CpG suppression on codon usage bias, the usage fractions of the eight codons that contain CpG (CCG, GCG, UCG, ACG, CGC, CGG, CGU and CGA) were analyzed. Among these eight codons, six (CCG, GCG, UCG, ACG, CGU and CGA) were suppressed, of which five were also suppressed in the coronavirus genomes. To study the possible effect of C ↔ U transition and A ↔ G transition on codon usage bias, codons of amino acids that can use C or U and those of amino acids that can use A or G in the codons were analyzed. No pattern of difference was observed between the use of NNU and NNC and between the use of NNA and NNG. The relative abundance of the 16 dinucleotides in the 19 coronavirus genomes are shown in Table 4 . Among the 16 dinucleotides, the relative abundance of CpG showed the most marked deviation from the "normal range" (mean ± S.D. = 0.509 ± 0.063, 0.271 less than 0.78), with all 19 genomes showing CpG under-representation. In addition, the relative abundance of UpG and CpA also showed slight deviation from the "normal range" (mean ± S.D. = 1.331 ± 0.057 and 1.257 ± 0.070, respectively, both N 1.23), with all 19 and 13 genomes showing UpG and CpA over-representation, respectively. The relationship between CpG suppression and cytosine deamination in the 19 coronavirus genomes is shown in Fig. 1 . The mean (S.D.) of the NNU/NNC in the six amino acids that only use either NNC or NNU as the codons of the 19 coronavirus genomes is 3.262 (1.785). CoV-HKU1 showed extremely high NNU/NNC ratio of 8.835. No significant correlation was observed between CpG abundance and mean NNU/NNC ratio in the 19 coronavirus genomes (r = − 0.339, P = 0.156). Marked CpG suppression is observed in all coronavirus genomes. The discovery of Toll-like receptors (TLRs) that recognize pathogen-associated molecular patterns and the downstream molecular pathways was one of the biggest advances in the understanding of vertebrate innate immunity in recent years. Among the TLR that recognize viral components, TLR3, 7, 8 and 9 detect viral nucleic acids (Bowie and Haga, 2005) . It has been shown that TLR9 bound to CpG of double-stranded DNA and elicited the downstream inflammatory response, and administration of CpG oligodeoxynucleotides has been shown to protect mice from herpes simplex virus 2 Table 3 Codon usage fractions in different hosts of coronaviruses a Codons with CpG are in red and codons of amino acids that use either NNC or NNU as the codon are in green. (For interpretation of the references to colour in this table legend, the reader is referred to the web version of this article.) infections (Ashkar et al., 2003; Lund et al., 2003) . Furthermore, it has been shown that CpG is under-represented in the genomes of small DNA viruses, which could be related to their evasion of the host immune systems (Karlin et al., 1994; Shackelton et al., 2006) . Although CpG suppression was also observed in RNA viruses, no known TLR has been shown to recognize CpG of ssRNA. However, recently it has been shown that ssRNA can stimulate human CD14 + CD11c + monocytes to produce large amounts of interleukin 12, but this activation of monocytes by CpG oligoribonucleotides was not mediated through TLR3, 7, 8 or 9 (Sugiyama et al., 2005) . The results suggested that CpG oligoribonucleotides may stimulate monocytes through a novel mechanism distinct from previously known immunostimulatory nucleic acids. In the present study, we showed that the mean CpG relative abundance in the coronavirus genomes is markedly suppressed (Table 4 ). This concurs with the results observed in a study on di-and trinucleotide frequencies in nine coronaviruses 10 years ago (Tobler and Ackermann, 1998) . The most logical way to avoid CpG is to mutate them to either UpG or CpA. This is in line with the observation that these two dinucleotides are over-represented in the coronavirus genomes, but their deviations from the upper limit of the "normal range" is not as remarkable as that of CpG from the lower limit of the "normal range", as the CpG suppression pressure is equally shared by UpG and CpA over-representation. Interestingly, only CpG containing codons in the context of purine-CpG (ACG and GCG), pyrimidine-CpG (UCG and CCG) and CpG-purine (CGA and CGG) are suppressed (Table 2) , whereas CpGpyrimidine (CGU and CGC) are not. However, when trinucleotide frequencies were analyzed in the 19 coronavirus genomes, all the eight trinucleotides with CpG were suppressed (Fig. 2) . This indicates that there is probably another force that has led to an increase use of CGU and CGC as codons for arginine, but this force does not act on trinucleotides over the whole genome in general. This force is probably unrelated to the relative abundance of the corresponding tRNA molecules in the hosts of the coronaviruses, as the pattern of bias in the hosts is not the same as that in the coronaviruses. In addition to CpG suppression, marked cytosine deamination is also observed in all coronavirus genomes. Although it has been recognized that deamination of cytosine is a significant source of spontaneous mutations for a few decades (Duncan and Miller, 1980) , DNA-cytosine deaminases, which are able to attack cytosines in single-stranded DNA, have only been discovered in the recent few years (Bransteitter et al., 2003; Sohail et al., 2003) . The discovery of the ability to edit human immunodeficiency virus DNA, and subsequently RNA as well, by the human cytidine deaminase APOBEC3G has allowed the speculation that APOBEC-mediated cytosine deamination may contribute to the sequence variation of RNA viruses that replicate without any DNA intermediates (Bishop et al., 2004) . GC skew, which reflects cytosine deamination, has been studied in various coronaviruses, and it has been shown that the GC skews of coronavirus genomes become less pronounced in the one third of the genome that encodes the structural proteins (Grigoriev, 2004; Pyrc et al., 2004) . In the present study, using the six amino acids that are only encoded by NNU or NNC, hence excluding most other pressures that may affect the relative abundance of cytosine and uracil, we showed that all these NNU and NNC had usage fractions of N 0.700 and b0.300, respectively (Table 2 ). In fact, for all codons that encode the same amino acid and with either C or U in any position, the usage fraction of the codon that uses U is invariably higher than the one that uses C in all coronaviruses. Furthermore, the percentage of C showed strong inverse relationships with the percentage of U in coronavirus genomes (r = − 0.902, P b 0.0001) (Fig. 3) . All these suggest that cytosine deamination is an important biochemical force in shaping coronavirus evolution. Table 4 Relative abundance of the 16 dinucleotides in the 19 coronavirus species with complete genomes available a Numbers N1.23 and b0.78 are shown in red and green, respectively. (For interpretation of the references to colour in this table legend, the reader is referred to the web version of this article.) Cytosine deamination and selection of CpG suppressed clones by the immune system are the two major independent biochemical and biological selective forces that shape codon usage bias in coronavirus genomes. Codon usage bias in coronaviruses is unrelated to the relative abundance of the corresponding tRNA molecules, as the patterns of bias in codon usage fractions in the hosts are not the same as those in the coronaviruses (Tables 2 and 3 ). Although others have tried to explain variations in codon usage in coronaviruses by compositional constraints (Gu et al., 2004) , we think that both codon usage bias and nucleotide composition of the coronavirus genomes, which are apparently related to each other, are both results of other biological and biochemical selective forces, rather than nucleotide composition as a cause of codon usage bias. On the other hand, most of the codon usage bias in the coronaviruses can be easily explained by CpG suppression and cytosine deamination (Table 2) . For asparagine, isoleucine, histidine, aspartic acid, glycine, valine, tyrosine, cysteine and phenylalanine, NNU are used more frequently than NNC because of cytosine deamination. For lysine, glutamine and glutamic acid, NNA are used slightly more frequently than NNG because of cytosine deamination in the minus strand during RNA replication. For threonine, ACG is suppressed because of CpG suppression and ACU is used more frequently than ACC because of cytosine deamination. For arginine, CGA and CGG are suppressed because of CpG suppression and CGU is used more frequently than CGC because of cytosine deamination. AGA is used more frequently than AGG and CGA is used more frequently than CGG because of cytosine deamination in the minus strand during RNA replication. For proline, CCG is suppressed because of CpG suppression and CCU is used more frequently than CCC because of cytosine deamination. For leucine, CUU is used more frequently than CUC, UUA is used more frequently than CUA, and UUG is used more frequently than CUG because of cytosine deamination. For alanine, GCG is suppressed because of CpG suppression and GCU is used more frequently than GCC because of cytosine deamination. For serine, UCG is suppressed because of CpG suppression and UCU is used more frequently than UCC while ACU is used more frequently than ACC because of cytosine deamination. In addition to showing that CpG suppression and cytosine deamination are probably the two most important biological/biochemical forces that shape codon usage bias, we also demonstrated that these two forces are independent (Fig. 1) , although cytosine deamination and subsequent selection of CpG suppressed clones by the immune system may be one of the mechanisms that has led to the resultant CpG suppression. Furthermore, we speculate that the species-specific number of CpG containing codons may not simply be the result of mutation pressure to avoid CpG, but an equilibrium between the immune pressure and the required number of CpG containing codons to serve biological functions such as to maintain RNA structure stability. Such an additional factor could explain the mere correlation between the NNU/NNC ratio and CpG dinucleotide abundance. The underlying mechanism for the extreme codon usage bias, cytosine deamination and G + C content in CoV-HKU1 is enigmatic. The contribution of cytosine deamination to genome evolution varies from very low to very high among the 19 coronavirus genomes. For bat-CoV HKU5, SARS-CoV and bat-SARS-CoV, the mean NNU/NNC ratios are less than 1.7 (Fig. 1) . Codon usage bias in these coronaviruses is relatively mild (Nc of 53.23, 49.423 and 49.882, respectively; Table 1) , and is mainly due to CpG suppression (Table 2) . On the other hand, for CoV-HKU1, the mean NNU/NNC ratio is more than 8.8 (Fig. 1) , which is likely a result of rapid cytosine deamination. Although the biochemical basis for this extreme cytosine deamination is not known, this is probably the explanation for the extremely strong codon usage bias in CoV-HKU1 (Nc of 35.671) and its lowest G + C content of 32% among all coronavirus genomes (Table 1) . One genome sequence of each of the 19 coronavirus species with complete genome sequence available was downloaded from the GenBank database (Table 1 ). The genomes of the hosts of the coronaviruses, including those of human, mouse, pig, cat and chicken, were also downloaded. Codon usage bias was calculated according to the method described by Wright (1990) . Using this method, when only one codon is used for each amino acid, Nc for the virus would be 20, and when all codons are used equally, the Nc for the virus would be 61. The codon usage fraction of a particular codon in a genome is calculated by the ratio of the number of that codon to the number of the amino acid that codon and other synonymous codons encode for in the protein coding sequence of the genome. The method for calculating codon usage bias accounting for background nucleotide composition (Nc′) (Novembre, 2002) was not used because it had been proposed to suffer from methodology problems, although those problems did not affect the conclusions which had been drawn by using Nc of this study (Fuglsang, 2006) . The relative abundance of the dinucleotides in the coronavirus genomes was assessed using the method described by Karlin and Burge (1995) . The odds ratio ρ xy = f xy /f x f y , where f x denotes the frequency of the nucleotide X and f xy the frequency of the dinucleotide XY, etc., for each dinucleotide were calculated. From data simulations and statistical theory, ρ xy ≤ 0.78 (extreme under-representation) or ρ xy ≥ 1.23 (extreme over-representation) occurs for sufficiently long (≥ 20 kb) random sequences with the probability at most 0.001 for virtually any base composition. To study possible correlations between CpG suppression and cytosine deamination in coronaviruses, the relative abundance of CpG and the mean ratio of NNC to NNU in the six amino acids (asparagine, histidine, aspartic acid, tyrosine, cysteine and phenylalanine) that only use either NNC or NNU as the codons (NNU/NNC ratio, representing contribution of cytosine deamination) were calculated for all 19 coronavirus genomes. Analysis of correlation between CpG deamination and NNU/ NNC ratio was performed using Pearson's correlation (SPSS version 11.0). Engineering the largest RNA virus genome as an infectious bacterial artificial chromosome Local delivery of CpG oligodeoxynucleotides induces rapid changes in the genital mucosa and inhibits replication, but not entry, of herpes simplex virus type 2 APOBECmediated editing of viral RNA Completion of the sequence of the genome of the coronavirus avian infectious bronchitis virus The role of Toll-like receptors in the host response to viruses Activationinduced cytidine deaminase deaminates deoxycytidine on single-stranded DNA but requires the action of RNase Coronavirus genome structure and replication Coronaviruses from pheasants (Phasianus colchicus) are genetically closely related to coronaviruses of domestic fowl (infectious bronchitis virus) and turkeys Comparison of genomic and predicted amino acid sequences of respiratory and enteric bovine coronaviruses isolated from the same animal with fatal shipping pneumonia Detection of a novel and highly divergent coronavirus from Asian leopard cats and Chinese ferret badgers in southern china Mutagenic deamination of cytosine residues in DNA Coronavirus infection of spotted hyenas in the Serengeti ecosystem A previously undescribed coronavirus associated with respiratory disease in humans Accounting for background nucleotide composition when measuring codon usage bias: brilliant idea, difficult in practice Mutational patterns correlate with genome organization in SARS and other coronaviruses Analysis of synonymous codon usage in SARS coronavirus and other virus in the Nidovirales Isolation and characterization of viruses related to the SARS coronavirus from animals in southern China Switching species tropism: an effective way to manipulate the feline coronavirus genome Biologic, antigenic, and full-length genomic characterization of a bovine-like coronavirus isolated from a giraffe The extent of codon usage bias in human RNA viruses and its evolutionary origin Molecular identification and characterization of novel coronaviruses infecting graylag geese (Anser anser), feral pigeons (Columbia livia) and mallards (Anas platyrhynchos) Dinucleotide relative abundance extremes: a genomic signature Why is CpG suppressed in the genomes of virtually all small eukaryotic viruses but not in those of large eukaryotic viruses? Completion of the porcine epidemic diarrhoea coronavirus (PEDV) genome sequence The molecular biology of coronaviruses Severe acute respiratory syndrome coronavirus-like virus in Chinese horseshoe bats Coronavirus HKU1 and other coronavirus infections in Hong Kong Complete genome sequence of bat coronavirus HKU2 from Chinese horseshoe bats revealed a much smaller spike gene with a different evolutionary lineage from the rest of the genome Altered pathogenesis of a mutant of the murine coronavirus MHV-A59 is associated with a Q159L amino acid substitution in the spike protein Bats are natural reservoirs of SARS-like coronaviruses Isolation of avian infectious bronchitis coronavirus from domestic peafowl (Pavo cristatus) and teal (Anas) Toll-like receptor 9 mediated recognition of herpes simplex virus-2 by plasmacytoid dendritic cells Accounting for background nucleotide composition when measuring codon usage bias Coronavirus as a possible cause of severe acute respiratory syndrome Identification of a novel coronavirus in bats Genome structure and transcriptional regulation of human coronavirus NL63 Compositional bias and size of genomes of human DNA viruses Evolutionary basis of codon usage and nucleotide composition bias in vertebrate DNA viruses Unique and conserved features of genome and proteome of SARS-coronavirus, an early split-off from the coronavirus group 2 lineage Human activation-induced cytidine deaminase causes transcriptiondependent, strand-biased C to U deaminations CpG RNA: identification of novel single-stranded RNA that stimulates human CD14 + CD11c + monocytes Prevalence and genetic diversity of coronaviruses in bats from Infectious RNA transcribed in vitro from a cDNA copy of the human coronavirus genome cloned in vaccinia virus Comparison of the di-and trinucleotide frequencies from the genomes of nine different coronaviruses Identification of a new human coronavirus Host-related nucleotide composition and codon usage as driving forces in the recent evolution of the Astroviridae Complete genomic sequence of human coronavirus OC43: molecular clock analysis suggests a relatively recent zoonotic coronavirus transmission event Evolutionary history of the closely related group 2 coronaviruses: porcine hemagglutinating encephalomyelitis virus, bovine coronavirus, and human coronavirus OC43 Relative rates of non-pneumonic SARS coronavirus infection and SARS coronavirus pneumonia In silico analysis of ORF1ab in coronavirus HKU1 genome reveals a unique putative cleavage site of coronavirus HKU1 3C-like protease Characterization and complete genome sequence of a novel coronavirus, coronavirus HKU1, from patients with pneumonia Clinical and molecular epidemiological features of coronavirus HKU1-associated community-acquired pneumonia Molecular diversity of coronaviruses in bats Comparative analysis of 22 coronavirus HKU1 genomes reveals a novel genotype and evidence of natural recombination in coronavirus HKU1 Comparative analysis of 12 genomes of three novel group 2c and group 2d coronaviruses reveals unique group and subgroup features The effective number of codons used in a gene Relationship of SARS-CoV to other pathogenic RNA viruses explored by tetranucleotide usage profiling Complete genomic sequences, a key residue in the spike protein and deletions in nonstructural protein 3b of US strains of the virulent and attenuated coronaviruses, transmissible gastroenteritis virus and porcine respiratory coronavirus Molecular biology of severe acute respiratory syndrome coronavirus We are grateful to the generous support of Mr.