key: cord-0766588-vvnhktca authors: Toshio, Kamahora; Soe, Lisa H.; Lai, Michael M.C. title: Sequence analysis of nucleocapsid gene and leader RNA of human coronavirus OC43 date: 1989-01-31 journal: Virus Research DOI: 10.1016/0168-1702(89)90048-8 sha: 8850566763d84cf4c4e33972a184e555992e8eb1 doc_id: 766588 cord_uid: vvnhktca Abstract The nucleotide sequence of the 3'-end of the genomic RNA of human coronavirus OC43 (HCV-OC43) was determined from the cDNA clones of the intracellular virus-specific mRNAs. The nucleotide sequence and the predicted amino acid sequence of the main open reading frame (ORF), which represents the nucleocapsid (N) protein, were highly homologous to those of bovine coronavirus (BCV) Mebus strain. This ORF predicts a protein of 448 amino acids. Additional smaller ORFs are also present in a different reading frame. We have also determined the leader sequence present at the 5'-end of HCV-OC43 mRNAs by a primer extension study. This sequence is highly homologous to that of mouse hepatitis virus, particularly in the 3'-end of the leader sequence, which is postulated to be involved in the unique mechanism of leader-primed transcription. These data suggest that HCV-OC43 and BCV might have diverged from each other fairly recently and that the 3'-end of the leader sequence has significant functional roles. necrotizing enterocolitis (Resta et al., 1985) . Furthermore, coronaviruses have been isolated from autopsied brain of multiple sclerosis patients (Burks et al., 1980) although the identities of these viruses have been questioned (Weiss, 1983) and the work has not been repeated. HCV can be divided into two antigenically distinct groups: one is represented by HCV-0C43, which shares antigenicity with coronaviruses of some other species, such as bovine coronavirus (BCV) and mouse hepatitis virus (MHV) (Macnaughton et al., 1981) . The other group is represented by HCV-229E, which shares antigenicity with porcine transmissible gastroenteritis virus (TGEV) and canine coronavirus (CCV) (Pedersen et al., 1978) . The close homology between HCV-0C43 and BCV, in particular, has been demonstrated by oligonucleotide fingerprinting analysis of the genomic RNA of these two viruses, which showed remarkable similarity (Lapps and Brian, 1985) and also by immunoprecipitation of individual viral proteins with specific antisera (Hogue et al., 1984) . These studies suggested that HCV-OC43 and BCV may have diverged fairly recently. Coronaviruses, as a group, contain a single-stranded, positive-sense RNA genome of 6 X 106-8 X lo6 daltons (Lai and Stohlman, 1978; Lomniczi and Kennedy, 1977) . In infected cells, the viral RNA genome is first transcribed into a full-length negative-strand RNA, which, in turn, is transcribed into a positive-sense genomic RNA and six subgenomic mRNAs (Lai et al., 1981) . These mRNAs are arranged in a 3'-coterminal nested set structure, in which the sequence of every mRNA is contained within the sequence of the next larger mRNA (Lai et al., 1981) . Furthermore, each mRNA contains an approximately 70-nucleotide leader RNA which is derived from the 5'-end of the genomic RNA (Lai et al., 1982 (Lai et al., , 1983 (Lai et al., ,1984 Spaan et al., 1983) . The derivation of the leader RNA takes place by a mechanism of "leader-primed transcription", in which the leader RNA is transcribed independently, dissociated from the template and then binds to the template at the downstream transcriptional start sites (Baric et al., 1983; Makino et al., 1986) . This mechanism appears to involve recognition of a stretch of repeat sequences present at both the 3'-end of the leader sequence and the transcriptional start sites of the template RNA (Shieh et al., 1987; Makino et al., 1988) . Relatively little of the molecular biology of human coronaviruses has been studied. It has been shown that the size and species of virion RNA and intracellular viral RNAs of at least one human coronavirus, 229E, are comparable to those of other coronaviruses (MacNaughton, 1978; Weiss, 1983) . Also, both HCV-OC43 and HCV-229E consist of at least three structural proteins: a nucleocapsid (N) protein of 52 kDa, an envelope peplomer (E2) protein of 190 kDa and a matrix (El) protein of approximately 26 kDa (Schmidt and Kenny, 1982; Hogue et al., 1984) . Several additional minor proteins, such as a glycoprotein of 140 kDa, have also been reported (Hogue et al., 1984; Hierholzer et al., 1972) . In this report, we studied the sequence of both the 3'-and 5'-ends of the genomic RNA of HCV-0C43. We determined the sequence of the nucleocapsid gene, which shows the presence of a very strong conservation between HCV-0C43 and BCV. We also found that the leader sequence of HCV-0C43, particularly at its 3'-end, is highly homologous to that of MHV. The HCV-0C43 strain (McIntosh et al., 1967) was originally obtained from Dr. Marion Cooney of the University of Washington, Seattle, and was propagated on a human rectal tumor (HRT) cell line (Tompkins et al., 1974) . The virus harvested from the medium of infected cell culture was purified according to published procedures (Makino et al., 1984) and viral RNA was extracted as described (Kamahora et al., 1979) . The purified viral genomic RNA was initially used for cDNA cloning using oligo (dT) as a primer for reverse transcription. Subsequent cDNA cloning followed essentially the procedures of Gubler and Hoffman (1983) , as modified by Shieh et al. (1987) . Several cDNA clones specific for HCV-OC43 were obtained; one of these, T16, has a 0.6 kilobase (kb) insert and hybridized to all of the virus-specific subgenomic mRNA species in Northern blot analysis (data not shown). This result suggests that T16 represents a cDNA clone of the 3'-end of the genomic RNA since the subgenomic mRNAs of coronaviruses have a 3'-end co-terminal nested-set structure (Stern and Kennedy, 1980; Lai et al., 1981) ; thus, only the cDNA probe representing the 3'-end of the genomic RNA could hybridize to all of the subgenomic mRNA species. Additional cDNA clones were obtained using poly(A)-containing intracellular RNA from HCV-OC43-infected HRT cells as template and oligo(dT) as primer. The virus-specific clones were identified with a nick-translated T16 as the probe. The positive clones were further tested by Northern blot analysis of the intracellular RNA. All of the cDNA clones selected hybridized specifically to HCV-OC43-specific mRNAs, similar to the mRNA patterns detected with T16 as the probe (data not shown). Thus, these cDNA clones represent sequences of the 3'-end of HCV-0C43 RNA, overlapping at least part of the 3'-most gene, which encodes the N protein. Several representative cDNA clones were sequenced. The sequencing strategy is shown in Fig. 1 . Sequencing was carried out by Sanger's dideoxyribonucleotide chain termination method (Sanger et al., 1977) . In some cases, deoxyinosine triphosphate (Bankier and Barrell, 1983) was used in place of deoxyguanosine triphosphate to reduce GC compression. Clone A3 has a poly A tract of 19 nucleotides at its 3'-end and represents a nearly full-length cDNA clone of mRNA 7 (about 1700 bases). Clone M6 is 2.4 kb in length, the 3'-terminal 1.6 kb of which overlaps with the A3 clone. However, the 5'-end 0.8 kb is distinct from the corresponding region of A3. Thus, the clone M6 probably represents a cDNA clone of mRNA6. This interpretation is consistent with the presence of the consensus intergenic sequence in this clone, at the position upstream of the open reading frame (ORF) of mRNA 7 (see below). The complete sequence of the 3'-terminal 1.7 kb of the HCV-0C43 RNA, covering the entire nucleocapsid gene, was obtained from both A3 and M6 clones. This sequence is shown in Fig. 2 . Both A3 and M6 clones are identical starting from nucleotide 69 in reference to mRNA 7 sequence (see below), until the 3'-end. The entire sequence is translated in three possible reading frames (Fig. 3) . The largest ORF can code for a protein of 448 amino acids. This predicted protein is highly homologous to the corresponding ORF of BCV Mebus strain (Lapps et al., 1987) (Fig. 2) . These two proteins have identical predicted numbers of amino acids. There are 44 nucleotide differences and 11 amino acid substitutions between the two strains. This protein is most likely the nucleocapsid protein of the virus. There are two additional smaller ORFs on the first reading frame. These two ORFs could potentially encode proteins of 60 and 108 amino acids, respectively. This structure is different from that of BCV RNA since the corresponding region of BCV RNA has a single larger continuous ORF capable of encoding a protein of 207 amino acids (Lapps et al., 1987) . The functional significance of these ORFs is not clear. The 5'-ends of the clones A3 and M6 are divergent as shown by the two branched sequences in Fig. 2 . At the juncture of divergence, both clones contain a stretch of sequence, TCTAAAT (nucleotides 69-75 of A3), which is very similar to the postulated consensus leader RNA binding site of MHV (Shieh et al., 1987) . In the case of MHV, this stretch of sequence is present at both the 3'-end of the leader RNA sequence and the intergenic regions of the genomic RNA and subgenomic RNAs (Shieh et al., 1987) . Thus, the 5'-ends of these two clones may represent the sequences of leader RNA and part of mRNA 6-coding sequence, respectively. To determine the leader sequence of HCV-OC43 mRNA, we performed a primer-extension study using a 32P-5'-end-labeled oligodeoxyribonucleotide (5'-CATCCT-TAAAATTTA-3') complementary to nucleotides 71 to 85 (the underlined region in Fig. 2 ) as the primer and mRNA 7 as the template. The cDNA product extended with reverse transcriptase was sequenced by the Maxam and Gilbert method (1980) . The sequence determined by this method is identical to the 5'-end sequence of the clone A3. Similar primer extension studies have also been performed using the same intergenic region ******ACACCGCATTGTTGAGA?iATMT ! synthetic primer and mRNA 6 as the template. The sequence determined from the primer-extended cDNA product was identical to that determined from clone M6 (the sequence preceded by asterisks) (Fig. 2) . Thus, the juncture where the sequences of M6 and A3 clones diverge represents the leader-mRNA fusion site. The sequence analysis presented in this communication confirmed previous reports of close relationship between HCV-OC43 and BCV as revealed by serological analysis (Pedersen et al., 1978; Gema et al., 1981) immunoprecipitation of virion structural proteins (Hogue et al., 1984) and oligonucleotide fingerprinting of genomic RNA (Lapps and Brian, 1985) . Among the 448 amino acids predicted for the N protein, only 11 amino acids differ between the two viruses, which represents 97.5% homology. This result suggests that these two viruses have only recently diverged from each other. Although the remaining sequence of their genomic RNAs have not been compared, they are likely to be very homologous as well since their genomic RNA fingerprints are extremely similar (Lapps and Brian, 1985) . Interestingly, these two viruses have different target cell specificity in vitro and infect different animals. Furthermore, HCV-OC43 causes mainly respiratory illness while BCV mainly affects the gastrointestinal system. Thus, these two viruses may provide a useful tool for understanding the molecular basis of these biological properties of coronaviruses. An additional difference between the sequences of the 3'-end of the genomic RNA of the two viruses is that BCV has a second ORF on a different reading frame from that of the N gene. This ORF has the potential to code for a protein of 207 amino acids (Lapps et al., 1987) . In contrast, the corresponding ORF in HCV-0C43 RNA is interrupted by a termination codon, leaving two smaller ORFs, potentially capable of coding for proteins of 60 and 108 amino acids, respectively. It is not clear whether these ORFs are functional; it is interesting to note that these ORFs have an optimum translation initiation signal, according to M. Kozak's consensus sequences (Kozak, 1984 (Kozak, , 1986 , in both BCV and HCV-OC43 RNA. It will be interesting to determine whether such proteins are synthesized in HCV-OC43 or BCV-infected cells. Our data also show that the leader sequences of HCV-OC43 and MHV are very well conserved (73%) (Fig. 4) . The sequence conservation is particularly notable in the 3'-half of the leader sequence, in which 26 of 27 nucleotides are perfectly matched (96% homology). It has been shown that this region contains the binding sites for the leader RNA to template RNA during the transcription of coronavirus mRNAs (Shieh et al., 1987; Makino et al., 1988) . In contrast, the 5'-end of the leader RNA was not as well conserved (60% homology). This conservation pattern is consistent with the functional significance of the 3'-end of the leader RNA in coronavirus RNA transcription (Shieh et al., 1987) . It is notable that only one UCUAA sequence (nucleotides 69-73) overlaps between mRNA 6 and mRNA 7, while there are three UCUAA repeats in the leader region of mRNA 7 (Fig. 2) . The UCUAA sequence has been implicated in leader RNA binding (Makino et al., 1988) . The presence of multiple UCUAA repeats in the leader RNA predicts that mRNA 7 of HCV-0C43 may be heterogeneous in number of UCUAA repeats, as has been shown for MHV mRNAs (Makino et al., 1988) . When compared to IBV (Brown et al., 1986) , which belongs to a separate antigenic group of coronaviruses, the sequence conservation is not as remarkable (52%) (Fig. 4) . The significance of the leader sequence conservation will require further analysis of additional coronavirus strains. We have previously shown that different strains of MHV could freely exchange the leader RNA on their subgenomic mRNAs during mixed infection (Makino et al., 1986) . Furthermore, different strains of MHV could undergo RNA recombination at a very high frequency (Makino et al., 1986; Keck et al., 1988) . The phenomena of leader RNA reassortment and RNA recombination have not been demonstrated between coronaviruses of different species. The finding presented in this report that the leader RNAs of HCV-OC43 and MHV are highly homologous, particularly in the 3'-terminal region where leader RNA binds to template RNA, suggests that the leader RNA of HCV-0C43 and MHV might be exchangeable during mixed infection. Furthermore, the sequence homology in the nucleocapsid gene and possibly other genes as well suggests that these two viruses might be able to undergo interspecies RNA recombination between them. Such possibilities are currently being investigated in our laboratory. Lai, M.M.C., Patton, C.D. and Stohlman, S.A. (1982) Further characterization of mouse hepatitis virus mRNAs: presence of common 5'-end nucleotides. J. Virol. 41, 557-565. LG, M.M.C., Patton, C.D., Baric, R.S. and Stohhnan, S.A. (1983) Presence of leader sequences in the mRNA of mouse hepatitis virus. J. Viral. 46, 1027-1033. La& M.M.C., Baric, R.S., Brayton, P.R. and Stohlman, S.A. (1984) Shotgun DNA sequencing Ch~acte~~tion of replicative intermediate RNA of mouse hepatitis virus: presence of leader RNA sequences on nascent chains Cloning and sequencing of 5'-terminal sequences from avian infectious bronchitis virus genomic RNA Two coronaviruses isolated from central nervous system tissue of two multiple sclerosis patients Antigenic and biological relationships between human coronavirus 0C43 and neonatal calf diarrhea coronavirus. 3 A simple and very efficient method for generating cDNA libraries Protein composition of coronavirus 0C43 Antigenic relationships among proteins of bovine coronavirus, human respiratory coronavirus OC43, and mouse hepatitis coronavirus A59 RNA specific for the transforming component of avian erythroblastosis virus strain R RNA r~ombination of murine coronaviruses: recombination between fusion-positive MHV-A59 and fusion-negative MHV-2 Compilation and analysis of sequences upstream from the translational start site in eukaryotic mRNAs Bifunctional messenger RNAs in eukaryotes The RNA of mouse hepatitis virus Mouse hepatitis virus AS9: messenger RNA structure and genetic localization of the sequence divergence from the hepatotropic stain MHV-3 We thank Dr. Susan Baker for critical comments and for assistance in preparing figures. We also thank Carol Flores for typing the manuscript.