key: cord-0825395-88mgoxz6 authors: Luytjes, Willem; Bredenbeek, Peter J.; Noten, Ans F.H.; Horzinek, Marian C.; Spaan, Willy J.M. title: Sequence of mouse hepatitis virus A59 mRNA 2: Indications for RNA recombination between coronaviruses and influenza C virus date: 1988-10-31 journal: Virology DOI: 10.1016/0042-6822(88)90512-0 sha: 8151ddc2f3b7ce323d15f39b1d7030b1d7ef84fb doc_id: 825395 cord_uid: 88mgoxz6 Abstract The nucleotide sequence of the unique region of coronavirus MHV-A59 mRNA 2 has been determined. Two open reading frames (ORF) are predicted: ORF1 potentially encodes a protein of 261 amino acids; its amino acid sequence contains elements which indicate nucleotide binding properties. ORF2 predicts a 413 amino acids protein; it lacks a translation initiation codon and is therefore probably a pseudogene. The amino acid sequence of ORF2 shares 30% homology with the HA1 hemagglutinin sequence of influenza C virus. A short stretch of nucleotides immediately upstream of ORF2 shares 83% homology with the MHC class I nucleotide sequences. We discuss the possibilitythat both similarities are the result of recombinations and present a model for the acquisition and the subsequent inactivation of ORF2; the model applies also to MHV-A59-related coronaviruses in which we expect ORF2 to be still functional. Murine hepatitis virus (MHV) is the most widely studied member of the Coronaviridae. This family of enveloped, single-stranded RNA viruses causes considerable economic loss, since coronavirus infections can severely affect cattle, poultry, and pets. Human coronavirus OC43 causes the common cold in man. Murine coronaviruses are of particular interest because several strains can cause a (chronic) demyelinating disease in rats and mice. Forthis reason the pathogenesis of MHV infections is studied as an animal model for virus-induced demyelination . MHV-A59 virions contain an infectious RNA genome, about 30 kb in length, associated with a nucleocapsid protein (N). Two membrane proteins have been identified: the transmembrane glycoprotein El and the large surface glycoprotein E2 (Siddell eta/., 1982) . The MHV-A59 genome is composed of seven different regions (A to G), separated by short, very similar junction sequences (Bredenbeek et al., 1987) . The messenger RNAs that are synthesized during infection are 3'-coterminal, and each extends to a different junction sequence in the 5'direction. This results in a nested set of mRNAs, including the genome, in which each has a different "unique" region at its 5'-end (Leibowitz et a/., 1981; Lai et a/., 1981; Spaan et al., 1982) . All mRNAs share a leader sequence of about 72 nucleotides (Spaan et a/., 1983; Lai et a/., 1984) . ln vitro translated MHV mRNAs encoding the structural proteins N, El, and E2 and the 14.5K nonstructural protein are functionally monocistronic ' To whom requests for reprints should be addressed. (Rottier et al., 198 1; Siddell, 1983) and sequence analyses have shown that the coding regions are located at the 5'-end of these individual mRNAs (Siddell, 1987) . There is one possible exception: sequence analysis of the 5'-end of mRNA 5 (region E) revealed two open reading frames (Skinner et a/., 1985; Budzilowicz and Weiss, 1987) . Whether both reading frames are used is not known. The coronaviruses studied to date show an identical order of the genes encoding the structural proteins: 5'-E2-El -N-3' . Between coronaviruses these genes are highly homologous. In contrast, differences are found in the structure and number of the genes encoding the nonstructural proteins, which is reflected in the number of subgenomic mRNAs that is synthesized by each coronavirus. In infectious bronchitis virus (IBV), feline infectious peritonitis virus (FIPV), and its close relative transmissible gastroenteritis virus (TGEV), members of different antigenic clusters from MHV, the largest subgenomic mRNA encodes the peplomer protein E2 or S (Binns et a/., 1985; Niesters et a/., 1986; Rasschaert and Laude, 1987; Jacobs et al., 1987) . In contrast, in MHV-infected cells an additional, larger RNA (mRNA 2) has been identified Weiss and Leibowitz, 1983) . ln vitro translation of this mRNA yields a 30K-35K protein (Leibowitz et a/., 1982; Siddell, 1983) . In MHV-JHM-infected cells, small amounts of a 30K protein can be detected (Siddell et a/., 1981) . However, the size of the unique region of mRNA 2, approximately 2 kb, indicates a larger coding capacity. In order to study the function of mRNA 2 we have cloned and sequenced region B of MHV-A59. Here we present its primary structure and show that it contains two open reading frames (ORF). The predicted amino acid sequence of the second ORF is remarkably similar to the HA1 sequence of the hemagglutinin protein of influenza C virus. We discuss the possibility that this ORF has been acquired by a recombination event. MATERIALS AND METHODS cDNA synthesis and cloning A MHV-A59-specific cDNA library was created using random primers on purified genomic RNA. Procedures were identical to those described previously (Luytjes et al., 1987) . Full details will be presented elsewhere (P. J. Bredenbeek et al., manuscript in preparation) . Recombinant cDNA clones were selected by hybridization (Meinkoth and Wahl, 1984) to oligonucleotide probes specific for the viral mRNAs (P. J. Bredenbeek et a/., manuscript in preparation). Plasmid DNA from recombinant clones was prepared according to Birnboim and Doly (1979) . Inserts were subcloned into Ml 3 vectors (Messing, 1983) . Selection of Ml 3 subclones specific for the unique region of mRNA 2 was performed by hybridizing phage supernatant to pentamer primed probes (Feinburg and Vogelstein, 1983; Roberts and Wilson, 1985) from previously oligonucleotide-selected cDNA clones. Sequence analysis was essentially done according to Sanger et al. (1977) . Computer assembly of sequence data was performed using the Staden program set (1986) . The predicted amino acid sequences were compared to the National Biomedical Research Foundation (NBRF) Protein Library (release 11) using the FASTP program set created by Lipman and Pearson (1985) . Additional analysis of similarities was carried out with the DIAGON program of Staden (1982) . We have recently constructed an almost complete random-primed cDNA library of the MHV-A59 genome. A set of oligonucleotides was synthesized, based upon the sequence of previously obtained MHV-A59-specific cDNA clones which had been mapped on the viral mRNAs (P. J. Bredenbeek, manuscript in preparation). OL 4 (specific for mRNA l), OL 6 (mRNA 2) and OL 7 (mRNA 3, see Luytjes et al., 1987) were used to screen the cDNA library for clones covering region B. Two completely overlapping clones (30, 96) and several clones with partial overlaps (4D, 35, F71, 95, 918) were isolated. Clone 96 was digested with Sau3A and subsequently ligated into the BamHl site of M 13mp9. The other selected cDNA clones were subcloned using restriction enzymes as indicated in Fig. 1 . Each nucleotide of region B was determined on at least two different cDNA clones and selected regions on three or more cDNA clones. of the unique region of mRNA 2 The 3'-end of region B has already been identified at the junction sequence 5'-UAAUCUAAAC-3', which separates it from the peplomer coding sequence (Luytjes et a/., 1987) . The only other potential junction sequence within the consensus sequence of the region B-specific cDNA clones was found at position -9589 ( Fig. 1 ) from the start of the poly(A)-tail of the genome: 5'-AAAUCUAUAC-3' (Fig. 2) . Immediately upstream of this sequence an ORF terminates, the primary structure of which shows a high similarity to the 3'-terminal sequence of the unique region of IBV mRNA F (Boursnell et al., 1987 , and data not shown). This strongly suggests that the junction sequence at position -9589 corresponds to the 5'-end of the unique region of mRNA 2. The consensus nucleotide sequence of region B is 2 176 residues long (Fig. 2 ). It contains two open reading frames. The first open reading frame (ORFl) starts 18 nucleotides downstream from the junction sequence and is 261 amino acids (aa) long. The second ORF (ORF2) starts 903 nucleotides downstream and is 413 aa long. It terminates 23 nucleotides upstream from the junction sequence that separates regions B and C (the peplomer gene). Between ORFl and ORF2 lies a stretch of 92 nucleotides with several termination codons in each reading frame (see Fig. 2 ). In ORFl three potential translation initiation codons can be found. The first AUG is in a strong context (Kozak, 1986 ) and is therefore most probably used. The coding capacity of ORFl is 30K, which is in agreement with the products obtained after in vitro translation of mRNA 2. There are no membrane protein sequence characteristics, such as a signal sequence, a transmembrane anchor sequence, or potential N-glycosylation sites. Diagon comparison (Staden, 1982) ORFl amino acid sequence with available sequences of other coronaviruses did not reveal any similarities. A FASTP similarity search (Lipman and Pearson, 1985) of the NBRF protein library produced an alignment to several proteins with nucleotide binding properties (data not shown). Recently, consensus sequence elements have been published, for which an involvement in nucleotide binding is proposed (Dever et al., 1987; Fry et a/., 1986) . Three regions in the ORFl sequence match to these elements (Fig. 3) . ORF2 does not start with an AUG codon; the first potential initiation codon within ORF2 is found at position 1 10. Interestingly, in the region upstream of ORF2 an AUG codon (position 879) is found in a favorable context, which precedes a short reading frame, separated from ORF2 by only one opal termination codon (Fig. 2) . This short reading frame is 90% homologous (83% at the nucleotide level) to the N-terminus of the signal sequence of several MHC class I genes ( Fig. 4 ; Schepat-t eta/., 1986). There is no other significant similarity between class I sequences and any MHV sequence. The region overlapping the end of ORFl and the beginning of ORF2 has been sequenced on three independent cDNA clones. The sequences are identical, excluding the possibility that the presence of the termination codon is a cloning or sequencing artifact. The sequence of ORF2 shows characteristics of a membrane protein sequence: the C-terminal hydro-phobic residues (underlined in Fig. 2 ) could provide a membrane anchor and 10 potential N-glycosylation sites are present. The most remarkable aspect of the ORF2 sequence came from FASTP analysis of the NBRF protein library: the predicted amino acid sequence encoded by ORF2 shows a 30% homology with the HA1 sequence of the hemagglutinin protein of influenza C virus (Nakada et a/., 1984; Pfeiffer and Compans, 1984) . The alignment presented in Fig. 5 shows that several regions are completely identical and that many conservative substitutions (Dayhoff eta/., 1983) are present. We could not detect similarities between the predicted ORF2 amino acid sequence and other influenza C (or A or B) virus sequences, nor was there any similarity to available coronavirus sequences. In this paper we present the primary structure of the unique region of MHV-A59 mRNA 2. Sequence analysis revealed two ORFs. ORFl has a coding capacity of 30K. In vitro translation of mRNA 2 of MHV-JHM (Siddell, 1983 ) and MHV-A59 (Leibowitz et al., 1982) yielded a 30K protein. Also in MHV-JHM-infected cells small amounts of a 30K protein have been detected (Siddell et a/., 1981) . This suggests that this protein is encoded by ORFl from mRNA 2. We assume that the ORFl translation product is initiated at the 5'-proximal AUG since this codon is in a preferred context (Kozak, 1986) . The presence of three consensus elements in Fry et al. (1986) and Dever et al. (1987) . the sequence of ORFl with possible nucleotide binding and phosphorylating properties (Dever et a/., 1987; Fry et a/., 1986) suggests a role for its product in virus replication or phosphorylation of the nucleocapsid protein (Siddell et a/., 1982) . Experiments are in progress to establish whether the ORFl product is essential for MHV, in view of the fact that a mRNA 2 is absent in cells infected with coronaviruses from other antigenic clusters. Unexpected was the presence of a second open reading frame, ORF2, located between ORFl and the peplomer gene, without a translation initiation codon, showing a remarkable amino acid similarity to the HA1 sequence of influenza C virus. The percentage of identity is high enough to rule out convergent evolution (Dayhoff et a/., 1983; Doolittle, 1981) . We believe that this similarity is the result of a recombination between coronaviruses and influenza C virus. Recent studies have indicated that coronaviruses are indeed capable of recombination. Makino et al. (1986) described homologous recombination between coronaviruses in mixed infections; the stretch of 267 nucleotides that we have found in the MHV-A59 peplomer gene and that is absent in MHV-JHM (Luytjes et al., 1987) could indicate a nonhomologous recombination. In MHV-A59-infected cells a protein that can be assigned to ORF2 has never been detected (Siddell et a/., 1982) . Since nonfunctional reading frames of RNA viruses show a high rate of mutation (Holland et a/., 1982) ORF2 must be either functional or the result of recent genetic changes. In the first case, possible ways of translating ORF2 would be either internal initiation at AUG codons in suboptimal contexts (which is unlikely) or protein initiation at an upstream AUG codon MHV at position -33 from the start of ORF2 and readthrough of the opal termination codon at position -3. Opal suppression has been reported for RNA viruses (Strauss et a/., 1983; March et al., 1987) and can be an important feature of the viral translation strategy. Internal initiation combined with read-through of an opal termination codon would probably lead to undetectable amounts of protein in infected cells. The number and location of termination codons in the region between ORFl and ORF2 excludes the possibility of frame shifting. In the second case ORF2 could have been acquired recently by recombination between MHV and influenza C virus. However, there is considerable evolutionary distance between both viruses: the nucleotide sequences of ORF2 and the HA1 gene are not similar and the codon usage in both reading frames is different (data not shown). Therefore, recombination must have taken place between ancestors of these viruses. This means that closely related coronaviruses should exist in which ORF2 is still expressed and that ORF2 in MHV-A59 must have been recently inactivated by genetic changes. An ORF2 product would range in size from 45K (unglycosylated) to 65K (N-glycosylated) and several coronaviruses containing additional proteins in this range have been reported. MHV-JHM, which shares at least 879/o homology with MHV-A59 in the nucleotide sequences from the peplomer gene down to the poly(A)-tail (Luytjes et a/., 1987) encodes one additional glycoprotein: gp 65 (Siddell, 1982) . Sequence data indicate that the corresponding gene must be located upstream of the peplomer protein gene. Taguchi et al. (1985 Taguchi et al. ( , 1986 sequence. Identical residues are boxed, substitutions scoring 0 or positive according to Dayhoff et a/. (1983) are indicated by colons. Dashes represent gaps which were inserted to maximize similarity. The sequence was taken from Nakada et a/. (1986) . and an additional mRNA 2a, intermediate in size between mRNA 2 and mRNA 3. Bovine coronavirus (BCV) shows a strong similarity to MHV-A59 in the nucleocapsid and matrix protein sequences (Lapps et a/., 1987) and it contains an additional spike protein E3, a hemagglutinin (King et a/., 1985; Deregt et a/., 1987) . The size of the hemagglutinin monomer is 65K and BCV also encodes a mRNA 2a (Keck et al., 1987) . The data on these coronaviruses lead us to suggest that ORF2 in MHV-A59 corresponds to the reading frames Weiss and Leibowitz, 1983 ). This could have been the result of an accumulation of recent point mutations. However, the strong similarity at both the amino acid and the nucleotide levels between the region immediately upstream of the opal termination codon (in front of ORF2) and the 5'-end of the coding region of several MHC class I mRNAs indicates that the initiation codon of ORF2 and the junction sequence upstream were lost because of a recent nonhomologous recombination event with MHC mRNA. The suggested homology between ORF2 of MHV-A59 and the BCV E3 gene leads us to propose a model for the relation between several coronaviruses in the antigenic cluster of MHV. Human coronavirus OC43 is closely related to BCV (Lapps and Brian, 1985) and shows sequence similarity to MHV-A59 (Hogue et al., 1984; Weiss, 1983 6 ). This model is supported by recent experiments performed in cooperation with Drs. R. Vlasak and P. Palese (Vlasak eT al., 1988) which show that BCV and OC43 recognize the same receptor and possess the same esterase activity as has been reported for the influenza C virus hemagglutinin protein (Vlasak et a/., 1987) . It has been suggested that virus evolution is a modular event, in which viral genomes are the result of the assembly of a set of primitive genes (see Goldbach, 1987) . This mechanism can offer an alternative explanation for the relation between MHV and influenza C virus. However, the similarity with MHC RNA and the previously repot-ted extra stretch of nucleotides in the A59 peplomer gene (Luytjes et al., 1987) indicate that coronaviruses are probably capable of nonhomologous recombination during replication. To date nonhomologous recombination at the RNA level in animal RNA viruses has been reported only for defective interfering RNA (see King et al., 1987) . Coronaviruses are the first example of nontumor RNA viruses being able to take up directly into their genome genetic material from the host cell. This may be a strong force in generating strains with new host spectra and tissue tropisms and could have important implications for the prevention of coronavirus infections. Cloning and sequencing of the gene encoding the spike protein of the coronavirus IBV A rapid alkaline extraction procedure for screening recombinant plasmid DNA Completion of the sequence of the genome of the coronavirus avian infectious bronchitis virus Sequences involved in the replication of coronaviruses In vitro synthesis of two polypeptides from a nonstructural gene of coronavirus mouse hepatitis virus strain A59 Establishing homologies in protein sequences. ln cDNA cloning and sequence analysis of the gene encoding the peplomer protein of feline infectious peritonitis virus Intracellular RNAs of feline infectious peritonitis coronavirus strain 79-1146 Structural proteins of bovine coronavirus and their intracellular processing GTP-binding domain: Three consensus sequence elements with distinct spacing Similar amino acid sequences: Chance or common ancestry? DNA probes by random priming with Klenow synthesis ATP-binding site of adenylate kinase: Mechanistic implications of its homology with ras-encoded ~21, F,-ATPase, and other nucleotide-binding proteins Genome similarities between plant and animal RNAviruses Antigenic relationships among proteins of bovine coronavirus, human respiratory coronavirus OC43, and mouse hepatitis coronavirus A59 Rapid evolution of RNA genome 19'87). The nucleotide sequence of the peplomer gene of porcine transmissible gastroenteritis virus (TGEV): Comparison with the sequence of the peplomer protein of feline infectious peritonitis virus (FIPV) An outbreak of influenza C in a children's home Temporal regulation of RNA synthesis of bovine coronavirus Genetic recombination in RNA viruses Bovine coronavirus hemagglutinin protein Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes Characterization of leader RNA sequences on the virion and mRNAs of mouse hepatitis virus, a cytoplasmic RNA virus Mouse hepatitis virus A59: mRNA structure and genetic localization of the differences from hepatotropic strain MHV-3.1. Viral Oligonucleotide fingerprints of antigenically related bovine coronavirus and human coronavirus OC43 Sequence analysis of the bovine coronavirus nucleocapsid and matrix protein genes Cell-free translation of murine coronavirus RNA The virus-specific intracellular RNA species of two murine coronaviruses: MHV-A59 and MHV-JHM Rapid and sensitive protein similarity searches Primary structure of the glycoprotein E2 of coronavirus MHV-A59 and identification of the trypsin cleavage site Highfrequency RNA recombination of murine coronaviruses Antigenic relationship among the coronaviruses of man and between human and animal coronaviruses Hybridization of nucleic acids immobilized on solid supports New Ml3 vectors for cloning. ln Regulation of translation of viral mRNAs. ln "The Molecular Basis of Viral Reproduction The peplomer protein sequence of the M41 strain of coronavirus IBV and its comparison with Beaudette strains Structure of the influenza C glycoprotein gene as determined from cloned DNA The predicted primary structure of the peplomer protein E2 of the porcine coronavirus transmissible gastroenteritis virus DNA probes by random priming with Klenow synthesis Translation of three mouse hepatitis virus strain A59 subgenomic RNAs in Xenopus laevis oocytes DNA sequencing with chain terminating inhibitors The nucleotide sequence and comparative analysis of the H-2DP class I H-2 gene Coronavirus JHM: Tryptic peptide fingerprinting of virion proteins and intracellular polypeptides Coronavirus JHM: Coding assignments of subgenomic mRNAs The organization and expression of coronavirus genomes Coronavirus JHM: Intracellular protein synthesis The structure and replication of coronaviruses Coronavirus MHV-JHM has a sequence arrangement which potentially allows translation of a second, downstream open reading frame Coronavirus mRNA synthesis involves fusion of non-contiguous sequences Isolation and identification of virus-specific mRNAs in ceils infected with mouse hepatitis virus (MHV-A59) Sequence relationships between the genome and the intracellular RNA species 1, 3, 6, and 7 of mouse hepatitis virus strain A59 An interactive graphics program for comparing and aligning nucleic acid and amino acid sequences The current status and portability of our sequence handling software Sequence coding for the alphavirus nonstructural proteins is interrupted by an opal termination codon Characterization of a variant virus isolated from neural cell culture after infection of mouse coronavirus JHMV Characterization of a variant virus selected in rat brains after infection by coronavirus mouse hepatitis virus JHM The influenza C virus glycoprotein (HE) exhibits receptor-binding (hemagglutinin) and receptor-destroying (esterase) activities Human and bovine coronaviruses recognize sialic acid containing receptors similar to those of influenza C viruses The biology and pathogenesis of coronaviruses Coronaviruses SD and SK share extensive nucleotide homology with murine coronavirus MHV-A59, more than that shared between human and murine coronaviruses Characterization of murine coronavirus RNA by hybridization with virus-specific cDNA probes The authors thank Dr. J. A. Lenstra for stimulating discussions, Drs. B. A. M. van der Zeijst and P. Rottier for critical reading of the manuscript, and Dr. A. Maagdenberg of the Duphar B. V. computer facility for setting up the computer programs. Part of the computer analyses have been performed using the CAOS-CAMM system of the University of Nijmegen, The Netherlands. W.L. was supported by a grant from Duphar B. V., Weesp, The Netherlands. P.B. was supported by a grant from the Netherlands Foundation for Chemical Research (SON) with financial aid from the Netherlands Organization for the Advancement of Pure Research (ZWO).