key: cord-0719015-mj9ea464 authors: Lapps, William; Hogue, Brenda G.; Brian, David A. title: Sequence analysis of the bovine coronavirus nucleocapsid and matrix protein genes date: 1987-03-31 journal: Virology DOI: 10.1016/0042-6822(87)90312-6 sha: b61a737696c300f44b13e14b311fea3eb594245b doc_id: 719015 cord_uid: mj9ea464 Abstract The 3′ end of the 20-kb genome of the Mebus strain of bovine enteric coronavirus (BCV) was copied into cDNA and cloned into the PstI site of the pUC9 vector. Four clones from the 3′ end of the genome were sequenced either completely or in part to determine the sequence of the first 2451 bases. Within this sequence were identified, in order, a 3′-noncoding region of 291 bases, the gene for a 448-amino acid nucleocapsid protein (N) having a molecular weight of 49,379, and the gene for a 230-amino acid matrix protein (M) having a molecular weight of 26,376. A third large open reading frame is contained entirely within the N gene sequence but is positioned in a different reading frame; it potentially encodes a polypeptide of 207 amino acids having a molecular weight of 23,057. A higher degree of amino acid sequence homology was found between the M proteins of BCV and MHV (87%) than between the N proteins (70%). For the M proteins of BCV and MHV, notable differences were found at the amino terminus, the most probable site of O-glycosylation, where the sequence is N-Met-Ser-Ser-Val-Thr-Thr for BCV and N-Met-Ser-Ser-Thr-Thr for MHV. BCV apparently uses two of its six potential O-glycosylation sites. The bovine enteric coronavirus (BCV) is one cause of severe enteritis in calves and may be responsible for as much as one-quarter of all deaths due to this disease (House, 1978) . Vaccines produced from cell culture-attenuated strains of virus have failed to be completely protective. Before attempting to develop vaccines by recombinant DNA that may have improved usefulness, it is imperative that the genes and gene products responsible for inducing protective immunity be thoroughly characterized. Toward this end, and for the purpose of determining the function of individual proteins in coronavirus replication, we have begun to clone and sequence the BCV genome. BCV is known to possess a single-stranded, nonsegmented, polyadenylated RNA genome of approximately 20 kb (Guy and Brian, 1979; Lapps and Brian, 1985) . The total number of genes encoded by the genome is not known, but presumably, because of its close antigenic relatedness to the well-characterized mouse hepatitis coronavirus, BCV will be similar to MHV in the number and arrangement of genes on its genome. One striking dissimilarity between BCV and MHV, however, is the possession by BCV of a fourth major structural protein, the 140-kDa hemagglutinin protein that comprises two disulfide-linked subunits of 65 kDa (Hogue et al., 1984; King and Brian, 1982; King et a/., 1985) . Questions therefore arise concerning not only its origin, function, and role in inducing protective immunity, but also the location of the hemagglutinin ' To whom requests for reprints should be addressed. gene on the genome and the resulting divergence from the MHV genome structure. In this paper we describe experiments that begin to examine the BCV genome by cDNA cloning and DNA sequencing. Within the 3'2451 -base sequence we find a gene map that parallels that for MHV. We report the primary structure for the N and M genes and their deduced amino acid sequences. Structural comparisons with other coronavirus N and M sequences are made and some conserved structural domains are identified. The Mebus strain of bovine coronavirus (BCV) was plaque purified and grown on the human rectal tumor (HRT) cell line as previously described (Hogue et al., 1984; Lapps and Brian, 1985) . Confluent monolayers of cells grown in 150-cm2 flasks were infected with a multiplicity of approximately 1 PFU per cell. After 1.5 hr adsorption at 37", inoculum was removed and 15 ml of the appropriate medium and radioisotope was added. Viral polypeptides were labeled by adding 400 &i 3H-labeled essential amino acids (150-200 mCi/mg; ICN) per flask in medium containing 10% normal essential amino acid concentration and 2% fetal calf serum (Sterile Systems, Inc.). Viral glycoproteins were labeled by adding 400 &i of [3H]glucosamine (5-l 5 Ci/mmol, ICN) per flask to me-dium containing 5% fetal calf serum. Virus was harvested and purified by isopycnic sedimentation in continuous sucrose gradients as previously described (Hogue et a/., 1984; Lapps and Brian, 1985) . g, 25". RNA sedimenting faster than mammalian 28 S ribosomal RNA was recovered by ethanol precipitation. cDNA cloning of the 3' end of the BCV genome Polyacrylamide gel electrophoresis and immunoblotting The discontinuous buffer gel system of Laemmli (1970) was used as previously described (Hogue et a/., 1984) . For examining intracellular proteins, whole cell lysates were prepared by sonication. Cells in 60-mm petri dishes were washed twice with cold phosphatebuffered saline (PBS), scraped into cold PBS, and pelleted by centrifugation at 2000 rpm. The cell pellet was suspended in 100 ~1 sterile distilled water, sonicated for 10 set in a bath sonicator, and stored at -80". For inhibitor studies, tunicamycin (Sigma) was used at a final concentration of 1.2 or 12 @I and monensin (Calbiochem) was used at a final concentration of 1 .O @I. Tunicamycin or monensin was added to cells immediately after virus adsorption and was incubated with the cells for a total of 24 hr, the time of cell lysate preparation. For electrophoresis, equal volumes of cell lysate and double-strength sample treatment buffer were mixed and heated at 100" for 5 min prior to electrophoresis. Unit strength sample treatment buffer is 0.125 MTris-HCI (pH 6.8)-49/o sodium dodecyl sulfate-5 n/r urea. For immunoblotting, a modified method of Towbin et a/. (1979) was used as previously described (Hogue et a/., 1984) . The preparation of rabbit antiserum against individual BCV proteins was previously described (Hogue et a/., 1984) . Monoclonal antiserum to human coronavirus OC43 M protein, which also recognizes BCV M protein, was a gift from J. Fleming, University of Southern California. Purification of genomic RNA BCV genomic RNA was cloned using a modified method of Gubler and Hoffman (1983) . First-strand synthesis was carried out in a volume of 50 ~1 containing 50 mM Tris-HCI (pH 8.1) 148 mM KCI, 8 mM MgCI,, 1 mA# DTT, 2 mll/l each of the four dNTPs, 10 &i [32P]dCTP (3000 Ci/mmol, ICN), 15 units RNasin (Promega), 50 pmol oligo (dT,2-,8) , 3 pg BCV RNA, 20 U reverse transcriptase (Seikagaku) for 1 hr at 37", and the reaction was stopped by adding 2.5 ~10.5 /M EDTA. BCV RNA was heated to 100" for 5 min and quick cooled to 37" immediately before its addition to the reaction. Reaction products were extracted with phenol-chloroform-isoamyl alcohol and ethanol precipitated after adding ammonium acetate. Second-strand synthesis was carried out as described by Gubler and Hoffman in 100 ~1 containing 20 mMTris-HCI (pH 7.5) 5 mlLI MgC&, 10 mM (NH&S04, 10 mM KCI, 0.15 mlW@-NAD, 40 prW dNTPs, 8.5 U/ml Escherichia co/i RNase H, 230 U/ml DNA polymerase I, 10 U/ml DNA ligase, and all of the product from firststrand reaction. Free nucleotides were removed by three cycles of ethanol precipitation of the reaction product and the total quantity of product was estimated from the amount of radiolabeled first strand that remained. Double-stranded cDNA was homopolymer tailed essentially by the method of Roychoudhury and Wu (1980) . The following were added to the dried DNA in order: 20 I.CI 1 OX cacodylate-Tris buffer (1.4 M K-cacodylate, 0.3 NI Tris-HCI (pH 7.6)). 4 @I 5 mM DlT, 3 ~1 10 mM dCTP, 162 ~1 H20, 2 ~1 100 ml\/l CoCI,, 50 &i[a -32P]dCTP (>3000 Ci/mol, ICN) in 5 ~1, and 16 units of TdT in 2 &I. The reaction was carried out at 37" for 1 min and then stopped by adding 10 ~1 0.5 M EDTA and 2 ~1 10% SDS, and the product was phenolchloroform-isoamyl alcohol extracted and ethanol precipitated. This reaction assumed an average size of 1 kbp for the ds cDNA and was designed to give an average of 15 dCMP residues per 3' end of dsDNA. C-tailed ds cDNA was annealed to G-tailed, Pstl-linearized pUC9 vector (PL Biochemicals) and E. co/i strain JM103 was transformed by the method of Hanahan (1983) using a total concentration of DNA of less than 0.1 fig/ml. Virus was purified from clarified supernatant fluids as described above. One-tenth of the virus preparation was labeled with rH]uridine (400 Ci/mmol, ICN), 20 &i/ml, in order to follow RNA purification. Viral RNA was extracted using the proteinase K-SDS method (Lapps and Brian, 1985) and phenol-chloroform-isoamyl alcohol extraction and was ethanol precipitated after adding sodium acetate. Because subgenomic RNA species are incorporated into BCV virions (Lapps and Brian, 1985) full-length genomic RNA to be used for cDNA cloning and for making probe for colony screening was selected by rate-zonal sedimentation on preformed 5-ml linear gradients of 15 to 30% sucrose (w/w) made up in TNE (10 mM Tris-HCI (pH 7.5) 100 mM NaCI, 1 rnb EDTA)-0.1 % SDS. RNA was dissolved in water and sedimented for 1.5 hr at 1 10,000 Identification of large inserts containing 3'-specific BCV sequences Cells containing recombinant plasmids were observed as white colonies on YT agar plates that con-tained 100 pg ampicillin/ml, 1 m/U IPTG, 0.004% X-gal and were transferred to nitrocellulose (Millipore, HATF) and probed with random-primed cDNA copied from BCV genomic RNA (Maniatis et al., 1982) . 32P-labeled, random-primed cDNA was synthesized as described above for the oligo(dT)-primed reaction except that dNTP concentrations were 2.5 PM each, 0.2 pg RNA was used, and oligo(dT) was replaced by 20 pg fragmented calf thymus DNA. Colonies yielding strong signals were analyzed for plasmid size and inserts of 1 .O to 4.1 kb (the largest) were further analyzed by Southern hybridization with 32P-labeled poly(dT) to detect poly(dA). 32P-labeled poly(dT) probe was prepared as described above for the oligo(dT)-primed reaction except that 50 pmol oligo(dT) . poly(rA) (PL Biochemicals) replaced the RNA. Alkali-treated [32P]poly(dT) probe was incubated for hybridization at 37" for 12 hr, then at 20" for 36 hr, and blots were washed in 2~ SSC, 0.1% SDS at 20". Plasmids were purified by alkaline lysis and cesium chloride centrifugation as described by Maniatis et al. (1982) and restriction endonuclease mapping was done as described by Smith and Birnstiel(1976) using plasmids that were labeled at the Sa/l site within the multiple cloning linker region. Restriction fragments end-labeled with 32P were isolated and sequenced by the method of Maxam and Gilbert (1980) . Many endlabeled fragments of less than 700 bases were first strand-separated before sequencing (James and Bradshaw, 1984) . Sequences were analyzed with the aid of the program developed by Queen and Korn (1984) marketed as part of the Beckman Microgenie program, March 1986 version (Beckman Instruments, Inc.). cDNA cloning and sequencing of four clones from the 3' end of the genome Starting material for cDNA cloning was approximately 3 pg of rate-zonally purified genomic RNA obtained from 500 ml of tissue culture medium. An estimated 70 ng of ds cDNA was generated and from this 670 white colonies were obtained. By colony screening, 89 colonies gave a strong signal to [32P]~DNA prepared from genomic RNA, and of these, 9 had inserts ranging from 1.2 to 4.1 kb as determined by agarose gel electrophoresis of linearized plasmids. The 9 clones were further analyzed to determine their restriction enzyme maps and poly(A) content. Only one of the clones, an insert of 1.2 kb identified as clone CB9, reacted strongly under hybridization conditions by Southern blotting to 32P-labeled oligo(dT). Three other clones identified as MN3 (2.1 kb), MA5 (2.8 kb), and MA7 (4.1 kb) were found to contain sequences that overlap with CB9 on the basis of hybridization and restriction endonuclease maps (data not shown). The orientation of all four clones in reference to the 20-kb virus genome and the restriction enzyme sites used for sequencing are illustrated in Fig. 1 . Our orientation presumes polyadenylation at only the 3' end of the genome and this is based on the documented 3' polyadenylation site in the avian infectious bronchitis virus and mouse hepatitis virus genomes (Lai et a/., 1981; Stern and Kennedy, 1980) . The strategy used for sequencing is described in the legend to Fig. 1 . Initially, clone MN3 was sequenced completely and was found to contain all of the N and part of the M sequences. Greater than 98% of the sequence containing the complete N gene was determined either by sequencing both strands of clone MN3 or by repeated sequencing of the same strand using different methods of end labeling. Some of the sequences were confirmed from subclones of MN3. To complete the sequencing of the M gene, clone MA5 was sequenced from its second Ddel site and parts of MA7 were sequenced as described in Fig. 1 . The total sequence of the M gene was determined by sequencing both strands of DNA and by repeated sequencing of some fragments using different methods of end labeling. The total nucleotide sequence of 2451 bases from the 3'end of the genome and the deduced amino acid sequences for the three largest open reading frames contained in this sequence are illustrated in Fig. 2 . All possible translation products were deduced for both virus-sense RNA and virus complementary-sense RNA (Fig. 3) because of the precedent that some singlestrand RNA virus genomes are of ambisense polarity (Auperin et a/., 1984) . RNA complementary to coronavirion-sense RNA could therefore theoretically function as mRNA. Figure The largest open reading frame extends from base 817 through base 2 160 and predicts a 448-amino acid protein of 49,379 mol wt. We conclude this to be the coding sequence for the nucleocapsid protein (N) for the following reasons: (i) The only BCV protein de-scribed to date that approaches this size is the 52-kDa phosphorylated nucleocapsid protein (King and Brian, 1982) . (ii) The predicted protein is basic, a property expected of nucleic acid-binding proteins. Fifty-nine (13%) of the amino acids are basic whereas 43 (109/o) are acidic, giving the protein a net charge of +16 at neutral pH. (iii) The amino acids encoded by this sequence share extensive (70%) sequence homology with the N protein of the closely related mouse hepatitis virus strains A59 and JHM (Armstrong et al., 1983; Skinner and Siddell, 1983) . The N gene for BCV shares other properties with the N gene of MHV. (i) It is rich in serine. Forty-two residues of serine make it the most abundant amino acid. (ii) It is flanked on its 5'side by the gene for the M protein, and it is flanked on its 3'side by a noncoding region of 291 bases, only 3 bases fewer than that for MHV JHM. I I I I I I I I 1 3'end MN3 I I MA5 MA7 . Sequencing strategy used to obtain BCV genomic sequences containing the N and M genes. (A) Strategy for obtaining the N gene sequence. Clone MN3 was sequenced completely and clones CB9 and MA5 were sequenced in part. The internal Ddel, Pstl, Sau3A I, and Xbal sites derived by restriction endonuclease mapping and the HindIll and SalI sites in the multiple cloning region of the pUC9 vector were the sites used for DNA sequencing. n , 0 and a indicate sites labeled at the 5' end using polynucleotide kinase for clones MN3, CB9. and MA5, respectively. l , 0, and 0 indicate sites labeled at the 3' end using reverse transcriptase, and the appropriate labeled deoxynucleotide triphosphate, for clones MN3, CB9, and MA5, respectively. 4 indicates sites labeled at the 3' end using radiolabeled cordycepin and terminal transferase for a subclone of MN3. (B) Strategy for obtaining the M gene sequence. Parts of clones MN3. MA5, and MA7 were sequenced beginning with the second Ddel site from the 5'end of clone MA5.H, Kl, and •l indicate restriction sites that are 5'end labeled with polynucleotide kinase for clones MN3, MA5, and MA7, respectively. l and 0 indicate sites that were 3' end labeled using reverse transcriptase and the appropriate labeled deoxynucleotide triphosphate for clones MN3 and MA5, respectively. Uniquely labeled molecules for sequencing were obtained from gels after electrophoretic separation of restriction endonuclease-treated. end-labeled fragments or after strand separation. The orientation of the clones in the pUC9 vector are as shown except for CB9 which is inverted; i.e., the poly(A) end is next to the HindIll site of the vector. pUC9 sequences are indicated as a bold line at the end of the restriction endonuclease maps. genes is very similar. Beginning with the first base following the M gene termination codon, the sequence is UAUCUAAACUUUAAGG for BCV, and UCUAAAC-UUUAAGG for MHV. (iv) The consensus sequence surrounding the initiation codon for the N gene, AG-GAUGU, is the same, and is a preferred sequence for translation initiation among eukaryotic messenger RNAs (Kozak, 1983) . The second largest open reading frame predicts a protein having properties of the matrix protein and also identifies potential 0-glycosylation sites The second largest open reading frame extends from base 112 through base 804 and predicts a 230-amino acid protein having a molecular weight of 26,376 (Fig. 2) . This protein has extensive amino acid homology with the M protein of MHV A59 strain (Armstrong et a/., 1984) as expected from its close antigenic relatedness (Hogue et a/., 1984) and is therefore the apparent BCV counterpart (Fig. 4) . By maximum alignment of the proteins, 200 of the amino acids (>86%) are the same as 200 of the 228 in the MHV sequence, and another 16 (7%) represent conservative changes. Because of the strong similarity in structure between the BCV and MHV M proteins, the BCV M protein can be expected to have a similar topology with respect to the virion envelope (Rottier et a/., 1986). Namely, the central portion of the molecule would be expected to span the membrane three times, with approximately 28 amino acids (26 for MHV) at the amino terminus being external to the virion and approximately 100 amino acids at the carboxy terminus being internal to the virion. The protein is slightly basic, having a net charge of +9 at neutral pH. The basic amino acids are clustered in the carboxy terminal 40% of the protein. Within the carboxy terminal 100 amino acids are 14 of the 20 basic amino acids and 6 of the 14 acidic amino acids, giving this region of the molecule a net charge of f8. It is therefore reasonable to expect that this part of the molecule might be interacting with the negatively charged RNA as suggested (Sturman et al., 1980) or possibly with an acidic portion of the N protein to contribute to a direct interaction between the M and N molecules. We predict the latter occurs on the basis of a 1 :l molar ratio between the M and N proteins in BCV (King and Brian, 1982) . One hundred thirteen (49%) of the amino acids are hydrophobic and the distribution of hydrophobic amino acids is nearly identical to that for the MHV M protein. The M proteins of BCV and MHV were together the first viral glycoproteins shown to possess O-linked oli-gosaccharides (Holmes et a/., 1981; Niemann and Klenk, 1981) . The character of the oligosaccharides, however, has been described only for MHV A59 . Our data suggest there may be up to two O-linked oligosaccharides per BCV M molecule. First, three separate species of M (gp26) molecules were identified from purified BCV when radiolabeled proteins were resolved by electrophoresis (Fig. 5 ). These were also observed, but less clearly resolved, when identified by immunoblotting with M-specific polyvalent antiserum (Fig. 5) or with M-specific monoclonal antiserum (data not shown). They have apparent molecular weights of 22K, 24K, and 26K and their appearance is consistent with the notion that the 22K species is the unglycosylated precursor and 1 or 2 oligosaccharide chains, each contributing approximately 2 kDa toward the molecular weight (Klenk and Rott, 1981) are added to assemble a 24-and 26-kDa species, respectively. Second, only three species of M protein were resolved in lysates of infected cells by immunoblotting and neither their sizes nor relative amounts were altered by tunicamycin, an inhibitor of N-glycosylation but not 0-glycosylation. Tunicamycin does, however, inhibit the glycosylation of gpl90 (the peplomeric protein for which the virion-associated subunits are gpl20 and gp100) and gp140 (the hemagglutinn) (Hogue and Brian, in preparation) . In the presence of radiolabeled glucosamine only the 24-and 26-kDa species were labeled (Fig. 5) . The fact that monensin, an inhibitor of Golgi function and hence O-glycosylation, diminishes the amount of the 24-and 26-kDa species and enhances the relative abundance of the 22-kDa species strengthens the notion the M glycosylation is O-linked (Niemann et a/., 1982) . Assuming the BCV M protein is glycosylated in the region external to the virion envelope, i.e., within the first 28 amino acids of the N terminus, then the serine residues at positions 2 and 3 or the threonine residues at positions 5, 6, 12, and 14 are potential sites for Oglycosylation (Fig. 4) . If, as presumed for MHV, the glycosylation sites are primarily within the N-terminal N-Met-Ser-Ser-Thr-Thr sequence, a region identical to the glycosylated region of glycophorin A, then the sequence per se may not be an absolute requirement for glycosylation since the N terminal sequence for BCV is N-Met-Ser-Ser-Val-Thr-Thr. The discrepancy between the observed molecular weight of 22 kDa for the unglycosylated polypeptide and the molecular weight of 26,376 deduced from sequence data could be explained by a strong tendency of the hydrophobic regions of the M protein to remain in close proximity, even in the presence of SDS, giving rise to more rapidly migrating globular molecules. Certainly such behavior would explain the self-aggregation T I LKAQEGLLLIPDLLHAHP 1470 1500 1530 1560 TCTAGTGCAGGATCGCGTAGTAGAGCCAATTCTGGCAACAGAACCCCTACCTCTGGTGTAACACCTGATATGGCTGATCAAATTGCTAGTCTTGTTCTGGCAAAACTTGGCAAGGATGCC SSAGSRSRANSGNRTPTSGVTPDMADQIASLVLAKLGKDA LVQDRVVEPILATEPLPLV 1590 1620 1650 1680 ACTAAGCCACAGCAAGTAACTAAGCAGACTGCCAAAGAAAAAGCCCCGCCAGAAGAGGAGCCCCAATAAACAATGCACTGTTCAGCAGTGT~GGGAAG TKPQQVTKQTAKEIRQK ILNKPRQKRSPNKQC T V Q Q C F G K 1710 1740 1770 1800 AGAGGCCCCAATCAGAATTTTGGTGGTGGAGAAATGTPAAAACTTGG~CTAGTGACCCACAG~CCCCATTCTTGCAGAACTCGCACCCACAGCTGGTGCGTTTTTCTTTGGATCAAGA RGPNQNFG GGEMLKLGTSDPQ F P I LAELAPTAGAFFFGSR 1830 1860 1890 1920 TTAGAGTTGGCCAAAGTGCAGAATTTGTCTGGGAATCTTGGAATTGCGCTATAATGGTGCAATTAGATTTGACAGTACACTTTCAGGTTTTGAGACC LELAKVQN LSGNLDEPQ KDVYELRYNGAIRFDSTLSGFET 1950 1980 2010 2040 ATAATGAAGGTGTTGAATGAGAATTTGAATGCATATCAACAACAAGATGGTATGATGAATATGAGTCCAAAACCACAGCGTCAGCGTGGTCAGAAG~TGGACAAGGAGAAAATGATAAT IMKVLNENLNAYQQQD GMMNMSPKPQRQRGQKNGQGENDN 2070 2100 2130 2160 ATAAGTGTTGCAGCGCCTAAAAGCCGTGTGCAGCAAAATAAGAGTAGAGAGTTGACTGCAGAGGACATCAGCCTTCTTAAGAAGATGGATGAGCCCTATACTGAAGACACCTCAGAAATA ISVAAPKSRVQQNKSRELTAEDI SLLKKMDEPYTEDTSEI 2190 2220 2250 2280 TAAGAGAATGAACCTTATGTCGGCACCTGGTGGTAAGCCCTCGCAGGAAAGTCGGGATAAGGCACTCTC~ATCAGAATGGATGTC~GCTGCTATAATAGATAGAGAAGGTTATAGCAGA 2310 2340 2370 A large open reading frame found internal to the N gene, but in a different reading frame, predicts a 207-amino acid protein having a molecular weight of 23,057 The nucleotide sequence from base 878 through base 1498 in the second reading frame encodes a 207amino acid protein of 23,057 (Fig. 2) . This protein is hypothetical since we have no proof yet of its existence. The protein has a net charge of +l at neutral pH and is moderately hydrophobic since 79 (38%) of its amino acids are hydrophobic. The hydrophobic amino acids are spread somewhat evenly throughout the protein except at the carboxyl terminus, where there are enough to make this part of the protein a potential membrane anchor region. Twenty-seven of the terminal 41 amino acids (66%) are hydrophobic. The existence of the protein cannot be ruled out on the basis of the consensus sequence (GUAAUGG) surrounding its initiation codon since it is one commonly used, being found at the initiation site of 18% of all eukaryotic mRNAs catalogued (Kozak, 1983) nor can it be ruled out on the basis of codon usage since it is similar to that used for the N and M proteins. We present the first nucleotide sequence data available for BCV or for any member of the hemagglutinating mammalian coronavirus subgroup which includes the human respiratory coronavirus OC43 and the porcine hemagglutinating encephalitis virus. Despite the fact that BCV has the hemagglutinin structural protein that is missing on MHV A59 (Hogue et a/., 1984; King et a/., 1985) it shares membership with MHV in one of the four major antigenic subgroups of coronaviruses (Pedersen et a/., 1978) . Both the gene map and the primary sequence for that part of the BCV genome described in this paper reflect a close relatedness to MHV, consistent with patterns of shared antigenicity between the two viruses (Hogue et a/., 1984) . Genome sequence divergence with regard to the hemagglutinin gene must therefore lie 5'-ward of this sequence. Both gene arrangement and primary sequence at the 3' end of the genome, however, suggest a greater degree of evolutionary divergence from both the porcine transmissible gastroenteritis virus (TGEV) and avian infectious (Kapke and Brian, 1986) ; IBV has two open reading reading frame for a potential 9.1 K protein positioned frames for proteins of 7.5K and 9.5K positioned bebetween the 3'-noncoding sequence and the N gene tween the N and M genes . The N protein of BCV shows an overall amino acid sequence homology of 709/o with both MHV A59 and MHV JHM (72% at the nucleotide level) (Skinner and Siddell, 1983; Armstrong et a/., 1983) but only 29% (37% at the nucleotide level) with the N protein of TGEV (Kapke and Brian, 1986 ) and 29% (43% at the nucleotide level) with the N protein of IBV (Boursnell et a/., 1985) . The degree of homology between the N amino acid sequences of BCV and MHV is not evenly distributed throughout the gene, however. There are regions of up to 16 amino acid stretches, for example, that show less than 309/o homology. Conversely there are regions of up to 69-amino acid stretches showing greater than 90% homology. A region of high homology among MHV (beginning at amino acid 86) IBV (beginning at amino acid 53) and TGEV (beginning at amino acid 53), and extending for 68 positions (Kapke and Brian, 1986) is also found in BCV (beginning at amino acid 83). Within this region there is 79% perfect homology between BCV and MHV. Such regions of conservation suggest that there are evolutionary pressures for retention of a specific function associated with this sequence. Other regions having similar chemical properties but little primary sequence homology also suggest conserved functional domains. These include clusters of serine residues and clusters of basic and acidic amino acids. Assuming all coronavirus N proteins are phosphorylated at only serine residues, as in the N protein of MHV (Stohlman and Lai, 1979) then "hot spots" for potential phosphorylation become apparent when the N protein sequences are compared (Fig. 6) . By aligning the N proteins of MHV, BCV, IBV, and TGEV with the first amino acid of the conserved 68 amino acid region, three clusters of 3-l 2 serine residues in common among all viruses become apparent at BCV amino acid positions 40-70, 180-225, and 300-350. The major serine cluster region is at amino acid positions 180-225. Cluster groups of 5 to 26 basic amino acids can be seen within 50 residues of the amino terminus, within the 68-amino-acid conserved region, between amino acid positions 200 and 300, and in a region extending between amino acids 50 and 25 from the carboxy terminus, but not within six positions of the carboxy terminus (Fig. 6) . Clustering of acidic amino acids is less striking but clusters of 10 to 12 are observed within the last 100 bases of the carboxy terminus (Fig. 6 ). Such regions may indicate sites for protein-nucleic acid or protein-protein interactions. The high degree of amino acid sequence homology between the M proteins of BCV and MHV (86%) contrasts with the lower degree (70%) between the N proteins. The contrast becomes even more striking when amino acids of conserved nature are included, making the homology 93 and 79%, respectively, for the M and N proteins. This contrast indicates either that structural constraints on the M protein are more rigid, resulting in a more limited evolution of this protein, or that there is a form of genetic exchange that has taken place between the two viruses. The notion that the M protein may be structurally constrained as a result of functional requirements is suggested by the conserved chemical features between the MHV and IBV M proteins in the absence of conserved primary structure. IBV is antigenitally unrelated to MHV, and the IBV M protein shares an amino acid sequence homology of only 359/o (perfect match only using the same method of alignment employed above) with that of MHV. Yet it shows an extremely similar hydrophobicity profile and thus an apparently similar membrane topology . That is, amino acid changes were conservative. The notion of genetic exchanges, similar to those observed for RNA viruses with segmented genomes, must be seriously considered in light of recent evidence that coronaviruses undergo high-frequency recombination (Makino et a/., 1986) . The mechanism giving rise to coronavirus recombinants is unknown but may involve displacement of nascent RNA polymerase complexes from the negative-strand template of one parent with subsequent attachment to the negative strand of a second parent (Makino et al., 1986) . Recombination might therefore be expected between the closely related BCV and MHV viruses if, by chance, they should replicate simultaneously in the same host. This most certainly would be expected if polymerase binding during the recombinational event involves the conserved intergenic sequences used to identify initiation sites for transcription (Baric et a/., 1985; Budzilowicz eta/., 1985; Makino et al., 1986) . Sequence and topology of a model intracellular membrane protein, El glycoprotein, from a coronavirus Sequence of the nucleocapsid gene from murine coronavirus MHV-A59 Sequence studies of Pichinde arenavirus S RNA indicate a novel coding strategy, an ambisense viral S RNA Characterization of leader-related small RNAs in coronavirus-infected cells: Further evidence for leader-primed mechanism of transcription Sequences of the nucleocapsid genes from two strains of avian infectious bronchitis virus Sequencing of coronavirus IBV genomic RNA: A 195-base open reading frame encoded by mRNA B Sequence of the membrane protein gene from avian coronavirus IBV Three intergenic regions of coronavirus mouse hepatitis virus strain A59 genome RNA contain a common nucleotide sequence that is homologous to the 3'end of the viral mRNA leader sequence A simple and very efficient method for generating cDNA libraries Bovine coronavirus genome Studies on transformation of Escherichia co/i with plasmids Antigenic relationships among proteins of bovine coronavirus, human regulatory coronavirus OC43, and mouse hepatitis coronavirus A59 Tunicamycin resistant glycosylation of a coronavirus glycoprotein: Demonstration of a novel type of viral glycoprotein Economic impact of rotavirus and other neonatal disease agents of animals Strand separation of DNA fragments and their isolation from nondenaturing polyacrylamide gels Sequence analysis of the porcine transmissible gastroenteritis coronavirus nucleocapsid protein gene Bovine coronavirus structural proteins Bovine coronavirus hemagglutinin protein Cotranslstional and posttranslational processing of viral glycoproteins Comparison of initiation of protein synthesis in procaryotes, eucaryotes, and organelles Cleavage of structural proteins during the assembly of the head of bacteriophage T4 Mouse hepatitis virus A59: mRNA structure and genetic localization in the sequence divergence from hepatotropic strain MHV-3 Oligonucleotide fingerprints of antigenically related bovine coronavirus and human coronavirus OC43 High-frequency RNA recombination of murine coronaviruses Molecular Cloning: A Laboratory Manual Sequencing end-labeled DNA with base-specific chemical cleavages. ln Post-translational glycosylation of coronavirus glycoprotein El : Inhibition by monensin Glycoprotein El of MHV-A59: Structure of the O-linked carbohydrates and construction of full length recombinant cDNA clones Coronavirus glycoprotein El, a new type of viral glycoprotein Antigenic relationship of the feline infections peritonitis virus to coronaviruses of other species A comprehensive sequence analysis program for the IBM personal computer Predieted membrane topology of the coronavirus protein El Terminal transferase-catalyzed addition of nucleotides to the 3' termini of DNA. ln Coronavirus JHM: Nucleotide sequence of the mRNA that encodes nucleocapsid protein A simple method for DNA restriction site mapping Coronavirus multiplication strategy. II. Mapping the avian infectious bronchitis virus intracellular RNA species to the genome Phosphoproteins of murine hepatitis virus Isolation of coronavirus envelope glycoproteins and interaction with the viral nucleocapsid Electrophoretic transfer of proteins from polyactylamide gels to nitrocellulose sheets: Procedure and some applications We thank Paul Kapke for many helpful discussions. This work was supported by Public Health Service Grant Al-l 4367 from the National Institute of Allergy and Infectious Diseases, by Grant 82-CRSR-2-1090 from the US. Department of Agriculture, and in part by a grant from the National Foundation for lleitis and Colitis, Inc. W.L. was a predoctoral trainee on Grant T32-AI-07 123 from the National Institutes of Health. B.H. was a predoctoral fellow supported by the Tennessee Center of Excellence Program for Livestock Diseases and Human Health. : I I II I I I II11111 I I I II II II I IIll 1 BCV J II I II II II : I I I II 1111111 I I I I I II II I 1111 I IBV J II II I I IIllllII I I I II I I TGEV IIII 11: I I IllllllII I 11111 I I I II II I Ill1 MHV r I 1 I 111 II : II II I I II I lllllII lllllllllll I I II I I I I Ill1 IIIII BCV I I I I I II I II : I III I I I I I Ill1 III llIlllllllI I I II Ill Ill1 IIII I IBV I I I II II 1: I lllll Ill II I I II Ill1 II Ill1 II I Ill I II II I I IIAIUI lllll I I I TGEV I III II I I I I I I I Ill I I I II I I II Ill1 II I I I11111 II I I I I I I I I I I I I I111 I MHV t 11 II I I IIllllll I II I I I I I Ill II I II Ill II IN BCV II Ill I I I II I Ill1 I II I I I I I I Ill I I I I II II III IBV I I I I III I II II II I I I Ill1 I I II II I I II I II I[ TGEV I I I II I II I I II I II II III II I Ill I II Ill1 IIII I I 100 200 300