key: cord-0957354-i6vuhaiv authors: De Groot, Raoul J.; Andeweg, Arno C.; Horzinek, Marian C.; Spaan, Willy J.M. title: Sequence analysis of the 3′ end of the feline coronavirus FIPV 79-1146 genome: Comparison with the genome of porcine coronavirus TGEV reveals large insertions date: 1988-12-31 journal: Virology DOI: 10.1016/0042-6822(88)90097-9 sha: 02569bcf9a7e53d2d262ef58dbfb2f9d38e7c7b2 doc_id: 957354 cord_uid: i6vuhaiv Abstract The genetic information, carried on mRNA 6 of feline infectious peritonitis virus (FIPV) strain 79–1146, was determined by sequence analysis of cDNA clones derived from the 3′ end of the FIPV genome. Two ORFs were found, encoding polypeptides of 11 K (ORF-1) and 22K (ORF-2). The FIPV sequence was compared to the 3′ end sequence of transmissible gastroenteritis virus (TGEV). ORF-1 has a homologous counterpart (ORF-X3) in the TGEV genome; both ORFs are located at the same position relative to the nucleocapsid gene. However, as a result of an in-frame insertion or deletion, ORF-1 is 69 nucleotides larger than ORF-X3. A similar event has occurred immediately downstream of ORF1: a 624-nucleotide segment, containing the complete ORF-2, is absent in the TGEV sequence. Most sequence similarity (98.5%) was found in the 3′ noncoding sequences. ORF-X3 and ORF-1 are preceded by the sequence AAC-TAAAC, which is assumed to be the transcription-initiation signal in FIPV and TGEV (P. A. Kapke and D. A. Brian (1986) Virology 151, 41–49). By S1 nuclease analysis, the 5′ end of FIPV RNA 6 was mapped immediately upstream of this sequence. A 700-nucleotide TGEV-specific RNA was found by cross-hybridization with an FIPV 3′ end probe, suggesting that TGEV ORF-X3 is also carried on a separate mRNA. The differences at the 3′ ends of the FIPV and TGEV genomes maybe the result of RNA recombination events. Coronaviruses, a group of enveloped, positivestranded RNA viruses, have attracted considerable interest because of their unusual replication strategy. In the infected cell, there are five to seven subgenomic mRNAs which form a 3' coterminal nested set: they have common 3' ends but extend for different lengths in the 5' direction. In addition, the RNAs share a short 5' leader sequence, which is fused to the RNA "body" via discontinuous transcription (Spaan eta/., 1983; Lai et aL, 1984; Brown et aL, 1984) . Translation of each RNA is thought to be restricted to the open reading frames (ORFs) at the 5' end that are not present in the smaller RNAs (for review see Siddell et al., 1983) . In vitro translation of the viral mRNAs (Rottier et al., 1981; Siddell, 1983; Stern and Sefton, 1984; Jacobs et al., 1986; de Groot et aL, 1987a) and the sequence analysis of coronavirus genomes Rasschaert et al., 1987; Skinner and Siddell, 1985; Schmidt et aL, 1987; Luytjes et aL, 1987) have allowed the construction of genomic maps. The relative position of the genes encoding the structural proteins is conserved on the genomes of these viruses. However, differences in the genomic maps indicated that other transcription units have been lost, gained, or translo-To whom requests for reprints should be addressed. cated as the coronaviruses diverged (de Groot et al., 1987a) . Feline infectious peritonitis virus (FIPV) and transmis: sible gastroenteritis virus (TGEV) of swine belong to the same antigenic cluster (Pedersen etaL, 1978; Horzinek eta/., 1982; Siddell etaL, 1983) and are closely related: sequence analysis of their peplomer genes revealed up to 93% sequence identity (Jacobs eta/., 1987) . Despite this close relationship, TGEV and FIPV differ in their genomic organization (de Groot et aL, 1987a) . TGEV is generally reported to specify six poly(A)-containing RNAs, the smallest of which (1.9 kb) encodes the nucleocapsid (N) protein (Jacobs et al., 1986) . In contrast to other coronaviruses, like infectious bronchitis virus (IBV) and mouse hepatitis virus (MHV), the nucleocapsid gene is not the 3'-most ORF. A short ©RF (ORF-X3), potentially encoding a polypeptide of 9.1 K, is found further downstream (Kapke and Brian, 1986; Rasschaert et aL, 1987) . Although the presumptive transcription-initiation signal, AACTAAAC, is present at the 5' end of ORF-X3, it is not clear whether this ORF is carried on a separate mRNA (Jacobs et al., 1986; Kapke and Brian, 1986; Rasschaert etaL, 1987) . For FIPV an RNA of 2.8 kb (RNA 5) encodes the N protein, while the smallest RNA (RNA 6) has a length of about 1450 bp. These findings indicated the presence of a large insertion at the 3' end of the FIPV genome as compared to the TGEV genome. In this report we describe the cloning and sequence analysis of the 3' end of the FIPV strain 79-1146 genome; a detailed comparison with the 3'end of the TGEV genome is presented. Selection and analysis of cDNA clones cDNA clones containing sequences derived from the 3' end of the FIPV genome were selected from a "random" cDNA library of FIPV 79-1146 genomic RNA (de Groot et al., 1987b) by hybridization to sucrose-gradient purified, 32p-labeled FIPV RNA 6 in 50% formamide, 5× SSC, 5X Denhardt's , 100 #g/ml salmon sperm DNA at 42 °. Recombinant DNA techniques were performed by standard methods (Maniatis et al., 1982) . Sequence analysis was carried out using the dideoxynucleotide chain termination procedure (Sanger et al., 1977) . Sequence data were assembled and analyzed by using the computer programs by Staden (1986) . Sl nuclease analysis and Northern blot analysis were performed as described (de Groot eta/., 1987a,b) . Isolation and characterization of recombinants containing sequences derived from the3' end of the FIPV genome The preparation of a cDNA library of FIPV genomic RNA in pUC9 was described previously (de Groot etaL, 1987b) . Recombinants containing sequences derived from the 3' end of the FIPV genome were isolated by colony hybridization with an RNA fraction enriched for RNA 6 (fraction 28, de Groot et aL, 1987a) . The plasmids pB12, pC 12, and pE7 were selected for sequence analysis. In Fig. 1 the sequence strategy is outlined. Three major ORFs were identified (Fig. 1) . The 5'-most ORF could be identified as the 3' end of the FIPV nucleocapsid gene, the sequence of which was 64% identical to the corresponding TGEV sequence (not shown). Figure 2 shows the nucleotide sequence and the predicted amino acid sequences of the region downstream of the N gene. As shown in Fig. 1 , this sequence was determined on both strands and on two independent cDNA clones, except for the 3'-most 68 nucleotides. ORF-1 (positions 49 to 375) predicts a protein of 108 residues. In the genomic sequence this ORF overlaps with the N gene. The first AUG codon (position 49) is followed by the sequence AACTAAAC; a second AUG codon is present at position 70 (Fig. 2 ). ORF-2 (positions 380 to 1000) could encode a polypeptide of 206 residues. Comparison with the 3' end of the TGEV genome Figure 3a shows a dot matrix comparison of the 3' end of the FIPV and TGEV genomes (Kapke and Brian, 1986) . The highest sequence similarity (98.5%) was found in the 3' noncoding regions. ORF-1 is 78% identical to the TGEV ORF-X3 but contains 69 nucleotides in addition (indicated by a dashed line in Fig. 2 ). The AUG start codon of ORF-X3 corresponds to the second AUG codon of ORF-1. A 624-nucleotide segment positions 376-1000)immediately downstream of ORF-1 is absent in the TGEV sequence. Strikingly, this segment corresponds exactly to ORF-2. A schematic alignment of the TGEV and FIPV sequences is shown in Fig. 3b . None of the recombinants isolated from our cDNA library contained the poly(A) tail, probably because the cDNA synthesis was randomly primed by calf thymus DNA pentamers (de Groot et aL, 1987b) . If aligned to the TGEV sequence, the most 3' located clone, E7, ends just one nucleotide upstream of the poly(A) tail. Localization of the 5' end of the presumptive RNA body of RNA 6 To determine the 5' end of RNA 6, we used S1 nuclease analysis. An M13-recombinant phage containing the virus-sense strand of the 950:bp PstI-Taql fragment ( Fig. 1) served as a template to prepare a uniformly labeled probe. This probe was hybridized to sucrose-gradient purifie d RNA 6, followed by Sl nuclease digestion. A fragment of 514 nucleotides was protected (Fig. 4a ). The precise length was determined in a sequencing gel (Fig. 4b ). This indicates that the 5' end of the RNA 6 body maps at position 60, immediately upstream of the AACTAAAC box. Consequently, the AUG codon at position 49 is not present in RNA 6. Furthermore, these results suggest that the body of RNA 6 has a length of 1212 nucleotides, provided that there are no additional insertions. Assuming an RNA leader sequence of 60-70 nucleotides (Spaan et al., 1983; Lai eta/., 1984; Brown eta/., 1984) and a poly(A) tail of about 100 nucleotides we arrive at a predicted length of approximately 1400 nucleotides. Previously, RNA 6 was estimated to be 1600 nucleotides (de Groot et aL, 1987a) . By using gels with a better resolution in this MW range, we have now measured a length of about 1450 nucle0tides (not shown). Since the AACTAAAC box preceding ORF-1 apparently is used as/a signal for initiation of transcription, we expected th,ls also to be the case for the AAC-TAAAC sequence preceding the TGEV ORF-X3. Figure 5 shows that in a Northern blot of oligo (dT)-selected RNAs of TGEV-infected cells, an RNA of about 700 nu- cleotides can be detected by cross-hybridization with the 1600-bp Pstl fragment of clone B12. Differences in the mRNA sets of TGEV Purdue and FIPV 79-1146 indicated the presence of large insertions at the 3' end of the FIPV genome (de Groot et al., 1987a) . We have characterized these insertions by sequence analysis of recombinants which had been isolated from a FIPV-specific cDNA library by hybridization with the smallest FIPV mRNA (RNA 6). RNA 6 carries two ORFs, encoding polypeptides of 11K (ORF-1) and 22K (ORF-2). In the genomic sequence, ORF-1 overlaps with the 3' end of the N gene. However, by Sl nuclease analysis it was shown that only sequences downstream of the N gene are contained in mRNA 6. Consequently, in this RNA only the second AUG codon of ORF-1 is available for translation-initiation. The 5' end of the body of RNA 6 was mapped immediately upstream of the sequence AACTAAAC, the presumptive transcription-initiation signal in the FIP/TGE viruses. This consensus sequence is not present between ORF-1 and ORF-2. Furthermore, there are no indications for an RNA smaller then RNA 6. Therefore, if both ORFs are to be translated, RNA 6 must function as a bicistronic mRNA. The start codon of ORF-1 and the two internal, out-of-frame AUGs at positions 86 and 188 are in an unfavorable context for translation-initiation, while the AUG of ORF-2 ranks among the most efficient start codons. According to the scanning hypothesis (Kozak, 1986a,b) this arrangement would favor translation of ORF-2. Translation of ORF-2 could also occur via reinitiation . Coronavirus mRNAs containing two or three ORFs in the 5' "unique" segment have previously been described for MHV and IBV (Boursnell eta/., 1985) . Recently, Smith et aL (1987) provided evidence for in vivo expression of a "downstream" ORF of IBV RNA D. ORF-1 is homologous to the TGEV ORF-X3 and present at the same location relative to the N gene. However, due to an in-frame insertion in FIPV or to a deletion in TGEV, ORF-1 is 69 nucleotides longer. The predicted ORF-X3 product contains hydrophobic segments of about 25 residues at the N-and C-terminus; these segments are separated by a hydrophilic central region (Kapke and Brian, 1986) . The 23-residue insert in the ORF-1 product enlarges this hydrophilic region. Moreover, due to a point mutation, ORF-1 con- (Kapke and Brian, 1986) . A schematic presentation of the results is depicted in (b), The ORFs are indicated by bars, black bars represent conserved segments, insertions are depicted by white bars. The arrow indicates the 5' end of the 'body' of RNA 6. sylation site. The features of the ORF-1 and ORF-X3 products are characteristic for membrane proteins. ORF-1 and ORF-X3 may encode minor structural proteins which have not yet been detected because of their small size. Like ORF-1, TGEV ORF-X3 is preceded by the presumptive transcription-initiation signal AACTAAAC (Kapke and Brian, 1986) . However, it was unclear whether in TGEV this ORF is contained in a separate RNA (Kapke and Brian, 1986; Jacobs etaL, 1986; Rasschaert et aL, 1987) . Rasschaert et al. (1987) did not detect such RNA species in Northern blots, but could have missed it since low percentage agarose gels were used. By cross-hybridization with an FIPV 3' end probe, we detected a TGEV-specific, poly(A)-containing RNA species of about 700 nucleotides in Northern blots. Jacobs et aL (1986) previously observed this RNA species in TGEV-infected cells, but considered it host specific. However, its synthesis was not affected by actinomycin D (Jacobs etaL, 1986) , which is a strong indication for virus specificity. ORF-2 is located on a 624-nucleotide segment, which is absent at the 3' end of the TGEV genome. A comparison with the partial TGEV sequence determined by Rasschaert et aL (1987) showed that the TGEV genome does not contain an ORF-2 homolog downstream of the peplomer gene. Moreover, ORF-2specific probes did not cross-hybridize with the TGEV mRNA 1 (not shown), indicating the absence of sequences related to ©RF-2 in the remaining part of the TGEV genome. ORF-2 is not related to genes of mouse hepatitis virus (MHV), infectious bronchitis virus (IBV) or to any of the sequences in the NBRF and Swiss protein databases. The predicted ORF-2 product is predominantly hydrophilic but contains a hydrophobic segment of 12 residues at the N-terminus. ORF-2 and the 69 nucleotide segment in ORF-1 may have been deleted in TGEV, but an intriguing possibility is that FIPV has acquired these sequences by RNA recombination. Homologous recombination in vitro has been described for the MHV strains A59 and JHM (Makino et al., 1986) . The remarkable sequence divergence at the 5' ends of the peplomer genes of TGEV and FIPV suggests that similar events may also occur in vivo (Jacobs et a/., 1987) . Recently, Luytjes et aL (1988) discovered a striking sequence similarity between a pseudogene contained in RNA 2 of MHV A59 and the hemagglutinin (HA) gene of influenza virus, type C. This finding is best explained by a nonhomologous recombination event. Conceivably, uptake of genetic information via nonhomologous recombination may account for FIPV ©RF-2 and the various nonrelated ORFs in other coronaviruses. 4. (a) $1 nuclease mapping of the 5' end of the FIPV RNA 6. The uniformly labeled, anti-sense strand of the 950 bp PstI-Taql fragment was used as a probe. After $1 nuclease digestion the protected fragments were analyzed in a 2% agarose gel. Lane 1, hybridization to total poly(A)-containing RNA extracted from FIPV-infected cells; lane 2, hybridization to yeast tRNA; lane 3, hybridization to sucrose-gradient purified RNA 6 (fraction 28; de Greet et al., 1987a) . Sau3A digested pUC 8 was used as a molecular weight marker (lane m). (b) the precise length of the protected fragment was determined in a sequencing gel. Lane 1, marker; lane 2, protected fragment after hybridization to purified RNA 6. Sequence and topology of a model intracellular membrane protein E1 glycoprotein, from a coronavirus Cloning and sequencing the nucleocapsid and E1 genes of coronavirus Sequencing of coronavirus iBV genomic RNA: Three open reading frames in the 5' "unique" region of RNA D Completion of the sequence of the genome of the coronavirus avian infectious bronchitis virus Sequence of the nucleoprotein gene from a virulent British field isolate of transmissible gastroenteritis virus and its expression in Saccharomyces cerevisiae A leader sequence is present on mRNA A of avian infectious bronchitis virus cDNA cloning and sequence analysis of the gene encoding the pep]omer protein of feline infectious peritonitis virus Intracellular RNAs of the feline infectious peritonitis coronavirus strain 79-1146 Antigenic relationships among homologous structural polypeptides of porcine, feline and canine coronaviruses The nucleotide sequence of the peplomer gene of porcine transmissible gastroenteritis virus (TGEV): Comparison with the sequence of the peplomer protein of feline infectious peritonitis virus (FIPV) Characterization and translation of transmissible gastroenteritis virus mRNAs Sequence analysis of the porcine transmissible gastroenteritis coronavirus nucleocapsid protein gene Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes Bifunctional messenger RNAs in eukaryotes Characterization of leader RNA sequences on the virion and mRNAs of mouse hepatitis virus, a cytoplasmic RNA virus Sequence of mouse hepatitis virus A59 mRNA 2: Indications for RNA recombination between coronaviruses and influenza C virus Primary structure of the glycoprotein E2 of coronavirus MHV-A59 and identification of the trypsin cleavage site High-frequency RNA recombination of murine coronaviruses Molecular Cloning: A Laboratory Manual Termination-reinitiation occurs in the translation of mammalian cell mRNAs Effect of upstream reading frames on translational efficiency in simian virus 40 recombinants Antigenic relationship of the feline infectious peritonitis virus to coronaviruses of other species Enteric coronavirus TGEV: Partial sequence of the genomic RNA, its organization and expression Translation of three mouse hepatitis virus stain A59 subgenomic RNAs in Xenopus/aevis oocytes DNA sequencing with chain-terminating inhibitors Nucleotide sequence of the gene encoding the surface projection glycoprotein of coronavirus MHV-JHM Coronavirus JHM: Coding assignment of subgenomic mRNAs The biology of coronaviruses Coronavirus MHV-JHM mRNA 5 allows translation of a second, downstream open reading frame Coding sequence of coronavirus MHV-JHM mRNA 4 Identification of a new gene product encoded by mRNA D of infectious bronchitis virus Coronavirus mRNA synthesis involves fusion of non-contiguous sequences Automation of the computer handling of gel reading data produced by the shotgun method of DNA sequencing Coronavirus multiplication: Location of genes for virion proteins on the avian infectious bronchitis virus genome We thank Hans Lenstra and Peter Rottier for helpful discussions and for critical reading of the manuscript. This work was supported by a research grant from Duphar BV, Weesp, The Netherlands.