key: cord-0770967-o81j3d1j authors: Page, Kevin W.; Britton, Paul; Boursnell, Michael E. G. title: Sequence analysis of the leader RNA of two porcine coronaviruses: Transmissible gastroenteritis virus and porcine respiratory coronavirus date: 1990 journal: Virus Genes DOI: 10.1007/bf00570024 sha: 4ef6ca984ee03204c7cfb6206fa85e63fc990507 doc_id: 770967 cord_uid: o81j3d1j The leader RNA sequence was determined for two pig coronaviruses, tranmissible gastroenteritis virus (TGEV), and porcine respiratory coronavirus (PRCV). Primer extension, of a synthetic oligonucleotide complementary to the 5′ end of the nucleoprotein gene of TGEV was used to produce a single-stranded DNA copy of the leader RNA from the nucleoprotein mRNA species from TGEV and PRCV, the sequences of which were determined by Maxam and Gilbert cleavage. Northern blot analysis, using a synthetic oligonucleotide complementary to the leader RNA, showed that the leader RNA sequence was present on all of the subgenomic mRNA species. The porcine coronavirus leader RNA sequences were compared to each other and to published coronavirus leader RNA sequences. Sequence homologies and secondary structure similarities were identified that may play a role in the biological function of these RNA sequences. Transmissible gastroenteritis virus (TGEV) and porcine respiratory coronavirus (PRCV) belong to the family Coronaviridae, a large group of pleomorphic enveloped viruses with a positive-stranded RNA genome. TGEV causes gastroenteritis in pigs, resulting in a high mortality in neonates (1) . PRCV was isolated in several European countries between 1984 and 1986 (2-4), does not cause diarrhea, and has been shown to replicate in the respiratory tract with little or no clinical signs, but is very similar antigenically and serologically to TGEV (2, 4) . Virions from both viruses contain two envelope glycoproteins of relative molecular mass (Mr) 200,000 (spike) and M r 28,000-31,000 (membrane protein) and a phosphorylated nucleoprotein of M r 47,000. cDNA probes to the structural protein genes of TGEV hybridized to the appropriate mRNA species of PRCV, suggesting a high degree of homology at the RNA level (unpublished data). Coronavirus proteins are expressed from a "nested" set of subgenomic mRNAs with common 3' termini but different 5' extensions. The sequence of each mRNA that is translated to produce viral proteins appears to correspond to the 5'-terminal region that is absent on the preceding smaller mRNA species. It has been shown for the coronaviruses, mouse hepatitis virus (MHV) and infectious bronchitis virus (IBV), the subgenomic mRNA species possess short "leader sequences" at their 5' ends. These sequences are not transcribed as a contiguous mRNA species, but are derived from the 5' end of the genomic RNA and are probably joined to the 5' end of each mRNA by a process of discontinuous transcription (5) (6) (7) (8) (9) . The leader sequence appears to be produced by a mechanism termed leader-primed transcription, in which the leader RNA is transcribed independently, dissociated from the template, and then binds to the template (negative-sense strand) at specific transcriptional start sites (I0, 11) . The mechanism appears to involve the recognition of consensus sequences identified on the genomic RNA at those points corresponding to the 5' ends of the subgenomic mRNAs. These consensus sequences may act as a binding site for the RNA polymeraseleader complex (7) (8) (9) (12) (13) (14) . It has been previously postulated that a heptameric sequence, ACTAAAC (15) (16) (17) , or a hexameric sequence, CTAAAC (18) (19) (20) , may be involved in the binding of the TGEV RNA polymerase leader. In this paper we describe the elucidation of the leader RNA sequences from the porcine coronaviruses TGEV and PRCV, the first leader sequence to be described from the TGEV serogroup of coronaviruses. Comparison of the leader RNAs of TGEV and PRCV with published leader RNAs of other coronaviruses was used to identify areas of conserved sequence and potential secondary structure that may be involved in the transcription of coronavirus subgenomic mRNA species. Confluent cultures of a pig kidney cell line LLC-PK1 were infected with a virulent British field isolate of TGEV strain FS772/70 or a British isolate of PRCV strain 86/137004 at a MOI of 1-10 PFU per cell. After 2 hr at 37~ the inoculum was removed and replaced with medium containing 1 ixg/ml actinomycin D to inhibit host-cell RNA synthesis (21) . After a further 2-hr incubation, 25 r of [5,6-3H]uridine (Amersham International plc, TRK.410, 35-50 Ci/mM) was added per culture bottle and the cells were incubated for a further 5 hr. The cells were lysed with guanidinium thiocyanate, the RNA pelleted through 5.7 M cesium chloride and poly(A)-containing RNA isolated by poly(U) Sepharose affinity chromatography, as described previously (21) . Two oligonucleotides were synthesized by the phosphoramidite method using an Applied Biosystem 381A synthesizer. One oligonucleotide, oligo 38 (5'-TGGATT-CATCCCCCCAACTA-Y), was complementary to the nucleoprotein gene 22 bp downstream from the initiation ATG codon (15) , as shown in Fig. 1 , and was used for primer extension. The second oligonucleotide, oligo 58 (5'-AGAGATA-TAGCCACGCTACACTCACTTTAC-Y), was complementary to the 5' end of the leader RNA ( Fig. 1) and was used for Northern blot analysis of viral mRNA. Gel-purified oligo 38 (500 ng) was 5'-end-labeled (22) using 20 U of T 4 polynucleotide kinase (Gibco-BRL, Paisley) and 20 IxCi [~/-32p]ATP (Amersham International plc, PB 10168, 3000 ci/mM. Poly(A)-containing RNA (1.5 p~g) isolated from TGEV-and PRCV-infected cells was resuspended in water and heated at 60~ for 3 min. A further incubation was carried out using the two mRNA preparations in 27 p.l reaction volumes containing 40 U of RNasin (Promega Biotec, Liverpool), 50 mM Tris-HC1 (pH 8.3), 10 mM MgC12, 35 mM KC1, 30 mM 2-mercaptoethanol, 3 mM dithiothreitol, 4 mM dNTPs, 5'-end-labeled oligo 38 (120 ng), and 21 U of AMV reverse transcriptase (Super-RT, Anglian Biotech Ltd, Colchester) for 90 min at 42~ Formamide dye (80% formamide, 10 mM NaOH, 1 mM EDTA, 0.1% xylene cylanol blue, 0.1% bromophenol blue) was added and the mixture boiled for 3 min and electrophoresed on a 40 cm buffer gradient sequencing gel (23) . The wet gel was autoradiographed for 1 hr to locate the primerextended products, which were excised from the gel. The labeled fragments were eluted from the polyacrylamide gel and chemically cleaved (24) . Samples of the cleaved products from each of the primer extended products were electrophoresed on 6% polyacrylamide gels at 35 W constant power for two different lengths of time. TGEV and PRCV poly(A)-containing RNA was glyoxylated and separated on a 1% agarose gel (22) . The RNA was transferred onto Biodyne A membranes (Pall P/N BNNG3R 1.2 ~m, Gallenkamp) in X20 SSC (X1 SSC = 0.15 M NaCI, 0.015 M trisodium citrate, pH 7.0) for 18 hr and baked at 80~ for 2 hr. The membrane was boiled in 50 mM Tris-HCl pH 8.0 for 5 min to remove glyoxal groups from the RNA and prehybridized in the presence of 50% formamide for 6 hr at 42~ (15) . The viral mRNA species were hydribidized with 32p-labeled oligo 58 in the presence of 50% formamide for 18 hr at 42~ The membrane was washed four times in X2 SSC containing 0.1% NaDodSO 4 for 15 rain at room temperature and autoradiographed. Following primer extension, using oligo 38 at the 5' end of the nucleoprotein gene from the porcine coronaviruses TGEV and PRCV, labelled fragments of approximately 140 bases were produced and purified from gels. Larger molecular weight species were also observed (data not shown) in minor amounts, presumably corresponding to read-through sequences upstream of the nucleoprotein gene primed from the larger mRNA species. The nucleotide sequences of the two fragments, determined by chemical cleavage, were identical. The resulting nucleotide sequence of the TGEV leader RNA sequence is shown in relation to the TGEV nucleoprotein gene in Fig. 1 . The leader RNA sequence diverges from the genomic sequence 15 bp upstream of the nucleoprotein gene, corresponding to the first nucleotide of the membrane protein gene stop codon (16), indicating a length of 91 nucleotides of unique sequence (Fig. 1 ). The 91 nucleotide leader sequence of TGEV and PRCV has a low content of G (18%) and C (20%), and a high A (22%) and T (40%) content, with 20% of the T residues grouped in threeto four-nucleotide motifs (Fig. 1) . These values are similar to those observed from the TGEV genome so far sequenced, except that the values for A (30.5%) and T (32.1%) are more similar on the genome than on the leader sequence. Analysis of the TGEV nucleoprotein nucleotide sequence (15) revealed a potential RNA polymerase-leader complex binding site. The site, ACTAAAC, is seven nucleotides upstream of the nucleoprotein initiation codon and has also been found to precede all the TGEV structural protein genes and two of the three potential genes shown to be at the 5' end of mRNA species (15) (16) (17) . This consensus sequence is found two nucleotides downstream of the nucleotide where the leader RNA and TGEV genomic sequences diverge, indicating that this sequence is involved in the leader-primed transcription ofTGEV mRNA molecules. As can be seen from Fig. 2 , 4 of the 6 mRNA species from the FS772/70 strain of TGEV have the sequence AACTAAAC, of which the 5'-end adenosine residue is the next base down from the divergence point. In fact, the consensus sequence at the spike/ORF1-ORF2 gene junction has the sequence GAACTAAAC and at the NUC/ORF4 gene junction has the sequence CGAACTAAAC, indicating that the region of the leader sequence 5' to the homology motif, ACTAAAC, may vary between 89 and 91 nucleotides depending on the TGEV gene. Computer analysis has also detected a homology between the leader RNA sequence and the 5' end of the negative strand (i.e., the reverse complement of the noncoding region at the 3' end of the positive strand). This is shown in Fig. 3 . The nucleotides on the leader RNA sequence, bases 84-99, and on the negative strand, bases 136 to 152 counting from the first base after the poly(A) tail, have an overall homology of 82% and include the sequenc~ CTAAAC, which is part of the postulated TGEV RNA polymerase-leader complex binding site. This is very similar to the observation for IBV (25) involving sequences present at the 5' end of the IBV genome, and on the IBV leader RNA sequences, with the 5' end of the IBV negative strand. The homology observed included the sequence CTTAAC, which is part of the postulated IBV RNA polymerase-leader complex binding site CT(T/G)AACAA. An oligonucleotide, oligo 58, was synthesised that was complementary to the 5' end of the TGEV and PRCV leader RNA sequences (Fig. 1) . The oligonucleotide was end-labeled and used to probe TGEV and PRCV mRNA species that were Northern blotted onto Biodyne membranes. As can be seen from Fig. 4 , the labeled probe hybridized to all of the TGEV and PRCV mRNA species. The intensity of the bands corresponding to labeled probe hybridized the spike mRNA species, and genomic RNA was lower than that observed for the smaller mRNA species due to less of these larger species being isolated from the poly(U) Sepharose column used in the isolation of mRNA. The fact that the probe hybridized to all of the mRNA species showed that the leader RNA sequence was present on the other RNA molecules of TGEV and both strains of PRCV was not unique to the nucleoprotein mRNA species. The two porcine coronavirus leader sequences were identical, indicating that the two viruses probably use the same RNA polymerase-leader complex binding site, ACTAAAC, for the synthesis of subgenomic mRNA species. The SEQHP comparison program of the Los Alamos (26) package was used to compare the leader RNA sequences determined in this paper and those published for five other coronaviruses belonging to two different serogroups. The sequences were compared from the 5' ends to the point of divergence from the genomic sequences. The percentage homologies, Table 1 , were expressed as the number of bases matched to the longer of the two sequences being compared. The homology of the leader sequences fell into three groups. Leader RNAs from coronaviruses belonging to different serological groups had homologies in the region of 35-40%. Serologically related viruses like human coronavirus (HCV) (strain OC43) and MHV (strains A59 and JHM) have about 60% homology. The third group involved different strains of MHV, A59, and JHM, which showed a homology of 91%. This observation indicates that TGEV and PRCV, which have a homology of 100%, are probably different strains of the same virus or that PRCV has very recently diverged from TGEV. In order to identify common areas of homology, the leader RNA sequences from seven coronaviruses were aligned. As can be seen from Fig. 5 , these fell into two groups. One group consists of MHV (strains A59 and JHM) with HCV (OC43), which have a fairly high degree of homology along their lengths. The other group consists of TGEV and PRCV (not shown on the diagram) with HCV (229E) and IBV, which have high homologies at their 3' ends and areas of homology at their 5' ends. There are good homologies towards the 3' ends, involving the postulated RNA polymerase-leader complex binding sites and sequences upstream of these sites, between the groups, but very little if any homology between the 5' ends. (7) and strain JHM (13); avian, IBV strain Beaudette (9,25). As seen from Fig. 5 simple alignment did not reveal very much information about the homologies of the leader RNA sequences from the different coronaviruses, except at the 3' ends involving the consensus sequences. In order to identify any potential similarities in these sequences, the secondary structure of the RNA sequences in Fig. 5 were analyzed. Potential secondary structures of the leader RNA sequences were determined using the computer program FOLD (27) from the UWGCG DNA analysis programs (28) . The coordinates determined by the FOLD program were displayed graphically using the UWGCG program SQUIG-GLES. The potential secondary structures obtained were compared and, as can be seen from Fig. 6 , the overall shape of these sequences are very similar, except for the avian coronavirus IBV. All the molecules appear to be composed of two stem-loop structures. The two MHV molecules are very similar in shape and, as seen from Fig. 5 and Table 1 , are very homologous, 91%, at base sequence. The secondary structures of the coronavirus leader RNA sequences are probably influenced by their biological function, which results in the similarity of these potential structures. This paper presents evidence that the nucleoprotein mRNA species of TGEV and the closely related porcine respiratory variant of TGEV, PRCV, contain an identical leader RNA sequence of about 91 nucleotides. Sequencing studies on TGEV have shown that the heptameric sequence ACTAAAC occurs on the genome upstream of the genes and is believed to be the binding site for the leader of the genomic RNA. This mechanism has been termed leader-primed transcription and involves not only the leader RNA primer, but also consensus sequences along the genome found upstream of the genes, which act as binding sites for the leader RNA primer. Comparison of TGEV and PRCV viral products has shown very little difference between the two coronaviruses, and until recently is was impossible to differentiate between the two viruses using antisera. PRCV is fully neutralized by antisera prepared against TGEV, and the majority of monoclonal antibodies (MAbs) raised against TGEV virion proteins cross-react with PRCV. However, MAbs, raised against antigenic determinants of the spike protein from either the virulent British isolate FS772/70 (29) or the avirulent Purdue strain of TGEV (30) have been identified that do not recognize PRCV. These observations and the fact that the leader RNA sequences from TGEV and PRCV are identical supports the evidence that the two viruses are very similar and that PRCV may have evolved as a TGEV variant. Comparison of the TGEV leader RNA sequence with the genomic sequence upstream of the nucleoprotein indicates that the length of the unique sequence of the leader sequence is 91 nucleotides. The point of divergence is two bases upstream of the ACTAAAC sequence, supporting the evidence that the TGEV RNA polymerase-leader complex binding site is ACTAAAC. Four out of the six mRNA species from the FS772/70 strain of TGEV have the sequence AACTAAAC, and the 5'-end adenosine residue is the next base down from the divergence point in the nucleoprotein mRNA (Fig. 2) . The differences in the homologies between the leader RNA and sequences upstream of the consensus sequence on the genomic RNA may play a role in the levels of transcription of a particular mRNA species. The mRNA species of 3.0 kb has been shown to have an open reading frame at the 5' end encoding a potential polypeptide of M r 9200 (17) . This particular mRNA does not have the heptameric consensus sequence but has the hexameric CTAAAC sequence, and it is interesting to note that it is the least abundant TGEV mRNA species (observed from TGEV mRNA in total cell lysates). Hybridization of oligo 58 to the 3.0-kb mRNA species showed that this species does contain the TGEV leader RNA, confirming that it is a true mRNA species, even though it is the only TGEV species not to have the heptameric consensus sequence. Comparison of the seven coronavirus leader RNA sequences against each other identified three groups (Table 1) : non-serologically related viruses had about 35-40% homology; serologically related viruses had about 60% homology; viral strains had about 90-100% homology. However, TGEV and HCV (229E) have been placed in the same serological group, but have only 36% homology within their leader RNA sequences, suggesting that the two viruses are not particularly related. TGEV and HCV (229E) have been shown to have 46% homology at the amino acid level within their derived nucleoprotein sequences (31) , whereas the homology between the derived nucleoprotein amino acid sequences for different viruses within the MHV serological group are between 80% and 98% homology. This indicates that the serological grouping of coronaviruses is not a particularly useful test, as similar epitopes may exist on the viral structural proteins. Comparisons of nucleic and amino acid sequences from the viruses will provide a more accurate method for grouping the viruses. It will be interesting to compare the leader sequences of bovine coronavirus (BCV), which is serologically related to HCV (OC43) and MHV (A59 and JHM), with feline infectious peritonitis virus (FIPV) and canine coronavirus (CCV), which are serologically related to TGEV, once their sequences have been determined. The large variation in sequence length and content made the alignment of the different leader sequences difficult. However, alignment of the six different coronaviruses revealed that they fell into two groups. There appears to be some conservation of short sequence motifs between the seven leader sequences. Toward the 3' end of the sequences, a TAG motif is conserved in all the leaders, followed by a string of Ts. In five out of seven of the sequences, this motif is TAGANNTT. About ten nucleotides downstream of this region is a conserved CT motif, which is followed by a series of nucleotides differing in number, depending on the coronavirus, followed by the postulated RNA polymerase-leader complex binding site. The largest number of nucleotides between the CT motif and the consensus sequence are found on TGEV and PRCV, the shortest is found on HCV (229E) and IBV. It is interesting to note that there is a five-base insert in MHV strain JHM when compared to MHV strain A59, which is also present in HCV (OC43) within this region. All the mammalian coronaviruses appear to have the motive CTAAAC, except HCV (OC43), which has CTAAAT. Recent sequence data suggest that coronaviruses FIPV and BCV have ACTAAAC as their mRNA consensus sequence. Upstream of the TAG motif there is an ACT motif occurring in six out of seven sequences. Toward the 5' end of the leader RNA sequences, the homologies are patchy and limited to short matches, occurring only between pairs of sequences. The area upstream of the consensus sequence has been suggested to be involved in the binding of nucleoprotein to the leader RNA sequence at nucleotides 56-65 in MHV (32) . It was suggested that mRNA species and genomic RNA form a complex with the nucleoprotein by the protein binding to or near the leader sequence attached to the RNA molecules (33) . Secondary structure analysis of the leader RNA sequences showed that all the sequences except for IBV possess a putative double stem-loop structure (Fig. 6 ). In the case of the mammalian coronaviruses, the consensus sequences and upstream regions of homology are on the second stem-loop structure, leaving the possibility that the RNA-dependent RNA polymerase could interact with the first stem-loop structure. The IBV consensus sequence is present on the free 3' end of the single stem-loop structure, possibly leaving the single stem-loop structure to interact with the polymerase. Virus Infections of Vertebrates (eds) Coronaviruses Molecular Cloning: A Laboratory Manual We thank Miss K. Mawditt, of this laboratory, for synthesizing oligos 38 and 58 and Dr. S. F. Cartwright, Central Veterinary Laboratory, Weybridge for PRCV strains 86/137004 and 86/135308. This work was supported by a research contract from the Biomolecular Engineering Programme of the Commission of the European Communities, contract No. BAP-0235-UK(HI).