key: cord-344558-1jgqofbr authors: Kocherhans, Rolf; Bridgen, Anne; Ackermann, Mathias; Tobler, Kurt title: Completion of the Porcine Epidemic Diarrhoea Coronavirus (PEDV) Genome Sequence date: 2001 journal: Virus Genes DOI: 10.1023/a:1011831902219 sha: doc_id: 344558 cord_uid: 1jgqofbr The sequence of the replicase gene of porcine epidemic diarrhoea virus (PEDV) has been determined. This completes the sequence of the entire genome of strain CV777, which was found to be 28,033 nucleotides (nt) in length (excluding the poly A-tail). A cloning strategy, which involves primers based on conserved regions in the predicted ORF1 products from other coronaviruses whose genome sequence has been determined, was used to amplify the equivalent, but as yet unknown, sequence of PEDV. Primary sequences derived from these products were used to design additional primers resulting in the amplification and sequencing of the entire ORF1 of PEDV. Analysis of the nucleotide sequences revealed a small open reading frame (ORF) located near the 5′ end (no 99–137), and two large, slightly overlapping ORFs, ORF1a (nt 297–12650) and ORF1b (nt 12605–20641). The ORF1a and ORF1b sequences overlapped at a potential ribosomal frame shift site. The amino acid sequence analysis suggested the presence of several functional motifs within the putative ORF1 protein. By analogy to other coronavirus replicase gene products, three protease and one growth factor-like motif were seen in ORF1a, and one polymerase domain, one metal ion-binding domain, and one helicase motif could be assigned within ORF1b. Comparative amino acid sequence alignments revealed that PEDV is most closely related to human coronavirus (HCoV)-229E and transmissible gastroenteritis virus (TGEV) and less related to murine hepatitis virus (MHV) and infectious bronchitis virus (IBV). These results thus confirm and extend the findings from sequence analysis of the structural genes of PEDV. Porcine epidemic diarrhoea virus (PEDV) is a causative agent for diarrhoea in pigs, particularly in neonates. The disease has been recognised for approximately thirty years, but the causative virus was only first described in 1978 [1] , while another ten years elapsed before a method was developed for propagation of the virus in cell culture [2] . During this time, outbreaks of the disease were reported from numerous European countries as well as Korea, China and Japan. The epidemiology and pathogenesis of the disease have been well described by Pensaert [3] . The biological behaviour, electron microscopic appearance and polypeptide structure of PEDV resulted in its provisional classification as a coronavirus [2, 4, 5] . Coronaviruses belong to the taxonomic order of Nidovirales and contain a single stranded RNA genome of positive polarity, which is approximately thirty kilobases in length. The genes encoding the structural proteins are located at the 3 0 end of the genome. An astonishing two-thirds of the genome consist of the replicase gene, which is located at the 5 0 end of the genome. The replicase proteins are encoded by ORF1a and ORF1b. These two long, slightly overlapping ORFs are connected by a ribosomal frame shift site in all coronaviruses sequenced to date. This regulates the ratio of the two polypeptides encoded by ORF1a and the readthrough product ORF1ab. About 70±80% of the translation products are terminated at the end of ORF1a, and 20±30% continue to the end of ORF1b. The polypeptides are post-translationally processed by viral encoded proteases [reviewed by 6]. These proteases are encoded within ORF1a; the polymerase-and the helicase-function are encoded by ORF1b. We have previously completed the sequencing of the nucleocapsid-(N), membrane-(M), small membrane-(E), ORF3 and spike-(S) genes of the PEDV strain CV777 [7±9]. The alignment of the deduced amino acid sequences indicated that PEDV occupies an interesting intermediate position between the two well-characterized members of the group I coronaviruses, transmissible gastroenteritis virus (TGEV) and human coronavirus (HCoV)-229E. In this study, we have continued to determine and analyse nucleotide sequences of PEDV. To our knowledge, only two group I coronaviruses have been sequenced completely, HCoV-229E and TGEV [10, 11] . In addition, two strains of mouse hepatitis virus (MHV), JHM and A59 belonging to the group II coronaviruses, and infectious bronchitis virus (IBV) have been completely sequenced [12±15] . Therefore, the sequence presented in this paper is the sixth sequence of a coronavirus covering the entire genome. Growth of cell adapted PEDV strain CV777 was performed essentially as has been described elsewhere [2, 8] , except that virus-infected cells were harvested at approximately 18 h post infection. Cells were freeze-thawed three times and cell debris removed by low speed centrifugation. Virus was pelleted by centrifugation for 2 h at 22,000 rpm and 4 C in a SW28 rotor of a Beckman centrifuge. Virus pellets prepared from two 175 cm 2 flasks were pooled and resuspended in 1 ml Trizol TM (Gibco-BRL), and RNA was prepared as recommended by the manufacturer. In order to obtain the first partial PEDV specific sequences, the predicted amino acid sequences of the HCoV-229E and TGEV polymerase ORFs were aligned and homologous regions identified. The homologous regions were used to design degenerate primers [9] that were used for RT-PCR amplifications. These initial amplicons were cloned and sequenced [9] . Later, a mixture of up to six antigenome sense primers based on PEDV specific sequences or the degenerate primers and random hexamer primer (purchased from Schmidheini AG; Balgach, Switzerland) was used for first strand cDNA synthesis. RNA prepared from two 175 cm 2 flasks of virus-infected cells was denatured for 10 min at 65 C and first strand cDNA was performed in a 20 ml total reaction volume using SuperscriptII TM (GibcoBRL; Basel, Switzerland) according to the manufacture's protocol. This was modified to create the longer reverse transcription products by including a denaturation step at 95 C for 5 min following the first 1 h incubation at 42 C, followed by the addition of 1 ml SuperscriptII TM and a second prolongation step of 1 h at 42 C. Template RNA was digested by adding 1 ml RNaseH (GibcoBRL; Basel, Switzerland) to the reaction mix and incubating at 37 C for 20 min. PCR amplification was performed as described elsewhere. In brief, Pfu DNA polymerase (Stratagene; Basel, Switzerland) was used for the amplifications, which were performed on a DNA Engine (MJ Research) machine. PCR fragments were subsequently cloned into pBluescript 1 II KS or pUC19 vectors using standard procedures. The nucleotide sequence was determined on these cDNA clones. Direct sequencing was performed on a RT-PCR product (see Fig. 1B ), which was cleaned through an agarose gel. The contigs of the sequence determinations were constructed using SeqMan (DNA*, Lasergene, Madison WI, USA). We previously reported the determination of the PEDV leader sequence on the mRNA encoding the N-gene [16] . This sequence was used for the primer design in order to amplify the 5 0 end of the genome. The leader sequence was used for the in silico construction of the genomic RNA sequence, which is available on GenBank database (Accession Number AF353511). Virus sequences covering replicase genes were obtained from the GenEMBL sequence database. The files with the accession numbers X69721, Z34093, AF029248, and M95169 for HCoV-229E, TGEV (Purdue 115), MHV-A59, and IBV (Beaudette) respectively were used. The deduced amino acid sequences were compared as indicated in the text using PILEUP and GAP (GCG Package version 10.0; Madison, WI, USA). The files generated by PILEUP were used in DISTANCES (GCG Package version 10.0; Madison, WI, USA) to determine the Kimura protein sequence distances, which were subsequently used for the construction of unrooted dendrogram using TreeGen on the CBRG server (http://cbrg.inf.ethz.ch/) The cloning approach we used previously to clone the PEDV M and N genes involved designing primers based on conserved regions of the coronavirus M and N genes to amplify the equivalent to the unknown PEDV sequence. In this study, we employed this technique to clone parts of the ORF1 of PEDV. Such a method is useful for viruses which do not grow to high titre, avoids lengthy screening of clones and could potentially be applied to the cloning of any group I coronavirus. However, the large size of ORF1 and the paucity of sequence data from other coronaviruses made this an ambitious objective. A number of conserved functional domains were identified in the predicted ORF1 products, but these domains are mainly located in the ORF1b region and leave large regions of the ORF1a product with no known function and only a low level of sequence conservation between different coronavirus genomes. In order to clone and determine the sequences for the PEDV ORF1, the predicted amino acid sequences of the HCoV-229E and TGEV ORF1 were aligned and homologous regions identified. The HCoV-229E and TGEV ORFs were sufficiently closely related to allow complete alignment of the predicted expression products. In contrast, the MHV and IBV sequences were much more divergent, and could only be aligned with the group I sequences in some of the conserved regions. Degenerate primers were designed from regions conserved between the HCoV-229E and TGEV and, where possible, MHV and IBV ORF1. These primers were used both to prime reverse transcription and for the PCR amplifications. Sequence data derived from these PCR products allowed us to design sequence-specific primers which were then used to amplify the entire ORF1 (see Fig. 1B ). Numerous small cDNA clones, five large cDNA clones and one RT-PCR product covering the 5 0 twothirds of the PEDV genome were used to determine the nucleotide sequence of the PEDV ORF1 (Fig. 1 ). This analysis completes the nucleotide sequence of PEDV, and thereby the sixth entire sequence determined from a coronavirus genome [10±13,15] . The genome of PEDV (CV777) excluding the poly A-tail is 28033 nt in length. Analysis of the newly determined nucleotide sequence revealed a pattern of ORFs typical of coronaviruses. A small ORF with the potential to code for a 12-amino acid peptide was found at the 5 0 end of the genome from nucleotide position 99±137. Such small ORFs (uORFs) are present in all coronaviruses sequenced so far. The uORFs of HCoV-229E [17] and IBV [15] are found to be eleven codons in length, while that of MHV is eight codons long [18, 19] . That of TGEV can only encode a three-amino acid peptide [20] . Two long ORFs of 12354 and 8037 nt, which overlap by 46 nt, covered most of the newly determined sequence. By analogy to published coronavirus sequences [15, 17, 20] , the ORFs were designated ORF1a and ORF1b. The predicted ORF1a of FEDV extended from nucleotide 297 to 12650. This resulted in a 4117-codon ORF. The overlapping ORF1b starting at nucleotide 12605 and ending at nucleotide 20641 had the capacity to code for 2678 amino acids. It has been proposed for coronaviruses and other members of the order Nidovirales [21] that the nucleotide sequences in the overlapping regions of ORF1a and ORF1b are able to fold into a pseudoknot tertiary structure [22, 23] . This region allows the ribosome shifting of the reading frame during translation of the ORF1a and subsequently continues the translation in ORF1b. The function of these RNA structures as ribosomal frame shift sites was demonstrated for the analogous sequences of IBV [24] and HCoV-229E [25] . It seems likely that the translation of the PEDV ORF1b is mediated by such a ribosomal frame shifting. The nucleotide sequences of PEDV, HCoV-229E, and TGEV covering the ribosomal frame shift site are more conserved to each other than to MHV-A59 or IBV. In order to identify the sequence which could be involved in the formation of the tertiary structure, the nucleotide sequences covering the end of ORF1a and the beginning of ORF1b from HCoV-229E [25] and TGEV [20] were aligned with the corresponding sequence of PEDV. Fig. 2A shows the predicted frame shift region of PEDV based on this comparison. The so-called slippery site (UUUAAAC) at which frame shifting occurs is identical in all coronaviruses sequenced so far. The stems and loops required to provide the tertiary structure of the frame shift regions of TGEV and HCoV-229E were compared and Fig. 2B shows the predicted tertiary structure required for the frame shift of PEDV based on this comparison. Pairwise comparison of the deduced amino acid sequences (using GAP) revealed that ORF1b of PEDV is more conserved than ORF1a to corresponding sequences of other coronaviruses. The percentage of similarities and identities is shown in Table 1 . The putative protein sequence of ORF1a was most similar to the sequence of ORF1a of HCoV-229E (59.4%) and less similar to the corresponding ORF1a of TGEV (52.1%), MHV-A59 (39.5%) and IBV (38.7%). The same relationship, but at a higher level of similarity, was true for the deduced amino acid sequence of the predicted PEDV ORF1b. It was most similar to the amino acid sequence of HCoV-229E ORF1b and TGEV ORF1b (83.2% and 80.3%, respectively). The similarity to the ORF1b from MHV-A59 and IBV was around 64%. The deduced amino acid sequences of ORF1a and ORF1b from PEDV were aligned with the corresponding sequences of HCoV-229E, TGEV, MHV-A59, and IBV using PILEUP. The degrees of amino acid homologies are graphically presented as dendrograms (Fig. 3A,B) . The multiple sequence alignments revealed several putative functional domains common to coronavirus sequences [23, 26] located on the deduced amino acid sequence of ORF1ab of PEDV. Some of these had been used to design the primers for the RT-PCR amplification. In the ORF1a region the following motifs were observed. Two motifs indicative of papain-like proteases (Plp) were present at amino acid positions 1077±1266 and 1716±1917. The Plp motif is found twice in the replicase genes of HCoV-229E, TGEV and MHV, but only once in that of IBV. In this respect, PEDV resembles HCoV-229E, TGEV and MHV rather than IBV. A highly conserved region (X-domain) was found between the two Plp motifs. Despite this motif being present in all coronavirus sequences, its function is not yet known. A picornavirus 3C-like (3C1) protease domain is located between amino acids 2998 and 3299 of the PEDV ORF1a. All corona-and arteriviruses encode this motif, which is the main protease for the coronavirus mediated processing of the polyproteins. Three markedly hydrophobic domains conserved among coronaviruses are found in ORF1a. The first is located after the second Plp motif and the others flank the 3Cl motif. Finally, a growth factorlike (Gfl) domain was located close to the end of ORF1a (amino acid position 3965±4000). In the ORF1b region, three structural protein motifs could be recognized, which all play a role in viral replication. A sub-sequence at amino acid position 4636±4939 containing the characteristic tripeptide ORF1 of PEDV SDD (or GDD in most RNA viruses) [26] is probably the active site for the RNA dependent RNA polymerase. A metal ion-binding domain covering amino acids 5027±5103 and a helicase motif at amino acid positions 5309±5624 were also observed in the PEDV ORF1b product. Alignments of the deduced amino acid sequences of the 3Cl protease and the polymerase motif from five different coronaviruses are shown in Fig. 4A and 4B, respectively. The findings concerning conserved domains are summarised in Fig. 1A . A deletion of about 180 amino acids located between the X-domain and the second Plp motif in the putative ORF1a sequence of TGEV compared to that of HCoV-229E was reported by Eleouet et al. [20] . This additional sequence was present in the PEDV ORF1a product. The alignment (using GAP) of the HCoV-229E and PEDV amino acid sequences revealed 42.5% similarity and 31.5% identity in this region. Earlier sequence analysis of PEDV based on the structural protein sequences has shown that PEDV is most closely related to HCoV-229E and TGEV [ 7± 9,27] , less related to MHV-A59, and least related to IBV. However, it was not possible to determine the relative similarities of HCoV-229E, TGEV and PEDV. In this study, the similarities and identities of the amino acid sequence alignments based on ORF1a and ORF1b show clearly that PEDV is most closely related to HCoV-229E and, moreover, that HCoV-229E is more similar in sequence to PEDV than it is to TGEV. In addition to the sequence analysis, the presented work offers various possibilities for future research on coronaviruses. Functional analysis and processing of the as yet uncharacterised PEDV ORF1 is now possible. Recently, Almazan et al. and Yount et al. achieved the generation of infectious TGEV from cDNA [28, 29] and Thiel et al. suceeded in generating full length cDNA clones of HCoV-229E and IBV in a recombinant vaccinia virus system [30] . The sequence and the cDNA clones covering the entire genome of PEDV would allow the development of a mini-genome system to study viral replication or the generation of an assembled, infectious cDNA clone. Bearing in mind the close relationship of PEDV and HCoV-229E, the latter approach could be used to exchange functional parts of these viruses to gain new insights into the biology of these viruses. Furthermore, the Porcine Epidemic Diarrhea Virus Virus Infections of Porcines A reverse genetic system for coronaviruses The authors thank Christa Meyer for excellent technical assistance. These studies were supported by the Swiss National Science Foundation, grant #31-43503.95.