key: cord-0910598-ay4kqfne authors: Penzes, Zoltan; González, Jose M.; Calvo, Enrique; Izeta, Ander; Smerdou, Cristian; Méndez, Ana; Sánchez, Carlos M.; Sola, Isabel; Almazán, Fernando; Enjuanes, Luis title: Complete Genome Sequence of Transmissible Gastroenteritis Coronavirus PUR46-MAD Clone and Evolution of the Purdue Virus Cluster date: 2001 journal: Virus Genes DOI: 10.1023/a:1011147832586 sha: 249d03ec9cda28c48e21538e2f81dff520b56171 doc_id: 910598 cord_uid: ay4kqfne The complete sequence (28580 nt) of the PUR46-MAD clone of the Purdue cluster of transmissible gastroenteritis coronavirus (TGEV) has been determined and compared with members of this cluster and other coronaviruses. The computing distances among their S gene sequences resulted in the grouping of these coronaviruses into four clusters, one of them exclusively formed by the Purdue viruses. Three new potential sequence motifs with homology to the α-subunit of the polymerase-associated nucleocapsid phosphoprotein of rinderpest virus, the Bowman–Birk type of proteinase inhibitors, and the metallothionein superfamily of cysteine rich chelating proteins have been identified. Comparison of the TGEV polymerase sequence with that of other RNA viruses revealed high sequence homology with the A–E domains of the palm subdomain of nucleic acid polymerases. Transmissible gastroenteritis coronavirus (TGEV) belongs to the Coronaviridae family of the Nidovirales order [15, 17] . TGEV is the prototype of group 1 coronaviruses that includes porcine, canine, feline, and human viruses. TGEV is enveloped and spherical in shape, with an internal core and a helical nucleocapsid [18] . Coronaviruses contain a 27.6-31.3 kb singlestranded positive-sense genomic RNA [15] . The virion RNA functions as a mRNA and is infectious [9] . It contains 7-8 functional genes, 4 or 6 of which (the spike S, membrane M, envelope E, nucleoprotein N, and in some strains an internal (I) open reading frame (ORF) of N gene and the hemagglutinin-esterase (HE)) encode structural proteins [15, 35] . In addition, several non-structural proteins are encoded by the coronavirus * Author for all correspondence: Tel.: 34-91-585 4555; Fax: 34-91-585 4915; E-mail: L.Enjuanes@cnb.uam.es genome. The number and location of the non-structural genes vary within coronaviruses of different species. In TGEV the genes are arranged in the order 5 -rep-S-3a-3b-E-M-N-7-3 . Four of them, rep, 3a, 3b, and 7, encode non-structural proteins. To study the molecular biology of coronaviruses, the recent construction of a cDNA encoding an infectious TGEV RNA [1] , the assembly of TGEV genome from six cDNA fragments [72] , and the construction of an infectious cDNA clone for human coronavirus (HCoV-229E) [58] will be of great help. Coronavirus RNA synthesis occurs via an RNAdependent RNA synthesis process in which mRNAs are transcribed from negative-stranded templates [34, 52] . Coronaviruses have transcription regulatory sequences (TRSs) that include a highly conserved core sequence (CS, previously named intergenic sequence [IS]) 5 -CUAAAC-3 , or a related sequence, depending on the coronavirus, at sites immediately upstream of most of the genes. Since genes often overlap in the Nidovirales, the acronym IS does not seem appropriate in these cases and the acronym CS could reflect the nature of the highly conserved sequence contained within the TRS. These sequences represent signals for the transcription of subgenomic mRNAs [34, 52] . Both genome-size and subgenomic negative-strand RNAs, which correspond in number of species and size to those of the virus-specific mRNAs have been detected [54, 55] . The two models compatible with most of the experimental data are leader-primed transcription [34] and discontinuous transcription during negative-strand RNA synthesis [53] . Recently, strong experimental evidence supporting the discontinuous transcription during negative-strand RNA synthesis has been reported [3, 62] . Also the leader-primed transcription has received additional support [41] . The complete sequence of a coronavirus genomic RNA has been first determined for the avian coronavirus infectious bronchitis virus (IBV) [8] . Since then, several other members of the Coronavirus genus have been fully sequenced, including mouse hepatitis virus (MHV) strains A59 [44] and JHM [37] , HCoV-229E [26] , the TGEV PUR46-PAR strain [13, 46] , and the bovine coronavirus (BCoV) [71] . TGEV infects both the epithelial cells of the small intestine and the lung cells of newborn piglets, resulting in a mortality of nearly 100%. The Purdue strain of TGEV was isolated for the first time around 1946 by Haelterman's group in the University of Purdue (Lafayette, Indiana) [23, 38] . The original virus (PUR46-SW11) was passed exclusively in swine. This virus was adapted to grow in swine testis (ST) cells [6, 7] and after 115 passages on this cell line it was cloned and distributed to many laboratories including ours. During the characterization of one of the oldest in vivo passages of the Purdue strain of TGEV (PUR46-SW11) [7, 23] , we observed that this virulent Purdue strain of TGEV was a mixture of at least two TGEV isolates, with remarkable differences in their in vivo and in vitro growth [51] . One of them, clone C11, replicated with high titers in the enteric tract and was virulent, while the other one (clone C8) produced low virus titers in enteric tissues and was attenuated. We report the complete sequence (28,580 nt) of the TGEV PUR46-MAD clone * , a close relative of PUR46-PAR. The evolution of the Purdue cluster of TGEV, from a highly enteric and virulent strain, to a * The nucleotide sequence reported in this paper has been submitted to the GenBank nucleotide sequence database and has been assigned the accession number AJ271965. clone that does not replicate in the enteric tract of conventional piglets and became attenuated is described. In addition, the sequence identity with other TGEV isolates and potential new sequence motifs identified within the replicase domain are reported. Viruses were grown in ST cells [39] . The PUR46-SW11 virus is a historical sample of the Purdue strain of TGEV isolated by Haelterman's group [23, 38] . It was obtained by passing the first TGEV field isolate 11 times in swine intestine; this virus was kindly provided as a 20% suspension of small intestine cells by Dr. M. Pensaert (Gent, Belgium) [23, 38] . From the uncloned virus passaged once in ST cells (PUR46-SW11-ST1), the PUR46-SW11-ST2-C8 (abbreviated PUR46-C8) and PUR46-SW11-ST2-C11 (abbreviated PUR46-C11) clones were plaque-purified [51] . The PUR46-SW11-ST115 was obtained from the PUR46-SW11 by 115 passages in ST cells and was distributed by L. Saif (Ohio State University) to other laboratories, leading to strains PUR46-MAD [31, 50] and PUR46-PAR [13, 46] . The PUR46-MAD strain was derived from the PUR46-SW11-ST115 strain by five cloning steps in ST cells. The selected clone was named PUR46-MAD in reference to the name of the strain (first three letters), year of isolation (two digits) and the specific clone (last three letters). We have used a similar nomenclature to name other strains derived in different laboratories. The Purdue virus strain NEB72 [50] , was renamed PTV (Purdue-type virus) because of its sequence similarity with the PUR46 strain [2] . The PTV clone was probably derived by the passage of a Purdue strain of TGEV in gnotobiotic pigs by the pulmonary route followed by passage in gnotobiotic pig lung cell cultures, and in diploid swine testicular cells with exposure to an acidic (pH 3) environment and incubation with trypsin (M. Welter, Dallas Center, IA). The original TGEV strains that do not belong to the Purdue cluster have been reported [50] . Genomic RNA was extracted from partially purified virus as described [40] . Briefly, ST cells cultivated in roller bottles (500 cm 2 ) were infected at MOI 5. Medium was harvested at 22 h post-infection (hpi) and virions were partially purified as described [31] . The viral pellet was dissociated in 500 l of TNE buffer (0.04 M Tris-hydrochloride pH 7.6, 0.24 M NaCl, 15 mM EDTA) containing 2% SDS, and digested with 50 ng of proteinase K (Boehringer Mannheim) for 30 min at room temperature. RNA was extracted twice with phenol-chloroform and precipitated with ethanol. Cytoplasmic RNA from TGEV infected cells was extracted using a buffer containing urea-SDS and phenol-chloroform [51] . The complete sequence of the clone PUR46-MAD was assembled starting from the sequence of a 9.7 kb defective minigenome (DI-C) derived from the virus [40] . This defective TGEV genome has three deletions of about 10, 1.1, and 7.7 kb in ORFs 1a, 1b, and after initiation of the S gene, respectively. The sequence of minigenome DI-C, the homologous sequence within the virus genome, and that of the 7.7 kb deletion were obtained using RNAs that were amplified by RT-PCR [40] . The resulting PCR products were cloned into pBluescript (Stratagene), pGEM-T (Promega), pCR2.1 (Invitrogen), or pSL1190 (Pharmacia) using standard procedures [49] . cDNA clones covering most of the genome were sequenced with Sequenase 2.0 (USB) or an ABI 373A automated sequencing machine (Applied Biosystems Inc.). The TGEV PUR46-MAD 5 -and 3 -end sequences were determined by primer extension using the 5 /3 RACE (Boehringer Mannheim) starting from 0.5 g of cytoplasmic RNA from virus infected cells. The RT-PCR amplification was performed using the primer 801 rs with a reverse sequence from nt 782 to 801 (see complete TGEV sequence). The primer used to sequence the 5 -end was 364 rs (including nt 365-385). The 3 -end sequence was determined using the primer X3.311vs with virus sense sequence from nt 28,381 to 28,400. The presence of two consecutive 'C' at position 20,347 was assessed by digestion of the cDNA with the BstII restriction endonuclease. The core sequence was obtained by characterizing at least three clones of independent origin. Sequence data were compiled using the Wisconsin Package software Version 9.1 -UNIX, Genetics Computer Group (GCG) (Madison, Wisconsin). Sequences obtained were compared to those of previously published TGEV strains [13, 32, 40, 46, 50] . Sequence differences were confirmed by sequencing three independently derived RT-PCR clones or by direct viral RNA sequencing [19] . Sequence comparison was made by using the Wisconsin Package software version 9.1 -UNIX. The pairwise distances within the group of aligned sequences were obtained using the Jukes-Cantor program of the GCG. The identification of sequence motifs was done with the Psi-Blast program using the Swiss-Prot database available through the European Bioinformatics Institute. Sequences were aligned using the Clustal W sequence alignment program for DNA and proteins [27, 59] . The complete sequence of the PUR46-MAD genome has been determined and it was comprised of 28,580 nt without the poly(A) tail. The 5 two-thirds of this RNA genome (20,368 nt) encode the viral RNA-dependent RNA replicase, while the structural genes are located at the 3 -end of the genome (8,214 nt). It is assumed that the PUR46-MAD RNA has a 5 terminal cap by analogy with other coronavirus genomes [34] . The viral RNA starts with the sequence 5 -ACUUUUAAAG-3 , as determined by 5 extension. At the 3 -end the TGEV genome has a poly(A) tail of unknown length. The Purdue virus cluster (Table 1) is defined as a set of viruses closely related in sequence, that are derived from the original PUR46-SW11 strain of TGEV. The sequence differences among these viruses are shown ( Fig. 1 ) in relation to the sequence of the PUR46-MAD, the prototype strain of our laboratory. The Purdue virus cluster includes two clones that were isolated from the original in vivo virus stock (virulent PUR46-C11 and attenuated PUR46-C8), clone PUR46-MAD (passaged 120 times in ST cells) with reduced replication in the enteric tract and partially attenuated, and clone PTV that does not replicate within the gut of conventional piglets and is fully attenuated (Table 1) . The link between these cluster members is their passage history [51] or their sequence identity within the 3 -end 8,214 nt ( Fig. 1 ). PTV only has 5 nt changes within the 3 -end 8.2 kb in comparison to the PUR46-MAD clone ( Fig. 1 ). This accumulation of nucleotide substitutions represents 0.57 nt changes per one thousand nucleotides, much lower than the 2.5 per one thousand nucleotides accumulated between the PUR-C8 and PUR-C11 clones. The 3 -end of the PUR46-MAD genome has complete sequence identity with clone C8. Comparison of the 3 -end 8.2 kb sequences of clones PUR46-C11 and PUR46-C8 revealed 22 nt differences, 14 of them in the S gene (Fig. 1) . Three of these nucleotide substitutions were in non-coding regions, one downstream the S gene stop codon (nt S-4370) and upstream the 3a gene, and two on the 3b gene (nts 3b-332 and 3b-432). The other nucleotide substitutions were scattered through the other 3 -end genes. In addition, there was a 6 nt deletion in the PUR46-C8 clone. This deletion has been considered a trade mark of all TGEV Purdue strains since it is present in all Purdue isolates sequenced except the parental PUR46-C11 clone [10, 11, 46, 47, 50, 67] . The sequences of the S genes from PUR46-C8 and PUR46-C11 clones were compared with those of the S genes from other nine TGEV strains, by computing the distances among their S genes using the Jukes-Cantor method. The results indicated that the 11 virus isolates could be grouped into four clusters according to their sequence homology (Fig. 2) . These clusters had increasing computing distances with viruses of the PUR46 cluster and with the TOY56, ranging between 0.0-0.5, 1.3-1.7, 2.0-2.98, and 2.98-3.4, and were formed by the isolates: (i) Purduetype viruses (PUR46-C11, PUR46-C8, PUR46-MAD, and PUR46-PAR); (ii) TOY56 and MIL65-AME; (iii) BRI70 and TAI83, and (iv) Porcine respiratory coronavirus (PRCoV) strains FRA86-RM4, ENG86-II, and HOL87, respectively. This organization of TGEVs into clusters matches the previously reported evolutionary tree [50] . The PUR46-MAD and the PUR46-PAR have similar virulence. Both clones are attenuated in colostrumfed swine and virulent in colostrum-deprived animals [2, 4, 21, 51] . PUR46-MAD replicates to a limited extent within the enteric tract (between 10 2 and 10 3 pfu/gram of tissue), and causes the death of two-day-old newborn piglets (LD50 = 1×10 4 pfu/animal). The PUR46-PAR clone was the first TGEV strain completely sequenced [13] . The 29 nt substitutions detected between PUR46-MAD and PUR46-PAR clones are responsible for 14 amino acid (aa) changes ( Table 2) . On some occasions, these changes represented insertions or deletions. One of these changes was a nucleotide (nt 20,347) deletion in the PUR46-PAR that led to a frame shift located in a region close to the end of ORF 1b and two nucleotide differences (one insertion and one deletion in the PUR46-MAD) in the non-coding region at the 3 -end of the genome (nt 28,331 and 28,440), respectively ( Table 2 ). Within the region that encodes the structural proteins at the 3 -end of the genome (nts 20,365-28,580), 12 nt differences were found, five of which resulted in amino acid changes ( Table 2) . The nine ORFs identified in the TGEV genome (PUR46-MAD clone) are summarized ( Table 3 ). The first 93 nt of the TGEV sequence correspond to the leader, defined as the motif preceding the first CS 5 -CUAAAC-3 . The CS is afterwards repeated along the genome at different nucleotide distances (3-37 nt) from the first codon (AUG) of each gene (Fig. 3A ). In addition, there is another 5 -CUAAAC-3 sequence 120 nt after the first initiation codon of the S gene. In principle, this CS could be responsible for the synthesis of a mRNA that has not been detected, although its size similarity with that of the S gene could have prevented its identification (S. Alonso, I. Sola, and L. Enjuanes, unpublished data). Transcription in coronavirus requires the discontinuous synthesis of the mRNAs in order to link the leader to the coding sequences of each mRNA. This process requires a complementarity between the sequences downstream of the 3 -end of the leader and the sequences flanking the complement of the CS (cCS) in the negative strand [34, 52, 62] . The extent of this complementarity could regulate transcription and was calculated for the TGEV PUR46-MAD strain using two procedures: by computing the complementary nucleotides in an uninterrupted segment of sequence around the CS, or by calculating the total number of complementary nucleotides for a sequence segment including the 6 nt of the CS and 12 nt flanking both the 5 -and the 3 -ends of the CS (30 nt total) (Fig. 3B) . The amount of each mRNA produced after infection with the PUR46-MAD strain, as determined by Northern blot analysis with a probe specific for the 3 -end of the genome (results not shown) was not related to the extent of the potential basepairing (Fig. 3B) . The largest mRNA is the genomic RNA that also serves as the mRNA for ORF 1a and 1b. The remainder are subgenomic mRNAs designated mRNA 2-7 (with the exception of the mRNA 3-1 corresponding to ORF 3b), in the order of decreasing size, encoding ORFs 2 (S), 3a, 3b, 4 (E), 5 (M), 6 (N), and 7 (Table 3 ). In the PUR46-MAD clone of TGEV, and in the other Purdue strains, the CS corresponding to the ORF3b has the sequence 5 -CUAAAU-3 where the 'C' in the last position of the CS is replaced by a 'U'. Consequently, mRNA 3-1 encoding gene 3b was not observed [30] . In contrast, this RNA has been detected in cells infected with the MIL65 strain of TGEV which has a standard CS in the homologous position [67] . A potential internal ORF starting at amino acid 77 is observed within the N gene. This ORF is within the same frame as the full-length N protein (383 aa) and could lead to a potential truncated N protein of 306 aa with an estimated molecular mass of 35 kDa. A truncated N protein with an estimated molecular mass of around 41 kDa, instead of 44 kDa of the fulllength protein, has been regularly observed by Western blot analysis in TGEV infected ST cells using N specific monoclonal antibodies (results not shown). This band is larger than the one expected for the truncated protein associated to a potential internal initiation of translation and possibly corresponds to a protease cleaved product (see below). The precise location of PUR46-MAD ORF 1a-1b predicted motifs (Table 3) and their distribution along the genome is indicated (Fig. 4) . These include already described motifs such as two papain-like proteinase domains (PL1 and PL2), a 3C-like (3CL) protease domain, a growth factor-like (GFL) domain, the ribosomal slippage site 5 -UUUAAAC-3 (RSS), the pseudoknot (PKnt), the polymerase (Pol), metal ion binding domain (MIB), helicase (Hel), ORF 1b variable domain (VD), and a conserved domain (CD) [13] . [13] . In addition, new domains showing sequence homology (Fig. 5) with a RVPh, a Bowman-Birk type serine proteinase inhibitor (BBPI), and a metallothionein-like protein (MTh) are also indicated. The predicted biological activity has not been experimentally proven. The position of the first and last nt or aa of each domain within the virus sequence is shown. In addition, we have identified three potential new domains (Figs. 4 and 5) showing variable sequence homology with other sequences: (i) 28% (41/148) amino acid identity with a phosphoprotein of rinderpest virus (RVPh). This protein has 507 aa and is probably a component of the active RNA-directed RNA polymerase alpha-subunit that may function in template binding [69] (Fig. 5A) ; (ii) 30% (15/49) amino acid identity with the invariant active site (core region) of the W1P1 Bowman-Birk serine proteinase inhibitor (BBPI) described in plants, and significant identity with other BBPI proteinases [42, 48] . These proteins have 102 aa including seven highly conserved cysteine residues. Interestingly, four of these residues are also conserved within the TGEV replicase sequence (Fig. 5B) ; and (iii) 25% (18/72) amino acid identity with LeMTA metallothionein-like protein (MTh) of plants and of significant identity with other MTh [68] . Of the 72 aa that represent the full-length of this metallothionein, 14 are cysteines and 7 of them are also conserved in the TGEV motif (Fig. 5C ). Further work needs to be done to determine whether TGEV would have the activities potentially encoded by the identified domains. Five motifs (A-E) have been defined in the palm subdomain of nucleic acid polymerases [24] . The amino acid sequence of the TGEV RNA polymerase The sequences in (A) were previously reported [69] . The sequences in (B) for Vicia faba, Vicia angustifolia, wheat germ, Arachis hypogea, wound-induced protein from maize (W1P1), and mung bean proteinase inhibitor (MBPI) were previously reported [42, 48] . Sequences (C) for Arabidopsis thaliana, coffee, Lycopersicon esculentum L. metallothionein (LeMT), wheat Al, and barley were reported [68] . Black and gray boxes indicate identity or similarity, respectively, with the corresponding residue in other sequences. Complete residue identity in all included sequences is denoted with an asterisk. Domain prediction was performed using the Psi-Blast program and the sequences were aligned using the ClustalW program. The number to the left of each sequence indicates the amino acid aligned or the amino acid within the replicase polyprotein (TGEV-PUR46-MAD). was compared to that of other coronaviruses and positive strand RNA viruses and similar domains have been identified in the coronavirus polymerases (Figs. 6 and 7 ). An interesting difference between the TGEV and other coronaviruses, in relation to polymerases of other RNA viruses, is the presence of a 44 aa linker sequence between B and C motifs in coronaviruses. This is in contrast to a 1-8 aa linker present in other RNA virus polymerases analyzed, except in the yellow fever virus (YFV) with a linker of 30 aa (Fig. 6) . Motif A of TGEV polymerase shows significant homology with the A motif of other positive RNA viruses (Fig. 7) . All of these viruses maintain the conserved amino acids D4613 and D4618 of the catalytic site. TGEV motif B has the highest homology with other positive strand RNA viruses with identical amino acids in the highly conserved positions S4677, G4678, T4682, and N4686 ( Fig. 7 Motif B). The coronavirus motif C, relevant in copy fidelity, includes the SDD (aa 4,754-4,756) sequence in substitution to the classic GDD conserved in all positive strand RNA viruses that have been studied. Motifs D and E are less conserved between coronaviruses and other positive strand RNA viruses. The complete sequence of the PUR46-MAD clone has been determined and its relation with other members of the Purdue cluster of viruses and with other coronaviruses has been defined. In addition, the role of [24, 43] . The amino acid sequence of the TGEV RNA polymerase motifs is shown in comparison with those of other coronaviruses and of positive strand RNA viruses. The organization of motifs was generated to obtain maximum alignment of highly conserved amino acids in the case of A, B, and C motifs, and it was based on motif length and position of highly conserved residues in the case of D and E motifs with limited homology. Multiple sequence alignments were performed using the ClustalW program [60] . Black and gray boxes indicate identity or similarity, respectively, with the corresponding residue of the other viruses. The sequences included within the alignments are from: (i) TGEV, this manuscript; (ii) HCoV-229E, HEV, BCoV, MHV, HCoV-OC43, IBV [57] ; (iii) Polio 3Dpol, TMV p183, HCV NSP5, TBSV p92, BMV 2a [43] , and YFV and PPV [22] , or from references cited within these publications. HCoV, human coronavirus; HEV, porcine hemagglutinating encephalomyelitis virus; BCoV, bovine coronavirus; MHV, mouse coronavirus; IBV, infectious bronchitis virus; other acronyms as in Fig. 6 . Amino acid positions are provided in relationship to the first amino acid of the viral replicase in the case of the coronavirus sequences. Complete residue identity is denoted with an asterisk. ND indicates that the number of the first amino acid is not known because the complete sequence of the virus is not available. the complementarity between the 3 -end of the leader and the CS has been analyzed, and three new potential sequence motifs have been identified along the replicase gene. The first nucleotide of PUR46-MAD clone was an A, coinciding with the 5 sequence of the PUR46-PAR clone [13] . Interestingly, when synthetic TGEV minigenomes were cloned behind T7 bacteriophage promoter [30] or after cytomegalovirus (CMV) promoters [45] , where the first engineered viral nucleotides were a 'C' or an 'A', respectively, the synthetic minigenomes were replicated by the helper virus, indicating that the nature of the first nucleotide, at least for minigenome rescue, was not absolutely critical. The length of the poly(A) tail at the 3 -end of the TGEV genome is not accurately known. Nevertheless, minigenomes or a full length infectious RNA with a poly(A) of 24 residues have been constructed which are efficiently replicated, indicating that 24 residues are enough for TGEV RNA replication [1, 30] . In fact, MHV minigenomes with 5-, 10-, and 68-nt poly(A) tails were replicated during BCoV infection [56] . Poly(A) tails of larger length (100-130 nt) have also been detected in coronaviruses [28, 33, 70] . Coronavirus poly(A) tail is essential for virus replication [56] , but the summarized data suggest that there is a high flexibility with the length of this poly(A). The different members of the Purdue virus cluster (Table 1 ) are closely related. Two of them, PUR46-C8 and PUR46-C11, were isolated from the same animal. These clones seem to have evolved through the accumulation of nucleotide substitutions and a small deletion. A comparison of the S gene sequences among eleven TGEV isolates (Fig. 2) showed that clone PUR46-C11 had the lowest computing distance (0.35) with clone PUR46-C8, while the computing distances with other TGEVs such as the MIL65 strain and with the PRCoVs were higher than 2.0 and 3.0, respectively. These data strongly suggest that clone C8 is derived from C11, and not from other viruses circulating at the same time and geographical area, such as the MIL65 strain isolated in Fredericksburg, Ohio [5] (R.D. Wesley, Personal communication). The PUR46-C11 clone could be a recent ancestor of the MIL65 strains of TGEV according to the epidemiological tree previously described [50] . The computing distances between the PRCoV isolates (FRA86, ENG86, and HOL87) were higher with the members of the Purdue cluster of viruses than with MIL65 and BRI87, suggesting that the PRCoVs were more likely derived from strains related to MIL65 or BRI87 TGEVs. A surprising observation was the high conservation of the RNA sequence of the PUR46-MAD virus upon passage on ST cells, since almost one-third of its genome (8,221 nt) , that encodes all the structural and three small non-structural proteins, has a complete sequence identity with the PUR46-C8 clone, with only two passages on the same ST cell line. This sequence identity may indicate that the selected virus has a highly favored sequence to grow in ST cells. In contrast, within the full-length PUR46-PAR genome that was passaged in a different cell line (PD-5 cells) [46] , the estimated number (1×10 −4 ) of nucleotide substitutions per nucleotide and replication cycle, in relation to the PUR46-MAD was higher and within the expected range for a RNA virus genome [12] . The amount of each mRNA produced after infection with the PUR46-MAD strain was not proportional to the extent of the potential basepairing, indicating that in TGEV mRNA abundance is not exclusively regulated by the complementarity between the sequences at the 3 -end of the leader and the sequences complementary to the TRSs in the negative RNA strand, in agreement with previous observations in MHV [34, 61] . In addition, although the mRNAs closer to the 3 -end of the genome are in general more abundant, the relative amount of each mRNA did not precisely correlate with the proximity of each mRNA leader to the 3end of the virus genome, in contrast to what has been described in other positive-strand RNA viruses [20, 61] and also in the negative-strand ones [25, 29, 64] . These results suggest that, in addition to basepairing between the 3 -end of the leader and the TRS complementary sequences [cTRSs], transcription in coronaviruses may be controlled by other viral and cellular factors, including TRS primary and secondary structure. In fact, it has been suggested that the discontinuous transcription that takes place during mRNA synthesis, is probably mediated through the interaction of proteins with both the 3 -end of the leader and with the cTRS, and then, by the binding between these proteins [34, 62] . This protein-RNA interaction most likely requires the recognition of a RNA-TRS primary and secondary structure larger than the CS. The presence of ORFs 3a or 3b in different TGEV strains is variable [2, 10, 16, 36, 63, 67] . TGEV strains, such as the MIL65, express both ORFs 3a and 3b [65] . In contrast, other strains such as a small plaque (SP) mutant of the MIL65 strain, express none of these ORFs [66] . All these strains infect swine implying that ORFs 3a and 3b are non-essential for virus growth in tissue culture or in vivo, facilitating the loss of ORF 3b during the passage of the TGEV Purdue strains or the PRCoV isolates. The truncated N protein, with an estimated molecular mass of around 41 kDa instead of 44 kDa of the full-length protein, regularly observed by Western blot analysis in TGEV infected ST cells, most likely corresponds to a caspase-mediated cleavage induced during the apoptosis of TGEV infected cells as previously reported [14] . It has been shown that the N protein sequence VVPD359 located 23 aa residues upstream of the carboxy-terminal end of the N protein is cleaved leading to the apparition of a shorter form of N protein in infected cells. The observed sequence is also present in other coronavirus N proteins, including the PRCoV. This protein is not found in the purified virions [14] . Polymerase motif C showed that coronaviruses had the DD sequence conserved as in the RNA-dependent DNA polymerases of retroviruses and in RNAdependent RNA polymerases of double-stranded RNA and segmented (−) strand viruses. But, in contrast to these viruses, coronaviruses have the SDD motif instead of the more common GDD one [24, 43, 57] . The high conservation among the Groups 1, 2, and 3 coronavirus polymerase domains in relation to other positive strand RNA viruses, and the conservation of additional replicase domains, for example, the carboxyterminal ORF 1b domain for which no homologue can be found in the other viral replicases, clearly indicates that the Nidovirales replicases are more related to each other than to any other group of positive-stranded RNA viruses [17] . The longer linker (44 aa) identified between the polymerase palm subdomain motifs B and C will also support the grouping of coronavirus polymerases as a subset within the positive-stranded RNA viruses. Motif B of the coronavirus polymerase sequence is also more closely related to the poliovirus polymerase than to the homologous domain of the other viruses analyzed. Three new potential domains have been identified in the TGEV replicase showing limited amino acid homology with the α-subunit of the polymeraseassociated nucleocapsid phosphoprotein of rinderpest virus, the Bowman-Birk type of proteinase inhibitors, and the metallothionein superfamily of cysteine rich chelating proteins [48, 68, 69] . We think that the sequence identities observed are possibly significant because of the number of conserved residues and, at least in the cases of the BBIP and metallothioneins, due to the highly conserved cysteine residues, generally relevant to protein structure and function. Nevertheless, the limited sequence homology observed does not imply that these domains will provide the virus with the corresponding activities. The role of these domains in TGEV replication is being investigated. Proceedings United States Livestock Sanitary Association The Evolutionary Biology of Viruses-Mutation Rates and Rapid Evolution of RNA Viruses Virus Taxonomy. Classification and Nomenclature of Viruses-Coronaviridae Coronaviruses and Arteriviruses Classification and Nomenclature of Viruses-Nidovirales Pathogenesis of transmissible gastroenteritis of swine The Coronaviridae-The Coronavirus Nucleocapsid Protein Molecular Cloning: A Laboratory Manual. 2nd edn. Cold Spring Harbor Laboratory Nidovirus: Coronavirus and Arterivirus-Full-Length Genomic Sequence of Bovine Coronavirus The Coronaviridae-Coronavirus Replication, Transcription, and RNA Recombination Nidovirus: Coronavirus and Arterivirus-Full-Length Genomic Sequence of Bovine Coronavirus We would like to thank the helpful comments of Dr. Alexander Gorbalenya on sequence motifs. This research was supported by grants from the Comisión Interministerial de Ciencia y Tecnología (CICYT, Spain), the Dirección General de Investigación