key: cord-0008902-cefl4zot authors: Ling, Roger; Davis, Philip J.; Qingzhong, Yu; Wood, Carla M.; Pringle, Craig R.; Cavanagh, David; Easton, Andrew J. title: Sequence and in vitro expression of the phosphoprotein gene of avian pneumovirus() date: 2000-02-25 journal: Virus Res DOI: 10.1016/0168-1702(95)00008-e sha: 39b3696753dbfa8b233de7e4707b5f33e9fc593e doc_id: 8902 cord_uid: cefl4zot The phosphoprotein (P) gene of two subgroup A strains of avian pneumovirus comprised 855 nucleotides containing only one substantial open reading frame encoding a protein of 278 amino acids, with a predicted M(r) of 30,323. In vitro translation of P mRNA in a wheat germ system resulted in the synthesis of two polypeptides of M(r) 35,000 Comparison of the deduced P protein sequence with that of the known mammalian pneumoviruses revealed overall amino acid identities ranging from 31 to 34.5%, suggesting a distant relationship. However, there was a much higher identity (63.2–68.4%) in a region of 57 residues, which included a heptad repeat sequence. The primary aetiological agent causing rhinotracheitis in turkeys, initially referred to as turkey rhinotracheitis virus, was isolated in South Africa in 1978 and subsequently in Europe in 1985. The virus can also replicate in the domestic fowl and other species (for references see Cavanagh, 1992; Naylor and Jones, 1993) . The morphology of the virus particle viewed in the electron microscope, together with the number and mobility of the virus proteins and mRNA species on gels suggested that this was the only example of an avian pneumovirus (APV) described to date (Cavanagh and Barrett, 1988; Collins and Gough, 1988; Ling and Pringle, 1988) . Subsequent molecular analyses have confirmed the close genetic homology of APV with human and bovine respiratory syncytial (RS) viruses, the type member of the genus Pneumovirus of the sub-family Pneumovirinae, and pneumonia virus of mice (PVM), the only other known mammalian pneumovirus. In particular, there are extensive amino acid similarities in the predicted proteins encoded by the fusion (F) (Yu et al., 1991; Chambers et al., 1992) , the matrix (M) (Yu et al., 1992a) and the 22K (or M2) (Ling et al., 1992; Yu et al., 1992b) genes, with conserved structural features in the proteins encoded by the SH (Ling et al., 1992) and G (attachment protein) (Ling et al., 1992; Juhasz and Easton, 1994) genes. Two subgroups, A and B, of APV have been defined on the basis of extensive differences in the G protein (Juhasz and Easton, 1994) . The presence of the conserved 22K protein gene strongly suggests that APV is closely related to the mammalian viruses as this gene is found only in pneumoviruses. Surprisingly, the genetic organization of the avian virus was found to be different from that of the other pneumoviruses, with the SH and G genes being more distal to the proposed single polymerase entry site at the 3'-end of the genome in APV (Ling et al., 1992; Yu et al., 1992a) . The major feature which distinguishes the pneumoviruses from other members of the family Paramyxoviridae in the electron microscope is the morphology of the helical nucleocapsid (Berthiaume et al., 1974) . The nucleocapsid complex is formed by an association of the genomic RNA, the major nucleoprotein (N), a phosphoprotein (P) and the polymerase (L) protein. The complex is responsible for the various RNA synthesis processes of virus genome replication and transcription within the infected cell, and as such has a pivotal role in virus replication. Until recently, it was thought that a major distinguishing feature between pneumoviruses and other paramyxoviruses was that the gene encoding the pneumovirus phosphoprotein was monocistronic, while the equivalent gene on other paramyxoviruses encode several protein products, either as a result of internal initiation by ribosomes both within and outside the major open reading frame (ORF), or by virtue of non-templated insertion of bases during P gene transcription making available alternative ORFs (Shioda et al., 1983; Satake et al., 1984; Bellini et al., 1985; Lambden, 1985; Galinski et al., 1986; Spriggs and Collins, 1986; Lopez et al., 1988; Johnson and Collins, 1990; Kondo et al., 1990; Matsuoka et al., 1991; Mallipedi and Samal, 1992) . Recently, analysis of the PVM P gene has shown that it contains two ORFs similar to the situation with other paramyxoviruses and that initiation at internal AUG codons directs the synthesis of several polypeptides in vivo (Barr et al., 1994) . In the light of this difference in the pattern of expression of the P gene of the two mammalian pneumoviruses it was of interest to investigate the P gene of the avian pneumovirus. Two isolates of APV, CVL 14/1 (Collins and Gough, 1988; Ling et al., 1992) and UK/3BV/85 (McDougall and Cook, 1986; Cavanagh and Barrett, 1988) were grown in BSC 1 and Vero cells, respectively. Total cytoplasmic RNA was isolated from virus-infected cells and poly(A)+ RNA selected as described previously for CVL 14/1 (Ling et al., 1992) and 3BV (Yu et al., 1991) . cDNA was prepared by the method described previously (Ling et al., 1992) using an oligo d(T) primer. The cDNA was amplified by the polymerase chain reaction. The P gene of strain CVL 14/1 was amplified by polymerase chain reaction using an oligonucleotide designed to anneal to the poly(A) tail (ATTAACCCT-CACTAAG(T)15) and a second oligonucleotide designed to anneal with the conserved sequence at the Y-end of avian pneumovirus genes shown underlined (ATCCCCGGGACAAGTA). A DNA fragment representing the full-length P gene was produced following amplification (not shown). The DNA fragment was isolated from an agarose gel, digested with a variety of restriction enzymes and the resulting fragments cloned into an appropriate plasmid vector. The sequence of the fragments was determined using the chain termination method of Sanger et al. (1980) . A total of 19 independent overlapping clones were sequenced to generate the consensus sequence for the P gene. Both strands of the clones were sequenced and all regions of the gene were sequenced from 3 to 8 times on independent clones with the exception of the Y-terminal 100 nucleotides which were sequenced only twice. The use of a PCR primer for the 5'-end of the gene precluded direct confirmation of the terminal 10 nucleotides. cDNA synthesis of the P gene of the 3BV strain was performed using a cDNA synthesis kit (Amersham International) and a number of oligonucleotide primers, the cDNA being cloned into pBluescript KS +. Firstly, two clones, MF35 and MF36, were generated using a primer corresponding to F gene sequence (Yu et al., 1991) . These clones were derived from a polycistronic mRNA and comprised the following sequence: up to 236 bases of the 3'-end of the P gene; an intergenic region comprising a single base; the M gene; and F gene sequence. Secondly, two clones (D1, D12) were obtained by oligo(dT) priming. These extended from the poly(A) tail to base 304 from the start of the P gene. Thirdly, two clones (D38, D40) were obtained after priming with two P-gene specific oligonucleotides. One clone (D40) was believed to extend beyond the beginning of the P gene. All clones were sequenced on both strands and all except the first 33 bases were sequenced from two or three clones. Polycistronic P sequence-containing mRNAs of APV strain 3BV were detected by Northern blot analysis (Yu et al., 1992a,b) . A random hexanucleotide-primed 32P-labelled probe was prepared using a Rs'aI restriction fragment from clone D38. Infectious bronchitis coronavirus mRNAs were used as RNA size markers. The 5'-terminus of the P mRNA of APV strain 3BV was determined by primer extension using a 32P-end-labelled 17-mer oligonucleotide complementary to nucleotides 86-102 of the P mRNA sequence shown in Fig. 1 , as described previously (Yu et al., 1992a,b) . The same oligonucleotide was used to prime a sequencing reaction (Yu et al., 1991) using as the substrate pBluescript containing insert D40, which was believed to extend beyond the beginning of the P gene. A full-length copy of the CVL 14/1 strain P gene was inserted into plasmid pBluescribe (Stratagene Inc.) downstream of the T3 promoter. Plasmids were linearized by digestion with XhoI which recognizes a sequence in the polylinker region at the 3'-end of the P gene insert. The P gene was transcribed in vitro using T3 RNA polymerase and the RNA used to direct translation in a wheat germ translation system in the presence of 35S-methionine. Translation products were resolved by electrophoresis in a 12% polyacrylamide gel followed by autoradiography. Primer extension analysis of the P mRNA resulted in a 32p-labelled cDNA which co-migrated in a sequencing gel with the first G of the sequence GGGA-CAAGU of the P gene clone D40, which was believed to extend beyond the beginning of the P gene (Fig. 2) . The sequence GGGACAAGU occurs (in the opposite, negative sense) at the beginning of all APV genes studied, namely those encoding the M, F, M2, SH and G proteins. Thus this confirmed that the start of the P gene corresponds to the sequence shown in Fig. 1 . The sequence of the APV P gene and the deduced amino acid sequence of the predicted protein is shown in Fig. 1 acids with a predicted M r of 30,323. The mRNA contained only one substantial ORF and in that respect was similar to the P genes of human and bovine RS viruses (Satake et al., 1984; Lambden, 1985; Johnson and Collins, 1990; Mallipedi and Samal, 1992) rather than that of PVM (Barr et al., 1994) . Clones MF35 and MF36 had been generated from polycistronic mRNAs and contained the end of the P gene in addition to the whole of the 3'-adjacent M gene and the beginning of the F gene. This revealed that the sequence in the genome immediately preceding the GGGACAAGU start of gene M was (in the positive Fig. 1 ) was used to prime synthesis of cDNA on P mRNA substrate (lanes 1 and 6). The same oligonucleotide was used to prime a sequencing reaction using as substrate pBluescript containing APV insert D40, which extends beyond the beginning of the P gene (lanes 2-5). All reactions were analyzed in a 6% polyacrylamide sequencing gel. The arrows indicate the primer extension product in lanes 1 and 6 which comigrate with the first nucleotide in the sequence GGGACAAGU derived from D40, confirming that this sequence was at the beginning of the P mRNA. Yu et al., 1992a) . This closely resembles the transcription termination/polyadenylation signal sequence of the other sequenced APV genes. For example, the F and M2 genes have the sequences AGUUA UUU AAAA and AGUUA AUU AAAA at the end, followed by an intergenic region. In the case of the M-F and F-M2 junctions, the intergenic regions comprise only two nucleotides. Thus the P-M junction sequence AGUUA UG AAAAAA U most likely comprises an AGUUA UG AAAAAA transcription termination/polyadenylation signal followed by a single nucleotide, U, intergenic sequence. Analysis of clone D40 showed that immediately before the start of the P gene was the sequence AGUAA UU UUUUUU UAU, part of which is shown in Fig. 2 . The first 13 nucleotides of this sequence probably form the termination/ polyadenylation signal of the preceding gene, the UAU being a trinucleotide intergenic sequence. Northern blotting showed that in addition to the monocistronic P mRNA approximately 40% of P gene transcripts were in the form of polycistronic mRNAs (data not shown). Translation of APV P gene mRNA in a wheat germ system directed the synthesis of two polypeptides with M, of approximately 35,000 (Fig. 3) . This size contrasts with the expected M, (30,323), but it is consistent with the observation that phosphoproteins of several negative strand RNA viruses display aberrant electrophoretic mobilities (Gallione et al., 1981; Johnson and Collins, 1990; Caravokyri and Pringle, 1992 ) and agrees with that described for the phosphoprotein synthesized in infected ceils (Ling and Pringle, 1992) . The nucleotide sequences of the two APV isolates examined in this report were identical. Both were isolated within a few months of each other at different sites in England in 1985, the first year in which APV is known to have caused disease in England. This suggests that they may be isolates of a single strain which was probably introduced into England in 1985. An unusual feature of the APV P protein is the high proportion of glutamate (E) residues in the C-terminal portion. While large numbers of charged, especially glutamate, residues are found at the C-terminus of all pneumovirus P proteins (Fig. 4) , the APV P protein contains many more, occurring consecutively or as short repetitive elements. For example, the sequence GESESEEES (residues 253-261) is immediately repeated almost exactly (Fig. 1) . This was observed with all clones of both strains. The significance of the presence of this highly negatively charged C-terminus is not clear, but the C-terminal portion of the Sendai virus P protein has been shown to be important for interacting with the major nucleocapsid protein in the nucleocapsid complex (Ryan et al., 1991) . If this is also the case for pneumoviruses, the C-terminus may interact with a basic region in the nucleoprotein. A striking aspect of the aminoterminal portion of the P protein among the pneumoviruses as a group is the high concentration of proline (P) residues; there are 4-to 8-fold more prolines in the first 100 residues than in the remaining 142-190 residues. This suggests that the amino terminal region of the P protein may adopt an unusual conformation. In common with the other pneumoviruses the APV P protein is devoid of tryptophan (W) residues but whereas human and bovine RS viruses have no cysteine (C) residues in the P protein, APV and PVM have two and 5 cysteine residues, respectively. Comparison of the predicted amino acid sequence of the avian pneumovirus P protein with other paramyxovirus phosphoproteins revealed significant levels of amino acid identity only with those of other pneumoviruses. The levels of similarity observed with the mammalian pneumoviruses are shown in Table 1 . It can be seen that the overall levels of similarity are lower than any other pairwise comparison within the pneumoviruses, suggesting only a distant relationship. The immediate N-termini show a little sequence similarity, but as with PVM, APV appears to contain an insertion of sequence when compared to the RS viruses. A high degree of sequence similarity was seen only in a region spanning residues 153-209 of the APV P protein. The N-terminal portion of this region contains a heptad repeat sequence which is often involved in the formation of a-helices and coiled-coils ( Fig. 4 ) (McLachlan and Karn, 1983; Cohen and Parry, 1986) . The heptad region terminates with a helix-breaking proline residue. The high degree of conservation of such a motif may indicate that this region is involved in intra-or inter-molecular interactions. The sequence conservation continues beyond the heptad repeat region for a further 26 residues, with one region ARDG/EIRDA, highly conserved in all pneumoviruses sequenced to date. The levels of sequence identity of the various pneumoviruses compared pairwise for this conserved region are shown in Table 1 . This exceptionally high level of sequence conservation suggests that this region may play an important role in either the RNA synthesis processes carried out by the nucleocapsid complex, or in maintaining the structural integrity of the complex. In vitro translation of the APV P protein resulted in the synthesis of two polypeptides of very similar M r (Fig. 3) . The larger, fainter, of the two observed polypeptides is likely to have been translated by initiation at the 5'-proximal AUG at nucleotides 14-16 while the stronger band probably reflects initiation at one or both of the next two AUG codons which are adjacent at nucleotides 44-49 (Fig. 1 ). Of these 3 AUG codons, the third (nucleotides 47-49) is in the best context for translation in vitro (Kozak, 1987) . The significance of these extra potential translational start sites in vivo is not known, but it is interesting to note that the phosphoprotein described by Cavanagh and Barrett (1988) and by Ling and Pringle (1992) did not appear as a discrete band. Whether this represents heterogeneous commencement of translation in vivo or is the result of heterogeneous post-translational modification such as phosphorlyation is unclear. Molecular cloning and sequence analysis of the phosphoprotein, nucleaocapsid protein, matrix protein and 22K(M2) protein of the ovine respiratory syncytial virus Sequence of the phosphoprotein gene of pneumonia virus of mice: expression of multiple proteins from two overlapping reading frames Measles virus P gene codes for two proteins Comparative structure, morphogenesis and biological characteristics of the respiratory syncytial (RS) virus and pneumonia virus of mice (PVM) Effect of changes in the nucleotide sequence of the P gene of respiratory syncytial virus on the electrophoretic mobility of the P protein Recent advances in avian virology Pneumovirus-like characteristics of the mRNA and proteins of turkey rhinotracheitis virus Sequence analysis of the fusion glycoprotein of pneumonia virus of mice suggests possible conserved secondary structure elements in paramyxovirus fusion glycoproteins ) a helical coiled coils: a widespread motif in proteins Characterisation of a virus associated with turkey rhinotracheitis Molecular cloning and sequence analysis of the human parainfluenza 3 virus RNA encoding the P and C proteins Nucleotide sequence of the mRNAs encoding the vesicular stomatitis virus N and NS proteins Sequence comparison of the phosphoprotein mRNAs of antigenic subgroups A and B of human respiratory syncytial virus identifies a highly divergent domain in the predicted protein Extensive sequence variation in the attachment (G) protein gene of avian pneumovirus: evidence for two distict subgroups Sequence analysis of the phosphoprotein (P) genes of human parainfluenza type 4A and 4B viruses and RNA editing at transcript Of the P genes: the number of G residues added is imprecise An analysis of 5' non coding sequences from 699 vertebrate messenger RNAs Nucleotide sequence of the respiratory syncytial virus phosphoprotein gene Turkey rhinotracheitis virus: in vivo and in vitro polypeptide synthesis Sequence analysis of the 22 K, SH and G genes of turkey rhinotracheitis virus and their intergenic regions reveals a gene order different from that of other pneumoviruses Nucleotide sequence of the fusion and phosphoprotein genes of human respiratory syncytial (RS) virus Long strain: evidence of subtype genetic heterogeneity Sequence comparison between the phosphoprotein mRNAs of human and bovine respiratory syncytial virus identifies a divergent domain in the predicted protein The P gene of human parainfluenza virus type 1 encodes P and C proteins but not a cysteine-rich V protein Turkey rhinotracheitis: preliminary investigations Periodic features in the amino acid sequence of nematode myosin rod Turkey rhinotracheitis: a review Two non contiguous regions of Sendai virus P protein combine to form a single nucleocapsid binding site Cloning in single stranded bacteriophage as an aid to rapid DNA sequencing Sequence analysis of the respiratory syncytial virus phosphoprotein gene Sequence of 3687 nucleotides from the 3' end of the Sendai virus genome RNA and the predicted amino acid sequences of viral NP, P and C proteins Sequence analysis of the P and C protein genes of human parainfluenza virus type 3: patterns of amino acid sequence homology among paramyxovirus proteins Deduced amino acid sequence of the fusion glycoprotein gene of turkey rhinotracheitis virus has a greater identity with that of human respiratory syncytial virus, a pneumovirus, than that of paramyxoviruses or morbilliviruses Cloning and sequencing of the matrix protein (M) of turkey rhinotracheitis virus reveal a gene order different from that of respiratory syncytial virus Sequence and in vitro expression of the M2 gene of turkey rhinotracheitis pneumovirus The sequence reported here has been deposited in the GenBank database under accession number U22110. This work was supported by the Biotechnology and Biological Sciences Research Council and the Ministry for Agriculture, Fisheries and Food, UK.