key: cord-0988325-a647hwu9 authors: Lin, Debby A.; Roychoudhury, Sonali; Palese, Peter; Clay, William C.; Fuller, Frederick J. title: Evolutionary relatedness of the predicted gene product of RNA segment 2 of the Tick-Borne Dhori virus and the PB1 polymerase gene of influenza viruses date: 1991-05-31 journal: Virology DOI: 10.1016/0042-6822(91)90641-n sha: 61f28c2619954ac3b4ae1b41f9dad830d44d35b9 doc_id: 988325 cord_uid: a647hwu9 Abstract The complete nucleotide sequence of the second largest RNA segment of Dhori/India/1313/61 virus was determined and the deduced amino acid sequence was compared with the polymerase (P) proteins of influenza A, B, and C viruses. RNA segment 2 (2224 nucleotides) of Dhori virus contains a single long open reading frame that can encode a 716-amino amid polypeptide (81.3 kDa). The predicted polypeptide shares between 27 and 31% sequence identities with the PB1 polypeptides of influenza A, B, and C viruses. Among the regions most highly conserved are the sequences around the Asp-Asp motif common to many RNA polymerases. In spite of the high level of sequence identity between the Dhori RNA segment 2 gene product and the influenza A, B, and C virus PB1 proteins the amino acid composition of the Dhori protein indicates an acidic charge feature at pH 7.0 in contrast to the basic nature of the PB1 proteins of the influenza viruses. We suggest that the Dhori PB1-like protein be designated the Pα protein of this virus. Members of the Dhori virus serogroup are as yet unclassified tick transmitted viruses that share structural and genetic properties with the Orthomyxoviridae (Clerx et al., 1983) . Dhori viruses have been isolated from a variety of tick and vertebrate species (Anderson and Casals, 1973; Filipe and Casals, 1979; Karabat-SOS, 1985; Williams et a/., 1973) . The genome of the virus consists of seven unique segments of singlestranded RNA with a total size of approximately 1 1.9 kb. The viral RNAs have been shown to encode information in the negative-sense (Clerx et al., 1983; Fuller et al., 1987; Freedman-Faulstich and Fuller, 1990) and it was previously shown that the Dhori nucleoprotein (encoded by RNA segment 5) shares conserved amino acid sequences with the influenza A, B, and C virus nucleoproteins (Fuller et al., 1987) . The sequence of RNA segment 4 encodes the single envelope protein of Dhori virus (Freedman-Faulstich and Fuller, 1990) . The envelope protein does not share sequence identity with any orthomyxovirus envelope protein. We have also determined the nucleotide sequence of the Dhori segment 6 RNA and this segment most likely encodes the viral matrix protein (unpublished data). The 3'and 5' nontranslated ends of RNA segments 4, 5, and 6 are very similar to the conserved ends of the genes of the recognized members of the Orthomyxoviridae (Clerx et ' To whom correspondence and requests for reprints should be addressed. a/., Fuller et a/., 1987; Freedman-Faulstich and Fuller, 1990) and segment 3 of Thogoto virus (Staunton et al., 1989) . A second group of tick-transmitted viruses, the Thogoto viruses, possess a segmented genome with six or seven segments of single stranded RNA (Clerx et al., 1983; Staunton et al., 1989) . Thogoto virus has been shown to replicate in Rhipicephalus appendiculatus ticks and to be transmitted to laboratory animals (Davies et a/., 1986) . The sequence of the third largest segment of Thogoto/SiAr/l26/72 virus was recently determined (Staunton et al., 1989) unique in having a net negative charge unlike the net positive charge of the corresponding A, B, and C virus proteins. Despite these differences the data reported here establish that the influenza viruses and Dhori virus share a common evolutionary ancestor. Dhori/lndia/l313/61 virus was grown in Green monkey kidney (GMK-Vero) cells as previously described (Clerx et a/., 1983) . Virus purification and RNA extraction have also been described previously (Clerx et a/., 1983) . Dhori virus-infected cellular RNA was isolated from Vero cells at 48 hr postinfection, phenol extracted, ethanol precipitated, and purified by oligo(dT)-cellulose chromatography as previously described (Collins eta/., 1982; Fuller er a/., 1983). Cloning of virus-specific DNA and identification of clones The Dhori/lndia/l3 13/6 1 virus strain was used to infect GMK-Vero cells. Infected cell mRNA was reverse transcribed with an oligo(dT),,m,s primer. Second-strand cDNA synthesis was performed by the method of Gubler and Hoffman (1983) . The blunt-ended double-stranded cDNAs were ligated to pBR322 which had been cut by Pvull and dephosphorylated. Clones bearing Dhori virus specific inserts were identified by colony hybridization using radiolabeled cDNA which had been obtained using viral RNA as template and the oligonucleotide 5' AGCAA(A/T)AACAAGCAGT 3' as primer. This oligonucleotide is complementary to sequences common to the 3' ends of the seven viral RNA segments (Clerx et a/., 1983) . Virus-specific clones were organized by cross-hybridization into several pools. Viral RNA was extracted from cesium chloride gradient (12-420/o (w/w)) purified Dhori virus and resolved on a 1.5% agarose gel containing 10 mM methylmercuric hydroxide. The viral RNAs were transferred from the agarose gel to Biotrans nylon membranes (ICN Inc., Irvine, CA) and hybridized with nick-translated plasmid DNA. Both strands of insert DNA of pD50-40 were directly sequenced using specific synthetic oligonucleotide primers and the modified T7 DNA polymerase Sequenase (United States Biochemical Corp.). All oligonucleotides were synthesized using an Applied BioSystems 380A synthesizer (Applied BioSystems, Foster City, CA). For plus-strand as well as for minus-sense sequencing specific primers were synthesized to permit the reading of new sequences over a distance of 150-200 nucleotides. In order to obtain sequences located at the 3' end terminus of the RNA segment 2, polymerase chain reactions were performed using purified viral RNA and the following oligonucleotides, YAGCAA(A/T)AACAAGCAGT 3' and 5'ATCTCTGTG-GAAGCCAC 3'. The sequence located at the 5' end terminus of the RNA segment 2 was also obtained by polymerase chain reaction performed on purified viral RNA and the following oligonucleotides, 5'AGTAGAQ A)ATCAAAGCA 3' and 5'ACCTCTTTTGTTGAAG 3'. The resulting dsDNAs were cloned into Ml 3mpl8RF and directly sequenced using universal primer. An analysis of the Dhori virus genome was attempted by DNA cloning of the poly(A)-containing RNA of virus-infected GMK-Vero cells. Virus-specific cDNA clones were obtained and by cross-hybridization were grouped into seven pools. Nick-translated DNAs of plasmids pD21-39, pD50-40, pD43-22, pD68-30, pD22-39, pD26-16, and pD12a-35 hybridized to RNAs 1, 2, 3, 4, 5, 6, and 7, respectively in a Northern blot analysis of Dhori virus RNA (Fig. 1) 35, contains a small insert of segment 1 as well as an belonging to the group that hybridized to RNA seginsert for segment 7 (unpublished results). Plasmid ment 2. pD50-40 was selected for further study since it had the The insert of pD50-40 was then sequenced using largest insert among a total of nine different clones specific synthetic oligonucleotide primers and the mod- (Yamashita et a/., 1989) . 'Alignment score (above diagonal) determined by the program PCOMPARE (Intelligenetics) using the unitaty matrix, a bias of 0, a gap penalty of 3, and 100 random runs. A score of 5 or greater is considered significant (Dayhoff, 1983) . ' Percent amino acid identity (below diagonal) based on an alignment of the two sequences by the method of Needleman and Wunsch (1970) . ified T7 DNA polymerase-Sequenase. Sequences of the 3'-and 5'-terminal regions of the RNA segment 2 absent from the insert in pD50-40 were determined by polymerase chain reaction amplification, cloning of the amplified DNAs into M 13mpl8RF, and subsequent sequencing of the inserts. The complete sequence of segment 2 of Dhori/lndia/l313/61 virus is shown in the plus sense in Fig. 2 . The RNA is 2224 nucleotides in length and the cDNA insert of pD50-40 only lacks 25 nucleotides at the 5' end (in the plus-sense orientation). The RNA encodes a single long open reading frame that could be initiated by an AUG codon at nucleotides 25 through 27. No other reading frame is open in the antigenome sense for longer than 64 amino acids and the longest open reading frame in the genome sense is 109 amino acids. We do not know if these smaller reading frames can be expressed but we have not detected any subgenomic messenger RNAs derived from RNA segment 2 (unpublished data). The AUG codon at nucleotides 25 through 27 is in a strong context for protein synthesis initiation (G at -3, A at +4; Kozak, 1986) . A 716amino acid polypeptide is predicted from the largest open reading frame and it is terminated with a single UAG codon at nucleotides 2173 through 2 175. A search of the NBRF (National Biomedical Research Foundation) protein database using the FASTP program (Lipman and Pearson, 1985) indicated that the predicted gene product of Dhori segment 2 was most closely related to the PBl proteins of the influenza virus group. The PBl polymerase proteins are the most highly conserved among the proteins of the influenza A, B, and C viruses (Yamashita et a/,, 1989 ) and they are most likely required for nucleotide addition during viral RNA synthesis (Braam et al., 1983) . A pairwise comparison of the amino acid sequence identities of the Dhori virus protein and the influenza A, B, and C virus PBl proteins are shown in Table 1 . The optimized alignments were made by the method of Needleman and Wunsch (1970) and indicate that the Dhori PBllike protein shares sequence identities of 30.7, 26.9, and 29.8% with the PBl proteins of influenza AIPRf8/ 34, influenza B/Ann Arbor/i/66, and influenza C/JJ/50 virus, respectively (Table 1 ). In addition, we compared the sequence of the Dhori PBl-like protein to the PBl proteins of influenza A, B, and C viruses using the PCOMPARE (Intelligenetics) program. This program determines the probability that the similarity between two sequences could occur by chance. The score is expressed as the number of standard deviation units by which the maximum score for the real sequences exceeds the average maximum score for 100 randomizations of the sequence. An alignment score greater than 5 (probability of similarity occurring by chance < 3 X 10p7) implies an evolutionary relationship between proteins (Dayhoff et al., 1983) . Pairwise comparisons (see Table 1 above diagonal) between Dhori and influenza PBl proteins yield alignment scores of 25-26 which clearly indicate an evolutionary relationship between the Dhori virus PBl-like protein and the influenza PBl proteins. A similar comparison between the Dhori segment 2 gene product and the influenza A/PR/ 8/34 PB2 protein and PA protein gave alignment scores of 0.25 and 0.015, respectively. Comparative analysis of the deduced amino acid sequences revealed highly conserved regions in the PBl proteins of influenza A, B, and C viruses (Kawaoka et al., 1989; Yamashita et al., 1989) . Although the functional domains of the PBl proteins of influenza viruses have not been determined, the conserved regions may be important for basic functions of the PBl proteins, such as initiation and chain elongation (Braam et al., 1983) . It has also been observed that there are four highly conserved motifs among RNA-dependent RNA polymerases (plus-or minus-strand virus polymerases) as described by Poch et a/. (1989) and the Dhori Pa protein shares all of these motifs (Fig. 3) . A comparison of the Dhori virus protein with the PBl proteins shown in Fig. 3 also reveals regions of highly conserved sequences. A 15-residue sequence motif (a double aspartic acid core flanked by hydrophobic residues) is found in many DNA and RNA polymerases (Argos, 1988 (Yamashita et al., 1990) with the PCU protein of Dhori/lndia/l313/61 virus. The alignment was obtained by a compilation of the pairwise comparisons by the method of Needleman and Wunsch (1970) . An asterisk (*) denotes identity with the amino acid of the Dhori Pa protein at that position and a dash (-) indicates a gap introduced in the sequence for optimal pairing. The four domains designated by the lines above and below the sequence represent those highly conserved regions defined in RNA polymerases by Poch et a/. (1989) . B, and C shares this motif (W/FWD/TGLQSSDDFA/VLI/ FV/A), at residues 435/438 to 449/452 (domain 3, Fig. 3 ). In the Dhori virus protein, we found a similar sequence at residues 419 to 433 (TGDHVESSDDFIHFF). All rules of this Asp-Asp polymerase motif are fulfilled except for position 7 which contains a serine instead of the traditional Gly, Met, Cys, Val, or Leu residue (Ar-gos, 1988). Thus it is likely that the Dhori virus protein also possesses a polymerase function similar to that of the PBl proteins of influenza viruses. The influenza virus PBl protein is transported to the nucleus independently of other viral proteins (Smith et al., 1987; Akkina et al., 1987) . Nath and Nayak (1990) have recently shown that the nuclear localization sig-nal of influenza virus (A/WSN/33) PBl protein was located within residues 180-252 of PBl The study suggested that two discontinuous regions of the PB 1 both containing a stretch of basic amino acids were required for its nuclear localization. Homologous sequences, however, were not found in the Dhori virus protein. A comparison of some of the physical properties of the Dhori and influenza PBl proteins indicates that the Dhori protein is the smallest in this group (716 amino acids; M, 81,292) and unlike the influenza PBl proteins which have a strong positive charge at neutral pH (+23-A/PR/8/34; +12-B/AA/i/66; +14-C/U/50) the Dhori protein is slightly acidic with a charge of -10 (PEPTIDESORT; Devereux et al., 1984) . Since the nomenclature of the influenza virus polymerase proteins is based on their charge characteristics a different name for the related protein of Dhori virus may be appropriate and we thus suggest the designation Pa protein. This nomenclature is proposed for the designation of the Dhori polymerase proteins [Pa-(PBl), Pp-(PB2), and Pr-(PA)] since the charge characteristics do not appear to be constant across virus groups. For example, the PA-like polymerase protein of influenza C/JJ/50 virus (Yamashita eta/., 1989 ) encodes a weakly basic protein designated P3 (segment 3 gene product). The third largest RNA segment of Thogoto virus encodes a PA-like protein with only a very weak negative charge (-2.5 at pH 7.0; Staunton et al., 1989) . Genes of other viruses also share sequence identity with influenza virus segments. It has been previously demonstrated that some members of the coronavirus group and influenza C virus have a hemagglutinin-esterase (HE) glycoprotein that recognizes the same Nacetyl-9-0-acetylneuraminic acid receptors on cells (Vlasak et a/., 1988) . The influenza C virus HE glycoprotein shares an approximately 30% amino acid sequence identity with a coronavirus HE glycoprotein present in bovine coronavirus (Kienzle et al., 1990 ) and mouse hepatitis virus (Luytjes et a/., 1988) . This example of significant identity between a plus-strand RNA and a minus-strand RNA virus supports the concept of a cassette model of evolution for the HE gene and is most likely an example of a recombinational event. In contrast, we feel that the relationship of the Dhori virus group to the orthomyxovirus group is most likely an example of divergent evolution in light of the similarities in genome structure and common replication strategies and of the observation that several Dhori virus gene products share significant identity with orthomyxovirus genes. Intracellular localization of the viral polymerase proteins in cells infected with influenza virus and cells expressing PBl protein from cloned cDNA. 1. v;ro Dhori virus A sequence motif in many polymerases Molecular model of a eukaryotic transcription complex: Function and movements of influenza P proteins during capped RNA primed transcription Tick-borne viruses structurally similar to orthomyxoviruses Experimental studies on the transmission cycle of Thogoto virus, a candidate orthomyxovirus Establishing homologies in protein sequences Sequence comparison of wildtype and cold-adapted B/Ann Arbor/l/66 influenza virus genes A comprehensive set of sequence analysis programs for the VAX Antibodies to Congo-Crimean haemorrhagic fever, Dhori, Thogoto and Bhanja viruses in southern Portugal Isolation of Dhori virus from Hyalemma marginatum ticks in Portugal Nucleotide sequence of the tick-borne Complete nucleotide sequence of the tick-borne, orthomyxo-like Dhori/lndia/l313/61 virus nucleoprotein gene A simple and very efficient method for generating cDNA libraries International Catalogue of Arboviruses Including Certain Other Viruses of Vertebrates Avian-to-human transmission of the PBl gene of influenza A viruses in the 1957 and 1968 pandemics Structure and orientation of expressed bovine coronavirus hemagglutinin-esterase protein Point mutations define a sequence flanking the AUG initiation codon that modulates translation by eukaryotic ribosomes Rapid and sensitive protein similarity searches Sequence of mouse hepatitis virus A59 mRNA 2: Indications for RNA recombination between coronaviruses and influenza C virus Function of two discrete regions is required for nuclear localization of polymerase basic protein 1 of A/WSN/33 Influenza virus (Hl Nl) A general method applicable to the search for similarities in the amino acid sequence of two proteins Identification of four conserved motifs among the RNA-dependent polymerase encoding elements Synthesis and cellular location of the ten influenza polypeptides individually expressed by recombinant vaccinia viruses Sequence analyses of Thogoto viral RNA segment 3: Evidence for a distant relationship between an arbovirus and members of the Orthomyxoviridae Human and bovine coronaviruses recognize sialic acid containing receptors similar to those of influenza C viruses Isolation of Wanowrie, Thogoto, and Dhori viruses from Hyalomma ticks infecting camels in Egypt Nucleotide sequence of human influenza A/PRI8/34 segment 2 Comparison of the three large polymerase proteins of influenza A, B, and C viruses This work was supported by National Institutes of Health Grants Al-l 8998 (P-P.), Al-l 1823 (P.P.) and Al-20939 (F.F.). This work was in partial fulfillment of a Westinghouse project (D.L.).