key: cord-0684552-th4srh88 authors: Aaskov, John; Jones, Anita; Choi, Wilson; Lowry, Kym; Stewart, Emerald title: Lineage replacement accompanying duplication and rapid fixation of an RNA element in the nsP3 gene in a species of alphavirus date: 2011-02-20 journal: Virology DOI: 10.1016/j.virol.2010.11.025 sha: 00cc4fafd19f95fa6462e630bd318892af8d142f doc_id: 684552 cord_uid: th4srh88 Abstract A sequence of thirty-six nucleotides in the nsP3 gene of Ross River virus (RRV), coding for the amino acid sequence HADTVSLDSTVS, was duplicated some time between 1969 and 1979 coinciding with the appearance of a new lineage of this virus and with a major outbreak of Epidemic Polyarthritis among residents of the Pacific Islands. This lineage of RRV continues to circulate throughout Australia and both earlier lineages, which lacked the duplicated element, now are extinct. Multiple copies of several other elements also were observed in this region of the nsP3 gene in all lineages of RRV. Multiple copies of one of these, coding for the amino acid sequence P*P*PR, were detected in the C-terminal region of the nsP3 protein of all alphaviruses except those of African origin. The fixation of duplications and insertions in 3′ region of nsP3 genes from all lineages of alphaviruses, suggests they provide some fitness advantage. Alphaviruses are positive sense RNA viruses that share a common ancestor with plant viruses in the tobamavirus, tobravirus and bromovirus families (Koonin and Dolja, 1993) . New world alphaviruses commonly are associated with encephalitic disease in humans while infections with old world alphaviruses usually are associated with fever, rash and arthritis (Griffin, 2007) . Following infection, the non-structural viral proteins (nsP1-4) of alphaviruses are translated directly from an open reading frame at the 5′ end of the viral genome while the structural proteins (C, E3, E2, 6K, E1) are derived from a 26S sub-genomic RNA produced by newly synthetised non-structural proteins (Strauss and Strauss, 1994) . While the roles of non-structural proteins nsP1, 2 and 4 are well understood that of nsP3 is less clear. Furthermore, while alphavirus nsP1, 2 and 4 proteins share extensive sequence homology with proteins from other families of positive strand viruses, nsP3 does not (Ahlquist et al., 1985; Haseloff et al., 1984) . nsP3 contains two conserved domains. The first (X or macro domain) is conserved among alphaviruses, coronaviruses, rubella and hepatitis E viruses (Koonin and Dolja, 1993) and the second is conserved among alphaviruses (Strauss and Strauss, 1994) . nsP3 is highly phosphorylated, particularly the serine and threonine residues in the C-terminal region (Vihinen and Saarinen, 2000) and may act to attach the alphavirus replication complex (nsP1-4 proteins) to the cytoskeleton of the host cell (Frolova et al., 2006; Gorchakov et al., 2008) . Semliki Forest viruses (SFV) can tolerate deletions of from 43 to 119 amino acids in the C-terminal region of their nsP3 proteins with only slight reductions in replication efficiency in vitro and in virulence for mice (Galbraith et al., 2006) and a 102 nucleotide deletion in this region of the nsP3 gene of Venezuelan encephalitis virus (VEEV) had no detectable effect on replication in vitro (Davis et al., 1989) . Several members of the alphavirus family have an OPAL stop codon near the 3′ end of the nsP3 gene (Strauss et al., 1988) requiring read-through for production of the nsP4 polymerase. Duplicated amino acid elements have been observed in the C-terminal region of nsP3 of several alphavirus isolates (Meissner et al., 1999; Oberste et al., 1996; Strauss et al., 1988) but without any indication of when or where these events occurred and whether they were related to the epidemiology of the viruses concerned. Ross River virus (RRV) employs complex, overlapping, urban and rural cycles of transmission involving multiple mosquito and vertebrate hosts but causes disease only in humans and horses (Russell, 2002) . The nsP3 protein of a strain of Ross River virus (RRV) recovered from an Epidemic Polyarthritis patient in 2004 contained a duplication of the amino acid sequence HADTVSLDSTVS/L which had not been observed in any earlier isolates (Jones et al., 2010) . The study described here was designed to determine whether the duplication of this element in this strain of RRV was an isolated event and, if not, when and where it had occurred and how quickly the change was fixed or removed. The amino acid sequence, HADTVSLDSTVS/L, which was duplicated in the nsP3 protein of RRV strain QML 1 recovered in 2004 (Jones et al., 2010) a Amino acid numbering from the N-terminal of RRV T48 nsP3. b Single copy of the motif in italics. c Multiple copies of motifs in bold type. Motif sequence from left to right from N-terminal to C-terminal of the nsP3 protein e.g HADTVSLDSTVL followed by HADTVSLDSTVS. d Spaces indicate the motif was not observed in that virus. two lineages of RRV which now are extinct (lineage 1 and 2, Table 1 , Fig. S2 ). The C-terminal region of the nsP3 protein of RRV (amino acids 301-550) contained three additional elements that appeared to have been duplicated and one, P*P*PR, that appeared at four locations ( Fig. 1A) . Other elements contained fewer amino acids than the HADTVSLDSTVS one and the amino acid sequences were less conserved. Within the HADTVSLDSTVS/L element, there were three tri-peptides (TVS) which were not found elsewhere in the nsP3 protein of RRV suggesting they may have been the foot prints of previous duplication events in this region. While the sequence HADTVSLDSTVS was duplicated in all post-1979 strains of RRV studies, the other elements, that appeared at multiple sites, were observed in all lineages of RRV and in the nsP3 proteins of a number of other alphaviruses (Table 1) . The earliest example of a lineage 3 strain of RRV in which the element HADTVSLDSTVS/L was duplicated was recovered from an Epidemic Polyarthritis patient in Fiji in 1979 (Aaskov et al., 1981) at the beginning of an outbreak of infection that swept the Pacific region. The number of cases of RRV infection reported in Australia has climbed steadily from approximately 500 cases in 1980 to an average of approximately 5000 per year at present (Aaskov, 2009 ). Accompanying this increase in the number of cases in Australia has been the steady replacement of lineage 1 and 2 RRV by lineage 3 viruses (Jones et al., 2010) . While there had been outbreaks of RRV infection in Australia prior to that in the Pacific, almost certainly caused by strains of RRV without this duplicated element in the nsP3 gene, these involved scores rather than tens of thousands of cases (Aaskov, 2009 ). However, we have been unable to identify a mechanism by which this change in the nsP3 gene could have conferred a significant fitness advantage on populations of RRV and it remains possible that one, or several, of the single nucleotide polymorphisms that distinguish the current lineage of RRV from the previous two (Jones et al., 2010) were responsible for these lineage replacements. There are precedents with other alphaviruses for epidemic potential to be determined by changes in only one or two nucleotides (Anischenko et al., 2006; Tsetsarkin et al., 2009) . The task of evaluating the significance of the duplication of this element is made more difficult by the absence of RRV isolates from Epidemic Polyarthritis patients in Australia prior to 1983 (Aaskov et al., 1985) and the extensive passage of early lineages of RRV from pools of mosquitoes (which may have contained multiple infected insects) in the brains of suckling mice in order to recover isolates. Nonetheless, no changes to this element have been detected, and no further duplications in the nsP3 gene of RRV have been fixed, since 1979 (Table 1, Fig. S1 ). A comparison of the nucleotide sequences of the nsP3 genes of the prototype strain of RRV (T48) and that in which the HADTVSLDSTVS repeat element was first observed (F9073) suggested three possible locations at which the duplication might have occurred i.e. 5′ to the original nucleotide sequence, 3′ to the original sequence or into the middle of it (Fig. 1B) . Duplication of the sequence 5′ to its position in the RRV T48 genome would require changes to three nucleotides in the insert. Duplication of the sequence 3′ to its position in the T48 genome or by insertion into the middle of the original sequence would require nucleotide changes in both the T48 genome and in the duplicated element. If the insertion occurred 3′ to the ancestral sequence, the nucleotide sequences flanking the insertion site would have been almost identical (Fig. 1B) 1031 1041 1051 1061 1071 ACAGUAUCC genome into a more stable stem loop. (Fig. 2) . Similar observations were made for RNA coding for a single and duplicated element in the 3′ region of the nsP3 gene from VEEV (Davis et al., 1989) . However, given the additional energy required to unfold more stable RNA structures prior to translation or copying, it is difficult to imagine such changes would confer any fitness advantage. The element HADTVSLDSTVS/L differed from two of the others (PVPPPR and VEFPWAPEDL) which also appeared to have been duplicated in RRV in that it was not strongly hydrophobic. Even when variation occurred in the sequence of the two latter elements, the amino acid replacements usually were hydrophobic (Table 1) . As these two elements were closer to the C-terminal of the nsP3 protein than the recently duplicated one, their hydrophobicity may indicate an association of this region of nsP3 with membranes or membrane-like structures in host cells (Gorchakov et al., 2008) . No inverted repeat nucleotide sequences were detected in the regions flanking the sites of the insertions, deletions or duplications in the nsP3 genes of alphaviruses and there were no A/U rich regions, which might be associated with polymerase slippage and recombination (Nagy and Simon, 1997) , on either side of these changes either (Fig. S2) . However, the sequences of the nucleotides on either side of one of the putative insertion site in RRV (Fig. 1) were almost identical as were the sequences flanking an insertion site in SFV (Fig. 3) but this was not the case in the other alphaviruses studied. The flanking nucleotide sequences in SFV were out of frame and so the similarities were not reflected in the amino acid sequence. The changes observed in the nsP3 protein of RRV appeared less chaotic than those observed in this gene in other alphaviruses. Examples of duplicated elements, similar to those observed in RRV, but unique to particular families or lineages of alphaviruses are highlighted in Fig. 3 . A full comparison of this region of the nsP3 protein of the major families of alphaviruses and the corresponding nucleotide sequences are shown elsewhere (Fig. S2) In both EEEV and VEEV, the duplicated element appeared 5′ to the original suggesting that the same may have occurred with the recently duplicated element in RRV nsP3. In contrast to RRV, the nsP3 genes of many other families of alphaviruses appeared to contain foreign genetic material. For example, there appeared to have been insertions of non-CHIKV RNA at two sites in the nsP3 gene of that virus. The amino acid element STITSLTHSQFDLSVDGE in CHIKV 06-021 was found in most strains of CHIKV but not in an example of one of the earliest lineages, ALSA 1. The amino acid sequence STITSLTH was identical to a region of a putative zinc finger protein from Aedes aegypti (Genbank XM001660684.1). The element GIADLAA in SFV (Y12518) was found nowhere else in the SFV polyprotein but appeared in a wide range of cellular proteins suggesting that host cell RNA could been inserted into this region of the SFV genome. Examples of what may represent foreign RNA inserted into the nsP3 genes of other alphaviruses have been reported previously (Davis et al., 1989 , Oberste et al., 1996 , Meissner et al., 1999 or are highlighted in EEV, SINV and VEEV in Fig. 3 . In EEEV and SINV there appeared to be hotspots for insertion events with progressively larger elements being inserted at the same site of different lineages. As some repeats, e.g. P*P*PR, were observed in most lineages of alphaviruses (Powers et al., 2001) , it is likely that the processes giving rise to them have been occurring for centuries. However, apart from two short ALAAR elements in an A-rich region, no repeat elements could be detected in the p150 gene/protein of rubella virus which has been suggested to be an antecedent of the alphavirus nsP3 gene (Koonin and Dolja, 1993) . The recent suggestion by Arrigo et al. (2010) that North American and South American lineages of EEV be reclassified as different species in the EEE complex is supported by an analysis of the amino acid sequences of the hypervariable region of their nsP3 proteins (Fig. 3) . The EAEV/IH element is not duplicated in the North American lineage and this lineage appears to contain two, and possibly three, large insertions. Using similar criteria, there may be a case for making lineage 1E strains of VEEV a separate species in the VEE complex i.e. a large amino acid element is duplicated in VEEV lineages 1AB, 1C and 1D but not in 1E and lineage 1E viruses contain three large sequences not found in the other lineages of VEEV. The changes observed in the C-terminal region of the nsP3 gene/ protein of RRV and other alphaviruses bore some similarities to those in defective interfering (D.I.) particles of SINV and SFV i.e. linear repeats and the insertion of foreign nucleotide sequences (Lehtovaara et al., 1981; Tsiang et al., 1985) raising the possibility that the processes giving rise to the hypervariability in nsP3 genes are similar to those that give rise to alphavirus DI particles. These observations and earlier studies (Davis et al., 1989 , Lehtovaara et al., 1981 Tsiang et al., 1985) suggest that the hypervariability of the nsP3 gene and the generation of alphavirus DI particles both could be due to recombination as a result of RNA template switching by nsP4. The duplication events in EEEV, VEEV and possibly RRV occurred 5′ to the original element, suggesting that recombination could have occurred during synthesis of negative strand RNA. Perhaps nsP4 is more prone to template switching when it is associated with the uncleaved nsP1-3 polyprotein to synthetise negative strand RNA than when it is complexed with nsP1, nsP2 and nsP3 proteins to produce positive strand RNA. The association of changes in the envelope proteins of alphaviruses with outbreaks of disease (Anischenko et al., 2006; Tsetsarkin et al., 2009 ) has focussed attention on the structural proteins of this family of viruses. However, while changes to structural proteins have the potential to influence the entry into, and the egress from, infected cells by virions, changes to non-structural proteins have the potential to have profound effects on the amount of virus produced and on the fitness of those virions e.g. depending on the fidelity of the replication of viral genomes (Pfeiffer and Kirkegaard, 2005) . The observation that all alphaviruses appear to insert pieces of autologous and or heterologous RNA into the 3′ region of their nsP3 genes and that some of these changes spread rapidly throughout lineages of these viruses suggests that there is some evolutionary benefit accruing from this process. What this benefit might be remains to be elucidated. Strains of RRV (Table 2) were obtained from the collection at the World Health Organisation Collaborating Centre for Arbovirus Reference and Research at the Queensland University of Technology. Nucleotide sequences for other alphaviruses were obtained from Genbank. RNA was extracted from RRV in the supernatant of cultures of infected Vero cells with QIAamp viral RNA minicolumns (Qiagen), according to the manufacturer's instructions. The RNA was reverse transcribed with Superscript III reverse transcriptase (Invitrogen) and random hexanucleotide primers (Boehringer). The resultant cDNA was amplified using a mixture of Taq and Pwo polymerases (Expand Long Template DNA polymerase; Roche) and RRV nsP3 specific primers (Table 3 ). The PCR product was analysed in 1.5% w/v agarose-Tris-acetate-EDTA gels, and bands of cDNA of interest were recovered and purified with QIAquick gel extraction kits (Qiagen) according to the manufacturer's instructions. The cDNA was sequenced at the Australian Genome Research Facility (Brisbane) using di-deoxy dye termination technology (Applied Biosystems). Sequences were aligned and analysed with software (Clustal W, DNAdist, Seqboot, Consense, Neighbour, M-Fold) available from the Australian National Genome Information Service (http://biomanager.info/). The one letter amino acid code has been used to identify amino acids. Supplementary materials related to this article can be found online at doi: 10.1016/j.virol.2010.11.025. Ross River virus: epidemic polyarthritis An epidemic of Ross River virus in Fiji Isolation of Ross River virus from epidemic polyarthritis patients in Australia Sindbis virus proteins nsP1 and nsP2 contain homology to non-structural proteins from several RNA plant viruses Venezuelan encephalitis emergence mediated by a phylogenetically predicted viral mutation Evolutionary patterns of Eastern Equine Encephalitis virus in North versus South America suggest ecological differences and taxonomic revision In vitro synthesis of infectious Venezuelan Equine Encephalitis virus RNA from a cDNA clone: analysis of a viable deletion mutant Formation of nsP3 specific protein complexes during Sindbis virus replication Deletions in the hypervariable domain of the nsP3 gene attenuates Semliki Forest virus virulence Different types of nsP3-containing protein complexes in Sindbis virus-infected cells Alphaviruses Striking similarities in amino acid sequence among non-structural proteins encoded by RNA viruses that have dissimilar genomic organisation Molecular evolutionary dynamics of Ross River virus and implications for vaccine efficacy Evolution and taxonomy of positive-strand RNA viruses: implications of comparative analysis of amino acid sequences 18S defective interfering RNA of Semliki Forest virus contains a triplicated linear repeat Sequencing of protype viruses in the Venezuelan equine encephalitis antigenic complex New insights into the mechanism of RNA recombination Complete sequence of Venezuelan equine encephalitis virus subtype 1E reveals conserved and hypervariable domains within the C terminus of nsP3 Increased fidelity reduces poliovirus fitness and virulence under selective pressure in mice Evolutionary relationships and systematics of the alphaviruses Ross River virus: ecology and distribution Nonstructural proteins nsP3 and nsP4 of Ross River and O'Nyong-nyong viruses: sequence and comparison with those of oyther alphaviruses The alphaviruses: gene expression, replication and evolution Epistatic roles of E2 glycoprotein mutations in adaption of Chikungunya virus to Aedes albopictus and Ae. Aegypti mosquitoes Studies of defective interfering RNAs of Sindbis virus with and without tRNA Asp sequences at their 5′ termini Phosphorylation site analysis of Semliki Forest virus nonstructural protein 3 This study was supported by grants from the Cook Estate and the National Health and Medical Research Council of Australia. Strain (