key: cord-0010778-qqj4l4e1 authors: TOMITAKA, YASUHIRO; OHSHIMA, KAZUSATO title: A phylogeographical study of the Turnip mosaic virus population in East Asia reveals an ‘emergent’ lineage in Japan date: 2006-11-14 journal: Mol Ecol DOI: 10.1111/j.1365-294x.2006.03094.x sha: 87634587c3e88e58b6833a638bdf4de5d55ae03e doc_id: 10778 cord_uid: qqj4l4e1 The genetic structure of populations of Turnip mosaic virus (TuMV) in East Asia was assessed by making host range and gene sequence comparisons of 118 isolates utilizing a population genetic approach. Most, but not all, isolates collected from Brassica plants in China infected only Brassica plants, whereas those from Japan infected both Brassica and Raphanus (BR) plants. Analyses of the positions of recombination sites in five regions of the genomes (one third of the full sequence) of the many recombinant isolates were fully congruent with the results of phylogenetic analysis, and at least one recombination type pattern was shared between Chinese and Japanese populations. One lineage of nonrecombinant isolates from the basal‐BR lineage was found in 2000 in Kyushu, Japan but none in China, and have since been found over the whole island. The sudden expansion of this basal‐BR population was strongly supported by calculations showing the deviations from the neutral equilibrium model for the individual geographical lineages with overall lack of nucleotide diversity, and by analysis of mismatch distribution. Our study shows that the recent Chinese and Japanese TuMV isolates are part of the same population but are discrete lineages. Comparisons of the genetic structures of the populations of codistributed species can provide significant insight into the extent to which extrinsic and intrinsic factors interact to affect the geographical scale of population differentiation (Bermingham & Moritz 1998; Campbell et al . 2006) . For instance, ecologically and phylogenetically disparate virus populations may exhibit striking concordance in phylogeographical structure across historical barriers to gene flow. Whereas a large number of such studies on the structure of animal and human RNA viruses have been reported, there have been few similar studies of plant RNA viruses (García-Arenal et al . 2001 ). In addition, very few studies on genetic populations using sequence-based analysis have been reported, whereas several studies of plant viruses have used molecular markers (Tsompana et al . 2005) . Turnip mosaic virus (TuMV) was ranked second only to Cucumber mosaic virus (CMV) as the most important virus infecting field-grown vegetables in a survey of their virus diseases in 28 countries and regions (Tomlinson 1987; . TuMV infects a wide range of plant species, most from the family Brassicaceae . It is probably the most widespread and important virus infecting both crop and ornamental species of this family throughout the world, and especially threatens cultivated brassicas in East Asia (Provvidenti 1996; Tomimura et al . 2003) . TuMV belongs to the genus Potyvirus . This is the largest genus of the largest family of plant viruses, the Potyviridae (Shukla et al . 1994; Fauquet et al . 2005) , which itself belongs to the picorna-like supergroup of viruses. TuMV, like other potyviruses, is transmitted by aphids in the nonpersistent manner (Hamlyn 1953) . Potyviruses have flexuous filamentous particles 700-750 nm long, each of which contains a single copy of the genome, which is a single-stranded positive sense RNA molecule of about 10 000 nucleotides (nts). The genomes have terminal untranslated regions flanking a single open reading frame. The single large polyprotein hydrolyses itself after translation into at least 10 proteins (Riechmann et al . 1992; Urcuqui-Inchima et al . 2001) . The protein 1 (P1) gene, situated at 5 ′ end of the genome, encodes a serine proteinase and is the protein with the greatest between-species variation. Even though P1 enhances amplification and movement of the virus, it is not strictly required for viral infectivity (Verchot et al . 1992; Klein et al . 1994) . The 6K protein 2 (second 6-kda protein) is involved in viral long-distance movement and symptom induction (Restrepo-Hartwig & Carrington 1994) . The viral genomelinked protein (VPg) is a multifunctional protein with several suggested roles in the viral infection cycle; it may act as a primer for RNA replicase during virus multiplication, possibly through direct interaction with the viral RNA polymerase, and it has essential functions in host genotype specificity (Schaad et al . 1997) . The nuclear inclusion a proteinase (NIa-Pro) cleaves the polyprotein (Parks et al . 1995) . The coat protein (CP) is involved in aphid transmission, cell-to-cell and systemic movement, encapsidation of the viral RNA and regulation of viral RNA amplification (Atreya et al . 1991; Dolja et al . 1994) . CP sequences have been studied extensively to assess the difference within and between potyvirus species (Urcuqui-Inchima et al . 2001) . Phylogeographical analysis looks for congruence between the phylogenetic and geographical relationships of organisms. It thereby elucidates the processes that underlie the genetic diversity of populations in space and time. We therefore decided that to better understand the evolution of host/virus interactions of TuMV, it was important to determine in more detail the molecular changes that occurred during the evolution of its populations. Our earlier studies have shown that the Asian TuMV subpopulation has probably emerged from the more ancient Eurasian subpopulation (Tomimura et al . 2003 (Tomimura et al . , 2004 . In both regions, Brassica crops are an important component of local agriculture, but in Europe the crops are mostly Brassica species whereas in Asia they are both Brassica and Raphanus species. It is of particular interest to evaluate the impact of recombination in shaping potyvirus populations. However, the structure of potyvirus populations has rarely been analysed at the local scale, so that evolutionary processes determining the population structure of the species are often unknown (Moreno et al . 2004) . Although Japan is close to China geographically, there is no evidence for plant virus migration between China and Japan. There are many reports on 'emerging' plant viruses such as begomoviruses (Brown 2001; Briddon & Stanley 2006) and tospoviruses (Moyer 1999) . It is relatively easy to identify a 'new' emerging virus when it is first introduced into a region; however, a more detailed genetic study of the population is required to identify an 'emergent' virus subpopulation when it is mixed with older endemic subpopulation, and there are few reports known to us, of such studies for plant virus species other than CMV (Moreno et al . 2004) . We report here that we have determined and compared gene sequences from c. 120 representative isolates of TuMV from different parts of China and Japan and from different host species, and used these to assess the genetic structure of the population by studies of their recombination, phylogeny, selection, using neutrality tests and mismatch distribution. In addition, we discuss the results in terms of the information they provide about the changes that have occurred during country-wide evolution, migration and demographic changes in the TuMV populations, resulting in perhaps the most detailed such a study of a plant virus to date. Details of the TuMV isolates, their country of origin, original host plant, year of isolation, and host type are shown in Table 1 , together with details of the isolates for which complete genomic sequences have already been reported (Tomimura et al . 2003; Suehiro et al . 2004 ; GenBank Accession nos AF394601 and AF394602). All the isolates were inoculated to Chenopodium quinoa and serially cloned through single lesions at least three times. They were propagated in Brassica rapa cv. Hakatasuwari or Nicotiana benthamiana plants. Plants infected systemically with each of the TuMV isolates were homogenized in 0.01 m potassium phosphate buffer (pH 7.0), and the isolates were mechanically inoculated to young plants of B. rapa cv. Hakatasuwari, B. pekinensis cv. Nozaki-1go, and B. napus cv. Norin-32go, as well as to Raphanus sativus cvs. Taibyo-sobutori and Akimasari. Inoculated plants were kept for at least 4 weeks in a glasshouse at 25 ° C. The viral RNAs were extracted from purified virions (Choi et al . 1977) or TuMV-infected B. rapa and N. benthamiana leaves using Isogen (Nippon Gene). The RNAs were reverse transcribed and amplified using high-fidelity Platinum™ Pfx DNA polymerase (Invitrogen). The cDNAs were separated by electrophoresis in agarose gels and purified using the QIAquick Gel Extraction Kit (QIAGEN). The purified cDNAs were cloned into pBluescript II SK + . Plasmids were maintained in Escherichia coli XL1-Blue (Stratagene). Sequences from each isolate were determined using two independent reverse transcription-polymerase chain reaction (RT-PCR) products and cloned plasmids. Each cloned plasmid and RT-PCR product was sequenced by primer walking in both directions using the BigDye Terminator version 3.1 Cycle Sequencing Ready Reaction Kit (Applied Biosystems) and an Applied Biosystems Genetic Analyser DNA Model 310; ambiguous nts in any sequence were The sequences of 196 isolates, collected worldwide, were obtained from the international gene sequence database and used for the recombination analyses. First, we joined the nt sequences of the P1, part C-terminus of cylindrical inclusion protein (Ct-CI, 3′ terminal 63 nts), 6K2, VPg, NIa-Pro and CP genes of each sequence, and called this its 'concat', namely a concatenated sequence analogous to a contiguous sequence or 'contig'. Phylogenies were calculated from 29 equal slices of the aligned gene sequences. Only those from the region around nt 5501-6500 in degapped sequences, which we call region 12 (R12; see Tomimura et al. 2003) , and which encodes from the Ct-CI to the middle of NIa-Pro nt sequences, gave trees calculated by three different methods [i.e. maximum-likelihood (ML), maximum parsimony (MP) and neighbour-joining (NJ), as described below] that were almost identical to those constructed from entire genomic sequences. The concats were used for evolutionary analyses. The corresponding regions of two sequences of Japanese yam mosaic virus ( JYMV) (Fuji & Nakamae 1999 , one of Scallion mosaic virus (ScMV) (Chen et al. 2002) and one of Narcissus yellow stripe virus (NYSV) (Chen et al. 2003) were used to align the TuMV concats, as blast searches had shown them to be the sequences in the international sequence databases most closely and consistently related to those of TuMV; TuMV P1 genes were more closely related to those of JYMV than ScMV, whereas for some other TuMV regions/genes such as R12 and the remainder of NIa-Pro gene it was the converse, except that the TuMV CP gene was most closely related to that of NYSV. We thus aligned all 196 P1 sequences with those of two JYMV isolates as the outgroup, the R12 + Pro sequences with those of JYMV and ScMV, and the CP sequences with that of NYSV using clustal x (Jeanmougin et al. 1998 ). However, this procedure resulted in some gaps, which were not in multiples of three nts. Therefore, the amino acid sequences corresponding to individual gene regions, together with the appropriate outgroups shown above, were aligned using clustal x with transalign (kindly supplied by Georg Weiller) to maintain the degapped alignment of the encoded amino acids, and then the separate gene regions reassembled to form sequences 3315 nts long. The aligned sequences were checked for incongruent relationships, which might have resulted from recombination, using rdp (Martin & Rybicki 2000) , geneconv (Sawyer 1999) , bootscan (Salminen et al. 1995) , maxchi (Maynard Smith 1992), chimaera (Posada & Crandall 2001) and siscan programs in rdp2 software (Martin et al. 2005) , and phylpro (Weiller 1998 ) and siscan version 2 (Gibbs et al. 2000) programs in the original software. We first checked for incongruent relationships using phylpro and siscan version 2, and then the programs in rdp2, whose analyses were done using default settings for different detection programs and a Bonferroni-corrected P-value cut off of 0.05 or 0.01, and then all isolates that had been identified as likely recombinants by the programs in rdp2 were rechecked using phylpro and siscan version 2, reviewing not just the complete nt sequences but also the synonymous (d s ) and nonsynonymous (d n ) sites separately. In addition, we checked 100 and 50 nt slices of all the sequences for evidence of recombination using the same programs. These analyses also assessed which nonrecombinant sequences have regions that are most closely related to regions of the recombinant sequences, and hence indicate the likely lineages that provided those regions to the recombinants. For simplicity, we call these the 'parental' isolates of recombinants, although they are merely those that include the most closely related regions. The phylogenetic relationships of the sequences were determined by three methods: the ML algorithm of treepuzzle version 5.0 (Strimmer & von Haeseler 1996; Strimmer et al. 1997) , the MP algorithm of paup* 4.0 beta version 10 (Swofford 2003) and the NJ algorithm of phylip version 3.5 (Felsenstein 1993) . For ML analyses, 100 puzzling steps were calculated using the Hasegawa-Kishino-Yano (HKY) model of substitution (Hasegawa et al. 1985) . For MP analyses, the heuristic search option and 100 bootstrap resamplings was used. For NJ analyses, distance matrices were calculated by dnadist with the Kimura two-parameter option (Kimura 1980) , and trees constructed from these matrices by the NJ method (Saito & Nei 1987) . A bootstrap value for each internal node of the NJ trees was calculated using 1000 random resamplings with seqboot (Felsenstein 1985) . The calculated trees were displayed by treeview (Page 1996) . One ScMV and two JYMV sequences were used as the outgroup to construct a phylogenetic tree of the concats, because the CP sequence of only one NYSV isolate was available (Chen et al. 2003) . The JYMV and ScMV sequences corresponding to individual gene regions within the concats were aligned as the encoded amino acids using clustal x with transalign to maintain the degapped alignment of the nts and then reassembled to form sequences 2979 nt long. The phylogenetic relationships of all the East Asian sequences were determined in almost the same way as that used to analyse isolates from all parts of the world. In this analysis, only isolates with known collection years were used. The 306 nts nearest to the 5′ end of the concats were discarded before the phylogenies were constructed because this region contained many recombination sites (RSs) involving parents from different major lineages (i.e. interlineage recombinants). Bootstraps using 100 or 1000 puzzling steps were calculated using the HKY model of substitution for ML analyses. The parts of the JYMV and ScMV sequences homologous the component parts of the concats were aligned using clustal x with transalign in order to maintain the degapped alignment of the nts, and then reassembled to form concat sequences 2673 nt long. The nt diversity was estimated using Kimura twoparameter correction (Kimura 1980) , and was expressed as the average number of nt substitutions per site in each pair of sequence variants. We used a ML approach to assess selection pressures in TuMV. D n and D s differences that correlated with phylogenetic relationships were estimated using the codeml program of paml package version 3.14 (Yang 1997) with parameter runmode = −2 and Nssites = 0; this model assumes one d n /d s ratio for all sites, and uses the Pamilo-Bianchi-Li (PBL) method (Li 1993; Pamilo & Bianchi 1993 ) of mega version 3.1 (Kumar et al. 2004 ). dnasp version 4.10 (Rozas et al. 2003 ) was used to estimate Tajima's D (Tajima 1989) , Fu & Li's D and F (Fu & Li 1993) statistical tests, and haplotype diversity. Tajima's D test is based on the differences between the numbers of segregating sites and the average number of nt differences. Fu & Li's D test is based on the differences between the numbers of singletons (mutations appearing only once among the sequence) and the total numbers of mutations. Fu & Li' s F test is based on the differences between the numbers of singletons and the average number of nt differences between every pair of sequences. Haplotype diversity was calculated based on the frequency and number of haplotype in the population. arlequin version 3.0 (Excoffier et al. 2005 ) was used to estimate the frequency distribution of the number of pairwise differences among all sequences (mismatch distribution). The analysis was based on 1000 simulated samples. Mismatch distribution was used to evaluate which population had a star-like phylogeny due to the accumulation of low frequency mutations during a recent expansion. A total of 118 TuMV isolates were examined in this study; 34 came from mainland China, seven from Taiwan, three from Korea, and the remaining 74 from different islands of Japan of which 64 were from Kyushu, nine from Honshu and one from Hokkaido (Table 1 and Fig. 1 ). The original host plants of one isolate, CHN12, and the host types of two isolates, C1 and TW, both Taiwenese, were unknown, as too the year of collection of the Korean RAD1 and RHS1 isolates. B. rapa and N. benthamiana plants infected systemically with each of the East Asian TuMV isolates were homogenized and their sap mechanically inoculated to young plants of B. rapa cv. Hakatasuwari, B. pekinensis cv. Nozaki-1go, and B. napus cv. Norin-32go, as well as to R. sativus cvs. Taibyo-sobutori and Akimasari. Nineteen out of 26 isolates collected from Brassica plants in China infected most Brassica plants systemically but did not infect R. sativus (Table 2A) . Thus, half of the isolates, despite minor differences in virulence for some of the tested Brassica hosts (data not shown), belong to the Brassica (B) infecting host type (Tan et al. 2004) , and seven of the isolates always infected Raphanus plants, although sometimes only occasionally, and thus are of the Brassica/Raphanus (BR) host type. By contrast, only three of 17 isolates collected from Brassica in Japan were B host type and most were BR infecting host type whether they had been collected from Brassica and Raphanus plants (Table 2B ). The same host-type patterns were obtained when the results for China were separated into their mainland Chinese and Taiwanese components, and the Japanese isolates were separated into Kyushu and Honshu-Hokkaido components. All the isolates collected from Raphanus plants in China and Japan were BR host type. The sequences of the genes encoding the P1, Ct-CI, 6K2, VPg and NIa-Pro, and the CP of each isolate from Kyushu, Japan, were determined. The genomic regions encoding the P1, 6K2, VPg, NIa-Pro and CP of all isolates were 1086, 159, 576, 729 and 864 nts long, respectively. Therefore the lengths of the genes of all Chinese, Korean and Japanese isolates were identical. The sequences are in the GenBank, EMBL and DDBJ databases with Accession nos AB233078-AB233174. We looked for evidence of recombination in the concatenated P1, 6K2, VPg, NIa-Pro and CP sequences ('concats') (Tomimura et al. 2004) . As the 6K2, VPg and NIa-Pro regions are contiguous in the genome, the actual position of RSs within those genes, except perhaps in the 5′-and 3′terminal 100 nts, will be revealed by phylpro and siscan analyses. Similarly, RSs within the P1 and CP regions will be found within those regions. However, RSs in the helpercomponent protease protein (HC-Pro), protein 3 (P3), first 6-kda 1 protein (6K1) and CI genes which lie between the P1 gene and the 6K2 gene will appear as RSs at sequence interface of the concat. Similarly, single RSs, but not 'insertions', within the nuclear inclusion b protein (NIb) region will be found at the NIa-Pro/CP interface of the concat. We searched for evidence of recombination in a total of 196 TuMV isolates from around the world Ohshima et al. 2002; Tomimura et al. 2003 Tomimura et al. , 2004 Tan et al. 2004 ) using the phylpro program (Weiller 1998) . This showed that some were phylogenetically anomalous, and these were examined by the 'sister scanning' method (siscan version 2) (Gibbs et al. 2000) to determine whether these anomalies resulted from recombination rather than convergent selection. Then, the programs in the rdp2 package (Martin et al. 2005) were used to confirm whether the RSs detected by the phylpro and siscan programs were likely. Recombination detecting programs can be grouped by their basic assumptions; for instance, bootscan (Salminen et al. 1995) , rdp (Martin & Rybicki 2000) and siscan programs are phylogenetic methods, whereas geneconv (Sawyer 1999) , maxchi (Maynard Smith 1992) and chimaera (Posada & Crandall 2001) programs are substitution methods, and the phylpro program is a distance comparison method (Posada & Crandall 2001) . All the East Asian isolates which had been identified as recombinants by Tomimura et al. (2003) and Tan et al. We therefore used siscan to check 50 and 100 nt slices of all Eurasian sequences for evidence of recombination either using all nts, or using those that had ds or dn changes separately. Table 3 summarizes the location of crossover sites found in the East Asian genomes by the seven recombination detecting programs, and Fig. 2 shows the different patterns of RSs in those recombinants together with a list of the sequences in which the various patterns were found. There seemed to be at least 20 RSs in the TuMV genomes and 17 recombination type patterns (A-Q, Fig. 2 ) in the East Asian population. Fourteen of the 20 RSs seemed to be derived from parents from different major lineages (i.e. interlineage RSs), seven RSs seemed to be derived from parents from the same major lineage (i.e. intralineage RSs). Most of RSs were detected by all programs; therefore, we called these 'clear' RSs, but two intralineage RSs at around nts 6121 and 6300 in the VPg gene were only detected by some programs and one of the parents had an RS at the same site, and thus, further analyses will be required to clarify this anomaly. We therefore considered these sites to be 'tentative' RSs or false positives, *Approximately estimated recombination crossover sites detected in the Turnip mosaic virus concat aligned sequences by the recombination detecting programs. Crossover site shows locations of individual genes/regions in 1J genome (Ohshima et al. 1996) . †Recombinant isolates identified by the recombination detecting programs; R (rdp), G (geneconv), B (bootscan), M (maxchi), C (chimaera) and S R (siscan) programs in rdp2, and S S (siscan synonymous site analysis) in siscan version 2 and P (phylpro) programs. The analysis was carried out with default settings for the different detection methods and a Bonferroni-corrected P values cutoff of 0.01. Typical pattern for the detected recombinants is listed. ‡The reported P-value is for the program in bold type in RDP2 and is the greatest P-value among the isolates calculated for the region in question. §One of the parents of recombinants showed Z-values greater than 3 in (on analysis of all nucleotidesites in S R of rdp2 program; thus, lower Z-values of one of the parents identified in S S of siscan version 2 program is shown. ¶For all isolates, see Fig. 2 . Recombination crossover sites of the isolates (in parentheses) were detected by the programs using 40 entire genomic sequences available from the international gene sequence data bases. For details of the genogroups, basal-BR, Asian-BR and world-B, see Ohshima et al. (2002) . and so recombination type patterns K and M may be artefacts. Two of the different recombination patterns (I and J) were found in many isolates. The other recombination patterns were only found in the genomes from three or fewer isolates. Isolates with patterns I were found in China and Japan, whereas patterns A, C-G, J and O-Q were found only in Japanese isolates and patterns B and L only in those from China. Isolates with pattern N, the mosaic genome pattern with basal-BR sequence, were found in both China and Korea but not in Japan. In summary, the Japanese population seemed to have a greater number of different recombination patterns than the Chinese (Table 4) . Earlier analyses using recombination detecting programs showed that as larger numbers of sequences are obtained and compared, the parents of tentative recombinants are more certainly identified (Tomimura et al. 2004 ). One recombination pattern Q was newly found among East Asian isolates in the present study, and thus, in summary, a total of 45 out of 118 (38%) of the East Asian isolates were found to be 'clear' recombinants. We checked the Fig. 2 Recombination maps of Turnip mosaic virus genomes; the estimated nucleotide positions of the recombination sites (RSs) are shown relative to the 5′ end of the P1 gene using the numbering of the 1J sequence (Ohshima et al. 1996) . The approximate RS positions were estimated from data analysed using the phylpro and siscan version 2 of the original software (Gibbs et al. 2000; Weiller 1998 ) and also rdp (Martin & Rybicki 2000) , geneconv (Sawyer 1999) , bootscan (Salminen et al. 1995) , maxchi (Maynard Smith 1992), chimaera (Posada & Crandall 2001) and siscan in the rdp2 program, together with that published by Ohshima et al. (2002) and Tan et al. (2004) . The wide box shows the P1, R12 + Pro and CP regions sequenced in this study, the narrow box shows the regions not yet sequenced. The boxes in white, grey and black are, respectively, of world-B, Asian-BR and basal-BR group parents as assessed by phylogenetic analysis. The thin line indicates 'tentative' RS that may be false positives (see Results), and the bold line indicates 'clear' RSs. The recombination type patterns K and M may be nonrecombinant (see Results). Recombination type pattern H of Nepalese isolates (NPL4 and 5) is included, and the isolates had interlineage RSs at nt 605 in P1 gene (see Tan et al. 2004). geographical distribution of different recombination patterns and found that isolates with identical recombination patterns tended to be grouped and, when present in different countries, also tended to be grouped within those countries, an indication of founder effects during colonization (data not shown). We assessed the phylogenetic relationships of all the 196 concats including those of all the recombinants identified previously Tomimura et al. 2003 Tomimura et al. , 2004 Tan et al. 2004 ). However, the resulting trees were inconsistent, and had poor bootstrap support for some lineages as had previously been found for concat sequences (Tomimura et al. 2004) . We therefore excluded the interlineage recombinant sequences and calculated trees from the concats of the remaining 139 isolates. The trees were obtained by ML, MP and NJ methods, and the ML tree is shown in Fig. 3 . All the trees partitioned most of the sequences into the same four consistent groups as reported earlier: basal-B, basal-BR, Asian-BR and world-B Tomimura et al. 2003 Tomimura et al. , 2004 Tan et al. 2004 ). However, no East Asian isolate fell into the basal-B group, which is sister group to all others in the ML, MP and NJ phylogenies, consists mostly of Eurasian isolates, and is most closely related to the JYMV and ScMV outgroup sequences. The remaining isolates form two major lineages one of B host type isolates, the 'world-B group', and the other of BR host type isolates, mostly from East Asia, and itself forming two lineages, the 'basal-BR group', which includes a few European isolates and the 'Asian-BR group'. The Chinese isolates fell into both the Asian-BR and world-B groups, whereas Japanese isolates occurred in all three groups. Within the world-B and basal-BR groups, there were discrete Asian sublineages. The basal-BR group seemed to be split into at least three sublineages. Genetic differences. The genomic recombination type patterns (Fig. 2) were studied further to see whether they provided more information about the East Asian TuMV populations. Table 1 and Fig. 2 , respectively. The recombination type pattern L in China is excluded because its original host and host type are not known. Recombination type patterns K and M may be 'tentative patterns' or 'false positives' because one of parents had recombination sites at almost identical site. ‡Basal-BR isolates were identified as nonrecombinants in this study. A maximum-likelihood (ML) tree calculated from the concat sequences of 139 isolates of Turnip mosaic virus that did not include the interlineage recombinants identified in this study and those reported by Tan et al. (2004) and Tomimura et al. (2004) . Numbers at each node indicate the percentage of supporting puzzling steps (or bootstrap samples) (only values > 50 are shown) in ML, maximum-parsimony and neighbour-joining, respectively. Horizontal branch length is drawn to scale with the bar indicating 0.1 nt replacements per site. The homologous sequences of two isolates (mild and j1) of Japanese yam mosaic virus ( JYMV) and an isolate of Scallion mosaic virus (ScMV) were used as the outgroup. The name of each isolate, its country of origin, original host plant, year of isolation and host type are shown for isolates not listed in Table 1 . For details of the genogroups, basal-B, basal-BR, Asian-BR and world-B, see Ohshima et al. (2002) . However although it was found ( Table 4 ) that most of the B host type isolates had gene sequences with recombination pattern K with 'tentative' RSs, whereas two-thirds of those of host type BR had recombination pattern I and J, the former came from China and the latter from Japan, thus recombination pattern and provenance were confounded. Chinese isolates had fewer different recombination patterns than Japanese isolates, many of which were interlineage recombinants of world-B × Asian-BR groups of BR host types. Furthermore, only three of the patterns were unique to China; one isolate with pattern B and L, and two others with pattern N. Thus, the recombination patterns did not yield additional phylogeographical information. Many East Asian recombinant isolates had parents from different major lineages (i.e. interlineage recombinants) with RSs in the middle of P1 genes (Fig. 2) ; therefore, we calculated trees from the concats after discarding the 5′terminal 306 nts of the P1 genes (894 nt from 5′ noncoding region in the genome of original 1J isolate, refer Fig. 2) of each isolate. These trees were calculated for 97 East Asian isolates of known collection year including the nonrecombinants and the intralineage recombinants identified in this study. Figure 4 A shows the ML tree, but all three methods, ML, MP and NJ, partitioned most sequences into the same three consistent groups; basal-BR, Asian-BR and world-B. Many Chinese and Japanese isolates, but not all, seemed to be clustered in country-specific lineages in each of the three major lineages. Some Taiwanese and Japanese isolates clustered into a single lineage in the world-B group (Fig. 4A) . Likewise, the basal-BR group consisted only of Japanese isolates but these split into three sublineages I, II and III, and although there were only a few isolates from Honshu-Hokkaido, they, and the Kyushu isolates, clustered into sublineages of the world-B and basal-BR groups. Star-like phylogenies were seen in basal-BR and Asian-BR groups. We also checked gene-by-gene to see whether there were significant differences between the Chinese and Japanese populations in their d n /d s substitution rates using the codeml and PBL methods (Table 5 ). It was found that d n was always smaller than d s and differed considerably in different genomic regions. This indicates that there is selection against most amino acid changes, namely 'negative selection', in most of these regions, although all had a d n / d s ratio in the range reported for most other DNA and plant RNA viruses (García-Arenal et al. 2001; Rubio et al. 2001) . The largest d n /d s ratio was for the 5′-terminal region of the P1 gene, whereas the d n /d s ratios for the Chinese and Japanese populations and for the different phylogenetic lineages were almost the same in all the genes and concats analysed by the two methods. The d n /d s ratios of sequences from isolates collected in Taiwan were two to five times higher than those of other populations, although this may have resulted from the small number of isolates analysed. Temporal differences. Temporal differences in populations can be assessed by sampling them on different occasions, and also, making various assumptions, from tree comparisons. We therefore calculated trees from the concat sequences discarding the 5′-terminal 306 nts of the P1 genes of isolates collected during and before 1999 and during and after 2000 separately (Fig. 4B, C) . Both the '1999-tree' and the '2000-tree' partitioned the sequences into three groups, basal-BR, Asian-BR and world-B, whereas the basal-BR group consisting many Japan/Kyushu isolates appeared after 1999. Temporal information can also be obtained from trees, because if different lineages are evolving at similar rates then the lengths of the branches, or the diversity of different parts of the tree, give an indication of their age. This is probably legitimate for this TuMV data as the paml and mega analyses described above revealed almost no significant selection differences between the lineages except in the Taiwanese population. The nt diversity (average number of nt substitutions per site in each pair of sequence variants) was estimated for each genomic region (P1, 6K2, VPg, NIa-Pro and CP genes), each country population, each lineage and each collection period. These analyses showed appreciable differences between the lineages. The P1 gene was the most variable when all isolates were compared, but not, as would be expected, when different country populations/lineages of the isolates were considered separately (data not shown). The Chinese populations of both Asian-BR and world-B isolates were more diverse than the Japanese populations in most of the genes examined, suggesting that the Chinese populations may be older, although to confirm this it will be necessary to compare nt diversities of the isolates collected in comparable areas of China and Japan. However, when the pre-and post-1999/2000 populations of Chinese isolates were compared, they were found to be closely similar; the later ones from Kyushu were significantly more diverse (data not shown). Interestingly, isolates of the two basal-BR sublineages that appeared in Japan during and after 2000 had less than one quarter the diversity of the Asian-BR and world-B isolates ( Table 6 ). The average year of collection of the pre-and post-1999/2000 populations in China and in Japan was 2 and 4 to 5 years apart, respectively. Temporal difference of genetic distance in each gene, geographical and genetic group were estimated by plotting, for each pair of sequences, the genetic distances against the interval between the collection times for the pair, but there was no significant correlation (r < 0.5, a coefficient of correlation) between them. We also assessed the temporal difference of the genomic recombination type patterns of Chinese and Japanese population in two different collection periods (before and after 1999/2000; data not shown), and again found no significant difference (P > 0.05, χ 2 test). A maximum-likelihood (ML) tree calculated from the concat sequences of East Asian isolates of Turnip mosaic virus. Isolates with known collection year were used. The sequences of 306 nucleotide from 5′ end to the interlineage recombination sites of concat sequences were discarded and the trees were calculated (see Materials and methods). (A) The ML tree calculated from all sequences of East Asian isolates that did not include the interlineage recombinants identified in this study; (B) '1999-tree' those calculated from the sequences of East Asian isolates collected before and including 1999; (C) and '2000-tree' those collected after and including 2000. Numbers at each node indicate the percentage of supporting puzzling steps (or bootstrap samples) (only values > 50 are shown) in ML, maximum-parsimony and neighbour-joining, respectively. Horizontal branch lengths are drawn to scale with the bar indicating 0.1 nt replacements per site. The homologous sequences of two isolates (mild and j1) of Japanese yam mosaic virus (JYMV) and an isolate of Scallion mosaic virus (ScMV) were used as the outgroup. For details of the genogroups, basal-BR, Asian-BR and world-B, see Ohshima et al. (2002) . The distribution of pairwise nt differences, or mismatch distribution (Rogers & Harpending 1992; Rogers 1995) , for each TuMV subpopulation or lineage was evaluated in arlequin (Fig. 5) . For populations experiencing long-term demographic stability, the stochastic process of lineage extinction via genetic drift produces a ragged multimodal distribution. On the contrary, in the recently expanded and still intact population, the majority of lineage coalescence events are expected to post-date the expansion, producing a smooth unimodal Poisson distribution around the time of (Yang 1997 ) and Pamilo-Bianchi-Li (PBL) method of mega version 3.1 (Kumar et al. 2004) . The d n /d s ratios were estimated from the concats of 2673 nts long, on the other hand, those from the concats of 2979 nts long are shown in parenthesis (see Materials and methods). An isolate from Hokkaido was not included in the analysis because it had interlineage recombination site in the genome (interlineage recombinant). expansion, reflecting the star-like phylogeny of alleles due to the accumulation of low frequency mutations since the expansion. In addition, nt polymorphism of TuMV populations were estimated using Tajima's D, Fu & Li's D and F statistical tests (Table 6 ) because we were interested in discriminating between demographic expansion and contraction. These statistical tests are expected to have negative values for background selection, genetic hitchhiking and demographic expansion, and the negative values indicate that population maintained low frequency polymorphism. Because selection events such as genetic hitchhiking and background selection affect relatively small fractions of the genome, a multilocus trend of negative statistical values would indicate that demographic forces are acting on the population (Tajima 1989; Hey & Harris 1999; Tsompana et al. 2005) . On the other hand, positive values are expected to be produced by balancing selection and a decrease of the population size. We found that the shapes of pairwise mismatch distribution for world-B groups in all subpopulations were similar, and were ragged and multimodal, supporting long-term demographic equilibrium or an insufficient number of samples ( Fig. 5A , C, D, H). On the other hand, the shape of mismatch distribution for basal-BR III group in Kyushu was smooth and unimodal and a good visual fit with the expectation of the sudden expansion model (Fig. 5G) . Likewise, a highly significant values of Tajima's D, Fu & Li's D and F statistical tests supported demographic expansion of the same isolates (Table 6 ). Whereas the mismatch distribution for the Asian-BR and basal-BR II groups in Kyushu were not unimodal, the accumulation of low-frequency mutations was characteristic of nonequilibrium population dynamics (Fig. 5E, F) , and notably, the number of pairwise differences at the peak of the first 'wave' in the distribution was almost similar to that of basal-BR III group of Kyushu, suggesting a recent demographic instability in the Asian-BR and basal-BR II groups of Kyushu. The haplotype diversity in all groups analysed had a value of 1.000 (Table 6 ). We have determined and compared TuMV gene sequences from c. 120 representative isolates from East Asia in order to assess the spatial genetic structure of the population using different evolutionary assumptions. In particular, we collected more than 60 TuMV isolates representing all regions of Kyushu island (approximately 42 000 km 2 ), which is located at the southwest end of the main Japanese island chain and therefore is the island nearest to mainland continental China, this also involved resampling localities from which TuMV had been obtained before 2000. These isolates were cloned by single-lesion isolations because of the high frequency of mixed infections in the field, not only with other viruses, especially CMV, but also other isolates of TuMV (approximately 2% in Kyushu), which had caused problems when gene sequencing for earlier studies (data not shown). Biological cloning is, of course, also mandatory when attempting to analyse recombinational events, so that one does not mistake a mixture of two distinct isolates as a recombinant Moreno et al. 2004 ). There have been several studies of the genetic structure of plant virus populations, for instance of those of begomoviruses (Briddon et al. 2004) , Citrus tristeza virus (Rubio et al. 2001) , Rice yellow mottle virus (RYMV) (Traore et al. 2005) , Tomato spotted wilt virus (TSWV) (Tsompana et al. 2005) and TuMV (Tomimura et al. 2004 ). All these have been continental-scale studies. Whereas analyses of the structure of local populations are less frequently reported, they may be particularly informative (García-Arenal et al. 2000; Bateson et al. 2002; Moreno et al. 2004) . For example, the increased frequency of recombinants in the CMV population of Spain was reported recently (Bonnet et al. 2005 ). In addition, there are few reports of phylogeographical studies of plant viruses (Tsompana et al. 2005 ) like that we report here of TuMV from species of Brassicacae in China and Japan. We have collected samples of nearly 300 plants of R. sativus, B. rapa, B. juncea and other crucifers showing mosaic symptoms plants, over the past decade in Kyushu island. However, although many of R. sativus plants reacted positively with TuMV antisera in enzyme-linked immunosorbent assay (ELISA) tests as it is relatively easy to detect TuMV in this species, it was more difficult to find TuMV-infected Brassica plants. In the present study, only 17 TuMV isolates were found in Brassica plants in Kyushu and other area of Japan and all were involved in this study (Table 2) . Host typing of each isolate also revealed that most B and all BR host type isolates were collected, respectively, from Brassica and Raphanus plants in China, and hence the 'host type' of each isolates correlated well with the host species from which it was originally isolated, whereas most isolates from Japan were BR host type regardless of whether the isolates had been collected from either Brassica or Raphanus plants. These results indicate that TuMV may be constrained differently by its host plants in the two countries, but whether this is a direct function of the host, or an indirect effect of the host on aphid vectors, is unknown. Recombination is an important source of genetic variation for potyvirus species (Revers et al. 1996; Moury et al. 2002; Moreno et al. 2004; Tomimura et al. 2004; Chare & Holmes 2006) . Our earlier studies showed that the mapping of recombinational events as well as phylogenetic relationships were useful for tracing the migration and evolution of TuMV (Tan et al. 2004 ). In the present study, many RSs in the P1 gene of the TuMV genome were similar in the Chinese and Japanese isolates. Although it is not possible to exclude the possibility that the RSs positions coincide with recombinational hotspots, it is also possible that genomes with the same recombination type pattern have a shared ancestor. In fact, from the analysis of the concat sequences, although number of isolates in recombination type patterns is different in each country, the isolates of recombination type pattern I, and nonrecombinants in Asian-BR and world-B groups migrated in both country populations (Figs 2 and 3) and hence probably represent the successful 'founder' TuMVs of East Asia. However, although the recombination type patterns and nonrecombinants did not give additional insights, they were congruent with the relationships indicated by the phylogenetic analyses. Note that some of the conflicting signals attributed to recombinant events may be due to differences in rate of evolution among genes and among lineages. The degree of translational selection in genes can be estimated by comparing the nt diversity at d n vs. d s positions. Using this measure, it has been found that there is strong negative selection (i.e. selection against change) operating on most animal and plant viruses (Domingo et al. 2001; García-Arenal et al. 2001) . In TuMV, the P1 gene has the largest d n /d s ratio, whereas the NIa-Pro has the smallest, indicating that different genes are under different selective constraints (Tomimura et al. 2004) . In this study, we looked at the d n /d s ratios of five genes in the populations in each country and district, and found that the values for Chinese and Japanese isolates were almost identical but that for the Taiwanese population was much greater (Table 5) , indicating that constraints in different local population may not be similar. We also tried to estimate d n /d s ratios of each lineage in each country using intralineage recombinant sequences (data not shown), but numbers of isolates available for these analyses were very small (see Fig. 4B ,C), and no worthwhile estimate could be obtained. Star phylogenies, as found in epidemic populations of Simian and Human immunodeficiency viruses, and CMV have been considered to indicate a recent emergence with minimal selection (Myers et al. 1993; Roossinck et al. 1999) . Such star phylogenies were seen in the Asian-BR and basal-BR groups of '1999-tree' and '2000-tree' (Fig. 4B, C) . Furthermore, phylogenetic analyses of isolates collected pre-and post-1999/2000 showed that new lineages of closely related basal-BR II and III isolates appeared around Kyushu after 1999, and the sudden expansion of the isolates in the basal-BR III lineage was strongly supported by the deviations from the neutral equilibrium model for the geographical lineages with overall lack of nt diversity (Table 6) , and by the analysis for mismatch distribution within individual geographical group (Fig. 5) . Mismatch distribution analysis provided evidence of sudden expansion of the severe acute respiratory syndrome (SARS) coronavirus (Yeh et al. 2004) . A combination for high haplotype diversity and low genetic diversity, assessed by mitochondrial DNA (mtDNA) markers, is taken as evidence of a recent population expansion after a genetic bottleneck (Grant & Bowen 1998) and this was found for a plant virus, TSWV, using nt sequences (Tsompana et al. 2005) . This combination was especially seen in the basal-BR III group of TuMV population, and this conclusion may also apply to this population, although RNA viruses may evolve faster than mtDNA, and nt mismatches may produce unusually large haplotype diversities. However, it is likely that the results obtained from the population of basal-BR III indicate a sudden expansion after a bottleneck, namely a 'founder effect', confirming other conclusions of our study. In TuMV population, some of closely related subpopulations of the world-B and Asian-BR groups were found in both China and Japan, and this was confirmed by the shared recombination type patterns (Fig. 2) and phylogenetic relationships of the isolates (Fig. 4) . On the other hand, no nonrecombinant basal-BR isolates, like those found in southern Japan, were found in China. So the question then is from where did the basal-BR lineage in Japan come? It may have originated in Japan, or from the TuMV population in other parts of the world and our existing phylogenetic evidence does not distinguish between these two possibilities. TuMV is transmitted by aphids in a nonpersistent manner but there is no record of seed transmission (Provvidenti 1980) although several potyviruses are. However, the occurrence of TuMV in broad bean (Vicia faba) and in saffron (Crocus sativus) in China (Hu et al. 1996; Chen & Chen 2000) , in Allium ampeloprasum in Israel (Gera et al. 1997) and in Ranunculus (Ranunculus asiaticus) in Italy (Tomimura et al. 2004) indicates that the host reactions of TuMV, including its ability to be seed-borne, may need to be re-examined. The other alternative would be metapopulation events involving colonization and extinction as has been reported for several plant viruses including CMV, tobamoviruses and TSWV (Fraile et al. 1997; García-Arenal et al. 2000 , 2001 Tsompana et al. 2005) . In conclusion, our biological and molecular studies show that the Chinese and Japanese TuMV isolates analysed here are part of the same population but are discrete lineages, each with little diversity and with close evolutionary relationships, suggesting that recent founder effects have shaped their genetic structure, although this may have been modified subsequently by clinal genetic drift. At present, the only effective control of TuMV is through the use of host plant genetic resistance, either conventional or transgenic, or by cross-protection with attenuated strains. Hence, it is very important for us to understand the phylogeography of TuMV population in each country or region. This analysis provides the first demonstration, to our knowledge, of population structuring and species-wide population expansions in a single-stranded plant RNA virus, utilizing a population genetic approach. Amino acid substitutions in the coat protein result in loss of insect transmissibility of a plant virus On the evolution and molecular epidemiology of the potyvirus Papaya ringspot Comparative phylogeography: concepts and application Role of recombination in the evolution of natural populations of Cucumber mosaic virus, a tripartite RNA plant virus Diversity of DNA 1: a satellite-like molecule associated with monopartite begomovirus-DNA β complexes Subviral agents associated with plant single-stranded DNA viruses The molecular epidemiology of begomoviruses Comparative population structure of Cynopterus fruit bats in peninsular Malaysia and southern Thailand A phylogenetic survey of recombination frequency in plant RNA viruses Occurrence and control of mosaic disease [turnip mosaic virus] in saffron (Crocus sativus) Molecular characterization of carla and potyviruses from Narcissus in China An improved method for purification of turnip mosaic virus Distinct functions of capsid protein in assembly and movement of tobacco etch potyvirus in plants Quasispecies and RNA Virus Evolution: Principles and Consequences arlequin (version 3.0): an integrated software package for population genetics data analysis Virus Taxonomy. Eighth report of the International Committee on Taxonomy of Viruses Confidence limits on phylogenies: an approach using the bootstrap Genetic exchange by recombination or reassortment is infrequent in natural populations of a tripartite RNA plant virus Statistical tests of neutrality of mutations Complete nucleotide sequence of the genomic RNA of a Japanese yam mosaic virus, a new potyvirus in Japan Complete nucleotide sequence of the genomic RNA of a mild strain of Japanese yam mosaic potyvirus in Japan Molecular epidemiology of Cucumber mosaic virus and its satellite RNA Variability and genetic structure of plant virus populations The natural occurrence of turnip mosaic potyvirus in Allium ampeloprasum Sister-scanning: a Monte Carlo procedure for assessing signals in recombinant sequences Shallow population histories in deep evolutionary lineages of marine fishes: insights from sardines and anchovies and lessons for conservation bioedit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT Quantitative studies on the transmission of cabbage black ringspot virus by Myzus persicae (Sulz.) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA Population bottlenecks and patterns of human polymorphism A viral disease of broad bean caused by a non-aphid-transmissible strain of turnip mosaic virus Multiple sequence alignment with clustal x Mutations in Turnip mosaic virus P3 and cylindrical inclusion protein are separately required to overcome two Brassica napus resistance genes A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences Mutational analysis of the tobacco vein mottling virus genome mega3: integrated software for molecular evolutionary genetics analysis and sequence alignment Unbiased estimation of the rates of synonymous and nonsynonymous substitution RDP: detection of recombination amongst aligned sequences rdp2: recombination detection and analysis from sequence alignment Analyzing the mosaic structure of genes Variability and genetic structure of the population of watermelon mosaic virus infecting melon in Spain Evidence for diversifying selection in Potato virus Y and in the coat protein of other potyviruses Tospoviruses (Bunyaviridae) Phylogenetic moments in the AIDS epidemic The complete nucleotide sequence of turnip mosaic virus RNA Japanese strain Molecular evolution of Turnip mosaic virus; evidence of host adaptation, genetic recombination and geographical spread treeview: an application to display phylogenetic trees on personal computer Evolution of the Zfx and Zfy genes: rates and interdependence between the genes Expression and purification of a recombinant tobacco etch virus NIa proteinase: biochemical analyses of the full-length and a naturally occurring truncated proteinase form Evaluation of methods for detecting recombination from DNA sequences: Computer simulations Evaluation of Chinese cabbage cultivars from Japan and the People's Republic of China for resistance to turnip mosaic virus and cauliflower mosaic virus Turnip mosaic Potyvirus The tobacco etch potyvirus 6-kilodalton protein is membrane associated and involved in viral replication Frequent occurrence of recombinant potyvirus isolates Highlights and prospects of potyvirus molecular biology Genetic evidence for a Pleistocene population explosion Population growth makes waves in the distribution of pairwise genetic differences Rearrangements in the 5′ nontranslated region and phylogenetic analyses of Cucumber mosaic virus RNA 3 indicate radial evolution of three subgroups dnasp, DNA polymorphism analyses by the coalescent and other methods Genetic variation of Citrus tristeza virus isolates from California and Spain: evidence for mixed infections and recombination The neighbor-joining method: a new method for reconstructing phylogenetic trees Identification of breakpoints in intergenotypic recombinants of HIV type 1 by Bootscanning geneconv: a computer package for the statistical detection of gene conversion. Distributed by the Author. Department of Mathematics VPg of tobacco etch potyvirus is a host genotype-specific determinant for longdistance movement The Potyviridae (eds Shukla DD, Ward CW, Brunt AA) Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies Bayesian probabilities and quartet puzzling An important determinant of the ability of Turnip mosaic virus to infect Brassica spp. and/or Raphanus sativus is in its P3 protein Phylogenetic Analysis Using Parsimony (*and other Methods), Version 4. Sinauer Associates Statistical method for testing the neutral mutation hypothesis by DNA polymorphism Inter-and intralineage recombinants are common in natural populations of Turnip mosaic virus The phylogeny of Turnip mosaic virus; comparisons of thirtyeight genomic sequences reveal a Eurasian origin and a recent 'emergence' in east Asia Comparisons of the genetic structure of populations of Turnip mosaic virus Epidemiology and control of virus diseases of vegetables Processes of diversification and dispersion of Rice yellow mottle virus inferred from largescale and high-resolution phylogeographical studies The molecular population genetics of the Tomato spotted wilt virus (TSWV) genome Potyvirus proteins: a wealth of functions Mutational analysis of the tobacco etch potyviral 35-kDa proteinase: identification of essential residues and requirements for autoproteolysis Turnip mosaic virus and the quest for durable resistance Phylogenetic profiles: a graphical method for detecting genetic recombinations in homologous sequences paml: a program package for phylogenetic analysis by maximum likelihood Characterization of severe acute respiratory syndrome coronavirus genomes in Taiwan: Molecular epidemiology and genome evolution This work is part of Yasuhiro Tomitaka's PhD study on phylogeography and genetic structure of Turnip mosaic virus in Asia and Eurasia. Kazusato Ohshima is a professor in the Faculty of Agriculture at the Saga University, Japan. His research interests include the evolution and epidemiology/ecology of Turnip mosaic virus and potyviruses We thank Akemi Sato and Hisao Shinohara (Saga University, Japan) for their careful technical assistance, and Adrian Gibbs for very kindly reading the manuscript. This work was supported by Grant-in-Aid for Scientific Research no. 17580040 from the Japan Society for the Promotion of Science.