key: cord-0010228-8ulaw69k authors: Merda, Déborah; Briand, Martial; Bosis, Eran; Rousseau, Céline; Portier, Perrine; Barret, Matthieu; Jacques, Marie‐Agnès; Fischer‐Le Saux, Marion title: Ancestral acquisitions, gene flow and multiple evolutionary trajectories of the type three secretion system and effectors in Xanthomonas plant pathogens date: 2017-09-29 journal: Mol Ecol DOI: 10.1111/mec.14343 sha: ac274c4280249196a1f3f31bb8df28017465b1ca doc_id: 10228 cord_uid: 8ulaw69k Deciphering the evolutionary history and transmission patterns of virulence determinants is necessary to understand the emergence of novel pathogens. The main virulence determinant of most pathogenic proteobacteria is the type three secretion system (T3SS). The Xanthomonas genus includes bacteria responsible for numerous epidemics in agroecosystems worldwide and represents a major threat to plant health. The main virulence factor of Xanthomonas is the Hrp2 family T3SS; however, this system is not conserved in all strains and it has not been previously determined whether the distribution of T3SS in this bacterial genus has resulted from losses or independent acquisitions. Based on comparative genomics of 82 genome sequences representing the diversity of the genus, we have inferred three ancestral acquisitions of the Hrp2 cluster during Xanthomonas evolution followed by subsequent losses in some commensal strains and re‐acquisition in some species. While mutation was the main force driving polymorphism at the gene level, interspecies homologous recombination of large fragments expanding through several genes shaped Hrp2 cluster polymorphism. Horizontal gene transfer of the entire Hrp2 cluster also occurred. A reduced core effectome composed of xopF1, xopM, avrBs2 and xopR was identified that may allow commensal strains overcoming plant basal immunity. In contrast, stepwise accumulation of numerous type 3 effector genes was shown in successful pathogens responsible for epidemics. Our data suggest that capacity to intimately interact with plants through T3SS would be an ancestral trait of xanthomonads. Since its acquisition, T3SS has experienced a highly dynamic evolutionary history characterized by intense gene flux between species that may reflect its role in host adaptation. epidemics in agroecosystems worldwide and represents a major threat to plant health. The main virulence factor of Xanthomonas is the Hrp2 family T3SS; however, this system is not conserved in all strains and it has not been previously determined whether the distribution of T3SS in this bacterial genus has resulted from losses or independent acquisitions. Based on comparative genomics of 82 genome sequences representing the diversity of the genus, we have inferred three ancestral acquisitions of the Hrp2 cluster during Xanthomonas evolution followed by subsequent losses in some commensal strains and re-acquisition in some species. While mutation was the main force driving polymorphism at the gene level, interspecies homologous recombination of large fragments expanding through several genes shaped Hrp2 cluster polymorphism. Horizontal gene transfer of the entire Hrp2 cluster also occurred. A reduced core effectome composed of xopF1, xopM, avrBs2 and xopR was identified that may allow commensal strains overcoming plant basal immunity. In contrast, stepwise accumulation of numerous type 3 effector genes was shown in successful pathogens responsible for epidemics. Our data suggest that capacity to intimately interact with plants through T3SS would be an ancestral trait of xanthomonads. Since its acquisition, T3SS has experienced a highly dynamic evolutionary history characterized by intense gene flux between species that may reflect its role in host adaptation. is a complex protein structure anchored in the bacterial membrane (Diepold & Armitage, 2015) . In pathogenic interactions, T3SS may enable the development of disease (via neutralization of plant defences and manipulation of host cellular processes) or trigger host resistance (through recognition inducing hypersensitive response). The T3SS is also widespread in mutualistic and commensal bacteria of protist, fungal and animal hosts (Abby & Rocha, 2012) . As the T3SS and T3Es play a crucial role in prokaryote-eukaryote interactions, knowledge of their origin and evolution is likely to be crucial to deepen our understanding of host adaptation. The origin and evolution of the T3SS has been extensively studied and it has been shown to share a common evolutionary history with the flagellar cluster required for swimming motility. Abby and Rocha (2012) recently proposed that the T3SS evolved by exaptation from the flagellar cluster. Further T3SS diversification then lead to seven distinct families of T3SSs (Ysc, SPI-1, SPI-2, Chlamy, Hrp1, Hrp2 and Rhizo) (Troisfontaines & Cornelis, 2005) . Diversification of the T3SS is not explained simply by vertical evolution. Indeed, it was shown that the evolution of the T3SS cluster has included numerous horizontal gene transfers (HGTs) (Troisfontaines & Cornelis, 2005) . These HGT events are facilitated by localization of T3SS cluster in plasmids or chromosomal pathogenicity islands. Extensive diversification of the T3SS seems to be driven by bacteria ecology, as T3SS families correlate with the host type. Indeed, Rhizo, Hrp1 and Hrp2 seem to be more frequently involved in interactions with plants, whereas Ysc, Chlamy, SPI-1 and SPI-2 appear to be specific to interactions with mammals, insects and amoeba (Abby & Rocha, 2012; Diepold & Armitage, 2015; Troisfontaines & Cornelis, 2005) . In general, each bacterial genus harbours a T3SS from a specific family but some bacteria with complex lifestyles harbour several T3SS from different families. These contrasting patterns of T3SS content could be explained by ancient vertical inheritance mixed with T3SS gain and loss events (Kirzinger, Butz, & Stavrinides, 2015) . The main T3SS families found in phytopathogens are Hrp1 (in Pseudomonas and Erwinia) and Hrp2 (in Xanthomonas, Ralstonia, Acidovorax and Burkholderia); however, alternative noncanonical T3SS has also been described in some plant-associated bacteria such as commensal pseudomonads associated with plants ( Barret et al., 2013) . These two main families of T3SS clusters, Hrp1 and Hrp2, differ in gene content, synteny and transcriptional regulation. Approximately 20 protein-coding genes, called hrp (hypersensitive reaction and pathogenicity) and hpa (hrp-associated) genes, are involved in the biogenesis of T3SS. Among them, nine genes, which were renamed hrc (hrp conserved), are highly conserved in plant and animal pathogens, and eight have homologs in the flagellar cluster (Tampakaki et al., 2010) . T3E repertoires are highly diverse within each genus and even within single bacterial species (McCann & Guttman, 2008) . They vary both in terms of content and size, for instance a given Xanthomonas axonopodis strain may have any between six and 26 T3E genes (Hajri et al., 2009) . It has been suggested that this high variability could be the consequence of the host adaptation process. Indeed, in Pseudomonas and Xanthomonas, the pathogenic strains are highly host specific and the T3E repertoire composition is correlated with host range (Hajri et al., 2009; Sarkar, Gordon, Martin, & Guttman, 2006) . The plasticity of T3E repertoire within a species could be explained by frequent HGTs (McCann & Guttman, 2008) as many T3E genes have been found associated with mobile genetic elements. Understanding the diversity and evolution of T3E repertoires in pathogenic bacteria is essential to gain insight into host adaptation mechanisms. However, identification of T3E genes in whole-genome sequences remains a challenge as T3Es are structurally and functionally highly diverse with more than 50 families identified so far in Xanthomonas and Pseudomonas (Lindeberg, Cunnac, & Collmer, 2012; Ryan et al., 2011) . Recently, machine-learning approaches have been developed. They rely on multiple criteria such as the presence of a secretion signal necessary for recognition by T3SS machinery that is found in Nterminal region of T3E (McDermott et al., 2011) or specific amino acid composition (Lower & Schneider, 2009) . In Xanthomonas, these approaches have enabled the identification of seven novel T3Es in the reference strain 85-10 (Teper et al., 2016) , exhibiting great promise for future discoveries with the exponential growth of genomic data. Xanthomonas are major plant pathogens, devastating crops worldwide. The major pathogenicity determinants of xanthomonads are the Hrp2 type-T3SS and its effectors (White, Potnis, Jones, & Koebnik, 2009) . Xanthomonas albilineans is an exception in the genus as it has no Hrp2 family gene cluster, but a SPI-1 T3SS (Marguerettaz et al., 2011) . Four Xanthomonas species lacking any Hrp-T3SSs and associated T3Es, namely Xanthomonas sacchari (Studholme et al., 2011) , "Xanthomonas cannabis" (Jacobs, Pesce, Lefeuvre, & Koebnik, 2015) , "Xanthomonas pseudalbilineans" (Pieretti et al., 2015) and Xanthomonas maliensis (Triplett et al., 2015) were recently described. Moreover, some Xanthomonas arboricola strains were also found without any T3SS (Cesbron et al., 2015; Merda et al., 2016) . The X. arboricola strains lacking any T3SSs are considered commensal, as no pathogenicity on their respective hosts has been observed. The X. arboricola species has an epidemic population structure, where epidemic clones are represented by successful pathovars (defined as pathovars responsible for epidemics worldwide). They infect stone and nut fruit trees. The recombinant network is represented by commensal strains and unsuccessful pathovars (defined by a limited geographical and potentially temporal expansion) (Merda et al., 2016) . Epidemic clone emergence seems to be correlated with the acquisition of T3Es, whereas in the recombinant network, strains would have lost T3E-coding genes and the T3SS cluster. The recent discovery of several Xanthomonas species and strains that lack the Hrp2 cluster has raised questions about the evolution of virulence and the origin of the T3SS in this genus. Are pathogenicity and Hrp2 clusters ancestral features of Xanthomonas that have been vertically inherited and lost in some species or do they represent more recent acquisitions? Contrasting with our deep knowledge of ancient evolutionary history of T3SS, little is known about recent origin and evolution of T3SS and its role in plant pathogen emergence. Given the pivotal role of T3SS and T3Es in xanthomonads pathogenicity and host specificity, and given their heterogeneous distribution at genus and species scales, the Hrp2 cluster and its effectors in Xanthomonas genus appear to be a good model to study T3SS origin and evolution at a fine evolutionary scale. In this study, we conducted our analyses on a collection of strains representing all valid species from the two phylogenetic groups of the genus (groups 1 and 2) as defined by Young, Park, Shearman, and Fargier (2008) . We inferred their phylogenetic relatedness based on the core genome of the whole genus. Moreover, to get insights into T3SS evolution, we studied not only cluster synteny, hrc gene phylogeny and homologous recombination, but we also considered the genomic environment of the T3SS cluster. Finally, to unveil the evolution of T3E repertoires in relation with pathogen emergence, we determined the T3E repertoires in a collection of 44 X. arboricola genomes representing both commensal and pathogenic strains using a machine-learning approach designed to detect T3E-coding genes in Xanthomonas genome sequences. We used a collection of 82 genome sequences (see Data S1) representing the diversity of Xanthomonas genus (36 strains belonging to Xanthomonas spp.) and the known diversity of Xanthomonas arboricola (44 additional strains; 23 strains being commensal and 21 pathogens) (Merda et al., 2016) . Genomes were sequenced using the Illumina technology and HiSeq 2500 (Genoscreen, Lille, France) or MiSeq instruments. Libraries of genomic DNA were performed using the Kit NextEra XT (Illumina, USA). Paired-end reads of 2 9 100 bp were assembled in contigs using SOAPDENOVO 1.05 (Li et al., 2010) and VELVET 1.2.02 (Zerbino & Birney, 2008) . Annotation was performed using EuGene-PP (Sallet, Gouzy, & Schiex, 2014 ). The T3SS-coding genes representing all T3SS families (Ysc, SPI-1, SPI-2, Chlamy, Hrp1, Hrp2, Rhizo) and their diversity were identified in genome sequences using BLASTp searches with the query sequences presented in the Data S2. We included in our search The genomic environments flanking and encompassing the T3SS cluster were analysed using the R package GENOPLOTR (Guy, Roat Kultima, & Andersson, 2010) . BLASTn between contigs encompassing the T3SS cluster was performed, and only BLAST hits with evalues below 0.01 were used to highlight conserved regions on the plots. First, this analysis was performed only using strains having a T3SS cluster to detect conserved flanking regions upstream and downstream of the cluster between phylogenetic neighbours. For strains lacking T3SS, a BLASTn search was used to find if regions flanking T3SS in T3SS-positive strains were also present in T3SS-negative strains; 5-Kb T3SS-flanking regions identified in the closest phylogenetic neighbours of each T3SS-negative strain were used as query sequences to identify the contig to use in further analyses. To study the synteny in the genomic environments of T3SS insertion site, both contigs from strains with and without T3SS cluster were included in the final analysis. Similar genomic environments of T3SS cluster were defined based on synteny and shared conserved regions spreading over at least four CDS in a 20-Kb window at the left and right sides of T3SS cluster and using X. arboricola CFBP 7179 as a reference (Fig. S1 ). Similar genomic environments of the T3SS cluster were highlighted with the same colour as shown in Fig. S1 . The same approach but using a 200-Kb window upstream and downstream the cluster was used to define the genomic context of T3SS insertion site. For the genomic environments of T3E genes, these T3E genes were located by their locus tag obtained during the search with the machine-learning approach. The same strategy as described above was used to study the genomic environment of avrBs2 insertion site. To locate the T3E genes in the genomes of X. arboricola pathogenic strains of group A, the contigs of genome sequences were ordered using MAUVE (Darling, Mau, Blattner, & Perna, 2004) . The sequence of CFBP 2528 was used as reference because, among group A strains of X. arboricola, the number of contigs was the lowest for this strain (8 contigs). For each genome, the contigs were concatenated using Geneious (Kearse et al., 2012) according to the order obtained with MAUVE. The circular representations were obtained using DNAPlotter (Carver, Thomson, Bleasby, Berriman, & Parkhill, 2009 ). The localization of each T3E gene in pathogenic strains was identified using their locus tag. The core proteome of Xanthomonas was identified with orthoMCL companion (Carrere, Cottret, Rancurel, & Briand, 2015) . The core proteome of X. arboricola was identified with ORTHOMCL Ver-sion2.0.9 analyses on predicted full-length proteins (Li, Stoeckert, & Roos, 2003) . OrthoMCL clustering analyses were performed using the following parameters: p-value cut-off = 1 9 10 À5 ; per cent match cut-off = 80; MCL inflation = 1.5; maximum weight = 316. MERDA ET AL. | 5941 2.6 | Phylogenies of core and T3SS-coding genes Phylogenies were performed using maximum likelihood in the PHYML software package. The phylogeny of the Xanthomonas genus was performed using the concatenated core proteome obtained with orthoMCL companion. For X. arboricola phylogeny, the concatenated orthologous groups were used. Each orthologous group in X. arboricola was aligned using MACSE (Ranwez, Harispe, Delsuc, & Douzery, 2011) . Only alignments with more than 75% sequence identity were kept for the phylogeny reconstruction. These alignments were concatenated using Geneious (Kearse et al., 2012) . For organism maximum-likelihood phylogenies, the JTT model was used. For the T3SScoding genes phylogeny, each hrc gene (with the exception of hrcL for which some CDSs were truncated) was aligned using MUSCLE (Edgar, 2004) , taking into account sequence translation in proteins to conserve the reading frame. The 10 hrc genes were concatenated according to their order in T3SS cluster. The phylogenic analysis was performed with the GTR + I + gamma model, corresponding to the best model identified by jModelTest (Posada, 2008) . The topology of the concatenated hrc tree was compared to the core proteome tree of Xanthomonas genus with a Shimodaira-Hasegawa test (Shimodaira & Hasegawa, 1999) implemented in R package PHANG-ORN (Schliep, 2011) . In the same way, topologies of each individual hrc/hrp tree were compared to each other and to the topology of the concatenated hrc/hrp tree. The impact of recombination (r) relative to mutation (m) was analysed with the q/h statistics using RDP v.3.44 (Martin et al., 2010) for each hrc/hrp gene located in the core region of the cluster (16 genes between hrcC and hrpE). The origins of recombinant sequences were identified by examining the concatenated sequences of hrp and hrc genes (concatenated according to their order in T3SS cluster) using RDP 3, GENECONV, BOOTSCAN, MAXIMUM CHI SQUARE, CHIMAERA, SISCAN and 3SEQ implemented in RDP v. 3.44 (Martin et al., 2010) . We considered that a recombination event was statistically supported when it was detected by at least two methods (Merda et al., 2016) . The recombination event representation was visualized using Circos (Krzywinski et al., 2009 ). The presence of T3SS genes was investigated in 82 genome sequences of Xanthomonas strains (Data S1) through BLASTp searches of 295 proteins representing the diversity of T3SS families (Data S2). The Hrp2 cluster was detected in 61 genome sequences, SPI-1 was identified in the genome sequence of Xanthomonas albilineans, and 20 genome sequences were free of any T3SS-encoding genes whatsoever, with fourteen of which belonging to Xanthomonas arboricola (12 commensal strains and two strains of the pathovar populi). T3SS clusters were also missing in Xanthomonas pisi and Xanthomonas melonis genomes, and as previously shown in some "Xanthomonas cannabis" strains, Xanthomonas maliensis and the group 1 species Xanthomonas sacchari. A robust phylogenetic tree of the Xanthomonas genus, based on the core proteome, was constructed to provide a reference point to infer evolutionary scenarios of T3SS gains and losses ( Figure 1 ). According to this phylogenetic reconstruction, X. maliensis and Xanthomonas campestris diverged very early from other species in group 2. This isolated phylogenetic position of X. campestris clade was unexpected as previous multilocus sequence analyses on whole genus diversity placed X. campestris in the core of group 2 (Triplett et al., 2015; Young et al., 2008) . However, in a genome-based phylogeny of a limited number of species a similar phylogenetic relationship has been inferred (Naushad & Gupta, 2013; Rodriguez et al., 2012) . and S1). Given the divergence between these strains, it is tempting to speculate that Hrp2 cluster was acquired through a single acquisition event in a common ancestor. However, except in the case of clade A and X. bromi, the hrp cluster was retrieved in several different genomic environments in clades B and C (Figures 1 and S1 ). Two scenarios could explain this situation: the first scenario involves the loss of the ancestral Hrp2 cluster and re-acquisition at a different genomic context, and the second scenario includes rearrangements in the 20-kb flanking regions of the ancestral Hrp2 cluster without affecting the genomic context of T3SS insertion site. To decipher which scenario is the most probable, genomic contexts of the T3SS insertion site (i.e., broader genomic environments spreading over 200 kb upstream and downstream) were compared in clades B and C strains selected for the quality of their genome assembly (Figures 1 and S2 ). We showed that rearrangements occurred in the direct flanking regions of T3SS cluster of these strains but that the genomic context of T3SS cluster was similar to those of clade A strains and X. bromi. Insertions of large fragments (80 kb for X. euvesicatoria, X. alfalfae and X. fuscans and 50 kb for "X. cannabis" strain CFBP 7912) in T3SS flanking regions broke synteny in the 20-kb window but this synteny with the direct genomic environment of X. bromi and clade A was observed further away from T3SS cluster in the clade B and C genome sequences. Thus, genomic rearrangement events and gene insertions affected the 20-Kb genomic environment of the T3SS cluster but not its location in the genome; the genomic contexts of T3SS cluster remained similar. These rearrangements differed between different clades and were sometimes supported by the presence of insertion sequences (ISs) and tRNAs (Figs S1 and S2). Altogether, these results support the hypothesis that there was an ancestral acquisition of the T3SS cluster in the common ancestor of clades A, B and C and subsequent rearrangements in flanking regions of the Hrp2 cluster ( Figure 1 ). The second acquisition most likely occurred in X. campestris ancestor. In this clade, the T3SS cluster was found in a different genomic context as no synteny could be found even when flanking regions as broad as 200 kb were compared with those of other Xanthomonas species (Figures 1 and S2) . The phylogenetic tree of T3SSencoding genes (Figure 2b) showed that X. campestris T3SS genes were phylogenetically related to those of clade A strains. Thus, the common ancestor of X. campestris might have acquired the hrp cluster by HGT from a clade A strain. However, the T3SS-encoding F I G U R E 1 Maximum-likelihood phylogeny based on the concatenated sequences of the core proteome (993 proteins) of 80 strains representing the entire Xanthomonas genus and schematic representation of T3SS genomic environments. The T3SS cluster is represented by the letters HRP, its genomic environments (20 kb on each side) by coloured rectangles and its genomic contexts (200 kb on each side) by hatched rectangles in the right column. Different colours correspond to different genomic environments or contexts ( Fig. S1 and S2 ). The colour of the letters HRP represents the different cluster organizations; HRP written in red represents the cluster organization found in group 2 xanthomonads and HRP written in green represents the one found in group 1. Dotted line represents absence of information due to contig interruption. In Hrp-negative strains, HRP letters are replaced by the number of CDS found in place of T3SS cluster at the putative T3SS cluster insertion site. Genomic environments of the insertion site are represented as described above. Probable T3SS acquisition events are represented by red arrow, loss events by blue arrow, and genomic rearrangements by green circled arrow. A dotted arrow represents hypothesis of T3SS loss and re-acquisition. Bootstrap scores (100 bootstraps) higher than 85% are displayed at each node [Colour figure can be viewed at wileyonlinelibrary.com] genes underwent homologous recombination (see below) that altered the phylogenetic signal of the hrp genes. Thus, we cannot exclude the possibility that an ancestral X. campestris hrp cluster would have been replaced by homologous recombination with a clade A strain. Thus, we favour the most parsimonious scenario of an independent acquisition in X. campestris ancestor as opposed to an acquisition by the common ancestor of all group 2 strains followed by the loss of this T3SS cluster in X. campestris and its re-acquisition in a different genomic context. The third acquisition of the T3SS cluster would have occurred in group 1. In this case, the T3SS cluster included the genes encoding the master regulators HrpX and HrpG. In contrast, in group 2 species, these two genes were located outside the T3SS cluster. Moreover, all T3SS-encoding genes are highly divergent from those of group 2 strains (Figures 2b and S3 ). While the genomic environments of the T3SS clusters among group 1 strains shared similarities, the T3SS genomic context had no similarity to those of group 2 strains (Figures 1, S1 and S2). Altogether, this suggests an independent acquisition event of the T3SS in the common ancestor of Xanthomonas translucens, Xanthomonas hyacinthi and Xanthomonas theicola. Ancestral acquisitions of T3SS clusters imply that the absence of Hrp2 clusters in the 19 strains of Xanthomonas group 2 was the result of multiple loss events. The X. arboricola strains lacking T3SSs were dispersed in five monophyletic lineages indicating at least five loss events during diversification of this species (Figure 1 ). One loss event could have occurred in the common ancestor of the two strains belonging to "X. cannabis" and two independent loss events could have occurred in X. melonis and X. pisi. As X. maliensis was the most divergent species of group 2, it was impossible to hypothesize any event responsible for absence of the T3SS cluster, and this species might have never had any T3SS cluster. In the 19 strains without T3SSs, the entire T3SS cluster was missing. No traces of pseudogenization, which could be identified by a weak homology with T3SS-coding genes with a BLASTn approach, were detected. BLASTn searches with the flanking regions of the Hrp2 cluster in strains lacking it and synteny analysis allowed us to identify the probable excision sites of the DNA fragment containing the Hrp2encoding genes (Fig. S1 ). Excision would have occurred between trpG and ltaE. Nevertheless, we did not find any mobile genetic elements between these loci. A loss followed by a re-acquisition of the whole T3SS cluster should have taken place in the evolutionary history of X. fragariae and X. cassavae. Despite their phylogenetic positions in Xanthomonas group 2, the genomic context of T3SS insertion site was different. Indeed, a lack of synteny in T3SS flanking regions was observed among X. fragariae, X. cassavae and other group 2 Xanthomonas species, even when flanking regions as large as 200 kb were considered (Figs S1 and S2). Because these species were localized in two different phylogenetic positions within group 2, this suggests independent re-acquisition events (Figure 1 ). However, for X. fragariae, another hypothesis could be put forth. The position of this species is similar in the organism-and T3SS phylogenies indicating a probable ancestral T3SS acquisition. In this hypothesis, the different genomic context of T3SS could be the result of a transposition of the cluster within X. fragariae genome. Four ISs were found upstream and downstream of the T3SS cluster in X. fragariae genome that corroborated either transposition or re-acquisition via HGT as a mechanism. As for strains devoid of the T3SS cluster, we looked at the insertion site of the ancestral T3SS acquisition in clades A, B and C and no remnants of the ancestral T3SS cluster were detected at this location. Xanthomonas codiaei shared the same left border of T3SS cluster as X. cassavae, its sister species, suggesting that the loss and reacquisition of T3SS cluster might have occurred in their common ancestor, but unfortunately contig interruption in X. codiaei precluded the analysis of a large genomic environment to confirm this hypothesis. To determine if the T3SS cluster follows the same evolutionary history as the species, a phylogeny based on concatenated hrc-coding genes was compared to the organism phylogeny based on the core proteome of Hrp2-positive strains (Figure 2a and b) . Numerous incongruences were observed and confirmed by the SH test (p-value = .00092). For instance, while most X. arboricola strains exhibited a monophyly for the T3SS-coding genes, T3SS-coding genes from the X. arboricola pv. guizotiae diverged from those of other X. arboricola strains ( Figure 2b ) and they were closely related to those of "X. cannabis" strains CFBP 7912 and Nyagatare. Similar incongruences were observed for X. campestris, X. cassavae, X. codiaei, X. dyei and X. vesicatoria whose T3SS-coding genes were phylogenetically related to those of clade A despite the high divergence among these species in the organism phylogeny. These incongruences between hrc phylogeny and organism phylogeny can be explained by homologous recombination occurring during T3SS evolution. To determine if recombination events affected the whole cluster or only some genes, individual phylogenies were built for each hrc/ hrp gene coding for the T3SS (Fig. S3) . In all phylogenies, X. arboricola pv. guizotiae strains (CFBP 7408 and CFBP 7409) did not group with other X. arboricola strains, but with "X. cannabis" strains CFBP 7912 and Nyagatare, suggesting that their whole T3SS cluster was acquired through a single homologous recombination event with these phylogenetically distant strains. Individual hrc/hrp phylogenies were compared in pairs using a SH test (Table S1 ). For half of the comparisons, the p-values were below 0.05, indicating that most topologies of these trees were significantly different and that recombination occurred between hrc genes. For instance, T3SS-coding genes from X. campestris clustered with genes from clades A and C (Figure 2b and Fig. S3 ) that did not reflect the intermediate posi- tion of X. campestris between group 1 and group 2 in the organism phylogeny ( Figure 2a) . In contrast to clade B strains, which clustered together in most hrc phylogenies, strains from clades A and C were interspersed. This suggested numerous HGTs between these two latter clades. To characterize the gene flow affecting the T3SS cluster in Xanthomonas spp. strains, potential recombinant sequences and their likely parental sequences were detected based on phylogenetic incongruences (Martin et al., 2010) . Identification of the likely origin of the recombinant fragment can be achieved if at least one sequence resembling the donor sequence is present in the data set. The identified exchanges concerned the entire sequence of one or two adjacent genes in the concatenated hrc genes. The two genes for which the number of exchanges were the highest were hrpE and hrpB2 (32 and 26 events, respectively) and the two genes for which the smallest number of exchange events were observed were hrcC and hrcT (3 and 0 events, respectively). Xanthomonas arboricola strains were the main recipients of recombination events (Figure 3a ). They mostly received genes from X. dyei and X. hortorum pv. hederae. Notably, most exchanges were detected within the recombinant network of X. arboricola and epidemic clones gave hrp/hrc alleles to strains belonging to this network, but no reverse events were detected. In contrast, only two X. arboricola strains were donors for other species (CFBP 1022 was donor for X. cassavae and CFBP 8149 for X. hortorum, X. gardneri and X. cynarae). T3SS gene exchanges were also detected between strains of the X. axonopodis species complex but remarkably no gene flow occurred between this clade and other clades. For each individual T3SS-coding gene, we estimated the evolutionary force responsible for observed polymorphism using the q/h ratio. For most genes (14 of 16), mutation had more impact than recombination on generating new alleles (q/h < 1) (Figure 3b ). Only two genes had a q/h ratio above one, hrcJ (q/h > 2) and hrcV (1 > q/ h > 2). In conclusion, within hrc/hrp genes, mutation was the major evolutionary force that have brought polymorphism and generated allelic variants at gene scale. This polymorphism was disseminated across the genus through homologous recombination of entire genes or contiguous genes. To decipher the diversity of T3E repertoires in X. arboricola, T3Ecoding genes were predicted for all genome sequences belonging to this species by a machine-learning approach dedicated to Xanthomonas organisms. Briefly, T3E-encoding genes were searched for based on criteria referring to type three N-terminal secretion signal, structural disorder, regulation by HrpX/HrpG, GC content, codon usage, amino acid properties and homology to known and validated T3Es. Based on this prediction, a set of seven ancestral core T3E genes was observed. The predicted T3E repertoires were highly MERDA ET AL. | 5945 variable with some strains having no T3Es and others having up to 34 predicted T3Es (Figure 4) . Eight of the 14 strains lacking the T3SS cluster were also deprived of T3E-coding genes. In contrast, between one and two T3E genes (avrBs2 and xopR) were identified in the remaining six strains (Data S3). While synteny in the xopR genomic environments has previously been shown (Merda et al., 2016) , here we observed that genomic environments of avrBs2 were also highly syntenic between all strains, with the exception of strains CFBP 7408 and CFBP 7409 (Fig. S4) , favouring an ancestral acquisition and subsequent losses of these T3Es during X. arboricola evolution. The two strains of pathovar guizotiae, CFBP 7408 and CFBP 7409, probably lost and reacquired avrBs2. It has to be noted that avrBs2 was systematically accompanied by three CDSs, namely xylR, a TonB-dependent receptor and a hypothetical protein that presented the same distribution in our collection whatever the genomic context (see Fig. S4 ). In addition to xopR and avrBs2, five other T3E genes (already described in Xanthomonas strains) were found in all X. arboricola strains having a T3SS. These five T3E genes were located within the T3SS cluster (Figure 3b ) and their distribution strictly followed the distribution of the T3SS cluster. However, among them were XopA, HpaA and HrpW, which, while listed as T3Es in some studies (Hajri et al., 2009; Merda et al., 2016) , are secreted regulator (HpaA) or hairpin-like proteins and their effector function remains unclear (Lorenz et al., 2008; White et al., 2009) . Synteny in the flanking regions of avrBs2, xopR, and of the five T3E genes associated with the T3SS cluster suggested that they were most probably acquired by an ancient X. arboricola strain; thus, these seven T3E genes will be designated as the ancestral repertoire thereafter. Pathogenic strains had a higher number of predicted T3E genes than commensal strains (Figure 4) . However, pathogenic strains CFBP 3122 and CFBP 3123 of pathovar populi lacked T3SS and T3E genes and hence represented an exception. Therefore, the pathogenicity of these bacteria, previously qualified as opportunistic pathogens (Haworth & Spiers, 1992) , may rely on different virulence F I G U R E 3 Gene flow affecting the T3SS cluster. (a) Representation of recombination events affecting 16 hrc/hrp genes of the T3SS cluster in Xanthomonas genus. The donor and recipient strains were identified with RDP software, and the representation was obtained using Circos. Each strain is represented by a rectangle of a different colour. The recombination events are represented by a link between donor and recipient strains. The colour of the link corresponds to the colour of the donor and indicates the direction of recombination event. The width of the links represents the number of genes involved in the recombination events. For strains belonging to Xanthomonas arboricola a colour code was used to represent the three groups previously defined (Merda et al., 2016) . Group A strains are indicated in red, group B strains in green, and group C strains in blue. (b) Representation of ratios of recombination rate vs. mutation rate (q/h) along the T3SS cluster using 61 genomes representing the genus diversity. Ratios were calculated for the 16 hrc/hrp genes of the core region of T3SS cluster using RDP software. Arrows are shaded according to the shading scale which indicates the range of the q/h value. The black arrows represent hpa and T3E genes for which q/h was not calculated. Genetic organization of the cluster is based on the sequence of CFBP 2528. In red are represented the five core T3E genes located in the T3SS cluster of X. arboricola strains [Colour figure can be viewed at wileyonlinelibrary.com] factors. The T3E repertoire of pathogenic strains encompassed the ancestral repertoire and a large number of additional predicted T3E genes. Indeed, the pan-T3 effectome of X. arboricola was composed of 57 predicted T3Es, and among them, only 11 were found in both pathogenic and commensal strains, with the 46 others present exclusively in pathogenic strains. Most putative T3E genes identified in X. arboricola were already described in other Xanthomonas spp. This is the case for the seven T3E genes of the ancestral repertoire and 31 other T3E genes (Data S3). Among the 19 remaining T3E genes composing the pan-effectome, seven were known in other bacteria (Ralstonia and Pseudomonas) and 12 were putative novel T3E genes. Among these putative novel T3Es, six (T3E_14 to T3E_19) had a weak similarity (less than 30% of sequence similarity) to T3E genes known in Xanthomonas (xopAH, xopJ1, xopAO, xopAV, xopG and xopM, respectively) and thus are not considered as orthologous of these genes, but they could share a common ancestor. The three successful pathovars (pvs. pruni, corylina and juglandis) shared ten T3E genes that were sequentially acquired in their common ancestor. These ten T3E genes were xopX, xopV, xopL, xopK, xopN, xopAV, xopQ, avrXccA2, xopZ and T3E_16, which shared 27.2% sequence similarity with xopJ1. To determine if their ancestral acquisition resulted from one or several events, we analysed their genomic context. Contig alignment using CFBP 2528 as reference revealed that these T3E genes were dispersed along the chromosome, except xopAV and xopQ, which were colocalized (Figs S5 and S6) . The dispersal of these T3E genes along the genome sequences suggested that they were acquired following several acquisition events. Given the synteny observed in the flanking regions of each of these T3E genes in the genomes of the successful pathovars ( Fig. S5) , it is likely that these independent acquisition events probably occurred in their ancestor before separation into three distinct pathovars. The acquisitions of the Hrp2 gene cluster during Xanthomonas evolution, two in group 2 strains and one in group 1 strains. Indeed, we highlighted an ancestral acquisition in the common ancestor of all group 2 species excluding Xanthomonas campestris. This species, which diverged early in group 2, has a T3SS cluster at a different chromosomal location, supporting an independent acquisition event. The third ancestral T3SS acquisition occurred in group 1. A different genetic organization of the T3SS cluster, a high divergence among T3SS-coding genes from the group 2 species, and the different genomic contexts of the T3SS clusters all indicate that group 1 strains probably acquired a different Hrp2 cluster independently as previously proposed by Jacobs et al. (2015) . Before this study, Xanthomonas translucens was the only group 1 species known to harbour a Hrp2 cluster (Wichmann et al., 2013) . Our results indicate that this atypical Hrp2 cluster is shared with Xanthomonas theicola and Xanthomonas hyacinthi and that it was probably acquired by their common ancestor. T3SS cluster acquisitions occurred before speciation of most xanthomonads; capacity to interact with plants through translocation of T3Es would thus be an ancient trait of xanthomonads as it is the case for other important plant bacterial pathogens (Diallo et al., 2012; Kirzinger et al., 2015) . After ancestral acquisition, the T3SS was lost in some strains and species scattered throughout in the Xanthomonas phylogenetic tree Absence of pseudogenes or remnants of T3SS-encoding genes might be surprising, but a similar observation was made in nonpathogenic Pseudomonas syringae strains, from which the entire cluster has been excised (Mohr et al., 2008) . The loss hypothesis in commensal strains of Xanthomonas arboricola species was previously proposed (Merda et al., 2016) based on Bayesian inference of gene gains and losses. Such complex scenarios with ancestral acquisition, losses and regains have also been proposed in Pantoea genus (Kirzinger et al., 2015) and P. syringae (Clarke, Cai, Studholme, Guttman, & Vinatzer, 2010) . Losses of T3SS could be explained by a loss of function (Abby & Rocha, 2012) . Indeed, it could be beneficial to lose this energetically costly machinery if it does not enhance bacterial fitness (Gophna, Ron, & Graur, 2003) . Thus, for commensal strains colonizing various plant hosts and with a limited set of T3Es (like X. arboricola group B strains) (Merda et al., 2016) , the fitness cost provided by T3SS might be high and consequently it could be lost. T3SS-negative strains may also act as profiteers and benefit from the presence of T3SS-positive strains colonizing the same niche as demonstrated in murine infections by Pseudomonas aeruginosa (Czechowska, McKeithen-Mead, Al Moussawi, & Kazmierczak, 2014) . Once acquired, we showed that T3SS-coding genes were prone to homologous recombination events leading to replacement of large fragments encompassing one complete gene, adjacent genes or even the entire cluster. The two genes for which the number of recombination events was the highest were hrpE and hrpB2, which encode the Hrp pilin and the putative inner rod, respectively (Hartmann et al., 2012) . These two proteins correspond to the early substrates of the secretion machinery. Weber and Koebnik (2006) observed positive diversifying selection in the hrpE sequence corresponding to the surface exposed part of the protein and interpreted it as an adaptative mechanism of the pathogen to escape recognition by the host. Homologous replacement of the hrpE gene by recombination could be an alternative mechanism to generate diversity and to escape host recognition. This latter mechanism has been extensively described in the mammalian pathogen Neisseria gonorrhoeae where it drives antigenic variation of the type IV pilus and avoidance of the host immune system (Obergfell & Seifert, 2015) . The two genes that showed the fewest exchanges, hrcC and hrcT, encode highly conserved proteins located in the basal structure of the secretion system and embedded in the bacterial envelope. Within each gene, allelic polymorphism is mostly generated by mutation, except for hrcJ and hrcV (Figure 3b ). (Sarris et al., 2013) . Understanding gene flow within and between populations sheds light on bacterial ecology. The study of "donor" and "recipient" strains of recombinant fragments showed that X. arboricola strains were the main "recipient," particularly in commensal strains, and Xanthomonas dyei and Xanthomonas hortorum were the two main donors The function of T3SS is to deliver T3Es into host cells. In most strains devoid of T3SSs, no T3E genes could be detected in their genomes using machine-learning approach and BLASTp (data not shown). Indeed, some T3E genes are housed in the T3SS cluster (Figure 3b ) and thus were lost with it. xopR and avrBs2, which are not located in T3SS cluster, were found in the genomes of some commensal Xanthomonas strains lacking T3SS. Their conserved genomic environments, when compared to strains with T3SSs, suggest that they are remnants of an ancestral T3E repertoire (Fig. S4 ) (Merda et al., 2016) . A recent loss of the T3SS could explain why the T3Es were present despite the lack of the T3SS. Alternatively, xopR and avrBs2 secretion might be mediated by the flagellum apparatus as demonstrated for some nonflagellar proteins (Journet, Hughes, & Cornelis, 2005) . They might also have an additional function independent of the T3SS. We have highlighted an extremely reduced ancestral core repertoire and stepwise acquisition of numerous additional T3Es in pathogenic strains of X. arboricola. Five of the seven core T3E genes were located in the T3SS cluster as previously observed in other Xanthomonas species (Noel, Thieme, Nennstiel, & Bonas, 2002; Potnis et al., 2011; da Silva et al., 2002; Teper et al., 2016) . Among them, XopA, HpaA and HrpW should be better considered as accessory or translocation proteins that help the translocation process (Lorenz et al., 2008; Roux et al., 2015; White et al., 2009) . Taking this into account, the X. arboricola core effectome comprises only four T3E genes (xopF1, xopM, avrBs2 and xopR) and is comparable in size to that of X. campestris (Roux et al., 2015) . Together, these results challenge the list of ten core T3E genes (avrBs2, xopF1, xopK, xopL, xopN, xopP, xopQ, xopR, xopX, xopZ) previously proposed White et al., 2009) . xopM, missing in this list, was recently shown to be a T3E gene of Xanthomonas euvesicatoria strain 85-10 (Schulze et al., 2012; Teper et al., 2016) . Our BLASTp searches showed that it is present in most group 2 Xanthomonas species (data not shown). Considering that xopR and avrBs2 were missing in only one and two strains, respectively, of 13 Xanthomonas campestris (Roux et al., 2015) , we propose a list of four putative core Xanthomonas T3Es: AvrBs2, XopF1, XopM and XopR. Interestingly, AvrBs2 contributes to bacterial fitness in field conditions, including epiphytic survival (Wichmann & Bergelson, 2004) . It is required for full aggressiveness both in dicots and monocots and was shown to inhibit pathogen-associated molecular pattern-triggered immune (PTI) responses in rice (Li et al., 2015; Zhao, Dahlbeck, Krasileva, Fong, & Staskawicz, 2011) . Similarly, XopM inhibits immunity-associated cell death mediated by MAP kinase cascades (Teper, Sunitha, Martin, & Sessa, 2015) and XopR inhibits plant basal defences (Akimoto-Tomiyama et al., 2012) . Besides the reduced ancestral core T3E repertoire, stepwise accumulation of additional T3Es has occurred in pathogenic strains and particularly in successful pathovars of X. arboricola. This accumulation appears to be a long-term evolutionary process as many T3Es were acquired before the radiation of the three successful pathovars. At the basal steps of pathogen emergence, accumulation of numerous T3Es including XopL, XopN, XopQ, XopX and XopZ occurred. These were shown to target PTI in addition to the ancestral T3Es, which are also involved in PTI suppression. This reinforces the idea that PTI suppression is crucial for pathogenic strains to achieve successful infection (Macho & Zipfel, 2015) . In conclusion, we showed three ancestral acquisitions of the Hrp2 cluster demonstrating that an intimate interaction with plants is an ancestral trait of xanthomonads. During radiation, most species retained this ancestral T3SS, but some lost it and subsequently it was re-acquired in some strains. Mutation is the main evolutionary force generating new hrc/hrp alleles. In group 2, Xanthomonas, the inter-and intraspecies homologous recombination of large fragments expanding through one or more genes shuffles this polymorphism generating new allelic combinations in Hrp2 clusters. A set of four ancestral core T3E genes is found in commensal strains and pathogens in X. arboricola that may approximate the Xanthomonas ancestral core effectome. We propose that these may allow the strains to overcome basal plant immunity under specific environmental conditions, but could have a fitness cost explaining why they were lost in some strains. In contrast, some strains experienced a different evolutionary pathway with stepwise accumulation of T3Es that probably accounts for their efficacy to overcome plant immunity and could explain the high aggressiveness. X. arboricola represents the archetype of this evolutionary scenario, which seems to share similarities with the one proposed for P. syringae (Lindeberg et al., 2012) and culminates in a narrow host range. We thank Geraldine Taghouti The whole-genome sequences obtained for this project have been deposited in GenBank. Accession nos. are available in Data S1. Most bacterial strains used in this study are available at the microbial resource centre CIRM-CFBP (Beaucouz e, INRA, France; http:// www6.inra.fr/cirm_eng/CFBP-Plant-Associated-Bacteria). supervised genome sequencing in ANAN platform The non-flagellar type III secretion system evolved from the bacterial flagellum and diversified into hostcell adapted systems Effects of crop rotation and N-P fertilizer rate on grain yield and related characteristics of maize and soil fertility at Bako, Western Oromia XopR, a type III effector secreted by Xanthomonas oryzae pv. oryzae, suppresses microbe-associated molecular pattern-triggered immunity in Arabidopsis thaliana Characterization of the SPI-1 and Rsp type three secretion systems in Pseudomonas fluorescens F113 Orthomcl-companion: A user friendly tool to analyze protein families DNAPlotter: Circular and linear interactive genome visualization Comparative genomics of pathogenic and nonpathogenic strains of Xanthomonas arboricola unveil molecular and evolutionary events linked to pathoadaptation Pseudomonas syringae strains naturally lacking the classical P. syringae hrp/hrc locus are common leaf colonizers equipped with an atypical type III secretion system Cheating by type 3 secretion system-negative Pseudomonas aeruginosa during pulmonary infection Mauve: Multiple alignment of conserved genomic sequence with rearrangements Pseudomonas syringae naturally lacking the canonical type III secretion system are ubiquitous in nonagricultural habitats, are phylogenetically diverse and can be pathogenic Type III secretion systems: The bacterial flagellum and the injectisome MUSCLE: Multiple sequence alignment with high accuracy and high throughput Bacterial type III secretion systems are ancient and evolved by multiple horizontal-transfer events genoPlotR: Comparative gene and genome visualization in R A "Repertoire for Repertoire" hypothesis: Repertoires of type three effectors are candidate determinants of host specificity Xanthomonas Characterization of HrpB2 from Xanthomonas campestris pv. vesicatoria identifies protein regions that are essential for type III secretion pilus formation Isolation of Xanthomonas campestris pv. populi from stem lesions on Salix matsudana X alba 'Aokautere' in New Zealand Comparative genomics of a cannabis pathogen reveals insight into the evolution of pathogenicity in Xanthomonas Type III secretion: A secretory pathway serving both motility and virulence (Review) Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data Inheritance of Pantoea type III secretion systems through both vertical and horizontal transfer Circos: An information aesthetic for comparative genomics OrthoMCL: Identification of ortholog groups for eukaryotic genomes The type III effector AvrBs2 in Xanthomonas oryzae pv. oryzicola suppresses rice immunity and promotes disease development De novo assembly of human genomes with massively parallel short read sequencing Pseudomonas syringae type III effector repertoires: Last words in endless arguments HpaA from Xanthomonas is a regulator of type III secretion Prediction of type III secretion signals in genomes of gram-negative bacteria Targeting of plant pattern recognition receptor-triggered immunity by bacterial type-III secretion system effectors Genomic and evolutionary features of the SPI-1 type III secretion system that is present in Xanthomonas albilineans but is not essential for xylem colonization and symptom development of sugarcane leaf scald RDP3: A flexible and fast computer program for analyzing recombination Evolution of the type III secretion system and its effectors in plant-microbe interactions Computational prediction of type III and IV secreted effectors in gram-negative bacteria Recombination-prone bacterial strains form a reservoir from which epidemic clones emerge in agroecosystems Naturally occurring nonpathogenic isolates of the plant pathogen Pseudomonas syringae lack a type III secretion system and effector gene orthologues Phylogenomics and molecular signatures for species from the plant pathogen-containing order xanthomonadales Two novel type III-secreted proteins of Xanthomonas campestris pv. vesicatoria are encoded within the hrp pathogenicity island Mobile DNA in the pathogenic Neisseria Full genome sequence analysis of two isolates reveals a novel Xanthomonas Species Close to the sugarcane pathogen Xanthomonas albilineans jModelTest: Phylogenetic model averaging Comparative genomics reveals diversity among xanthomonads infecting tomato and pepper MACSE: Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons Genomes-based phylogeny of the genus Xanthomonas Genomics and transcriptomics of Xanthomonas campestris species challenge the concept of core type III effectome Pathogenomics of Xanthomonas: Understanding bacterium-plant interactions EuGene-PP: A next-generation automated annotation pipeline for prokaryotic genomes Comparative genomics of host-specific virulence in Pseudomonas syringae Comparative genomics of multiple strains of Pseudomonas cannabina pv. alisalensis, a potential model pathogen of both monocots and dicots phangorn: Phylogenetic analysis in R Analysis of new type III effectors from Xanthomonas uncovers XopB and XopS as suppressors of plant immunity Multiple comparisons of log-likelihoods with applications to phylogenetic inference Comparison of the genomes of two Xanthomonas pathogens with differing host specificities Draft genome sequences of Xanthomonas sacchari and two banana-associated xanthomonads reveal insights into the Xanthomonas group 1 clade Playing the "Harp": Evolution of our understanding of hrp/hrc genes Type three secretion system in Pseudomonas savastanoi pathovars: Does timing matter? Identification of novel Xanthomonas euvesicatoria type III effector proteins by a machine-learning approach Five Xanthomonas type III effectors suppress cell death induced by components of immunity-associated MAP kinase cascades Characterization of a novel clade of Xanthomonas isolated from rice leaves in Mali and proposal of Xanthomonas maliensis sp nov Type III secretion: More systems than you think Positive selection of the Hrp Pilin HrpE of the plant pathogen Xanthomonas The type III effectors of Xanthomonas Effector genes of Xanthamonas axonopodis pv. vesicatoria promote transmission and enhance other fitness traits in the field The noncanonical type III secretion system of Xanthomonas translucens pv. graminis is essential for forage grass infection A multilocus sequence analysis of the genus Xanthomonas New Zealand strains of plant pathogenic bacteria classified by multi-locus sequence analysis; proposal of Xanthomonas dyei sp Velvet: Algorithms for de novo short read assembly using de Bruijn graphs Computational and biochemical analysis of the Xanthomonas effector AvrBs2 and its role in the modulation of Xanthomonas type three effector delivery Ancestral acquisitions, gene flow and multiple evolutionary trajectories of the type three secretion system and effectors in Xanthomonas plant pathogens