key: cord-0018884-nd7xtt97 authors: Sinha, Dhiraj; Sun, Xifeng; Khare, Mudra; Drancourt, Michel; Raoult, Didier; Fournier, Pierre-Edouard title: Pangenome analysis and virulence profiling of Streptococcus intermedius date: 2021-07-09 journal: BMC Genomics DOI: 10.1186/s12864-021-07829-2 sha: 2da974cd68aa88b33383751b093372dfb2da5e43 doc_id: 18884 cord_uid: nd7xtt97 BACKGROUND: Streptococcus intermedius, a member of the S. anginosus group, is a commensal bacterium present in the normal microbiota of human mucosal surfaces of the oral, gastrointestinal, and urogenital tracts. However, it has been associated with various infections such as liver and brain abscesses, bacteremia, osteo-articular infections, and endocarditis. Since 2005, high throughput genome sequencing methods enabled understanding the genetic landscape and diversity of bacteria as well as their pathogenic role. Here, in order to determine whether specific virulence genes could be related to specific clinical manifestations, we compared the genomes from 27 S. intermedius strains isolated from patients with various types of infections, including 13 that were sequenced in our institute and 14 available in GenBank. RESULTS: We estimated the theoretical pangenome size to be of 4,020 genes, including 1,355 core genes, 1,054 strain-specific genes and 1,611 accessory genes shared by 2 or more strains. The pangenome analysis demonstrated that the genomic diversity of S. intermedius represents an “open” pangenome model. We identified a core virulome of 70 genes and 78 unique virulence markers. The phylogenetic clusters based upon core-genome sequences and SNPs were independent from disease types and sample sources. However, using Principal Component analysis based on presence/ absence of virulence genes, we identified the sda histidine kinase, adhesion protein LAP and capsular polysaccharide biosynthesis protein cps4E as being associated to brain abscess or broncho-pulmonary infection. In contrast, liver and abdominal abscess were associated to presence of the fibronectin binding protein fbp54 and capsular polysaccharide biosynthesis protein cap8D and cpsB. CONCLUSIONS: Based on the virulence gene content of 27 S. intermedius strains causing various diseases, we identified putative disease-specific genetic profiles discriminating those causing brain abscess or broncho-pulmonary infection from those causing liver and abdominal abscess. These results provide an insight into S. intermedius pathogenesis and highlights putative targets in a diagnostic perspective. Streptococcus intermedius belongs to the S. anginosus group (SAG) that also includes S. constellatus and S. anginosus [1] . It is part of the normal oral cavity and upper respiratory tract floras, as well as those of the gastrointestinal and female urogenital tracts [2] [3] [4] [5] . This bacterium was first described by Guthof in 1956 after being isolated from dental abscesses [6] . S. intermedius may also cause human infections, usually monomicrobial, including purulent abscesses of the liver, lungs, psoas, spine and/or central nervous system, and infective endocarditis [7] . Over the years, the role of S. intermedius in human infections has increasingly been reported. Patients with invasive S. intermedius infections were described to cause significantly longer hospital stays and higher mortality than patients with other S. anginosus group infections, suggesting that identifying this species might be important for the management of patients [8] . Various putative virulence factors have been described for Streptococcus intermedius, among which the ability to form biofilms to protect itself from antibiotics and the host immune system [9] , the production of hydrolytic enzymes, including both glycosaminoglycan-degrading enzymes, such as hyaluronidase and chrondroitin sulphate depolymerase, and glycosidases, such as α-N-acetylneuramidase (sialidase), β-D-galactosidase, N-acetyl-β-D-glucosaminidase and N-acetyl-β-D-galactosaminidase, which allow S. intermedius to grow on macromolecules found in host tissue [10] ; a cytotoxin, intermedilysin (ILY), that can directly damage host tissues and immune defense cells and participate in bacterial pathogenicity; and the surface protein antigens I/II that are involved in adhesion to fibronectin and laminin, which is an important step in the pathogenesis of endocarditis and abscess formation [11] . The development of high throughput nucleic acid sequencing technologies has enabled observing variations of the genetic repertoire among strains of a given bacterial species. Our present study analysis aimed at describing the genetic diversity and pathogenesis substratum of S. intermedius. Twenty-seven genomic sequences from S. intermedius strains, including 13 newly sequenced from our laboratory and 14 from public databases, were used for pan-genomic analysis. Predicted genes were compared among strains to determine the size of the core and dispensable gene pools, the pangenome, the gain/loss of putative virulence determinants, and to identify genomic islands. The 13 genome sequences determined in this study were deposited in GenBank and their accession numbers are listed in Table 1 . The genomic DNA (gDNA) of each studied S. intermedius strain was extracted in two steps: a mechanical treatment was first performed using acid-washed glass beads (G4649-500 g Sigma) and a FastPrep BIO 101 instrument (Qbiogene, Strasbourg, France) at maximum speed (6.5) for 90 s. following a 2-hour lysozyme incubation at 37°C, DNA was extracted using an EZ1 biorobot and the EZ1 DNA Tissue kit (Qiagen, Hilden, Germany). The elution volume was 50µL. Genomic DNA was quantified using the Qubit assay (Life technologies, Carlsbad, CA, USA). The gDNAs were sequenced using a MiSeq sequencer with the Paired-End strategy and the Nextera XT library kit (Illumina, Inc, San Diego, CA, USA). The Paired-End library was prepared using input solutions of 1 ng gDNAs. The gDNAs were fragmented at the tagmentation step. Then, limited cycle PCR amplification (12 cycles) completed the tag adapters and introduced dualindex barcodes. After purification on AMPure beads (Life technologies, Carlsbad, CA, USA), the libraries were normalized according to the Nextera XT protocol (Illumina). Normalized libraries were pooled for sequencing on a MiSeq sequencer (Illumina). Automated cluster generation and paired-end sequencing with dual index reads was performed in a single 39-hour run in a 2 × 250 bp format. The numbers of paired-end reads were summarized in Table 2 . The paired-end reads were trimmed and filtered according to the read qualities. After sequencing, the obtained reads were assembled using the A5 software [12] with default parameters and then contigs were compared to NCBI using BLASTn to remove contaminations. Then, the online tool Fasta dataset joiner (http://users-birc.au.dk/biopv/php/fabox/ fasta_joiner.php) was used to merge sequences into a single molecule. The Mauve software was used for multiple genomic sequence alignment [13] . Genes were annotated using the Prokka software with default parameters [14] in which the similarity e-value cut-off is 0.000001 and the minimum contig size is 200 bp. This pipeline also includes several other tools like Aragorn for tmRNA detection, Barnap to count rRNAs and Prodigal to identify coding sequences. To estimate the mean level of sequence similarity at the genome level among studied strains, we used the OrthoANI [15] and Genome-to-Genome Distance Calculator (GGDC) [16] softwares, with the following respective threshold values of 95-96 and 70 %. A 16 S rRNA-based phylogenetic analysis of the 27 studied S. intermedius strains was performed using the [17] . For constructing the phylogenetic tree, the following options were used: Maximum Likelihood method; Kimura 2-parameter model for substitution model; uniform rates among sites; partial deletion option for gaps/missing data; 1000 bootstrap replicates. Using genomic sequences and the Roary program [18] , a clustered heatmap of core genes was drawn on the basis of the presence/absence approach [18] . We also detected SNPs with the snp-sites program [19] from the core genome alignment and drew a phylogenetic tree with CGEwebface [20] . Virulence-associated genes were detected by comparing studied genomic sequences with the virulence factor database (VFDB) [21] and sequences described in recent publications [22] . The BLASTp search was performed using the threshold scores reported by Olson et al.: 35 % identity and highest scoring pair length of 50 % [22] . Additionally, we reviewed the literature to identify the proteins involved in interactions with the host [10, 23, 24] . A principal component analysis was performed using the XLSTAT program (Data Analysis and Statistical Solution for Microsoft Excel, Addinsoft, Paris, France 2017) in which the Fisher's least significant difference (LSD, α = 0.005) and Pearson's correlation coefficients were used, to detect any association of virulence-associated genes with specific clinical conditions. Get_homologue [25] was used to reveal orthologous genes among S. intermedius strains, using the following parameters: minimal coverage (-C) 40 %, minimum identity (-S) 50 %, minimum e-value (-E) 1e-05. Sequence similarity searches and clustering of coding sequence (CDS) from the 27 genomes were performed using pairwise BLASTp and OrthoMCL algorithms [26] . Sequential inclusion of all possible combinations of up to 27 strains were simulated and fitted by regression analysis [27] of the amount of conserved genes and of strainspecific genes. This allowed to estimate and extrapolate the sizes of core-and pan-genomes. Roary [18] was also used, with default parameters, to confirm the reliability of the obtained pan-genome analysis results (identity percent ≥ 70 %, coverage ≥ 70 %) and to generate the core genome alignment. The Clusters of Orthologous Groups (COGs) database was used to identify gene functions [28] using BLASTP (E-value 1e − 03 , coverage 0.7 and identity percent 30 %). A circular comparison of genomes was obtained using the online GView Server (https://server.gview.ca/) with S. intermedius strain ATCC 27,335 as reference genome [29] . ResFinder and the ARG-ANNOT database were used to search antibiotic resistance-related markers [30, 31] . The presence of CRISPR repeats and prophages was predicted using the CRISPRFinder [32] and PHAS TER softwares, respectively [33] . The 27 studied S. intermedius strains originated from China, Canada, South Korea, US, Japan and France. The patients' data was not available for some strains. The 13 French strains (G1552-G1557 and G1562-G1568, Tables 1 and 2) were isolated in our laboratory from patients with various infections (Table 2) , from August 2014 to November 2016, on 5 % sheep blood-enriched Columbia agar (BioMérieux) at 37°C in anaerobic atmosphere. Their identification was confirmed by the high scores (> 2) obtained using MALDI-TOF MS. In addition, 14 S. intermedius genome sequences were retrieved from GenBank. The 27 strains were divided into 8 groups according to their isolation source ( Table 2 ). The genome sizes and gene numbers among S. intermedius strains were relatively similar, consisting for each strain in a single chromosome but no plasmid was identified in any strains and ranging in size from 1.85Mbp to 2.05Mbp ( Table 2) . A schematic view of all 27 studied genomes is provided in (Fig. 1) , showing an overall high degree of conservation. The general features of S. intermedius genomes are summarized in Table 2 The G + C content of S. intermedius ranged from 37.3 to 37.8 % (avg 37.641 %, n = 27). All 13 in-house sequenced S. intermedius The 16 S rRNA-based phylogenetic analysis (Fig. 2) , widely used as a gene marker to differentiate Streptococcus species [34] , demonstrated that all S. intermedius strains were grouped in a single cluster that was closelyrelated to S. anginosus and S. constellatus within the S. anginosus group [22] (Fig. 2 ). In the topology S. intermedius, S. constellatus and S. anginosus strains clustered together with their sub-species. However, the heatmap obtained using Roary [18] , based on the core genome, was more discriminatory within the species than the 16 S rRNA-based analysis and identified 3 clusters that were independent from the strain source (Fig. 3) . The three clusters are as follows: strains G1557, G1556, LC4, G1562, SK54, AJKN01, ATCC27335 and JTH08 constituted the first group, strains 30,309, G1563, G1564, 631SC0N and G1554 clustered in the second group while the remaining strains clustered in a third group. There was neither evidence of correlation between strain clusters and their clinical forms, nor between genomic types and the geographical origin of isolates. To measure the divergence between all studied strains at a deeper level, we also analyzed their phylogenetic relationships on the basis of core genome SNPs, which demonstrated that strains G1562, G1566 and FO413 diverged from other strains and exhibited a higher tendency of recombination. However, again no diseasespecific clustering was observed (Fig. 4) . Digital DNA-DNA hybridization (dDDH) values ranged from 80.5 to 99.3 % between all 27 strains, thus confirming their classification within a single species. This was also cross-validated by the OrthoANI program, which produced pairwise values ranging from 97.78 to 100 % which is well above the consensus 95-96 % threshold for prokaryotic species demarcation [35] . This corresponded to 100 % 16 S rDNA sequence identity across all studied isolates. The above data correlate with a strong degree of genome conservation and synteny. The overall distribution of S. intermedius proteins in COG categories was quite similar in all 27 studied strains (Fig. 5) . Previous studies of other Streptococcus species also suggested that, within a given species, the majority of strains had a similar COG profile [36] [37] [38] . Approximately 79.72 % of all proteins predicted in all strains were identified in COG superfamilies. The The alignment gaps tend to coincide with the regions of low G + C contents. The rings, from the inside out, display the size in kbp; GC skew; G + C content; followed by genomes as listed in the left legend proportion of each category fluctuated within a very small range, showing almost similar percentages of distribution in all strains. The most abundant subcategories were related to carbohydrate transport and metabolism (G) and translation, ribosomal structure and biogenesis (J) like their distribution in core genes. Less than half of strain-specific genes, but more than 90 % of core genes, had a match in the COGs database. The most abundant functions in core genes were associated with metabolism (Fig. 5a) . The overall proportion of metabolic functions in core genes was 32.47 %, whereas that in strain-specific genes was 9.58 %. More specifically, energy production and conversion (C), amino acid transport and metabolism (E), nucleotide transport and metabolism (F), carbohydrate transport and metabolism (G) and coenzyme transport and metabolism (H) were noticeably more abundant in core genes (p-value < 0.01) (Fig. 5b) . No mobilome-related functions were detected in S. intermedius. The functional category of information storage and processing showed highly different proportions in sub-categories (Fig. 5b) . The functions of translation, ribosomal structure and biogenesis (J) were significantly enhanced (p-value < 0.0001) in core genes, whereas the functions of replication, recombination and repair (L) were significantly enhanced (p-value < 0.01) in strain-specific genes. This trend was also observed in other bacteria [35] . In the cellular processing and signaling category, the function of defense mechanisms (V) was more abundant in strain-specific (p-value < 0.05) than in core genes (Fig. 5c) . The average number of new genes added by a novel genome was 40 when the 27th genome was added (Fig. 6) . The exponential decay model shown in Fig. 7a suggests that the number of conserved core genes approached an asymptote with the comparison of 27 genomes. A total of 1,355 core genes were identified in S. intermedius. The average proportion and sequence identity of core genes per strain were 72 and 97.79 %, respectively, indicating that core genes in S. intermedius are highly conserved and reflecting a low degree of intraspecies genomic variability too. Examination of the functional annotation of these core genes suggests, as expected, that they encode mostly core metabolic processes. A total of 1,054 strain-specific genes were identified in S. intermedius and the average number of strain-specific genes was 39 (Fig. 7b) . Among strain-specific genes, 148 genes were found in strain G1562, 107 in strain TYG1620, 105 in strain BA1, 96 in strain 32,811, 82 in strain G1557, 73 in strain C270, 69 in strain 631_SCON, 61 in strain G1555, 41 in strain G1554, 38 each in strains G1565 and F0413, 33 each in strains G1564 and LC4, 27 each in strains G1556 and ATCC27335, 20 in strain 30,309, 16 in strain G1553, 13 in strain G1552, 11 in strain B196, 6 in strain KCOM1545, 5 in strain FDAA RGOS_233 and 1 in strains G1563, G1566, G1567, JTH08, SK54AJKN01, respectively. The size of the pangenome increased steadily without reaching any plateau. The pangenome trend depicted in (Fig. 7b) shows a gradual expansion by addition of new genomes and thus the pangenome of S. intermedius may be considered as open and indicates a homogenous pattern of genome evolution with similar rates of gene gain/ loss process across the whole population. In addition, a total of 1,611 accessory genes that were shared by two or more strains were identified. Overall, we identified a S. intermedius pangenome of 4,020 genes including 1,355 core genes, 1, 054 strain-specific genes and 1,611 accessory genes. In the S. intermedius pangenome, 252 virulence factors were identified in total. Of these, 70 core virulence factors were shared by all strains and 78 unique virulence factors were present in one strain each (Table 3) . Virulence-associated genes present in all studied genomes included homologous virulence genes that contribute to bacterial avoidance of the immune system, such as ily which encodes an intermedilysin, the lmb, pspA, pavB/pfbB, fss3 genes coding surface proteins, the genes coding the polysaccharide capsule (cps4A, cps4B, cps4C, cps4D, cps8D), the auto-inducer LuxS (luxS), the binding proteins (pavA, hitC, fbpC, psaA, mntA, clpC, fss3), neuraminidase (nanA), hyaluronidase (hysA), and heat shock protein B (htpB), genes from the sil locus known to play a role in quorum-sensing and virulence in S. pyogenes (silA, silD, silE, salX), genes associated with secretion systems (lem11, lem15, sdeC, ceg32, esxA, essC, lpg2372, lirB), and genes associated with Mg2 + transport proteins (mgtB, mgtC); the response regulator CsrR betahemolysin gene (cylG), lamanin-binding surface protein like Pac and invasion protein inlA were also present in all strains. Among these core virulence genes, the surface protein antigen I/II that was demonstrated to play a potential role in S. intermedius pathogenesis [39] , and human fibronectin and laminin that are supposed to bind to this antigenic protein induce IL-18 release from monocytes [39] ; genes from the streptococcal invasion locus (sil) are related to enhanced virulence in the SAG group and may contribute to the invasive behavior of S. intermedius strains; the internalin (inlA), likely acquired from Listeria monocytogenes, increases the virulence of S. intermedius by playing a key role in attachment to host cells [40] ; the hyaluronidase (hysA) acts in the liquification of tissues and is also involved in biofilm formation, which protects bacteria from host defenses and antibiotics, and plays a role in infection [9] ; the ily-coded intermedilysin can directly damage host tissues and immune defense cells, causing human cell death by membrane bleb formation [23] . It has been also reported that intermedilysin helps in invasion and adhesion of bacteria to human liver cells, and in cytotoxicity [41] ; the galE gene codes galactose which plays a role in biofilm formation and its key residues are essential for epimerase activity [42] ; the [21] laminin-binding surface protein, homologous to that in Streptococcus agalactiae is coded by the Pac gene and is essential in binding and invasion of different host surfaces, and is present in almost all group B Streptococcus strains causing pneumonia, septicemia and meningitis [43, 44] ; psaA codes a surface lipoprotein that plays a role in Streptococcus pneumoniae systemic infections by interacting with monocytes [45] ; we also identified the heat shock protein-coding gene htbB that is known in Legionella pneumophila to act in adhesion to host fibronectin [46] ; the clpC gene codes a heat shock protein involved in the invasion of hepatocytes in Listeria monocytogenes and has an ATPase activity [47] ; ATPase proteins were shown to play a role in the survival and virulence in Salmonella typhimurium and S. aureus [48] ; clpP codes an ATP-dependent caseinolytic protease that was proven in Streptococcus suis to play a role in colonization and bacterial adaptation to various environmental stresses [49] , pavB codes a fibronectin-binding protein that mediates bacterial attachment to human epithelial and endothelial cells and also promotes transfer of bacteria to the bloodstream [50, 51] ; and nanA codes a highly conserved neuraminidase that also possesses a sialidase activity to catalyze the cleavage of terminal sialic acid residues from glycoconjugates. In S. pneumoniae, it promotes biofilm formation and contributes significantly to broncho-pulmonary colonization [52] . Although most of the strains exhibited one to eight unique virulence genes, strains G1562 and BA1 possessed 14 and 10 specific virulence genes, respectively. Eight strains (G1563, G1566, G1567, G1568, BA1, KCOM1545, JTH08, SK54AJKN01) had no strainspecific virulence factor (Table 3) . Among unique virulence genes, sdcA, ybtE,lpbA, SalR, salK, VopT are secretory system-associated genes that are involved in iron-mediated transport across cellular membranes. Some of these genes are linked with bacterial growth and act as important anti-inflammatory effectors [42, [53] [54] [55] [56] [57] [58] . Among other unique genes, the pilC gene is suspected to be essential for secretion and assembly of transcription factor P, important in pilus formation [59] while pilT helps in polymerization and depolymerization of pilin [60] . The brkA gene inhibits bactericidal activity and protects the bacterium from complement activation products [61] . Other unique genes are linked with bacterial adherence and colonizationm such as hopH, toxB, mpn 372 and stcE which contribute significantly to actin organization and bacterial attachment to human surfactant proteins [62] [63] [64] [65] . The iraAB gene utilizes iron-loaded peptides that promote iron assimilation [66] while lepA plays a role in bacterial growth and induces inflammatory response. This gene also plays a key role in pathogenicity in Psudomonas aeruginosa [67] . The fcrA gene codes a protein containing receptor domains for immunoglobulins similar to those M-related proteins [68] . Another immunoglobulin-related gene, aga, plays a barrier function for mucosal antibodies by cleaving IgA1 [69] . IpsA controls transcriptional biogenesis of the cell wall in inositol-derived lipid formation in Corynebacterium and Mycobacterium species [70] . The vasL gene is considered to be component of vas genes, associated with the membrane type VI secretion system [71] , and ravL is presumably activated at low oxygen level and regulates virulence gene expression via clp gene [72] . The lpg0365 codes a lypophosphoglycan that together with other membrane polypeptides, is necessary for Leishmania pathogenesis [73] . The pvdJ gene is involved in the production of cyclodipeptides that may regulate the production of biofilm [74] . In addition, pvdL is associated to biosynthesis or uptake of the siderphores pyoverdine and pyochelin that act in the transport of heme and ferrous ions [75] , while pvdD is involved in the biosynthesis of pyoverdine in Pseudomonas aeruginosa [76] . IpaJ codes a plasmid antigen involved in demyristoylation of proteins by inducing golgi fragmentation and inhibiting hormone trafficking [77] . AliA is associated with nasopharyngeal colonization in Streptococcus pneumoniae [78] . The espN gene is reported in Mycobacterium tuberculosis to play a role in adding an acetyl group to the N-terminus of the esaT-6 virulence factor [79] . Flagella-related unique genes found in different studied strains include flgG, flgI, flgJ and flgk which play a major role in virulence, adhesion and motility. They are mostly involved in flagellum formation and also act as interface with other flagellar proteins [80] [81] [82] [83] . The lnlK gene was reported in Listeria monocytogenes to help avoid autophagy while virB8 localizes to the inner membrane and is related to the export of alkaline phosphatase to the periplasm [84] . Finally, sigA codes a sigma factor linked with galactosidase activity [85] . Using principal component analysis of differentially distributed virulence genes, three distinct clusters were visualized (Fig. 8) . A clear separation of virulence genes associated with brain or broncho-pulmonary abscesses (cps4E, sda and lap) from those associated with liver or abdominal abscesses (cpsB, fbp54 and cap8D) was observed. The first component which has maximum coverage and represents the largest variation showed that brain abscess-causing strains were associated with genes coding ATP-dependent proteolytic enzymes, which indicates their potential role in abscess formation. Other virulence genes clustered independently, excluding any association with the previous two disease categories. Among virulence genes associated to brain and bronchopulmonary infections, sda codes an histidine kinase that regulates sporulation initiation in Bacillus subtilis and mediates the expression of virulence-associated factors [86] ; lap codes the Listeria adhesion protein (LAP) that is a host stress response protein responsible for adhesion and promotion of translocation across monolayers [87] ; and cps4E codes the capsular polysaccharide biosynthesis protein that was demonstrated in S. pneumoniae to prevent phagocytosis by forming an inert shield essential for encapsulation [88, 89] . In S. pyogenes, fbp54 codes a fibronectin-binding protein that acts as an immunogen in humans. The amino acid sequence of fbp54 in S. intermedius is similar to that of S. pneumoniae. cap8D codes a dehydratase that is essential for the synthesis of the capsule precursor involved in adhesion. It has also been targeted as component for vaccine development [85, 90] ; cpsB code capsular polysaccharide biosynthesis proteins that are essential for encapsulation in S. pneumoniae and are involved in the interaction of bacteria with their environment, notably their host organism [91] ; In contrast to the above-mentioned genes, some were not found to be disease-specific. These included glf, cpsE, cpsI, cpsA, cps4C, cps8P and hasC. The glf gene is involved in the biosynthesis of unusual monosaccharide galactofuranose [92] ; cpsE codes a glycosyl transferase responsible for the addition of activated sugars to the lipid carriers in the bacterial membrane and are essential for encapsulation in S. pneumoniae [93] ; cpsI is essential for the production of high molecular weight capsular polysaccharides [94] ; cpsA and cps8P are necessary for normal cell wall integrity and composition [95] ; cps4C codes a polysaccharide tyrosine kinase adaptor protein that plays a key role in the regulation of capsule biosynthesis [96] ; finally, hasC is involved in biosynthesis of hyaluronic acid capsule biosynthesis encodes glucose-1phosphate uridylyltransferase [97] . The tetracycline resistance gene tetM was identified in strains G1552, C270, KCOM1545, G1555, LC4, 30,309 and 32,811 whereas tet32 was identified in strain 631_ SCON ( Table 4 ). The macrolide resistance gene ermB was detected in strains G1552, C270, G1555 and 30,309. In other strains, no antibiotic resistance gene was identified. (Table 1 ). In addition, four prophage-like elements were detected in strain BA1, three in strain TYG1620, and two in strains G1562, G1564, G1553, G1557, F0413 and G1555. The major difference in the genome size between all 27 studied strains of S. intermedius resided in the phage numbers and this presence of phages also denotes contribution of horizontal gene transfer in the emergence of this species [98] . The search for CRISPR elements showed that 14 of the 27 studied genomes contained CRISPRs. Three of these 14 strains (G1564, G1565, 631_SCON) had more than one CRISPR, for a total of 17 CRISPR modules identified in studied strains. The direct repeat (DR) length in identified CRISPRs ranged from 24 to 36 bp while there was variation in the number of spacers present within each CRISPR. CRISPRs also differed among strains but the DR regions were similar for a given CRISPR element subtype. Based on the type of cas proteins, the CRISPRs of strains G1562, G1563, G1564, G1556, G1554, 631_ SCON, 30,309 were subtype I-C CRISPRs; those of strains FDAARGOS_233 and KCOM1545 were subtype II-A CRISPRs; finally, the CRISPRs of strains G1565, G1552, B196, G1555 and 32811were subtype II-C CRIS PRs [93] (Table 5) . In the present study, we reported 13 new clinical isolates of S. intermedius and, based upon a combined approach of pangenomics, core-genomics and virulence profiling of 27 strains, attempted identification of disease-specific genetic profiles. The comprehensive analysis revealed a genomic variability across strains within the species, although synteny of the core genome was preserved. Our results highlight the importance of surface proteins like pavB, pspA and cps4 (polysaccharide-coding proteins) and the binding proteins psaA, pavA, which are present in all studied strains, in pathogenesis. PCA results suggests two distinct categories of virulence genes, ATP dependent proteolytic virulence genes cps4E, sda and lap that are associated with brain and broncho pulmonary abscess while capsular polysaccharides protein coding genes cpsB and cps8D are linked with liver and abdominal abscess formation. The fibronectin binding protein coded by fbp54 is also showing its connection with liver and abdominal abscess formation. A recent study also attempted to determine the pangenome of S. intermedius. [99] The SNP-based phylogenetic tree as well as core gene-based tree showed no clustering related to any disease entity in S. intermedius strains. The whole study provides a key genetic framework for assessing and understanding the molecular events contributing to S. intermedius pathogenesis. However, due to the limited number of studied strains, validation of the role of these virulence factors will require experimental confirmations. Phenotypic differentiation of Streptococcus intermedius, Streptococcus constellatus, and Streptococcus anginosus strains within the "Streptococcus milleri group Association of viridans group streptococci from pregnant women with bacterial vaginosis and upper genital tract infection Streptococcus intermedius, Streptococcus constellatus, and Streptococcus anginosus (the Streptococcus milleri group): association with different body sites and clinical infections Streptococcus milleri group") are of different clinical importance and are not equally associated with abscess Streptococcus intermedius causing infective endocarditis and abscesses: a report of three cases and review of the literature Pathogenic strains of Streptococcus viridans Streptococcus anginosus, Streptococcus constellatus and Streptococcus intermedius. Clinical relevance, hemolytic and serologic characteristics Characterization of the Pathogenicity of Streptococcus intermedius TYG1620 Isolated from a Human Brain Abscess Based on the Complete Genome Sequence with Transcriptome Analysis and Transposon Mutagenesis in a Murine Subcutaneous Abscess Model Role of hyaluronidase in Streptococcus intermedius biofilm. Microbiology (Reading, Engl) Distribution of the intermedilysin gene among the anginosus group streptococci and correlation between intermedilysin production and deepseated infection with Streptococcus intermedius Rapid identification of Streptococcus intermedius by PCR with the ily gene as a species marker gene A5-miseq: an updated pipeline to assemble microbial genomes from Illumina MiSeq data progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement Prokka: rapid prokaryotic genome annotation OrthoANI: An improved algorithm and software for calculating average nucleotide identity Genome sequence-based species delimitation with confidence intervals and improved distance functions MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets Roary: rapid large-scale prokaryote pan genome analysis SNPsites: rapid efficient extraction of SNPs from multi-FASTA alignments Solving the problem of comparing whole bacterial genomes across different sequencing platforms VFDB: a reference database for bacterial virulence factors Phylogenetic relationship and virulence inference of Streptococcus Anginosus Group: curated annotation and whole-genome comparative analysis support distinct species designation Intermedilysin, a novel cytotoxin specific for human cells secreted by Streptococcus intermedius UNS46 isolated from a human liver abscess The role of Streptococcus intermedius in brain abscess GET_HOMOLOGUES, a versatile software package for scalable and robust microbial pangenome analysis OrthoMCL: identification of ortholog groups for eukaryotic genomes Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome The COG database: an updated version includes eukaryotes Interactive microbial genome visualization with GView Identification of acquired antimicrobial resistance genes ARG-ANNOT, a new bioinformatic tool to discover antibiotic resistance genes in bacterial genomes CRISPRcompar: a website to compare clustered regularly interspaced short palindromic repeats Tools for finding prophage in bacterial genomes Streptococcal taxonomy based on genome sequence analyses DNA-DNA hybridization values and their relationship to whole-genome sequence similarities Complete genome sequence of an M1 strain of Streptococcus pyogenes Rapid evolution of virulence and drug resistance in the emerging zoonotic pathogen Streptococcus suis Genome of the opportunistic pathogen Streptococcus sanguinis Expression and functional properties of the Streptococcus intermedius surface protein antigen I/II Internalin A can mediate phagocytosis of Listeria monocytogenes by mouse macrophage cell lines Intermedilysin is essential for the invasion of hepatoma HepG2 cells by Streptococcus intermedius Functional characterization and transcriptional analysis of galE gene encoding a UDPgalactose 4-epimerase in Xanthomonas campestris pv. campestris Group B streptococcal disease in nonpregnant adults Enhanced expression of lmb gene encoding laminin-binding protein in Streptococcus agalactiae strains harboring IS1548 in scpB-lmb intergenic region Roles of virulence genes (PsaA and CpsA) on the invasion of Streptococcus pneumoniae into blood system Genomic characterization, phylogenetic analysis, and identification of virulence factors in Aerococcus sanguinicola and Aerococcus urinae strains isolated from infection episodes ClpC ATPase is required for cell adhesion and invasion of Listeria monocytogenes Simultaneous identification of bacterial virulence genes by negative selection Genetic alteration of capsule type but not PspA type affects accessibility of surface-bound complement and surface antigens of Streptococcus pneumoniae PavA of Streptococcus pneumoniae modulates adherence, invasion, and meningeal inflammation The pavA gene of Streptococcus pneumoniae encodes a fibronectin-binding protein that is essential for virulence Pneumococcal Neuraminidase A (NanA) Promotes Biofilm Formation and Synergizes with Influenza A Virus in Nasal Colonization and Middle Ear Infection GapA and CrmA coexpression is essential for Mycoplasma gallisepticum cytadherence and virulence The high-pathogenicity island of Yersinia pseudotuberculosis can be inserted into any of the three chromosomal asn tRNA genes Crystal structure of the N-lobe of lactoferrin binding protein B from Moraxella bovis SspH2 as antiinflammatory candidate effector and its contribution in Salmonella Enteritidis virulence SalK/SalR, a Two-Component Signal Transduction System, Is Essential for Full Virulence of Highly Invasive Streptococcus suis Serotype 2 Identification and characterization of VopT, a novel ADP-ribosyltransferase effector protein secreted via the Vibrio parahaemolyticus type III secretion system 2 Identification and characterization of pilG, a highly conserved pilus-assembly gene in pathogenic Neisseria Multiple conformations facilitate PilT function in the type IV pilus BrkA protein of Bordetella pertussis inhibits the classical pathway of complement after C1 deposition Helicobacter pylori HopH (OipA) and bacterial pathogenicity: genetic and functional genomic analysis of hopH gene polymorphisms Identification and characterization of human surfactant protein A binding protein of Mycoplasma pneumoniae Detection of toxB, a plasmid virulence gene of Escherichia coli O157, in enterohemorrhagic and enteropathogenic E. coli The StcE protease contributes to intimate adherence of enterohemorrhagic Escherichia coli O157:H7 to host cells The Legionella pneumophila iraAB Locus Is Required for Iron Assimilation, Intracellular Infection, and Virulence Cooperation between LepA and PlcH Contributes to the In Vivo Virulence and Growth of Pseudomonas aeruginosa in Mice The vir-regulon of Streptococcus pyogenes: coordinate expression of important virulence factors Gene structure and extracellular secretion of Neisseria gonorrhoeae IgA protease IpsA, a novel LacI-type regulator, is required for inositol-derived lipid formation in Corynebacteria and Mycobacteria Pathoadaptive conditional regulation of the type VI secretion system in Vibrio cholerae O1 strains Regulation and secretion of Xanthomonas virulence factors Lipophosphoglycan expression and virulence in ricin-resistant variants of Leishmania major Pyoverdines are essential for the antibacterial activity of Pseudomonas chlororaphis YL-1 under lowiron conditions GeneChip expression analysis of the iron starvation response in Pseudomonas aeruginosa: identification of novel pyoverdine biosynthesis genes Nucleotide sequence of pvdD, a pyoverdine biosynthetic gene from Pseudomonas aeruginosa: PvdD has similarity to peptide synthetases Proteolytic elimination of N-myristoyl modifications by the Shigella virulence factor IpaJ The Ami-AliA/AliB Permease of Streptococcus pneumoniae Is Involved in Nasopharyngeal Colonization but Not in Invasive Disease Virulence profiling of Shiga toxin-producing Escherichia coli recovered from domestic farm animals in Northwestern Mexico Flagellar basal body flg operon as a virulence determinant of Vibrio vulnificus Characterization of flgK gene and FlgK protein required for H. pylori colonization-from cloning to clinical relevance The Role of the Flagellar Protein FlgJ in the Virulence of Brucella abortus Systematic Cys mutagenesis of FlgI, the flagellar P-ring component of Escherichia coli. Microbiology (Reading) The essential virulence protein VirB8 localizes to the inner membrane of Agrobacterium tumefaciens Molecular characterization of the capsule locus from non-typeable Staphylococcus aureus Structure and mechanism of action of Sda, an inhibitor of the histidine kinases that regulate initiation of sporulation in Bacillus subtilis Listeria monocytogenes uses Listeria adhesion protein (LAP) to promote bacterial transepithelial translocation and induces expression of LAP receptor Hsp60 Serotypic variations among virulent pneumococci in deposition and degradation of covalently bound C3b: implications for phagocytosis and antibody production Complete genome sequence of a virulent isolate of Streptococcus pneumoniae Systemic and Mucosal Immunizations with Fibronectin-Binding Protein FBP54 Induce Protective Immune Responses against Streptococcus pyogenes Challenge in Mice Capsular Polysaccharide Expression in Commensal Streptococcus Species: Genetic and Antigenic Similarities to Streptococcus pneumoniae Targeted gene deletion of leishmania major UDP-galactopyranose mutase leads to attenuated virulence Disruption of the cpsE and endA Genes Attenuates Streptococcus pneumoniae Virulence: Towards the Development of a Live Attenuated Vaccine Candidate. Vaccines (Basel) Capsular Polysaccharide Production in Enterococcus faecalis and Contribution of CpsF to Capsule Serospecificity Characterization of the putative polysaccharide synthase CpsA and its effects on the virulence of the human pathogen Aspergillus fumigatus Topology of Streptococcus pneumoniae CpsC, a Polysaccharide Copolymerase and Bacterial Protein Tyrosine Kinase Adaptor Protein Identification and Disruption of Two Discrete Loci Encoding Hyaluronic Acid Capsule Biosynthesis Genes hasA, hasB, and hasC in Streptococcus uberis Metagenomic detection of phage-encoded platelet-binding factors in the human oral cavity Genome Mining and Comparative Analysis of Streptococcus intermedius Causing Brain Abscess in a Authors' contributions D.S. and X.S. performed the genomic analysis while M.K performs PCA analysis and helps in preparing figures and tables, and M.D, D.Ra., P.E.F. and D.S. wrote the paper and designed the study. All authors reviewed the manuscript. The authors read and approved the final manuscript. The study was supported by the Méditerranée Infection foundation, the National Research Agency under the program "Investissements d'avenir", reference ANR-10-IAHU-03 and by Région Provence Alpes Côte d'Azur and European funding FEDER PRIMI. Ethics approval and consent to participate The study design was validated by the ethics committee of the institut federatif de recherche 48 under reference 13-035. Not applicable. The authors declare no conflict of interest in relation to this research.Author details 1 Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.