key: cord-0011766-sf7c34lj
authors: Guimaraes, Ana M. S.; Zimpel, Cristina K.
title: Mycobacterium bovis: From Genotyping to Genome Sequencing
date: 2020-05-03
journal: Microorganisms
DOI: 10.3390/microorganisms8050667
sha: 34121e05cb8c0be6b539be2613b78f40563dafa7
doc_id: 11766
cord_uid: sf7c34lj

Mycobacterium bovis is the main pathogen of bovine, zoonotic, and wildlife tuberculosis. Despite the existence of programs for bovine tuberculosis (bTB) control in many regions, the disease remains a challenge for the veterinary and public health sectors, especially in developing countries and in high-income nations with wildlife reservoirs. Current bTB control programs are mostly based on test-and-slaughter, movement restrictions, and post-mortem inspection measures. In certain settings, contact tracing and surveillance has benefited from M. bovis genotyping techniques. More recently, whole-genome sequencing (WGS) has become the preferential technique to inform outbreak response through contact tracing and source identification for many infectious diseases. As the cost per genome decreases, the application of WGS to bTB control programs is inevitable moving forward. However, there are technical challenges in data analyses and interpretation that hinder the implementation of M. bovis WGS as a molecular epidemiology tool. Therefore, the aim of this review is to describe M. bovis genotyping techniques and discuss current standards and challenges of the use of M. bovis WGS for transmission investigation, surveillance, and global lineages distribution. We compiled a series of associated research gaps to be explored with the ultimate goal of implementing M. bovis WGS in a standardized manner in bTB control programs.

Tuberculosis (TB) is a transmissible disease of humans and animals accompanying societies for thousands of years [1] . Despite progress in its control and prevention, TB is a top cause of mortality by a single infectious agent in the world and has devastating effects on bovine livestock and wildlife populations. Ten million new cases and 1.2 million human deaths were reported in 2018, and the increasing incidence of multidrug resistant strains is a threat to public health [2] . In addition, bovine TB (bTB) is an OIE (World Organisation for Animal Health) notifiable disease and, of the 179 countries reporting disease status in 2015-2016, approximately 50% declared the presence of TB in animals, with higher prevalence in Africa and parts of Asia and the Americas [3] . Despite an effective global notification system, the actual impact of bTB in animals is not fairly quantified, especially in wildlife and in countries where disease control programs are not well-established [4] . TB in cattle has important socioeconomic consequences, as the loss of livestock severely affects producers in developing countries with poorly implemented disease control programs and in certain developed nations where specific wildlife reservoirs create pockets of infection [4] [5] [6] [7] [8] [9] [10] .

bTB is also a major, but often neglected, public health concern [11] . The causative pathogen of the disease can be transmitted from cattle to humans through close contact or the consumption to handle in genomic studies because the majority of and most commonly used sequencing platforms generate short reads, usually ranging from 50 to 300 bp, which are often shorter than the repeats themselves [72] . Some of these repetitive regions are the basis for the traditional genotyping techniques developed over the years (see next section).

A number of reviews describe in detail traditional typing methods used for M. bovis outbreak investigations [23] [24] [25] [26] [27] [28] [29] . Nearly all techniques, briefly reviewed below, were first developed and applied for M. tuberculosis typing and later validated for M. bovis studies. Due to MTBC's clonal nature, most polymorphisms in genotyping techniques originate from insertion sequences (e.g., IS6110) and other repeat regions (e.g., CRISPR, PE/PPE genes, PGRS genes). Evidence accumulated over the years indicates that each technique or a combination thereof presents distinct resolving power at the country, region, subregion, and farm levels [26] (Figure 1 ; Table S1 ).

Country Region Subregion Farm Animal WGS MIRU-VNTR PCR Spoligotyping Figure 1 . Resolution power of the main techniques used to resolve transmission clusters of Mycobacterium bovis depicted in relation to world, country, region, subregion, farm, and animal levels. WGS: whole-genome sequencing; MIRU-VNTR: mycobacterial interspersed repetitive unit-variable-number tandem repeat typing; PCR: polymerase chain reaction. Arrows indicate the level of resolution each technique is able to achieve. WGS provides fine resolution to discriminate between M. bovis strains distributed globally to the individual farm level, while MIRU-VNTR PCR and spoligotyping have more limited resolution, particularly at the individual farm level. WGS may be able to discriminate between different M. bovis strains infecting the same animal only if sampling is comprehensive, multiple isolate cultures are sequenced, and/or deep sequencing of the primary isolate is performed.

In 1985, Collins and de Lisle [73, 74] developed the first intraspecific typing technique of M. bovis, the restriction endonuclease analysis (REA). REA consists of applying three different enzymes (BstEII, PvuII, and Bcll) to digest high amounts of total DNA extracted from M. bovis isolates, followed by band pattern visualization on agarose gels. Despite its use in molecular epidemiology studies in certain countries at the time [75] [76] [77] , the assay soon proved to be technically demanding, with an excessive number of small DNA fragments difficult to resolve [78] (Table S2 ; Figure S1 ). Currently, its application is mostly restricted to a reference laboratory in New Zealand, in which it was developed, and was last used for routine typing of M. bovis in 2011 [26] .

A pulsed-field gel electrophoresis (PFGE) [79] assay was later developed for M. tuberculosis and other MTBC strains and resulted in improved resolution of band patterns compared to REA (i.e., larger and fewer bands). However, PFGE had two main disadvantages: first, the MTBC's lipid-rich cell wall inhibits the action of lytic enzymes used in PFGE, preventing the proper use of the PFGE's agarose plugs [80] [81] [82] ; and second, comparative studies developed in later years showed that PFGE of M. tuberculosis strains had a lower intra-specific discriminatory power compared to other genotyping techniques that were subsequently developed [83, 84] . This low discriminatory power is associated with MTBC's clonality; the low number of polymorphic positions between strains may result in undistinguishable band patterns [85] ( Figure S1 ). PFGE has also some intrinsic disadvantages, such as being technically demanding and time consuming (Table S2) . Given the drawbacks, there are only three published reports using this technique to type M. bovis strains [86] [87] [88] .

As REA and PFGE proved insufficient to discriminate M. tuberculosis and M. bovis strains, the search for polymorphic and stable genetic markers allowed the elaboration of superior typing techniques. Currently, the most widely used genetic markers are the IS6110 (for M. tuberculosis), the direct repeat (DR) region (which is a mycobacterial CRISPR), the poly(GC) rich sequences (PGRS), and the variable number tandem repeats (VNTR) sequences. Each marker has its corresponding typing technique.

The 1358 bp IS6110 is MTBC's specific [89] and differences in its location and copy numbers is what discriminate among isolates [90, 91] . This repetitive element was first described in 1990, by screening a M. tuberculosis cosmid library constructed in pHC79 with labelled M. tuberculosis total DNA [92] . Presently, the most standardized and commonly used method to detect IS6110 in M. tuberculosis strains is the IS6110-RFLP (IS6110-Restriction Fragment Length Polymorphism) [93, 94] . Briefly, the technique consists in extracting high amounts (2-3 µg) of total bacterial DNA, digesting it with PvuII endonuclease and subjecting the digested sample to standard electrophoresis on agarose gel. The agarose gel is then used to perform a Southern blot, in which the DNA fragments are transferred to a membrane, and probes complimentary to a portion of the 3 end of the IS6110 sequence hybridize to reveal the number of IS elements and size of generated fragments through chemiluminescence (originally radiolabeling) [89, 94, 95] . IS6110-RFLP patterns can be compared and compiled using specific computer software. Mycobacterium tuberculosis isolates from individuals that are part of the same transmission link often display identical IS6110-RFLP patterns, constituting transmission clusters. IS6110 has also been shown to be stable over time (0.57-10.69 years to change, depending on the disease phase) [96] , which means the technique can be used to study recent transmission or in long-term epidemiological studies.

A major drawback of IS6110-RFLP is the fact that nearly all M. bovis strains carries only 1-5 copies of the insertion element [91, 97] and this technique has low discriminatory power in isolates containing five or less IS6110 copies [98] ( Figure S1 ). In other words, many M. bovis isolates will have the same IS6110-RFLP pattern, making it impossible to distinguish among them. As with REA and PFGE, IS6110-RFLP also requires high amounts of DNA and is labor-intensive (Table S2) . For these reasons, despite being commonly used for M. tuberculosis, IS6110-RFLP was of little use for M. bovis genotyping.

In 1991, a Southern blot-based RFLP was developed based on the digestion of M. tuberculosis DNA using enzymes of four-base recognition sites [99] . One of the detected DNA fragments showing high heterogeneity among isolates was cloned and sequenced, revealing a highly repetitive sequence, identified as PGRS [100] . This fragment served as a probe to identify the presence of up to 30 PGRS copies present in MTBC genomes. Owing to the poor applicability of IS6110 typing for M. bovis, PGRS-RFLP allowed significant improvement in M. bovis strain differentiation [98] . However, as with REA, the presence of multiple bands [101] makes it difficult to interpret [102] (Figure S1 ). As a Southern blot-RFLP based system, it also requires high amounts of DNA and is a laborious technique (Table S2) .

As with many bacteria and archaea [103] , MTBC organisms have a defense system against invading nucleic acids called type III-A CRISPR/Cas system. Even before much attention was given to bacterial CRISPR, this sequence in M. tuberculosis, known as DR locus, was described [104] and readily applied in genotyping [105] . Hermans et al. [104] originally described a genomic locus in M. bovis BCG containing a IS6110 element with many 36 bp direct repeats (DRs) interspersed by spacer sequences ranging from 35 to 41 bp in size. One DR and its neighboring spacer sequence is called a "direct variable repeat" (DVR). The order of the spacers is similar among MTBC strains, but DVRs can be deleted. Therefore, the difference between two isolates is given by the variable presence of spacers in the DR region. There is only one DR locus per MTBC genome ( Figure 2 and Figure S1 ) and up to 43 unique spacers between DRs.

The first typing method for the DR locus was called DVR-polymerase chain reaction (DVR-PCR) [105] , which was later substituted by spoligotyping (spacer oligotyping technique). Spoligotyping was developed in 1997 [106] and readily utilized to evaluate M. bovis strains [107] . This "reverse line blot hybridization technique" is PCR-based and detects the presence of the unique spacers in an MTBC isolate in two steps. First, the spacers between DRs are amplified using PCR. A single primer set complimentary to the two extremities of the DR sequences is used, but the reverse primer is biotin labelled, resulting in the synthesis of labelled reverse strands. Individual spacers are subsequently detected by hybridization of the biotin-labelled PCR product to a nylon membrane containing covalently linked oligonucleotides corresponding to 37 spacers of M. tuberculosis H37Rv and six spacers of M. bovis BCG. A mini-blotter is used for hybridization and up to 45 isolates can be simultaneously compared [28] (Figure 2 ). One advantage of this technique is that it can be applied directly to DNA extracted from infected tissue samples, not requiring bacterial isolation [108] . In the case of M. bovis, spacers 3, 9, 16, and 39-43 are lacking, allowing for species differentiation [106] .

Further improvement and automatization of the technique led to the application of microbead-based detection systems, such as Luminex platforms [109] [110] [111] , multiplexed primer extension-based spoligotyping assay using automated matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) [112] , microarray [113] [114] [115] [116] , and ligation-based amplification and melting curve analysis [117] .

The evolution of spoligotype patterns is given by the loss of spacer sequences, which cannot be restored by recombination and is, therefore, fixed in that population [118] . The problem of spoligotyping is the homoplasy, i.e., unrelated lineages can present identical spoligotype patterns because the loss of spacer sequences is a common event [119] . Thus, spoligotypes are not good indicators of phylogenetic relatedness. In addition, its resolving power has been frequently shown to be lower than REA and MIRU-VNTR PCR (mycobacterial interspersed repetitive unit-variable-number tandem repeat typing, polymerase chain reaction) [26] (Figure 1 ; Tables S1 and S2). Despite these factors, spoligotyping remains as one of the most applied genotyping techniques in M. bovis studies (Table S1 ). In "principle", squares denote the quantity of specific genetic markers (i.e., DR locus and VNTR) on M. bovis genomes. While spoligotyping is based on a unique locus, MIRU-VNTR PCR amplifies genetic targets from multiple regions of the genome (up to 24 loci). In contrast, WGS uses information from the whole-genome sequence. Dates refer to the year in which each technique was developed.

In "genomic region", the MIRU40 locus is shown as an example of one of the 24 loci that can be used in MIRU-VNTR PCR. In "results", the spoligotyping membrane is depicted accommodating several samples simultaneously, owing to the high-throughput capability of this technique (up to 45 samples can be simultaneously analyzed). In MIRU-VNTR PCR, although many samples can be amplified at once, each sample can occupy up to 24 wells in an agarose gel, so many electrophoresis runs may be needed depending on the laboratory. MIRU-VNTR databases can subsequently be used to generate a minimum spanning tree. WGS is a high-throughput technique that will lead to single nucleotide polymorphism (SNP)-based analysis. The same generated data can also be used to detect spoligotype and MIRU-VNTR patterns (see text). WGS results can be used to evaluate transmission clusters as well as phylogenetic relationships among the sequenced genomes. WGS: whole-genome sequencing; MIRU-VNTR: mycobacterial interspersed repetitive unit-variable-number tandem repeat typing; PCR: polymerase chain reaction; DR: direct repeat.

VNTR is a locus in which a nucleotide sequence is arranged as tandem repeats, i.e., repeats clustered together and oriented in the same direction. The size (in bp) of this locus varies according to the number of times the nucleotide sequence is repeated. Each repeat can be added or removed through recombination or replication errors, resulting in alleles with different number of repeats. VNTR are present in eukaryotes and prokaryotes, and given its variability, it has been frequently used for DNA typing [120] .

Compared to the single DR locus of spoligotyping, many VNTR loci exist in MTBC (Figures 2 and S1 ) and they are detected using PCR. The sizes of resulting PCR products correspond to the number of repeats in each locus. Initially, 11 VNTR loci (five MPTR loci, with repeats of 15 bp, and six ETR loci, with repeats of 53-75 bp) were evaluated in MTBC strains [121] [122] [123] . Five ETR loci (ETR-A, -B, -C, -D, -E) showed more discriminatory power among strains [124] . However, these five loci did not provide higher resolution compared to the IS6110-RFLP in M. tuberculosis strains with high IS6110 copy numbers [121, 124] . Since M. bovis has few IS6110 copies, ETR loci were indeed more discriminative than IS6110-RFLP [121, 125] , but spoligotyping continued to present higher resolution [124, 126] . Thus, other loci were identified and tested, such as MIRU, QUB, and Mtu, and currently a 24-loci MIRU-VNTR PCR is commonly used [127] . MIRU-VNTR can also be evaluated along with spoligotyping to infer genotyping through the online platform MIRU-VNTRplus [128] , providing a standardized manner of results delivery.

Among M. bovis studies, different sets of VNTR loci have been applied (Table S1 ). Each locus and combination thereof may present better or worse discriminatory power depending on the region and sample set (Table S1 ). It has been suggested that each region should define the best combination of loci for its reality [129] , aiming also at decreasing the cost and time spent in running different PCR assays. It has also been shown that, in certain settings, the capacity of MIRU-VNTR PCR in detecting transmission clusters may be dependable on the M. tuberculosis lineage [130, 131] . For instance, it has been described that standard 24-loci MIRU-VNTR PCR has low resolution power to precisely discriminate closely related isolates of the lineage 2, Beijing of M. tuberculosis [131] . It is unknown if this is also the case for M. bovis lineages or clonal complexes.

The first complete genome sequence of M. bovis to become available originated from a strain denominated AF2122/97 isolated from a cow in England [132, 133] . As this was the first M. bovis genome available, M. bovis AF2122/97 is now considered the reference genome of M. bovis in GenBank, as genomes of M. bovis type strains have never been sequenced. By December 2019, only 74 virulent M. bovis genomes (i.e., not BCG) are deposited as complete or draft forms in NCBI, compared to 6522 M. tuberculosis genomes. In SRA (Sequence Read Archive), the database for depositing raw reads, the number of sequenced M. bovis is in the thousands. The disparity in numbers between assembled complete or draft genomes and raw reads highlights that the majority of developed studies are based on SNP and/or indel detection using reads.

The current WGS workflow ( Figure 3 ) begins with the isolation of M. bovis from de-contaminated tissue samples on solid (e.g., Stonebrink, 7H11-OADC) or liquid media (e.g., 7H9-OADC, MGIT-mycobacterial growth indicator tube), followed by the extraction of its DNA, library preparation, and WGS using short-read technologies (e.g., Illumina platforms). Special attention must be given to the quality of extracted DNA and the use of library kits that can accommodate high-GC content bacteria [70] . DNA extraction of mycobacteria is not trivial; the lipid-rich cell wall interferes with yield and DNA purity, which may affect library construction [70] . An optimized extraction protocol of non-tuberculous mycobacteria for long-read sequencing has been recently proposed [134] . Once DNA is successfully extracted and sequenced, generated reads need to undergo quality checks and are processed in specific data pipelines tailored to each need. For epidemiology purposes, WGS can be used to assess the genetic relatedness among isolates to address transmission investigation, surveillance, and/or lineage identification ( Figure 3 ). Current methodologies are typically based on the identification of SNP and indel differences between or among isolates. Basically, the greater the SNP and indel difference between two isolates, the lower the probability they are related to each other. SNPs and indels are ultimately identified by mapping the quality-checked reads to a reference genome and calling the variants.

Step 01

Step 02

Step 03

Step 04

Step 05 . Mycobacterium bovis whole-genome sequencing (WGS) workflow from bacterial isolation to data analysis. SNP: single nucleotide polymorphism. * Time is highly dependable on library kit and sequencing protocol. ** MTBC-specific and general parameters are described in detail in the text, but overall this includes FastQC parameters, minimum established sequencing coverage, contaminating reads, species confirmation, mixed-strain evaluation (depending on the purpose of the analysis), and homogeneous sequencing coverage (after reference genome mapping). MTBC: Mycobacterium tuberculosis complex.

The quality of WGS reads can dramatically impact the study outcome. Therefore, quality assessment is considered the first step in data analyses. Once laboratory-specific [135] and standard quality controls associated with the sequencer run are evaluated and errors originating from the sequencer itself are ruled out, generated FASTQ files normally undergo general and mycobacteria-specific quality assessments and processing. Accordingly, following adaptors removal with appropriate software [136] [137] [138] [139] , an overall quality evaluation of reads is typically performed using FastQC [140] or similar software [139, [141] [142] [143] . Quality parameters are frequently evaluated rather manually, by analyzing the QC report of each FASTQ file or by using software that compile multiple-sample QC reports [141, 142] . Nevertheless, these are important measures to ensure that high-quality sequencing data are used in downstream analyses. Based on this QC evaluation, FASTQ files are usually processed to remove low quality data.

Parameters such as anomalous GC content, duplicated sequences, per base and per sequence qualities, per base N content, sequence length distribution, among others can be addressed according to pre-established and/or default thresholds. The detection of anomalous GC content may indicate possible sample contamination, as any peak differing from the high value of mycobacteria (≈65%) are not expected. A high level of sequence duplication indicates errors or enrichment bias related to PCR amplification and sequencing that are not expected to occur in WGS [140] . Duplicated sequences (e.g., PCR duplicates) are normally removed downstream in the pipeline, after read mapping, using appropriate software [144] . In addition, reads are often trimmed and filtered out according to quality, with user-specified or default thresholds. Protocols of trimming with different stringency levels have been tested for eukaryotes, showing that variations of parameters may significantly affect end-results [145, 146] . Basic FASTQ processing (e.g., adaptor removal, read trimming, read filtering, removal of duplicates, among others) can be performed by using a combination of different software or by using all-in-one tools [143] . Finally, QC results should be evaluated before and after file processing to guarantee that minimum quality standards have been reached with appropriate read length distribution.

Unfortunately, sequencing files can often contain contaminating reads, i.e., reads not originating from the target genome [147] [148] [149] [150] . These contaminants may or may not result in discrepant GC content peaks. Their presence is sometimes inevitable and challenges for eliminating contaminant reads have been addressed previously [34] . If not evaluated beforehand, their presence may be detected only at read mapping or genome assembly, or go undetected and result in false positive or negative SNPs [151] . One way to check for contaminants is to use FastQ Screen [152] or Kraken [153] , and if desired, filter out unwanted reads following a pre-established threshold of sample contamination acceptance [152] . One study using Kraken defined a threshold of at least 90% of the reads taxonomically assigned to MTBC for the sample to be included in the analyses [154] . However, because MTBC genomes are highly similar, it is difficult to control for cross-contamination when sequencing several MTBC isolates at once [155] . Heterozygous sites may occur, and the sample falsely considered a mixed-infection.

There are three mycobacteria-specific sequencing quality criteria that can be evaluated: (i) homogenous sequencing coverage; (ii) RD identification; and (iii) within-host genetic diversity. One of the advantages of MTBC clonality is that read-mapping coverage to a reference genome can be utilized as measure of homogeneous sequencing coverage of the target genome. When mapping high-quality reads to a MTBC genome, a high mapping coverage (>95%) of the reference genome is expected [156] [157] [158] [159] . Percentage cut-offs may be established [45, 160] , because substantially low percentages are likely to indicate that the target genomes were not evenly sequenced. In addition, the presence of species-specific RDs in the target genome must be assessed. In our laboratory, we have identified a number of MTBC genomes deposited in public databases with mistakenly-assigned MTBC species [45, 161] . As MTBC members have high genomic and phenotypic similarity, errors in species identification may occur. Therefore, even if the bacterial isolate was obtained from cattle tissue, M. bovis specific RD patterns should be confirmed. This confirmation can be performed using reads, by checking RD regions through reference genome-mapping [45] , or by running the automated software RD analyzer [162] .

Another major challenge of SNP-based approaches for MTBC WGS analysis is within-host genetic diversity [163] . The amount of MTBC genetic diversity an individual carry depends on the time between infection and development of active disease (within-host evolution, i.e., microevolution) and/or the number of strains this individual was exposed at single or multiple infection events through life (mixed-infection) [163] . Microevolution occurs during long-term co-existence between pathogen and host and is characterized by a single infection event leading to bacterial mutations over time. On the other hand, mixed-infection occurs when the individual is exposed to a single or repeated infection events through life of different strains, and is thus carrying distinct strains of MTBC [163] . If DNA is extracted from the primary isolate without bacterial propagation from a single, de-clumped colony, there may be simultaneous sequencing of more than one strain in a sample. Thus, when mapping reads to the reference genome and calling variants, heterozygous sites may arise. More details regarding this issue are given in the following sections.

A closed, complete genome must be chosen as reference for read mapping and variant calling. The choice of reference genome can dramatically alter the end-results [164, 165] and it is still a controversial matter [34] . Lack of standardization of reference genomes halts comparisons between pipelines and laboratories. Ideally, the reference genome must have all DNA segments present in the bacterial population under study. If the reference genome has deleted regions compared to the genomes being tested, genetic diversity may be missed. The evolutionary distance between the microbial genomes under study and the reference genome should also be taken into account [165, 166] . For instance, if M. tuberculosis genomes are used as reference for M. bovis studies, the number of detected SNPs increases dramatically [47, 167] , which may lead to errors in read mapping and variant calling [165] , substantially increasing computer usage and time.

WGS-based studies of M. bovis often use as reference the genome of M. bovis AF2122/97 (Table S3) . One recent study has proposed the use of an outbreak-matched M. bovis genome as reference in France [168] . Studies of M. tuberculosis have used the M. tuberculosis H37Rv genome, lineage-or outbreak-matched genomes, or an inferred ancestral MTBC genome, which have been reviewed elsewhere [34] . The use of a MTBC pan-genome as reference, i.e., a gene pool representing the whole diversity of MTBC genes, has also been suggested, but never evaluated [34] . Intergenic regions, however, should not be neglected in future technical validations of these approaches. Recently, a computational pan-genome of M. tuberculosis (in this case, a dataset of whole genome sequences, and not simply core and accessory genes) with 5,205,216 bp obtained from 146 M. tuberculosis genomes has been proposed as a reference genome for this species [169] . Considering that a M. bovis transmission cluster may be defined or ruled out by just few SNPs (see following sections), more comprehensive studies on the effect of the reference genome on SNP identification should be performed.

Bacterial SNPs and indels, as well as structural variants (SVs-indels, duplications, inversions, and translocations >50 bp), can be identified through de novo genome assembly followed by comparison against a reference genome, or by mapping reads to a reference genome [170] . When using assembled genomes, failure to properly identify variant calls can occur due to assembly errors or misidentification of indels [170] . It is more appropriate and faster to use the complete information provided by the reads than relying on assemblers and consensus base callers [171] to detect variants. Thus, mapping reads to a reference genome is the preferred first step to detect high quality variants. Numerous short-read aligners have been developed (to cite a few [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] ) and additional information about mapping principles have been reviewed [179] .

Different approaches of read mapping and variant calling are described in M. bovis studies (Table S3) . Among the most widely used short-read mapping tools are Bowtie/Bowtie2 [172] and BWA/BWA-SW [174] . Output alignment files are subsequently processed to call and generate a list of high-quality variants using tools available from toolkits or pipelines such as VarScan2 [182] , SAMtools [173, 183] , and GATK (Genome Analysis Toolkit) [184, 185] . PCR duplicates are frequently removed after read mapping with Picard (MarkDuplicates; https://broadinstitute.github.io/picard/) or SAMTools (rmdup) [173, 183] ; but the actual necessity for this step has not been systematically evaluated using MTBC genomes. VarScan2 combines a heuristic method coupled with statistical algorithm to detect mutations from read mapping, and integrates identification of SNPs, indels, or both (mpileup2snp, mpileup2indel, mpileup2cns, respectively). On the other hand, SAMtools and GATK are probabilistic methods, implementing Bayesian statistics. SAMtools contemplate the bcftools to call variants [173] , while GATK version 4 uses HaplotypeCaller [184, 185] . Unfortunately, variant callers have been mostly benchmarked with human genomes, which may lead to the report of false variants when analyzing microbial genomes [170] . More recent studies showed marked result differences among pipelines for variant detection in WGS studies of M. tuberculosis [160, 165] and other bacteria [166, 186] .

One of the reasons for these discrepancies is that MTBC studies vary widely on the parameters adopted to map reads and call high-quality variants; no standards have been determined. The choice of parameters greatly influences variant detection (e.g., base call and mapping quality scores, tail distance, presence of variants on both strands for paired-end reads, read depth, minimum allele frequency, maximum number of SNP calls within 10-12 bp, local assembly or realignment around indels, and strand bias) [170] . Sequencing coverage, PCR duplicates, mapping artefacts around indels, SVs and repetitive or duplicated regions may also result in false positive (FP) and/or false negative (FN) calls [170] . As part of MTBC-specific measures, it is common practice to exclude SNPs and indels associated with repetitive DNA, such as PE/PPE family genes, phage genes, repetitive family 13E12 genes, transposases, and integrases, following their identification through annotation or genomic position, or by excluding selected genes from the reference genome [165, 187] . However, comprehensive evaluation of the true probability of these calls being FP or FN are lacking. By using read simulation and comparison to long-read sequencing, a recent study has shown that SNPs detected in PE/PPE regions were highly unlikely to be FP calls when using BWA read mapping and Pilon variant caller [188] (which is a microbial variant caller). In contrast, another study has shown that both FP and FN calls are disproportionately present in PE/PPE regions in a multi-variant caller comparison [165] . This contradiction highlights the need for additional studies. Thus, significant challenges remain to be overcome in order to define the best parameters to call variants and how to handle low-quality variant calls.

Two major technical challenges arise in variant calling when there is within-host genetic diversity: establishment of minimum allele frequency to identify a site as heterozygous, and the minimum number of heterozygous variants in a sample to be considered a "mixed-sample" condition. Importantly, the ability to uncover these parameters in M. tuberculosis studies is also directly affected by sequencing coverage [189] . The minimum allele frequency is a fixed threshold (≈75% to 95%) for the proportion of reads supporting a particular variant call. Sites with the percentage of reads falling below this threshold are considered heterozygous and hence used to support a mixed-infection. Unfortunately, there is no consensus on the percentage level that should be used.

Once heterozygous sites have been identified, two strategies are commonly applied to determine a mixed-sample condition: a cut-off proportion of heterozygous sites to total variants, or a minimum total number of heterozygous sites [21, 45, 51, [190] [191] [192] [193] [194] [195] [21, 45, 51, 192, 194] , consider these heterozygous sites in the context of the contact chain being analyzed to resolve transmission networks [21, 194, [196] [197] [198] [199] , or excluded heterozygous sites from downstream analysis [43, 193, [200] [201] [202] .

Individual animals or humans carrying MTBC isolates with distinct SNP profiles have been described [21, 48, 51, 156, 163, [194] [195] [196] [197] [198] [203] [204] [205] [206] [207] [208] . As explained above, such conditions occur when there is microevolution and/or mixed-infection. Both concepts, the manner in which within-host genetic diversity is detected, and their application to the definition of transmission clusters have been initially defined with M. tuberculosis and later applied to M. bovis WGS studies [21, 45, 47, 51, 194, 195] . Owing to the low substitution rate of MTBC [46, 190, 196, 202, 209, 210] , the number of acquired SNPs by M. tuberculosis strains under a microevolution process (within-host evolution) has been estimated to be very low, usually in single digits [29, 163, 190, 196, 197, 209, 211] . On the other hand, mixed-infection is defined when two or more M. tuberculosis isolates obtained from an individual differ by a great number of SNPs [29, 163, 196] .

Microevolution can be detected at the individual or transmission cluster levels ( Figure 4A ). Very often, microevolution is only detected at the latter, because the whole extent of within-host genetic diversity is frequently missed due to insufficient individual sampling [163] . When these low SNP distances are inferred among individual samples in a cluster, they are used to define them as part of the same transmission cluster. In other words, the same M. bovis strain was transmitted from one animal to the other, and the amount of genetic changes accumulated is zero or just a reflection of within-host evolution (i.e., microevolution) represented by very few SNPs. Contrastingly, if the number of SNPs between two individual samples is too high, they are not considered part of the same transmission cluster ( Figure 4B ).

At the transmission cluster level, if this within-host genetic diversity is captured with adequate sampling, an individual may be considered part of two transmission clusters, representing different infection events that occurred over time ( Figure 4B ). More likely the within-host genetic diversity is not entirely captured, and pathogen transmission between or among individuals may be mistakenly discarded. In other words, if a cow is infected with two distinct strains of M. bovis differing by a great number of SNPs, and only one is sequenced but both are transmitted to other cows, one of the animals in the transmission chain will go undetected as part of that cluster. It is important to highlight that an individual may get re-infected with the same strain, which would be impossible to distinguish using current analytical methods. The actual impact of within-host genetic diversity on the transmission dynamics and pathogenesis of M. bovis remains to be comprehensively studied. . When a great SNP distance is found between two M. bovis isolates from different hosts, these animals are not considered part of the same transmission cluster. However, if an animal is infected with two strains differing by a great number of SNPs (i.e., mixed-infection), it may be identified as participating in two different transmission clusters (cluster 1 and cluster 2). If the within-host genomic diversity is not entirely captured, one of the transmission clusters may be missed. This animal with a mixed-infection may also transmit both strains to another animal (cluster 3), and if the diversity is entirely captured, both animals will be considered as part of the same cluster.

Few approaches have been used to interpret variant calling data in M. tuberculosis and M. bovis studies. The simplest one is to use the absolute number of detected SNPs (indels are excluded) to infer relatedness based on predefined thresholds. This methodology has been often employed in M. bovis studies [21, 47, 51, 210, 212] (Table S3 ). Based on solid epidemiological links, SNP thresholds have been established to distinguish within-host microevolution from mixed-infection of M. tuberculosis [190, 193, 197, 209] . Consequently, the same SNP thresholds were also applied to distinguish isolates belonging to a transmission cluster or not, helping define a transmission chain of M. tuberculosis or M. bovis and differentiate relapse from re-infection in M. tuberculosis infections [21, 47, 51, 163, 190, 193, 196, 197, 199, 209, [213] [214] [215] [216] [217] (Table S3 ). One of the first studies to define SNP thresholds evaluated M. tuberculosis isolates obtained from chronically infected patients, epidemiologically linked patients, and outbreaks with confirmed transmission chain observed in the UK (a low-burden country) from 1994 to 2011 [196] . A maximum of five SNPs was defined as the limit to infer a transmission cluster or microevolution. Similar thresholds have been confirmed in later studies performed at low and high burden settings [190, 197, 209, 211] , and is commonly accepted that five SNPs can be used as a stringent threshold and 10 or 12 as a more relaxed threshold [29, 163] . Nevertheless, SNP thresholds described in the literature of M. tuberculosis vary [163] . It is known that these thresholds may be influenced by variant calling protocols, culture or sampling, read depth, and epidemiological links used to first define them, which makes them unlikely to be adequately transferred between settings and studies [163] . Thresholds have never been determined for M. bovis, which is likely subjected to different evolutionary pressures compared to M. tuberculosis. Moreover, owing to the possibility of false positives, indels are usually excluded from the analysis; just few studies of M. tuberculosis WGS have used this information to better resolve clusters [209, 218] .

Established SNP thresholds defining recent transmission events were calculated according to the evolutionary rate of M. tuberculosis, reported as 0.3-0.5 SNP per genome per year [190, 196, 209] . It is unknown if the same rate applies to M. bovis. Estimated substitution rates of M. bovis range from 0.15 to 0.53 substitutions per genome per year [46, 202, 210] . However, these studies either examined a limited number of isolates [210] or geographically restricted samples [46, 202] . The correct estimation of M. bovis substitution rates has significant implications for the definition of the amount of genetic changes needed to define a transmission cluster, and for the temporal resolution WGS can provide to study disease dynamics in bTB [210] . Although it is possible that M. bovis-derived SNP thresholds are not very different from M. tuberculosis, the paucity of knowledge regarding M. bovis evolution and DNA repair mechanisms implies that more in-depth evaluation should be conducted. It is unknown, for instance, if the phenotype of broad host tropism [52] influences replication and substitutions rates of M. bovis over time.

Traditional MLST (multi-locus sequencing typing) is based on the identification of mutations in a pre-established, limited number of bacterial genes. In order to incorporate the whole gene repertoire of a bacterial species and WGS technology, cgMLST or pgMLST (core genome or pan-genome MLST) schemes based on MTBC core or pan-genome genes (core genes plus some accessory genes), respectively, have been applied [219] [220] [221] [222] . Briefly, the obtained list of SNPs (indels are normally excluded) is translated into a standardizable allele numbering system. SNPs are identified in a pre-defined allele dataset of selected MTBC species; any particular gene identified with a SNP is giving a number. Each sample is then given a sequence type (ST) determined by a combination of allele numbers. In other words, these schemes are based on the concept of allelic variation. STs generated from the bacterial population under study can then be used to generate minimum spanning trees to define transmission clusters [219] [220] [221] [222] . One great advantage of these methods is the possibility of generating a nomenclature that can be readily compared between laboratories, which is vastly appreciated for disease control and eradication programs. However, by using these approaches, information from intergenic regions may not taken into account, unless intergenic loci are added to the initial reference dataset. In addition, there may be variation in the gene pool or gene annotation inconsistencies among different strains of M. bovis and MTBC [156, 161, 223, 224] that may lead to errors when initially defining the gene repertoire to serve as alleles. This approach has been independently applied in M. bovis isolates from a Brazilian State approaching bTB eradication status, revealing recent transmission between farms and multiple M. bovis introductions within the same farm [225] .

Most M. bovis WGS studies use phylogenetic methods to define potential clusters of pathogen transmission, to evaluate populational structure of M. bovis, and/or for surveillance purposes [21, 43, [45] [46] [47] [48] [49] [50] [51] 161, 167, 194, 202, 210, 212, 217, [226] [227] [228] [229] (Table S3 ). In general, phylogenetic trees are constructed from alignments (i.e., matrices) of concatenated SNPs identified in each M. bovis genome under study. These trees are generated using different algorithms, such as maximum likelihood, maximum parsimony, neighbor-joining, or Bayesian inference. This approach provides clusters of associated M. bovis isolates, but additional analyses are normally performed to ascertain a transmission chain [21, 44, 47, 50, 51, 202, 210, 212] . In phylogenetic trees, transmission pairs do not always appear phylogenetically related or associated; phylogenetic trees are not a complete substitute for a transmission network [163, 230, 231] . Nevertheless, Bayesian inference schemes have also been used to estimate temporal scales of bTB outbreaks by dating ancestries of the bacterial population under study [43, 46, 49] . When used to study M. bovis populational structure or evolutionary dynamics in countries or globally, phylogenetic reconstruction has always been the preferred method [21, 45, 46, 48, 50, 167, 194, [226] [227] [228] [229] (Table S3 ). However, it is important to understand that only core SNPs will be considered. All indels and variant sites that are not present in all strains are excluded from the analysis.

Genomic regions containing homopolymers or tandem repeats can lead to false reports of indels and/or SNPs due to sequencing errors or inaccurate read mapping. In addition, small and large indels are difficult to be accurately detected [154, 170, 232] . Therefore, current pipelines to infer M. tuberculosis or M. bovis transmission normally exclude indels or variants detected in repetitive, duplicated, and/or low-complexity regions [21, 34] . Repetitive regions and duplicated genes are likely subjected to distinct evolutionary rate [233] . Thus, with the advent of more accurate variant callers and parameters, as well as long-read sequencing, the inclusion of such sites may provide further resolution for outbreaks as well as changes to the current SNP thresholds for definition of a transmission cluster in the future. Long-read sequencing technologies are powerful and promising tools that can uniquely identify the genomic origin of the read, helping resolve repeat regions, and determining large deletions or rearrangements [234] . However, a major drawback of such technologies is the low base calling accuracy when compared to short read technologies, which is detrimental for variant detection [234, 235] . Hybrid systems, including the association of long-and short-read data, have been proposed to correct base calling errors [234, 236] . In the future, it is expected that an increased accuracy in base calling of long-read technologies will revolutionize genome sequencing. Advantages and disadvantages of long-read sequencing compared to short-read sequencing have been recently reviewed elsewhere [234] .

WGS does not eliminate the identification and reporting of spoligotypes and MIRU-VNTR patterns of samples under study. SpolPred [237] and SpoTyping [238] are two software developed to detect spoligotypes from short-read sequences in FASTQ format. SpoTyping also accepts assembled contigs in FASTA format as input and is reported to be 20-40 times faster than SpolPred [238] . Both software have reported identical spoligotypes in a dataset tested [238] . More recently, a methodology to reconstruct the whole CRISPR locus of MTBC strains have been proposed [239] and is awaiting further investigation for its applicability as a typing tool.

In contrast to spoligotyping, which is based on a single locus, the identification of MIRU-VNTR profiles using WGS data from short-read sequencing has been more challenging. An algorithm to assign 24-loci MIRU-VNTR profiles to isolates using draft and complete genomes have been described [240] , provided that genomes meet a minimum-quality assembly. More recently, a software that uses long-read sequences obtained using Pacific Biosciences and Oxford Nanopore Technologies as input data has been developed [241] , aiming to overcome the difficulties encountered with the long repeats of the MIRU-VNTR loci that may not be resolved with short-read sequencing.

Interpretation of genotyping and WGS data is challenging because the sampling of the population of interest is often partial and/or biased, and there is a variable interval between time of infection (i.e., when the transmission occurred) and time of sample collection [43, 210] . In addition, transmission routes and intervals may be uncertain due to the slow evolving rates of the MTBC [46, 190, 196, 202, 209, 210] and possible differences in substitution and replication rates between active replication state and latent state [43, 210, 242] . To circumvent some of these issues, the population of interest should be sampled in a manner that the epidemiological processes are captured [46, 210] .

Sole genetic data may not be sufficient to detect transmission in human or bovine TB outbreaks [26, 163] . Identified transmission networks based solely on genetic data can be different from the network of actual transmission events if detailed field investigations are not performed [26, 163] . In particular, highly clustered transmission networks can introduce uncertainty to the evaluation of transmission dynamics, especially when lower resolution genotyping is applied (e.g., MIRU-VNTR PCR and spoligotyping). Challenges associated with clustered networks have been reviewed elsewhere [26] , but in general, clustering adds uncertainty to the identification of infection source and transmission patterns. To provide better resolution, the genetic data must be dense (i.e., well sampled) and complemented by good quality, collected epidemiological and demographic parameters. Accordingly, the association between WGS and network data have been elegantly applied to investigate bTB outbreaks at the local level [51, 210] . The association between network, spatial-temporal mathematical models and WGS is the ideal situation to correctly describe the transmission dynamics of a particular outbreak [26, 51, 210] and propose targeted interventions. These are very powerful approaches to delineate disease control strategies in the long-term, particularly in a multi-host system; however, such refined analyses may not be easy to implement in bTB control programs requiring real-time transmission investigation.

Once a transmission cluster, an infection source or a single infected-case or farm has been detected using WGS, such information needs to be communicated to end-users, e.g., veterinarians, epidemiologists, program officials, among others. Preferentially, reporting must be standardized and comparable among different veterinary services, connecting federal, state or province, and local stakeholders. Unfortunately, no standards exist on how these WGS reports must be, while capacity building is expected to play a crucial role in guaranteeing correct interpretation of the results. In other words, the improvement and acquirement of skills, knowledge, equipment, and general resources by personnel involved are vital for success of WGS-based programs. As bTB is an OIE-notifiable disease, M. bovis WGS-based surveillance systems can greatly benefit from general, robust disease reporting systems already in place in many countries [243] . An ideal system would be able to register: (i) an outbreak with genome-based transmission links that were detected using standardized data generation and bioinformatics pipelines, and (ii) individual cases or farms reporting M. bovis genomes that can be compared with a comprehensive database for the prospective identification of transmission links in a disease surveillance context, using the same standardized pipeline ( Figure 5) . A standardized bioinformatics pipeline, publicly available, has been developed by the National Veterinary Services Laboratories (NVSL) of the US Department of Agriculture (USDA), which implemented the use of M. bovis WGS in its official bTB program in 2013 [21] . Other pipelines (for detection of antibiotic resistance, strain typing, and/or transmission detection) have also been reported for M. tuberculosis and reviewed elsewhere [34] . Increasing efforts must be made to provide standardized end-to-end processes that are affordable and easily managed by non-experts. Figure 5 . Components of a Mycobacterium bovis whole-genome sequencing (WGS) pipeline. In grey: components of the pipeline; in green: files that will compose the sequence and analysis databases (raw reads-FASTQ-and vcf file); in pink: metadata, which can also compose a metadata database linked to each FASTQ and vcf file; in purple: the possibilities of genome comparisons: (i) a user can choose to compare the genome against all genomes of the database or against a subset of genomes composing the database; or (ii) a user can input several genomes that can be compared against each other or with other genomes deposited in the database. An ideal pipeline would also allow periodic national surveillance reports, emitting alerts of newly detected clusters or outbreaks in certain regions that warrant further attention according to user-specified thresholds.

An important example of a standardized laboratory network for WGS reporting is GenomeTrakr [37] (US Food and Drug Administration), used for foodborne pathogens. Others exist for viral pathogens [244, 245] . Reflecting GenomeTrakr structure, an effective integration between veterinary, public health, university, and industry laboratories would be of utmost interest to report M. bovis WGS data as part of national control programs. These laboratories can undergo proficiency tests to ensure quality control and standardization in generating and depositing data to a common database [246] . Once sequencing data is deposited in public databases, further comparison and identification can be fast and efficient, provided there is an effective bioinformatic pipeline established.

Report guidelines for animal health surveillance (AHSURED) were recently proposed, aiming at a systematic description of the means by which the output of surveillance has been generated for a particular disease [247] . Through a survey of experienced professionals working in animal surveillance for State Authorities, a consolidated checklist of items to be reported was generated. Although these guidelines are not specific to any bacterial typing technique or transmission cluster identification method, its applicability using WGS data remains to be tested. Other initiatives aimed at harmonizing the documentation of disease surveillance and reporting include the SANTERO (http://santero.fp7-risksur.eu/), HOTLINE (https://www.thehotlineproject.org/), and RISKSUR (http://www.fp7-risksur.eu/) projects. Guidelines for reporting cohort, case-control, and cross-sectional studies of veterinary diseases have also been proposed (STROBE-Vet Statement; https://meridian.cvm.iastate.edu/strobe). The inclusion of pathogen WGS for infection source identification and contact tracing in these projects has never been evaluated.

Spoligotyping and MIRU-VNTR PCR have been often applied to resolve local clustering on larger scales [26] (Table S1 ). However, their power to discriminate within-cluster events or at the farm-to-farm scale is rather limited [26, 29, 202] (Figure 1 ). In such instances, WGS may provide the resolution to finely resolve transmission patterns happening at the individual herd level, in clusters of small spatial extent [48, 210] , or in countries where bTB prevalence is almost null and re-introduction outbreaks occur due to a single-sourced M. bovis strain [43] . Accordingly, many studies show that WGS is useful to differentiate M. tuberculosis strains with identical MIRU-VNTR genotypes, proving superior resolution [29, 197, 209, 248, 249] . Frequently, traditional typing methods of M. bovis depict the same or few genotypes distributed over relatively large local areas or encompassing a great proportion of the tested isolates under study [29, 48, 210, [250] [251] [252] . Such lack of resolution is troublesome for the detection of M. bovis transmission between farms or between cattle and wildlife, especially in regions approaching free-status, with low bTB prevalence. WGS may provide an opportunity to solve this problem.

On the other hand, WGS studies evaluating transmission of M. bovis in high-burden countries or regions with high M. bovis genetic diversity are lacking. Sometimes, the genetic diversity given by MIRU-VNTR and spoligotyping is so high in the region and/or in the sample set being tested that an infection source cannot be accurately identified [199, 253] . The applicability of M. bovis WGS in these instances remains to be elucidated.

The rate of genetic variation of a pathogen has implications for the scale at which the epidemiological events can be resolved using DNA typing data [254] . Accordingly, the use of WGS has been particularly advantageous to trace RNA virus outbreaks, owing to their high substitution rate. However, MTBC has a much lower evolutionary rate compared to these pathogens. As such, the resolution power of WGS for MTBC at the animal-to-animal or human-to-human level may be poor depending on the scenario [196, 202, 209, 210, 255] . In other words, zero or only very few SNPs between or among MTBC isolates are detected, leading to a failure in describing transmission links carrying meaningful information for prospective interventions. This is not a restraint of the WGS technology per se, yet a consequence of the low mutation rate of MTBC when compared to fast evolving pathogens, such as viruses. Regardless of this limitation, for both M. tuberculosis and M. bovis, it has been concluded that the epidemiology of outbreaks can greatly benefit from WGS data, providing better resolution than any other genotyping technique [26, 34, 202, 249, 256, 257] .

In a bacterial genome, repeat regions exhibit faster evolutionary rates compared to non-repeat regions [233] . MIRU-VNTR and spoligotype genomic regions have been successfully applied for genotyping because these are rapidly evolving regions of repetitive DNA. As explained above, the loss and gain of fragments within these regions drive the identification of genotyping patterns. Therefore, the genetic variation given by MIRU-VNTR PCR and spoligotyping is not depicted in current whole-genome data interpretation, which is based on SNP divergences. It also means that WGS is presently based on signals arising from the slowest evolving regions of the bacterial genome. The use of long-read technologies in the future may allow for more informative sites from repetitive regions to be included in the analysis, which may improve the applicability and resolution of WGS in epidemiology.

In the past years, WGS has helped define MTBC lineages, particularly those adapted to humans (M. tuberculosis L1 through L4 and L7, and M. africanum L5 and L6) [258] . Mycobacterium tuberculosis and M. africanum global lineage distribution has been associated with geography and human populations, and later shown to have distinct profiles of virulence and drug resistance acquisition [258, 259] . Similar attempts to classify M. bovis genetically have been made by using a limited set of markers, leading to the classification of clonal complexes (CCs). Accordingly, four M. bovis CCs have been described (African 1 and 2, European 1 and 2), and these are determined based on specific deletions ranging from 806 to 14,094 bp, few SNPs and spoligotypes [260] [261] [262] [263] . Similarly to M. tuberculosis lineages, CCs appear to be geographically segregated, with African 1 and 2 restricted to Africa, European 2 usually found in the Iberian Peninsula, and European 1 distributed globally [118, [260] [261] [262] [263] . However, M. bovis WGS studies indicate that not all isolates can be classified into these complexes, indicating that CCs do not represent the whole genetic diversity of M. bovis [21, 48, 161] . More recently, a global collection of 1,969 M. bovis genomes from different countries has been analyzed using whole-genome based phylogenetics [45] . This study proposed the existence of at least four distinct global lineages of M. bovis (Lb1 to Lb4), geographically segregated and not fully represented by CCs. There were still few M. bovis genomes without CC markers that could not be classified in any of these lineages (unknown clusters 1, 2 and 3) [45] . Another study also described M. bovis isolates without CC classification in France and suggested that these might be country-specific lineages [228] . As these French M. bovis genomes have not been compared to global genome collections, their lineage classification remains to be unraveled. As more M. bovis genomes are sequenced in the future, particularly from Africa and Asia, a more complete picture of M. bovis lineages global distribution will be determined. The continuous investigation of M. bovis genomes at the global level will provide opportunities to understand differences in virulence and transmission profiles underlying the current disease distribution.

Mycobacterium caprae is a causative agent of TB in animals of the Bovidae family [264] [265] [266] . This pathogen has been mostly detected in the European continent, with few reports of M. caprae in animals outside of Europe and cases of zoonotic TB in European patients detected in other countries. Accordingly, one strain of M. caprae was isolated from cattle in Algeria but has been linked to a possible introduction from mainland Europe [267] . In Morocco, three isolates of animal MTBC with intact RD4 and M. caprae-associated spoligotype were obtained from cattle [268] , and in Japan, one captive Borneo elephant was found infected with M. caprae [269] . With a similar generalist tropism for hosts compared to M. bovis, M. caprae has been isolated from humans, goats, sheep, cattle, pigs, red deer (Cervus elaphus), wild boars (Sus scrofa), foxes (Vulpes vulpes), European bisons (Bison bonasus), Borneo elephant, and captive dromedary camel (Camelus dromedarius) [264] [265] [266] . In Spain, the number of cattle farms from which M. caprae was isolated accounted for 0.85-6.67% of the total number of herds with bTB, a number that is increasing over years [264] . WGS has been successfully used for contact tracing of M. caprae in cattle herds from Germany, showing evidence of within and between farm transmission [44] .

More recently, the possibility of M. orygis as a primary pathogen species causing bTB in South Asia has been raised due to the observation of TB caused by this species in people from the region [192, 270] . However, very little is known about the true host range of M. orygis, as it has been isolated from cattle, oryxes, gazelles, deer, antelope, waterbucks, and non-human primates [271] . A single outbreak of M. orygis in a dairy farm of mixed-breed animals of Bos taurus (Friesian breed) and Bos taurus indicus (Sahiwal breed), with 18 affected animals was reported [272] . As similar outbreaks in alternative species are also described for M. tuberculosis (e.g., elephants) and M. bovis (e.g., dogs) [273] [274] [275] [276] , further studies should be conducted on the actual host range of M. orygis and if cattle is a reservoir for this bacterial species.

In this review, we outlined current standards and/or challenges that remain to be unraveled on genotyping and WGS of M. bovis as tools for epidemiologic investigations. One important step towards implementation of WGS in programs of bTB control and eradication is certainly the standardization of data analysis and reporting of M. bovis WGS outcome. Research gaps associated with these subjects have been identified and described throughout this review (Table 1) . Although continuous efforts must be made to address these challenges, WGS ultimate implementation in bTB programs must also integrate systems administration, management of resulting databases, and maintenance of the pipeline. Another important aspect of standardizing data generation and analysis is to define sets of M. bovis isolates and genomes that can be used for validation of different approaches as well as between laboratories. Validation and inter-laboratory quality control Validation datasets (of bacterial isolates and genomes) Protocols for inter-laboratory standardization (from bacterial isolation to sequencing) * Technical validations should encompass the impact of choosing different parameters or stringency levels on the analysis output tailored for each need (contact investigation, surveillance, drug resistance detection), and also the relevance of these steps in the final outcome (are all these steps and parameters necessary to achieve the correct outcome?).

The field of bTB has unquestionably experienced many technique advancements for transmission investigation and surveillance, from genotyping to genome sequencing. Yet, the disease remains a significant challenge in numerous parts of the world. Many low-to-middle income countries have still to establish basic disease control and eradication programs, and they have not benefited from M. bovis genotyping in the past. Only few, developed countries, with well-established bTB control programs, have implemented M. bovis genotyping as an epidemiologic tool. In addition, many genotyping studies worldwide have been performed in a retrospective, research-oriented manner, frequently not providing real-time investigation to solve outbreaks. Nevertheless, these studies have been incredibly valuable to understand transmission dynamics at the local and country levels, providing important information for public policy implementations. Not surprisingly, the same developed countries with a tradition in applying genotyping techniques into their bTB programs have overcome barriers to apply M. bovis WGS in their transmission investigations or on a research-basis, such as the USA, Ireland, New Zealand, and France [21, 43, 46, 50, 51, 202, 210, 212, 228] . Data generated from these countries and beyond show that WGS provides superior resolution power when compared to traditional genotyping techniques. In addition, WGS provided the means to evaluate the global structure of M. bovis population, bringing valuable insights into the current disease distribution [45] .

It is evident that the research community has proven the usefulness of genotyping techniques for M. bovis transmission detection and surveillance and is now accumulating evidence on the applicability of WGS for the same purposes. However, compared to genotyping, WGS will likely see a much slower pace of employment in bTB programs and research. The requirement for an articulate bTB control and eradication program, specialized personnel, laboratory and computing infra-structure, good internet connectivity, streamlined operational procedures and protocols for data generation, availability of reagents, bioinformatic pipelines, and integrated and effective veterinary services are obstacles for widespread M. bovis WGS implementation in many countries [26, 34, 277] . In addition, despite continuous drops in prices, WGS can still reach a high-cost per sample, especially if just a few isolates need to be sequenced [26] . Thus, successful implementation of M. bovis WGS depends on multiple factors and will be contingent on the veterinary service strength, country-specific willingness to eradicate and control bTB, and investments. Most importantly, current stakeholders have to understand the value of such tools in controlling the disease, and this requires continuous research in different scenarios showing its applicability to resolve outbreaks.

The following are available online at http://www.mdpi.com/2076-2607/8/5/667/s1, Table S1 : List of research articles comparing traditional DNA typing methods using Mycobacterium bovis isolates (1998 to mid-2019), Table S2 : Characteristics of genotyping techniques and whole-genome sequencing (WGS) used in Mycobacterium bovis studies, Table S3 : Characteristics of whole genome sequencing studies of Mycobacterium bovis. Figure S1 : Schematic representation of the genomic regions involved in genotyping techniques used in Mycobacterium bovis studies.

Researched and analyzed the data; wrote and approved the manuscript, A.M.S.G. and C.K.Z. All authors have read and agreed to the published version of the manuscript. 

Genomic insights into tuberculosis

Bovine tuberculosis: An old disease but a new threat to Africa

A restatement of the natural science evidence base relevant to the control of bovine tuberculosis in Great Britain

Mycobacterium bovis (bovine tuberculosis) infection in North American wildlife: Current status and opportunities for mitigation of risks of further infection in wildlife populations

Epidemiology and control of Mycobacterium bovis infection in brushtail possums (Trichosurus vulpecula), the primary wildlife host of bovine tuberculosis in New Zealand

The epidemiology of Mycobacterium bovis in wild deer and feral pigs and their roles in the establishment and spread of bovine tuberculosis in New Zealand wildlife

Mycobacterium bovis: Characteristics of Wildlife Reservoir Hosts

An update on bovine tuberculosis programmes in Latin American and Caribbean countries

Zoonotic tuberculosis in human beings caused by Mycobacterium bovis-A call for action

Zoonotic tuberculosis due to Mycobacterium bovis in developing countries

Revised interpretation of the Hain Lifescience Genotype MTBC to differentiate Mycobacterium canettii and members of the Mycobacterium tuberculosis complex

Mutations in pncA, a gene encoding pyrazinamidase/nicotinamidase, cause resistance to the antituberculous drug pyrazinamide in tubercle bacillus

Pyrazinamide susceptibility and amidase activity of tubercle bacilli

Differences in primary sites of infection between zoonotic and human tuberculosis: Results from a worldwide systematic review

Food and Agriculture Organization of the United Nations (FAO); World Organisation for Animal Health (OIE)

Development of integrated surveillance systems for the management of tuberculosis in New Zealand wildlife

Mycobacterium bovis genotypes in Northern Ireland: Herd-level surveillance

Whole genome sequencing of Mycobacterium bovis isolated from livestock in the United States

Infection of wildlife by Mycobacterium bovis in France assessment through a national surveillance system

Molecular typing of Mycobacterium bovis isolates: A review

Molecular Epidemiology of Mycobacterium bovis in Humans and Cattle

Advances in molecular diagnostics for Mycobacterium bovis

Use of genomics to track bovine tuberculosis transmission

Molecular epidemiology of bovine tuberculosis. I. Mycobacterium bovis genotyping

Molecular differentiation of Mycobacterium bovis isolates. Review of main techniques and applications

The evolution of strain typing in the Mycobacterium tuberculosis complex

Sources of bovine tuberculosis in the United States

Whole-genome sequencing in outbreak analysis

Developing insights into the mechanisms of evolution of bacterial pathogens from whole-genome sequences

Transforming clinical microbiology with bacterial genome sequencing

Whole genome sequencing of Mycobacterium tuberculosis: Current standards and open issues

A systematic method for surveying data visualizations and a resulting genomic epidemiology visualization typology: GEViT

High-throughput sequencing in veterinary infection biology and diagnostics

Practical value of food pathogen traceability through building a whole-genome sequencing network and database

Chapter 1.1.7. Standards for High. Throughput Sequencing, Bioinformatics and Computational Genomics

Next-generation sequencing workflows in veterinary infection biology: Towards validation and quality assurance

Whole Genome Sequencing for Foodborne Disease Surveillance: Landscape Paper; World Health Organization (WHO)

The Use of Next-Generation Sequencing Technologies for the Detection of Mutations Associated with Drug Resistance in Mycobacterium tuberculosis Complex.: Technical Guide; World Health Organization (WHO)

EUSeqMyTB to set standards and build capacity for whole genome sequencing for tuberculosis in the EU

Descriptive epidemiology and whole genome sequencing analysis for an outbreak of bovine tuberculosis in beef cattle and white-tailed deer in northwestern Minnesota

Investigation of intra-herd spread of Mycobacterium caprae in cattle by generation and use of a whole-genome sequence

Global distribution and evolution of Mycobacterium bovis

Using whole genome sequencing to investigate transmission in a multi-host system

Mycobacterium bovis persistence in two different captive wild animal populations in Germany: A longitudinal molecular epidemiological study revealing pathogen transmission by whole-genome sequencing

Genetic profiling of Mycobacterium bovis strains from slaughtered cattle in Eritrea

Genomic Polymorphism Associated with the Emergence of Virulent Isolates of Mycobacterium bovis in the Nile Delta

Disease management at the wildlife-livestock interface: Using whole-genome sequencing to study the role of elk in Mycobacterium bovis transmission in

Combining genomics and epidemiology to analyse bi-directional transmission of Mycobacterium bovis in a multi-host system

Strain Variation in the Mycobacterium Tuberculosis Complex: Its Role in Biology

Linkage disequilibrium between minisatellite loci supports clonal evolution of Mycobacterium tuberculosis in a high tuberculosis incidence area

Global phylogeography of Mycobacterium tuberculosis and implications for tuberculosis product development

Stable association between strains of Mycobacterium tuberculosis and their human host populations

Out-of-Africa migration and Neolithic coexpansion of Mycobacterium tuberculosis with modern humans

Pre-Columbian mycobacterial genomes reveal seals as a source of New World human tuberculosis

A new evolutionary scenario for the Mycobacterium tuberculosis complex

Identification of variable regions in the genomes of tubercle bacilli using bacterial artificial chromosome arrays

Use of a Mycobacterium tuberculosis H37Rv bacterial artificial chromosome library for genome mapping, sequencing, and comparative genomics

Physical mapping of Mycobacterium bovis BCG Pasteur reveals differences from the genome map of Mycobacterium tuberculosis H37Rv and from M. bovis. Microbiology

Differentiation of Mycobacterium tuberculosis complex by PCR amplification of genomic regions of difference

Mycobacterium bovis nom.nov

In Bergey's Manual of Systematic Bacteriology

The Mycobacteria: A Sourcebook

Deoxyribonucleic acid relatedness among species of slowly-growing mycobacteria

Ecotypes of the Mycobacterium tuberculosis complex

Phylogenomic analysis of the species of the Mycobacterium tuberculosis complex demonstrates that Mycobacterium africanum, Mycobacterium bovis, Mycobacterium caprae, Mycobacterium microti and Mycobacterium pinnipedii are later heterotypic synonyms of Mycob

Comparison of sample preparation methods used for the next-generation sequencing of Mycobacterium tuberculosis

A tale of conservation and diversity

Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases

DNA Restriction Endonuclease Analysis of Mycobacterium tuberculosis and Mycobacterium bovis

DNA Restriction Endonuclease Analysis of Mycobacterium bovis and other members of the Tuberculosis complex

Geographic distribution of restriction types of Mycobacterium bovis isolates from brush-tailed possums ( Trichosurus vulpecula) in New Zealand

DNA restriction fragment typing of Mycobacterium bovis isolates from cattle and badgers in Ireland

Typing of Mycobacterium bovis isolates from cattle and other animals in the same locality

DNA fingerprinting of Mycobacterium bovis strains by restriction fragment analysis and hybridization with insertion elements IS1081 and IS6110

DNA polymorphisms in strains of Mycobacterium tuberculosis analyzed by pulsed-field gel electrophoresis: A tool for epidemiology

Improved preparation of high molecular weight DNA for pulsed-field gel electrophoresis from mycobacteria

Development of a new DNA extraction protocol for PFGE typing of Mycobacterium tuberculosis complex

Improved Isolation of Genomic DNA from Mycobacteria in Agarose Plugs by Rapid Lysis with a Combination of N-Acetylglucosaminidase and Lysozyme

Molecular typing methods used in studies of Mycobacterium tuberculosis in Iran: A systematic review

Comparison of PFGE, IS6110-RFLP, and 24-Locus MIRU-VNTR for molecular epidemiologic typing of Mycobacterium tuberculosis isolates with known epidemic connections

The changing face of the epidemiology of tuberculosis due to molecular strain typing: A review

Molecular typing of Mycobacterium bovis isolates from Cameroon

Genetic differentiation of Australian isolates of Mycobacterium tuberculosis by pulsed-field gel electrophoresis

Analysis of multidrug-resistant Mycobacterium bovis from three clinical samples from Scotland

Characterization of Mycobacterium tuberculosis insertion sequence, IS6110, and its application in diagnosis

IS6110: Conservation of sequence in the Mycobacterium tuberculosis complex and its utilization in DNA fingerprinting

New insights into the transposition mechanisms of IS6110 and its dynamic distribution between Mycobacterium tuberculosis Complex lineages

Pasteur, I. IS6110, an IS-like element of Mycobacterium tuberculosis

Insertion element IS986 from Mycobacterium tuberculosis: A useful tool for diagnosis and epidemiology of tuberculosis

Restriction fragment length polymorphism analysis using IS6110 as an epidemiological marker in tuberculosis

Strain identification of Mycobacterium tuberculosis by DNA fingerprinting: Recommendations for a standardized methodology

Van Calculation of the stability of the IS6110 banding pattern in patients with persistent Mycobacterium tuberculosis disease

Occurrence and stability of insertion sequences in Mycobacterium tuberculosis complex strains: Evaluation of an insertion sequence-dependent DNA polymorphism as a tool in the epidemiology of tuberculosis

Use of a repetitive element isolated from Mycobacterium tuberculosis in hybridization studies with Mycobacterium bovis: A new tool for epidemiological studies of bovine tuberculosis

Differentiation of Mycobacterium tuberculosis strains by use of a nonradioactive Southern Blot hybridization method

Molecular cloning of a highly repeated DNA element from Mycobacterium tuberculosis and its use as an epidemiological tool

Comparison of various repetitive DNA elements as genetic markers for strain differentiation and epidemiology of Mycobacterium tuberculosis

Spoligotyping and polymorphic GC-rich repetitive sequence fingerprinting of Mycobacterium tuberculosis strains having few copies of IS6110

CRISPR-Mediated Adaptive Immune Systems in Bacteria and Archaea

Insertion element IS987 from Mycobacterium bovis BCG is located in a hot-spot integration region for insertion elements in Mycobacterium tuberculosis complex strains

Nature of DNA polymorphism in the direct repeat cluster of Mycobacterium tuberculosis; application for strain differentiation by a novel typing method

Simultaneous detection and strain differentiation of Mycobacterium tuberculosis for diagnosis and epidemiology

Spacer oligonucleotide typing of Mycobacterium bovis strains from cattle and other animals: A tool for studying epidemiology of tuberculosis

Simultaneous detection and strain differentiation of Mycobacterium tuberculosis complex in paraffin wax embedded tissues and in stained microscopic preparations

Transfer of a Mycobacterium tuberculosis genotyping method, Spoligotyping, from a reverse line-blot hybridization, membrane-based assay to the Luminex multianalyte profiling system

Use of Luminex MagPlex magnetic microspheres for high-throughput spoligotyping of Mycobacterium tuberculosis isolates in Port-au-Prince

Mycobacterium tuberculosis complex CRISPR genotyping: Improving efficiency, throughput and discriminative power of "spoligotyping" with new spacers and a microbead-based hybridization assay

Replacing reverse line blot hybridization spoligotyping of the Mycobacterium tuberculosis complex

A DNA chip-based spoligotyping method for the strain identification of Mycobacterium tuberculosis isolates

Rapid spoligotyping of Mycobacterium tuberculosis complex bacteria by use of a microarray system with automatic data processing and assignment

Spoligotyping of Mycobacterium tuberculosis complex isolates using hydrogel oligonucleotide microarrays

Lab-on-chip-based platform for fast molecular diagnosis of multidrug-resistant tuberculosis

Spoligotyping of Mycobacterium tuberculosis complex isolates using ligation-based amplification and melting curve analysis

The global distribution and phylogeography of Mycobacterium bovis clonal complexes

Microevolution of the Direct Repeat Region of Mycobacterium tuberculosis: Implications for Interpretation of Spoligotyping Data

Variable number of tandem repeats (VNTR) and its application in bacterial epidemiology

Genetic diversity in the Mycobacterium tuberculosis complex based on variable numbers of tandem DNA repeats

Differentiation of strains in Mycobacterium tuberculosis complex by DNA sequence polymorphisms, including rapid identification of M. bovis BCG

Identification of novel intergenic repetitive units in a mycobacterial two-component system operon

Comparison of methods based on different molecular epidemiological markers for typing of Mycobacterium tuberculosis complex strains: Interlaboratory study of discriminatory power and reproducibility

Comparison of variable number tandem repeat and IS6110-restriction fragment length polymorphism analyses for discrimination of high-and low-copy-number IS6110 Mycobacterium tuberculosis isolates

Molecular typing of Mycobacterium tuberculosis based on variable number of tandem DNA repeats used alone and in association with spoligotyping

Proposal for standardization of optimized mycobacterial interspersed repetitive unit-variable-number tandem repeat typing of Mycobacterium tuberculosis

A web tool for polyphasic genotyping of Mycobacterium tuberculosis complex bacteria

MIRU-VNTR allelic variability depends on Mycobacterium bovis clonal group identity

A quantitative evaluation of MIRU-VNTR typing against whole-genome sequencing for identifying Mycobacterium tuberculosis transmission: A prospective observational cohort study

Proposal of a consensus set of hypervariable mycobacterial interspersed repetitive-unit-variable-number tandem-repeat loci for subtyping of Mycobacterium tuberculosis Beijing isolates

The complete genome sequence of Mycobacterium bovis

Updated reference genome sequence and annotation of Mycobacterium bovis AF2122/97

Complete nontuberculous mycobacteria whole genomes using an optimized DNA extraction protocol for long-read sequencing

Setup, validation, and quality control of a centralized whole-genome-sequencing laboratory: Lessons learned

Fastq-clean: An optimized pipeline to clean the Illumina sequencing data with quality control

Trimmomatic: A flexible trimmer for Illumina sequence data

Quality control and preprocessing of metagenomic datasets

A Quality Control for High Throughput Sequence Data

Qualimap 2: Advanced multi-sample quality control for high-throughput sequencing data

Summarize analysis results for multiple tools and samples in a single report

An ultra-fast all-in-one FASTQ preprocessor

Evaluating the necessity of PCR duplicate removal from next-generation sequencing data and a comparison of approaches

Evaluation of quality assessment protocols for high throughput genome resequencing data

An extensive evaluation of read trimming effects on illumina NGS data analysis

Diverse and widespread contamination evident in the unmapped depths of high throughput sequencing data

Reagent and laboratory contamination can critically impact sequence-based microbiome analyses

Human contamination in bacterial genomes has created thousands of spurious proteins

Unexpected cross-species contamination in genome sequencing projects

Contaminant DNA in bacterial sequencing experiments is a major source of false genetic variability

FastQ Screen: A tool for multi-genome mapping and quality control

Ultrafast metagenomic sequence classification using exact alignments

Integrating standardized whole genome sequence analysis with a global Mycobacterium tuberculosis antibiotic resistance knowledgebase

Identifying mixed Mycobacterium tuberculosis infection and laboratory cross-contamination during mycobacterial sequencing programs

Genome sequencing of Mycobacterium pinnipedii strains: Genetic characterization and evidence of superinfection in a South American sea lion (Otaria flavescens)

Novel Mycobacterium tuberculosis Complex Isolate from a Wild Chimpanzee

Lineage-specific SNPs for genotyping of Mycobacterium tuberculosis clinical isolates

Multi-clonal evolution of multi-drug-resistant/extensively drug-resistant Mycobacterium tuberculosis in a high-prevalence setting of Papua New Guinea for over three decades

Towards standardisation: Comparison of five whole genome sequencing (WGS) analysis pipelines for detection of epidemiologically linked tuberculosis cases

Complete Genome Sequencing of Mycobacterium bovis SP38 and Comparative Genomics of Mycobacterium bovis and M. tuberculosis Strains

In silico region of difference (RD) analysis of Mycobacterium tuberculosis complex from sequence reads using RD-Analyzer

Abubakar, I. Interpreting whole genome sequencing for investigating tuberculosis transmission: A systematic review

Does choice matter? Reference-based alignment for molecular epidemiology of tuberculosis

Genomic variant identification methods alter Mycobacterium tuberculosis transmission inference

Genomic diversity affects the accuracy of bacterial SNP calling pipelines. Gigascience 2020, 9

Progenitor strain introduction of Mycobacterium bovis at the wildlife-livestock interface can lead to clonal expansion of the disease in a single ecosystem

The complete genome sequence of Mycobacterium bovis Mb3601, a SB0120 spoligotype strain representative of a new clonal group

Computational pan-genome mapping and pairwise SNP-distance improve detection of Mycobacterium tuberculosis transmission clusters

Best practices for evaluating single nucleotide variant calling methods for microbial genomics

SRST2: Rapid genomic surveillance for public health and hospital microbiology labs

Fast gapped-read alignment with Bowtie 2

A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data

Fast and accurate long-read alignment with Burrows-Wheeler transform

Ultra-fast whole-genome secondary analysis on Illumina sequencing platforms

SOAP2: An improved ultrafast tool for short read alignment

Stampy: A statistical algorithm for sensitive and fast mapping of Illumina sequence reads

Fast and SNP-tolerant detection of complex variants and splicing in short reads

Short read mapping: An algorithmic tour

VarScan: Variant detection in massively parallel sequencing of individual and pooled samples

Haplotype-based variant detection from short-read sequencing

VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing

Genome Project Data Processing Subgroup The Sequence Alignment/Map format and SAMtools

A framework for variation discovery and genotyping using next-generation DNA sequencing data

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

Evaluation of SNP calling methods for closely related bacterial isolates and a novel high-accuracy pipeline

Accurate whole-genome sequencing-based epidemiological surveillance of Mycobacterium tuberculosis. In Current and Emerging Technologies for the Diagnosis of Microbial Infections

-host population dynamics of M. tuberculosis during treatment failure

Previously undetected super-spreading of Mycobacterium tuberculosis revealed by deep sequencing

Whole-genome sequencing to establish relapse or re-infection with Mycobacterium tuberculosis: A retrospective observational study

Identifying mixed Mycobacterium tuberculosis infections from whole genome sequence data

A new phylogenetic framework for the animal-adapted Mycobacterium tuberculosis complex

Recurrence due to relapse or reinfection with Mycobacterium tuberculosis: A whole-genome sequencing approach in a large, population-based cohort with a high HIV infection prevalence and active follow-up

Whole genome sequencing of Mycobacterium bovis to obtain molecular fingerprints in human and cattle isolates from Baja California

Use of whole-genome sequencing and evaluation of the apparent sensitivity and specificity of antemortem tuberculosis tests in the investigation of an unusual outbreak of Mycobacterium bovis infection in a Michigan dairy herd

Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: A retrospective observational study

Use of whole genome sequencing to determine the microevolution of Mycobacterium tuberculosis during an outbreak

Whole genome sequencing analysis of intrapatient microevolution in Mycobacterium tuberculosis: Potential impact on the inference of tuberculosis transmission

Whole-genome sequencing to detect recent transmission of Mycobacterium tuberculosis in settings with a high burden of tuberculosis

Population Genomics of Mycobacterium tuberculosis in Ethiopia Contradicts the Virgin Soil Hypothesis for Human Tuberculosis in Sub-Saharan Africa

Comparative whole-genome sequence analysis of Mycobacterium tuberculosis isolated from tuberculous meningitis and pulmonary tuberculosis patients

Use of bacterial whole-genome sequencing to investigate local persistence and spread in bovine tuberculosis

Mixed-strain Mycobacterium tuberculosis infections and the implications for tuberculosis treatment and control

Tuberculosis in Swiss captive Asian elephants: Microevolution of Mycobacterium tuberculosis characterized by multilocus variable-number tandem-repeat analysis and whole-genome sequencing

Within patient microevolution of Mycobacterium tuberculosis correlates with heterogeneous responses to treatment

Whole genome sequencing reveals mycobacterial microevolution among concurrent isolates from sputum and blood in HIV infected TB patients

Genomic diversity in autopsy samples reveals within-host dissemination of HIV-associated M. tuberculosis HHS Public Access Author manuscript

Whole genome sequencing versus traditional genotyping for investigation of a Mycobacterium tuberculosis outbreak: A longitudinal molecular epidemiological study

Whole genome sequencing reveals local transmission patterns of Mycobacterium bovis in sympatric cattle and badger populations

Large-scale whole genome sequencing of M. tuberculosis provides insights into transmission in a high prevalence area

Transmission Network of Deer-Borne Mycobacterium bovis Infection Revealed by a WGS Approach

Elucidating emergence and transmission of multidrug-resistant tuberculosis in treatment experienced patients by whole genome sequencing

Reemergence and amplification of tuberculosis in the Canadian Arctic

Assessment of Mycobacterium tuberculosis transmission in Oxfordshire, UK

Clinical application of whole-genome sequencing to inform treatment for multidrug-resistant tuberculosis cases

Mycobacterium bovis in Panama

Identifying likely transmission pathways within a 10-year community outbreak of tuberculosis by high-depth whole genome sequencing

Whole-genome-based Mycobacterium tuberculosis surveillance: A standardized, portable, and expandable approach

Harmonized genome wide typing of tubercle bacilli using a web-based gene-by-gene nomenclature system

Bionumerics for Whole Genome Multi Locus Sequence Typing

Machine learning and structural analysis of Mycobacterium tuberculosis pan-genome identifies genetic signatures of antibiotic resistance

Comparative Whole-Genome Analysis of Clinical Isolates Reveals Characteristic Architecture of Mycobacterium tuberculosis Pangenome

Sequenciamento do Genoma Completo do Mycobacterium bovis Como Instrumento de Sistema de Vigilância no Estado de Santa Catarina

Whole genome sequencing of the monomorphic pathogen Mycobacterium bovis reveals local differentiation of cattle clinical isolates

Molecular epidemiology and whole genome sequencing analysis of clinical Mycobacterium bovis from Ghana

Accurate phylogenetic relationships among Mycobacterium bovis strains circulating in France based on whole genome sequencing and single nucleotide polymorphism analysis

Patterns and processes of Mycobacterium bovis evolution revealed by phylogenomic analyses

Relating phylogenetic trees to transmission trees of infectious disease outbreaks

Reconstructing disease outbreaks from genetic data: A graph approach

Variant profiling of evolving prokaryotic populations

Positive and strongly relaxed purifying selection drive the evolution of repeats in proteins

Opportunities and challenges in long-read sequencing data analysis

Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction

A comparative evaluation of hybrid error correction methods for error-prone long reads

SpolPred: Rapid and accurate prediction of Mycobacterium tuberculosis spoligotypes from short genomic sequences

Fast and accurate in silico Mycobacterium spoligotyping from sequence reads

Exhaustive reconstruction of the CRISPR locus in Mycobacterium tuberculosis complex using short reads

MIRU-profiler: A rapid tool for determination of 24-loci MIRU-VNTR profiles from assembled genomes of Mycobacterium tuberculosis

MIRU-VNTR typing directly from long sequencing reads

Use of whole genome sequencing to estimate the mutation rate of Mycobacterium tuberculosis during latent infection

Challanges of Animal Health Information Systems And Surveillance for Animal Diseases And Zoonoses; FAO Animal Profuction and Health Proceedings

Nextstrain: Real-time tracking of pathogen evolution

Virus Pathogen Database and Analysis Resource (ViPR): A Comprehensive Bioinformatics Database and Analysis Resource for the Coronavirus Research Community. Viruses

GenomeTrakr proficiency testing for foodborne pathogen surveillance: An exercise from

Development of reporting guidelines for animal health surveillance-AHSURED

Whole-genome sequencing and social-network analysis of a tuberculosis outbreak

Epidemiological links between tuberculosis cases identified twice as efficiently by whole genome sequencing than conventional molecular typing

Comparison of 45 variable number tandem repeat (VNTR) and two direct repeat (DR) assays to restriction endonuclease analysis for typing isolates of Mycobacterium bovis

DNA typing of Mycobacterium bovis strains from the castlepoint area of the Wairarapa

Evaluation of Spoligotyping and MIRU-VNTR for Mycobacterium bovis in Xinjiang

Limitations of spoligotyping and variable-number tandem-repeat typing for molecular tracing of Mycobacterium bovis in a high-diversity setting

Measurably evolving pathogens in the genomic era

Whole genome sequence analysis of a large isoniazid-resistant tuberculosis outbreak in London: A retrospective observational study

Inferring patient to patient transmission of Mycobacterium tuberculosis from whole genome sequencing data

Whole genome sequencing of Mycobacterium tuberculosis for detection of recent transmission and tracing outbreaks: A systematic review

Consequences of genomic diversity in Mycobacterium tuberculosis

Co-evolution of Mycobacterium tuberculosis and Homo sapiens

African 2, a clonal complex of Mycobacterium bovis epidemiologically important in East Africa

African 1, an epidemiologically important clonal complex of Mycobacterium bovis dominant in Mali

European 1: A globally important clonal complex of Mycobacterium bovis

European 2-A clonal complex of Mycobacterium bovis dominant in the Iberian Peninsula

Mycobacterium caprae infection in livestock and wildlife

Outbreak of tuberculosis caused by Mycobacterium caprae in a zoological garden

Transmission of Mycobacterium caprae in a herd of European bison in the Bieszczady Mountains, Southern Poland

Molecular characterization of Mycobacterium bovis strains isolated from cattle slaughtered at two abattoirs in Algeria

Molecular characterization of bovine tuberculosis strains in two slaughterhouses in Morocco

Mycobacterium caprae infection in a captive Borneo elephant

Zoonotic tuberculosis in India: Looking beyond Mycobacterium bovis. bioRxiv

Characterization of Mycobacterium orygis as M. tuberculosis complex subspecies

Tuberculosis caused by Mycobacterium orygis in dairy cattle and captured monkeys in Bangladesh: A new scenario of tuberculosis in South Asia

An outbreak of tuberculosis due to Mycobacterium bovis infection in a pack of English Foxhounds

Mycobacterium bovis tuberculosis in hunting hounds

Fatal tuberculosis in a free-ranging African elephant and one health Implications of human pathogens in wildlife

Mycobacterium tuberculosis in wild Asian elephants, southern India

Building international genomics collaboration for global health security

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license

The authors are in debt to Carolina Bertelli de Souza Ferreira from the University of São Paulo, São Paulo, Brazil for invaluable technical assistance.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.