key: cord-0883119-egwo19oc authors: Aw, Tiong Gim; Rose, Joan B title: Detection of pathogens in water: from phylochips to qPCR to pyrosequencing date: 2011-12-05 journal: Curr Opin Biotechnol DOI: 10.1016/j.copbio.2011.11.016 sha: 9100b941c934c3d01cca98ebdbd6923278a240d0 doc_id: 883119 cord_uid: egwo19oc Waterborne pathogens pose a significant threat to human health and a proper assessment of microbial water quality is important for decision making regarding water infrastructure and treatment investments and eventually to provide early warning of disease, particularly given increasing global disasters associated with severe public health risks. Microbial water quality monitoring has undergone tremendous transition in recent years, with novel molecular tools beginning to offer rapid, high-throughput, sensitive and specific detection of a wide spectrum of microbial pathogens that challenge traditional culture-based techniques. High-density microarrays, quantitative real-time PCR (qPCR) and pyrosequencing which are considered to be breakthrough technologies borne out of the ‘molecular revolution’ are at present emerging rapidly as tools of pathogen detection and discovery. Future challenges lie in integrating these molecular tools with concentration techniques and bioinformatics platforms for unbiased guide of pathogen surveillance in water and developing standardized protocols. Water is essential to human health as a means to reduce diseases and also as the vehicle through which diseasecausing microorganisms or pathogens may be transmitted. A growing number of enteric microorganisms, including bacteria, viruses and protozoa in sewage have been identified as important waterborne pathogens. Most waterborne pathogens are introduced into drinking water supplies and recreational waters by human and animal wastes and poor wastewater infrastructure as well as inadequate treatment. This includes leaking sewers, combined sewer overflows, urban storm water runoff, agricultural land runoff, faulty septic tanks/drain fields and poorly treated wastewater discharges. The vital role of public health in minimizing community wide-spread acquisition of these waterborne pathogens and infectious disease control of the hundreds of pathogens has drawn attention to the need for detecting and identifying bacteria, parasites and viruses directly in water. Until recently, scientists have had no accurate and comprehensive way to detect and quantify the presence of microbial pathogens in various environmental samples. Culture-based methods have long been extensively used to detect microbial pathogens in environmental matrices and are considered to be 'gold standard', but inherent problems exist, such as lengthy time of analysis, labor intensive and the inability to detect uncultivated species. The term ''the great plate count anomaly'' was coined by Staley and Konopka [1] over 20 years ago to describe great discrepancy between direct microscopic counting and numbers of bacteria that can be enumerated from environmental samples by plating procedures. The ability to access DNA sequences of microbial genomes offers an exciting alternative for exploring and improving the characterization and rapid detection of a diverse group of pathogens in water. The advent of molecular tools has resulted in tremendous paradigm shifts in microbial water quality monitoring: (i) direct detection of pathogens, including uncultivated pathogens, is now possible and it takes hours instead of days to weeks, (ii) the true positivity rate of some pathogens in waters is known to be much higher than previously thought based on culture-based detection alone, and (iii) traditional bacterial indicators often fail to predict or correlate with the presence of waterborne pathogens. By increasing the capacity to detect a wide range of microbial pathogens in waters, novel and potential pathogens could be recognized and associated with waterborne diseases and better characterization of known pathogens can be undertaken for waterborne diseases of unknown etiology. According to Web of Science, 403 articles with qPCR and water as key words have been published since 1998 with over 100 articles in 2010 and 2011. Citations of this work grew to over 900/year by 2011. This article reviews some of the most recent and significant progress in molecular tools for the detection of pathogens in water. eliminated many iconic waterborne enteric diseases such as typhoid and cholera in the North America and Europe. However, microbial pathogens continue to be a major cause of water-related disease outbreaks globally and remain a key public health challenge in providing safe drinking water and recreational water in the 21st century. In addition to these traditional waterborne pathogens, a significant number of emerging and re-emerging pathogens have now been recognized and water transmission has caused significant outbreaks; examples include Escherichia coli O157:H7, Helicobacter pylori, Legionella pneumophila, Campylobacter jejuni, Mycobacterium avium complex, enteric viruses such as noroviruses and hepatitis E virus, and the parasites Cryptosporidium parvum and Microsporidia. According to the latest World Health Organization (WHO)'s Guidelines for Drinking-water Quality, zoonotic pathogens which make up 75% of the emerging pathogens, pose the greatest challenges to ensuring the safety of drinking water and ambient water [2]. The list of potential waterborne pathogens is extensive [2] and Table 1 provides examples of emerging waterborne pathogens which are also listed on the latest US EPA's Contaminant Candidate List (CCL-3). Because of the nature of waterborne pathogens, what is perceived as small concentrations are associated with high levels of community risks [3] . The use of quantitative microbial risk assessment allows for an improved understanding of how concentrations, loading and transport influence probability of infection associated with microbial contamination of water. The limitations in detecting and identifying pathogens directly from environmental water samples by culture or microscopy can now be addressed by integrating concentration techniques with molecular tools to provide sensitive, specific and quantitative data on any pathogens of interest. The tools have a number of advantages. Multiple pathogens can be characterized by microarray technology, however often sensitivity is not adequate. qPCR is the most rapidly growing technique for use in the water environment for both microbial source tracking and rapid pathogen specific quantification; and finally the power of pyrosequencing for exploring the water environment has offered new insights into the microbial world. Developed in the early to mid-1990s, DNA microarrays are arrays containing a high density immobilized nucleic acids (genomic DNA, cDNA or oligonucleotides) in an ordered two-dimensional matrix that enable the simultaneous interrogation of thousands of genes in a single assay via nucleic acid hybridization [4 ] . Unfortunately, current limitations of microarrays including the lack of sequence information for many pathogens, non-specific binding and inability to offer the low detection levels available by qPCR, make the use of microarrays for the direct detection of very dilute spectrum of pathogens in environmental matrices impractical. Until recently, microarrays have not been used to quantitatively detect Detection of pathogens in water Aw and Rose 423 Table 1 Potential waterborne pathogens, diseases caused and recent qPCR assays for their detection Major disease(s) qPCR detection limit (per reaction) Ref. pathogens in environmental water samples and a previous PCR enrichment with gene-specific primers was usually needed to increase the detection resolution [4 ] . Multiple displacement amplification, an isothermal DNA amplification reaction using phi29 DNA polymerase and random hexamer primers, offers a promising unbiased approach for increasing the amount of target DNA in the environmental sample before microarray detection [5] . The rapid growth of 16S rRNA gene databases has helped pave the way for developing high-density phylogenetic microarrays to simultaneously analyze many taxa in complex environmental samples. Phylogenetic microarrays such as the PhyloChip have been proven to be useful in determining microbial community structure [6] . The Phy-loChip that targets the known diversity within the 16S rRNA gene is able to simultaneously identify any of thousands of taxa present in an environmental sample. The current version (G3) of the PhyloChip allows simultaneous detection of up to 50,000 bacterial, archaeal and microalgal taxa [7] . Besides monitoring bacterial population during environmental bioremediation, PhyloChip technology has been used for monitoring airborne bacteria for bioterrorism surveillance [8] , microbial community analysis in biological wastewater treatment systems [9] and microbial source tracking of pathogens in coastal urban watershed [10] . More recently, two methods of PCR-independent microbial community analysis using the microarray PhyloChip: direct rRNA hybridization and doublestrand cDNA generation and hybridization have been developed, providing cost-effective and PCR bias-free alternatives to PCR-amplified microbial community analysis [11] . The ViroChip, a pan-virus DNA microarray containing thousands of unique 70-mer oligonucleotide probes to target all viral families known to infect humans has been described by Wang et al. [12, 13] . The authors demonstrated that this approach coupled with a random PCR amplification strategy, was able to detect multiple viruses as well as novel members of known viral families. Recently, several high-density microarray prototypes such as the pan-Microbial Detection Array (MDA) [14] and the GreeneChip system [15] have been developed for the molecular surveillance and detection of a panel of pathogens including bacteria, viruses and parasites in clinical samples. These methods have yet to be applied to realworld water samples for the detection of waterborne pathogens. To make this practical, these high-density microarrays will need to be further coupled with water sample concentration and purification techniques, perhaps other amplification techniques and then could be used to assess the diversity of pathogens in water and potentially discover the presence of previously unknown pathogens. Polymerase chain reaction (PCR) developed in the early 1980s by Kary Mullis, is now a common and often indispensable molecular technique in both clinical and biological research laboratories. The concept of PCR remains unchanged since its inception. However, the agarose gel-based detection of PCR amplification products has undergone dramatically changes that have led to a revolutionary testing platform, namely quantitative real-time PCR (qPCR), which enables quantification of DNA targets by monitoring amplified products during cycling as indicated by increased fluorescence [16 ] . In the past few years, the development of qPCR has provided significant improvements in PCR-based assays for the detection of waterborne bacterial, viral and protozoan pathogens (Table 1) . Currently, qPCR offers numerous advantages over conventional, end-point PCR techniques such as higher sensitivity and specificity, faster rate of detection, no post-PCR analysis and thereby minimizing the risk of carryover contamination, and the capability to provide quantitative results. The dual-labeled fluorescent probe such as TaqMan probe has been the most popular technique of choice for detecting pathogens in environmental samples owing to its higher specificity. The advantage of this assay is that the target amplicon is verified by the probe which recognizes internal amplicon sequences. Inclusion of an internal control in each sample is essential for the accuracy of quantitative results, particularly for complex environmental matrices, by monitoring the efficiency of nucleic acid extraction and reverse transcription (for RNA target) as well as the presence of PCR inhibitors. Detection of human pathogenic viruses has undergone a significant transition from conventional cell culture to novel molecular techniques, particularly quantitative reverse-transcriptase (qRT)-PCR for RNA viruses, with higher sensitivity and specificity. Detection of pathogenic viruses by qRT-PCR does not necessarily indicate their infectivity, but it has been an excellent methodology for studying virus pollution in the water environment. Moreover, the use of qRT-PCR for the analysis of non-cultivable noroviruses [17] and slow growing hepatitis A virus [18] has been fundamental to advancing the understanding of the occurrence of these human pathogens in water environments. Surveys around the world have begun exploring the occurrence of both DNA and RNA viruses in water by qPCR (see [19 ] for a review). However, there were no consistent trends found in regard to the viral types or levels. The molecular characterization of environmental viral strains has revealed a great genetic diversity of human pathogenic viruses in polluted surface waters [20] . Yet no global survey of viruses has been undertaken by evaluating the same virus targets, using the same protocols. A temporal assessment is needed to compare the diversity and level of viral infections and water pollution to contrast developing to developed regions of the world, from tropical environments to temperate conditions and to contrast rural versus urban populations. It may be possible to take on global virus characterization in the future with evolution in technology that allows higher throughput and hundreds of targets. Advances in microfluidics and nanobiotechnology allow for the construction of high-density and low-volume qPCR platforms, such as Biotrove OpenArray system (Applied Biosystems) that are capable of accommodating 3072 reactions per array. With a novel internal amplification control approach, this next-generation qPCR platform offers high-throughput testing of environmental samples against a wide range of pathogens [21, 22] . Since the invention of the dideoxy chain termination technique, often referred to as 'Sanger sequencing', was first described more than three decades ago, technological advances have led to the increasing development of massively parallel DNA sequencing platforms, including pyrosequencing. Based on the sequencing-by-synthesis principle, pyrosequencing is a DNA sequencing technology that utilizes enzyme-coupled reactions and bioluminescence to monitor the pyrophosphate release accompanying nucleotide incorporation, in real-time ( Figure 1 ) [23 ] . A major advance for pyrosequencing technology came with the introduction of the Roche 454 sequencing platform. This next generation sequencing technology holds several advantages over the more traditional Sanger sequencing include higher throughput, simplified sample preparation and the miniaturization of sequencing chemistries, enabling massively parallel sequencing reactions to be carried out at a scale and cost not previously possible [24 ] . The latest 454 sequencing platform, GX FLX+ is able to produce one million reads (with a read length of up to 1000 bases) per 23-hour sequencing run. Pyrosequencing technology is revolutionizing the study of microbial ecology as well as direct metagenomic detection Detection of pathogens in water Aw and Rose 425 High levels of several classes of resistance genes in bacterial communities exposed to antibiotic were identified. [55 ] Reclaimed and potable waters Viral DNA and RNA Tangential flow filtration, DNase treatment, MDA, pyrosequencing (454 GS-FLX and GS20 sequencer) Over 50% of the viral sequences with no significant similarity to proteins in GenBank. Bacteriophages dominated the DNA viral community. The RNA metagenomes contained sequences related to plant viruses and invertebrate picornaviruses. [26 ] Wastewater biosolids Viral DNA and RNA DNase and RNase treatment, reverse transcription for RNA, pyrosequencing (454 GS-FLX sequencer) Optimal annotation approach specific for viral pathogen identification is described. Parechovirus, coronavirus, adenovirus, aichi virus and herpesvirus were identified. [56 ] Lake water Viral RNA Tangential flow filtration, DNase and RNase treatment, random amplification (klenow DNA polymerase), pyrosequencing (454 GS-FLX sequencer) 66% of the sequences with no significant similarity to known sequences. Presence of viral sequences (30 viral families) with significant homology to insect, human and plant pathogens. of human pathogens in both clinical and environmental samples. Unlike PCR and microarray methods where investigators are limited by known sequence information and must select the range of pathogens to be considered in a given assay, high-throughput sequencing approach is unbiased and makes it possible to detect novel pathogens. Figure 1 shows the basic steps that are needed for the detection of microbial pathogens in environmental waters by pyrosequencing. Besides the direct detection of pathogens in stool specimens [25] , several recent papers describe the application of pyrosequencing to investigate the diversity of bacterial and viral pathogens in environmental samples (Table 2) . A general finding from these studies is that the majority of the sequences in environmental metagenomes, particularly viral sequences, have no similarities to known genes in the database, indicating high abundance of uncharacterized viruses which may be potential human pathogens [26 ,27 ] . The extremely high-throughput of pyrosequencing has significantly expanded the repertoire of complete bacterial pathogen genomes, particularly strains associated with outbreaks [28, 29] , since whole bacterial genomes can be sequenced in a fraction of the time and at a much lower cost. As the majority of waterborne disease outbreaks had undetermined etiology agents [30] , pyrosequencing and other high-throughput sequencing technologies are likely to identify novel pathogens associated with waterborne illnesses in the future and could address multiple etiologies. Moreover, sequencing of outbreak isolates is of public health interest, particularly if rapid data about virulence markers can be obtained. Most recently, a new whole genome amplification and sequencing approach called 'Single Virus Genomics', which enables the isolation and complete genome sequencing by 454 pyrosequencing of the single virus particle (bacteriophages lambda and T4), has been described [31 ] . This proof-of-concept study is likely to enhance uncultivated virus discovery as well as provide relevant viral reference genomes for the assembly of metagenomic data and the design of qPCR primers and probes targeting uncultivated viruses. Further study is needed to optimize this approach for human pathogenic viruses in environmental waters. Although high-throughput sequencing has made remarkable advances in the past few years, the most critical challenge for its application in pathogen detection is the development of improved user-friendly bioinformatics and visualization platforms to facilitate rapid and robust analysis and interpretation of high-throughput sequencing data. Challenges for pathogen detection in water using molecular tools: upstream sample processing Adapting novel molecular tools to meet the needs of waterborne pathogen monitoring is a significant technical challenge. Rapid detection and identification of waterborne pathogens by novel molecular tools are often hampered by the relatively low abundance of target organisms in water samples. Much work has been carried out in the development of rapid, sensitive and specific analytical methods. However, less emphasis has been placed on the pre-analytical or upstream sample processing which must be able to selectively separate and concentrate all target pathogens in each water sample. The preconcentration step is important in improving the detection sensitivity of microbial contaminants, particularly by molecular detection tools which use extremely small sample volumes (a few microliters). Recently, tangential-flow, hollow fiber ultrafiltration, which uses size-exclusion as the mechanism of concentration, has been shown to be a promising technique to concentrate diverse pathogens in a single process [32, 33] . Moreover, ultrafiltration is currently the method of choice to concentrate viruses from large volumes of water for metagenomic detection by pyrosequencing [26 ,27 ] . The relationship between detection by molecular approaches and viability or infectivity of waterborne pathogens still remains a concern, particularly for risk assessment to public health. One of the most recent approaches to determine viability or infectivity using molecular tools is propidium monoazide (PMA) treatment before qPCR to reduce PCR signals from DNA originating from dead cells [34, 35] . Although PMA treatment has also shown promise in excluding signals from dead cells with microarrays [36] and pyrosequencing [37] , further validation is needed in environmental water samples, particularly after disinfection. Disinfection mainly chlorination of water serves as the major process for inactivating pathogens. Viable qPCR methods using PMA for distinguishing live cells and dead cells (heat killed) has not been verified for water after disinfection. Varma et al. [38] and Srinivasan et al. [39] found that there was no observed reduction in the qPCR signals after disinfection of wastewater effluent samples. Thus, currently, qPCR can be used as a tool to monitor pathogen loading and physical removal or dilution but cannot be used to address viability. This review shows that considerable progress has been made to achieve sensitive, specific and high-throughput detection of pathogens in water. Overall, qPCR is among the most reliable and cost effective of the molecular tools owing to the shorten time to result as well as its potential to increase sensitive and specific, and it is therefore not surprising that qPCR is increasingly being used for the detection of waterborne pathogens. Despite the potential for development of high-throughput analytical systems for the detection and differentiation of pathogens in water, further assay optimization is needed for high-density microarrays to demonstrate reliability and match the sensitivity of qPCR. Roche 454 pyrosequencing and other commercially available high-throughput sequencing platforms such as Solexa/Illumina Genome Analyzer, Applied Biosystem SOLiD Sequencing as well as the most recently released Ion Torrent system [40 ] have revolutionized the study of microbial diversity. Nevertheless, these high-throughput sequencing platforms are currently in their infancy for the direct detection of pathogens in water and are exploratory. Whereas many technological and methodological challenges remain, high-throughput sequencing technology has the potential to provide an unbiased detection approach for waterborne pathogens with a single common protocol. By integrating with qPCR, concentration of key pathogen groups can be subsequently determined. A recent study by Svraka et al. [41 ] where metagenomic sequencing was used to identify viruses in samples that exhibited consistent cytopathogenic effects in cell culture but remained unexplained by established PCR, demonstrates the need for unbiased approach for pathogen detection in a public health setting. However, several questions arise when molecular tools are being introduced to microbial water quality monitoring. Which systems offer the greatest promise for more routine water pollution diagnoses? Can current laboratory personnel be trained to perform some of these complex assays on a routine basis? What changes are necessary in water sampling, transport and storage? Are the improved sensitivity and decreased turnaround time worth the additional cost for water quality monitoring? Currently, success is being achieved more in the downstream part of the detection process and strategic sampling as well as concentration techniques of pathogens in environmental waters should be exploited further. It is clear that the field of environmental pathogen detection is and will remain highly dynamic with tremendous potential for development of new tools and continuous improvement of existing concepts. However, the major challenge facing environmental microbiologists is to apply these tools for detecting pathogens in real environmental matrices on a routine basis. Multi-laboratory round robin test should be performed to evaluate novel molecular tools and refine the prototype protocols into standardized methods with appropriate technology transfer. Papers of particular interest, published within the period of review, have been highlighted as: of special interest of outstanding interest Measurement of in situ activities of nonphotosynthetic microorganisms in aquatic and terrestrial habitats Microbial health risks and water quality Challenges of multiplexed detection: detection of pathogens in water and wastewater using microarrays A comprehensive review describing recent advances and challenges of microarrays for the detection of waterborne pathogens Something from (almost) nothing: the impact of multiple displacement amplification on microbial ecology High-density universal 16S rRNA microarray analysis reveals broader diversity than typical clone library when sampling the environment Deepsea oil plume enriches indigenous oil-degrading bacteria Urban aerosols harbor diverse and dynamic bacterial populations Bacterial community structure in geographically distributed biological wastewater treatment reactors Characterization of coastal urban watershed bacterial communities leads to alternative community-based indicators PCR amplification-independent methods for detection of microbial communities by the high-density microarray PhyloChip Microarray-based detection and genotyping of viral pathogens Viral discovery and sequence recovery using DNA microarrays A microbial detection array (MDA) for viral and bacterial detection Panmicrobial oligonucleotide array for diagnosis of infectious diseases Why the need for qPCR publication guidelines?-The case for MIQE The author provides guidelines to improve entire qPCR workflow, from experimental design to statistical analysis and reporting. General implementation of these guidelines is important for producing consistent and high quality data from qPCR assays. transcription-PCR assay for quantification of hepatitis A virus in clinical and shellfish samples Molecular detection of pathogens in water -the pros and cons of molecular techniques Prevalence and genetic diversity of waterborne pathogenic viruses in surface waters of tropical urban catchments Accurate quantification of microorganisms in PCR-inhibiting environmental DNA extracts by a novel internal amplification control approach using Biotrove OpenArrays Development and experimental validation of a predictive threshold cycle equation for quantification of virulence and marker genes by high-throughput nanoliter-volume PCR on the OpenArray Platform Pyrosequencing: history, biochemistry and future An informative introduction to the principle of pyrosequencing The development and impact of 454 sequencing This article provides an overview of the 454 sequencing technology Direct metagenomic detection of viral pathogens in nasal and fecal specimens using an unbiased high-throughput sequencing approach Metagenomic analysis of viruses in reclaimed water Metagenomic analysis of RNA viruses in a fresh water lake These papers describe the metagenomic detection of viruses in environmental water samples by massively parallel pyrosequencing technology High-throughput genome sequencing of two Listeria monocytogenes clinical isolates during a large foodborne outbreak Variation in virulence among clades of Escherichia coli O157:H7 associated with disease outbreaks Causes of outbreaks associated with drinking water in the United States from 1971 to Single virus genomics: a new tool for virus discovery The authors describe a new approach that combines flow cytometry, whole genome amplification and pyrosequencing for the isolation and complete genome sequencing of a single virus particle. This approach has the potential to discover novel viruses and facilitate comparative genomic analyses by providing relevant reference genomes Dead-end hollow-fiber ultrafiltration for recovery of diverse microbes from water Tangential-flow ultrafiltration with integrated inhibition detection for recovery of surrogates and human pathogens from large-volume source water and finished drinking water Quantification of viable Legionella pneumophila cells using propidium monoazide combined with quantitative PCR Use of propidium monoazide in reverse transcriptase PCR to distinguish between infectious and noninfectious enteric viruses in water samples Selective detection of live bacteria combining propidium monoazide sample treatment with microarray technology Discrimination between live and dead cells in bacterial communities from environmental water samples analyzed by 454 pyrosequencing Quantitative real-time PCR analysis of total and propidium monoazide-resistant fecal indicator bacteria in wastewater Escherichia coli, enterococci, and Bacteroides thetaiotaomicron qPCR signals through wastewater and septage treatment Landscape of next-generation sequencing technologies The authors provide the current state of the art in next-generation sequencing technologies The authors demonstrate that metagenomic sequencing can provide an improved detection of unidentified pathogens in a public health setting Rapid identification and quantification of Campylobacter coli and Campylobacter jejuni by real-time PCR in pure cultures and in complex samples Quantification of viable but nonculturable Escherichia coli O157:H7 by targeting the rpoS mRNA Failure to detect Helicobacter pylori DNA in drinking and environmental water in Dhaka, Bangladesh, using highly sensitive real-time PCR assays Specific real-time PCR for simultaneous detection and identification of Legionella pneumophila serogroup 1 in water and clinical samples Influence of environmental gradients on the abundance and distribution of Mycobacterium spp. in a coastal lagoon estuary Detection of live Salmonella sp. cells in produce by a TaqMan-based quantitative reverse transcriptase real-time PCR targeting invA mRNA Development of a multiplex real-time PCR assay with internal amplification control for the detection of Shigella Species and Enteroinvasive Escherichia coli Real-time PCR with an internal control for detection of all known human adenovirus serotypes A broadly reactive one-step real-time RT-PCR assay for rapid and sensitive detection of hepatitis E virus Development and evaluation of a real-time PCR assay for quantification of Giardia and Cryptosporidium in sewage samples A duplex real-time PCR assay for the quantitative detection of Naegleria fowleri in water samples Pyrosequencing of the 16S rRNA gene to reveal bacterial pathogen diversity in biosolids Pathogenic bacteria in sewage treatment plants as revealed by 454 pyrosequencing Examples of how massively parallel pyrosequencing can be used to determine the diversity of bacterial pathogens in wastewater based on 16S rRNA Pyrosequencing of antibiotic-contaminated river sediments reveals high levels of resistance and gene transfer elements Using pyrosequencing, the authors demonstrate high levels of resistance genes in bacterial communities in river sediments exposed to antibiotics Viral metagenome analysis to guide human pathogen monitoring in environmental samples Another example of how pyrosequencing can be used to study the diversity of human pathogenic viruses in environmental samples. The authors also describe an improved bioinformatic approach for identifying pathogens in a virome data set