key: cord-355988-4eldkteb authors: SAMPATH, RANGARAJAN; HALL, THOMAS A.; MASSIRE, CHRISTIAN; LI, FENG; BLYN, LAWRENCE B.; ESHOO, MARK W.; HOFSTADLER, STEVEN A.; ECKER, DAVID J. title: Rapid Identification of Emerging Infectious Agents Using PCR and Electrospray Ionization Mass Spectrometry date: 2007-04-23 journal: Ann N Y Acad Sci DOI: 10.1196/annals.1408.008 sha: doc_id: 355988 cord_uid: 4eldkteb abstract: Newly emergent infectious diseases are a global public health problem. The population dense regions of Southeast Asia are the epicenter of many emerging diseases, as evidenced by the outbreak of Nipah, SARS, avian influenza (H5N1), Dengue, and enterovirus 71 in this region in the past decade. Rapid identification, epidemiologic surveillance, and mitigation of transmission are major challenges in ensuring public health safety. Here we describe a powerful new approach for infectious disease surveillance that is based on polymerase chain reaction (PCR) to amplify nucleic acid targets from large groupings of organisms, electrospray ionization mass spectrometry (ESI‐MS) for accurate mass measurements of the PCR products, and base composition signature analysis to identify organisms in a sample. This approach is capable of automated analysis of more than 1,500 PCR reactions a day. It is applicable to the surveillance of bacterial, viral, fungal, or protozoal pathogens and will facilitate rapid characterization of known and emerging pathogens. Southeast Asia has been the source of several emerging zoonotic-and vectorborne disease outbreaks over the years. 1 These include acute encephalitis caused by a novel paramyxovirus, Nipah virus; 2,3 hand, foot, and mouth disease (HFMD) caused by coxsackie A16 and enterovirus 71; 4 viral encephalitis and hemorrhagic fevers caused by arboviruses, such as Dengue and Japanese encephalitis viruses; 5 and episodic outbreaks of influenza A virus (H5N1, H9N2, etc.) and acute respiratory syndrome caused by the SARS coronavirus. [6] [7] [8] [9] A fundamental problem in rapid identification of the etiology of infectious disease outbreaks is the sheer number of microbes, both known and novel, that could potentially be the causative agents. According to a recent review, 10 there are more than 1,400 species of microbes known to be pathogenic to humans. Of these, more than 60% were found to be zoonotic and more than 175 pathogenic species were associated with emerging infectious diseases. These numbers do not include numerous strain variants of each organism or classes of pathogens that infect plants or animals. Further, species variants of an otherwise benign group of microbes may prove infectious to humans and strain-level genetic changes may confer resistance to antimicrobial agents. Most of the new technologies being developed for detection of infectious agents incorporate a version of quantitative polymerase chain reaction (PCR), based upon the use of highly specific primers and probes and designed to selectively identify specific pathogenic organisms. Paradoxically, this approach requires assumptions about the type and strain of the infectious agent, limiting detection to a small number of known pathogens. It is both impractical and cost prohibitive to run a large number of individual tests in an attempt to identify every known infectious organism. Moreover, existing tests for known organisms will not identify newly emerging organisms. Broad-range PCR methods provide an alternative to single-agent tests. By amplifying gene targets conserved across groups of organisms, broad-range PCR has the potential to generate amplification products across entire genera, families, or, as with bacteria, an entire domain of life. This strategy has been successfully employed in the past for bacterial and viral detection. [11] [12] [13] [14] [15] [16] The drawback of this approach for epidemiologic applications is that the analysis of PCR products for mixed amplified samples requires sequencing of hundreds of colonies per reaction, and thus it cannot be performed rapidly or cost effectively on large numbers of samples. New approaches to the parallel detection of multiple infectious agents include multiplexed PCR methods 17, 18 and microarray strategies. [19] [20] [21] Microarray strategies are promising because undiscovered organisms might be detected by hybridization to probes designed to conserved regions of known families of bacteria and viruses. However, these methods are not well suited for the high-throughput detection of a broad spectrum of microorganisms potentially present in complex biological mixtures. Triangulation identification for genetic evaluation of risks (TIGER) is a universal pathogen detection platform developed under a Defense Advanced Research Projects Agency (DARPA) biodefense program that is capable of identifying an extraordinarily broad range of pathogens (FIG. 1) . The process uses electrospray ionization mass spectrometry (ESI-MS) and base composition analysis of PCR amplification products from highly conserved genomic regions to identify and determine the relative quantity of pathogenic bacteria, FIGURE 1. Automated PCR/ESI-MS biosensor. Shown in this view are key modules including amplicon purification (desalting), plate stackers, sample injection, and the mass spectrometer. Precise molecular weight determinations of amplicons yield unambiguous base compositions that are used to uniquely "fingerprint" each pathogen. The automated system is capable of analyzing more than 1,500 PCR reactions in 24 h. viruses, fungi, or protozoa present in a sample. The fundamental distinction between this PCR/ESI-MS strategy and traditional (specific) nucleic acid tests is the nature of the question being answered. Traditional nucleic acid tests answer the question: "Is infectious organism X in my sample?" The PCR/ESI-MS method answers the question: "What infectious organisms are in my sample?" In effect, this is equivalent to running many thousands of specific nucleic acid tests because the identity of the infectious organism being detected does not need to be anticipated. The premise of this technology is that it can provide rapid, sensitive, and cost-effective detection of a broad range of "normal" pathogenic organisms and simultaneously identify unexpected or emerging infectious organisms. This broad function technology may be the only practical way to rapidly diagnose diseases caused by emerging infectious organisms that might otherwise be missed or mistaken for a more common infection. 22 The technology has been validated for a variety of applications, including emerging virus identification, 23 determination of the pathogens involved in acute pneumonia epidemics, 24 and identification of biological weapons agents in a variety of sample types. 25, 26 The basis of the above approach is the principle that, despite the enormous diversity of microbes, there are sets of essential common features in their biomolecules. Parallel detection of infectious agents takes advantage of highly conserved regions within genes that flank highly variable regions. These conserved genomic features serve as anchors for broad-range PCR priming to amplify, without prejudice, nucleic acids from most microorganisms in a sample. Such broad-range PCR reactions targeted across the broadest possible grouping of organisms typically result in a mixture of amplicons, reflecting the complexity of the starting sample. With no further separation, this mixture of PCR amplicons is desalted and analyzed by ESI-MS for accurate mass measurement, enabling us to determine the base composition (numbers of A, G, C, and T nucleotides) of the PCR amplicons. 23, 25, 26 Amplification relies on highly conserved regions, but the variability in sequence between the two primer sites, reflected in the base composition, allows us to distinguish between organisms. A key aspect of this approach is the concept of "triangulation." In this context, triangulation refers to taking measurements from multiple loci distributed across the microbial genome and using the combination of the measured base composition signatures to identify the organism. While no single PCR target region might give definitive species resolution for all microbes, species can be clearly identified and differentiated from one another using the triangulation strategy across multiple loci. A database of expected base compositions for each primer region for all known organisms was generated using an in silico PCR search algorithm (electronic PCR or ePCR). An existing ribonucleic acid (RNA) structure search algorithm has been modified to include PCR parameters, such as hybridization conditions, mismatches, and thermodynamic calculations. 27 This also provides information on primer specificity of the selected primer pairs. The regions are selected so that the nucleic acid composition of the resulting amplicons has sufficient variability to maximally resolve the target organisms. Thus, the measured base compositions from strategically selected locations of the genome are used as a signature to identify and distinguish the organisms present in the original sample. Detailed experimental protocols have been published elsewhere. 25 Microbes have been identified from human and animal samples (throat swabs, skin wipes, hair, tissue) and from samples obtained from the environment (food products, soil, water, surface wipes, air). Nucleic acids are extracted from bacteria using either a heat or mechanical lysis step. Viral genome isolation is performed using commercially available kits that have been optimized for use in the system. Samples are divided into wells of a microtiter plate that contain pairs of broad-range PCR primers. A key element of the PCR/ESI-MS system is the ability to quantify the levels of genomic material present in the samples. To accomplish this, we include an internal calibrant in each PCR reaction performed. The calibrant consists of a specially designed nucleic acid sequence that is similar to, but distinguishable from, sequences of organisms we might detect. This allows for the simultaneous presence of an accurately quantified amount of a known calibrant deoxyribonucleic acid (DNA) with similar amplification properties to the target sequence during the PCR cycle. Further, this serves as an internal positive control for every PCR reaction. The relative abundances of the end products (as measured by the mass peak heights) can be used in conjunction with the starting concentration of the calibrant to accurately estimate the amount of target sequence in the starting sample (FIG. 2) . These measurements are highly reproducible over a wide range of concentrations and provide improved confidence for the determination of the quantity of the infectious agents in the sample. 24, 25 In this section, we describe two specific applications of the PCR/ESI-MS method for identifying known and emerging viruses of particular relevance to Southeast Asia. These concepts are equally applicable to bacteria, 24,25 other viruses, fungi, and protozoa. At the beginning of the SARS outbreak in early 2003, the etiology of the causative agent was not known. Several initial reports based on preliminary characterization of the isolated virus suggested it could be a member of one of several viral families, including Coronaviridae. We designed several broad-range primers targeting the various genera within these families. For broad-range detection of all coronaviruses, two PCR primer target regions in orf-1b, one in the RNA-dependent RNA polymerase (RdRp) and the other in Nsp14, were identified prior to publication of the SARS sequence. 23 Locations of primers within these regions were optimized both for sensitivity and broad-range priming potential simultaneously by amplifying dilutions of multiple, diverse coronaviruses. In silico analysis of the final primer pairs against GenBank nucleotide database sequences showed that these primers would be expected to amplify all coronaviruses but no other viruses, bacteria, or human DNA. 27 Multiple isolates of a number of different coronavirus species were obtained from various sources (including the American Type Culture Collection [ATCC] and the Naval Health Research Center in San Diego). SARS isolates were obtained from the Centers for Disease Control and Prevention (CDC) and from Michael Buchmeier (The Scripps Research Institute, La Jolla, CA). PCR products for all isolates were generated, desalted, and analyzed by ESI-MS by methods described previously. 28 The spectral signals were algorithmically processed to obtain base compositions of sense and antisense strands of each amplicon. 29 The results from the analysis of 14 coronavirus isolates are shown in TABLE 1. For both target regions, the measured signals agreed with compositions expected from the known coronavirus sequences in GenBank. Several of the isolates used in this study did not have a genome sequence record in GenBank. Nevertheless, we were able to amplify all test viruses and experimentally determine their base compositions. These experimentally determined base compositions were later confirmed by sequencing and deposited in Gen-Bank (AY874541, AY878317-24). Thus the strategy described here enables identification of organisms without the need for prior knowledge of their sequence. We further demonstrated the potential to detect multiple viruses in the same sample, as might occur during a co-infection. Pooled viral extracts from three human coronavirus isolates, HCoV-229E, HCoV-OC43, and SARS CoV, were amplified in a mixture and analyzed by PCR/ESI-MS. Signals from all three viruses were clearly detected and resolved in the mass spectrum demonstrating that co-infections of more than one coronavirus species could be identified. In addition to broad identification of all coronaviruses tested, including the SARS CoV, we were able to accurately classify the SARS isolates as unrelated but distant members of the group 2 coronaviruses, consistent with the findings of Snijder et al. 30 These results are described in detail in Sampath et al. 23 Viral encephalitidis is caused by a number of different viruses from different viral families. Important among these are members of the genus Alphavirus (Togaviridae family) and the genus Flavivirus (Flaviviridae family); both are single-strand, positive-strand RNA viruses. These are major vector-borne diseases in Southeast Asia and Pacific Rim countries. While the alphaviruses Venezuelan, and eastern and western equine encephalitis viruses (VEEV, EEEV, and WEEV) have received much attention in the Americas for their ability to infect humans and their potential to be used as biowarfare agents, Ross River, Chikungunya, Barmah Forest, and other alphaviruses are common and detrimental to public health in the Asia Pacific regions. 5, 31, 32 Genomic changes drive the emergence and reemergence of these viruses as high-consequence pathogens by causing alterations in host range and virulence properties, 33 and the resulting diversity at the nucleotide and protein levels poses a great challenge for identification and diagnostic strategies. As an example, VEE complex viruses display 13 serologically distinct types belonging to 6 antigenic subtypes (I to VI), and the EEE virus antigenic complex has 4 antigenic subtypes (I-IV), spanning the North and South American isolates. 34 Our approach for identifying and discriminating these viruses was to design several primer pairs that would amplify all members of the Alphavirus genus, and then design primers that amplify distinct regions of the Alphavirus isolates to distinguish subtypes. 25 Broad-range primer pairs targeting the 5 -end of the RNA virus genome (nonstructural protein, nsP1 gene) were designed and tested against a number of different viral isolates, including both old world and new world alphaviruses. All of the test isolates showed a perfect match to the expected base compositions where prior data were known, and the tested viral isolates were readily distinguished from one other (TABLE 2) . Importantly, the base composition data for some of the isolates represented new measurements from regions that have not been sequenced, showing the potential of this strategy to identify previously unknown or newly emerging viruses. Additional confirmation primer pairs targeting the old and new world alphaviruses have been developed and together they provide complete resolution among all known alphavirus species (Eshoo et al., Personal Observation). The biosensor system described here is uniquely suited to influenza virus detection and surveillance in avian, human, or other animal reservoirs. Broad surveillance capability for all influenza viruses is complemented by simultaneous type and subtype level resolution of various identified influenza viruses. This will be useful in monitoring the emergence of new influenza viruses. Based upon the analysis of multiple influenza sequence alignments, paninfluenza virus PCR primer sets have been developed that are capable of amplifying all three influenza virus species (A, B, and C) and subtypes (HxNy) from different animal hosts (human, avian, swine, etc.) and to distinguish their essential molecular features using base composition signatures (Sampath et al., Personal Observation) . To measure the breadth of coverage and resolution offered by this panel we tested 50 different influenza virus isolates, including TIGER strategy uses a paradigm that allows simultaneous identification of multiple pathogens, including all bacteria, viruses, fungi, and other microorganisms present in a sample without the need for prior knowledge of a specific nucleotide target sequence. Thus previously unanticipated, newly emerging, or potentially unknown organisms can be identified using this platform. The strategy described here uses integrated ESI-MS and base composition analysis of broad-range PCR products. Broad-range PCR reactions are capable of producing products from groups of organisms, rather than single species, and the information content of each PCR reaction is very high. In many cases, including the SARS-CoV detection described above, priming across broadly conserved regions provides taxonomic resolution at the species level. Where additional subspecies level classification becomes important, broad primers can be supplemented with species-specific primers that can detect even single nucleotide changes (SNPs) at highly variable loci. As a genotypic method for microbial identification, PCR/ESI-MS can provide strain-or isolate-level information. 35 These data may be important in epidemiological monitoring to determine the origin of a particular outbreak. Due to its high throughput capabilities, the methods described will have great utility in a routine survey and detection setting. The mass spectrometer is capable of analyzing complex PCR products at a rate of approximately 1 min per sample. Because the process is performed in an automated, microtiter plate format, large numbers of samples can be examined (>1,400 PCR reactions/day/instrument), making this process practical for large-scale analysis of clinical or environmental surveillance samples in public health laboratory settings. This approach can be extended to other viral, bacterial, fungal, or protozoal pathogen groups and is a powerful new paradigm for timely identification of previously unknown organisms that cause disease in humans or animals and for monitoring the progress of epidemics. Emerging infectious diseases-Southeast Asia Nipah virus infection, an emerging paramyxoviral zoonosis Nipah virus infection: pathology and pathogenesis of an emerging paramyxoviral zoonosis Molecular epidemiology of human enterovirus 71 strains and recent outbreaks in the Asia-Pacific region: comparative analysis of the VP1 and VP4 genes Emerging viral diseases of Southeast Asia and the Western Pacific Identification of severe acute respiratory syndrome in Canada Coronavirus as a possible cause of severe acute respiratory syndrome Novel coronavirus and severe acute respiratory syndrome A novel coronavirus associated with severe acute respiratory syndrome Risk factors for human disease emergence Analysis of a marine picoplankton community by 16S rRNA gene cloning and sequencing Bacterial diversity within the human subgingival crevice Phylogenetic analysis of a highly conserved region of the polymerase gene from 11 coronaviruses and development of a consensus polymerase chain reaction assay Molecular phylogeny and proposed classification of the simian picornaviruses A sensitive method for the identification of uncharacterized viruses related to known virus groups: hepadnavirus model system PCR method for detection of adenovirus in urine of healthy and human immunodeficiency virus-infected individuals A multiplex reverse transcription-PCR method for detection of human enteric viruses in groundwater Serotyping Streptococcus pneumoniae by multiplex PCR High-density microarray of small-subunit ribosomal DNA probes Microarray-based detection and genotyping of viral pathogens Viral discovery and sequence recovery using DNA microarrays Novel biosensor for infectious disease diagnostics Rapid identification of emerging pathogens: coronavirus Rapid identification and strain-typing of respiratory pathogens for epidemic surveillance TIGER: the universal biosensor Mass spectrometry provides accurate characterization of two genetic marker types in Bacillus anthracis RNAMotif-An RNA secondary structure definition and search algorithm A highly efficient and automated method of purifying and desalting PCR products for analysis by electrospray ionization mass spectrometry Length and base composition of PCR-amplified nucleic acids using mass measurements from electrospray ionization mass spectrometry Unique and conserved features of genome and proteome of SARS-coronavirus, an early split-off from the coronavirus group 2 lineage Development of reverse transcription-PCR assays specific for detection of equine encephalitis viruses Viruses of the Bunya-Togaviridae families: potential as bioterrorism agents and means of control Venezuelan equine encephalitis emergence: enhanced vector infection from a single amino acid substitution in the envelope glycoprotein Genetic and antigenic diversity among eastern equine encephalitis viruses from North, Central and South America High throughput genotyping of Acinetobacter baumannii All authors are full-time employees and stockholders of Ibis Biosciences, Inc., a wholly owned subsidiary of Isis Pharmaceuticals, Inc. TIGER technology was jointly developed by the Ibis Biosciences a subsidiary of Isis Pharmaceuticals and SAIC (San Diego, USA). We acknowledge DARPA, CDC, Department of Homeland Security (DHS), and National Institute of Allergy and Infectious Diseases (NIAID) for financial support for development of the technology and Dr. Jacqueline R. Wyatt, Ph.D. (www.JL-SciEdit.com) for carefully reviewing the manuscript; patents are currently pending in the United States and internationally. Several key participants contributed significantly to the development and implementation of various aspects of the technology but are too numerous to be listed individually by name.