key: cord-0878990-6i66zf0e authors: Rutvisuttinunt, Wiriya; Klungthong, Chonticha; Thaisomboonsuk, Butsaya; Chinnawirotpisan, Piyawan; Ajariyakhajorn, Chuanpis; Manasatienkij, Wudtichai; Phonpakobsin, Thipwipha; Lon, Chanthap; Saunders, David; Wangchuk, Sonam; Shrestha, Sanjaya K.; Velasco, John Mark S.; Alera, Maria Theresa P.; Simasathien, Sriluck; Buddhari, Darunee; Jarman, Richard G.; Macareo, Louis R; Yoon, In-Kyu; Fernandez, Stefan title: Retrospective use of next-generation sequencing reveals the presence of Enteroviruses in acute influenza-like illness respiratory samples collected in South/South-East Asia during 2010–2013 date: 2017-07-14 journal: J Clin Virol DOI: 10.1016/j.jcv.2017.07.004 sha: c2033bc278a4005808e5338b3a0ffbdfdb92411b doc_id: 878990 cord_uid: 6i66zf0e BACKGROUND: Emerging and re-emerging respiratory pathogens represent an increasing threat to public health. Etiological determination during outbreaks generally relies on clinical information, occasionally accompanied by traditional laboratory molecular or serological testing. Often, this limited testing leads to inconclusive findings. The Armed Forces Research Institute of Medical Sciences (AFRIMS) collected 12,865 nasopharyngeal specimens from acute influenza-like illness (ILI) patients in five countries in South/South East Asia during 2010–2013. Three hundred and twenty-four samples which were found to be negative for influenza virus after screening with real-time RT-PCR and cell-based culture techniques demonstrated the potential for viral infection with evident cytopathic effect (CPE) in several cell lines. OBJECTIVE: To assess whether whole genome next-generation sequencing (WG-NGS) together with conventional molecular assays can be used to reveal the etiology of influenza negative, but CPE positive specimens. STUDY DESIGN: The supernatant of these CPE positive cell cultures were grouped in 32 pools containing 2–26 supernatants per pool. Three WG-NGS runs were performed on these supernatant pools. Sequence reads were used to identify positive pools containing viral pathogens. Individual samples in the positive pools were confirmed by qRT-PCR, RT-PCR, PCR and Sanger sequencing from the CPE culture and original clinical specimens. RESULTS: WG-NGS was an effective way to expand pathogen identification in surveillance studies. This enabled the identification of a viral agent in 71.3% (231/324) of unidentified surveillance samples, including common respiratory pathogens (100/324; 30.9%): enterovirus (16/100; 16.0%), coxsackievirus (31/100; 31.0%), echovirus (22/100; 22.0%), human rhinovirus (3/100; 3%), enterovirus genus (2/100; 2.0%), influenza A (9/100; 9.0%), influenza B, (5/100; 5.0%), human parainfluenza (4/100; 4.0%), human adenovirus (3/100; 3.0%), human coronavirus (1/100; 1.0%), human metapneumovirus (2/100; 2.0%), and mumps virus (2/100; 2.0%), in addition to the non-respiratory pathogen herpes simplex virus type 1 (HSV-1) (172/324; 53.1%) and HSV-1 co-infection with respiratory viruses (41/324; 12.7%). Asia is an epicenter for the emergence of diverse respiratory pathogens representing major global public health threats [1] [2] [3] . Accurate molecular identification of respiratory pathogens is a vital decisionmaking tool for public health officials and health care providers. Unfortunately, traditional laboratory techniques do not allow a broad detection of viral pathogens during routine surveillance [4] . Next-generation sequencing (NGS) is a powerful tool to detect emerging or re-emerging pathogens, and to obtain information about frequency of intra-host genetic variation and virological responses to treatments [5] [6] [7] . NGS has been utilized to study commonly circulating respiratory viruses and novel viruses found in vectors, animals, and humans [8] [9] [10] . In addition, whole genome NGS (WG-NGS) has been utilized to identify viruses in diagnostic clinical virology [11] [12] [13] and occasionally applied in parallel with standard diagnostic assays [12, 14] . During 2010-2013, the Armed Forces Research Institute of Medical Sciences (AFRIMS) conducted routine acute respiratory illness surveillance in 5 countries in South/South East (S/SE) Asia: Nepal, Bhutan, Thailand, Cambodia, and Philippines. 12,865 nasal and/or oropharyngeal swabs in universal transport media (UTM) were collected from individuals seeking medical care for acute influenza-like illness (ILI). These specimens were tested for influenza virus (IFV) by rRT-PCR, and 52.4% IFV negative specimens were sent for cell-based culture isolation. Of these, 324 produced CPE but were negative for common respiratory virus serology tests suggesting the presence of uncommon respiratory viruses in the culture. We sought to determine if we could identify a pathogen in CPE positive samples despite the samples testing negative using antibody panels for common respiratory samples. The information obtained will be used to improve current molecular detection protocols, which can then be incorporated in routine surveillance and clinical diagnosis. To investigate whether WG-NGS can be used to uncover the etiology of IFV negative, but CPE positive specimens, we utilized the Illumina MiSeq platform in conjunction with traditional molecular assays. Viral isolations of IFV rRT-PCR negative that showed CPE positive were pooled and sequenced by WG-NGS. NGS sequences were utilized to identify positive pools containing viral pathogens. PCRs and Sanger sequencing were further performed on individuals of the positive pools to identify the presence of specific viral genotypes in the positive specimens. Nasal and/or oropharyngeal swabs in UTM (Copan Diagnostics Inc., California, USA) were collected from ILI surveillance sites in S/SE Asia during 2010-2013. The surveillance samples were collected in Nepal, Bhutan, Thailand, Cambodia and Philippines under protocols approved by the Institutional Review Boards of host country institutions and the Walter Reed Army Institute of Research (WRAIR). Respiratory specimens negative by IFV rRT-PCR underwent viral isolation in Madin-Darby canine kidney (MDCK) and Hep-2 cells (Fig. 1) [15, 16] . Culture supernatant from CPE positive cultures were tested by hemagglutination inhibition (HAI) for IFV and by indirect fluorescent antibody (IFA) for respiratory syncytial virus (RSV), IFV A, IFV B, human adenovirus (HAdV), human parainfluenza (hPIV) 1, 2 and 3. CPE positive, but HAI and IFA-negative supernatants were selected for further testing by WG-NGS. The supernatant of these CPE positives were grouped in 32 pools containing 2-26 supernatants per pool based on their country of origin, the year they were initially collected and cell line used for isolation (Tables 1A and 1B) . Equal volume of each CPE positive supernatant was pooled to make up a total of volume of 300 μl/pool. Virus enrichment steps, including removal of host cells and virus concentrating steps were applied as described previously [16, 17] . QIAamp Viral RNA Extraction Kit (QIAGEN, CA, USA) was used for viral nucleic acid extraction. Na-noDrop (Wilmington, DE, USA) and Agilent Tapestation1100 (Agilent Technologies, CA, USA) were utilized to quantify and determine the quality of the DNA and RNA (OD 260 /OD 280 values). TruSeq LT RNA sample preparation was conducted following the manufacturer's instructions as described before [16, 17] . Briefly, RNA was heat-fragmented, followed by double stranded cDNA synthesis (Life technologies, NY, USA) and adapter ligation. DNA quantity and quality control was validated and monitored by Agilent Tapestation1100, Qubit ® 2.0 Fluorometer (Life Technologies, NY, USA) and rRT-PCR (Applied Biosystems, CA, USA). Three runs (10-12 pools/run) were conducted with Illumina MiSeq reagent version2 according to the manufacturer's protocol. The quality control of sequence reads was done as previously described [16, 17] with additional steps used for novel pathogen findings [Fig 2] . Identification of individual viruses from the pool was performed as follows: Viral genome quantification was conducted by (i) de novo assembly of sequence reads into contigs, consensus sequences assembled from sequence reads by Trinity [18] followed by identification of the contigs in GENBANK by BLASTn and (ii) de novo assembly by meta-IDBA [19] , followed by blast against complete genome sequence data base. Novel pathogen classification followed the analysis procedure by BLASTx alignment against the non-redundant (nr) protein database as described previously [20] with the cut-off at 500 kb or greater in contig length with undetectable similarity to reference Real-time RT-PCR, RT-PCR or PCR was used to confirm the WG-NGS results from CPE culture and clinical specimens. Table S1 displays the PCR and RT-PCR primers sequences used to confirm the WG-NGS results. The pan-EV RT-PCR [21] results were confirmed by Sanger sequencing at AITbiotech Pte Ltd. Company, Singapore. The accession numbers of our sequences in GENBANK were described in Table 3 with a small subset of samples previously published by Zhou et al., 2016 . To determine the genetic variations and the relationships with the reference viruses, phylogenetic trees were constructed using Seaview (v4.0) and GTR+G+I parameter calculated by MEGA v. 6.0. During routine ILI surveillance in S/SE Asia in 2010-2013, approximately 4325 (33.6%) of 12,865 respiratory specimens collected were found to be positive for IFV by rRT-PCR and 8540 (66.4%) specimens were found to be negative (Fig. 1) . Approximately 52.4% (4478 of 8540) of the IFV rRT-PCR negative specimens were cultured, 12.8% of which (572 of 4478) were found to induce CPE positive in either MDCK or HEp-2 cells. A total of 339 of 572 (59.3%) of these CPE positive cultures were negative by standard respiratory HAI and IFA assays, of which 324 (95.6%) (2-26 culture samples per pool) were pooled. A total of 32 distinct pools were sequenced by WG-NGS to identify viral agents present in the supernatant following the flow diagram illustrated in Fig. 2 . A total of 46.1 million sequence reads were generated from 3 sequencing runs. After selecting sequence reads with ≥30Q scores and removing background sequences, 5.9 million sequence reads remained for analyses. Sequence alignment using BLASTn in NCBI nt database identified viral sequences in 29 of 32 (90.6%) pools (Tables 1A and 1B) . Pools from Cambodia 2013, Philippines 2012 and Philippines 2013 yield sequence reads identified host background and not viral pathogens by BLASTn in NCBI nt database. Sequence reads aligning with HSV-1 were found to be most common (25 of 32 pools, 78%). In addition to HSV-1, 11 other viral sequences were detected in the pools, including mumps virus (MuV), enterovirus (EV), Coxsackievirus (CV), echovirus (E), human rhinovirus (hRV), IFV A, IFV B, hPIVs, human adenovirus (HAdV), human coronavirus (HCoV), and human metapneumovirus (hMPV). We verified the pools' results using conventional PCR and RT-PCR and were able to confirm the WG-NGS results in 63/105 cases (60%, Tables 1A and 1B, shaded cells). A discrepancy occurred in the Philippines 2012, where PCR/RT-PCR testing failed to corroborate the presence of any of the 4 viruses detected by WG-NGS. No pathogen identification was made by WG-NGS and PCR in three pools (Tables 1A and 1B, italic letters). The 310 culture supernatants that made up the 29 pools were further tested by pathogen-specific PCR, RT-PCR or rRT-PCR to validate the WG-NGS findings and to quantify the percentage of samples within each pool where viruses were present ( Table 2) . Two hundred and thirty-one of the 310 (74.5%) individual samples were positive using at least one of the pathogen-specific PCR tests verifying the pathogens identified by NGS in the pooled samples (Table 2 ). Approximately 43.3% (100/231) of the positive samples were respiratory pathogens. Of these, CVs were the most common respiratory viruses found, present in 31.0% (31/100) of all positive samples, followed by E, found in 22% samples. EV (16.0%), hRV, (3.0%), MuV (2.0%), hMPV (2.0%) and HCoV (1.0%) were all found at lesser frequencies ( Table 2) . Two samples positive for the pan-enterovirus (pan-EV) PCR could not be further typed and appear in Table 2 as pan-EV positive (2%). Also found were respiratory viruses that failed to be identified during the original screening for IFVA and HAdV due to few mismatched primer sequences (Fig. S2) . The non-respiratory virus HSV-1 was the most common virus found in the individual supernatants, present in 172 of 310 samples (55.4%, Table S3 ). In at least 41 of these samples (23.8%) HSV-1 appeared as a co-infecting agent, most commonly co-present with enteroviruses. One sample collected from Philippines in 2012 contained HCoV OC43 strain co-infected with HSV-1. In addition, standard molecular procedures were conducted in selected samples and results validated the presence of the identified pathogens in both clinical specimens and CPE culture. Majority of clinical specimens were depleted and unavailable for this confirmation test. Seventy-one EV positive (22 from HSV-1 co-infected samples) samples from Nepal, Bhutan, Thailand, Cambodia and Philippines, were further subtyped within the EV genus utilizing the GTR+G+I model maximum likelihood tree (Fig. 3) using the Sanger sequences obtained from their PCR amplicons as previously described. The 71 positive pan-EV samples were found to be CV B5 ( [22] . Five subtypes of the enterovirus genus and hRV species C were found in more than one country in SE Asia. The genetic distance among these samples of the same serotypes found in 5 countries were less than 0.2 (data not shown). One case of EV71 was detected from samples collected from Thailand in 2011. Seeking for Viral Pathogen Identification of Contigs: BLASTn against NCBI nt database of the Pools Table 2 Confirmation of individuals samples by PCR, RT-PCR or rRT-PCR. a Further characterization of enterovirus genus (from the pan-EV PCR-left side of dark grey column) of individual CPE samples were obtained from phylogenetic analysis of sequences (Fig. 3) . Total numbers of samples in the gray highlighted columns are equal to the number in the Pan-EV column. Bold letters confirmed the present of the pathogen previously identified in the pools by WG-NGS. Asterisk indicated the exceptional 2 cases that the pathogens identified differently than indicated from the pools. Unlined letters demonstrate samples contain mismatched sequences where primers of the rRT-PCR bind (Table S2 ). A summary of the pathogens identified and the cases with still unclear etiology collected during routine respiratory surveillance 2010-2013 is illustrated in Fig. 4 . Our analyses found HSV-1 to be the most frequent virus present in our supernatant, followed by EV genus and other common respiratory viruses, which was in agreement with the observation in Thailand reported by Zhou et al. [22] . The frequency of HSV-1 positive samples was most likely due to the exacerbation of existing HSV-1 infections caused by common respiratory infections in patients and the subsequent HSV-1 viral shedding in the upper respiratory track at the time of sample collection. Individuals with HSV-1 infections were routinely described as asymptomatic or subclinical [23] . It was estimated that HSV-1 seroprevalence was about 51% in SE Asia [24] and 33.3% in S Asia [25] . Viruses from the EV genus were the most common (71%) respiratory pathogens identified. Nine samples from Thailand 2010-2011 were sequenced individually and confirmed the present of the identical enterovirus serotypes sequences [22] . Despite the spread of EV-D68 globally [26] , it was not detected in our surveillance study which enrolled adults 18-65 ages group during 2010-2013. Our study, with limited sample size detected a similar range of enterovirus serotypes (CV B2, CV B3, CV B4, E6 and E9) in SE Asian countries as other recent studies in Thailand [21, 22] . These serotypes identified outside Thailand strains have genetic distance < 0.2 from the Thai serotypes. This suggests some temporal and spatial structure of both the Thai and, more broadly, the SE Asian EV populations, and thus may indicate some regional circulation of EV lineages and viral traffic between countries in SE Asia. Overall, our study demonstrated that WG-NGS on pooled culture supernatants can be a useful tool to identify viral pathogens in clinical isolates. However, the identification of pathogens in the samples that yielded low sequence reads requires additional testing and controls. Our data (Tables 1A and 1B) showed total sequence reads without subtracting background sequence of the negative control experiment. The low sequence reads in several samples were likely generated from combination of cross talk among neighboring sequenced samples, contamination of barcodes and carries over from the previous runs. A specific example such as the Philippines 2012 pool contained sequence reads that were identical to Philippines 2011-2012 pool, which was performed next to each other during library preparation. In addition, three pools detected only host background and not viral pathogens possibly due to low viral load, no viruses growing in the culture, or misidentifying the morphology of culture identified as CPE positive. WG-NGS of pooled culture provided comprehensive information from samples with unclear etiology after routine respiratory assays. Due to limitation of the study, a subset of the samples (2.9%) was selected for sequencing individually and the identity of the pathogens from sequence individually is in agreement of the results from the pooled samples. With the addition of WG-NGS, 29 of 32 (90.6%) pools were shown to contain viral pathogens identifiable using the GENBANK nt database. When broken down into individual samples, we were able to confirm the presence of other respiratory viruses in 100 of the 310 (32.3%) individual samples. WG-NGS significantly increased valid data information in addition to the standard respiratory assays. Furthermore, the WG-NGS and bioinformatics analysis were able to identify viral pathogens from pooled cultures containing up to 26 samples, which potentially makes this method time, labor and cost effective. This procedure allows screening of larger numbers of samples, with practical applications to surveillance or clinical studies. However, the sensitivity of detection of this procedure may have some limitations [27] . Mainly pooled CPE positive cultures were tested as opposed to individually sequencing primary specimens. Without taking into consideration samples where HSV-1 was detected, approximately 67.8% of the samples in the pooled remained of unknown etiology after WG-NGS and bioinformatics data analysis (Fig. 4) . A potential and likely significant limitation in this work was the cell lines selected for virus isolation. Our limitation to our two cell lines may have curtailed the type of viruses we were able to isolate and identify. Moreover, antibiotics required to maintain the MDCK and HEp-2 cultures may have also favored the growth of certain viruses over others. Despite being a useful tool to screen for viral pathogens, the WG-NGS on pooled cultures may suffer of low resolution not only for pathogen identification but also for pathogen discovery. The WG-NGS utilizes random amplification which also amplifies host genome, increasing the complexity of the task of identifying pathogens generally present in low abundance in the clinical specimens and culture [28] . Long sequence reads with high depth of coverage (DOC) are optimal to identify diverse types of viruses from various types of clinical specimens [29] [30] [31] [32] . For novel pathogen discovery, it would be required to apply advanced sample preparation with higher capacity host cell removal procedures directly from the primary clinical specimen and efficient pathogen discovery bioinformatics pipelines. In summary, WG-NGS on pooled samples provides advantages as a screening tool to complement standard assays used during surveillance and diagnostic process. The WG-NGS does not require predefined target for identification of specific pathogens based on clinical presentation. In addition, the WG-NGS broadens the array, magnitude and complexity of pathogen detection since it is not limited by pathogen-specific primer and probe sequences. There are no requirements for the WG-NGS to update primer and probe sequences and the use of culture to isolate continuously. However, the technology provides complex results which require careful sample preparation, powerful sequencers, exhaustiveness of data analysis on well-planned bioinformatics pipelines and large databases for pathogen identification. Armed Forces Health Surveillance Center − Global Emerging Infections Surveillance and Response System (AFHSC-GEIS). Material has been reviewed by the Walter Reed Army Institute of Research. There is no objection to its presentation and/or publication. The opinions or assertions contained herein are the private views of the author, and are not to be construed as official, or as reflecting true views of the Department of the Army, Department of the Navy or the Department of Defense. Coronavirus host range expansion and Middle East respiratory syndrome Coronavirus emergence: biochemical mechanisms and evolutionary perspectives High genetic diversity and frequent genetic reassortment of avian influenza A(H9N2) viruses along the East Asian-Australian migratory flyway Prevalence and molecular characterizations of enterovirus D68 among children with acute respiratory infection in China between Unbiased detection of respiratory viruses by use of RNA sequencing-based metagenomics: a systematic comparison to a commercial PCR panel The RNA virus quasispecies: fact or fiction Within-host nucleotide diversity of virus populations: insights from next-generation sequencing The first identification and retrospective study of Severe Fever with Thrombocytopenia Syndrome in Japan Biological characterization and next-generation genome sequencing of the unclassified Cotia virus SPAn232 (Poxviridae) Next-generation sequencing of elite berry germplasm and data analysis using a bioinformatics pipeline for virus detection and discovery Development of a virus detection and discovery pipeline using next generation sequencing HIV Research for Prevention 2014: AIDS Vaccine, Microbicide and ARV-based Prevention Science (HIV R4P) in Cape Town Evaluation of unbiased next-generation sequencing of RNA (RNA-seq) as a diagnostic method in influenza virus-positive respiratory samples Identification and characterization of Highlands J virus from a Mississippi sandhill crane using unbiased next-generation sequencing Whole genome: next-generation sequencing as a virus safety test for biotechnological products The impact of primer and probe-template mismatches on the sensitivity of pandemic influenza A/H1N1/2009 virus detection by real-time RT-PCR Simultaneous and complete genome sequencing of influenza A and B with high coverage by Illumina MiSeq Platform Viral subpopulation diversity in influenza virus isolates compared to clinical specimens Full-length transcriptome assembly from RNA-Seq data without a reference genome Meta-IDBA: a de Novo assembler for Fig. 4. Regional map displaying the distribution of respiratory viral pathogens identified by NGS from respiratory specimens collected during metagenomic data Discovery of STL polyomavirus, a polyomavirus of ancestral recombinant origin that encodes a unique T antigen by alternative splicing Prevalence and characterization of enterovirus infections among pediatric patients with hand foot mouth disease, herpangina and influenza like illness in Thailand Metagenomics study of viral pathogens in undiagnosed respiratory specimens and identification of human enteroviruses at a Thailand hospital Public health strategies to prevent genital herpes: where do we stand? Age-specific prevalence of infection with herpes simplex virus types 2 and 1: a global review Seroprevalence of HSV1 and HSV2 infections in family planning clinic attenders Clusters of acute respiratory illness associated with human enterovirus 68-Asia, Europe, and United States The use of next generation sequencing in the diagnosis and typing of respiratory infections The diagnosis of infectious diseases by whole genome next generation sequencing: a new era is opening Next-generation sequencing technologies in diagnostic virology Next-generation sequencing technology in clinical virology Temporal response of the human virome to immunosuppression and antiviral therapy Sequence analysis of the human virome in febrile and afebrile children We would like to thank Dr. Simon Pollett and Dr. Damon Ellison for comments on the manuscript; Dr. Kimberly Bishop-Lilly, Mr. Gregory Rice, Ms. Ragina Cer, Dr. Patrick Chain for advices in bioinformatics; Ms. Thidarat Intararit and the clinical research coordinator team for the support on the clinical information; Mr. Chitchai Hemachudha, Ms. Angkana Huang, Ms. Tipawan Thipwong and the project management section for specimen processing and demographic data on study subjects;Ms. Kamonthip Rungrojcharoenkit, Ms. Duangrat Mongkolsirichaikul and the virology and serology laboratory team for processing viral isolates, CPE and HAI. Mr. Kittinun Hussem from AFRIMS molecular section team and Ms. Phatcharin Chotchuang an undergraduate student from Burapha University for laboratory technical assistance on rRT-PCR, Tapestation and qPCR. Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.jcv.2017.07.004.