key: cord-0264660-y954ajwf authors: Zhang, Maodong; Huang, Yanyun; Godson, Dale L.; Fernando, Champika; Alexander, Trevor W.; Hill, Janet E. title: Assessment of metagenomic sequencing and qPCR for detection of influenza D virus in bovine respiratory tract samples date: 2020-06-10 journal: bioRxiv DOI: 10.1101/2020.06.10.144782 sha: 534fc35dc4c2f25946659137a8ab95cfce643837 doc_id: 264660 cord_uid: y954ajwf High throughput sequencing is currently revolutionizing the genomics field and providing new approaches to the detection and characterization of microorganisms. The objective of this study was to assess the detection of influenza D virus (IDV) in bovine respiratory tract samples using two sequencing platforms (MiSeq and Nanopore (GridION)), and species-specific qPCR. An IDV-specific qPCR was performed on 232 samples (116 nasal swabs and 116 tracheal washes) that had been previously subject to virome sequencing using MiSeq. Nanopore sequencing was performed on 19 samples positive for IDV by either MiSeq or qPCR. Nanopore sequence data was analyzed by two bioinformatics methods: What’s In My Pot (WIMP, on the EPI2ME platform), and an in-house developed analysis pipeline. The agreement of IDV detection between qPCR and MiSeq was 82.3%, between qPCR and Nanopore was 57.9% (in-house) and 84.2% (WIMP), and between MiSeq and Nanopore was 89.5% (in-house) and 73.7% (WIMP). IDV was detected by MiSeq in 14 of 17 IDV qPCR-positive samples with Cq (cycle quantification) values below 31, despite multiplexing 50 samples for sequencing. When qPCR was regarded as the gold standard, the sensitivity and specificity of MiSeq sequence detection were 28.3% and 98.9%, respectively. We conclude that both MiSeq and Nanopore sequencing are capable of detecting IDV in clinical specimens with a range of Cq values. Sensitivity may be further improved by optimizing sequence data analysis, improving virus enrichment, or reducing the degree of multiplexing. Abstract: High throughput sequencing is currently re volutionizing the genomics field and 17 providing new approaches to the detection and characterization of microorganisms. The 18 objective of this study was to assess the detection of influenza D virus (IDV) in bovine 19 respiratory tract samples using two sequencing platforms (MiSeq and Nanopore 20 (GridION)), and species-specific qPCR. An IDV-specific qPCR was performed on 232 21 samples (116 nasal swabs and 116 tracheal washes) that had been previously subject to 22 virome sequencing using MiSeq. Nanopore sequencing was performed on 19 samples 23 positive for IDV by either MiSeq or qPCR. Nanopore sequence data was analyzed by two 24 bioinformatics methods: What's In My Pot (WIMP, on the EPI2ME platform), and an in-25 house developed analysis pipeline. The agreement of IDV detection between qPCR and 26 MiSeq was 82.3%, between qPCR and Nanopore was 57.9% (in-house) and 84.2% 27 (WIMP), and between MiSeq and Nanopore was 89.5% (in-house) and 73.7% (WIMP). 28 IDV was detected by MiSeq in 14 of 17 IDV qPCR-positive samples with Cq (cycle 29 quantification) values below 31, despite multiplexing 50 samples for sequencing. When 30 qPCR was regarded as the gold standard, the sensitivity and specificity of MiSeq sequence 31 detection were 28.3% and 98.9%, respectively. We conclude that both MiSeq and 32 High throughput sequencing is currently revolutionizing the genomics field and 39 providing new approaches to the detection and characterization of viruses. The workflow of bioinformatic analysis is illustrated in Figure 1b . Once Nanopore raw 141 data were demultiplexed and trimmed using Porechop and passed the quality score (Qscore) 142 7, high quality reads were aligned to the bovine genome (BioProject Accessions 143 PRJNA33843, PRJNA32899) using Minimap2, and unmapped reads (i.e. non-host derived 144 reads) from each sample were de novo assembled using Trinity [22] [23] [24] . Assembled contigs 145 were mapped to the virus Reference Sequence (RefSeq) database using BLASTn and virus-146 like contigs with a minimum alignment length of 100 bp and an expectation (e) value < 10 -3 147 were further examined by BLASTx alignment to the GenBank non-redundant protein 148 sequence database to confirm the nucleotide sequence-based identification and to remove 149 any spurious matches [25] . The total number of viral reads was determined as previously 150 described [10] . 151 Quality filtered reads from the Nanopore sequencing were also uploaded to the EPI2ME 152 platform for analysis with the WIMP (What's in My Pot, version 2.3.7) application for 153 taxonomic classification of reads. 154 sample and library preparation. Extracted RNA was used directly for qPCR, while 158 DNA randomly amplified from the same extracts were used for MiSeq sequencing 159 and GridION Nanopore sequencing; (b) Bioinformatic workflow to identify viruses 160 in BRD samples. WIMP analysis was used only for Nanopore data. The remaining 161 analysis was the same for data from both MiSeq and GridION Nanopore sequencing 162 except Minimap2 was used instead of Bowtie2 in for host sequence subtraction. 163 A total of 82.7 million reads were obtained from MiSeq. When removed low-quality 189 reads and host-derived reads, 33.6 million reads were remained. A total of 1.8 million high-190 quality viral reads were generated, accounting for 2.19% of the total reads obtained from 191 MiSeq [10]. A total of 5.9 million Nanopore reads including unclassified (30.5%) and 192 classified reads (69.5%) passed the quality filter (Qscore 7) using MinKNOW. After 193 subtracting reads reported as "unclassified" by WIMP, a total of 0.41 million viral reads were 194 obtained, accounting for 6.9% of total reads obtained ( Figure 3 ). The proportion of viral reads 195 per sample was 0.1% to 18.4% (Nanopore, WIMP analysis) compared to 0.03% to 3.1% for 196 the previously generated MiSeq data; however, with both sequencing approaches, the 197 majority of reads obtained were identified as host-derived or other (bacteria, fungi, 198 unclassified) (Figure 3) . 199 200 In addition to WIMP classification of quality-filtered Nanopore reads, we also performed 206 a de novo assembly of the Nanopore reads. The largest IDV contigs assembled for each 207 sample from Nanopore data (using the in-house bioinformatics workflow, Figure 1b) were 208 generally longer than those from MiSeq data and ranged from 626 to 2308 bp (Nanopore), 209 and 249 to 1584 bp (MiSeq) ( Table 1 ). The genome (or genome segment) coverage of each 210 largest contig from each sample was from 10.53% to 95.31%. The proportion of Nanopore 211 reads mapped to IDV for each sample by in-house analysis was higher than that from MiSeq 212 except for sample T10 and T30. The proportion of IDV reads identified in the WIMP analysis 213 of the Nanopore data, however, was generally comparable to that from the Nanopore (in-214 house) workflow (Table 1 ). The proportion of reads identified as IDV in Nanopore (WIMP), 215 Nanopore (in-house) and MiSeq sequencing was generally extremely low (average 2.51%, 216 17.03% and 0.46%, respectively). As expected, approximately six times more reads were 217 obtained for the individually sequenced sample 129 than for those from the multiplexed 218 samples (Table 1 ). Sample 129 also had the lowest Cq value in the IDV qPCR (16.99, 219 corresponding to 6.25 x 10 7 copies per reaction) and the highest proportion of IDV reads in 220 the metagenomic sequencing results (Nanopore-WIMP 14.69%, Nanopore-in-house 27.72%, 221 MiSeq 1.48%) ( Table 1) on the x-axis indicate individual specimens; tracheal samples are denoted by "T" 227 before animal number. 228 The 19 samples selected for sequencing on the Nanopore GridION platform represented 230 a range of IDV concentrations based on qPCR of 6.88 to 6.25×10 7 genome copies per 231 reaction, corresponding to Cq values ranging from 39.46 to as low as 16.99 (Table 1 ). The 232 agreement of IDV detection between qPCR and Nanopore was 57.9% (in-house) and 84.2% 233 (WIMP), and that between MiSeq and Nanopore was 89.5% (in-house) and 73.7% (WIMP). 234 IDV was detected in the Nanopore data from all but one (18/19) of the IDV-positive samples 235 when reads were classified using the WIMP application, but this proportion dropped to 11/19 236 when the in-house read assembly workflow was used. For most (7/8) of the samples with 237 disparate results, 10 or fewer IDV reads were identified in the WIMP analysis. The exception 238 was sample T10 with 29 IDV reads. 239 In order to explore qualitatively whether detection of other viruses in addition to IDV 240 was comparable between the two metagenomic sequencing platforms, we compared the 241 complete lists of viruses detected by MiSeq or Nanopore in the 19 IDV-positive samples. 242 The number of different viruses detected in each sample varied from none to a maximum of 243 four. The proportion of samples with perfect agreement between MiSeq and Nanopore (in-244 house) was 52.6%, by MiSeq and Nanopore (WIMP) was 47.4%; and by Nanopore (in-house) 245 and Nanopore (WIMP) was 36.8% (Supplementary Table 1) . 246 Metagenomic sequencing is transforming routine detection of viruses from traditional 248 cell culture, antibody-antigen techniques and qPCR to detection of viruses in a target-249 independent manner. Sequencing approaches have now been widely applied for detection of 250 known and novel agents in various types of clinical specimens in both human and veterinary 251 medicine [26] [27] [28] . The potential usefulness of viral metagenomics for virus surveillance and 252 diagnostics is still in debate due to its performance relative to the gold-standard method of 253 real-time qPCR routinely employed in diagnostic laboratories [3] . A recent assessment of the 254 performance of Nanopore, MiSeq and qPCR for detection of chikungunya and dengue 255 viruses in serum or plasma samples with relatively high viral loads (Cq values from 14 to 32) 256 demonstrated 100% agreement among these methods [1] . In this investigation, however, a 257 maximum of 16 samples were multiplexed and sequenced using MiSeq, and each sample was 258 sequenced individually on Nanopore [1]. This low degree of multiplexing translates to high 259 analytical sensitivity, but correspondingly makes these technologies relatively more 260 expensive per sample than expected, and decreases the potential application for routine 261 diagnostics. In our current study, we performed further exploration to assess the performance 262 of metagenomic sequencing approaches with a higher degree of multiplexing of clinical 263 samples in both MiSeq and Nanopore sequencing. IDV presented an excellent target for this 264 comparison given its association with BRD in beef and dairy cattle, and the availability of 265 specimens with a wider range of viral loads than has been included in previous investigations 266 [19, 20, 32] [29] . In our current study, the original nucleic acids extracted with the 273 MVSK were used for both qPCR and metagenomic sequencing, eliminating the influence of 274 different extraction methods and kits on our results (Figure 1a) . 275 The IDV-specific qPCR assay detected its target in 22.8% (53/232) of specimens. While 276 the majority of IDV positive samples with Cq value below 31 were detected by MiSeq, only 277 1/36 samples with a Cq above this threshold were positive by sequencing (Figure 2 ). These 278 results demonstrate that for samples where the viral load exceeds 6.25 ´ 10 2 even a relatively 279 modest MiSeq sequencing effort (50 samples multiplexed in a single flow cell) is sufficient 280 to detect the virus. The agreement between qPCR and Nanopore of 57.9% (in-house) and 281 84.2% (WIMP) demonstrated that relatively modest Nanopore sequencing effort (6 samples 282 multiplexed) is also sufficient to detect the virus. The results from Nanopore sequencing (in-283 house), however, showed no consistent relationship between viral load and detection by 284 sequencing; furthermore, no consistent relationship between viral load and proportion of viral 285 reads was observed in either MiSeq or Nanopore sequencing (Table 1) . For example, the two 286 IDV positive samples 199 and 10 had Cq values of 26.01 and 24.09, respectively; however, 287 sample 199 had a higher proportion of IDV sequence reads in Nanopore and MiSeq than 288 sample 10 (Table 1) . 289 There are several possible explanations for differences in both the proportion of IDV 290 reads and total viral reads detected in each sample by MiSeq and Nanopore. First, variation 291 in the amounts of DNA used for sequencing library preparation for the two sequencing 292 platforms may play an important role. Second, the abundance of virus relative to host or 293 bacterial genetic material is a critical determinant of the detection threshold of metagenomic 294 sequencing. A greater proportional abundance of a virus increases the chance that it will be 295 detected by sequencing and improves the genome coverage obtained. Therefore, virus 296 enrichment is commonly applied to clinical samples and enrichment methods such as those 297 used in this study (a combination of centrifugation and nuclease-treatment) should lead to 298 removal of bacteria and host cells, thus improving virus detection [30] . Virus propagation in 299 cell culture is a less appealing method for virus enrichment since it is time-consuming, 300 requires specific expertise and creates the potential for introduction of mutations [31] . 301 Reduction of the degree of multiplexing of samples is an alternative way to improve virus 302 detection, but there is a corresponding increase in cost per sample and a corresponding 303 reduction in throughput that are undesirable in research or clinical diagnostic settings. 304 Reduction of the degree of multiplexing of samples also reduces the chances of cross-barcode 305 contamination because barcode reagents are susceptible to cross-contamination [32] . 306 Bioinformatic analysis in metagenomic sequencing remains challenging but is crucial 307 for accurate identification of diagnostic targets. We used the comparable pipeline to analyze 308 both data from MiSeq and Nanopore sequencing (in-house), which showed the exciting 309 feasibility of metagenomic viral whole-genome-sequencing using both Nanopore and MiSeq 310 technology with the assembled contigs covering from 10 to 95% of each IDV genome 311 segment. Although de novo assembly was performed on Nanopore sequencing data, for the 312 majority of the samples, the length of the largest contig was that of one single read. Skipping 313 the assembly step in bioinformatic analysis of Nanopore data could provide an advantage for 314 timely identification of potential pathogens. The long reads of Nanopore sequencing is 315 thought to provide good confidence for species level identification, but the low coverage 316 combined with the error rates of this platform preclude its use for strain-level resolution [33] . 317 The detection rates of IDV, the number or proportion of IDV reads (Table 1) to compare assembled contig sequences to the NCBI RefSeq virus database, which is a more 324 computationally intense process that produces more detailed results. The identification of a 325 match using Centrifuge is based on probabilities of particular k-mer combinations occurring 326 in the query and reference and not a consideration of the entire query sequence, thus 327 increasing the possibility of false positives [34] . In contrast, Trinity assembly and then 328 BLAST search against a reference database could lead to false negatives if the particular 329 target sequence is very rare [36] . If there are very few reads derived from some component 330 of the metagenome, these reads may not be included in the assembly since there is insufficient 331 "evidence" to support building contigs from them [35, 36] . The current lack of definition of 332 the reference database or the ability to use custom databases with WIMP make this approach 333 inappropriate for clinical diagnostic applications due to the difficulty of validating such 334 approaches. Our results provide an illustration of the profound effects that post-sequencing 335 analysis can have on results, and the trade-offs associated with each choice. Selection of the 336 most appropriate analysis pipeline must consider the sequencing platform, as well as 337 tolerance for false negatives and false positives, logistical considerations, and the required 338 taxonomic resolution. 339 Analytical sensitivity is currently one of the main limitations of metagenomics. In this 340 study, IDV was detected by MiSeq sequencing in specimens with qPCR Cq value as high as 341 35.62 when 50 samples were multiplexed in comparison to a maximum Cq value of 39.46 342 using Nanopore with multiplexing of 6 samples. For the IDV positive samples with low virus 343 loads (e.g. sample 32), targeted qPCR may be preferable given its higher analytical 344 sensitivity. Interestingly, we observed two samples that were IDV positive by both Nanopore 345 (WIMP) and MiSeq but negative by qPCR (Samples T30 and T52, Table 1 ). These cases 346 illustrate a potential advantage of metagenomic sequencing compared to qPCR since a likely 347 explanation for this observation is that these specimens contained strain variants of IDV that 348 were not detected by the qPCR assay. We were unable to determine if this was the case since 349 the IDV sequence reads did not cover the region of the genome targeted by the species-350 specific qPCR assay. Targeted PCR assays for rapidly evolving RNA viruses require ongoing 351 performance monitoring, and optimization of primers and probes [2] . No single method is 352 suitable for application for all pathogens or specimen types, and each one has advantages in 353 different circumstances. 354 Taken together our results demonstrate the potential of metagenomic sequencing on the 355 Illumina MiSeq and Oxford Nanopore platforms for detection of viruses, including IDV, in 356 clinical samples from naturally infected animals with a wide range of viral loads. While 357 application of these approaches to screening animal populations or infectious disease 358 research is feasible, their deployment for routine virology diagnostics in clinical settings will 359 require additional research, laboratory and bioinformatic method development, and 360 performance evaluation. Selection of appropriate methods will continue to require careful 361 consideration of the numerous trade-offs that confront practitioners at each step of the 362 investigation. 363 364 Assessment 383 of metagenomic Nanopore and Illumina sequencing for recovering whole genome 384 sequences of chikungunya and dengue viruses directly from clinical samples Clinical Sequencing Uncovers 388 Origins and Evolution of Lassa Virus Nanopore-based detection and 392 19 characterization of yam viruses Real-time, portable genome 396 sequencing for Ebola surveillance A window into third-generation sequencing Rapid metagenomic identification 402 of viral pathogens in clinical samples by real-time nanopore sequencing analysis Next generation sequencing technology: Advances 405 and applications Metagenomic 408 characterization of the virome associated with bovine respiratory disease in feedlot 409 cattle identified novel viruses and suggests an etiologic role for influenza D virus 412 A metagenomics and case-control study to identify viruses associated with bovine 413 respiratory disease Respiratory viruses identified in western Canadian beef cattle by 416 metagenomic sequencing and their association with bovine respiratory disease Influenza D virus infection in Mississippi beef 420 cattle Detection of influenza D virus in bovine respiratory disease samples Characterization of a novel influenza virus in cattle and Swine: proposal 426 for a new genus in the Orthomyxoviridae family Isolation of a novel swine influenza 430 virus from Oklahoma in 2011 which is distantly related to human influenza C viruses Novel Influenza D virus: Epidemiology, 433 pathology, evolution and biological characteristics Serologic evidence 436 of exposure to influenza D virus among persons with occupational contact with cattle Influenza D Virus in Animal Species in Guangdong 440 Province, Southern China. Emerg Infect Dis Replication and 444 Transmission of the Novel Bovine Influenza D Virus in a Guinea Pig Model Distinct bacterial 447 metacommunities inhabit the upper and lower respiratory tracts of healthy feedlot 448 cattle and those diagnosed with bronchopneumonia Cloning of a human parvovirus by molecular screening of respiratory 452 tract samples Development and evaluation of a 456 new Real-Time RT-PCR assay for detection of proposed influenza D virus Fast gapped-read alignment with Bowtie 2 Full-length transcriptome 462 assembly from RNA-Seq data without a reference genome Minimap2: pairwise alignment for nucleotide sequences NCBI viral genomes resource Application of next generation sequencing for the detection of 469 human viral pathogens in clinical specimens The Fecal Viral Flora of California Sea Lions The Fecal Virome 475 of Pigs on a High-Density Farm Metagenomic analysis of viral nucleic acid extraction methods in 479 respiratory clinical samples Evaluation of rapid and simple techniques 483 for the enrichment of viruses prior to metagenomic virus discovery Multiplex PCR 487 method for MinION and Illumina sequencing of Zika and other virus genomes 488 directly from clinical samples Accurate multiplexing and filtering for 491 high-throughput amplicon-sequencing Evaluation of Oxford Nanopore's MinION Sequencing Device for 495 Centrifuge: rapid and sensitive 498 classification of metagenomic sequences Basic local 501 alignment search tool Trimmomatic: a flexible trimmer for Illumina 504 sequence data We thank Anju Tumber (Prairie Diagnostic Services, Inc.) for reagent