key: cord-102547-nxut8ov1
authors: GraÌdel, C.; Terrazos Miani, M. A.; Baumann, C.; Barbani, M. T.; Neuenschwander, S.; Leib, S. L.; Suter-Riniker, F.; Ramette, A.
title: Whole genome sequencing of human enteroviruses from clinical samples by nanopore direct RNA sequencing
date: 2020-06-09
journal: nan
DOI: 10.1101/2020.06.09.20126219
sha: 
doc_id: 102547
cord_uid: nxut8ov1

Enteroviruses are small RNA viruses that affect millions of people each year by causing an important burden of disease with a broad spectrum of symptoms. In routine diagnostic laboratories, those viruses are identified by PCR based methods, often combined with partial sequencing for genotyping. In this proof-of-principle study, we assessed direct RNA sequencing (DRS) using nanopore sequencing technology for fast whole-genome sequencing of viruses directly from clinical samples. Results of the approach were complemented with those obtained by sequencing the corresponding viral cDNA via Illumina MiSeq sequencing. DRS of total RNA extracted from three different enterovirus-positive stool samples produced long RNA fragments, covering between 59% to 99.6 % of the best reference genomes. The identification of the enterovirus sequences in the sample was confirmed by the short-read cDNA sequencing. Sequence identity between DRS and Illumina MiSeq enterovirus consensus sequences ranged between 94-97%. Here we show that nanopore DRS can be used to correctly identify the genotypes of enteroviruses from patient stool samples with high viral load.

Enteroviruses (EVs) present a major burden for human health and healthcare systems worldwide, with large outbreaks consisting of hundreds of thousands of hospitalized cases occurring periodically [1] . EVs are associated with a wide variety of symptoms, ranging from mild respiratory diseases to severe neurological infections, leading potentially to death [2] . These single-stranded RNA viruses that belong to the Picornaviridae family possess a relatively small genome size ranging from 7.2 to 8.5 kb. For routine diagnostics, EV presence is generally determined by real-time, reverse-transcription PCR (qRT-PCR) assays, complemented by partial sequencing of the VP1-VP4 coding regions for genotyping [2, 3] . Due to the lack of proofreading mechanism in RNA replication, EV genomes are highly variable and the likely subject of within-and between-genome recombinations [4, 5] . Therefore, when diagnostic tests solely rely on PCR using conserved primer sequences, false-negative results cannot be excluded. There is thus a need to generalize the assessment of EV diversity and evolution via a whole-genome approach, rather than from the limited information gained from sequencing short sequences alone.

The main limitation towards a general adoption of whole-genome sequencing (WGS) for better EV characterization and better understanding of their evolution and epidemiology, is mostly of technological nature: WGS approaches are more expensive, technically demanding and hence time consuming than standard molecular assays; as such they are generally not routinely used in diagnostic laboratories. In the case of RNA viruses, the challenge is even more exacerbated because various molecular steps (RNA purification, extraction, cDNA synthesis, and optionally amplification) add to the turnaround time and final cost of the assays. Clinical WGS applications are mostly based on Sanger sequencing and on second-generation sequencing platforms, with Illumina MiSeq/HiSeq and Ion Torrent machines leading the market, as benchtop sequencing technologies enable highthroughput sequencing at an affordable price per sample when samples are multiplexed [6] . Third-generation sequencers, i.e. SMRT technology (Pacific Biosciences) and nanopore sequencing (Oxford Nanopore Technologies, ONT), have recently emerged as complement to or replacement of second-generation sequencers, by enabling the sequencing of single, long DNA molecules, a feature that is particularly attractive in the context of sequencing full-length viral genomes [7] .

Nanopore sequencing technology has been successfully applied to genome sequencing of RNA viruses. Based on viruses propagated on cell lines, viral transcriptomes have been examined by nanopore sequencing for instance for porcine circovirus [8] , Herpes Simplex Virus [9] , hepatitis C virus [10] , cultured influenza virus A, human cytomegalovirus 3/15 (HCMV) [11] . Additionally, nanopore sequencing has been used successfully to sequence whole EV genomes from viral cDNA extracted from cell cultures: In their proof-of-concept study [12] , the authors demonstrated that nanopore sequencing could be applied for rapid routine whole genome sequencing of EV with sufficient accuracy compared to Sanger sequencing. Metagenomic nanopore sequencing of influenza virus was performed directly from randomly amplified viral cDNA obtained from clinical respiratory samples [13] . Furthermore, ONT has presented a new, potentially revolutionary nanopore sequencing application, which allows sequencing of RNA molecules directly, i.e. without the prerequirement of cDNA synthesis or PCR amplification [14] . This method, direct RNA sequencing (DRS), yields full-length, strand-specific RNA sequences and enables the direct detection of nucleotide modifications in native RNA molecules. DRS has been used in several studies, including human poly(A) transcriptome [15] and to analyse the transcriptome of DNA viruses, such as HSV herpes simplex virus type 1 (HSV-1) during productive infection of primary cell [16] . DRS was also used for sequencing RNA genomes of e.g. Pseudorabies Virus propagated on immortalized porcine kidney epithelial cell line [17] , influenza A virus (H1N1) from infected chicken eggs [18] , human Coronaviruses viral RNAs produced in cell cultures [19] , and many other examples of complete sequencing of multiple, single-stranded RNA (ssRNA) viruses obtained with or without poly(A)-tailing of RNA viruses obtained from cell cultures [20] .

Here, in a proof-of-concept study, we apply nanopore DRS to EV-positive stool samples and show that whole RNA genomes of enteroviruses can be retrieved with enough genomic information for the characterization of the infectious agents. We further analyse the metatranscriptomic data provided by the DRS approach and compared it with that obtained by Illumina MiSeq sequencing of the same samples.

Sample description. In this study, we used three independent patient stool samples (E590, E372, E026), which were sent for enterovirus diagnostics to the clinical diagnostic laboratory of the Institute for Infection Diseases, Bern. The samples were collected in 2017 (E590, E372) and 2018 (E026) and had a cycle threshold (Ct) value of 23.0 (E590), 18.8 (E372) and 22.5 (E026) via real-time PCR using enterovirus-specific primers (see description below). Ethics approval was granted by the Swiss Ethics committee on research involving humans on 08 March 2018 to conduct sequencing of enteroviruses in clinical samples stored in the IFIK biobank (BASEC-Nr: Req-2018-00158). Figure 1 . Analytical workflow. The workflow used for diagnostic assays is indicated by greyed boxes and that followed for NGS techniques by white boxes.

Routine diagnostic approach. After homogenization with a sterile pipette, about 0.5 g of stool sample was added to 4 mL of transport medium [21] containing 5-10 glass beads (diameter 2 mm; Merck AG, Zug, Switzerland) ( Figure 1 ). After 30 s of vortexing, the suspension was centrifuged 5 min at 3350 g and the resulting supernatant 0.2-µm filtered with 20% penicillin/streptomycin (Biochrom, Berlin, Germany). This preparation was further extracted on the EasyMAG platform (bioMérieux, Geneva, Switzerland) for real-time PCR and with TRIzol LS Reagent (ThermoFisher, Reinach, Switzerland) for Illumina MiSeq sequencing (see below).

RNA extraction and purification. A total of 750 µL of TRIzol LS Reagent (ThermoFisher) was added per 250 µL of stool suspension in PBS. Following homogenization by pipetting, the sample was transferred to Phasemaker tubes (ThermoFisher). After incubation for 5 min, 200 µL of chloroform were added, and the tubes shaken vigorously by hand for 15 seconds. After 15 min incubation, the sample was centrifuged for 5 min at 16,000 g at 4°C. The aqueous phase was transferred to a new tube and 10 µg of carrier RNase-free glycogen (ThermoFisher) were added, followed by 500 µL isopropanol. After incubation for 10 min, the sample was centrifuged 10 min at 12,000 g at 4°C, and the supernatant was discarded. The total RNA precipitate was resuspended in 1 mL of 75% ethanol. The sample was vortexed briefly, and then centrifuged for 5 min at 7500 × g at 4°C. The supernatant was discarded and the RNA pellet was air-dried for 10 min, before being resuspended in 20 µL of RNA storage solution (ThermoFisher). After incubation at 60°C for 10 min, the RNA sample was either used directly in downstream applications, or stored at -80°C. Alternatively, total nucleic acid extraction was performed with NUCLISENS easyMAG from 200 µL of the routine diagnostic stool preparation, to which 2.5 µL carrier RNA (Qiagen) was added and eluted in 110 µL. This extract was used for real-time PCR and Sanger genotyping.

Pre-treatment (Chloroform/beads/centrifugation). We used the WHO recommended protocol for pre-treating stool samples for enterovirus RNA isolation (Enterovirus surveillance guidelines, [22] ) adapted from Nix et al., 2006 [23] as follows: A low amount (0.5 to 1 g) of stool sample was added to PBS (ThermoFisher) up to 1 mL volume, to which 0.3 g of glass beads (2 mm diameter, Sigma) and 0.5 mL of chloroform (AppliChem, Aesch, Switzerland) were added. The mixture was shaken vigorously using a TissueLyser (Qiagen AG, Hombrechtikon, Switzerland) for 20 min at . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted June 9, 2020. . https://doi.org/10.1101/2020.06.09.20126219 doi: medRxiv preprint 5/15 maximum speed. The suspension was centrifuged at 1500 × g for 20 min at 4°C, and approximately 1 mL of the supernatant was transferred to a new 1.5 mL tube and continued with RNA extraction.

Real-time RT PCR. One-step RT-PCR was done with the AgPath-ID One-step kit (Ambion, Reinach, Switzerland) using published primers and probes [24] , following the protocol described previously [25] . Primers and probes were synthesized at Microsynth AG, Balgach, Switzerland.

Enterovirus genotyping. Genotyping of the samples was performed by VP1 amplicon Sanger sequencing as described previously [23, 25] and carried out at Microsynth. Nanopore sequencing. For nanopore sequencing, we followed the manufacturer's instructions of the protocol for kit SQK-RNA001 (version DRS_9026_v1_revN_15Dec2016), but substituted Superscript III with Superscript IV (ThermoFisher). The input RNA and amounts loaded on flow cells for each experiment is specified in Supplementary Table 1. Quantification was done using a Qubit RNA HS Assay Kit (RNA) or DNA HS assay (cDNA) kit on a Qubit fluorometer 3.0 (Thermofisher). The reverse-transcribed and adapted RNA was loaded onto R9.4.1 flowcells and sequenced on MinION sequencer.

Illumina MiSeq. Primer design for Coxsackievirus A6 cDNA synthesis. The following three specific primers for Coxsackievirus A6 were designed for this project to hybridize to Coxsackievirus A6 sequence KJ541158: 2588R_A6: 5′-CCCGTTTCTGCCGCTT-3′ adapted from primer 292 by Oberste et al. [26] , 5672R_A6: 5′-ATATCTCTGAATTTCTCATT-3′ adapted from primer HEV.3C.d1 by Bessaud et al. [27] , EV-3UTR1_A6_rc: 5′-CATATTCACGACCAGATTCCTGGTG-3′ (this study). All primers were synthesized at Microsynth.

cDNA synthesis for Illumina MiSeq. First strand synthesis was performed with SuperScript IV First-Strand Synthesis System as follows: For reverse transcription, a reaction mixture was prepared containing 0.5 µL of the specific primers EV-3UTR1_A6_rc, 2588R_A6, 5672R_A6 (10 µM), or 1 µL of random hexamers (50 µM) or 1 µL of oligo(dT)20 (50 µM) together with 1 µL of 10 mM dNTP mix and filled with RNA extract and DEPC-treated water to a total volume of 13 µL. After heating at 65°C for 5 min and snap cooling, a mixture of 4 µL SSIV Buffer, 1 µL 100 mM DTT, 1 µL ribonuclease inhibitor and 1 µL SuperScript IV reverse transcriptase (200 U/µL) were added. The samples were incubated for 10 min at 50°C and 10 min at 80°C (preceded by 10 min at 23°C when containing random hexamer primers). Subsequently, 1 µL of E. coli RNase H (2 U/µL) was added and incubated for 37°C for 20 min. To this mixture, 10 µL of second strand synthesis reaction buffer NEBNext (New England Biolabs (NEB)), 5 µL NEBNext Second Strand Synthesis Enzyme Mix and 44 µL nuclease free water were added. The reaction mixture was incubated at 16°C for 60 min. Clean up with 144 µL Agencourt AMPure XP magnetic beads (Beckman Coulter, Nyon, Switzerland) was performed according to the manufacturer's instructions, with an elution volume of 35 µL for samples E372 and E026 and 47 µL for sample . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted June 9, 2020. . https://doi.org/10.1101/2020.06.09.20126219 doi: medRxiv preprint 6/15 E590. For sample E590, an additional end-prep step was performed: To 45 µL of the elution, 7 µL of Ultra II End-Prep buffer (NEB), 3 µL of Ultra II End-Prep enzyme mix (NEB), and 5 µL of nuclease free water were added and the mixture was incubated for 5 min at 20°C and 5 mins at 65°C. After an additional purification step using 60 µL Agencourt AMPure XP magnetic beads the cDNA was eluted in 20 µL of nuclease-free water. Illumina MiSeq sequencing. Libraries were prepared from unamplified cDNA using Nextera XT DNA Library Prep kit (E590) and Nextera DNA Flex Library Prep kit (E372, E026) (Illumina) before sequencing using an Illumina MiSeq benchtop sequencer generating 2 ×150 bp paired-end reads (v2), according to the manufacturer's protocols. Sequencing was performed at the Next Generation Sequencing Platform of the Inselspital, Bern.

Bioinformatic analysis. Raw FAST5 files produced by MinION sequencing were basecalled under high accuracy mode using the ONT basecaller Guppy version 3.2.2 with the parameter: "guppy_basecaller --input_path PATH --recursive --save_path PATH --qscore_filtering --min_qscore 7 --flowcell FLO-MIN106 --kit SQK-RNA001 --cpu_threads_per_caller 4 --num_callers 4". Statistics for nanopore sequencing output are summarized in supplementary Table 2 . Basecalled nanopore reads were classified using BLASTN against NCBI's nucleotide (nt) database (downloaded 03.10.2019), using BLAST version 2.6.0. BLAST results were classified using MEGAN (v. 6.12.3, [28] ). RNA sequences were mapped to the best scoring reference genome using minimap2 (version 2.5-r601-dirty, [29] ). For Illumina MiSeq data, adapter trimming was done with bbduk.sh from the BBMap package (v 37.80, [30] ), and human and rRNA reads were removed by mapping reads against the corresponding databases (GRCh18 genome, cDNA and ncRNA, and SILVA 132 LSU SSU https://www.arb-silva.de/documentation/release-132/) using minimap2 [29] . Non-human, non-rRNA reads were then assembled using SPAdes (parameters -k 55 --rna --only-assembler; version 3.11.0; [31] ). The resulting contigs were subsequently analysed using BLASTN and MEGAN as described for nanopore data. Sequence identity between Illumina MiSeq and nanopore DRS enterovirus genome consensus sequences was calculated using legacy BLAST 2.2.9. Coverage plots were made with BBMap and plots created using R statistical computing environment (version 3.6.0).

Data availability. After removal of any human reads, all Illumina MiSeq sequencing data (FASTQ), raw and basecalled ONT data (FAST5 and FASTQ, respectively), and Sanger sequences have been deposited on the European Nucleotide Archive (ENA) under the project reference PRJEB38758.

DRS was performed with enterovirus-positive stool samples with similar viral load (Ct values between 18-23) in three independent experiments, consisting of samples from three different patients. The samples were prepared using the WHO recommended protocol for viral enrichment using chloroform/bead treatment, followed by RNA extraction using TRIzol or easyMAG . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted June 9, 2020. Table 1 ). We sequenced the total polyadenylated RNA using DRS on MinION Nanopore sequencer. The runs were continued until there was only negligible sequencing output or until the maximum recommended duration of the run was reached. For validation purposes, all three samples were also subjected to cDNA sequencing using Illumina MiSeq from samples prepared by the routine diagnostic procedure (Figure 1) . The cDNA was produced with either genotype specific primers (E590), or with both oligo-dT primers and random hexamers (E372, E026). Table 1 . Taxonomic classification of reads (Oxford Nanopore) and contigs (Illumina MiSeq) using MEGAN based on BLASTN matches against NCBI's nucleotide (nt) database at the domain level.

Sample E590. DRS reads from extracted RNA were obtained up to 7 hours after the beginning of the sequencing run, and sequencing was stopped after 12 hours. Out of a total output of 102,391 raw nanopore reads, 9,213 reads were successfully basecalled into RNA sequences, with an average length of 1,128 bases (range 1-7,112 bases). More than 98% of basecalled RNA reads (9,065 reads) were taxonomically classified using BLASTN against NCBI's nucleotide (nt) database. The large majority (>98%) of those hits matched eukaryotic sequences, and only 2.2% bacterial (199 reads) and 0.17% viral sequences (16 reads) ( Table 1 ). The majority of eukaryotic reads were assigned to the yeast species Saccharomyces cerevisiae (8,619 reads; 95%). All other eukaryotic species had <5 reads assigned. Within the reads assigned to bacteria, the only species with notable number of reads were Escherichia coli (37 reads), Faecalibacterium prausnitzii (12 reads) or Bacteroides vulgatus (12 reads), all of which are known to be commensal species in the human gut. As far as viral sequences were concerned, all 16 (0.18% of 9,065) reads matched the Enterovirus genus, species EV-A ( Table 2) . They were on average 2,446 bases long (range 458-7,112 bases). Alignment of the RNA sequences to the best scoring reference genome (Coxsackievirus A6 KJ541158; Figure 2 ) demonstrated that the long reads covered almost the entire genome of Coxsackievirus A6, with one single read covering alone 98.5% of the reference genome (with only 110 bases missing at the 5' end). The top two longest RNA sequences (7,112 and 5,397 bases long) were obtained within the first hour of sequencing, and most of the larger sequences (>2kb) were . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted June 9, 2020. obtained within the first 3 hours of sequencing. After 3 hours of sequencing, sequences matching enterovirus were all <900 bases long. Table 2 . Taxonomic classification of viral reads (Oxford Nanopore) and contigs (Illumina MiSeq) using MEGAN based on BLASTN matches against NCBI's nucleotide (nt) database at the species and genotype levels. Grey cells indicate absence of hits in the dataset. Asterisks (*) indicate that >90% of the respective genome sequence was covered, while thick boxes indicate EV species whose identity was also confirmed by Sanger sequencing of the VP1 gene.

Classical amplicon sequencing of the VP1 region was attempted for sample E590, but it did not yield good results. In order to confirm the genotype identification and to obtain a highly accurate whole genome sequence of the enterovirus genome, we subsequently subjected the sample E590 also to cDNA sequencing using Illumina MiSeq. In this case, the cDNA was produced using genotype-specific primers given that low number of reads was obtained by DRS and MiSeq . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted June 9, 2020. . https://doi.org/10.1101/2020.06.09.20126219 doi: medRxiv preprint 9/15 sequencing library preparation was done with the routine diagnostic pre-treatment given that low amount of original material was available (Figure 1 ). After removal of reads mapping to human genome and bacterial rRNA, metatranscriptomic reads were assembled into contigs and further subjected to BLASTN similarity analysis against known sequences in the nt database, and the top hits were taxonomically summarized using MEGAN. Most contigs of sample E590 were classified as bacteria (38) or viruses (17) ( Table 1 ). Such differences in species composition when compared to the results of the DRS run may be explained by the genotype-specific approach used for cDNA synthesis and different pre-treatment of the sample.

All viral sequences matched again uniquely to EV sequences, however, MEGAN analyses revealed the presence of two enterovirus species consisting of EV-A (8 contigs; contigs lengths ranging from 130 to 3164 bases) and EV-B (8 contigs; contigs lengths ranging from 241 to 2,845 bases) in the cDNA sample, of which the majority of reads were assigned to Coxsackievirus A6 and Echovirus 18 ( Table 2) . Mapping of the viral sequences to a VP1 database confirmed the presence of these two genotypes and the identification was verified by using the online RIVM bioinformatic platform [32] . Alignment of the reads to the closest reference genome revealed that the coverage of CV-A6 was 96% with an average depth of 7.8 (±4.5), while for Echovirus 18, only 68% of the reference genome were covered (Figure 2) . The consensus sequence for CV-A6 obtained from Illumina MiSeq was compared to the one from DRS sequencing, which showed 94.0% identity (6,926/7,368) , with 2% gaps (167/ 7,368).

Sample E372. Based on the promising results obtained for sample E590, the DRS transcriptomic approach was repeated on two other clinical samples: For the DRS of RNA extracted from sample E372, the total yield of the sequencing run was 58,548 reads (14.84 Mb) after 12 h of sequencing.

A total of 31,636 reads passed basecalling, with an average length of 144.4 bases (range of 2-4,276). Only 42% (13,297) of the basecalled sequences were taxonomically classified using BLASTN. Interestingly, there was a drastic difference in read composition as compared to the first sample (E590), and the majority of reads classified as of bacterial origin (97.5%, 12,967), and only very few reads matched to eukaryotic (10, 0.8%) or archaeal (5, 0.4%) species. For bacteria, DRS reads were assigned to a large variety of species. The most prominent phyla were Bacteroidetes (4,518 reads) and Proteobacteria (717 reads). A total of 315 viral reads, corresponding to 2.4% of classified reads, were found. As seen for the previous sample, only matches to the Enterovirus genus were found. Although many more viral reads were found in comparison to sample E590, they were, with an average of 1,192 bases, much shorter, and with a range of 106-4,276. No reads spanned the complete genome sequence of any known species or genotype, and altogether could cover 59% of the reference genome sequence (Figure 2) . The short fragment length might result from increased fragmentation of the viral RNA during extraction or library preparation. Due to the directional sequencing used in the DRS process of nanopore sequencing, higher coverage at the 3' end of the sequence may be obtained, and fragments without poly-A tail cannot be sequenced. Therefore fragmented, incomplete RNA molecules may hinder the recovery of the 5'end of the RNA genome. Nevertheless, although the covered region did not span the VP1 sequence, we were able to identify the enterovirus as Echovirus 30 with the best scoring reference genome. This . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted June 9, 2020. . https://doi.org/10.1101/2020.06.09.20126219 doi: medRxiv preprint 10/15 species identification was independently confirmed by standard genotyping approach that uses Sanger-based sequencing of the partial VP1 amplicon. Figure 2 . Coverage plots for both DRS nanopore runs (blue lines) and Illumina MiSeq (grey lines) for all three samples for each EV genotype. Shown are the depth of coverage after mapping the corresponding reads to the best reference genomes in sample E590 (CA6: KJ541158.1, E18: HM777023.1) and E372 (MH484072.1). For E026, the coverage plot was produced by using the assembled contig sequence of the Illumina sequencing run (oligo-dT primers) as reads did not map well any other reference EV genome. For samples E372 and E026, Illumina MiSeq reads based on cDNA produced with oligo-dT and random hexamers are indicated by light and dark grey lines, respectively.

The cDNA from sample E372 was also sequenced by Illumina MiSeq, although in this case with non-specific primers for cDNA synthesis for a more unbiased approach towards EV detection and . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted June 9, 2020. . https://doi.org/10.1101/2020.06.09.20126219 doi: medRxiv preprint 11/15 not to miss possible EV co-infections. Both oligo-dT and random hexamer primers were used to ensure a better chance of obtaining the full genome, with the idea that oligo-dT may lead to results more comparable to the those of DRS due to polyadenylation requirements, and random hexamers may offer better chance of obtaining a well-covered 5' end of the enterovirus genome. The distribution of contigs assigned on the domain level did not vary much between random hexamer and oligo-dT produced cDNA: Most contigs were assigned to bacteria (99.3% or 99.5% respectively), with only a small minority mapping to eukaryotes (0.15%, 0.20%) and viruses (0.51%, 0.28%). Within bacteria, most contigs were assigned to the phyla Proteobacteria (11,120, 12,551), followed by Bacteriodetes (735, 896) and Actinobacteria (303, 381). Taking a closer look at all the detected viral genotypes ( Table 2) , we found that, besides Echovirus 30, there were also contigs mapping to other virus genotypes, such as Rhinovirus A, and a variety of other bacterial or plant viruses, albeit only with few short contigs. The Echovirus 30 genome, however, was well covered with either approach (Figure 2) . Comparison of the DRS nanopore sequencing consensus to the one obtained from Illumina MiSeq showed 97.3% identity (4,211/ 4,329), with 0.8% gaps (34/ 4,329).

Sample E026. The DRS run of the RNA extract from sample E026 was stopped after the maximal run duration of 48 h. Notably, this run had much higher output compared to two previous samples, resulting in a total of 1.56 M sequenced reads (1.78 Gb), from which 1.46 M reads passed basecalling. The average length was also considerably higher than in previous runs with 1,035 bases (range 1-7,101 bases). Although the RNA input material used for library preparation was measured to be of a similar amount for all three samples (361 ng), cDNA measurement (896 ng) before loading on the flow cell indicated that the initial RNA concentration might not have been the same. Overall, 91% of basecalled reads (1.41 M reads) were taxonomically classified using BLASTN. Of those, 95.3% (1,349,407) mapped to eukaryotic sequences. As seen for sample E590, the vast majority of eukaryotic reads mapped to Saccharomyces cerevisiae (96.8%), while 4.6% (65,408) were classified as of bacterial origin, and only a negligible amount as archaea (6, 0.0004%). Viral sequences comprised 0.11% (1,579) of the reads, the majority of which were classified as plant viral pathogens ( Table 2) , mostly belonging to the tomato mosaic virus (1,274 reads, 81% of viral reads). Other, more rarely detected species, included Melon necrotic spot virus (8) , and Cactus virus X (21) . In total, 181 reads mapped to enteroviruses. Although sequences were mapped to several genotypes, mapping the reads to the best scoring reference genome and to a VP1 database identified genotype Echovirus 25 as the most likely and only EV genotype present in the sample. Although there were relatively few hits to enteroviruses considering the high total output of the run, there were some very long sequences present that covered up to 99.6% of the reference genome ( Figure 2 ).

Illumina sequencing of the cDNA from sample E026 produced using oligo-dT primers and random hexamers showed similar distributions on the domain level: The majority of contigs belonged to bacterial species (98.6% and 98.9%, respectively for oligo-dT and random hexamer approaches), with a minority of reads mapping to eukaryotes (0.49%, 0.50%) and viruses (0.87%, 0.56%). This major shift from yeast to bacteria as compared to the DRS approach, which was also observed for . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted June 9, 2020. . https://doi.org/10.1101/2020.06.09.20126219 doi: medRxiv preprint 12/15 sample E590, could be explained by the different pre-treatment using 0.2-µm filtration. Additionally, comparison of number of contigs and nanopore reads are of limited value when not considering the coverage depth. For E026, the genome of Echovirus 25 genotype, whose presence was also confirmed by Sanger sequencing of the VP1 gene, was well covered with both types of cDNA synthesis (Figure 2) . Some contigs also mapped to another genotype, Echovirus 30, but these were only short contigs, which may have been wrongly taxonomically assigned due to the short sizes of the Illumina reads. Otherwise, we also found a variety of viral sequences, with the most prominent being the tomato mosaic viruses and other plant pathogens, reflecting the results of the DRS run. The consensus sequence identity for Echovirus 25 sequenced by DRS to Illumina MiSeq was calculated as 96.7% (6,979/ 7,218) , with 1% gaps (104/ 7,218).

Given the high cost associated with DRS at the present time, those experiments were designed to assess the potential of the technology on clinical samples, while refining wet laboratory approaches, so that maximum information could be obtained as more samples were analysed. Using RNA extracted from clinical samples, we were able to repeatedly identify human enteroviruses in stool samples from three independent patients by nanopore-based DRS. Beyond identifying EV species and genotypes correctly, we showed that the approach may also provide rich metatranscriptomic information on sample composition for all life domains. Clear differences in overall species compositions, with either yeast or bacteria dominating the majority of obtained reads, were observed between samples. Viral reads constituted between 0.11-2.4% of total passed reads. By complementing DRS data with Illumina MiSeq sequencing data for the same samples, we were able to validate the obtained enterovirus sequences and the further analysis of the metatranscriptomic data. Illumina MiSeq sequencing revealed a higher diversity in viral species in these samples ( Table 2) , which was not captured by the DRS approach.

In all samples, 59% to 99.6 % of the EV reference genome sequences were covered. The average identity of the DRS consensus was 94-97% compared to the Illumina MiSeq consensus sequence. Other studies using DRS have achieved similar values with 93-97% identity when sequencing viruses from cell culture [20] , or up to 99% consensus identity for influenza A [18] . For applications with samples with low target concentration such as patient samples, getting enough sequencing depth to obtain accurate consensus sequences constitutes a challenge for the method. The resulting consensus sequences might therefore be of limited use for applications which require high accuracy, such as phylogenomic analysis. However, for the purpose of genotype identification the approach is sufficiently accurate, especially as it can provide long reads, which can facilitate mapping to the correct reference, and expedite downstream bioinformatic analyses.

The good overall agreement in EV detection between the two NGS approaches suggests that the variability observed in number of reads and composition between samples may be attributable to the natural biological variability that may exist between the sampled patients. Yet, finer differences . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted June 9, 2020. . https://doi.org/10.1101/2020.06.09.20126219 doi: medRxiv preprint 13/15 in composition between samples may be explained by the sequencing technology and by the different pre-treatments of the samples: On the one hand, smaller cDNA molecules produced by the wet laboratory procedure (cDNA synthesis, bead cleaning), followed by short-read sequencing, may be easier to detect, than larger, intact RNA molecules via DRS. On the other hand, unreliable mapping and poor taxonomical identification may be produced by short reads aligning to genomic regions that are conserved among genotypes, but also by long, error-prone nanopore reads [14] aligning incorrectly to reference sequences. Therefore, further large-scale comparisons of the two NGS approaches would be needed to confirm or reject the hypothesis about the effect of read length on organism detectability in metagenomic or metatranscriptomic studies.

While enteroviruses were detected in all tested samples using DRS, it should be noted that these were all samples with relatively high viral load, i.e. low Ct values. With a range of only 16-315 total reads mapping to enteroviruses for the three samples, the approach was yet close to the limit of detection. In the case of sample E590, this sensitivity issue may explain why co-infection with another enterovirus was not detected, as compared to the short-read sequencing-based approach. Low sensitivity of nanopore sequencing in a viral metagenomic approach has been reported previously in patient samples with low viral titers, even when sequencing viral genomes via SISPA-based amplification [13] . An improvement in sensitivity would be necessary before attempting to sequence samples with low viral load in general. A limiting factor of the approach was the amount of polyadenylated RNA used for library preparation, as in two of the sequencing runs the full flowcell sequencing capacity was not used and the sequencing was stopped before maximal run duration, because low amounts of RNA library were loaded on the flowcell. If sufficient amounts of patient samples were available, increasing the input concentration might have provided better coverage of the target sequences. Furthermore, DRS sequencing is expensive if only one sample is loaded per flowcell. Recent progress with barcoding for DRS [33] might however reduce this problem in the future. Additionally, results might be improved with adaptations in viral enrichment steps. We used a chloroform/bead pre-treatment in DRS samples for enrichment of viral capsid and although it might be efficient in enriching viral reads, it could also reduce the overall amount of RNA extracted. As the maximum sequencing yield of the flowcell was not always reached, perhaps milder enrichment methods might be preferable, especially if the broader diversity of pathogens is to be studied. Currently the DRS method is restricted to sequencing of polyadenylated RNA, however, performing polyadenylation of the total RNA fraction could expand its applications.

Another characteristic of the DRS method is the expected sequencing bias towards the 3' end of the poly-adenylated molecules [14] . Hence, coverage of the 5' region of the EV genomes was generally low, which may be problematic given that the VP1 region is the main region used for EV genotyping (e.g. located at position 2,207-3,082 in Echovirus 30 KT353720.1). Thus, correct genotyping would require at least around 4,100 bases long RNA reads from the 3' towards the 5' end of the genome sequence. We encountered this problem in the case of sample E372, although in that instance we were able to identify the same genotype as found by the gold standard VP1 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted June 9, 2020. . https://doi.org/10.1101/2020.06.09.20126219 doi: medRxiv preprint

A generic assay for whole-genome amplification and deep sequencing of enterovirus A71

Rhinoviruses and Respiratory Enteroviruses: Not as Simple as ABC. Viruses

Species-specific RT-PCR amplification of human enteroviruses: a tool for rapid species identification of uncharacterized enteroviruses

Virology, epidemiology, pathogenesis, and control of enterovirus 71

Human enterovirus 109: a novel interspecies recombinant enterovirus isolated from a case of acute pediatric respiratory illness in Nicaragua

Coming of age: ten years of next-generation sequencing technologies

Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples

Multi-platform analysis reveals a complex transcriptome architecture of a circovirus

Long-Read Isoform Sequencing Reveals a Hidden Complexity of the Transcriptional Landscape of Herpes Simplex Virus Type 1. Front Microbiol

A case of genotype-3b hepatitis C virus in which the whole genome was successfully analyzed using third-generation nanopore sequencing

Enrichment by hybridisation of long DNA fragments for Nanopore sequencing

Rapid and Accurate Sequencing of Enterovirus Genomes Using MinION Nanopore Sequencer

Metagenomic Nanopore Sequencing of Influenza Virus Direct from Clinical Respiratory Samples

Highly parallel direct RNA sequencing on an array of nanopores

Nanopore native RNA sequencing of a human poly(A) transcriptome. bioRxiv

Direct RNA sequencing on nanopore arrays redefines the transcriptional complexity of a viral pathogen

Multi-Platform Sequencing Approach Reveals a Novel Transcriptome Profile in Pseudorabies Virus. Front Microbiol

Direct RNA Sequencing of the Coding Complete Influenza A Virus Genome

Direct RNA nanopore sequencing of full-length coron-avirus genomes provides novel insights into structural variants and enables modification analysis. bioRxiv

Rapid Sequencing of Multiple RNA Viruses in Their Native Form. Front Microbiol

Evaluation of 2-SP transport medium for detection of Chlamydia trachomatis and Neisseria gonorrhoeae by two automated amplification systems and culture for chlamydia

Enterovirus surveillance guidelines: Guidelines for enterovirus surveillance in support of the Polio Eradication Initiative. 2015; World Health Organization (WHO) Regional Office for Europe

Sensitive, seminested PCR amplification of VP1 sequences for direct identification of all enterovirus serotypes from original clinical specimens

Rapid routine detection of enterovirus RNA in cerebrospinal fluid by a onestep real-time RT-PCR assay

Rapid and Cost-Efficient Enterovirus Genotyping from Clinical Samples Using Flongle Flow Cells

Improved molecular identification of enteroviruses by RT-PCR and amplicon sequencing

Characterization of the genome of human enteroviruses: design of generic primers for amplification and sequencing of different regions of the viral genome

MEGAN analysis of metagenomic data

Minimap2: pairwise alignment for nucleotide sequences

BBMap: A Fast, Accurate, Splice-Aware Aligner

SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing

An automated genotyping tool for enteroviruses and noroviruses

Barcoding and demultiplexing Oxford Nanopore native RNA sequencing reads with deep residual learning. bioRxiv

PCR-based approach. However, unambiguous identification relies on the capsid region and its presence is crucial in case of recombinations or co-infections. Therefore, great care must be applied during extraction and library preparation to avoid shearing or degrading extracted RNA. The known high variability of EV genome sequences makes it necessary to have suitable alternatives for genotype identification available if PCR-based approaches fail. In such cases, DRS of patient samples could be a valuable alternative, as the sequencing is primer-independent and the sequences of long, native RNA molecules are recovered. Additionally, as is the case with all nanopore sequencing, the data is available in real-time, which can provide faster identification and a sequencing-on-demand approach to molecular assays.Overall, our proof-of-concept study demonstrated to the possibilities offered by the DRS technique for genotype identification of human enteroviruses directly from clinical samples. The method requires further optimization to improve the overall sensitivity and to lower costs for future possible applications in routine diagnostics. However, with the rapid advancement of the technologies those issues will likely improve in the near future.