key: cord-0268280-yfqhvcxr authors: Chrzastek, Klaudia; Sellers, Holly S.; Kapczynski, Darrell R. title: A universal, single primer amplification protocol (R-SPA) to perform whole genome sequencing of segmented dsRNA reoviruses date: 2021-11-01 journal: bioRxiv DOI: 10.1101/2021.11.01.466778 sha: 91a8060718400df50a61b737d75c208184dffde8 doc_id: 268280 cord_uid: yfqhvcxr Background The Reoviridae family represents the largest family of double-stranded RNA (dsRNA) viruses, and the members have been isolated from a wide range of mammals, birds, reptiles, fishes, insects, plants. Orthoreoviruses, one of the 15 recognized genera in the Reoviridae family, can infect humans and nearly all mammals, and birds. Genomic characterization of reoviruses has not been adopted on a large-scale due to the complexity of obtaining sequences for all 10 segments. Results In this study, we developed a time-efficient, and practical method to enrich reovirus sequencing reads from isolates that allowed for full genome recovery using single-primer amplification method coupled with next generation sequencing. We refer to this protocol as reovirus-Single Primer Amplification (R-SPA). Our results demonstrated that most of the genes were covered with at least 500 reads per base space. Furthermore, R-SPA covered both 5’ and 3’ end of each reovirus genes. Conclusion A universal and fast amplification protocol that yields double-stranded cDNA in sufficient abundance and facilitates and expedites the whole genome sequencing of reoviruses was presented in this study. were covered with at least 500 reads per base space. Furthermore, R-SPA covered both 5' and 23 3' end of each reovirus genes. The flexibility of next-generation sequencing (NGS) technology allows almost any 32 genetic material to be studied on a genome-wide scale. Complete and accurate genome 33 sequencing are essential to facilitate full genome assemblies. Genetic characterization of RNA 34 viruses using sequence-based methods is widely used to understand virus diversity, virus 35 spread, understand origin and evolutionary history of viruses or to perform clinical diagnostics. 36 In eukaryotes, RNA viruses account for the majority of the virome diversity [1] . 37 Orthoreoviruses belong to the Reoviridae family and have a double-stranded RNA (dsRNA) 38 genome. Reoviruses are ubiquitous and can infect humans and animals such as mammals, fish, 39 reptiles and birds. Exposure to reoviruses is presumed to be very common in the human 40 population, 50% of children 5-6 years of age and more than 90% adults can be seropositive to 41 reovirus [2] [3] [4] . The orthoreovirus genome consists of ten dsRNA segments that are divided into 42 three classes based on size: large (L1, L2, and L3), medium (M1, M2 and M3) and small (S1, 43 S2, S3 and S4) segments. Each segment encodes one protein, with the exception of segment 44 S1 which contains 3 open reading frames and codes for p10, p17 and σC proteins [5] [6] [7] [8] . The Avian Orthoreoviruses (ARVs) have been isolated from enteric disease syndromes, 48 myocarditis, hepatitis, arthritis/tenosynovitis, malabsorbstion syndrome in commercial poultry 49 and are responsible for economic loss in poultry worldwide [12] [13] [14] . Recently, newly emerging 50 avian reovirus variants were isolated in the USA [15] [16] [17] or China [18] . The molecular 51 characterization of avian reoviruses is based on amplification of the amino acid sequence of 52 the Sigma C. Sigma C is a major antigenic determinant of ARVs and the most genetically 53 variable gene within the reovirus genome [16, [19] [20] [21] . A next-generation sequencing method 54 for avian reoviruses is largely dependent on isolation of the virus and in such case a direct 55 based RNA-Truseq or cDNA HiSeq sequencing results in a high amount of contaminating 56 (non-viral) nucleid acid in the sequences generated as well as, low yields of reovirus-specific 57 sequences upon sequencing. Furthermore, direct sequencing usually requires a multiple 58 template preparation steps such as RNase and/or DNase treatment to remove contaminations, 59 or high-speed sedimentation to concentrate packaged viral genomes. In contrary, targeted 60 reovirus sequencing usually requires the use of multiples reaction to separately target each 61 reovirus gene [18] . In this study, we developed a simple template enrichment protocol that is utilizes one 63 universal primer to target all ten segments of the reovirus genome. We refer to this protocol as 64 reovirus-Single Primer Amplification (R-SPA) and coupled with next-generation sequencing 65 we obtained full genome sequence of reoviruses. Furthermore, we compared a new R-SPA 66 strategy with our previously described sequence-independent single primer amplification 67 (SISPA) that allows for ssRNA viral genome enrichment to assess whether this method could 68 be also applied to dsRNA reoviruses. Whole genome sequencing and genome coverage 71 A complete, full genome sequence of reoviruses were obtained using the single primer 72 amplification method presented in this study, coupled with next-generation sequencing. We 73 were able to recover 10 segments of each reovirus genome sequence sequenced in this study 74 using R-8N single primer amplification (R-SPA), or a combination of R-8N and R-Rev-8N and 75 obtained a nearly full genome sequence using K-8N SISPA strategy ( was achieved for lambda gene segments (λA, λB, λC), followed by µA, and ơC. R-SPA and R-87 SPA combined with R-reverse-8N primer covered eight or nine out of 10 segments with a depth 88 of coverage above 1000 reads per bp, respectively. K-8N SISPA strategy allowed for six out 89 10 genes to be covered with at least 1000 reads per bp ( Table 2 . As compared to K-8N SISPA strategy, R-SPA allowed to achieve over 13x higher 94 5 mean depth of coverage for ơB segment followed by 8.5x higher mean depth of coverage for 95 µNS segment and 2.5x for λB and µB ( Figure 3A ). The mean depth of coverage for the Sigma 96 C encoding region of S1, was similar after R-SPA and K-8N SISPA strategy and slightly lower 97 when a helper primer was added to the R-SPA reaction. The addition of the Rev-8N primer to 98 R-SPA did not increase the depth of coverage for most of the genome segments sequenced 99 besides sigma NS gene. The sigma NS segment following R-SPA with the helper primer was 100 sequenced at a very high depth of coverage, more than 10 times higher, compared to R-SPA 101 only (10-70x) and over 8 times (8-21x) higher than the K-SISPA strategy ( Figure 3A ). Overall, 102 the highest depth of coverage was achieved using R-SPA only, followed by R-SPA with a 103 helper primer and K-8N SISPA strategy. The distribution of reads aligned to the reference 104 reovirus genome is shown in Figure 3B . Our results demonstrated that most of the genes were 105 covered with at least 500 reads per base space, especially when R-SPA was used for reovirus 106 genome amplification. Furthermore, R-SPA covered both 5' and 3' end of each reovirus genes 107 (Suppl. Figure 1 ). Single primer amplification methods presented in this study allowed detection of avian 110 orthoreovirus by metagenomics. For Reo/Ck/TX/117816/Tendon/2017, out of 54,856 reads 111 classified by Kraken, 95% represents avian reovirus (R-SPA strategy), and out of 49,108 reads 112 classified after K-8N SISPA strategy, 77% were assigned to avian reovirus (Table 3) . For 113 Reo/Ck/TX/115940/Tendon/2016, 92% and 90% of reads were assigned to avian reovirus after 114 R-8N, R-SPA and K-8N SISPA strategy, respectively (Table 3) . primer sets for other segmented viruses as we successfully applied to reoviruses in this study. In this study, we were able to obtain full gene sequences for reoviruses with a high depth of 143 7 coverage that may be used for a subconsensus-level characterization of viral segments. Furthermore, we have shown that modification of a previously described SISPA NGS strategy 145 could also be successfully applied to amplify the reovirus genome. However, the number of In conclusion, a universal and fast amplification protocol that yields double-stranded cDNA in 153 sufficient abundance and facilitates and expedites the whole genome sequencing of reoviruses 154 was presented in this study. Further, our method reduces contamination with non-reovirus 155 sequences that may overwhelm the final sequence output, which can occur using non-selective 156 nucleic acid amplification procedures prior to sequencing. We additionally provide evidence 157 to suggest that the method described is more simplistic than other targeted-based sequencing 158 approaches and could produce ten segments of reovirus genome using only one universal 159 primer. It can be applied in any basic molecular virology/microbiology laboratory with access 160 to a thermal cycling machine. Table 1 . The quality of sequencing reads was assessed using FastQC ver. 0.11.5. The reads were then 218 quality trimmed with Phred using a quality score of 30 or more, in addition to low-quality ends 219 trimming and adapter removal using Trim Galore ver.0.5.0 (powered by Cutadapt) Primer Amplification (R-SPA), R-8N primer that contains 21 known nucleotides (barcode) tag to random octamer at the 3'end was designed. Out of 21 nucleotides, six represent conserved nucleotides found in all reovirus segments (grey color), followed by six nucleotides (5`-end, position 6 -12) that were commonly found in large viral segments, eight random nucleotides used to increase the annealing temperature and random octamer (8xN). 22 Figure 2 . A schema of reovirus amplification. Total RNA was extracted using commercially available kit. A reovirus -Single Primer Amplification, R-SPA protocol consists of three main steps: (i) RT-PCR reaction, (ii) Klenow reaction and (iii) conventional PCR reaction. In the first step, single primer (barcode) that is tagged to random octamer is used to convert RNA into cDNA (R-8N, yellow; K-8N, red). Next, cDNA is converted into dsDNA by Klenow polymerase in a presence of K-8N or R-8N primer in an isothermal amplification. The purified dsDNA of the Klenow reaction is subsequently used as a template for PCR amplification using barcode primer (primer R or primer K). After PCR amplification, and size selection using Agencourt AMPure XP beads (Beckman Coulter) the product can be used as an input for libraries preparation and sequencing. 23 Suppl. Figure 1 . Distribution of aligned reovirus reads at 5`-end. Each graph represents distribution achieved after R-Single Primer Amplification (R-SPA) (red, R1 and salmon, R2), K-8N SISPA 25 strategy (K1 and K2, green) and R-SPA in combination with helper primer R-Rev (grey, R+Rev1 and black, R+Rev 2). ) 280922 (85.64%) K 603870 393944 (65.24%) 201306 (93.11%) R and R-Rev