key: cord-0982733-izu5x2d4 authors: Moreno, Gage K.; Braun, Katarina M.; Halfmann, Peter J.; Prall, Trent M.; Riemersma, Kasen K.; Haj, Amelia K.; Lalli, Joseph; Florek, Kelsey R.; Kawaoka, Yoshihiro; Friedrich, Thomas C.; O’Connor, David H. title: Limited SARS-CoV-2 diversity within hosts and following passage in cell culture date: 2020-04-20 journal: bioRxiv DOI: 10.1101/2020.04.20.051011 sha: 14358c597ffc4f985fb0dadadbb7a151a863ab44 doc_id: 982733 cord_uid: izu5x2d4 Since the first reports of pneumonia associated with a novel coronavirus (COVID-19) emerged in Wuhan, Hubei province, China, there have been considerable efforts to sequence the causative virus, SARS-CoV-2 (also referred to as hCoV-19) and to make viral genomic information available quickly on shared repositories. As of 30 March 2020, 7,680 consensus sequences have been shared on GISAID, the principal repository for SARS-CoV-2 genetic information. These sequences are primarily consensus sequences from clinical and passaged samples, but few reports have looked at diversity of virus populations within individual hosts or cultures. Understanding such diversity is essential to understanding viral evolutionary dynamics. Here, we characterize within-host viral diversity from a primary isolate and passaged samples, all originally deriving from an individual returning from Wuhan, China, who was diagnosed with COVID-19 and subsequently sampled in Wisconsin, United States. We use a metagenomic approach with Oxford Nanopore Technologies (ONT) GridION in combination with Illumina MiSeq to capture minor within-host frequency variants ≥1%. In a clinical swab obtained from the day of hospital presentation, we identify 15 single nucleotide variants (SNVs) ≥1% frequency, primarily located in the largest gene – ORF1a. While viral diversity is low overall, the dominant genetic signatures are likely secondary to population size changes, with some evidence for mild purifying selection throughout the genome. We see little to no evidence for positive selection or ongoing adaptation of SARS-CoV-2 within cell culture or in the primary isolate evaluated in this study. Author Summary Within-host variants are critical for addressing molecular evolution questions, identifying selective pressures imposed by vaccine-induced immunity and antiviral therapeutics, and characterizing interhost dynamics, including the stringency and character of transmission bottlenecks. Here, we sequenced SARS-CoV-2 viruses isolated from a human host and from cell culture on three distinct Vero cell lines using Illumina and ONT technologies. We show that SARS-CoV-2 consensus sequences can remain stable through at least two serial passages on Vero 76 cells, suggesting SARS-CoV-2 can be propagated in cell culture in preparation for in-vitro and in-vivo studies without dramatic alterations of its genotype. However, we emphasize the need to deep-sequence viral stocks prior to use in experiments to characterize sub-consensus diversity that may alter outcomes. To understand how serial passaging SARS-CoV-2 affects genomic variation, we sequenced 118 virus populations after each passage using the same SISPA metagenomics approach we used 119 to characterize the original biological specimen. Passaged sample names and cell lines are 120 described in the methods. An in-house pipeline (available at: 121 https://github.com/katarinabraun/SARSCoV2_passage_MS) was applied to trim out primer 122 sequences, bioinformatically deplete host reads, and generate alignment files, which contained 123 all reads mapping to the SARS-CoV-2 Madison reference genome (MT039887.1). At the 124 consensus level, SARS-CoV-2 does not accumulate genetic variation after two passages on 125 Vero 76 cells (Fig 2) . We also examined deletions ≥ 1% frequency and ≥ 3 nt in length. We found 126 no evidence of deletions that fit these criteria in any of the cell culture isolates. 127 To characterize patterns of sub-consensus diversity, we looked at SNVs at or above 1% 129 frequency in only the Illumina reads. We previously established that this conservative cutoff 130 ensures that only bona fide mutations are considered [15, 16] . All minor variant analyses and 131 figures were completed using the Illumina SNV data as these data are higher average quality 132 and ideal for analysis involving low-frequency variants (Fig 3) . Seventy-five percent of all minor 133 variants we identify fall in ORF1a and ORF1b, which together take up 72.8% of the length of the 134 28kb coding genome. ORF1a and ORF1b encode the replicase machinery [7] . We account for 135 differences in gene size by normalizing variants to kilobase gene length (variants / kb-gene-136 length -"v/kbgl") [10] . The highest density of variants was reported in smaller genes like 137 envelope, ORF7a, and ORF10 (S2 Table) . We also show that through each passage, variant 138 density in ORF1a and ORF1b increases. There were no SNVs ≥ 1% in the spike gene in the 139 primary NP swab, but low-frequency SNVs (all <5%) were identified in spike following passage 140 in cell culture (Fig 3) . Outside of ORF1a and ORF1b, the other genes in the primary NP swab are clonal above the 1% threshold, with the exception of one low-frequency SNV In addition to assessing the fate of individual minor variants, we were also interested in 188 evaluating population dynamics using diversity metrics. Specifically, we calculated genewise 189 diversity using π , the average number of pairwise differences per nucleotide site among a set of sequences, for each gene in each sample. Overall, genewise nucleotide diversity is very low 191 compared to other RNA viruses, consistent with low mutation rates in coronaviruses due to RNA 192 proofreading machinery [18, 19] . Genewise diversity was very low in the primary NP swab and 193 was only measurable in ORF1a (9 SNVs), ORF1b (5 SNVs) and N (1 SNV). Genewise diversity 194 is more varied in the passaged samples (Fig 5) . Interestingly, π is highest in ORF7a in these 195 samples -although this signal seems to be primarily driven by the small size of this gene. To Vero 76 ORF10, p1 Vero E6 envelope (E), and p1 Vero STAT-1 KO ORF3a. 205 We examined the concordance between SNV calls at the same sites, irrespective of frequency, 207 determined by Illumina and ONT workflows. To begin, we used a stringent cutoff of 10% 208 frequency for ONT SNVs. We then called variants at percentage frequencies decreasing by 209 0.5% (eg. calling 8% variants, then 7.5%, etc) until the variants called by ONT no longer 210 matched Illumina variants irrespective of frequency at these sites (Fig 6, S1 Illumina and ONT at lower frequencies in the passaged samples because viral titer in vitro 219 typically exceeds viral titer in vivo resulting in higher average coverage in the passaged samples 220 required to support minor variant calls at lower frequencies. Oxford Nanopore kits SQK-LSK109. Samples were barcoded using the Oxford Nanopore Native 327 Barcodes (EXP-NBD104 and EXP-NBD114), and pooled to a total of 140ng prior to being run 328 on the appropriate flow cell (FLO-MIN106) using the 72hr run script. 329 Amplified cDNA was purified using a 1:1 concentration of AMPure XP beads (Beckman Coulter, 331 Brea, CA, USA) and eluted in 48µL of water. PCR products were quantified using Qubit dsDNA 332 high-sensitivity kit (Invitrogen, USA) and were diluted to a final concentration of 0.2 ng/µl (1 ng 333 in 5 µl volume). Each sample was then made compatible for deep sequencing using the Nextera 334 fragmented and tagged with short oligonucleotide adapters, followed by 14 cycles of PCR for 336 template indexing. Samples were purified using two consecutive AMPure bead cleanups (0.5x 337 and 0.7x) and were quantified once more using Qubit dsDNA high-sensitivity kit (Invitrogen, 338 USA). The average sample fragment length and purity was determined using Agilent High were pooled with seven other samples (not included in this manuscript) and were denatured to a 346 final concentration of 14pM with a PhiX-derived control library accounting for 1% of total DNA 347 and was loaded onto a 600-cycle v3 flowcell. Average quality metrics were recorded, reads 348 were demultiplexed, and FASTQ files were generated on Illumina's BaseSpace platform. 349 Seventy-two hours after sequencing was initiated, raw sequencing reads were demultiplexed 351 using qcat (https://github.com/nanoporetech/qcat). In order to deplete host sequences, 352 sequencing reads are mapped against host genome and transcript references, and unmapped 353 reads are saved. Reads were then trimmed by 30bp on each side to discard SISPA primer 354 sequences. In this step, reads with quality scores SNPGenie adapts the Nei and Gojobori method of estimating nucleotide diversity (π), and its 384 synonymous (π S ) and nonsynonymous (π N ) partitions from next-generation sequencing data 385 [31] . As most random nonsynonymous mutations are likely to be disadvantageous, we expect 386 π N = π S indicates neutrality suggesting that allele frequencies are determined primarily by 387 genetic drift. per gene segment, we report the density of variants normalized to gene kilobase length. 540 Outbreak of pneumonia of unknown etiology in Wuhan China: The mystery and the miracle Coronavirus-Infected Pneumonia Epidemiological and clinical characteristics of 99 cases of 415 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study Clinical features of patients infected with 2019 novel 418 coronavirus in Wuhan A novel coronavirus outbreak of global health 420 concern Nextstrain: real-time tracking of pathogen evolution Genomic characterization of the 2019 novel human-424 Genomic characterisation and epidemiology of 2019 novel 427 coronavirus: implications for virus origins and receptor binding Molecular evolution of the SARS coronavirus during the course of the 429 SARS epidemic in China SARS-CoV-2 431 exhibits intra-host genomic plasticity and low-frequency polymorphic quasispecies Genomic diversity of SARS-CoV-2 in Coronavirus Disease 434 2019 patients Identification of a common deletion in the spike protein of 436 SARS-CoV-2 Metagenomic Nanopore Sequencing of Influenza 438 Virus Direct from Clinical Respiratory Samples Nanopore and Illumina sequencing for recovering whole genome sequences of 441 chikungunya and dengue viruses directly from clinical samples Selection on haemagglutinin imposes a bottleneck 443 during mammalian transmission of reassortant H5N1 influenza viruses Pathways Taken during Mammalian Adaptation of a 1918-like Avian Influenza Virus Quantifying within-host diversity of H5N1 influenza 449 viruses in humans and poultry in Cambodia Direct RNA sequencing and early evolution 451 of SARS-CoV-2 Coronaviruses as DNA wannabes: a new model for the regulation 453 of RNA virus replication fidelity Measurements of intrahost viral diversity require an unbiased 455 diversity metric An amplicon-based sequencing framework 457 for accurately measuring intrahost virus diversity using PrimalSeq and iVar Mutations present in a low-passage 460 Zika virus isolate result in attenuated pathogenesis in mice Viral mutation rates Within-Host Evolution of Human Influenza 464 Genetic variation in neuraminidase 466 genes of influenza A (H3N2) viruses Moderate mutation rate in the SARS coronavirus genome and its 468 implications A new twist in measuring mutation rates Trimmomatic: a flexible trimmer for Illumina sequence 471 data VarScan 2: somatic mutation and copy number 473 alteration discovery in cancer by exome sequencing SNPGenie: estimating evolutionary parameters to 475 detect natural selection using pooled next-generation sequencing data Simple methods for estimating the numbers of synonymous and 478 nonsynonymous nucleotide substitutions Adaptive evolution of genes and genomes