key: cord-1015277-1wrgslkz authors: Lin, M. J.; Rachleff, V. M.; Xie, H. J.; Shrestha, L.; Lieberman, N. A.; Peddu, V.; Addetia, A.; Casto, A.; Breit, N.; Mathias, P. C.; Huang, M.; Jerome, K.; Greninger, A. L.; Roychoudhury, P. title: Host-pathogen dynamics in longitudinal clinical specimens from patients with COVID-19 date: 2021-04-29 journal: nan DOI: 10.1101/2021.04.27.21256149 sha: 1dea12e5357885a011e57d6246795a00970d476c doc_id: 1015277 cord_uid: 1wrgslkz Rapid dissemination of SARS-CoV-2 sequencing data to public repositories has enabled widespread study of viral genomes, but studies of longitudinal specimens from infected persons are relatively limited. Analysis of longitudinal specimens enables understanding of how host immune pressures drive viral evolution in vivo. Here we performed sequencing of 49 longitudinal SARS-CoV-2-positive samples from 20 patients in Washington State collected between March and September of 2020. Viral loads declined over time with an average increase in RT-PCR cycle threshold (Ct) of 0.87 per day. We found that there was negligible change in SARS-CoV-2 consensus sequences over time, but identified a number of nonsynonymous variants at low frequencies across the genome. We observed enrichment for a relatively small number of these variants, all of which are now seen in consensus genomes across the globe at low prevalence. In one patient, we saw rapid emergence of various low-level deletion variants at the N-terminal domain of the spike glycoprotein, some of which have previously been shown to be associated with reduced neutralization potency from sera. In a subset of samples that were sequenced using metagenomic methods, differential gene expression analysis showed a downregulation of cytoskeletal genes that was consistent with a loss of ciliated epithelium during infection and recovery. We also identified co-occurrence of bacterial species in samples from multiple hospitalized individuals. These results demonstrate that the intrahost genetic composition of SARS-CoV-2 is dynamic during the course of COVID-19, and highlight the need for continued surveillance and deep sequencing of minor variants. These results demonstrate that the intrahost genetic composition of SARS-CoV-2 is dynamic during the course of COVID-19, and highlight the need for continued surveillance and deep sequencing of minor variants. using the Wald test was performed using DEseq2 [21] and deemed significant at a Benjamini-Hochberg adjusted p value < 0.1. Statistical enrichment of Gene Ontology Biological Processes 120 was performed on all significant genes using the R package clusterProfiler [22] . Raw (https://github.com/FredHutch/CLOMP) as previously described [23] . Samples with more than 127 10 million reads were randomly down-sampled to 10 million reads before analysis using the 128 "sample" command in seqtk (https://github.com/lh3/seqtk). The pipeline output was visualized 129 using the Pavian metagenomic explorer [24] , and reads per million (RPM) calculations were 130 done using a custom R script. Results were filtered to highlight RPM counts for a shortlist of 131 clinically relevant taxa (S4 Table) . Samples were determined to be positive if the species level 132 RPM was at least 30 for viruses, and 100 for bacteria. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 many with severe disease given the availability of multiple samples from these 143 patients. Consistent with other reports [26] , we observed that viral load declined over time in 144 most patients with two or more positive or inconclusive samples with an average increase in RT-145 PCR cycle threshold (Ct) of 0.87 per day (Fig 1 and S1 Fig) . ACTT-1 Trial 2 (10) . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 29, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 29, 2021. ; from P004 collected at the time of autopsy, two samples from P006 collected at the same time during a hospital admission, and samples collected 9 hours apart from P012 during an 165 emergency room visit. Low frequency variants detected across the genome 186 We analyzed intrahost viral genetic variation by examining all sites with >100x locus 187 depth, masking known problematic sites (see Methods). We examined sites in 47 samples from 188 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 20 different patients and found a total of 1267 unique non-synonymous variants relative to the 189 Wuhan-Hu-1 (NC_045512.2) reference genome present at frequencies between 5-95% (Fig 2) . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Of the seven most commonly observed variants in our dataset (Table 3) . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2021. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2021. ; https://doi.org/10. 1101 /2021 For samples that were sequenced metagenomically, we pseudo-aligned reads to the 287 human transcriptome to perform differential expression analysis comparing initial (t = 0) 288 timepoints to later timepoints. Samples with more than 900,000 pseudo-aligned reads (n = 7 289 initial, 3 later timepoints) were included in the analysis to determine variation in host gene 290 expression over time. We observed a dramatic downregulation of several cytoskeletal genes, 291 particularly dynein heavy chain (DNAH 2, 3, 5, 6, 7, 9, 10, 11, 12) , as well as WDRs, MAP1A, . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2021. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2021. ; https://doi.org/10.1101/2021.04.27.21256149 doi: medRxiv preprint Taken together, our results suggest that low frequency genomic variants emerge in 387 immunocompetent individuals, but that these variants are unlikely to reach fixation. Given the 388 emergence of rapidly spreading variants of concern over the past several months, the limited 389 intra-host evolution observed in our dataset highlights the critical impact that a select few CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 Virus. Trends Microbiol. 2018 Sep; 26(9) :781-93. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 microbiome studies and pathogen identification. Schwartz R, editor. Bioinformatics. 2020 is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 29, 2021. Within-Host Evolution of Human Influenza SARS-CoV-2 detection across multiple specimen types SARS-CoV-2 samples to increase molecular testing throughput Metagenomic Next-Generation Sequencing during an Investigation of Hospital-Acquired 438 Human Parainfluenza Virus 3 Infections Recovery of Complete SARS-CoV-2 Genomes from Clinical Samples by Use of Swift 445 Global initiative on sharing all influenza data -from vision 448 to reality MAFFT Multiple Sequence Alignment Software Version 7: 452 Improvements in Performance and Usability Nextstrain: real-454 time tracking of pathogen evolution Trimmomatic: a flexible trimmer for Illumina sequence 457 data Near-optimal probabilistic RNA-seq Pavian: interactive analysis of metagenomics data for SARS-CoV-2 in Europe and North America Tracking 485 Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the 486 COVID-19 Virus Prolonged Infectious SARS-CoV-2 Shedding from an Asymptomatic Rank taxID