key: cord-0297941-r53waxc7 authors: Wertheim, J. O.; Wang, J. C.; Leelawong, M.; Martin, D. P.; Havens, J. L.; Chowdhury, M. A.; Pekar, J.; Amin, H.; Arroyo, A.; Awandare, G. A.; Chow, H. Y.; Gonzalez, E.; Luoma, E.; Morang'a, C. M.; Nekrutenko, A.; Shank, S. D.; Quashie, P. K.; Rakeman, J. L.; Ruiz, V.; Torian, L. V.; Vasylyeva, T. I.; Kosakovsky Pond, S. L.; Hughes, S. title: Capturing intrahost recombination of SARS-CoV-2 during superinfection with Alpha and Epsilon variants in New York City date: 2022-01-21 journal: nan DOI: 10.1101/2022.01.18.22269300 sha: f2cd77c858961a7fc3c600ecb268e11ff5aa6f2f doc_id: 297941 cord_uid: r53waxc7 Recombination is an evolutionary process by which many pathogens generate diversity and acquire novel functions. Although a common occurrence during coronavirus replication, recombination can only be detected when two genetically distinct viruses contemporaneously infect the same host. Here, we identify an instance of SARS-CoV-2 superinfection, whereby an individual was simultaneously infected with two distinct viral variants: Alpha (B.1.1.7) and Epsilon (B.1.429). This superinfection was first noted when an Alpha genome sequence failed to exhibit the classic S gene target failure behavior used to track this variant. Full genome sequencing from four independent extracts revealed that Alpha variant alleles comprised between 70-80% of the genomes, whereas the Epsilon variant alleles comprised between 20-30% of the sample. Further investigation revealed the presence of numerous recombinant haplotypes spanning the genome, specifically in the spike, nucleocapsid, and ORF 8 coding regions. These findings support the potential for recombination to reshape SARS-CoV-2 genetic diversity. Major strain alleles are those that occurred at frequencies between 60 and 90% (≥3 samples), 133 with all diagnostic Alpha mutations in this set. Minor strain alleles are those that occurred at 134 frequencies between 10 and 25% (≥3 samples), with all but one diagnostic Epsilon mutation in 135 this set; the A28272T mutation characteristic of Epsilon is absent in NYCPHL-002461. Notably, 136 the "other" category encompasses all other variable sites, i.e. those occurring at AF between 137 25% and 60% or those found in only one or two samples. The two alleles were found in all four 138 replicate sequences at intermediate frequencies: G7723A (30.3%) and C23099A (46.7%). 139 These frequencies are suggestive of intrahost variation in the major strain. 140 141 In contrast, the sequencing dataset for the index case, NYCPHL-002130, showed all but one of 142 the alleles occurring at ≥ 85%, and all but one of the alleles (C14676T) were also found as 143 "shared" or "major strain" classes in the NYCPHL-002461 datasets (Figure 1 ). The C23099A 144 mutation, which was at intermediate frequency in NYCPHL-002461, was present at only 88.1% 145 in NYCPHL-002130 from the index case, suggesting the transmission of a mixed viral 146 population between these individuals. 147 148 Phylogenetic inference with major and minor variants. We identified sub-clades within 149 Alpha and Epsilon that shared substitutions with the major and minor strains ( Figure 2) . We 150 inferred a maximum likelihood (ML) phylogenetic tree in IQTree2 for the major strain and 1174 151 related B.1.1.7 genomes containing the C2110T, C14120T, C19390T, and T7984C substitutions 152 found in the major strain ( Figure 2A ). We also inferred an ML tree for the minor strain and 807 153 related B.1.429 genomes containing the C8947T, C12100T, and C10641T substitutions found in 154 the minor strain ( Figure 2C ). 155 156 Root-to-tip regression analyses show that the NYCPHL-002461 sampling date is consistent with 157 the molecular clock for both the major and minor strain sequences ( Figure The minor variant is genetically distinct from all other sampled genomes, including any genome 168 sequenced by NYC DOHMH ( Figure 2C ). The closest relatives were sampled in California 169 (EPI_ISL_3316023, EPI_ILS_1254173, EPI_ISL_2825578), the United Kingdom 170 (EPI_ILS_873881), and Cameroon (EPI_ISL_1790107, EPI_ISL_1790108, EPI_ISL_1790109). 171 The most similar of these relatives is EPI_ISL_3316023, which was sampled on 11 January 172 2021 in California and represents the direct ancestor of the minor variant on the phylogeny. The 173 only mutation separating these genomes is T28272A, which is a reversion away from an 174 Epsilon-defining mutation. There are no sequenced closely related Epsilon genomes from NYC, 175 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 21, 2022. The longest cloned region spanned 947 nt within the S gene and contained 5 nt substitutions 209 differentiating the major and minor strains plus a variable site in the major variant. Of the 104 210 clones sequenced within this region, 60 (57.7%) were major strain haplotypes, 13 (12.5%) were 211 minor strain haplotypes, whereas the remaining 31 clones (29.8%) contained both major and 212 minor strain mutations, consistent with recombination ( Figure 3A ). We observed 11 distinct 213 combinations of major and minor strain mutations across these clones, with two distinct 214 haplotypes present in 6 clones apiece. Most haplotypes are consistent with only a single 215 recombination breakpoint, though we did observe clones consistent with 2 or 3 breakpoints. 216 217 The second cloned S region spanned 658 nt in S including the Δ 69-70 and Δ 144 deletions 218 characteristic of the major strain and two 2 substitutions in the minor strain. Of the 93 clones 219 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 21, 2022. ; https://doi.org/10.1101/2022.01.18.22269300 doi: medRxiv preprint sequenced, 69 (74.1%) were major strain haplotypes, 17 (18.3%) were minor strain haplotypes, 220 and 7 (7.5%) were mixed haplotypes ( Figure 3B ). Five of these mixed haplotypes contained 221 only one of the two deletions. Unlike in the primary sequencing analyses where the The third, and shortest, cloned region spanned 476 nt of ORF8, surrounding 4 substitutions 227 defining the major strain and 1 minor strain substitution. Of the 36 cloned sequences, 30 228 (83.3%) had the major strain haplotype, 2 (5.6%) had the minor variant haplotype, and 4 229 (11.1%) had mixed haplotypes consistent with recombination ( Figure 3C) Fisher's exact test). 260 261 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 21, 2022. Here, we report evidence of intra-host recombination of SARS-CoV-2 within a single individual 287 superinfected with Alpha and Epsilon viral variants during the second COVID-19 wave in New 288 York City in early 2021. Because recombinant viruses can be successfully generated and 289 transmitted 15 between humans, this finding underscores their potential relevance to the future of 290 the COVID-19 pandemic. 291 292 The presence of major and minor strains described within the superinfected individual are 293 unlikely to be the result of bioinformatics error, contamination, or experimental artifacts. The 294 degree of evolutionary divergence of each of the strains from other available SARS-CoV-2 295 genomes is consistent with viruses circulating at the time of their January 2021 sampling dates. 296 Moreover, the major strain genome is identical to contemporaneously sampled genomes from 297 both a named contact and strains circulating in the country from which they had both recently 298 visited. No closely related genome to the minor strain was ever sequenced by NYC DOHMH, 299 lessening the probability of a contaminated sample. Given the relatively low sequencing 300 coverage in NYC in January 2021 and low prevalence of the Epsilon variant, around 1% in NYC 301 at the time, it is not unexpected that a closely related genome would not be observed. 302 Furthermore, the major and minor variants were both present in all four extractions of the two 303 aliquots at similar frequencies, indicating that any contamination, if present, would need to have 304 occurred in the original sample swab. 305 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 21, 2022. ; https://doi.org/10.1101/2022.01.18.22269300 doi: medRxiv preprint 306 The timing of this superinfection is important, because January 2021 was the peak of the 307 second COVID-19 wave in NYC, a time when numerous variants were circulating and 308 immediately prior to the vaccination roll-out campaign. Hence, January 2021 in NYC represents 309 not only the height of potential for superinfection risk, but also a location where its existence 310 would be most apparent due to the co-circulation of numerous genetically distinct viral variants. 311 312 There remain unexplained patterns in the genome sequencing data from the superinfected 313 individual. Evidence of a major and minor strain was not apparent at the S deletions Δ69/70 and 314 Δ144 in the genome sequencing, but the cloning analysis showed major and minor alleles at 315 these sites at the expected frequencies. Therefore, it is possible that the ARTIC protocol 316 preferentially sequenced templates containing these deletions, giving a false impression of their 317 predominance in the genomic analysis. Also of interest is the A28272T mutation in the minor 318 strain, which is either a reversion or potential sequencing artifact. The high number and genomic variability recombinant haplotypes that we have identified within 346 a single superinfected individual suggests that recombination is perpetually occurring within 347 SARS-CoV-2 infections. Whether recombination will play a role in the emergence of novel 348 SARS-CoV-2 variants is an open question. Reduced incidence due to vaccine-induced and 349 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 21, 2022. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 21, 2022. sequencing instrument using V3 600-cycle reagent kit, with a V3 flow cell for 250-cycle paired-384 end sequencing (Illumina). For NYCPHL-002461, the same "extraction" specimen aliquot was 385 used for a second extraction, Extract B. Extracts C and D were independent extractions, but 386 from the "archived" specimen aliquot. As such, the first extract (A), and extracts B, C, and D 387 were independent samples which underwent independent reverse transcription, ARTIC PCR, 388 library preparation, and sequencing reactions. Additionally, reads without a combination of major or minor alleles are excluded. All unique 400 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 21, 2022. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 21, 2022. Population level recombination detection. We downloaded all SARS-CoV-2 sequences from 485 GISAID that were deposited by 5 September 2021 37 and analyzed the 27,806 genomes 486 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 21, 2022. ; https://doi.org/10.1101/2022.01.18.22269300 doi: medRxiv preprint sequenced by the NYC PHL and Pandemic Response Lab (PRL) from specimens collected 487 from NYC residents. These genomes were aligned to the Wuhan-Hu-1 reference genome 488 (Genbank accession NC_045512.2) using MAFFT v7.453 (options --auto --keeplength --489 addfragments) 33 . 490 491 To determine whether there was any onward transmission of a major-minor strain recombinant, 492 we used 3SEQ v.1.7 26 as a statistical test for recombination in the NYC data. 3SEQ 493 interrogates triplets of sequences for signals of mosaicism in a sequence given two 'parental' 494 sequences. We interrogated each of the 27,806 NYC PHL and PRL-generated genomes for 495 mosaicism given the major and minor strains as parents. The resulting p-values are Dunn-Sidak 496 corrected for multiple comparisons (n=55612), and we tested for mosaicism at p-value 497 thresholds 0.05 and 0.25. The single nucleotide differences between a putative recombinant 498 and the major and minor strains were visualized using snipit 499 (https://github.com/aineniamh/snipit). 500 501 502 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 21, 2022. ; https://doi.org/10.1101/2022.01.18.22269300 doi: medRxiv preprint Data availability. 503 The data analyzed as part of this project were obtained from the GISAID database and through 504 a Data Use Agreement between NYC DOHMH and the University of California San Diego. We 505 gratefully acknowledge the authors from the originating laboratories and the submitting 506 laboratories, who generated and shared via GISAID the viral genomic sequence data on which 507 this research is based. A complete list acknowledging the authors who submitted the data 508 analyzed in this study can be found in Data S1. 509 510 Trimmed, host-depleted viral sequencing data and cloned sequence fragments have been 511 submitted to NCBI (accession numbers pending). 512 513 Acknowledgements. to their institution unrelated to this research. All other authors declare no competing interests. 527 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 21, 2022. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 21, 2022. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 21, 2022. ; https://doi.org/10.1101/2022.01.18.22269300 doi: medRxiv preprint there are clones with only major alleles, minor alleles, and mix of both alleles. Out of the three 639 cloned regions, the S-substitution clones has the highest frequency of mixed variants. 640 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 21, 2022. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 21, 2022. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 21, 2022. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 21, 2022. ; https://doi.org/10.1101/2022.01.18.22269300 doi: medRxiv preprint Evolutionary aspects of recombination in RNA viruses Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage 533 responsible for the COVID-19 pandemic RNA recombination in a coronavirus: recombination between 535 viral genomic RNA and transfected RNA fragments Exploring the natural origins of SARS-CoV-2 in the light of 537 recombination Co-circulation of three camel coronavirus species and recombination of 539 MERS-CoVs in Saudi Arabia Cytodiagnosis of Candida 541 organisms in cervical smears Dual infection of novel influenza viruses A/H1N1 and A/H3N2 in a 545 cluster of Cambodian patients Natural co-infection of influenza A/H3N2 and A/H1N1pdm09 viruses 547 resulting in a reassortant A/H3N2 virus Case report: change of dominant strain during dual SARS-CoV-2 549 infection Long-Term Severe Acute Respiratory Syndrome Coronavirus CoV-2) Infectiousness Among Three Immunocompromised Patients: From Prolonged 552 Viral Shedding to SARS-CoV-2 Superinfection Phylogenomics reveals viral sources, transmission, and potential 554 superinfection in early-stage COVID-19 patients in Ontario Novel Coronavirus Is Undergoing Active Recombination A Glimpse Into the Origins of Genetic Diversity in the Severe Acute 559 Respiratory Syndrome Coronavirus 2 Generation and transmission of interlineage recombinants in the 561 SARS-CoV-2 pandemic Recombinant SARS-CoV-2 563 genomes are currently circulating at low levels. bioRxiv Evidence for increased breakthrough rates of SARS-CoV-2 variants of 565 concern in BNT162b2-mRNA-vaccinated individuals The emergence and ongoing convergent evolution of the SARS-CoV-567 2 N501Y lineages Transmission, infectivity, and neutralization of a spike L452R SARS-CoV-569 2 variant Assessing transmissibility of SARS-CoV-2 lineage B.1.1.7 in England. 571 Ready-to-use public infrastructure for global SARS-CoV-2 monitoring Detection and characterization of the SARS-CoV-2 lineage B.1.526 575 in New York A simple and robust statistical test for detecting 577 the presence of recombination