key: cord-0849958-1gr6tlhp authors: Fischer, Will; Giorgi, Elena E.; Chakraborty, Srirupa; Nguyen, Kien; Bhattarcharya, Tanmoy; Theiler, James; Goloboff, Pablo A.; Yoon, Hyejin; Abfalterer, Werner; Foley, Brian T.; Tegally, Houriiyah; San, James Emmanuel; de Oliveira, Tulio; Gnanakaran, S.; Korber, Bette title: HIV-1 and SARS-CoV-2: Patterns in the Evolution of Two Pandemic Pathogens date: 2021-06-03 journal: Cell Host Microbe DOI: 10.1016/j.chom.2021.05.012 sha: 1b76fabdc490bb942e338c3065784be1d294d92e doc_id: 849958 cord_uid: 1gr6tlhp Humanity is currently facing the challenge of two devastating pandemics, caused by two very different RNA viruses: HIV-1, which has been with us for decades, and SARS-CoV-2, which has swept the world in the course of a single year. The same evolutionary strategies that drive HIV-1 evolution are at play in SARS-CoV-2. Single nucleotide mutations, multi-base insertions and deletions, recombination, and variation in surface glycans all generate the variability that, guided by natural selection, enables both HIV-1’s extraordinary diversity and SARS-CoV-2’s slower pace of mutation accumulation. Even though SARS-CoV-2 diversity is more limited, recently emergent SARS-CoV-2 variants carry Spike mutations that have important phenotypic consequences, in terms of both antibody resistance and enhanced infectivity. We review and compare how these mutational patterns manifest in these two distinct viruses to provide the variability that fuels their evolution by natural selection. In the past half century, two distinct novel RNA viruses have caused global pandemics: 18 human immunodeficiency virus type 1 (HIV-1) and severe acute respiratory syndrome 19 coronavirus type 2 (SARS-CoV-2). Both emerged as zoonotic pathogens. 20 HIV-1 is most closely related to immunodeficiency viruses found in wild chimpanzees 21 (SIVcpz; Hahn et al., 2000; Sharp and Hahn, 2011) ; zoonoses involving SIVcpz have 22 occurred multiple times . The vast majority of HIV/AIDS cases worldwide 23 are associated with the M-group (Main), which comprises subgroups A, B, C, D, F, G, 24 H, and J, in addition to circulating recombinant forms (CRFs). In this review, we use the 25 term HIV-1 in a restricted sense to refer to the M-group only. HIV-1 is transmitted 26 sexually or through other body fluids, such as blood and breast milk. The ancestor to the 27 HIV-1 M group, which ultimately gave rise to the pandemic, is likely to have first entered the 28 human population in Africa early in the 20th century (Zhu et al., 1998; Korber et al., 2000; 29 Worobey et al., 2008) . HIV-1 is a chronic infection that over many years results in 30 immunodeficiency and eventually causes death by impairing i mmune defenses, allowing 31 opportunistic infections to arise. A major challenge in eliminating HIV-1 is latency, which 32 does not occur in SARS-CoV-2. As a consequence, acquired immune deficiency syndrome 33 position in the alignment, to use as a central reference point. This choice of a reference 207 sequence was made to minimize the number of differences highlighted in our HIV figures 208 ( Figure 1A and S1). For illustration, we condense the alignments into a single figure that 209 displays the 35 million bases in the 3,903-sequence curated full-genome HIV-1 alignment 210 (containing only one sequence per infected individual), and >100 million bases in a randomly 211 sampled 3,903-sequence near-full-genome SARS-CoV-2 alignment ( Figure S1 ). The 212 mutational patterns in the Env and Spike proteins, most relevant to vaccine design, are shown 213 in Figure 1A and 1B respectively; the high density of Env mutations reflects the formidable 214 challenge of creating a HIV vaccine that can elicit cross-reactive immune responses. 215 Other conspicuous features of the data are the different degrees of "bushiness" in the 216 phylogenetic trees (Figure 1 ), and the large numbers of lineage-specific mutations in HIV-1 217 Env; these patterns of enriched mutations are specific t o , and help to define, major clades 218 and circulating forms (Figure 1 and Figure S1 ). The HIV-1 tree itself has a star-like structure 219 that is consistent with a rapidly expanding infection in a homogeneously susceptible 220 population. Contrastingly, in Spike, the emerging clades and "variants of interest" of SARS-221 CoV-2 are associated a small number of amino-acid changes across the protein (or base 222 mutations across the ∼30,000 bases of genome). The long vertical lines in Figures 1B and S1 223 represent mutations that are shared among phylogenetically clustered sequences. Some clade-224 defining prominent mutations in Spike are apparent ( Figure 1B ). These include D614G, 225 , which is embedded in a 4-mutation haplotype that defines the G 226 clade ( Figure S1 ); A222V, which became common in the UK and Europe in the summer 227 of 2020 Bartolini et al., 2020) ; and S477N, which dominated the 228 Australian sampling in the summer of 2020. Both A222V and S477N became relatively less 229 common in late 2020/early 2021 as the lineages with these mutations were replaced (Shen et 230 al., 2021) In the first quarter of 2021, different VOI/VOCs have been increasing in 235 prevalence in some geographically distinct local populations at a rapid pace (Deng et al., 236 2021; West et al., 2021; Rambaut et al., 2020b; Bugembe al., 2021) , but only the 237 B.1.1.7 form was sampled at a high enough frequency globally to be visually apparent 238 in the subset of sampled viruses included in Figure 1B . We therefore highlight the changes 239 in the Spike protein ( Figure 1C) , and in the full-length genome ( Figure S1C ), that 240 characterize the baseline forms in six of the variants that are increasing in frequency in 241 local populations, and are spreading globally. Two things are evident in these figures: first, 242 that each VOI/VOC is itself a lineagecontinuing to evolve, sampling additional mutations 243 over timeand also that multiple VOI/VOCs share particular mutations. To wit, the 244 mutation N501Y, noticed first in the B.1.1.7 lineage and considered important due to its 245 location in the Receptor Binding Motif (Rambaut et al., 2020b) , is also found in the distinct 246 lineages B.1.351 (South Africa; also called 501Y.V2 (Wibmer et al., 2021) ) and P.1 247 (Brazil). N501Y enhances infectivity, with a modest impact on neutralization (Leung et al., 248 2021; Rathnasinghe et al., 2021) . L452R, found in the B.1.427 and B.1.429 lineages from 249 California and more recently in the B.1.617-related lineages from India, can enhance 250 infectivity, and impart resistance to many RBD targeting antibodies and sera (Deng et al., 251 2021 , McCallum et al., 2021b . E484K is found in B.1.351, P.1, and in a sublineage of 252 A.23.1; it confers neutralizing antibody resistance (Wibmer et al., 2021) . Two distinct 253 mutations from amino acid K417, K417T and K417N, appear in P.1 and B.1.351, 254 respectively; mutations in K417 also contribute to neutralizing antibody resistance 255 (Wibmer et al., 2021) . B.1.351 shares a further mutation, A701V, with the B.1.526 variant 256 first reported from New York ( Figure 1C ); A701V is an example of a shared mutation that 257 is, as yet, unexplored for phenotypic consequences. The observation of any single mutation 258 in multiple expanding lineages suggests convergent evolution, i.e., that fitness effects of 259 that mutation help drive lineage expansion. As noted above, several of these mutations 260 have been shown experimentally to be advantageous to the virus. The observation of 261 multiple such mutations in particular lineages suggests that these fitness effects can be 262 cumulative. 263 J o u r n a l P r e -p r o o f Insertions and deletions (indels) are a critical adaptive mechanism for both HIV-1 and 265 SARS-CoV-2, but they manifest differently. Indels originate via non-homologous 266 recombination and can happen anywhere in the HIV genome. However, viable indels that 267 do not introduce frameshifts are most commonly found in hypervariable regions (Wood et 268 al., 2009) . HIV-1 insertions generally manifest as direct, short-repeat duplications (Wood et 269 al., 2009) . Four of the variable loops of the HIV-1 Env protein, (V1, V2, V4, and V5, but not 270 V3) contain hypervariable sections that have an extraordinary capacity to change by 271 insertion and deletion (Bricault et al., 2019) ; the variability in these regions is dramatic 272 and plays an important role in neutralizing antibody resistance. The extraordinary length 273 variation in these four hypervariable loops ( Figure 2A ) is accompanied by changes in net 274 charge and in the number of N-linked glycosylation sites (Tian et al., 2016) . For example, 275 the V1 loop in Env can accommodate lengths that range between 5-66 amino acids 276 (median, 30; Figure 2A ); some hypervariable V1 loops include no N-linked glycosylation 277 sites, others up to 11 (median, 4), and they have a net charge ranging from -6 to 8 278 (median, -1). Much of the observed population-level variation in these loops can be 279 recapitulated in a single individual during the course of infection (Stephenson et al., 2020) . 280 Hypervariable region indel evolution begins early in HIV-1 infections, and contributes to 281 viral escape from the earliest antibodies as they begin to impose selective pressure during 282 antibody/viral co-evolution (Bar et al., 2012; Gao et al., 2014; Bonsignori et al., 2017; 283 Roark et al., 2021; Korber et al., 2017) . 284 Because indels are the primary evolutionary driver within the hypervariable regions of 285 Env, it is inappropriate to assume sequence homology and base-substitution drive evolution 286 in these regions. Thus alignment dependent strategies for identifying positive selection 287 associated with the acquisition of immunological resistance can be misleading. Generally, 288 therefore, we explore the impact of hypervariable regions on neutralizing antibody sensitivity 289 by using three attributes of the variable loops that are independent of alignment: loop length, 290 net charge, and number of glycosylation sites. Particular characteristics of certain loops are 291 associated with resistance to particular classes of broadly neutralizing antibodies (Bricault et 292 al., 2019) . The loop lengths, their charge, and variable glycosylation patterns all affect loop 293 conformation, directly modulating access to critical epitopes. A remarkable aspect of Env 294 hypervariable region evolution is that the location of hypervariable indels observed in a 295 human host during early infection with HIV-1 will often be precisely recapitulated when that 296 same Env is incorporated into a SHIV construct and used to infect Rhesus macaques (Roark 297 et al., 2021) . 298 SARS-CoV-2 is also accumulating indels that can have critical phenotypic consequences, but, 299 thus far, only a few specific deletions have become prominent among pandemic variants. As in 300 HIV-1 Env, these deletions often occur in, or proximal to, structurally flexible loop regions. The 301 Spike ∆H69/V70 deletion (∆69/70) is the most common globally, and is found in many lineages 302 and distinctive Spike contexts ( Figure 2B ,C). It first came to prominence in association with mink 303 farm outbreaks in Denmark (European Centre for Disease Prevention and Control, 2020; 304 Lassaunière et al., 2020; van Dorp et al., 2020) , paired with either the N439K or Y454F mutations 305 in the RDB (Shen et al., 2021; Kemp et al., 2021a) . One study suggested that ∆69/70 has minimal 306 impact on the neutralization potency of serum from convalescents or vaccinees (Shen et al., 2021) , 307 although another found that this deletion could affect antibody binding and/or neutralization 308 (McCarthy et al., 2021) ; a third study reported that ∆69/70 can enhance infectivity in vitro (Kemp 309 et al., 2021) . These forms of the virus were a significant presence in the European epidemic in the 310 summer and early fall of 2020 (Shen et al., 2021) . Their prevalence, however, like that of the G 311 clade they descended from, began to decline soon after the more transmissible B.1.1.7 variant (also 312 a G clade descendant) was first sampled in the United Kingdom in late September 2020 (Volz et 313 al., 2021; Rambaut et al., 2020b; Davies et al., 2020) There are also small spatially localized clusters of distinct indels found in Spike that are 321 rare but likely to be viable and transmitted, as they often are sampled multiple times (Figures 322 2B and S2) . The most interesting of these clusters is in the region between Spike positions 323 137-148 ( Figure S2A ). While this Spike variable region is much less variable than the 324 hypervariable regions of HIV-1, it shares some features with them: (i) the region overlaps 325 with an exposed loop on Spike, the N3 loop (Chi et al., 2020) ; (ii) there are many distinctive 326 patterns of local deletions observed in this regionalong with the very frequently observed 327 ∆144 deletion, a spectrum of 24 other distinct deletion patterns are found among 341 different 328 Spike sequences ( Figure S2 ); and finally, (iii) it is embedded in the NTD supersite, and so, 329 like ∆144 (McCallum et al., 2021; Cerutti et al., 2021) , the other deletions in this region are 330 also likely to impact antibody resistance. There are also rare deletions that are near to, or 331 span the furin cleavage site ( Figure S2A , positions 671-693); a deletion of the furin cleavage 332 site augmented viral growth in culture, but produced virus that was attenuated in vivo 333 . 334 A third deletion, ∆L242/A243/L244 (∆242-244), is found in the B.1.351 lineage that has 335 come to dominate the South African epidemic (Wibmer et al., 2021) . This variant has a 336 formidable neutralization resistance profile, and ∆242-244 has been proposed to alter the loop 337 structure and contact region for NTD-targeting neutralizing antibodies (Wibmer et al., 2021) . 338 The positions 242-244 are not themselves in a loop, but represent three hydrophobic residues 339 at the end of a strand in a -hairpin motif; deleting them would likely alter the N5 loop 340 structure that connects the strands, a contact region for NTD-targeting neutralizing 341 antibodies. Interestingly, ∆242-244 is found not only in the B.1.351 backbone but also in a 342 small number of B.1.1.7 lineage sequences, as well as in a few sequences with no additional 343 Spike mutations (e.g. 3 from South Africa in Dec. 2020, and one from China in Feb. 2020; in 344 all, 4 cases in the context of an ancestral form of Spike at position 614, D614). The ancestral 345 D614 is currently (March, 2021) very rarely sampled. D614G confers a fitness advantage in 346 terms of transmissibility, and global samples had almost entirely shifted to the mutated form 347 by early summer of 2020 Hou et al., 2020) . Still, the D614G mutation 348 may come at a cost for the virus, as some have found the ancestral D614 form to be more 349 resistant to neutralization by sera. A 4-6 fold increase in vaccine sera sensitivity was 350 observed for D614G in one study (Weissman et al., 2021) , while in another an average 1.7 351 fold difference was observed among sera from hamsters infected with D614 virus against the 352 D614G variant . However, not all studies find such a difference. For 353 example, Hou and colleagues did not see a significant difference between the two forms in 354 terms of neutralization sensitivity to human convalescent sera ). As the virus 355 is increasingly confronted with convalescent and vaccine sera over the course of 2021, the 356 greater neutralization sensitivity of the D614G form (if this is indeed the case) may come to 357 outweigh its increased transmissibility as a selective force at the population level, and D614 358 may begin to re-emerge. Of note in this regard, the ancestral D614 is part of the Spike 359 signature of the VOI in the A23.1 lineage that recently emerged from Uganda. D614 has also 360 recently resurfaced in combination with ∆69/70 and with ∆144 ( Figure and in many other contexts, by a precise 6-base deletion that overlaps 3 codons encoding 369 Spike amino acids 68-70 ( Figure 2E ). As noted above, such repeated precise mutations were 370 also found in SHIV studies with specific HIV-1 Envs even in different infected hosts (Roark 371 et al., 2021) . In addition to spontaneous indel recurrence, recombination may contribute to 372 indel movement through the population, enabling selection and increasing the frequencies of 373 distinct variants, and variants-of-variants, that carry ∆60/70, ∆144, and ∆242-244 in Spike 374 backbones (Giorgi et al., 2021; Varabyou et al., 2020) . 375 More extensive indel patterns have been increasingly observed in recent regionally 376 emerging variants through the Spring of 2021. Some examples include a complex variant 377 increasingly sampled in Chile, and spreading internationally, that carries a seven amino acid 378 deletion at Spike ∆246-252 (previously Pango lineage B.1.1.1; now called C.37); a variant 379 that is increasingly sampled in the Philippines that carries two deletions, Spike ∆141-143 and 380 ∆243-244 (lineage P.3); a variant that is increasingly frequently sampled globally that was 381 first in sample India, (lineage B.1.617.2, a CDC-listed VOI), that carries a two amino acid 382 deletion in a distinctive region, Spike ∆156-157; and a still very rare but particularly 383 interesting variant in terms of indels that was first sampled in Russia (lineage AT.1), with a 384 large nine amino deletion, Spike ∆136-144, and a four amino acid insertion at Spike 679, 385 GIAL, very near the furin cleavage site. 386 The SARS-CoV-2 Spike protein and HIV-1 Envelope (Env) protein are Class I viral 388 fusion glycoproteins (White et al., 2008) that are trimeric in both pre-and post-fusion states. 389 Env is the smaller of the two. In its native state, it forms a heterodimeric trimer comprised Figure 3C ) (Wrobel et al., 2020) ; this increases 397 the SASA to 1300 nm 2 . HIV-1 Env also undergoes substantial conformational change 398 during the pre-fusion process of docking to a target cell. Upon Env binding to its primary 399 receptor (CD4), the variable V1 and V2 loops move away from the Env apex, exposing the 400 CCR5 co-receptor binding site. CCR5 binding triggers a conformational transition that 401 enables the gp41 fusion machinery to access the target cell membrane and initiate fusion 402 (Wang et al., 2016) . Thus, both viral surface proteins exhibit significant conformational 403 plasticity that protects the receptor-binding interface until the critical moment, enabling the 404 preservation of high-affinity binding to host receptors in the face of immune pressure. 405 J o u r n a l P r e -p r o o f Both Spike and Env proteins are highly glycosylated, primarily with N-linked glycans, 408 although both also include less well-characterized O-glycans (Shajahan et al., 2020; Silver et 409 al., 2020) . Glycans are extremely dynamic and are much more flexible than the underlying 410 protein, so any single glycan can sample a large volume in space. Multiple glycans in 411 combination become, in effect, a physical shield that blocks antibody access to the antigenic 412 surface of the protein for both HIV-1 Env (Wei et al., 2003; Berndsen et al., 2020) Site-specific mass spectrometry studies (Behrens et al., 2016; Watanabe et al., 2020a) 422 indicate that, in both proteins, each glycosylation site is occupied by heterogeneous mixtures 423 of glycoforms: high oligomannose, complex (with or without fucosylated cores and 424 negatively charged sialic acid tips), and hybrids of the two. Relative glycoform frequencies 425 depend on expression cell lines and their glycosylation enzyme repertoire (Goh and Ng, 426 2018) . Both the chemical composition of individual glycans and of the amino acids in 427 physical contact with them can affect the orientation of glycans and inter-glycan interactions 428 . All these factors affect the overall topology and the 429 immunological protection conferred by the glycan shield. 430 For HIV-1, glycan shield evolution is an important immune evasion strategy. Glycans 431 on the HIV-1 Env are concentrated on long, flexible loops, increasing their dynamic range 432 and spatial coverage ( Figure 3B ). The Glycan Encounter Factor (GEF) gives the probability 433 of a probe's encountering a sugar's heavy atoms as it approaches the Env surface. This 434 provides a metric to quantify the glycan coverage over the surface of the protein, illustrating 435 how well the glycan shield can protect against approaching antibodies ( Figure 3C ) 436 ). Due to the dense and dynamic glycan coverage, the Env protein 437 has high GEF across almost the entire antigenic surface, although the CD4 binding site where 438 the Env interacts with the host receptor ( Figure 3C ) remains relatively exposed. In HIV-1 439 Env, certain glycan sites can sometimes shift by a few amino acids with important 440 immunological consequences. For example, a shift of a glycan from position 332 to 334 can 441 result in significant antibody resistance for broadly neutralizing antibodies (bNAbs) targeting 442 V3-glycans (Bricault et al., 2019) . HIV-1 bNAbs with breadth and potency favor long heavy 443 chain third complementarity determining regions (HCDR3s) , which 444 enable these antibodies to reach through the glycan shield to target epitopes at the protein 445 surface (Dashti et al., 2019) . Since the glycan shield topology varies between HIV-1 viruses, 446 glycan holes with low GEF regions may vary likewise, greatly affecting Env variant 447 sensitivity to different antibody responses. 448 Due to the smaller number of glycans on a larger surface area, the density of 449 glycosylation is much lower on the SARS-CoV-2 Spike than it is on HIV-1 Env (Figure 3) . 450 Since the Spike protein surface is less effectively shielded by glycan coverage relative to the 451 Env surface, two regions that harbor critical neutralizing antibody epitopes, the RBD and the 452 N-terminal Domain (NTD) supersite region, are relatively exposed. The RBD forms the 453 functional interface between the Spike protein and the host ACE2 receptor, and contains 454 several mutations seen in the recent variants. The recurrent N501Y mutation alters specific 455 interactions with ACE2 and may lead to increased binding affinity, as well as enhanced 456 infectivity (Leung et al., 2021; Rathnasinghe et al., 2021) . When a single RBD is rotated up, 457 effecting the change from "all-down" to "one-up" conformation that enables ACE-2 receptor 458 binding (Wrapp et al., 2020) , all-atom molecular dynamics simulations show that there is a 459 accompanying transition in the glycan shield ( Figure 3C ): in the "one-up" conformation, the 460 glycan coverage at the apex of the trimer in disappears (Mansbach et al., 2021) . Of note, 461 when the RBD is in the "one-up" conformation ( Figure 3C ) the amino acid at site 501 is 462 exposed to solvent with no glycan coverage, not even by neighboring RBD glycans N331 463 and N343. Recombination has played a critical role in HIV-1 evolution (Zhang et al., 2010) . HIV-1 495 nomenclature (Robertson et al., 2000) recognizes both major clades, specified A-K, and over 496 100 circulating recombinant forms (CRFs) (characterized and listed at the Los Alamos HIV 497 database, www.hiv.lanl.gov). Such inter-subtype recombination events are readily detected 498 by sequence analysis; within-subtype recombination is more challenging to resolve, but can 499 still be identified (Kiwelu et al., 2013; Nikolaitchik et al., 2015) . Recombination is also a 500 very important evolutionary mechanism over the course of a natural HIV-1 infection within a 501 single individual (Shriner et al., 2004 , Song et al. 2018 . A bioinformatic tool developed to 502 track within-subject recombination in the low-diversity setting of HIV early infection 503 (RAPR, Song et al. 2018) , can also be usefully applied in the low-diversity setting of SARS-504 CoV-2 in the COVID-19 pandemic. 505 506 Coronavirus infections are frequent and widespread across different animal reservoirs, 508 where distinct viruses may coexist in the same hosts and often recombine (Denison et al., 2011; 509 Su et al., 2016) . At high multiplicities of infection, more than 25% of viral progeny may be 510 recombinant (Baric et al., 1990) . Recombination is an important element of coronavirus 511 evolution, can be observed even between different coronavirus families, and has been 512 implicated in the origin of SARS-1, MERS, and SARS-CoV-2 (Sabir et al., 2016; Li et al., 513 2020a; Lam et al., 2020; Hon et al., 2008) . In low diversity settings, such as the first year of the 514 SARS-CoV-2 pandemic, many standard bioinformatic strategies for detecting recombination 515 will be insufficiently sensitive (this includes, e.g., strategies developed to detect recombination 516 between major HIV-1 clades). Nevertheless, several studies have found occurrences of 517 recombination among SARS-CoV-2 pandemic variants (De Maio et al., 2020; Korber et al., 518 2020a; Varabyou et al., 2020) . Varabyou and colleagues found evidence of recombination in 519 SARS-CoV-2 based on major clades and their defining mutations. By using this method to 520 screen the full GISAID database (http://gisaid.org), they found hundreds of instances of likely 521 recombinants, some of which persisted in the population. They could demonstrate that at least 522 some of these recombinants were not the result of sequencing from mixed infections, and that 523 some were parts of transmitted lineages (Varabyou et al., 2020) . 524 Using the RAPR tool, which was designed specifically for low diversity settings (Song et 525 al., 2018) , we find strong evidence of recombination among geographically regional sets of 526 SARS-CoV-2 sequences. RAPR uses the full set of variable positions in its analysis, not just 527 major clade defining positions, which may enhance sensitivity in some cases, but it is 528 computationally intensive (as it compares all possible sequence triplets), which limits its use to The level of sequence diversity differs conspicuously between SARS-CoV-2 and HIV-1 544 (Figures 1 and S1 ). Outside of closely related transmission chains, within-subtype Hamming 545 distances between Env proteins typically range from 10% to 25%, with between-subtype 546 differences of 20% to 40%. Within each single host, HIV evolves at a rate approaching 1% 547 per year (Krakoff et al., 2019) . The baseline form of the most divergent lineage of SARS-548 CoV-2 characterized to date, P.1, carries only 12 changes in the 1273 amino acids of the 549 Spike protein, less than 1% (Toovey et al., 2021) . But a comparison of the global transitions 550 in variant frequencies over time merits consideration, as for both viruses, variants and global 551 diversity will shape the future success of vaccines. A general feature of the HIV-1 epidemic 552 is the gradual increase in diversity within clades in local geographic populations; these 553 changes are accompanied by greater levels of resistance to sera derived from natural 554 infection Rademeyer et al., 2016) . 555 To illustrate the degree of large-scale change in the HIV-1 pandemic over time, we 556 compare the global subtype and CRF distribution between two 6-year windows, 2000-2005 557 and 2015-2020 ( Figure 5A) . A striking feature of this analysis is the relatively consistent 558 frequency of sampling of different subtypes in different major geographic regions (Bbosa et 559 al., 2019) . This consistency, which is likely a consequence of the much lower transmission 560 rate of sexually transmitted pathogens, may enable regional deployment of subtype-specific 561 immunological strategies for prevention, if they can be successfully developed. The form was found to be commonly circulating in Nigeria, an unexpected finding since B clade 578 variants have rarely been sampled in Africa (Billings et al., 2019) . One of the more 579 unfortunate recent trends evident in Figure 5 is that CRF01 viruses, which once had a more 580 limited distribution focused in Southeast Asia, are now more commonly sampled in China 581 and Australia. Due to glycan shifts, as discussed above, CRF01 viruses are almost completely 582 insensitive to V3-glycan-targeting bNAbs, a key focus of current vaccine design efforts. 583 In contrast to the slow transitions in HIV subtypes and diversification over decades, 584 SARS-CoV-2 variants that carry an advantageous set of mutations can move very globally 585 swiftly, and effect near-total turnover of local populations on the time scale of a few months 586 ( Figure 5B and C). If this were to occur repeatedly without lineage extinction, a star-like 587 phylogeny results; in contrast, if less-fit lineages are repeatedly driven to very low levels, a 588 ladder-like tree is expected, as with influenza. A visual signature of this phenomenon is 589 present in Figures 1 and S1 , where rapidly expanding lineages have reduced background 590 mutations. Rapid lineage expansion in SARS-CoV-2 was first observed as the G clade rapidly 591 replaced the ancestral virus that had initially seeded the global pandemic in the spring of 2020 592 (Korber et al., 2020a) . By the autumn of 2020, the viruses carrying the ancestral D614 form of 593 the virus were very rarely seen ( Figure 5B) are persisting. One of several CDC designated B.1.617-related VOIs that were originally 606 detected in India, B.1.617.2, has begun to be increasingly sampled globally, and is rapidly 607 increasing in prevalence in England, and also has a presence in North America (Fig. 5C) J o u r n a l P r e -p r o o f As we have highlighted throughout this review, there are some similarities between SARS-CoV-2 and HIV-1, but key differences as well. Both are enveloped RNA viruses, and are animal viruses that crossed into humans, and gave rise to pandemics. HIV-1 is primarily sexually transmitted and took many decades to acquire a global presence, while SARS-CoV-2 is a respiratory infection that became a global pandemic within months of its initial detection. Both viruses evolve using insertions, deletions, and recombination in addition to base substitution. Both viruses evolve under immune pressure, and have circulating variants with mutations in key epitope regions that confer relative resistance to neutralizing antibodies; indels are important for the evolution of antibody resistance in both HIV-1 and SARS-CoV-1. Both viruses have heavily glycosylated receptor-binding surface proteins that enable entry into host cells, but the glycan shield conferred by HIV-1 is far denser. A fundamental biological difference is that HIV-1 is a retrovirus and its genetic material can be harbored in latently infected cells, making it very difficult to clear. As a chronic infection, HIV-1 continues to evolve under immune pressure in every infected individual. In contrast, SARS-CoV-2 infections are typically soon cleared, although rare chronic cases of COVID-19 may be contributing to the more extensively mutated variants of interest and concern. Vaccines for HIV-1 are very challenging: in part because of the difficulty of inducing antibodies that can penetrate the glycan shield, and in part because of the tremendous diversity of the virus. Vaccines for SARS-CoV-2 were enabled by the relative accessibility of the key epitopes for neutralizing antibodies in the RBD and the NTD supersite, and by the limited variability of these epitope regions in the initial phases of the pandemic. As further variation in these regions continues to emerge, it will be critical to document the impact of arising mutations and to remain agile in our response to COVID-19. In the spring of 2020, SARS-CoV-2 was advancing through an immunologically naive population, and swiftly spreading in the context of its new human host. In such a scenario, it was reasonable to suppose that enhanced infectivity and transmissibility would confer a primary selective advantage, and this indeed proved to be the case. There was a repeated and very rapid shift in prevalence to G clade viruses (which carried the D614G J o u r n a l P r e -p r o o f mutation) essentially whenever that variant entered a new geographic region, even if that region had an ongoing, well-established, ancestral Wuhan-variant epidemic . G clade viruses were found to be more infectious in pseudotype assays and were associated with higher viral loads in the upper respiratory tract , and were shown to be more infectious in laboratory animals . What changed over the course of 2020 is that the virus began to encounter and propagate through populations with varying levels of immunity from prior exposure. Under these conditions, immunological resistance has greater potential to be favored as a force for positive selection; widespread vaccination will further increase such selective pressure during 2021. The G-clade viruses may be somewhat more susceptible to serum neutralization (Weissman et al., 2021; Plante et al., 2021) , and selection for antibody resistance may in the future counter-select for viruses with the ancestral D614 form. Some early evidence for this is that the A.23.1 viral lineage, which was increasing in prevalence in Central Africa (Bugembe et al., 2021) , carries the ancestral D614 form. Many new VOI and VOC have begun to emerge that simultaneously carry both multiple neutralizing antibody resistance mutations and enhanced-infectivity mutations, and many of the mutations that phenotypically benefit the virus are being resampled concurrently in different lineages (Figures 1, 2 and 4 ). This suggests that the virus may be exploring and re-exploring a favored mutational landscape within the context of currently circulating forms. Thus, understanding and defining recurrent mutation events may serve to guide us as we prepare for the possibility of second-generation vaccine designs to contend with growing viral diversity. By the summer of 2021, the virus will be moving through mostlyvaccinated populations in some countries, and through populations increasingly enriched for recovered individuals, and so the evolutionary pressures driving selection may once again be altered. The future course of the SARS-CoV-2 pandemic may well be set by the events surrounding large-scale vaccine rollout in the first and second quarter of 2021. Although complete viral eradication is unlikely, there are different possible modes of viral persistence, with different implications for future vaccine control. In influenza, selection for resistance and seasonal bottlenecking give rise to a ladder-like tree topology: there is extensive J o u r n a l P r e -p r o o f diversity over many years, but relatively little in any one season. In HIV-1, long-term persistent infection and co-evolution with immune responses produce a "bushy" tree, with the simultaneous and temporally extended epidemiological presence of extremely diverse viral lineages. Forcing case counts to low levels (as in the influenza seasonal bottleneck) reduces the variation available for viral evolution. If this were achieved in the current pandemic, SARS-CoV-2 might be driven to an influenza-like evolutionary trajectory. The developing phylogeny would then have a more ladder-like topology, with population-level immunity influencing shifts in dominant variants over time. In this case, a strategy with periodically updated vaccines specifically targeting the currently circulating variants may suffice for continuing vaccine protection. If, on the other hand, a wide range of variants continues to circulate, diversify, and recombine, the eventual result could be simultaneous and continuous circulation of various phylogenetically and immunologically distinctive variants, and possibly emerging recombinants between them. This situationmore reminiscent of HIV-1 than of influenzawould present a different kind of challenge for vaccines, and might require vaccines to be designed to induce broad responses specifically addressing frequently sampled antibody-resistance mutations that arise in multiple lineages. Several highly efficacious COVID-19 vaccines were deployed within a year of the emergence of SARS-CoV-2 (Richman, 2021; Tumban, 2020) , whereas 40 years of research has failed to produce any comparable vaccine for HIV-1. To begin to understand this discrepancy, one only has to look at Figures 1 and 3 . In HIV-1, the key vaccineelicited antibody epitopes are diverse at the sequence level, and they are well-protected by a dense, highly variable and dynamic glycan shield. These challenges require innovative and complex strategies to ultimately enable the design of an effective HIV-1 vaccine that can achieve broadly neutralizing antibody induction. In contrast, the SARS-CoV-2 key epitope regions have been comparatively slow in accumulating small numbers of mutations (Figure 1) , and key neutralizing epitope regions are exposed and hence vulnerable to antibodies (Figure 3 ). Spike vaccine antibody responses are very potent and target multiple epitopes, and to date they have generally been resilient and able to offer protection against variants with modest numbers of mutations. The SARS-CoV-2 vaccines may also elicit cross-reactive T-cell responses; known SARS-CoV-2 T-cell epitopes are J o u r n a l P r e -p r o o f highly conserved (Tarke et al., 2021; Redd et al., 2021) . The impact of such responses is still being determined. Rhesus macaques (RMs) are naturally resistant to severe disease, and one CD4+ or CD8+ T cell depletion experiment in RMS only slightly prolonged recovery from infection and did not impact re-infection (Hasenkrug et al., 2021) , while another found that CD8+ T cell depletion of convalescent macaques partially abrogated protective immunity against rechallenge (McMahan et al., 2021) . In more diverse HIV-1, many T-cell epitopes are also highly variable, and relatively few vaccine-elicited responses to full HIV-1 proteins are likely to cross-react with circulating variants (Korber and Fischer, 2020) . As variants shift in prevalence and the SARS-CoV-2 pandemic takes on new forms, our capacity to track and test these variants, and to use past mutational patterns to anticipate the future, may enable us to keep up with our viral foe's evolutionary twists and turns. Mutations of particular interest that are discussed in the text are labeled in panels B and C. The SARS-CoV-2 sequence data in this figure used data from the GISAID 2021-02-25 release date, "near-complete" alignment as described in Korber et al. 2020b ; alignment statistics at https://cov.lanl.gov). Kingdom between November 1, 2020 and May 10, 2021. In the fall, the G clade (light grey), and the GV clade (the G clade with an additional A222V mutation (darker grey)) were co-circulating, with a gradual relative increase in the GV clade relative to G clade over the summer and fall. B.1.1.7 (orange) was first sampled in September, and rapidly J o u r n a l P r e -p r o o f increased in prevalence in the UK, comparable to the global transitions we found when the G clade became globally dominant . In the spring of 2021, B.1.627.2, initially sampled in India, had significantly begun to rise in frequency in the UK. In this evolutionary pattern, one form gave way successively to another: G, to GV, to B.1.1.7. Currently B.1.617.2 has begun to be increasing sampled; over the next few months we will learn if B.1.617.2 continues in this upward trajectory in the UK and elsewhere. The same data are plotted two ways: weekly average tallies of each form, to give a sense of sampling, and weekly average frequencies. Below, the same data is plotted for North America. The G clade is dominant in the fall. G clade forms which carried additional mutations near the furin cleavage site (magenta and purple) became increasingly frequently sampled, but then gave way to variants with more complex forms of Spike, which often still carried a positive charge near the furin cleavage site. When the B.1.1.7 variant began to be sampled in early December, there are already distinct forms with an established presence and relative fitness advantages co-circulating, and VOI/VOCs first sampled from California, Brazil, and New York all had a significant presence. Still B.1.1.7 has been increasingly sampled throughout North America, although P.1 and B.1.526 also continuing to maintain or increase in frequency in some regions states in the USA. As of early May, 2021, B.1.617.2 is still rare but present and increasing in frequency in North America. Figure 1 in the main text. Estimated mutational parameters based on maximum likelihood trees generated using RAxML-NG (Kozlov et al., 2019) , with a general time-reversible model with rate categories estimated using an 8category Gamma rate distribution (GTR+G8). Base frequencies: Sequences are ordered top-to-bottom according to the phylogenetic tree in the left panel; consequently, continuous vertical stripes indicate lineage-specific mutations that are shared by related sequences (see text). The trees in the left panel are in each case derived from a whole-genome nucleotide alignment: an approximation to the maximum-likelihood tree, generated with RAxML-NG (Kozlov et al., 2019) for HIV-1 Env, and for SARS-CoV-2 Spike, a neighbor-joining tree inferred with PAUP (Swofford, 2003) using Log-Determinant (logdet) distances with uninformative sites removed (see Swofford et al., 1996, pp. 459-462) , saved with parsimony branch lengths. J o u r n a l P r e -p r o o f Additional prolines are occasionally added to Spike near 520/521, which are located at the end of the RBD. In the top case (shown in the row beneath Anc), we found a repeated A522P change, no insertion, but a distinctive way to add two prolines in this local domain, so we included it here. In the other cases, either a single P, or a two- Autologous HIV-1 neutralizing antibodies: emergence of neutralizationresistant escape virus and subsequent development of escape virus neutralizing antibodies Prolonged severe acute respiratory syndrome coronavirus 2 replication in an immunocompromised patient Early low-titer neutralizing antibodies impede HIV-1 replication and select for virus escape Establishing a genetic recombination map for murine coronavirus strain A59 complementation groups The newly introduced SARS-CoV-2 variant A222V is rapidly spreading in Lazio region HIV subtype diversity worldwide Visualization of the HIV-1 Env glycan shield across scales Viral variants that initiate and drive maturation of V1V2-directed HIV-1 broadly neutralizing antibodies New subtype B containing HIV-1 circulating recombinant of sub-Saharan Africa origin in Nigerian men who have sex with men Epidemic dynamics and antigenic evolution in a single season of influenza A Viral evolution and escape during acute HIV-1 infection RNA 3'-end mismatch excision by the severe acute respiratory syndrome coronavirus nonstructural protein nsp10/nsp14 exoribonuclease complex A SARS-CoV-2 lineage A variant (A.23.1) with altered spike has emerged and is dominating the current Uganda epidemic The Incorporation of Host Proteins into the External HIV-1 Envelope Kaposi's sarcoma and Pneumocystis pneumonia among homosexual men-New York City and California Quantification of the resilience and vulnerability of HIV-1 native glycan shield at atomistic detail Origin and evolution of pathogenic coronaviruses Broadly neutralizing antibodies against HIV: Back to blood Estimated transmissibility and severity of novel SARS-CoV-2 Issues with SARS-CoV-2 sequencing data Transmission, infectivity, and antibody neutralization of an emerging SARS-CoV-2 Evidence for host-dependent RNA editing in the transcriptome of SARS-CoV-2 Worldwide Reduction in MERS Cases and Deaths since 2016 Recurrent mutations in SARS-CoV-2 genomes isolated from mink point to rapid host-adaptation The race is on Detection of new SARS-CoV-2 variants related to mink Transmission of single HIV-1 genomes and dynamics of early immune escape revealed by ultra-deep sequencing Cooperation of B cell lineages in induction of HIV-1-broadly neutralizing antibodies Diversity considerations in HIV-1 vaccine selection WHO Director-General's opening remarks at the media briefing on COVID-19 -11 Recombination and low-diversity confound homoplasy-based methods to detect the effect of SARS-CoV-2 mutations on viral transmissibility Predictive value of immunologic and virologic markers after long or short duration of HIV-1 infection Impact of host cell line choice on glycan profile TNT version 1.5, including a full implementation of phylogenetic morphometrics Coronaviridae Study Group of the International Committee on Taxonomy of Viruses, 2020. The species severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2 Recombination, reservoirs, and the modular spike: Mechanisms of coronavirus cross-species transmission Pervasive and nonrandom recombination in near full-length HIV genomes from Uganda AIDS as a zoonosis: scientific and public health implications Recovery from acute SARS-CoV-2 infection and development of anamnestic immune responses in T cell-depleted rhesus macaques. bioRxiv Multiple roles for HIV broadly neutralizing antibodies Temporal dynamics in viral shedding and transmissibility of COVID-19 WHO-UNAIDS Network for HIV Isolation Characterisation, 2019. Global and regional molecular epidemiology of HIV-1, 1990-2015: a systematic review, global survey, and trend analysis Intractable COVID-19 and prolonged SARS-CoV-2 replication in a chimeric antigen receptor-modified T-Cell therapy recipient: A case study Emergence and spread of a SARS-CoV-2 variant through Europe in the summer of 2020 Evidence of the recombinant origin of a bat severe acute respiratory syndrome (SARS)-like coronavirus and its implications on the direct ancestor of SARS coronavirus SARS-CoV-2 D614G variant exhibits efficient replication ex vivo and transmission in vivo Impact of clade, geography, and age of the epidemic on HIV-1 neutralization by antibodies HIV-1 reverse transcription. Cold Spring Harb Perspect Med 2 VMD: visual molecular dynamics Kaposi's sarcoma in homosexual men-a report of eight cases Recurrent emergence and transmission of a SARS-CoV-2 spike deletion H69/V70. bioRxiv SARS-CoV-2 evolution during treatment of chronic infection Frequent intra-subtype recombination among HIV-1 circulating in Tanzania T cell-based strategies for HIV-1 vaccines Polyvalent vaccine approaches to combat HIV-1 diversity Timing the ancestor of the HIV-1 pandemic strains RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference Variation in intraindividual lentiviral evolution rates: a systematic review of human, nonhuman primate, and felid species Crystal structure, conformational fixation and entryrelated interactions of mature ligand-free HIV-1 Env Working paper on SARS-CoV-2 spike mutations arising in Danish mink, their spread to humans and neutralization data The impact of mutations in SARS-CoV-2 spike on viral infectivity and antigenicity Emergence of SARS-CoV-2 through recombination and strong purifying selection Eastern chimpanzees, but not bonobos, represent a simian immunodeficiency virus reservoir Vertical T cell immunodominance and epitope entropy determine HIV-1 escape Early transmissibility assessment of the N501Y mutant strains of SARS-CoV-2 in the United Kingdom APOBEC proteins and intrinsic resistance to HIV-1 infection The SARS-CoV-2 spike variant D614G favors an open conformational state Lower in vivo mutation rate of human immunodeficiency virus type 1 than that predicted from the fidelity of purified reverse transcriptase N-terminal domain antigenic mapping reveals a site of vulnerability for SARS-CoV-2 immune evasion by variant B.1.427/B.1.429. bioRxiv Recurrent deletions in the SARS-CoV-2 spike glycoprotein drive antibody escape Correlates of protection against SARS-CoV-2 in rhesus macaques Transmission of SARS-CoV-2: A review of viral, host, and environmental factors Decades of basic research paved the way for today's 'warp speed' Covid-19 vaccines High recombination potential of subtype A HIV-1 Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2): An update Human immunodeficiency virus genetic variation that can escape cytotoxic T cell recognition Spike mutation D614G alters SARS-CoV-2 fitness Features of recently transmitted HIV-1 clade C viruses that impact antibody recognition: Implications for active and passive immunization Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology The N501Y mutation in SARS-CoV-2 spike leads to morbidity in obese and aged mice and is neutralized by convalescent and post-vaccination human sera CD8+ T cell responses in COVID-19 convalescent individuals target conserved epitopes from multiple prominent SARS-CoV-2 circulating variants COVID-19 vaccines: implementation, limitations and opportunities HIV-1 nomenclature proposal Coronavirus RNA proofreading: Molecular basis and therapeutic targeting A structural view of SARS-CoV-2 RNA replication machinery: RNA synthesis, proofreading and final capping Co-circulation of three camel coronavirus species and recombination of MERS-CoVs in saudi arabia Deducing the N-and Oglycosylation profile of the spike protein of novel coronavirus SARS-CoV-2 Origins of HIV and the AIDS pandemic SARS-CoV-2 variant B.1.1.7 is susceptible to neutralizing antibodies elicited by ancestral spike vaccines Pervasive genomic recombination of HIV-1 in vivo Discovery of O-linked carbohydrate on HIV-1 Envelope and its role in shielding against one category of broadly neutralizing antibodies Tracking HIV-1 recombination to resolve its contribution to HIV-1 evolution in natural infection Vaccines and broadly neutralizing antibodies for HIV-1 prevention Epidemiology, genetic recombination PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer Associates Phylogenetic inference Comprehensive analysis of T cell immunodominance and immunoprevalence of SARS-CoV-2 epitopes in COVID-19 cases Residual colours: a proposal for aminochromography Effect of glycosylation on an immunodominant region in the V1V2 variable domain of the HIV-1 Envelope gp120 protein Introduction of Brazilian SARS-CoV-2 484K.V2 related variants into the UK Lead SARS-CoV-2 candidate vaccines: Expectations from phase III trials and recommendations post-vaccine approval The increasing genetic diversity of HIV-1 in the UK Rapid detection of interclade recombination in SARS-CoV-2 with Bolotie Transmission of SARS-CoV-2 lineage B.1.1.7 in England: Insights from linking epidemiological and genetic data. medRxiv Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein Glycan shield and epitope masking of a coronavirus spike protein observed by cryo-electron microscopy Cryo-EM structure of a CD4-bound open HIV-1 envelope trimer reveals structural rearrangements of the gp120 V1V2 loop Site-specific glycan analysis of the SARS-CoV-2 spike Vulnerabilities in coronavirus glycan shields despite extensive glycosylation Exploitation of glycosylation in enveloped virus pathobiology Antibody neutralization and escape by HIV-1 Coronavirus genomes carry the signatures of their habitats D614G Spike mutation increases SARS-CoV-2 susceptibility to neutralization SARS-CoV-2 lineage B.1.526 emerging in the New York region detected by software utility created to query the spike mutational landscape Structures and mechanisms of viral membrane fusion proteins: multiple variations on a common theme SARS-CoV-2 501Y.V2 escapes neutralization by South African COVID-19 donor plasma HIV evolution in early infection: selection pressures, patterns of insertion and deletion, and the impact of APOBEC Direct evidence of extensive diversity of HIV-1 in Kinshasa by 1960 Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation SARS-CoV-2 and bat RaTG13 spike glycoprotein structures inform on virus evolution and furin-cleavage effects Major antigenic site B of human influenza H3N2 viruses has an evolving local fitness landscape The role of recombination in the emergence of a complex and dynamic HIV epidemic An African HIV-1 sequence from 1959 and implications for the origin of the epidemic Network for Genomic Surveillance in South Africa (NGS-SA) author list Sibongile Walaza 9 Both, however, are also frequently sampled in other contexts: ∆69/70 was found an additional 10,168 times, and ∆144 an additional 1,513 times. Focused regions of rare but recurring indels are highlighted here, and details are provided in Figures S1 and S2. The different regions in Spike are highlighted and include: the signal peptide (SP), the N terminal domain (NTD), the receptor binding domain (RBD) and motif (RBM), subdomain 1 and 2 (SD1 and SD2), the fusion peptide (FP), heptad repeat 1 and 2 (HR1 and HR2), the central helix (CH), and the connecting domain (CD) and the transmembrane region (TM). (C) A parsimony tree based on the cov In the (lower) close-up view of NTD, the positions of the most common deletions: 69/70, 144, and 242-244 are depicted as red beads. Residues shown in light blue are loops N1 (14-26), N3 (141-156), and N5 (246-260) that define the supersite for NTD-binding neutralizing antibodies (E) The position of the 6-nucleotide. 3-codon deletion at SARS-CoV-2 genome positions 21,766-21,771 that causes most instances of the Spike ∆69/70 2-amino-acid deletion. Note that the third position of the original isoleucine codon Acknowledgments. This work was supported by LANL LDRD project 20200554ECR and by a LANL project to characterize COVID diversity provided by Los Alamos National Laboratory Technology Innovation funds, as well as through the NIH NIAID, DHHS Interagency Agreement R-00441015-0/AAI12007. We thank the staff at GISAID for kindly supporting our efforts at cov.lanl.gov, and the many groups throughout the world that provide the global SARS-CoV-2 viral sequence data. Special thanks to Duncan McBranch and Joseph "Pat" Fitch for their leadership of the COVID pandemic response at Los Alamos National Laboratory.Declaration of Interests. B.K., W.F., J.T., T.B., and S.G. have provisional patents and patents relating to vaccine design to address viral diversity as applied to HIV-1 and/or SARS-CoV-2.