key: cord-1027391-k6nyb7el authors: Klink, G.; Safina, K. R.; Garushyants, S. K.; Moldovan, M.; Nabieva, E.; The CoRGI Consortium,; Komissarov, A. B.; Lioznov, D.; Bazykin, G. A. title: Spread of endemic SARS-CoV-2 lineages in Russia date: 2021-05-27 journal: nan DOI: 10.1101/2021.05.25.21257695 sha: 5614926a12bbe71b2ca72181a0e7f45a1a82610f doc_id: 1027391 cord_uid: k6nyb7el In 2021, the COVID-19 pandemic is characterized by global spread of several lineages with evidence for increased transmissibility. Russia is among the countries with the highest number of confirmed COVID-19 cases, making it a potential hotspot for emergence of novel variants. Here, we show that among the globally significant variants of concern, B.1.1.7 (501Y.V1), B.1.351 (501Y.V2) or P.1 (501Y.V3), none have been sampled in Russia before January 2021. Instead, since summer 2020, the epidemic in Russia has been characterized by the spread of two lineages that are rare elsewhere: B.1.1.317 and a sublineage of B.1.1 including B.1.1.397 (hereafter, B.1.1.397+). In February-March 2021, these lineages reached frequencies of 26.9% (95% C.I.: 23.1%-31.1%) and 32.8% (95% C.I.28.6%-37.2%) respectively in Russia. Their frequency has increased in different parts of Russia. Together with the fact that these lineages carry several spike mutations of interest, this suggests that B.1.1.317 and B.1.1.397+ may be more transmissible than the previously predominant B.1.1, although there is no direct data on change in transmissibility. Comparison of frequency dynamics of lineages carrying subsets of characteristic mutations of B.1.1.317 and B.1.1.397+ suggests that, if indeed some of these mutations affect transmissibility, the transmission advantage of B.1.1.317 may be conferred by the (S:D138Y+S:S477N+S:A845S) combination; while the advantage of B.1.1.397+ may be conferred by the S:M153T change. On top of these lineages, in January 2021, B.1.1.7 emerged in Russia, reaching the frequency of 17.4% (95% C.I.: 12.0%-24.4%) in March 2021. Additionally, we identify three novel distinct lineages, AT.1, and two lineages prospectively named B.1.1.v1 and B.1.1.v2, that have started to spread, together reaching the frequency of 11.8% (95% C.I.: 7.5%-18.1%) in March 2021. These lineages carry combinations of several notable mutations, including the S:E484K mutation of concern, deletions at a recurrent deletion region of the spike glycoprotein (S:{Delta}140-142, S:{Delta}144 or S:{Delta}136-144), and nsp6:{Delta}106-108 (also known as ORF1a:{Delta}3675-3677). Community-based PCR testing indicates that these variants have continued to spread in April 2021, with the frequency of B.1.1.7 reaching 21.7% (95% C.I.: 12.3%-35.6%), and the joint frequency of B.1.1.v1 and B.1.1.v2, 15.2% (95% C.I.: 7.6%-28.2%). The combinations of mutations observed in B.1.1.317, B.1.1.397+, AT.1, B.1.1.v1 and B.1.1.v2 together with frequency increase of these lineages make them candidate variants of interest. epidemiological and/or antigenic properties. In spring 2020, the S:D614G change has spread 49 globally due to its fitness advantage 1, 2 . Subsequently, a number of variants of concern, including 50 B.1.1.7 (501Y.V1) first sampled in Great Britain in September, B.1.351 (501Y.V2) first sampled 51 in South Africa in October, and P.1 (501Y.V3) first sampled in Brazil in December, were shown 52 to be associated with increased transmissibility 3-5 . These variants are characterized by 53 overlapping sets of changes in spike receptor-binding domain which affect ACE2 binding and 54 antibody recognition, as well as other changes with demonstrated functional and antigenic 55 effects. Emergence of SARS-CoV-2 variants with evidence for change in transmissibility, and 56 possibly other properties, highlights the importance of continued surveillance of novel variants. 57 In particular, locally arising variants that grow in frequency over time may suggest a 58 transmission advantage, although such an increase may also occur by chance 6 . 59 Here, we show that the outbreak in Russia is characterized by a spread of two lineages, 60 B.1.1.317 and B.1.1.397+, which are highly prevalent in Russia but rarely appear in non-61 Russian samples. We trace the accumulation of sequential mutations in the evolution of these 62 lineages, and single out the spike mutations that are followed by a burst in frequency. on April 5, 2020 in Moscow, spreading across the country throughout 2020 (Fig. 2) The frequency dynamics of the derived variants at the remaining 38 positions is shown in Fig. 3 . 104 These include the mutations characterizing the B.1.1.7 variant which has been increasing in 105 frequency in Russia since January 2021 ( Fig. 1) , as well as some of the other globally spreading 106 mutations of concern or interest, including the E484K mutation in spike. However, at many of 107 these sites, the non-reference variants were rare outside Russia (Fig. 3) . Most of these variants 108 showed similar temporal dynamics in Moscow and St. Petersburg regions, as well as in the 109 European and Asian parts of Russia (Fig. S1 ), indicating that their increase in frequency is not a 110 result of sampling bias. 111 We aimed to identify the high-frequency variants carrying these mutations. Many of these sites 112 were highly homoplasic, and overall we found the resulting phylogenies not to be robust. 113 Instead, we defined the most frequent variants composed of these mutations, independent of 114 the alleles at other sites (Fig. 4) . four potentially important changes in spike (Q675R+D138Y+S477N+A845S; column 1 in Fig. 4 ). The frequencies of such "non-canonical" combinations of mutations increased throughout 2020-132 2021 (Fig. 5) . 133 Finally, we observe three high-frequency combinations of mutations, including the S:E484K 134 mutation of concern as well as other mutations of interest (notably S:Δ140-142, S:Δ136-144 and 135 nsp6:Δ106-108, also referred to as ORF1a:Δ3675-3677; columns 7, 8 and 10 in Fig. 4 ). One of 136 these combinations (column 8 in Fig. 4) CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 27, 2021. ; . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 27, 2021. alone, it demonstrates modest growth (Fig. 6A) ; however, its combination with 173 S:Q675R+D138Y+S477N+A845S demonstrates a much more rapid frequency increase, with 174 the estimated daily growth rate of 1.93% (95% CI: 1.8%-2.06%; Fig. 6B ). While this leads to a 175 frequency increase of the N:A211V mutation independent of the background (Fig. 6C) , this 176 suggests that the frequency increase is more likely driven by the 177 S:Q675R+D138Y+S477N+A845S combination than the N:A211V change defining the B. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 27, 2021. combinations are still frequent many months after they all originated (Fig. 5, 7) . 191 192 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (Fig. 8B) ; as well as three novel variants carrying the following combinations of 199 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) 2.8% and 2.6% respectively), and also because they are composed of known mutations of 207 interest or concern. The daily growth rate estimated for these variants by the logistic growth 208 model is in the range of 2.44% to 7.18% (Fig. 8C-E) . 209 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The continued spread of some of these variants between February-April 2021 is confirmed by 216 community-based PCR testing. To obtain independent frequency estimates, we made use of a 217 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 27, 2021. PCR system sensitive to the presence of nsp6:Δ106-108 and S:Δ69-70 deletions (see Methods) 218 to detect the B. (Fig. 4) . While the frequency estimates were highly uncertain 221 ( wide-spread in April (Fig. 9 , Table 1 ). A considerable fraction (59.6%) of PCR samples from 223 February and March were included in our main analysis, as their sequences were in GISAID. However, the frequency increase was also observed in the 136 PCR samples for which no 225 sequencing data was available (Fig. S2) , providing independent validation of the NGS results. 226 Similarly, it was observed when the PCR tests only for St. Petersburg were analysed (Fig. S3, 227 S4), indicating that the prevalence of these variants increases at least in this city as opposed to 228 being an artefact of changing sampling between regions. Notations for logistic curves are the same as in Fig. 6 . 235 236 Table 1 . Frequencies of (B.1.1.v1 or B.1.1.v2) Mutational composition of the high-frequency variants 240 In this section, we discuss the mutations that constitute the variants spreading in Russia. Russian dataset (p=0.0396 for the MEME model and p=0.0268 for the FEL model, the 245 likelihood-ratio test), as well as a rapid increase in frequency of non-reference variants in the 246 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 27, 2021. included in one of the four regions of the nucleocapsid protein with the highest affinity to 249 multiple MHC-I alleles 13 . Nevertheless, the frequency of the variant carrying the N:A211V 250 mutation alone has declined since October 2020 (Fig. 5) , suggesting that it is unlikely to confer 251 transmission advantage against the background of other currently frequent variants (Fig. 6) . these other mutations (Fig. 7) . This increase has been ongoing since late spring 2020 (Fig. 5 transmissibility; the variant carrying the S:P681H mutation alone; and three novel variants. 292 S:P681H is one of the nine spike changes that characterize the rapidly spreading B.1.1.7 293 lineage 3 ; however, it is absent from the two other lineages of concern, B.1.351 and P. 1, 294 indicating that it is not essential for increased transmissibility. The 681 position is adjacent to the 295 furin cleavage site; this site is absent in non-human CoV, and is assumed to have contributed to 296 pathogenicity in humans 32 . Changes at this position experience both persistent and episodic 297 positive selection 12 . P681H appears to increase in frequency globally 33 , although it is hard to 298 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 27, 2021. ; disentangle this increase from that of the other changes constituting the rapidly spreading 299 B.1.1.7 lineage. We find that the frequency of this mutation in Russia in the absence of other 300 B.1.1.7 mutations does not increase (Fig. 8) , indicating that it does not increase transmissibility 301 by itself. 302 The three remaining high-frequency variants with evidence for rapid frequency increase carry 303 combinations of the following high-frequency mutations: S:P9L, S:Δ140-142 (or S:Δ136-144), 304 S:E484K, and nsp6:Δ106-108. The sets of mutations in these variants are in conflict (i.e., not 305 nested within each other; Fig. 4) , indicating that at least some of these mutations emerged in 306 them independently. These mutations are of interest or concern. Specifically, S:E484K (present 307 in AT.1 and B.1.1.v2) is involved in multiple variants of concern including the B.1.351 308 (501Y.V2) 4 , P.1 (501Y.V3) 5,34 and P.2 (S.484K) 5,34 lineages, and has been shown by several 309 groups to cause escape from neutralizing antibodies 35-37 . nsp6:Δ106-108 (also referred to as 310 ORF1a cases of COVID-19 and the arrival of variants of concern, notably the B. rates estimated by our model are all below 2% (Fig. 6-8) . Besides, these variants are currently 337 missing spike changes L452R, E484K or N501Y which occur in other VOCs 41 . 338 The combinations of mutations seen in the three variants that emerged in 2021, AT.1, B.1.1.v1 339 and B.1.1.v2 (Fig. 8C-E) , look more suspicious, because their estimated rate of frequency 340 increase is higher and because they include mutations with known effects and occurring in other 341 variants of interest or concern. While little can be told about their frequency dynamics on the 342 basis of the currently available data, they require careful monitoring. 100 nucleotides from the beginning and from the end of the alignment were trimmed. After that, 349 we excluded sequences (1) shorter than 29,000 bp, (2) with more than 3,000 (for Russian 350 sequences) or 300 (for all other countries) positions of missing data (Ns), (3) excluded by 351 Nextstrain 44 , (4) in non-human animals, (5) with a genetic distance to the reference genome 352 more than four standard deviations from the epi-week mean genetic distance to the reference, 353 or (6) with incomplete collection dates. As our focus was on the spread of lineages in Russia, 354 and since Russia is relatively poorly sampled, we chose a less stringent threshold at step (2) for 355 Russian compared to non-Russian sequences in order to keep more Russian To estimate positive selection, we employed MEME and FEL models implemented in the HyPhy 380 package 10,11 . For this analysis, we added Russian sequences with incomplete collection dates to 381 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) data was available. Notations are the same as in Fig. 6 . 547 548 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 27, 2021. ; https://doi.org/10.1101/2021.05.25.21257695 doi: medRxiv preprint Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases 407 Infectivity of the COVID-19 Virus Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility 409 and Pathogenicity Rapid increase of a SARS-CoV-2 variant with multiple spike protein mutations 411 observed in the United Kingdom Emergence and rapid spread of a new severe acute respiratory syndrome-413 related coronavirus 2 (SARS-CoV-2) lineage with multiple spike mutations in South Africa Genomic characterisation of an emergent SARS-CoV-2 lineage in Manaus: preliminary findings Emergence and spread of a SARS-CoV-2 variant through Europe in the 419 summer of 2020 Global initiative on sharing all influenza data -from vision to reality Genomic epidemiology of the early stages of the SARS-CoV-2 outbreak 423 in Russia Near real-time visualization of SARS-CoV-2 (hCoV-19) genomic variation 425 Not So Different After All: A Comparison of Methods 427 for Detecting Amino Acid Sites Under Selection Detecting Individual Sites Subject to Episodic Diversifying Selection Selection history of genes in SARS-CoV-2/COVID-19 genomes enabled by data from Immunoinformatic Analysis CoV-2 Nucleocapsid Protein and Identification of COVID-19 Vaccine Targets Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus Antibody evasion by the P.1 strain of SARS-CoV-2 The antigenic anatomy of SARS-CoV-2 receptor binding domain Evolution of Antibody Immunity to SARS-CoV-2 Landscape analysis of escape variants identifies SARS-CoV-2 spike mutations that 444 attenuate monoclonal and serum antibody neutralization Serine 477 plays a crucial role in the interaction of the SARS-CoV-2 spike protein with the 447 human receptor ACE2 | Scientific Reports Novel and Expanding SARS-CoV-2 Variant, B.1.526, Identified in New York | medRxiv Detection and characterization of the SARS-CoV-2 lineage B.1.526 in New 453 The expert reported on the mutation of the coronavirus in 13 regions of the Russian Federation Dangerous COVID-19 mutations, which Popova warned about, were found in the Urals Evolutionary relationships and sequence-structure determinants in human 459 SARS coronavirus-2 spike proteins for host receptor recognition Potent SARS-CoV-2 Neutralizing Antibodies Directed Against Spike N-Terminal 462 Domain Target a Single Supersite A neutralizing human antibody binds to the N-terminal domain of the Spike protein 465 of SARS-CoV-2 The SARS-CoV-2 nucleocapsid protein is dynamic, disordered, and phase 468 separates with RNA A new SARS-CoV-2 lineage that shares mutations with known Variants of 470 Concern is rejected by automated sequence repository quality control Identification and functional analysis of the SARS-COV-2 nucleocapsid protein We are grateful to all GISAID submitting and originating labs (