key: cord-0280855-ecx5cqqi authors: Colson, P.; Fournier, P.-E.; Chaudet, H.; Delerce, J.; GIRAUD-GATINEAU, A.; HOUHAMDI, L.; ANDRIEU, C.; BRECHARD, L.; BEDOTTO, M.; PRUDENT, E.; GAZIN, C.; BEYE, M.; BUREL, E.; DUDOUET, P.; TISSOT-DUPONT, H.; GAUTRET, P.; LAGIER, J.-C.; MILLION, M.; BROUQUI, P.; Parola, P.; Drancourt, M.; LA SCOLA, B.; LEVASSEUR, A.; Raoult, D. title: Analysis of SARS-CoV-2 variants from 24,181 patients exemplifies the role of globalisation and zoonosis in pandemics date: 2021-09-12 journal: nan DOI: 10.1101/2021.09.10.21262922 sha: 295c78b3e93d4b5bd5f46de6887278bec5aac0a8 doc_id: 280855 cord_uid: ecx5cqqi After the end of the first epidemic episode of SARS-CoV-2 infections, as cases began to rise again during the summer of 2020, we at IHU Mediterranee Infection in Marseille, France, intensified the genomic surveillance of SARS-CoV-2, and described the first viral variants. In this study, we compared the incidence curves of SARS-CoV-2-associated deaths in different countries and reported the classification of SARS-CoV-2 variants detected in our institute, as well as the kinetics and sources of the infections. We used mortality collected from a COVID-19 data repository for 221 countries. Viral variants were defined based on [≥]5 hallmark mutations shared by [≥]30 genomes. SARS-CoV-2 genotype was determined for 24,181 patients using next-generation genome and gene sequencing (in 47% and 11% of cases, respectively) or variant-specific qPCR (in 42% of cases). Sixteen variants were identified by analysing viral genomes from 9,788 SARS-CoV-2-diagnosed patients. Our data show that since the first SARS-CoV-2 epidemic episode in Marseille, importation through travel from abroad was documented for seven of the new variants. In addition, for the B.1.160 variant of Pangolin classification (a.k.a. Marseille-4), we suspect transmission from mink farms. In conclusion, we observed that the successive epidemic peaks of SARS-CoV-2 infections are not linked to rebounds of viral genotypes that are already present but to newly-introduced variants. We thus suggest that border control is the best mean of combating this type of introduction, and that intensive control of mink farms is also necessary to prevent the emergence of new variants generated in this animal reservoir. worldwide and 5.9 million cases in France by 31 July 2021 80 (https://www.ecdc.europa.eu/en/geographical-distribution-2019-ncov-cases)). In addition, 81 there are animal reservoirs, a major one being mink (8, 9) . SARS-CoV-2 is, therefore, 82 currently a rapidly evolving virus with a mutation rate estimated to be 9.8 x 10 -4 83 substitutions/site/year (10) and characterised by a high rate of lineage turnover (11). Several 84 SARS-CoV-2 specific classification and naming systems including those of GISAID 85 (https://www.gisaid.org/ (12)), Nexstrain (https://clades.nextstrain.org/ (13)), Pangolin 86 (https://cov-lineages.org/pangolin.html (11) (34.1±4.2). Similar trends were observed for the number of amino acid changes in the spike 158 protein (Fig. S1) . Overall, when analysing the pairwise genetic distance between genomes 159 obtained over time, two major periods with increased viral genetic diversity compared to 160 previous months were identified, during the summer of 2020 compared to before June 2020, 161 mean±standard deviation of pairwise genetic distances between genomes being 8.28x10 -162 There are no universally recognised or used strategies and criteria to classify viruses 179 below the species level, and only biological classifications and nomenclatures have been 180 proposed (20, 21) (Box 1). Classification and naming systems have been specifically 181 proposed for SARS-CoV-2 including those of GISAID (https://www.gisaid.org/) (12), 182 Nextstrain (https://clades.nextstrain.org/) (13), Pangolin (https://cov-183 lineages.org/pangolin.html) (11), and the WHO (https://www.who.int/en/activities/tracking-184 SARS-CoV-2-variants/) (Box 2). However, we chose to implement our own classification 185 system and nomenclature and to use names rather than only numbers. This approach fitted the 186 viral genomic epidemiology in our geographical area, and was more understandable and 187 easier to use, which is the very purpose of classification and nomenclature. Our strategy 188 consisted in detecting all mutations in all genomes obtained from patients diagnosed in our 189 institute and in assigning a dynamic nomenclature to track SARS-CoV-2 genetic patterns. 190 A phylogenetic tree was built that included all genomic sequences of SARS-CoV-2 191 obtained in our institute (Fig. 4A ). Based on tree topology and to distinguish them from 192 SARS-CoV-2 mutants, we delineated SARS-CoV-2 variants as groups of viral genomes 193 carrying a set of at least five mutations differentiating them from any other viral genomes and 194 obtained from at least 30 (initially 10) different patients. Genomes were, therefore, 195 unambiguously assigned to a given variant based on the presence of particular sets of 196 mutations (Table 1) . This differentiated variants from mutants, the genomes of which do not 197 harbour this minimal set of hallmark mutations. We first delineated seven Marseille variants 198 of SARS-CoV-2 as early as 7 September 2020, 2 of which did not eventually reach the 199 threshold number of 30 genomes (2). This was the first description of an emergence and 200 expansion of several SARS-CoV-2 variants responsible for distinct epidemics in a same 201 geographical area. Three additional variants were then defined in 2020 (Fig. 5) and four in 202 2021, making a total of 12 (Table 1; Figs. 4A and 4B ). In addition, we delineated 9 other viral 203 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 12, 2021. ; https://doi.org/10.1101/2021.09.10.21262922 doi: medRxiv preprint lineages comprising genomes harbouring a specific set of at least five mutations but obtained 204 from <30 different patients; some of these lineages may grow and become bona fide variants 205 as additional members will be obtained. Besides, we detected four other SARS-CoV-2 206 variants that were defined as variants of concern (VOC; Box 2) and were named Alpha, Beta, 207 Gamma and Delta by the WHO (https://www.who.int/en/activities/tracking-SARS-CoV-2-208 variants/). Regarding the nomenclature of the variants we defined, we used our city name, 209 Marseille, followed by a number according to the chronology of their description (e.g.: 210 Marseille-1, Marseille-2,…) or one of their hallmark amino acid substitutions (e.g.: Marseille-211 501). Hereafter, we will use both the Marseille and Pangolin nomenclatures to describe the 212 genotypes of the SARS-CoV-2 genomes obtained in our institute. 213 Overall prevalence of the Marseille SARS-CoV-2 variants 216 SARS-CoV-2 genotyping was performed and successful for 24,181 (44%) of the 217 54,703 SARS-CoV-2-positive patients diagnosed in our institute as of 18 August 2021 (Table 218 2). SARS-CoV-2 genotyping was performed by genome sequencing in our institute for at 219 least 12% of positive diagnoses of SARS-CoV-2 infection monthly, and for at least one fifth 220 and one third of positive diagnoses monthly for 14 and 7 of the 18 months from February 221 2020 to July 2021, respectively (Table S2) . Genotype was obtained primarily by whole 222 genome next-generation sequencing in 11,387 (47%) cases and by partial genome next-223 generation sequencing (spike fragment) in 2,621 (11%) cases, and retrospectively or 224 prospectively (since January 2021) by in house variant-specific qPCR assays gradually 225 designed and implemented subsequent to the detection of new variants (17, 18) Based on our definition of variants, we were able to identify several since the summer 240 of 2020, which we would currently call variants of concern (VOC) (Box 2). After having 241 defined and named the Marseille variants, we attempted to characterise their dynamic and to 242 analyse their epidemiological features. For some of them, we were able to find their source. 243 During the onset of the SARS-CoV-2 epidemic in late February 2020, we observed two 244 distinct SARS-CoV-2 genotypes that did not make it possible to define variants based on our 245 classification system. They corresponded to Nextstrain clades 19A and 20B that briefly co-246 circulated in our geographical area during the first period of the SARS-CoV-2 epidemic 247 ( We named the first variant we defined Marseille-1 (corresponds to Pangolin clade 253 B.1.416), which emerged on week 27 of 2020 and predominated in July 2020 before rapidly 254 disappearing at the end of August 2020 (3) (Table S3; from February 2021. The first cases were part of a family cluster during the Christmas 299 holidays after some family members came from England by train. We also had smaller 300 epidemics with the Beta (South African) and the Gamma (Brazilian) variants from January 301 2021. Regarding the Gamma variant, we first detected it among people returning from the 302 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Overall, we identified several SARS-CoV-2 variants that were responsible as early as 318 the summer of 2020 for distinct epidemics causing several waves of SARS-CoV-2 incidence 319 (Figs. 6A, 6B, and 7) (22, 23) . The epidemics caused by these different variants each 320 exhibited a bell-shaped curve, and they occurred successively or concomitantly, therefore 321 contributing to the total case load. We documented that new SARS-CoV-2 variants were 322 introduced in our geographical area by boat in one case (Marseille-1), by train in one case 323 Nigeria through air travel is suspected. Importation from abroad through travel therefore 326 accounted for a substantial source of variants and, subsequently, of cases. The other source 327 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 12, 2021. ; https://doi.org/10.1101/2021.09.10.21262922 doi: medRxiv preprint identified for the emergence of a new variants is the mink epidemics in Northern France, with 328 the case of Marseille-4 (4). 329 In this study, we wanted to clarify the classification and naming of SARS-CoV-2 332 genotypes we implemented in our institute in order to have a simple identification of SARS-333 CoV-2 variants and to characterize their mutations, their source, and spread. This is essential 334 to get a better understanding of SARS-CoV-2 epidemiology. Since the SARS-CoV-2 335 emergence in France, we have performed in our institute extensive surveillance, identification 336 and monitoring of SARS-CoV-2 variants. Currently, the viral genotype has been determined 337 for 44% of the patients diagnosed in our institute and the viral genome was obtained in 338 approximatively one fifth of patients diagnosed with SARS-CoV-2. As early as the summer of 339 2020, we increased our surveillance of SARS-CoV-2 genotypes which allowed us to decipher First, it appears relatively clear that in countries which have been isolated, either 345 geographically (islands that have closed their access) or politically (China and Korea), there 346 has been only one major epidemic episode with a bell-shaped incidence curve. In Europe and 347 the United States, where such a policy has not been implemented or has been made 348 impossible by political circumstances, several epidemics have occurred successively or 349 concurrently with different viral variants. Marseille, which is a very particular city due to its 350 geographical location, has for centuries been at the forefront of imports of epidemics 351 including of plague and cholera (28), and operated in an identical way during the SARS-CoV-352 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 12, 2021. ; https://doi.org/10.1101/2021.09.10.21262922 doi: medRxiv preprint 2 pandemic. Thus, we observed that the first diagnosed SARS-CoV-2 infections were 353 imported cases involving people who had stayed in northwest Italy and at the French/Italian 354 border. This preceded a first epidemic phase during which international borders were closed. 355 Their reopening in early summer 2020 was followed by a short epidemic. This was linked to a 356 SARS-CoV-2 variant that we named Marseille-1, which was imported from North Africa by 357 boat and originated from sub-Saharan Africa, and was probably not very transmissible, since 358 this epidemic ended two months after its emergence and did not spread beyond the Marseille travellers. This explains the importation of infectious agents of various origins, which has 376 been recognised in Marseille since antiquity (28). The considerable role of travel in the 377 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 12, 2021. ; https://doi.org/10.1101/2021.09.10.21262922 doi: medRxiv preprint introduction and subsequent spread of new SARS-CoV-2 variants on the national or 378 continental scales has also been reported in a recent phylogeographical study that analysed 379 genomic, epidemiological, and mobility data (including ours) collected from ten European 380 countries between January and October 2020 (30). This study found that the intensity of 381 international travel predicted the spread of SARS-CoV-2 with the introduction of new viral 382 variants, and it estimated that more than half of the lineages that were spreading in late 383 summer 2020 had been newly introduced to countries since mid-June 2020. Such 384 importations into countries of SARS-CoV-2 variants and their impact on the pattern of 385 incidence of SARS-CoV-2 cases have also been reported in other studies (31). This sharply 386 confirms what we described as early as at the beginning of September 2020 (2) with the 387 emergence of seven new SARS-CoV-2 including one (Marseille-1) for which we had already 388 found a source (3). Thus, we were the first to consider and highlight that the re-increase of 389 SARS-CoV-2 incidence observed during the summer of 2020 in Marseille and in France may 390 have resulted from the importation of new variants rather than from a rebound of previous 391 viral genotypes (2, 3). This aligns with the fact that countries for which cross-border travel 392 was the most limited geographically and/or politically by closing the borders experienced a 393 single epidemic peak, whereas countries with high levels of international travel experienced 394 several epidemic peaks and longer periods with a significant incidence of infections. Hence, 395 these findings indicate that while local control measures such as lockdowns and curfews may 396 limit viral spread, they are inefficient to avoid the introduction of new viral genotypes. The 397 role of local lockdowns in controlling the viral spread has been controversial. This measure 398 has been found to be irrelevant by some researchers (32) . In Japan, lockdown was not 399 associated with a reduction of SARS-CoV-2 incidence, whereas travel restrictions were (33). 400 Second, our findings indicate that the role of epizootics in dense animal herds has been 401 overlooked in the generation of variants, their transmission to humans, and their contribution 402 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (when sequences reads were generated on the MiSeq Illumina instrument) were used. 521 Detection of mutations was performed using the freebayes tool 522 (https://github.com/freebayes/freebayes) (45) with a mapping quality score of 20. SAMtools 523 was used for soft clipping of Artic primers (https://artic.network/), and to remove PCR 524 duplicates (46). Sequences described in the present study have been deposited on the GISAID 525 sequence database (https://www.gisaid.org/) (12) and can be retrieved online using the 526 GISAID online search tool with "Marseille" as keyword, then selecting sequence names 527 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 12, 2021. ; https://doi.org/10.1101/2021.09.10.21262922 doi: medRxiv preprint containing "IHU" or "MEPHI". In addition, they have been deposited on the IHU Marseille 528 Infection website: https://www.mediterranee-infection.com/sequences-genomiques-sars-cov-529 2-completes-partielles-sequences-spike-protein-jusquen-mai-2021/. 530 Numbers of nucleotide changes in the SARS-CoV-2 genomes and of amino acid 531 changes in the SARS-CoV-2 spike protein were obtained using the Nextclade tool 532 (https://clades.nextstrain.org/results). They were plotted using the GraphPad software v5.01 533 (https://www.graphpad.com/). Pairwise nucleotide distances between SARS-CoV-2 genomes 534 were computed using the MEGA7 software v10.2.5 (https://www.megasoftware.net/). 535 For specimens with Ct values > 30 or those with Ct values < 30 but from which genome 537 sequences were not obtained, we identified those harbouring specific Marseille genotypes 538 using qPCR targeting variant-specific regions. We previously described qPCR specific of the 539 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Patterns of incidence curves of SARS-CoV-2-associated deaths are shown in 10 panels numbered 825 from 1 to 10; these numbers correspond to those from the cladogram of Figure 1A . The example of 826 one country is shown in a panel at the right of each of the ten patterns. 827 (C) Worldwide distribution of the ten major patterns of incidence curves of SARS-CoV-2-associated 828 deaths defined based on hierarchical clustering. 829 Colours are as used for Figure 1A . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 12, 2021. ; https://doi.org/10.1101/2021.09.10.21262922 doi: medRxiv preprint WHO Declares COVID-19 a Pandemic Dramatic increase in the SARS-CoV-2 mutation rate and low mortality 624 rate during the second epidemic in summer in Marseille Introduction into the Marseille geographical area of a mild SARS-CoV-627 2 variant originating from sub-Saharan Africa: An investigational study Emergence and outcomes of the SARS-CoV-2 'Marseille-4' variant Classification: purposes, principles, progress, prospects Viral quasispecies On the nature of virus quasispecies Transmission of SARS-CoV-2 on mink farms between 636 humans and mink and back to humans SARS-CoV-2, and the Human-Animal Interface No evidence for increased transmissibility from recurrent 640 mutations in SARS-CoV-2 A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist 642 genomic epidemiology Nextstrain: real-time tracking of pathogen evolution The effect of travel restrictions on the spread of the 2019 novel 648 coronavirus (COVID-19) outbreak Ultrarapid diagnosis, microscope imaging, genome sequencing, and 650 culture isolation of SARS-CoV-2. Eur Genomic diversity and evolution of coronavirus (SARS-CoV-2) in Implementation of an 656 in-house real-time reverse transcription-PCR assay to detect the emerging SARS-CoV-2 657 N501Y variants Implementation of an in-house real-time reverse transcription-PCR 659 assay for the rapid detection of the SARS-CoV-2 Marseille-4 variant Spread of a SARS-CoV-2 variant through Europe in the summer of 662 2020 Virus Taxonomy: The ICTV 664 Report on Virus Classification and Taxon Nomenclature. The Online (10th) Report of 665 the International Committee on Taxonomy of Viruses Geminivirus strain demarcation and nomenclature Introduction of SARS-CoV-2 mutants and variants in the Marseille geographical area 672 through travel from abroad Incidence rate of SARS-CoV-2 infections by French department 675 per sliding week Spreading of a new SARS-CoV-2 N501Y spike variant in a new 677 lineage Limited spread of a rare spike E484K-harboring SARS-CoV-2 in New Pathways of Mutational Change in SARS-CoV-2 Proteomes Involve Regions of Intrinsic Disorder Important for Virus Replication and Release Two-millennia fighting against port-684 imported epidemics SARS-CoV-2 variant from India to Marseille: the still active role of 687 ports in the introduction of epidemics Untangling introductions and persistence in COVID-19 resurgence in Early and ongoing importations of SARS-CoV-2 in Canada Should governments continue lockdown to slow the 694 spread of covid-19? A Cross-Country Analysis of the Determinants of Covid-19 SARS-CoV2 spike protein gene variants with N501T and G142D 700 mutation-dominated infections in mink in the United States Clinical outcomes in COVID-19 patients infected with different SARS CoV-2 variants in Marseille, France Online ahead of print Not all COVID-19 pandemic waves are alike SARS-CoV-2 variants, spike mutations and immune escape Confronting the Delta Variant of SARS-CoV-2, 710 Summer 2021 An interactive web-based dashboard to track COVID-19 in 712 real time Clustering of time series data-a survey A review on time series data mining Similarity Measure Selection for Clustering Time Series Databases Defining clusters from a hierarchical cluster tree: 720 the Dynamic Tree Cut package for R Minimap2: pairwise alignment for nucleotide sequences Haplotype-based variant detection from short-read sequencing The Sequence Alignment/Map format and SAMtools Recently agreed changes to 728 the International Code of Virus Classification and Nomenclature Revising the way we conceive and name viruses below the 731 species level: a review of geminivirus taxonomy calls for new standardized isolate 732 descriptors Virus species and virus identification: past and current 734 controversies In vitro site-directed mutagenesis: 736 generation and properties of an infectious extracistronic mutant of bacteriophage Qbeta Molecular self-organization and the early stages of evolution Viral quasispecies Darwinian evolution in the light of genomics The causes and consequences of 745 HIV evolution Virus nomenclature below the species level: a standardized 747 nomenclature for laboratory animal-adapted strains and variants of viruses assigned to 748 the family Filoviridae Three-dimensional plot of weekly proportions accounted by each SARS-CoV-2 mutants and 872 variants among patients SARS-CoV-2-diagnosed Weekly incidence of each SARS-CoV-2 mutants and variants extrapolated to the total number of 874 cases, based on their proportions of genotyped cases V3 84 21D B.1.525 Eta C1498T 17 20A B.1.416.1 -C28833T, C2706T, C25731T V2-20B 9 20B B.1.1.318 -C3961T, C9072T, C9891T Marseille-681R-19B 7 19B A.23.1 -C9430T, A15477T, C18395T, A20622T, G20623T, A20624T, C23730T, A26319G, C28854T