key: cord-0317132-r7ygjhfs authors: Colson, P.; Gautret, p.; Delerce, J.; Chaudet, H.; Pontarotti, P.; Forterre, P.; Tola, R.; Bedotto, M.; Delorme, L.; LEVASSEUR, A.; Lagier, J.-C.; Million, M.; Yahi, N.; Fantini, J.; La Scola, B.; Fournier, P.-E.; Raoult, D. title: The emergence, spread and vanishing of a French SARS-CoV-2 variant exemplifies the fate of RNA virus epidemics and obeys the Black Queen rule date: 2022-01-08 journal: nan DOI: 10.1101/2022.01.04.22268715 sha: 29f87685d89210f8371e824518bdce054b52a1bc doc_id: 317132 cord_uid: r7ygjhfs The nature and dynamics of mutations associated with the emergence, spread and vanishing of SARS-CoV-2 variants causing successive waves are complex. We determined the kinetics of the most common French variant (Marseille-4) for 10 months since its onset in July 2020. Here, we analysed and classified into subvariants and lineages 7,453 genomes obtained by next-generation sequencing. We identified two subvariants, Marseille-4A, which contains 22 different lineages of at least 50 genomes, and Marseille-4B. Their average lifetime was 4.1+/-1.4 months, during which 4.1+/-2.6 mutations accumulated. Growth rate was 0.079+/-0.045, varying from 0.010 to 0.173. All the lineages exhibited a gamma distribution. Several beneficial mutations at unpredicted sites initiated a new outbreak, while the accumulation of other mutations resulted in more viral heterogenicity, increased diversity and vanishing of the lineages. Marseille-4B emerged when the other Marseille-4 lineages vanished. Its ORF8 gene was knocked out by a stop codon, as reported in several mink lineages and in the alpha variant. This subvariant was associated with increased hospitalization and death rates, suggesting that ORF8 is a nonvirulence gene. We speculate that the observed heterogenicity of a lineage may predict the end of the outbreak. The shape of epidemic curves of acute infectious diseases is the subject of several hypotheses 59 and interpretations. The occurrence of successive waves of SARS-CoV-2 infections during 60 the current pandemic was linked to the emergence of viral variants [1] [2] [3] [4] [5] , while possible causes of 61 . CC-BY 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 8, 2022. leads to the accumulation of many random mutations, most presumably mildly deleterious, 152 with little effect on fitness. Only 10 -8 may be associated with a fitness gain 15, 18, 19 . Overall, the 153 RNA virus fitness evolution includes an initial period of rapid multiplication possibly caused 154 by a positive mutation followed by the decline of viral fitness caused by accumulation of unfit 155 mutations, as described for the vesicular stomatitis virus 15, 20 . 156 We observed heterogeneity of the growth rates for the different Marseille-4 subvariants and 157 lineages, making it challenging to generalize the behaviour of one SARS-CoV-2 subvariant or 158 lineage to all of them. All Marseille-4 lineages present a genetic signature with mutations 159 sometimes associated with the inactivation of ORF7a or ORF7b, as described 21 . None of these 160 mutations, apart from those located in the spike gene, were predicted to be possibly associated 161 with increased transmissibility. Knock-out of the ORF8 and ORF7a/b genes shows that 162 . CC-BY 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 8, 2022. ; 8 SARS-CoV-2 virulence may be associated with a loss of genes, as shown for bacteria in 163 which the decline in genomic content is often associated with an increased specificity and 164 virulence 22 is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 8, 2022. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 8, 2022. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 8, 2022. ; https://doi.org/10.1101/2022.01.04.22268715 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 8, 2022. ; https://doi.org/10.1101/2022.01.04.22268715 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 8, 2022. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 8, 2022. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 8, 2022. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 8, 2022. ; Nextclade tool (https://clades.nextstrain.org/) and freebayes 53 (https://github.com/freebayes/freebayes) 6 using a mapping quality score of 20 and results 54 filtered by the Python script based on major nucleotide frequencies ≥ 70% and nucleotide 55 depths ≥ 10 (when sequence reads were generated on the NovaSeq Illumina instrument 56 (Illumina Inc.)) or ≥ 5 (when sequence reads were generated on the MiSeq Illumina 57 instrument). SARS-CoV-2 genotyping was performed using a second in-house script written is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 8, 2022. ; https://doi.org/10.1101/2022.01.04.22268715 doi: medRxiv preprint A structural model of the ORF8 protein was generated from pdb file 7JTL 11 . The gaps in the 78 crystal structure were fixed by incorporating the missing amino acids with the Robetta protein 79 structure prediction tool 12 , followed by energy minimization with the Polak-Ribière algorithm 80 as previously reported 13 . Mutant and truncated proteins were then generated with Swiss-81 PdbViewer 14 and submitted to several rounds of energy minimization as described 15 . is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 8, 2022. ; https://doi.org/10.1101/2022.01.04.22268715 doi: medRxiv preprint 5 average infectious and pre-infectious periods (according to the SEIR model). Here, we set D 101 at 9.3 18 and d' at 3.3 19 . To calculate the growth rate, we used Chow's F test to determine the 102 inflexion point of the logarithm of cumulated number of cases, which corresponded to the end 103 of the exponential phase. Then, we applied a linear model for this phase to obtain the 104 regression slope (i.e. the growth rate) and its 95% confidence interval. Statistical processes 105 were performed using R software version 4.0.2 (https://cran.r-project.org/). All statistical 106 conclusions were made using a 0.05 threshold. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 8, 2022. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 8, 2022. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted January 8, 2022. ; https://doi.org/10.1101/2022.01.04.22268715 doi: medRxiv preprint Untangling introductions and persistence in COVID-19 resurgence in The emergence, genomic diversity and global spread 200 of SARS-CoV-2 Genomic reconstruction of the SARS-CoV-2 epidemic in England Ongoing global and regional adaptive evolution of SARS-CoV-2 Analysis of SARS-CoV-2 variants from 24,181 patients exemplifies the 206 role of globalisation and zoonosis in pandemics, Front. Microbiol Estimating epidemiologic dynamics from cross-sectional viral load Emergence and outcomes of the SARS-CoV-2 'Marseille-4' variant Analysis of SARS-CoV-2 variants from 24,181 patients exemplifies the 114 role of globalisation and zoonosis in pandemics, Front. Microbiol Implementation of an in-house real-time reverse transcription-PCR 117 assay for the rapid detection of the SARS-CoV-2 Marseille-4 variant Minimap2: pairwise alignment for nucleotide sequences The Sequence Alignment/Map format and SAMtools Haplotype-based variant detection from short-read sequencing Nextclade: clade assignment, 128 mutation calling and quality control for viral genomes A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist 131 genomic epidemiology Data, disease and diplomacy: GISAID's innovative 133 contribution to global health Structure of SARS-CoV-2 ORF8, a rapidly evolving immune evasion 135 protein Robetta server Structural dynamics of SARS-CoV-2 variants: A health monitoring 139 strategy for anticipating Covid-19 outbreaks SWISS-MODEL and the Swiss-PdbViewer: an environment 141 for comparative protein modeling Hybrid In Silico/In Vitro Approaches for the Identification of 143 Functional Cholesterol-Binding Domains in Membrane Proteins Parametric statistical change point analysis with applications to 146 genetics, medicine, and finance Estimating the generation interval and inferring the latent period of Supplementary Table S1. List of nucleotide and amino acid changes associated with the onset and expansion of Marseille-4A and Marseille-4B 219 subvariants and lineages Marseille-4 subvariants and lineages Nucleotide changes (hallmark mutations) Amino acid changes 01 C23191U -Marseille-4A.02 G28086U ORF8: A65S Marseille-4A.03 G487A -Marseille-4A.04 C222U -Marseille-4A.05 C3096U; C23188U ORF1a: S944L; -Marseille-4A.06 G29701A -Marseille-4A.07 C27434U ORF7a: T14I Marseille-4A.08 G571A -Marseille-4A.09 G2600U ORF1a: V779F Marseille-4A.10 G27877U ORF7b: C41F Marseille-4A