key: cord-0894300-ojse10p5 authors: Korencak, Marek; Sivalingam, Sugirthan; Sahu, Anshupa; Dressen, Dietmar; Schmidt, Axel; Brand, Fabian; Krawitz, Peter; Hart, Libor; Maria Eis-Hübinger, Anna; Buness, Andreas; Streeck, Hendrik title: Reconstruction of the Origin of the First Major SARS-CoV-2 Outbreak in Germany date: 2022-05-10 journal: Comput Struct Biotechnol J DOI: 10.1016/j.csbj.2022.05.011 sha: a8317d677e5da975b828a2b748a051def8bf32e1 doc_id: 894300 cord_uid: ojse10p5 The first major COVID-19 outbreak in Germany occurred in Heinsberg in February 2020 with 388 officially reported cases. Unexpectedly, the first outbreak happened in a small town with little to no travelers. We used phylogenetic analyses to investigate the origin and spread of the virus in this outbreak. We sequenced 90 (23%) SARS-CoV-2 genomes from the 388 reported cases including the samples from the first documented cases. Phylogenetic analyses of these sequences revealed mainly two circulating strains with 74 samples assigned to lineage B.3 and 6 samples assigned to lineage B.1. Lineage B.3 was introduced first and probably caused the initial spread. Using phylogenetic analysis tools, we were able to identify closely related strains in France and hypothesized the possible introduction from France. In December 2019 China reported several fatal pneumonia cases. Shortly afterward, Zhou et. al. 16 identified the cause of those deaths: a novel coronavirus which was closely related to SARS-CoV and was 17 later named SARS-CoV-2 [1]. Since then, the virus has spread through all continents, and the World 18 Health Organization (WHO) declared a pandemic. While the first SARS-CoV-2 outbreaks occurred in most 19 countries in major cities including Milan [2], Manchester [3] or Chicago [4] or high-density traffic hubs [5-20 7] , the first outbreak in Germany happened in Heinsberg, a small relatively unknown town with little to 21 no tourism [8] . After a carnival session where a super-spread occurred, it was reported that about 3.1% 22 of the local population was PCR positive [8] . However, until today it is uncertain how the virus was 23 introduced in the first place to this town and how it was able to spread from thereafter. 24 25 The virus strains circulating today evolved from the original Wuhan strain by accumulating different 26 types of mutations. In general, RNA viruses have very high mutation rates which can be up to a million 27 times higher compared to their hosts, which may correlate with enhanced virulence and other traits 28 considered beneficial for virus replication [9] . Sequencing data suggest that coronaviruses change slower 29 than most other RNA viruses. This is likely due to a proofreading enzyme that corrects copying mistakes 30 [10]. At the root of the phylogeny of SARS-CoV-2 are two lineages that were denoted as lineages A and B. 31 The earliest lineage A virus (GISAID EPI_ISL_406801) was sampled on January 5, 2020. There are two 32 nucleotide positions, which help us to distinguish between these two lineages. known sequencing errors were masked using a custom python script [20] . Before the downstream 92 analyses, sequences were kept if they were longer than 28.000 bp and had less than 0.05% missing 93 bases. Columns that contained more than 50% gaps were also removed. After stringent quality control, 94 the maximum likelihood (ML) based phylogenetic tree reconstruction was performed using FastTree 95 v2. We sequenced the viral genomes of 90 (23%) of the 388 SARS-CoV-2 cases that were reported in the 120 Heinsberg district in February and March 2020. After quality control, we retained 89 samples for 121 phylogenetic analysis. Phylogenetic tree annotated by Pangolin annotation system revealed that the 122 samples clustered into groups 1 and 2 ( Figure 1A ). The majority of samples belonging to group 1 were 123 assigned to pangolin lineage B.3 (74 samples), whereas, in group 2, the majority of samples were 124 assigned to lineage B.1 (6 samples). Samples belonging to lineage B.3 were collected early in the 125 outbreak (before March 13, 2020, Figure 1B ), indicating that this lineage caused the initial outbreak. 126 Lineage B.1 was introduced at a later time point (after March 13, 2020, Figure 1B) To further investigate the origin at a state and national level, we performed a phylogenetic analysis using 134 Nextstrain. We incorporated SARS-CoV-2 samples from North-Rhine-Westphalia (NRW; state where the 135 outbreak took place) and in other regions in Germany that were uploaded to GISAID and collected 136 between December 5, 2019 to April 4, 2020. Subsampling was performed based on samples collected 137 from NRW and Germany (Figure 2A, 2B) . Our analysis confirmed that lineage B.3 was indeed the most 138 prevalent strain in the beginning of the outbreak in the region. The analysis also revealed that B.3 and 139 B.1 were competing strains around that time point. Thus, we hypothesize that the outbreak was not 140 caused by an introduction of a single virus strain, but rather a series of at least two individual events 141 which introduced different viral strains into this region and this fueled the spreading of the virus. 142 Outbreaks involving multiple variants have been observed before. SARS-CoV-2 genomic diversity study 143 from Brazil showed that lineage B.1 was the most prevalent one at the time point when is started to 144 gain significance also in Europe. They also concluded that a local transmission can be caused by 145 multiple strains [30] . Another outbreak at a university in the USA from March, 2021 -May, 2021 was 146 caused by multiple strains simultaneously, which was confirmed by the positive travel history of the 147 infected individuals [31] . Another outbreak with multiple variants was linked to a single flight from New 148 Delhi to Hong Kong in April 2021, in which 59 people were infected and the sequencing analyses 149 revealed at least 3 sub-lineages [32] . Similarly to these, we identified two dominant lineages and we 150 assume that their introduction did not occur simultaneously but rather distinctly in a timely manner. 151 152 We next used the same approach as above to identify the closest ancestor of the strain which caused the 153 outbreak. Performing a phylogenetic analysis, we used SARS-CoV-2 samples from Europe that were 154 collected between December 5, 2019 to April 4, 2020. We observe that the B.3 samples cluster in one 155 branch of the European SARS-CoV-2 phylogenetic tree. Additionally, the parent branch is assigned to 156 France. The European level analysis revealed a closely related strain located in France (Figure 3 ). Taking 157 into consideration that the first reported cases of SARS-CoV-2 in France were in the east of the country, a 158 region neighbor to Germany [33], it is possible that the virus was introduced from there. However, the 159 lack of information on the travel history and the subsampling approach applied in the Nextstrain 160 workflow limits the analysis. Moreover, the majority of the early samples before mid-February 2020 161 were collected in France. This may bias the phylogenetic analysis on the European level, but it is 162 consistent with the sample collection dates. 163 164 Lastly, we characterized and assessed the genetic differences between the two lineages which were 165 associated with the outbreak. We were able to identify a prominent missense mutation in the spike 166 protein D614G A) Nextstrain-based phylogenetic tree analysis using a subsampling schema based on European country 299 level from January to March 2020. The node colors indicate the exposed countries and each dot 300 represents a genome from the GISAID database. 301 B) Zoom into our cohort revealed that the internal nodes prior to the cohort was assigned to France. The 302 red circles indicate the representative genomes from our cohort and a closely related strain from France. 303 304 Authors contribution 305 MK conducted the wet lab experiments and wrote the manuscript. SS, AS performed the bioinformatic 306 and phylogenetic analyses. DD, AMEH provided the samples. All authors provided edits, discussion and 307 changes of the manuscript. A pneumonia outbreak associated with a new coronavirus of probable bat origin. 202 Nature Baseline Characteristics and Outcomes of 1591 Patients Infected With SARS-204 CoV-2 Admitted to ICUs of the Lombardy Region Real-world SARS CoV-2 testing in Northern England during the first wave of 206 the COVID-19 pandemic Transmission Dynamics of Large Coronavirus Disease Outbreak in Homeless 208 Shelter Epidemiological Characteristics of Infectious Diseases Among Travelers Between 212 China and Foreign Countries Before and During the Early Stage of the COVID-19 Pandemic. Front 213 Public Health Airport Pandemic Response: An Assessment of Impacts and Strategies after One 215 Year with COVID-19 Infection fatality rate of SARS-CoV2 in a super-spreading event in Germany Why are RNA virus mutation rates so damn high The coronavirus is mutating -does it matter? Nature A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic 221 epidemiology Cutadapt removes adapter sequences from high-throughput sequencing reads Minimap2: pairwise alignment for nucleotide sequences The Sequence Alignment/Map format and SAMtools An amplicon-based sequencing framework for accurately measuring 229 intrahost virus diversity using PrimalSeq and iVar QUAST: quality assessment tool for genome assemblies A quality control tool for high throughput sequence data MultiQC: summarize analysis results for multiple tools and samples in a single 235 report MAFFT: a novel method for rapid multiple sequence alignment based on fast 237 Fourier transform Issues with 239 SARS-CoV-2 sequencing data FastTree 2 -Approximately Maximum-Likelihood Trees for 242 Large Alignments Nextstrain: real-time tracking of pathogen evolution TreeTime: Maximum-likelihood phylodynamic analysis GISAID: Global initiative on sharing all influenza data -from vision to 249 reality ANNOVAR: functional annotation of genetic variants from 251 high-throughput sequencing data A high-performance computing toolset for relatedness and principal component 253 analysis of SNP data SARS-CoV-2 Lineages and Sub-Lineages Circulating Worldwide: A Dynamic 255 Overview Genomic epidemiology of SARS-CoV-2 reveals multiple lineages and early spread 257 of SARS-CoV-2 infections in Lombardy, Italy SARS-CoV-2 introduction and lineage dynamics across three epidemic peaks 259 in Southern Brazil: massive spread of P.1 Multiple Variants of SARS-CoV-2 in a University Outbreak After Spring Break -261 Air travel-related outbreak of multiple SARS-CoV-2 variants. medRxiv Hospital and Population-Based Evidence for COVID-19 Early Circulation in the 265 East of France Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases 268 Infectivity of the COVID-19 Virus SARS-CoV-2 outbreak in a tri-national urban area is dominated by a B.1 lineage 270 variant linked to a mass gathering event Dynamics, outcomes and prerequisites of the first SARS-CoV-2 272 superspreading event in Germany in February 2020: a cross-sectional epidemiological study Why crowding matters in the time of COVID-19 pandemic? -a lesson from 275 the carnival effect on the 2017/2018 influenza epidemic in the Netherlands