key: cord-0261794-bxvymsne authors: Nadeau, S. A.; Vaughan, T. G.; Beckmann, C.; Topolsky, I.; Chen, C.; Hodcroft, E.; Schaer, T.; Nissen, I.; Santacroce, N.; Burcklen, E.; Ferreira, P.; Jablonski, K. P.; Posada-Cespedes, S.; Capece, V.; Seidel, S.; Santamaria de Souza, N.; Martinez-Gomez, J. M.; Cheng, P.; Bosshard, P. P.; Levesque, M. P.; Kufner, V.; Schmutz, S.; Zaheri, M.; Huber, M.; Trkola, A.; Cordey, S.; Laubscher, F.; Goncalves, A. R.; Aeby, S.; Pillonel, T.; Jacot, D.; Bertelli, C.; Greub, G.; Leuzinger, K.; Stange, M.; Mari, A.; Roloff, T.; Seth-Smith, H.; Hirsch, H. H.; Egli, A.; Redondo, M.; Kobel, O.; Noppen, C.; Be, title: Swiss public health measures associated with reduced SARS-CoV-2 transmission using genome data date: 2021-11-11 journal: nan DOI: 10.1101/2021.11.11.21266107 sha: 9efe13daade710da81bc2382c64e0300de6edbb7 doc_id: 261794 cord_uid: bxvymsne Genome sequences allow quantification of changes in case introductions from abroad and local transmission dynamics. We sequenced 11,357 SARS-CoV-2 genomes from Switzerland in 2020 - the 6th largest effort globally. Using these data, we estimated introductions and their persistence throughout 2020. By contrasting estimates with null models, we estimate at least 83% of introductions were adverted during Switzerland's border closures. Further, transmission chain persistence roughly doubled after the partial lockdown was lifted. Then, using a novel phylodynamic method, we suggest transmission in newly introduced outbreaks slowed 36 - 64% upon outbreak detection in summer 2020, but not in fall. This could indicate successful contact tracing over summer before overburdening in fall. The study highlights the added value of genome sequencing data for understanding transmission dynamics. SARS-CoV-2 genomes were collected at an unprecedented scale in 2020 (1) and they have been extensively used to characterize transmission dynamics, in particular because genetic data can be 15 used to link related cases. This enables reconstruction of introductions and downstream transmission chains in the absence of contact tracing data (2) . Where contact tracing data is available, this approach has been verified and has additionally helped link unassigned individuals to known transmission chains (3, 4) . So far, genome data have primarily been used to reconstruct transmission dynamics at the 20 onset of the pandemic in spring 2020. Phylogenetic approaches reconstruct pathogen phylogenies for calculation of relevant statistics without further explicit models. For example, (5, 6) showed that national lock-downs during the early Irish and English epidemics coincided with reduced lineage size and diversity. In Switzerland, (7) linked regional super-spreading events to a dominant lineage in the city of Basel using a phylogenetic reconstruction. Phylodynamic studies assume the 25 phylogeny arises from some underlying model of transmission between hosts (and possibly migration of hosts between regions). This assumption enables estimation of population-level transmission dynamics from pathogen genome data. For example, (8) (9) (10) showed that public health measures reduced SARS-CoV-2 transmission rates in Israel, New Zealand, and Washington State, USA. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted https://doi.org/10.1101 https://doi.org/10. /2021 However, new models and careful considerations of potential biases are required to quantify effects of specific public health measures. Here, we developed a two-step analysis framework carefully combining phylogenetic and phylodynamic methods to address potential sampling biases and phylogenetic uncertainty while enabling analysis of a large data set of 5, 520 sequences representing approximately 5% of weekly confirmed cases for the first pandemic year 5 in Switzerland. We use these sequences to quantify transmission dynamics until just before the widespread dispersal of more transmissible variants of concern in December 2020 (World Health Organization, n.d.) . We aimed at quantifying the association between cross-border and local SARS-CoV-2 transmission dynamics and the implementation and lifting of major public health measures. We consider border restrictions, lockdown measures, and contact tracing -three front- 10 line tools in the fight against COVID-19. 15 First, we identified independent introductions of SARS-CoV-2 into Switzerland and estimated their persistence. To do this, we selected a spatiotemporally representative set of Swiss SARS-CoV-2 genome sequences ( Figure S1 ) and then divided them by Pango lineage (12) . We aggregated Pango lineages dominated by Swiss sequences into parent lineages so that we can assume each analyzed lineage originated outside Switzerland (Table S1 ). We constructed an 20 approximate maximum-likelihood phylogeny for each lineage of Swiss and genetically similar foreign sequences. Next, we identified independently introduced singleton sequences or clusters of Swiss sequences from these lineage phylogenies as described in the Materials and methods. Importantly, we identified two different plausible sets of introductions resulting from two different assumptions about the ordering of transmission events at polytomies with Swiss and non-Swiss 25 descendants. We refer to these different sets as "few" or "many" introductions and they are based on different extreme assumptions on the number of introductions when phylogenetic uncertainty is large. We estimate that the analyzed sequences originate from between 557 (few introductions) and 2284 (many introductions) independent introductions into Switzerland. These introductions 30 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted https://doi.org/10.1101 https://doi.org/10. /2021 4 are roughly power law-distributed in size, with the 10 largest introductions accounting for 16% (many introductions) to 30% (few introductions) of sampled genomes ( Figure S2 ). From a downsampling analysis, we observe that we do not reach saturation -if we were to include more sequences, we would identify more introductions ( Figure S3A ). Therefore, the total number of introductions into Switzerland cannot be determined from our data. 5 Since we sampled genome sequences proportional to the number of confirmed cases through time, we expect that trends through time in the number and persistence of introductions are representative of the underlying dynamics. Figure 1A shows the number of newly sampled introductions into Switzerland each week, which peaked the week of 15 March. Switzerland closed its external borders beginning 13 March 2020 (13). We fit a simple null model to the estimated 10 sampled introductions before the borders closed. Namely, we assume introductions into Switzerland are a linear function of case numbers in neighboring countries. Figure 1B shows this model fit to both sets of identified introductions and the resulting projections from 13 March through the re-opening of Switzerland's European borders on 15 June 2020. We estimate that compared to the null model, reduced travel in the border closure period averted 593 -4038 sampled 15 introductions, a lower bound estimate for the true number of introductions averted. This represents a reduction of at least 83%. Finally, newly sampled introductions rose more-or-less steadily from border opening until November 2020, coinciding with borders reopening across Europe and globally and a European fall wave of cases. New introductions cannot sustain an epidemic unless they persist in the local population. 20 Our analysis suggests several introductions were quite persistent in Switzerland, including one that may have persisted across the whole sampling period ( Figure S4 ). On average, introductions persisted 5 -34 days from first to last sampling in 2020. However, introductions in late 2020 persisted longer than in early 2020. Figure 1C shows that 0.5 -8% of April introductions persisted at least 60 days, compared to 12 -52% of September introductions. 25 We can also formulate a simple null model for persistence in the absence of spring 2020 lockdown measures and associated behavioral changes. We assume that persistence, measured as the time until introductions circulating each day are last sampled, does not change through time. We assume this delay distribution always equals median persistence calculated over the entire spring. Figure 1D contrasts this null model assumption with empirical persistence calculated from 30 each day for each set of introductions. The distribution does indeed vary through time, deviating . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted https://doi.org/10.1101 https://doi.org/10. /2021 5 from the null model. We estimate median persistence for introductions at lockdown start is less than or around the median calculated over the whole spring and rises to above the null model in the post-lockdown period. Quantitatively, introductions persist roughly twice as long at a postlockdown peak around 10 June compared to at lockdown start, though temporal signal is noisy ( Figure 1D ). We note that under the few introductions assumption, persistence estimates are upper-5 bounded by the end of our sampling period, so the increase in persistence may be an underestimate (Figure1D). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted https://doi.org/10.1101 https://doi.org/10. /2021 Genome data capture summer 2020 "slowdown" transmission dynamics Next, we investigated more specific transmission dynamics once SARS-CoV-2 lineages were introduced to Switzerland. Here, we quantified time-varying transmission dynamics in Switzerland in a Bayesian phylodynamic framework. As a base model, we used the birth-death model with 5 serial sampling originally described by (14) . We conditioned this model on the introductions identified from our phylogenetic analysis. In a nutshell, the model assumes that once lineages are introduced, they are transmitted between hosts according to a time-varying transmission rate which is the same across introductions, die out upon recovery/death of the host according to a constant becoming-uninfectious rate, and yield genome samples with a time-varying sampling probability. 10 We assume individuals who test positive adhere to the self-isolation regulations, so lineages die out when they are sampled. Under this parameterization, the effective reproductive number in Switzerland is a function of the transmission rate, becoming-uninfectious rate, and sampling probability. Here, we develop an extension to this methodology by adding a transmission rate 15 "damping" factor, as shown in Figure 2 . The transmission rate is assumed to decrease by a multiplicative damping factor two days after each introduction is first sampled. We use a spikeand-slab prior on this factor to include the possibility of no transmission slowdown. We allow this damping factor to vary between spring, summer, and fall 2020 in Switzerland -periods characterized by very different case numbers and testing regimes ( Figure 3A , (15)). 20 Using this model, we aim to test whether contact tracing efforts in Switzerland slowed transmission once introductions were detected. We reason that test-trace-isolate can only slow transmission shortly after the first case of an introduction tests positive but not beforehand, as beforehand the introduction was circulating cryptically. The two-day delay aims to account for the time between an individual giving a sample (i.e., being swabbed) and having their contacts 25 notified. Specifically, this delay consists of the time to RT-PCR results, which was generally lower than 24 hours in most Swiss laboratories (16) ,plus a small delay for contact tracers to reach the contacts. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted https://doi.org/10.1101 doi: medRxiv preprint are sampled. The likelihood of the genome sequence data at the tips of the phylogenies is calculated 10 given the "applied" effective reproductive number specific to each introduction (B and C, bottom). Across several model configurations (see supplementary text) and our two polytomy assumptions, we estimate effective reproductive number values that roughly capture the same trends as estimates based on confirmed case numbers ( Figure S5B ). Strikingly, we estimate the 15 reproductive number dropped from between 3.7 -6.7 in the week of 9 March to between 0. 2 -0.5 in the week of 16 March 2020, coinciding with lockdown measures (estimates are posterior median estimates conditioned on the few and many identified introductions, respectively). On the other hand, sampling probability estimates in fall 2020 are strongly dependent on the prior ( Figure S5A ), which we discuss more in the supplementary text. 20 Based on our main model configuration, we estimate a 36 -64% slowdown in transmission after introductions are first sampled in summer 2020 (posterior median estimates conditioned on the many and few identified introductions, respectively). In comparison, we estimate no or very little slowdown upon the first sampling during fall 2020 ( Figure 3 ). These results are qualitatively robust to conditioning on few or many introductions and imposing a strong prior bound on the 25 sampling proportion ( Figure S6 ). In contrast, estimates in spring 2020 are inconsistent. This could . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted https://doi.org/10.1101 doi: medRxiv preprint 8 be because low genomic diversity in SARS-CoV-2 during this period causes high phylogenetic uncertainty (17) . In summary, we report a "slowdown" dynamic in SARS-CoV-2 transmission in Switzerland in summer 2020 where transmission slows after the first genome in a new introduction is sampled, while this slowdown is not observed in fall 2020. show estimates for if and how much transmission rates were dampened after introductions were 10 sampled during different time periods in (C) Switzerland and (D) New Zealand. The inference was done twice, once conditioning on introductions identified assuming many introductions (light grey) and once assuming few introductions (dark grey). Thus, the difference between estimates in light and dark grey are due to phylogenetic uncertainty. While Switzerland is centrally located in Europe and well-connected to other countries, especially those in the (normally) barrier-free Schengen travel zone, New Zealand is a relatively isolated island nation. Additionally, New Zealand aimed to eradicate SARS-CoV-2 throughout 2020 using strong measures, such as keeping its borders closed (19) , while Switzerland re-opened to Europe 20 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted https://doi.org/10.1101 doi: medRxiv preprint 9 on 15 June. We compared our estimates for the transmission damping factor between Switzerland in spring, summer, and fall 2020 with New Zealand before and after an epidemic breakpoint in mid-May 2020 when local transmission was briefly eradicated (9) . From this point until early August, all cases in New Zealand were linked to managed quarantine facilities at the border. Then, a new community outbreak was identified on 11 August (20) . Case numbers were subsequently 5 held at lower levels through December 2020 ( Figure 3B ). Based on the available genome data from New Zealand, which includes samples linked to managed quarantine facilities and from the community according to the sequence submitters, we estimated damping factors in New Zealand before and after 15 May to be comparable or stronger than in Switzerland during summer and fall 2020 ( Figure 3D ). (21). We estimate that introductions circulating on 17 March . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted https://doi.org/10.1101 https://doi.org/10. /2021 persisted only about half as long as in mid-June. We also estimate that only 0.5 -8% of April introductions persisted more than 60 days, compared to 12 -52% of September introductions. These findings agree with (22) , who demonstrated a reduction in the number of transmission clusters and the risk of transmission within clusters in the Canton of Vaud, Switzerland after lockdown. Finally, we obtained genome-based estimates for the time-varying effective 5 reproductive number throughout 2020 from our phylodynamic inference. We estimate the reproductive number dropped from between 3.7 -6.7 the week of 9 March to between 0.2 -0.5 the week of 16 March, coinciding with lockdown measures. Since our genome sampling was proportional to case numbers in Switzerland, these estimates are informed by case numbers but the genome data additionally inform the model on putative within-Switzerland transmission, 10 removing potential bias due to introduced cases. Finally, we quantified a summertime "slowdown" dynamic in which introductions initially spread faster, then slowed 36 -64%. A plausible explanation of this dynamic is a successful testtrace-isolate implementation that roughly halved transmissions once an introduction was identified during summer 2020 in Switzerland. For this analysis, we make the strong assumption that all in transmitting more than non-travelers (23) . Thus, a passive transmission slowdown might have happened as introduced lineages moved into the non-traveler population. We would expect travelers in fall to have similar contact networks as those in summer, but we do not quantify a transmission slowdown in Switzerland in fall. This coincides with high case numbers during a fall wave, when Swiss contact tracing was reported to be overburdened (24). Second, contacts of 25 positive cases are likely tested more intensely, potentially yielding "bursts" of samples around the first detected cases that subsequently disappear. If so, we can still interpret the slowdown dynamic as evidence that test-trace-isolate implementation was working, but it is difficult to determine precisely how much transmission actually slowed. International comparisons also lend perspective to the transmission slowdown effect we 30 quantify from Swiss genome data. Again using our newly developed phylodynamic model, we . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted https://doi.org /10.1101/2021.11.11.21266107 doi: medRxiv preprint quantified a significant slowdown effect in New Zealand during two different time periods. Thus, this slowdown effect is not unique to Switzerland in summer. Importantly, (4) showed -using genome sequence data -that New Zealand contact tracing was highly effective in identifying SARS-CoV-2 infection clusters. Then, (25) recently exploited an accidental, partial breakdown of English contact tracing to show that normal contact tracing in early fall 2020 reduced transmissions 5 by 63% in the 6 weeks following a positive case. This measure is within the range of our estimates for a transmission slowdown in Switzerland in summer 2020. Together, our results quantify the reduction of case importation and local transmission in Switzerland in time periods of general border closure and general lockdown measures using phylogenetic methods. Second, we provide genome-based quantification of a summertime 10 transmission slowdown that may be linked to successful contact tracing efforts. This slowdown is not observed in fall when the specific contact tracing efforts were overwhelmed. We envision that the quantitative estimates presented here can help policy-makers weigh these general and specific measures against burdens they impose. Most of the Swiss SARS-CoV-2 genomes analyzed until 1 December 2020 were generated by the Swiss SARS-CoV-2 Sequencing Consortium (26), which contributed in total 11,357 SARS-CoV-2 genome sequences in 2020. Here, we briefly describe the swab-to-sequence process for these 20 samples. RNA extracts from qPCR-positive patient naso-or oral-pharangeal swabs were provided by Viollier AG, a Swiss medical diagnostics company. RNA extraction was done with either the Abbott m2000sp or Seegene STARMag 96x4 Universal Cartridge RNA extraction kit. Extracts were then transferred to the Genomics Facility Basel or the Functional Genomics Center Zurich for whole-genome sequencing. Both centers used the ARCTIC v3 primer scheme (27, 28) to 25 generate tiled, approximately 400bp-long amplicons. Library preparation was done with the New England Biolabs (NEB) library preparation kit. Libraries were sequenced on Illumina MiSeq or NovaSeq machines, resulting in 2 x 251 base reads. Bioinformatics was done using V-pipe (29), including read trimming and filtering with PRINSEQ (30), alignment to GenBank accession MN908947 (31) with bwa (32) , and consensus base calling. For consensus base calling, positions 30 with <5x coverage are masked with "N", positions with >5% and >2 reads supporting a minor base . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted https://doi.org/10.1101 https://doi.org/10. /2021 are called with IUPAC ambiguity codes, and positions with >50% reads supporting a deletion are called with "-". We rejected samples with <20,000 non-N bases. The consensus sequences we generated have been made publicly available on both GISAID (33) (submitting lab: Department of Biosystems Science and Engineering, ETH Zürich) and European Nucleotide Archive (ENA) (study: PRJEB38472). 5 We supplemented these data with other Swiss sequences and foreign sequences available via GISAID (accessed 31 May 2021). From the full set of sequences available on GISAID, we removed from consideration non-human samples, samples < 27000 bases long, and samples flagged by the Nextclade tool (34) for one of the following reasons: suspiciously clustered SNPs (QC SNP clusters status metric not "good"; >= 6 mutations in 100 bases), too many private 10 mutations (QC private mutations status metric not "good"; >= 10 mutations from the nearest tree node), or overall bad quality (Nextclade QC overall status "bad"). We aligned the sequences to the reference genome MN908947.3 using MAFFT (35) . Finally, we followed the Nextstrain pipeline's recommendation to mask the first 100 and last 50 sites of the alignment (36) since the start and end of SARS-CoV-2 sequences are prone to sequencing error (37) . From the quality-filtered alignment of GISAID sequences, we selected a focal set of sequences from Switzerland and a context set of sequences from abroad. For the focal sequences from Switzerland, we aimed to select a spatially and temporally 20 representative sample. Therefore, we down-sampled available sequences to 5% of confirmed case counts in each Swiss Canton each week from the start of the Swiss epidemic in February until 1 December 2020. Cases were only attributed at the Canton level beginning in mid-May, so until this point, we sampled randomly from across Switzerland. Where there were not enough sequences available from a Canton in a week, we took all available sequences. Figure S1 shows the resulting 25 numbers of analyzed sequences through time for each Canton compared to confirmed case numbers. To reduce the size of the alignments for phylogenetic analysis, we divided the focal Swiss set into Pango lineages (12) , similar to (6) . Lineages composed of > 50% Swiss sequences were aggregated into their parent lineage(s) until <= 50% were Swiss. This aims to ensure that each analyzed lineage originated outside of Switzerland. Table S1 gives the number of analyzed 30 sequences per lineage. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted https://doi.org/10.1101 https://doi.org/10. /2021 For the contextual sequences from abroad, we aimed to select the most genetically similar sequences to focal Swiss sequences. This set should help distinguish between SARS-CoV-2 variants unique to Switzerland (likely within-Switzerland transmission) and variants also circulating abroad (possibly recent introductions). To select this set, we applied the Nextstrain priority script (38) to rank sequences from abroad by their genetic similarity to Swiss sequences 5 in each lineage alignment. Then, we selected two times as many context sequences as focal Swiss sequences for each analyzed lineage. We tested different numbers of focal Swiss sequences and different ratios of contextual to focal sequences, with the conclusion that the most variation in identified introductions is still due to how we define an introduction (see "Identifying introductions", below and Figure S3 ). The final 10 dataset used for analysis includes 5,520 focal sequences from Switzerland and 11,009 genetically similar contextual sequences from abroad. We estimated the maximum likelihood phylogeny for each lineage alignment using IQ-TREE (39) 15 under an HKY substitution model (40) with empirical base frequencies and four gamma rate categories. We then rooted each phylogeny with GISAID strain EPI_ISL_406798 as an outgroup and estimated branch lengths in time units using least-squares dating (LSD) (41) with a strict molecular clock and a minimum mutation rate of 8x10-4 substitutions per site per year. We additionally assumed the root date to be between 15 November and 24 December 2019 (roughly 20 in line with estimates provided by Nextstrain (42) ) and set the minimum branch length to zero. Sequences that violated the strict clock assumption (z-score threshold > 3) were removed and nearzero branches (<1.7x10-5 substitutions per site per year) were collapsed into polytomies, reflecting the fact that the sequence data is not sufficient to resolve the ordering of these transmission events. Given the root date constraints, the mutation rate conformed to the lower bound of 8x10-4 with 25 extremely narrow confidence intervals. Confidence intervals for node dates were generated in LSD by re-sampling branch lengths 100 times under a lognormal relaxed clock model with standard deviation 0.4. 30 We identified putative Swiss transmission chains (collections of two or more genomic sequences resulting from within-Switzerland transmissions) from each Pango lineage tree according to the . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted https://doi.org/10.1101 https://doi.org/10. /2021 following criteria applied on a recursive tip-to-root tree traversal: at least two Swiss sequences are part of a clade in the tree and the subtree spanned by these Swiss sequences is monophyletic upon removing (a) up to three export events where (b) only one export event may occur along each internal branch. Exports are defined to be clades containing non-Swiss sequences. We chose a conservative value for criterion (b) while still allowing some export events and note that the We refer to any Swiss sequence not falling into a Swiss transmission chain as a Swiss singleton. We assume each singleton and each transmission chain represent an independent 10 introduction of SARS-CoV-2 into Switzerland; together these are called introductions. Finally, we repeated this procedure twice for each lineage tree, interpreting polytomies in two different ways. Namely, we made two different assumptions upon reaching a polytomy where non-Swiss descendent(s) of the polytomy would cause the proposed introduction to violate criterion (a). To resolve the polytomy such that it generates many introductions, we split descendent Swiss 15 clades into independent introductions. Alternatively, to resolve the polytomy such that it generates few introductions, we aggregated descendent Swiss clades, going in size order, into a single introduction. If in doing this we reached criterion (a), we continued aggregating descendants into a second introduction, and so on. These procedures are heuristic, but are conceptually similar to the ACCTRANS (accelerated transformations) and DELTRANS (delayed transformations) 20 methods for assigning character transformations when multiple scenarios are equally parsimonious (43) . In summary, we identify introductions twice, generating estimates that represent two plausible sets of many introductions and few introductions. Our pipeline for down-sampling available sequences, constructing phylogenies, and estimating introductions is available at https://github.com/cevo-public/Grapevine-SARS-CoV-2-Introduction-Analysis. We wanted to understand if the Swiss border closure on 13 March 2020 coincide with a change in introduction dynamics. Our null model is that the number of weekly introductions before border closure, estimated from the genome data as described above, is a linear function of weekly SARS- 30 CoV-2 cases in surrounding countries. Introductions are dated by the first sampled case, whereas case counts (taken from the ECDC (18)) are dated by case reporting dates. We considered an . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted https://doi.org/10.1101 https://doi.org/10. /2021 average delay between these variables up to 18 days. We also considered two options for "surrounding countries", either all the non-Swiss European countries in the ECDC dataset or just Switzerland's neighboring countries Italy, France, Germany, and Austria. We fit these models separately to the weekly many and few introductions estimates before 13 March 2020 and selected the best-fit models (lowest root mean squared error). For both estimates, the neighboring-5 countries-only incidence data better predicted introductions. The best-fit delay was 2 days for the many introductions estimates compared to 5 days for the few introductions estimates, so we used these values. The model was only fit to the few weekly introduction estimates before 13 March 2020, values after that date are projections based on the neighboring country case numbers. We also wanted to understand if the Swiss lockdown between 17 March and 27 April 2020 coincided with a change in the persistence of introductions. Our null model is that introductions circulating on any given day persist equally long, regardless of the day. In other words, introductions die out (are no longer sampled) according to some delay distribution that is constant 15 through time. For each date, we calculated the time from that date to the last sample for each introduction persisting on that date. Singleton introductions trivially persist for 1 day. Then, we report the median and inter-quartile range of this delay distribution from each date. 20 After identifying introductions, we performed phylodynamic inference on them using the BDSKY (birth-death skyline) method (44) in BEAST2 (45) . To avoid model mis-specification due to the more transmissible alpha variant, we analyzed data only until 1 December 2020. The inference relies on two main models: a nucleotide substitution model describing an evolutionary process and a population dynamical model describing a transmission and sampling process. For the nucleotide 25 substitution model, we assumed an HKY (40) model with four Gamma rate categories to account for site-to-site rate heterogeneity (46) . We used the default priors for kappa and the scale factor of the Gamma distribution. We assumed a strict clock with rate 8x10−4 substitutions per site per year. For the population dynamic model, we used BDSKY (44). In BDSKY, the identified introductions are the result of a birth-death with sampling process parameterized by an effective 30 reproductive number, a becoming-uninfectious rate, and a sampling probability. As in (47), we inferred these population dynamical parameters jointly from the different introductions. More . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted https://doi.org/10.1101 https://doi.org/10. /2021 concretely, each introduction is assumed to result from an independent birth-death process having its own start time, but sharing all other parameters with the processes associated with the other introductions. Further, we used a uniform prior on the time of origin for each introduction, from 1 January until the end of the sampling period. This means our prior expectation is a uniform rate of introductions through time. We fixed the become-uninfectious rate to 36.5 per year, which 5 corresponds to an average of 10 days to becoming uninfectious. We allowed the effective reproductive number to vary week-to-week. We applied an Ornstein-Uhlenbeck smoothing prior to the logarithm of this parameter, defined such that the stationary distribution of the process was LogNormal(0.8, 0.5) and with an Exp(1) hyperprior on the relaxation parameter of the process. This prior constrains both the absolute reproductive number and the relative sizes of the change 10 between weeks. Finally, we allowed the sampling probability to vary when Swiss testing or genome sampling regimes changed significantly (Table2). For our main analysis, we applied a LogUniform(10 −4 , 1) prior on the sampling probability. As a sensitivity analysis, we also tried a LogUniform(10 −4 , 0.05) prior since we upper-bounded our sampling to 5% of confirmed cases each week. See the supplementary text below for more information. 15 Then, we added an additional "contact tracing" factor to the model. This factor is a multiplicative damping of the effective reproductive number applied to each introduction from 2 days after the first sample date until the introduction dies out. Since we hypothesized contact tracing was not functioning as well during periods of high case numbers, we estimated a separate value in each of three periods: before 15 June 2020 (spring), 15 June to 30 September 2020 20 (summer), and 30 September to 1 December 2020 (fall). We used the same spike and slab prior for the damping factor in each period, with an inclusion probability of 0.5 and a uniform prior if included between 0 and 1. The XML files for the phylodynamic analysis are available at https://github.com/tgvaughan/TransmissionChainAnalyses. All the code used to generate figures and values for the manuscript is available at https://github.com/SarahNadeau/cov-swiss-phylo. As linked to within the previous sections, the raw results files upon which these scripts operate were produced by the phylogenetic analysis code at https://github.com/cevo-public/Grapevine- . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted https://doi.org/10.1101 https://doi.org/10. /2021 The Ethikkommission Nordwest-und Zentralschweiz (EKNZ) ruled on the ethics of genome sequence generation by the Swiss SARS-CoV-2 Sequencing Consortium (S3C) for this research. Ethical approval was waived as only viral material is processed. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted https://doi.org/10.1101 https://doi.org/10. /2021 CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted https://doi.org/10.1101 https://doi.org/10. /2021 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted https://doi.org/10.1101 https://doi.org/10. /2021 responsible for obtaining the specimens and the submitting laboratories where genetic sequence 25 data were generated and shared via the GISAID Initiative, on which this research is based. A full acknowledgements table of these groups, including the identifiers for all GISAID data used in this study, is available on the project GitHub repository at https://github.com/SarahNadeau/cov-swissphylo. We also thank Jana Huisman for valuable discussions on the manuscript. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted https://doi.org/10.1101 https://doi.org/10. /2021 Geographical and temporal distribution of SARS-CoV-2 clades in the WHO European Region Reconstruction and prediction of viral disease epidemics Revealing COVID-19 transmission in Australia by SARS-CoV-2 genome sequencing and agent-based modeling Phylodynamics Reveals the Role of Human Travel and Contact Tracing in Controlling the First Wave of COVID-19 in Four Island Nations Whole-genome sequencing of SARS-CoV-2 in the Republic of Ireland during waves 1 and 2 of the pandemic Establishment and lineage dynamics of the SARS-CoV-2 epidemic in the UK SARS-CoV-2 outbreak in a 30 tri-national urban area is dominated by a B.1 lineage variant linked to a mass gathering event Full genome viral sequences inform patterns of SARS-CoV-2 spread into and within 35 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted Genomic epidemiology reveals transmission patterns and dynamics of SARS-CoV-2 in Aotearoa New Zealand Viral genomes reveal patterns of the SARS-CoV-2 outbreak in Washington State World Health Organization, Tracking SARS-CoV-2 variants A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology Switzerland re-opens its European borders. SWI swissinfo.ch (2020) Sampling-through-time in birth-death trees Swiss Federal Office for Public Health, COVID-19 Switzerland | Coronavirus | Dashboard Impact of different SARS-CoV-2 assays on laboratory turnaround time Phylogenetic Analysis of SARS-CoV-2 Data Is ECDC, Download the daily number of new reported cases of COVID-19 by country worldwide History of the COVID-19 Alert System | Unite against COVID-19 Use of Genomics to Track Coronavirus Disease Outbreaks Emerging Infectious Diseases Size and duration of COVID-19 clusters go along with a high SARS-CoV-2 viral load: A spatio-temporal investigation in Vaud 5 state Spread of a SARS-CoV-2 variant through Europe in the summer of 2020 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted Measuring the scientific effectiveness of contact tracing: Evidence from a natural experiment Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples V-pipe: a computational pipeline for assessing viral genetic diversity from high-throughput data Quality control and preprocessing of metagenomic datasets A new coronavirus associated with human respiratory disease in China Fast and accurate short read alignment with Burrows-Wheeler transform MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform master · nextstrain/ncov · GitHub Issues with SARS-CoV-2 sequencing data Nextstrain build for novel coronavirus (nCoV) International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies Dating of the human-ape splitting by a molecular clock of mitochondrial DNA Fast Dating Using Least-Squares Criteria and Algorithms Lattice-theoretic properties of MPR-posets in phylogeny Birth-death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV) others, BEAST 2.5: An advanced software platform for Bayesian evolutionary 15 analysis Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods Characterising the epidemic spread of Influenza A/H3N2 within a city through phylogenetics Competing interests: TS is president of the Swiss National COVID-19 Science Task Force. Data and materials availability: All data used in the analysis is available on GISAID (gisaid.org). Data generated by the Swiss SARS-CoV-2 Sequencing Consortium is available on GISAID (submitting lab We gratefully acknowledge the authors from the originating laboratories Figs. S1 to S7 25 Tables S1 to S3. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted https://doi.org/10.1101 https://doi.org/10. /2021