key: cord-0696369-vhhmivbq authors: Lopez, M. G.; Chiner-Oms, A.; Garcia de Viedma, D.; Ruiz-Rodriguez, P.; Bracho, M. A.; Cancino-Munoz, I.; Dauria, G.; de Marco, G.; Garcia-Gonzalez, N.; Goig, G. A.; Gomez-Navarro, I.; Jimenez-Serrano, S.; Martinez-Priego, L.; Ruiz-Hueso, P.; Ruiz-Roldan, L.; Torres-Puente, M.; Alberola, J.; Albert, E.; Aranzamendi Zaldumbide, M.; Bea-Escudero, M. P.; Boga, J. A.; Bordoy, A. E.; Canut-Blasco, A.; Carvajal, A.; Cilla Eguiluz, G.; Cordon Rodriguez, M. L.; Costa-Alcalde, J. J.; de Toro, M.; de Toro Peinado, I.; del Pozo, J. L.; Duchene, S.; Ferandez, J.; Fuster Escriva, B.; Gimeno Cardona, C.; Go, title: The first wave of the Spanish COVID-19 epidemic was associated with early introductions and fast spread of a dominating genetic variant date: 2020-12-22 journal: nan DOI: 10.1101/2020.12.21.20248328 sha: d0ea193672ea4b906a0be2ff41b8ff3497dc4236 doc_id: 696369 cord_uid: vhhmivbq The COVID-19 pandemic has shaken the world since the beginning of 2020. Spain is among the European countries with the highest incidence of the disease during the first pandemic wave. We established a multidisciplinar consortium to monitor and study the evolution of the epidemic, with the aim of contributing to decision making and stopping rapid spreading across the country. We present the results for 2170 sequences from the first wave of the SARS-Cov-2 epidemic in Spain and representing 12% of diagnosed cases until 14th March. This effort allows us to document at least 500 initial introductions, between early February-March from multiple international sources. Importantly, we document the early raise of two dominant genetic variants in Spain (Spanish Epidemic Clades), named SEC7 and SEC8, likely amplified by superspreading events. In sharp contrast to other non-Asian countries those two variants were closely related to the initial variants of SARS-CoV-2 described in Asia and represented 40% of the genome sequences analyzed. The two dominant SECs were widely spread across the country compared to other genetic variants with SEC8 reaching a 60% prevalence just before the lockdown. Employing Bayesian phylodynamic analysis, we inferred a reduction in the effective reproductive number of these two SECs from around 2.5 to below 0.5 after the implementation of strict public-health interventions in mid March. The effects of lockdown on the genetic variants of the virus are reflected in the general replacement of preexisting SECs by a new variant at the beginning of the summer season. Our results reveal a significant difference in the genetic makeup of the epidemic in Spain and support the effectiveness of lockdown measures in controlling virus spread even for the most successful genetic variants. Finally, earlier control of SEC7 and particularly SEC8 might have reduced the incidence and impact of COVID-19 in our country. The COVID-19 pandemic has shaken the world since the beginning of 2020. Spain is among the 102 European countries with the highest incidence of the disease during the first pandemic wave. We 103 established a multidisciplinar consortium to monitor and study the evolution of the epidemic, with 104 the aim of contributing to decision making and stopping rapid spreading across the country. We 105 present the results for 2170 sequences from the first wave of the SARS-Cov-2 epidemic in Spain 106 and representing 12% of diagnosed cases until 14 th March. This effort allows us to document at 107 least 500 initial introductions, between early February-March from multiple international sources. 108 Importantly, we document the early raise of two dominant genetic variants in Spain (Spanish 109 Epidemic Clades), named SEC7 and SEC8, likely amplified by superspreading events. In sharp 110 contrast to other non-Asian countries those two variants were closely related to the initial variants 111 of SARS-CoV-2 described in Asia and represented 40% of the genome sequences analyzed. The 112 two dominant SECs were widely spread across the country compared to other genetic variants 113 with SEC8 reaching a 60% prevalence just before the lockdown. Employing Bayesian 114 phylodynamic analysis, we inferred a reduction in the effective reproductive number of these two 115 SECs from around 2.5 to below 0.5 after the implementation of strict public-health interventions 116 in mid March. The effects of lockdown on the genetic variants of the virus are reflected in the 117 general replacement of preexisting SECs by a new variant at the beginning of the summer season. 118 Our results reveal a significant difference in the genetic makeup of the epidemic in Spain and 119 support the effectiveness of lockdown measures in controlling virus spread even for the most 120 successful genetic variants. Finally, earlier control of SEC7 and particularly SEC8 might have 121 reduced the incidence and impact of COVID-19 in our country. 122 123 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint Despite the high incidence accumulated across the country some regions had significantly higher 139 incidence than others. Genomic epidemiology and phylodynamics 6-8 offer a unique opportunity to 140 understand the early events of the epidemic at the global, regional and local levels, to track the 141 evolution of the epidemic after its initial stages and to quantify the impact of lockdown measures 142 on the genetic variants of the virus. However, there are challenges and caveats that prevent the 143 use of pathogen genomes as the sole source of interpretation. While there is now a large number 144 of SARS-CoV-2 sequences deposited in the databases 9 there are still important unsampled areas 145 of the world, including some that played an important role in the initial spread of the epidemic. In 146 addition, the virus spreads faster than it evolves 10,11 which limits the resolution of phylogenetic 147 and phylodynamic analysis 12 . Finally, despite important efforts by sequencing consortiums, only 148 a fraction of the total number of infections has been sequenced. Nevertheless, genomic 149 epidemiology has played an important role in understanding the global and local epidemiology of 150 COVID-19 13-15 . 151 152 After the pandemic was declared in Spain, we assembled the National Consortium of genomic 153 epidemiology of SARS-CoV-2 (http://seqcovid.csic.es/). This established a unique network 154 incorporating more than 50 hospitals and scientific institutions across the country to collect clinical 155 samples and epidemiological information from COVID-19 cases. Here we present the results of 156 this nation-wide effort. We were able to sequence 12% of the reported cases before the national 157 lockdown, and 1% of the reported cases of the first wave (until 14 th May), including samples of 158 SARS-CoV-2 across Spain in the early months of the pandemic (February-May). Using a 159 combination of pathogen genomics, phylogenetic tools, clinical and epidemiological data we have 160 been able to dissect the very early events in the dispersion of SARS-CoV-2 throughout Spain as 161 well the evolution of the virus during the exponential phase and after the lockdown. We document 162 simultaneous introductions in the country from multiple sources. We show that up to 40% of cases 163 were caused by two Spanish epidemic clades, named SEC7 and SEC8. Seven other Spanish 164 epidemic clades were detected but their role was minor, probably because they were introduced 165 relatively close to the lockdown and had no opportunities for a rapid exponential expansion as the 166 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 22, 2020. ; https://doi.org/10.1101 https://doi.org/10. /2020 initial two clades had. In contrast to other European countries these SECs belong to early lineages 167 in the epidemic (A in Pangolin, 19B in NextStrain) . We also show that the reproductive number, 168 Re, of the most successful Spanish epidemic clades quickly declined after the implementation of 169 lockdown measures and they were completely absent from samples taken in July-September. 170 Our results suggest that the most successful variants were those associated with earlier 171 introductions but also that their success may depend on the synergy between superspreading 172 events and high mobility. These results also show the effectiveness of lockdown measures in 173 controlling the virus spread and eliminating established successful epidemic clusters from 174 circulation. 175 176 SARS-CoV-2 was introduced multiple times from multiple sources 177 178 Our dataset consists of 2,170 sequences from Spain, collected under ethical approval, from 25 th 179 February to 22 nd June, coinciding with the initial phases of the COVID-19 pandemic in the country 180 ( Figure 1a ). The most populated Spanish regions were sampled, resulting in a dataset with 181 sequences representing 16 of the 17 administrative regions in which the country is divided ( Figure 182 1b). 1,962 out of the 2,170 (90.4%) samples analyzed here have been sequenced by the 183 SeqCOVID consortium, while the remaining 208 have been generated by independent 184 laboratories and downloaded from GISAID 9 (Table S1 ). Spain displayed a particular viral 185 population structure with a higher proportion of lineage A sequences compared to other European 186 countries 16 (Figure 1c ). Strains from patients in Spain were more closely related with cases 187 sequenced in China and were the most abundant during the first weeks of the Spanish epidemic. 188 They were replaced by lineage B strains ( Figure S1 ), which differ by at least 6-7 substitutions 189 from lineage A and dominated the beginning of the pandemic in most European countries. In 190 addition, we observed an heterogeneous distribution of the SARS-CoV-2 genetic diversity within 191 Spain, both at the regional and local levels. For example, our analysis shows how viral diversity 192 declined with geographic distance from a large urban outdoor like Valencia (see Supplementary 193 Notes). 194 195 Similarly to other countries 17,18 , phylogenomic analyses suggest the existence of multiple 196 independent entries of the virus into Spain. To identify possible introductions we inspected the 197 placement of Spanish viral samples in a global phylogeny constructed with more than 30,000 198 sequences ( Figure 1d ). Given the low genetic diversity of the virus, particularly at the beginning 199 of the epidemic, we found most instances in which a Spanish sample is genetically identical to 200 other variants circulating in the rest of the world. According to their phylogenetic placement, three 201 different possibilities were considered for the phylogenetic position of Spanish sequences. A 202 sequence was included in a 'candidate transmission cluster' when it was found in a monophyletic 203 clade with other Spanish sequences; it was included in a 'zero distance' group when it grouped 204 with other genetically identical Spanish sequences but also with other foreign sequences; and it 205 was denoted as 'unique' when no matching sequence in the Spanish dataset was identified (see 206 detailed definitions of the groups in Mat and Met and in Figure S2 ). We detected 224 'candidate 207 transmission clusters' comprising 827 sequences (~40% of the Spanish samples); 30 'zero-208 distance clusters', comprising 831 sequences, and 513 'unique' sequences (Figures S3). Next, 209 we determined how many unique cases and clusters were compatible with an introduction before 210 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 22, 2020. ; https://doi.org/10.1101/2020.12.21.20248328 doi: medRxiv preprint the general lockdown. We detected that 191 groups (165 'candidate transmission clusters' plus 211 26 'zero distance clusters') and 328 unique sequences met this criteria, representing at least 519 212 independent introductions (distribution of dates in Figure 1e ). This is probably an underestimation 213 of the total number of entries because the number of sequences analyzed is a small subset of the 214 total notified cases ( Figure 1a ). Phylogenetic analysis suggests that the most likely introductions 215 of cases with a clear phylogenetic link (see Methods) came from Italy, the Netherlands, England, 216 and Austria (accounting for ~23%, ~20%, ~13% and 12% of the cases for which a likely country 217 of origin can be inferred, respectively) ( Figure 1f ). The observation that more than half of the 218 introduction events detected are unique sequences illustrates the heterogeneous outcome after 219 an introduction, as some events resulted in large epidemiological clusters, and others 220 disappeared leaving almost no trace. A clear example is the first described death in Spain for 221 which we have generated a partial sequence and who was infected in Nepal but who did not 222 generate any identifiable secondary cases in our dataset. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 22, 2020. ; https://doi.org/10.1101/2020.12.21.20248328 doi: medRxiv preprint (https://microreact.org/showcase) loading the Data S1 files. The size of each piechart correlates 233 with the number of sequences collected in the corresponding area. Each color corresponds to a 234 specific Pangolin lineage, as detailed in Figure S1 (light yellow and green correspond to lineage 235 A, all the others are lineage B). c. Distribution of majorSARS-CoV-2 clades during the first stages 236 of the pandemic (before 1 st April 2020), in those European countries with more than 50 sequences 237 deposited in GISAID 13 th November 2020. d. Global maximum likelihood phylogeny constructed 238 with 32,416 sequences, placement of Spanish samples is indicated in red. e. New and 239 accumulated introductions to Spain. Lower-bound introduction estimates were defined as the date 240 of the likely infection of the first case in a cluster (14 days before symptom onset). f. Estimated 241 international origin of SARS-CoV-2 introductions based on phylogenetic data; in color, those 242 countries with a likely contribution larger than 10%. 243 244 A few genetic variants dominated the first wave in Spain 245 To identify those introductions that resulted in sustained transmission and therefore 246 epidemiologically successful in the long-term, we scanned the phylogeny for larger clades mainly 247 composed by Spanish samples (see Mat and Met for criteria). We identified 9 Spanish Epidemic 248 Clades (SEC) distributed across the phylogeny, representing 46% of the total Spanish dataset 249 analyzed (995 out of 2,170 samples) ( Figure 2a , Figure S4 , Figure S5 , Table S1 , Table S2 ). We 250 first noticed that only two SECs encompassed 30% and 10% of all Spanish samples (SEC8 and 251 SEC7, respectively). This implies that the introduction of these two specific genetic variants 252 explains a high proportion of the entire epidemic for the first wave in the country. In fact, they were 253 responsible for 44% of the 'candidate transmission clusters' identified before the lockdown ( Figure 254 2b). We then estimated the time of introduction in Spain for the 9 SECs using a Bayesian 255 approach (Table S2) . As a conservative estimate we considered the time of introduction as any 256 time between the age of the most recent common ancestor of the SEC and the date of the first 257 Spanish sample ( Figure 2c ). Thus, we assume that the ancestor of the SEC was not necessarily 258 in Spain. 259 260 Our analysis shows that the earlier the introduction, the larger the size of the SEC ( Figure S6 ). 261 The larger clades, SEC7 and SEC8, were the first successful genetic variants introduced into 262 Spain during late January -February ( Figure 2b ). Both belong to lineage A (Pangolin 263 nomenclature) and partially explain the peculiar population structure in Spain relative to other 264 European countries (Figure 1c ). In addition, compared with other SECs, SEC7 and SEC8 were 265 widely spread in the country, being present in at least 10 of the 17 administrative regions ( Figure 266 2b) and had a mean pairwise geographic distance between samples of more than 300 km 267 regardless whether or not the Islas Canarias and Baleares are included ( Figure S7 ). On the 268 contrary, SECs that were introduced later were smaller and showed a narrower geographic 269 spread ( Superspreading events and mobility were key for the success of SEC8 287 Why some genetic variants succeed over others cannot be answered solely from genomic 288 sequence data. We must also take into account the epidemiological dynamics in the country. 289 There is data supporting a role of the 614G mutation in the spike protein associated with 290 epidemiological success. However, SEC7 and SEC8 do not harbour the variant, explaining why 291 614G was less frequent during the first weeks of the epidemic in Spain than in other countries 292 (Figue S8). In addition, the inspection of signature positions for both SECs did not lead to any 293 likely genomic determinant of epidemiological success (Table S3) . Alternatively, we have 294 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 22, 2020. ; https://doi.org/10.1101/2020.12.21.20248328 doi: medRxiv preprint investigated linked epidemiological data from the earliest cases to shed light on the early success 295 of SEC8. 296 In a first phase, SEC8 was introduced at least twice from Italy to the city of Valencia (Figure 3a) . 297 There is epidemiological evidence that both cases were infected in Italy, as they attended the 298 Atalanta-Valencia Champions League football match on 19 th February, and that one of them 299 initiated a transmission chain upon returning to Valencia a few days later. This epidemiological 300 link strongly suggests that the SEC8 genetic variant was imported from Italy. This introduction 301 occurred in agreement with the estimated time of entry of SEC8 into Spain (Table S2) . NextStrain 302 tracking tools for viral spatial spread suggests additional SEC8-related early seedings in Madrid, 303 País Vasco, Andalucía, and La Rioja regions (Video S1) which may involve other countries, not 304 necessarily Italy. Given the lack of virus genetic differentiation and scarce epidemiological 305 information there is no certainty on whether they resulted from independent introductions from 306 abroad or from internal migrations of infected persons, although the simultaneous detection in 307 different regions favours the first option. Most of these multiple introductions occurred during the 308 second half of February, a period in which more than 11,000 daily entries of travelers from Italy 309 were recorded. 310 In a second phase, SEC8 was fueled by superspreading events. Based on the topology of the 311 phylogenetic tree (Figure 2a ) there were multiple clades involving a large number of very closely 312 related sequences (1-3 SNPs) (Figure 3a ). Of special relevance was a funeral on 23 rd February 313 with attendees from the País Vasco and La Rioja regions from which 25 sequences had been 314 sequenced. Importantly, although they did not differ by more than 2 SNPs these sequences are 315 spread across the SEC8 phylogeny suggesting the existence of many more non-sampled 316 secondary cases across the country (Figure 3a) . In a third phase, SEC8, after reaching high 317 frequency locally, was redistributed across the country and in less than two weeks it reached a 318 prevalence of 60% among the sequenced genomes (Figure 3b ), being present in almost every 319 region analysed. All these phases occurred between the first known diagnosed SEC8 case on 320 25 th February (Table S2 ) and the lockdown on 14 th March, highlighting the need for very early 321 containment measures to stop the spread of SARS-CoV-2. 322 In the second half of March, Spain imposed a strict lockdown on non-essential services and 324 movements. A Bayesian birth-death skyline analysis allowed us to evaluate the impact of the 325 lockdown on the effective reproductive number (Re) of the most successful SECs. The analyses 326 of SEC7 ( Figure S9 ) and SEC8 (Figure 3c) their origin are marked in the tree. In red, cases imported from different events in Italy. In orange, 343 secondary cases originated from one of the cases introduced from Italy (also marked with blue 344 arrows). In purple, cases related to a large burial in La Rioja. Green stars mark potential 345 superspreading events of more than 10 sequences sharing at least one clade-defining SNP. b. 346 Contribution of SEC8 to the total of samples sequenced over time. identified in our analyses of SEC7 and SEC8 (Table S3 ). In fact, neither SEC7 nor SEC8 carry 375 the 614G mutation in the spike protein contrary to what is seen in most, but not all, lineage B 376 variants ( Figure S8 ). The mutation 614G has been associated with increased viral shedding 377 compared to the ancestral 614D variant in laboratory conditions 26 and in transmission studies 27,28 . 378 However, other studies cast doubts on its actual role in the epidemic 29 suggesting that its impact 379 on epidemic transmission was minor, if any. In the case of Spain, 614G was not behind the initial 380 success of the epidemic because SEC7 and particularly SEC8 were much more common than 381 other genetic variants until the lockdown (10% and 30% of cases respectively). On the contrary, 382 founder events seem to have played an important role for these particular variants. Our analysis 383 shows that they were the first variants introduced in the country and, at least SEC8, were linked 384 to very early superspreading events that contributed to their success. However, an early 385 introduction of lineage A variants also occurred in other European countries but they did not take 386 hold and were displaced by lineage B. Despite the early adoption of strong NPI measures, we 387 hypothesize that epidemic control in the first wave in Spain was soon overwhelmed as compared 388 to countries that controlled early outbreaks 13 . This was likely associated with a strict 389 implementation of the case definition by the WHO, which allowed a stealth dispersion of the first 390 introductions, but also to several superspreading events, which strongly favoured the 391 establishment of the earliest variants arriving into the country. Spain implemented one of the most 392 strict lockdowns in Europe with a high compliance from the population as tracked by mobility 393 data 30 . The efficacy of NPI measures was evident a few weeks later and it was reflected in the 394 almost complete elimination of SEC7 and SEC8 by the end of the first wave. The spread of new 395 variants, represented by other SECs and more isolated cases, corresponded to a new phase in 396 the epidemic at the national level, with much more limited mobility and social interactions which 397 prevented the establishment of large clusters and transmission chains except in high risk settings 398 such as nursing-homes and long-term care facilities. 399 400 This study has several limitations. Despite being one of the countries with more contribution to 401 public repositories, our dataset only represents a small subset of confirmed cases that occurred 402 in the first COVID-19 wave (1% of cases). Moreover, sampling across the country was 403 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted December 22, 2020. ; https://doi.org/10.1101 https://doi.org/10. /2020 heterogeneous and the representation of each region in the dataset was not always proportional 404 to the incidence during the studied period. Lack of genome data from countries with high disease 405 burden, especially at the beginning of the pandemic, may have led to underestimating the total 406 number of introductions and prevented a reliable identification of their likely sources based only 407 on viral genome sequences. In addition, we did not have access to individual patient data for most 408 cases. Despite these limitations, we have been able to investigate some of the key cases and 409 events that ignited the epidemic in Spain. This allowed us to understand the origin and early 410 spread of SEC8, which would not have been possible based only on genome data. But we have 411 also shown that genetic data can be used to accurately estimate relevant epidemiological 412 parameters such as Re and doubling times even when the proportion of sampling is low. 413 414 We believe that our results allow us to draw lessons for the control of this and future pandemics. 415 First, we have shown how specific variants can be used to track the effectiveness of epidemic 416 control measures. In February, the number of SEC8 cases was just a few dozens and yet it ended 417 up accounting for 60% of the sequenced samples in the first weeks of March. Second, the closure 418 of borders to countries with high incidence is relevant to reduce simultaneous and multiple imports 419 of the virus, but its efficacy depends on the inward incidence of the disease 31 . The most successful 420 SECs during the first wave were probably those that arrived early, multiple times, and to diverse 421 locations. Thus, as suggested elsewhere, founder effects are important for the success of certain 422 variants. Third, SEC7 and SEC8 extended across Spain in a matter of days. Controlling mobility 423 is essential when the level of community transmission is high, as demonstrated by the significant 424 decrease in Re for these high-transmission genetic variants after the lockdown. As a comparison, 425 before the lockdown Re values were 50% higher in Spain (3.3 for SEC8) than in Australia (1.63), 426 and they underwent a reduction down to 7% of the original value (0.23) as a result of the 427 containment measures, compared to 30% (0.48) in Australia 15 . From a public health perspective, 428 our results add to the evidence that the success of specific genetic variants is fueled by 429 superspreading events which rapidly increase the prevalence of the virus 32 . Subsequently, 430 coupled to the high mobility of our connected world, a variant may end up dominating the epidemic 431 in a geographic location. This is what occurred to SEC8 and what at a local level has been 432 described in Boston 33 . In fact, we have recently described a new variant in Europe that is rapidly 433 growing in several countries, which is also linked to initial superspreading events 34 . The 434 conclusion is that early diagnosis and notification of cases would have helped to a timely 435 implementation of effective contact tracing that, coupled with earlier mobility closures and maybe 436 tighter border control, would have probably delayed a few days the expansion of genetic variants 437 such SEC8 during the early stages of the epidemic in Spain. Whether this might have changed 438 the global shape of the epidemic in the country or other genetic variants would have performed 439 its role leading to a similar outcome cannot be ascertained, but the comparison with other 440 countries lead us to suspect that there would have been not many differences with them. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 22, 2020. ; https://doi.org/10.1101 https://doi.org/10. /2020 25. Phylogenetic analysis of SARS-CoV-2 diversity in Europe (Italy). 497 https://nextstrain.org/groups/neherlab/ncov/italy. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 22, 2020. ; https://doi.org/10.1101/2020.12.21.20248328 doi: medRxiv preprint -Supplementary Data 523 - Table S1 : GISAID accession numbers for the 32914 sequences used in this study. 524 The 'basal group' used for dating and the sequences representative of the pangolin 525 lineages are marked for identification. 526 - Table S2 : SEC characteristics and inferred origin time. The time of the most recent 527 common ancestors (MRCA) of each SEC was estimated with a Bayesian 528 molecular clock analysis. "MRCA date" indicates the median value for the age of 529 the closest SEC MRCA. The 95% Highest Posterior Density (HPD) credibility 530 interval for this value is provided. "SEC size" indicates the number of samples 531 belonging to each SEC. The first Spanish collected sample within each SEC is also 532 indicated; the inferred date of infection is inferred as the time span between the 533 oldest MRCA date and the first Spanish collected sample. Number of "candidate 534 transmission clusters", "zero distance clusters" and "unique" included in each SEC 535 are mentioned. "MRCA2" indicates the time of the previous ancestor to the MRCA, 536 considering only nodes that display a posterior probability higher than 0.5. If we 537 consider that the MRCAs were already in Spain, then the introductions into the 538 country occurred between the MRCA2 and the MRCA dates. 539 - Figure S1 : Abundance of the different Pangolin lineages in the dataset by epidemiological 546 week (number of weeks since 2019-12-23) as plotted in Microreact. 547 - Figure S2 : Examples of the different groups of sequences identified. 'Candidate 548 transmission clusters' are groups of Spanish sequences that form a clade. 'Zero distance clusters' 549 are groups of Spanish sequences that are at zero distance from each other. Finally, the 'unique' 550 sequences are Spanish sequences that are more than 1 SNP away from any other Spanish 551 sequence and that do not share a most recent common ancestor (MRCA) node with other Spanish 552 sequences 553 - Figure S3 : Distribution of the different clusters/groups sizes in Spanish samples. 554 - Figure S4 : Number of international and Spanish sequences in each SEC. 555 - Figure S5 : Phylogenetic location of each SEC in the global SARS-CoV-2 phylogeny. 556 Sequences from Spain are coloured according to their SEC (as indicated in Figure 2 ) while 557 international sequences remain in black colour. 558 - Figure S6 : Time of the MRCA of each SEC plotted against the contribution of each SEC 559 to the total number of samples in the Spanish dataset. We observed a significant correlation (⍴=-560 0.69, p-value=0.03) between the time of the MRCA of each SEC and its size, estimated as the 561 number of samples sequenced. 562 - Figure S7 : Distribution of genetic (salmon) versus geographic (grey) distances within 563 each pair of samples belonging to the same SEC. 564 - Figure S8 : Distribution of sequences harbouring the 614G mutation (blue) versus the 565 614D mutation (salmon,wild-type) in the S gene for the spanish sequences in our dataset. In the 566 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. phylogenetic tree in the background is the maximum clade credibility tree from the BDSKY 579 analysis, with the tips colored according to whether they were sampled before or after 20 th March. 580 - Figure S10 : Mean pairwise genetic distance vs geographic distance (in SNP number), 581 between the largest cities (> 70k inhabitants) of the Comunidad Valenciana autonomous region. 582 - Figure S11 : Left) Heatmap of genetic diversity for the province of Valencia; red colors 583 indicate high diversity; blue colors indicate lower diversity. Genetic diversity has been measured 584 as the number of base substitutions per site averaged over all sequence pairs within each 585 municipality. Genetic diversity is largest near Valencia, the region's capital, and decreases with 586 geographic distance to it. Right) All sequences included in our dataset from Comunidad 587 Valenciana, coloured according to the pangolin lineage they belong to. 588 589 590 593 89594R to FGC. MC is supported by Ramón y Cajal program from Ministerio de Ciencia and 594 grants RTI2018-094399-A-I00 and SEJI/2019/011. 595 We . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 22, 2020. ; https://doi.org/10.1101 https://doi.org/10. /2020 Brief report: A novel coronavirus from patients with pneumonia in China Genomic epidemiology reveals transmission patterns and dynamics 471 of SARS-CoV-2 in Aotearoa New Zealand. medRxiv Tracking the COVID-19 pandemic in Australia using genomics Geographical and temporal distribution of SARS-CoV-2 clades in the WHO 476 European Region Rapid SARS-CoV-2 whole-genome sequencing and analysis for 478 informed public health decision-making in the Netherlands Evolution and epidemic spread of SARS-CoV-2 in Brazil The Covid-19 outbreak in Spain. A simple dynamics model, some lessons, and a 483 theoretical framework for control response Analysis of the impact of lockdown on the reproduction number of 485 Annex I -List of the SeqCOVID-SPAIN consortium members Iñaki Comas (icomas@ibv.csic.es), Fernando González-Candelas (fernando.gonzalez@uv.es) Álvaro Chiner-Oms (achiner@ibv.csic.es) Manoli 616 Torres-Puente (mtorres@ibv.csic.es), Inmaculada Gómez-Navarro (igomez@ibv Lidia Ruiz-Roldán (lidiarroldan@gmail María Alma Bracho (bracho_alm@gva.es) martinez_lucpri@gva.es), Inmaculada Galán-Vendrell (galan_inm@gva.es Griselda De Marco (demarco_gri@gva.es) Mireia Coscollá 622 (mireia.coscolla@uv.es), Paula Ruiz-Rodríguez (ruizro5@alumni.uv.es), Giuseppe D'Auria 623 (dauria_giu@gva.es), Francisco Javier Roig Sena (roig_fco@gva.es), Isabel Sanmartín 624 (isanmartin@rjb.csic.es), Daniel García-Souto (danielgarciasouto@gmail Jorge Rodríguez-Castro (jorge.rodriguez@usc.es), Martín Santamarina 627 (martin.santamarina.garcia@usc.es), Nuria Rabella (nrabella@santpau.cat) José Manuel Azcona-Gutiérrez 632 (jmazcona@riojasalud.es), Miriam Blasco-Alberdi (mblasco@riojasalud.es), Alfredo Mayor 633 (alfredo.mayor@isglobal.org), Alberto L. García-Basteiro (alberto Jon Sicilia (jsiciliamambrilla@gmail.com), Pilar Catalán 638 (pcatalan.hgugm@salud.madrid.org), Julia Suárez (julia.suarez@iisgm.com), Patricia Muñoz 639 (pmunoz@hggm.es), Cristina Muñoz-Cuevas (cristina.munozc@salud-juntaex.es), Guadalupe 640 Rodríguez Rodríguez (guadalupe.rodriguez@salud-juntaex.es), Juan Alberola 641 (juan Antonio Rezusta (arezusta@unizar.es), Alexander 643 Tristancho (aitristancho@salud.aragon.es), Ana Milagro Beamonte 644 (amilagro@salud.aragon.es), Nieves Martínez Cameo Elisa Martró (emartro@igtp.cat), Antoni E. Bordoy 646 (aescalas@igtp.cat), Anna Not (anot@igtp.cat), Adrián Antuori (adrian Sonia Algarate Cajo (sonialgarate@gmail.com) Jose Luis del Pozo (jdelpozo@unav.es) Cristián Castelló Abietar (crcaab@hotmail.com) Maitane Aranzamendi Zaldumbide 653 (maitane.aranzamendizaldumbide@osakidetza.eus), Andrea Vergara (vergara@clinic.cat) David Posada 655 (dposada@uvigo.es), Diana Valverde (dianaval@uvigo.es), Nuria Estévez 656 (nuestevez@uvigo.es), Iria Fernández-Silva (irfernandez@uvigo.es), Loretta de Chiara 657 (Ldechiara@uvigo.es), Pilar Gallego (mgallego@alumnos.uvigo.es), Nair Varela 658 (nvarela@alumnos.uvigo.es) Ulises Gómez-Pinedo 660 (ulisesalfonso.gomez@salud.madrid.org), Mónica Gozalo Margüello 661 (monica.gozalo@scsalud.es), Mª Eliecer Cano García (meliecer.cano@scsalud.es) Andrés 666 Canut-Blasco (andres.canutblasco@osakidetza.eus), Silvia Hernáez Crespo 667 (silvia.hernaezcrespo@osakidetza.eus), María Luz Cordón Rodríguez 668 (marialuzalbina.cordonrodriguez@osakidetza.eus), Mª Concepción Lecaroz Agara 669 (mariaconcepcion.lecarozagara@osakidetza.eus), Carmen Gómez González 670 (carmen.gomezgonzalez@osakidetza.eus), Amaia Aguirre Quiñonero 671 (amaia.aguirrequiñonero@osakidetza.eus) Milagrosa Montes Ros 681 (mariamilagrosa.montesros@osakidetza.eus), Luis Piñeiro Vázquez 682 (luisdario.pineirovazquez@osakidetza.eus), Ane Sorrarain (ane.sorarrain@biodonostia.org) José María Marimón (josemaria.marimonortizdez@osakidetza.eus), Maria Dolores Gómez Ruiz 684 (gomez José María Navarro-Marí 686 (josem.navarro.sspa@juntadeandalucia.es), Irene Pedrosa-Corral 687 (irene.pedrosa.sspa@juntadeandalucia.es), Sara Luisa Sanbonmatsu-Gámez 688 (saral.sanbonmatsu.sspa@juntadeandalucia.es), Maria Carmen Perez Gonzalez 689 (mcpergon@gobiernodecanarias.org), Francisco Javier Chamizo López 690 (fchalop@gobiernodecanarias.org) Eliseo Albert (eliseo.al.vi@gmail.com) Eva Pastor Boix (pastor_eva@gva.es) Cascales Ramos (cascales_pal@gva.es), Begoña Fuster Escrivá (begona María Dolores Ocete Mochón 696 (ocete_mar@gva.es), Rafael Medina González (rafa.medina.gonzalez@gmail.com), Julia 697 González-Cantó (juliagonzalez1992@hotmail.com) Inmaculada de Toro Peinado 699 (inmadetoro@yahoo.es), María Concepción Mediavilla Gradolph (gradolphilla@hotmail Encarnación Simarro Córdoba (mesimarro@sescam.jccm.es), Julia 702 Lozano Serra (jlozanos@sescam.jccm.es), Mónica Parra Grande 703 (monicaparra88@hotmail.com), Lorena Robles Fonseca (lrobles@sescam.jccm.es) Natalia 705 Chueca (naisses@yahoo.es), Federico García (fegarcia@ugr.es), Cristina Gomez-Camarasa 706 (gomezcamarasa@gmail.com), Ana Carvajal (ana.carvajal@unileon.es), Vicente Martín 707 (vicente.martin@unileon.es) Héctor Argüello (hector.arguello@unileon.es) Amparo Farga Martí (farga_amp@gva.es) Rocío Trastoy 712 (rocio.trastoy.pena@sergas.es), Gema Barbeito-Castiñeiras 713 (gema.barbeito.castineiras@sergas.es), Amparo Coira (amparo.coira.nieto@sergas.es) guirao@sergas.es), Anna Planas (anna.planas@iibb.csic.es) María Ángeles Marcos (mmarcos@clinic.cat), Manuel 718 Segovia Hernández (msegovia@um.es), Antonio Moreno Docón (a.moreno@um.es), Juan 719 Carlos Galan (juancarlos.galan@salud.madrid.org), Esther Viedma Moreno 720 (viedmaesther@gmail.com), Jesús Mingorance (jesus.mingorance@idipaz.es), Jovita 721 Fernández-Pinero (fpinero@inia.es), Elisa Rubio García (elrubio@clinic.cat), Aida Peiró-722 Mestres (aida.peiro@isglobal.org) This work was funded by the Instituto de Salud Carlos III project COV20/00140, Spanish 592National Research Council project CSIC-COV19-021 and ERC StG 638553 to IC, and BFU2017-