key: cord-0707005-82x7gq5z authors: Lai, A.; Bergna, A.; Caucci, S.; Clementi, N.; Vicenti, I.; Dragoni, F.; Cattelan, A. M.; Menzo, S.; Pan, A.; Callegaro, A.; Tagliabracci, A.; Caruso, A.; Caccuri, F.; Ronchiadin, S.; Balotta, C.; Zazzi, M.; Vaccher, E.; Clementi, M.; Galli, M.; Zehender, G. title: Molecular tracing of SARS-CoV-2 in Italy in the first three months of the epidemic date: 2020-07-07 journal: nan DOI: 10.1101/2020.07.06.20147140 sha: 601a927ca43c0cd37b3bf09a0eb013491c223ddc doc_id: 707005 cord_uid: 82x7gq5z The aim of this study is the characterization and genomic tracing by phylogenetic analyses of 59 new SARS-CoV-2 Italian isolates obtained from patients attending clinical centres in North and Central Italy until the end of April 2020. All but one of the newly characterized genomes belonged to the lineage B.1, the most frequently identified in European countries, including Italy. Only a single sequence was found to belong to lineage B. A mean of 6 nucleotide substitutions per viral genome was observed, without significant differences between synonymous and non-synonymous mutations, indicating genetic drift as a major source for virus evolution. tMRCA estimation confirmed the probable origin of the epidemic between the end of January and the beginning of February with a rapid increase in the number of infections between the end of February and mid-March. Since early February, an effective reproduction number (Re) greater than 1 was estimated, which then increased reaching the peak of 2.3 in early March, confirming the circulation of the virus before the first COVID-19 cases were documented. Continuous use of state-of-the-art methods for molecular surveillance is warranted to trace virus circulation and evolution and inform effective prevention and containment of future SARS-CoV-2 outbreaks. Italy is one of the countries most and earlier affected in Europe by the COVID-19 pandemic 50 (https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48 51 e9ecf6). The first autochthonous cases of Coronavirus 2019 Disease (COVID-19) were observed 52 starting from February 21, 2020 in Codogno (Lodi province), determining on February 22, 2020 the 53 establishment of a 'red zone' to contain the epidemic, encompassing 11 municipalities. Thereafter, in 54 a short time, it became evident that the epidemic had already involved a large part of Lombardy 55 region and then spread to neighbouring regions and, substantially less, to the rest of the country. On (Table S1 ). All of the data used in this study were previously anonymised as required by the Italian Data . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 7, 2020. The RDP5 software was used to investigate the presence of potential recombination [9]. All of the genes were tested for selection pressure using Datamonkey 132 (https://www.datamonkey.org/). The simplest evolutionary model best fitting the sequence data was selected using the JmodelTest . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 7, 2020. The MCMC analysis was run until convergence with sampling every 10,000 generations. Convergence was assessed by estimating the effective sampling size (ESS) after 10% burn-in using Tracer v. Given that the samples were collected during a short period of time, a "birth-death contemporary" 171 model was used. The analyses were based on the previously selected HKY substitution model and the The MCMC analyses were run for 100 million generations and sampled every 10,000 steps. Convergence was assessed on the basis of ESS values (ESS >200). Uncertainty in the estimates was 186 indicated by 95% highest posterior density (95%HPD) intervals. The mean growth rate was calculated on the basis of the birth and recovery rates (r=λ-δ), and the . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 7, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 7, 2020. Considering the Italian isolates, only 1 site resulted under significant selecting pressure by three 236 different methods (MEME, FEL, FUBAR): site 1,046 in the S gene that was present in three isolates 237 from Padua. This G1046V mutation is located in the S2 subunit, between heptad repeat 1 and 2. Mutations R203K-G204R in N gene were always simultaneously detected. It appears that these 239 mutations discontinue a serine-arginine (S-R) dipeptide by introducing a lysine in-between them, 240 having impacts on structure and function in the mutated N protein. Fifty two sequences in our dataset carried these mutations, particularly 11 of the 59 whole 242 genome newly characterized; six of these were from Tuscany, four from Milan and one from Padua. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 7, 2020. . https://doi.org/10.1101/2020.07.06.20147140 doi: medRxiv preprint (Table S3) . The mean tMRCA of the tree root ( The Bayesian tree of the Italian sequences showed 15 small significant subclades including two 263 to ten isolates (Figure 2 ). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 7, 2020. . https://doi.org/10.1101/2020.07.06.20147140 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 7, 2020. . https://doi.org/10.1101/2020.07.06.20147140 doi: medRxiv preprint The Bayesian birth-death skyline plot of the Re estimates with 95%HPD with a single R group 280 (corresponding to R0) estimated a mean value of 2.25 (1.5-3.1). Figure 4 (panels a and b) shows the 281 changes of Re since the origin of the epidemic and suggests that Re was higher than 1 since the early . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 7, 2020. . https://doi.org/10.1101/2020.07.06.20147140 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 7, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 7, 2020. . https://doi.org/10.1101/2020.07.06.20147140 doi: medRxiv preprint suspended flights from China. The couple landed at the Milan airport and travelled to other 373 locations in Northern and Central Italy before the onset of symptoms requiring hospitalization in 374 Rome, but they had not travelled to Padua. Thus, the origin of such a strain remains unexplained 375 and further investigations are underway to evaluate whether this strain may have played a role in 376 causing an epidemic, at least locally. It would also be interesting to investigate whether the currently 377 predominant strain was for some reasons more epidemic than the initial strain, or if the spread of the 378 latter was limited by random factors. In conclusion, our data show the importance of molecular and phylogenetic evolutionary 380 reconstruction in the surveillance of emerging infections. Of note, it appears that the outbreak in . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 7, 2020. . https://doi.org/10.1101/2020.07.06.20147140 doi: medRxiv preprint Genomic characterization and phylogenetic analysis of SARS-COV-2 in Italy Molecular characterization of SARS-CoV-2 from the first case of COVID-19 in 458 The New England journal of medicine 2020 Spike mutation pipeline reveals the emergence of a more 468 transmissible form of SARS-CoV Geneious Basic: an integrated and extendable desktop software platform 471 for the organization and analysis of sequence data MAFFT Multiple Sequence Alignment Software Version 7: Improvements in 473 Performance and Usability Molecular Evolutionary Genetics Analysis 475 across Computing Platforms RDP4: Detection and analysis of 477 recombination patterns in virus genomes RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies A 482 dynamic nomenclature proposal for SARS-CoV-2 to assist genomic epidemiology Bayesian phylogenetic 485 and phylodynamic data integration using BEAST 1.10 Exploring the temporal structure of 490 heterochronous sequences using TempEst (formerly Path-O-Gen) Improving the accuracy 492 of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty Bayesian selection of continuous-time 495 Markov chain evolutionary models Birth-death skyline plot reveals temporal changes 497 of epidemic spread in HIV and hepatitis C virus (HCV) Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia Comparative population dynamics of HIV-1 C: subtype-specific differences in patterns of epidemic growth The reproductive number of COVID-19 is higher 508 compared to SARS coronavirus Assessment of the SARS-CoV-2 basic reproduction number, R0, based on the 510 early phase of COVID-19 outbreak in Italy Spread and dynamics 512 of the COVID-19 epidemic in Italy: Effects of emergency containment measures Monitoring transmissibility and mortality of COVID-19 in Europe Early phylogenetic estimate of the effective 517 reproduction number of SARS-CoV-2 SARS-CoV-2 seroprevalence trends in healthy blood donors during the COVID-19 Milan . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 7, 2020. . https://doi.org/10.1101/2020.07.06.20147140 doi: medRxiv preprint 522 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 7, 2020. . https://doi.org/10.1101/2020.07.06.20147140 doi: medRxiv preprint