key: cord-0911326-jrf6l1cc authors: Micheli, Valeria; Rimoldi, Sara Giordana; Romeri, Francesca; Comandatore, Francesco; Mancon, Alessandro; Gigantiello, Anna; Perini, Matteo; Mileto, Davide; Pagani, Cristina; Lombardi, Alessandra; Gismondo, Maria Rita title: Geographical reconstruction of the SARS‐CoV‐2 outbreak in Lombardy (Italy) during the early phase date: 2020-08-20 journal: J Med Virol DOI: 10.1002/jmv.26447 sha: 2e20f7cc0f4a4f2f0ea1438110e865e82e0b9f9a doc_id: 911326 cord_uid: jrf6l1cc The first identification of autochthonous transmission of SARS‐CoV‐2 in Italy was documented by the Laboratory of Clinical Microbiology, Virology and Bioemergencies of L. Sacco Hospital (Milano, Italy) on 20th February 2020 in a 38 years old male patient, who was found positive for pneumonia at the Codogno Hospital. Thereafter Lombardy has reported the highest prevalence of COVID‐19 cases in the country, especially in Milano, Brescia and Bergamo provinces. The aim of this study was to assess the potential presence of different viral clusters belonging to the six main provinces involved in Lombardy COVID‐19 cases in order to highlight peculiar province‐dependent viral characteristics. A phylogenetic analysis was conducted on 20 full length genomes obtained from patients addressing to several Lombard hospitals from February 20th to April 4th, 2020, aligned with 41 Italian viral genome assemblies available on GISAID database as of 30th March, 2020: two main monophyletic clades, containing 8 and 53 isolates, respectively, were identified. Noteworthy, Bergamo isolates mapped inside the small clade harbouring M gene D3G mutation. The molecular clock analysis estimated a cluster divergence approximately one month before the first patient identification, supporting the hypothesis that different SARS‐CoV‐2 strains had spread worldwide at different times, but their presence became evident only in late February along with Italian epidemic emergence. Therefore, this epidemiological reconstruction suggests that virus initial circulation in Lombardy was ascribable to multiple introduction. The phylogenetic reconstruction robustness, however, will be improved when more genomic sequences are available, in order to guarantee a complete epidemiological surveillance. This article is protected by copyright. All rights reserved. represented a public health concern in the past 20 years. 4, 5 The COVID-19 related virus was classified as a β-betacoronavirus and, considering its close correlation to SARS-CoV, it was renamed SARS-CoV-2. 6 In Europe, Italy is one of the most affected areas, accounting for more than 230,000 cases on June 5 th , 2020. 7 Northern Italy has reported the highest prevalence in the country, especially in Milano, Brescia and Bergamo provinces, which registered more than 23,000, 14 ,000 and 13,000 cases, 7 respectively and an infection rate almost double (0.58% and 0.52%, respectively) compared with the rest of Italy (0.32%). In particular, Bergamo and Brescia provinces had to face a high percentage of severe clinical cases presenting an enormous rate of mortality. This article is protected by copyright. All rights reserved. The first identification of autochthonous transmission of SARS-CoV-2 in Italy was documented by the Laboratory of Clinical Microbiology, Virology and Bioemergencies of L. Sacco University Hospital (Milano, Italy) on 20 th February 2020: the patient was a 38 years old male, who was found positive for pneumonia at the Codogno Hospital, without any evident linkage with COVID-19 cases; soon after the laboratory received thousands of respiratory samples for the confirmation of suspected COVID-19 from many regional institutions (Bergamo, Brescia, Cremona, Codogno, Lodi, Milano): owing to the geographical distribution of these specimens, viral sequence data could give insight into SARS-CoV-2 molecular epidemiology and possible local virus introduction. The aim of the present study was to assess the potential presence of different viral clusters belonging to the six main provinces involved in Lombardy COVID-19 cases in order to highlight peculiar province-dependent viral characteristics. The study included SARS-CoV-2 positive samples collected at the Laboratory of Clinical considered. Since the laboratory received samples from other hospitals widely distributed on the regional territory, patients' location was highly variable: selection was therefore performed aiming at maximizing geographical representation and detection of related diversity. Relevant clinical, demographic and geographical data were recorded. This article is protected by copyright. All rights reserved. Total RNA was extracted from 200 μL of sample and eluted in 100 μL by QIAMP VIRAL RNA mini kit (Qiagen, Hilden, Germany), according to manufacturer's instructions. Quality, quantity and purity of the genomic RNA were determined using Qubit 4 fluorometer (Thermofisher Scientific Inc, Italy). cDNA was synthesized using ImProm-II™ Reverse Transcriptase (RT) and related reagents (Promega Corporation, Italia). The reaction mixture was prepared as follows: 6 μL of ImProm-II™ 5X Reaction Buffer, 1.8 μL of 5 mM Mg2+, 1 μL of 10 mM Random Primers, 1 μL of 5mM dNTPs, 1 μL of RT, 0.5 μL of RNasin (40 U/μL) and 20 μL of purified RNA, for a total volume of 31.3 μL; reverse-transcription was performed at the following conditions: 37° C for 45 min, 80° C for 5 min. cDNA was amplified using two SARS-CoV-2 specific primers sets, in order to cover the whole viral genome. cDNA was then digested, ligated to barcodes, purified and amplified again; a second purification was performed before dsDNA quantification on Qubit 4 and library preparation on One Touch™ 2 instrument (Thermofisher Scientific, Monza, Italy); the One Touch ES instrument (Thermofisher Scientific, Monza, Italy) was used for final enrichment and Ion Torrent™ Personal Genome Machine™ (PGM) System (Thermofisher Scientific, Monza, Italy) for sequencing, following manufacturer's instructions. For each sample, the genome assembly was obtained using a mapping-based approach. Low quality reads bases were trimmed out using Trimmomatic software, 8 A dataset of 41 genome assemblies of SARS-CoV-2 strains isolated in Italy between 20 th February 2020 and 30 th March 2020 was retrieved from GISAID database, (Table S1) . A global dataset including these 41 GISAID genome assemblies and the 20 genome assemblies generated in this study was produced and aligned using MAFFT. 10 The low quality alignment regions at the extremities of the alignment were removed using Gblocks with default parameters. 11 The alignment was subjected to Maximum Likelihood phylogenetic analysis using RaxML. 12 The obtained tree was then analysed using A total of 20 samples were sequenced and included in phylogenetic analysis, attributing the progressive ID HSacco-N (from HSacco-2 to HSacco-21). All patients were resident in Lombardy, distributed in different provinces; in particular, the province 'Milano' contains also patients hospitalized for non-COVID-19 disease, for whom SARS-CoV-2 hospital acquisition was supposed; in addition, HSacco-20 was a nurse living in Bergamo and working at Lodi Hospital. This article is protected by copyright. All rights reserved. The 20 SARS-CoV-2 genome assemblies are available on the GISAD database ( Table 1 ). The comparison of the marginal likelihoods of constant and exponential coalescent models under a log-normal relaxed clock showed that the best fitting model was the exponential coalescent prior (PS BF exponential growth vs. constant = 2,42; SS BF exponential growth vs. constant = 2,39). The obtained phylogenetic tree is reported in Figure 1 . The tree shows the existence of two major monophyletic clades, containing 8 and 53 isolates respectively (in green and blue, respectively, in Figure 1 ) Lombardy had the highest prevalence of COVID-19 in Italy, thus being the likely epicentre of country outbreak. 7 However, it is still unclear how SARS-CoV-2 circulation started: no epidemiological link was found for the first identified autochthonous patient. Moreover, emerging evidences suggest a multiple virus introduction at least in January 2020. 17, 18 This article is protected by copyright. All rights reserved. This are key factors in patients' management and any variation can extremely impact drugs, vaccines and diagnostic tools or be related to a more severe clinical presentation. In conclusion, this study gave insights on early dynamics of SARS-CoV-2 circulation in Italy, underlying peculiar strains localization and supporting multiple virus introductions at least in January 2020. Geographic map of Italy, with the regions from which isolates were collected, colored following the color code of the "Region" legend in figure a. World Health Organization. Novel Coronavirus (2019-nCoV), Situation report-1, 21 A pneumonia outbreak associated with a new coronavirus of probable bat origin World Health Organization. WHO Director-General's opening remarks at the media briefing on COVID-19 -11 Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding SARS and MERS: recent insights into emerging coronaviruses Coronaviridae Study Group of the International Committee on Taxonomy of Viruses. The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2 COVID-19 Italia -Monitoraggio della situazione Trimmomatic: a flexible trimmer for Illumina sequence data A framework for variation discovery and genotyping using next-generation DNA sequencing data Parallelization of MAFFT for large-scale multiple sequence alignments Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies Exploring the temporal structure of heterochronous sequences using TempEst jModelTest 2: more models, new heuristics and parallel computing Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10 Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty A doubt of multiple introduction of SARS-CoV-2 in Italy: A preliminary overview Genomic characterization and phylogenetic analysis of SARS-CoV-2 in Italy Emergence of genomic diversity and recurrent mutations in SARS-CoV-2; Infection HSacco-17 hCoV19/Italy/Hsacco-17/2020 n/a n/a n/a n/a 2020-04-04HSacco-18 hCoV19/Italy/Hsacco-18/2020 Milano Milano n/a n/a 2020-04-02HSacco-19hCoV19/Italy/Hsacco-19/2020 Codogno Lodi n/a n/a 2020-02-20HSacco-20 hCoV19/Italy/Hsacco-20/2020Bergamo/Lo di Bergamo/L odi n/a n/a 2020-02-26HSacco-21 hCoV19/Italy/Hsacco-21/2020 n/a n/a n/a n/a 2020-04-01