key: cord-0855263-c3ezmshe authors: Bartolini, Barbara; Rueca, Martina; Gruber, Cesare Ernesto Maria; Messina, Francesco; Carletti, Fabrizio; Giombini, Emanuela; Lalle, Eleonora; Bordi, Licia; Matusali, Giulia; Colavita, Francesca; Castilletti, Concetta; Vairo, Francesco; Ippolito, Giuseppe; Capobianchi, Maria Rosaria; Di Caro, Antonino title: SARS-CoV-2 Phylogenetic Analysis, Lazio Region, Italy, February–March 2020 date: 2020-08-03 journal: Emerg Infect Dis DOI: 10.3201/eid2608.201525 sha: 8b4db8223d6e8a4cb050f7246668aacb90878296 doc_id: 855263 cord_uid: c3ezmshe We report phylogenetic and mutational analysis of severe acute respiratory syndrome coronavirus 2 virus strains from the Lazio region of Italy and provide information about the dynamics of virus spread. Data suggest effective containment of clade V strains, but subsequently, multiple waves of clade G strains were circulating widely in Europe. We analyzed nasopharyngeal swab (n = 6) and bronchoalveolar lavage (n = 3) samples from 9 patients with COVID-19 to perform SARS-CoV-2 whole-genome reconstruction and mutational analysis. We collected samples in late February and early March, 2020 (Table 1) . At sampling time, all patients reported symptoms such as fever, sore throat, cough, or other respiratory symptoms. Two sequences were identical, so we included only 1 of them in the analysis, resulting in 8 total sequences. We named the sequences INMI3-INMI10 for their detection at National Institute for Infectious Diseases and analyzed them together with the previously published INMI1 and INMI2 (6) , along with all the sequences from Italy posted to GISAID database by April 11, 2020. We performed next-generation sequencing (SARS-CoV-2 Panel) on Ion Torrent platform (Thermo Fisher Scientific, https://www.thermofisher. com) using shotgun approach for INMI3-4 and amplicon approach for INMI5-10. After quality control, we generated a median number of 4.3 × 10 7 reads for each shotgun sample and 1.5 × 10 6 for each amplicon sample (ranging from 7.5 × 10 5 to 4.8 × 10 7 ). The sequencing mean depth of SARS-CoV-2 ranged from 367-fold in INMI3 to 16,661-fold in INMI5. We submitted consensus sequences to GISAID. We used the proposed phylogenetic lineage classification (A. Rambaut We report phylogenetic and mutational analysis of severe acute respiratory syndrome coronavirus 2 virus strains from the Lazio region of Italy and provide information about the dynamics of virus spread. Data suggest effective containment of clade V strains, but subsequently, multiple waves of clade G strains were circulating widely in Europe. according to GISAID phylogenetics, as reported (6), and clade B2; the clade includes other sequences from EU countries, but no additional sequences from Italy. All other INMI sequences cluster with the GISAID G clade, and with the B1 clade; we focused subsequent analysis on clade B1 (Figure) . The clade B1 INMI sequences are distributed in 2 main clusters, one including most of the northern Italy strains and the other including sequences mainly from central Italy. In particular, INMI4, which was epidemiologically linked to Bergamo (Lombardy region), clusters with sequences from central Italy (Abruzzo region). The other INMI sequences cluster with strains from northern Italy. Of note, in both clusters the sequences from Italy are intermixed with sequences from other EU countries, which can also be seen in the broader phylogenetic analysis on GISAID, in which more EU sequences are analyzed. We have identified 5 synonymous and 9 nonsynonymous substitutions distributed along the whole genome (Table 2) . Each patient showed several amino acid substitutions ranging from 4 to 7. The G clade-specific single-nucleotide polymorphism A23403G led the amino acid change D614G in the S protein. We observed one additional mutation in this protein, that of C21575T (L5F) in INMI7, which is detected in few other sequences in GISAID, interspersed among different non-G clades (M. Chiara et al., unpub. data, https://doi.org/10.1101/2020.03.30.016790). Its location in a marginal region of the gene and the sporadic distribution in different clades indicates repeated occurrence not followed by fixation, consistent with no evolutionary advantage. The S protein in the SARS-CoV-2 virus is a chief determinant of the host range and pathogenicity. The virion attaches to the cell membrane by binding the S protein with the host ACE2 receptor (7) . The D614G mutation, located in the putative S1-S2 junction region near the furin polybasic cleavage site (RRAR), might have an effect on priming by host cell proteases; however, the real impact of this high-frequency mutation is unclear. The variants C241T, C3037T (located in the noncoding region) and C14408T (in open reading frame1ab, orf1ab) were present in all INMI3-INMI10 sequences. These mutations have been detected in several SARS-CoV-2 isolates throughout Europe and are characteristic of clade G (C. Yin, unpub. data). A nonsynonymous substitution D3G in membrane glycoprotein was detected in 1 INMI9 sequence. We detected 3 nucleotide changes in INMI4, located in a high variable region of the gene, in 2 adjacent codons of the nucleocapsid (N) gene, two 2-amino acid changes, R203K and G204R. N protein, responsible for the formation of helical nucleocapsid, can elicit humoral and cell mediated immune response and has potential value in vaccine development. However, none of the observed mutations has been so far associated with changes in viral pathogenicity or transmissibility. The phylogenetic reconstruction we report suggests possible multiple introduction of SARS-CoV-2 virus in Italy, supporting previously reported analysis conducted on a more limited number of sequences (3) (4) (5) . The analysis consistently places the strains described in this study in 2 distinct clusters in B1 clade. No other sequence from Italy clusters in B2 (or GI-SAID V) clade, indicating the positive effect of containment measures established by health authorities in both Italy and China to limit viral transmission directly from China. The same measures were unable to contain a wave of subsequent multiple introductions in Italy of strains that were widely circulating in Europe, all clustering with clade B1. The inclusion of the viral sequences from infections occurring in the Lazio region helps to demonstrate the dynamics of virus circulation in Italy. In particular, a small number of mutations have been detected in these strains, but the real impact and role that these mutations may have on the pathogenicity and transmissibility of SARS-CoV-2 remains to be determined. A limitation of our research is that only a portion of viral sequences, including the sequences from Italy, have been published as of April 10, 2020; phylogenetic analysis could substantially change when more sequences are made available. Continued genomic surveillance strategies are needed to improve monitoring and understanding of current SARS-CoV-2 epidemics, which might help to lessen the public health impact of COVID-19. Furthermore, increased sequencing capacity is necessary for contact tracing and enhanced surveillance activity. Coronavirus disease 2019 (COVID-19) situation report-81 European Centre for Disease Control and Prevention (ECDC) A doubt of multiple introduction of SARS-CoV-2 in Italy: a preliminary overview on behalf of ISS COVID-19 Study Group. Whole-genome and phylogenetic analysis of two SARS-CoV-2 strains isolated in Italy in Genomic characterization and phylogenetic analysis of SARS-COV-2 in Italy Molecular characterization of SARS-CoV-2 from the first case of COVID-19 in Italy Characterization of spike glycoprotein of SARS-CoV-2 on virus entry and its immune cross-reactivity with SARS-CoV We thank the contributors of genome sequences of the newly emerging coronavirus (the originating and submitting laboratories) for sharing their sequences and other metadata through the GISAID Initiative, on which this research is based. We thank Salvatore Conti and Alessandro Albiero for their support in NGS sequencing and analysis. Dr. Bartolini is a senior scientist at Microbiology Laboratory and Infectious Diseases Biorepository at the National Institute for Infectious Diseases "L. Spallanzani."Her primary research interests are next-generation sequencing and emerging and reemerging infections. T T T T T T T T Noncoding UTR C 2062 T Noncoding UTR C 3037 T T T T T T T Nucleocapsid protein *Nucleotide positions refer to the Wuhan-Hu-1 reference genome (GenBank accession no. MN908947). Orf, open reading frame; Syn, synonymous substitution; UTR, untranslated region.