key: cord-0794466-z5hle1hm authors: Amato, Laura; Candeloro, Luca; Di Girolamo, Arturo; Savini, Lara; Puglia, Ilaria; Marcacci, Maurilia; Caporale, Marialuigia; Mangone, Iolanda; Cammà, Cesare; Conte, Annamaria; Torzi, Giuseppe; Mancinelli, Adamo; Di Giallonardo, Francesca; Lorusso, Alessio; Migliorati, Giacomo; Schael, Thomas; D’Alterio, Nicola; Calistri, Paolo title: Epidemiological and genomic findings of the first documented Italian outbreak of SARS-CoV-2 Alpha variant date: 2022-05-13 journal: Epidemics DOI: 10.1016/j.epidem.2022.100578 sha: a20af29844dd37a85d732411f4beb3642189b085 doc_id: 794466 cord_uid: z5hle1hm From 24 December 2020 to 8 February 2021, 163 cases of SARS-CoV-2 Alpha variant were identified in Chieti province, Abruzzo region. Epidemiological information allowed the identification of 14 epi-clusters. With one exception, all the epi-clusters were linked to the town of Guardiagrele: 149 contacts formed the network, two-thirds of which were referred to the family/friends context. Real data were then used to estimate transmission parameters: according to our method, the calculated Re(t) was higher than 2 before the 12 December 2020. Similar values were obtained from other studies considering Alpha variant. However, results should be critically interpreted in light of the epidemiological situation and control measures in place in the studied area at that particular time. Italian sequence data were combined with a random subset of sequences obtained from GISAID database. Genomic analysis showed close similarity between the sequences from Guardiagrele, forming one distinct clade. This would suggest one or limited unspecified introductions from outside to Abruzzo region in early December 2020, which led to the diffusion of Alpha in Guardiagrele and in neighbouring municipalities, with very limited inter-regional mixing. Since the beginning of the pandemic, the Istituto Zooprofilattico Sperimentale of Abruzzo and Molise regions (IZSAM) was appointed by the Italian Ministry of Health to support the molecular diagnosis, genomic characterization and phylogenetic analysis of SARS-CoV-2. In addition, IZSAM has been providing the Local Health Authorities of Abruzzo region for epidemiological support when needed Danzetta et al., 2020; Di Giallonardo et al., 2020; Lorusso et al., 2020) . From 20 December 2020 onward, IZSAM started to regularly perform Next Generation Sequencing (NGS) on SARS-CoV-2 PCR-positive swabs collected in Abruzzo since the second half of the month. On 24 December 2020, SARS-CoV-2 Alpha variant was identified for the first time in four samples collected about one week earlier in Guardiagrele (42°11'26.7"N -14°13'15.3"E), a town of nearly 9,000 inhabitants in the hinterland of the Chieti province. The testing and sequencing of suspected SARS-CoV-2 cases and traced contacts in the Abruzzo region was conducted within the official surveillance program established by the Italian health authorities, and is exempt from ethical approval. Nasopharyngeal swabs were collected from individuals showing SARS-CoV-2 clinical signs, either hospitalized or not, screened in the framework of contact-tracing activities or monitoring programs for employees of the national health care system (Servizio Sanitario Nazionale, SSN). Detection of SARS-Cov-2 RNA in the swabs followed the laboratory procedure and diagnostic methods described in Lorusso et al. . The workflow for SARS-CoV-2 RNA detection followed two main steps: the viral inactivation (PrimeStore® MTM) carried out in a BSL3 biocontainment laboratory, and RNA detection by the TaqManTM 2019-nCoV Assay Kit v2 (Thermo Fisher, qPCR), targeting three different portions of SARS-CoV-2 genome located in the replicase, S and N protein encoding genes. Selected positive samples showing threshold cycle (C T ) values less or equal to 25 were further processed by NGS in order to obtain the whole genome sequence of the occurring strains. Genome sequencing was performed as described in Di J o u r n a l P r e -p r o o f Giallonardo et al. (Giallonardo et al., 2020) . SARS-CoV-2 lineages were assigned to each sequence using the Pangolin COVID-19 Lineage Assigner tool v2.0.7 (Github, 2021) . Sequences, once produced, were immediately shared with the GISAID database. Alpha SARS-CoV-2 confirmed cases detected from 1 December 2020 to 18 January 2021 in the Chieti province were firstly considered in the cluster analysis. When genome sequencing was not possible (i.e. high PCR C T values), spike gene target failure (SGTF) was taken as proxy of Alpha variant. SGTF is defined as any test with C T < 30 for ORF1ab and N targets but no detectable S gene (Davies et al., 2021b) . The presence of SGTF and a link to one or more Alpha sequenced cases were used for assigning these individuals to Alpha cluster. If PCR results were not available (i.e. sample processed by a different laboratory) the link with the Alpha confirmed case was individually evaluated on the basis of the robustness of the epidemiological information. Data on confirmed SARS-CoV-2 cases in Chieti province are routinely collected and stored in an electronic database by the Local Health Authority. Personal data (i.e. name, surname, address, municipality, individual fiscal code, date of birth) and epidemiological information (i.e. case unique identification number; date of onset of symptoms; date of sampling and date of diagnosis; date of start of health surveillance; date of recovery; any eventual additional remarks like workplace details, school attended, or any other relevant information, and, if known, in-contact case and relationship with the in-contact case) of the study population were retrieved from the Local Health Authority dataset on 8 February 2021. Data quality was improved by means of clean-up, validation, and update of the original information. Tools and network analysis techniques were used to assist the data validation process and rebuilding the infectious transmission chains. Cases were grouped into clusters on the basis of epidemiological information (hereafter referred as "epi-clusters"). Links between cases were categorised according to common environment: household/friends, work/occupational, school, health care structures, spatial proximity (households within 5 meters), or unknown. A relational spatial-temporal network dataset was created (by using visNetwork, dplyr and igraph libraries in R environment, version 1.4.1106). To identify all possible transmission chains, a trace forward analysis was performed, using each case as seed and a time window of 66 days, starting from 1 December 2020. The network analysis was conducted to identify subnetwork structures providing a greater contribution to the transmission of the infection. Address and municipality information was used to find spatial coordinates through OpenStreetMap and geocoding procedures developed using the R package tidygeocoder (Cambon and Hernangómez, 2021) . The detailed contact tracing procedure is described in the Supplementary Information. J o u r n a l P r e -p r o o f Gender, age, presence of symptoms and fatality rate were evaluated. A two-tailed Mann-Whitney test was applied to evaluate differences between the age of symptomatic and asymptomatic cases. Data management was performed by using Microsoft Excel® (Microsoft Corporation, 2013) and Microsoft Access® (Microsoft Corporation, 2013) . Statistical analysis was performed by Statistical Software for Excel- XLSTAT (XLSTAT Version 2013 .2.04 Copyright Addinsoft, 1995 -2013 . The effective reproduction number Re(t) represents the number of expected secondary cases deriving from each primary case at time t. Its calculation is used as an indicator of the epidemic trend or to evaluate the effectiveness of interventions. To estimate Re(t), it is essential to quantify the serial interval number or the generation interval (Griffin et al., 2020) . As the dataset of the Alpha cluster contained detailed information on the possible transmission chains, it was used for providing the incidence time series and the corresponding contact chains for the whole outbreak. A procedure to sample plausible exposure times within a reconstructed contact chain was developed. Exposure times, with assumed upper and lower bounds, were calculated from the index case of each epi-cluster. The process was iterated for each epi-cluster. At the end of the iteration 10,000 infection trees were generated. The detailed procedure is described in the Supplementary Information. The model described by Cori et al. (Cori et al., 2013) for Re(t) estimation, integrated with the best practice procedures and considerations on Re(t) estimation proposed by Gostic et al. (Gostic et al., 2020) , was applied to the obtained infection incidence and generation interval distributions. The average daily infection incidence was calculated across the set of generated trees. . The generation interval was determined by first calculating the median of the times for each infector-infected pair. Then, a Bayesian Markov Chain Monte Carlo (MCMC) approach has been used to estimate the distribution parameters, given the median generation intervals and assuming they were Gamma distributed (using the uninformative Uniform(0, 10) as prior for both shape and scale parameters). In this way, the uncertainty of the generation interval parameters is linked to the size of the chain rather than the number of trees generated. The parameters of the final distribution are obtained from the median values of the distributions of the parameters. The BayesTools library was used for the MCMC framework. Re(t) calculation was carried out following two different approaches, one using knowledge of the infectious transmission chain and the other based on symptom onset data. The first approach consists of calculating Re(t) on simulated trees starting from the real infectious transmission chain observed. This method, therefore, allows to determine Re(t) from the date of symptoms onset and to simulate the exposure time. The average case-reproduction number (the number of secondary cases per each case infected on day t) distribution was used to estimate Re(t) from the 10,000 generated trees, similarly to what was done by Hens et al. (Hens et al., 2012) . For comparison to the first approach, it was applied the model described by Cori et al. (Cori et al., 2013) , and implemented in R package Epiestim (Cori et al., 2019) , which allows estimating Re(t) in a Bayesian framework by having the incidence and the serial interval distribution. Gostic et al., and Knight & Mishra (Gostic et al., 2020; Knight and Mishra, 2020) showed that using infection incidence and generation interval improves the accuracy of the estimations. Thus, estimations derived from generated trees were used to feed the Cori model and to obtain Re(t) on a sliding window of seven days. A random subset of 1,500 Alpha variant sequences was obtained from GISAID database (GISAID, 2021) covering a time span from 1 November 2020 to 10 March 2021. This 'background' data was combined with Italian sequence data (n=608, 18 December 2020 to 1 March 2021). An alignment was performed in MAFFT applying the L-INS-I algorithm and manually inspected for accuracy using Geneious Prime® 2021.1.1 (Biomatters, 2020) . Sequences which did not cover the complete coding region or had >5% ambiguities were removed. The final alignment consisted of 608 Italian sequences and 1,363 global sequences. A P.1 lineage was used as an outgroup (EPI_ISL_833137). A phylogenetic tree was estimated in IQ-TREE using the Hasegawa-Kishino-Yano nucleotide substitution model with a gamma distributed rate variation among sites (HKY+Γ) and an SH-like approximate likelihood ratio test for branch support (1000 replicates). All Italian sequences fell into one node containing 1,097 sequences (see Supplementary Information, Figure S10 ), and which was used for subsequent analysis. Italian clusters with more than 10 Italian sequences were defined as branches with high node support (>80%) and less than 5% of global sequences. From 24 December 2020 to 8 February 2021, 94 cases were confirmed by genome sequencing to be infected by SARS-CoV-2 Alpha variant. Additional 69 individuals were recognized as suspected to be infected by Alpha because epidemiologically linked to one or more confirmed Alpha cases. Of the 69 suspected cases, 64 samples were not suitable for sequencing but showed SGTF. For the remaining five cases the information about PCR Ct were not available and they were considered only on the basis of strong epidemiological connections with one confirmed Alpha variant case. The overall study population comprised 163 cases: 85 women (52%) and 78 men (47%). The mean age was 46 years, and the most represented age group was 50-59 years old (32 cases, 19.63%). The 76.69% of people (125/163) reported some SARS-CoV-2 clinical symptoms, while 38 cases (23%) were asymptomatic. The median age of symptomatic (51 years, ±29 years of IQR) and asymptomatic cases (45 years ± 42.75 years of IQR) were not significantly different (p-value = 0.137, two tails Mann-Whitney Test). Two fatalities were recorded within the study population (2/163, 1.23%), one man and one woman, of 63 and 41 years old, respectively. Two large epi-clusters (C1 and C7) were documented. C1, which consists of 48 cases, was the first to occur. The acknowledged index case reported to the Local Health Authority an unspecified contact with people coming from outside of the region. C7 is the largest identified epi-cluster and it was linked to a medical centre. Among the 53 cases of C7, 15 were inpatients of the centre, 15 workers (health personnel and cleaning workers) and 19 related family members and friends. C3-C6, and C8-C14 were mostly familyrelated epi-clusters (from 2 to maximum 10 cases each). C8 and C3 also included work-related transmission links. It is noteworthy that C2 epi-cluster, apparently unrelated to any other cluster, was directly caused by the return from the UK of a person, who resulted positive to nasopharyngeal swab after the arrival at The average and the standard deviation of the median generation intervals (for each infector-infected person pair over the 10,000 trees) resulted in 4.7 days and 2.9 days respectively. The infection incidence curve estimated by the generated trees is shown in Figure 3A . The overall period considered ranges from 4 December 2020 to 17 January 2021, resulting in eight days back shifted with respect to the onset of the first symptoms (12 December) and 1 day before the date of the last positive J o u r n a l P r e -p r o o f test. The curves of the estimated Re(t) values according to the model described by Cori et al. (Cori et al., 2013) and considering the average case-reproduction number (the number of secondary cases per each case infected on day t) distribution were reported in Figure 3B . Although in the initial growth phase, i.e. during the first 10 days of the epidemic, Case Re(t) shows a greater dispersion and higher values than the Re(t) Cori distribution, the Re(t) curves show a similar trend, and both of them reach the value 1 when the infection incidence achieves the peak (maximum number of 9 cases on 27 December). All Italian Alpha sequences fell within one node in the global phylogeny. There was an overrepresentation of 61% of sequences from infections in Abruzzo (n=372), and 16% and 12% from Molise (n=99) and Campania (n=70), respectively. The phylogeny reveals limited mixing between the sequences detected in Italy and those abroad (Supplementary Information, Figure S10 ). We identified six larger clades with more than 10 Italian sequences and node support >80%. These contained 26, 129, 19, 383, and 14 sequences, respectively ( Figure 4A ). Within these phylogenetic clades most infections were sampled within one region only, but infections from the same region were scatter across different clades ( Figure 4A ). We found two clades with infections from Campania (clades 1 and 5), two clades with infections from Molise (clades 2 and 3), and also two clades with infections from Abruzzo (clades 2 and 5). This suggests multiple introductions of Alpha into different regions with very limited subsequent inter-region transmission. Notably, the four sequences linked to C2-epiclusters were not linked to any of these six clades but instead formed independent branches on the phylogeny. The latter was further subdivided into Guardiagrele (which is part of Chieti) and all other towns in this province. Within this clade, 55% (n=188) of Abruzzo sequences were from Chieti, and additional 17% (n=59) were from Guardiagrele only ( Figure 4C) . Notably, the first infection was sampled in Guardiagrele on 18 December and only three days later in Chieti (21 December), then on the 28 December in L'Aquila, 29 December in Pescara, and lastly on the 30 December in Teramo. We could not determine the exact number of within and between province transmission due to low node support. However, the phylogeny does not exclude between province transmission as sequences from all four provinces are scattered across the phylogeny. Although, such mixing was limited. Noteworthy, social distancing measures were already in place in Abruzzo region when the outbreak in Guardiagrele started. The region was classified as "red area" from mid-November 2020 to 7 December and from 23 December until 7 January 2021, and "orange area" during the period in between. Red is the highest level of alert and it foresees the ban of people movement, even within the municipalities, apart from those for urgent needs related to health or necessary works (e.g.: linked to food supply production chain, health workers, etc.). The measures foreseen in the orange area include the ban of movements across regions, except for urgent or working reasons. However, the restrictions in place imposed by the partial lockdown appeared not able to stop the spread of the Alpha variant in a context like that in Guardiagrele, where the transmission occurred at least for the two third in family settings. When the period of time before the official confirmation of the first cases is taken into consideration (before the 12 December 2020), the Re(t) calculated according to the model of Cori et al. (Cori et al., 2013) was higher than 2 ( Figure 3B) . A systematic review of 29 studies on the reproductive number for SARS-CoV-2 transmission showed an overall estimation of 2.87 (95% CI: 2.39-3.44), under several different conditions, with the highest values reported for the Diamond Princess Cruise Ship in Japan (14.8), followed by some country-level estimations, as those reported for France (6.32, 95% CI: 5.72-6.99), Germany (6.07, 95% CI: 5.51-6.69) and Spain (3.56, 95% CI: 1.62-7.82) (Billah et al., 2020) . These values seem clearly higher than our estimation. However, when the Alpha variant is considered and studies more similar to ours are taken into account, the distance appears less important. Using the method described in Abbott et al. (2020) and in Sherratt et al. (2021) , and implemented in the EpiNow2 R package (Abbott, 2020) , Davies et al. (Davies et al., 2021a) estimated Re(t) values of Alpha variant, from October 2020 to January 2021, for some regions of England, which varied from 1.01 to 1.04 (Davies et al., 2021a) . In Switzerland, between 01 January and 17 January 2021, before the introduction of more severe control measures, the reproductive number for the Alpha variant was significantly above 1 (1.24 [1.07-1.41], and 1.46 [1.21-1.72] according two different datasets) . Nevertheless, considering the specific approach followed in our work, the results of any comparison of Re(t) values estimated by us with those reported in previous published papers should be carefully interpreted. In fact, our estimations are strictly linked to the epidemiological situation observed in the area under study during the time window considered, thus influenced by the control measures in place, the people's behaviour and the level of immunity of the population. In our approach, the median generation intervals across the set of generated trees have been used to estimate the distribution parameters, assuming they were Gamma distributed. The uncertainty of the generation interval parameters is linked to the size of the chain rather than to the number of generated trees. However, this approach could be biased by the variance reduction of the estimated generation interval distribution. In the cluster observed in Guardiagrele, social distancing measures, strict trace back, testing and quarantine measures were in place, which probably contained and eventually interrupted the viral transmission. In addition, the lack of epidemiological links between some observed cases in Guardiagrele may have led to underestimate the real Re(t) as well as the assumptions made in the model (e.g. considering the incubation period of 1-15 days) may limit the comparability of our results with those of other authors. As mentioned, the available dataset did not include the full transmission chain. Therefore, we decided to apply the EpiEstim method to overcome the availability of only incomplete epidemiological data (i.e., chain of transmission, introduction of the virus, case zero, etc.). Also, we attempted to identify a plausible chain of transmission, based on available data. As reported in literature, many models could be applied for the calculation of Re(t) when the chain of transmission is unknown, but the dates of clinical onset of symptoms are available (Abbott et al., 2020a (Abbott et al., , 2020b . One of the first described, is the one proposed by Cori et al. (Cori et al., 2013) , and later improved in the EpiEstim package (Cori et al., 2019) . However, the main limits of Cori's approach have been reported by Gostic et al. (Gostic et al., 2020) : the method is distorted because of misspecifications of the generation interval and because of the existing lag between infection's incidence and symptoms' incidence. Although the limits of the method presented by Cori and colleagues have been exceeded by other packages (Abbott, 2020) , in the present work the authors have applied the EpiEstim package also because, at the time of our study, EpiEstim was the standard method applied by the Italian National Institute of Health (ISS) J o u r n a l P r e -p r o o f when defining restriction area/regions in Italy (Istituto Superiore di Sanità, 2021d) . The proposed method aimed at temporally relating the estimated case reproduction number to infections date; in our view, this approach would have been more useful than obtaining estimates based on symptoms' onset dates. In fact, in this second case, the impact of control measures would have been observed with a temporal delay because of the incubation period distribution. Therefore, we think that our approach could be used to evaluate applied control measures in a timelier manner. Although a case-by-case comparison between genetic and epidemiological data cannot be done, since sequence data could be used only for 59 of 163 cases belonging to the Guardiagrele cluster, the results of the genetic analyses do not seem to contradict the main findings of the epidemiological investigations carried out. The phylogenetic analysis shows that the sequences obtained from the cases linked to this cluster are very closely related each other, with a limited mixing with other SARS-CoV-2 Alpha sequences from the same province ( Figure 4 ). The sole exception is represented by the sequences belonging to the epi-cluster C2, falling within a different node, thus confirming the different introduction pathway as resulted by the epidemiological investigations. In particular, the results of the genetic analyses do not seem to contradict the hypothesis arising from the epidemiological investigations about the possibility that the observed cluster in Guardiagrele, with the exception of C2 epi-cluster, might derive from a single introduction from outside occurred in early December 2020. Although multiple introductions cannot be excluded, the close similarity of genome sequences would suggest that a single common source of infection caused of the SARS-CoV-2 cluster in Guardiagrele. The observed differences among the sequences might be coherent with a local independent evolution during the outbreak, with no or limited re-introduction from outside the cluster. The limited mixing between geographic regions investigated might be also due to the lockdown restrictions imposed and the reduced number of travels allowed across the country. However, several factors may have influenced and partially distorted the results of our epidemiological investigations. The lack of evidence of epidemiological connections among the observed epi-clusters may indicate the existence of multiple infection sources or, more coherently with the results of the genetic analyses, may be linked to the difficulties of identifying all possible social connections among cases and exposure opportunities, which can easily occur in a small town like Guardiagrele. Likewise, hesitancy in epidemiological interviews may have interfered with the identification of linkages within and between epiclusters. The methodological approach presented in this study allowed to estimate in a more accurate way the Re(t) and other transmission parameters (generation interval and incubation period), using data from the J o u r n a l P r e -p r o o f epidemiological investigations carried out on a cluster of SARS-CoV-2 Alpha variant infection, identified in a town of Abruzzo region (Italy) on late December 2020. The whole genome sequencing of viral RNA present in the nasopharyngeal swabs of cases belonging to the outbreak under study, partially confirmed that the virus belonged to the same phylogenetic node, suggesting that one or limited unspecified introductions from outside in early December 2020 led to the diffusion of Alpha variant in Guardiagrele and in the neighbouring municipalities. Figure 1 . Network based on epidemiological data. 14 apparently unrelated epi-clusters were identified, with 163 cases (circles) and 149 transmission links (arrows). Epi-clusters are numbered (number located on the index case) and circles' diameter is proportional to the number of secondary cases. Color of the circles represent date of symptom onset: recorded (black circles) or not (grey circles). Arrows represent the transmission chain, from the infector to the infected. J o u r n a l P r e -p r o o f Figure 2 . Epi-clusters and nature of connections between cases. One-hundred contacts happened in the family/friends context; 25 connections were related to working locations, 15 connections were linked to a residential medical center; 3 cases were associated to a fourth one because of attending the same school; spatial analysis highlighted 2 connections between cases, and, lastly, the nature of four connections reported by the Local Health Authorities was not specified. J o u r n a l P r e -p r o o f epiforecasts/EpiNow2: Initial release EpiNow2: Estimate real-time case counts and time-varying epidemiological parameters Estimating the time-varying reproduction number of SARS-CoV-2 using national and subnational case counts Risk of hospitalisation associated with infection with SARS-CoV-2 lineage B.1.1.7 in Denmark: an observational cohort study Reproductive number of coronavirus: A systematic review and meta-analysis based on global level evidence Geneious -Bioinformatics Software for Sequence Data Analysis Increased Household Secondary Attacks Rates With Variant of Concern Severe Acute Respiratory Syndrome Coronavirus 2 Index Cases Infection sustained by lineage B.1.1.7 of SARS-CoV-2 is characterised by longer persistence and higher viral RNA loads in nasopharyngeal swabs Tidygeocoder: Geocoding Made Easy SARS-CoV-2 Variant Classifications and Definitions A covid-19 hotspot area: Activities and epidemiological findings A new framework and software to estimate time-varying reproduction numbers during epidemics SARS-CoV-2 RNA persistence in naso-pharyngeal swabs CMMID COVID-19 Working Group Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England Increased mortality in community-tested cases of SARS-CoV-2 lineage B.1.1.7 Genomic Epidemiology of the First Wave of SARS-CoV-2 in Italy Ulteriori misure urgenti in materia di contenimento e gestione dell'emergenza epidemiologica da COVID-19 Cov-lineages/pangolin: Software package for assigning SARS-CoV-2 genome sequences to global lineages Practical considerations for measuring the effective reproductive number Rapid review of available evidence on the serial interval and generation time of COVID-19 Robust reconstruction and analysis of outbreak data: Influenza A(H1N1)v transmission in a school-based population Prevalenza della variante VOC 202012/01, lineage B.1.1.7 in Italia -Studio di prevalenza 4-5 febbraio 2021 Prevalenza delle varianti VOC (Variant Of Concern) in Italia: lineage B Prevalenza delle VOC (Variant Of Concern) del virus SARS-CoV-2 in Italia: lineage B.1.1.7, P.1 e B.1.351, e altre varianti (Variant Of Interest, VOI) tra cui lineage P FAQ sul calcolo del Rt [WWW Document A "One-Health" approach for diagnosis and molecular characterization of SARS-CoV-2 in Italy. One Heal Neutralization of SARS-CoV-2 lineage B.1.1.7 pseudovirus by BNT162b2 vaccine-elicited human sera 202012/01 -Technical briefing 4, gov.uk Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations A municipality-based approach using commuting census data to characterize the vulnerability to influenza-like epidemic: The COVID-19 application in Italy Exploring surveillance data biases when estimating the reproduction number: With insights into subpopulation transmission of COVID-19 in England A cluster of the coronavirus's U.K. variant was found in Italy. Four cases grew to 29 before the town was alert World Health Organization (WHO), 2021a. Tracking SARS-CoV-2 variants World Health Organization (WHO), 2021b. COVID-19 Weekly Epidemiological Update 45, World Health Organization A) Maximum likelihood tree showing 1097 full-genome SARS-CoV-2 sequences. Branches are colored according to Italian regions; global data is shown in grey. For the five main Italian clusters the proportion of sequences per region is shown as a pie chart. (B) The largest clusters containing all sequences from Guardiagrele is shown enlarged (n=383). Branches are colored according to the province in Abruzzo. Sequences from Guardiagrele are marked with white. (C) the provinces in Abruzzo and the proportion of sequences from each province in the large cluster Writing -Review & Editing. Luca Candeloro: Methodology; Software; Formal analysis; Writing -Original Draft; Writing -Review & Editing. Arturo Di Girolamo: Conceptualization; Formal analysis; Data Curation; Investigation. Lara Savini: Methodology; Software; Formal analysis; Writing -Original Draft; Writing -Review & Editing. Ilaria Puglia: Methodology; Data Curation;. Maurilia Marcacci: Methodology; Data Curation;. Marialuigia Caporale: Methodology; Data Curation;. Iolanda Mangone: Methodology; Data Curation;. Cesare Cammà: Methodology; Data Curation;. Annamaria Conte: Conceptualization; Methodology; Software; Writing -Review & Editing. Giuseppe Torzi: Conceptualization; Investigation; Resources; Supervision. Adamo Mancinelli: Conceptualization Writing -Review & Editing. Alessio Lorusso: Methodology; Data Curation; Writing -Review & Editing. Giacomo Migliorati: Supervision; Project administration; Funding acquisition. Thomas Schael: Supervision; Project administration; Resources, Funding acquisition. Nicola D'Alterio: Supervision; Project administration; Funding acquisition. Paolo Calistri: Conceptualization; Methodology Writing -Original Draft; Visualization; Supervision Mention of trade names or commercial products in this article is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the IZSAM. No potential conflict of interest was reported by the author(s). ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:Highlights  SARS-CoV-2 Alpha variant has been identified in Guardiagrele (Abruzzo, Italy) starting from late December 2020;  Epidemiological investigations led to the identification of epi-clusters comprising 163 Alpha variant cases;  A reconstructed transmission chain can be used to estimate transmission parameters including Re(t);  A comparison between sequences in the GISAID database supports limited virus introductions scenario in the area.