key: cord-0930761-8niqpwvc authors: Santamaria, Luis; Hortal, Joaquin title: Chasing the ghost of infection past: identifying thresholds of change during the COVID-19 infection in Spain date: 2020-04-14 journal: nan DOI: 10.1101/2020.04.09.20059345 sha: 24c5610d5aea11c5216c113e9770184ece0f1ef3 doc_id: 930761 cord_uid: 8niqpwvc COVID-19 pandemic has spread worldwide rapidly from its first outbreak in China, with different impacts depending on the age and social structure of the populations, and the measures taken by each government. Within Europe, the first countries to be strongly affected have been Italy and Spain. In Spain, infection has expanded in highly populated areas, resulting in one of the largest nationwide bursts so far by early April. We analyze the evolution of the growth curve of the epidemic in both the whole of Spain and Madrid Autonomous Region (the second largest conurbation in Europe), based on the cumulative numbers of reported cases and deaths. We conducted segmented, linear regressions on log-transformed data to identify changes in the slope of these curves and/or sudden shifts in the number of cases (i.e. changes in the intercept) at fitted breaking points, and compared their results with a timeline including both key events of the epidemic and containment measures taken by the national and regional governments. Results were largely consistent in the four curves analyzed (reported infections and deaths for Spain and Madrid, respectively), showing two major shifts in slopes (growth rates) at 14-15 and 26-29 March that resulted in 37-65% reductions of slope, and originated in infection on 4-5 and 16-18 March (for case detections) and 14-23 February and 5-6 March (for deaths). Small upward shifts in the progress of the disease in Madrid were not associated with significant changes in the intercept of the curve, and seem related with unevenness in case reporting. These results evidence an early deceleration in the spread of COVID-19 coinciding with personal hygiene and social distancing recommendations, as well as the general awareness of the population; and a second, stronger decrease when harder isolation measures were enforced. The combination of both breakpoints seemingly led to the start of the contention of the disease outbreak by early April, the limit of our time series. This highlights the importance of adopting public health strategies that include disseminating basic knowledge on personal hygiene and reduced social contact at the onset of the epidemic, and the importance of early enforcement of hard contention measures for its subsequent contention. COVID-19 infection has rapidly spread worldwide since its first outbreak in Wuhan (China) in mid December 2019. The global number of confirmed cases has gone over one million on 3rd April 2020 (John Hopkins University Coronavirus Resource Center, see Dong et al. 2020) , barely 3 months after its first report on 31st December. Individuals infected with COVID-19 remain asymptomatic for 5-6 days, while presenting enough viral load to be infective after 1-2 days of infection (Linton et al. 2020 , Lai et al. 2020 . Severe cases require hospitalisation 3-15 days after the appearance of the first symptoms, which are similar to other infectious respiratory illnesses. This, together with the initial unawareness of the population, led to a high transmission rate of the infection, which spread rapidly to neighbouring countries, the Middle East and Europe, and then the rest of the world (see https://nextstrain.org/ncov). An increasing number of countries was progressively affected, and they responded differently depending on the WHO and local expert advice at the moment, the structure and resources of their public health systems, their R+D+i capacity (which determined the number of PCRs available for testing contagions from blood samples, among other things), and their ability to implement social distance measures. The diversity of policy responses, together with the preexisting differences in spatial aggregation, social behaviour and age structure of their populations, provide an unique array of test cases to understand how different levels and combinations of preventive quarantine and social-distancing measures affected the spread of the pandemic. COVID-19 arrived to mainland Spain in early February (first recorded hospitalisation dates back to 15th February; Table S1 ). During the first 2-3 weeks of February, COVID-19 infection reached Spain at least three times, via UK and Italy -as evidenced by the presence of three different genetic clusters identified by nextstrain (Hadfield et al. 2018 ; last accessed 8th April). Different from Italy, where infections were concentrated in the North, the combination of these three introductions with early, unnoticed community transmission resulted in consecutive outbreaks in distant, highly populated areas of the Basque Country and Navarra (North), Madrid (Center), Catalonia (North East), Andalusia (South) and Valencia (East) (see timeline in Figure 5 below, and Table S1 ). The spatial structure of the Spanish populations has played a role in the particularly rapid spread of the pandemic in some regions the country. Its impact has been harsher in the big conurbations of Madrid (around 6.4M people; second most populated Metropolitan area of the EU, after Paris) and Barcelona (c. 5 .4M) -as well as in Álava, Navarra and La Rioja (c. 1M in total), following the early infection of healthcare workers from Txagorritxu Hospital. Balearic and Canary archipelagos also received infections from the early onset of the pandemic, so it is reasonable to assume that by early March COVID-19 infections were widely distributed throughout the whole country. Several factors make Spanish data one of the most fair accounts of the effects of the pandemic at the country and regional levels, together with Italy and, specially, South Korea. Although the lack of enough tests that has been pervasive for most countries (except South Korea), Spain has achieved one of the highest infection test ratios per capita (Clark et al. 2020) , thanks to the early mobilisation of most PCR machines available in universities and research centres for either COVID-19 testing or COVID-19 research. Importantly, only cases testing positive in the PCR make it to the official statistics -and (similar to Italy but different to other European countries) all deaths testing positive are registered as caused by COVID-19 infection, including those associated with previous pathologies or happening outside hospitals (e.g. in private homes and nursing homes). These data provide an underestimation of the total population infected and the number of fatalities-due to the limited number of tests; although for the number of deaths this is partly compensated by the lethality associated to other pathologies that is attributed to COVID-19 when PCRs render positive tests. However, the relatively homogeneous intensity of testing and the stability of criteria for disease attribution throughout the time period of this analysis probably result in unbiased estimators for the spread of the pandemic. It is therefore safe to assume that the number of reported cases of infection and the number of deaths are reasonably good proxies for the advance of the pandemic. Here we characterize the growth curve of COVID-19 infections in the whole of Spain, from the onset of the pandemic in early February through the establishment of increasingly more restrictive social and governmental restrictions to mobility and personal contact. We also perform he analyses for the Madrid Autonomous Region (Madrid hereafter), a highly populated area with good public transportation and a high daily commuting rate, which represented the country's largest focus of the pandemic -as it represents a prime example of the spread of the virus in a large, mostly panmictic population through time, and the effect of social-distancing measures thereupon. The adoption of containment measures by the national and regional governments followed a sustained increment through time, from the recommendation of preventive measures in late February and early March, to increasingly stricter social-distancing measures on 9-10 March, to a nationwide lockdown announced on 13 March and enforced on 15 March, to the the closure of all non-essential economic activities on 31 March (see Figure 5 below, and Table S1 ). Such sequence of measures was broadly discussed by experts, media and social media, with opinions ranging from qualifying them as exaggerated or unnecessary during the first weeks of the outbreak; to criticizing them as tardy of insufficient in the weeks that followed. Two controversies have been particularly strong: (i) were preventive and soft socialdistancing measures useful, or should hard social-distancing measures have been introduced from the early moments (late February to early March)?, and (ii) did the mass events on the weekend of 7-8 March, coinciding with the International Women's Day demonstrations (over 300k attendants in the whole country, 120K in Madrid) and premier football league matches (around 280k spectators in total and 72K in Madrid, respectively) trigger the early spread of the pandemic in Spain's largest cities, specially in Madrid? Bearing this temporal sequence in mind, we analyze the growth curves of the cumulative numbers of cases detected and the cumulative number of deaths for both the whole of Spain and Madrid, focusing specifically in the changes in the growth rate (i.e. the slope of log-transformed data) of these curves through time. Based on this analysis, we seek to answer two questions: (1) how effective were the different social-distancing measures in reducing infection and mortality rates?; and (2) how significant were the effects of 7-8 March mass gatherings on the expansion of the epidemic, compared with other key events and control measures? Data on the different events that marked the evolution of the pandemic in Spain (e.g. first cases detected, large infection bouts, first deaths) or influenced its perception by the general public, as well as policy measures (e.g. preventive isolation, social-distancing, lockdowns) and putative key events (e.g. large gatherings associated to sport events, political demonstrations and party rallies), were gathered from official sources, national and international media, and scientific publications. Whenever possible, and in all cases for policy measures, we confirmed their date and content from official documents and/or websites from international, national or regional institutions. We include a broad list of events in Table S1 and selected the most relevant ones for the timeline shown, together with the results of the statistical analyses, in Figure 5 's graphical summary. Official data on the (i) cumulative number of cases, and (iii) cumulative number of deaths were obtained from the daily Covid reports of the Spanish Ministry of Health, as compiled by the Worldometer Coronavirus data service (for national data) and the Covid data service of eldiario.es (for regional data). Data were extracted at two levels of aggregation, for Spain as a whole country, and for Madrid Autonomous Region (i.e. Comunidad Autónoma de Madrid). For the analyses we included data from the first day in which at least 10 cases or at least 1 death were measured; and extended the analyses to 22-24 days after the onset of social-distancing measures on 13-15/3/20, a period doubling the average infection-to-detection time (10.1 days; see next section), and equaling the average infection-to-death time (21 days; see next section). To estimate the infection date of reported cases, we calculated the infection-to-testing time by combining reported values of incubation time (mean = 5.0 days in Lauer et al. 2020 ; median = 5.1 days in Linton et al. 2020 ; mean = 6.4 days in Lai et al. 2020) with time from illness onset to hospital admission for treatment and/or isolation (median = 3.3 days among living cases and 6.5 days among deceased; Linton et al. 2020 ). Hence, we used an infection-to-testing time of 9 days for living cases and 12 days for dead cases. Based on the proportion of 36% deaths to 64% recoveries reported from 3/3/20 to 6/4/20 (for a total of 57,006 closed cases in Spain), we estimated an average infection-to-testing time of 10.1 days -which, for simplicity, was We fitted a family of segmented (broken-line) regressions with no, one, two and three breaking points (Models 1 to 4, with two, four, six and eight parameters respectively) and compared them using the their adjusted R 2 and goodness of fit. Goodness-of-fit comparisons were based on two criteria: (i) the distribution of the residuals; (ii) the adjusted R 2 ; and (iii) a Fstatistic comparing each model with the next level of restriction -that is, to the model with one breakpoint (hence, two parameters) less (Hank et al. 2020). To ensure residuals' homoscedasticity, we used linear fits on log10-transformed data; similar results were, however, obtained using exponential fits on untransformed data (not shown). Fitted breaking points provide objective information on the moment at which infection dynamics changed, while slopes provide information of the direction and magnitude of such . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint . https://doi.org/10. 1101 changes. When analyzing the data from Madrid, we observed discontinuities that suggested that some breakpoints could involve a change in the intercept, rather than in the slope. This would imply a significant shift in values at a given day, followed by a continuous increase at the same growth rate that preceded such day -an scenario consistent, for example, with a sudden increase in infection rate during the mass gatherings of 7-8 March. To test for this possibility, we assessed the fit of an additional model with two breaking points, the first one involving a change in the intercept and the second one involving a change in the slope (Model 5). For the whole of Spain, the model with two breaking points (Model 3) provided the best fit (Table 1) . Fitted breaking points were placed on day 17.9 (14/3/20, estimated infection on 4/3/20) and 30.5 (26/3/20, estimated infection on 16/03/20) (Figure 1 ). The growth rate of the number of cases decreased by 49% (from 0.15 to 0.08) after the first breakpoint (14/3/20) and decreased again by another 54% (from 0.08 to 0.03) after the second breakpoint (26/3/20). is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint . https://doi.org/10. 1101 of infection is provided in the upper X-axis. Filled points indicate Sundays. Broken vertical lines indicate the breaking points of the best fit (Model 3), which is shown with a thicker line. The analyses performed on the number of cases from Madrid are consistent with the results for the whole country. The model with two breaking points (Model 3) provided the best fit (Table 2) . Fitted breaking points were placed on day 12.7 (14/3/20, estimated infection on 4/3/20) and 26.6 (28-29/3/20, estimated infection on 18-19/3/20) (Figure 2 ). The growth rate of the number of cases decreased by 65% (from 0.18 to 0.06) after the first breakpoint (14/3/20) and decreased again by 59% (from 0.08 to 0.03) after the second breakpoint (18-19/3/20). An inspection of the values and fits ( Figure 2) shows that the apparent jump in the number of cases detected on 9/3/20 (estimated infection on 28/02/20) was caused by the combination of a decrease during the weekend (7-8/3/20) and an increase the following Monday -which kept the point in line with the previous and posterior values. Indeed, the only fitted model that identified a change of slope (Model 4, breaking point at day 9.0, i.e. on 10/3/20, estimated infection 29/2/20) showed a 19% decrease in the growth rate (from 0.18 to 0.15) at such point -although it provided a non-significant improvement in goodness-of-fit relative to a more-parsimonious model without such breaking point (Model 3) . Similarly, the model with two breaking points involving a change of intercept and a change of slope (Model 5), which resulted in a 11% increase in the intercept (from 1.3 to 1.45) on day 7.1 (8/3/20, estimated infection 27/2/20), did not result in a significantly better fit compared to Model 2 ( Table 2 ). Here it is worth noting that a similar 'decrease-and-jump' in the number of cases was observed one week before, from Saturday 29/2/20 to Monday 2/3/20, although dates before 1/3/20 were not included in the analysis owing to the low number of registered cases (below the ten-cases threshold). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint . https://doi.org/10.1101/2020.04.09.20059345 doi: medRxiv preprint The model with three breaking points (Model 4) provided the best fit (Table 3) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint . https://doi.org/10.1101/2020.04.09.20059345 doi: medRxiv preprint (Table 4 ). Fitted breaking points were placed on day 9.8 (15/3/20, estimated infection on 23/2/20) and 20.7 (26/3/20, estimated infection on 5/3/20) (Figure 4) . The growth rate of the number of cases decreased by 56% (from 0.22 to 0.10) after the first breakpoint (15/3/20) and decreased again by 65% (from 0.10 to 0.03) after the second breakpoint (26/3/20). The model with one slope shift (on day 12.4, i.e. 17/3/20, estimated infection on 25/2/20) and one intercept shift (on day 8.2, i.e. 13/3/20, estimated infection on 21/2/20) showed a marginally significant improvement of goodness-of-fit relative to Model 2 (0.10