key: cord-1033283-w80t2xvp authors: Puca, M.; Buonanno, P. title: Using newspapers obituaries to nowcast daily mortality: evidence from the Italian COVID-19 hot-spots date: 2020-06-03 journal: nan DOI: 10.1101/2020.05.31.20117168 sha: 61f7e0b9c5062551dc66791af638352440f04ad9 doc_id: 1033283 cord_uid: w80t2xvp Real-time tracking of infectious disease outbreaks helps policymakers to make timely data-driven decisions. Official mortality data, whenever available, may be incomplete and published with a substantial delay. We report the results of using newspapers obituaries to nowcast the mortality levels observed in Italy during the COVID-19 outbreak between February 24, 2020 and April 15, 2020. We find that the mortality levels predicted using newspapers obituaries outperforms forecasts based on past mortality according to several performance metrics, making obituaries a potentially powerful alternative source of information to deal with real-time tracking of infectious disease outbreaks. Since the first suspected pneumonia cases observed on December 2019 in Wuhan (China), the novel coronavirus (COVID-19) causing a severe acute respiratory syndrome turned into a global pandemic. 1 Having a timely reaction to control the outbreak of an infectious disease is a fundamental factor for the success of a containment measure [1, 2, 3] . While the number of reported cases and infections suffers from several measurement biases, comparing the total mortality rates to those of previous years offers a reliable information on the severity of an epidemic [4, 5] . Mortality data in the middle of a pandemic, however, are not perfect and difficult to estimate [6, 7] . 2 Mortality records, moreover, are published with substantial delay. For example, Britain's National Statistical Office has recently started to release weekly mortality data after death certificates have been processed. 3 In Italy, the National Statistical Institute released official mortality data about the January 1, 2020 to February 21, 2020 period only on March 31, 2020, and it usually releases mortality data with a one year lag. 4 In this paper we propose to use newspapers obituaries as an alternative source of information to 'nowcast' daily mortality levels. Specifically, we use obituaries published on the local newspapers of Bergamo population of approximately 10 million inhabitants [8, 9] . 5 Figure 1 displays the daily evolution of the raw mortality level (solid line) and the number of published obituaries (dashed line). While obituaries represent only a subset of the officially registered deaths, with a gap increasing at the peak of the outbreak, the correlation between the two measures is glaring. Our contribution. Building on standard forecasting techniques, we show the predictive power of newspapers obituaries as an alternative measure of mortality levels. We also compare different forecasting models and report that obituaries-based forecasts outperform all other considered models according to several accuracy criteria. Notes: This figure shows, for each municipality in our sample, the daily evolution of deaths (solid line) and obituaries (dashed line). Table 1 reports retrospective estimates of daily mortality from February 24, 2020 to May 15, 2020, using several forecasting models, with Panel A (resp. Panel B ) reporting observations for the municipality of Bergamo (resp. Brescia). We compare the estimated mortality level to the true mortality published by ISTAT on May 4, 2020 and computed different accuracy metrics described in 3. These measure include the root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), the Theil's U, the Akaike's information criterion (AIC), and the Bayesian 5 Data on cumulative cases are available at http://www.protezionecivile.gov.it/media-communication/ press-release/detail/-/asset_publisher/default/content/coronavirus-la-situazione-dei-contagi-in-ita-37. 2 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 3, 2020. . Information Criterion (BIC). We compare these measures for (i) ordinary least squares (OLS) estimates; (ii) "augmented" autoregressive-moving-average (AARMA(1,2)) estimates with obituaries as exogenous variables; (iii) one lag autoregressive estimates (AR(1)); three lags autoregressive estimates (AR (2)). Comparing these metrics, we report that the AARMA(1,2) model outperforms all other models according to every performance metric, for both municipalities in our sample. (1,2)) refers to the equation yt = µ + yt−1 + obituariest + t−2 τ =t ετ where the obituariest estimate is considered as an exogenous variable. RMSE, MAE, and MAPE are for root mean squared error, mean absolute error, and mean absolute percent error, respectively. Theil's U statistic [10] is the ratio between the RMSE of a model and the RMSE of a naive forecast (i.e. yt+1 = yy). Lower values of the statistics imply a more accurate forecasting model. AIC and BIC refer to the Akaike's Information Criterion and the Bayesian Information Criterion, respectively. Lower values of these metrics imply a lower out-of-sample prediction error. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 3, 2020. . https://doi.org/10.1101/2020.05.31.20117168 doi: medRxiv preprint The basic principle of now-casting is exploiting information which is published at a higher frequency than the variable of interest [11] . We explore the accuracy of newspapers obituaries published in local newspapers in predicting actual daily mortality in almost real time. Newspapers obituaries contain information on individual characteristics such as name, surname, gender, age, date of death, and the municipality of death. This information allows us to increase the information set available to external observers and estimate a real-time mortality rate. Newspapers obituaries. We digitalized newspapers obituaries published by L'Eco di Bergamo and Il Giornale di Brescia, the two most read and circulated newspapers in the province of Bergamo and in the province of Brescia, respectively. 6 Our final dataset contains 4,054 unique individuals from February 24 to May 14, 2020 for the province of Bergamo and 3,784 unique individuals for the province of Brescia over the same period. We combine obituaries data with mortality data at the municipality level released by the Italian National Statistical Institute (ISTAT) on May 9, 2020. 7 The ISTAT dataset contains daily deaths at the municipality level from January 1 to April 15, 2020 for a sample of 4,433 Italian municipalities. The ISTAT sample covers the universe of municipalities belonging to the two provinces of our analysis (243 municipalities in the province of Bergamo and 205 municipalities in the province of Brescia). 6 In 2019, the daily number of readers of L'Eco di Bergamo has been 402,000, while the daily number of readers of Il Giornale di Brescia has been 427,000. Source: http://audipress.it/quotidiani/ 7 Data are available at the ISTAT website: https://www.istat.it/it/archivio/240401. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 3, 2020. . https://doi.org/10.1101/2020.05.31.20117168 doi: medRxiv preprint Formulation of the AARMA(1,2) model. Our AARMA(1,2) model is motivated by the inspection of the autocorrelation and partial autocorrelation plots, which display a one lag significant autocorrelation coefficient, and a two lags partial autocorrelation coefficients. This leads us to estimate the following where y t = ln(mortality t ) is the log-transformed mortality observed at time t, x t = ln(obituaries t ) is the log-transformed number of newspapers obituaries published at time t, which is assumed to be exogenous with respect to the time series {y t } (i.e. E[ε t |x t ] = 0). Accuracy metrics. The RMSE, MAE, MAPE, and Theil's U of the estimatorŷ t to the target mortality level y t are defined, respectively, as RM SE(ŷ t , y t ) = [1/n n t=1 (ŷ t − y t ) 2 ] 1/2 , M AE(ŷ t , y t ) = 1/n n t=1 |ŷ t − y t |, M AE(ŷ t , y t ) = 1/n n t=1 |ŷ t − y t |/y t , T heil(ŷ t , y t ) = RM SE(ŷ t , y t )/RM SE naive , where RM SE naive refers to the RMSE of a naive forecast, i.e. y t = y t−1 . The AIC and BIC are defined, respectively, as AIC = 2k − 2 ln(L) and BIC = k ln(T ) − 2 ln(L), whereL maximizes the likelihood function of the estimated model, k is the number of estimated parameters, and T is the sample size. We use newspapers obituaries to nowcast the mortality levels observed in Italy during the COVID-19 outbreak peak. We find that forecasting models using newspapers obituaries outperform other models based on previously observed mortality. Our approach, despite powerful, is not free from limitations. First, newspapers obituaries may underrepresent the actual mortality level, an issue that becomes more severe during the epidemic peak (see Figure 1 ). Such underrepresentation, however, goes against our estimates since it should decrease the precision of our estimates. Second, despite concentrated in the most affected Italian region, our sample refers only to two municipalities. We are agnostic about the existence of heterogeneous individual behavioral attitudes towards publishing newspapers obituaries in other locations. 8 Understanding how such heterogeneity may affect our estimates constitutes a valuable path for future research. Public health interventions and epidemic intensity during the 1918 influenza pandemic The large heterogeneity observed in civic attitude and prosocial behavior across Italian municipalities may play a role in determining such propensity to publish obituaries Accurate estimation of influenza epidemics using google search data via argo Beware of the second wave of covid-19 Estimating the severity of covid-19: evidence from the italian epicenter How deadly is covid-19? a rigorous analysis of excess mortality and age-dependent fatality rates in italy. medRxiv How deadly is covid-19? understanding the difficulties with estimation of its fatality rate Data gaps and the policy response to the novel coronavirus Spread and dynamics of the covid-19 epidemic in italy: Effects of emergency containment measures Patterns of covid-19 related excess mortality in the municipalities of northern italy. medRxiv Applied economic forecasting Now-casting and the real-time data flow The prosperous community: Social capital and public life. The american prospect We thank Nunzia Vallini (Director of Il Giornale di Brescia) and Mauro Torri (CEO of Editoriale Bresciana) for their help. We thank Sergio Galletta for useful comments and discussions. We also thank Endri Avduli and Oumar Ben Salha for research assistance.