key: cord-0967907-kdw9sg8x authors: Rivieccio, B. A.; Micheletti, A.; Maffeo, M.; Zignani, M.; Comunian, A.; Nicolussi, F.; Salini, S.; Manzi, G.; Auxilia, F.; Giudici, M.; Naldi, G.; Gaito, S.; Castaldi, S.; Biganzoli, E. title: CoViD-19, learning from the past: A wavelet and cross-correlation analysis of the epidemic dynamics looking to emergency calls and Twitter trends in Italian Lombardy region date: 2020-10-16 journal: nan DOI: 10.1101/2020.10.14.20212415 sha: 88ae9213a66cfc64776a6c1a74ebe34dc7fe9021 doc_id: 967907 cord_uid: kdw9sg8x The first case of Coronavirus Disease 2019 in Italy was detected on February the 20th in Lombardy region. Since that date, Lombardy has been the most affected Italian region by the epidemic, and its healthcare system underwent a severe crisis during the outbreak. From a public health point of view, therefore, it is fundamental to provide healthcare services with tools that can reveal a possible new epidemic burden with a certain time anticipation, which is the main aim of the present study. Moreover, the sequence of law decrees to face the epidemic and the large amount of news generated in the population feelings of anxiety and suspicion. Considering this whole complex context, it is easily understandable how people overcrowded social media with messages dealing with the pandemic, and emergency numbers were overwhelmed by the calls. Thus, in order to find potential predictors of a possible second epidemic wave, we analyzed data both from Twitter and from emergency services comparing them to the daily infected time series at a regional level. Since our principal goal is to forecast a possible new ascending phase of the epidemic, we performed a wavelet analysis in the time-frequency plane, to finely discriminate over time the anticipation capability of the considered potential predictors. In addition, a cross-correlation analysis has been performed to find a synthetic indicator of the time delay between the predictor and the infected time series. Our results show that Twitter data are more related to social and political dynamics, while the emergency calls trends can be further evaluated as a powerful tool to potentially forecast a new burden. Since we analyzed aggregated regional data, and taking into account also the huge geographical heterogeneity of the epidemic spread, a future perspective would be to conduct the same analysis on a more local basis. The progressive decrease of CoViD-19 cases should not let our guard down, indeed it is 60 clear that, since the beginning of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV- 61 2) pandemic, public health has suffered from the absence of a proper preparedness plan to face an 62 episode which was unexpected and unpredictable and has heavily impacted on the territorial and 63 hospital healthcare services. Planning has a fundamental role nowadays but it can be adequate only 64 if the next possible pandemic peak can be effectively foreseen by means of a predictive tool which 65 accounts for all the available signals. In order to do so, it is of paramount importance to learn from 66 what happened during the first peak to be prepared for the potential next one. The SARS-CoV-2 outbreak in Italy has been characterized by a massive spread of news 68 coming from both official and unofficial sources leading to what has been defined as infodemia, an 69 over-abundance of informationsome accurate and some notthat has made hard for people to 70 find trustworthy sources and reliable guidance needed [4] . 71 Infodemia on SARS-CoV-2 created the perfect field to build suspicion in the population, 72 which was scared and not prepared to face this outbreak. It is understandable how the rapid increase 73 of the cases number, the massive spread of news and the adoption of laws to face this outbreak led 74 to a feeling of anxiety in the population, whose everyday life changed very quickly. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20212415 doi: medRxiv preprint 4 A way to assess the dynamic burden of social anxiety is a context analysis of major social 76 networks activities over the internet. To this aim Twitter represents a possible ideal tool, because of 77 the focused role of the tweets according to the more urgent needs of information and 78 communication, rather than general aspects of social projection and debate as in the case of 79 Facebook, which could provide slower responses for the fast individual and social context evolution 80 dynamics [5] . 81 Taking into account this specific context, it is easy to understand why the 112 emergency 82 number service in Lombardy region was suddenly overwhelmed by an enormous number of calls 83 that rapidly overcame its capacity to cope and compromised the possibility to identify those patients 84 who needed immediate medical assistance [6] . 85 As pointed out by the Scientific Italian Society for Medical Emergency (SIEMS), the 86 number of calls to 112 for the Milan province was 5,086 on February the 16 th , before the outbreak, 87 and rapidly increased to 6,798 on February the 21 st and to 10,657 on February the 22 nd [7] . 88 The emergency service in Lombardy region is organized through three first-level PSAPs 89 (public-safety answering points), called CUR-NUE (Unique answering operating room / point -90 European emergency number), which forward the call to the most appropriate service, i.e. Police, Fire Department or Medical emergency rescue service. So, after the first assessment, calls requiring 92 medical assistance are sent to one of the four second-level PSAPs called SOREU (Regional Operating Rooms for Medical Emergency and Urgency), depending on the geographical area the 94 call is coming from in order to evaluate the patient and decide the most appropriate intervention. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint [8], Lombardy region created a regional toll-free number for CoViD-19, the first one in Italy. Other Italian regions created their own one in the following weeks, as well as other European countries 115 like Spain, Germany, Croatia, which were facing similar issues [9]. The 24/24-hour toll-free number was settled on February the 23 rd by AREU (Regional 117 Emergency Service Agency) and, although it helped to funnel non-urgent calls, it was not enough 118 because of the huge number of calls: for example on the second day it received more than 400,000 119 calls. Calls to the emergency services could be an important and helpful indicator of the spread of 121 the infection among the population, taking into account the possibility to analyze data regarding the 122 municipality from which the calls originated and the motivations that induced people to ask for fast 123 medical support. Statistical models could be used to assess the association of these data with new 124 cases of CoViD-19 in order to predict new epidemic hotspots on a municipal scale, or with a 125 smaller spatial scale for big cities. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20212415 doi: medRxiv preprint 6 In addition to usual public health indicators, social media data may also be used as probes of 127 the people behaviour according to the recent trends of digital epidemiology. As mobile technology 128 continues to evolve and proliferate, social media are expected to occupy an increasingly prominent 129 role in the field of infectious diseases [10-12] . controlling the spread of disease [11] . Aim of the study is to understand the correlation between the users calls to the emergency is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 16, 2020. Data about SOREU-118, NUE-112 and toll-free number daily incoming calls were only 162 available at a regional level. The very first days of the NUE and toll-free number time series have 163 been discarded due to the very intense population panic reaction which reflected into a very huge 164 amount of calls (whose peak was even higher than the following, new cases-related one). 165 Particularly, in the case of NUE they were inappropriate non-urgent calls (most of all for 166 information need), so they were not forwarded to the corresponding SOREU: indeed, in the SOREU 167 time series we do not observe any peak in the very first days. Moreover, this choice is justified if we 168 consider thatin case of a new epidemic burdenthere would not be such a powerful reaction, so 169 to the aim of predictability we can take into account just the subsequent new increase in the calls to 170 NUE and toll-free number, which is more related to the CoViD-19 dynamics. Twitter data, instead, 171 differently from the emergency calls, were not geolocalized. Finally, daily new cases have been 172 collected at the province level and then aggregated at the regional level. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20212415 doi: medRxiv preprint 8 Twitter data analysis 177 The monitoring of the communication dynamics on online social media has been conducted 178 on Twitter [18] . Specifically, the Twitter Search API (Application Programming Interface) was 179 used to collect all the tweets in Italian language containing the keywords "112" or "118" in the 180 body text. The data span the period from 2020.02.18 to 2020.06.29. In addition, the text of the 181 tweets was further filtered to identify the most common keywords related to the emergency, is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20212415 doi: medRxiv preprint 9 possess a huge amount of information content, thus it is of primary importance to reveal them 202 adequately. Among the others, the wavelet transform has relevant features such as a good capability is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20212415 doi: medRxiv preprint over time between the signals, we also performed a time domain analysis estimating their cross-227 correlation sequence. Cross-correlation could also be estimated using wavelets, specifically through a maximal is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 16, 2020. As already pointed out before, the following daily regional aggregated data were considered: In addition, we compared also Twitter data to the number of regional daily infected patients. The WCS and the MSWC were calculated for each of these time series in relation to the data 266 of regional daily infected. Indeed, both wavelet cross-power spectrum and coherence, through the 267 CWT, can show areas in the time-frequency space where two signals share common harmonic 268 components. In particular, the focus will be on the areas for which coherence is higher than 0. In the following figures (Figs 1-7) , the time courses of the smoothed and normalized series is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20212415 doi: medRxiv preprint (which has been converted to the equivalent Fourier frequency, cycles/day), and the color scale is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20212415 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20212415 doi: medRxiv preprint 14 the only one for which wavelet analysis does not display any relevant coherence compared to daily 327 new cases data is that of regional toll-free number incoming calls (Fig 1) . Instead, NUE regional 328 data and daily infected signals (Fig 2) display coherence over days from 18 to 22 at frequencies 329 around 0.25 cycles/day, with a phase lag from -126.4° to -134.7°, corresponding to a time delay 330 from 2.5 days to 2.6 days. Interestingly, days from 18 to 22 are confined between the two peaks, 331 since the NUE calls curve reaches the peak at day 16, while the infected curve reaches its maximum 332 at day 29. Not surprisingly, wavelet cross-spectrum and coherence analysis between regional daily 333 incoming calls to SOREU and infected people (Fig 3) shows an anomaly less limited over time and Even if not geolocalized, we finally compared regional epidemic time series with Twitter data 338 (Figs 4-7) : just considering the time courses, it is evident that the best potential predictor is the is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 16, 2020.  finally, the replies time series (Fig 7) shows a background even more localized both in time and 360 frequency, and an anomaly around the frequency of 0.14 cycles/day from day 19 to day 33, both 361 with a coherence value around 0.6. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20212415 doi: medRxiv preprint It is evident that, concerning the dynamics of communications and the context on social 375 media, Twitter activity (Figs 8-9) is not so strictly related to the epidemic dynamics, since it is 376 triggered most of all by social, political and chronicle news, which drive an emotional participation 377 of the users. Indeed, the first increase in all these time series (tweets, replies, likes and retweets), 378 from day -3 to day 1, precedes just the establishment of the red areas in Codogno and Vo' Euganeo, 379 while the second peak of daily tweets (Fig 8) at day 20 is related to the death of an operator of SRA 380 due to CoViD-19. Moreover, likes and retweets (Fig 9) trends look more aligned to the 381 announcements about lockdown policies. following: (i) daily regional incoming calls to NUE-112 (Fig 10) ; (ii) daily regional incoming calls 391 to SOREU-118 ( Fig 11) ; (iii) daily number of new tweets (Fig 12) . In the following figures, the 392 maximum of each function is depicted in red, and the confidence limits for the peak lag, deduced by is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 16, 2020. In addition, a sensitivity analysis of these results with respect to the amplitude of the initial 413 smoothing with a moving average filter was performed. The results are reported in Table 1 is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 16, 2020. (Table 1) , consequently it can be assumed that our results are robust with respect to this parameter. . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20212415 doi: medRxiv preprint represents a limitation for this kind of analysis. In our specific instance, it can be noticed that the 464 inferior limit of the frequency domain in the case of Twitter data is much lower than the one of the 465 daily regional emergency calls time series (112, 118, toll-free number), just because much more 466 samples for the Twitter data are available (Figs 1-7) . Consequently, in the cases of the calls to the is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 16, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20212415 doi: medRxiv preprint 21 indicators are currently under investigation and many of them will provide useful information, but 496 we should not only rely on indicators focused on detecting an increase in new cases, because the 497 main impact on the health system is more related to the characteristics of the infected population 498 rather than to the number of infected people. The severe countermeasures put in place, such as the national lockdown, had a deep impact 500 on the population from several points of view, not only on the health system. It is therefore 501 important to take into account the social reaction to the crisis and analyzing it is part of the public 502 health response. Our analysis shows that Twitter trends correlate more with social factors rather 503 than with the number of cases (Figs 8-9 ). This finding suggests that a thorough analysis of social 504 media would improve our understanding about what the most common worries, fears and feelings 505 of the population are, in order to address them through a public health strategy that should include a 506 proper use of social media to inform the population. Among all the Twitter data, only the daily 507 number of new tweets reveals some anticipation capability with respect to the epidemic curve: 508 wavelet analysis, indeed, detects a trend at the lowest frequencies, and a phase-lagged anomaly in a 509 frequency range centred around 0.25 cycles/day that occurs just between the two peaks (Fig 4) . This 510 finding is confirmed and consistent both with the 7-days distance of the two curve peaks and with 511 the cross-correlation analysis (maximum at -6 days and -3 days lags, respectively for the original 512 data and the modelled time series, with small confidence intervals, see Figs 12 and 15). is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20212415 doi: medRxiv preprint Heterogeneity of 521 COVID-19 outbreak in Italy Italian Department for Civil Defense CoViD-19 Italia. Monitoraggio situazione Novel Coronavirus (2019-nCoV) situation report -13 What can we learn about the Ebola outbreak from tweets? Monitoring 530 emergency calls and social networks for COVID-19 surveillance. To learn for the future: the 531 outbreak experience of the Lombardia region in Italy EENA recommendations for Emergency Services Organisations during the COVID-19 Data and strategies per country on emergency calls & public warning during 539 COVID-19 outbreak Tracking social media discourse about the COVID-19 pandemic: 542 development of a public coronavirus Twitter data set Trending on social media: integrating social media into 546 infectious disease dynamics The COVID tracking project Social media based surveillance systems for healthcare using machine 550 learning: a systematic review COVID-19: the end of lockdown 552 what next? Chiamate al 118, in Lombardia oltre il 30% per motivi respiratori e 554 infettivi. Picco il 16 marzo In Lombardia calano le telefonate al 112, il picco il 12 marzo. Resta 558 elevato il rapporto con i ricoveri R: unleash machine learning techniques The wavelet transform time-frequency localization and signal analysis Spectral analysis of signals Wavelet coherence analysis of dynamic cerebral 570 autoregulation in neonatal hypoxic-ischemic encephalopathy A practical guide to wavelet analysis Essential wavelets for statistical applications and data analysis Wavelet methods for time series analysis An introduction to wavelet analysis in oceanography and 578 meteorology: with application to the dispersion of Yanai waves On the 'probable error' of a coefficient of correlation deduced from a small 581 sample A method to estimate the statistical significance of a correlation when the data are 583 serially correlated Coronavirus: more than a third of people in Italy's COVID-19 epicentre estimated to have had disease Disposizioni attuative del decreto-legge 23 febbraio 2020, n. 6, recante misure urgenti in 589 materia di contenimento e gestione dell'emergenza epidemiologica da COVID-19 Italian. 591 30. Misure straordinarie ed urgenti per contrastare l'emergenza epidemiologica da COVID-19 e 592 contenere gli effetti negativi sullo svolgimento dell'attività giudiziaria. Italian Law Decree n is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprintThe copyright holder for this this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20212415 doi: medRxiv preprint It is made available under a perpetuity.is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprintThe copyright holder for this this version posted October 16, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprintThe copyright holder for this this version posted October 16, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprintThe copyright holder for this this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20212415 doi: medRxiv preprint