key: cord-0112163-0w2a3ay1 authors: Rollier, Michiel; Miranda, Gisele; Vergeynst, Jenna; Meys, Joris; Alleman, Tijs; Vijver, Ellen Van De; Surveillance, the Belgian Collaborative Group on COVID-19 Hospital; Nopens, Ingmar; Baetens, Jan title: Mobility and the spatially heterogeneous spread of SARS-CoV-2 in Belgium date: 2022-02-23 journal: nan DOI: nan sha: bf941d80e7a0e675efd1a5a38544566890a34b60 doc_id: 112163 cord_uid: 0w2a3ay1 Given that social interactions drive the spread of infectious diseases amongst humans, one anticipates that human mobility in Belgium affected the spread of COVID-19 during both 2020"waves". Measures against this spread in turn influenced mobility patterns. In this study, we analyse and mutually compare time series of COVID-19-related data and mobility data across Belgium's 43 arrondissements (NUTS 3 level). First, we confirm that overall mobility did in fact change significantly over the consecutive stages of the pandemic. So doing, we define a quantity that represents the degree of mobility between two arrondissements: the"connectivity index". Second, we analyse spatio-temporal COVID-19-related incidence and hospitalisation data using dynamic time warping and time-lagged cross-correlation. This allows us to quantify time lag and morphological similarities between localised waves. Third, by coupling those analyses, we conclude that mobility between arrondissements is in fact indicative of the spatio-temporal spread of COVID-19 in Belgium, as was previously shown for other EU countries. This conclusion supports the need for mobility analysis and/or control during a pandemic, and by extension motivates the development of our spatially explicit Belgian metapopulation model. covid-19 is a respiratory disease caused and spread by sars-cov-2, an infectious coronavirus. The first confirmed case in Belgium was a repatriated person from Wuhan [1] , who tested positive on February 4, 2020 but did not spread the disease further (see Figure 1 for a chronological overview). Returning vacationers from affected regions in Italy led to additional importations during the half-term holidays [2] , and by early March 2020, disease transmission in Belgium was confirmed. On March 13, 2020, the Belgian government imposed the first measures to control the virus spread. On March 18, 2020, the measures were tightened to a lockdown. Two days later the Belgian borders were closed for non-essential travel, meaning that from that moment on crossborder mobility was severely restrained. Restrictions were gradually eased in May, June and July 2020. Upon the emergence of a second covid-19 wave in September-October, measures were again restricted on October 19, 2020. To inform policymakers about the forecasted evolution of the pandemic and possible scenarios of disease spread related to changes in the containment measures, we developed a compartmental covid-19 metapopulation model operational at the national scale [4] that tracks the number of individuals in 10 different epidemiological and clinical stages and 9 different age groups. It was developed in parallel with other researchers with slightly varying approaches [5, 6] . The collective results of these efforts were finally bundled to obtain ensemble forecasts [7] . The aforementioned models work at the level of the entire Belgian population and hence do not capture the spatial heterogeneous nature of disease spread, despite the fact that human mobility plays an important role in the emergence of spatial disease dynamics [8] [9] [10] [11] . For what concerns covid-19, [12] demonstrated that the initial covid-19 spread in France and Spain Table 1 ). Khaki-coloured areas represent conventional school holidays, added in particular to match the time series plot for mobility, on which school holidays have a noticeable impact (see Figure 3 ). can be explained to a large extent by mobility between departments. Additionally, (initial) susceptibility and hence incidence may vary spatially as a consequence of differential mobility and a spatially varying sociodemographic context [13] [14] [15] [16] , and a recent geostatistical study pointed out that there is indeed a spatial structure underlying the spread of covid-19 in Belgium [17] . Moreover, the locality of imposed control and mitigation measures as well as their effective implementation further increases the spatial heterogeneity of the disease dynamics. To account for this, [18] [19] [20] and [21] developed spatial models to investigate and simulate covid-19 spread in Spain, Brazil, France and the United Kingdom, respectively. With a road density of more than 500 km per 100 km 2 in 2011 [22] and a population density of 374 inhabitants per km 2 in 2020 [23] , Belgium is one of the most connected countries in the world. Still, small-scale spatial heterogeneity is observed nonetheless [17] , making Belgium both heterogeneously and highly connected. The relationship between mobility and viral spread was convincingly observed in large EU countries [12] , but these countries are admittedly (much) more heterogeneous than Belgium. In this study, we therefore investigate whether these relationships are also observed in a small and highly connected country like Belgium. We first define a quantitative measure for mobility between the Belgian arrondissements and confirm that non-pharmaceutical interventions strongly affected the overall mobility during the pandemic. Second, we rely on dynamic time warping and time-lagged cross correlation to unveil relationships with covid-19-related time series associated with the same arrondissements. Third -and essentially -, these separate analyses are combined, demonstrating quantitatively that a high degree of mobility between arrondissements does influence the spread of sars-cov-2 both regarding time lag between arrondissements and overall shape of the local waves. As such, it is meaningful to not model Belgium as a homogeneous pool, but rather to extend current covid-19 metapopulation models for Belgium [4] [5] [6] in a spatially explicit fashion. 2 Data collection and preprocessing 2 All analyses in this paper were performed at NUTS 3 level. The NUTS 3 level in the Nomenclature of Territorial Units for Statistics (NUTS) corresponds to administrative units with an average population size between 150 000 and 800 000. In Belgium, there are 43 such administrative units, which are called arrondissements ( Figure 2 ). Three reasons motivate this choice of spatial resolution: 1) we guarantee consistency and comparability with other spatial analysis papers within Europe (e.g. [12] ); 2) we avoid troublesome analysis of overly noisy time series associated with smaller geographical units; and 3) we avoid unneeded complication arising from the EU GDPR legislation when working with privacy-sensitive data at higher spatial resolution. In 2020 Belgium faced two covid-19 waves. The first one we defined from March 1 2020 (first confirmed covid-19 cases) until July 1 2020, where the end date was chosen to coincide with the start of the summer holidays, during which incidences were low despite lifting of the most stringent restrictions. The second wave started on September 1, 2020 (reopening of schools and hence a sudden increase in contact dynamics) and was flattened again by the end of December 2020. Due to the presumed delayed effect of the pandemic on mobility, it is informative to include an additional analysis on the "ascending" and "descending" phases of the waves. The transition between both phases is determined by the date at which the nationwide maximum number of hospitalisations was registered for the respective waves (Table 1 ). We analysed spatio-temporal data on excess deaths, and covid-19-related cases and hospitalisations. Daily data on confirmed cases and hospitalisations due to covid-19 were provided by the Belgian public health institute Sciensano ( Figure 1 ). An aggregated format of these datasets at various NUTS levels is publicly available [3] . For our analyses, we considered all incidence and hospitalisation data at NUTS 3 level and expressed all data proportionally per 100 000 inhabitants. We opted not to include time series of confirmed covid-19 deaths in our analysis because these data were only available at weekly basis, and data during the first weeks of the pandemic were not available at all. Moreover, these data are not easily comparable between countries because of differences in death count conventions (e.g. only hospital deaths, only confirmed deaths or suspected deaths included) and in testing regime. This issue was addressed by using data on excess deaths to assess covid-19 mortality instead [24] . These data are not only internationally comparable but also account for potential indirect covid-19 deaths by an overload of the health system or co-infections. In our study, excess deaths are defined in relation to local mortality data from 2009 to 2019. As a proxy for Belgian mobility, we used positioning data from mobile phones connecting to transmission towers. These data were provided by Proximus, Belgium's largest telecommunication company with a market share of 30-40% in terms of active SIM cards [25] . These data account for the movement of 25 to 50% of the population in any given arrondissement, hence we may safely assume this is representative of the entire arrondissement's population. The provided data sets contain information on the daily number, duration (henceforth "staytime"), and location of visits by Proximus users, at NUTS 3 level, from which origin-destination matrices (ODMs) can be compiled. Records with a visit count smaller than 30 are censored (as "< 30") to comply with EU GDPR legislation. The data were upscaled to the entire arrondissement population by using data on the number of Proximus customers and inhabitants in each arrondissement. As an example, the staytime time series for mobility from Brussels-Capital to Antwerpen (NIS 21000 to 11000) is shown in Figure 3 . Our analyses are focused on the covid-19 epidemic in 2020. Similar to many other countries across the world, Belgium was largely isolated in 2020 as a consequence of the imposed containment measures, especially during the defined waves (see Figure 1 ). This rather unique situation justified ignoring cross-border mobility and allowed for the study of Belgium as a quasi-independent geographic entity. To verify to what extent the pandemic affected mobility for all arrondissements, we compared the average daily outward mobility during the ascending and descending phases of both the first and second 2020 covid-19 waves with prepandemic mobility, per 100 000 inhabitants. The former was computed as: where g denotes the considered (home) arrondissement, h the visited arrondissement, ∆t the duration of the considered period and N g the population of arrondissement g. To demonstrate that the spread of covid-19 between arrondissements can be correlated to their bilateral mobility, we followed the method proposed by [12] . As a quantitative standard for the connectivity between arrondissements, we defined a mobility-based connectivity index (CI). This is the logarithm of the daily number of hours P gh spent in arrondissement h by people living in arrondissement g and vice versa (P hg ), normalised over the duration of the considered time period ∆t: Note that the CI does not normalise over the number of inhabitants, as it is assumed that the absolute -rather than the relative -number of visitors will affect viral spread. Further note that, by definition, this index is indifferent to the direction of movement (i.e. CI gh = CI hg ). This reflects the assumption that sars-cov-2 is equally well spread geographically by healthy subjects visiting infected subjects, as vice versa. The preprocessed ("raw") covid-19 time series under consideration typically show high variability between days, especially when looking at smaller spatial units such as arrondissements. Part of this can be explained by differences in reporting between week and weekend, but much of the variation comes from the inherent stochastic nature of (detecting) infections, hospitalisations and deaths. A rudimentary one-week window rolling average as in Figure 1 still results in jagged series. More sophisticated smoothing of such data is required and routinely performed to meaningfully compare trends between different regions [26] . Additionally, it is of interest to systematically quantify the expected variability around the raw data. To tackle both issues, we chose to model each time series using a Generalised Additive Mixed Model (GAMM) approach with a log link [27] (see Supplementary Material). Figure 4 illustrates the result of the GAMM approach for two Belgian arrondissements, Brussels-Capital and Charleroi, using as input the number of confirmed cases per 100 000 inhabitants. This plot also visualises the corresponding rolling average with a one-week window. Note that the incidence peaks in Figure 4 appear to occur after the national hospitalisation peak. This is due to the fact that testing was not yet up to date during the first wave. We assume that this was the case homogeneously throughout Belgium, such that incidence waves may still be mutually compared. As can be seen in this figure, the GAMM approach results in smoother curves whilst keeping relevant epidemic trends. Additionally, this method allow us to incorporate both autocorrelation and overdispersion, which less advanced methods do not. We simulated a set of 100 GAMM time series per wave for every arrondissement (see Supplementary Material). This set is used to repeatedly perform the analyses between two time series (see below), resulting in an equally large set of numerical values. The distribution of these values is in turn used to assess the uncertainty of the analysis in a methodological and quantitative fashion. We remark that a similar approach is not required for the mobility time series, because these data are much less noisy to begin with, and because we will mostly use a cumulative quantity, rather than daily data. We systematically compared incidence and hospitalisation time series for 43 × 42/2 = 903 unique arrondissement couples, for both the first and the second 2020 covid-19 wave. Local waves will generally resemble the epidemic behaviour at the national level, but minor differences between arrondissements in amplitude and timing can be indicative of the geographical direction Figure 4 . We are not interested in the value of the correlation, but in the horizontal position of the maximum. Right: illustration of the alignment enforced by the dynamic time warping between the same two time series. This particular normalised DTW distance is 0.81. and intensity of the viral spread. We anticipated that strongly connected arrondissements would have near-synchronous time series with a highly analogous shape. We assessed morphological similarities between smoothed time series using dynamic time warping (DTW). Difference in timing (lag) was computed using time-lagged cross-correlation (TLCC). Both methods first impose standardisation to zero mean and unit standard deviation, such that the analysis on absolute and relative (per-100k) time series generates identical results. TLCC quantifies how synchronised two time series are by computing the correlation between them at different time shifts. The highest correlation indicates the time lag with which one signal leads the other [28, 29] . For instance, Figure 4 suggests visually that the peak of the confirmed number of cases in Brussels occurs earlier than in Charleroi. This is reflected by the maximum TLCC occurring at a lag of −6 days ( Figure 5 ), which allows us to conclude that Brussels was about 6 days ahead of Charleroi for what concerns the number of confirmed cases. Note however that this alone does not imply any type of causation. DTW gauges morphological similarity by minimising a Euclidean measure of distance between two time series, after standard score normalisation and after synchronisation based on TLCC values. The latter precondition is important in order to ascertain that the DTW value does not intrinsically correlate with the TLCC value. Minimising said distance can be thought of as stretching or compressing the second series with the aim of having it resemble as much as possible the first (reference) series [26, 30] , as shown in the right panel of Figure 5 . This requires that a DTW value ("distance") of 0 indicates identity. As the DTW distance scales with the number of analysed data points, will normalise over the duration of the considered period. During both phases of the first covid-19 wave, the average mobility outside the home arrondissement dropped substantially in comparison to the baseline mobility ( Figure 6 ). This is less pronounced in the ascending part of the first wave, because the stringent national lockdown was installed only halfway this period. This is corroborated by the staytime time series shown in Figure 1 . The latter also indicates that mobility fluctuates periodically as a consequence of the weekends. During the entire descending part of the first wave, teleworking, closure of non-essential shops and a ban on non-essential long-distance movements were imposed. Consequently, it is clear that the imposed lockdown measures had an immediate effect on the daily mobility. In contrast, during the ascending part of the second wave, mobility was similar as in pre-covid-19 times. This indicates that in terms of mobility within Belgium, the Belgian population behaved almost as before the start of the covid-19 pandemic by September 2020. Of course, with the rapidly increasing numbers throughout September and October 2020, measures were gradually imposed again and social behaviour changed. As a result, the mobility dropped again significantly during the descending part of the second wave as compared to the baseline to a level similar as during the first wave. Clearly, the fact that the level of mobility was not consistent during the entire second covid-19 wave, makes it harder to investigate the interplay between mobility and covid-19 spread for this wave. Second, the (symmetric) heatmaps containing CIs are shown and compared for the two waves ( Figure 7) . The change in mobility over the waves was relatively consistent over the different arrondissements (about 10% higher during the second wave), with some exceptions for arrondissements Arlon, Bastogne, Virton, Veurle and Philippeville. The latter can be attributed to their low number of inhabitants (making any change in mobility relatively large). This consistency supports the use of these CIs as a proxy for connectivity [12] . In general the mobility patterns did not differ much between the 2020 covid-19 waves, so we only report the CI matrix for the first covid-19 wave, whilst the one for the second wave can be found in the heatmap in A2. From this heatmap it can be inferred that most arrondissements containing a province capital are relatively well connected to any other arrondissement, with Arlon being the only exception to this rule. The latter is expected because of its location and the fact that a considerable number of its inhabitants commute to the Grand Duchy of Luxembourg. Further, we may conclude that the least connected arrondissements can be found in the province of Luxemburg, but also along the Walloon stretch of the French border (Neufchateau, Philippeville, Thuin, Virton). On the other hand, the most connected arrondissements are typically located in Flanders, in addition to Brussels-Capital, which is in agreement with former studies on the general metropolitan connectivity in Belgium [31] . All results presented above are in line with the recent comprehensive work by [32] . We now turn to the analysis of covid-19 incidence and hospitalisation data. After generating 100 bootstrapped time series for each arrondissement, we computed the TLCC and DTW values for all GAMM-fitted series between each unique pair of arrondissements. From this set of values, we calculated the mean and standard deviation of the TLCC lag and DTW distance. The heatmaps in Figure 8 show the average time lag (in days) over all bootstrapped time series for each pair of arrondissements. The corresponding standard deviation can be found in the Supplementary material. From Figure 8 -(a), we infer amongst other things that covid-19 hospitalisations in Brussels-Capital during the first 2020 wave were about 2.34 ± 0.68 days ahead of the ones in the arrondissement Antwerpen. According to Sciensano, the regions in the southwest of province Limburg and the so-called Borinage region containing the province capital Mons were defined as the initial clusters [3]. These regions correspond with the arrondissements Tongeren, Hasselt and Mons. Upon inspection of Figure 8 , this is most clearly visible for Mons. For Tongeren and Hasselt this is less convincingly so, arguably because the TLCC gauges the lag of the entire wave, whilst Sciensano identified initial clusters based on local index patients. Interestingly, a distinction can be made between both waves when it comes to TLCC values, as opposed to the mobility patterns that did not change much over the two 2020 covid-19 waves (Figure 7) . The lags observed during the second wave are much larger than the ones observed during the first wave. This is directly related to the duration of the second wave, which was longer and also resulted in a higher number of hospitalisations. Additionally, the second-wave heatmap shows lower variability. This can of course be understood as the result of national homogenising during the course of the covid-19 crisis, especially during the summer when mobility was almost at the prepandemic level ( Figure 6 ). This suggests that a spatial analysis is especially pertinent at the start of a pandemic. Similarly, Figure 9 provides a comparison between arrondissements for both 2020 covid-19 waves in terms of the average DTW distance between the corresponding hospitalisation time series and for each bootstrapping realisation. The heatmaps in Figure 9 illustrate the similarity between covid-19 waves in terms of DTW values. For the first wave, the largest distances are observed for Diksmuide, Ieper and Huy, which also have larger TLCC values. For the former two arrondissements this can be understood by observing that they are amongst the least connected arrondissements in Flanders according to Figure 7 . The highest average DTW value was observed between Diksmuide and Huy, 0.52 ± 0.10 (figure 9), which is in agreement with the low connectivity between them (Figure 7) . For the second wave, Maaseik and Tielt are more distant from the other arrondissements in terms of the DTW distance and both are lagging behind on the others, as can be seen in Figure 8-(b) . The highest average DTW value for the second wave was observed between Maaseik and Liège, 0.69 ± 0.17, even though the connectivity between Maaseik and Liège is not that low. In line with the mobility restrictions that were less strict Fig. 9 Normalised dynamic time warping distances computed between each unique pair of Belgian arrondissements using data on hospitalisations, for first (a) and second (b) covid-19 waves. during the second 2020 wave, and in agreement with the TLCC results, this indicates that the link between mobility and covid-19 spread was much more cluttered during the second 2020 wave. We computed the CI involving each arrondissement and one of the initial clusters (Tongeren, Hasselt and Mons), and compared it to the corresponding number of excess deaths, for each week following the introduction of covid-19 in Belgium. This is the first indication confirming initial clusters and gauging their influence on sars-cov-2 spread. We verified the existence of a linear relationship between excess deaths and the CI to either of the initial clusters for every week since the start of the first wave in Belgium, as demonstrated before by (author?) [12] for other countries in Europe. From mid-March until the beginning of May, the initial spread of COVID-19 in Belgium was indeed related to the connectivity to the initial clusters located in the arrondissements Tongeren, Hasselt, and Mons ( Figure 10 ). This relationship is the most pronounced for Tongeren with a maximum R 2 of 0.48 in the week of April 22, 2020 (Figure 11, left) . In this week, the most connected arrondissements to Tongeren (such as Hasselt and Liège, see Figure A1 ) had a high percentage of excess deaths, whilst poorly connected arrondissements had fewer excess deaths. By the week of May 13, 2020, this relationship had weakened and connectivity to Tongeren no longer influenced the number of excess deaths (Figure 11 , right). This decline can be explained by the fact that by that time covid-19 had spread across the entire country, so sources of covid-19 infection were spread more homogeneously across Belgium. Remarkably, the influence of connectivity (i.e. R 2 -values ¿ 0.3) occurred mainly after the first hospitalisation peak and well after the start of the first lockdown, in a period when the mobility to and from the initial clusters had already reached lockdown level (Figure 10, bottom) . In France, on the other hand, the influence of connectivity to the Haut-Rhin department on the initial spread disappeared 14 days after the first lockdown measures [12] , which corresponds to the period between first symptoms and death [33] . The extra delay in response of the virus spread to mobility changes in Belgium can be explained as follows: in contrast to the case of the Haut-Rhin department in France, no big spreading event was documented in one of the initial clusters in Belgium, such that the effect on the percentage of excess deaths was more gradual. Furthermore, Belgium is a very connected country, and hence the difference between most and least connected arrondissements is smaller. The combination of both aspects makes the relationship between connectivity and initial spread in general less clear (lower R 2 -values) than in the study of [12] . In a final step, we also evaluated whether the connectivity indices CI gh during the 2020 covid-19 waves (and their ascending/descending parts) are reflected in the average DTW distances and TLCC absolute-value lags between arrondissements g and h. For this purpose, we computed the Pearson's correlation and the Spearman's rank correlation coefficients for all unique (g, h) pairs. The Pearson's coefficient R is used to identify a linear relationship between data; the Spearman's coefficient ρ accounts for monotonic relationships that are not necessarily linear [34] , and is less sensitive to outliers. We consider both quantities because we are a priori agnostic on the nature of the The results for covid-19 hospitalisations are shown in Figure 12 . Irrespective of the considered wave, the coefficient always point towards a negative correlation, both for the TLCC lag and the DTW distance. This indicates that covid-19 did in fact spread in a more similar fashion (low DTW values) in highly connected arrondissements, whilst this similarity decreased with a decreasing connectivity between the arrondissements. Additionally, it demonstrates that covid-19 time series of strongly connected regions quickly succeed each other (low time lags). The highest Pearson's and Spearman's correlation coefficients are retrieved for the DTW values, in particular for the first wave. In order to complement the results shown in Figure 6 , these correlation analyses have been accomplished for the ascending and descending parts of the waves as well. These results are less convincing when analysing only the ascending or decending phase of the wave. Correlations for the DTW values are shown in Supplementary Material (Figures A6 and A7) , but are left out entirely for the TLCC values, because they contained no meaningful information. This demonstrates that the analysis of the entire wave is needed to properly assess relations between morphology, synchronicity, and connectivity. We found a more quantitatively endorsed value of this correlation, by computing the Pearson's and Spearman's rank correlation coefficient for all (Tables A2 and A3 ). As explained in section 4.1, lockdown measures were established in a more consistent way during this entire first wave. The late adoption of lockdown measures during the second wave probably complicates unveiling a clear relationship between the mobility and similarity in disease evolution because ad hoc movements across a larger distance were again allowed. The correlation coefficients provide a clear indication that mobility is related to the spatial spread of covid-19. This confirms results from the study by [35] , where a (non-linear) spatial linkage between covid-19 and mobility is observed for Belgium as well. Still, there are several other mobility-related factors that must have also played a role, but that could not be included in our analysis since relevant data are not available at the appropriate spatiotemporal resolution. These factors include the reason behind a mobility event (work, leisure, education,. . . ), the number of stops until the final destination of one mobility event, and so on. Furthermore, by working at NUTS 3 level, we ignored the role of short-distance mobility (mobility within an arrondissement) due to, for instance, local shopping, leisure and educational activities. Taking those short-distance movements into account, would probably allow us to identify a stronger relationship between mobility and covid-19 spread, as also [17] pointed out that the spatial autocorrelation between covid-19 incidence dropped beyond 15 kilometres for the first wave -whilst mostly exceeding 50 kilometres during the second. This again indicates that the assumption of homogeneity within the arrondissements seems to be more valid at the start of the pandemic. Still, an even more fine-grained analysis would require the use of covid-19-related time series at municipal level, which are generally very noisy due to the relatively low number of cases and hospitalisations per municipality, and hence complicating further statistical analysis without gaining much insight. Results in this paper were presented in three stages that build up to the main conclusion: mobility within Belgium is indicative for the spatio-temporal spread of sars-cov-2. We first showed that mobility between geographical regions at NUTS 3 level (arrondissements) in Belgium was reduced during the first and second wave of the covid-19 pandemic in 2020. Particularly, we quantified the strength of the link between the connectivity to arrondissements that were affected first on the one hand, and local excess deaths on the other hand. Second, we quantified time lag and morphological similarity between local covid-19-related time series using dynamic time warping and time-lagged cross-correlation. This approach proved to be meaningful and intuitive for both incidence and hospitalisation data when considering the entire wave. Third, we assessed the strength of the relationship between the connectivity index of pairs of arrondissements on the one hand, and DTW or TLCC values on the other. We observed a significantly nonzero anticorrelation, notably for DTW distances. This confirms that strongly connected arrondissements will exhibit morphologically similar time series, that will be (on average) roughly synchronised. The techniques developed in and conclusions drawn from this research demonstrate that a spatio-temporal data analysis of mobility and epidemiological data at NUTS 3 level in Belgium is feasible and informative. This motivates data analysis for other sociological and demographic aspects of society that may be employed as a metric for the current pandemic. Moreover, the conclusions imply that a spatially explicit model of covid-19 in Belgium ought to include a notion of mobility. The additional model complexity is justified by the demonstrated impact of mobility on properties of local time series. Supplementary information. This article is associated with a supplementary document containing additional information on the Belgian geography, a more in-depth discussion of the GAMM fitting and bootstrapping, and with additional results. It is currently included as an appendix. Vergeynst contributed equally to this work. Jenna Vergeynst was responsible for early exploratory work with time series analysis, with early examination of mobility trends, and with the full analysis of excess deaths. Gisele Miranda was occupied with the realisation of GAMM fits, the calculation of corresponding DTW and TLCC values, and the conception of the heat plots with mean values and standard deviations of these quantities. Michiel Rollier was responsible for processing of the mobility data, development of the correlation coefficients for the scatter plots, and for general overview, formatting and wording of the document. Joris Meys made important contributions regarding the technical background of GAMM fitting, as well as scrutinising statistical methods. Tijs Alleman contributed to the interpretation and processing of the covid-19 time series from a modeller's point of view. Ellen Van De Vijver made critical suggestions and corrections from a geostatistical point of view. The Belgian Collaborative Group on covid-19 processed and provided all data required for this analysis, and commented on its analysis. Ingmar Nopens is Tijs Alleman's doctoral advisor. Jan Baetens is Michiel Rollier's doctoral advisor. Both, and in particular the latter, gave formal advice and structural guidelines, and maintained the focus of this work. All authors were in close communication and reviewed the final document. All authors consent to its submission to the arXiy. Funding. This work was supported by the UGent Special Research Fund, Belgium, by the Research Foundation Flanders (FWO), Belgium, project numbers G0G2920 and 3G0G9820 and by VZW 100 km Dodentocht Kadee, Belgium through the organization of the 2020 100 km COVID-Challenge. Further, the computational resources and services used in this work were also provided by the VSC (Flemish Supercomputer Center ) , funded by FWO, Belgium and the Flemish Government . The funding sources played no role in study design; in the collection, analysis and interpretation of data; in the writing of the report; nor in the decision to submit the article for publication. A.1 Additional geographical information Figure A1 illustrates the CI to Tongeren, the major initial cluster. Tongeren is most strongly connected to Hasselt and Liège. Naturally, a strong correlation between CI and geographical distance is observed. Smoothing of time series and systematic estimation of data variability is used in this paper in order to establish a meaningful comparison between time series. This includes a quantification of the uncertainties associated with the values calculated from the dynamic time warping and time-lagged cross-correlation algorithms. Our approach makes use of GAMM fitting and bootstrapping [27] . For every time t we modelled the expected count of events E(Y t ) per 100k inhabitants as E(Y t ) = exp(α + s(X) t β) + t . In this equation, α represents a general intercept, s(X) t the t-th row of a matrix with the projection of time vector X on a second-order penalised B-spline basis and β a vector of coefficients for the fixed effects. We chose the dimension of the spline basis to roughly match the number of weeks in the dataset. This allows for enough detail to catch peaks in number of events, whilst the penalisation helps the model withstand erratic fluctuations [36] . The residuals t are modelled according to a first-order autoregressive process (AR1) to account for the auto-correlation in the time series. Estimation was done using a quasi-likelihood method that assumes a linear relation between the mean and the variance, to account for the documented overdispersion in covid-19 transmission [37] . For the details on parametrisation and estimation, we refer to [27] . For every raw time series, we simulated 100 GAMM curves using a parametric bootstrap procedure based on the spline coefficients. In this procedure, we treated the coefficients as originating from a multivariate normal distribution, with the estimated value as mean and the estimated variance-covariance matrix of these coefficients as the variance structure. For every simulated time series a vector with coefficient values β was randomly drawn from this distribution, and multiplied with the matrix s(X t ). We carried out all calculations with the mgcv package [27] in the statistical software R [38] . Note, for completeness, that due to the log transformation the model fit becomes unstable if there are long periods without events. To increase numerical stability, we simply added 1 to the data prior to fitting in these cases. This correction was subsequently subtracted from the resulting predictions. For each arrondissement, we computed 100 time series using the bootstrapping approach. An overview of the obtained time series is shown in Figure A3 for four different arrondissements: Brussels and Antwerpen (the two most populated arrondissements in Belgium), and Diksmuide and Philippeville (amongst the least populated arrondissements). This figure shows the 95% confidence intervals, given all times series generated for each illustrated region using the bootstrapping approach, together with the observations, the GAMM fitted curve and the one-week rolling average for the covid-19 hospitalisations during the first epidemic wave. It is clear that the GAMM fitted curve is much smoother than the one defined by the rolling average. As expected from the lower signal-to-noise ratio for sparsely inhabited arrondissements, their confidence intervals are much wider. Essentially, we use GAMMs as a more advanced smoothing technique. Contrary to moving averages and other local smoothing techniques, they allow us to incorporate both autocorrelation and overdispersion. The resulting smooth curves are in general less sensitive to erratic fluctuations in the data, whilst still sufficiently flexible to describe general trends (and differences between these) in a time series of number of events ( [36] , [27] ). The GAMM framework also offers a more formal estimation of the uncertainty on the parameters. This allows for a computationally efficient method to construct bootstrap samples from these general trends. A possible weakness lies in the fact that this estimate of uncertainty relies heavily on a number of assumptions. First of all, we assume that the β coefficients follow a multivariate normal distribution. Whilst this distribution is not guaranteed, the used Laplace approximation performs well for a sufficiently large sample size [39] . We argue that the amount of data used is sufficient to expect little deviation from this assumption. Second, we assume a linear relationship between the mean and variance of predictions. This approach, often referred to as quasi-Poisson, has been used in numerous other analyses (e.g. (author?) [40] , (author?) [41] , (author?) [42] and others ). Yet, assuming a quadratic relationship between mean and variance would be more equivalent to the negative binomial distribution assumed by [37] . On the other hand, such an approach would give larger values less weight in the fit compared to the quasi-Poisson method [43] . Comparison of both methods for a selection of arrondissements showed that assuming a quadratic relationship would lead to a systematic underestimation of the peak height. By analysing the squared deviation from weekly averages, we concluded that the linear assumption could be defended. Only for the few cases with a very large number of events, variance would be underestimated. In the majority of cases, the linear relation would slightly overestimate the variance, making the approach more conservative. Figure A4 contains TLCC values for first and second wave incidence data. Figure A5 demonstrates the average DTW values and the corresponding standard deviations, calculated from 100 GAMM realisations for first and second wave incidence data. These values quantify a correlation between connectivity on the one hand, and morphological similarity on the other. Similar scatter plots for TLCC values have been left out, because these values can only be meaningfully calculated from an analysis of the entire wave, as in Figure 12 . The same information as Figure A6 , but from the descending parts of the waves. Nopens, I., Baetens, J.M.: Assessing the effects of non-pharmaceutical interventions on SARS-CoV-2 spread in Belgium by means of a compartmental , age-stratified , extended SEIQRD model and public mobility data. Epidemics 37, 100505 (2021) [5] Abrams, S., Wambua, J., Santermans, E., Willem, L., Kuylen, E., Coletti, P., Libin, P., Faes, C., Petrof, O., Herzog, S.A., Beutels, P., Hens, N.: Modeling the early phase of the belgian covid-19 epidemic using a stochastic compartmental model and studying its implied future trajectories. medRxiv (2020 One repatriated Belgian has tested positive for the novel coronavirus Returns from Italy push COVID-19 tally higher the Belgian Collaborative Group on COVID-19 Hospital Surveillance, 0.pdf Accessed How the individual human mobility spatio-temporally shapes the disease transmission dynamics The role of population heterogeneity and human mobility in the spread of pandemic influenza Quantifying the impact of human mobility on malaria Impact of human mobility on the emergence of dengue epidemics in Pakistan How human mobility explains the initial spread of COVID-19 Transportation of coronavirus disease from wuhan to other cities in china Population flow drives spatio-temporal distribution of covid-19 in china The role of spatio-temporal information to govern the covid-19 pandemic: A european perspective Spatial analysis of covid-19 incidence and the sociodemographic context in brazil Geostatistical analysis of one year of COVID-19 data in Belgium A mathematical model for the spatiotemporal epidemic spreading of COVID19. medRxiv Metapopulation modeling of COVID-19 advancing into the countryside: an analysis of mitigation strategies for Brazil. medRxiv A parsimonious approach for spatial transmission and heterogeneity in the COVID-19 propagation: Modelling the COVID-19 propagation Modelling and predicting the spatiotemporal spread of covid-19, associated deaths and impact of key risk factors in england Statbel: Number of deaths per day, sex, age, region, province, district Alignment of curves by dynamic time warping Generalized Additive Models: An Introduction with R, 2nd edn Analysis of detrended time-lagged cross-correlation between two nonstationary time series Time lags between exanthematous illness attributed to zika virus, guillain-barré syndrome, and microcephaly Computing and visualizing dynamic time warping alignments in r: the dtw package Metropoolvorming in België en Vlaanderen:De polycentrische ruimtelijke structuur van de arbeidsmarkt Excess deaths associated with covid-19 pandemic in 2020: age and sex disaggregated time series analysis in 29 high income countries The incubation period of coronavirus disease 2019 (CoVID-19) from publicly reported confirmed cases: Estimation and application Pearson correlation coefficient Non-linear spatial linkage between COVID-19 pandemic and mobility in ten countries: A lesson for future wave Flexible smoothing with B-splines and penalties Estimating the overdispersion in covid-19 transmission using outbreak sizes outside china R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing Some asymptotic results on generalized penalized spline smoothing Emergence of kawasaki disease related to sars-cov-2 infection in an epicentre of the french covid-19 epidemic: a time-series analysis Coronavirus Disease 2019 Pandemic: Impact Caused by School Closure and National Lockdown on Pediatric Visits and Admissions for Viral and Nonviral Infections-a Time Series Analysis Forecasting the 2020 covid-19 epidemic: A multivariate quasi-poisson regression to model the evolution of new cases in chile Quasi-poisson vs. negative binomial regression: How should we model overdispersed count data? Acknowledgements. The authors wish to express their gratitude to Proximus, Belgium's leading telecom operator, for their generous commitment in the context of the Task Force 'Data & Technology against Corona'; in particular for providing daily detailed mobility data.