key: cord-1039159-auvn8rca
authors: Selinger, C.; Choisy, M.; Alizon, S.
title: Predicting COVID-19 incidence in French hospitals using human contact network analytics
date: 2021-06-23
journal: nan
DOI: 10.1101/2021.06.17.21258666
sha: 6b48f83f0bdca86fb3b509caddb9270092dca5bb
doc_id: 1039159
cord_uid: auvn8rca

Coronavirus disease (COVID-19) was detected in Wuhan, China in 2019 and spread worldwide within few weeks. The COVID-19 epidemic started to gain traction in France in March 2020. Sub-national hospital admissions and deaths were then recorded daily and served as the main policy indicators. Concurrently, mobile phone positioning data have been curated to determine the frequency of users being colocalized within a given distance. Contrarily to individual tracking data, these can provide a proxy of human contact networks between subnational administrative units. Motivated by numerous studies correlating human mobility data and disease incidence, we developed predictive time series models of hospital incidence between July 2020 and April 2021. Adding human contact network analytics such as clustering coefficients, contact network strength, null links or curvature as regressors, we found that predictions can be improved substantially (more than 50%) both at the national and sub-national for up to two weeks. Our sub-national analysis also revealed the importance of spatial structure, as incidence in colocalized administrative units improved predictions. This original application of network analytics from co-localisation data to epidemic spread opens new perspectives for epidemics forecasting and public health.

The COVID-19 pandemic revealed the importance of identifying early predictors of epidemic dynamics. Indeed, for this infectious disease, hospital admission data is usually the most reliable indicator but it suffers from a two weeks delay with the current state of the epidemic [1, 2] . Screening test results can provide closer monitoring but they often suffer from strong sampling biases.

Mobility data gathered daily by telephone providers and internet services can help to understand epidemic spread, 40 as shown early in the pandemic using individual mobility [3] . More recently, [4, 5, 6] observed that epidemic growth ratios and effective reproduction numbers correlated with human mobility, whereas other studies have used mobility data as early predictors for disease incidence [7, 8, 9] . For instance, [10] used mobile phone location data provided by Google Mobility Trends to calibrate individual movements. However, a limitation of such data is that it only involves individual movement but disease transmission requires at least two people. Here, we use data from Facebook of individuals with respect to their administrative unit of residence (see Methods for details). Similar data has been used by [11] and [12] to simulate meta-population models and investigate the effect of interventions on the resulting network. However, to our knowledge, this data has not yet been used to predict key features of the epidemic such as hospital admissions or deaths. 50 The body of literature for COVID-19 time series analyses linking epidemiological data to Public Health measures or environmental factors is already widespread, ranging from cross-correlation analysis to complex predictive machine learning tasks [13, 14, 15, 16, 17, 18, 19] . Some methods such as neural networks provide powerful predictions, but the interpretation of the results is offset by the method's convolutive nature. Here, we choose a middle ground and a more straightforward approach by combining standard Autoregressive Integrated Moving Average (ARIMA) models 55 with original regressors of human contact derived from colocation data.

Facebook Inc. recorded the position of mobile app users, who agreed to turn location tracking on. Every week, users were assigned resident administrative regions based on consistent overnight stays. Given two administrative regions (the 'départements') and , we denote by ( ) and ( ) the number of users assigned to each département on

week . The colocation probability between and for week is calculated as follows. The week is partitioned into 5 minutes time bins 1 , … , 2016 . We denote ( ) the number of users assigned to and located in the same For continental France, colocation probabilities were provided at the département level, with a total of = 94 départements covered. Therefore, we constructed 94x94 matrices that capture the weekly colocation probabilities between users depending on their département of residence. These can readily be used to construct weighted contact networks. We also investigated the coverage of Facebook users with assigned département of residence among to French population from the 2019 census data provided by insee.fr.

To summarize temporal changes in the resulting weighted contact networks between French départements, we calculated for each week three node-based (clustering, strength and null links) and one edge-based (curvature) graph descriptors. The node-based descriptors were calculated by constructing undirected, weighted graphs from colocation data. For clustering, we used the local clustering coefficient ('transitivity' function in the R package igraph), which calculates for a given node all edges weights between the node neighbours relative to the maximal neighbourhood clique size. This clustering coefficient equals one if the node is contained in a clique, i.e. the node and all its neighbours are connected to each other. The strength of a node is calculated by summing the edge weights incident to the node ('strength' function in igraph). The number of null links counts the number of incident edges with zero weight and is bounded from above by −1. A high centrality score for a particular node indicates that colocation between other nodes is decreasing, forcing shortest paths to go through that node, which becomes more central. Curvature was calculated in the sense of discrete Ollivier-Ricci (OR) curvature [20] with a freely available Python code at https://github. com/saibalmars/GraphRicciCurvature. The OR curvature of an edge compares the distance between two nodes and with the optimal transport cost  between uniform measures on unit balls centered in and :

Numerical studies showed that edges with positive curvature tend to be part of a cluster, whereas edges with negative curvature tend to act as bridges between clusters [21, 22, 23] . Therefore, a decrease in curvature can be seen as an indicator for connectivity breakdown. . CC-BY-NC-ND 4.0 International license It is made available under a display the preprint in perpetuity.

is the author/funder, who has granted medRxiv a license to (which was not certified by peer review) for this preprint For comparison, we also incorporated Google Mobility Report data www.google.com/covid19/mobility/, 70 from which we obtain the daily change in percentage with respect to a pre-pandemic baseline in visits to grocery stores, parks, workplace, residential areas, transit and retail stores. We calculated cumulative weekly changes matching the dates of the weekly recorded colocation data. Since Google Mobility Reports were not available at the départementlevel, we used these data only for national-level analyses.

We downloaded the positive testing rate, i.e. the ratio of positive tests over all tests from www.ourworldindata. 75 org. To remove reporting bias, we calculated the rolling average with a right-aligned seven-day window and calculated the weekly positive test rate matching the dates of the weekly recorded colocation data.

We downloaded daily temperature data by département which was aggregated into weekly minimum, maximum and average temperature. For the same data, we also calculated quantiles for national analysis.

COVID-19 hospital data for France was downloaded from www.data.gouv.fr. The data comprised daily hospital 80 admissions, ICU admissions, and deaths in hospitals by département. To match the Facebook colocation data, we only considered départements in continental France spanning the time period from March 24, 2020 to March 30, 2021. For the country-wide analysis, we summed the daily incidence in all départements and calculated right-aligned seven-day rolling averages to remove reporting bias. Finally, we calculated cumulative weekly incidence matching the dates of the weekly recorded colocation data. The weekly incidence data were log-transformed prior to the time series analysis.

The time series analysis was performed using the R package forecast. For each week, we trained and tuned Autoregressive Integrated Moving Average (ARIMA) models on historic data starting from March 24, 2020 and performed forecasting for one and two-week horizons. Model parsimony during tuning was determined by the Akaike Information Criterion (AIC). The first four weeks were used for training only, and we started prediction at the fifth week of our records. At any particular week, starting from June 2020, the prediction accuracy was evaluated in terms of mean 90 average error (MAE) between incidence data and predictions for the following two weeks. More precisely, given logtransformed incidence data from week , with exogenous regressors , the regression model with ARIMA error [24] is defined by

where the regression error is an ARIMA error with the order of the autoregressive part and is the order of the moving average part and are Gaussian errors. Note that and can undergo differencing up to order to ensure 95 stationarity.

We used exogenous regressors based on quantiles of network descriptors, Facebook colocation data, Google mo- is the author/funder, who has granted medRxiv a license to (which was not certified by peer review) for this preprint The copyright holder this version posted June 23, 2021. ; https://doi.org/10.1101/2021.06.17.21258666 doi: medRxiv preprint bility reports, and positive testing rates to tune parameters , , and and to train each model. We then compared the prediction results from models using regressor data to those using historic incidence data alone.

The predictive power of combinations of regressors was assessed by following a step-wise method. For each week, 100 we started using a single regressor and incremented the set of regressors as long as the mean average error of the prediction decreased, thereby yielding a linear combination of regressors with minimal prediction error per week.

For the département-level models, we also applied regression models with ARIMA error. Node-based regressors (e.g. node strength, null links) were based on the actual values, whereas the edge-based curvature regressors consisted of quantiles relative to the département of interest. In addition, for each département, we also used incidence data for 105 hospital admission and mortality in départements with the ten strongest colocation links to the département of interest (denoted by incid_1, incid_2, etc.).

The analysis of network descriptors showed a highly dynamic range for the clustering coefficients in relationship 110 with the hospital incidence data and underlying non-pharmaceutical interventions (Fig 1) . For instance, in the midst of the first national lockdown in April 2020, the minimum clustering coefficient was at 0.6 but by July 2020 it had regained almost maximum levels (close to 1). The number of null links, which offers a more discrete measure of decreasing connections between départements, reached 72 out of 93 possible null links (77%) for several départements in April 2020. In contrast, several départements exhibited small changes in null links, suggesting that the mobility 115 restrictions impacted the colocation probabilities unevenly across the country. The network strength more than tripled during the same time period in the départements, reaching the strongest colocation probability; a high level which was maintained throughout October 2020, while hospitalization was increasing again in September. The curvature trends followed that of the network strength but were more sensitive to perturbations, for instance around January 1st 2021.

We also investigated spatial snapshots of the network descriptors prior to and during the second lockdown period 120 at the beginning of November 2020. Figure 2 shows, for these two weeks, the 50 strongest colocation weights per week between two départements (red edges) and the number of null links in the départements involved at the end of these links. These result point towards spatial clusters of départements which maintained high colocation weights between each other (e.g. in the Paris region or in Eastern France) while shutting down connections to many other départements.

Correlation plots (Fig S1) across all recorded weeks indicated a strong positive correlation between network 125 strength, curvature and Google Mobility (retail, grocery, parks, transit), but negative correlations between these quantities and null links. Interestingly, within-département colocation was only weakly correlated with Google Mobility data, supporting the hypothesis that colocation data could yield signals qualitatively different from more classical is the author/funder, who has granted medRxiv a license to (which was not certified by peer review) for this preprint 130 The time series analysis showed that predictions made using only incidence data from past time points led to an average mean error of 1,632 hospitalisations per week and 238 deaths per week (regressor group "none" in Fig 3) .

Including a single exogenous regressor across all time points clearly reduced the prediction error by about 13% for hospital admissions and by 50% for hospital deaths (Fig 3) . Quantiles from Facebook colocation data such as network strength, clustering or null links were among regressors that reduced prediction errors most. Google Mobility data 135 related to recreation and residential movement also greatly improved predictions regarding mortality but not hospital admissions. Positive testing rate only appeared to improve predictions with two-week horizons. Finally, temperature did not belong the best predictors for hospital admissions or mortality.

To further our understanding of temporal dynamics of hospital admissions, in particular at epidemic turning points, we determined for each week combinations of regressors that reduced the error most. For the one-week prediction 140 horizon, we obtained almost perfect fits using regressor combinations mostly related to colocation networks (e.g. clustering and curvature quantiles) and, to a lesser extent, temperature (Fig 5) . Although combining regressors improved the model fit at the two-week horizon compared to the model without exogenous regressors (the red curve), the model was not capable of reproducing the precision seen at the one-week prediction horizon. Similarly, combining regressors to predict hospital deaths resulted in almost perfect fits for the one-week horizon, and greatly improved fits for the 145 two-week horizon, albeit slightly overestimating peak mortality (Fig 4) .

Performing the same analysis at the département level revealed the importance of spatial structure. In particular for Paris region départements (Paris, Yvelines, Seine-Saint-Denis, Seine-et-Marne) and Nantes, the predictors that decreased the mean prediction error most when compared to models without regressors were those related to the 150 incidence in highly colocalized départements (Fig 6) . Conversely, for the metropolitan areas of Lyon, Bordeaux, and Marseilles, network descriptors (in red) such as curvature, null links, but also overall between-département colocation weights were among the best predictors of hospital admissions and deaths. This was also the case for two-week predictions of local hospital deaths, and to a lesser extent for two-week prediction of hospital admissions (Fig 7) .

Numerous studies have put forward the potential insights that can be gained from mobile phone usage data in order to understand the spread of infectious diseases. However, this data is usually analysed from an individual perspective, by following where the users are and how they move. The Facebook Inc. data differs in that it contains colocation data. is the author/funder, who has granted medRxiv a license to (which was not certified by peer review) for this preprint We hypothesised that such data could be particularly suited to understand the transmission of respiratory infections, which involves close contact between individuals. To address this question, we developed an explicit network-based 160 analysis to analyse relevant summary statistics regarding colocation data.

Our analysis indicates that human mobility and contact data improves time series prediction of French COVID-19 hospital admissions and deaths by up to 50%. Determining a posteriori the optimal combination of regressors shows that human contact network analytics augmented with temperature and positive testing rate data yields perfect fits at a one-week horizon. Although these combinations are not of predictive nature, they highlight the impact of global 165 network properties, which are a proxy of the extent of human contact in contrast to human mobility alone. Spatial disparities in disease incidence have motivated sub-national policy decisions to manage the pandemic [25] , and our sub-national time series analysis confirms the predictive power of spatial structure since incidence from colocalized administrative units improved most incidence predictions. Notably, we find that human contact network analytics such as curvature, null links, and network strength are prominent predictors for the metropolitan areas of Lyon, Bordeaux, 170 and Marseilles. Further investigations using a meta-population point of view could yield additional insights.

Another asset of our study is that even though our two-week prediction horizon is relatively large compared to other studies, it remains quite robust for predictions of hospital deaths when compared to one-week predictions, even for a posteriori fits. This is also the slightly the case for hospital admissions. The fact that peak incidence was systematically overestimated points to the weakness of using only linear models, especially in situations where the susceptible 175 populations are rapidly depleted.

Our results reveal the contrasting importance of temperature fluctuations, since temperature was not included in the top univariate regressors, but improved a posteriori fits when combined with other regressors. One possible explanation could be that national and sub-national policy changes over the time period considered led to human contact mobility regressors to be favored over temperature in terms of explanatory power.

Our study has several limitations. First, by focusing on the elaboration of contact network analytics we have utilized rather elementary statistical models. These have the advantage to be easy to interpret, but, given the number of features, more sophisticated machine learning techniques such as long short-term memory neural networks or random forests might could allow extracting more information from the data. Second, the interpretation of colocation data as a proxy for infectious contacts remains to be validated in the field, e.g. in community transmission studies [26] . The validation 185 would also require to concurrently taking into account the importance of hospital catchment areas when recording disease incidence [27] . Furthermore, assortativity between users and user coverage might introduce biases that do not reflect the extent of human contacts pertaining to disease transmission.

Taken together, our time series analysis shows the potential of human contact network analytics to improve both predictions and a posteriori model fits of disease incidence as recorded during the COVID-19 pandemic in France. is the author/funder, who has granted medRxiv a license to (which was not certified by peer review) for this preprint The copyright holder this version posted June 23, 2021. ; https://doi.org/10.1101/2021.06.17.21258666 doi: medRxiv preprint Predicting disease incidence with human contact network analytics Combining network analytics with mechanistic models of disease transmission opens promising novel avenues for real-time disease control.

We thank the Data for Good initiative by Facebook Inc. who provided the co-localization data through a data sharing agreement with the University of Montpellier. is the author/funder, who has granted medRxiv a license to (which was not certified by peer review) for this preprint is the author/funder, who has granted medRxiv a license to (which was not certified by peer review) for this preprint is the author/funder, who has granted medRxiv a license to (which was not certified by peer review) for this preprint is the author/funder, who has granted medRxiv a license to (which was not certified by peer review) for this preprint is the author/funder, who has granted medRxiv a license to (which was not certified by peer review) for this preprint is the author/funder, who has granted medRxiv a license to (which was not certified by peer review) for this preprint min_temperature_30%  min_temperature_40%  min_temperature_50%  min_temperature_60%  min_temperature_70%  min_temperature_80%  min_temperature_90%  min_temperature_100%  mean_temperature_0%  mean_temperature_10%  mean_temperature_20%  mean_temperature_30%  mean_temperature_40%  mean_temperature_50%  mean_temperature_60%  mean_temperature_70%  mean_temperature_80%  mean_temperature_90%  mean_temperature_100%  ricci_mean_0%  ricci_mean_10%  ricci_mean_20%  ricci_mean_30%  ricci_mean_40%  ricci_mean_50%  ricci_mean_60%  ricci_mean_70%  ricci_mean_80%  ricci_mean_90%  ricci_mean_100%  null_links_30%  null_links_40%  null_links_50%  null_links_60%  null_links_70%  null_links_80%  null_links_90%  null_links_100%  clustering_0%  clustering_10%  clustering_20%  clustering_30%  clustering_40%  clustering_50%  clustering_60%  clustering_70%  clustering_80%  clustering_90%  clustering_100%  strength_0%  strength_10%  strength_20%  strength_30%  strength_40%  strength_50%  strength_60%  strength_70%  strength_80%  strength_90%  strength_100%  within_departement_colocation_0%  within_departement_colocation_10%  within_departement_colocation_20%  within_departement_colocation_30%  within_departement_colocation_40%  within_departement_colocation_50%  within_departement_colocation_60%  within_departement_colocation_70%  within_departement_colocation_80%  within_departement_colocation_90%  within_departement_colocation_100%  between_departement_colocation_0%  between_departement_colocation_10%  between_departement_colocation_20%  between_departement_colocation_30%  between_departement_colocation_40%  between_departement_colocation_50%  between_departement_colocation_60%  between_departement_colocation_70%  between_departement_colocation_80%  between_departement_colocation_90%  between_departement_colocation_100%  facebook_coverage_0%  facebook_coverage_10%  facebook_coverage_20%  facebook_coverage_30%  facebook_coverage_40%  facebook_coverage_50%  facebook_coverage_60%  facebook_coverage_70%  facebook_coverage_80%  facebook_coverage_90%  facebook_coverage_100%  positive_test_ratio_weekly  retail_and_recreation_percent_change_from_baseline  grocery_and_pharmacy_percent_change_from_baseline  parks_percent_change_from_baseline  transit_stations_percent_change_from_baseline workplaces_percent_change_from_baseline residential_percent_change_from_baseline is the author/funder, who has granted medRxiv a license to (which was not certified by peer review) for this preprint The copyright holder this version posted June 23, 2021. ; https://doi.org/10.1101/2021.06.17.21258666 doi: medRxiv preprint

Estimating the burden of SARS-CoV-2 in France

Memory is key in capturing COVID-19 epidemiological dynamics

The effect of human mobility and control measures on the COVID-19 epidemic in China

Association between mobility patterns and COVID-19 transmission in the USA: a mathematical modelling study

tail/recreation and public transport mobility during non-lockdown periods

Effects of population co-location reduction on cross-county transmission risk of COVID-19 in the united states

An ensemble model based on early predictors to forecast COVID-19 healthcare demand in France

A spatiotemporal tool to project hospital critical care capacity and mortality from covid-19 in us counties

Novel coronavirus (2019-nCoV) early-stage importation risk to Europe

Variation in human mobility and its impact on the risk of future COVID-19 outbreaks in Taiwan

Mobility network models of COVID-19 explain inequities 235 and inform reopening

Modeling nigerian covid-19 cases: A comparative analysis of models and estimators

Predictions for COVID-19 with deep learning models of LSTM

Time series analysis and forecasting of coronavirus disease in indonesia using ARIMA model and PROPHET

Using discrete ricci curvatures to infer COVID-19 epidemic network fragility and systemic risk

COVID-19 pandemic prediction using time series forecasting models

Forecasting COVID-19 confirmed cases, deaths and recoveries: Revisiting established time series 250 modeling through novel applications for the USA and italy

Data analysis of covid-19 pandemic and short-term cumulative case forecasting using machine learning time series methods

Ricci curvature of metric spaces

Community detection on networks with ricci flow

Ricci Curvature of the Internet Topology

Ollivier's ricci curvature, local clustering and curvature-dimension inequalities on graphs

Automatic time series forecasting: the forecast package for R

Local lockdowns outperform global lockdown on the far side of the COVID-19 epidemic curve

Resurgence of SARS-CoV-2: Detection by community viral surveillance

The authors thank the CNRS, the IRD, and acknowledge the itrop HPC (South Green Platform) at IRD Montpellier for providing HPC resources that have contributed to the research results reported within this study (https: //bioinfo.ird.fr/).