key: cord-0144081-emlkjp9t
authors: Haycock, Chance; Thorpe-Woods, Edward; Walsh, James; O'Hara, Patrick; Giles, Oscar; Dhir, Neil; Damoulas, Theodoros
title: An Expectation-Based Network Scan Statistic for a COVID-19 Early Warning System
date: 2020-12-08
journal: nan
DOI: nan
sha: ca6e93894127ce0ba4b238a503c00cd5a9afcac5
doc_id: 144081
cord_uid: emlkjp9t

One of the Greater London Authority's (GLA) response to the COVID-19 pandemic brings together multiple large-scale and heterogeneous datasets capturing mobility, transportation and traffic activity over the city of London to better understand 'busyness' and enable targeted interventions and effective policy-making. As part of Project Odysseus we describe an early-warning system and introduce an expectation-based scan statistic for networks to help the GLA and Transport for London, understand the extent to which populations are following government COVID-19 guidelines. We explicitly treat the case of geographically fixed time-series data located on a (road) network and primarily focus on monitoring the dynamics across large regions of the capital. Additionally, we also focus on the detection and reporting of significant spatio-temporal regions. Our approach is extending the Network Based Scan Statistic (NBSS) by making it expectation-based (EBP) and by using stochastic processes for time-series forecasting, which enables us to quantify metric uncertainty in both the EBP and NBSS frameworks. We introduce a variant of the metric used in the EBP model which focuses on identifying space-time regions in which activity is quieter than expected.

In response to the current COVID-19 pandemic, governments of half the world's population have enacted some form of "lock-down" [11] ; a non-pharmaceutical approach to suppressing the spread of the virus by restricting social interaction -a primary source of COVID-19 transmission [4, 10] . With some countries now beginning the relaxation of these measures, as well as a looming second-wave; the monitoring of daily human activity has never been more important for policy makers. As we transition from nationwide lock-downs (e.g. in the UK, France, Spain and Italy) and enter the recovery period it is of paramount importance to understand to what extent human activity is returning to normal, enabling targeted interventions and effective policy-making, to prevent the resurgence of COVID-19 and to aid the economic recovery. Here, we describe the implementation of an early warning system that enables the monitoring of human activity levels and ultimately, the detection of regions where levels are higher than expected. This enables us to understand mobility and activity in near real-time; assist public health planners, epidemiologists, policy-makers and support and help to optimise the economic recovery. Monitoring the collective movement of millions of people has been done before, to e.g. quantify malaria transmission rates [18] . But clearly, large-scale computational studies such as this, raise substantial ethical concerns. Alas, note all data used in this project is anonymised and the system undergoes continuous review by The Alan Turing Institute's Ethical Advisory Group.

34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada. arXiv:2012.07574v1 [cs. LG] 8 Dec 2020

The field of scan statistics is vast and is used for detecting anomalous clusters in spatial or spatiotemporal data [7, 6, 8, 1] . The spatial scan statistic was introduced by Kulldorff [9] and one of many extensions is the expectation-based version by Neill et al. [15] , Neill [13] . In the latter, simultaneous monitoring of a large number of spatially localised time-series [15] is undertaken, to detect emerging clusters (i.e. spatial patterns). By computing the expected count for each recent day, for each spatial location, and then comparing those to more recent counts, Neill et al. [15] improve timeliness, accuracy and the spatial resolution of detection. Shiode & Shiode [17] recently introduced the network-based scan-statistic (NetScan), an extension of the original by Kulldorff [9] . NetScan is one of many statistical methods for analysing spatial patterns of points on a network of lines, such as road accident locations along a road network [2] . But focusing on the former for brevity, NetScan departs from the original scan-approach by replacing a standard planar search window with a network-based search window [17, §2.1] . This is then used to sweep across the entire extent of the street network in the area of investigation. By extending the EBP to the network space, see §3, we leverage the considerable statistical power therein, to the often complex geometries found in networks -see fig. 1 .

Consider a time-series, sampled at every hour, at a set of N s fixed spatial locations

) spanning up to the present time T (e.g. vehicle count data from N s fixed road sensors). Where V is a set of vertices and E is a set of edges. Our goal is to scan over the most recent W hours worth of data and identify spatio-temporal clusters by comparing expected counts b t i ∈ Z + with true counts c t i ∈ Z + . Our approach has three distinct chapters: first ( §3.1), we generate the baselines b t i -expected count data which represents typical behaviour. This provides the null hypothesis data, to compare to actual count data to locate emerging regions of busyness; second ( §3.2), a grid is defined on the search domain and hence the search regions (remember, our search is over a road-network G); third ( §3.3), a metric is defined which compares the expected and true counts. Consequently we are able to assign scores to each search region and use these to either monitor the entire region or report back on significant increases/decreases in activity.

We first generalise the forecasting by allowing disjoint intervals of time used for training and forecasting periods; these are denoted by time-indexing sets T t and T f respectively. After accounting for day-of-the-week variation, we then use true counts {c t i | t ∈ T t } to learn baseline estimates {b t i | t ∈ T f }. In the context of a COVID-19 monitoring system, specifying T t is dependent on the subtle scientific question that is being asked. For example, one could specify T t as a time period of mid-lockdown data to identify regions where traffic levels are higher than expected.

The second generalisation incorporates uncertainty into the forecast through the use of Gaussian processes (GP) [16] . The GP framework yields predictions b t i that are Gaussian distributed with mean µ t i and standard deviation σ t i [16] (when treating the counts as continuous variables). Consequently we estimate an upper and lower bound for each b t i by adding/subtracting some pre-determined number of σ t i from the mean b t i = µ t i . We denote these upper and lower bounds by b t i,upp , b t i,low respectively and ultimately scan over search regions with data

Further, most human activity, such as traffic flow, is expected to exhibit both daily and weekly periodic variation which suggests the use of a tailored kernel of the form:

We explicitly initialise the periods of the periodic components to 24h and 168h respectively. In addition, we compare the GP approach to the very successful Holt-Winters method used by Neill [12] (though they find that performance is very much dependent on the scientific question being asked). See fig. 6 , appendix C, for an example forecast. We divide the network into approximately equal length line-segments as discussed by Shiode & Shiode [17] and shown in fig. 1 (see also appendix B). The search regions are generated by defining a lower and upper bound of path length to search on the network. Instead of searching rectangular regions, we search over multiple, overlapping paths that lie on G. The number of paths generated is very much dependent on the shape of the network and the lower and upper bounds for the path length. Although it is expected that the number of search paths overwhelming exceeds the number of rectangular search regions, the ability to easily hash network edges in dictionary-like data structures enables a simpler and faster implementation than its planar counterpart. For directional data sets (e.g. traffic sensors with road direction data), it is also possible to adapt the network-based scan to produce scores for each direction of a search path S.

We denote the collection of spatio-temporal search paths by S. For every search region S ∈ S, we define B S := S b t i and C S := S c t i as the total baseline estimates and actual counts within a spatio-temporal region S respectively. B S,upp and B S,low are defined similarly. Fundamentally, for each searched space-time region S, we are interested in estimating the posterior probabilities P[H 1 (S) | D] and P[H 0 | D] for some null and alternative hypotheses H 0 , H 1 (S) for S ∈ S. The EBP model works under the null hypothesis of true counts being equal to a Poisson distributed random variable with mean given by the baseline estimate. The alternative set of hypotheses are similar but generated with a mean of qb t i for some risk factor q > 1, which is to say:

Without using Bayesian approaches as in Neill & Cooper [14] , it is hard to calculate these probabilities in isolation. Instead, Neill [12] introduces the Poisson likelihood metric, F EBP (S), derived from their ratio (see appendix D for a derivation). Note that in the case of C S < B S , the value of this metric defaults to 1. We introduce a generalised form of this metric which will also account for quieter activity than expected. We define the ASYM metric as

The ability to specify different training profiles T t motivates the introduction of this metric. Previous works have focused on capturing events with increased counts; in the context of a COVID-19 monitoring system, this may not be the user's goal. In the live version of the system, currently deployed in the city of London, we use real heterogeneous data sources such as loop detectors (for counting vehicles) and traffic cameras. That data is proprietary and hence cannot be included in this paper (a snippet of real traffic-camera data is shown in fig. 6 , appendix C). That being said, to test the proposed framework, we use semi-synthetic, hourly vehicle count data from ∼900 road sensors across the borough of Westminster, London. See appendix E for data generation process. We build the forecasts using 21 days worth of training data and set a forecasting period of W = 48 hours. It is deemed that scanning over spatio-temporal regions which are 2 days long is the optimal length for detecting hotspots within this type of data. (a) A higher corrected EBP score 1 corresponds to a more significant result when compared to the 99th percentile of historical data (see appendix E). We see here that on average, the NET scan yields scores of higher or similar significance for each day throughout the simulated surge. For consistency, we define search regions that take roughly the same time to scan over for each scan type. For the network (NET) scan, we generate all paths on the main road network of Westminster that are between 50m and 1km in length; we include the lower bound to ensure that small circular paths on roundabouts are avoided. We also make the intermediate step of restricting the data set to detectors that are within 5 × 10 −4 degrees of the main road network. For the planar (PL) scan, we first extend the Westminster boundary to its smallest bounding box and define a grid of resolution N = 8. The number of detectors and search regions for each scan type are summarised in table 1.

We compare the performance of each scan type by analysing three statistics; the average time to detect a simulated surge and its corresponding spatial precision and recall as defined in [12] . Figure 2a shows how the EBP score changes throughout the duration of the surge for each scan type. On average, the network scan yields scores of higher or similar significance when compared to its planar counterpart. However, it is likely to be reporting on a subset of the affected region due to the way paths are constructed. Figure 2b and fig. 2c summarise the spatial accuracy of each scan type. In sum, if the required output of the system is a single region of space-time that engulfs surging sensors, then it is deemed that the PL scan is superior in terms of speed, simplicity and high spatial recall. If the user is not interested in a singular return region and requires a more visual heat map output of the domain, then the NET scan offers a much more localised view of the surge; boasting superior spatial precision. For a more in-depth discussion on the results, the system and the future see appendix A.

A The value and ethics of using 'big-data' to capture human mobility Concluding remarks on the early warning system It is clear that there are benefits and drawbacks to all types of scans which are mostly dependent on the scientific use case. Furthermore, the size of the region of interest also plays its part. The simplicity of the PL scan allows for much larger domains without increasing the run-time of the scan; for example, the whole city of London, UK. It was found that imposing a 16 × 16 grid on the entire London domain provided a sufficient resolution for detecting larger regions of increased activity -shown in fig. 3 . However, it is deemed that if data sources are confined to a network that sits in a relatively small region of space (e.g. a borough), then the NET scan is far superior in visualising localised hotspots with the added advantage of including the direction of the surged activity. On the forecasting, the two forecasting methods, appear to perform similarly, again both with their benefits and drawbacks. The HW method is a good and fast starting point for creating these type of baselines, boasting a factor of 20 faster run time than a GP. However, if run-time is not an issue, GP's manage to achieve a slightly better spatial accuracy with the added benefit of uncertainty quantification.

Whilst both variants of the NET scan boast a higher spatial precision, the suffering recall metric suggests that a possibly more useful output is shown in a heat map form as shown in fig. 4 . These heat maps are generated by taking the collection of searched regions and aggregating by mean to the defined grid. This output should be of greater use to decision makers; providing a simple and visual explanation of the underlying dynamics. It also yields information about sensors which are not surging which is something that is not available from a single returned search region. Naturally, presenting these heat maps with a temporal dependence (e.g. through the use of a slider) is even more useful in showing the evolution of a potential surge. 

(c) Road network sensors shown, on the same region as the planar scan in fig. 4b . Included is also an example line-segment used by the NET scan. The full road network in Westminster borough is shown in fig. 1 .

(d) Network heatmap results for the same surging sensors as shown in Figure 4b . Regions of high significance (yellow) are clearly more localised than the planar counterpart.

London's digital twin of busyness The early warning system presented herein, plays but a small part in the larger digital twin. The twin combines scalable cloud computing and machine learning to continuously learn and update itself by drawing from multiple streams of transportation data, capturing the constantly evolving activity in the city. A digital twin is more than a digital abstraction reflecting reality, rather it also attempts to generate interventions which may be directly implemented or via "human-in-the-loop" recommendations to advisors to benefit both accuracy of the abstraction and any objectives therein. One such example within London has been observing air quality within areas of high public transit. Encoding a capacity to represent pollution nearby a school or park therefore permits estimating alternative routes or prioritising hybrid buses for a given route. Other parts of the twin can count the number of people walking on a street and estimate their social distance from snapshots of Transport for London's traffic camera footage. This is implemented through specialised cloud compute clusters, virtual machines and data structures activity engaged in data collection, event detection, and intervention evaluation. A future extension of our work is to apply our scan statistics to multiple modes of transport thus giving further insights to policy makers.

Privacy As was touched upon in §1 privacy is a major issue when conducting large-scale human monitoring of the form done in this project. First, owing to the nature of a pandemic of this magnitude, spread and severity, certain civil liberties may need to be temporarily suspended or bypassed. Around the world this has been the case, where e.g. some lock-downs have been more onerous than others. Similarly for privacy, in order to get ahead of the virus transmission, certain liberties may need to be temporarily suspended in order for the scientific community to fully understand the virus. That being said, any new government powers must expire when the disease is contained and any new processing of personal data must be proportionate to the actual need. In this project all data is anonymised, collected data cannot be reverse engineered in order to track people and there is a strong regiment of ethical oversight. Ultimately though, we are not interested in micro-behaviours, but rather the macro behaviour of people as this is what is required for more informed policy decisions.

Further, we have taken steps to make our work transparent and reproducible. Transparency improves trust in a system from the general public -particularly important when many are rightly concerned about user privacy and the "big brother" effect. By making our code open-source, using technologies such as Docker, and (where possible) the data publicly accessible, other researchers can reproduce our work and our methods are fully transparent.

Future work On submission the digital twin, with its early warning system, is live in the city of London. Discussions are ongoing with regards to expanding the system to other cities in the UK. On the technical level there are a multitude of improvements that can be made as well as far more research undertaken. For the warning system alone, a more robust way of modelling the busyness would be to employ an inhomogeneous spatio-temporal point-process model, which would allow for non-stationary intensity function -see e.g. the work by Baddeley et al. [3] . That being said, elegant though that approach may be, the complexity of such a model is orders of magnitude higher than our initial proposal herein. Moreover, the technical improvements must be weighed against the ultimate users of this system i.e. the stakeholders, for whom usability if the primary motivator. Under this light, a research software engineering direction is actively focused on ease of use, progressing Application Programming Interface (API) simplicity and model transparency.

Stakeholders The two primary stakeholders are the Greater London Authority and Transport for London. When explaining our work to policy makers we emphasise high level mechanics, intuition and limitations of methods rather than the mathematical intricacies. As the system matures, it is our hope that the list of stakeholders grows, so that multiple policy makers can make use of the system. We see this as a likely future scenario as the virus, unfortunately, appears to be the 'new normal'.

Westminster Boundary

Sensor locations x i 

The Holt-Winters (HW) method [5] is trained on historical counts {c t i | t ∈ T t } by iterating the three equations given by eq. (5) . It has three main components: the smoothed value X t , trend component Y t , and the periodic component Z t

Tuning parameters α, β, and γ are optimised to increase the forecast accuracy of the model. The baseline estimate for the next hour is then given by eq. (6) above. The whole training period is used to predict b t i for the next hour, and hence, by iterating this process, the baseline for any subsequent hour can be determined. An example of HW applied to real count data is shown in fig. 6 . Figure 6 : A comparison of 'busyness' forecasting methods used within the scan statistic pipeline, when applied to data from London's JamCam system -which captures how traffic is moving along particular roads and helps people to understand live traffic conditions. Both methods appear to perform well with clear periodic variation in both. In this particular example, the HW method appears to exaggerate some of this periodic behaviour more severely than the GP counterpart. Clearly, the GP method comes with the added benefit of confidence bands. Throughout what follows, we use a 3σ confidence interval.

The Poisson likelihood metric, F EBP (S) is derived from the ratio of P[D|H 0 ] and P[D|H 1 (S)], S ∈ S. In its original form, the metric is derived as follows:

using that the maximum value of F EBP (S) is achieved for q = max(1, C S /B S ).

In this example, we aim to create data that corresponds to a mid-lockdown profile beginning from April 2020 and focus on identifying regions of increased activity. To generate (and anonymise) the surge-free data, we first calculate the 90th percentile of the amplitude for each sensor and use this to generate a sinusoidal time-series with added Poisson noise. The periods of the sinusoidal variation are chosen to match the empirical daily and weekly variation of each sensor. We use this generated data for two purposes; firstly to act as base data to later build a surge on top of and secondly, to build up the empirical distribution of the EBP metric when the scan is performed on surge-free data. In practise, real data requires the need for pre-processing; our implementation includes a four-stage pipeline for this which involves anomaly detection/removal, missing data interpolation and the removal of time-series which are deemed to be either a-periodic or missing too much data.

To simulate a surge, we first choose an epicentre uniformly at random within the borough boundary and it's k nearest sensors; k is chose uniformly at random between k min = 10 and k max = 100. We apply a linear increase in each of these affected sensor's time-series spanning the last three days; it is this three day surge that we wish to capture in the scan. We give each sensor its own rate of linear increase ω i which is proportional to it's busyness i.e. busier sensors will surge faster than quieter ones. This was found to give a similar surge severity to what would be expected from a breach in lockdown rules. Parametrically, the surge factor λ i (t) = (1 + ω i t) for outbreak days t = 1, 2, 3 is bounded between 1 and 4.

To define when a surge is detected, we first estimate the empirical distribution of EBP scores when no surges are present. Using the mid-lockdown profile, we generate 122 days worth of surge-free sensor data across the borough; the first 21 days are used to train the model for the first forecast period. Using the remaining 101 days data, we calculate the highest EBP score for each scan type; the distributions are shown in Figure 7 . It is this empirical distribution that is used to calculate the scan's average time to detection as a function of allowable false-positive rate.

1.00000 1.00025 1.00050 1.00075 1.00100 1.00125 1.00150 1.00175 1.00200 EBP Score Type Planar-GP Network-GP Planar-HW Network-HW Figure 7 : Distribution of EBP scores from surge free simulated data. Both variations of the NET scan appear yield a thinner tailed distribution, potentially making "significant" easier to define in the alarm system.

Spatial point patterns: methodology and applications with R

Analysing point patterns on networks-a review

Non-and semi-parametric estimation of interaction in inhomogeneous point patterns

Association between mobility patterns and covid-19 transmission in the usa: a mathematical modelling study. The Lancet Infectious Diseases

The holt-winters forecasting procedure

An introduction to the theory of point processes: volume II: general theory and structure

Statistical analysis of spatial and spatio-temporal point patterns

Handbook of spatial statistics

A spatial scan statistic

Evaluating the effectiveness of social distancing interventions to delay or flatten the epidemic curve of coronavirus disease

Covid-19 pandemic and environmental pollution: a blessing in disguise? Science of The Total Environment

Expectation-based scan statistics for monitoring spatial time series data

Machine Learning and Event Detection for Population Health. Georgia Institute of Technology

A multivariate bayesian scan statistic for early event detection and characterization

Detection of emerging space-time clusters

Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)

A network-based scan statistic for detecting the exact location and extent of hotspots along urban streets. Computers, Environment and Urban Systems

Quantifying the impact of human mobility on malaria

Funded by Lloyd's Register Foundation programme on Data Centric Engineering and Warwick Impact Fund via the EPSRC Impact Acceleration Account. Further supported by the Greater London Authority, Transport for London, Microsoft, Department of Engineering at University of Cambridge and the Science and Technology Facilities Council.