key: cord-0680265-yajcdinf authors: Pinder, Thomas; Hollaway, Michael; Nemeth, Christopher; Young, Paul J.; Leslie, David title: A Probabilistic Assessment of the COVID-19 Lockdown on Air Quality in the UK date: 2021-04-22 journal: nan DOI: nan sha: d15b6f376ac3a5c1ce56b87b46c6f08e4d8441af doc_id: 680265 cord_uid: yajcdinf In March 2020 the United Kingdom (UK) entered a nationwide lockdown period due to the Covid-19 pandemic. As a result, levels of nitrogen dioxide (NO2) in the atmosphere dropped. In this work, we use 550,134 NO2 data points from 237 stations in the UK to build a spatiotemporal Gaussian process capable of predicting NO2 levels across the entire UK. We integrate several covariate datasets to enhance the model's ability to capture the complex spatiotemporal dynamics of NO2. Our numerical analyses show that, within two weeks of a UK lockdown being imposed, UK NO2 levels dropped 36.8%. Further, we show that as a direct result of lockdown NO2 levels were 29-38% lower than what they would have been had no lockdown occurred. In accompaniment to these numerical results, we provide a software framework that allows practitioners to easily and efficiently fit similar models. In response to the COVID-19 pandemic, many countries resorted to ordering citizens to remain at home. These lockdown measures had the effect of curtailing certain polluting activities, resulting in reduced pollutant concentrations in many places around the world (e.g. Forster et al., 2020) . Clear evidence of reductions has been found in nitrogen dioxide (NO 2 ), a pollutant that can negatively impact respiratory health (Henschel and Chan, 2013; Anenberg et al., 2018) and which is strongly associated with transport emissions due to the burning of fossil fuels (Sundvor et al., 2013) . Yet these (often provisional) assessments of lockdown impacts on NO 2 have largely been restricted to analyses of individual measurement locations or satellite retrievals and focused on daily means (Berman and Ebisu, 2020; Carslaw, 2020; Schindler, 2020) , rather than combining information across different sources. In this paper we utilise a spatiotemporal approach to model the patterns of change in NO 2 . Modelling continuous spatiotemporal surfaces using a set of discrete observations is a common task in geostatistics (Diggle and Ribeiro, 2007) . This approach assumes that there exists a latent partially observed function which is commonly modelled with a GP. Consequently, it is then possible to construct a fully-Bayesian model. Historically these models have been computationally challenging due to their O(n 3 ) cost, where n is the number of datapoints. However, recent advances in the sparse GP literature has reduced this cost to O(nm 2 ), where m << n (Hensman et al., 2013) . In this work we combine several measurement datasets and covariate information to build a highly informative spatiotemporal model that yields novel insights into the affect of the COVID-19 UK lockdown on NO 2 levels. To build a model that can be both computationally efficient to fit and powerful enough to capture the complex latent process, we use recent developments in the GP literature by using the SteinGP presented by Pinder et al. (2020) . Finally, we examine the inferred latent function to report two quantitative results concerning the change in NO 2 levels. The first gives the integrated NO 2 levels pre and post-lockdown, and the second compares the latent function post-lockdown to the NO 2 levels that, accounting for weather-driven variability, would have been recorded had a lockdown not been instigated. Ground measurements Hourly NO 2 concentrations are recorded by the Automatic Urban and Rural Network (AURN) that covers England and Northern Ireland, the Scottish Air Quality Network (SAQN) and the Welsh Air Quality Network (WAQN) groups of sensors. This network is comprised of 237 sites (shown in Appendix A) that are used to monitor compliance of UK air quality with ambient air quality directives. It provides hourly concentrations of a number of key pollutants, including NO 2 . The data covers a wide range of site types from urban roadside through to rural settings 1 . The NO 2 values are pre-processed by the RMWeather package Grange et al. (2018) to normalise for the effects of weather, namely wind speed and wind direction. Land cover data The UK Centre for Ecology and Hydrology Land cover Map (LCM) 2019 describes the land cover over Great Britain (Morton et al., 2020b) and Northern Ireland (Morton et al., 2020a) at 25 m resolution for 21 broad classes. Here, the 25 m data is aggregated to 200 m grid squares over the UK and the proportion of each square that falls under one of the 3 urban classes is calculated. This gives an estimate of the urban fraction in each grid cell. Reanalysis data ERA5 is a reanalysis product developed by the European Centre for Medium Range Weather Forecasting (ECMWF). Using data assimilation techniques with a forecast model, satellite data and ground-based observations, global hourly meteorological conditions are produced at a horizontal resolution of ≈31 km (Hersbach et al., 2020) , from 1979 through to the present day. Here wind speed, wind direction, relative humidity and temperature data are used to provide our model with meteorological covariate information (Grange et al., 2018 ). In the model described in Section 3.1 we will refer to the land cover and reanalysis data as covariates, the location of the ground measurement stations as the spatial dimensions and the timestamp as the temporal dimension. This selection of covariates is consistent with other air pollution studies (e.g., Grange et al., 2018 ). Considering a set of n where x i ∈ R n and y i ∈ R. We seek to learn a function f : R d → R that relates inputs to outputs given D and a likelihood function is the realisation of f at the input locations X. We place a GP prior on f such that p(f ) ∼ GP(µ, k θ ), characterised by a mean function µ : X → R and a θ-parameterised kernel function k θ : X × X → R. To introduce sparsity into our GP model we introduce a set of inducing points Z that are used to approximate X. This is common practice for GP modelling with large datasets as it enables tractable computation (see Snelson and Ghahramani, 2005; Hensman et al., 2013) . Using recent work by Pinder et al. (2020) , the kernel parameters θ, observational noise σ 2 and latent values u = f (Z) of the GP can be learned using Stein Variational Gradient Descent (SVGD). Such an approach allows for highly efficient computations to be done, whilst allowing the practitioner to place prior distributions on parameters, a quality not enjoyed by regular sparse frameworks. Further, using the software package described by Pinder et al. (2020) , we demonstrate how practitioners can easily integrate these models into their existing analyses. See Appendix E for an example of this. 3 Analysis We formulate the following kernel 3 to capture the complex spatiotemporal dynamics of NO 2 , where the subscripts s, t and c denote the spatial, temporal and covariate dimensions of the data. A third-order Matérn kernel is used for the spatial kernel and a squared exponential kernel for the covariate values. Furthermore, we use a product kernel of a non-stationary third-order polynomial kernel and first-order Matérn kernel for the temporal dimension. A white noise process is applied to all seven input dimensions. Full kernel expressions can be found in Appendix B.1. The choice of kernels was driven by two factors: firstly our beliefs about the spatiotemporal behaviour of NO 2 , and secondly from a comparison of the performance of several different kernels on a held-out set of data (see Appendix B.2). To supplement the above kernel, we also equip the GP with a linear mean function y i = a x i + b where a ∈ R d and b ∈ R. The rationale for including a linear mean function is that if we focus on the temporal dimension, there is a globally decreasing trend in NO 2 levels. Including a linear mean function provides an effective method to capture this. From the above model design, our full set of parameters becomes Ω = {θ, a, b, σ 2 , u} where |Ω| = 1022 4 The learned values of these parameters are crucial as it will govern the behaviour of the inferred latent function. Within θ we have three types of parameter: a lengthscale that will control the amount of correlation between datapoints, a variance α that controls the magnitude of the latent function and polynomial coefficients γ that control the temporal behaviour of the GP.To learn posterior distributions for each of these parameters, we independently initialise J = 10 particles to be used within the SVGD optimisation update step. We employ the Adam (Kingma and Ba, 2015) optimiser to estimate step-sizes and compute the gradients using mini-batches containing 250 observations. We fit the model from Section 3.1 to 89 days worth of hourly data from February 1 st to April 30 th , noting that lockdown measures were first imposed in the UK on March 23 rd . It is of note that enhanced social distancing was introduced in the UK from March 16 th . A snapshot of the inferred spatial surface on March 23 rd at 9AM can be seen in Figure 1 . A set of 1000 inducing points Z are initialised using a k-determinantal point process (KDPP) where k = 1000 with 10000 samples drawn according to the Markov chain Monte-Carlo (MCMC) sampler described in Anari et al. (2016) . As is standard, we treat the latent values u = f (Z) as a model parameter. NO 2 reduction From the inferred latent process, we are able go beyond assessing changes in NO 2 levels at the location of measurement stations. Using the latent process we can instead compute integrated spatiotemporal mean levels. This is desirable as it gives quantities that are representative of the entire UK, and not just the parts of the UK where measurement stations exist. For the entire UK, the integrated spatial NO 2 levels decreased 36.8% between 9AM on March 16 th and 9AM on March 30 th (one week pre and post-lockdown). Deviation from normal trends While a clear reduction in NO 2 is apparent across the pre and post lockdown period, we would also like to give more context to the apparent decrease by comparing to levels in previous years. To define a normal trend, we consider data from February 1 st to April 30 th in 2019. Comparing the latent function that is learned in 2019 to the latent function that corresponds to data from the same period in 2020, we can see the effect that lockdown had on NO 2 levels, relative to expected temporal changes from 2019. Quantitatively, by computing the integrated spatiotemporal mean for data from February 1 st through to the March 23 rd in 2019 and 2020, we can see that NO 2 levels are 9% lower in 2020 compared to levels in 2019. However, if we now compute the integrated spatiotemporal mean for data from March 23 rd through to April 30 th in 2019 and 2020, then we can see that levels in 2020 are now 38% lower. This can be seen in Figure 3 where we plot the difference in NO 2 levels from 2019 to 2020, integrated over the temporal values from March 23 rd to April 30 th . In all locations NO 2 values are lower in 2020 when compared to 2019. Moreover, we can see that the decrease is greater in cities and the more urban areas of the UK compared to the more rural regions. For illustrative purposes, we can also see this temporal change in Figure 2 . Here we observe the inferred time series at a single location (51.5 o N, 0.1 o W) that corresponds to Westminster in London, UK, where it is apparent that the divergence in NO 2 levels begins around March 23 rd . Future directions This work has demonstrated a method to infer spatially and temporally complete air quality measurement data, which will serve machine learners, environmental scientists and policymakers. Computing spatiotemporal interpolations in the way described in this article allows us to make informed decisions based upon air quality levels in locations where we have no sensors. Further, using the associated predictive uncertainty, we can express to policy makers the confidence we have in these interpolations. Overall, this is a very powerful technique for atmospheric and environmental scientists. Whilst we have focussed our analysis on the COVID-19 lockdown in the UK, there is nothing preventing this analysis being applied to different time periods, locations and environmental variables. For instance, this could be used in conjunction with UK-wide health data to identify regions of the UK most at risk of air quality related illnesses. Additionally, due to the accompanying uncertainty estimates in the predictive output of a GP, there is scope for this work to be used in a decision making framework to decide where new sensors should be placed. Perhaps the simplest metric for informing this decision is choosing the point location that results in the largest decrease in predictive uncertainty. Finally, this work could also provide a dataset suitable for understanding and quantifying the spatiotemporal evolution of pollution -or indeed any sparsely measured environmental variableand for evaluating process models of the atmosphere and climate. Table 1 : Explicit forms of the kernels used in Section 3 operating on an arbitrary x ∈ R d . For Matérn kernels, the respective order is given by c /2 − 2. For notational brevity we let τ = x, x 2 2 . Kernel In the course of building the GP model described in Section 3, several alternative models were considered. Primarily, this is driven by the use of increasingly complex kernels whilst trying to build the most parsimonious model. Where complexity is determined by the number of parameters, we will proceed to detail the alternative kernels that were considered and their respective metrics in order of increasing complexity. 1) isotropic third-order Matérn, 2) Automatic Relevance Determination (ARD) third-order Matérn, 3) third-order polynomial kernel acting on the data's temporal dimension convolved with an ARD third-order Matérn acting on all but the temporal dimension. In each case a zero and linear mean function are considered. As the kernel's complexity increased, the the marginal log-likelihood increased and the root mean-squared error on a held-out set of data decreased. Based upon these two metrics, the kernel described in Section 3 was used. Building upon the GP setup in Section 2.2, we give a brief description of Stein variational Gaussian processes for the interested reader. Following Pinder et al. (2020) , we introduce a set of J particles {λ i } J i=1 where λ i = {θ, σ 2 n , u}. The particles are then jointly optimised according to the mapping T (λ t ) = λ t + t ζ(λ t ) where T is the velocity field that yields the fastest reduction of the Kullback-Leibler divergence from the approximate posterior to the true posterior. Further, is a step-size parameter. Empirically, this velocity field can be computed byζ TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems Monte carlo markov chain algorithms for sampling strongly rayleigh distributions and determinantal point processes Estimates of the Global Burden of Ambient PM2.5, Ozone, and NO2 on Asthma Incidence and Emergency Room Visits Changes in U.S. air pollution during the COVID-19 pandemic Convergence of Sparse Variational Inference in Gaussian Processes Regression COVID-19 and changes in air pollution openair -An R package for air quality data analysis Model-based Geostatistics Current and future global climate impacts resulting from COVID-19 Random forest meteorological normalisation models for Swiss PM 10 trend analysis Health risks of air pollution in Europe -HRAPIE project. New emerging risks to health from air pollution -results from the survey of experts Gaussian processes for big data The ERA5 global reanalysis Adam: A method for stochastic optimization Stein variational gradient descent: A general purpose bayesian inference algorithm GPflow: A Gaussian Process Library using TensorFlow Land cover map Stein Variational Gaussian Processes SVS: Reductions in Pollution Associated with Decreased Fossil Fuel Use Resulting from COVID-19 Mitigation Sparse gaussian processes using pseudoinputs X × X → R is a kernel function, such as the radial basis function The first term in (2) attracts particles to areas of high probability mass in Gaussian process posterior, whereas the second term in (2) encourages diversity amongst particles. For more details D Training details Prior distributions For the lengthscale parameters present in stationary kernels we use a Gamma prior with shape and scale parameters of 1. All variance parameters are assigned a Gamma prior with shape and scale parameters of 2. For the mean function's coefficients a and The kernel's lengthscale is estimated at each iteration of the optimisation procedure using the median rule Inducing points The inducing points used in Section 3 are set using a KDPP as per Particle initialisation Particle initialisation is carried out by making a random draw from the respective parameter's prior distributions Parameter constraints For all parameters where positivity is a constraint (i.e. variance), the softplus transformation is applied with a clipping of 10 −6 , as is the default in GPFlow Optimisation We use the Adam optimiser (Kingma and Ba, 2015) to optimise the set of SVGD particles. The step-size parameter is started at 0.01 for the first 500 iterations The dataset for GB search?text=ERA5&type=dataset. Near real-time air quality data from the AURN, WAQN and SAQN networks are available through the Openair R package A full implementation of the SteinGP used in Section 3 can be E Demo Implementation from steingp import SteinGPR , RBF , Median , SVGD import t e n s o r f l o w _ p r o b a b i l i t y . distributions as tfd import numpy as np import gpflow # Define some synthetic timestamps X = np . random . uniform ( -5 , 5 , 100 ) . reshape ( -1 , 1 ) # Simulate a response plus some noise at each timestamp y = np . sin ( x ) + np . random . normal ( loc =0 , scale = 0 .1 , size = X . shape ) # Predict Xtest = np . linspace ( -5 , 5 , 500 ) . reshape ( -1 , 1 ) theta = opt . get_particles () poste rior_sam ples = model . predict ( Xtest , theta , n_samples = 5 )