key: cord-0747960-cx44mosv authors: Schüler, Lennart; Calabrese, Justin M.; Attinger, Sabine title: Data driven high resolution modeling and spatial analyses of the COVID-19 pandemic in Germany date: 2021-08-18 journal: PLoS One DOI: 10.1371/journal.pone.0254660 sha: f1a80031c8ed87e6aca9c096c7f29628466e35c3 doc_id: 747960 cord_uid: cx44mosv The SARS-CoV-2 virus has spread around the world with over 100 million infections to date, and currently many countries are fighting the second wave of infections. With neither sufficient vaccination capacity nor effective medication, non-pharmaceutical interventions (NPIs) remain the measure of choice. However, NPIs place a great burden on society, the mental health of individuals, and economics. Therefore the cost/benefit ratio must be carefully balanced and a target-oriented small-scale implementation of these NPIs could help achieve this balance. To this end, we introduce a modified SEIRD-class compartment model and parametrize it locally for all 412 districts of Germany. The NPIs are modeled at district level by time varying contact rates. This high spatial resolution makes it possible to apply geostatistical methods to analyse the spatial patterns of the pandemic in Germany and to compare the results of different spatial resolutions. We find that the modified SEIRD model can successfully be fitted to the COVID-19 cases in German districts, states, and also nationwide. We propose the correlation length as a further measure, besides the weekly incidence rates, to describe the current situation of the epidemic. The SARS-CoV-2 virus was first detected in China in late 2019, and then rapidly spread around the world. By March 2020, COVID-19, the disease caused by SARS-CoV-2, was officially declared a pandemic by the World Health Organization [1] . To date, the pandemic has resulted in devastating consequences to life, health, and national economies. The novelty of the SARS-CoV-2 virus, coupled with the comparative lack of clinical research on coronaviruses in general, has left Non-Pharmaceutical Interventions (NPIs), such as masks, lockdowns, and social distancing measures, as the main weapons in the fight against COVID-19. Indeed, NPIs have so far played an important role in modulating the dynamics of the pandemic [2] . reporting obligation of all positive COVID-19 tests to the RKI and the fact that these data are published on the district level makes it possible to model the epidemic at this comparatively high spatial resolution. The population size of the districts is taken from the Federal Statistical Office of Germany [13] . The COVID-19 epidemic in Germany is modeled using a compartmental epidemiological model [14] on the district level. Within each district, the population is divided into Susceptible, Exposed, Infectious, Recovered, and Dead compartments, with the total population being the sum of the individuals in the compartments minus the COVID-19 related deaths N = S + E + I + R. To keep the number of parameters as low as possible, the exposed individuals and the asymptomatic cases are handled together in one compartment. The modified SEIRD model is formulated as It is assumed that the asymptomatic cases can recover, but not die due to COVID-19, thus Eq (5) is only coupled to Eq (3). A graphical visualization of the system of Eqs (1)-(5) is shown in Fig 1. The NPIs are modeled by a piecewise constant contact rate β(t), which is allowed to change at the dates of the NPI implementations. Without loss of generality, this assumption is reformulated to constant contact rates β j , with j = 1, 2, . . ., M + 1 and M being the total number of NPIs. β j is exchanged by β j+1 at the date of the j-th NPI. All simulations start on the 2020-03-01 and for the initial conditions, we use the number of laboratory-confirmed cases per day obs , gathered by the RKI. It translates to the SEIRD-model (1)-(5) as I obs ¼ b aE. Thus, the initial condition for the Exposed compartment is E(0) = I obs /α. For the Infectious compartment, the reported cases over the last 1/α days are integrated Ið0Þ ¼ R 0 À 1=a I obs ðtÞdt. This is only an approximation, but it quickly becomes irrelevant, as the compartment is filled by the Exposed. The Recovered are set to R(0) = 0, exactly like the Dead D(0) = 0, as there where no reported deaths at the initial time. Then, the initial condition for the Susceptible compartment is calculated as Because the latent and asymptomatic cases are lumped together into one compartment, parts of the model structure and some of its parameters cannot easily be mapped to quantities which can actually be measured, like the mean time it takes for the asymptomatic cases to recover. This decision was made in order to keep the number of parameters as low as possible, but at the same time, to have a model, that is flexible enough to reproduce the course of the COVID-19 epidemic across different scales and all districts in Germany. The assumptions made for SIR-type models break down for small populations. Because the number of cases per day is often already low on the district level without separating the cases into different age groups, we neglect the age distribution of the population to avoid further reducing the number of individuals in the respective compartments. Using the next generation matrix approach [15] , the reproduction number for the SEIRDmodel can be calculated yielding The system of non-linear ordinary Eqs (1)-(5) is numerically solved using an explicit Runge-Kutta method of order 5(4), derived by Dormand et al. [16] and implemented by Virtanen et al. [17] . The M + 5 unknown parameters θ = (α, β 1 , β 2 , . . .β j , γ, κ, μ) T per district in Eqs (1)-(5) are estimated using Bayesian inference. For the evidence, the number of laboratory-confirmed cases per day I obs and the number of deaths related to COVID-19 per day D obs , gathered by the RKI, are used. These data are grouped together as X obs = (I obs , D obs ) T . Translating X obs to the SEIRD-model (1)- (5) , the rate of positively tested cases per day is expressed as I obs ¼ b aE and the rate of COVID-19 related deaths as D obs ¼ b mI, with X = (αE, μI) T . Assuming Poisson error distributions, we maximize its log-likelihood with t being the total number of days simulated. The parameter inference is set up for all of the 412 districts and the sampling is repeated 200000 times for each of them. The prior distributions of the parameters are uniform P(θ)�U and the sampling is done using the Metropolis-Hastings MCMC algorithm [18, 19] . The first 10% of the simulations are used for classical Monte Carlo sampling for the burn-in period. From this, the best parameter set is used as the initial parameter set for the Metropolis sampler. 10 MCMC chains are used for convergence checks. SPOTPY [20] is used for the implementation of the parameter inference. The RKI gathers and updates its data on the COVID-19 epidemic once a day. These data are downloaded and preprocessed in order to use it for the evidence in the Bayesian framework. Next, the parameter inference is executed for all districts in parallel. Finally, the analyses are done and the plots are created. All these steps are part of a fully automatised workflow on the HPC Cluster EVE [21] at the UFZ Leipzig. For a comparison with the much more common approach of modeling an epidemic on a national level, the results from all fitted district level simulations are aggregated, first to the level of states within Germany, and subsequently to the national level. This yields three different spatial resolutions that can then be compared: 1) district, 2) state, and 3) national. Additionally, the same SEIRD-model (1)-(5), which was applied to the districts, is also parametrized for the national case and death rates for resolution 3) and for the 16 individual German states for resolution 2). We performed sensitivity analyses to better understand the model behavior using the extended Fourier amplitude sensitivity test (FAST) algorithm [22] . This method is a variancebased global sensitivity analysis taking parameter interactions into account and is implemented by SPOTPY [20] . The relatively high spatial resolution of German districts makes it possible to use geostatistical methods to identify spatial correlation structures. The variogram is a function describing the type, length, and strength of these spatial correlations. In the cases considered here, the variogram increases monotonically from 0 until it flattens out when it reaches its maximum, which is equal to the variance of the field. The greater the variogram at a certain distance, the smaller the correlations at that distance. If only few and spatially separated superspreader events take place in Germany, we expect to see a high correlation length but a low correlation strength, because all the districts with low infection numbers are highly correlated over a large area (left panel in S1 Fig) . But if a superspreader event causes a spreading of infections to neighboring districts and a map of the case numbers on a district level would be plotted, this map would look very patchy, with clusters of high case numbers next to clusters of low case numbers (right panel in S1 Fig) . This would be reflected in a variogram with shorter correlation lengths and a higher correlation strength. The variogram is calculated by first computing the pairwise distances of all data points z(x i ) (in this case the number of positively tested individuals at the centroid x i of the district in which they where reported). Depending on these pairwise distances kx i − x j k, the values are grouped into bins of different distances r with r k � kx i − x j k