key: cord-0213754-28mfnhsl authors: Chatzilena, Anastasia; Demiris, Nikolaos; Kalogeropoulos, Konstantinos title: A modelling framework for the analysis of the transmission of SARS-CoV2 date: 2022-03-07 journal: nan DOI: nan sha: 37a692954398bb4075576c722aaeadd0098628c1 doc_id: 213754 cord_uid: 28mfnhsl Action plans against the current SARS-CoV-2 pandemic have been implemented around the globe in order to reduce transmission. The reproduction number has been found to respond to public health interventions changing throughout the pandemic waves. However, the actual global burden of SARS-CoV-2 remains unknown due to severe under-ascertainment of cases. The use of reported deaths has been pointed out as a more reliable source of information, likely less prone to under-reporting. Given that daily deaths occur from past infections weighted by their probability of death, one may infer the true number of infections accounting for their age distribution, using the data on reported deaths. We adopt this framework and assume that the transmission dynamics generating the total number of underlying infections can be described by a continuous time transmission model expressed through a system of non-linear ordinary differential equations, where the transmission rate and consequently the reproduction number are stochastic. We model the transmission rate as a diffusion process, with distinct volatility phases for each pandemic wave, allowing to reveal both the effect of control strategies and the changes in individuals behavior. We study the case of 6 European countries and estimate the time-varying reproduction number ($R_t$) as well as the true cumulative number of infected individuals using Stan's No-U-Turn sampler variant of Hamiltonian Monte Carlo. As we estimate the true number of infections through deaths, we offer a more accurate estimate of $R_t$. We also provide an estimate of the daily reporting ratio and discuss the effects of changes in mobility and testing to the inferred quantities. The current SARS-CoV-2 pandemic which originated in December 2019 in the city of Wuhan, China and spread rapidly across the globe, has had devastating economic and social consequences, in addition to the severe loss of human life. Early on the pandemic, there have been consistent efforts from national health authorities to publicly report the daily counts of laboratory-confirmed cases and deaths in real-time, however, it became more than apparent that surveillance was going to be a challenging part in our quantitative understanding due to the low diagnostic capacity of many asymptomatic and mild infections. In response to the rising numbers of reported cases and deaths, which characterized the pandemic waves in early and late 2020 and mid-2021, many European countries, have implemented several control strategies, ranging from social distancing recommendations to large-scale lockdowns. Aiming to reduce a key epidemiological parameter, the time-varying reproduction number, these control strategies have been changing over time and different countries have adopted different action plans, expressing different policy decisions, social mechanisms, health systems capacity and transmission dynamics. The evaluation of the effectiveness of the implemented preventive measures in each country, as reflected in the reduction of transmission, is not straightforward. Inference from infectious disease data is a non-standard problem since one rarely observes the necessary evidence [Rhodes et al., 1996, Demiris and O'Neill, 2005] . Despite the massive progress in data collection this has become apparent in the current pandemic in the estimation of the time-varying reproduction number that typically relies on surveillance data which are usually biased and incomplete. It is hard to evaluate the actual burden of the disease when we deal with such a highly transmissible disease as COVID-19, with many asymptomatic and mild symptomatic infections which are not detected by health systems [Jombart et al., 2020 , Li et al., 2020a , Verity et al., 2020 . Some large-scale seroprevalence studies [Ward et al., 2021 , Pollán et al., 2020 aimed to estimate the actual number of infections and found severe under-ascertainment. Depending on the testing capacities imposed by healthcare resource constraints and the adopted testing and tracing policies, the level of under-ascertainment has been changing over time and across countries. The number of reported deaths is a more reliable indication of which countries around the globe have faced the most severe effects of the SARS-CoV-2 pandemic and even though, the reporting of deaths may vary over time and across countries, data on reported deaths are likely less prone to under-reporting. Therefore, given that daily deaths occur from past infections weighted by their probability of death, we can infer the total number of infections using the data on reported deaths [Jombart et al., 2020 , Flaxman et al., 2020 . This work uses a model-based approach to estimate the transmissibility of SARS-CoV-2 and the effect of the adopted control measures across 6 European countries. We introduce an extension of the Dureau et al. [2013] model where we fit to the unknown true number of cases an SEIR (susceptible-exposed-infected-recovered) compartmental model driven by a stochastic time-varying transmission rate that captures the effect of both the control measures and the behavioural changes. Our model may also be viewed as an extension of the work in Flaxman et al. [2020] where the estimated number of cases is indirectly inferred and generated by a diffusion-driven stochastic SEIR process. We implement our suggested model in a Bayesian framework, using Hamiltonian Monte Carlo employing the Stan software, by fitting our model to daily reported deaths for Greece, Portugal, United Kingdom, Germany, Sweden and Norway. We then examine how we can combine our estimates of the total number of daily cases with data on daily laboratory-confirmed cases and estimate the daily reporting ratio. Finally, through a multivariate regression analysis, we disentangle the effects of preventive measures and testing policies on the estimated total cases, the time-varying transmission rate and the reporting rate, using only publicly available data. Our proposed model is then fit to data from country pairs with similar population demographics and health and social welfare infrastructures, to gain insights on the COVID-19 pandemic. The paper is organised as follows: In Section 2 we present the developed model and adopted methods. Specifically, we provide an overview of the available data and present analytically our modelling framework. Section 3 contains the results of our empirical analysis. Section 4 concludes and provides some relevant discussion. All the data, R and Stan code files are made freely available at https://github.com/anastasiachtz/seir-gbm.git. Publicly available datasets containing surveillance data on new confirmed cases and deaths per day and per country or region, are maintained in the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU) [Dong et al., 2020] based on various sources. To estimate our model parameters describing the mechanisms of disease spread we use only data on the reported number of deaths as a more reliable source of information compared to laboratory-confirmed cases. The number of laboratory-confirmed daily cases constitutes a biased source of information, primarily due to the high proportion [Lavezzo et al., 2020 , Jombart et al., 2020 , Li et al., 2020a , Verity et al., 2020 of mild or asymptomatic infections which are not typically reported since people may not seek medical care or testing. This phenomenon intensifies during the pandemic waves, when the health care systems are overwhelmed, people with mild COVID-19 symptoms are advised to avoid health care unless it is necessary. Taking also into account the testing capacity constraints of each country, which has been changing over time, casts further doubts regarding the quality of the daily cases, as a source of data, towards estimating the actual number of infected people during the pandemic. The latter has been confirmed by some large-scale seroprevalence studies [Ward et al., 2021 , Pollán et al., 2020 . In this regard, we consider the total number of infections to be unobserved (latent). Reported deaths are offering a more reliable quantitative understanding of the pandemic despite their own limitations. Flaxman et al. [2020] suggested calculating backwards from observed deaths to the number of infections. The number of deaths attributed to COVID-19 has been considered less prone to under-reporting since deaths mainly arise from severe cases who are more likely to have been tested while seeking health care or after death. Therefore, we assume that the number of unreported deaths in the countries under study is negligible. However, early in the pandemic, in the absence of European and international standards, some deaths due to COVID-19 may have not be recorded. We begin our analysis 2 weeks prior each country reports 10 cumulative deaths following Flaxman et al. [2020] , with knowledge of the incomplete nature of these early data since our estimates are not affected by them. We are aware that the timing of the reporting procedures may differ between countries and reporting delays may exist, but we consider that they are relatively minor for the countries under study, however, we incorporate uncertainty within our observational model. In our context, deaths offer a window to the past, revealing the infections which led to them, but they can not be used in real-time analysis to inform current infections unless additional assumptions are imposed. Based only on publicly available data sources on the reported number of deaths, we develop a modelling framework where we link the daily new infections to the reported deaths, and infections are generated by a stochastic transmission SEIR compartmental model, adding another level of hierarchy to the model. Also, given the significant vaccine rollout, we extend our model to reflect the impact of vaccinations. The directed acyclic graph at figure 1 represents the structure of the model delineating its transmission and observational components. Statistical inference procedures are addressed within a Bayesian framework and parameter learning is carried out using Stan's implementation of Hamiltonian Monte Carlo [Stan Development Team, 2018] . The analysis period ranges from March 2020 to September 2021. Death count data for Greece, Portugal, Germany, United Kingdom and Norway are obtained from CSSE at JHU [Dong et al., 2020] , while data for Sweden were directly obtained from Folkhälsomyndigheten, the Public Health Agency of Sweden. Swedish Public Health Agency adjusts the daily number of deaths ex-post, correcting for the reporting delay, resulting in significantly different counts compared to their initial reports. We examine all possible discrepancies between data reported by national health authorities and data maintained in the COVID-19 Data Repository by CSSE at JHU and use the most integrated dataset for each country. Detailed references for each data source are presented in the supplementary material. The base of the model is to link the daily new infections to the data on reported deaths. The infection fatality ratio serves as a bridge between deaths and true infections, in the sense that deaths on any day occur from previously acquired infections according to their probability of death given infection. Following Flaxman et al. [2020] , we assume that the expected number of deaths at time t, d t , is a function of the unobserved true past infections c t−s , weighted by the distribution of time from infection to death, f [Verity et al., 2020] , and multiplied by their probability of death i.e the infection fatality ratio, if r [Levin et al., 2020] . Therefore, the expected number of deaths at time t can be expressed as, where c τ are the unobserved true past infections which will be linked to the solution of a system of ODEs (6). The infection-to-death distribution is the sum of two independent Gamma distributions, the estimated infection-to-onset distribution and the estimated onset-to-death distribution [Verity et al., 2020] , f ∼ Γ(6.29, 0.26) and it is discretized by f 1 = 1.5 0 f (τ )dτ and f s = s+0.5 s−0.5 f (τ )dτ for s = 2, 3, .... The overall if r depends on the age distribution of the infections, which changes over time. Until the end of 2020 in our period under study, we identify the points in time where the age distribution of reported infections changes, especially for the 50 − 69 and over 70 age groups which have both the lower under-reporting rates and the higher age-specific if r. We evaluate different overall if r across these different periods using estimates of the age-specific if r reported by the Centers for Disease Control and Prevention (CDC) based on Levin et al. [2020] , which we then adjust taking into account the improvement of health infrastructures after the first few months of the pandemic and the emergence of more lethal and transmissible variants at the end of 2020 [Davies et al., 2021 , Volz et al., 2021 . Our central calculations of the overall if r for each time period j in the year 2020, of a relatively constant proportion of reported cases per age where if r g is the age-specific if r for age group g, g = 1, . . . , G, c rep g is the cumulative number of reported cases of age-group g and C rep is the cumulative number of all reported cases. Given our calculations on mean if r j s for every country, we assign Beta priors for each if r j with mean if r j and low variance. At the end of 2020 and the beginning of 2021, COVID-19 vaccines became available in several countries, including the countries under study. Vulnerable groups at the highest risk of severe disease were prioritized by the designed vaccination programs, therefore older age groups were immunised first. Protection of older age groups and an increase in the number of infections in younger age groups in the presence of more transmissible variants, led to a significant decrease in if r. Based on the rate of decrease of the estimated overall if r in the UK by Birrell et al. [2021] , we re-adjust the overall if r in all countries, taking into account the timing of their immunization programs. Given the existence of asymptomatic infections and the inability of mass testing, the confirmed number of infections will always be smaller than the actual number of infections. We use the latter as a rule of thumb and update the overall if r if the estimated number of infections is larger than the reported number of infections. The reported deaths D t at time t are assigned a Negative Binomial distribution with mean d t and variance d t + where 1/φ controls the overdispersion and a-prioti 1/φ ∼ C + (0, 5). Several epidemiological models have been proposed [Birrell et al., 2021 , Flaxman et al., 2020 , Lemaitre et al., 2020 , Dehning et al., 2020 , Wood, 2021 attempting to describe the transmission dynamics of SARS-CoV-2 and the effects of preventative measures on these dynamics. Non-pharmaceutical interventions such as social distancing recommendations, limitations on the size of indoor and outdoor gatherings, promotion of teleworking, self-isolation of symptomatic individuals, school closures and ultimately stay-at-home measures, primarily aim to limit the contact rate between individuals while also affecting the relative infectiousness of infected individuals. These control strategies have been changing over time, different action plans have been adopted according to the epidemiological situation of each country and local communities have responded differently to these measures between pandemic waves. These considerations suggest that a flexible model should be adopted for capturing the dynamics of the effective transmission rate. Here we consider a stochastic expansion of the well-known deterministic SEIR compartmental model [Anderson and May, 1992] which assumes a homogeneously mixing population in which all individuals are equally susceptible and equally infectious should they become infected. Instead of a constant transmission rate between susceptible and infected individuals, we assume that it follows a stochastic process [Dureau et al., 2013] . Specifically, conditional upon the stochastic infection rate β t , we consider that the transmission dynamics resulting to the generated true infections in each country, are expressed as the solution of the following system of non-linear ordinary differential equations (ODEs): where S t represents the number of susceptible, E t the number of exposed, but not yet infectious, I t the number of infected and R t the number of recovered individuals at time t. The total population size of each country is denoted by N (with N = S t + E t + I t + R t ). In order to allow the latent and infectious periods to be gamma distributed, we assume that each of the E and I compartments are defined by two classes, E 1t , E 2t and I 1t , I 2t respectively. Hence, γ 1 denotes the rate at which the exposed individuals become infective so that 2 γ1 is the mean latent period and γ 2 denotes the recovery rate so that 2 γ2 is the mean infectious period. The vaccine efficacy is denoted by ρ, and ν t−U is the reported number of individuals who received the first dose of a vaccine U days prior to time t. We consider a simple vaccination scenario, where vaccinated individuals are protected 45 days after the first dose of any of the available vaccines, which is the average time to obtain immunity given that during this time interval they also have their second dose where necessary. To account for imperfect vaccine efficacy, we consider that vaccinated individuals move to the removed compartment proportionally to the vaccines efficacy, which we set equal to 50% as an average efficacy of the different types of the distributed vaccines. The transmission rate at time t is denoted by β t , for which we assume the following stochastic differential equation (SDE) The model in (5) may be viewed as the prior for the transmission rate trajectory β t . The function g(·) transforms to the real line and is typically set to the logarithm log(·). The drift function µ(·) determines the mean change in β t and is being set to 0 as we a-priori assume that upward and downward movements are equally likely. The function σ(η t , θ η ) reflects the volatility and B t denotes standard Brownian motion. Our starting point is a constant volatility assumption, σ(·) ≡ σ, but this is relaxed by introducing specific change-points across different waves to capture potential different responses resulting from adaptive human behavior. Given these specifications, we get the geometric Brownian motion as the prior for the β t trajectory. In order to link with the available observations, the model-implied daily new infections, denoted by c t , are obtained by The above integral requires solving the ODE in (4) together with the SDE in (5) and in this respect our model can be viewed as a hypo-elliptic diffusion. As such, it cannot be solved analytically meaning that the exact likelihood function for the observed data is intractable. For this reason, we adopt the data augmentation framework of Dureau et al. [2013] , in the spirit of Roberts and Stramer [2001] , that employ time-discretization via the Euler approximation. The fineness of the discretization can be chosen by the user to control the approximation error. For illustration purposes, let us consider β t to be constant between each each pair of days, i.e. for each [t − 1, t), noting that smaller intervals can also be used. The model in (5) then implies that η t |η t−1 ∼ N η t−1 , σ 2 and the solution of the ODEs in (4) can be approximated using the trapezoidal rule. More sophisticated Runge-Kutta methods may also be used at the expense of higher computational cost. To complete the model specification, we consider Gamma prior distributions for the rate of loss of latency and the recovery rate, with small variances, reflecting 2 days average latent period and 4-5 days average infectious period [Li et al., 2020b . Due to the fact that the testing capacity of the countries under study has been scaled up, particularly during 2021, we assume that new cases are isolated slightly faster, translated in shorter infectious period. Therefore, we consider an average infectious period of 5 days during the first year of the pandemic and adopt a shorter average infectious period of 4 days for the last several months. Finally, a half-Cauchy prior is assigned for the volatility of the Brownian motion, σ w ∼ C + (0, 5) for each pandemic wave w. The model was fitted in the Stan software using the NUTS algorithm. Inference for ODE-based models represents a non-trivial statistical problem. As the chosen MCMC algorithm explores the parameter space we are effectively solving an increasing number of ODE systems and the ODEs' behaviour varies for different parameter values [Grinsztajn et al., 2020] . Therefore, sophisticated systems of ODEs can be computationally intensive. This became apparent when fitting our stochastic SEIR model for an extended time period. As the number of observations increased, so did the number of parameters given the time-varying nature of the transmission rate, resulting in relatively small effective sample sizes indicating slow mixing. To improve efficiency, we fit first the corresponding SIR model and then use the posterior estimates to uniformly draw initial values for the SEIR model. Details on the implementation can be found at the supplementary material. In order to assess the time course of the pandemic in the 6 European countries under study, estimates on the time-varying reproduction number, the number of daily infections and the cumulative infections per country are reported. For the stochastic transmission SEIR model, the time-varying reproduction number R t can be thought of as the average number of secondary cases generated by a typical infectious individual, calculated here using 2β t /γ 2 . Figures 2, 3 , 4 summarize our results on R t , for each country pair. Estimates of R t at the start of the epidemic must be viewed with caution since early data on deaths can not accurately reflect the transmission dynamics of local, as opposed to country-wide, spread. For R t , unless otherwise stated, we report posterior medians and pointwise equal-tailed 95% credible intervals. Analytical results for all the inferred parameters characterizing transmission, can be found at the supplementary material, including the parameters appearing in the observational process. We examine countries in pairs based on their similarities in terms of demographics and health and social welfare infrastructures. Even though our study includes only European countries, the adopted preventive measures may differ significantly between countries and so may the response of the local populations to those measures, as captured by the variation in the transmission rate. In early March 2020, after the detection of the first SARS-CoV-2 infections, both Greece and Portugal (Fig. 2) introduced sequentially several control measures aiming to prevent large-scale outbreaks. We estimate that early measures taken by both countries, such as cancellation of large public events and closure of educational facilities, managed to drop R t significantly, before their nationwide lockdowns. In Greece, lockdown sustained a low spread of SARS-CoV-2 until the beginning of the summer, as reflected in the estimated R t which remained below 1, as opposed to Portugal for which we estimate temporary fluctuations of R t well above 1 during the same period. Between August and September, some months after the restrictive measures were eased, there was a gradual re-emergence of SARS-CoV-2 transmission in both countries. We estimate that the gradual reintroduction of social distancing measures led to a fall in R t , before the imposition of lockdowns, which sustained lower R t in the short run. However, in early 2021, in the wake of Christmas and New Year's relaxed measures, as well as the establishment of more transmissible variants, a steady increase in R t is estimated in Greece. As a consequence, control measures were strengthened further which resulted in lower transmission levels until June, as reflected in the estimated R t . In Portugal on the other hand, even though a state of emergency was declared in early November, the transmission wasn't tamed easily, and in conjunction with relaxed measures during Christmas, we estimate that R t reached its highest level since the fist wave, at the end of December 2020. Hospitals were pushed to the limit of their capacity during Portugal's third wave which was driven by the highly transmissible alpha variant [Davies et al., 2021 , Volz et al., 2021 . Eventually, stricter lockdown rules were imposed reducing R t . As the control measures where eased during summer 2021, in the presence of the more transmissible delta variant, both countries faced significant rises in R t . Figure 3 illustrates the progression of estimated R t in United Kingdom and Germany. In Germany large events were cancelled and schools as well as non-essential shops were closed by the middle of March reducing R t significantly even before the partial lockdown, which however sustained R t levels well below 1. Elevated testing and tracing policy early in the outbreak, allowed Germany to start lifting restrictions in early May while maintaining low transmission. Regarding the United Kingdom, we estimate that by the middle of March the slow introduction of social distancing measures, before the nationwide lockdown, managed to reduce R t below 1. A gradual re-emergence of SARS-CoV-2 transmission after summer gave rise to R t , which by mid-September is estimated to be constantly above 1 in both countries. Control measures and ultimately lockdown tamed transmission in the short-run in both countries, however, we estimate that the significant drop in R t was prominent before the implemented lockdowns. Increased transmissibility of the alpha variant as well as relaxed restrictions during Christmas, resulted in a rapid rise in infections both in the United Kingdom and Germany. Both countries implemented stricter lockdowns while the estimated R t was decreasing. Despite the stricter measures and a significant vaccine rollout in both countries, we estimate that R t remained over 1 up to the end of September in the presence of highly transmissible variants. While Norway's public health response to COVID-19 was similar to that of many European countries including those under study, Sweden adopted less restrictive measures with no general lockdown. Figure 4 presents our results on the estimated R t for Sweden and Norway during the period under study. The day the global pandemic was declared, Norway acted quickly, imposing a lockdown with school closures and rigorous testing. As we estimate, the early measures resulted in a substantial reduction in R t well below 1. In contrast, Sweden sequentially introduced non-pharmaceutical interventions, cancelling large public events and recommending social distancing measures especially for more vulnerable groups. However, all businesses as well as schools continued to operate. We estimate that R t managed to drop below 1, although at a smaller pace compared to Norway, but Sweden faced excess transmission in elderly care homes leading to many deaths. A resurgence of infections during the second pandemic wave, led Sweden to introduce stricter control strategies, similar to those of other countries. We estimate that R t started increasing in September in both countries, however, the estimated R t in Sweden peaked at a much higher level in October, compared to Norway. Re-introduction of control measures dropped the estimated R t in the short-run for both countries, yet during the first months of 2021, increased transmissibility characterizing the third pandemic wave, led to an increase in R t . Sweden faced a more severe burden with an estimated R t higher than Norway. The number of new daily infections is one of the main epidemiological quantities offering straightforward insight into the actual burden of the pandemic. In the supplementary material we present the estimated daily new cases, the reported cases for each country as well as the estimated aggregate incidence from March 2020 to September 2021. Our findings indicate that in all countries, during the first and second pandemic wave, the estimated daily new cases are significantly higher than the laboratory-confirmed ones. In Figure 5 , the estimated number of cumulative cases in the United Kingdom is presented, along with the equivalent estimate from REACT-2 [Ward et al., 2021] . Seroprevalence surveys such as REACT constitute a direct approach to estimate the actual number of individuals that have been infected but have not been detected by surveillance systems. We selected this particular survey since it was carefully conducted, it was sufficiently large to offer an accurate estimate of the proportion infected and took place essentially after the first wave, therefore minimising the chance of missing cases due to waning antobody levels. According to REACT-2, by mid-July the overall antibody prevalence in England was 6% (95% CI: 5.8 − 6.1). If we adjust the estimated overall prevalence to the population in the United Kingdom, our estimates on the total population infected essentially coincide with the estimates from REACT-2, independently validating our findings. Our estimates on the number of true daily cases may be combined with data on daily laboratory-confirmed cases resulting in an estimate of the number of unreported cases each day and consequently an estimate of the daily reporting ratio. Thus, using the posterior median of the estimated total number of infections at time t, denoted by c t , we explicitly incorporate a reporting delay between actual exposure and report, L, so the number of unreported cases can be described as is data on the number of laboratory-confirmed cases which are reported at time t (see supplementary material). We consider a time delay between infection and report (L) equal to 6 days and define the daily reporting ratio as the ratio of laboratory-confirmed cases to the estimated total number of cases adjusted to their time of report i.e. r t = c rep t /c t−L Especially during the first pandemic wave, when reporting protocols had not been established, there are several days where the reported number of cases display spikes that do not represent an actual increase in cases in this particular day but inconsistencies in reporting. Given that we want to capture the general direction of the varying reporting ratio, we implement a generalized additive model smoothing to remove those resulting peaks [Wood et al., 2016] . We used the mgcv-package [Wood and Wood, 2015] and derive a spline-based smoother as a function of time. The results on the smoothed reporting ratio for each country are presented in Figure 6 . Our findings indicate that during periods of high transmission there is a large proportion of under-reported infections in all countries, consistent with advice on stay at home unless needed. Even though most countries improved their testing coverage during the second and third wave, the predominance of more transmissible variants led to an increased number of infections, a large proportion of which was not detected by health systems, resulting in estimated reporting ratios lower than 40%. Discordance between our estimates on cases and the reported data for Sweden and Norway suggests that further refinement is required in these estimates. As noted in subsection 3.1, the estimated R t varies over time, which seems consistent with the expected changes as a result of the implemented control measures for COVID-19 in the countries under study. Essentially, the time-varying transmission rate β t captures transmission dynamics as shaped by the evolution of susceptible and infected individuals which in turn is affected by several time-varying factors such as non-pharmaceutical interventions, variations of human behaviour based on sociodemographic characteristics and climatic variations among others. These effects are highly interdependent making it difficult, in the absence of additional data sources, to disentangle their individual contributions to β t and design targeted and more efficient public health control strategies for each country. However, using publicly available data, we can investigate the link between some generic measures of disease control and key epidemiological quantities. In what follows we examine the relationship between the time-varying transmission rate, the daily number of true cases and the daily reporting ratio with two disease measures; mobility patterns and testing policies. Given the significant impact of vaccinations on transmission dynamics, we have chosen to run this analysis until March 2021, when the effect of non-pharmaceutical interventions was still dominant. Social distancing measures such as stay-at-home recommendations, limitations on gatherings, closure of schools and workplaces and general restrictions on internal movement can be reflected in changes in mobility patterns [Kraemer et al., 2020 , Lemaitre et al., 2020 . These kinds of measures aim to limit the contact rate between individuals and therefore reduce transmission. Mobility data by Google [Google, 2021] are collected by geographical location and summarize relative changes in movement in different categories of places, such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential. We access Google mobility data for the studied countries and kept the first component of a principal component analysis as a single measure of mobility [James et al., 2013] . Regarding testing and tracing policies a crude measure of the efforts to increase testing capacity in each country is the reported number of daily tests. . Data of this kind may not be an accurate representation of each country's testing and tracing policy but may reasonably account for a significant fraction of the actual tests performed and crudely reflect possible variations in the testing policy. We assume that the time-varying transmission rate, the true cases and the reporting ratio correlate over time and use multivariate regression analysis in order to examine their relationship to mobility patterns and testing policies. We use as covariates the sum of lagged mobility trends weighted by the time they are able to generate infections and the daily number of tests performed 3 to 6 days ago. So we run the following multivariate regression for each country: Y t ∼ MVN δ 1 m t + δ 2 tests t−3 + δ 3 tests t−4 + δ 4 tests t−5 + δ 5 tests t−6 , Σ where Y t = (c t , log(β t ), logit (r t )). The mobility proxy m t is described as t−1 τ =1 mob τ π t−τ , where mob is the first principal component of movement trends and π ∼ Γ(2.6, 0.4) is the serial interval which is discretized by π 1 = 1.5 0 π(τ )dτ and π s = s+0.5 s−0.5 π(τ )dτ for s = 2, 3, .... The number of tests at day t is denoted by tests t and the regression coefficients by δ i , i = 1, . . . , 5. The covariance matrix is represented by Σ and we can rewrite it in terms of the correlation matrix Ω as, Σ = D σ ΩD σ , where D σ denotes a diagonal matrix with diagonal elements σ i , i = 1, 2, 3. Then, we specify a LKJ onion method correlation matrix distribution [Lewandowski et al., 2009] for Ω. The parameterization of LKJ distribution used in Stan allows to sample matrices depending on a shape parameter. The shape parameter determines whether we expect to sample posterior matrices close to the identity matrix or to general positive definite matrices [Stan Development Team, 2018] . In principle, large values of the shape parameter pertain to correlations close to zero while values less than 1 suggest high probability for non-zero correlations. We use Stan's implicit parameterization of the LKJ correlation matrix in terms of its Cholesky factor. We account for the dependence in our variables by assuming that Ω ∼ LkjCholesky(0.5) which translates to a U-shaped prior over random correlation matrices and assign a half-Normal prior on the standard deviations: σ i ∼ N + (0, 10). For the vector of regression coefficients δ we use Zellner's g-prior, a multivariate normal density with covariance matrix proportional to the inverse Fisher information matrix [Zellner, 1986] where g reflects the amount of information available in the data relative to the prior. We set g = n which is equivalent to the prior having the same amount of information as one observation [Kass and Wasserman, 1995] . Our results suggest that mobility has a significantly positive effect on the transmission rate for Greece, Portugal, United Kingdom and Germany, indicating that increased mobility is associated with increased transmission. This effect is weaker with respect to the total infections as a significant effect is clear only for Portugal. The relation between the reporting ratio and mobility is not immediately apparent since increased mobility can increase infections but if at the same time the number of tests increases then the effect on the reporting ratio would be unclear. In Greece, United Kingdom, Germany, Sweden and Norway our estimates reflect a positive effect of mobility on the reporting ratio while there is a negative effect in Portugal. Concerning the impact of tests on transmission and infections, we would expect increased testing capacity to decrease the number of infections, as well as the transmission rate, given that confirmed infections are detected earlier and isolated. Our findings are in line with the latter, indicating a significant negative effect of the lagged number of tests on transmission rate for Greece, Portugal, Sweden and Norway. However, an increase in the tests performed may be the result of increased transmission in the community. A positive statistically significant relation between the estimated daily cases and the number of tests performed during the previous days, is observed in all countries. Finally, the reporting ratio is positively associated with test numbers in the United Kingdom and Norway. In any case, our results for Sweden and Norway should be cautiously interpreted, given the discrepancies in our estimates of the daily infections. In this paper, we present a Bayesian approach for the estimation of temporal changes in the reproduction number of SARS-CoV-2 through data on deaths. Using a flexible stochastic extension of the SEIR model, we examine the COVID-19 pandemic in 6 European countries, inferring key epidemiological quantities such as the case reproduction number and the daily number of cases. The adopted COVID-19 outbreak control measures primarily aim to affect the transmission rate affecting the evolution of susceptible and infected populations. We assume that the generation of infections is described by an extension of the deterministic SEIR compartmental model where the transmission rate is stochastic. A proportion of these infections results to the deaths that we observe, according to a certain probability. We estimate that during the time course of the pandemic there have been substantially more infections than those detected by health care systems. Especially during the peak of each pandemic wave, the actual number of infections is significantly larger. The estimated cumulative cases can offer a measure of the actual burden of the pandemic. We show that the estimated changes in the reproduction number are consistent with the expected variation in SARS-CoV-2 transmission over time, as a result of the implemented control strategies. We estimate that all countries except Sweden, having introduced several non-pharmaceutical interventions, were able to drop R t below 1 well short of their nationwide lockdowns. The effects of sequentially introduced interventions in a small period of time, are highly interdependent making it hard to disentangle their individual contribution. Therefore, one may not offer robust conclusions concerning optimal strategies in the absence of additional data sources. A distinct characteristic of our modelling approach is the absence of strong structural assumptions for the temporal evolution of the transmission rate, and subsequently for the case reproduction number, departing from the piecewise constant assumption of Flaxman et al. [2020] . Changes in R t are only driven by variations in the observed data on deaths for each country. We consider that control measures, different public responses on these measures based on cultural characteristics, adaptive human behaviour during a pandemic and any other time-varying factor is reflected in the trends in the numbers of deaths resulting from the respective infections. Therefore, our framework can be adapted to other countries in a straightforward manner. Several limitations need to be considered when using death counts as the main source of information. Early in the pandemic, in the absence of European and international standards, some deaths due to COVID-19 may not be recorded leading to underestimation of infections. Therefore our initial estimates must be viewed with caution. Also, reporting procedures differ between countries both in terms of the timing of the report as well as the definition of COVID-19 related death. We used an integrated data source, comparing the data reported by each national health authority to the data maintained by CSSE at JHU. In addition, data on deaths depend on past infections and are not suitable for real-time analysis without further assumptions. However, in the absence of large seroprevalence studies in many countries, death counts offer a credible option for evaluating the actual burden of the pandemic in terms of people infected. The objective of this work was to provide a flexible framework offering an accurate representation of what has happened in the pandemic thus far. Extending our analysis for 19 months increased the computational cost and introduced several time-dependent factors which should be taken into account. Our findings rely on estimates of the if r which have large uncertainty especially after partial immunity is induced through vaccination. We allow only for deterministic changes in if r at specific time points based on our empirical observations on the emergence of more lethal and transmissible variants, the introduction of vaccines and the improvement or extreme pressure on health infrastructures. Seroprevalence studies may offer additional insights into the time-varying if r. Unfortunately, such surveys are not readily available available for all countries. During the time course of the pandemic factors such as the capacity of hospitals during periods of high transmission and the efficacy of the available vaccines may affect the infection to death distribution, assumed constant throughout this analysis. Additional detailed data on hospitalization can relax this assumption, further refining the results. Nevertheless, as evident by the comparison of our results to independent studies, the generic approach of this work offers a flexible and accurate framework for modelling SARS-CoV2 transmission. In order to improve efficiency while fitting the stochastic transmission SEIR model for an extended time period, we first fit the analogous SIR model and then use the posterior estimates to draw initial values for the SEIR model. For shorter time periods, the SIR model was less sensitive to initial values and computationally less intensive. Details on the SIR model's specification and indicative results for Greece can be found in the extended supplementary material available at https://github.com/anastasiachtz/seir-gbm.git. In executing this analysis, we began with 4 chains each with 4500 iterations, a warm-up of 2000 and thinning rate equal to 5, leading to a total of 2000 posterior samples. However, time constraints led to the use of fewer iterations which were adequate to produce reliable estimates. Therefore, for some countries, we use 4 chains, each with 1000 iterations of which the first 500 are warm-up to automatically tune the sampler, leading to a total of 2000 posterior samples. We examine the convergence of the parameters by inspecting the trace plots of all chains, indicating that there is no lack of convergence, and by checking theR convergence statistic and effective sample sizes reported by Stan. In all cases it appears that the chains converged reasonably well with relatively low effective sample sizes only for the volatility parameters of the Brownian motion. Volatilities are top-level parameters in our hierarchical model and we have very little information for them. Other non-informative priors have been also tested for σ w giving similar results. Analytical results are provided in the next section. Figures S1a-S6b offer a thorough graphic presentation of the time course of estimated infections in parallel with some general control measures implemented in each country, for the time period between March and mid-October 2020 and between mid-October 2020 and September 2021. For ease of presentation, uncertainty in our results on new daily infections is expressed through 50% credible intervals (CI) derived from the 25% and 75% quantiles. Our initial estimates of the infections are uncertain, due to possible under-ascertainment in deaths, and likewise, when it comes to the estimated infections of the latest period under study, uncertainty exists due to the fact that deaths can only offer an accurate representation of infections during the past 2 − 3 weeks. The estimated aggregate infections in each country from March 2020 to September 2021 are presented in Figure S7 offering an indication of the actual burden of the disease. Finally, Figure S8 presents a visual inspection of the model fit to the data on death counts for each country and Figure S9 sets out additional model's estimates on the unreported daily number of cases for each country. Tables S1, S2, S3 present posterior means and credible intervals for some indicative model parameters for each country, as well as effective sample sizes andRs. In all countries, during the first and second pandemic wave, the estimated daily new cases are significantly higher than the laboratory-confirmed cases. Also, both posterior medians and credible intervals are skewed to the left in comparison to the reported cases, which can be explained by possible reporting delays and by the fact that transmission also occurs through asymptomatic individuals [Lavezzo et al., 2020] . We estimate that during the first pandemic wave, Greece and Portugal having introduced several control measures well short of lockdown, were able to suppress infections early ( Fig. S1a and S2a). With regard, however, to the resurgence of infections in fall 2020, stricter measures were necessary to stem the increase of infections. The notable difference between estimated and reported cases in Portugal in January of 2021 (Fig. S2b) , indicates a possible change in the infection to death distribution, which we considered constant throughout our analysis. A shorter infection to death distribution is consistent with the extreme pressure in hospitals at that time in Portugal. Regarding the estimated infections in Germany and the United Kingdom, during the first wave, a decrease in daily new cases appeared some days before the nationwide lockdown in both countries ( Fig. S3a and S4a) . However, during the subsequent pandemic waves infections are estimated to increase steadily although both countries had imposed nationwide lockdowns in November. The dominance of the alpha variant led to extended lockdowns which eventually reduced the number of infections as estimated. In Sweden, we estimate that a large proportion of total infections were undocumented during the first wave (Fig. S5a) . Also, a substantial amount of these infections appear much earlier in the pandemic than what is reported. Poor monitoring procedures and uncontrolled outbreaks in care homes in Sweden can partly explain this significant mismatch, since older infected individuals may have a smaller infection to death distribution. The assessment of the subsequent waves, especially after January 2021 requires further investigation. The apparent inconsistency between estimated and reported cases may be the result of discrepancies in the data as well as several time-dependent factors not accounted for in our model. Estimates on the true cases in Norway must also be viewed with caution, given the sparsity of the reported data on deaths (Fig. S8f) . Days with zero reported deaths were followed by days with high death counts particularly after September 2020. Norway is a case in point, indicating the strong dependence of our model on the quality of reported data. Regarding our estimates on the total infected population, Norway's increased testing capacity compared to the other countries under study, can explain the small difference between the estimated and reported number of individuals who have been infected, as presented in Figure S7f . Counting process models for infectious disease data: distinguishing exposure to infection from susceptibility Bayesian inference for stochastic multitype epidemics in structured populations via random graphs Inferring the number of covid-19 cases from recently reported deaths Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov-2) Estimates of the severity of coronavirus disease 2019: a model-based analysis. The Lancet infectious diseases Sars-cov-2 antibody prevalence in england following the first peak of the pandemic Prevalence of sars-cov-2 in spain (ene-covid): a nationwide, population-based seroepidemiological study Estimating the effects of non-pharmaceutical interventions on covid-19 in europe Capturing the time-varying drivers of an epidemic using stochastic dynamical systems An interactive web-based dashboard to track covid-19 in real time. The Lancet infectious diseases Suppression of a sars-cov-2 outbreak in the italian municipality of vo' Reconstructing the early global dynamics of under-ascertained covid-19 cases and infections Stan Modeling Language User's Guide and Reference Manual, Version 2 Assessing the age specificity of infection fatality rates for covid-19: systematic review, metaanalysis, and public policy implications Estimated transmissibility and impact of sars-cov-2 lineage b. 1.1. 7 in england Assessing transmissibility of sars-cov-2 lineage b. 1.1. 7 in england Real-time nowcasting and forecasting of covid-19 dynamics in england: the first wave Assessing the impact of non-pharmaceutical interventions on sars-cov-2 transmission in switzerland. medRxiv Inferring change points in the spread of covid-19 reveals the effectiveness of interventions Inferring uk covid-19 fatal infection trajectories from daily mortality data: Were infections already in decline before the uk lockdowns? Infectious diseases of humans: dynamics and control On inference for partially observed nonlinear diffusion models using the metropolishastings algorithm Early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia Bayesian workflow for disease transmission modeling in stan Estimating the time-varying reproduction number of sars-cov-2 using national and subnational case counts Smoothing parameter and model selection for general smooth models The effect of human mobility and control measures on the covid-19 epidemic in china Coronavirus pandemic (covid-19). Our World in Data Generating random correlation matrices based on vines and extended onion method On assessing prior distributions and bayesian regression analysis with g-prior distributions. Bayesian inference and decision techniques A reference bayesian test for nested hypotheses and its relationship to the schwarz criterion A global panel database of pandemic policies (oxford covid-19 government response tracker) COVID-19 Data Repository by the COVID-19 Data Repository by the sex Daily vaccinations with 1st dose Greece Hellenic National Public Health Organization Anastasia Chatzilena states that: "Part of this research is co-financed by Greece and the European Union (European Social Fund-ESF) through the Operational Programme «Human Resources Development, Education and Lifelong Learning» in the context of the project "Strengthening Human Resources Research Potential via Doctorate Research" (MIS-5000432), implemented by the State Scholarships Foundation (IKY)". This work was part of first author's PhD thesis submitted in the Department of Economics at Athens University of Economics and Business.