key: cord-0256657-zxbldech authors: McGregor, Geoffrey; Tippett, Jennifer; Wan, Andy T.S.; Wang, Mengxiao; Wong, Samuel W.K. title: Comparing regional and provincial-wide COVID-19 models with physical distancing in British Columbia date: 2021-04-22 journal: nan DOI: 10.3934/math.2022376 sha: 3f90a8226877142e7db0c8d718e3d9e44052e50d doc_id: 256657 cord_uid: zxbldech We study the effects of physical distancing measures for the spread of COVID-19 in regional areas within British Columbia, using the reported cases of the five provincial Health Authorities. Building on the Bayesian epidemiological model of Anderson et al. (2020), we propose a hierarchical regional Bayesian model with time-varying regional parameters between March to December of 2020. In the absence of COVID-19 variants and vaccinations during this period, we examine the regionalized basic reproduction number, modelled prevalence, relative reduction in contact due to physical distancing, and proportion of anticipated cases that have been tested and reported. We observe significant differences between the regional and provincial-wide models and demonstrate the hierarchical regional model can better estimate regional prevalence, especially in rural regions. These results indicate that it can be useful to apply similar regional models to other parts of Canada or other countries. The coronavirus disease 2019 is caused by the SARS-CoV-2 virus, and was declared by the World Health Organization to be a global pandemic on March 11, 2020. The virus and its associated illness have spread from their origin in Wuhan, China to nearly every corner of the world. The first reported case of COVID-19 arrived in the province of British Columbia (BC) in Canada on January 28, 2020, and on the one-year anniversary of the pandemic, BC has had 86,219 positive cases and 1,397 deaths, as reported by the Government of Canada (2021) . Prior to the development of vaccines and also with on-going COVID-19 immunizations, many countries around the world and their associated provinces, states, or sub-regions, including BC, have been relying on non-medical interventions to control the transmission of the virus. The application of these interventions -including mask usage, physical distancing, contact tracing, hand washing, improved ventilation recommendations, travel restrictions, and restriction on social gatherings -can be either fixed or dynamic and is intended to prevent overwhelming the health care system. Fixed interventions are in place for a set duration of time and dynamic interventions are cycled on and off in response to the demand on the health care system. Periods of tightening and relaxation of COVID-19 restrictions can have a dramatic effect on the number of new COVID-19 case incidences over time and the overall duration of the pandemic as discussed in Shafer et al. (2021) . Thus some countries, including New Zealand, have preferred a "COVID zero" approach where these non-medical interventions are strictly enforced with the goal of eradication of COVID-19 within a specific geographic region, as recommended by Baker et al. (2020) . In contrast, BC has applied dynamic non-medical interventions to control the strain on the health care system for the majority of 2020. Urban and rural areas, including those within BC, are heterogeneous in terms of population density, cultural profile, social determinants of health, and attitudes towards COVID-19 restrictions. As defined by Health Canada (2018), social determinants of health refer to social and economic factors affecting the health of groups of individuals. Specifically, there is an inherent lack of availability of health services in rural areas and significant geographical barriers to accessing health services. On average, rural residents have to travel four times farther than those in urban regions to visit a physician, as reported in Garasia and Dobbs (2019) . Moreover, non-medical interventions rely on the collective goodwill of the people to comply with public health orders and the perceived threat of COVID-19. These behaviours and perceptions are known to vary between urban and rural regions. Specifically, rural residents have been found to be less likely to adapt their behaviour in response to COVID-19 and to have more negative perceptions of the effectiveness of the non-medical interventions, as discussed in . While these significant differences between urban and rural areas persist, BC has seen predominantly blanketed restrictions across the province for the majority of 2020, based on COVID-19 projections using data sourced from urban centres. Thus, macro-level modelling at the provincial or territorial level may be ignoring the nuances of rural living and limiting the specificity of these models to non-urban areas. One metric utilized for quantifying these regional differences is the basic reproduction number, R 0 , defined as the expected number of new cases a single infectious individual will generate. As discussed in Delamater et al. (2019) , a region's estimated R 0 value will vary depending on the population density and social behaviours specific to that region. For example, in a COVID-19 modelling study of Turk et al. (2020) , R 0 was estimated to be lower for a central urban area than for the entire state of North Carolina. In BC, a compartmental mathematical modelling study by Anderson et al. (2020) played an instrumental role in guiding the reopening plans devised by the provincial government after it shut down all but essential services on March 17, 2020. This model was used to assess the spread of COVID-19 in BC and to provide projections based on various levels of physical distancing. It was constructed as a modified SEIR model, which differs from typical SEIR models in that each model compartment was replicated, in order to create a series of two parallel compartments for those who did and did not engage in physical distancing. While the model of Anderson et al. (2020) was vital in BC's response to COVID-19 at the beginning of the pandemic, it did not account for the inherent and aforementioned differences between urban and rural regions. Furthermore, it contained parameters related to physically distancing that were derived from urban data sources, such as the rate at which individuals move to and from being physically distanced. In this work, there are two main objectives. First, we adapt the compartmental model of Anderson et al. (2020) to a regional model. Second, we study the differences between the proposed regional model and the provincial-wide model during March to December of 2020, prior to the vaccination phase beginning in January of 2021. Specifically, we propose a hierarchical Bayesian epidemiological model of COVID-19 for each of BC's five regional health authorities: Vancouver Coastal Health, Fraser Health, Interior Health, Island Health, and Northern Health. We refer the five regional health authorities respectively as Coastal, Fraser, Interior, Island and Northern and often refer to them as regions. The former two regions contain the most populated urban areas in BC, including Greater Vancouver and the Fraser Valley, whereas the latter three regions are primarily rural and less populated. Since previous research has highlighted the inverse relationship between rurality and the likelihood of modified behaviours in response to the pandemic, certain model parameters are regionalized to these five regions of BC. Specifically, we chose the regional parameters to be the relative reduction in contacts due to physical distancing and the proportion of anticipated COVID-19 cases that have been tested and reported, leading to a Bayesian hierarchical model. We discuss briefly the assumptions of our hierarchical regional model and limitations to our approach, for which the subsequent section will provide more details. While grouping the provincial case counts into regions would allow us to inspect case data more precisely within each region, looking solely at the regional data can be misleading for predicting future regional case counts. One main reason is that these regions are not isolated, since there can be migration between regions of British Columbia, even during periods of intraprovincial travel restrictions. Therefore, if one region has a large spike in cases, then it is possible that neighbouring regions will also see an increase in cases over time. The impacts of migration on COVID-19 have been recently studied by Hu et al. (2021) ; Sirkeci and Yucesahin (2020) ; Zhan et al. (2020) . Overall, reducing regional migration is an important aspect of controlling the spread of COVID-19. Despite migration playing a significant role in the spread of COVID-19 throughout regions, an epidemiological and statistical model accounting for migration requires access to migration data, which were not publicly available for the period of our study. Thus, we do not inherently take into account of migration between regions and instead use the observed case counts to infer estimates about our model parameters. As shown later in the Results and Discussion section, we observed that there were periods when the provincial trend was opposite to that of specific regional trends. For example, the province as a whole may be experiencing a rapid decrease in daily cases, while certain rural regions are experiencing growth in cases. In these situations, the insight gained from our regional modelling may help guide region specific mitigation measures. Conversely, if the opposite scenario occurs, where the provincial case numbers are on the rise while certain regions' case numbers remain low, then restricting migration may help to prevent the spread of new cases into these regions. This paper is organized as follows. In Section 2, we extend the Bayesian epidemiological model introduced by Anderson et al. (2020) to include the five health regions of BC. Specifically, we introduce a regional epidemiological model in Section 2.1 and discuss the regional time-varying specific parameters, such as regional relative contact reduction due to physical distancing. In Section 2.2, we detail the hierarchical structure of our Bayesian model. In particular, we discuss the regional proportion of anticipated cases that have been tested and reported due to improvement in testing and provide justifications on the choice of priors. In Section 2.3, we summarize specific model parameters for both the hierarchical regional model and the provincial-wide model used for comparison. In Section 2.4, we assess the hierarchical model's ability to estimate model parameters from a controlled simulation study. In Section 2.5, we provide justifications on our choice of hierarchical model versus a non-hierarchical one. Finally, in Section 3, we present our main results and compare the differences in prevalence and parameters between the provincial-wide and regional models. To close the paper, we summarize our work and discuss limitations of our regional model, along with future work. In this section, we introduce an extension of the mathematical and statistical framework presented in Anderson et al. (2020) for modelling the spread of COVID-19 throughout BC, Canada. We choose to build upon this model as it captures many of the key features required for modelling COVID-19. For example, the epidemiological model contains COVID-19 specific compartments, such as, exposed but not infectious, infectious but not symptomatic, and quarantining. In addition to these compartments, populations practising physical distancing are accounted for, with reductions in exposure being time-dependent based on government lockdown protocols. The model also accounts for changes in test availability by allowing the proportion of infected individuals being tested to change over time while also accounting for potential delays between the onset of symptoms and getting a positive test. The resulting epidemiological model is then utilized within the statistical framework to compute likely values of the unknown model parameters. As in Anderson et al. (2020) , we take a Bayesian approach to parameter estimation and sample values of parameters from their posterior distribution via Hamiltonian Monte Carlo (HMC). Computations are carried out in R, using the Stan package, as detailed in Carpenter et al. (2017) . Further details on the statistical approach are in Section 2.2. Before we proceed to the model specifics, we highlight our proposed extensions to the model presented in Anderson et al. (2020) . First and foremost, we regionalize the model to describe the spread of COVID-19 in each of the five health regions of BC: Coastal, Fraser, Interior, Island, and Northern. This entails having a distinct set of differential equations for each of these regions, with certain parameters being region-specific and others shared provincial-wide. We also extend the modelling period to the end of December 2020, therefore we have included seven different physical distancing periods, based on government protocol, and four different testing periods accounting for the increasing availability of tests. We have restricted our modelling period from March to December 2020 for two main reasons. Beginning in January 2021, BC began rolling out vaccines, as reported by Migdal A. (2020) . Accounting for vaccinated individuals would require alterations to the epidemiological model. While such extensions are well studied in the literature, it is not the main goal of our work. In addition to vaccinations, COVID-19 variants began spreading throughout BC and the rest of Canada starting in mid-December of 2020, as reported by Martins N. (2021) . Moreover, certain variants are known to be far more infectious according to BC Centre for Disease Control (2021a), and therefore, significant changes to the modelling framework would be required to study the impacts of the new variants. Given that our main objective is to study the impacts of regional versus provincial-wide modelling, we have focused our attention on data from the time period between March and December of 2020. We now present a regionalized version of the epidemiological model of Anderson et al. (2020) and discuss the assumptions of our modelling choices throughout. The epidemiological model being considered is an SEIQR type model, meaning we model individuals in health region i which are susceptible (S i ), exposed, not symptomatic and not infectious (E i 1 ), exposed, not symptomatic and infectious (E i 2 ), symptomatic and infectious (I i ), quarantined (Q i ) and recovered (R i ). In addition to these compartments, a secondary set of equations , is included to model individuals in health region i practising physical distancing. The strictness of the physical distancing measures will vary in time due to policy changes throughout the different lockdown phases between March 1, 2020 and December 31, 2020. As discussed in more detail later in this section, the proposed epidemiological model has no population influx or out-flux, meaning births and deaths are not being accounted for. Due to the relatively low death rate, and the relatively short time period being studied, we believe this is a reasonable modelling choice. Within the i th health region with population N i , the epidemiological model for individuals not practising physical distancing is given by the ordinary differential equations (ODEs): For ease of reference and clarity, we have summarized the provincial and regional parameters appearing in Equation (1) in Table 1 . The values for the fixed model parameters k 1 , k 2 , D, q, u r and u d were chosen to be the same as in Anderson et al. (2020) . The parameters k 1 , k 2 and 1 D quantify the progression of the infected population through the compartments E i 1 to E i 2 and eventually to I i and R i . Specifically, D represents the mean duration of the infectious period, and the parameter q represents the rate at which infectious individuals begin fully quarantining. We have chosen to fix k 1 , k 2 and D throughout the province, as they are intrinsic to COVID-19 during the period of our study and we fix q as we do not have access to region specific quarantining data. As discussed in Anderson et al. (2020) , approximately one fifth of all severe cases eliminated transmission by either fully quarantining, or ending up in the hospital (both considered to be within compartment Q i ). Therefore, equating 1 5 = q q+1/D , yields the value q = 0.05 as seen in Table 1 . The parameter u d governs the rate at which individuals switch their behaviour to begin physical distancing and u r governs the rate at which they return to normal behaviour 1 . Later in this section, we also show how u r and u d describe the asymptotic proportion of the population who are practising physical distancing. Since the physical distancing mandates were provincial-wide during the time period under study, u d and u r are taken as provincial-wide fixed parameters. Furthermore, we interpret the parameters f i as the relative reduction in contact due to physical distancing in region i and β as the rate of transmission from the infectious population to the susceptible population. We briefly discuss these interpretations next. Firstly, physical distancing has not been the only tactic employed throughout British Columbia to mitigate the spread of COVID-19. Other mitigation measures, such as mask usage, contact tracing, hand washing, sanitization and improved ventilation recommendations, also played prominent roles in mitigating transmission of COVID-19 between March and December of 2020. Therefore, if a lower value of f i (t) is observed in a region, this could be interpreted as a combination of mitigation factors, instead of a relative reduction in contacts based solely on physical distancing. However, the lockdown phases under study were mostly associated with changes in physical distancing protocols within the province. The other aforementioned mitigation measures remained largely unchanged throughout this same time period. Thus, while a baseline portion of f i (t) can be attributed to other mitigation measures, the changes in observed case counts in the provincial and regional data are then largely due to the changes in physical distancing measures. Due to lack of data to account for the effects of other mitigation measures on f i (t), we therefore make the assumption that physical distancing is the largest contributor to mitigating transmission of COVID-19 during the time period under study. Thus, for the rest of the paper, we interpret f i (t) mostly as the relative reduction in contacts due to physical distancing. Furthermore, this is also consistent with the interpretation of f (t) as presented in the original model of Anderson et al. (2020) . Secondly, given the heterogeneity within the regions of British Columbia, it seems reasonable to have the rate of transmission parameter, β, be region specific. However, there is dependence between β and the relative reduction in contacts, f i (t), leading to some over-parametrization in our regional model in the absence of additional constraints. For example, the same case count data can be modelled by different combinations of parameter values: (a) A less contagious disease (smaller β) but with more contacts (larger f i (t)), or (b) A more contagious disease and less contacts. Given that we are interested in observing the changes in f i (t) over time, and comparing these values across regions, its dependence on β complicates their interpretation. We therefore have chosen to estimate β as a provincial-wide parameter and f i (t) as regional parameters. Prior distributions can be assigned to stabilize these parameters while providing sufficient flexibility for the data to inform their estimates; these are discussed in Section 2.2. The justification of this modelling choice is further explored in more detail in Section 2.5. Having discussed these assumptions and limitations, we continue with introducing the remaining part of the model for the analogous compartments practising physical distancing. Specifically, within the i th health region, individuals practising physical distancing have their own set of ODEs, given by: The equations for the physical distancing population mirrors (1) almost exactly. Firstly, we see that the signs of u d and u r are switched from (1) because the physical distancing population exits (1) and enters (2), and the returning to regular population exits (2) and enters (1). Next, we see the modification that the susceptible population has a reduced rate of infection, as seen by the additional factor of f i (t) in the equations for dt . We assume f i (t) to be a continuous, piecewise constant function with values in [0, 1], where f i (t) = 1 models zero reduction in contacts and f i (t) = 0 models 100% reduction in contacts for physically distanced individuals. Moreover, the f i (t) remain constant during lockdown phases, with one week linear transitions between different phases. This results in, where the dates for t S j and t F j , for j = 1, . . . , 6 are summarized in Table 2 . These change points reflect the various phases of provincial lockdown within BC throughout 2020 as reported on BC Centre for Disease Control (2021b). Specifically, Phase 1, Phase 2 and Phase 3A were between t F 1 and t F 2 , t F 2 and t F 3 , t F 3 and t F 4 respectively. According to our notation, Phase 3B would account of t F 4 to t F 6 , and we included an additional change point for Thanksgiving at t F 5 . Finally, Phase 3C took effect after t F 6 . Table 2 : The change points are used in the definition of f i (t). We note that all dates (except Thanksgiving) were obtained from BC Centre for Disease Control (2021b). As mentioned earlier in this section, the parameters u r and u d also determine the asymptotic proportion of the population following physical distancing protocols. This follows by defining , to represent the non-physically distanced population and the physically distanced population in the i th health region respectively. Using the systems (1) and (2) the total population of our modelled region. This is as expected, since we are not accounting for birth or death. . Therefore, the system of differential equations in health region i, (1) and (2), rapidly converges to u d ur+u d N i individuals practising physical distancing. Throughout this paper, we will distinguish between the basic reproduction number in the absence of physical distancing, denoted by R 0b , and the regionalized basic reproduction number, denoted by R i 0 , which factors in physical distancing and quarantining. We obtain the equation for R 0b by computing the basic reproduction number of a simplified version of (1), without physical distancing or quarantining. This can be done, for example, using the next-generation method of Diekmann et al. (1990) ; van den Driessche and Watmough (2002), where E i 1 , E i 2 , and I i are taken as the disease states. This yields R 0b = β(D + 1 k 2 ) which is common throughout the province, since β, D, and k 2 are not regionalized parameters. As shown in Appendix A of Anderson et al. (2020) , the next-generation method can also be used to compute the regionalized basic reproduction number R i 0 , given by where e represents the asymptotic proportion of individuals practising physical distancing, given . . 6, and varies based on the different regions. With the epidemiological part of the model established, we now discuss the statistical aspects of our model. The ODEs of (1) and (2) are solved within Stan using a numerical ODE solver that includes arguments: t (independent variable time), state (the ODE system at the time specified), θ (the ODE arguments that depend on parameters), x r (the ODE arguments that depend on data only) and x i (the integral data values used to evaluate the ODE system). To relate the compartments of (1) and (2) to testing data, we follow the approach of Anderson et al. (2020) and define the expected number of reported cases on day r to be µ r , given by where ψ i (r) is now the regional proportion of anticipated cases in the i th health region on day r that have been tested and reported, and w(s) is the density function of delay s, with a maximum delay of 45 days, with w(·) denoting the Weibull distribution. The histogram of times between symptoms onset and case reporting was fitted using a Weibull distribution, as in Anderson et al. (2020) where the 99.99992% quantile of 45 days is set to be the maximum delay. The resulting parameters for the Weibull distribution are λ = 9.85 and k = 1.73, which correspond to a mean of 8.8 days and SD 5.2 days. Furthermore, given that testing protocol and availability evolved throughout the pandemic, we define ψ i (r) to be the piecewise function We chose these change points for ψ i (r) because starting on March 16, 2020, the testing shifted focus to healthcare workers, long-term care residents, and community clusters not linked to travel. Then from April 9, 2020, expanded testing included residents in remote regions, people in homeless or unstable housing, first responders and returning travellers to Canada. Finally from April 21 and onwards, any individual with COVID-19 symptoms was eligible to get a test. Figure 1 : This depicts the hierarchical structure of the model dependence on parameters. R 0b is shared between all regions and f 2 , . . . , f 7 , ψ 2 , . . . , ψ 4 and φ are individualized to each region. As described earlier in Section 2.1 and illustrated in Figure 1 , we use a hierarchical model to have all five regions share a common R 0b , with the prior of Lognormal (log(2.6), 0.2). This prior reflects the range of reported values for the basic reproduction number of COVID-19 in early 2020 before large-scale interventions were implemented, as provided by MIDAS Network (2021). This approach leads to the combining of regional and information from throughout BC to obtain smoothed estimates; that is the estimates are not purely driven by local dynamics. The rate of spread within each region is then dependent on the regional specific parameters f i 2 , . . . , f i 7 . We also expect R 0b to be an important driver of the predictions made by the epidemiological model, with larger R 0b resulting in an increased number of predicted cases. As discussed in Section 2.1, the parameters D (the mean duration of the infectious period) and k 2 (rate of movement from E i 2 to I i ) are fixed, thus ensuring that the dynamics of our model can clearly reflect a given value of R 0b , since β = R 0b (D+1/k 2 ) . For each region in the hierarchical model, we used a negative binomial observation model to link the expected case count in region i, µ i r from the structural model, and the observed case count data in region i, denoted by C i r : where φ i is the (inverse) dispersion parameter with prior 1/φ i ∼ χ 2 1 ; this follows Stan's prior recommendations which guards against favouring a high over-dispersion a priori. For more details, see Stan development team (2020). The negative binomial (with dispersion) is a commonlyused distribution to model random counts, when the variance is not necessarily a deterministic function of the mean. The separate dispersion parameters φ i for each region allow for flexibility in modelling the negative binomial variance, which could differ between regions. Equation (7) makes the assumption that the observed cases in a region on a given day are conditionally independent of other days and regions, given its modelled mean and dispersion. Since µ i r is directly linked to the epidemiological model (see Equation 5 ) and captures the trends in each region over time, we believe this conditional independence assumption is reasonable. Taking a Bayesian approach, the corresponding statistical model has parameters R 0b , f i k , ψ i j and φ i , for k = 2, . . . , 7 and j = 1, . . . , 4, and the joint posterior distribution of the parameters can be written as Table 3 . The priors chosen for f i 2 . . . f i 7 reflect the gradual loosening of restriction between March 21, 2020 and November 6, 2020, and tightening of restrictions starting from November 7, 2020. Equation (8) further indicates that we assume the priors are independent. Thus, for example, we expect a priori that f i 3 is likely to be larger than f i 2 due to gradual loosening of restrictions and so Table 3 places their prior modes at 0.5 vs. 0.4; however, this choice of prior does not force f i 3 to be larger than f i 2 in the posterior distribution due to their independence. For ψ 1 , . . . , ψ 4 , the prior modes reflect our knowledge of increasing test availability, but these quantities cannot be directly measured; the antibody study of Statistics Canada (2021) provides a rough estimate of antibody seroprevalence due to infection and it is reasonable to set the prior mode for ψ 4 (which covers the majority of the study period duration) at 0.4 with a wide SD of 0.2, given the high degree of uncertainty in the Statistics Canada estimate. 2 In general, the priors we have chosen encode our knowledge of policy and testing protocols, while being flexible enough (via prior standard deviations of 0.25 for f i 2 . . . f i 7 and 0.2 for ψ i 1 , . . . , ψ i 4 ) to allow the data to inform the final parameter estimates. This point is validated in the simulation study that follows in Section 2.4. In this section, we summarize the rest of the model inputs from known sources. Table 4 shows the populations of each region reported in Interior Health (2020). This was used for initialization and comparison with a provincial-wide model. Moreover, we have used the value of N =5,100,000 in our provincial-wide model for the population of BC 3 . For the provincial-wide model, we treat the entire province as one region and thus have a single set of differential equations (1) and (2) to solve. We take the same initial conditions as those given in Anderson et al. (2020) . Specifically, the model is initialized on February 1 with 8 individuals shared between compartments E 1 , E 2 , I and E 1d , E 2d and I d , where the 8 cases reflects an assumed 10-30% reporting rate in the early days of the pandemic (see Table 2 of Anderson et al. (2020) for the specific values). Using these initial conditions, equations (1) and (2) are solved throughout February to the obtain compartment values needed for the observation model beginning on March 1. The purpose of including the provincial-wide model is to subsequently compare results with those of our regional model. To do so, we impose the same model structure on the provincial-wide model, with the same change points for f 2 , . . . , f 7 and ψ 1 , . . . , ψ 4 and the same priors on all model parameters. Then to fairly compare the results with those from the regional model, we scale our provincial-wide results using the regional population ratios ( N i N ), i.e., Coastal by 0.24, Fraser by 0.37, Interior by 0.16, Island by 0.17 and Northern by 0.06. For the regional model, we have equations (1) and (2) to solve for each region given the set of parameters f i 1 , . . . , f i 7 and R 0b , or equivalently β. Likewise, appropriate initial conditions for the differential equations are needed, namely initial values for all model compartments in all five regions. To obtain these, we similarly scale the provincial-wide compartmental values on February 1 according to their relative population. For example, in the i-th region we set S i (February 1) = N i N S(February 1), where S denotes total number of susceptible individuals provincial-wide and N is the total population of BC. The data used in this research are publicly available from the British Columbia Centre for Disease Control website of BC Centre for Disease Control (2021b). At the time that this project commenced, the data for all health authorities were downloaded in a single CSV file and was manually organized by region. During the time frame under investigation in this work, BC experienced two waves of COVID-19 cases. The first occurred from March to early April of 2020, where the seven-day moving average of cases (red line) reached approximately 50 cases per day. From the beginning of April onward, case numbers were very low and remained that way until they started climbing at the beginning of July 2020. The second wave surge commenced at the end of October 2020. The seven-day moving average of cases skyrocketed from 150 cases per day to just under 800 cases per day by the beginning of December 2020 before tapering off and plateauing at approximately 500 cases per day for the remainder of 2020. In this section, we discuss details on validating our modelling approach using simulated data. We conducted a simulation study to assess the estimation of model parameters in a controlled setting. Starting with known values of all the parameters, we use the model to simulate observations (case counts). The model is then fitted to the simulated data, and we examine whether the original parameter values can be recovered. We follow the model setup as laid out in the previous sections, including the priors (Table 3) , change-points for f i (t) ( Table 2 and transitions in Equation 3) and ψ i (r) (Equation 6), and initialization of the compartments (Section 2.3) . The values set for the hierarchical R 0b and regional Table 5 ; these values were chosen such that case counts simulated from these parameters can mimic the real data observed for the five regions. In particular, there is substantial heterogeneity in several of the regional parameters (e.g., f 3 , ψ 1 , ψ 2 , φ), and the modes of the priors are not necessarily close to the actual parameter values (e.g., hierarchical R 0b , f 3 and f 5 for most regions, ψ(r) for Coastal region). Thus, this is also a realistic test for whether the guidance provided by the priors leave sufficient flexibility for the data to inform the parameter estimates. To simulate observations from the model, we use the initial conditions (Section 2.3) to solve the ODE system (Equations 1 and 2) and obtain compartment values for each region from February 1 to December 31 of 2020. Then we use Equation (5) to calculate the expected number of reported cases for each region on each day from March 1 to December 31 of 2020, and finally generate our simulated daily case counts from the negative binomial observation model (Equation 7) . We then fit the model to the simulated data, and the histograms of the MCMC samples from Stan for each parameter are shown in Figures 8-10 in Appendix A, with the blue vertical lines indicating the true value of the parameter (as given in Table 5 ) and the red lines indicating the 0.05 and 0.95 quantiles (i.e., the bounds of the 90% central credible interval) of the MCMC samples for that parameter. It can be seen that all of the true parameter values lie within their 90% credible intervals, which suggests that the Bayesian estimation procedure can reasonably infer the model parameters given the choice of priors. For example, the prior mode for the hierarchical R 0b is at 2.6, while its posterior distribution (last panel of Figure 10 ) has shifted to a mode just below the true value of 3.0 together with a fairly precise 90% credible interval (2.90, 3.08). The posteriors of f i likewise have fairly tight credible intervals, showing that the estimates are not unduly influenced by the priors (e.g., see the f 3 panels in the second column of Figure 9 , which clearly reflect the differences in the true regional f 3 values in Table 5 ). The most uncertain parameter is ψ 4 , as seen in the relatively wide credible intervals ( Figure 8, fourth column) ; nonetheless, a shift away from the prior mode (0.4) towards higher values is noticeable. Overall, these results provide confidence in proceeding to apply our hierarchical model to the real data analysis. In this section, we further justify our choice of a hierarchical regional model with β being a provincial-wide estimated parameter and f i (t) being regional estimated parameters. As discussed at the beginning of Section 2.1, there is dependence between the rate of infection parameter β and the relative reduction in physical contacts f i (t). In particular, we stated earlier in Section 2.1 that the same data can be fitted by having decreases in β compensated by increases in f i (t), or vice versa. This inverse relationship can be seen analytically using equation (4) for the basic reproduction number R i 0 for region i. Using the known parameters defined in Table 1 , equation (4) simplifies to with R 0b = β(D + 1 k 2 ) = 6β. Thus, we see that indeed increases or reductions in β, or equivalently in R 0b , can be compensated for by reductions or increases respectively in f i (t) to achieve the same R i 0 . This inverse relationship is also confirmed in Figures 11 and 12 of Appendix B. In these plots, we are comparing a non-hierarchical regional model, meaning both β and the f i s are estimated using only regional data, versus our hierarchical regional model with an estimated β taken to be the common throughout the province. Specifically in Figure 11 , we see that the non-hierarchical regional model estimates R 0b , shown in blue, to be consistently lower than the provincial estimates, with more pronounced variation in the less populated regions of Island and Northern. Looking at Figure 12 , we also see that the non-hierarchical regional model, shown in blue, estimates consistently larger f i (t) than the hierarchical regional model. Therefore, although the non-hierarchical regional model may be able to obtain an adequate fit of the data, the increased variability in R 0b throughout the regions inhibits our ability to interpret the estimated f i (t) values. Moreover, the stability of the parameter estimates is an important feature of the hierarchical regional model, as validated in the simulation study (Section 2.4) in which the original parameter values could be largely recovered from sample data. Furthermore, we again emphasize that the priors chosen for the hierarchical regional model provide adequate constraints on the parameters, especially R 0b and f i (t), while allowing the data to inform the final parameter estimates. In this section, we present numerical results which highlight the differences between provincial and regional modelling. We focus on two comparisons, the first being a full analysis using reported cases from March 1 to December 31 of 2020, comparing the total number of active cases or prevalence inferred from the fitted provincial-wide and regional models. We emphasize that the prevalence is different from the total number of active cases which have been reported. Prevalence takes into account all active cases within a specific region, not just those who have tested positive. The second comparison is to illustrate the predictive ability of the fitted provincial-wide and regional models, where we use the data from March 1 to November 7 of 2020 and let both models predict the future prevalence. Overall both comparisons show significant differences between the provincial-wide and regional models. Below, we present a detailed discussion of these results and provide some region-specific recommendations based on model predictions. To obtain the samples from the posterior distribution of the parameters, we used Stan 2.21.0 and R 4.0.2, running 2000 HMC iterations and 4 chains. All 4 chains were observed to have converged, as shown in the trace plots of Figures 13-16 from Appendix C. In this section, we compare the results between the regional and provincial-wide models, using data from March 1 to December 31 of 2020. In particular, our objective is to compare the differences in COVID-19 prevalence predicted by the regional and provincial-wide models. We now discuss the results of this comparison depicted in Figures 2 through 6 and in Table 6 . Figure 2 displays the estimated densities for R 0b for the hierarchical regional model in blue and the provincial-wide model in red. There is a relatively small difference between the posterior means of these densities, which are 2.98 and 2.95 for the hierarchical regional model and the provincial-wide model, respectively. The posterior density for R 0b in the hierarchical regional model is more concentrated around its mean, which is sensible since data from all five regions help inform its estimate. Notably, while the prior for R 0b had its mode at 2.6 for both models, the modes of the corresponding posteriors have shifted to the 2.9-3.0 range. This similarity in R 0b suggests that differences in the computed R 0 between the models can be primarily attributed to the f parameters. Figure 2 : R 0b comparison of hierarchical regional model (blue) vs provincial-wide model (red). Figure 3 showcase the differences between the regional f i 2 , . . . , f i 7 and the provincial-wide f 2 , . . . , f 7 , where we recall that f i j represents the relative reduction of contacts in region i during phase j. We observe from Figure 3 that within the urban regions of Fraser and Coastal, the differences in regional and provincial-wide f i are minimal. Conversely, there are significant variations between the regional and provincial-wide f i in the more rural regions of Island, Interior and Northern. The MCMC samples for the hierarchical regional R 0b and the provincial-wide R 0b (Figure 2) , together with the regional f i j and the provincial-wide f j . . , f 7 by regions (blue) and provincial-wide (red). Red and blue densities with little overlap indicate that the provincial-wide estimates are substantively different from those of the regional model. (Figure 3 ), were used in equation (4) to compute Table 6 . These are the posterior means and 95% credible intervals of R i 0 , along with the regional weighted average R 0 and the provincialwide R 0 , for each period associated with the f i (t) phases (Table 2) . Non-overlapping credible intervals indicate that there is a high probability of different R i 0 values for that phase. Such differences can be seen when comparing the provincial-wide and regional models (e.g., Interior and Northern differ from BC-wide R 0 for the f 7 phase), and also when comparing different regions (e.g., Interior and Island have different R 0 's for the f 7 phase) within the regional model. There is a reasonable agreement between the regional weighted average R 0 and the provincialwide R 0 , except for possibly phase f 3 . Given the proportionality of R i 0 to f i (t) in equation (9), the substantial increase in f 3 results in a large increase in the regionalized basic reproduction number, R i 0 , within the Interior region. This increase is reflected in Table 6 , resulting in a statistically significant difference from the provincially estimated R 0 . This increase can be attributed to the large increase in cases occurring in the Interior region in early July. We provide further discussion of the results related to the Interior region at the end of this section. Region Table 6 : For each period corresponding to the f 2 , . . . , f 7 phases (columns), estimates (posterior means) of the regionalized basic reproduction number R i 0 computed from the MCMC samples of f i j and R 0b for the regional model. The corresponding weighted average of the R i 0 for the province based on the regional populations is also shown. The final row lists the estimates of the basic reproduction number R 0 based on the MCMC samples from the provincial-wide model. The 95% credible intervals for each quantity are shown in parentheses. Figure 4 highlights the discrepancies between the regional and provincial-wide model prevalence of COVID-19 between March 1 and December 31 of 2020. First, we see vast differences in prevalence in the Interior region, Island region and Northern region. Within the Island region for example, the provincial-wide model grossly overestimates the number of cases from August through December 31. Whereas in the Interior and Northern regions, the provincial-wide model predicts decline through late November and December, when in fact the prevalence in these two regions were increasing. Recalling that the R 0 values were significantly different between the provincial-wide and regional models during the f 7 phase, significant increase in prevalence were predicted in the Northern and Interior regions. To verify that the regional model accurately fits the case data, we next look at Figure 5 . In Figure 5 , we observe that the regional model is in good agreement with the reported cases, with the vast majority of reported case counts lying within the 90% posterior probability bands. Again, the Interior outbreak in early July 2020 is a notable exception. Overall, these plots further support that the regional model provides a more accurate description of the modelled prevalence as illustrated in Figure 4 . Figure 4 : Comparison of modelled prevalence by regions (blue) and provincial-wide (red) scaled by respective regional population ratios. The dotted lines indicate start of reopening phases of f i 2 , . . . , f i 7 , as detailed in Table 2 . Here, the solid lines indicate the respective mean and shaded regions indicate respective ranges with 50% and 90% posterior probability. Finally, Figure 6 plots the estimated densities of ψ 1 , . . . , ψ 4 , namely the proportion of anticipated cases that have been tested and reported, shown in blue for each region. Again for comparison, the corresponding density estimates for these parameters in the provincialwide model are shown in red. There are some apparent differences between the regional and provincial-wide estimates for all four periods, and most noticeably for ψ 1 and ψ 2 . As these are the early weeks of the pandemic, this indicates that there may be uneven testing availability in the province, resulting in posteriors for ψ 1 and ψ 2 that vary considerably from the priors assigned to them. Furthermore, the assumed initial conditions for the model (which were simply scaled regionally due to low initial case counts) may contribute to these differences. Once testing became widely available (April 21 onwards, represented by ψ 4 ), it appears that there is considerable uncertainty: while ψ 4 has increased relative to ψ 3 , its plausible range of values in the posterior is quite wide, between 0.3-0.9 for most regions and also in the provincial-wide model. Nonetheless, the ψ 4 posteriors are not purely reflecting the prior chosen (which had a mode at 0.4 from Section 2.2) and regional differences are still apparent; e.g., Northern has most of its ψ 4 posterior probability density from 0.1-0.7, while the corresponding range is 0.5-0.9 for Coastal. Figure 6 : Comparison of estimated posterior probability densities for the proportion of anticipated cases that have been tested and reported, ψ 1 , . . . , ψ 4 by regions (blue) and provincial-wide (red). Red and blue densities with little overlap indicate that the provincial-wide estimates are substantively different from those of the regional model. Combining the results showcased in Figures 3, 4 , 5 and Table 6 , we see that the provincialwide model is unable to adequately capture the modelled prevalence throughout all regions. In particular, since data from all regions were grouped together, smaller outbreaks, such as the July outbreak in the Interior region, did not make a significant impact on the estimated provincial R 0 . Specifically, the estimated mean of the provincial R 0 during the f 3 phase was 1.26, while in the Interior region, the regional model yielded an estimated R i 0 mean of 2.21, with nonoverlapping 95% credible intervals. Additionally, Table 6 indicates several other statistically significant differences between the provincial and regional models, for instance during the f 5 phase for the Northern and Island regions. While the hierarchical regional model accurately estimated an increase in cases for the Interior region through the end of June into July of 2020, it failed to fit the isolated increase and decrease in cases appearing during the f 4 phase, as seen in Figure 5 . Since R i 0 is constant during each phase, as shown in equation (9), our proposed modelling framework is unable to predict an increase and decrease in cases within a single phase. One explanation why the outbreak was immediately followed by a rapid decrease in cases was due to interventions introduced in mid-July of 2020 affecting restaurants, bars, nightclubs, rental properties and house boats, as reported by Ghoussoub M. (2020) . If this additional intervention had been counted as a provincial lockdown phase, our hierarchical regional model may have been able to better estimate the increase and decrease in cases during this short period. Overall, the results presented in this section highlight that using strictly provincial modelling to estimate prevalence may miss localized trends within regions. This is illustrated by Figure 4 during the f 7 phase in December of 2020. The provincial-wide model suggests a significant decline in cases, while the regional model suggests a drastic increase in cases within the Northern and Interior regions. Therefore, if only provincial-wide modelling is utilized, loosening of provincial restrictions may be recommended. However, as shown by the regional model, such a recommendation could be catastrophic within the Interior and Northern regions. For this comparison, the two models were fitted using data from March 1 to November 7 of 2020 and subsequently used to predict the prevalence of COVID-19 cases in the time period between November 8 and November 30 of 2020. The resulting plots are shown in Figure 7 . In many ways, these plots are the most impactful of our results, as we now explain. The epidemiological model described by (1) and (2), with parameters estimated via the statistical methodology presented in Section 2.2, yield our best estimates of how the pandemic is evolving at a given time, along with credible intervals quantifying the associated uncertainty with those estimates. We verified that our modeling approach allows us to fit the regional data well, as discussed in the previous subsection. The final step is to illustrate the utility of our model in making predictions. The predicted time period is shown in the light grey portions of Figure 7 . Focusing our attention to the rural regions (Interior, Island and Northern) in Figure 7 , we see an overestimation of cases predicted by the provincial-wide model. Recall from Section 2.3, the provincial-wide model groups together all cases within the province, predicts the provincewide prevalence, and then scales the cases to the individual regions based on their population ratio. Since the cases per capita in the urban regions (Fraser and Coastal) are the highest in the province, scaling the provincial results to the individual regions leads to an overestimation and an underestimation in the rural and urban regions, respectively. From the results of Figure 7 , the provincial-wide model suggests a drastic increase in cases throughout the province if additional mitigation measures are not implemented. In contrast, the regional model provides different prevalence predictions in certain rural regions. Specifically, we see no overlap with the 90% posterior band for the Island region and minor overlap with the 90% posterior band for the Interior region. Overall, both the provincial-wide and regional models suggest that all regions should introduce additional mitigation measures. However, the regional model suggests further mitigation efforts should take place in the urban regions 4 . These efforts could include an increase in testing, additional contact tracing and travel restrictions in the Fraser and Coastal regions to prevent the further spread of COVID-19. As discussed in the introduction section of the paper, migration between regions has been shown to be a significant contributor to the spread of COVID-19. Therefore, in order to protect the more rural regions from further increase in cases, a reduction in migration from Coastal and Fraser regions would be advisable. In the absence of such travel restrictions, the rural regions are vulnerable to a sudden increase in cases due to an influx of infectious individuals from the Coastal or Fraser regions. Moreover, we observe from Figure 4 that the rise in cases throughout November was first observed in the Fraser and Coastal regions. This is then followed by increases in the Northern region and later in the Interior and Island Regions. Hence, through regionalized predictive modelling, specific region mitigation measures can made to help prevent further spread of COVID-19. In this paper, we adapted the provincial-wide model of Anderson et al. (2020) to a regional model and studied their differences. In contrast to Anderson et al. (2020) 's provincial-wide model, we introduced an additional hierarchical structure to facilitate regional modelling, with specific regional parameters. Our results showed that the proposed hierarchical model can effectively fit the regional case data and highlighted important differences in both prevalence estimates and prediction versus the provincial-wide model. For example, when comparing the provincial and regional models, statistically significant differences in the R i 0 estimates were observed in rural regions during certain lockdown phases. Our results also indicate that the regional model was able to detect smaller trends overlooked by the provincial-wide model, such as the rise in cases within the Interior region throughout late June and early July of 2020. Furthermore, the provincial-wide model was shown to overestimate prevalence in certain rural regions and underestimate prevalence in the urban regions of BC. This suggests that regionalized models can provide valuable insight potentially overlooked by using a provincial-wide model and help guide future interventions to reduce the spread of COVID-19 in BC. The value of regional modelling of COVID-19 has also been demonstrated in other recent work, such as Karatayev et al. (2020) and Bakhta et al. (2021) . Moreover, the application of our proposed regional model is not restricted to only British Columbia. Upon modifying parameters to specific regions, our proposed regional model can be applied to other parts of Canada or other countries, to enhance regional predictions of COVID-19 prevalence. A public Github repository with the R code is provided to facilitate such usage, as given in the Code Availability statement at the end of the paper. Nonetheless, our proposed model has certain limitations. The ODE systems do not account explicitly for migration between regions. Such migration could have played a key role especially during the initial spread of COVID-19 to the different regions of the province. Also, we did not attempt to incorporate time-varying or region-specific parameters u d and u r . For example, u d and u r may have changed as the pandemic progressed due to lockdown fatigue. We also note that data on the number of tests performed and positivity rates have not been used in our model. Intuitively, such data may have some relationship to ψ i , but their actual role is difficult to determine without random testing. Overall, access to additional data could greatly improve the accuracy of the proposed model. For example, the parameter f i (t) currently accounts for all the mitigation measures that limit the spread of COVID-19, such as physical distancing, masks, hand washing and contact tracing. Although these mitigation measures remained fairly constant during our period of study, they still had some influence on the resulting estimates for f i (t). Therefore, access to specific data on the other mitigation measures could allow us to better determine the degree to which the parameter f i (t) can be attributed to physical distancing. As discussed in Section 3.1, our model only accounts for the official lockdown phases imposed provincially between March 1 and December 31 of 2020. However, other interventions, such as changes in protocols for bars and restaurants, may have impacted the provincial and regional case counts during these months. Therefore, our regional model could be improved by accounting for all such interventions. Finally, it is important to note that not all outbreaks are a result of an increase in contacts. Certain outbreaks, known as superspreaders, emerge from parties or gatherings, where significantly more infections occur than anticipated. Recent models account for superspreaders as highly infectious individuals as discussed in Karatayev et al. (2020) , or individuals with a high number of contacts as studied by . Additionally, superspreader events have been studied using an SEIR model in conjunction with cellular network data by Chang et al. (2021) . In contrast, our ODE model is unable to account for superspreader events, or superspreading individuals, and therefore may misrepresent outbreaks resulting from superspreaders as a general increase in contacts. Furthermore, upon finishing our work, we have noticed that the BC Centre for Disease Control has started reporting regional predictions in the weekly report of BC Centre for Disease Control (2021c) as of the week of April 14 of 2021, and as well as in their recent annual report of BC Centre for Disease Control (2021d). Thus, another extension to this work would be to modify our hierarchical Bayesian epidemiological model to incorporate vaccinated individuals and additional modifications to account for more infectious variants of COVID-19. With additional change points to account for recent restrictions in 2021, the modified model would allow predictions to be made with the current 2021 data and be compared with the regional predictions from the BC Centre for Disease Control. The R code and programs that support the results of this analysis are publicly available in the Github repository at https://github.com/wongswk/hierarchical-regional-covid. Figure 10 : Histograms of MCMC samples for f 6 , f 7 , φ and hierarchical regional R 0b in the simulation study. The true value of each parameter is shown with the blue vertical line, and the bounds of the central 90% credible interval for each parameter is shown with the red vertical lines. Appendix B Non-Hierarchical versus Hierarchical regional model fits comparison Figure 11 : For models fitted to individual regions only, the estimated posterior density of R 0b in each region is shown in blue, with the corresponding posterior density of R 0b from the hierarchical regional model shown in red. Figure 15 : Trace plots of runs for the regional f 6 , f 7 , φ and hierarchical regional R 0b . Quantifying the impact of COVID-19 control measures using a Bayesian model of physical distancing Elimination could be the optimal response strategy for COVID-19 and other emerging pandemic diseases Epidemiological Forecasting with Model Reduction of Compartmental Models. Application to the COVID-19 Pandemic BC Centre for Disease Control: COVID-19 Variants BC COVID-19 Data British Columbia (BC) COVID-19 Situation Report Week COVID-19: One Year of the Pandemic in BC Stan: A probabilistic programming language Mobility network models of COVID-19 explain inequities and inform reopening Differences in Preventive Behaviors of COVID-19 between Urban and Rural Residents: Lessons Learned from A Cross-Sectional Study in China A Time-Dependent SIR Model for COVID-19 With Undetectable Infected Persons Complexity of the Basic Reproduction Number (R0) On the definition and the computation of the basic reproduction ratio R 0 in models for infectious diseases in heterogeneous populations Socioeconomic determinants of health and access to health care in rural Canada Social determinants of health and health inequalities Population migration, spread of COVID-19, and epidemic prevention and control: empirical evidence from China Health Authority Profile Local lockdowns outperform global lockdown on the far side of the COVID-19 epidemic curve CityNews: COVID-19 variant arrives in B 2019 Novel Coronavirus Repository CBCNews: 64-year-old residential care aide is 1st person in B.C. to receive COVID-19 vaccine Relaxation of social distancing restrictions: Model estimated impact on COVID-19 epidemic in Manitoba Coronavirus and migration: analysis of human mobility and the spread of Covid-19 Prior Choice Recommendations Few Canadians had antibodies against SARS-CoV-2 in early 2021 Modeling COVID-19 Latent Prevalence to Assess a Public Health Intervention at a State and Regional Scale: Retrospective Cohort Study Reproduction numbers and sub-threshold endemic equilibria for compartmental models of disease transmission General Model for COVID-19 Spreading With Consideration of Intercity Migration, Insufficient Testing, and Active Intervention: Modeling Study of Pandemic Progression in Japan and the United States The authors would like to thank the anonymous reviewers for their helpful comments and valuable suggestions at improving this paper. The authors have no competing interests declared.