key: cord-0838321-tdfmmwgp authors: Jung, Sung-mok; Endo, Akira; Akhmetzhanov, Andrei R.; Nishiura, Hiroshi title: Predicting the effective reproduction number of COVID-19: Inference using human mobility, temperature, and risk awareness date: 2021-10-08 journal: Int J Infect Dis DOI: 10.1016/j.ijid.2021.10.007 sha: 2002e5f2367f95ee5469a83d6250a4368a0f6cf3 doc_id: 838321 cord_uid: tdfmmwgp Objectives The effective reproduction number ( R t ) is critical for assessing the effectiveness of countermeasures during the coronavirus disease 2019 (COVID-19) pandemic. Conventional methods using reported incidence are unable to provide R t timely due to the delay from infection to reporting. Here, we aim to develop a framework to predict the R t in real-time using timely accessible data, i.e., human mobility, temperature, and risk awareness. Methods A linear regression model to predict R t was designed and embedded in the renewal process. Four prefectures of Japan with high incidence in the first wave were selected for model fitting and validation. Predictive performance was assessed by comparing the observed and predicted incidence using cross-validation, by testing on a separate dataset in two other prefectures with distinct geographical settings from the four prefectures. Results The predicted mean values of R t and 95% uncertainty intervals well traced the overall trend of incidence, while predictive performance was diminished when R t abruptly changed potentially due to superspreading events and when stringent countermeasures were implemented. Conclusions The described model can potentially be used for monitoring the transmission dynamics of COVID-19 ahead of the formal estimates subject to delay, providing essential information for timely planning and assessment of countermeasures. The first confirmed case of severe acute respiratory syndrome coronavirus 2 (SARS- infection was reported in Japan on 15 January 2020, and since then the transmission of its causal agent, coronavirus disease 2019 has continuously affected the entire country. As the incidence of COVID-19 began to surge, the state of emergency was declared by the national government on 16 April requesting the voluntary reduction of physical contact, which likely helped to suppress the epidemic (Jung et al., 2021) . However, resuming socioeconomic activities in late May led to resurgence of cases. Although a temporal decline in the incidence was observed, the country has experienced the third wave from late October. In response, prefectural governments with a large number of cases have requested bars and restaurants to curtail their operation hours from late November (Ministry of Health Labour and Welfare, 2020a). Despite these measures, the spread of disease continued, leading the national government to declare the second state of emergency on 7 January 2021, asking citizens to refrain from non-essential outings (Cabinet Relations Office, 2021). As part of evaluation of countermeasures, the effective reproduction number ( ), defined as the expected number of secondary cases arising from a single primary case at calendar time , is 5 widely used for monitoring the trend in community transmission (Nishiura et al., 2009 ). However, timely and accurate estimation of using incidence data remains challenging. First, to precisely link the timing of a control measure and resulting changes in the transmission trend, it is vital to estimate the as a function of the infection time (Gostic et al., 2020) . Since infection times are rarely observed in practice, they need to be estimated accounting for the empirical delay distributions. Moreover, the estimation of is further complicated because of right truncation with respect to the time interval from the infection to reporting. In real-time practice, the number of recent infections is underestimated due to cases that are already infected but not yet reported. Considering the empirically observed time delay of around 9 days from the infection to reporting in Japan, the estimated from the reported incidence is likely to be biased, at least within 9 days from the latest reporting date. Considering that the SARS-CoV-2 transmission is facilitated by human-to-human contact, digital proxies of human mobility pattern can provide an important avenue to infer the directly unobservable transmission patterns as a function of time. Indeed, various datasets of mobility patterns have started to become widely available during the ongoing COVID-19 pandemic, and such datasets have been used to monitor the time-dependent patterns of physical distancing performance (Buckee et al., 2020; Kishore et al., 2020; Leung et al., 2021; Nouvellet et al., 2021; Xiong et al., 2020) . In addition, published studies have reported that temperature is inversely 6 associated with COVID-19 transmission Pequeno et al., 2020; Qi et al., 2020; Smith et al., 2021; Ujiie et al., 2020; Wang et al., 2021) . These data are often more readily accessible than case counts (which are typically subject to delays of ~9 days) and thus may enable near real-time assessment of interventions if they can be used to predict . Furthermore, quantified risk awareness can also help to predict more accurate because an induced adherence to personal protective behaviors (e.g., wearing a mask or washing hands) may reduce the virus transmissions (West et al., 2020) . Accumulated evidence suggests that integrating human mobility coupled with temperature and risk awareness well reflect the contact pattern as a function of time, and thus, the integrative model could provide an opportunity for the timely prediction of during ongoing COVID-19 pandemic. Here, we aim to develop a simple statistical framework to predict the proxy of using key driving factors of the COVID-19 transmission that can be used before a formal estimate relied on reported case counts is available. period") and cross-validation ("validation period"), respectively, and 22,379 cases were included in the test data. unknown illness onset dates. Thus, to estimate as a function of the infection time, the back-projection was conducted in following two steps. First, the missing illness onset date of those cases was back-projected from the date of laboratory confirmation with a right-truncated time interval distribution from illness onset to laboratory confirmation in each prefecture (Jung et al., 2021) . Second, we back-projected the infection time among all reported COVID-19 cases from either observed or back-projected illness onset date, using the incubation period distribution (Linton et al., 2021) . The R package "surveillance" (Höhle, 2007) was used for nonparametric back-projection. The human mobility pattern and temperature were hypothesized as driving factors for COVID-19 transmission, and those datasets for the abovementioned time period were collected. Google community mobility reports (hereafter Google mobility) were used to capture the human mobility patterns (Google, 2021) . Google mobility data provides six different categories (i.e., "retail and recreation", "grocery and pharmacy", "parks", "transit stations", "workplaces", and "residential") of changes in the human mobility, relative to the average of that on the same day of the week in the pre-pandemic period (i.e., 3 January-6 February 2020). We used mobility pattern related with "retail and recreation" in our analyses based on our domain knowledge that this category likely represents the mobility in close-contact settings associated with COVID-19 transmissions (Cazelles et al., 2021) . Daily temperature data were retrieved from Japan Meteorological Agency (Japan Meteorological Agency, 2021) by selecting one representative observatory nearby the location of prefectural government office for each prefecture. To extract the overall trend of those time-series without being influenced by daily noise, both datasets were smoothed by taking a 7-day moving average. A prediction model for , integrating the human mobility, temperature, and risk awareness, was designed and evaluated in the following steps. First, three candidate regression models were fitted to the reconstructed COVID-19 incidences during the training period via the renewal process. We then compared the performance of these candidate models by cross-validation against the estimated from data in the validation period and selected the best model. Lastly, a predictive performance of the best-ranked model was evaluated using the separate test data by comparing estimated with the predicted values produced by the trained model to determine an applicability of the model to other geographic settings. We propose three simple regression models that incorporate different combinations of explanatory factors for COVID-19 transmission: only Google mobility (Model 1), Google mobility and temperature (Model 2), and Google mobility, temperature, and risk awareness over COVID-19 (Model 3). In each model, corresponding variables were included in a log-linear regression form. The in the model with all three factors (Model 3) is formulated as follows: where is the baseline effective reproduction number in prefecture . We assume that are similar between all four prefectures of the training data, and follow the Gamma distribution with a mean ̅ and fixed coefficient of variation (CV) of 0.5 (Park et al., 2020) . We fixed the value of CV due to a small number of prefectures, but varied CV in the range of 0.25-1 carried similar results of our analysis. Covariates ( ) and ( ) are smoothed values of Google mobility and temperature at calendar time and in prefecture , respectively. ( ) is the degree of risk awareness at calendar time and in prefecture , which was graded by assuming that it is linearly associated with the smoothed number of newly reported COVID-19 cases, following a positive association between the confirmed cases and risk perception from the study using a longitudinal data in the UK (Schneider et al., 2021) . The effect of this variable ( ) was capped at a predefined upper limit ( ), corresponding to the governmental definition of the "highest alert level" incidence in Japan (i.e., 25 confirmed cases per 100,000 population in a week) (Cabinet Relations Office, 2020). According to the definition, a daily number of cases giving an upper limit for each prefecture ( ) was specified as 497 in Tokyo, 315 in Osaka, 270 in Aichi, 188 in Hokkaido, 182 in Fukuoka, and 52 in Okinawa. The in Models 1 and 2 were specified as models whose coefficients for non-included variables in Equation (1) were fixed at 0. We constructed a likelihood function for the proposed regression models of (Model 1-3), based on the renewal process and estimated the corresponding parameters of each regression model by fitting it to the COVID-19 incidence during the training period. The expected number of daily reported domestic cases ( ) at calendar time in a given prefecture was calculated using the equation: where ( ) denotes the total (imported + domestic) daily number of COVID-19 cases at time reported in prefecture . ( ) is the probability mass function (PMF) of generation time (Nishiura et al., 2020) . To account for the right truncation, the cumulative mass function, ( ), for the time delay from infection to report was calculated by convoluting the PMF of the incubation period and of the time interval from illness onset to reporting in prefecture (see May 2020) and a part of the second wave (15 July-31 August 2020). The intra-wave period from 1 May to 15 July 2020 was excluded because of low count numbers observed during that period. For parameter estimation, Poisson likelihood was used: where the set includes all or a part of the parameters (i.e., , ̅ , , , and ) specific for each of the three proposed models. The maximum likelihood method was employed, 95% confidence intervals (CIs) of each parameter were derived from 10,000 samples from a Laplace-approximate normal distribution. To select the best model among proposed models, a cross-validation was conducted by comparing the predicted and estimated "ground-truth" during the validation period (from 1 September 2020 to 31 January 2021) in the four prefectures. Each regression model produced predictive values for the validation period using the explanatory variables based on the estimated parameters in Method 2-2-1. We used the values estimated from incidence data via the renewal process (Equation (2)) as the ground-truth. was estimated as a free 13 time-dependent parameter from the COVID-19 incidences during the validation period by the following likelihood function: The 95% CIs were derived using the profile likelihood method. Considering the right-truncation in the recent reported incidences, the estimated for the latest 15 days (1-15 February 2020) were excluded from the cross-validation. Furthermore, to smooth out abrupt fluctuations (e.g., superspreading events) in the estimated values, a 7-day moving average was taken. Then, a predictive performance of each model (i.e., comparison between the predicted and estimated during the validation period) was quantitatively assessed using four different measurements: bias, root-mean-square error (RMSE), ranked probability score, and Dawid-Sebastiani score (Funk et al., 2019) , and the best model was selected. In addition, the number of predicted infections using the conditional forecasting method (i.e., forecasting a future incidence based on the predicted and empirically reported incidences in the past) was also compared against the back-projected incidence by infection dates. Lastly, a predictive performance of the finally selected model was evaluated using the test data, indicating a potential applicability of the proposed model to other geographical settings. Accordingly, the values in two prefectures of test data (Fukuoka and Okinawa) were predicted from 15 July 2020 through 31 January 2021, relying only on the trained model, and were compared against the estimated from the renewal process. In addition, the predicted number of infections using the conditional forecasting method were also compared with the empirical data in each of the two prefectures. Substantial reduction in the Google-based mobility was observed in all regions during the first and second state of emergency, consistent with only small numbers of reported cases by the end of the declaration. The human mobility patterns tended to show abrupt increases on consecutive national holidays and accordingly, the number of reported cases slightly increased after roughly 9 days, consistent with the empirical time delay from infection to reporting in Japan. Table 1 shows the estimated parameters from the data in the training period and summarizes the predictive performance of proposed models during the validation period. Among the three models, the model that incorporated Google mobility, temperature, and risk awareness (Model 3) was selected as the best model based on the Dawid-Sebastiani score, while the model that accounted for only Google mobility (Model 1) showed the best performance in RMSE, bias, and ranked probability score. The predicted time trend of using Model 1 (almost linear trend near the value of one; Figure S1 ) was less informative compared to Model 3, while temperature and risk awareness were negatively associated with estimated coefficients of -0.02 per Celsius degree (95% CI: -0.02, -0.01) and -0.12 per 100 reported cases (95% CI: -0.15, -0.10), respectively. All coefficients were statistically significant. The present study proposed a simple regression model for predicting the real-time of COVID-19, accounting for human mobility, temperature, and risk awareness. Our analysis suggested that the human mobility pattern was positively associated with COVID-19 transmissions, while the temperature and risk awareness were negatively associated. These findings indicate that the reduction in socioeconomic activities and the level of risk awareness may be linked to the reduction in transmission, highlighting the potential of social distancing interventions and risk communication for controlling the COVID-19 epidemic (Anderson et al., 2020; Heydari et al., 2021) . The inverse association between the temperature and COVID-19 transmissions was also in line with other published papers Ma et al., 2021; Smith et al., 2021) . This finding could be explained by two possible mechanisms. First, cold temperature induces behavioral changes and increases indoor contact that is associated with the transmission risk of SARS-CoV-2 (McClymont and Hu, 2021). Second, the virus enjoys greater survivability in cold environments (Riddell et al., 2020) , as was the case for other human coronaviruses (Chan et al., 2011; van Doremalen et al., 2013) . Although the cumulative number of reported cases during the summer season was higher than that of the winter season in 2020 (Figure 1 incidence, showing a clear emerging signal ( >1) of the second and third waves in Japan. Such performance of proposed model suggests that our framework can provide a plausible proxy of the latest of COVID-19 using the readily accessible data, which conventional methods relying on the reported incidence are not able to provide in a timely manner due to the inherent delays. Timely assessment of is essential to inform public health policy aiming, e.g., to bring the epidemic under the control before the hospital and intensive care unit occupancy reaches its full capacity. Although abrupt changes of values, presumably induced by temporary local surges of cases (e.g., clusters in hospitals and nursing homes), could not be fully captured by the proposed model, it was still able to provide a timely signal of changes in before the formal estimates become available. Despite the overall good performance of the proposed model, our framework over-or underestimated the when stringent interventions (i.e., reduced opening hours for restaurants and bars from November 2020) were in place. Although we believe that the inclusion of the mobility patterns associated with retail and recreation to represent the physical mixing in high-risk settings was a reasonable choice, such mobility data with limited temporal and spatial resolution may not fully reflect the detailed social contact patterns. The transmission of COVID-19 is suggested to involve substantial individual variations characterized by a highly dispersed offspring distribution (Endo et al., 2020) and thus stringent control measures were imposed primarily on settings regarded as high-risk (e.g., nightclubs, bars, and restaurants). The resulting changes in detailed contact patterns in those places may not have been fully reflected on the simple summary data of human mobility. Moreover, the digital proxies for human mobility patterns were suggested to be not very informative regarding changes in the density of individuals within high-risk places, although this metric may play a crucial role in the SARS-CoV-2 (Chang et al., 2021) . These limitations of the mobility data may account for the temporary deviations in the prediction. There are some additional limitations to be mentioned. First, the relationship between the number of COVID-19 cases and the degree of risk awareness may change over time in the long run. Indeed, a decrease in adherence to non-pharmaceutical interventions was reported in the United States from April to November 2020 (Crane et al., 2021) , in spite of the continuously increasing number of reported COVID-19 cases. Second, the upper limit for the effect of risk awareness was rather arbitrary chosen and not necessarily theoretically justified. We assumed that the risk awareness affects the transmission risk via personal behavioral changes that are not reflected on the changes in the mobility (e.g., wearing a mask or avoiding crowded places during outside visits). It is likely that there is a certain limit to the risk reduction achieved by such behavioral changes, which we have incorporated in our model as a prespecified cap. Third, with the roll-out of COVID-19 vaccines, the proposed model might become insufficient to predict the due to the herd immunity effect conferred by vaccination. Lastly, more accurate prediction may require an extended model that accounts for age-stratified transmission dynamics (e.g., age-specific susceptibility), along with the age-specific mobility patterns. In conclusion, our study suggests that human mobility, temperature, and risk awareness can be integrated into the renewal process to timely predict the effective reproduction number during the ongoing COVID-19 transmission ahead of the formal empirical estimates subject to delays, which provides essential information for timely planning and assessment of epidemic control measures. incidences using the conditional forecasting method. Blue lines and shaded areas indicate the estimated and its 95% confidence intervals using the renewal process and profile likelihood, while purple lines and shaded areas are the predicted and its 95% confidence interval from the best model and its estimated parameters. How will country-based mitigation measures influence the course of the COVID-19 epidemic? Aggregated mobility data could help fight COVID-19 Parallel trends in the transmission of SARS-CoV-2 and retail/recreation and public transport mobility during non-lockdown periods The Effects of Temperature and Relative Humidity on the Viability of the SARS Coronavirus Mobility network models of COVID-19 explain inequities and inform reopening Change in Reported Adherence to Nonpharmaceutical Interventions During the COVID-19 Pandemic Stability of Middle East respiratory syndrome coronavirus (MERS-CoV) under different environmental conditions Estimating the overdispersion in COVID-19 transmission using outbreak sizes outside China Assessing the performance of real-time epidemic forecasts: A case study of Ebola in the Western Area region of Sierra Leone COVID-19 Google Community Mobility Reports Practical considerations for measuring the effective reproductive number, Rt The effect of risk communication on preventive and protective Behaviours during the COVID-19 outbreak: mediating role of risk perception surveillance: An R package for the monitoring of infectious diseases Projecting a second wave of COVID-19 in Japan with variable interventions in high-risk settings Measuring mobility to monitor travel and physical distancing interventions: a common framework for mobile phone data analysis Real-time tracking and prediction of COVID-19 infection using digital proxies of population mobility and mixing Air pollution and temperature are associated with increased COVID-19 incidence: A time series study Correlation between times to SARS-CoV-2 symptom onset and secondary transmission undermines epidemic control efforts Role of meteorological factors in the transmission of SARS-CoV-2 in the United States Weather Variability and COVID-19 Transmission: A Review of Recent Research Ministry of Health Labour and Welfare Ministry of Health Labour and Welfare. Requests for reducing operation hours from Ministry of Health Labour and Welfare. Evaluation report for the latest COVID-19 infections 2020c Early epidemiological assessment of the virulence of emerging infectious diseases: a case study of an influenza pandemic Serial interval of novel coronavirus (COVID-19) infections Reduction in mobility and COVID-19 transmission Reconciling early-outbreak estimates of the basic reproductive number and its uncertainty: framework and applications to the novel coronavirus (SARS-CoV-2) outbreak Air transportation, population density and temperature predict the spread of COVID-19 in Brazil COVID-19 transmission in Mainland China is associated with temperature and humidity: A time-series analysis The effect of temperature on persistence of SARS-CoV-2 on common surfaces COVID-19 risk perception: a longitudinal analysis of its predictors and associations with health protective behaviours in the United Kingdom Temperature and population density influence SARS-CoV-2 transmission in the absence of nonpharmaceutical interventions Effect of temperature on the infectivity of COVID-19 Impact of temperature and relative humidity on the transmission of COVID-19: a modelling study in China and the United States Applying principles of behaviour change to reduce SARS-CoV-2 transmission Mobile device data reveal the dynamics in a positive relationship between human mobility and COVID-19 infections