key: cord-0556969-mb85lcix authors: Ozaki, Jun'ichi; Shida, Yohei; Takayasu, Hideki; Takayasu, Misako title: Daily-activity-dependency of effective reproduction number in COVID-19 pandemic: direct modelling from GPS data date: 2022-03-17 journal: nan DOI: nan sha: 29a5d5677752c594c75f04c3d711683dabd218a7 doc_id: 556969 cord_uid: mb85lcix During the COVID-19 pandemic, governments faced difficulties in implementing mobility restriction measures, as no clear quantitative relationship between human mobility and infection spread in large cities is known. We developed a model that enables quantitative estimations of the infection risk for individual places and activities by using smartphone GPS data for the Tokyo metropolitan area. The effective reproduction number is directly calculated from the number of infectious social contacts defined by the square of the population density at each location. The difference in the infection rate of daily activities is considered, where the `stay-out' activity, staying at someplace neither home nor workplace, is more than 28 times larger than other activities. Also, the contribution to the infection strongly depends on location. We imply that the effective reproduction number is sufficiently suppressed if the highest-risk locations or activities are restricted. We also discuss the effects of the Delta variant and vaccination. Since the beginning of the COVID-19 pandemic in 2019, there have been 452 million confirmed cases and over six million deaths globally as of 12 March 2022, posing serious healthcare challenges [1] [2] [3] . Most governments have struggled to control this disease and simultaneously minimize its damage to daily and economic activities owing to limited time and resources [4] [5] [6] [7] [8] . Along with vaccinations, nonpharmaceutical interventions are considered essential for managing this disease [9] [10] [11] [12] [13] . For example, the governments around the Tokyo metropolitan area in Japan declared states of emergency (SoEs) that limited daily and economic activities including schools, department stores, cinemas, restaurants, bars, and travel to reduce human mobility in public spaces 5, 9 . Consequently, the number of social contacts and, thereby, the effective reproduction number decreased. Governments require reliable quantitative estimations of the effect of such policies on the pandemic, that is, a predictable model, for making informed decisions. Studies have estimated the effectiveness of lockdowns or non-compulsory measures such as SoEs on reducing human mobility [9] [10] [11] [12] [13] . However, the effect of reduced human mobility on the pandemic remains unclear, as it is insufficient to investigate the number of social contacts alone. Infectious diseases, including the COVID-19 pandemic, have been studied using global positioning system (GPS) data [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] . However, with such diseases, the types of social contacts are considered more critical. Specifically, the infection rate is known to strongly depend on whether social contacts are wearing masks or talking and on the ventilation state of the rooms they are in [24] [25] [26] . The point is that the number of social contacts through each daily activity type should be investigated. In this study, we propose a model to predict the effective reproduction number of COVID-19 based on daily human activities inferred from smartphone GPS data. From these data, first, we categorize citizens' daily activity patterns into four types and estimate the population density for each activity, location, and time in the Tokyo metropolitan area. We also calculate the number of social contacts for each activity by assuming it to be proportional to the square sum of the population density. Second, we propose an activity-dependent infection model based on the susceptible-infectious-removed (SIR) model. This model seems to have a lower resolution than that of other compartmental models (e.g., susceptible-exposed-infectious-removed (SEIR) model). However, it is suitable for direct formulation using GPS data. The effective reproduction number is a linear combination of the number of social contacts, and the coefficients are the infection rate per contact. We determine the parameters to fit the empirical data before the spread of the Delta variant and verify that the model is sufficient to predict the effective reproduction number. Then, we compare the parameters to observe the activity that has the highest infection risk. We also calculate the effective reproduction number for each daily activity, location, and individual. The model prediction is valid for up to around two weeks after the human mobility data are obtained. Third, we investigate effects other than those of human mobility. We show that the domination of the Delta variant is described by only two parameters: starting date and ratio of effective reproduction number to that of existing variants. Furthermore, we demonstrate the effectiveness of vaccination through its high prevention ratio of infection in a metropolis. First, we analysed COVID-19 epidemiological data in the Tokyo metropolitan area from 1 Feb. 2020 to 31 Oct. 2021 27 . t denotes the day count from the beginning of 2020 (e.g., t = 1 means 1 Jan. 2020) and I new (t), the number of new COVID-19 infection cases on day t. The effective reproduction number R data e (t) was estimated as 28 where γ −1 = 5 is the mean generation time 29 . Fig. 1(a) shows a plot of the effective reproduction number. Similarly, we plotted the effective reproduction number limited in severe cases, which is defined using only the number of new severe cases instead of the number of all infections. The correlation between the effective reproduction numbers for all cases and severe cases is maximized if that for severe cases is regarded to be delayed by 12 days. Both are consistent with each other after t = 200. The 1st to 4th SoEs in the Tokyo metropolitan area 5 are also shown. During each SoE, the effective reproduction number started to decrease after around two weeks. The Delta variant was first detected in Japan on 18 May 2021 (i.e., t = 504) 30 . We analysed empirical human mobility data in Japan, especially in the Tokyo metropolitan area, from 1 Jan. 2020 to 30 Sep. 2021 31 . We calculated the population density at each time and location to estimate the number of infection routes. The infection probability of each infection route depends on how people contact each other and, consequently, on the activity. Therefore, we categorized the GPS user activity into four states, staying home ('home'), moving ('move'), staying out ('stay'), and working ('work'), to distinguish the activity-dependent infection probability with a time resolution of 15 min. The definition is as follows: (1) home is the state when the user is within the 1 km-square corresponding to their home, (b) work is the state when the user is within the 1 km-square corresponding to their workplace, (c) move is the state in which the user is within a 1 km-square that is different from the one in which they were 15 min ago and that does not correspond to either their home or workplace, and (d) stay-out is any other situation, e.g., in restaurants, department stores, or stadiums away from their home. Here, the home and work regions are estimated from the home/work city data in the GPS dataset, as explained in the Methods section. Fig. 1 (b) shows a plot of the time series of the mean duration per day of the four activities, where we take the 7-day moving average for avoiding the periodicity related to weekdays. The total time per day is 19 h because the data point is from 05:00 to 24:00. During the pandemic, roughly speaking on average, users were at home for 12 h (not including midnight), at work for 3 h, moved for 2 h, and stayed out for 2 h. Note that this is averaged over all days including holidays for all users. We also calculated the correlation among the mean durations of the four states in the span 200 ≤ t < 500. This span was used later for model parameter fitting. Move was strongly correlated with work and stay. We derived an infection model based on the classical SIR model using the GPS data for the four states with several assumptions. The classical SIR model is given by the following set of equations: where t is the time; S(t), the number of susceptible people; I(t), the number of infectious people; R(t), the number of removed people (either recovered, quarantined, or dead); and N = S(t) + I(t) + R(t), the total number. The parameter β represents the strength of infection spread, and the parameter γ determines the timescale from infected state to recovered or quarantined state. The reciprocal of γ is the mean generation time 29 : γ −1 = 5. We adopted the following four assumptions: (A1) the number of removed people is much smaller than the number of susceptible people, (A2) no nonlocal infections occur between different spaces or time periods, (A3) infectious people are distributed uniformly in the target area (i.e., Tokyo metropolitan area), and (A4) no infections occur between different activities. Assumption (A1) comes from the low cumulative number of COVID-19 cases in Japan (below 2% on 30 Oct. 2021) 27 . Then, the number of susceptible people is assumed to be constant, that is, the whole population of Japan. Assumption (A2) means that we discard indirect infections such as droplet infection with long-distance or contact infection after a long time 32 . Nonetheless, indirect infections are effectively included if the population in the target area does not change drastically. Assumption (A3) is used for simplifying the model. As our GPS data does not contain users' privacy including the information of infection, we simply assume that infected people distribute uniformly. We discard the effect from infected people who went outside or inside of the Tokyo metropolitan area. Assumption (A4) is the intuition that social contacts between different activities are fewer than those within the same activity. For example, in a restaurant, 'work' and 'stay' people will have much fewer contacts than between 'stay' and 'stay'. By applying the first assumption (A1), we only need to consider the second equation of the SIR model because infection causes a negligible change in S(t). In this case, the equation becomes where R e (t) = β Nγ S(t) is an effective reproduction number. We discretize the equation by setting dt = γ −1 as whereβ = β Nγ . This discretization is needed for fitting the empirical data. Next, we adopt the second assumption (A2) of local infections. We suppose that all people in the target area have equal possibility of close contact with each other. In this case, the time evolution of I(t) is described by the product of I(t) and S(t) because the number of infection routes or social contacts is proportional to I(t)S(t). In this sense, the classical SIR model is exact only if the population density is uniform. However, this is not the case in real situations. Therefore, we divide the area into squares (grids) in which social contacts are equally possible among all people in a square. The size of the squares is set to 1 km×1 km. The case of 100 m×100 m 9 is discussed in Supplementary note 1. Thus, we obtain where m is the square label, and τ is the time label in the considered area A and date t. The area A is the Tokyo metropolitan area, and it contains 36,898 1 km-squares. The time label τ moves in 15 min intervals around the current date t for five days, from t − 2 to t + 2, according to the discretization unit γ −1 = 5. The parameterβ takes a different value from the earlier equations because the space size and period are different:β is multiplied by (number of 1 km-squares)/(number of time steps) if the density is uniform in space and time. The third assumption (A3) gives the following relation: where S(t) = ∑ m∈A S mτ (t), I(t) = ∑ m∈A I mτ (t), and A is the target area. Therefore, the effective reproduction number is given as Finally, assumption (A4) is applied. We divide the population density S mτ (t) into that of the activities S amτ (t) and introduce activity-dependent parameters β a . The equation is where the time label τ moves from t − 2 to t + 2, and the activity label a takes the values 'home', 'move', 'stay', and 'work'. The coefficient β a is supposed to be constant in the timespan considered. For simplicity, we approximate the sum as follows: for the equation to have values only on date t, where the time label τ moves only on date t in the right-hand side. Thus, the effective reproduction number is a linear combination of the human mobility time series, by introducing M a (t) = γ −1 ∑ m∈A,τ∈t (S amτ (t)) 2 , the population moment of activity a on date t. We emphasize that the effective reproduction number is simply determined only by the human mobility on the same day. The parameters β a are interpreted as the infection rate during activity a per infection route and 15 min period. We determine the model parameters β a to fit the effective reproduction number; however, the observed effective reproduction number is based on the report date and not the infection date. We have to consider a typical time delay from infection to report ∆T to deal with it. In light of this effect, Eq. (12) is modified as follows: Here, we use the 7-day moving average of M a (t) to remove the periodicity of weekdays as well as the definition of the empirical effective reproduction number. We estimate the time delay from infection to report ∆T in the period 200 ≤ t < 500. Figure 2 shows the correlation between the effective reproduction number and the population moment of each activity delayed by ∆T . The peak at ∆T ∼ 14 is the delay of the infection reports, whereas the trough at ∆T ∼ −60 means that human mobility is decreased after infection spread. The optimal value ∆T = 14 is derived by minimizing a fitting loss to the effective reproduction number by the population moments as a function of ∆T , where the fitting loss is the squared distance in the log-10 space, to prevent the estimation from being dominated only by the significant value of the effective reproduction number. This value is consistent with that reported in a previous study 33, 34 . The remaining problem is the multicollinearity of the move population moment; the correlation of the population moments is 0.89 between move and stay-out and 0.90 between move and work. To address this, we assume that the coefficient β a for move and work is the same because the situations in these two activities are similar: people wear masks in trains (move) and offices (work) but not always in restaurants (stay-out). We fit the model parameters to the data for all cases in the period 200 ≤ t < 500. We do not use data for t < 200 because the social situation drastically changed during the 1st SoE (e.g., mask distribution in Japan 35 ), and the data are not stationary. Furthermore, the effective reproduction numbers for all cases and severe cases are different: epidemiological data are not reliable in that span. The fitting is performed under the log space to prevent the estimation from being dominated only by the significant value of R e (t). As a result, the parameters are β home = (1.2 ± 0.1) × 10 −7 , β move = β work = (6.2 ± 2.8) × 10 −8 , and β stay = (3.4 ± 0.2) × 10 −6 , where the unit is per 15 min. Figure 4 shows that the model explains the data in the period; however, they are different before t = 200 and after t = 500. Before t = 200, the effective reproduction number for severe cases is more 5/13 consistent with the model result. After t = 500, the effects of the Delta variant and the vaccination result in differences 30, 36 . Thus, the classical SIR model is suitable for direct GPS data modelling. Despite its simplicity, we observe that the following model quantitatively explains the COVID-19 epidemic. The effects of the Delta variant and vaccination are estimated as follows 37 . Let r δ = R δ /R 0 be the ratio of the effective reproduction number of the Delta variant R δ to that of the other existing variants R 0 . Furthermore, we assume that the numbers of people infected by the Delta variant I δ (t) and the existing variants I 0 (t) are the same at t = t δ . If both variants increase independently, the ratio is . By definition, the effective reproduction number of mixed variants R mix (t) is where S(x) = 1/(1 + exp(−x)) is the standard sigmoid function. The range of R mix (t) is R 0 to R δ . Therefore, the effect of the Delta variant is introduced by multiplying by the factor 1 + (r δ − 1)S(γ log(r δ ) · (t − t δ )). The vaccination model is also multiplied by a factor where C V is the infection prevention ratio of the vaccine 38, 39 , and R V (t) is the vaccination ratio in the target area. We approximate the vaccination ratio as R V (t) =(number of vaccinations in Japan up to date t)/2(Japanese population) and fit its data by a sigmoid function as shown in Fig. 3 . The parameters are t V = 590, γ −1 V = 34.6, and R ∞ V = 0.817. The sigmoid function S(x) is suitable for fitting the vaccination ratio because it is saturated in the limit x → ±∞. We determine the other parameters of the Delta variant and the vaccination after t = 500 by fitting the epidemiological data as r δ = 1.68 ± 0.03, t δ = 530 ± 2, and C V = 0.99 ± 0.02, where the model equation is finally The effective reproduction number for each activity, location, and person was investigated. Fig. 5 shows the result for each activity: the sum equals the total effective reproduction number. The stay-out activity dominates the change in the whole effective reproduction number because the parameter for stay-out is 28 and 55 times larger than that for home and move/work, respectively. The effective reproduction number at each location m is defined by the partial sum of Eq. (12) R e (m,t) = 1 as shown in Fig. 6 (a) two months just before the 1st SoE, (b) during the 1st SoE, (c) two months just before the 2nd SoE, and (d) during the 2nd SoE. The movie of the effective reproduction number map each day is provided as Supplementary movie 1, where a raw value is taken instead of a 7-day moving average to calculate the population moments. The sum over the Tokyo metropolitan area gives the total effective reproduction number. High-risk zones are concentrated in downtown Tokyo. Fig. 7 shows the cumulative distribution. This figure shows that the distribution is almost the same for low-risk regions, and a power law approximates it for high-risk regions whose exponents are ( For example, the total effective reproduction number is reduced by 17%, 25%, and 36% if the infection rate is suppressed to 10% in the top-5, -10, and -20 highest-risk 1 km-squares in period (a), respectively. With regard to real restrictions, the 1st SoE successfully reduced the population density in the downtown region. The SoEs reduced the exponents; however, the change amount of the exponent before and after SoEs also decreased. The effective reproduction number for each infected person 42, 43 is defined by the mean number of people to whom they would spread the infection; the average for all people gives the overall effective reproduction number. We calculate it using the time series of the activity and the location in a day by the following relation: where τ is the time label with 15 min intervals, and a i (τ) and m i (τ) represent the history of person i. We refer to this value as the GPS-based individual effective reproduction number. Fig. 8 shows the cumulative distribution function (CDF) of R (i) e (t), where we collect all data for (a) two months just before the 1st SoE, (b) during the 1st SoE, (c) two months just before the 2nd SoE, and (d) during the 2nd SoE. The distribution is approximated by truncated power laws. The exponents between -1 and 0 in (a) and (c) clearly indicate that significant cluster infections or superspreaders dominate the total infection. In fact, the top 10% of people contribute to (a) 58%, (b) 38%, (c) 54%, and (d) 50% of the overall effective reproduction number. A comparison before and during SoEs reveals that the cut-off does not change while the exponent is decreased. This implies that the effects of the SoE are represented by the reduction of the exponent in the CDF power law; it results in the suppression of the overall effective reproduction number. The cut-off corresponds to the largest infection clusters or superspreaders, which was not suddenly changed by SoEs; however, we observe that it decreased slowly from the 1st to the 2nd SoE. In this study, we verified human-activity-dependent COVID-19 infection rates using smartphone GPS data. We classified human activity patterns into four types, 'home', 'move', 'stay-out', and 'work', and estimated the number of social contacts for each daily activity in the Tokyo metropolitan area. Then, we derived an equation from the classical SIR model for GPS data with activity information to be used directly. The model successfully predicted effective reproduction numbers for future reporting. We demonstrated that infection risk is the highest when the people are not at home or work or not moving. By quantitatively understanding the effect of human mobility on infection spread, we distinguished the impact of the Delta variant and vaccination. Furthermore, we derived formulas that divide the effective reproduction number into the contributions from each location or individual. These formulas enabled us to observe the distributions of infection risk. As applications, we present an effective reproduction number map and GPS-based individual effective reproduction numbers, whose distributions obey the power law or truncated power law. The model provides a comprehensive understanding of infection spread of epidemics. A previous research by T. Yabe et al. 9 investigated the COVID-19 spread in the Tokyo metropolitan area during 2020 in detail and discovered the nonlinear relation between the effective reproduction number and the contact index, the number of social contacts among people not in their homes. The nonlinear relation is explained by the change in the ratio of the population moment of stay-out to that of work and move. Another previous research conducted by S. Rüdiger et al. 15 shows that the non-uniformity in the infective contact network has an important role as well as the total number of social contacts. The difference from this study in approach is the origin of the non-uniformity in social contacts: the non-uniform distribution of the infectious people in their study (c.f. it can break the assumption (A3)), and the type of the social contacts in this study. The assumptions (A1)-(A4) determines the limitation of our model, but relaxing some of them can lead to a new application or complex model for epidemics. If we do not adopt the assumption (A1), the effective reproduction number is decreased to the ratio (1 − R(t)/N). This is because the possibility of infection in each location decreases to that same ratio. For the assumption (A3), the non-uniformity in the infective contact network is considered. Although it is impossible to track each person across the days in the GPS dataset for privacy protection, preferences of people with high infection risk could be assumed or observed in another dataset to estimate the spatial distribution of infected people. Epidemiological data COVID-19 epidemiological data were provided by the Ministry of Health, Labour, and Welfare in Japan 27 . The data consist of the number of new infection cases and severe cases from Feb. 2020 and May 2020, respectively, to Oct. 2021. We took a 7-day moving average to remove the periodicity of epidemiological data in a week. The moving average was taken from 6 days prior to the target date. Vaccination statistics were obtained from the Prime Minister of Japan and His Cabinet 36 . GPS data were provided by Agoop Corp, Japan 31 . They include the location of smartphones with a user ID, time, latitude/longitude, and home/work city from 1 Jan. 2020 to 30 Sep. 2021. The median accuracy of the latitude/longitude data was 10 m. The provider restricts the data resolution for privacy protection if users are close to their home, where the latitude/longitude is fixed to the centre of the home 1 km-square. We pre-processed the original data to obtain 15 min interval data using linear interpolation, discarding data that cannot be interpolated sufficiently. The timestamps were every 15 min from 05:00 to 24:00. Data for midnight were discarded because most people stay in bed and do not spread infection. The home/work city data were converted into the home/work 1 km-square data. A home square is a 1 km-square where the user stays at 05:00 in the home city, and a work square is where the user visits continuously for at least 5 h in a day in the work city. Data for iOS smartphones were discarded because they are not gathered if the user is not moving due to the iOS specification, making staying at home undetectable. The converted data cover approximately 0.4 million users in Japan. We investigated the population corresponding to a GPS point to renormalize the GPS data to the actual population distribution. We counted the users at 05:00 for each home prefecture defined by their home city's prefecture for each day. The effective population of one GPS data point in a prefecture was calculated as the ratio of the actual population in the prefecture in Oct. 2019 44 to the number of users counted. At the beginning of 2020, the typical values ranged from 250 in metropolitan areas to 500 in the countryside. Supplementary note 1 | Estimation of population moment using 1km-square data. We consider the case of using the 100m-squares instead of the 1km-squares because the social contact is still inhomogeneous within the 1km-squares. We estimate the population moment M 100m a (t), defined by the 100m-square data of GPS, from the 1km-square data because the GPS data does not have enough resolution in home squares for privacy protection and enough user number for the data limitation. Here we think of 100m-squares labeled by m 100m within a 1km-square marked by m 1km , and a partial sum ∑ m 100m ∈m 1km S am 100m τ (t) 2 . We approximate it as a function of the population of 1km-square S am 1km τ (t). In Fig. 1 , for all 1km-square in Japan, we plot the partial sum of the population moment ∑ m 100m ∈m 1km S m 100m τ (t) 2 conditioned by the 1km population S m 1km τ (t), where activities do not condition them, and the home activity is removed because of its low resolution. Here the 1km population is typically over several hundred if a GPS point is observed. This means that the partial sum is approximated by where ∑ m 100m ∈m 1km S am 100m τ (t) = S am 1km τ (t). If the population is homogeneous in the 1km-square, the coefficient must be 0.01, not 0.024. The population inhomogeneity in 1km-squares makes the coefficients 2.4 times more significant. The population moment is described by Therefore the infection rate β 100m a using the 100m-squares is calculated as where β a is the infection rate in the 1km-square case. COVID-19) Dashboard Timeline: WHO's COVID-19 response Clinical features of patients infected with 2019 novel coronavirus in wuhan, china How will country-based mitigation measures influence the course of the covid-19 epidemic? Economic impact of government interventions during the covid-19 pandemic: International evidence from financial markets Impacts of the covid-19 pandemic on life of higher education students: A global perspective Modelling safe protocols for reopening schools during the covid-19 pandemic in france Non-compulsory measures sufficiently reduced human mobility in tokyo during the covid-19 epidemic Impacts of social distancing policies on mobility and covid-19 case growth in the us The effect of human mobility and control measures on the covid-19 epidemic in china Effect of non-pharmaceutical interventions for containing the covid-19 outbreak in china Covid-19 outbreak response: A first assessment of mobility changes in italy following national lockdown Mobile phone data for informing public health actions across the covid-19 pandemic life cycle Predicting the SARS-CoV-2 effective reproduction number using bulk contact data from mobile phones Mobile phone data for informing public health actions across the covid-19 pandemic life cycle Covid-19 outbreak response, a dataset to assess mobility changes in italy following national lockdown Association of mobile phone location data indications of travel and stay-at-home mandates with covid-19 infection rates in the us Twitter reveals human mobility dynamics during the covid-19 pandemic The use of mobile phone data to inform analysis of covid-19 pandemic epidemiology Effect of covid-19 response policies on walking behavior in us cities Interpretable Sequence Learning for COVID-19 Forecasting Universal scaling of human flow remain unchanged during the covid-19 pandemic Face mask use in the general population and optimal resource allocation during the covid-19 pandemic Association of social distancing and face mask use with risk of covid-19 The risk of indoor sports and culture events for the transmission of covid-19 Visualizing the data: information on COVID-19 infections Effective Reproduction Number Estimation from Data Series Serial interval of novel coronavirus (covid-19) infections SARS-CoV-2) of concern for increased infectivity and transmissibility and altered antigenicity (12th report) Detection of air and surface contamination by sars-cov-2 in hospital rooms of infected patients Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: A statistical analysis of publicly available case data On the relation between active population and infection rate of COVID-19 Policy of distributing 2 cloth masks per address Prime Minister of Japan and His Cabinet Sars-cov-2 variant dynamics across us states show consistent differences in effective reproduction numbers About Pfizer's COVID-19 Vaccine The effect of covid-19 vaccination in italy and perspectives for living with the virus The reproductive number of the Delta variant of SARS-CoV-2 is far higher compared to the ancestral SARS-CoV-2 virus Information for Healthcare Professionals on COVID-19 Vaccine Pfizer/BioNTech Individual-based approach to epidemic processes on arbitrary dynamic contact networks Superspreading and the effect of individual variation on disease emergence We thank Kenta Yamada, Yukie Sano, Takashi Shimada, and Takahiro Nishi for the helpful discussions. We thank Agoop for providing the GPS datasets. This work was supported by the Tokyo Tech World Research Hub Initiative (WRHI) Program of the Institute of Innovative Research, Tokyo Institute of Technology. The authors declare no competing interests.