key: cord-0480190-jnx2od71 authors: Pacheco, Diogo; Oliveira, Marcos; Chen, Zexun; Barbosa, Hugo; Foucault-Welles, Brooke; Ghoshal, Gourab; Menezes, Ronaldo title: Predictability states in human mobility date: 2022-01-04 journal: nan DOI: nan sha: a6e66c6ed3bed5fde15d99b981dcf5ceb24449d9 doc_id: 480190 cord_uid: jnx2od71 Spatio-temporal constraints coupled with social constructs have the potential to create fluid predictability to human mobility patterns. Accordingly, predictability in human mobility is non-monotonic and varies according to this spatio-socio-temporal context. Here, we propose that the predictability in human mobility is a {em state} and not a static trait of individuals. First, we show that time (of the week) explains people's whereabouts more than the sequences of locations they visit. Then, we show that not only does predictability depend on time but also the type of activity an individual is engaged in, thus establishing the importance of contexts in human mobility. Human beings are routine-oriented to the extent that lack of predictability in daily mobility patterns is linked to high levels of stress [1, 2] . This change-averse behaviour leads to people having well-defined routines, which allows for high predictability in daily mobility patterns. Human trajectories have been shown to exhibit regularities at multiple scales, despite the inherent complexity that exists in the choices humans can make for the routes of their daily travels. Indeed, the analysis of large populations via mobile phone data has suggested the possibility of predicting up to 93% of human movement [3] . The predictability of human mobility, however, tells us only part of the story, since it neglects spatio-temporal constraints and the social embedding behind mobility regularities. Therefore, this work demonstrates that mobility predictability should be seen as a transient state, rather than a trait of individuals. The understanding of mechanisms governing human travelling behaviour is crucial to a variety of domains such as epidemic modelling [4] , traffic management [5] , and national security [6] , to name but a few [7] . The modelling of human predictability as a state dependent on activity being performed (spatio-social) and the time of such activity (temporal) can lead to better decisions within the aforementioned domains as it offers a finer, more detailed, view of human dynamics. arXiv:2201.01376v1 [physics.soc-ph] 4 Jan 2022 Several factors in our daily lives, such as work schedules and biological processes, restrict our travelling behaviour. For instance, our internal circadian and ultradian (i.e. less than 24h) rhythms have a direct impact on our activity schedules and therefore on our mobility patterns [8, 9, 10, 11, 12] . Interestingly, a sudden absence of constraints can completely alter the predictability levels and rhythms in human mobility (e.g., as in the recent COVID-19 lock-down procedures [13] where people were not bound to usual constraints and constructs). These constraints are likely to shape the uncertainty on the whereabouts of people, which in turn brings more demand to services such as transport networks, grocery stores, and hospitals; the lack of predictability reduces our ability to plan based on demand. Guessing that a person will be at home on a Tuesday at 4am will most likely be a correct guess for most individuals under "normal" circumstances. The same cannot be said to be true for the same individual at 11am on Saturday morning, in fact, it may be much harder to know a person's whereabouts during times in which they are not bound by typical daily rhythms. Such variations lead to predictability being a temporal/transient state of individuals. In all certainly, this state also depends on social and spatial aspects but differences that are imposed by socio-spatio aspects are captured also by predictability (or lack there of) in the temporal dimension. For instance, the presence of someone at a pub at 12pm or 9pm carries is not only a temporal difference; the location of the pub (spatio) and who will be there with this person (social) is related to the time of the event. In this work, we describe the temporal regularities of the theoretical predictabilities of human mobility, and examine their different frequency and time components. Our results suggest that in addition to the daily routines, mobility diversity is also marked by periods of approximately 12h and 6h, which correspond to the second and fourth harmonics of our internal circadian rhythm. These findings suggest that the processes responsible for our visitation regularities are governed by our internal biological cycles beyond the sleeping and feeding needs, evidenced by predominance of the 12h periods over the 8h and 6h cycles. We argue that such patterns should lead to the predictability in human mobility to be considered a transient state rather than a fixed characteristic of individuals. In a seminal study in human mobility, Song et al. [3] used cellphone data and proposed an information theoretic approach to estimate mobility predictability based on the uncertainty of visiting a number of different locations. Knowing the sequence of visitations (entropy-rate) in addition to their frequencies (Shannon entropy) was the key to estimate predictability. They showed one's next visited location could be predictable at most 93% of the time. In the present work, the locations of social media check-ins are used to approximate one's mobility trajectory. This data is, however, fundamentally different from cellphone data. For instance, the location of a mobile user is given by the triangulation of tower positions, whereas for social media users it represents the latitude/longitude coordinates of the place checked-in. Also, check-in data is always intended to be provided while cellphone users can be tracked continuously without explicit consent, e.g., while receiving a call (see more details in the Materials and Methods section). Before introducing the new concepts of the proposed predictability state, we validate our data by replicating Song et al. [3] . Figure 1 shows that for our datasets the predictabilities derived from sequence of check-ins (visited locations) Π c peak around 40%. They are indeed higher than the one based on the frequency of check-ins Π u , around 20%. It is worth noting that these figures are markedly lower than the ones previously reported, 93% [3] and 71% [14] . Such difference can be explained by the fact that, in our work, locations are determined by the places in which users have checked-in without any spatial coarse-graining. As pointed out by Ikanovic and Mollgaard [14] , the predictability measure is dependent on the spatial resolution such that when the spatial resolution ∆s → ∞, the predictability Π → 1. Conversely, when ∆s → 0, the predictability Π → 0. Predictability based on trajectory (sequence of visited location) indirectly encapsulates a temporal order, but it ignores any time property larger than the one used to define the trajectory itself. That is, if a trajectory is the sequence of visited locations during one day, the days themselves are not always distinguished. Moreover, the time interval in which humans move between locations is not constant over time. Sleeping, eating, working are just a few examples of activities constrained by distinct duration, time, and space. Therefore, a 93% predictability of the next location turns out to be relatively meaningless to predict when it will be visited. Moreover, it reveals the predictability as a transient, time-dependent, state rather than a constant property of an individual. We can go further and say that the transiency arises from time, space, and social aspects, although in this work we focus on the first two aspects. Traditionally, two trajectories would be considered identical if the sequence of visited places is the same, regardless of which day of the week they were performed. We break the visited location trajectories into weekday-hour bins A B Figure 1 : Although humans' choices seem complex and unpredictable, Song et al. [3] have exposed human mobility being fairly predictable when considering temporal sequence in addition to frequency of visitation. Here, we replicate this finding using the three datasets that capture human mobility. The distributions of (A) entropy H and (B) predictability values. Sequences predict more than frequencies Π c > Π u and are peaked around 40% for all three datasets. to represent the time dimension embedded in the data beyond a single day. For instance, these trajectories would be represented differently if they are performed in distinct days of the week or shifted within a day (see Materials and Methods for more details). In this work, we measure how typical it is for someone to be in a specific place at a specific hour in a specific day of the week, rather than how typical a sequence of visitation throughout any day is. We measure the predictability Π of the 168-hour independent bins based on the frequency of visited locations in that particular bin. With an informationtheoretic measure instead of a simpler quantity such as the relative location frequencies or the pure location diversity, we have encapsulated in a single quantity both the location diversity and their relative frequencies. Figure 2 shows the predictability timeline for different groups of users in three datasets. It reveals a remarkable 24h periodic patterns of predictability. We analysed the impact of user heterogeneity by grouping them based on the number of unique visited locations S. As one would expect, the predictability decreases as the number of visited places increases. We also grouped users based on their radius of gyration-the geographical coverage based on the coordinates of the visited places-and their engagement level with the social media platform-the monthly average number of days with mobility trajectories, i.e., day with at least two check-ins. In all cases, the results show it is harder to predict the mobility of users with more diverse routes; regardless of how we measure this diversity. Moreover, the predictability is higher during nighttime, peaking around 4-5 AM when most people are sleeping at home, regardless of dataset. Although all datasets are from location-based social network platforms, the predictability amplitude in the Weeplace data is much larger than what we observed in the other datasets. A possible cause is the fact that the Weeplace data was provided by Foursquare users who were interested in visualising their check-in history. It is reasonable to believe that these users were, on average, more active (in terms of their check-in history) than a regular LBSN user. In fact, the median number of check-ins of BrightKite and Gowalla users were 11 and 25 respectively, whereas for the Weeplace this figure was 329 check-ins. Also, a median Weeplace user has visited 131 single locations while for BrightKite and Gowalla these numbers were 8 and 19 unique locations respectively. To better understand the generalisation of the results shown in Figure 2 , we also analyse the aggregated patterns of the entire population. Figure 3 -A shows the analysis of the wavelet power spectrum and determines the significance (dashed line) of the 24h and 12h cycles for the populations in all of our datasets. Not surprisingly, this analysis reveals that the circadian period (approximately 24h) is the most prominent component of the predictability regularity. However, revealing the 12h component as the second-strongest component was not expected (i.e., the circasemidian period). For instance, given the working schedules and the sleeping cycles, one would expect finding significance for the 8h period as well. The third-strongest component is centred approximately around the 6h regime during the day, even though the signal is not significant at the population level. By combining the strongest component with the variance of predictability observed throughout the week, we reveal a strong spectral agreement among all datasets. Figure 3 -B shows the standardised predictability timeline for each dataset based on the sample mean of the predictabilities P i across all users within each time bin t. Though the three datasets are from different sources, they seem to capture the same regularities. Such finding suggests that the temporal variation of the mobility diversity is activity-independent and therefore is likely to be a characteristic manifestation of the underlying human dynamics. An alternative hypothesis to the influence of the 12h rhythm is that these periods are in fact rooted in population-level heterogeneity on the activity routines. For instance, the 12h component could be explained by the same 24h rhythms if a large proportion of the users had their 24h schedules offset by 12h. To test for this hypothesis, we performed the wavelet analysis on individual-level data. Interestingly, this analysis reveals the 6h period to be more prevalent than the 12h. Figure 3 -C shows the percentage of users and their strongest component. Gowalla users are more distinct from others as they are less likely to have circadian periods and for having significantly more 6h periods. A possible explanation for this observation is the trips feature offered by this platform, incentivizing same category visits, such as pub crawls. As already mentioned, the emergent circadian pattern across groups of individuals and datasets, not surprisingly, reveals the predictability peaking at night when people tend to go (and stay at) home. Yet, this result brings up a new aspect of predictability that was never considered before. Such finding implies that, for instance, stating that "an individual is 80% predictable" must be interpreted in an averaged sense. Missing from this is the instantaneous changes in a person's predictability state over time. Figure 4 -A shows how time-granularity affects the predictability. As the time window decreases, from a week-day (seven 24-hour bins per week) to a week-day-hour (hundred and sixty-eight 1-hour bins) representation, the average predictability increases. Interestingly, if one only distinguishes the days of the week (i.e., 24h bins), the average predictability is equivalent to Song's time-frequency predictability [3] . Figure 4 -B explores the effects of different time windows in our datasets. It shows the average predictability monotonically decreasing until the 24-hour window, where it seems to stabilise. Specifically for Gowalla, predictability sharply drops when the time window is bigger than 3h as the strong 6h-period component shown in Figure 3 -C might be scattered. Figure 4 : Effects of time-granularity in the limits of predictability. As the time window decreases, the average predictability increases. The temporal dependency in peoples' whereabouts arguably results from daily routines due to individual, social, and spatio-temporal constraints. Likewise, we expect that the predictability in human mobility depends on the context. For example, it is plausible that people are more predictable about their workplace than the restaurants they visit. To investigate the role of context in predictability, we examine the places individuals visit over time using the Weeplaces dataset. The rightmost panel in Figure 1B shows the distribution of predictabilities when the trajectories are composed by all data available. For this investigation, they are the baselines. Figure 5 shows distributions of predictabilities when trajectories are filtered to only contain places of specific categories (e.g., food, shop, home). For instance, a contextdependent trajectory for food could be: breakfast-place-A, lunch-place-B, coffee-place-C, etc; while a full baseline trajectory could be: home-place-X, breakfast-place-A, work-place-Y, lunch-place-B, coffee-place-C, gym-place-Z, etc. As expected, Figure 5 shows the limits of predictability (solid-lines) can increase or decrease depending on context (comparison against the dotted-lines, i.e., full trajectories). For example, activities related to home and work are more predictable than the average baseline. In contrast, leisure activities (i.e., nightlife and entertainment) are less predictable than the average activity. This finding suggests that knowing about the context that people are embedded can inform us about their predictability. To understand the role of context on human mobility, we investigate the extent to which context helps us estimate an individual's predictability. Precisely, we analyse how well we can estimate an individual's predictability based on this individual's context preference. For instance, we want to understand whether knowing that a person goes shopping will often inform us about this person's overall predictability. We first represent this context preference by using the relative frequency that an individual stayed at places from each category. Then, we use linear regression to determine the predictability of an individual based on context preference. Figure 6 shows that such a simple model can estimate well an individual's predictability (R 2 = 0.424, MAE = 0.045). Figure 6A plots individual predictabilities Π c (estimated based on full trajectories over multiple days, e.g., home, restaurant, work, restaurant, etc.) against the estimated predictabilitiesΠ c (based on context profiling, e.g., 80% home/work, 18% food, 1% nightlife, etc). Figure 6B shows the residuals are normally distributed and centered on zero. This result implies that having a piece of coarse-grained information about individuals (i.e., context preference) informs us about an individual's characteristic-their intrinsic mobility uncertainty-that we would need to have fine-grained data (i.e., check-in data) to calculate. Meeting people's needs requires governments, industries, and other stakeholders to be able to plan for demands (e.g., hospital admissions, public transportation, store opening times, etc.). The predictability of human movement is at the heart of planning, hence more accurate modelling should lead to better planning and better decision-making. Previous research models predictability as aggregate value for an individual, which fails to capture the temporal variations of such regularities for different times of the day and days of the week. In this work, we model predictability as a state that is time-dependent but also context-dependent. With this definition, we show that the state of predictability of individuals varies significantly and hence should not be seen as an intrinsic characteristic of the individuals. The main implications of this work is that planing of activities related to human mobility (e.g. city events, epidemic modelling, road maintenance) need to consider time-space variations of individual activities. Furthermore, during periods of restrictions, such as the COVID-19 pandemic, the understanding and characterisation of these time-spatial variations can aid governments to make the correct decisions. In 2020 and 2021, many governments imposed curfew/lockdown measures to citizens after certain hours (e.g., Spain, Colombia) in a "blanket" way. Effective curfews depend on the time A B Figure 6 : Knowing the type of activity people engage helps us to estimate their predictability in mobility. and the location, and the consideration of such variations can lead to a better approach where not all areas are treated equally. Using predictability as a state and using the states for planning could lead to more just/equitable outcomes. This work has limitations, but in most cases they are related to a dependency to high-resolution data. We have demonstrated here that even using a somewhat mediocre data resolution, we can characterise the state of predictability of individuals. With access to higher resolution data, such as the ones being collected as part of "track-and-trace" systems in certain countries, our modelling can lead to more accurate characterisations. Many governments and companies (as part of "data-for-good" efforts) are starting to open their datasets to scientists which will naturally lead to better urban analytics including human dynamics modelling. We use data from different location-based social networking services (Table 1 ). Brightkite and Gowalla were two popular social networking sites that existed from 2007 until 2012. Weeplaces was a website in that users could upload their check-in activities from other social network services (e.g., Facebook Places, Foursquare). These datasets contain users' check-in activity including user identification, location coordinates (i.e., latitude and longitude), and time stamp. Additionally, the Gowalla dataset contains a description (i.e., the category) of the locations (e.g., nightlife, outdoors). All datasets are publicly available. To study individuals' mobility, we use their check-in activity to create trajectories, described as time series of the form where x(t) ∈ V is a place, and V is the set of visited places. In our work, we want to investigate the dynamics of this time series and its embedded uncertainty. Specifically, we examine the visitation preferences of an individual and the emerging patterns in these preferences. To investigate an individual's visitation preferences, we analyze the probability p(i) of visiting a location i ∈ V and the number of locations N = |V| visited by this individual. The value of N tells us about the extent that an individual visits places broadly. However, N neglects visitation frequency, missing the existence of favorite locations. To account for frequency, we are also interested in the spread of the probability distribution p(i). To do so, we use the Shannon entropy S unc (p) of the random variable X, defined as the following: and expressed in bits. The entropy S unc quantifies the uncertainty regarding the places that a specific user visits over time. For example, when an individual visits one place only, S unc (p) is zero because of the low uncertainty about this individual's whereabouts. In contrast, an individual without any favourite location has the highest uncertainty and the maximum entropy, which occurs when p(i) = 1/N for ∀i. In this hypothetical case, the entropy simplifies to S rand (p) = log 2 N, and represents the uncertainty of a visitation preference following a uniform distribution. That is, the uncertainty regarding this individual is the same as guessing randomly. Note that the Shannon entropy captures the uncertainty of an individual without accounting for any temporal correlation or patterns of visitation, since it neglects the sequence in which events (i.e., visitation) take place. To investigate emerging patterns in the visitation preferences of individuals, we can understand X as the result of processes that generate such sequences. Precisely, we analyse the source entropy rate h µ of a stochastic process [18, 19] . The entropy rate tells us the uncertainty of a trajectory while discounting for the recurrent patterns in it. In our case, h µ quantifies the irreducible uncertainty of an individual even when we learn about their visitation patterns that emerge throughout time. In our study, we use the Lempel-Ziv compression algorithm to estimate the entropy rate of an individual, following a previous work [3] . We call this estimate the time-correlated entropy and denote it as S. Not only do these entropy values enable us to assess the uncertainty in the mobility of individuals, but they also help us to characterise the extent to which an individual's trajectory is indistinguishable from random-they enable us to estimate the limits of predictability of an individual's mobility. In our work, we are interested in estimating the probability Π of correctly predicting future locations of an individual, given a past series of observations. Song et al. [3] showed that Π is subject to Fano's inequality and has an upper bound, denoted as Π max . This upper bound reveals the theoretical upper limit to predict the future location of an individual by restricting ourselves to the trajectory data only. For example, Π max = 0.7 tells us that an individual's trajectory exhibits an intrinsic uncertainty that makes their behaviour indistinguishable from random 30% of the time. This randomness restricts the predictive power of any algorithm seeking to predict the future locations of this individual. Though it is infeasible to measure Π max directly, the quantity has an explicit relationship with the the time-correlated entropy: To estimate Π max of an individual, we need to solve Eq. (3) using a numerical solver, given that we know the number of locations N and entropy S. Similarly, we can estimate the hypothetical predictability of an individual if this individual lacked favorite locations or visitations patterns. To estimate these special, respectively, we replace S with S rand and S unc to find their respective Π rand and Π unc . We are also interested in the temporal dependencies of these limits of predictability,thus we analyze individuals at different moments of the week. Specifically, we split the trajectory of each individual into time slots representing the time of the week. We create 168 slots (i.e., 24 hours × 7 days of the week) and define X t=t0 as a random variable representing the places that an individual visits at the time slot t = t 0 ∈ [1, . . . , 168]. We measure entropy and predictability limits of this random variable regarding each individual in the datasets, which enables us to construct their respective time series of predictability. We note that, in our approach, the idea of a mobility trajectory as an arbitrarily long sequence of visits is demoted in favor of a routine-oriented perspective. Thus, the sequential information-on which the entropy rate leverages-is less relevant. To understand the temporal dimension of human mobility predictability, we use the continuous wavelet transform to describe the regularities in the time series of the individuals. With the wavelet transform, we extract both time and frequency components from a time series. The method has a long history of successful applications to a variety of domains such as climate prediction [20] , digital image processing [21] , and crime dynamics analysis [22] . The wavelet transform decomposes a time series using functions, called wavelets that dilate (scale) to capture different frequencies and that translate (shift) in time to include changes with time. We can define the wavelet transform of a discrete sequence Y = {y(1), y(2), . . . , y(N )} having observations with a uniform step δt as the following: where the ' * ' denotes the complex conjugate and s is the wavelet scale. The wavelet transform can be seen as the cross-correlation between the time series y (t) and a set of functions ψ * s,τ (t), distributed over time and having different widths [23] . By varying the scale s and translating over time (i.e., varying n), we have a representation of the amplitude of the different periodic features of Y and how they vary with time. To examine the overall periodicity in Y , we evaluate the average of W Y c (s, n) over n: called the global wavelet spectrum, which provides us with an unbiased estimate of the true power spectrum [24] . We analyze its statistical significance by using the method developed by Torrence and Compo [25] , which tests the wavelet power against a null model that generates a background power spectrum P k . The test is given by: where ν = 2 for complex wavelets (our case) [25] . The morning rush hour: Predictability and commuter stress Aversive event unpredictability causes stress-induced hypoalgesia Limits of predictability in human mobility Multiscale mobility networks and the spatial spreading of infectious diseases Understanding road usage patterns in urban areas Conformal Anomaly Detection: Detecting Abnormal Trajectories in Surveillance Applications Human mobility: Models and applications Ultradian, circahoral and circadian structures in endothermic vertebrates and humans Plasticity of the intrinsic period of the human circadian timing system Coupling human mobility and social ties Unravelling daily human mobility motifs Spatiotemporal Patterns of Urban Human Mobility Analysis of socioeconomic aspects related to mobility patterns in the uk during the covid-19 pandemic An alternative approach to the limits of predictability in human mobility Friendship and mobility: user movement in location-based social networks Exploiting geographical neighborhood characteristics for location recommendation Maps: A multi aspect personalized poi recommender system Elements of Information Theory Web routineness and limits of predictability: Investigating demographic and behavioral differences using web tracking data Inter-annual to inter-decadal streamflow variability in Quebec and Ontario in relation to dominant large-scale climate indices Image analysis with two-dimensional continuous wavelet transform Spatio-temporal variations in the urban rhythm: the travelling waves of crime Time-dependent spectral analysis of epidemiological time-series with wavelets On estimation of the wavelet variance A Practical Guide to Wavelet Analysis This work was supported by the US Army Research Office under Agreement Number W911NF-17-1-0127.