key: cord-0202034-38ty9jtw authors: Ayan, Necati A.; Damasceno, Nilson L.; Chaskar, Sushil; Sousa, Peron R. de; Ramesh, Arti; Seetharam, Anand; Rocha, Antonio A. de A. title: Characterizing Human Mobility Patterns During COVID-19 using Cellular Network Data date: 2020-10-27 journal: nan DOI: nan sha: e5525c0b10ce15a6f0dcf1e2562412e5b755cf62 doc_id: 202034 cord_uid: 38ty9jtw In this paper, our goal is to analyze and compare cellular network usage data from pre-lockdown, during lockdown, and post-lockdown phases surrounding the COVID-19 pandemic to understand and model human mobility patterns during the pandemic, and evaluate the effect of lockdowns on mobility. To this end, we collaborate with one of the main cellular network providers in Brazil, and collect and analyze cellular network connections from 1400 antennas for all users in the city of Rio de Janeiro and its suburbs from March 1, 2020 to July 1, 2020. Our analysis reveals that the total number of cellular connections decreases to 78% during the lockdown phase and then increases to 85% of the pre-COVID era as the lockdown eases. We observe that as more people work remotely, there is a shift in the antennas incurring top 10% of the total traffic, with the number of connections made to antennas in downtown Rio reducing drastically and antennas at other locations taking their place. We also observe that while nearly 40-45% users connected to only 1 antenna each day during the lockdown phase indicating no mobility, there are around 4% users (i.e., 80K users) who connected to more than 10 antennas, indicating very high mobility. Finally, we design an interactive tool that showcases mobility patterns in different granularities that can potentially help people and government officials understand the mobility of individuals and the number of COVID cases in a particular neighborhood. Our analysis, inferences, and interactive showcasing of mobility patterns based on large-scale data can be extrapolated to other cities of the world and has the potential to help in designing more effective pandemic management measures in the future. COVID-19 is a global pandemic that has infected human beings in all countries of the world. Lack of physical distancing, isolation, maskwearing, and effective contact tracing are some of the key factors that have contributed to the spread of COVID-19 and transformed it into a global pandemic. To mitigate the spread of the disease, most countries around the world have implemented varying levels of lockdown. Though lockdowns have been effective in decreasing the rate of spread [4] , they have not been able to curb the disease and infections continue to soar in countries around the world. Therefore, there is an urgent need to understand the effects of lockdown on mobility patterns, so they can be effectively integrated into government policies to manage the COVID-19 pandemic. In this paper, our goal is to analyze and compare cellular network usage data (comprising of phone calls, 3G/4G data connections, and text messages) from pre-lockdown, during lockdown, and postlockdown phases of the COVID-19 pandemic to understand and model human mobility patterns, and evaluate the effect of lockdown on mobility. To this end, we collaborate with one of the main cellular network providers in Brazil, TIM Brazil, and conduct a large scale study by collecting and analyzing anonymized cellular network connections from all users in the city of Rio de Janeiro, the second most populous city in Brazil, and its suburbs. The data consists of individual connections made by users to approximately 1400 cellular antennas in and around the city of Rio de Janeiro and its suburbs during each 5-minute interval from March 1, 2020 to July 1, 2020. There are approximately 120 million connections for each day made by approximately 2 million users per day during this time period in our dataset, amounting to a total of approximately 10 billion connection logs. As Brazil enforced strict lockdown measures between the third week of March and end of May, the data and the ensuing analysis provides valuable insight into human behavior and mobility in the pre-lockdown, during lockdown, and post-lockdown time periods, making it a comprehensive study of mobility that can offer valuable perspective for effective lockdown and pandemic management. Our main contributions are summarized below. Connectivity and User Mobility Analysis: Our data analysis reveals some interesting trends. We observe that the total number of cellular connections decreases to 78% during lockdown and then increases again to 85% of the pre-lockdown values during the post-lockdown period. The number of distinct users using cellular network connections also increases in the post-lockdown period as more people venture outside and thus need to use the cellular network. To investigate the impact of lockdown on mobility, we investigate the top 10% of antennas (i.e., 140) that carry the highest amount of traffic. We observe that new antennas emerge in the top 10% both in the lockdown and post-lockdown phases that replace some of the antennas in the top 10% in the pre-lockdown phase. A closer look reveals that antennas that serve downtown Rio de Janeiro as well as those that serve the commercial hubs of some of the main districts of the city no longer feature in the top 10% of antennas during lockdown. We also conduct user mobility analysis and observe that while approximately 35-40% users exhibit no mobility (i.e., connect to a single antenna per day) during the lockdown and post-lockdown periods, approximately 4% of users (i.e., 80K users) exhibit high mobility (i.e., connect to more than 10 antenna per day). This high mobility is interesting as it is likely to be demonstrated by essential workers and those flouting lockdown measures. Graph-based User Mobility Analysis: We next conduct a graphbased analysis to better understand and model the mobility patterns of people during COVID-19. To this end, for each day, we construct a graph where the antennas correspond to the vertices and the movement of users between antennas corresponds to the weight of that particular edge. We determine the total in-degree of the nodes of the graph to quantify the total number of mobility events and observe that user mobility starts increasing around 3 weeks before the end of lockdown, with the trend continuing into the post-lockdown period. We also investigate the impact of the day of the week on mobility. We observe that weekdays and Saturday have similar levels of mobility during and after lockdown while Sunday has the least mobility. This difference in mobility between the other days of the week and Sunday is interesting because it suggests that many people do not have the opportunity to work from home even during the lockdown period. We next construct heatmaps for mobility by grouping the antennas into main municipal regions of the city to identify: i) the geographical regions that have the highest mobility, and ii) to investigate the change in the mobility of particular antennas during and post lockdown. Interestingly, we detect a connection between mobility patterns observed at antennas and the social progress index (SPI) of the region in which they are located. We observe that antennas in regions having a low SPI often exhibit higher mobility when compared to the ones in regions having a higher SPI. COVID-19 Borescope: Finally, we design a visual/interactive tool, COVID-19 Borescope, which helps people and government administrators analyze the mobility of individuals as well as correlate it with the number of COVID-19 cases in the city. To deal with the massive amount of data, we use an optimized version of the nanocubes data structure to make the tool scalable and highly interactive. The interactive web interface provides multiple functionalities including selecting specific regions of the city, specifying date ranges, zoom in/out capabilities, and shows the total number of active cases, recovered cases, and deaths. Concluding Remarks and Ongoing Efforts: We conclude our introduction with some final remarks and outline our current research efforts. (1) The large scale nature of our study where we discern and model the mobility patterns of 2 million users each day in Rio de Janeiro and its suburbs, the second most populous city in Brazil, coupled with the fact that most countries around the world have failed to effectively contain the pandemic provides us the footing to confidently hypothesize that the observations and conclusions drawn here can be extrapolated to cities around the world. (2) Overall, our research reveals that while lockdowns reduced the amount of human mobility, a high (approximately 15%) of the population still ventured significantly out of their neighborhood, which could have partially contributed to our failure in containing the spread of COVID-19. With COVID-19 cases once again on the rise and countries around the world bracing for a second wave, our analysis shows that if governments resort to lockdowns as a measure to contain the disease, stricter implementation of lockdown measures may be necessary to decrease the mobility of people. (3) Our analysis and the interactive website arms government authorities with scientific analysis and tools to design and implement effective policies to contain the current pandemic. Importantly, the learnings from this work along with our ongoing research on designing and integrating mobility prediction models can enable authorities to take minimally invasive actions (e.g., traffic rerouting, city planning) to avert a surge in infections in place of widely unpopular blanket lockdown interventions [1]. Additionally, as part of our ongoing efforts, we are investigating the correlation between mobility and the number of reported COVID-19 infections, which can further enable us to mitigate the spread. Our work on characterizing mobility during COVID-19 touches upon different areas such as Internet and web measurement studies, mobility analysis and modeling, and analysis of mobile use with ties to geographic locations. In contrast to existing work, the primary goal of this work is to understand and model human mobility by leveraging cellular data connections during COVID-19, and lays the foundation for designing analytics-based tools and models to improve societal and governmental preparedness and response. Due to the recent nature of the pandemic, there is limited work on measurement studies examining the impact of pandemic on different network parameters. Lutu et al. [20] characterize the impact of COVID-19 on mobile network operator traffic and analyze the changes brought upon by the pandemic. Feldman et al. [13] analyze Internet traffic during COVID-19 and find that the overall traffic volume increases by 15-20% within a week of the pandemic. There is also work on measuring the reaction to the pandemic on the Internet and social media [5, 8] . Zakaria et al. [31] analyze the impact of COVID-19 control policies on campus occupancy and mobility via passive WiFi sensing. Trivedi et al. [27] use passive WiFi sensing for network-based contact tracing for infectious diseases, particularly focused on the COVID-19 pandemic. Modeling human mobility using cellular network and mobile application data has been a problem that has garnered interest in the last decade. Some notable ones here are predicting human mobility using attentive recurrent neural networks [14] and spatio-temporal modeling and prediction using deep neural networks [28] , learning to transfer mobility between cities [17] , leveraging cellular network data for understanding fine-grained mobility [12] . Zhang et al. [32] develop a real-time model for human mobility using multi-view learning and Zhu et al. [34] develop spherical hidden Markov model for understanding human mobility. There is also work on analyzing data in relationship to the geographic locations [11, 25] . Cao et al. [6] conduct measurement studies on the predictability of human movement in a college campus using WLAN measurements. Pattern mining approaches to detect underlying mobility patterns have also been developed [10, 18] . Chaganti et al. [7] Sadri et al. [24] develop a continuous model to predict user mobility in a day. Nikhat et al. [23] present an analysis of user mobility in cellular networks. There is also work on early detection of gathering events by understanding the traffic flow [33] . There is also work on measurement studies in networks understanding how users transition across networks [30] , measuring city-wide signal strength [2] , modeling mobility using a mixed queueing network model [9] , empirical characterization of mobility of multi-device Internet users [26] , and quantitatively evaluating different mobility approaches across different architectures [7] . There is also extensive work indoor localization and location prediction [3, 22, 29, 29] , which are also related to human mobility prediction and analysis. In this section, we describe the cellular network traffic datasets that we collect and use in our analysis. We collaborate with one of the main cellular network providers in Brazil, TIM Brazil, and collect and analyze cellular network connections from all users using this cellular provider in the city of Rio de Janeiro and its suburbs. Our goal is to analyze and compare cellular network usage data (comprising of phone calls, 3G/4G data connections, and text messages) from pre-lockdown, during lockdown, and post-lockdown phases to understand and model human mobility patterns during the COVID-19 pandemic, and evaluate the impact of lockdowns on mobility. We collect and log individual cellular connections made by users to approximately 1400 cellular antennas in and around the city of Rio de Janeiro and its suburbs during each 5-minute interval from March 1, 2020 to July 2, 2020. The data consists of approximately 120 million connections logs for each day during this time period, which encompasses approximately 2 million users per day. Overall, the entire dataset comprises of approximately 10 billion connection logs. As Brazil enforced strict lockdown measures from the third week of March to the end of May, the data provides valuable information on human behavior and mobility in the pre-lockdown, during lockdown, and post-lockdown time periods. We note that the cellular network provider has anonymized the data to ensure user privacy. We investigate mobile connection data at two different granularities in our analysis: i) aggregated data measured at the antenna level, corresponding to the total number of connections made to each antenna (aggregated), and ii) anonymized individual connections made by each mobile device to the antennas (individual). While the aggregated data is available for the entire duration of the study, the individual data is only available from April 5 ℎ , 2020. We present some details and statistics about the datasets to understand them before proceeding to a more detailed analysis. In Table 1 , we present some example instances from the aggregated data. The data represents the number of connections made to a specific antenna at a specific instance in time. For example, on the day 04-26-2020 at the time 00.00.00, there are 262 connections to the antenna located at the coordinates [-20.837028, -43.563111]. In Table 2 , we present some example instances from the individual data. Here, each data instance corresponds to a single anonymized user connecting to an antenna at a specific instance in time. Figure 1 shows the total number of connections across the entire duration of our study. Orange vertical lines in the figure separate the different phases in the pandemic: i) the first period from March 1, 2020 to March 16, 2020 represents the period before lockdown, ii) the second period from March 16, 2020 to June 1, 2020 represents the period during lockdown, and iii) the third period from June 1, 2020 to July 1, 2020 represents the period after lockdown when the lockdown measures were eased. The red line captures the average number of connections per week. From the figure, we can see that the total number of connections shows a steadily decreasing trend when lockdown measures are imposed, followed by a period of low connection activity, and then a gradual increase even before lockdown measures are eased. The number of connections continue to increase after the lockdown is lifted, but still the numbers are lower than the pre-lockdown period. Figure 2 represents the number of distinct users per day from April 5, 2020 to July 2, 2020 in the individual dataset. We again see a reduced number of user connections during the lockdown period when compared to the period after lockdown. From our initial analysis, we observe a discrepancy in the data collection for three days (May 5, 2020, June 21, 2020, and June 22, 2020). We exclude data from these days and then normalize the rest of the week's data for a seven-day period so that it does not interfere our interpretations of change in mobility due to COVID-19. To facilitate a better understanding of our analysis and inferences on human mobility patterns, in this section, we provide background on the socio-economic and geographic distribution of people in the different regions of Rio de Janeiro. To this end, we consider the map in Figure 3 , which depicts the municipality administration categorization of the city and indicates the Social Progress Index (SPI) for each administrative region of Rio de Janeiro. The SPI is a universal performance metric that captures the socio-economic situation [21] . The map illustrates the administrative regions with low, fair, good, and high SPIs that can be correlated to the average quality of life of the population in these regions. This background fundamentally drives our analysis, and provides us with the necessary context to pose important research questions and draw insightful conclusions. Rio de Janeiro, like most major cities in the world, has a significant number of companies and business establishments in the downtown area (indicated on the map as CE) or in the surrounding neighborhoods (PO -Portuaria, RC -Rio Comprido, BO -Botafogo, CO -Copacabana). Additionally, while some of the regions (e.g., BO -Botafogo, CO -Copacabana, LA -Lagoa, TI -Tijuca, BT -Barra da Tijuca) are relatively close to the downtown (i.e., a short driving distance away or easy to access via public transportation), some other regions (such as SC -Santa Cruz, CG -Campo Grande, BA -Bangu, RE -Realengo, JA -Jacarepagua) are pretty far and/or very time consuming to commute to/from downtown daily. Another important detail to note is that people/families with higher socio-economic status usually live near the coast. Administrative regions such as BO -Botafogo, CO -Copacabana, LA -Lagoa, TI -Tijuca, BT -Barra da Tijuca are more expensive and, consequently, have a higher SPI. On the other hand, the lesser privileged population tends to live farther from the coast/downtown. The administrative regions such as SC -Santa Cruz, CG -Campo Grande, BA -Bangu, and RE -Realengo have a lower SPI. One important thing to note is that although Jacarepagua (JA) has a high SPI, there are some parts within this region with a lower SPI. One of the larger lower SPI regions within JA is highlighted on the map (known as Cidade de Deus area). In this section, we present analysis on the aggregate and individual data to answer the following questions on user connectivity and mobility: (1) Which antennas/locations correspond to the maximum user connectivity and traffic and how do they vary during the different phases of the pandemic? (2) What percentage of users are mobile/stationary and how does that vary during the different phases of the pandemic? (3) How does user mobility change with the day of the week and how does that vary during the different phases of the pandemic? (4) What antennas in each region attract the maximum number of users in a week and how do they vary during the different phases of the pandemic? We first conduct analysis on aggregate connectivity data and present results on how the connectivity in the top antennas change across pre-lockdown, during lockdown, and post-lockdown time periods. Figure 4 shows the percentage of traffic supported by the top 10% antennas. We first consider the top 10% antennas before lockdown (solid red line) and the top 10% of antennas during lockdown (dashed blue line), both ordered in decreasing order by the percentage of traffic supported by them. We observe that the top 10% antennas during lockdown overall incur a higher percentage of traffic when compared to the top 10% of antennas before lockdown even though the absolute amount of traffic during lockdown is lower than prelockdown ( Figure 1) . Now, to understand the difference between the traffic in the antennas before and during lockdown, we consider the same set of antennas that garner the top 10% traffic before lockdown and plot the traffic supported by them during the lockdown period (dotted magenta). We keep the order of antennas here same as the solid red line (top 10% before lockdown in decreasing order of traffic) to enable an easier visual comparison. We observe that some antennas incur a significantly lesser percentage of traffic during lockdown (the low points in the dotted magenta line). We also observe that some of these antennas forfeit their position in the top 10% during lockdown (marked as red dots in dotted magenta) and other antennas take their place in the top 10% during lockdown (blue stars in the dashed blue line). Having examined the differences in connectivity in the top 10% antennas, we proceed to analyze the locations of these displaced antennas. In Figure 5 , red markers represent the antennas that are in the top 10% antennas before lockdown but not in the top 10% of antennas during lockdown (corresponding to the red dots in Figure 4 ). And blue markers represent the antennas that emerge in the top 10% during lockdown but are not in the top 10% antennas before lockdown) in Figure 4 . We see that heavily trafficked antennas during lockdown emerge in the suburbs, e.g., SC and GU regions (blue markers in the left side of the map in Figure 4 ) due to restrictions in mobility due to lockdown. The antennas that get displaced from top 10% during lockdown are in the downtown regions, e.g., CE and CO in Figure 3 (corresponding to red markers in the right side of the map in Figure 4 ). This is expected because majority of people are likely to be working from home during the lockdown period. We perform a similar analysis comparing connectivity before and after lockdown ( Figure 6 ). We see a similar trend of the top 10% antennas after lockdown (dashed blue) incurring heavier traffic than the top 10% antennas before lockdown (solid red), though the distance between the red and blue lines in Figure 6 is smaller than Figure 4 . We note that similar to the during lockdown phase, the total traffic post-lockdown is still lower than the pre-lockdown phase (Figure 1 ). This shows that even after the lockdown measures are lifted, the traffic pattern still hasn't returned to normalcy. Similar to the comparison between pre-lockdown and during lockdown, we see changes in the antennas that contribute to the top 10% during the pre-lockdown and post-lockdown periods. Examining the traffic of the antennas in the top 10% before lockdown in the postlockdown time period (dotted magenta), we observe that antennas get displaced from top 10% from positions as high as the 20 ℎ percentile (red dots in the dashed magenta) and other antennas take their place (blue stars in the dashed blue line). These graphs serve as motivation to perform a finer grained analysis of connectivity and mobility, which we detail in the following sections. Now, analyzing the locations of the displaced antennas in the post-lockdown period, we find that there are a higher number of newly added antennas (blue markers in the map in Figure 6 ) when compared with the number of new additions to the top 10% during lockdown (blue markers in the left side of the map in Figure 4 ). When compared with the lockdown period (Figure 4 ), we observe newly emerging antennas in the CG region, a densely populated regions with fair SPI. Interestingly, we observe that four main antennas close to the downtown in CE, CO, and BO regions still do not feature in the top 10% antennas in the post-lockdown period which suggests that even after the lockdown is lifted, majority of people are working remotely and are also actively avoiding heavily congested areas. Here, we conduct a macroscopic analysis of the mobility of individual users. If a user is only connected to a single antenna, we conclude that the mobility of the user is limited (i.e., the user is primarily indoors; their movement is primarily restricted to the neighborhood where they live). In contrast, if a user is connected to multiple antennas in a day, we conclude that the user is mobile as they must have ventured significantly outside their neighborhood. While it is possible that some users live in an area that is serviced by two antennas, we believe such occurrences are usually rare and do not significantly alter our findings. Thus, the number of antennas a particular user connects to in a day provides a succinct picture of the mobility of the user. We use the individual data for this analysis. Since this data is only available from April 5 ℎ , our graphs start from week 6 to synchronize the duration of our analysis with the aggregated data. After calculating the distinct number of antennas for each user per day, we group users based on this number as exhibiting no mobility (i.e., 1 antenna visited), low mobility (i.e., between 2 and 5 antennas visited), medium mobility (i.e., between 6 and 10 antennas visited), and high mobility ((i.e., greater than 10 antennas visited). Figure 8a shows the percentage of users who are exhibiting no/low mobility during the lockdown period. We observe that the number of users with no mobility follows a decreasing trend during the lockdown period from ∼40% to ∼35%, which suggests that more people are venturing out of their homes as the lockdown progresses. Correspondingly, we observe an increase in the percentage of users with medium mobility as the lockdown progresses ( Figure 8b ). We observe that this trend continues after lockdown as well; (Figures 8c and 8d) . Following the medium mobility users during and after lockdown, we observe an increase from < 10% at the start of lockdown in week 6 to approximately 13% in week 17. While the percentage of users in the high mobility group almost remains constant overall during lockdown, we can see a more pronounced increase after lockdown (Figures 8b and 8d ). The number of users with low mobility remains approximately the same throughout the lockdown period and after the lockdown (Figures 8a and 8c) . One possible reason could be that users from the no mobility group may have transitioned to the low mobility group (no mobility users decrease from 40% to 35%), while a similar percentage of users transition from the low mobility to medium/high mobility groups. Some users in this group may have also "adjusted" to the new normal and adapted their mobility patterns around the pandemic for the duration of our study to keep the total percentage approximately the same. In contrast, our analysis in Figures 8b and 8d reveals that approximately 4% of users (i.e.., 80K users) visited 10 or more antennas per day, which suggests high mobility for certain individuals. This high mobility can be attributed to essential workers (e.g., sanitation workers, postal workers, taxi drivers) as well as low-income workers who need to travel far for work to sustain their livelihood and families during these trying times. Additionally, the high mobility could also be attributed to individuals who demonstrate less adherence to lockdown rules. From our analysis, we conclude that while the lockdown reduced the amount of human mobility, a high (approximately 15%) of the population still ventured significantly out of their neighborhood, which could have partially contributed to our failure in containing the spread of COVID-19. Our analysis in the previous subsection demonstrates that a significant number of individuals move across antennas that indicates high mobility in and around the city. Therefore, to better understand the mobility, we perform a graph-based mobility analysis. We construct a graph where the nodes/vertices correspond to the antennas (i.e., the graph has approximately 1400 vertices). We parse the individual user data and every time a user switches from one antenna to another antenna (referred to as a mobility event), we increase the weight of the edge between those two vertices by one. The so constructed mobility graph thus transforms user connections to mobility events and presents the opportunity to investigate the overall mobility in Rio and its suburbs at an aggregate level. As we have data loss for May 5, June 21, and June 22, we perform the following pre-processing to ensure fair comparison across weeks. For week 10, we ignore May 5 and scale by a factor of 7/6. For week 17, we ignore June 21 and June 22 and then scale it by 7/5. For week 18, we only have 5 days of data, so we also scale week 18 by 7/5. Figure 9 shows the distribution of the total number of mobility events over weeks. The vertical orange line represents the day when lockdown is eased. We observe from the figure that the overall user mobility in the city starts increasing from about three weeks before the lockdown measures are eased. The increase in mobility continues into the post-lockdown period as well. This finding is synchronous with Figures 8a, 8b, 8c, and 8d , which also indicate We next investigate the impact of the day of the week on the mobility of individuals ( Figure 10 ). To perform this study, we first observe from Figure 9 that the overall mobility pattern remains similar for some weeks. Therefore, we group some weeks together based on their mobility patterns (weeks 6-7, weeks 8-11, weeks 12-13, and weeks [14] [15] [16] [17] [18] . We then group Monday through Thursday together because they are working days and keep Friday, Saturday, and Sunday as separate days. We consider Friday separately because it captures the mobility pattern of a work day during the earlier part of the day and that of a weekend during the later part of the day. For Mon-Thu, we plot the average of the four days. The error bars capture the variation in the number of mobility events. We observe from Figure 10 that the overall mobility is lesser on weekends, particularly on Sundays. While one would expect this behavior in a pre-COVID society, we observe that this behavior persists even during lockdown. This additionally suggests that a significant portion of the population still ventures outside for their work during the week and does not have the opportunity to work remotely. Interestingly, we observe that the mobility on Fridays is lower than Mon-Thu during the initial part of the lockdown period and then increases and surpasses Mon-Thu. One plausible explanation is that despite the rising number of cases more individuals are slowly socially self relaxing the lockdown measures and are going to work during the day on Friday and then participating in recreational and/or social activities in the evening, which results in a higher number of mobility events on Friday in comparison to Mon-Thu. We next investigate the top 10% antennas to identify the differences in the connectivity pattern depending on the day of the week. As Friday is a work day, we observe that there is limited variation in the top 10% antennas, with a difference of 5-8 antennas when compared with the top 10% antennas in Mon-Thu. In comparison, we observe a significantly higher variation in the top 10% antennas over the weekend when compared to the weekdays with the difference being higher for Sunday in comparison to Saturday. Figures 11 and 12 show the variations in the top 10% antennas for Saturday and Sunday for weeks 8-11, respectively. The red markers on the map correspond to the ones that were in the top 10% during Mon-Thu but were replaced with the new antennas shown in pink and green for Saturday and Sunday, respectively. We see that the antennas being replaced from the Mon-Thu group are located in downtown Rio and Duque de Caxias. As expected the new antennas in the top 10% lie in the more residential areas. There are also similarities in the new locations that emerge in the top 10% on Saturday and Sunday, suggesting that the locations that gather the top 10% traffic tend to be similar over the weekend, though the amount of activity is higher on Saturday. To better understand the variation in mobility for the 1400 antennas over the weeks, we first group the antennas according to the municipality classification in Rio as shown in Figure 3 . Figure 13 shows the heatmap outlining the variation in the mobility at each antenna in the various regions for weeks 6 through 18. The vertical ticks on the horizontal axis mark the boundary of each region and coincide with the last antenna located in that particular region. The region OUT signifies all the antennas located in the outskirts or suburban areas of Rio in our dataset. Therefore, the antennas in OUT may not be geographically proximal to one another. We observe from Figure 13 that the amount of mobility varies significantly across regions. While regions such as SC and CG show high mobility throughout the lockdown and post-lockdown periods, some regions such as BT and CE show low mobility. The low levels of mobility in the CE region, which is in the downtown area, is congruous to Figures 5 and 7 , where we observe that the antennas in the downtown region relinquish their position in the top 10% of antennas in terms of the total traffic. This is primarily due to commercial businesses and offices being closed due to lockdown and their employees working remotely. For the other regions, revisiting Figure 3 provides us possible context for explaining this difference in mobility. We observe from Figure 3 that SC and CG are regions with low SPI, while BT is a region with high SPI. We hypothesize that due to this socio-economic disparity, people in SC and CG may be compelled to venture out of their homes for work and personal reasons during this challenging time, while people in BT may have the opportunity to stay indoors. In comparison, some areas such as JA contain a mix of low and high mobility antennas. Again, this can be explained by the presence of sub-regions in JA that lie on the extremes of the SPI spectrum (low and high in Figure 3 ). We next investigate the antennas that show the highest variation in mobility during the lockdown and post-lockdown periods. To conduct this study, we split the antennas into two groups-antennas whose average number of mobility events is less than 50,000 and antennas whose average number of mobility events is greater than 50,000. Such categorization helps us separately study antennas with high variation in both the low and high mobility groups, respectively. Figures 14 and 15 show the heat map for the top 15 antennas with the highest variation in this low mobility and high mobility groups, respectively. We observe that there are a number of antennas that show significant variation. Such mobility variations could plausibly be attributed to fluctuations in the number of COVID-19 positive cases, government policies as well as transitions to/from remote/on-site work that cause higher number of individuals to gather in a specific geographical region covered by an antenna. With many cities around the world going into lockdown again, analyzing mobility and correlating it with the number of infections is a key and promising factor for controlling the spread of the virus. Thus, another significant contribution of this work is developing a visual/interactive tool called COVID-19 Borescope that helps government and municipality administrations better understand the evolution of COVID-19 by analyzing the correlation between people's mobility and the infection data reports in different regions of Rio de Janeiro. This powerful tool is still under development, but is already launched and available to the public 1 . Developing a tool to provide insights from our analysis is challenging because it has to: i) be able to support a huge amount of data, possibly from multiple data sources, ii) be scalable, iii) offer the flexibility to integrate new algorithms and data sources, and (iv) be user friendly and intuitive for end users. The details of our tool are described in the following sections. To address the challenges involved in the development of a graphical and interactive system that performs intelligent data analysis of visually selected geo-temporal subsets of collected information, COVID-19 Borescope is supported by a robust underlying architecture. As shown in Figure 17 , the architecture consists of three servers, two in the back end and one in the front end. At the front end, we have NginX 2 running as the application server. At the back end, we have two data servers, one to store the data received from the cellular network provider and another one to store the data that is obtained from the Open Data repository provided by the Brazilian ministry of health 3 . The application server processes requests received from the Web User Interface, forwards the processed queries to the appropriate back-end server(s) and waits for the response(s). Once the NginX application server receives the response(s) from the data server, it may either still perform some post-processing or may directly forward the results to the end user. In the former case, the post-processing scripts are triggered. The tool is very flexible, which makes it possible to include external calls to machine learning algorithms. In the current version, as part of our ongoing research efforts, we are using this function to analyze correlation between mobility and infection cases in different regions of the city. After finishing the post-processing, the results are sent back to the end user and presented in a graphical manner on the web interface. For the data server, the tool uses a new data structure, which is an optimized variation of Nanocubes [19] . A full description of this data structure is beyond the scope of this paper and will be the subject of a future publication. However, it is important to mention that the data structure is an in-memory database and, as is the case with any Datacube structure [16] , it is specialized to perform statistical geo-temporal queries in a efficient manner with coordinates organized as QuadTrees [15] , offering low response time for queries and moderate memory usage. The data structure uses a JSON-based language that emulates a simplified SQL syntax to retrieve data. It offers the traditional "select", "where", and "group by" statements to select, filter, and group/fold data, respectively. The outcome of the data structure is a time series that is forwarded to the end user by the NginX framework, before and/or after being submitted to post-processing functions. 1 Accessible at: http://gwrec.cloudnext.rnp.br:57074/ 2 NginX: http://nginx.org/ 3 Brazilian COVID-19 OpenData: https://opendatasus.saude.gov.br/dataset Figure 16 provides an overview of the web user interface, which is used for visualization and interactive analysis. We briefly describe the interface. On the top left, the interface provides the option to select the region(s) in the map the user wants to analyze. In the bottom of the interface, the graphic shows the evolution of the number of connections with time for the selected area of the map. On the right hand side, the "Total" option shows the histogram of COVID-19 reported cases, which includes the number of recovered, active, and total cases, and deaths during the selected time period. The "Correlation" option shows the result from the correlation analysis. On the top right, the control panel allows users to zoom in/out, navigate different time periods and filter city regions by name. The pop-up button over the selected neighborhood provides the summary of the COVID-19 numbers for that specific region of the city. In this paper, we presented a large scale analysis of human mobility during a crucial stage in the COVID-19 pandemic in Rio de Janeiro and its suburbs based on cellular network connection logs from one of the main cellular network providers in Brazil, TIM Brazil. Our analysis employs aggregate and individual data on cellular connections from three phases in the first wave of the COVID-19 pandemic: pre-lockdown, during lockdown, and post lockdown, and draws important conclusions on the impact of lockdown on mobility. Overall, our research revealed that while lockdowns reduced the amount of human mobility, a high (approximately 15%) of the population still ventured significantly out of their neighborhood, which could have partially contributed to our failure in containing the spread of COVID-19. Since our analysis is based on large scale data from one of the most populous cities of the world, our analysis and resulting conclusions can potentially have positive implications on understanding mobility and designing lockdowns in other cities in future waves of the COVID-19 pandemic or other future events of a similar nature. With COVID-19 still surging in many countries and cities of the world, we believe our analysis and conclusions can potentially help in the effective management of the pandemic. Our work opens up avenues for several important research directions. One immediate next step of our study involves studying the correlation between mobility of users and infection rates. A fine-grained understanding of this correlation would be helpful in designing region-specific lockdowns rather than a one-size-fits-all solution, which is challenging to enforce for governments and also hard to adhere for people. Another potential direction involves applying and designing more sophisticated mobility models to understand the patterns more effectively [14, 32, 34] . Here, one idea is to study traffic flow patterns to identify bottlenecks and suggest alternate less congested routes and times that can spread the mobility and reduce overcrowding in populous and heavily trafficked areas that have a surge in infection rates. We also plan to develop and integrate mobility prediction models in this effort so that appropriate actions can be taken before a surge in infections occurs. We will continue to integrate our analysis and findings in the COVID-19 Borescope, which presents us with the perfect environment to expand the visibility and utility of our research and potentially help convert that into actionable policies or self-awareness for people. Minutes. 2020. 60 Minutes With Dr. Anthony Fauci City-Wide Signal Strength Maps: Prediction with Random Forests Hapi: A Robust Pseudo-3D Calibration-Free WiFi-based Indoor Localization System Is the lockdown important to prevent the COVID-19 pandemic? Effects on psychology, environment and economy-perspective How the Internet reacted to Covid-19 On human mobility predictability via WLAN logs A crossarchitectural quantitative evaluation of mobility approaches Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set A mixed queueing network model of mobility in a campus wireless network Mining pattern similarity for mobility prediction in location-based social networks CellRep: Usage Representativeness Modeling and Correction Based on Multiple City-Scale Cellular Networks Modeling Fine-Grained Human Mobility on Cellular Networks The Lockdown Effect: Implications of the COVID-19 Pandemic on Internet Traffic Deepmove: Predicting human mobility with attentional recurrent networks Quad trees a data structure for retrieval on composite keys Data cube: a relational aggregation operator generalizing GROUP-BY, CROSS-TAB, and SUB-TOTALS What is the Human Mobility in a New City: Transfer Mobility Knowledge Across Cities Joint mobility pattern mining with urban region partitions Nanocubes for Real-Time Exploration of Spatiotemporal Datasets A Characterization of the COVID-19 Pandemic Impact on a Mobile Network Operator Traffic SOCIAL PROGRESS INDEX 2016 Can you find me now? Evaluation of network-based localization in a 4G LTE network An Analysis of User Mobility in Cellular Networks What will you do for the rest of the day? an approach to continuous trajectory prediction Urban Vibes and Rural Charms: Analysis of Geographic Diversity in Mobile Service Usage at National Scale Empirical Characterization of Mobility of Multi-Device Internet Users Rajesh Balan, and Prashant Shenoy. 2020. WiFi-Trace: Network-based Contact Tracing for Infectious Diseases Using Passive WiFi Sensing Spatiotemporal modeling and prediction in cellular networks: A big data enabled deep learning approach CSI-based fingerprinting for indoor localization: A deep learning approach Measurement and modeling of user transitioning among networks Prashant Shenoy, and Rajesh Balan. 2020. Analyzing the Impact of Covid-19 Control Policies on Campus Occupancy and Mobility via Passive WiFi Sensing Real-time human mobility modeling with multi-view learning A traffic flow approach to early detection of gathering events A spherical hidden Markov model for semantics-rich human mobility modeling