key: cord-0901441-fmf3a7b8
authors: Zhang, L.; Ghader, S.; Pack, M. L.; Xiong, C.; Darzi, A.; Yang, M.; Sun, Q.; Kabiri, A.; Hu, S.
title: AN INTERACTIVE COVID-19 MOBILITY IMPACT AND SOCIAL DISTANCING ANALYSIS PLATFORM
date: 2020-05-05
journal: nan
DOI: 10.1101/2020.04.29.20085472
sha: 0da30325190e191b9888ad2660f4de0a24fc7053
doc_id: 901441
cord_uid: fmf3a7b8

The research team has utilized privacy-protected mobile device location data, integrated with COVID-19 case data and census population data, to produce a COVID-19 impact analysis platform that can inform users about the effects of COVID-19 spread and government orders on mobility and social distancing. The platform is being updated daily, to continuously inform decision-makers about the impacts of COVID-19 on their communities using an interactive analytical tool. The research team has processed anonymized mobile device location data to identify trips and produced a set of variables including social distancing index, percentage of people staying at home, visits to work and non-work locations, out-of-town trips, and trip distance. The results are aggregated to county and state levels to protect privacy and scaled to the entire population of each county and state. The research team are making their data and findings, which are updated daily and go back to January 1, 2020, for benchmarking, available to the public in order to help public officials make informed decisions. This paper presents a summary of the platform and describes the methodology used to process data and produce the platform metrics.

Informed decision-making requires data. In the case of COVID-19, no previous pandemic had 3 such a big universal impact on societies in the modern history, as a results historic data lacked key 4 information on how people react to such a universal pandemic and how the virus impacts 5 economies and societies. Data-driven decision-making becomes a challenge in such an 6 unprecedented event. Thanks to the technology, we now have an enormous amount of observed 7 data collected by mobile devices amid pandemic. We can now utilize this data to learn more about 8 the various impacts of a pandemic on our lives, make informed decisions to fight the current 9

invisible enemy, and be better prepared the next time such pandemics happen. Our research team 10 has utilized a national set of privacy-protected mobile device location data and produced a COVID-11

19 Impact Analysis Platform to provide comprehensive data and insights on COVID-19's impact 12 on mobility, economy, and society. 13 14

Mobile device location data are becoming popular for studying human behavior, specially mobility 15

behavior. Earlier studies with mobile device location data were mainly using GPS technology, 16 which is capable of recording accurate information including, location, time, speed, and possibly 17 a measure of data quality 1 . Later, mobile phones and smartphones gained popularity, as they could 18 enable researchers to sudy individual-level mobility patterns 2-4 . Other emerging mobile device 19

location data sources such as call detail record (CDR) 5-7 , Cellular network data 8 , and social media 20 location-based services 9-13 have also been used by the researchers to study mobility behavior. 21 Mobile device location data has proved to be a great asset for decision-makers amid the current 22

COVID-19 pandemic. Many companies such as Google, Apple, or Cuebiq have already utilized 23 location data to produce valuable information about mobility and economic trends 14-16 . 24 Researchers have also utilized mobile device location data for studying COVID-19-related 25 behavior 17,18 . 26 27

Non-pharmaceutical interventions such as social distancing are important and effective tools for 28 preventing virus spread. One of the most recent studies projected that the recurrent outbreaks might 29 be observed this winter based on pharmaceutical estimates on COVID-19 and other coronaviruses, 30 so prolonged or intermittent social distancing may be required until 2022 without any interventions 31 19 , highlighting the importance of improving our understanding about individual's reaction to 32 social distancing. Researchers have highlighted the importance of social distancing in disease 33

prevention through modeling and simulation 20-23 . The simulation models assume a level of 34 compliance, which can now be validated through observed data. Our current platform utilizes 35 mobile device location data to provide observed data and evidence on social distancing behavior 36 and the impact of COVID-19 on mobility. We used daily feeds of mobile device location data, 37

representing movements of more than 100 Million anonymized devices, integrated with COVID-38 19 case data from John Hopkins University and census population data to monitor the mobility 39 trends in United States and study social distancing behavior 24 . In the next section we describe the 40 methodology used to process the anonymized location data and produce the metrics that are 41 available on the platform. The methodology section is followed by a brief overview of the 42 platform. The last section presents concluding remarks. 43 44 45 All rights reserved. No reuse allowed without permission.

was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 1  2  The research team first integrated and cleaned location data from multiple sources representing  3  person and vehicle movements in order to improve the quality of our mobile device location data  4 panel. We then clustered the location points into activity locations and identified home and work 5 locations at the census block group (CBG) level to protect privacy. We examined both temporal 6 and spatial features for the entire activity location list to identify home CBGs and work CBGs for 7 workers with a fixed work location. Next, we applied previously developed and validated 8 algorithms 25 to identify all trips from the cleaned data panel, including trip origin, destination, 9 departure time, and arrival time. Additional steps were taken to impute missing trip information 10 for each trip, such as trip purpose (e.g., work, non-work), point-of-interest visited (restaurants, 11

shops, etc.), travel mode (air, rail, bus, driving, biking, walking, and others), trip distance (airline 12 and actual distance), and socio-demographics of the travelers (income, age, gender, race, etc.) 13 using advanced artificial intelligence and machine learning algorithms. If an anonymized 14 individual in the sample did not make any trip longer than one-mile in distance, this anonymized 15

individual was considered as staying at home. A multi-level weighting procedure expanded the 16 sample to the entire population, using device-level and trip-level weights, so the results are 17

representative of the entire population in a nation, state, or county. The data sources and 18 computational algorithms have been validated based on a variety of independent datasets such as 19

the National Household Travel Survey and American Community Survey, and peer reviewed by 20 an external expert panel in a U.S. Department of Transportation Federal Highway Administration's 21

Exploratory Advanced Research Program project, titled "Data analytics and modeling methods for 22

tracking and predicting origin-destination travel trends based on mobile device data" 25 . Mobility 23 metrics were then integrated with COVID-19 case data, population data, and other data sources. 24 Figure 1 shows a summary of the methodology. All rights reserved. No reuse allowed without permission.

was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which this version posted May 5, 2020. Trips are the unit of analysis for almost all transportation applications. Traditional data sources 4 such as travel surveys include trip-level information. The mobile device location data, on the other 5 hand, do not directly include trip information. Location sightings can be continuously recorded 6 while a device moves, stops, stays static, or starts a new trip. These changes in status are not 7 recorded in the raw data. As a result, researchers must rely on trip identification algorithms to 8 extract trip information from the raw data. Basically, researchers must identify which locations 9

form a trip together. The following subsections describe the steps our research team took to identify 10 trips. The algorithm runs on the observations of each device separately. 11 12 2.1.1. Pre-Processing 13 14

First, all device observations are sorted by time. The trip identification algorithm assigns a hashed 15 ID to every trip it identifies. The location dataset may include many points that do not belong to 16 any trips. The algorithm assigns "0" as the trip ID to these points to identify them as static points. 17

for every observation, we compute the distance, time, and speed between the point and its previous 18 and next points if exist. 19 20

The trip identification algorithm has three hyper-parameters: distance threshold, time threshold, 21 and speed threshold. This algorithm checks every point to identify if they belong to the same trip as their previous point. 31

If they do, they are assigned the same trip ID. If they do not, they are either assigned a new hashed 32 trip id (when their ≥ ℎ ℎ ) or their trip ID is set to "0" (when their 33 < ℎ ℎ ). Identifying if a point belongs to the same trip as its previous 34 point is based on the point's "speed to", "distance to" and "time to" attributes. If a device is seen 35 in a point with ≥ ℎ ℎ but is not observed to move there 36 ( < ℎ ℎ ), the point does not belong to the same trip as its previous point. 37

When the device is on the move at a point ( ≥ ℎ ℎ ), the point belongs to 38 the same trip as its previous point; but when the device stops, the algorithm checks the radius and 39 dwell time to identify if the previous trip has ended. If the device stays at the stop (points should 40 be closer than the distance threshold) for a period of time shorter than the time threshold, the points 41 still belong to the previous trip. When the dwell time reaches above the time threshold, the trip 42 ends, and the next points no longer belong to the same trip. The algorithm does this by updating 43

"time from" to be measured from the first observation in the stop, not the point's previous point. 44

The algorithm may identify a local movement as a trip if the device moves within a stay location. 45 All rights reserved. No reuse allowed without permission.

was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. To filter out such trips, all trips that are within a static cluster and all trips that are shorter than 300 1 meters are removed. We first identify all activity points. was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which this version posted May 5, 2020. . https://doi.org/10.1101/2020.04.29.20085472 doi: medRxiv preprint non-static clusters based on time and speed checks. After finalizing the potential stay clusters, the 1 framework combines nearby clusters to avoid splitting a single activity (Figure 4) . 2 3 4 Figure 4 . Activity clustering methodology 5 6 2.2.2. Home and work CBG Identification 7 8 Figure 5 shows the methodology for home and work CBG identification. Instead of setting a fixed 9 time period for each type, e.g., 8pm to 8am as the study period for home CBG identification and 10 the other half day for work CBG identification, the framework examines both temporal and spatial 11

features for the entire activity location list. The benefits are two-fold: the results for workers with 12 flexible or opposite work schedules would be more accurate and the employment type for each 13 device could be detected simultaneously. Figure 6 shows the validation of home and work location 14 imputations, by comparing the distance from home to work between longitudinal employer-15 household dynamics (LEHD) data and the imputed locations for a set of mobile device location 16 data for the Baltimore metropolitan area. We can observe a satisfactory match. 17 18 19 Figure 5 . Home/work CBG imputation methodology 20 21 All rights reserved. No reuse allowed without permission.

was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which this version posted May 5, 2020. to be able to make sufficient generalizations using a multi-layer DNN and capture the exceptions 10 using the wide single-layer model. The datasets used for training the model were collected from 11 the incenTrip mobile phone app 28 , developed by the authors, where the ground truth information 12 for car, bus, rail, bike, walk, and air trips was collected. To effectively detect the travel mode for 13 each trip, feature construction is critical in providing useful information. Travel mode-specific 14 knowledge is needed to improve the detection accuracy. In addition to the traditional features used 15 in the literature (e.g. average speed, maximum speed, trip distance, etc.), we also integrated the 16 multi-modal transportation network data to construct innovative features in order to improve the 17 detection accuracy based on network data integration. The wide and deep learning method utilized 18 in this study achieved over 95% prediction accuracy for drive, rail, air, and non-motorized, and 19 over 90% for bus modes on test data. We have applied the trained algorithms on the location dataset 20 to obtain multimodal trip rosters (see Figure 7 that shows raw location data points by different 21 travel modes 

All rights reserved. No reuse allowed without permission.

was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which this version posted May 5, 2020. In the Home and work CBG Identification, we described how home and work CBGs can be 6

identified. Other purposes can be directly identified through spatial joint of trip end locations and 7 point of interest (POI) data. We have used a popular commercial POI dataset that includes more 8 than forty million records for the U.S. For each trip end, we first filter all POIs that are located 9 within a 200-meter radius of the trip-end. Next, we identify the trip purpose by the POI type of the 10 closest POI. 11 12 2.3.3. Socio-Demographic Imputation 13 14

Due to privacy concerns, mobile device location data contain very little ground truth information 15 about the device owners. However, it is essential to understand how representative the sample is 16

and how different segments of the population travel. The state-of-the-practice method is to assign 17 either the census population socio-demographic distribution or the public use microdata sample 18 (PUMS) units to the sample devices within the same geographic area based on the imputed home 19

locations. More advanced socio-demographic imputation methods utilize travel patterns and 20

visited POI types to impute the socio-demographics. These methods require a significant amount 21 of computation, as various features from different databases should be calculated and used. In 22 order to balance the computations and conduct a timely analysis for the pandemic, we have used 23 the state-of-the-practice method and assigned socio-demographic information to the anonymized 24 devices based on the census socio-demographic distribution of their imputed CBG. Five-year 25

American Community Survey (ACS) estimates for 2014 to 2018 from the U.S. Census Bureau can 26 be used to obtain median income, age distribution, gender distribution, and race distribution for 27 each U.S. CBG 29 . For each device, we used Monte-Carlo simulation 30 to draw from the age, 28 gender, and race distribution at the device's imputed home CBG. We also assigned the CBG's 1 median income to the device. 2 3 2.4. Weighting 4 5

The sample data needs to be weighted to represent population-level statistics. First, the devices 6 available in our dataset are a sample of all individuals in the population, so we need to apply 7 device-level weights. Second, for an observed device, only a sample of all trips may be recorded, 8 so trip-level weights are also needed. For the sake of timeliness, we have applied simple weighting 9 methods to obtain county-level device weights and state-level trip weights. In order to obtain 10 device-level weights, we have used the home county, obtained from the imputed home CBG 11

information. research findings available to other researchers, agencies, non-profits, media, and the general 28 public. The platform will evolve and expand over time as new data and impact metrics are 29 computed and additional visualizations are developed. Table 1 shows the current metrics available  30 in the platform at the national, state, and county levels in the United States with daily updates. 31 Figure 8 illustrates the platform. 32 33 2 All rights reserved. No reuse allowed without permission.

was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which this version posted May 5, 2020. The Integrated dataset compiled by our research team shows how the nation and different states 6 and counties are impacted by the COVID-19 and how the communities are conforming with the 7 social distancing and stay-at-home orders issued to prevent the spread of the virus. The platform 8 utilizes privacy-protected anonymized mobile device location data integrated with healthcare 9 system data and population data to assign a social distancing score to each state and county based 10 on derived information such as percentage of people who are staying home, average number of 11 trips per person and average distance traveled by each person. As the next steps, the research team 12 is integrating socio-demographic and economic data into the platform to study the multifaceted 13

impact of COVID-19 on our mobility, health, economy, and society. 14 15

ACKNOWLEDGMENT 16 17

We would like to thank and acknowledge our partners and data sources in this effort: (1) Amazon 18

Web Service and its Senior Solutions Architect, Jianjun Xu, for providing cloud computing and 19 technical support; (2) computational algorithms developed and validated in a previous USDOT 20

Federal Highway Administration's Exploratory Advanced Research Program project; (3) mobile 21 device location data provider partners; and (4) partial financial support from the U.S. Department 22

of Transportation's Bureau of Transportation Statistics and the National Science Foundation's 23 RAPID Program. 24 25 All rights reserved. No reuse allowed without permission.

was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which this version posted May 5, 2020. . https://doi.org/10.1101/2020.04.29.20085472 doi: medRxiv preprint

Simulation suggests that rapid activation of social 1 distancing can arrest epidemic development due to a novel strain of influenza

Does Social Distancing Matter?

Estimating the scale of COVID-19 Epidemic in the United States: Simulations 6 Based on Air Traffic directly from Wuhan

An interactive web-based dashboard to track COVID-19 8 in real time

Data Analytics and Modeling Methods for Tracking and Predicting 10 Origin-Destination Travel Trends Based on Mobile Device Data. (Federal Highway 11 Administration Exploratory Advanced Research Program

Deep learning

An integrated and personalized traveler information and incentive scheme 15 for energy efficient mobility systems

Proceedings 21st International 18 Cartographic Conference

Monte carlo methods