key: cord-0814362-kr8kjao6 authors: Silva, Julio Cezar Soares; de Lima Silva, Diogo Ferreira; Delgado Neto, Afonso de Sá; Ferraz, André; Melo, José Luciano; Ferreira Júnior, Nivan Roberto; de Almeida Filho, Adiel Teixeira title: A city cluster risk-based approach for Sars-CoV-2 and isolation barriers based on anonymized mobile phone users' location data date: 2020-11-05 journal: Sustain Cities Soc DOI: 10.1016/j.scs.2020.102574 sha: 388dbb3bf7b2de73e8f12445b2a104893b34707b doc_id: 814362 cord_uid: kr8kjao6 Given the recent outbreak of Sars-CoV-2, several countries started to seek different strategies to control contamination and minimize fatalities, which are usually the primary objectives for all strategies. Secondary objectives are related to economic factors, therefore ensuring that society would be able is to keep its essential activities and avoid supply disruptions. This paper presents an application of anonymized mobile phone users' location data to estimate population flow amongst cities with an origin-destination matrix. The work includes a clustering analysis of cities, which may enable policymakers (and epidemiologists) to develop public policies giving the appropriate consideration for each set of cities within a Province or State. Risk measures are included to analyze the severity of the spread among the clusters, which can be ranked. Then, intelligence can be obtained from the analysis, and some clusters could be isolated to avoid contagion while keeping their economic activities. Therefore, this analysis is reproducible for other states of Brazil and other countries and can be adapted for districts within a city, especially considering the possibility of a second wave COVID-19 pandemic. The rapid spread of the novel COVID-19 worldwide urges studies regarding the causes, the current global/local situation, the effects of this epidemic disease, and the possible solutions on both social-economic and health aspects. Intending to contain the advance of this pandemic, several governments have imposed social restrictions on their populations, including policies such as quarantine and lockdowns (Gao & Yu, 2020; Yang et al., 2020) . In parallel, the academic discussion about the studies on the impact of border/travel control, travelers targeting, and local governments' responses advances (Anzai et al., 2020; Shrivastava & Shrivastava, 2020; Sirkeci & Yucesahin, 2020; Wells et al., 2020). 4 In Brazil, the Supreme Federal Court authorized governors and mayors to determine the rules that should be followed by the citizens from their respective states and cities (Brígido, 2020) . This decision results in the need for local analysis provided for each state to justify their positions. Moreover, the potential of coronaviruses to result in large case clusters via super spreading (Wu et al., 2020) , such as the catastrophic situation in Italy caused by the novel COVID-19 in the Lombardy region (Giordano et al., 2020; Sebastiani et al., 2020) , where more than 70,000 cases were reported, and worries regarding a second wave of infection call attention for the importance of studies focused in microregions scenarios and economic recovery in this context (Hart & Halden, 2020) . In this paper, the state of Pernambuco is analyzed, located in northeastern Brazil. To date, Pernambuco is among the states with the highest number of deaths, 845, caused by COVID-19 in Brazil (Ministério da Saúde, 2020) . Furthermore, about 98% of its intensive care units are occupied, and a crash of the health system is near. Real anonymized data regarding movements between 192 cities located in Pernambuco from 2020-01-01 to 2020-03-31 were collected and used as input for a network model. From the obtained population mobility network, it has been considered the weighted directed graphs, which enabled us to analyze data with effects prior to and after the first insolation policies adopted. Therefore, the contributions of this paper are threefold. First, it discusses the COVID-19 outbreak within one of the most affected states of Brazil. Second, it presents a clustering analysis based on real data collected from mobile phone location, used to describe the dynamics of travels between pairs of cities located in Pernambuco made by the phone users. Plots of the clusters give a visual illustration of how cities interact with each other. Third, this paper uses the statistics of the cities' population and the number of COVID-19 active J o u r n a l P r e -p r o o f cases to calculate risk measures that can be used by authorities when deciding over the inclusion/exclusion of isolation barriers. Thus, the paper contributes by proposing the integration of anonymized mobile location data to estimate mobility patterns through an origin-destination matrix. The bi-directional mobility graph is obtained to enable a persistence analysis of the network to find natural clusters of cities. Then, an analysis is provided integrating a traditional epidemic measure, such as Force of Infection (Keeling & Rohani, 2008) , and a proposed Risk exposure measure considering each node strength and the actual epidemic data provided by official sources to enable insights such as presented in Section 3.1.2 and Section 4 for supporting policymakers' decisions and epidemiologists to establish isolation protocols. The paper is organized into four sections. Section 1 introduces the topics discussed in the paper. Section 2 presents how the data were collected and the methods used to analyze them. The results of a case study are reported and discussed in Section 3, followed by a discussion on the insights and limitations in Section 4. Finally, Section 5 concludes the paper and presents some suggestions for the continuity of this work. Mobile location data has been included as one of the main digital tools/technologies used for combating COVID-19 in a recent review (Budd et al., 2020) . In this review, Budd et al. (2020) explain that the use of location data regards mainly a public-health need for interruption of community transmission. The concern of large global technology companies such as Google and Apple in generating and make available community mobility reports for aiding the COVID-19 combat highlights the importance of this kind of study (Apple, 2020; J o u r n a l P r e -p r o o f Google, 2020). Table 1 presents the comparison of the state-of-the-art studies involving COVID-19 and mobile phone location data and this work. The effect of travel restrictions in Wuhan to prevent or delay case importations in other Chinese cities as well as internationally was discussed in (Chinazzi et al., 2020) , where Baidu mobility data was used. Kraemer et al. (2020) also used real-time mobility data from Wuhan in their study, where the impact of control measures in the region was analyzed. Zhang et al. (2020b) used contact-tracing data from one of the largest operators in China to investigate the impact of age difference in COVID-19 transmission and modification of mixing patterns due to social distancing. Jia et al. (2020b) Studied the effect of the quarantine in ceasing mobility and showed that the outflow distribution from Wuhan predicts where the infections by COVID-19 mostly occurs. The authors also developed a spatio-temporal risk model to identify regions with a high risk of infection at an early stage. Zhou et al. (2020) used anonymized mobile phone data as an input for an SEIR model to study COVID-19 evolution dynamics when different mobility restrictions were imposed. Hu et al. (2020) studied the correlation of the COVID-19 spatiotemporal data of two categories of location-based services data of mobile devices to understand the effect of human movement on the COVID-19 spread. Mobility data was also used to understand the COVID-19 spread and mitigation strategies in other countries. Pepe et al. (2020) used mobile location data provided by Cuebiq Inc to develop individual proximity and mobility metrics to study the impact of the social distancing measures imposed by the Italian government in the COVID-19 spread reduction. Aleta et al. (2020) used mobile location and census data to build agent-based models in Boston's J o u r n a l P r e -p r o o f metropolitan area to understand the impact of strategies to relax social-distancing. They used compartmental models to study the evolving dynamics of COVID-19. Hill et al. (2020) used mobile phone location and census data to develop a robust regression model that analyzed the impact of religion on human mobility during the COVID-19 pandemic. Concerning Brazil, only one study that used mobile devices location data was developed until now. Peixoto et al. (2020) used a metapopulation SI model to study the spatiotemporal dynamics evolution of COVID-19 in two Brazilian states. By ranking cities from São Paulo and Rio de Janeiro with respect to the rank of infection measure, the authors obtained a risk map for the pandemic spread evolution. Our study brings innovation because it explores COVID-19 risk of infection in groups of cities with a relatively strong relationship (clusters), therefore offering methods to mitigate J o u r n a l P r e -p r o o f infection risk, in this case, strategic placement of isolation barriers, while maintaining both economic and social interactions among cities of each cluster. This study also adopted both risk of infection of an individual present in a group of infected cities and the risk of contamination of cities containing no active cases. The data was provided by In Loco Company, which collects anonymized location data from about 60 million devices around the world, enabling mobile apps to provide location-aware services while securing the privacy of their users 1 . In Pernambuco, more than 2 million devices are registered and were used in this application, which is significative data considering the 9.5 million population of Pernambuco. However, In Loco Company is unable to associate a user with any external information from it due to privacy purposes. Nevertheless, most of the apps (e.g., e-commerce stores, shopping stores, and online sales of used products) that have In Loco Company technology embarked are homogeneous amongst social classes, therefore, representing a significant portion of the company's database. As a result of having a stratified sample of users, this enables In Loco Company to sell mobilitybased business intelligence applications to a wide variety of purposes. The performance of In Loco Company technology was among the best infrastructure-free technologies in IPSN 2014 (Lymberopoulos & Liu, 2017) , with an average location error of 2.81 meters for the indoor location. It relies only on the smartphone's sensors, including Wi-Fi, accelerometer, and magnetometer, in order to enhance GPS precision of location events, giving context to their nature. Also, the embedded Software Development Kit (SDK) of In Loco Company is specially designed to capture location in crucial moments of the user's path, that is, only moments that actually establish a visit to a specific place (sufficient steady-state in movement and for a sufficient amount of time in that location), only gathering the information once and when needed. Also, the absence of a user's internet connection in the moment of location collection is not a limitation. The SDK is programmed to store location visits and wait for the moment in which sending the visits information is possible. Therefore, movement noise is diminished, and locations may be gathered even if the device is offline or if its owner only connects to the internet when there is a Wi-Fi connection available. Using such rich yet anonymized information of the user's location enables us to build the origin-destination matrix by relying first on the distribution of pairs of distinct location transitions on users' paths. The same source of anonymized mobile user's location data has been used for the main social isolation index (Queiroz et al., 2020) used in Brazil by policymakers for establishing the COVID-19 outbreak control with social distancing. Figure 1 presents a diagram that summarizes the source in which data was collected, how it was preprocessed, and used as input for the experiments. The origin-destination matrix characterizes the overall flow distribution between pairs of locations. Each entry of the origin-destination matrix represents the relative flow estimate of people moving from city i to city j. If = 0.2, then a 20% flow was estimated with respect to the population of city i. Only the intracity movement was not estimated using that technique. Thus, the idea is to consider the rate of users that did not leave their current city (that can be estimated by clustering visits on time and location and choosing the most frequent location in moments of resting, such as the night) in a given day. So the data collected includes an origin-destination matrix for each day t. In this paper, the data used regards the behavior of the population of Pernambuco concerning travels between the = 192 cities on 91 different dates (2020-01-01 to 2020-03-31). Therefore, for each city , it was calculated the percentage of people that remained in the city and the percentage that left city to each of other − 1 cities at each of the 91 dates collected. Devices from other states found within these 192 cities during the data collection J o u r n a l P r e -p r o o f were also considered due to its mobility from one city to another, even if it was inter-state mobility. The objective of this study is to consider worst-case scenarios of the COVID-19 spread; thus, we have estimated the worst-case flow for each pair of cities in the state. We considered the worst-case flow between i and j as = max( ), where t represents an origin-destination matrix in the database. These scenarios were evaluated in two different periods: before the pandemic and after the start of the pandemic and the establishment of isolation protocols. An extract using the indexes [1:7, 1:7] from an OD matrix is shown in Table 2 . OD-matrix sample. Table 2 . Table 2 . OD-matrix sample. Furthermore, statistics involving the COVID-19 spread in Pernambuco were collected from the Brazilian Ministry of Heathy reports (Ministério da Saúde, 2020) and the Secretariat for Planning and Management of Pernambuco (Seplag, 2020) . Also, information regarding the population and coordinates of the cities was collected from the Brazilian Institute of Geography and Statistics (IBGE, 2019). In this section, definitions that may be necessary for a better understanding of the results and analysis of this article are presented. Also, the sources here cited can be used by the reader that wants a more complete and more in-depth explanation of the measures and definitions. Definition 1 (Bondy & Murty, 2008) : Let = ( ( ), ( )), be a directed graph, or digraph, where ≔ ( ) represents a set of vertices and ≔ ( ), disjoint from , consists of a set of directed arcs together with an incidence function that associates each arc of with an ordered pair of vertices . Let each vertex (or node) represent one city, , where the set of vertices (cities) is obtained from the origin-destiny data collected from anonymized mobile phone location data. A proportion of the population from the city travels to each of the other to cities . Let this proportion be the weight of the associated arc ( ), with tail in the and head in . Then, the digraph, together with the weights of its arcs, is called a weighted directed graph ( , ). Considering four cities , , , and . A directed graph is illustrated in Figure 2 . J o u r n a l P r e -p r o o f be the weight associated arc that has its tail in the and head in . Then, a graph constrained by the threshold can be obtained from the original graph using the following rule. Definition 2: An arc between two vertices , ∈ exists if and only if, , ≥ or , ≥ . In other words, this threshold represents the fractional volume of people that implies a wellestablished communication amongst cities, therefore indicating that there is a proportion , ≥ of people that travel from city to . This well-established communication is associated with frequent travels, relative to family, work, and expenditure of services and goods. For instance, analyzing Figure 2 , if we assume the only the flows corresponding to people going from city to city , from to , from to , and from to are above the threshold, the newly constructed graph could be expressed as in Figure 3 . J o u r n a l P r e -p r o o f Therefore, this threshold implies a risk exposure considering the connectedness amongst cities. Thus, depending on the tolerance level described by a threshold , cities within a province may be divided into clusters considering the dynamics of the flow of people amongst those cities. Definition 3: Force of Infection Measure (FOI) (Keeling & Rohani, 2008 ) -Intending to simplify the management of infected clusters, a proposition is to rank them in order of importance according to the FOI of a cluster 1 , represented here as 1 , which is the rate at which a single individual contracts the disease (Keeling & Rohani, 2008) : Where is the number of infectious individuals from a city ∈ 1 and 1 is the population size of 1 . The parameter is found with the product of the contact rate ( ) and the probability of transmission (t), therefore: = . For developing a more conservative analysis, the probability of transmission was considered equal to 100%, and the mean strength of the cluster (Barrat et al., 2004) was considered as a proxy for modeling contact rate. Definition 4: City Risk Exposure Measure, , is given by equation (2): J o u r n a l P r e -p r o o f Definition 5: Cluster Risk Exposure Measure, 0 , is given by equation (3): The following intersection of events may occur: city i is infected by city j and k (two cities simultaneously), city i is infected by more than two cities simultaneously, and so on. For both risk measures ( , 0 ) a conservative approach has been assumed, therefore including the redundant intersection of events. i.e., being contaminated by more than one city at the same time. The state of Pernambuco has an estimated population of more than 9.5 million people. The state publishes a report at the end of each day, in which the cities are divided according to their localization into 12 health regions. On the other hand, no information regarding movements between cities is presented in this report. As mentioned in Section 2.1, mobile phone localization data have been used to obtain information about travels between cities in Pernambuco. Therefore, the percentage of the population of city that traveled to city in a set date was inferred. The experiments developed in this section were performed by using the python-igraph library (Csardi & Nepusz, 2006) , which is a python library developed for network science, containing functions that allow the user to build graphs, to develop quantitative analysis, and also offers graph visualization tools. Also, data concerning the active coronavirus infectious individuals in Pernambuco were collected from Secretariat for Planning and Management of Pernambuco (Seplag, 2020) . Experiments were performed to evaluate the connection dynamics and economic recovery possibility of the clusters produced as the connection threshold increases from 0 (complete network with original connections) to 0.2. As the threshold grows from 0 to 0.2, clusters of cities with relatively strong connections are revealed. We considered that a representative cluster set concern a good quantity of disjoint regions that covers the majority of cities of the state. There was no reason to investigate the increase of the threshold beyond 0.2 since we couldn't obtain a representative set of clusters when the threshold is greater than that value. Experiments were performed in two scenarios, one considering data collected before and after 2020-03-15, as this date marks the starting point of the social distancing interventions in Pernambuco. Both scenarios considered the maximum flow of travels amongst the cities during the respective periods (January 01 to March 15 and March 16 to March 31). Figure 6 illustrates that the clustering generation with threshold decreases. It can be highlighted that as is relaxed more connections between cities are found. If = 0, then the complete network associated with original data is obtained. Of course, as increases, the cities with strong dependency (not necessarily mutual) form isolated clusters, and cities with weak dependency get isolated due to the inferred mobility pattern. J o u r n a l P r e -p r o o f Since the generated clusters are fully separated, one can use them to continuously classify regions into two categories: those in which policymakers and epidemiologists may propose a protocol for border/travel control when necessary, and others where social restrictions must be maintained/implemented. Policymakers and epidemiologists may also consider that with specific protocols, some regions may have the necessary care and border controls, associated with a plan for the gradual recovery of activities to be performed. In both categories, policymakers shall consider border controls so that infected individuals will not leave their original region. Such isolation barrier protocol is particularly important when a region/cluster is classified as feasible for economic recovery planning in order to prevent the entrance of infectious individuals. The following subsections aimed at answering three questions: "How to establish isolation barriers to contain the outbreak while minimizing social-economic losses?", "What are the possibilities of representative mobility clusters available for the state?", and "Once the representative set of clusters is selected, how to assess risk and establish priorities for mitigation actions?" This subsection concerns understanding how the clusters of strongly related cities were affected by the government's isolation protocols and the available possibilities of representative sets of clusters for the state. Clusters are structures that contain a relatively higher probability of exchanging infected individuals amongst the cities belonging to them. When a threshold is used to construct a cluster, the probability of mobility inside it is greater or equal to , and the probability of mobility coming from outside it is less than . A decrease in the threshold reduces the anomalous behavior that comes from outside each cluster. Nevertheless, if the threshold is too small, few clusters are constructed, and it is more difficult J o u r n a l P r e -p r o o f to control the spread inside a cluster. When few cities with no COVID-19 cases reported are placed within the same cluster, then travel barriers could be used to protect/isolate these. In the first section of the supplementary material a more detailed exploratory analysis of the persistence of the state's networks and characteristics of the generated clusters is presented. Figure 7 shows the behavior of the population included or not in clusters with more than one city when the threshold varies. Considering a threshold of = 0.15, for instance, it is observed that 186 (172 before isolation) cities in individual clusters cover about 70% (60.44%) of the total population. It can be interpreted that a large proportion of individuals presents less than a 15% chance of mobility between cities. Thus, the chosen threshold also brings less than likelihood of exceptional communication with cities not included in any plotted cluster in the above figures. Another way is to fix (or accept) a relatively smaller threshold value and to create more clusters to subdivide the state, based on local communications among cities. They can be used to manage the minimization of the COVID-19 spread and to plan a gradual economic recovery of the cities. The objective of this section is to give answers on how to prioritize clusters and manage mitigation actions in the state. The first step to develop the risk analysis was to classify clusters into not infected and infected. A cluster is infected if it includes at least one infected city. The classification results are presented in Figure 9 . Six clusters were found to be not infected ( 0 ) and 12 clusters are infected ( 1 ). These clusters were colored according to their condition and indexed with a number k inside a colored square. J o u r n a l P r e -p r o o f Once the clusters are separated, their risk assessment can be developed to guide decisions using the metrics detailed in Section 2. The FOI measure is used to evaluate the criticality of infected clusters of cities or an isolated infected city, so epidemiologists may use this information for planning social isolation policies. In contrast, the Risk Exposure measures are used for non-infected cities and clusters and enable decisions regarding isolation barriers policies to keep economic activities and prevent infection. Table 3 presents the 12 infected clusters, ordered by their respective FOI. In this table, the first column indicates the clusters' indexes, and the reader can check the cities that constitute each cluster in Table 1 in the Supplementary Material. In the second column, the FOI, given by equation (1), of the clusters are presented, where the estimated actual active cases for each city were extrapolated from the number of cases officially reported. This extrapolation was necessary since only a small portion of the infected population effectively goes to a hospital and makes a COVID-19 test. There are reports that those persons seeking hospital and health support because they contain more severe symptoms (Day, 2020 ; World Health Organization, 2020b), and the same is being reported for Pernambuco. We assumed that the registered are only 20% of the total number of cases that can transmit the disease since there is a general perception of underreporting due to the small number of tests. The third column presents the difference between the FOI of consecutive clusters, and the fourth column illustrates a ratio over these differences. At last, the fifth column shows the cumulative ratio, and the final value shows the relative of risk difference amongst each pair. When the sign '-' is included in an entry of the table, it means that there is no sufficient information to calculate the associated column output. This will happen for the last three columns, especially because at least one row below the row in which the values are being calculated is needed to calculate their output. When analyzing Table 3 , it can be observed that Cluster 1 1 is the most critical, and should be prioritized by the policymakers. Also, policymakers can visualize the risk gap among clusters in order to prioritize initiatives concerning a resource constraint and enlighten the different aspects amongst those city sizes and populations. For example, in the fifth column, the difference from Cluster 1 1 to Cluster 2 1 is more than 35 times the difference from Cluster 2 1 to Cluster 2 1 . This magnitude is expressive and shows the relative severity of the cluster 1 1 , which contains Recife. Furthermore, the sixth column shows the cumulative ratio, and its final value shows the quantity of risk range that needs to be mitigated. This additional information is useful due to resource limitations faced by the managers. Some clusters can contribute to the risk management process with relatively more impact. This is the case for clusters 1 1 -3 1 , which together contain more than half of the DiffRatio or risk range to be mitigated. Thus, the public manager can discuss with epidemiologists which of these clusters to attack first, and depending on the budget, choose more than one to simultaneously perform risk mitigation. In Table 4 , the metrics discussed above for Table 3 are presented for isolated infected cities, which are not within any cluster. In this case, Cachoeirinha is the most critical city. As we had discussed before, we can also see the impact of each isolated city in the risk mitigation process. The first three cities (Cachoeirinha, Ipubi, and Tupanatinga) can be attacked first in order to perform a more efficient risk mitigation process. Metrics regarding non-infected clusters and cities are presented respectively in Table 5 and Table 6 details the non-infected clusters ordered by their risk exposure ( 0 ). Concerning Table 5 , the second column contain the risk of infection of non-infected clusters. The cities within each cluster are presented in Table 2 It can be seen that the clusters that most contribute to the exposure risk mitigation are 1 0 and 2 0 , since they are associated with almost all the risk range to be mitigated. In Table 6 , one can find the city exposure risk, calculated for non-infected isolated cities. According to this measure, Sertânia is the city with a higher risk of becoming infected based on the location flows used in this application. The first ten cities (Sertânia -Cedro) are those that dramatically contributed to the variation in the range of risk to be mitigated and should be given special attention. This section presented a risk mitigation process that considers groups of cities and minimal intervention on their economic relationships. The tools to prioritize infected clusters ( 1 ) and healthy clusters ( 0 ) where risk measures that quantify a range of risk to be mitigated. For each type of cluster, there is a subset of clusters that contributes more efficiently to the quantified range of risk to be mitigated. Also, not all cities were allocated to clusters, considering the adopted threshold. But, using the proposed tools, one can also investigate the situation of the infected and healthy isolated cities. Suppose an isolated city is healthy, with low exposure risk, and its geographical position is close to other healthy clusters. In that case, one can think of ways to allocate this city to a cluster such that there is no increase in this cluster's exposure risk by considering the links (or strong relationships) between the cities of the cluster and the isolated city. Finally, since the COVID-19 reports are made available to the public daily, these risk tables must be calculated on a daily basis. When analyzing a biologic calamity scenario such as the ongoing COVID-19 pandemic, it is essential to consider that people's behavior is affected, including the flows of people traveling between cities. For this analysis, data before and after the implementation of social isolation policies in the state were considered. Then, in the calculation of the risk measures, only the networks produced with data compatible with the social isolation behavior have been used. The networks have been constructed considering the maximum observed flow within the observed isolation state, ensuring a pessimistic approach A pessimistic approach was considered when calculating risk measures, where the number of reported cases of COVID-19 within each city was considering as only a portion of 20% of the total cases due to under-reporting. Therefore, an estimated number of infected people was used, considerably increasing the number found in the reports. The entire cluster is considered contaminated if there is a single COVID-19 case reported for a unique city of the cluster. In this paper, the focus is on the spread of the disease amongst the cities of the state. Therefore, the decisions based on these metrics are related to the implementation of isolation barriers to isolate not infected cities. Then, the risk exposure metrics show that cities that are not infected yet have a mobility relationship with other infected cities. For instance, Table 4 shows that Sertânia is the most exposed non-infected city. It can be used to support a decision of the governor or mayor (policymaker) regulating the flow of people entering the city. On the other hand, the FOI is applied to clusters and cities already infected. As this metric is based on the proportion of the city population infected at the moment, this metric can be used to support social isolation or lockdown policies. Nevertheless, it is a multidimensional decision and should consider the city's healthcare system, the socio-economic impact of totally closing the city for its population. For example, in a low-income Brazilian neighborhood, a small house with one or two rooms may be the home of a family. All these aspects should be taken into account by policymakers, which is not the focus of this paper. Epidemiologists must support the definition of lockdown policies, supporting policymakers to assess all consequences when defining a lockdown procedure. It is essential to highlight that Pernambuco is similar to other states in Brazil, where its capital (Recife) is a center of gravity in terms of mobility and concentration of population. Therefore, it is the node with higher strength within the graph. Since our approach was based on the persistence of the network, it is important to notice that only those arcs with a weight higher than 2.5% were considered to build the final cluster scenario. Thus, since the fact that Recife is about 80km from the nearest border of Pernambuco, there is a minimal flow to other cities compared to the overall mobility. The data collected considered all mobile devices that were found within the cities of Pernambuco, and this includes devices from persons arriving from cities located in other states, including those that have borders with Pernambuco. The complete database of mobile phone devices monitored by In Loco Company has 60 million devices. From this database, only about 2 million devices had a visit record in Pernambuco during the data collection. There is a 2.5% of mobility flow not considered when building these clusters of cities. So, there could be an error around this not considered population flow, but this would not J o u r n a l P r e -p r o o f significantly change the clustering analysis since it was based on the network persistence built from such a large dataset. Our contribution is to provide information so policymakers may be able to control effectively this 2.5% of population flow, enabling, for instance, to establish further trace contacts or quarantine protocols instead of controlling all population flow without a strategic priority. With this perspective, epidemiologists can structure isolation and barrier protocols for ensuring safety to the essential population flows that may happen for the food and other essential supply chain activities. There are open issues and challenges to be addressed in future research. The information integration of the isolation barriers and our analysis with SEIR compartment models, Bayesian SEIR models, and other quantitative simulation models are the main challenges related to this research. Also, one could think about performing spatiotemporal clustering to analyze different cluster dynamics. Finally, another issue concerns how to connect healthy isolated cities or small clusters to relatively big healthy clusters in a way that does not increase the cluster's exposure risk. In this paper, networks based on mobile phone location data were used to analyze the isolation of cities in one of the most affected states of Brazil. Mobility thresholds are used to construct clusters and investigate the behavior of 192 cities before and after social distancing policies were implemented. Therefore, it was possible to observe mainly those cities which connect weakly and those who connect strongly with each other. Among the contributions of this approach, the clustering scenarios based on weighted flows can be used to aid rule-makers when deciding over isolating one city or a set or cities with border controls. Further developments of this study include cluster's risk assessment, which J o u r n a l P r e -p r o o f involves ranking from the most critical to the least critical one, depending on their situation, based on the adopted risk measures. Isolated clusters with no cases can receive priority when economic recovery policies start to get implemented. Therefore, economically viable clusters can be generated, preserving strong and weak relationships among cities. Thus, the idea is that our proposition may be used to provide intelligence, so policymakers may consider it associated with the epidemiologists' perspective and define regions of interest for isolation and progressive economic recovery. The isolation protocols and criteria for lockdown and opening areas shall be defined by epidemiologists considering the timeline of infections and available facilities for medical treatment. Future works can include small-world effects analysis, variations of algorithms for clustering such as the ones for weighted directed graphs. In this case, thresholds regarding a minimum amount of flow are not used to transform the original directed graph, and different clusters can be found and analysis performed. Furthermore, the inclusion of aspects regarding the hospital's situations of each city or region can be added to the risk analysis. Also, the analysis made in this paper is reproducible for other states of Brazil and other countries. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The Number of Confirmed Cases of Covid-19 by using Machine Learning: Methods and Challenges. Archives of Computational Methods in Engineering, 0123456789 The Lisbon ranking for smart sustainable cities in Europe Modelling the impact of testing, contact tracing and household quarantine on second waves of COVID-19 On the Coronavirus (COVID-19) Outbreak and the Smart City Network: Universal Data Sharing Standards Coupled with Artificial Intelligence Assessing the Impact of Reduced Travel on Exportation Dynamics of Novel Coronavirus Infection (COVID-19) Mobility trends reports The architecture of complex weighted networks A foundational framework for smart sustainable city development: Theoretical, disciplinary, and discursive dimensions and their synergies Graph Theory STF decide que governadores e prefeitos podem decretar isolamento na pandemia Where do cyclists ride? A route choice model developed with revealed preference GPS data Phase-and epidemic region-adjusted estimation of the number of coronavirus disease 2019 cases in China The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak The igraph software package for complex network research Covid-19: four fifths of cases are asymptomatic, China figures indicate Exploring spatio-temporal commuting patterns in a university environment Public governance mechanism in the prevention and control of the COVID-19: information, decision-making and execution Computational analysis of SARS-CoV-2/COVID-19 surveillance by wastewater-based epidemiology locally and globally: Feasibility, economy, opportunities and challenges The Blood of Christ Compels Them : State Religiosity and State Population Mobility During the Coronavirus International Journal of Infectious Diseases First , second and potential third generation spreads of the COVID-19 epidemic in mainland China : an early exploratory study incorporating locationbased service data of mobile devices Building an Open Resources Repository for COVID-19 Research Building an Open Resources Repository for COVID-19 Research Population flow drives spatio-temporal distribution of COVID-19 in China Population flow drives spatio-temporal distribution of COVID-19 in China Modeling Infectious Diseases in Humans and Animals The effect of human mobility and control measures on the COVID-19 epidemic in China Secondary attack rate and superspreading events for SARS-CoV-2. The Lancet The Microsoft Indoor Localization Competition: Experiences and Lessons Learned Pervasive computing in the context of COVID-19 prediction with AI-based algorithms Quantifying the impact of urban road networks on the efficiency of local trips Smart campus-A sketch Coronavírus Brasil Modeling future spread of infections via mobile geolocation data and population dynamics . An application to COVID-19 in Brazil COVID-19 outbreak response, a dataset to assess mobility changes in Italy following national lockdown. Scientific Data Disability, Urban Health Equity, and the Coronavirus Pandemic: Promoting Cities for All Large-scale assessment of human mobility during COVID-19 outbreak Exploiting IoT and big data analytics: Defining Smart Digital City using real-time urban data Covid-19 epidemic in Italy: evolution, projections and impact of government measures COVID-19 no Mundo Detection of coronavirus Disease (COVID-19) based on Deep Features and Support Vector Machine Minimizing the risk of international spread of coronavirus disease 2019 (COVID-19) outbreak by targeting travelers Coronavirus and Migration: Analysis of Human Mobility and the Spread of Covid-19 Association between climate variables and global transmission oF SARS-CoV-2 A machine learning forecasting model for COVID-19 pandemic in India Accounting for multi-dimensional dependencies among decision-makers within a generalized model framework: An application to understanding shared mobility service usage levels Impact of international travel and border control measures on the global spread of the novel 2019 coronavirus outbreak No TitleMiddle East respiratory syndrome coronavirus (MERS-CoV) Q&A: Influenza and COVID-19 -similarities and differences 2019/question-and-answers-hub/q-a-detail/q-a-similarities-and-differences-covid-19-and-influenza Nowcasting and forecasting the potential Beware of the second wave of COVID-19 in China under public health interventions Changes in contact patterns shape the dynamics of the COVID-19 outbreak in China Changes in contact patterns shape the dynamics of the COVID-19 outbreak in China Effects of human mobility restrictions on the spread of COVID-19 in Shenzhen, China: a modelling study using mobile phone data. The Lancet Digital Health The authors thank the InLoco company for all the cooperation for this study. This study was partially supported by Facepe (IBPG-0753-3.08/17, IBPG-0373-1.03/19), CNPq