key: cord-0763558-rvimszc8 authors: Fang, Lei; Huang, Jinliang; Zhang, Zhenyu; Nitivattananon, Vilas title: Data-driven Framework for Delineating Urban Population Dynamic Patterns: Case Study on Xiamen Island, China date: 2020-07-04 journal: Sustain Cities Soc DOI: 10.1016/j.scs.2020.102365 sha: bb97692e2eee67749fec0a4b0fc35f6152d35e2b doc_id: 763558 cord_uid: rvimszc8 The effective data mining of social media has become increasingly recognized for its value in informing decision makers of public welfare. However, existing studies do not fully exploit the underlying merit of big data. In this study, we develop a data-driven framework that integrates machine learning with spatial statistics, and then use it on Xiamen Island, China to delineate urban population dynamic patterns based on hourly Baidu heat map data collected from August 25 to September 3, 2017. The results showed that hot grids are primarily clustered along the main street through the downtown area during working days, whereas cold grids are often observed at the edge of the city during the weekend. The mixed use (of commercial and life services, restaurants and snack bars, offices, leisure areas and sports complexes) is the most significant contributing factor. A new cold grid emerged near conference venues before the Brazil, Russia, India, China, and South Africa Summit, revealing the strong effects of regulations on population dynamics and its evolving patterns. This study demonstrates that the proposed data-driven framework might offer new insights into urban population dynamics and its driving mechanism in support of sustainable urban development. Urban areas are the busiest places for human activities, such as transportation, education, catering, shopping, sports, and leisure. Hence, the spatiotemporal distribution of urban populations differs significantly from that of rural areas (Yang, Liu, Li, & Li, 2018) . The dynamics of a population within a particular spatial zone can promote the re-aggregation and diffusion of social and economic activities (Jacobs, 1961; Jacobs-Crisioni, Rietveld, Koomen, & Tranos, 2014) . China has been experiencing rapid urbanization over the past few decades, and the number of urban population reached 58.52% by the end of 2017 (He, Zhou, Tang, Fan, & Guo, 2019) . The significant change in urban population in China is a key driver of urbanization, informatization, industrialization, and globalization in the future, which can reshape the dynamic patterns of the population (de Haas, 2010; Pan & Lai, 2019) . Therefore, accurate mapping and understanding regarding the dynamic patterns of urban population as well as epidemiological modeling and disaster prevention are essential for a sustainable urban planning and development (Catlett, Cesario, Talia, & Vinci, 2019; Liu, Wang, Xiao, & Gao, 2012; Santos-Reyes & Olmos-Peña, 2017; Zhang, Huang, Duarte, & Zhang, 2016) . Conventional methods to acquire urban population data were accomplished by questionnaires and census sampling, which covered approximately only 1% of the population and focused on J o u r n a l P r e -p r o o f specific time segments or macroscale dynamic patterns over a long time period (Fan, 2005; Dong, Pu, & Wang, 2013) . Interpolation, choropleth mapping, and dasymetric mapping, for example, are widely used methods to estimate the spatial distribution of populations. However, these maps create uniform distributions of populations in space according to geographic units (census tracts or block groups), causing the values of the mapped population to change abruptly in the boundaries of spatial units, although the dasymetric method can partially amend this problem using statistical surfaces (Nikola, Branislav, Milan, & Dragutin, 2011) . Therefore, the current approaches cannot acquire and map spatiotemporal population data completely and efficiently, and precise and quantitative studies regarding population dynamic patterns are rare. In recent years, with the rapid development in information and communication technology as well as the extensive use of smartphones with various location-based applications (typically known as "Apps") or social media platforms (e.g., Twitter, Foursquare, Weibo, Baidu heat map, etc.), the ability to capture accurate location information and point of interest (POI) has become easier and cheaper than before. Numerous studies regarding population or urban dynamics are currently based on location-related information acquired from application programming interfaces or social media platforms (García-Palomares, Salas-Olmedo, Moya-Gómez, Condeço-Melhorado, & Gutiérrez, 2018; Li, Li, Yuan, & Li, 2019; Shen & Karimi, 2016; Jiang, Alves, Rodrigues, Ferreira, & Pereira, 2015; Zhen, Cao, Qin, & Wang, 2017) . Twitter data, for example, are a promising alternative to big data source that provides the information required for studying people dynamics at the local level. Researchers have used geo-tagged (active check-in) Twitter data, which contained precise spatial and temporal information to perform a city dynamics analysis (García-Palomares, Salas-Olmedo, Moya-Gómez, Condeço-Melhorado, & Gutiérrez, 2018) . However, researchers have indicated that geo-tagged Tweets constituted only 1% of all posted messages, and that the users were primarily "young people"; furthermore, "check-in" activities typically occurred in scenic, educational, or commercial spots at specific times, rendering the results biased and incomplete (Blanford, Huang, Savelyev, & MacEachren, 2015) . Therefore, a wider collection of big data sets is required to compensate for the inherent data biases from the Twitter dataset. Unlike Twitter in terms of information interaction, the Baidu heat map was developed passive location information collection (Li, Li, Yuan, & Li, 2019; Lyu & Zhang, 2019; Zhou,Pei & Wu, 2018 ). It will not only constantly record the location information in the background, but also J o u r n a l P r e -p r o o f integrate the location information streamed from various widely used LBS (location based services) apps in the smartphone market, most of which were developed or purchased by the Baidu Company. The information includes almost every aspect of daily life, such as navigation, local services, and takeaway services (e.g., Baidu Map, Baidu Nuomi, Baidu Search. Visit the official website for more information: http://home.baidu.com/home/index/company). In contrast to "check-in" or census data, this dataset of heat maps is nearly unbiased and can provide almost real-time population data, which can provide robust and appropriate data for population dynamics studies. Population distribution has been characterized using the Getis-Ord Gi * method and Baidu heat map dataset (Li, Li, Yuan, & Li, 2019) . However, the Getis-Ord Gi * method can only extract hot and cold spots in one data frame. In other words, the results are static at a specific time slice. As this method fails to analyze the dynamic patterns using a nearly real-time big data series, a threedimensional (3D) GIS visualization technique was proposed to characterize the dynamics of city structures (Zhiqiang & Zhongnan, 2016) or human movement (Gao, 2015) . However, this method mainly focused on descriptive analysis, rendering the lack of statistics in the results owing to the lack of objective tests. Therefore, a more robust and efficient approach must be developed to fully utilize the big data set not only in the static data frame, but also in the entire time series with quantitative measures. Xiamen, the island city formerly known in the Western circles as Amoy, is emerging as Southeastern China's most sophisticated city (Fig. 1) . Xiamen Island has two districts, i.e., the Huli and Siming districts. The Huli district comprises five subdistricts (Jiangtou, Heshan, Huli, Jinshan and Dianqian), whereas the Siming district comprises nine subdistricts (Kaiyuan, Jialian, Zhonghua, Yuandang, Xiagang, Wucun, Lujiang, Lianqian, and Binhai). These districts have undergone rapid urbanization over the past four decades. As the host of the 2017 Brazil, Russia, India, China, and South Africa (BRICS) Summit, Xiamen has attracted attention worldwide for its beautiful scenery and unique culture. It is interesting to observe how the population pattern dynamics responds to changing local regulations and rules during the event in a modern coastal city such as Xiamen, which may be implications for its sustainable urban development. J o u r n a l P r e -p r o o f Hence, this study was performed to achieve the following objectives: 1) to develop a deep mining approach to visualize and analyze urban population dynamic patterns using hourly Baidu heat maps, and 2) to explore the potential influencing mechanism of the dynamic patterns of urban populations. The findings of this study are significant for sustainable urban planning and development. Herein, a new space and time deep mining method is proposed to analyze urban population dynamics and its evolving patterns based on Baidu heat maps. As shown in Fig.2 , this framework comprises four components: (1) data preparation and processing, (2) spatiotemporal distribution analysis, (3) dynamic patter analysis, and (4) driving mechanism analysis. The major methods involved are described in detail in the following sections. Chinese corporations, has a series of widely used applications. It is estimated that approximately 1.1 billion Chinese mobile users access those applications each month (Chinese Search Engine Market Share, 2018). The Baidu heat map is a type of visualization product that uses colors to display the relative population density, where red indicates high population density, and blue indicates low population density (Appendix 1). The Baidu heat map is updated every 15 min to reflect the real-time population distribution and uses an eight-bit map value (i.e., 256 values) to represent the relative population density (from 0 to 255). Although this value could not reflect the real population, it is a good big data source for population distribution and dynamics assessment. The Baidu heat maps obtained were handled by the batch processing of georeferencing, time aggregation, gridding, and population density index (PDI) calculation. However, the geographical coordinate information of the Baidu heat maps obtained were insufficient. To match the heat map with other typically used remote sensing images (such as Landsat), the WGS 84 coordinate system was selected as the coordinate system. All the heat maps obtained have the same scope and size. Therefore, they have the same georeference coordinates in terms of longitudes and latitudes. First, a Baidu heat map registration process was performed with the ArcGIS georeference tool. Subsequently, the information of this geo-referenced coordinate was used in other Baidu heat maps through a batch processing program. Because the number of active users varied considerably, considerable fluctuations were observed in the Baidu heat map (Leng, Ying, Huang, & Zheng, 2015) . Therefore, the PDI was applied to normalize the data, as follows (1): where PDI is the normalized population density index; Qth is the summary of the heat map value in zone h at time t; ∑Qth represents the summary of the heat map value of all zones at time t. To realize the merit of these datasets and to facilitate analysis, the data were aggregated into the following four time slots: morning (07:00 to 10:59), noon (11:00 to 14:59), afternoon (15:00 to 18:59), and night (19:00 to 22:59), and the average PDI of each time slot were calculated daily. Compared with previous investigations conducted at the city or regional level (Chen, Liu, Li, Liu, Yao, Hu, Xu, & Pei, 2017; He, He, Song, Wu, Yin, & Mou, 2018; Huang & Wong, 2016) , J o u r n a l P r e -p r o o f 400 m × 400 m spatially connected grids were created in this study to reflect the heterogeneity of the population dynamics. Consequently, 1096 grids covering the entire island were constructed to support this analysis (Fig. 1 ). However, it was difficult to manually calculate the PDI value of each grid for all the heat maps, hence, an iterative calculation program was developed using Model Builder, ESRI ArcGIS 10.3 software (Arcgis Pro, 2019). Model Builder is a visual programing language for building geoprocessing workflows, where a model is represented as a diagram that chains together sequences of processes and geoprocessing tools using the output of one process as the input to another process. The PDI value of all the heat maps will be calculated automatically at each iteration. The workflow diagram is shown in Fig. 3 . A web mining technique was developed to automatically collect 149,074 POI data of Xiamen Island from the Baidu map. The POI refers to a geographic entity that can be abstracted as a point, containing precise spatial information (latitude and longitude). A name or description and a category for POI is typically included. POI categories are similar to land-use categories and the preferences and social functions of people can be well represented by POIs. The type and density of POIs at a specific location can directly or indirectly reflect land-use and functional zoning (Wu, Ye, Ren & Du, 2018). These points were classified into 12 types according to the classification of urban land use and planning standards of development Land (GB50137-2011). Some POI points that were not related to land use, such as public toilets and newsstands were removed, whereas some categories were aggregated into a major one. Therefore, six major categories were classified: Spatiotemporal distribution analysis was applied to Baidu heat maps at different time slots, including local spatial autocorrelation analysis, which was used to assess the local clustering characteristics of each grid; consequently, statistically significant hot/cold grids can be recognized. Meanwhile, a 3D space-time cube approach was used to visualize the spatiotemporal distribution of the statistically significant hot/cold grids. Getis-Ord Gi* was used to evaluate the local clustering characteristics at each time slot; subsequently, the spatiotemporal distribution of hot and cold grids was statistically identified (Getis & Ord, 2010; Ord & Getis, 1995) . The null hypothesis for this statistical test was complete spatial randomness (CSR), which postulates that the observed spatial phenomena represent one of many possible spatial arrangements (Fig. 6 ). For example, if we select all the grids with different PDI values and throw them down randomly after infinite times, grids with highest PDI values would occasionally be accidentally thrown into the same area, and the probability (p-value) would be small (less than 0.01); consequently, the null hypothesis would most likely be rejected; therefore, a hot grid can be distinguished. The z-score (also known as Gi* value) is the standard deviation. The resultant z-scores and p-values indicate that the grids with either high or low values cluster or disperse spatially. The associated equations are as follows: J o u r n a l P r e -p r o o f where xj is the PDI for grid j; ωi,j is the spatial weight between grids i and j; n is equal to the total number of grids, which is 1096. To ensure the credibility and reliability of this procedure, some preconditions must be clarified, such as the method to define the spatial relationship between the grids and their values. Compared with traditional (nonspatial) statistics, the foundation of this spatial statistical test is the CSR hypothesis. We wish to determine if the observed spatial pattern (hot or cold spot) represents one of many (n!) possible spatial arrangements such that we can reject the CSR hypothesis. Some previous studies have indicated that multiple testing techniques are suitable for obtaining the optimal parameters (Goovaerts, 2010; He, Zhou, Tang, Fan, & Guo, 2019) . Therefore, we conducted 10,000 iterations of random sampling to test the possibility of the observed spatial patterns. Finally, the spatial relationship was set to a fixed distance and its neighborhood search threshold was set to 400 m. For a more stringent and credible statistical test, the false discovery rate (FDR) procedure was applied, which can potentially reduce the critical p-value thresholds shown in Fig. 6 . A 3D space-time cube technique was used to statistically visualize the spatiotemporal distribution of hot/cold grids. Consider the morning slot from August 25 to September 3 as an example (Fig. 7) . Each cube represents the clustering status at one location during a specific time slot per day. The column with different colors represents the daily variation from August 25 to September 3. The slice represents the spatial distribution of the hot/cold grids on the same day. J o u r n a l P r e -p r o o f Dynamic pattern analysis was performed on each time series of the hot/cold grids at different time slots. The trends of those hot/cold grids in each column were evaluated using the Mann-Kendall trend test (Hamed, 2009; Kossack & Kendall, 1950; MANN, 1945) . The Mann-Kendall trend test is a non-parametric test; therefore, it is applicable to all distributions. This method is typically applied to detect increasing or decreasing trends in a time series data-set. The null hypothesis for this test is that no monotonic trend exists in the series, and the alternate hypothesis is that a trend exists. The equations involved are as follows: J o u r n a l P r e -p r o o f The mean of S is E[S]=0 and var(S)= n(n-1)(2n+5)/18. If a trend is present, the sign values will tend to increase or decrease constantly. Every value is compared to every value preceding it in the time series. The test can be used to obtain trends for as few as four samples. However, with only a few data points, the test has a high probability of not obtaining a trend when one is present if more points are provided. The more the data points, the more likely is a true trend is obtained in the test (as opposed to one obtained by chance). The minimum number of recommended measurements is between 8 and 10. Therefore, 10 days of continuous data (August 25 to September 3, 2017) were used to guarantee the credibility of the results. The dynamic patterns for each column were then identified according to the trend characteristics. Seventeen types of dynamic patterns were identified (Arcgis Pro, 2019). A new hot spot is a grid that is a statistically significant hot spot for the final time step and has never been a statistically significant hot spot before. An intensifying hot spot means a grid that has been a statistically significant hot spot for 90% of the time-step intervals including the final time step, in which the intensity of clustering of high counts in each time step is increasing and that increase is statistically significant. Detailed information regarding other types of dynamic patterns are presented in Appendix 2. A driving mechanism analysis was performed in terms of spatial functional differentiation, timing of urban activities, and regulation and local rules to determine the factors driving population dynamics and their evolving patterns on Xiamen Island. Local clustering analysis (Gi*) based on three significance levels (i.e., p = 0.01, 0.05, and 0.1) was performed to detect the hot spots for each POI category, by which the effects of spatial functional differentiation on population dynamic patterns can be determined. We used the PDI for four time slots (i.e., morning, noon, afternoon, and night), which were defined in subsection 2.2, to detect the effects of the timing of activities on the population dynamic patterns. The population dynamic patterns before and during the BRICS Summit were used to measure the subsequent effect from regulations and local rules. J o u r n a l P r e -p r o o f The spatiotemporal distribution of the hot and cold grids at different time slots was represented in the form of space-time cube. Each space-time cube represents one day in a 400 m × 400 m area. Each column contained 10 cubes that represented the timeline from August 25 to September 3 (Fig. 8 ). As shown in Fig. 8 , most of the cold grids were distributed in the north and south of Xiamen Island, and the hot grids appeared in an "X" shape along the major streets through the downtown area. The remaining insignificant grids were observed at the junction between the hot and cold areas. In some areas, the grid clustering type varied over time, which means that the grid clustering type can change from hot to cold or vice versa. A series of dynamic changes was observed at the east coast of the island, especially in term of clustering type at the summit venue. Some grids changed from hot to cold during the test slot. The clustering types varied substantially among the different areas (Fig. 9) . The total number of grids with different clustering types in the subdistricts is illustrated from August 25 to September 3. Many subdistricts indicated a single-type distribution but the majority was shown by hot or cold grids, e.g., the Heshan, Jiangtou, Yuandang, Binhai, Lujiang, Jialian, and Zhonghua subdistricts. accounting for 31.8% to 41.9% of the total grids. However, they were rarely observed on August 27 (Sunday), which constituted 16.0% to 24.5% of the total grids. These findings suggest that people were resting rather than gathering in public places. Furthermore, on the first half of September 3, which coincided with the BRICS opening morning, hot grids reduced significantly to 26.1% and 25.7%. Local residents were advised not to be out on the morning of the BRICS opening day for security reasons. However, the hot grids for the afternoon and night slots did not decrease significantly compared with the same time slot of the previous days. Furthermore, a slightly increasing trend was observed during the night slot, suggesting that local residents preferred to gather after the opening. The dynamics of the cold grids was almost opposite those of the hot grids. The cold grids were observed the most on August 27 (Sunday) in the morning and noon slots, which constituted 42.9% to 52.0% of the total grids, respectively. This finding indicates that the local people were relaxing at home on Sunday rather than working or commuting. Another interesting phenomenon observed was that the number of insignificant grids (or hybrid dynamics) remained at almost the same level for the entire day, and the insignificant grids were normally J o u r n a l P r e -p r o o f distributed at the boundary of the hot and cold areas (Fig. 8) . A total of 17 population dynamic patterns were identified at various time slots from August 25 to September 3 (Fig. 11) . The detailed spatial results are shown in respectively. These cold grids were commonly associated with industrial area, port, newly-built-up, and mountainous areas. A new hot block was observed in the the Dianqian district in the afternoon and night slots, indicating that people were increasingly attracted to the city under the urbanization. One historical cold block and 18 diminishing cold grids were identified in the Lianqian district in the night slot, suggesting that the population intensities in these cold grids were increasing, and one of those grids did not show any cold characteristics on the last day. In total, 16, 28, and 32 new cold grids were observed for the morning, noon, and afternoon slots, respectively. This means that these grids had never been cold until the last day. Among them, a new cold grid was located exactly near the Xiamen International Convention and Exhibition Center in the noon slot, demonstrating the powerful effect of the regulations on population movement. The hot spots for each POI category were detected (Fig. 12) . The greatest spatial autocorrection was observed in the CPOIs, followed by the RPOIs, OPOIs, LPOIs, SPOIs, and EPOIs. Most of the hot spots for the commercial and life services, restaurants and snack bars were located in the downtown area from the southwest to the center of the island. Meanwhile, most of the hot spots for scenic and green spaces were distributed in the southwest of the island, which is the traditional downtown area of Xiamen city. The hot spots for offices, leisure, and sports were more scattered around the island, and some hot spots for offices were observed in the north of the island compared with other categories. Hot spots for education and health, however, were primarily distributed along the west coast or east coast of the island. Fig. 12 indicates that most of the insignificant hot spots (cold spots) for each category were in the north and south of the island; these areas were limited to mountains, airports, industrial areas, or wetland parks. J o u r n a l P r e -p r o o f A new deep mining framework using a nearly unbiased and real-time big data set is proposed herein to characterize the population dynamics and its evolving patterns. Historically, questionnaires, sampling, physical modelling, and theory-driven based extrapolation beyond the observed data have been widely used in demography. Choropleth mapping and dasymetric mapping, for example, are based on Tobler's First Law of Geography, i.e., everything is related to everything else, but near things are more related to each other (Tobler, 1970) . They can extrapolate any quantitative variable based on geographical units such as distance to the observed phenomenon. However, these methods always create an impression of an overly uniform area on the edge of two different land-use types (such as urban area and water body), or change abruptly at two continuous subdistricts on the calculated population distribution. Other researchers have recognized these problems (Nikola, Branislav, Milan, & Dragutin, 2011; Mennis, 2003) . In the big data era, a deluge of earth system data has become available. For instance, social networking and user-generated web content, which has been termed volunteered geographic information (VGI; Goodchild, 2007) , has gained more attention in recent years. In particular, the mobile phone has become an important platform by which interactions between individuals and their geographic space can be observed. Large volumes of data are already well beyond dozens of petabytes, with rapidly increasing transmission rates exceeding hundreds of terabytes per day (Agapiou, 2017) . Therefore, multiple VGI data have provided a well-informed demographic behavior data source owing to their nearly real-time and full coverage. However, the ability of information processing methods has not increased in pace with data availability. Several major challenges must be addressed, such as the method to extract knowledge from data, as well as the method to derive models and predictions that learn much more from data than traditional physical modeling approaches. This point of view has been emphasized by Reichstein et al. (2019) . Although population distributions have been conducted using multiple VGI data previously worth of data was used in their study as basic data, i.e., February 18, 2017 (Saturday) as an off day and February 20, 2017 (Monday) as a working day. Relatively static results were used to represent the population distribution in these two days However, more data are required to understand the population dynamic pattern and its underlying driving mechanism using big data and deep mining techniques. Deep/machine learning and data-driven approaches are used increasingly to characterize patterns and gain insights with the support of the ever-increasing stream of big data, which has been high latitudes (Forkel, 2016) . However, to our knowledge, reports on the spatiotemporal distribution of urban populations and their dynamic patterns using data-driven and deep mining approaches are not available. Compared with other related studies, our proposed framework offers several advantages. First, we developed a data-driven approach to characterize the population dynamic patterns at each grid from the Baidu heat maps with a total volume of 1.83 GB from August 25 to September 3, 2017 (Appendix1) and to identify the underlying driving mechanism based on 149,074 POIs (Figs. 4 and 5), which were acquired from the Baidu map. Second, we developed a machine learning method integrating a multiple testing operation with an FDR procedure to optimize the method to identify the spatiotemporal distribution of the population. Third, a space-time cube method was designed to statistically represent the hot/cold spots at one grid per day at a specific time slot (Fig. 8) . The framework proposed in this study can be easily applied for extracting useful information from other spatiotemporal big data sets. Using the precipitation radar data as an example, we should focus more on the consecutive, intensifying, and persistent hot areas to mitigate against flooding. However, the drought probabilities in consecutive, intensifying, and persistent cold areas are higher than those in other places. This approach may offer a fresh data-driven pertaining to water-related hazards modeling and forecasting compared with traditional physic-based models. Based on the results of this study, we discovered that urban spatial functional differentiation is the most essential factor that can affect the spatial distribution of a population (Figs. 11 and 12) . In summary, the distribution patterns of a population are significantly spatially correlated with the distribution patterns of POIs. However, the hot spots of a population are not singly related to each type of POI, and mixed use is the most significant driving force on population distribution. Since the 1980s, mixed use has regained attention by theories such as sustainable development and new urbanism (Grant 2002) . This point of view was advocated by Yue et al. (2017) . China has witnessed rapid development since the reform and the opening up policy decades before. However, infrastructures, such as life services, hospitals, universities, finance, and business offices are primarily located downtown in urban areas, resulting in uneven social or economic developments compared with rural areas. He et al. (2019) reported the same results. Meanwhile, the cold spots of the population were highly consistent with the cold spots of POIs for each category, which were concentrated in mountains, industrial parks, and newly developed residential areas. The timing of various urban activities is an internal factor that determines the population J o u r n a l P r e -p r o o f dynamics. Hot grids were primarily observed on work days. During the weekend, however, people typically relax at home or visit relatives and friends. These types of lifestyles resulted in a more scattered clustering type (Figs. 9 and 10); similar findings have been reported in previous studies (Leng, Ying, Huang, & Zheng, 2015; Li, Li, Yuan, & Li, 2019) . This investigation also revealed that a fixed number of people distributed and moved randomly and evenly throughout the area and time, resulting in many hybrid grids. These hybrid grids were typically observed at the boundary of hot and cold areas, which was not reported in previous studies. Regulations and local rules are external factors that significantly affected the spatiotemporal distribution of the population. Quantitative evidence from this investigation revealed that the number of people decreased before and during the BRICS meeting days, where a significant decrease was observed on September 2 (Fig. 10) , and several new cold grids appeared near the BRICS Summit venue, airport, and around the restricted areas (Fig. 11) . To our knowledge, this is the first study to quantify the effects of regulations and local rules on population dynamics. Based on this study findings and supported by relevant literature as presented earlier, a number of implications on policy recommendations for sustainable urban development can be of interest. Firstly, and most significantly, the data-driven framework and supporting methods could be used for monitoring and evaluation of effectiveness related to population dynamics following external factors such as implementing specific regulations and local rules to control urban development and planning. Secondly, not only for special event like the BRICS Summit in 2017 but also other events especially related to natural and epidemic disasters (such as Typhoon and COVID-19), by possible application of this study it is interesting to know how people react to the risk incl. their movements in relation to urban space, facilities and services which can contribute to the design of effective responding or adaptation measures. Lastly, but not least while very basic implication, the key factors (i.e., driving mechanism in this study) as related to land use types (incl. different POIs and the mixed use) influencing population distribution could be an important set of findings for consideration of different land use plan and control measures associated with population activities. The data-driven framework proposed in this study offers new insights into the urban population dynamics and its driving mechanism in Xiamen Island. Hot grids were primarily The authors declared that they have no conflicts of interest to this work. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted. A grid with a single uninterrupted series of statistically significant hot spot in the final time-step intervals. The grid has never been a statistically significant hot spot prior to the final hot spot series and less than ninety percent of all time series are statistically significant hot spots. A grid that has been a statistically significant hot spot for ninety percent of the time-step intervals, including the final time step, and the intensity of clustering in each time step is increasing overall and that increase is statistically significant. A grid that has been a statistically significant hot spot for ninety percent of the time-step intervals with no discernible trend indicating an increase or decrease in the intensity of clustering over time. A grid that has been a statistically significant hot spot for ninety percent of the time-step intervals, including the final time step, and the intensity of clustering in each time step is decreasing overall and that decrease is statistically significant. A grid that is an on-again then off-again hot spot. Less than ninety percent of the time-step intervals have been statistically significant hot spots and none of the timestep intervals have been statistically significant cold spots. A statistically significant hot spot for the final time-step interval that has a history of also being a statistically significant cold spot during a prior time step. Less than J o u r n a l P r e -p r o o f ninety percent of the time-step intervals have been statistically significant hot spots. The most recent time period is not hot, but at least ninety percent of the time-step intervals have been statistically significant hot spots. A grid that is a statistically significant cold spot for the final time step and has never been a statistically significant cold spot before. A grid with a single uninterrupted series of statistically significant cold spot in the final time-step intervals. The grid has never been a statistically significant cold spot prior to the final cold spot series and less than ninety percent of all time series are statistically significant cold spots. Cold Spot A grid that has been a statistically significant cold spot for ninety percent of the time-step intervals, including the final time step, and the intensity of clustering in each time step is increasing overall and that increase is statistically significant. A grid that has been a statistically significant cold spot for ninety percent of the time-step intervals with no discernible trend, indicating an increase or decrease in the intensity of clustering over time. J o u r n a l P r e -p r o o f Diminishing Cold Spot A grid that has been a statistically significant cold spot for ninety percent of the time-step intervals, including the final time step, and the intensity of clustering in each time step is decreasing overall and that decrease is statistically significant. A grid that is an on-again then off-again cold spot. Less than ninety percent of the time-step intervals have been statistically significant cold spots and none of the timestep intervals have been statistically significant hot spots. A statistically significant cold spot for the final time-step interval that has a history of also being a statistically significant hot spot during a prior time step. Less than ninety percent of the time-step intervals have been statistically significant cold spots. The most recent time period is not cold, but at least ninety percent of the time-step intervals have been statistically significant cold spots. Does not fall into any of the hot or cold spot patterns defined above. Remote sensing heritage in a petabyte-scale: satellite data and heritage Earth Engine© applications Terrestrial gross carbon dioxide uptake: global distribution and covariation with climate Spatio-temporal crime predictions in smart cities: A data-driven approach and experiments Delineating urban functional areas with building-level social media data: A dynamic time warping (DTW) distance based k-medoids method Chinese Search Engine Market Share Migration and Development: A Theoretical Perspective Interprovincial Migration, Population Redistribution, and Regional Development in China: 1990 and 2000 Census Comparisons Enhanced seasonal CO2 exchange caused by amplified plant productivity in northern ecosystems Spatio-Temporal Analytics for Exploring Human Mobility Patterns and Urban Dynamics in the Mobile Age City dynamics through Twitter: Relationships between land use and spatiotemporal demographics The Analysis of Spatial Association by Use of Distance Statistics Mixed use in theory and practice: Canadian experience with implementing a planning principle Citizens as Sensors: The World of Volunteered Geography How do multiple testing correction and spatial autocorrelation affect areal boundary f analysis Exact distribution of the Mann-Kendall trend test statistic for persistent data The impact of urban growth patterns on urban vitality in newly built-up areas based on an association rules analysis using geographical 'big data The spatial organization pattern of urban-rural integration in urban agglomerations in China: An agglomeration-diffusion analysis of the population and firms Activity patterns, socioeconomic status and urban spatial structure: what can social media data tell us? The death and life of Great American cities Evaluating the impact of land-use density and mix on spatiotemporal urban activity patterns: an exploratory study using mobile phone data Rank Correlation Methods Big Data Based Job-residence Relation In Chongqing Metropolitan Area Spatiotemporal distribution characteristics and mechanism analysis of urban population density: A case of Xi'an Urban land uses and traffic 'source-sink areas' : Evidence from GPS-enabled taxi data in Shanghai Using multi-source big data to understand the factors affecting urban park use in Wuhan. Urban Forestry & Urban Greening Nonparametric tests against trend Generating Surface Models of Population Using Dasymetric Mapping Modelling the spatial distribution of Vojvodina's population by using dasymetric method Local Spatial Autocorrelation Statistics: Distributional Issues and an Application Spatial pattern of population mobility among cities in China: Case study of the National Day plus Mid-Autumn Festival based on Tencent migration data Deep learning and process understanding for data-driven Earth system science A research on complex network of Chinese interprovincial migration based on the fifth population census Analysis of the 'News Divine' stampede disaster Urban function connectivity: Characterisation of functional urban streets with social media check-in data Mining point-of-interest data from social networks for urban land use classification and disaggregation. Computers, Environment and Urban Systems Code for classification of urban land use and planning standards of development land A Computer Movie Simulating Urban Growth in the Detroit Region Measure of urban-rural transformation in Beijing-Tianjin-Hebei region in the new millennium: Population-land-industry perspective Measurements of POI-based mixed use and their relationships with neighbourhood vibrancy Dynamic population flow based risk analysis J o u r n a l P r e -p r o o f of infectious disease propagation in a metropolis Delineation of an urban agglomeration boundary based on Sina Weibo microblog 'check-in' data: A case study of the Yangtze River Delta Early Warning of Human Crowds Based on Query Data from Baidu Maps: Analysis Based on Shanghai Stampede This research was supported by the National Natural Science Foundation of China (Grant No. 41971231; Grant No. 41471154). We thank Dr. Wang Shih-Chi from the U.S. EPA for his assistance with English writing and anonymous reviewers for constructive feedbacks and comments that helped improve this paper. A grid that is a statistically significant hot spot for the final time step and has never been a statistically significant hot spot before.