key: cord-0058628-ias9gkkl authors: Gonschorek, Julia title: Geoprofiling in the Context of Civil Security: KDE Process Optimisation for Hotspot Analysis of Massive Emergency Location Data date: 2020-08-19 journal: Computational Science and Its Applications - ICCSA 2020 DOI: 10.1007/978-3-030-58811-3_42 sha: 8151b85f6d141ba073a04a28220ea9fc780c0e02 doc_id: 58628 cord_uid: ias9gkkl In the performance of their duties, authorities and organisations with safety and security tasks face major challenges. As a result, the need to expand the knowledge and skills of security forces in a targeted manner through knowledge, systemic and technological solutions is increasing. Of particular importance for this inhomogeneous end user group is the time factor and thus in general also space, distance, and velocity. Authorities focus on people, goods, and infrastructure in the field of prevention, protection, and rescue. For purposive tactical, strategic, and operational planning, geodata and information about past and ongoing operations dispatched and archived at control centers can be used. For that reason, a rule-based process for the geovisual evaluation of massive spatio-temporal data is developed using geoinformation methods, techniques, and technologies by the example of operational emergency data of fire brigade and rescue services. This contribution to the extension of the KDE for hotspot analysis has the goal to put the professional and managerial personnel in a position to create well-founded geoprofiles based on the spatial-temporal location, distribution, and typology of emergency mission hotspots. In doing so, significant data is generated for the neighborhood of the operations in abstract spatial segments, and is used to calculate distance measures for the Kernel Density Estimation (KDE) process. At the end there is a completely derived rule-based kde process for the geovisual analysis of massive spatio-temporal mass data for hotspot geoprofiling. This includes emergency response plans for major emergencies and special protection plans for particularly endangered objects. Geoprofiling, also known as geographic profiling or geographic profiling analysis, should be understood as a bundle of methods. "[The] geographic locations of an offender's crimes […] are used to identify and prioritize areas where the offender is likely to live." [1] Geoprofiling in the context of civil security was first adopted by Rossmo in the 1990s as a theme in perpetrator identification. The main function of this investigative methodology is "to prioritise suspects and assist in investigative information management." [2] The differences between the requirements of geoprofiling for police applications and for the fire brigade and rescue services are based on the respective area of responsibility and competence. Emergency location and time are known very precisely; it is not a matter of identifying offenders or reconstructing the course of events or estimating possible further victims in the case of serial offenders. Also in the field of responsibility of the fire brigade and the rescue service there are usually no spatio-temporal connections to preceding or subsequent emergencies. An exception includes "fire devils" and vandalism as well as domestic violence. However, the police investigate this. Fire and rescue services provide care for injured persons in such situations and secure the infrastructure in the event of fire or possibly leaking hazardous substances. Geovisual analysis with geoprofiles in the non-police control centres are used to plan requirements and deployment of forces. A typical question aims at the change of hotspots, e.g. regarding their spatio-temporal stability. Figure 1 shows emergency event's spatial association and aggregation mapped in field cartograms for geoprofiling (a model space) with temporal variances and classified into different density intensity levels. Detailed predictive information based on spatio-temporal data analyses can help to plan efficiently, demand-and focus-oriented. The visualisation of spatio-temporal peculiarities represents a value in itself, because humans perceive the predominant part of information optically. In addition, the visualisation can lead to the sensitisation of practitioners in control centres for certain, if necessary, recurring and associated plannable deployment strategies. The mass data analysed for this purpose are past emergencies, i.e. operations involving the fire and rescue services. The mission data are available in their entirety. The aim is to identify significantly spatial, temporal or spatio-temporal aggregations of emergencies visually, and neither to calculate possible nor statistically probable event locations in space (e.g. by using methods of the Kriging family) [4] . Aggregations need to be defined and a distinction must be made here between hotpoints and hotspots. Hotpoints mark punctual locations where emergencies occur repeatedly within a defined temporal interval and which show significantly strong event intensities in their neigbourhood [3] . In contrast to hotpoints, hotspots describe areas. This is often formulated imprecisely in the literature. Hotspots can be understood as concentration or clustering of events in space. In criminalistics, these spaces are regarded as high crime areas [5] . It is often assumed that there is a connection between aggregations of events and their spatial association. It is likely that events of this kind will occur again in the space identified as a hotspot. However, not every location in a hotspot becomes a scene of such an event [5] . These definitions have in common that they assume the totality of all events would describe a hotspot. This means that every event, even a statistically random one, is taken into account when determining the hotspot and thus contributes to the hotspot's profile (appearance, form and expression, cf. Fig. 1, 2) . These assumptions are factually unfounded and mathematically incorrect. Here the problem is described and a process is developed, which exclusively feeds the hotspot-characterizing events, namely hotpoints, into the analysis and thus allows more objective, because rule-based, visualizations of spaces of high event densities. Therefore it is defined: Hotspots mark areas based on the existence of hotpoints. Hotspots consist of a high density of hotpoints. Each hotspot has at least one density center, i.e. an area of highest event density [3] . Depending on predefined time intervals, such as time of day or month, hotspots can be determined and statements can be made about the behavior of hotspots in terms of permanence or periodicity, mobility or movement patterns and aggregation or dissimilation [6, 7] . The results can be used for the purpose of more detailed situation assessments and form a basis for prognostic and geovisual analyses. Long-term hotspots become visible and analyzable for strategical and operational tasks. This makes it possible to identify performance and weaknesses in previous implementation practice and subsequently adaption of identified needs. The aim is to support fire brigades and rescue services in the planning of their means required with data analyses that go beyond descriptive statistics and to make mission-relevant information visible and communicable. In this way, together with the longterm experience of the emergency forces, a set of instruments is created that strengthens the ability to act, and supports adaptation processes to changing hazard situations. Related work presents a large variety of different methods that are often presented one after the other, their respective advantages and disadvantages being pointed out. The connecting elements between the approaches are rarely pointed out. As a result, separate analyses and visualizations are produced, whose individual expressiveness is limited [1, 2, 5, [8] [9] [10] . On the other hand, methods are combined in so many ways, for example in the approach of software systemically linked views, that the multiple visual representations of one or more variables are not always easy to interpret and can sometimes lead to the viewer being overwhelmed [12] . A central element of most authors is the neighbourhood of the events to be investigated -usually reduced to their density. The understanding of neighbourhood underlying this work is based on the spatial and temporal proximity of events or hotpoints and hotspots. Tobler's first law of geography describes this connection in general terms: "Everything is related to everything else, but near things are more related than distant things." [13] In this field of application, emergencies that are several kilometres apart cannot be described as spatially close. This is a scale problem in cartographic representation. It should be noted that events that are spatially distant from each other are not adjacent. The spatial distance has to be taken into account and separate events should not be combined into one area. HENGL is convinced: "[Standard] grid resolutions (20-200 m in most cases) with which we work today, will soon shift towards finer and finer, which means that we need to consider grid resolution in time context also." [14] In the real world, 20 m street length, for example in real-estate areas, means one to three single-family or terraced houses, whereas in the city centre it means several business floors and numerous residential units in vertical construction. Therefore, even shorter distances are not recommended if the information is to be generated on a large scale. Furthermore, this is a not insignificant data protection issue: identity-exact data analyses are not permitted in Germany in the non-police use. Furthermore, they are not relevant for the questions and methods presented here. These values were verified in the course of our own unpublished calculations and discussed with the control services of the professional fire brigades of Cologne and Berlin. The resources, i.e. vehicles of the emergency services, report the geo-coordinate of the operation. This is usually not congruent with the actual location of the operation. A 20 to 40 m walk is quite common, in some cases even more. With a minimum mesh size, as proposed by HENGL, misinterpretations of the field cartograms can therefore occur. In order to counteract this problem, it is recommended here to set the minimum edge length of the grid cell at 50 m. Larger distances or cell sizes, as they are often found in US-American crime analyses, for example, should also be critically evaluated. Due to the method of representation, field cartograms with coarsely meshed grid cells give the impression that events are homogeneously distributed in this large area. In the real world this can include two to three street crossings, shopping arcades and high-rise commercial buildings. The importance of cell sizes and search radii is a considerable one for KDE, which is realized in field cartograms. It should be noted here that the mesh or cell size of 50 m  50 m to 200 m  200 m is recommended for this field of application as a sensible distance measure for neighbourhood, i.e. the distance between events. In addition, spatial distances must be delimited from each other in case of semantic proximity. This is possible in urban areas by abstract segmentation of space and is based on the functional selection of subspaces even before geo and statistical analyses are applied. In the transition from model space to representation space the scale is introduced. Each representation space is scaled, this effect does not exist in the data space. In addition, the treated facts are scaled spatially-semantically. The considered data have semantic information and spatial coordinates. The former can be classified into categories. These categories were taken and analysed with respect to their spatial distribution in order to identify patterns. Territorial units such as street sections, building blocks, statistical and administrative units are deliberately avoided. Spatial units consist of semantic categories. These are based on the function of space. Examples of this are public playgrounds, railways, airports, industrial areas, bridges and residential areas. This cannot be mapped onto territorial units without causing misinterpretations. The semantic facts are categorized and space is defined from them. This does not correspond to the conventional way of first defining the room reference and then mapping the data to it. The transformation from model to real space is data-driven by focusing on the point position in the surface. The common denominator is formed by field cartograms as a special form of area cartograms. In concrete terms, regular square grids are formed. Thus a common area reference is available, in which different semantic and spatial questions can be processed. After analysis, the segments formed are reassembled in the final field map. In this way it is possible to reduce the amount of data (segmentation by selection) and to increase the information content in the result (aggregation). Similar to the determination of hotpoints, various methods of cartography and statistics can be used to calculate and visualize hotspots. The aim of the approach presented here is to extend the spatial reference from single point to area information. Therefor a transformation of the point geometry into a surface geometry is necessary. Known methods among others are interpolation by isolines, cluster analyses like NNH and K-Means, creation of choropleth maps in which the point information is related to areas as densities. Kriging and Inverse Distance Weighting are able to estimate intensities between known points and output them as area information. However, if a geoprofile is needed, the methods mentioned are not suitable. When choosing a method, the objective is of central importance. This involves the development of hotspot maps in which the location, distribution, density values and movement patterns of deployment hotspots in urban space become visible. Based on the real distribution of events both in space and time, the basic assumption is valid: fire and rescue operations do not take place everywhere in the entire urban area and are often not homogeneously distributed -neither in space nor in time. Furthermore, the focus is not on estimating the probability of occurrence of events, but rather on the information about neighbourhoods contained in the data in the form of spatio-temporal event densities. One method that has proven to be suitable and has been accepted in criminology over the past 30 years is the KDE. With respect to Nadaraya, a sufficiently large sample and a correspondingly selected bandwidth allow an arbitrary good estimation of the unknown event distribution by kernel density estimation [15] . This is a non-parametric estimation method for density interpolation. A pioneer in this field is PARZEN, who worked continuously on a probability density function since the 1950s [16, 17] . The prerequisite for the application of the continuous estimator for the density of the distribution of the individual events are random variables independent of each other. It is generally true that the deployments of the fire brigade and the rescue service are not interdependent and that random samples from the deployment database are unrelated. How KDE works: first of all, in a GIS a uniform grid is laid above the study area. The spatial reference is provided by the georeferenced base map data of the study area (e.g. UTM). The grid has a maximum north-south and west-east extension according to the boundaries of the investigation area. Starting from the cell center x the kernel function K moves from cell to cell and searches for events X_i within the fixed bandwidth d. The events that lie within this window are weighted according to their distance from the cell center. The cell center is therefore the point at which the density is estimated. The following applies: Events that lie closer to the center point receive a greater weight than events that lie further away. Finally, the summed up and averaged density value is transferred to the raster cell. The set screws of the KDE are the parameters kernel function, cell size and bandwidth. Kernel functions, or kernels, are estimation functions that work like normal weighting functions. There are numerous functions that can be used for this purpose. SILVERMAN has shown in empirical studies that the differences in the estimation results are minor depending on the choice of the core function: "It is quite remarkable that the efficiencies obtained are so close to one […] ." [15] LEVINE confirms this in principle, but shows depth-differentiated effects for geospatial application: "Each method of interpolation will produce slightly different results. Triangular and negative exponential functions tend to produce and emphasize many small hot […] spots and thus produce a "molted" appearance on […] [the] map. Quartile, uniform, and normal distribution functions tend to smooth the data more." [10] The choice of cell size influences the calculation of density values and the visual granularity, i.e. the graphical output of density values into the field cartogram. If one compares different cell sizes with the same data basis, the cartographic results sometimes differ considerably. Smaller cell sizes cause a smooth surface. Larger cell sizes produce a more granular surface in which hotspots can disappear. For comparability between hotspots of different time intervals, uniform cell sizes should always be chosen [7, 15] . The discussion about a fitting cell size was already conducted above. The results are applied here. The third parameter is the bandwidth (syn.: smoothing parameter, search radius). The bandwidth is fixed during the entire process and must be determined before use. It is generally agreed that the choice of this parameter represents the greatest challenge in KDE. Figure 2 illustrates the effects of different cell size and bandwidth on KDE geoprofiles. Each geoprofile allows different conclusions. A well-founded decision support and planning basis for authorities and organisations with safety and security tasks has to be done differently, because this scope for interpretation is not scientifically acceptable [7] . Hotspots based on the conventional bandwidth selection for KDE are usually created by the mass of data, i.e. by quantity [1, 2, 5, 8, 9, 20, [22] [23] [24] . The aggregation of the mass data, its weighting and neighborhood analysis by the GETIS-ORD Gi* [18, 19] statistics for bandwidth selection, which is anchored in the concept presented here, defines a quality. This enables the classification of the geodata into non-significant events as well as weakly significant, significant, high and highly significant hotpoints. This smaller subset of the original sample is fed into the KDE. It follows that each hotspot identified by the KDE is based on at least one weakly significant hotpoint in the center. By applying the Gi* test statistics and calculating the mean distances between the hotpoints, the spatial proximity is taken into account directly at KDE. As a result, the hotspots are more sharply (mutually) distinguishable. The preceding discussion of methods and parameter problems points out interfaces from which a complete process for the rule-based generation of geoprofiles can be derived, which can be automated to a large extent. A new way to set parameters of the KDE process is defined here. This is done by means of a Gi* test statistic preceding the KDE, taking into account the adapted cell size approach of HENGL, in combination with an extension of the distance approach of Williamson et al. [20, 21] by masking the Fig. 2 . Influence of the kde-relevant parameters cell size and bandwidth on hotspot field cartograms and their effects on the interpretability of geoprofiles [3] . investigation area into segments. The analysis is described conceptually as follows. Figure 3 shows the whole process modelled in UML. a. Data pre-processing: a data set to be analysed is selected from the data set. The selection is based on the operation type or several operation types of one or several time intervals. This new dataset is used for the upcoming analysis. b. Spatial pre-processing: The expert in the control centre has a thorough knowledge of study areas. He is able to divide the total space for the analysis into abstract segments, which can be delimited from other/adjacent spaces, e.g. by spatial-structural elements (e.g. streets, fences, water bodies or by functions of spaces), and to generate masks. 2. Hotpoint analysis: For this purpose, a new data set must be generated. Based on 1. a., all operations that are located at the same location are summarized. The number is saved as new attribute weight. This data set is fed into the Gi* test statistics and the hotpoints determined are classified. 3. Distance analysis: the distances between the hotpoints determined in step 2 within their spatial segments (masks) are calculated and the mean value d is calculated. 4. Hotspot analysis: d is now set as bandwidth. The hotpoint data are fed into the KDE procedure and classified. 5. Map products: as a result, at least two visual products are available: a hotpoint map and a hotspot map (geoprofile). This information can be displayed in a map. The interpretation of hotspots combined with the localisation of hotpoints can be important in the analysis and planning of requirements. 6. Furthermore, it is useful to create temporal hotspot series in order to carry out a change analysis. This is realized by choosing several time intervals. For applications in civil security research, this KDE procedure is thus considerably enhanced in terms of the quality of the process and results. For the first time, it is not individual decisions or aesthetic aspects that determine the visual result, nor are there any rules of thumb or formulas without reference to geo space. When choosing the two parameters cell size and bandwidth, both the position of the data points in relation to each other and the real spatial and temporal anchoring of these events are directly taken into account: characteristics of neighbourhood relations are embedded into the KDE by hotpoint analysis, subsequent data preparation, and distance determination. Rule-based process for geovisual analyses of massive spatio-temporal data of emergency events to generate geoprofiles [3] . The functionality of the KDE process will be demonstrated by means of a case study in Berlin. For this purpose, Table 1 describes the process step-by-step in graphic and textual form on the following pages. Table 1 . The conception of the rule-based process for generating a geoprofile of massive spatiotemporal datasets using the case study (41,498 emergency alerts for rescue vehicles in an abstract space segment within one calendar year) [3] process steps explanation raw data Load the complete application data into a database program. It contains all alert information. The data set is sufficiently large. The quality check follows: incorrect and incomplete entries are not available. A uniform georeference is available. Steps to clean up or delete data are not necessary. Apply the Gi* test statistics to the data set in ArcGIS. The search radius is set to 50 m. The newly created data set is saved. selection Within the new record, hot-points are selected and all other data is removed. The applied selection rule: z-score ≥ 1,65 and p-value ≥ 0,95. A total of 379 hotpoints were identified. These can be displayed in the GIS (point map). The hotpoint data record is saved. classification The hotpoints of the hotpoint data record are classified. The applied classification rules: z-score ≥ 3,28 and p-value ≥ 0,9995 := highly significant z-score ≥ 2,58 and p-value ≥ 0,995 to < 0,9995 := high significant z-score ≥ 1,96 and p-value ≥ 0,975 to < 0,995 := significant z-score ≥ 1,65 and p-value ≥ 0,95 to < 0,975 := low significant The point map shows all calculated hotpoints (points in shades of red) and all non-significant data (points in black). An exemplary representation of the regular grid is shown, because at the chosen display scale the 50 m x 50 m cells would be too small to be visually detectable as such. The distances of the hotpoints to each other are determined and then the arithmetic mean is calculated. This value is fed into the core density estimation as bandwidth d. (d = 345 m) In ArcGIS the KDE is based on the hotpoint dataset with the cell size 50 m x 50 m and the bandwidth d = 345 m. The result is the hotspot field cartogram. The central research question of this dissertation project is: How does an automatable process have to be described in order to analyse point mass data of real emergencies with regard to their location in space and time geovisually and to make them interpretable? To answer this question, a concept was developed which enables rule-based geovisual analyses of the continuous and mass data stored in the control centres of authorities and organisations with safety and security tasks. The concept is characterized by the standardization of central process steps and can be implemented in GIS, for example, via program interfaces. The goal of transforming the point into the surface in order to increase the overall informative value is achieved. The KDE procedure is discussed for this purpose and a problem solution is offered by targeted modelling of a new method process. Neighbourhood and density are described in the geoanalytical context of space and time and the concept of horizontality and verticality of event neighbourhoods is introduced. Furthermore, methods for hotpoint and hotspot analysis are presented, their deficits were identified and a generic approach to their elimination is presented. Based on this, the experts of the control centres can formulate requirements and recommendations for strategic, operational and tactical action on deployment planning in more detail. Through the combination of methods and the definition of comprehensible spatial and temporal variables, it has been possible to develop a process of analysis for the calculation and visualization of hotspots that is capable of standardisation and largely automated. The necessary factor expert knowledge is part of the definition of the abstract segmentation of the space for mask creation. These masks are stored in the overall process. (They can be easily adapted in case of changing spatial structures.) The innovative, generic analytics process designed here to determine hotspots based on KDE is an extension of the optimised KDE-process. The proven benefit lies in the calculability, availability of expert knowledge in process form and in improving the quality of analysis. The concept is more comprehensive than the conventional KDE process. At the same time, the quality of the geovisual analyses increases. Not all emergency events of the entire urban space are included in the determination of hotspots. With the integration of the hotpoint determination into the overall process, a necessary selection of significant and actually significant emergency events into the hotspot calculation takes place. This also reduces the enormous data volume and increases the physical computing power. Practice shows that the reduced data volume due to hotpoint-based selection does not always provide a sufficiently large sample for the core density estimation to be carried out. This circumstance is not to be understood as a deficit of the concept developed here. In statistics, the functionality of numerous methods is based on minimum data requirements. Test and estimation methods must not be applied without the compliance with these requirements, which is subject to proof. If these indications are ignored, the procedures may lead to erroneous results as well as misinterpretations of the real situation. If there is a complete lack of hotpoints in a spatial segment, despite a comparatively high compression of individual events, methods other than KDE must be applied in order to generate a spatial, cartographic visualization. The result is geoprofiles in the form of maps with standardised class formation, signature and colour values. These also provide a starting point for further analyses and strategic, tactical and operational planning steps. Crime Analysis and Crime Mapping Geographic profiling analysis: principles, methods and applications Konzeption und prototypische Umsetzung eines regelbasierten Prozesses zur geovisuellen Auswertung massiver raumzeitlicher Datenbestände von Feuerwehreinsätzen: Ein Beitrag zur Erweiterung der KDE für die Hotspot-Analyse im Spatial modeling and geovisualization of rental prices for real estate portals Space, Time, and Crime Civil security in urban spaces: adaptation of the KDE-method for optimized hotspot-analysis for emergency and rescue services Zivile Sicherheit in urbanen Räumen -Adaption des KDE-Verfahrens zur optimierten Hotspot Analyse für Behörden und Organisationen mit Sicherheitsaufgaben Kernel density estimation (KDE) vs. hot-spot analysis -detecting criminal hot spots in the City of San Francisco CrimeStat III. A Spatial Statistics Program for the Analysis of Crime Incident Locations (v3.1) Crime mapping: spatial and temporal challenges Visualize This: The FlowingData Guide to Design, Visualization, and Statistics A computer movie simulating urban growth in the Detroit region Finding the right pixel size Density Estimation for Statistics and Data Analysis On estimation of a probability density function and mode Nonparametric statistical data modeling The analysis of spatial association by use of distance statistics Local spatial autocorrelation statistics: dristributional issues and an application A better method to smooth crime incident data Tools in the spatial analysis of crime Examining the influence of cell size and bandwidth size on kernel density estimation crime hotspot maps for predicting spatial patterns of crime Kernel density estimation and hotspot mapping: examining the influence of interpolation method, grid cell size, and bandwidth on crime forecasting Crime Analysis: From First Report to Final Arrest Acknowledgements. The work presented here is part of the dissertation entitled "Konzeption und prototypische Umsetzung eines regelbasierten Prozesses zur geovisuellen Auswertung massiver raumzeitlicher Datenbestände von Feuerwehreinsätzen. Ein Beitrag zur Erweiterung der KDE für die Hotspot-Analyse im Kontext ziviler Sicherheitsforschung." and was successfully completed at the University of Potsdam, Germany, in 2019. The support of his conference contribution by the German Aerospace Center, Institute of Optical Sensor Systems, Department Security Research and Applications is gratefully acknowledged.