key: cord-0000312-eabho73n authors: Shahid, Rizwan; Bertazzon, Stefania; Knudtson, Merril L; Ghali, William A title: Comparison of distance measures in spatial analytical modeling for health service planning date: 2009-11-06 journal: BMC Health Serv Res DOI: 10.1186/1472-6963-9-200 sha: 4961ab3b26b0165a3f1b267fc776329d792fc5f2 doc_id: 312 cord_uid: eabho73n BACKGROUND: Several methodological approaches have been used to estimate distance in health service research. In this study, focusing on cardiac catheterization services, Euclidean, Manhattan, and the less widely known Minkowski distance metrics are used to estimate distances from patient residence to hospital. Distance metrics typically produce less accurate estimates than actual measurements, but each metric provides a single model of travel over a given network. Therefore, distance metrics, unlike actual measurements, can be directly used in spatial analytical modeling. Euclidean distance is most often used, but unlikely the most appropriate metric. Minkowski distance is a more promising method. Distances estimated with each metric are contrasted with road distance and travel time measurements, and an optimized Minkowski distance is implemented in spatial analytical modeling. METHODS: Road distance and travel time are calculated from the postal code of residence of each patient undergoing cardiac catheterization to the pertinent hospital. The Minkowski metric is optimized, to approximate travel time and road distance, respectively. Distance estimates and distance measurements are then compared using descriptive statistics and visual mapping methods. The optimized Minkowski metric is implemented, via the spatial weight matrix, in a spatial regression model identifying socio-economic factors significantly associated with cardiac catheterization. RESULTS: The Minkowski coefficient that best approximates road distance is 1.54; 1.31 best approximates travel time. The latter is also a good predictor of road distance, thus providing the best single model of travel from patient's residence to hospital. The Euclidean metric and the optimal Minkowski metric are alternatively implemented in the regression model, and the results compared. The Minkowski method produces more reliable results than the traditional Euclidean metric. CONCLUSION: Road distance and travel time measurements are the most accurate estimates, but cannot be directly implemented in spatial analytical modeling. Euclidean distance tends to underestimate road distance and travel time; Manhattan distance tends to overestimate both. The optimized Minkowski distance partially overcomes their shortcomings; it provides a single model of travel over the network. The method is flexible, suitable for analytical modeling, and more accurate than the traditional metrics; its use ultimately increases the reliability of spatial analytical models. Health service research is concerned with the investigation of how social, financial, organizational, technological, and behavioral factors affect access to health care, the quality and cost of health care, and ultimately health and well-being [1] . Distance plays a vital role in studies assessing spatial disease patterns as well as access to hospital services. In a highly complex health care environment, even micro-geographic differences in the availability of tertiary services can affect access to care [2, 3] . The study of distances from patient homes to the nearest hospital is an example where distance is often studied as a crude but objective indicator of geographic accessibility to hospital services [4] . In such situations, the measurement of actual travel distance (or travel time) on a road network is clearly the most appropriate method [5] . Health service research, however, encompasses a much broader investigation area, where spatial analytical models are employed to assist in the provision of effective accessibility to health care services. Distance is often used indirectly in these types of analysis as one of the parameters defining the model's thrust and its results. In a rapidly changing physical and social environment, transportation means and travel modes change quickly as do epidemic transmission modes, overturning traditional ways of conceptualizing and measuring distance [6] . A commonly-used distance metric is the Euclidean distance, a straight line distance measurement between two points, 'as the crow flies' [7, 8] . This method is simple and intuitive, but very few are the applications where it can yield accurate distance estimates. An alternative, well-known distance metric is the Manhattan, or taxi-cab distance: as its name suggests, it is most appropriate for grid-like road networks, typical of many North American cities, characterized by a rectangular city block pattern. The Manhattan metric measures distance between points along a rectangular path with right angle turns [9, 10] . Most commonly, travel along road networks involves a mixture of Euclidean, Manhattan, and curvilinear trajectories. There is no firm consensus on methods for selecting a distance metric [11] , nor is there much published information on the extent to which Euclidean, Manhattan, and road distances relate to one another in applied distance analysis [12, 13] . Travel along a complex, or mixed network can be usefully modeled by a class of distance metrics, known as Minkowski distance [14] , which is a general distance metric, of which the Euclidean and Manhattan metrics are special cases. This array of metrics provides flexibility and generality, in that, within a single class of metrics, a range of parameters can be selected; therefore, a single yet flexible method for measuring distance can be defined for the optimal estimation of distance on a variety of empirical road networks. One further important aspect is the funda-mental role of time in accessing health care services: if distance is a crude estimate of accessibility, travel time is a more relevant estimate. Travel time computation is no longer a prohibitively time consuming and computationally intensive task, thanks to powerful GIS software, hardware, and rich road network datasets [14] [15] [16] [17] . However, actual travel time on a road network is highly variable due to local (spatial and temporal) conditions which are hardly predictable and controllable. Because of this characteristic, travel time computations lack general validity, requiring adjustments to account for specific temporal conditions, e.g., weekend vs. weekdays, rush vs. non-rush hours, season, and weather, as well as local spatial conditions, e.g., local traffic congestion, lane closures, or proximity to amenities or popular destinations. All these reasons hamper the implementation of travel time computations in spatial analytical models, since even local analytical models require the definition of a single rule for distance measurement. A crude solution to this problem is the use of average travel time in spatial models; a more realistic solution can be obtained through the use of Minkowski distance: an optimal value of Minkowski distance can be selected to model travel time on a complex road network. Spatial data tend to exhibit characteristics that negatively impact the statistical properties of quantitative models, decreasing their reliability: spatial analytical models are designed to mitigate these negative effects. The most crucial properties of spatial data are spatial dependence (near things tend to be more similar than distant things) and non-stationarity (inconstant variability of phenomena across space) [18] . Two broad categories of spatial analytical models include spatially autoregressive (SAR) methods, which deal with spatial dependence [19] , and geographically weighted (GWR) methods, which deal with spatial non-stationarity [20] . In empirical situations, spatial dependencies and non-stationarities take up specific forms, which are a function of many factors, including the nature of the phenomena under investigation and the representation of space underpinning the model. For this reason, a simplistic application of spatial analysis, one that does not carefully model the salient aspects of phenomena, often fails to fulfill the model's primary objective, which is to enhance the model reliability. The transition from a simplistic to a customized implementation of spatial analysis requires the calibration of each parameter defining the analysis: one of the most crucial parameters, affecting the analytical results, is the distance measurement method. Cardiac catheterization is a procedure that is performed to determine presence or absence of coronary artery blockages. The procedure involves the percutaneous insertion of a catheter into the arterial system, after which it is guided into the aorta where the coronary arteries are positioned. Contrast dye is then injected into the coronary arteries so that blockages can be located and identified. In some instances, cardiac catheterization can lead to immediate use of percutaneous coronary intervention with balloon angioplasty and the insertion of coronary stents that open up partially or completely blocked arteries to restore blood flow. In some instances, this procedure is performed in stable patients where distance and travel times are a minor concern. In other instances, however, the procedure is done urgently, and for such situations, consideration of distances and travel times become a central consideration in the planning of health services. In the context of an applied study of distance between patient residence and a tertiary cardiac catheterization facility in a large city, this paper analyzes the effectiveness of a selection of distance metrics in providing a useful model of travel distance and travel time along an urban road network. The comparison of different metrics leads to the identification of a metric that is conceptually sound and computationally effective. The metric thus identified is experimentally used in a spatial autoregressive model analyzing the spatial distribution of cardiac catheterization cases in the city. The study area encompasses the City of Calgary, one of the largest Canadian cities, with approximately 1 million residents [21], distributed over a large geographic area (roughly 750 Km 2 ), characterized by diversity of population, housing type, residential density, and accessibility to heath services. Cardiac catheterization is an invasive procedure for patients experiencing cardiovascular symptoms and defines coronary anatomy, left ventricular and valvular function; it provides important prognostic information for individuals affected by cardiovascular conditions [22] . During the study period the procedure was only performed at the Foothills Medical Centre, located in the northwest of the city. Three types of data are used in this study: cardiac catheterization patient database, postal code locations, and the Calgary road network. Cardiac catheterization patient data were obtained from the Alberta Provincial Project for Outcome Assessment in Coronary Heart Disease (APPROACH), an ongoing data collection initiative, begun in 1995, producing information on all patients undergoing catheterization in Alberta [22] . The data are released at the postal code spatial aggregation level. Data were extracted for Calgary residents only and catheterizations performed over the year 2002, resulting in a total of 2, 445 catheterization cases, distributed over 2, 138 postal codes. A postal code conversion file (PCCF) [23] was obtained from Statistics Canada. Only postal codes that have at least one catheterization case are retained for the analysis. It shall be observed that postal code locations refer to the primary residence of catheterization patients, not to the place where symptoms were felt or where emergency care was first administered. The Calgary road network data were obtained from the University of Calgary data holdings, based on street information collected and compiled in 2005 by DMTI Spatial [24] . This road network was used to calculate shortest road distances from patient residence location to hospital for cardiac catheterization services. Straight line (Euclidean) distance and Manhattan distance are often used in health service research [25] . Each of these distance metrics may appropriately estimate distance in some parts of a study area, but their application at the city level tends to yield large errors in areas that depart from the dominant pattern, and may lead to highly inaccurate distance estimations. One of the reasons for using Euclidean and/or Manhattan distance is the relative ease of their implementation; in contrast, it is more problematic to design algorithms implementing actual road network distance in spatial analytical models. In order to reduce the error associated with the Euclidean and Manhattan metrics while maintaining the computational simplicity of a single, intuitive mathematical formula, the general Minkowski metric is examined, to devise a single method that best approximates the average pattern of an empirical road network. Optimizing values of the Minkowski formula are calculated for road distance as well as travel time; the results are compared with more traditional distance measures in the context of assessing geographic accessibility to cardiac facilities. The Minkowski distance has the potential to provide a more accurate estimate of road network distance and travel time than the Euclidean and Manhattan metrics. A set of 2, 138 distances between each patient's postal code of residence and the Foothills hospital are calculated according to each of the distance measurement methods considered. The geographic locations of each postal code from the PCCF and the hospital are recorded in latitude and longitude; therefore, in order to implement distance computations, the road network is projected using an equidistant projection system, which is chosen in order to preserve distance and produce consistent distance measurements [26] . Latitude and longitude coordinates are then converted into Eastings and Northings, i.e., x and y values, expressed in kilometers. Alternative methods could have been used, for example the great circle distance formula [27] , which, however, provides rougher estimations. The ArcGIS 9.3 [28] Geometry calculator was used to calculate the x and y coordinates based on the projected dataset and the resulting x and y values were used in the distance formulas defined below. Euclidean [7, 8] , Manhattan [9, 10] , and Minkowski [14] distance can be calculated by the formula: where, for this application: d is the distance between a patient's residence and the hospital; x i , y i are the geographic coordinates of the centroid of each postal code of residence; x j , y j are the geographic coordinates of the Foothills hospital. The generic p parameter in Equation 1 can be replaced by the value 2 to yield the well known Euclidean distance; the value 1 would yield the Manhattan distance, and all the intermediate values in the in the [1