key: cord-0026836-m5dd5qmb authors: nan title: Predicting the Travel Distance of Patients to Access Healthcare Using Deep Neural Networks date: 2021-12-08 journal: IEEE J Transl Eng Health Med DOI: 10.1109/jtehm.2021.3134106 sha: b8d44c1d48d1810eb6487e35dd41e032a20cefcd doc_id: 26836 cord_uid: m5dd5qmb Objective: Improving geographical access remains a key issue in determining the sufficiency of regional medical resources during health policy design. However, patient choices can be the result of the complex interactivity of various factors. The aim of this study is to propose a deep neural network approach to model the complex decision of patient choice in travel distance to access care, which is an important indicator for policymaking in allocating resources. Method: We used the 4-year nationwide insurance data of Taiwan and accumulated the possible features discussed in earlier literature. This study proposes the use of a convolutional neural network (CNN)-based framework to make predictions. The model performance was tested against other machine learning methods. The proposed framework was further interpreted using Integrated Gradients (IG) to analyze the feature weights. Results: We successfully demonstrated the effectiveness of using a CNN-based framework to predict the travel distance of patients, achieving an accuracy of 0.968, AUC of 0.969, sensitivity of 0.960, and specificity of 0.989. The CNN-based framework outperformed all other methods. In this research, the IG weights are potentially explainable; however, the relationship does not correspond to known indicators in public health. Conclusions: Our results demonstrate the feasibility of the deep learning-based travel distance prediction model. It has the potential to guide policymaking in resource allocation. Clinical and Translational Impact Statement— Deep learning technology is feasible in investigating the distance that patients would travel while accessing care. It is a tool that integrates complex interactive variables with highly imbalanced data distributions. It is ideal for people to receive sufficient healthcare services without the need to travel a distance. However, nonmedical financial obstacles, such as transportation and high travel burdens, have been acknowledged as key barriers in accessing healthcare [1] , [2] . Several studies have identified that patients are willing to travel farther distances under certain circumstances. For example, [3] noted that patients with chronic illnesses travel approximately two-thirds of the distance to a physician compare to those who report no chronic illnesses. Further, [4] observed that half of the patients expressed that they would travel a distance to reduce the time on a waiting list for surgery. Other studies have identified an association between patient travel distance and disease severity [1] , [4] , [5] . However, studies have shown that the burden of travel makes treatment an unaffordable option [6] . Furthermore, [7] showed that patient health outcomes (for example, survival rates, length of hospital stay, and nonattendance at follow-up) decrease gradually as the distance to the healthcare facilities increases. Some studies noted VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ that follow-up services must be geographically accessible to ensure utilization, irrespective of insurance status [8] , [9] . Previous studies have used econometrics to predict patient choice. For example, [10] built patient choice models to predict the hospital that a patient would go to in the region using multinomial logit (MNL) and utility-maximizing nested logit. Furthermore, [11] distinguished patient choice between hospital-based and clinic-based care using a two-level nested MNL model, and [12] described the impact of quality on hospital choice using MNL. Generally, the purpose of econometrics is to estimate the interaction of variables and to explain causality, whereas it does not make predictions that indicate an action [10] , [13] . Improving geographical access remains a key issue in health policy design [6] , [17] . Travel distance is an essential piece of information that affects a patient's choice [10] , [13] - [15] . Evaluating the travel distances of patients is a way of investigating whether the medical resources of the area are sufficient, estimating potential medical demands, and the tolerable distance that an individual is willing to travel [4] , [16] . It provides important guidance for allocating resources from the perspective of policy [16] , [18] . Most earlier studies mainly relied on conventional statistics and econometrics to analyze the impact of specific variables under certain hypotheses, preconditions, and limited patient groups. This involved in ruled out the potential confounding variables under balanced label sampling [13] , [19] . However, it may be insufficient to support decision-making in reality when patient choices can result from a complex interaction of various factors and circumstances [4] , [6] , [19] - [22] . It is difficult to determine the insufficiency of regional medical resources without a tool that can integrate complex interactive variables and illustrate their interactivity under highly imbalanced circumstances. Such restrictions limit policymakers' decisions based on their experiences and interpretations. Machine learning is known to be capable of processing multidimensional features and provides a generalized prediction [13] , [23] . It has exhibited excellent performance in the medical domain, such as early risk detection [24] , [25] , mortality prediction [26] , [27] , symptom classification [28] , and patient admission prediction [29] . The aim of this study is to propose a framework using a deep learning approach to predict the travel distance of a patient. The proposed approach has the potential to support decision making in policy design. To the best of our knowledge, this is the first study to demonstrate the feasibility of deep learning techniques in predicting the travel distance of patients to access healthcare. The remainder of this paper is organized as follows. The methods section illustrates the retrieval of data, extraction of features, and the predicted target. Furthermore, it introduces machine learning methods, training strategies, evaluation indicators, and model interpretation methods. The results section illustrates the prediction and interpretation of the outcomes. The discussion section thoroughly interprets our findings. Finally, the conclusion section concludes our findings. The training and testing processes used in study are illustrated in Fig. 1 . The following subsections introduce our method according to different phases, including data collection, data preprocessing, model training, prediction, performance evaluation, and interpretation of the prediction model. The data used in this research were the insurance claims from two million clinical declaration files and the Registry for Beneficiaries files from the Taiwan National Health Insurance Research Database (NHIRD), dated January 1, 2008 to December 31, 2011. The data released were originally sampled to ensure their representation of the population across Taiwan. The files included demographic information and visiting records of outpatients and emergency settings. In addition, four publicly announced data were included. One is the ''physician density'' information that referred to the number of practicing physicians serving per 10,000 people in each region of Taiwan [30] . Second, the national calendar was used to distinguish between workdays, weekends, and national holidays. Third, the center latitude and longitude of each district was used to calculate the travel distance of each visit. Fourth, the number of medical institutes and medical staff in each region, which were used to calculate the healthcare accessibility index (acc. index) based on the adjustment of the enhanced two-stage floating catchment area (2SFCA) method. The 2SFCA is a way to evaluate the local accessibility of medical care based on the regional physicianto-population ratio and the weight of distance decay effect (that is, the farther the distance is required to travel, the less likely an individual is to use a healthcare service) [2] , [17] , [31] , [32] . This study was approved by the Research Ethics Committee at the National Taiwan University (No. 202004EM035 and No. 202104EM038) and waived the requirement for informed patient consent for the data, which had already been de-identified before analysis. This study extracted 25 features that were accumulated from the earlier literature [34] - [39] , as listed in Table 1 , and the detailed calculation equations are summarized in the Appendix [40] . Incomplete or questionable data, such as individuals without a birthdate or gender (or with two genders), records without a date, a birthdate later than the visit date, patients without any visiting records, patients without a primary diagnosis, incomplete information of visiting hospitals, patients unable to determine their place of residence (POR), and places that could not indicate the acc. index, were excluded. To avoid having dominated predictors, a Pearson correlation test was performed, presenting with a heatmap to demonstrate the correlation between all features and the prediction target. The targeted prediction outcome in this study was the travel distance of the patients. The POR and the location of the hospital were required to calculate the travel distance for each visit. Owing to the nature of privacy protection, the NHIRD provides the POR of each patient with an approximate district that the patient registers as a hometown when filling the insurance form. However, registered districts often differ from [42] . The nearby district list was generated using GeoDa version 1.14.0, an open geographic information system (GIS) software. The rules are slightly reduced because family members and relatives are not considered in this study. Rule 1: Whether the patient belongs to a type A identity. Type A identity is the type of insurance identity that people are required to register at their POR. Rule 2: Accessing care owing to flu or respiratory infection conditions. Rule 3: Is the visiting institute in a nearby registered district? Rule 4: At least having two emergency service records. Rule 5: Is the emergency service institute in a nearby registered district? where the patient actually lives. People may leave their hometown and live at an alternative location for several reasons, which makes it difficult to capture the travel distance. This study adopted the method proposed by Lin et al. [41] to obtain the estimated POR. It mainly used the records of treatment for flu, respiratory infection conditions, or emergency services to determine the POR of patients. These types of care services are less likely to travel far away. The estimated rules are shown in Fig. 2 . This study identified the center of the POR district and the center of the hospital district with latitude and longitude, and calculated the distance between the two centers to obtain the estimated travel distance [2] , [17] , [31] . Considering the characteristics of the distance decay effect, impedance differentiations among areas [4] , [32] , [33] , and that the distance was determined through approximation, we categorized the distance into four levels (<5 km, 5-10 km, 10-15 km, and >15 km). Approaching the prediction with approximate labels instead of actual values was considered to result in better generalization and more straightforward in practical use. After the preprocessing, all the numerical values (including age, number of diseases and chronic diseases, number of visits, number of votes as the most frequent provider continuity (MFPC), the least frequent provider continuity (LFPC), physician density, Charlson comorbidity index (CCI), accessibility index of each region, and disease importance rate (DIR)) were normalized between −1 and 1. The categorical features (including gender, regional groups, and the indication of low income, surgery, emergency service, severe VOLUME 10, 2022 condition, and workdays) were transformed into one-hot encoding [43] , which converts a categorical variable that has n values into n variables. The numeric and categorical features were then concatenated into a patient visit vector to represent each event. Each physician visit was seemed as an independent visit event. Afterward, the data were randomly split into training and testing data at an 80:20 ratio. Owning to the imbalanced distribution of patient travel distance, the training data were further randomly undersampled to reach a balanced label among the four distance labels. The training data were then randomly separated into five subsets. The training process included the rotation of each subset as a validation subset whereas the others acted as a training subset. On the other hand, considering that the models were required to face the actual behavior of patient travel, the testing data remained in its original label distribution, as the final evaluation of this research remains in its original label distribution. This study proposes a convolutional neural network (CNN)-based framework for travel distance prediction. The proposed framework was tested against three conventional machine learning methods and a deep learning method to demonstrate its effectiveness. The three conventional machine learning methods were regression analysis (RA), random forest (RF), and support vector machine (SVM). Because the conventional machine learning method does not inherit the ability of feature selection, principal component analysis (PCA) is added to conventional machine learning methods for feature reduction. That is, conventional machine learning methods underwent two preprocessing steps, one with PCA and the other without PCA. PCA is a multivariate statistical technique that extracts important information from the data and expresses it as a set of new orthogonal variables, known as the principal components [44] , [45] . In our design, the number of components to be kept was set to 95%. The deep learning method used was a multilayer perceptron (MLP) and the proposed CNN-based framework. Deep learning is known to inherit the ability to extract useful features automatically [46] - [48] ; hence, PCA was not applied to MLP and CNN. The following are a brief introduction of the used methods used, including the conventional machine learning methods (i.e., RA, RF, and SVM) and deep learning methods (i.e., ML and CNN): where Y and X represents the collection of all samples y (i) and x (i) , respectively;Ŷ =β X denotes the vector of the fitted value [49] , [50] . In our design, the RA used the logistic regression method with a one-vs-rest scheme to model multiclass prediction. RF is a tree-based classifier that ensembles the results of multiple decision trees. By setting the number of decision trees to be generated and the number of features to be selected, RF was tested for the best split when growing the trees. It returns the probabilities of the averaging classes of the produced trees for classification tasks or the general mean value of trees for regression tasks. The performance is considered better than that of a single classifier [51] , [52] . In our design, the user-defined number of trees to be generated was set to 100, and the number of features to determine the best split was set to five. The basic idea of SVM is to establish a hyperplane that can maximize the distance between the plane and the nearest data [53] , [54] . The hyperplane f(x) that separates the given data can be denoted as f( where M denotes the number of samples, and the inputs are x i , where i = {1, 2, . . . , M }. In our design, the C parameter, which indicates the tolerance degree of misclassification, was set to 0.2. MLP is a complex version of an artificial neural network that contains multiple hidden layers [55] , [56] , where every neuron in layer i is fully connected to every other neuron in layer i + 1. In a multi-layer neural network, each layer of the network is trained to produce a higher level of representation of the observed pattern [57] , [58] . The computation of MLP can be denoted asŷ = σ d j=1 x j w ij + b ij , where each hidden layer computes a weight w ij and a bias b ij of the output from the previous layer, followed by a nonlinear activation function σ that calculates the sum as outputs. The number of units in the previous layer is represented by d, and the output of the previous layer is represented by x j . In our design, the proposed MLP model contained 25 input nodes (based on input features) and five hidden layers containing 1500 neurons each. Rectified linear unit (ReLU) activation functions were used between each layer. Four output nodes symbolized the four categorized levels of distance. The concept of CNN is to extract meaningful information from the spatial pattern of data, which is primarily used for pattern recognition within images. CNNs comprise three types of layers: convolutional (Conv) layers, pooling layers, and fully-connected (FC) layers. The Conv layer uses a small array of numbers, called a kernel, to repeatedly apply across the input, and calculate an element-wise product between the input and the kernel to extract the spatial dimensionality, as shown in Fig. 3 . The element-wise product is then summed up to obtain the output value in the corresponding position, called a feature map. The pooling layers operate over each feature map; maxpooling, for example, takes only the maximum value of a certain size of matrix on the feature map, shown in Fig. 4 , and denotes it as the feature in that section. Finally, the FC layer, which is analogous to the MLP form, predicts the final outcome based on the pooling results [59] - [61] . In this study, we used a CNN to predict the distance that the patients would travel. The design of the proposed model is illustrated in Fig. 5 . In our design, four one-dimensional convolutional (Conv1D) layers with a ReLU activation function and max-pooling were used. The convolution kernel size was set to 3, stride was set to 1, and the width of the max-pooling was 3. The FC layer had two hidden layers each containing 500 neurons. The final fully-connected output layer (FCO) consists of four output nodes to condense the result to a fourcategorized output. This study further demonstrated how adding layers to the proposed framework contributed to the prediction results and achieved optimization. The process included 1-layer Conv+1-layer FCO, 2-layer Conv+1-layer FCO, 3-layer Conv+1-layer FCO, 4-layer Conv+1-layer FCO, 4-layer Conv+1-layer FC+1-layer FCO, and finally, the proposed framework with 4-layer Conv+2-layer FC+1-layer FCO. A ReLU activation function was included in every FC layer to achieve nonlinear transformation. The prediction model was evaluated using indicators including the receiver operating characteristics (ROC) curve, area under the receiver operating characteristics curve (AUC), accuracy, sensitivity, specificity, precision, and F1 score [62] . (1)-(5) demonstrate the calculation of the indicators. For a multi-class classification of travel distance, the macroaverage was used to generalize the performance index, which computed the metric independently for each class and then obtained the average to consider each class equally. The AUC used the one-vs-rest scheme to demonstrate the general performance. Although deep learning models can achieve excellent performance, the model cannot explain how and why the model reaches its prediction, known as the black-box problem [47] , [63] . The lack of transparency in the model is a serious barrier to implementing the application in practice [64] . This study explored the interpretation of the proposed model using the Integrated Gradients (IG) method [65] . Using a function F : R n → [0, 1] to represent a deep neuron network, let input x ∈ R n and baseline input x ∈ R n . IG considers the straight-line path from the baseline x to the input x, and computes all the integral gradients at all points along the path. The IG is obtained by cumulating these gradients, denoted as in (6) , where the baseline is commonly chosen as near-zero F(x ) ≈ 0 so that it can be ignored and represents the weight of the individual input feature. i represents the i th dimension along the x and x paths. The IG was further averaged to obtain a weight for each input feature. We used an open-source interpretation library Captum [66] to implement the IG. This study was implemented with Python version 3.7.6, combined with PyTorch framework 1.1.0, scikit-learn 0.22.2, and Captum 0.3.1. A total of 7,139,603 visiting records of patients were included in the analysis, of which 65% of the patients had traveled less than 5 km, 17% had traveled 5-10 km, 6% had traveled 10-15 km, and 11% had traveled more than 15 km. Demographic information of the patients is shown in Table 2 . The number of distance labels each patient traveled across accumulates up to 1.89 (SD = 0.80), and the headcount for each distance label shows that 92.45% of the patients have a record of choosing institutes that are nearby (<5 km). Meanwhile, 32.01% of the patients had a record of choosing institutes that were far away (>15 km). According to the correlation heatmap shown in Fig. 6 , six pairs of correlated features exceeded ±0.7: the total number of diseases and the total number of chronic diseases (0.998), MFPC and LFPC (0.986), age and CCI score (0.879), usual provider of care (UPC) and sequential continuity of care index (SECON) (0.778), the total number of visits and the total number of diseases (0.759), and the total number of visits and the total number of chronic diseases (0.758). None of the features appeared to be a dominat feature (correlation exceeding ±0.7) towards the distance, nor was the distance in level form or continuous value form. The results indicated that the proposed CNN-based framework exceeded the performance of all other methods, achieving an accuracy of 0.968, AUC of 0.969, sensitivity of 0.960, and specificity of 0.989, as shown in Table 5 . Fig. 7 shows the ROC curve of the CNN-based framework against all other methods. Fig. 8 shows the contributions of the layers added to the proposed framework. The detailed performance values are listed Table 6 in the appendix. The interpretation results are shown in Fig. 9 . The detailed values are listed Table 7 in the appendix. The top three weighted features were the LFPC, physician density, and the number of chronic diseases. The last three weighted features were the MFPC, total number of visits, and SECOC. In our study, we successfully demonstrated the effectiveness of using a deep learning-based framework to predict the travel distance of patients. The results indicated that deep learning methods (MLP and CNN) outperformed conventional machine learning methods (RA, RF, and SVM) in processing structuralized insurance data. The proposed CNN-based framework outperformed all other methods. The performance converges and is optimized by adding layers by layers. Although the models were trained based on balanced label sampling, the results showed that the model was feasible when the testing data were highly imbalanced [19] , [21] . This implies that the designed framework can be applied in real practice. In addition, we adopted a cross-validation strategy as an act of generalizing data distribution. The reported result is based on the validation of unseen data, which were isolated before the training process. This avoids the potential for overfitting in the prediction model. Meanwhile, based on the number of distance labels that a patient traveled and the number of headcounts for each distance label, both results indicate that patients may change their choice of travel distance under different circumstances; therefore, each visit event should be considered as independent. Although conventional machine learning methods are known to require the process of feature selection, our results showed that, with or without the PCA process, performance did not differ much; nonetheless, the features used decreased from 25 to 16. This indicates that after PCA, conventional machine learning methods can use fewer features and resulting in an approximate performance. Notably, conventional machine learning methods were incapable of handling large amounts of data, whereas SVM failed to converge without increasing parameter C. The insurance and public health data include nationwide records, which are typically large and sparse. In contrast, the deep learning approach is known to extract useful information automatically without additional preprocessing [46] - [48] . This has the potential to achieve an VOLUME 10, 2022 ideal end-to-end system that requires no further interference after data input. In addition, it has no difficulty in processing a large volume of data. Combined with the observation of performance results, we conclude that deep neural networks are a better choice for implementation in public health problems such as predicting the travel distance of patients. According to the IG interpretation results, the effect of the features on the prediction results can be positive or negative. Although all the features included were extracted based on earlier studies (meaning that they should all appear to be significant features), not all variables appear to be as effective. Some of the features are weighted approximately to zero, indicating that the scenario in Taiwan may appear differently. In general, patient preference (LFPC and MFPC), medical resources in each region (physician density and region groups), the complexity of diseases (number of chronic diseases), and the regularity of visits (total number of visits and SECOC) were weighted as the most effective features of the prediction results. It is worth noting that LFPC and MFPC are weighted as the top and last features, respectively, symbolizing that patient preference affects significantly and in different directions (positively and negatively). This indicates that the IG weights are potentially explainable and can be related to disciplinary knowledge in public health. In addition, although some of the features shows correlated with each other in the Pearson correlation test, the calculated IG weight did not appear correspondingly (such as the total number of chronic diseases and the total number of diseases). Traditionally, acc. index and CCI score are considered more sophisticated indicators to identify the density of medical resources and the complexity of disease. Their IG weights did not exceed the value of physician density and the number of chronic diseases, which are indicators that provided less insight. Both results indicates that common consensus, such as correlation or sophisticated indicators, may not necessarily affect the prediction model simultaneously. The discipline of social science typically focuses on clarifying the causality and interrelationship of variables that affect patients' access to healthcare services. However, the machine learning approach attempted to provide a prediction that integrates the interaction results of the variables [79] . Further studies are required to interpret the differences in between. Ensuring freedom of choice for patients is valued across countries. It has the potential to empower patients by prompting providers to compete for patients through a customermarket mechanism, improving care quality, efficiency, and wait time [67] - [76] . In Taiwan, freedom is assured under the universal coverage of National Health Insurance [77] , [78] , which is a perfect field for observing patient choice. The decision-making process in policymaking generally involves current status investigation, policy design and evaluation, and post-implementation evaluation. Our work focused on supporting the first phase, whereas the model was capable of simulating patient choice with up to 96% accuracy. The policymaker can input patient data based on household registration in a particular region and investigate the distance that the patients would travel. Such information led to the evaluation of whether the patients considered the medical resources in the region to be sufficient or whether there are other considerations that induce them to travel further. In addition, the prediction result of patient choice has the potential to direct us in understanding patients' reactions under the current medical resource allocation. Based on the characteristics of patients' choices, the distance they would travel can be further analyzed, and the conclusion is informative in policy design. Our work led to a more precise current status investigation, and as a return, may potentially lead to more precise resource allocation. The prediction was based on the trajectory of the de-identified patient-visit data commonly collected by insurance companies. Therefore, the model is highly achievable elsewhere, as it does not involve complex information that is difficult to collect or violates patient privacy. However, because of the nature of the de-identified data, the distance to travel can only be determined based on projection and assumption and cannot be validated accurately, which is a limitation of this study. Applications using deep learning technology are promising in healthcare policymaking, and further investigation is encouraged before industrialization, such as the discussion of individual impacts of each feature and its effect on the model performance and resource allocation. This study successfully demonstrated the effectiveness of using machine learning to predict the travel distance of patients. The proposed CNN-based framework performs well in processing structured insurance data. It was capable of handling complex combinations of features and imbalanced datasets, which is commonly noted when facing the problem of patient choice. Age, gender, low income (Yes/No), total number of visits, total number of diseases, total number of chronic diseases, four continuity indicators, and a disease complexity index were included as characteristics of the patients. Age was determined according to the birthdate and visit date. The denotation of gender and low income was the status identified when entering the insurance. The total number of visits was calculated as the number of visiting records during the study period. The total number of diseases and chronic diseases were identified using the International Classification of Diseases (ICD) codes for each patient. The four continuity indicators captured the duration, density, dispersion, and continuity of an individual while accessing care [34] - [36] . To track the duration and density of patients accessing care, the UPC and least usual provider of care (LUPC) were used to indicate the visiting ratio of medical institutes. The UPC represented the most frequently visited institute, while LUPC represented the least frequently visited institute. The patients were required to visit the provider at least once to denote an institute as LUPC. The SECOC was used to calculate the change in providers, representing the dispersion of care. The continuity of care index (COCI) is a single indicator that represents the continuity of care for an individual. The calculation of the four continuity indicators is shown in (7) to (10) , where N denotes the total number of visits, n i denotes the number of visits to the ith provider, k denotes the number of providers once visited, and C j is denoted as 1 when the jth provider is the same as the (j + 1)th provider (0, if not). In addition, we included the CCI score [37] for the disease complexity of the individual. LUPC = min n i N (8) We characterized the providers using four indicators: the physician density of each region, MFPC, LFPC, and acc. index. Regarding patients' ''vote with their feet'' in choosing providers by themselves, we calculated the MFPC and LFPC to represent the patients' experiences and recommendations for each institute [38] , [39] . The MFPC represents the frequency of being voted as the UPC, and the LFPC represents the frequency of being voted as the LUPC. Each patient was able to vote for only one MFPC or LFPC. The calculations are shown in (11) and (12), where p indicates the total number of patients, and UPC i and LUPC i indicate the ith patient who voted the provider as the UPC or LUPC, respectively. The acc. index of each region was calculated based on the adjustment of the enhanced two-stage floating catchment area method [2] , [17] , [31] , [32] , where the distance decay of each regional physician-to-population ratio is considered. Each incident was characterized based on five encoded features. Through the encoded ICD codes and treatment codes of visiting records, we identified whether surgery was involved (Yes/No), whether it was an emergency service (Yes/No), whether it was considered as a severe condition (Yes/No), whether the visit day was a workday (Yes/No), and the disease importance rate (DIR) of the target disease during that visit. The identification of surgeries and emergency services was based on the treatment codes defined by the NHIRD. Severity was defined based on the emergency triage results. Triage results rank for levels 1 to 3 (1 = resuscitation, 2 = emergency, and 3 = urgent) were listed as severe, and the conditions that were included in the catastrophic illness announced by the National Health Insurance were also listed as severe. The date of the incident was categorized as workday or nonworkday. We used the DIR to represent the importance of the target disease in that visit to determine whether the visit of the patient was a regular or singular event. The encoded primary diagnosis was identified as the target disease at that visit. The DIR represents the ratio of importance and is calculated as shown in (13) , where N indicates the total number of visits to the patient, and d i indicates the total number of visits for disease d i . We used only the primary diagnosis of the visit to identify the DIR. DIR = d i N (13) Association between travel distance and metastatic disease at diagnosis among patients with colon cancer A rational agent model for the spatial accessibility of primary health care Distance and health care utilization among the rural elderly Access, choice and travel: Implications for health policy The interplay of socioeconomic status, distance to center, and interdonor service area travel on kidney transplant access and outcomes Association between travel distance and choice of treatment for prostate cancer: Does geography reduce patient choice?'' Are differences in travel time or distance to healthcare for adults in global north countries associated with an impact on health outcomes? A systematic review Association between geographic access to cancer care, insurance, and receipt of chemotherapy: Geographic distribution of oncologists and travel distance The influence of distance on utilization of outpatient mental health aftercare following inpatient substance abuse treatment Patient choice modelling: How do patients choose their hospitals? The demand for health care services in rural Tanzania Does quality influence choice of hospital? Machine learning: An applied econometric approach Access to health care for the rural elderly Nearby, but not wanted? The bypassing of rural hospitals and policy implications for rural health care systems Patient choice of a hospital: Implications for health policy and management Accessing doctors at times of need-measuring the distance tolerance of rural residents for health-related travel A typology of crossborder patient mobility Predicting the future-Big data, machine learning, and clinical medicine Determinants of patient choice of healthcare providers: A scoping review Prediction policy problems Determinants of patient choice of healthcare providers: A scoping review Recommendations for the safe, effective use of adaptive CDS in the U.S. Healthcare system: An AMIA position paper Deep neural network for predicting diabetic retinopathy from risk factors Diabetes prediction model based on an enhanced deep neural network Prediction of mortality from 12-lead electrocardiogram voltage data using a deep neural network Using automated machine learning to predict the mortality of patients with COVID-19: Prediction model development study A deep convolutional neural network model to classify heartbeats Hospital admission location prediction via deep interpretable networks for the yearround improvement of emergency patient care Statistics Yearbook of Practicing Physicians and Health Care Organizations in Taiwan Measuring spatial accessibility of health care providers-introduction of a variable distance decay function within the floating catchment area (FCA) method An enhanced two-step floating catchment area (E2SFCA) method for measuring spatial accessibility to primary care physicians A suite of methods for representing activity space in a healthcare accessibility study Measuring care continuity: A comparison of claimsbased methods Using an integrated COC index and multilevel measurements to verify the care outcome of patients with multiple chronic conditions Measuring Majority of Care. Manitoba Centre for Health Policy A new method of classifying prognostic comorbidity in longitudinal studies: Development and validation Measuring the performance of individual physicians by collecting data from multiple health plans: The results of a two-state test Relationship between continuity of care in the multidisciplinary treatment of patients with diabetes and their clinical results Using deep learning and explainable artificial intelligence in patients' choices of hospital levels Using regional differences and demographic characteristics to evaluate the principles of estimation of the residence of the population in national health insurance research databases (NHIRD),'' Taiwan Contiguity-Based Spatial Weights Beyond one-hot encoding: Lower dimensional target embedding Principal component analysis Principal component analysis Deep EHR: A survey of recent advances in deep learning techniques for electronic health record (EHR) analysis,'' Opportunities and challenges in developing deep learning models using electronic health records data: A systematic review Deep patient: An unsupervised representation to predict the future of patients from the electronic health records Logistic regression diagnostics An introduction to logistic regression analysis and reporting Random forest in remote sensing: A review of applications and future directions How many trees in a random forest? What is a support vector machine? Support vector machine in machine condition monitoring and fault diagnosis A survey of deep neural network architectures and their applications Efficient processing of deep neural networks: A tutorial and survey Deep learning for healthcare: Review, opportunities and challenges Artificial intelligence for the otolaryngologist: A state of the art review An introduction to convolutional neural networks Convolutional neural networks: An overview and application in radiology A deep neural network application for improved prediction of HbA 1c in type 1 diabetes Understanding and using sensitivity, specificity and predictive values Shapley additive explanations for NO 2 forecasting Explaining nonlinear classification decisions with deep Taylor decomposition Axiomatic attribution for deep networks Captum: A unified and generic model interpretability library for PyTorch Determinants of patient choice of healthcare providers: A scoping review Free choice of healthcare providers in The Netherlands is both a goal in itself and a precondition: Modelling the policy assumptions underlying the promotion of patient choice through documentary analysis and interviews Gatekeeping and provider choice in OECD healthcare systems Gatekeeping function of primary care physicians under Japan's free-access system: A prospective open cohort study involving 14 isolated islands The influence of general practitioners on access points to health care in a system without gatekeeping: A cross-sectional study in the context of the QUALICOPC project in Austria Usability problems do not heal by themselves: National survey on physicians' experiences with EHRs in Finland Identifying quality indicators used by patients to choose secondary health care providers: A mixed methods approach Gatekeeping and the utilization of physician services in france: Evidence on the Médecin traitant reform Effect of health insurance type on health care utilization in patients with hypertension: A national health insurance database study in Korea Current trends in Japanese health care: Establishing a system for board-certificated GPs Does universal health insurance make health care unaffordable? Lessons from Taiwan An overview of the healthcare system in Taiwan Explainable machine-learning predictions for the prevention of hypoxaemia during surgery The authors would like to thank Kai-Chun Liu, who is with the Research Center for Information Technology Innovation, Academia Sinica, for his friendly support.