key: cord-0997482-w88hdlmr authors: Qi, Xiaoyu; Mei, Gang; Cuomo, Salvatore; Xiao, Lei title: A network-based method with privacy-preserving for identifying influential providers in large healthcare service systems date: 2020-04-06 journal: Future Gener Comput Syst DOI: 10.1016/j.future.2020.04.004 sha: 3d437a83aac3593fcea6eb2a6f6f5e56fc13b7ee doc_id: 997482 cord_uid: w88hdlmr Abstract In data science, networks provide a useful abstraction of the structure of many complex systems, ranging from social systems and computer networks to biological networks and physical systems. Healthcare service systems are one of the main social systems that can also be understood using network-based approaches, for example, to identify and evaluate influential providers. In this paper, we propose a network-based method with privacy-preserving for identifying influential providers in large healthcare service systems. First, the provider-interacting network is constructed by employing publicly available information on locations and types of healthcare services of providers. Second, the ranking of nodes in the generated provider-interacting network is conducted in parallel on the basis of four nodal influence metrics. Third, the impact of the top-ranked influential nodes in the provider-interacting network is evaluated using three indicators. Compared with other research work based on patient-sharing networks, in this paper, the provider-interacting network of healthcare service providers can be roughly created according to the locations and the publicly available types of healthcare services, without the need for personally private electronic medical claims, thus protecting the privacy of patients. The proposed method is demonstrated by employing Physician and Other Supplier Data CY 2017, and can be applied to other similar datasets to help make decisions for the optimization of healthcare resources in the response to public health emergencies. mainly consists of two stages: (1) creating reasonable networks for the interested 19 healthcare service systems and (2) identifying the influential nodes in the created 20 networks. 21 Much research has been conducted to abstract healthcare systems as networks, 22 and most of this research has focused on creating "patient-sharing networks" based both deliver care to the same patient and examined the approaches to conceptual-43 izing, measuring, and analyzing provider patient-sharing networks. 44 There are typically two procedures in identifying influential nodes in the cre-45 ated networks: (1) ranking nodes according to nodal influence measure metrics, 46 such as degree centrality [11] , clustering coefficients [12] , H-Index [13, 14] , and 47 k-shell [15, 16] ; and (2) evaluating the influence of top-ranked nodes by compar- 48 ing the structures and functions before and after removing a certain percentage of 49 top-ranked nodes. 50 Currently, many algorithms have been proposed for identifying influential nodes. In this paper, we propose a network-based method with privacy-preserving The novelty of the proposed method can be explained as follows. As mentioned 79 above, most of the related research is based on patient-sharing networks, which 80 are created according to electronic medical claims, focusing on the sharing of pa-81 tient information between providers. The advantage of creating patient-sharing 82 networks is that they are quite precise. However, the data of electronic medical 83 claims are personal and private [32, 33] , and the data are generally not directly 84 available to the public. Healthcare organizations need to consider regulations and 85 rules of privacy in regard to patient information. Researchers need to deal with 86 personal privacy information carefully before creating the required patient-sharing 87 network. In contrast, our study is based on a patient-sharing network rather than the 89 provider-interacting network. In the proposed method, the provider-interacting net-90 work joined by healthcare providers can be roughly created according to the loca-91 tion and available types of healthcare service, without the need for personal, private 92 electronic medical claims data. We only utilize the relationships between providers 93 to identify influential providers, thus protecting the privacy of patients. Most im- The main contributions of this paper can be summarized as follows. (1) We construct a provider-interacting network by employing publicly avail-103 able information on locations and types of healthcare services of providers. (2) We rank the influential nodes of the created provider-interacting networks 105 using four local metrics. (3) We evaluate the impact of the top-ranked influential nodes in the provider-107 interacting network using three indicators. The rest of this paper is organized as follows. Section 2 describes the details 109 of the proposed network-based method for identifying the influential provider of 110 healthcare service in the provider-interacting network. Section 3 presents the ap- In this section, we will first introduce the data source and then describe the 116 details of the proposed network-based method. (1) npi -NPI for the performing provider on the claim. The provider's NPI is 124 the numeric identifier registered in the NPPES. Each provider has a unique NPI. (2) nppes provider zip -The provider's ZIP code. (3) hcpcs code -Healthcare Common Procedure Coding System (HCPCS) 127 code used to identify the specific medical service provided. The availability of the obtained datasets is as follows. In this paper, we propose a network-based method with privacy-preserving for 134 identifying influential providers in large healthcare service systems. The proposed 135 network-based method is composed of three main procedures. The first procedure is the construction of the network, which uses the ZIP code 137 and HCPCS code of the providers to build the provider-interacting networks. The 138 key step in this procedure is to set a threshold value for the location of the neigh-139 boring providers, specify that providers within the threshold range can generate the 140 connection relationship, and add weights to the edges according to the similarity 141 of the types of medical services provided. The second procedure is to rank influential nodes. The ranking of the nodes in 143 the generated provider-interacting network is conducted in parallel on the basis of 144 four nodal influence metrics, including the DC, CB, CC, and H-Index. The third procedure is to evaluate the impact of the identified influential nodes. By removing a certain proportion of the top-ranked nodes, the impact on the three 147 indicators of the network is evaluated, including the maximum connectivity co-effectiveness of the influential nodes ranking algorithm. The flowchart of the proposed network-based method for identifying influential 151 providers in a healthcare service system is illustrated in Fig. 1 . information [34, 35, 36] . In this paper, we employed the second approach by using (1) ZIP code proximity 176 We converted the centroid coordinates of each provider's ZIP code to latitude is X j , and the latitude is Y j . If the absolute value of the difference between the 179 longitudes of two nodes, i.e., |X i -X j |, and the absolute value of the difference 180 between the latitudes, i.e., |Y i -Y j |, are both within a certain range, then the two 181 providers are considered potential neighbors; see Eq. (2). According to the division of cities, districts, and streets, people who live in a 183 city are likely to know each other. After statistical calculation, the average value of 184 threshold α is 0.2˚, and the average value of threshold β is also 0.2˚. (2) HCPCS code similarity 186 The HCPCS is divided into two principal subsystems, referred to as level I and there is descriptive terminology that identifies a category of similar items. In this paper, we specify that the set of HCPCS codes for medical services 194 provided by node v i is represented by S i , and the set of HCPCS codes provided by 195 node v j is represented by S j . If two providers have the same HCPCS code, they 196 are considered potential neighbors, and the weight of the edge is k (Eq. (3)). Due to the large number of nodes in the nationwide provider-interacting net- 5)). If the JC of two nodes is small, the relationship between the two 240 nodes may be weak. Conversely, if the JC of two nodes is large, the relationship 241 between the two nodes may be strong. between v i and its neighbors is E i , the CC of a node can be calculated via Eq. (7). In special cases, if the degree K i of the node is 0 or 1, the C i of the node is 265 considered to be 0. Obviously, the nodal CC is also between 0 and 1 [20, 44]. nent after the removal of nodes, the more obvious the trend is, indicating that the 305 effect of using this method to attack a network is better than that of other methods. (2) Network efficiency 307 To investigate the effect of node removal on network efficiency, the network 308 efficiency can be used to evaluate the connectivity of the network (Eq. (10)). To where η ij = 1/d ij , d ij is the shortest path between nodes i and j, and N is the 314 number of network nodes. In this paper, we remove a certain proportion of specific nodes in the network 316 to simulate the effect of a network attack and then calculate the network efficiency 317 decline ratio before and after the attack to quantify the accuracy of each node in-318 fluence metric. The proportion of network efficiency decrease is expressed through Eq. (11). where η represents the network efficiency after node removal, η 0 represents the 321 original network efficiency, and 0≤ µ ≤1. The higher the value of µ is, the worse 322 the network efficiency becomes after removing the node. (3) Susceptibility In this subsection, we will introduce more implementation details on the devel- There are two steps in this procedure: the extraction of NPI, ZIP code, and 345 HCPCS data and the construction of the provider-interacting network according to 346 (1) the ZIP code proximity and (2) the HCPCS code similarity. 347 We wrote a Python script specifically to extract the NPI, ZIP code, and HCPCS Table 2 . The frequency distribution of the four node-ranking metrics is shown in Fig. 8 . Fig. 9 (a) and Fig. 9 (b) show that when removal ratios reach 0.5% ∼ 1%, us-441 ing the metric DC to remove the top-ranked nodes leads to the largest decline in the 442 maximum connectivity coefficient and network efficiency, followed by the metrics 443 CB and H-Index. The metric CC leads to a slight decrease in the maximum con-444 nectivity coefficient and network efficiency, which is almost unchanged. In Fig. 9 445 (c), the metrics CC and H-Index reduce the susceptibility by a small margin, which 446 is almost unchanged, while the metric CB reduces the maximum connectivity co-447 efficient and network efficiency by the largest margin, followed by the metric DC. The above results show that, in the provider-interacting networks, when selectively 449 removing the top 0.1% ∼ 1% of nodes for each metric, the efficiency, connectivity not be accurate because it is not able to determine the "best" distance threshold 528 to select neighboring providers. In fact, the threshold of distance is specified. In resource. In the future, we will also conduct research work in this field. Identifying the most influential providers in a healthcare service system is an Provider patient-sharing networks and multiple-provider prescrib-577 ing of benzodiazepines An 580 analysis of patient-sharing physician networks and implantable cardioverter 581 defibrillator therapy The formation of physician patient sharing networks in medicare Exploring the effect of hospital affiliation Patient-sharing networks of physicians and health care 588 utilization and spending among medicare beneficiaries A scoping review of patient-sharing network studies using admin-592 istrative data Fu-600 ture Generation Computer Systems The h-index of a network node 603 and its relation to degree and coreness An index to quantify an individual's scientific research output The k-core as a predictor of struc-609 tural collapse in mutualistic ecosystems Theories and appli-612 cations Identifying influential and susceptible members of social 615 networks Identification of influential spreaders in complex networks Influential node ranking in social net-621 works based on neighborhood diversity, Future Generation Computer Sys-622 tems Ranking in 624 evolving complex networks Centrality in social networks conceptual clarification A new status index derived from sociometric analysis, Psychome-629 trika Identification of influential spreaders in complex networks Searching the web Authoritative sources in a hyperlinked environment Leaders in social networks, the 638 delicious case Vital nodes 641 identification in complex networks Cascade-based attacks on complex networks Efficient behavior of small-world networks Evaluating the importance 650 of nodes in complex networks Physician and Other Supplier Data CY Medicare-Provider-Charge-Data A distributed ensemble approach for mining 658 healthcare data under privacy constraints Gtsim-pop: Game theory based se-661 cure incentive mechanism and patient-optimized privacy-preserving packet 662 forwarding scheme in m-healthcare social networks Disparities in obesity rates: analysis 666 by zip code area New-669 comb, Geocoding addresses from a large population-based study: lessons A comparison of address point, parcel and street geocoding 673 techniques The node importance in actual complex net-676 works based on a multi-attribute ranking method, Knowledge-Based Systems 677 Identifying influential nodes in complex networks 679 based on expansion factor Evidential identification of influential nodes in 682 network of networks Ranking the spreading ability of 685 nodes in complex networks based on local structure Identification of influential nodes in social networks 689 with community structure based on label propagation Evidential method to identify influential nodes in 692 complex networks Identifying influential nodes in complex networks 695 based on the inverse-square law Role of centrality for the identification of influential spreaders 700 in complex networks Universal resilience patterns in com-703 plex networks Statistical mechanics of complex networks Error and attack tolerance of complex 709 networks SNAP for C++: Stanford Network Analysis Platform SNAP: A general purpose network analysis and graph 713 mining library On the design and analysis of proto-716 cols for personal health record storage on personal data server devices A data integration platform for patient-centered e-721 healthcare and clinical decision support, Future Generation Computer Sys-722 tems COVID-19: what is next for public health? Case of the index patient who 728 caused tertiary transmission of COVID-19 infection in korea: the applica-729 tion of lopinavir/ritonavir for the treatment of COVID-19 infected pneumo-730 nia monitored by quantitative RT-PCR Substantial un-733 documented infection facilitates the rapid dissemination of novel coronavirus 734 (SARS-CoV2) SARS virus infection of cats and ferrets Impact of internaand border control measures on the global spread of the novel 742 2019 coronavirus outbreak The effect of travel restrictions 747 on the spread of the 2019 novel coronavirus (COVID-19) outbreak A survey of Internet of Things 750 (IoT) for geo-hazards prevention: Applications, technologies, and chal-751 lenges A machine learning approach for 754 IoT cultural data Decision 757 making in IoT environment through unsupervised learning Exploring unsupervised 760 learning techniques for the Internet of Things The authors declare that there are no conflicts of interest.J o u r n a l P r e -p r o o f