key: cord-0997482-w88hdlmr
authors: Qi, Xiaoyu; Mei, Gang; Cuomo, Salvatore; Xiao, Lei
title: A network-based method with privacy-preserving for identifying influential providers in large healthcare service systems
date: 2020-04-06
journal: Future Gener Comput Syst
DOI: 10.1016/j.future.2020.04.004
sha: 3d437a83aac3593fcea6eb2a6f6f5e56fc13b7ee
doc_id: 997482
cord_uid: w88hdlmr

Abstract In data science, networks provide a useful abstraction of the structure of many complex systems, ranging from social systems and computer networks to biological networks and physical systems. Healthcare service systems are one of the main social systems that can also be understood using network-based approaches, for example, to identify and evaluate influential providers. In this paper, we propose a network-based method with privacy-preserving for identifying influential providers in large healthcare service systems. First, the provider-interacting network is constructed by employing publicly available information on locations and types of healthcare services of providers. Second, the ranking of nodes in the generated provider-interacting network is conducted in parallel on the basis of four nodal influence metrics. Third, the impact of the top-ranked influential nodes in the provider-interacting network is evaluated using three indicators. Compared with other research work based on patient-sharing networks, in this paper, the provider-interacting network of healthcare service providers can be roughly created according to the locations and the publicly available types of healthcare services, without the need for personally private electronic medical claims, thus protecting the privacy of patients. The proposed method is demonstrated by employing Physician and Other Supplier Data CY 2017, and can be applied to other similar datasets to help make decisions for the optimization of healthcare resources in the response to public health emergencies.

mainly consists of two stages: (1) creating reasonable networks for the interested 19 healthcare service systems and (2) identifying the influential nodes in the created 20 networks. 21 Much research has been conducted to abstract healthcare systems as networks, 22 and most of this research has focused on creating "patient-sharing networks" based both deliver care to the same patient and examined the approaches to conceptual-43 izing, measuring, and analyzing provider patient-sharing networks. 44 There are typically two procedures in identifying influential nodes in the cre-45 ated networks: (1) ranking nodes according to nodal influence measure metrics, 46 such as degree centrality [11] , clustering coefficients [12] , H-Index [13, 14] , and 47 k-shell [15, 16] ; and (2) evaluating the influence of top-ranked nodes by compar- 48 ing the structures and functions before and after removing a certain percentage of 49 top-ranked nodes. 50 Currently, many algorithms have been proposed for identifying influential nodes. In this paper, we propose a network-based method with privacy-preserving The novelty of the proposed method can be explained as follows. As mentioned 79 above, most of the related research is based on patient-sharing networks, which 80 are created according to electronic medical claims, focusing on the sharing of pa-81 tient information between providers. The advantage of creating patient-sharing 82 networks is that they are quite precise. However, the data of electronic medical 83 claims are personal and private [32, 33] , and the data are generally not directly 84 available to the public. Healthcare organizations need to consider regulations and 85 rules of privacy in regard to patient information. Researchers need to deal with 86 personal privacy information carefully before creating the required patient-sharing 87 network.

In contrast, our study is based on a patient-sharing network rather than the 89 provider-interacting network. In the proposed method, the provider-interacting net-90 work joined by healthcare providers can be roughly created according to the loca-91 tion and available types of healthcare service, without the need for personal, private 92 electronic medical claims data. We only utilize the relationships between providers 93 to identify influential providers, thus protecting the privacy of patients. Most im- 

The main contributions of this paper can be summarized as follows.

(1) We construct a provider-interacting network by employing publicly avail-103 able information on locations and types of healthcare services of providers.

(2) We rank the influential nodes of the created provider-interacting networks 105 using four local metrics.

(3) We evaluate the impact of the top-ranked influential nodes in the provider-107 interacting network using three indicators.

The rest of this paper is organized as follows. Section 2 describes the details 109 of the proposed network-based method for identifying the influential provider of 110 healthcare service in the provider-interacting network. Section 3 presents the ap- 

In this section, we will first introduce the data source and then describe the 116 details of the proposed network-based method. (1) npi -NPI for the performing provider on the claim. The provider's NPI is 124 the numeric identifier registered in the NPPES. Each provider has a unique NPI.

(2) nppes provider zip -The provider's ZIP code.

(3) hcpcs code -Healthcare Common Procedure Coding System (HCPCS) 127 code used to identify the specific medical service provided.

The availability of the obtained datasets is as follows. In this paper, we propose a network-based method with privacy-preserving for 134 identifying influential providers in large healthcare service systems. The proposed 135 network-based method is composed of three main procedures.

The first procedure is the construction of the network, which uses the ZIP code 137 and HCPCS code of the providers to build the provider-interacting networks. The 138 key step in this procedure is to set a threshold value for the location of the neigh-139 boring providers, specify that providers within the threshold range can generate the 140 connection relationship, and add weights to the edges according to the similarity 141 of the types of medical services provided.

The second procedure is to rank influential nodes. The ranking of the nodes in 143 the generated provider-interacting network is conducted in parallel on the basis of 144 four nodal influence metrics, including the DC, CB, CC, and H-Index.

The third procedure is to evaluate the impact of the identified influential nodes.

By removing a certain proportion of the top-ranked nodes, the impact on the three 147 indicators of the network is evaluated, including the maximum connectivity co-effectiveness of the influential nodes ranking algorithm.

The flowchart of the proposed network-based method for identifying influential 151 providers in a healthcare service system is illustrated in Fig. 1 . information [34, 35, 36] . In this paper, we employed the second approach by using 

(1) ZIP code proximity 176 We converted the centroid coordinates of each provider's ZIP code to latitude

is X j , and the latitude is Y j . If the absolute value of the difference between the 179 longitudes of two nodes, i.e., |X i -X j |, and the absolute value of the difference 180 between the latitudes, i.e., |Y i -Y j |, are both within a certain range, then the two 181 providers are considered potential neighbors; see Eq.

(2).

According to the division of cities, districts, and streets, people who live in a 183 city are likely to know each other. After statistical calculation, the average value of 184 threshold α is 0.2˚, and the average value of threshold β is also 0.2˚.

(2) HCPCS code similarity 186 The HCPCS is divided into two principal subsystems, referred to as level I and there is descriptive terminology that identifies a category of similar items.

In this paper, we specify that the set of HCPCS codes for medical services 194 provided by node v i is represented by S i , and the set of HCPCS codes provided by 195 node v j is represented by S j . If two providers have the same HCPCS code, they 196 are considered potential neighbors, and the weight of the edge is k (Eq. (3)).

Due to the large number of nodes in the nationwide provider-interacting net- 5)). If the JC of two nodes is small, the relationship between the two 240 nodes may be weak. Conversely, if the JC of two nodes is large, the relationship 241 between the two nodes may be strong. between v i and its neighbors is E i , the CC of a node can be calculated via Eq. (7).

In special cases, if the degree K i of the node is 0 or 1, the C i of the node is 265 considered to be 0. Obviously, the nodal CC is also between 0 and 1 [20, 44]. nent after the removal of nodes, the more obvious the trend is, indicating that the 305 effect of using this method to attack a network is better than that of other methods.

(2) Network efficiency 307 To investigate the effect of node removal on network efficiency, the network 308 efficiency can be used to evaluate the connectivity of the network (Eq. (10)). To 

where η ij = 1/d ij , d ij is the shortest path between nodes i and j, and N is the 314 number of network nodes.

In this paper, we remove a certain proportion of specific nodes in the network 316 to simulate the effect of a network attack and then calculate the network efficiency 317 decline ratio before and after the attack to quantify the accuracy of each node in-318 fluence metric.

The proportion of network efficiency decrease is expressed through Eq. (11).

where η represents the network efficiency after node removal, η 0 represents the 321 original network efficiency, and 0≤ µ ≤1. The higher the value of µ is, the worse 322 the network efficiency becomes after removing the node.

(3) Susceptibility In this subsection, we will introduce more implementation details on the devel- There are two steps in this procedure: the extraction of NPI, ZIP code, and 345 HCPCS data and the construction of the provider-interacting network according to 346 (1) the ZIP code proximity and (2) the HCPCS code similarity. 347 We wrote a Python script specifically to extract the NPI, ZIP code, and HCPCS Table 2 . The frequency distribution of the four node-ranking metrics is shown in Fig. 8 . Fig. 9 (a) and Fig. 9 (b) show that when removal ratios reach 0.5% ∼ 1%, us-441 ing the metric DC to remove the top-ranked nodes leads to the largest decline in the 442 maximum connectivity coefficient and network efficiency, followed by the metrics 443 CB and H-Index. The metric CC leads to a slight decrease in the maximum con-444 nectivity coefficient and network efficiency, which is almost unchanged. In Fig. 9 445 (c), the metrics CC and H-Index reduce the susceptibility by a small margin, which 446 is almost unchanged, while the metric CB reduces the maximum connectivity co-447 efficient and network efficiency by the largest margin, followed by the metric DC.

The above results show that, in the provider-interacting networks, when selectively 449 removing the top 0.1% ∼ 1% of nodes for each metric, the efficiency, connectivity not be accurate because it is not able to determine the "best" distance threshold 528 to select neighboring providers. In fact, the threshold of distance is specified. In resource. In the future, we will also conduct research work in this field.

Identifying the most influential providers in a healthcare service system is an 

Provider patient-sharing networks and multiple-provider prescrib-577 ing of benzodiazepines

An 580 analysis of patient-sharing physician networks and implantable cardioverter 581 defibrillator therapy

The formation of physician patient sharing networks in medicare

Exploring the effect of hospital affiliation

Patient-sharing networks of physicians and health care 588 utilization and spending among medicare beneficiaries

A scoping review of patient-sharing network studies using admin-592 istrative data

Fu-600 ture Generation Computer Systems

The h-index of a network node 603 and its relation to degree and coreness

An index to quantify an individual's scientific research output

The k-core as a predictor of struc-609 tural collapse in mutualistic ecosystems

Theories and appli-612 cations

Identifying influential and susceptible members of social 615 networks

Identification of influential spreaders in complex networks

Influential node ranking in social net-621 works based on neighborhood diversity, Future Generation Computer Sys-622 tems

Ranking in 624 evolving complex networks

Centrality in social networks conceptual clarification

A new status index derived from sociometric analysis, Psychome-629 trika

Identification of influential spreaders in complex networks

Searching the web

Authoritative sources in a hyperlinked environment

Leaders in social networks, the 638 delicious case

Vital nodes 641 identification in complex networks

Cascade-based attacks on complex networks

Efficient behavior of small-world networks

Evaluating the importance 650 of nodes in complex networks

Physician and Other Supplier Data CY

Medicare-Provider-Charge-Data

A distributed ensemble approach for mining 658 healthcare data under privacy constraints

Gtsim-pop: Game theory based se-661 cure incentive mechanism and patient-optimized privacy-preserving packet 662 forwarding scheme in m-healthcare social networks

Disparities in obesity rates: analysis 666 by zip code area

New-669 comb, Geocoding addresses from a large population-based study: lessons

A comparison of address point, parcel and street geocoding 673 techniques

The node importance in actual complex net-676 works based on a multi-attribute ranking method, Knowledge-Based Systems 677

Identifying influential nodes in complex networks 679 based on expansion factor

Evidential identification of influential nodes in 682 network of networks

Ranking the spreading ability of 685 nodes in complex networks based on local structure

Identification of influential nodes in social networks 689 with community structure based on label propagation

Evidential method to identify influential nodes in 692 complex networks

Identifying influential nodes in complex networks 695 based on the inverse-square law

Role of centrality for the identification of influential spreaders 700 in complex networks

Universal resilience patterns in com-703 plex networks

Statistical mechanics of complex networks

Error and attack tolerance of complex 709 networks

SNAP for C++: Stanford Network Analysis Platform

SNAP: A general purpose network analysis and graph 713 mining library

On the design and analysis of proto-716 cols for personal health record storage on personal data server devices

A data integration platform for patient-centered e-721 healthcare and clinical decision support, Future Generation Computer Sys-722 tems

COVID-19: what is next for public health?

Case of the index patient who 728 caused tertiary transmission of COVID-19 infection in korea: the applica-729 tion of lopinavir/ritonavir for the treatment of COVID-19 infected pneumo-730 nia monitored by quantitative RT-PCR

Substantial un-733 documented infection facilitates the rapid dissemination of novel coronavirus 734 (SARS-CoV2)

SARS virus infection of cats and ferrets

Impact of internaand border control measures on the global spread of the novel 742 2019 coronavirus outbreak

The effect of travel restrictions 747 on the spread of the 2019 novel coronavirus (COVID-19) outbreak

A survey of Internet of Things 750 (IoT) for geo-hazards prevention: Applications, technologies, and chal-751 lenges

A machine learning approach for 754

IoT cultural data

Decision 757 making in IoT environment through unsupervised learning

Exploring unsupervised 760 learning techniques for the Internet of Things

The authors declare that there are no conflicts of interest.J o u r n a l P r e -p r o o f