key: cord-017831-anadq4j9 authors: Lai, Yi-Horng title: Network Analysis of Comorbidities: Case Study of HIV/AIDS in Taiwan date: 2015-07-30 journal: Multidisciplinary Social Networks Research DOI: 10.1007/978-3-662-48319-0_14 sha: doc_id: 17831 cord_uid: anadq4j9 Comorbidities are the presence of one or more additional disorders or diseases co-occurring with a primary disease or disorder. The purpose of this study is to identify diseases that co-occur with HIV/AIDS and analyze the gender differences. Data was collected from 536 HIV/AIDS admission medical records out of 1,377,469 admission medical records from 1997 to 2010 in Taiwan. In this study, the comorbidity relationships are presented in the phenotypic disease network (PDN), and φ-correlation is used to measure the distance between two diseases on the network. The results show that there is a high correlation in the following pairs/triad of diseases: human immunodeficiency virus infection with specified conditions (042) and pneumocystosis pneumonia (1363), human immunodeficiency virus infection with specified malignant neoplasms (0422) and kaposi’s sarcoma of other specified sites (1768), human immunodeficiency virus acquired immunodeficiency syndrome, and unspecified (0429) and progressive multifocal leukoencephalopathy (0463), and lastly, human immunodeficiency virus infection with specified infections (0420), meningoencephalitis due to toxoplasmosis (1300), and human immunodeficiency virus infection specified infections causing other specified infections (0421). The human immunodeficiency virus infection and acquired immune deficiency syndrome (HIV/AIDS) epidemic is one of the most important and crucial public health risks facing governments and civil societies in the world. Adolescents were at the center of the pandemic in terms of transmission, impact, and potential for changing the attitudes and behaviors that underlie this disease. Therefore, HIV/AIDS prevention has become a priority all over the world. The first HIV/AIDS case in Taiwan was reported in 1984. As of the end of 2013, the total number of HIV/AIDS cases had been accumulated to 26,475. Faced with this serious situation, Taiwan's Centers for Disease Control worked with other departments and dedicated a tremendous amount of effort and resources to introduce harm reduction programs. Total reported cases dropped in 2006, which was the first trend reversal since 1984. In 2008 and thereafter, the epidemic took a turn; infections mainly occurred through sexual encounter [1] . There are no clear boundaries between many diseases, as diseases can have multiple causes and can be related in several dimensions. From a genetic perspective, a pair of diseases can be related because they have both been associated with the same gene, whereas from a proteomic perspective, diseases can be related because diseaseassociated proteins act on the same pathway [2] . During the past half-decade, several resources have been constructed to help understand the entangled origins of many diseases. Many of these resources have been presented as networks in which interactions between disease-associated genes, proteins, and expression patterns have been summarized. Goh, Cusick, Valle, Barton, Vidal, and Barabási created a network of Mendelian gene-disease associations by connecting diseases that have been associated with the same genes [3] . Besides, more and more studies have applied the network approach in diseases, such as neurodegenerative diseases [4] , infertility etiologies [5] , SARS, and HIV/AIDS [6] . A comorbidity relationship exists between two diseases whenever they affect the same individual substantially more than chance alone. In the past, comorbidities have been used extensively to construct synthetic scales for mortality prediction [7, 8] , yet their utility exceed their current use. Studying the structure defined by entire sets of comorbidities might help the understanding of many biological and medical questions from a perspective that is complementary to other approaches. For example, a recent study built a comorbidity network in an attempt to elucidate neurological diseases' common genetic origins [9] . Heretofore, however, neither this data nor the data necessary to explore relationships between all diseases is currently available to the research community. This present study decided to provide this data in the form of a phenotypic disease network (PDN) that includes all diseases recorded in the medical claims. Additionally, this study illustrates how a PDN can be used to study disease progression from a network perspective by interpreting the PDN as the landscape where disease progression occurs and shows how the network can be used to study phenotypic differences between patients of different demographic backgrounds. This study indicates the directionality of disease progression, as observed in our dataset, and finds out that more central disease in the PDN are more likely to occur after other diseases and that more peripheral diseases tend to precede other illnesses. In order to guide HIV/AIDS-related diseases prevention program, this study conducted the PDN of HIV/AIDS to explore the relationship between HIV/AIDS and other diseases. The objective of this study is to identify diseases that are highly correlated with HIV/AIDS, and discuss gender differences. The National Health Insurance (NHI) program was initiated in Taiwan in 1995 and covers nearly all residents. In 1999, the Bureau of NHI began to release all claims data in electronic form to the public under the National Health Insurance Research Database (NHIRD) project. The structure of the claim files is described in detail on the NHIRD website and in other publications [10]. NHIRD offers reliable, systematic, and complete data for disease detection. The datasets contained only the visit files, including dates, medical care facilities and specialties, patients' genders, dates of birth, and the four major diagnoses coded in the International Classification of Disease, 9th Revision, Clinical Modification (ICD-9-CM) format [10, 11] . In total, the ICD-9-CM classification consists of 657 different categories at the 3 digit level and 16,459 categories at 5 digits. Human immunodeficiency virus infection and acquired immune deficiency syndrome (HIV/AIDS) is coded 042 as Human immunodeficiency virus (HIV) infection disease, 0420 as human immunodeficiency virus infection with specified infections, 0421 as specified infections causing other specified infections, 0422 as human immunodeficiency virus infection with specified malignant neoplasms, and 0429 as acquired immunodeficiency syndrome, and unspecified. To protect privacy, the data on patient identities and institutions had been scrambled cryptographically. The visit files in this study represented 1,377,469 admission activities within the NHI from 1997 to 2010. Demographically, the data set consists of 1,377,469 admission medical records from 1,000,000 patients. Of all these patients, 50.93% were females, 48.93 were males, 20.67% were over 71 years of age, and 536 persons were diagnosed with HIV/AIDS (Table 1) . These HIV/AIDS admission medical records included 45 females (8.40%) and 491 males (91.60%). Of all the 536 HIV/AIDS records, 461 (86.01%) had major diagnoses coded 042, 21 (3.92%) had major diagnoses coded 0420, 3 (.56%) had primary diagnoses coded 0421, and 7 (8.77%) had major diagnoses coded 0429 (Table 2 ). There are several limitations to the current study. First, although data gathered from NHIRD is comprehensive and reliable, there are still some mistakes that the system couldn't find, such as code entry errors. These errors may be carried into data Y.-H. Lai pre-processing, and it is beyond the control of this study. Second, the data was not upto-date. Although future researchers are still recommended to apply the method of this study to analyze the characteristics of patients for the purpose of disease prevention, changes of medical treatments and other factors should be considered. Third, this study does not have a global sample, so there might be limits to replicate the findings of this study in all the other countries. It might, however, be generalized to other ethnic Chinese population due to the similarity in genes and physiology. To measure the comorbidity relationships, it is necessary to quantify the strength of comorbidities by introducing a notion of distance between two diseases. A difficulty of this approach is that different statistical distance measures have biases that over-or under-estimate the relationships between rare or prevalent diseases. These biases are important given that the number of times a particular disease is diagnosed, such as its prevalence, follows a heavy tailed distribution [2] , meaning that while most diseases are rarely diagnosed, a few diseases have been diagnosed in a large fraction of the population. In this study, the φ-correlation is used to quantify the distance between two diseases. The φ-correlation, which is Pearson's correlation for binary variables, can be expressed mathematically as [2, 12] : where C ij is the number of patients affected by both diseases, N is the total number of patients in the population and P i and P j are the prevalence of diseases i and j. The distribution of φ values representing all disease pairs where C ij >0 is presented in Fig 1, 3 and 5. This research utilizes data from NHIRD to obtain the four major diagnoses codes of all patients. This study calculates φ-correlation with Equation 1. Pajek 4.03 program was used to compute the compute the degree of centrality and betweenness of each node and the path value (φ-correlation). This study is focused on the path between each disease as network and the correlation as the value of line (path weight), and they could be affected by different populations, which indicates differences in gender for each population. It can be summarized the set of all comorbidity associations between all diseases expressed in the study population by constructing a Phenotypic Disease Network (PDN). In the PDN, nodes are disease phenotypes identified by unique ICD9 codes, and links phenotypes that show significant comorbidity according to the measures introduced above. In principle, the number of disease-disease associations in the PDN is proportional to the square of the number of phenotypes, yet many of these associations are either not strong or are not statistically significant [2] . The structure of the PDN can be explored by focusing on the strongest and the most significant of these associations. The PDN can be seen as a network of the phenotypic space. This network allows people to understand the relationship between illnesses. The distribution of φ-values representing all disease pairs is presented in Figure 1 . Most of them are between .000 and .005. A discussion on the confidence interval and statistical significance of these measures can be found in Hidalgo, Blumm, Barabási, and Christakis's study, and φ-correlation > .06 is statistically significant [2] . In Figure 2 , nodes are diseases; links are correlations. Node color identifies the ICD9 category; node size is proportional to disease prevalence. Link color indicates correlation strength. The PDN built using φ-correlation. All statistically significant links where φ-correlation>.06 are shown here. Human immunodeficiency virus infection with specified conditions (042) and pneumocystosis pneumonia (1363), cryptococcal meningitis (3210), candidiasis of mouth (1120), unspecified secondary syphilis (0919), kaposi's sarcoma of unspecified (1769), cryptococcosis (1175), kaposi's sarcoma of palate (1762), neurosyphilis, unspecified (0949), kaposi's sarcoma of lung (1764), kaposi's sarcoma of lymph nodes (1765), and late syphilis, latent (096) are highly correlated. Those φ-correlations are all above .06. The relationship between human immunodeficiency virus infection with specified conditions (042) and pneumocystosis pneumonia (1363) is the strongest. Pneumocystis pneumonia is a form of pneumonia, caused by the yeast-like fungus Pneumocystis jirovecii. Pneumocystis pneumonia is not commonly found in the lungs of healthy people, but, being a source of opportunistic infection, it can cause a lung infection in people with a weak immune system [13] . Human immunodeficiency virus infection with specified infections (0420) and human immunodeficiency virus infection specified infections causing other specified infections (0421), kaschin-beck disease (7160), pneumocystosis (1363), meningoencephalitis due to toxoplasmosis (1300), falciparum malaria (0840), kaposi's sarcoma of other specified sites (1768), with specified malignant neoplasms (0422) are highly correlated. Those φ-correlations are all above .06. Y.-H. Lai The φ-correlation of human immunodeficiency virus infection with specified infections (0420) and meningoencephalitis due to toxoplasmosis (1300) is highest. Toxoplasmosis is a parasitic disease caused by the protozoan Toxoplasma gondii. The parasite infects most genera of warm-blooded animals, including humans, but the primary host is the felid family. Infection occurs by eating infected meat, particularly swine products. By ingesting water, soil, or food that has come into contact with infected animals' fecal matter [14] . Besides, human immunodeficiency virus infection with specified infections (0420) and Specified infections causing other specified infections (0421) is highly correlated, and the φ-correlation is .11. Human immunodeficiency and kaposi's sarcoma of sk specified infections (0420) highly correlated. Those φ- The φ-correlation of Hu lignant neoplasms (0422) a highest. Human immunodeficien specified (0429) and prog correlated. Those φ-correlat The distribution of w va Most of them are between . In PDNs for females (Figure 4 ), human immunodeficiency virus infection with specified conditions (042) and cryptococcal meningitis (3210), kaposi's sarcoma of unspecified (1769), and pneumocystosis (1363) are highly correlated. Human immunodeficiency virus acquired immunodeficiency syndrome, unspecified (0429) and other cerebellar ataxia (3343), and progressive multifocal leukoencephalopathy (0463) are highly correlated. Those φ-correlations are all above .06. The distribution of w values representing all disease pairs is presented as Fig. 5 . Most of them are between .000 and .005. In PDNs for males ( Figure 6 ), human immunodeficiency virus infection with specified conditions (042) and unspecified secondary syphilis (0919), cytomegaloviral disease (0785), kaposi's sarcoma of palate (1762), amebic liver abscess (0063), kaposi's sarcoma of lung (1764), kaposi's sarcoma of lymph nodes (1765), cryptococcosis (1175), late syphilis, latent (096), candidiasis of mouth (1120), cryptococcal meningitis (3210), neurosyphilis, unspecified (0949), kaposi's sarcoma of unspecified (1769), and pneumocystosis (1363) are highly correlated. Those φ-correlations are all above .06. The φ-correlation of human immunodeficiency virus infection with specified conditions (042) with pneumocystosis (1363) is highest. Human immunodeficiency virus infection with specified infections (0420) and specified infections causing other specified infections (0421), meningoencephalitis due to toxoplasmosis (1300), pneumocystosis (1363), kaschin-beck disease (7160), kaposi's sarcoma of other specified sites (1768), with specified malignant neoplasms (0422), and falciparum malaria (0840) are highly correlation. Those φ-correlations are all above .06. The φ-correlation of specified infections causing other specified infections (0421) with human immunodeficiency virus infection with specified infections (0420) is highest. Human immunodeficiency virus acquired immunodeficiency syndrome, unspecified (0429) and HIV infection, unspecified (0449) is highly correlation, and the φcorrelation is .09. Through the PDN, this paper has identified the diseases that are associated with HIV/AIDS. It could be showed that the PDN has a complex structure where some diseases are highly connected while others are barely connected at all. While not conclusive, these observations can explain the observation that more connected diseases are seen to be more lethal, as patients developing highly connected diseases are more likely those at an advanced stage of disease, which can be reached through multiple paths in the PDN. The findings suggest that human immunodeficiency virus infection with specified conditions (042) and pneumocystosis pneumonia is highly correlated (1363). This result is consistent with Aliouat-Denis, Chabé, Demanche, Aliouat el, Viscogliosi, Guillot, Delhaes, and Dei-Cas's study [13] . Pneumocystis pneumonia is especially seen in people with cancer undergoing chemotherapy, HIV/AIDS, and the use of medications that suppress the immune system. Human immunodeficiency virus infection with specified infections (0420) and meningoencephalitis due to toxoplasmosis (1300) is highly correlation. Besides, it is also highly correlation with human immunodeficiency virus infection specified infections causing other specified infections (0421). The result is the same as Dubey, Hill, Jones, Hightower, Kirkland, Roberts, Marcet, Lehmann, Vianna, Miska, Sreekumar, Kwok, Shen, and Gamble''s study [14] . Human immunodeficiency virus infection with specified malignant neoplasms (0422) and kaposi's sarcoma of other specified sites (1768) is highly correlation. The result is the same as Holmes, Hawson, Liu, Friedman, Khiabanian, and Rabadan's study [15] . Since patients infected with HIV/AIDS have a high risk of developing Kaposi sarcoma, the prevention of this malignant disease should be prioritized. Human immunodeficiency virus acquired immunodeficiency syndrome, and unspecified (0429) and progressive multifocal leukoencephalopathy (0463) is highly correlation. The result is the same as Casado, Corral, García, Martinez-San Millán, Navas, Moreno, and Moreno's study [16] and Sano, Nakano, Omoto, Takao, Ikeda, Oga, Nakamichi, Saijo, Maoka, Sano, Kawai, Kanda [17] . Progressive multifocal leukoencephalopathy is a usually fatal viral disease characterized by progressive damage or inflammation of the white matter of the brain at multiple locations. It is caused by the JC virus, which is normally present and kept under control by the immune system. JC virus is harmless except in cases of weakened immune systems. Progressive multifocal leukoencephalopathy occurs almost exclusively in patients with severe immune deficiency, most commonly among patients with acquired immune deficiency syndrome, but people on chronic immunosuppressive medications including chemotherapy are also at increased risk of progressive multifocal leukoencephalopathy [16, 17] . For females, human immunodeficiency virus infection with specified conditions (042) and cryptococcal meningitis (3210), kaposi's sarcoma of unspecified (1769), and pneumocystosis (1363) are highly correlation. Human immunodeficiency virus acquired immunodeficiency syndrome, unspecified (0429) and other cerebellar ataxia (3343), and progressive multifocal leukoencephalopathy (0463) are highly correlation. For males, human immunodeficiency virus infection with specified conditions (042) and pneumocystosis (1363) is highly correlation. Human immunodeficiency virus infection with specified infections (0420) and meningoencephalitis due to toxoplasmosis (1300), and falciparum malaria (0840) are highly correlation. Human immunodeficiency virus infection with specified malignant neoplasms (0422) and kaposi's sarcoma of other specified sites (1768) is highly correlation. Human immunodeficiency virus acquired immunodeficiency syndrome, unspecified (0429) and HIV infection, unspecified (0449) is highly correlation. Exploring comorbidities from a network perspective could help determine whether differences in the comorbidity patterns expressed in different populations indicate differences in races, country, or socioeconomic status for each population. Here this study show as an initially stage that there are differences in the strength of comorbidities measured for patients of different gender. The PDN could be the starting point of studies exploring these and related questions. Communicable Diseases & Prevention-HIV/AIDS, Health topics A Dynamic Network Approach for the Study of Human Phenotypes The Human Disease Network A Network Approach To Clinical Intervention In Neurodegenerative Diseases Infertility Etiologies Are Genetically and Clinically Linked With Other Diseases in Single Meta-Diseases Network-Based Analysis of Comorbidities Risk During an Infection: SARS and HIV Case Studies Chronic Conditions and Risk of In-Hospital Death Improved Comorbidity Adjustment for Predicting Mortality in Medicare Populations Probing Genetic Overlap among Complex Human Phenotypes International Classification of Diseases, Ninth Revision The Phi-coefficient, the Tetrachoric Correlation Coefficient, and the Pearson-Yule Debate Pneumocystis Species, Co-Evolution and Pathogenic Power. Infection Prevalence Of Viable Toxoplasma Gondii in Beef, Chicken, and Pork from Retail Meat Stores in The United States: Risk Assessment to Consumers Discovering Disease Associations by Integrating Electronic Clinical Data and Medical Literature Continued Declining Incidence and Improved Survival of Progressive Multifocal Leukoencephalopathy in HIV/AIDS Patients in The Current Era Rituximab-associated Progressive Multifocal Leukoencephalopathy Derived from Non-Hodgkin Lymphoma: Neuropathological Findings and Results of Mefloquine Treatment Acknowledgements. This study is based in part on data from the National Health Insurance Research Database provided by the Bureau of National Health Insurance, Department of Health and managed by National Health Research Institutes (NHRI). The interpretation and conclusions contained herein do not represent those of Bureau of National Health Insurance, Department of Health or National Health Research Institutes.