key: cord-0821798-erwec022 authors: Byeon, Haewon title: Exploring Factors for Predicting Anxiety Disorders of the Elderly Living Alone in South Korea Using Interpretable Machine Learning: A Population-Based Study date: 2021-07-18 journal: Int J Environ Res Public Health DOI: 10.3390/ijerph18147625 sha: 7a33ab5eb74885333b053640a1d464f58c500956 doc_id: 821798 cord_uid: erwec022 This epidemiological study aimed to develop an X-AI that could explain groups with a high anxiety disorder risk in old age. To achieve this objective, (1) this study explored the predictors of senile anxiety using base models and meta models. (2) This study presented decision tree visualization that could help psychiatric consultants and primary physicians easily interpret the path of predicting high-risk groups based on major predictors derived from final machine learning models with the best performance. This study analyzed 1558 elderly (695 males and 863 females) who were 60 years or older and completed the Zung’s Self-Rating Anxiety Scale (SAS). We used support vector machine (SVM), random forest, LightGBM, and Adaboost for the base model, a single predictive model, while using XGBoost algorithm for the meta model. The analysis results confirmed that the predictive performance of the “SVM + Random forest + LightGBM + AdaBoost + XGBoost model (stacking ensemble: accuracy 87.4%, precision 85.1%, recall 87.4%, and F1-score 85.5%)” was the best. Also, the results of this study showed that the elderly who often (or mostly) felt subjective loneliness, had a Self Esteem Scale score of 26 or less, and had a subjective communication with their family of 4 or less (on a 10-point scale) were the group with the highest risk anxiety disorder. The results of this study imply that it is necessary to establish a community-based mental health policy that can identify elderly groups with high anxiety risks based on multiple risk factors and manage them constantly. Anxiety, which is defined as a disorder causing difficulties in daily life due to excess worry, fear, and hyperarousal, is known as one of the most common mental disorders worldwide [1] . It was reported that one in five Americans suffered from anxiety disorders [2] and the lifetime prevalence of anxiety disorders was 9.3% in South Korea [3] . The number of patients with an anxiety disorder is rapidly increasing in South Korea: the number of patients treated for an anxiety disorder increased from 533,619 in 2014 to 690,735 in 2018, a 29.4% increase in five years [3] . Particularly, the incident rate of anxiety disorders by age group showed that the number of treated patients per 100,000 increased the most (15% increase) from 2014 to 2018 in the elderly group (≥60 years old), and the result suggested that the elderly experienced anxiety frequently and that anxiety disorder was a rapidly increasing mental illness. A number of epidemiologic studies [3] [4] [5] have reported that the prevalence of anxiety disorders in the elderly is lower than that of the young/prime-aged. In particular, Gum et al. (2009) [5] examined a community-based epidemiologic survey and showed that the prevalence of anxiety disorders was 20.7% in the 18-44 years old group, 18 .7% in the 45-64 years old group, and 7.0% in the 65 years old or older group, indicating that that of the elderly was the lowest. However, it is believed that the actual prevalence of anxiety disorders in the elderly may be higher than the reported, when considering the 2 of 16 fact that the elderly are reluctant to recall and report psychiatric symptoms or often tend to express the symptoms in physical terms [6] . The elderly are at very high risk of experiencing anxiety because (1) they face a lot of social stress such as bereavement, retirement, economic hardship, and abuse from people around them, (2) they are vulnerable to anxiety due to neuro-biological changes in the brain as a result of aging, (3) they are more likely to experience the fear of death in the senescence, and (4) they suffer from more physical diseases than younger people and are taking a lot of drugs [7] . Nevertheless, since the elderly perceive emotional problems such as depression and anxiety as a result of aging and they do not seek medical assistance actively, a small number of them are diagnosed with an anxiety disorder and treated [8] . Anxiety disorders can be treated by drugs, using anti-anxiety drugs such as buspirone, or psychotherapy [9] . Therefore, it is important to identify factors associated with anxiety and detect and manage people who are very vulnerable to anxiety as soon as possible. It is highly likely that anxiety is affected by social factors as well as the physical and psychological problems of individuals [10] . Therefore, it is necessary to consider environmental factors such as social factors and social networks, in addition to sociodemographic characteristics, when identifying factors related to anxiety. It is unavoidable that the capability to emotionally cope with social and environmental changes is more vulnerable in old age, when people tend to be highly dependent on social factors in terms of economic, physical, and mental health [11, 12] . Moreover, the risk factors of anxiety are complex and more likely to cluster with each other [11, 12] . Therefore, it is important in public health science to understand the characteristics of anxiety in old age, considering that South Korea is facing a super-aged society. It is clear that the elderly are vulnerable to anxiety and anxiety disorder is a common disease in the elderly. However, only a few studies have evaluated the risk factors of anxiety disorder in old age while considering social factors and social network as well as sociodemographic characteristics and personal characteristics compared to other mental disorders, such as cognitive disorders [7] . Many recent studies [13, 14] have used machine learning based on big data to identify the risk factors of a disease while considering multiple risk factors. However, employing a single machine learning technique may show lower prediction performance, depending on the used algorithm, and it is possible to induce errors because the bias existing in each algorithm can affect the prediction result. For example, a decision tree model such as Iterative Dichotomiser 3 (ID3) is very useful for making simple decisions, however, when tree models are complicated, it has lower prediction power and it poses a risk of result instability (possibility of deriving different results in iterated analysis) [15] . As an alternative method to overcome this limitation, many studies have developed predictive models using various machine learning techniques and combined them into a stacking ensemble learning model to reduce the risk of bias that individual models may have [16] [17] [18] . On the other hand, when developing a predictive model using medical data, explanatory power (interpretation) of the results is important in addition to accuracy. Recently, one important issue in medical artificial intelligence (AI) is to develop eXplainable Artificial Intelligence (X-AI) that can explain and present decisions made by AI in a form that can be understood by humans [19] . In the case of image classification, which is unstructured data, new methods such as learning deep explanation or gradient-class activation map (Grad-CAM) have been developed and used in various fields [20] . In the case of structured data, such as examination data, Carvalho et al. (2019) [21] and Wang et al. (2019) [22] introduced a method of presenting the key predictors derived from machine learning with decision tree visualization as an alternative way to increase the interpretability of the black box model. This epidemiological study aimed to develop an X-AI that could explain groups with a high anxiety disorder risk in old age. To achieve this objective, (1) this study explored the predictors of senile anxiety using base models and meta models. (2) This study presented decision tree visualization that could help psychiatric consultants and primary physicians easily interpret the path of predicting high-risk groups based on major predictors derived from final machine learning models with the best performance. This study is a secondary data use study using the Korean Psychosocial Anxiety (KPA) Survey, a national survey. The KPA survey was conducted from August to September 2015 under the supervision of the Korea Institute for Health and Social Affairs. This study stratified 17 cities and provinces in South Korea using the population data of the statistical yearbook (complete enumeration) of the Ministry of Safety and Public Administration as of June 2015, and sampled by using the quota sampling method while considering the composition ratios of gender, age, and residential region. This study selected 200 eup, myeon, or dong for sampling sites using the probabilities proportional to size (PPS) method by treating 3552 eup, myeon, or dong in South Korea as the population. This study applied PPS after sorting cities, counties, and districts based on the administrative district code to secure the randomness of the samples. After choosing 200 sample sites, we visited the selected sample sites and chose the fifth household from the community center of each eup, myeon, and dong. As a result, this study surveyed 7000 adults who were 19 years or older. A surveyor who received survey training visited the sample household and conducted a 1:1 survey based on a computer assisted personal interview. This study was approved by the Clinical Research Ethics Committee of University H (No. 20180042). This study analyzed 1558 elderly (695 males, and 863 females) who were 60 years or older and completed the Zung's Self-Rating Anxiety Scale (SAS) [23] , which was translated into Korean and standardized. The anxiety disorder, an outcome variable, was measured using the Korean version of SAS [23] , which is a translated and standardized version of Zung's SAS [24] . SAS is a self-reporting test that encompasses emotional and psychophysiological aspects. It is a widely used standardized screening test that can easily measure anxiety disorders in healthy people [25] . The SAS consists of a 4-point Likert scale composed of 20 items, and the total score is 80 points. A higher score indicates more severe anxiety symptoms. When developing the Korean version of SAS, the Cronbach alpha value, indicating internal consistency, was 0.96, and the overall accurate discrimination rate, discriminating between healthy patients and patients with anxiety, was 93.7% [24] . In this study, the threshold of the anxiety disorder was set as 45 points. Referring to previous studies [26] [27] [28] [29] [30] , explanatory variables of this study included age, self-esteem, alcohol use disorder (normal drinker, high-risk drinker, or alcohol use disorder), subjective loneliness (very rare, occasionally lonely, often lonely, or mostly lonely), the experience of suicidal urge over the past year (yes or no), subjective frequency of communication with neighbors and friends (10-point scale; a higher score means more frequent communication), subjective frequency of communication with other family members (10-point scale), subjective satisfaction with help (support) from neighbors (yes or no), regular club activities (yes or no), perceived social support, subjective trust satisfaction with neighbors (yes or no), subjective satisfaction in the safety level of the neighborhood (yes or no), subjective satisfaction in the living environment of the neighborhood (yes or no), subjective satisfaction in the medical service of the region (yes or no), mean monthly household income (