key: cord-0055308-9afi91l7 authors: Lee, Sang Chul; An, Chansik; Yoo, Jongha; Park, Sungho; Shin, Donggyo; Han, Chang Hoon title: Development and validation of a nomogram to predict pulmonary function and the presence of chronic obstructive pulmonary disease in a Korean population date: 2021-01-19 journal: BMC Pulm Med DOI: 10.1186/s12890-021-01391-z sha: 97f1ecac8bc485ee9b11445a09c5194bc532436a doc_id: 55308 cord_uid: 9afi91l7 BACKGROUND: Early suspicion followed by assessing lung function with spirometry could decrease the underdiagnosis of chronic obstructive pulmonary disease (COPD) in primary care. We aimed to develop a nomogram to predict the FEV(1)/FVC ratio and the presence of COPD. METHODS: We retrospectively reviewed the data of 4241 adult patients who underwent spirometry between 2013 and 2019. By linear regression analysis, variables associated with FEV(1)/FVC were identified in the training cohort (n = 2969). Using the variables as predictors, a nomogram was created to predict the FEV(1)/FVC ratio and validated in the test cohort (n = 1272). RESULTS: Older age (β coefficient [95% CI], − 0.153 [− 0.183, − 0.122]), male sex (− 1.904 [− 2.749, − 1.056]), current or past smoking history (− 3.324 [− 4.200, − 2.453]), and the presence of dyspnea (− 2.453 [− 3.612, − 1.291]) or overweight (0.894 [0.191, 1.598]) were significantly associated with the FEV(1)/FVC ratio. In the final testing, the developed nomogram showed a mean absolute error of 8.2% between the predicted and actual FEV(1)/FVC ratios. The overall performance was best when FEV(1)/FVC < 70% was used as a diagnostic criterion for COPD; the sensitivity, specificity, and balanced accuracy were 82.3%, 68.6%, and 75.5%, respectively. CONCLUSION: The developed nomogram could be used to identify potential patients at risk of COPD who may need further evaluation, especially in the primary care setting where spirometry is not available. Chronic obstructive pulmonary disease (COPD), the third leading cause of mortality worldwide, is a common and preventable disease characterized by progressive airflow obstruction [1, 2] . One of the main challenges in COPD is its frequent underdiagnosis [3] . People with early or undiagnosed COPD have been shown to have significant morbidity from exacerbations many years before their diagnosis, which can burden healthcare costs [4] . One of the most important factors contributing to the delayed diagnosis of COPD is the low use of spirometry in primary care [3, 5, 6] . People with early or undiagnosed COPD are most likely to encounter the healthcare system in the primary care setting. Therefore, earlier diagnosis of COPD in primary care followed by proper management could significantly improve the prognosis [7] . However, primary care providers do not always have access, time, Open Access *Correspondence: chansikan@gmail.com; chansikan@nhimc.or.kr 2 Research Institute, National Health Insurance Service Ilsan Hospital, Goyang, Republic of Korea Full list of author information is available at the end of the article or adequate training to use spirometry for patients suspected of having COPD [8] . Alternatively, an easy-to-use tool to predict spirometry results or the presence of COPD would be helpful in enhancing COPD screening in primary care. When a patient is predicted to have COPD, the primary care provider would refer the patient for spirometry. Previous studies have proposed prediction models for the diagnosis or prognosis of COPD patients [9] [10] [11] [12] [13] [14] [15] [16] . However, most of these models require information not available in the primary care setting and were developed mainly in non-Asian populations. Therefore, the purpose of this study was to develop and validate a history-and symptom-based nomogram that can be conveniently used to predict a spirometry resultthe forced expiratory volume-one second (FEV 1 )/forced vital capacity (FVC) ratio-and the presence or absence of COPD. We searched our electronic medical record database and found 6322 adult (≥ 40 years) patients who underwent pulmonary function tests, including spirometry, at a single medical institution in South Korea (hereafter, Korea) between January 2012 and December 2019. Of these, 1,703 patients who were already diagnosed with COPD in 2012 were excluded. Thus, patients included in this study were either first diagnosed with COPD between 2013 and 2019 or were not diagnosed with COPD during the study period. When a patient underwent multiple spirometry measurements, the first test result was used to exclude the possible treatment effect on spirometry results. Of these patients, 378 were excluded because smoking history was missing. Other respiratory ailments, such as asthma, bronchiectasis, or interstitial lung disease, were not included in this study. The final study cohort was randomly split into train and test cohorts with a ratio of 7:3 while preserving the same proportion of COPD patients (Fig. 1) . The ratio of 7:3 is commonly used as a rule-ofthumb when splitting a dataset into training and test sets; a recent machine-learning study also reported that using 70% or 80% of the data as a training set showed the best result [17] . To assess pulmonary function, spirometry was performed according to the American Thoracic Society/European Respiratory Society (ATS/ERS) standards by trained research assistants [18] . Dry rolling-seal spirometer (Model 2130; SensoMedics, Yorba Linda, CA, USA) was used for all subjects. All spirometry traces were reviewed by a specialist to determine whether they fulfilled the reproducibility and acceptability criteria of the ATS/ERS Task Force. The normal predictive values for spirometry data were calculated using a reference equation derived from Korea's general population [19] . A fixed criterion of predicted forced expiratory volume in 1 s per forced vital capacity (i.e., FEV 1 /FVC < 0.7) was used to diagnose patients with COPD in accordance with the Global Initiative for Chronic Obstructive Lung Disease (GOLD) guidelines [20] . Outcome variables were the FEV 1 /FVC ratio and the presence or absence of COPD. Predictors for the outcome were age, sex, overweight (defined as body mass index [BMI] > 25 kg/m 2 ), smoking history, the symptoms of dyspnea, cough, or sputum, the presence or absence of underlying hypertension, diabetes, congestive heart failure, coronary vascular disease, stroke, or anemia, and the prior use of salbutamol or antibiotics. The information on age, sex, height, weight, and the presence or absence of the symptoms was retrieved from the electronic medical records, while the information on medication and preexisting medical conditions was extracted from the national health insurance claim data. The diagnosis claimed by the healthcare providers and the actual diagnosis may differ because the dataset was established for recording claims and reimbursements. Thus, operational definition was used for determining the presence of medical conditions, as previously reported [21] . For example, hypertension was diagnosed when a patient on antihypertensive medication was admitted for the first time or visited outpatient clinic for a second time with the International Classification of Diseases 10 th revision (ICD-10) codes for hypertensive disease. The prior use of salbutamol or antibiotics was defined as the presence of the drug codes with the ICD-10 codes for lower respiratory infections within 3 years prior to the spirometry. Antibiotics included were amoxicillin, clarithromycin, co-amoxiclav, erythromycin, doxycycline and cephalexin. Smoking history was available from both our medical records and the national health screening results, as all Koreans over 40 years old are mandated to undergo the biannual national health screening, which contains a questionnaire about smoking habits. However, there were approximately 5 times more missing values in the health screening records than the medical records. Therefore, we mainly used smoking history from the medical records; we used the health screening database instead only when smoking history was missing in the medical records. Continuous or categorical variables were compared between the training and test cohorts using t test or chisquare tests, respectively. Univariable and multivariable linear regression was performed to determine the association between the risk factors and FEV 1 /FVC ratio and find independent predictors. In the multivariable regression, only variables with a significant association with the FEV 1 /FVC ratio in the univariable regression were used. A linear regression model for predicting the FEV 1 /FVC ratio was fit in the training cohort and validated in the test cohort using mean absolute error (MAE) as an evaluation metric. In addition, the agreement between the predicted and actual FEV 1 /FVC values was graphically assessed using the Bland-Altman plot. A nomogram to predict FEV 1 /FVC was created based on the prediction model fitted in the training cohort. Using predicted FEV 1 /FVC values as a diagnostic criterion, the area under receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and balanced accuracy were calculated for discriminating between patients with and without COPD, with their 95% confidence intervals (CIs). Our study cohort was imbalanced, with approximately 9 times more patients in the non-COPD group than in the COPD group. In an imbalanced cohort, it is highly likely that predicted outcome values are biased towards the majority group (i.e., non-COPD group or higher FEV 1 /FVC ratio in this study). Therefore, when training the model, we used the synthetic minority over-sampling technique (SMOTE) algorithm to create synthetic minority class cases to balance the two classes [22] . All analyses were performed using R 3.6.0. The packages used include 'stats (v3.6.0)' for linear regression, 'pROC (v1.15.3)' for AUC analysis, 'epiR (v1.0-15)' for calculating diagnostic performances, 'DMwR (v0.4.1)' for SMOTE, 'rms (v5.1-3.1)' for drawing a nomogram, and 'BlandAltmanLeh (v0.3.1)' for drawing a Bland-Altman plot. Two-sided probability values of < 0.05 were considered statistically significant. The final study cohort comprised 4241 patients (2204 men and 2037 women) with a mean age of 67 (range, 40-98) years. The mean or frequency of all the variables was not significantly different between the training and test cohorts ( Table 1) Table 2) . The mean difference between the predicted and actual FEV 1 /FVC ratios (i.e., MAE) was 8.858 in the training cohort and 8.721 in the test cohort. For FEV 1 /FVC in the range between 65 and 75, the MAE was 6.324 in the training cohort and 6.490 in the test cohorts. In the diagnosis of COPD using the predicted FEV 1 / FVC ratio, the AUC was 0.832 (95% CI 0.812-0.845) and 0.822 (95% CI 0.789-0.854) in the training and test cohorts, respectively. The overall performance was best when the criterion of FEV 1 /FVC < 70 was used to diagnose COPD; the sensitivity, specificity, PPV, NPV, and balanced accuracy were 82.3%, 68.6%, 25.5%, 96.7%, and 75.5%, respectively ( Table 3) . The Bland-Altman plot revealed a trend that our model overestimated FEV 1 /FVC when an actual FEV 1 / FVC value was less than 65; in this range, many cases were observed above the upper 95% limit of agreement (Fig. 2) . Hence, the effective range of the FEV 1 /FVC ratio predicted by our nomogram was from 65 to 90; a predicted FEV 1 /FVC value less than 65 or larger than 90 must be interpreted as 'less than 65' or 'larger than 90' , respectively, instead of the actual value itself (Fig. 3 ). In this study, we developed a multivariable model to identify patients who are expected to have decreased pulmonary function and thus is at risk for COPD. In developing this prediction model, we aimed to create an easy-to-use tool that can help primary care providers decide whether to refer patient suspected of having COPD to a facility where spirometry is available. Thus, we examined variables that are obtainable from simple physical examination and history taking for potential predictors. In our study, the five variables associated with airflow limitation (i.e., decreased FEV 1 /FVC) were older age, male sex, the absence of overweight, the presence of dyspnea, and ever-smoking history. Old age, male sex, and smoking are well-known risk factors for COPD. Historically, COPD has been considered a disease of elderly male smokers, although evidence suggests that this historical view is slowly changing [23] . The prevalence and mortality of COPD have increased more rapidly in women than in men during the past 2 decades, attributed to the changing smoking trends during the past 50 years [24] . Hence, reevaluation of risk stratification by gender is warranted in the future. Tobacco smoking is the most powerful risk factor for COPD. Although the acquisition of accurate and correct information on the actual smoking habits-duration, amount, and type of cigarette-is of utmost importance, the information in electronic medical records is often quite variable depending on the timing of data entry, visit route (i.e., outpatient, emergency room, or general ward), or medical staff who entered the data. Thus, we processed the smoking data as a binary variable: non-smoker and ever (current or past)-smoker. In this study, the presence of overweight showed a protective effect, which is in line with previous studies. A study with Asian COPD patients reported that COPD patients with a higher BMI have a better pulmonary function [25] . In another study, while underweight was associated with poor survival in COPD, there was a protective effect of overweight and obesity on mortality on COPD patients [26] . The ATS/ERS guideline recommends using the percentile value rather than fixed percent for interpreting the results of a pulmonary function test because using 0.70 as a lower limit of the FEV1/FVC ratio could yield a significant number of false-positive results in males aged > 40 years and females > 50 years [27] . However, our purpose was to develop an easy-to-use nomogram for primary physicians to decide whether to refer a patient for pulmonary function test. For physicians who are not familiar with spirometry, a fixed percent ratio can be more intuitive and easy-to-read than a percentile value when making such a decision. In addition, in this screening setting, false-negative diagnosis by underestimating the risk is generally considered worse than false-positive diagnosis by overestimating the risk. Furthermore, the results of a pulmonary function test were similar with both the fixed ratio and the percentile value in a previous study [28] . Hence, we used a fixed FEV1/FVC ratio as the outcome in this study. GOLD guidelines also support the use of multivariable prediction models to assess the prognostic profile and facilitate follow-up of patients, instead of single predictors such as spirometry or history of exacerbations [20] . Several models have been proposed to predict the risk of COPD. However, they either require information that cannot be obtained in the primary care setting [9-11, 13, 15] or were developed for non-Asian populations [9, 10, [13] [14] [15] [16] . Therefore, our model may have additional values because it only requires simple physical examination and history taking. In addition, since the occurrence and manifestation of COPD is unique to each race and country [29] [30] [31] [32] , we believe that our model could screen more undiagnosed COPD patients in Korea. We wish that we could improve our model as more data are obtained in the future and eventually develop a robust, reliable prediction model that can be used nationwide. This study has some limitations. First, further external validation is needed, because this model was developed with a retrospective study in a single institution. Second, detailed smoking history, such as the type of smoking, amount, and duration, was not used in our analysis. In addition to the conventional tobacco smoking, various electronic cigarettes using new nicotine delivery technologies have recently gained popularity in public. Although recent national health screening questionnaires are changing to reflect recent smoking behavior, the data Fig. 2 Plot of differences between predicted and actual FEV 1 /FVC ratio versus the mean of the predicted and actual ratios in the test set. The black line represents the mean of the differences, showing the presence of bias, about − 4.7, in the predicted FEV 1 /FVC ratio. The two dotted lines represent the limits of agreement, ± 1.96σ Fig. 3 Nomogram predicting FEV 1 /FVC ratio. The nomogram is used by first giving each variable a score on the 'Points' scale. The scores for all variables are then added to obtain the total score and a vertical line is drawn from the 'Total Points' row to estimate the FEV 1 /FVC ratio. A predicted FEV 1 /FVC value < 65 or > 90 must be interpreted as 'less than 65' or 'larger than 90' , respectively used in this study did not contain it. Third, we were not able to collect the information on the history of occupation or exposure to chemical irritants, which can also be critical for predicting pulmonary function and the presence of COPD. In conclusion, we developed a nomogram to predict an FEV 1 /FVC ratio and the presence of COPD based on age, sex, weight, the presence of dyspnea, and smoking history. This nomogram could be used conveniently to screen potential high-risk patients, especially in the primary care setting where spirometry is not available. COPD guidelines: a review of the 2018 GOLD report The unmet global burden of COPD Global burden of COPD Screening for and early detection of chronic obstructive pulmonary disease Prevalence and underdiagnosis of chronic obstructive pulmonary disease among patients at risk in primary care The prevalence of undiagnosed chronic obstructive pulmonary disease in a primary care population with respiratory tract infections-a case finding study Earlier diagnosis and earlier treatment of COPD in primary care A mixed methods study to compare models of spirometry delivery in primary care for patients at risk of COPD A machine learning approach to triaging patients with chronic obstructive pulmonary disease A new approach for identifying patients with undiagnosed chronic obstructive pulmonary disease Comparison and development of machine learning tools for the prediction of chronic obstructive pulmonary disease in the Chinese population Prognostic models for outcome prediction in patients with chronic obstructive pulmonary disease: systematic review and critical appraisal Developing an algorithm to identify people with Chronic Obstructive Pulmonary Disease (COPD) using administrative data Development and validation of a model to predict the 10-year risk of general practitioner-recorded COPD An algorithm for the identification of undiagnosed COPD cases using administrative claims data Predicting risk of undiagnosed COPD: development and validation of the TargetCOPD score Hybrid adaptive computational intelligence-based multisensor data fusion applied to real-time UAV autonomous navigation Standardisation of spirometry Normal predictive values of spirometry in Korean population Global strategy for the diagnosis, management, and prevention of chronic obstructive lung disease: the GOLD science committee report 2019 Machine learning prediction for mortality of patients diagnosed with COVID-19: a nationwide Korean cohort study SMOTE: synthetic minority over-sampling technique • fast, convenient online submission • thorough peer review by experienced researchers in your field • rapid publication on acceptance • support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year submit your research ? Choose BMC COPD and gender differences: an update Comparison of World Health Organization and Asia-Pacific body mass index classifications in COPD patients Body weight and mortality in COPD: focus on the obesity paradox Interpretative strategies for lung function tests Comparison of per cent predicted and percentile values for pulmonary function test interpretation Defining race/ethnicity and explaining difference in research studies on lung function Ethnic differences in pulmonary function in healthy nonsmoking Asian-Americans and European-Americans Racial differences in quality of life in patients with Socioeconomic status, race and COPD health outcomes Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations This study was also part of the Ilsan Hospital Machine Intelligence with National Big Data (I-MIND) project. 1 Authors' contributions SCL, CA, and CHH conceived the study. JY and DS obtained and extracted data. CA and SHP cleaned data. SCL and CA analyzed the data and wrote the paper; CA performed the statistical analysis. All authors have taken due care to ensure the integrity of this work. CA and CHH were in charge of the overall direction. All authors read and approved the final manuscript. This work was supported by the Technology Innovation Program (20011642, CDM based algorithm treatment protocol service system development and spread), funded by the Ministry of Trade, Industry & Energy (Korea). Due to the institutional policy, data can only be made available to researchers who subject to a non-disclosure agreement, upon reasonable request. The Institutional Review Board of National Health Insurance Service Ilsan Hospital (NHIMC 2020-06-005) approved this Health Insurance Portability and Accountability Act-compliant retrospective study and waived the informed consent. All methods were performed in accordance with relevant guidelines and regulations. Not applicable. The authors declare that they have no conflict of interest.