key: cord-0030743-fb2829yj
authors: Nuutinen, Mikko; Haukka, Jari; Virkkula, Paula; Torkki, Paulus; Toppila-Salmi, Sanna
title: Using machine learning for the personalised prediction of revision endoscopic sinus surgery
date: 2022-04-29
journal: PLoS One
DOI: 10.1371/journal.pone.0267146
sha: 92624a96d16350242d6166841d6fc4927cbce6f7
doc_id: 30743
cord_uid: fb2829yj

BACKGROUND: Revision endoscopic sinus surgery (ESS) is often considered for chronic rhinosinusitis (CRS) if maximal conservative treatment and baseline ESS prove insufficient. Emerging research outlines the risk factors of revision ESS. However, accurately predicting revision ESS at the individual level remains uncertain. This study aims to examine the prediction accuracy of revision ESS and to identify the effects of risk factors at the individual level. METHODS: We collected demographic and clinical variables from the electronic health records of 767 surgical CRS patients ≥16 years of age. Revision ESS was performed on 111 (14.5%) patients. The prediction accuracy of revision ESS was examined by training and validating different machine learning models, while the effects of variables were analysed using the Shapley values and partial dependence plots. RESULTS: The logistic regression, gradient boosting and random forest classifiers performed similarly in predicting revision ESS. Area under the receiving operating characteristic curve (AUROC) values were 0.744, 0.741 and 0.730, respectively, using data collected from the baseline visit until six months after baseline ESS. The length of time during which data were collected improved the prediction performance. For data collection times of 0, 3, 6 and 12 months after baseline ESS, AUROC values for the logistic regression were 0.682, 0.715, 0.744 and 0.784, respectively. The number of visits before or after baseline ESS, the number of days from the baseline visit to the baseline ESS, patient age, CRS with nasal polyps (CRSwNP), asthma, non-steroidal anti-inflammatory drug exacerbated respiratory disease and immunodeficiency or suspicion of it all associated with revision ESS. Patient age and number of visits before baseline ESS carried non-linear effects for predictions. CONCLUSIONS: Intelligent data analysis identified important predictors of revision ESS at the individual level, such as the frequency of clinical visits, patient age, Type 2 high diseases and immunodeficiency or a suspicion of it.

The logistic regression, gradient boosting and random forest classifiers performed similarly in predicting revision ESS. Area under the receiving operating characteristic curve (AUROC) values were 0.744, 0.741 and 0.730, respectively, using data collected from the baseline visit until six months after baseline ESS. The length of time during which data were collected improved the prediction performance. For data collection times of 0, 3, 6 and 12 months after baseline ESS, AUROC values for the logistic regression were 0.682, 0.715, 0.744 and 0.784, respectively. The number of visits before or after baseline ESS, the number of days from the baseline visit to the baseline ESS, patient age, CRS with nasal polyps (CRSwNP), asthma, non-steroidal anti-inflammatory drug exacerbated respiratory disease and immunodeficiency or suspicion of it all associated with revision ESS. Patient age and number of visits before baseline ESS carried non-linear effects for predictions. 

Chronic rhinosinusitis (CRS) is a symptomatic inflammatory disease of the nasal and paranasal mucosa lasting more than 12 weeks [1] . With a prevalence of about 11%, CRS diminishes patient quality of life and productivity and increases healthcare costs [1] . The main phenotypes are CRS with nasal polyps (CRSwNP) and without (CRSsNP) [1] [2] [3] . The majority of CRS cases occurring in Western countries are characterised by Type 2 high inflammation with elevated levels of eosinophils, interleukin-4 (IL-4), IL-5 and IL-13 [1] . Nonsteroidal anti-inflammatory drug (NSAID) exacerbated respiratory disease is a Type 2 high chronic inflammatory syndrome with a partially unknown pathobiology associated with CRSwNP and asthma and with an increased morbidity [4] [5] [6] . Endoscopic sinus surgery (ESS) represents a cost-effective treatment [7] if conservative therapy (such as intranasal corticosteroids and nasal saline irrigation) is insufficient [1] . The success rates for initial ESS range from 76% to 98% [8, 9] . The early identification of CRS recurrence risk following ESS is cost-effective [10, 11] , helping to correctly target treatment [12] and prevent permanent tissue changes [1] .

A substantial number of studies have identified the risk factors of revision ESS [13] [14] [15] [16] [17] [18] [19] [20] [21] , in studies varying according to sample size (n = 66 [21] vs. n = 61 000 [15] ), data collection methods (large retrospective database [15] vs. prospective questionnaires [14] ) or geographic location (USA [15] , Australia [22] and Finland [13] ). Commonly recognised risk factors include nasal polyps, asthma, allergy, non-steroidal anti-inflammatory drug (NSAID) exacerbated respiratory disease (NERD) and a previous ESS. In a meta-analysis [19] , the strongest predictors of revision ESS were allergic fungal rhinosinusitis, NERD, asthma, prior polypectomy and operations prior to 2008. However, no prior research has analysed the prediction accuracy of revision ESS at the individual level or for variables with a nonlinear association. In this study, we examined the accuracy of the personalised prediction of revision ESS, and attempted to identify the effects of important predictor variables via modern machine-learning algorithms and methods.

This study consisted of rhinitis or rhinosinusitis patients presenting at the Department of Otorhinolaryngology at the Hospital District of Helsinki and Uusimaa (HUS), Finland. The HUS ethics committee approved the study protocol (nro 31/13/03/00/2015), thereby precluding the need to obtain written informed consent from patients for this retrospective follow-up study.

The inclusion criteria for the initial patient population (n = 5080) was an ICD-10 diagnosis of J30, J31, J32, J33 or J01 registered during outpatient visits in 2005, 2007, 2009, 2011 or 2013. Longitudinal data for a random patient sample were collected from the electronic health records (EHRs), such that the sample size was the same for each sampling year and each month of the sampling year. The last data collection day for follow-up was 31 September 2019. CRS was defined as diagnostic codes J33 and/or J32. ESS was defined based on the surgical codes (Table A in S1 File). In total, we excluded 27 CRS patients <16 years of age. The baseline visit was defined as the first clinic visit, and baseline ESS was defined as the first ESS identified in EHRs at the specific sampling time. Revision ESS was defined as an ESS performed following the baseline ESS during the follow-up period.

A total of 111 of 767 (14.5%) CRS patients underwent revision ESS (mean±stdev) 30.3±31.0 months following the baseline ESS (Fig 1a and Table 1 ). Among revised patients, 88 underwent one revision ESS and 23 patients underwent two or more revisions (Fig 1b) . Table 2 summarises the patient characteristics that were analysed both from the structured EHR data (visits, procedure codes and patient diagnoses) and free clinical texts (diagnoses and comorbidities). Comorbidity-related variables were obtained from the ICD-10 codes (Table B in S1 File) and using validated keyword-based information extraction from free clinical texts (see S2 File). For asthma, we used ICD-10 code J45, doctor-diagnosed lung function test-confirmed asthma. A NERD diagnosis was obtained from EHR text and was based on a typical history of airway symptoms following the ingestion of NSAID with/without challenge test confirmation of NERD. 

In this study, we conducted four analyses: univariate model, machine learning classifier comparison, the effect of the data collection time and model interpretability analyses. The univariate models examined the prediction accuracy of individual variables using univariate logistic regression classifiers. Machine learning classifier comparison examined the predictive performance of three classifiers: random forest, logistic regression and gradient boosting. The random forest and gradient boosting classifiers were chosen for the machine learning classifier comparison since they are widely used with a demonstrated good performance [23] . Logistic regression was chosen because it is simple and still performs relatively well [24] . It remains important to determine if simpler algorithms perform comparably well. To understand the effect of the time of data collection time, the performance of the classifier was calculated when the variable collection time was from the baseline visit to the baseline ESS or to 3, 6 or 12 months following baseline ESS. Fig 2 illustrates the timeline of the data collection for the models 0, 3, 6 and 12 months, respectively. For example, the model for 3 months was trained and validated using patient data collected from patient EHRs between patients' baseline visits and 3 months following baseline ESS. The logistic regression classifier was selected for the analysis of the data collection time period because it is simple and because the machine learning classifier study demonstrated that its performance was higher or similar to other classifiers. However, the logistic regression classifier is linear and thus not able to model possible nonmonotonic relationships between predictors and outcomes. The random forest and gradient boosting classifiers can model complex, non-monotonous relationships, but are so-called black box models or uninterpretable classifiers. The relationships between inputs and outputs are difficult to understand directly from the parameters or the structure of the trained model. For the model interpretability analysis, we chose to use gradient boosting classifiers. The model interpretability analysis was calculated using Shapley values (SHAP) and partial dependence plots (PDPs) [25, 26] , and were analysed for their importance and the possible nonmonotonic effects of the variables. Fig 3 shows the data flows for training and testing the classifiers. Original data were first divided into two distinct data folds: the training fold (70% of the data) and the test fold (30% of the data). We used the training fold to select the variables and hyperparameters and to train the final models. The test fold relied on an external dataset which we used to measure the performance of the final models. During splitting, folds were stratified to preserve the proportion of patients in both target classes (no revision vs. revision).

The data flow shown in Fig 3a was used to compare the machine learning classifiers and for the analysis of the data collection time period. Here, we summarise the steps in the process, which included variable selection (step 1), searching for model hyperparameters (step 2), model training (step 3) and performance evaluation (step 4). The data flow in Fig 3b provides the univariate model and the model interpretability analyses. Specifically, we proceeded by searching for model hyperparameters (step 2), model training (step 3) and performance evaluation (step 4). One primary difference between the data flows presented is that Fig 3a uses 

The data flow in Fig 3a contains the method of sequential forward variable selection (SFS, step 1) [27] . SFS begins with an empty set, and adds one variable at a time from the original variable set S all ( Table 2) for classifier F(�) by maximising the performance measure. We used the area under the receiver operating characteristic (AUROC) curve as the performance metric. Because our data are unbalanced, we used class weight balanced loss functions. The output of the SFS was 15 most important variables S k,sel for classifier F(�). The (average) importance of each variable a was measured using the following rank metric:Ŗ ðaÞ ¼ 1 10

where r(k, a) is the rank of variable a based on the data set k and #F is the size of the largest subset resulting from SFS [28] [29] [30] . In this study, #F = 15. A higher R(a) (rank score) indicates that variable a is more important according to SFS, because it was selected in the smaller size variable subsets. That is, the variable has a higher predictive capability according to SFS, whereby its revision ESS prediction ability is high. The optimal hyperparameters for classifier F(�) with variables S m,sel were identified using the grid-search method (step 2). The hyperparameter values for different classifiers and the summary statistics for the selected hyperparameter values appear in S5 File. Following the identification of the optimal hyperparameters for classifier F(�) using variables S m,sel , the model was trained (step 3) and the performance was calculated using dataset X test (step 4). 

We used the following standard performance metrics: AUROC, the area under the precision recall curve (AUPRC), precision, sensitivity, specificity and the F1 score. AUROC is the mostly used evaluation metric for measuring the performance of any classification model. An AUROC of 0.5 indicates no discrimination above chance, while an AUROC of 1.0 indicates a perfect classification. A rough guide for the classification ability of a model is AUROC = 0.9-1.0 indicates an excellent performance, AUROC = 0.8-0.9 indicates a good performance, AUROC = 0.7-0.8 indicates a fair performance and AUROC = 0.6-0.7 indicates a poor performance [31, 32] . AUPRC is often used evaluation metric for imbalanced data sets. The baseline (discrimination above change) of AUPRC is equal to the fraction of positives. The baseline AUPRC of our study is 0.145, indicating that 14.5% of the patients underwent revision ESS.

The baseline values for the AUROC and AUPRC metrics were confirmed for our data by training and testing the models using randomised label data ( Table A in S3 File) . Precision refers to the number of true positive results divided by the number of all positive results, including those not identified correctly. In this study, precision refers specifically to the ability of a model to identify only revision patients. Sensitivity, by comparison, indicates the number of true positive results divided by the number of all samples that should have been identified as positive. In this study, then, sensitivity refers specifically to the ability of the model to identify all of the revision patients. Specificity is the number of true negative results divided by the number of all samples that should have been identified as negative. In this study, specificity specifically refers to the ability of the model to identify all patients not needing revision. Finally, the F1 score represents the harmonic mean between the precision and sensitivity. Precision, sensitivity, specificity and F1 score are calculated using the following equations:

where TP is the number of true positives predicted by the classifier, FP is the number of false positives, FN is the number of false negatives and TN is the number of true negatives.

We used seven Python packages-sklearn [33] , xgboost [34] , mlxtend [35] , numpy [36] , pandas [37] , shap [25, 38] and pdpbox [39] -to implement the classifiers and compute the performance values and model interpretations. SFS was computed using the 'SequentialFeatureSelector' function in the mlxtend package. The classifiers of random forest, logistic regression and gradient boosting were implemented using functions from the sklearn.linear_model, sklearn. ensemble and xgboost packages. The grid search for the hyperparameters was conducted using the 'GridSearchCV' function in the sklearn.model_selection package. We computed the Shapley values using the 'TreeExplainer' function in the shap package. Partial dependency plots (PDPs) were created using the 'pdp_isolate' function in the pdpbox package. The packages of numpy and pandas were used for data reading and processing.

The CRS patient population which underwent baseline ESS (n = 767), included 448 (58%) females, ranging in age from 16 to 90 years. Table 2 summarises the patient characteristics and the proportion who did and did not undergo revision ESS. The following comorbidities significantly associated with patients who underwent revision ESS during follow-up: doctor-diagnosed lung function test-confirmed asthma, CRSwNP, allergies, chronic respiratory disease, EHR text-based NERD and immunodeficiency or a suspicion of immunodeficiency. The following continuous variables significantly associated with revision ESS: an older age, a shorter time from the baseline visit to baseline ESS, a higher frequency of visits between the baseline visit and baseline ESS and a higher number of visits from the baseline ESS to 3 months postoperatively, 6 months postoperatively and 12 months postoperatively, respectively. Table 3 presents the results of the univariate logistic regression models to predict revision ESS following baseline ESS. Results represent the average of 10 reformulations from the training and test folds (Fig 3b) . 

The plots in Fig 4 show the AUROC values for the classifiers of random forest, logistic regression and gradient boosting as a function of the number of variables. We applied the SFS method to select variables collected from the baseline visit until six months after the baseline ESS. Results are reported as the averages from 10 reformulations from the training and test folds (see Fig 3a) . Table 6 presents the variables selected using SFS in order of the rank scores calculated using Eq 1. When using any of the three classifiers, the following variables resulted in high rank scores, indicating their importance as predictors of revision ESS: the number of visits six months after baseline ESS, CRSwNP, asthma and NERD. In addition, the frequency of visits from the baseline visit to baseline ESS and the number of visits before the baseline ESS emerged as important predictors.

The effect of the length of time for data collection on the model's ability to predict the risk of revision ESS was evaluated using the logistic regression classifier. Fig 5 presents values when the data collection time period was from the baseline visit to the baseline ESS or until 3, 6 or 12 months after baseline ESS. Table 7 summarises the AUROC, AUPRC, sensitivity, specificity and F1 score values. The highest performance (AUROC = 0.784) was, as expected, found in the model that included a 12-month follow-up period, because more information was available in that model compared with models using 3-or 6-month follow-up periods or no follow-up period at all. The sensitivity for the 12-month model reached 0.61,

We used three models and the classifiers in these models were logistic regression, gradient boosting and random forest for predicting revision ESS. AUROC, area under the receiver operating characteristics curve; ESS, endoscopic sinus surgery.

https://doi.org/10.1371/journal.pone.0267146.g004 Table 4 . AUROC and AUPRC values as a function of the number of variables for predicting revision ESS. We selected variables using the sequential forward selection (SFS) method. Three models were used and the classifiers in these models were the logistic regression (LR), gradient boosting (GB) and random forest (RF) for predicting revision ESS. 

For the model interpretability analysis, we trained the gradient boosting classifier using the variables collected from the baseline visit until 6 months following baseline ESS employing the data flow from Fig 3b. Fig 6 illustrates the variables sorted based on the highest sum from the absolute Shapley values across all patients. The distributions of the data points on the plots show the impact of each variable on the classifier output. We found that a high number of visits after baseline ESS and a short time interval between the baseline visit and baseline ESS both increased the revision ESS risk. In addition, CRSwNP, asthma and allergies increased the revision ESS risk. The Shapley values revealed that patient age and the frequency of clinical visits from baseline visit to baseline ESS (that is, the time period from the baseline visit to the baseline ESS and the number of visits before baseline ESS) affected the revision ESS risk in a nonmonotonic manner. That is, the red values (the higher than the average values) of these variables are dispersed on both sides of the scale. Fig 7 shows the PDPs for the ten variables with the highest Shapley values. The PDP plot for the number of visits 6 months following baseline ESS revealed a wide risk score range, from a value of 0.1 for patients with less than two visits following baseline ESS up to a value of about 0.26 for patients with more than seven visits (Fig 7b) . Similarly, if the patient had two or more postoperative visits within 3 months, the risk score for revision ESS increased (Fig 7g) . The plot for the time between the baseline visit and baseline ESS revealed a sharp drop in the risk score after about 100 days (Fig 7f) . When the time between the baseline visit and ESS was less than 100 days, the risk score was about 0.15. When the time increase to >500 days, the risk score decreased to <0.13. The PDP curve for age was nonmonotonic and the risk scores varied Table 5 . Sensitivity and specificity values as a function of the number of variables for predicting revision ESS. We selected variables using the sequential forward selection (SFS) method. Three models were used and the classifiers in these models were the logistic regression (LR), gradient boosting (GB) and random forest (RF) for predicting revision ESS. (Fig 7e) . The risk score was 0.16 for patients aged 30 to 65 years. Furthermore, the number of visits between the baseline visit and baseline ESS was nonmonotonic. Patients with 10 to 25 visits between the baseline visit and baseline ESS exhibited a smaller risk for revision ESS than patients with fewer than 10 visits or for patients with more than 25 visits (Fig 7j) .

We also detected a moderate correlation between the number of days from the baseline visit to the baseline ESS and the number of visits (p < 0.001, correlation coefficient r = 0.51 from the Pearson's linear correlation test). Yet, the correlation was weak between the number Table 6 . The top ten variables in prediction capacity of revision ESS. Three models were used and the classifiers of the three models were the logistic regression, gradient boosting and random forest for predicting revision ESS. We used the sequential forward selection (SFS) method to select the top performing variables. In each SFS run, the best variable was awarded 15 points, the next best variable 14 points and so on. Ten runs were performed using each of the three classifiers. The rank score represents the sum of points (range, 0-150 points). of days from the baseline visit to the baseline ESS and the following variables: age (r = 0.14), CRSwNP (r = 0.06), asthma (r = 0.19) and immunodeficiency (r = −0.00).

This study aimed to identify individual-level risk factors associated with revision ESS among CRS patients through the use of machine-learning algorithms. Personalised risk assessment is a process whereby an individual's level of risk is calculated using multiple predictors [40] . Personalised risk communication represents a process through which the results of an individual's risk assessment are tailored to their preferences and for specific uses [40] . In part, we identified previously unpublished important predictors of revision ESS, such as a high number of visits before and after baseline ESS as well as a short time interval between the baseline visit and baseline ESS. Our data also demonstrated that demographic variables such as age, Type 2 high diseases (CRSwNP, asthma and NERD) and immunodeficiency or a suspicion of it were important predictors of revision ESS at the individual level. These findings agree with previous observations at the population level [41] . In addition, our findings reinforce the importance of diagnostics and the management of NERD, nasal polyps, asthma and other comorbidities in preventing uncontrolled CRS. In terms of clinical implications, our findings may prove relevant to patient counselling, following up on and planning treatment, such as that of biological therapy [12] . However, validation studies for these results remain necessary. Personalised risk communication has previously proven effective in clinical decision-making, such as in COVID-19 diagnostics [42] , patient selection for cardiac resynchronisation therapy [43] and in organising follow-up for patients receiving adjuvant endocrine therapy [44] .

To our knowledge, machine learning models have not been previously used to predict revision ESS among CRS patients. Machine learning, however, has previously been used in allergology and related research [45] , including in the prediction of persistent early childhood asthma [46] , eosinophilic esophagitis [47] , eosinophilic CRS [48] or osteomeatal complex inflammation [49] . In addition, machine learning has found applications in predicting postoperative outcomes for degenerative cervical myelopathy [50] , revision surgery following knee replacement [51] , prolonged opioid prescription following surgery for lumbar disc herniation [52] , blood transfusion following adult spinal deformity surgery [53] , surgical infections [54] and olfactory recovery after ESS [55] . None of these previous studies, however, have presented models designed to predict revision ESS at the individual level. Revision ESS risk has previously been studied at the population level relying instead on traditional statistical models such 

as Cox's proportional hazard models [13, 15, 16] or logistic regression models [14, 15, 18, 20] . Such studies have assumed associations are linear and that an alpha error <5% indicates the importance of a predictor.

We found that a greater number of visits, a higher frequency of visits and a shorter time period between the baseline visit and baseline ESS all associated with revision ESS. This might reflect that patients with a high number of visits exhibit more uncontrolled CRS and thus may ultimately undergo revision ESS. In other words, our findings suggest that an increasing number of visits before ESS might signal more severe disease, which affects not just the physician's and but also the patient's decision regarding ESS at baseline as well as the revision ESS during follow-up. These results indicate that patients who achieved control of disease following baseline ESS did not require further follow-up visits through tertiary care centres and were unsubscribed from the hospital. Patients with ongoing problems, however, tend to visit the clinic more frequently and exhibit a higher probability of ultimately undergoing revision ESS. We found little evidence in the literature on the predictive potential of visits at the individual level. A retrospective cohort study from the USA (n = 6985) revealed that the number of postoperative outpatient visits associated with revision surgery for anterior cruciate ligament reconstructions [56] , findings similar to ours, albeit different types of surgeries and at a population level. Our findings indicate that patients with a higher frequency of visits at baseline exhibit a higher risk only partially controlled by surgery might prove helpful when counselling patients.

Our study also showed that CRSwNP, asthma and NERD represent important predictors of revision ESS at the individual level. In accordance with this, previous studies demonstrated at the hospital population level that several factors associate with CRS recurrence and/or revision ESS, including CRSwNP, asthma, allergic rhinitis, NERD, eosinophilia and smoking [1, 13, 57, 58] . CRSwNP patients with a comorbidity of asthma and/or NERD carry an increased risk for recurrence and revision ESS, although these patients appear to benefit from an initial ESS [19, 41, [59] [60] [61] . This finding may reflect more severe disease, typically presenting with comorbidities for NERD, anosmia, Type 2 high eosinophilic inflammation and a higher likelihood of polyp regrowth [5, 57, [62] [63] [64] [65] [66] [67] [68] [69] [70] . In SFS, immunodeficiency or a suspicion of it also emerged as one of the top ten predictors in all three classifiers. Immunodeficiency increases the risk of infectious exacerbations and uncontrolled CRS, thereby also increasing the risk of revision ESS. This agrees with a previous study that demonstrated that at the hospital population level immunodeficiency and granulomatosis with polyangiitis increase the revision ESS risk [71] . While the variable 'suspicion of immunodeficiency' is not the same as a diagnosed immunodeficiency, it might indirectly reflect a similar situation regarding poor CRS control, leaving a physician to suspect a rare comorbidity or allowing consideration for the need of revision ESS.

We also demonstrated that the length of time for EHR data collection increased the predictive accuracy of the models. The time period for data collection from the baseline visit until 12 months following the baseline ESS carried the highest predictive accuracy in our models. The time interval for data collection for the model serves to optimise the time required following baseline ESS and model accuracy.

We validated the predictive accuracy using three classifiers. To do, we chose to use logistic regression, gradient boosting and random forest classifiers since they possess different properties and generally have been used in predicting surgical outcomes [50, 72] or persistent asthma [46] . The logistic regression classifier is linear and thus incapable of modelling possible nonmonotonic and nonlinear relationships between predictors and outcomes [73] . The random forest and gradient boosting classifiers can model complex relationships, but they represent so-called black box models, meaning that are uninterpretable classifiers, whereby the relationships between inputs and results are difficult to directly interpret beyond the parameters or the structure of the trained model [73] . Since the predictive accuracy of the variables was similar across the three classifiers in our study, we used logistic regression primarily to validate the variable collection time period. Overall, our findings indicate the importance of validating outcome prediction using different classifiers and evaluating the effect of the data collection time period, as suggested in previous studies [74, 75] . By evaluating different classifiers, we found that a simple and interpretable logistic regression model may prove adequate for clinical application. However, if modelling requires nonlinear relationships, then random forest or gradient boosting models can be used. Classification performance proved comparable across all classifiers.

Revision ESS risk was previously studied at the population level using Cox's proportional hazard [13, 15, 16] or logistic regression [14, 15, 18, 20] models, which usually assume associations are linear and that an alpha error <5% indicates the importance of a predictor. Using these assumptions, previous studies have demonstrated that a younger age associated with revision ESS [13, 66] . We found that age actually affects revision ESS risk in a nonmonotonic manner, thus indicating that machine learning improves the prediction potential of age in revision ESS risk. Similarly, nonlinear approaches have significantly improved the prediction of stroke risk [76] .

Both our own and previous study groups have examined populations of CRSwNP [66] or CRS [13] patients. In our study, we found that age actually affects revision ESS risk in a nonmonotonic manner. Thus logistic regression models appear less than ideal for examining the impact of the individual patient's age on revision ESS risk. By performing partial dependency plot analyses, we showed that the revision ESS risk was highest for patients aged 30 to 70 years, and medium high for patients older than 70 years, whereas the risk was lowest among patients aged 16 to 30 years. Younger patients experience less CRSwNP or CRSwNP among such patients often comprises antrochoanal polyps, which carry a smaller revision surgery risk [1] .

Furthermore, the number of visits before baseline ESS carried nonlinear effects as predictors in our study. Patients logging 10 to 25 clinical visits between the baseline visit and baseline ESS exhibited a lower risk for revision ESS than patients with fewer than 10 or more than 25 clinic visits. Those patients visiting the clinic 10 to 25 times before baseline ESS may have CRSsNP with acute recurrent exacerbations. However, this subgroup warrants further study in order to confirm this assumption, since the number of subjects in our study was rather small. We can speculate that some physicians may schedule more frequent follow-up visits even with sufficient disease control. That said, consistent practices have been employed in our hospital, the clinical visit frequencies are closely monitored and routine controls are not reserved. Thus we argue that a visit frequency �2 per year reflects relatively poor disease control. Previous studies found that CRSwNP patients with recurrent acute rhinosinusitis episodes benefit from an initial ESS [1] . Previous studies on other conditions and on other predictors revealed a Ushaped association between the predictor variable and outcome, including associations between intraoperative net fluid balance and early atrial tachyarrhythmia recurrence [77] as well as between body mass index and asthma in Japanese children [78] . These findings highlight the importance of evaluating the linearity of associations to improve the personalised predictive value of them.

The strengths of this study include the random sample of hospital patients, the long followup time period we captured and the discovery of nonlinear associations between certain variables and outcomes. In addition, the novelty of this study lies in the validation of models employing several classifiers, which were also tested at the individual level.

We should also mention several limitations to our study, which include changes which occurred in ESS and CRS care during the sampling time period. To minimise the impact of any possible chronological or seasonal bias, we spread the sampling time over several baseline years (2005, 2007, 2009, 2011 and 2013) and each month during the baseline year. Patients with recurrence may have sought treatment elsewhere, although this potential bias was minimal since over 90% of ESS are performed in public healthcare settings [79] . In this study, we were authorised to extract data from a relatively small number of patients. However, this limitation was addressed by using cross-validation methods. Unfortunately, we did not process time series variables. Thus, recurrent neural network type models such as long short-term memory (LSTM) or bidirectional LSTM could not be used to predict revision ESS risk. EHR data have been available in our hospital since 2005. We acknowledge that the baseline ESS does not always indicate the first ESS. As such, we lacked data for possible earlier ESS, which we have previously shown to affect the revision ESS risk on a population level [13] . Furthermore, data were lacking for some other important factors, such as postoperative treatment, validated symptoms, endoscopic nasal polyp score, medication, the Lund Mackay score for sinus computed tomography scans, smoking status, eosinophils and the extent of baseline ESS. Yet, some of these variables, such as smoking [13] or total ethmoidectomy, have not emerged as strong predictors of revision ESS compared with Type 2 high diseases [57] in our previous studies. That said, we acknowledge that the inclusion of more variables and additional cases would most likely improve our estimates. Therefore, before extrapolating our results to clinical practice, replication studies in other populations and with additional variables are needed.

Our results indicate that Type 2-high conditions (CRSwNP, asthma and NERD), a high clinical visit frequency, a short time interval between the baseline clinic visit and ESS and immunodeficiency or a suspicion of it increase the likelihood of revision ESS at the individual level. Moreover, age and the number of preoperative clinical visits predict a nonlinear revision ESS risk. Although these findings require validation in other populations, our results reinforce the importance of diagnostics and the management of NERD, CRSwNP, asthma and other comorbidities to prevent uncontrolled CRS, and carry relevancy for patient counselling specifically.

Supporting information S1 File. Procedure and ICD10-codes. List of procedure and ICD10 codes that were used for identifying ESS patients and chronic diseases. 

European Position Paper on Rhinosinusitis and Nasal Polyps 2020

Multidimensional endotypes of chronic rhinosinusitis and their association with treatment outcomes

Multivariate analysis of inflammatory endotypes in recurrent nasal polyposis in a Chinese population

Diagnosis and management of NSAID-Exacerbated Respiratory Disease (N-ERD)-a EAACI position paper

Factors affecting upper airway control of NSAID-exacerbated respiratory disease: A real-world study of 167 patients. Immunity, Inflammation and Disease

Risk Factors of Severe Adult-onset Asthma: A Multi-factor Approach

Economic evaluation of endoscopic sinus surgery versus continued medical therapy for refractory chronic rhinosinusitis. The Laryngoscope

Can FESS Combined with Submucosal Resection (SMR)/Septoplasty Reduce Revision Rate? Otolaryngology-Head and Neck Surgery

The national comparative audit of surgery for nasal polyposis and chronic rhinosinusitis

Cost-effectiveness and comparative effectiveness of biologic therapy for asthma: To biologic or not to biologic

Predicting difficult-to-treat chronic rhinosinusitis by noninvasive biological markers

Monoclonal Antibodies and Airway Diseases

Factors affecting revision rate of chronic rhinosinusitis

Using postoperative SNOT-22 to help predict the probability of revision sinus surgery

Revision rates and time to revision following endoscopic sinus surgery: A large database analysis

Long-term revision rates for endoscopic sinus surgery

Predicting revision sinus surgery in allergic fungal and eosinophilic mucin chronic rhinosinusitis. The Laryngoscope

Factors impacting revision surgery in patients with chronic rhinosinusitis with nasal polyposis

Revision surgery rates in chronic rhinosinusitis with nasal polyps: meta-analysis of risk factors

Endoscopic Sinus Surgery for Chronic Rhinosinusitis with Nasal Polyps: Clinical Outcome and Predictive Factors of Recurrence

Escalation in mucus cystatin 2, pappalysin-A, and periostin levels over time predict need for recurrent surgery in chronic rhinosinusitis with nasal polyps

Outcomes of modified endoscopic Lothrop in aspirin-exacerbated respiratory disease with nasal polyposis

In: Data-driven advice for applying machine learning to bioinformatics problems

A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models

From local explanations to global understanding with explainable AI for trees

Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation

The Elements of Statistical Learning

Stochastic Stepwise Ensembles for Variable Selection

Opinion versus practice regarding the use of rehabilitation services in home care: an investigation using machine learning algorithms

Prediction of adverse cardiac events in emergency department patients with chest pain using machine learning for variable selection

External validation of two prediction models identifying employees at risk of high sickness absence: Cohort study with 1-year follow-up

A readers' guide to the interpretation of diagnostic test properties: clinical example of sepsis

Scikit-learn: Machine Learning in Python

A Scalable Tree Boosting System

The NumPy Array: A Structure for Efficient Numerical Computation

Data Structures for Statistical Computing in Python

A Unified Approach to Interpreting Model Predictions

PDPbox: python partial dependence plot toolbox

Personalized risk communication for personalized risk assessment: Real world assessment of knowledge and motivation for six mortality risk measures from an online life expectancy calculator. Informatics for Health and Social Care

Revision Rates after Endoscopic Sinus Surgery: A Recurrence Analysis

Covid-19 Automated Diagnosis and Risk Assessment through Metabolomics and Machine Learning

Can machine learning improve patient selection for cardiac resynchronization therapy?

Supervised Machine Learning to Predict Follow-Up Among Adjuvant Endocrine Therapy Patients

Natural language processing to advance EHR-based clinical research in Allergy

Personalized prediction of early childhood asthma persistence: A machine learning approach

An algorithm for the classification of mRNA patterns in eosinophilic esophagitis: Integration of machine learning

Machine learning of biomarkers and clinical observation to predict eosinophilic chronic rhinosinusitis: a pilot study

Automated classification of osteomeatal complex inflammation on computed tomography using convolutional neural networks

Using a machine learning approach to predict outcome after surgery for degenerative cervical myelopathy

Estimating an Individual's Probability of Revision Surgery After Knee Replacement: A Comparison of Modeling Approaches Using a National Data Set

Development of machine learning algorithms for prediction of prolonged opioid prescription after surgery for lumbar disc herniation

Predictive Modeling for Blood Transfusion After Adult Spinal Deformity Surgery

Predicting the occurrence of surgical site infections using text mining and machine learning

Biometric predictive models for the evaluation of olfactory recovery after endoscopic sinus surgery in patients with nasal polyposis

Type and frequency of healthcare encounters can predict poor surgical outcomes in anterior cruciate ligament reconstruction patients

Factors Affecting the Control of Chronic Rhinosinusitis With Nasal Polyps: A Comparison in Patients With or Without NERD

Assessing Cut-off Points of Eosinophils, Nasal Polyp, and Lund-Mackay Scores to Predict Surgery in Nasal Polyposis: A Real-World Study

Long-term outcomes of different endoscopic sinus surgery in recurrent chronic rhinosinusitis with nasal polyps and asthma

Real-life study showing uncontrolled rhinosinusitis after sinus surgery in a tertiary referral centre

High Discontinuation Rates of Peroral ASA Treatment for CRSwNP: A Real-World Multicenter Study of 171 N-ERD Patients

Eosinophils and Mast Cells in Aspirin-Exacerbated Respiratory Disease

The prognostic role of serum eosinophil and basophil levels in sinonasal polyposis

Predictive Significance of Tissue Eosinophilia for Nasal Polyp Recurrence in the Chinese Population

Are neutrophil-, eosinophil-, and basophil-to-lymphocyte ratios useful markers for pinpointing patients at higher risk of recurrent sinonasal polyps?

A prospective investigation of predictive parameters for post-surgical recurrences in sinonasal polyposis

Prediction models for postoperative uncontrolled chronic rhinosinusitis in daily practice

The Importance of Local Eosinophilia in the Surgical Outcome of Chronic Rhinosinusitis: A 3-Year Prospective Observational Study

Mucosal eosinophilia and recurrence of nasal polyps-new classification of chronic rhinosinusitis

Subclassification of chronic rhinosinusitis with nasal polyp based on eosinophil and neutrophil. The Laryngoscope

Revision endoscopic sinus surgery rates by chronic rhinosinusitis subtype

Deep Learning for Improved Risk Prediction in Surgical Outcomes

The Elements of Statistical Learning

BMC Med Inform Decis Mak. Development and validation of classifiers and variable subsets for predicting nursing home admission

Comparison of machine learning classifiers for differentiation of grade 1 from higher gradings in meningioma: A multicenter radiomics study

Machine learning provides evidence that stroke risk is not linear: The non-linear Framingham stroke risk score

U-Shaped Association Between Intraoperative Net Fluid Balance and Risk of Postoperative Recurrent Atrial Tachyarrhythmia Among Patients Undergoing the Cryo-Maze Procedure: An Observational Study

U-Shaped Association between Body Mass Index and the Prevalence of Wheeze and Asthma, but not Eczema or Rhinoconjunctivitis: The Ryukyus Child Health Study. The Journal of asthma: official journal of the Association for the Care of

Regional differences in endoscopic sinus surgery in Finland: A nationwide register-based study

Open access funded by Helsinki University Library.