key: cord-0030945-a2tbljn3 authors: Li, Wenle; Liu, Yafeng; Liu, Wencai; Tang, Zhi-Ri; Dong, Shengtao; Li, Wanying; Zhang, Kai; Xu, Chan; Hu, Zhaohui; Wang, Haosheng; Lei, Zhi; Liu, Qiang; Guo, Chunxue; Yin, Chengliang title: Machine Learning-Based Prediction of Lymph Node Metastasis Among Osteosarcoma Patients date: 2022-04-20 journal: Front Oncol DOI: 10.3389/fonc.2022.797103 sha: d7d83ebf4e27e7ae6690fd7d7df2d15879a34a37 doc_id: 30945 cord_uid: a2tbljn3 BACKGROUND: Regional lymph node metastasis is a contributor for poor prognosis in osteosarcoma. However, studies on risk factors for predicting regional lymph node metastasis in osteosarcoma are scarce. This study aimed to develop and validate a model based on machine learning (ML) algorithms. METHODS: A total of 1201 patients, with 1094 cases from the surveillance epidemiology and end results (SEER) (the training set) and 107 cases (the external validation set) admitted from four medical centers in China, was included in this study. Independent risk factors for the risk of lymph node metastasis were screened by the multifactorial logistic regression models. Six ML algorithms, including the logistic regression (LR), the gradient boosting machine (GBM), the extreme gradient boosting (XGBoost), the random forest (RF), the decision tree (DT), and the multilayer perceptron (MLP), were used to evaluate the risk of lymph node metastasis. The prediction model was developed based on the bestpredictive performance of ML algorithm and the performance of the model was evaluatedby the area under curve (AUC), prediction accuracy, sensitivity and specificity. A homemade online calculator was capable of estimating the probability of lymph node metastasis in individuals. RESULTS: Of all included patients, 9.41% (113/1201) patients developed regional lymph node metastasis. ML prediction models were developed based on nine variables: age, tumor (T) stage, metastasis (M) stage, laterality, surgery, radiation, chemotherapy, bone metastases, and lung metastases. In multivariate logistic regression analysis, T and M stage, surgery, and chemotherapy were significantly associated with lymph node metastasis. In the six ML algorithms, XGB had the highest AUC (0.882) and was utilized to develop as prediction model. A homemade online calculator was capable of estimating the probability of CLNM in individuals. CONCLUSIONS: T and M stage, surgery and Chemotherapy are independent risk factors for predicting lymph node metastasis among osteosarcoma patients. XGB algorithm has the best predictive performance, and the online risk calculator can help clinicians to identify the risk probability of lymph node metastasis among osteosarcoma patients. Lymph node metastasis is one of the most common metastases and has important prognostic implications for many types of cancers (1, 2) . Systemic metastatic cells, originated from primary cancer, spread through the blood and lymphatic system, and lymph nodes are usually the first organ to develop metastases. In various types of tumors, the presence of metastatic tumor cells in regional lymph nodes is one indicator for poor prognosis (3, 4) . Osteosarcoma, also called osteogenic sarcoma, is a common primary malignant bone tumor and is highly malignant and aggressive (5) . Osteosarcoma patients often suffered from poor prognosis due to metastases, with the lung being the most common site of metastasis (6) . Besides, patients also could be found to develop lymph node metastasis in the ipsilateral or contralateral limb (7) . The incidence of lymph node metastasis in osteosarcoma was 1.4% to 2.3%. Although the probability of lymph node metastasis in osteosarcoma was not high, it was still of a high concern owing to its significant association with reduced 5-year survival outcome among osteosarcoma patients (8) . Sheila Thampi et al. (9) found that the 5-year overall survival rates for patients with and without regional lymph node involvement were 10.9% and 54.3%, respectively, and regional lymph node involvement was an independent predictor of relative poorer survival prognosis. These results suggested that the focus on osteosarcoma lymph node metastasis should be emphasized in clinical practice, and it was of particular importance to develop a predictive model to stratify the risk of osteosarcoma lymph node metastasis at advance. Machine learning, a new type of artificial intelligence, is beginning to be widely used in healthcare data analysis and is a powerful tool to improve clinical strategies (10) (11) (12) (13) (14) . Machine learning algorithms can automatically learn from the input data to predict output values within acceptable accuracy and identify patterns and trends in the data (11) (12) (13) (14) (15) (16) (17) . Therefore, this study aimed to develop models based on machine learning using clinical features to predict the risk of lymph node metastasis among osteosarcoma patients so that individual prevention strategies for osteosarcoma could be proposed to help clinicians to make therapeutic decisions. Thus, we hypothesized that an optimal model could be developed with the help of significant clinical features according to machine learning. Retrospective analysis of the SEER (Surveillance, Epidemiology, and End Results) database and data of patients admitted to the Second Affiliated Hospital of Jilin University, the Second Affiliated Hospital of Dalian Medical University, the Liuzhou People's Hospital affiliated to Guangxi Medical University, and the Xianyang Central Hospital. SEER is an authoritative source for cancer statistics in the United States. The Surveillance, and it provides information on cancer statistics in an effort to reduce the cancer burden among the U.S. population. Patients were included if they had (1) pathologically confirmed primary osteosarcoma, (2) The cohort from the SEER database was included in the traininggroup and the cohort from the four medical centers was includedin the validation group. We compared the pathologicalcharacteristics of the training and validation group andanalyzed the risk factors for predicting lymph node metastasisby the univariate analysis. Subsequently, the multivariate logisticregression analysis was used to evaluate each variable, andindependent predictors associating with lymph node metastasiswere obtained. The independent predictors were included in sixmachine learning algorithms and the AUC was calculated toidentify the highest performing machine learning model. Meanwhile, A web-based calculator was capable of estimating the probability of lymph node metastasis in individuals. The training group was extracted from the SEER database using SEER statistical software (version 8.3.6). All analyses were performed using R software (version 3.6.0). Continuous variables were represented as the median with interquartile range (IQR), while categorical variables were represented as numbers with proportions. Differences of two groups were compared by Wilcoxon rank-sum test for continuous variables, and categorical variables were evaluated using the Chi-Squared test or Fisher's Exact test. Logistic regression analysis was used to analyze the relationship between various predictor variables (either categorical or continuous) and an outcome which is binary (dichotomous). Six ML algorithms, including the logistic regression (LR), the gradient boosting machine (GBM), the extreme gradient boosting (XGBoost), the random forest (RF), the decision tree (DT), and the multilayer perceptron (MLP), were used to evaluate the risk of lymph node metastasis. The prediction model was developed based on the best predictive performance of ML algorithm and the performance of the model was evaluated by the area under curve (AUC), prediction accuracy, sensitivity and specificity.Bilateral p-values < 0.05 were considered statistically significant. The 1201 patients were divided into two groups according to the presence of lymph node metastases, and the differences between the two groups (lymph node metastases vs no lymph node metastases) in terms of age (P=0.01), T stage (P<0.001), M stage (,P<0.001), surgery (P<0.001), chemotherapy (P=0.005), bone metastases (P<0.001), and survival times (P<0.001) were statistically significant ( Table 1) . Meanwhile, fewer patients in surgery owned lymph node metastases than non-surgical patients ( Table 1) . The patients in the training group were trained with 10-fold cross-validation, and the data set was divided into 10 parts, of which 9 parts were used for training and 1 part for testing on a rotating basis, and the final accuracy was averaged 10 times. The results (Figure 1) showed that the XGB model had the highest accuracy in predicting the risk of osteosarcoma lymph node metastasis occurrence with an AUC of 0.882. The results of external validation (Figure 2 ) also showed the best performance of the XGB model with an AUC of 0.874, a sensitivity of 0.750, a specificity of 0.868, and the accuracy of 0.851 (Table 3) . Therefore, the XGB model was selected as the final prediction model in this study. The relative importance of variables in each ML algorithm for predicting osteosarcoma lymph node metastasis is shown in Figure 3 , and the overall trend was as follows: although the importance of variables in these ML algorithms varied slightly, they included T stage, M stage, age, surgery, and chemotherapy ranked in the top five. In contrast, Radiation, bone metastases, and lung metastases did not contribute much to the prediction of the risk of lymph node metastasis occurrence in osteosarcoma. The XGB model performed best with the following variables in descending order of importance: age, T stage, laterality, M stage, surgery, chemotherapy, lung metastases, radiation, and bone metastases. In this study, an online calculator based on the best model (the XGboost algorithm) was developed to predict the risk of lymph node metastasis in osteosarcoma patients ( Figure 4) . This calculator was easy to automatically present in clinical practice by simply entering the patient's clinical characteristics and laboratory data. Please refer to the website: https://share. streamlit.io/liuwencai123/os_lnm/main/os_lnm.py. Osteosarcoma, Ewing sarcoma, and chondrosarcoma were the three most common types of malignant bone tumors. Of the three tumors, osteosarcoma and Ewing sarcoma occur more frequently in childhood, whereas chondrosarcoma is more common in the elderly (18) . Pathology is the gold standard for the diagnosis of osteosarcoma, but clinical experience is required to determine the presence of tumor cells, and this empirical variation can influence pathologic judgments (19) . It has been shown that there were prognostic differences between patients with different subtypes of osteosarcoma, with a higher overall survival rate for classic osteosarcoma and a lower survival rate for the small cell and capillary dilated types (20) . Staging is important in the development of treatment plans. More than 90% of typical osteosarcomas have eroded the bone cortex and invaded soft tissues at the time of consultation (21) , and belonged to the interstitial stage IIB type, and those with pulmonary metastases belonged to stage III. The most common site of metastasis in osteosarcoma is the lung (22) , but patients with pulmonary metastases have a relatively minor negative prognostic effect compared to metastases from other sites (23) . Although the incidence of lymph node metastases from osteosarcoma is low, patients with lymph node metastases from osteosarcoma have a poorer prognosis than those with nonlymph node metastases (9) . An animal study showed that the median survival time of dogs without lymph node metastases (318 days; range, 20 to 1,711 days) was significantly longer than the median survival time of dogs with lymph node metastases (59 days; range, 19 to 365 days) (24) , and lymph node metastases were an unfavorable prognostic factor. Therefore, early identification of the risk of lymph node metastasis in patients with osteosarcoma is clinically important and will facilitate timely measures by clinicians to optimize treatments. This study developed and validated several models using popular machine learning algorithms to predict the risk of lymph node metastasis among osteosarcoma patients and the logistic regression analysis showed that T stage, M stage, surgery, and chemotherapy were independent risk factors for lymph node metastasis among osteosarcoma patients. After comparing the performance of the six ML algorithms, we found that the XGB algorithm had the best performance (AUC=0.882). To increase the feasibility of the application of this model, an online web calculator was further developed for assessing the individual probability of lymph node metastasis in patients with osteosarcoma. The clinical characteristics of age, T stage, laterality, M stage, and surgery were identified by the ML algorithm as the most important predictors of lymph node metastasis in patients with osteosarcoma. The results of this study showed that older patients with osteosarcoma had a greater risk of lymph node metastasis, which may be related to the bimodal nature of the age of predilection for osteosarcoma (25) . T stage and M stage, as indicators of the biological progression of the tumor, were positively correlated with lymph node metastasis in a larger number of tumors (26) . And surgery as one of the more important variables may cause metastatic dissemination of tumor cells due to invasive operations such as surgery, puncture or intraoperative injury to tissues such as blood vessels (27) . In a study conducted by Dong et al. (28) , gender, primary tumor site, tumor type and size were identified as independent risk factors for lymph node metastasis in osteosarcoma by single multifactor analysis, and age, race, distant metastasis, tumor type and surgical treatment were also shown to be prognostic factors affecting overall survival of patients with lymph node metastasis in osteosarcoma by multifactor COX regression analysis. However, despite the significance of the results in this study, no external data validation was performed and there may had been over-fitting. In a large population-based cohort study (29) , the presence of bone metastases (OR 8.73; 95%CI: 4.37-17.48) or brain metastases (OR 25.63; 95%CI: 1.55-422.86) in patients with osteosarcoma was significantly associated with the occurrence of pulmonary metastases, which could provide some ideas for our subsequent study to take into account the presence of metastases from other sites before the patient develops lymph node metastases as a factor. The advantage of the present study is that the ML algorithm was used to develop a prediction model to assess the risk of lymph node metastasis in patients with osteosarcoma using readily available clinical data, and the prediction model developed in this study was validated by the validation group, showing strong predictive power compared with the linear model used in previous studies. The prediction model developed in this study showed strong predictive power and some advantages over the linear models used in previous studies. The inclusion of different ethnic groups in the modeling and validation groups also demonstrated the generalizability of the model. Finally, in order to make the prediction model more convenient for clinical use, an online application based on the model was created, which allowed clinicians to predict the risk of lymph node metastasis in patients with osteosarcoma by using the clinical characteristics of the patients available. There were some limitations of this study. Firstly, this study was a retrospective study and there might be some selection bias. Secondly, the results analyzed in this study only demonstrated the association between risk factors and lymph node metastasis, but could not elucidate whether there was a causal relationship. Because the original data in the SEER database have no chronological sequences, thus the causal relationship between variables could not be obtained after analysis. Therefore, this study need further investigations. This study developed and validated the ML algorithm for individualized prediction of whether a patient with osteosarcoma will develop lymph node metastasis by using readily available clinical features. T stage, M stage, surgery and Chemotherapy are independent risk factors for predicting lymph node metastasis among osteosarcoma patients. Among all the six ML algorithms, the XGB algorithm has the best predictive performance, and the online risk calculator was generated based on this algorithm, which can help clinicians to identify the risk probability of lymph node metastasis among osteosarcoma patients. The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding authors. The SEER database is a comprehensive data source developed based on population data and updated annually since its launch in 1973. It is public and identifiably accessible that data analysis is treated as non-human subjects by the Office for Human Research Protections. As such, no institutional review board approval and informed consent were required. For multicenter data, the study was approved by the ethics review committee of four medical institutions in China, the Second Affiliated Hospital of Jilin University, the Second Affiliated Hospital of Dalian Medical University, Liuzhou People's Hospital, and Xianyang Central Hospital (No. 2021-00-22) and was conducted in accordance with the guidelines of the Helsinki Declaration. CLY, CXG and QL carried out the study design. WCL conducted the research and collected and analyzed the data. WLL and YFL performed the statistical analysis and drafted the manuscript. ZRT and STD provided the expert consultations and suggestions. ALL conceived the study, participated in its design and coordination, and helped shape the language. All authors contributed to the article and approved the submitted version. The Construction and Development of a Clinical Prediction Model to Assess Lymph Node Metastases in Osteosarcoma Prognostic Significance of Metastatic Lymph Node Ratio: The Lymph Node Ratio Could be a Prognostic Indicator for Patients With Gastric Cancer Unexpected Contribution of Lymphatic Vessels to Promotion of Distant Metastatic Tumor Spread Propensity-Matched Analysis of Clinical Relevance of the Highest Mediastinal Lymph Node Metastasis Anlotinib, a Novel Small Molecular Tyrosine Kinase Inhibitor, Suppresses Growth and Metastasis via Dual Blockade of VEGFR2 and MET in Osteosarcoma Epidemiology and Risk Factors of Osteosarcoma Distal Fibula Reconstruction in Primary Malignant Tumours Adverse Impact of Regional Lymph Node Involvement in Osteosarcoma Multiparametric Ultrasomics of Significant Liver Fibrosis: A Machine Learning-Based Analysis On the Design of Blockchain-Based ECDSA With Fault-Tolerant Batch Verication Protocol for Blockchain-Enabled IoMT Blockchain and PUF-Based Lightweight Authentication Protocol for Wireless Medical Sensor Networks Classification of COVID-19 Individuals Using Adaptive Neuro-Fuzzy Inference System Socio-Technological Factors Affecting User's Adoption of Ehealth Functionalities: A Case Study of China and Ukraine Ehealth Systems Secure-Enhanced Federated Learning for Ai-Empowered Electric Vehicle Energy Prediction COVID-19 Patient Health Prediction Using Boosted Random Forest Algorithm When Machine Learning Meets Medical World: Current Status and Future Challenges Allografts in Malignant Bone Negative CT Contrast Agents for the Diagnosis of Malignant Osteosarcoma Evaluation of Clinical and Histopathologic Prognostic Factors for Survival in Canine Osteosarcoma of the Extracranial Flat and Irregular Bones Establishment and Characterization of Novel Patient-Derived Extraskeletal Osteosarcoma Cell Line NCC-ESOS1-C1 Establishment and Characterization of a Highly Metastatic Human Osteosarcoma Cell Line From Osteosarcoma Lung Metastases Survival and Prognosis With Osteosarcoma: Outcomes in More Than 2000 Patients in the EURAMOS-1 (European and American Osteosarcoma Study) Cohort Incidence and Prognostic Importance of Lymph Node Metastases in Dogs With Appendicular Osteosarcoma: 228 Cases (1986-2003) Translational Cell Biology of Highly Malignant Osteosarcoma American Joint Committee on Cancer Staging and Other Platforms to Assess Prognosis and Risk Surgical Ventricular Entry is a Key Risk Factor for Leptomeningeal Metastasis of High Grade Gliomas Risk Factors of Regional Lymph Node (RLN) Metastasis Among Patients With Bone Sarcoma and Survival of Patients With RLN-Positive Bone Sarcoma Lung Metastases at the Initial Diagnosis of High-Grade Osteosarcoma: Prevalence, Risk Factors and Prognostic Factors. A Large Population-Based Cohort Study Conflict of Interest: Author CG was employed by Hengpu Yinuo (Beijing) Technology Co., Ltd, Beijing, China.The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.