key: cord-0841186-81mcs4uo authors: Asghari, Parvaneh title: A diagnostic prediction model for colorectal cancer in elderlies via internet of medical things date: 2021-06-16 journal: Int J Inf Technol DOI: 10.1007/s41870-021-00663-5 sha: b63ba2d76e6df9112556f06a7c00e2b4a3880ca2 doc_id: 841186 cord_uid: 81mcs4uo Internet of Medical Things (IoMT) and embedded systems have improved the healthcare systems by enabling remote monitoring the patients’ health conditions anywhere and anytime especially during novel COVID-19 pandemic. In this paper, an IoT-based predicting model is proposed to predict colorectal cancer (CRC) in elderlies. It provides a CRC predicting model for the involved medical team to continuously trace an elderly’s biological indicators using smart wearable embedded systems and medical IoT devices. In this model, vital medical data is collected by IoMT devices and sensors, then analytical information is derived via machine learning (ML) methods for early CRC diagnosis and elderly’s health parameters changes. The experimental results confirm that the suggested model meets the proper accuracy of predicting the CRC in aged people. The fast growing population of aged people and high related healthcare overheads has become a serious challenge in our today's life [1] . Therefore, the fast declining health condition of elderlies motivates them to apply online health monitoring systems to prevent common threatening diseases such as cancers [2] . Recent technologies such as wireless sensor networks (WSN), Internet of Thing (IoT) and Internet of Medical Things (IoMT) contribute significantly to develop the smart health monitoring systems and applications for early diagnosis of non-contagious diseases such as cancers [3] . IoMT as Medical sensor-based IoT smart devices provide remote elderlies' health monitoring while they are at home, in hospital, or being anywhere else. Medical IoT sensors attached to the patient and transfer continuously vital health data to the medical team from anywhere and anytime [4] . With a high prevalence of novel COVID-19, early detection and prediction of CRC can contribute widely to decrease its heavy burden on communities comprising of aged and disabled people through making on-time and accurate clinical decisions on CRC detection for recommending on-time necessary treatments to them [5] . Most of the studies concentrated on the prediction of general chronic diseases such as diabetes mellitus, heart failures, kidney disease in IoT contexts by ML methods [6] . However, early detection of cancers as a serious non-contagious disease via IoT technology and data mining methods have been remained as an important problem due to the poor performance of clinical prediction models and the high cost of diagnosis procedures. This issue has led us to provide an IoT-based diagnostic model for early detection of CRC in elderlies based on monitoring the most important influencing indicators. The suggested model applies medical IoT sensors to constantly monitor the elderly's overall health function condition to detect the symptoms of CRC. Our IoT-based CRC prediction model is developed for detecting the high risk conditions and identifying changes in elderlies' biological indicators via data mining methods over the steady observing. The main contributions of this paper include: (i) proposing an IoT-based monitoring model for on-line collecting colorectal health data from anywhere and anytime for elderlies, (ii) providing an abstract model for early CRC diagnosis over following the influencing parameters including essential colorectal vital signs and clinical health status, and finally iii) evaluating the elderlies' biological alterations over data mining methods to predict the CRC cases and the level of its severity in elderlies. The rest of this paper is organized as follows. Section 2 presents a brief survey and analysis of the current related works in this field. In Sect. 3, the proposed model is explained in detail that includes the IoT-based model for collecting the required data and the method for prediction of CRC based on the extracted associative rules. Section 4 demonstrates the experimental outcomes of the suggested method over statistical examination using some existing data classification methods to compare their outcomes. Section 5 concludes the proposed method for CRC prediction and provides future research directions in this area. Relevant studies on CRC prediction has been less focused in digital healthcare systems, especially in IoT environments [1] . In this section, some of the recent related studies are surveyed. Also, a comparison between them is presented at the end of this section in Table 1 . An approach for prediction of advanced-stage CRC survival or death of patients was proposed in [1] . The authors considered the survival prediction procedure as a two-phase process. At the primary phase, a tree-based classification algorithm is offered for grouping the survivability of advanced-stage CRC patients. In the second phase, a selective regression technique was provided for lifetime prediction. The authors claimed that their offered prediction model reached a more accurate prediction in comparison to the one-phase regression methods. In [2] , a supervised ML model for CRC prediction based on high dimensional gene data was proposed. In the offered supervised model, the authors presented a three layered light neural network model using a group of factors. The authors claimed that their model which employs the Monte Carlo algorithm presented a firm efficiency in comparison to the other methods. An approach using data mining methods deployed on a huge volume of medical information records to provide a model for CRC prediction in [3] . The decision tree learner was applied as the classification method. The performance of the proposed model was evaluated based on sensitivity and specificity factors. In [7] , a predictive model based on ML techniques classifies patients based on blood counts representing a greater probability of CRC and deserving colonoscopy operation. The efficiency of the offered mode was evaluated based on True Positive/ False Positive (TP/FP) and True Negative/ False Negative (TN/FN) parameters. Also, a predictive model for CRC detection was proposed in [4] based on electronic Medical Records (EMRs). Some ML methods such as Random Forests (FR), Support Vector Machine (SVM), classification and regression tree (CART) algorithms were applied in this model. The parameters of TP/ FP and TN/FN were used for assessing the proposed model. In [5] , an approach was presented for colorectal surgical difficulties using Electronic Health Records (EHRs) as heterogeneous clinical data, based on ML [5] Not mentioned Support vector machine (SNM) 9 9 9 9 4 [8] Internet-based Not mentioned 9 9 9 9 9 [9] Internet-based Not mentioned 9 9 9 9 9 Our model Multi-layer perceptron (MLP), J48, sequential minimal optimization (SMO), naïve bayes Int. j. inf. tecnol. methods. Both linear and non-linear forms of the SVM technique were applied in the proposed model. Using patient vital signs, blood samples values, and EHRs enhanced the performance of the proposed model. Recently, an Internet-based virtual humans (VH) approach has been presented in [8] for CRC screening. The authors provided the strategies for CRC screening which are derived from two aspects for having users' perception including the visual design and the information about the user intentions to follow additional health data. An experiential analysis compared the impact of the information medium comprising the animated and static VH, and also the text-based involvement on user objectives. It was revealed that VH-based interference can effect users' determinations outside of a laboratory environment. In [9] , several potential predictors for CRC were recognized in a selected population of aged people using the interface among the screening database and primary care records. The authors observed that the considered predictors can be considered in a refined risk prediction model merging the newer quantitative FIT for bowel cancer screening. Table 1 presents a summary of the employed platform, applied data mining method and measurement factors for performance evaluation in the reviewed papers. As observed in Table 1 , none of the surveyed papers employed the cloud or IoT as the powerful technologies in their approaches for predicting the critical condition of CRC for monitoring the potential patients. The proposed abstract model of CRC prediction in Fig. 1 comprises of three main components as follows: 1. In the IoT environment and patients' sphere, besides the required personal information which has to be given by the patient, the other on-line required data for CRC prediction are collected by bio-medical sensors and medical IoT devices which all of them are considered as behavioral data. Two other factors of Hemoglobin (Hgb) and Fecal Immunochemical Test (FIT) which are known as vital signs for CRC prediction are also collected in this component. 2. In the Cloud broker component, data preprocessing and CRC prediction processes are performed using the colorectal health data which have been collected in the previous component. 3. In the medical care component, the analytical outcomes derived from the CRC prediction process are passed to the involved medical team for recommending the required prescriptions and instructions to the patients. The proposed model consists of four phases including that describe in the next subsections. Most of the previous studies have considered some clinical signs for CRC possibility consist of complete blood count (CBC) [7] . However, there are some other signs for the CRC detection in aged people which are vital to assess [5, 10] . The main clinical indicators for CRC detection in elderlies and their definitions besides normal limits are described in Table 2 . As shown in Table 2 , the required parameters that influence on colorectal health status consist of three types of data including personal data, clinical data and clinical IoT data which have been proved by the world health organization WHO [11, 12] . The first group of required data consists of gender, personal history and family history of CRC [5] . This group is considered as personal data that has to be entered by the patient in the system once. The second group of required data that categorized as clinical data that must be entered daily in the system, consists of Constipation, Diarrhea, Rectal bleeding, Abdominal Pain and Abdominal tenderness. The third group of essential data for CRC prediction is the clinical IoT data which have to be collected via bio-medical sensors and MIoT devices that provide continuously gathering the essential data consist of Hemoglobin (Hgb), Fecal Immunochemical Test (FIT) and Wight loss which are sensed and collected online and constantly. To monitor the elderly's colorectal health factors, the IoT data have to be collected via MIoT smart devices. The Table 2 which were describedin Sect. 3.1. The medical IoT smart devices continuously get and collect the colorectal health indicators in order to detect the potential CRC in patient [6] . The data preprocessing phase is required to clean the sensed medical IoT data from the inconsistencies and the noises for the data mining phase. Furthermore, the feature selection procedures are uses for reducing the dimension of data for easier classification processes in the CRC prediction phase [6] . In this study, 400 Iranian aged people have been considered as the samples for evaluating the proposed method. According to the symptoms for the elderlies' colorectal health status, the CRC possibility is classified in four cases including (1) The white status that means No-CRC, (2) the yellow status that means Pre-CRC which means the patient needs more follow up, (3) the orange status that means CRC detected with medium probability, and (4) the red status that means CRC detected with high probability. These different cases are resulted from the monitored values of the colorectal health indicators based on Table 2 , and the derived rules for case diagnosis based on the Table 3 . Table 4 , shows the tabular presentation of the obtained rules based on the observed symptoms to show the colorectal health status which categorized with white, yellow, orange and red boxes. After detecting the status of the patient, the procedure of CRC prediction is performed based on the evaluation of the indicators comprising of FIT factor, personal history and family history. Figure 2 shows the procedure of the offered scheme for CRC prediction. In the case of No-CRC detection, the patients should be followed up for the potential CRC screening in the future. For evaluating the effectiveness of the CRC prediction method, five main parameters of precision (Pr.), accuracy (Acc.), f-score, recall (Re.) and execution time are considered. Table 5 refers to the mentioned factors and their descriptions and equations concerning the confusion metrics which are usually employed for measuring the performance of ML classifiers [6] . The confusion metrics contains the instances with four features comprising: (1) True Positive (TP) that shows the cases that have been correctly categorized, (2) False Negative (FN) specifies the CRC instances that have been detected as normal. (3) False Positive (FP) indicates the normal instances that have been grouped as CRC cases and (4) True Negative (TN) defines the normal instances that have been classified truly [6] . In the offered model, some classification techniques are used for CRC prediction on collected instances. The outcomes of the classification process achieved by Weka 3.6 with different ML classification techniques such as J48, Sequential Minimal Optimization (SMO), Multi-Layer Perceptron (MLP) and Naïve Bayes (NB) methods. The collected dataset is tested by the training step in the classifier. Also, the k-fold cross validation technique is employed in which the dataset is separated into k mutualexclusive folds of close equal size in a random way for k times training and testing of the classification process. The accuracy factor is considered as the total number of right classifications, which is divided by the instances in the dataset in cross-validation method. The results from different classification methods are presented in Figs. 3, 4, 5, 6 and 7 based on some different cross folds values. Typically k-fold cross validation technique is a common procedure in which the folds are provided from the same portion of labels for creating dissimilar datasets. Here, for performance evaluation of the applied classifiers, stratified k-fold cross-validation is employed with values between 1 and 20 for k-folds. As shown in Fig. 3 to Fig. 7 , the MLP technique achieved the minimum execution time with a major difference comparing to the other algorithms. The MLP showed an accuracy of 97% with a precision and recall of 94 and 99% and f-score of 96.4%. Also, the lowest exclusion time of 120 ms for MLP classifier shows that it has the highest quality regarding the results. Hence, it can be resulted that in CRC prediction, the MLP classifier has the maximum performance among the others. Regarding the challenges of elderlies' health worries of chronic diseases such as cancers and growing the expenditures of cancer therapy, mainly in developing countries, we present an abstract IoT-based model for colorectal cancer (CRC) predicting and early CRC diagnosis to support the medical teams for on-time decision making about the patients. Since people in such countries hardly access to hospitals and medical care centers for following their colorectal health status, we propose a CRC monitoring and prediction model via applying the benefits of sensor technology in the IoT context. Furthermore, we suggest a A tree ensemble-based two-stage model for advanced-stage colorectal cancer survival prediction Supervised machine learning model for high dimensional gene data in colon cancer detection Utilizing data mining for predictive modeling of colorectal cancer using electronic medical records On the advantage of using dedicated data mining techniques to predict colorectal cancer Predicting colorectal surgical complications using heterogeneous clinical data and kernel methods Haj Seyyed Javadi H (2019) A medical monitoring scheme and health-medical service composition model in cloud-based IoT platform Early colorectal cancer detected by machine learning model using gender, age, and complete blood count data Internetbased tailored virtual human health intervention to promote colorectal cancer screening: design guidelines from two user studies The use of electronic healthcare records for colorectal cancer screening referral decisions and risk prediction model development