key: cord-0075901-g36lgjvs authors: Abdollahi, Jafar; Nouri-Moghaddam, Babak title: Hybrid stacked ensemble combined with genetic algorithms for diabetes prediction date: 2022-03-21 journal: Iran J Comput Sci DOI: 10.1007/s42044-022-00100-1 sha: 1a16e176e53adc9142b0f2709bf4daab56c79ba8 doc_id: 75901 cord_uid: g36lgjvs Diabetes is currently one of the most common, dangerous, and costly diseases globally caused by increased blood sugar or a decrease in insulin in the body. Diabetes can have detrimental effects on people’s health if diagnosed late. Today, diabetes has become one of the challenges for health and government officials. Prevention is a priority, and taking care of people’s health without compromising their comfort is an essential need. In this study, the ensemble training methodology based on genetic algorithms was used to diagnose and predict the outcomes of diabetes mellitus accurately. This study uses the experimental data, actual data on Indian diabetics on the University of California website. Current developments in ICT, such as the Internet of Things, machine learning, and data mining, allow us to provide health strategies with more intelligent capabilities to accurately predict the outcomes of the disease in daily life and the hospital and prevent the progression of this disease and its many complications. The results show the high performance of the proposed method in diagnosing the disease, which has reached 98.8%, and 99% accuracy in this study. Chronic diabetes (CD) is one disease that affects the body's metabolism and causes structural changes. In 2014, the number of patients increased from 100 to 422 million [1] [2] [3] . Diabetes is typically divided into type 1, type 2 [4] , and gestational diabetes. Type 2 is increasing with a high prevalence worldwide and is one of the leading causes of death. Because regardless of age and gender, it threatens them due to the lack of insulin in the body [5] . Increased blood sugar is associated with the risk of death in the community due to pneumonia, stroke, acute myocardial infarction, etc. However, its effect is on the vital organs is so harmful that it is considered the mother of all diseases. There is a risk of miscarriage, kidney failure, heart attack, blindness, and other chronic and deadly diseases in diabetic patients. Therefore, it is essential to diagnose diabetes faster [6] [7] [8] . B Babak Nouri-Moghaddam babaknouriit85@gmail.com 1 Department of Computer Engineering, Ardabil Branch, Islamic Azad University, Ardabil, Iran The promising emerging potential of the Internet of Things (IoT) for connected medical devices and sensors plays a vital role in the next-generation healthcare industry for quality patient care. Due to the increasing number of elderly and disabled people, there is an urgent need for real-time health care infrastructure to analyze patient health care data to prevent preventable deaths [9] . Also, in intelligent health, modern wearable devices have gradually increased their capabilities in recent decades. They are equipped with several internal and external sensors to detect many vital signs [10] . Designing and implementing a far-off monitors system allows physicians and caregivers to remember peopling health in the least time. Current developments in ICT like the net of Things, Machine learning (ML), and data processing will enable us to produce health strategies with more intelligent capabilities to accurately predict the outcomes of the disease in the way of life, and therefore the hospital. Besides, medical advances in recent decades have significantly increased life expectancy while significantly reducing mortality [11] . According to La et al.' study [12] , the benefits of personal health care with IoT requirements are divided Iran Journal of Computer Science Fig. 1 General IoT-based health monitoring system [14, 15] Promote frequent assessments of health conditions and awareness of preventive health care needs. For example, the diagnosis of the disease with a set of measurements for a period can have been effectively accessible through routine examinations in clinics. Because routinely, in clinics and hospitals, doctors first take vital signs, then perform medical tests on the accepted vital signs to diagnose the disease. Rahmani et al. [13] used IoT technology to offer an objective and structured approach to improving human health. This approach will vary the health sector's IoTbased devices regarding social benefits, influence, and cost-effectiveness. Due to the character of the IoT calculations, all health institutions (people, equipment, medicine) are often continuously monitored and managed. Therefore, using these technologies within the health industry can improve the standard and costs of medical aid by automating tasks already carried out by humans. Figure 1 shows the overall IoT-based health monitoring system [14, 15] . As a result, the prediction accuracy of a model may be high even with optimal parameters. Not surprisingly, the models produced by MLR and carrier support regression with a linear core are not statistically distinct and perform significantly better than other methods in the IWPC Ensemble [16] . One way to beat the restrictions of one algorithm is to mix the benefits of several algorithms to exceed the limit of one machine learning algorithm (e.g., the Ensemble method). Recently, the "bagging" Ensemble method has been wont to predict diabetes [17, 18] . Stack generalization is another Ensemble method that uses a higher-level model to combine lower-level models to achieve higher prediction accuracy [19, 20] . Unlike bagging and boosting approaches, which can only combine machine learning algorithms of the same type, stacked generalization can combine different algorithms through a meta-machine learning model to maximize generalization accuracy. The objectives of the study or the research question are as follows. • Using a hybrid machine learning model to diagnose diabetes. • Significant improvement in a forecast accuracy • Use several models in combination • Achieve a high level of reliability in classification. • Use multiple models to increase the estimate of the final model. • Improved accuracy and reduced error compared to singlecore models. Also, the contributions of this paper can be summarized as follows: • Machine learning ensemble models for Diabetes (T2D) prediction demonstrated high performance. • Comparing results with the most related researches according to the literature. • Examining the benefits of ensemble methods proposed recently for prediction. • Using hybrid stacked generalization (SG) based Metaheuristics approach in the diagnosis of diabetes. In the last several years, the number of persons diagnosed with diabetes has increased exponentially. According to a health report, 347 million individuals globally have diabetes. Diabetes is a disease that affects both the elderly and the y ounger generation [21] [22] [23] . Diagnosing diabetes in its early stages is also difficult. This diagnosis will aid in the decisionmaking process of the medical system and will help us save lives from diabetes. Therefore, early prediction of diabetes is significant to save a person from diabetes. The type 2 diabetes data set, which contains nine valuable factors and 768 records, and another dataset with 55 useful variables and 100.000 records, was used in this investigation. Tables 2 and 3 lists the variables and their abbreviations. Finding a pattern in a vast data set is all about data analysis. This allows us to draw particular conclusions from the data supplied. Different machine learning methods can be used to perform the analysis. However, investigations have demonstrated that none of the algorithms can adequately solve a problem independently. This paper presents two sets of machine learning algorithms for diabetes prediction. The classification-based algorithm is one, and the ensemble learning algorithm is the other. Artificial intelligence (AI) research in healthcare is quickly advancing, with possible applications being proven in a variety of medical fields. We employ ensemble learning deliberately to look for superior prediction performance or classification accuracy [24] [25] [26] [27] . We use nine classifiers in classification. Random forest classifier, Support Vector Machine, Decision Trees, K-Nearest Neighbors, Gradient Boosting, Multilayer Perception, Extra Tree Classifier, AdaBoost classifier, and basic Gaussian bays are the several types of classification algorithms. We used a learning-based Genetic Algorithm (GA) for the ensemble learning approach. These ten algorithms were applied and compared to evaluate the accuracy of diabetes prediction for two different approaches to machine learning, and they scored 98% on average, which is higher than previous machine learning algorithms. The rest of the article is as follows: we will review the method in section two and examine the results in section three. Then, we will review the discussion in the fourth section. Finally, in the last section, we will review the conclusions and future work. The present study deals with an ensemble stacking-based learning methodology for detecting diabetes. This section provides a brief review of the literature on numerous metaheuristic optimization-based and ensemble learning-based prediction methods of diabetes. Measures have been taken to reduce the number of chronic disease diagnostic tests to reduce overall costs. One of the possible solutions is to use machine learning techniques in healthcare data, which are used to find frequent patterns in an extensive database to obtain helpful information. Machine learning methods are instrumental in diagnosing diabetes and increasing its efficiency. One of the most important challenges for machine learning researches is accurately diagnosing diabetes. For example, Fatima et al. [28] have studied different machine learning methods to diagnose various diseases such as cardiopathy, diabetes, liver, and hepatitis and have succeeded in diagnosing this disease. Besides, Alić et al. [29] have used artificial neural networks and Bayesian networks to diagnose and classify diabetes, aiming to evaluate artificial neural networks and Bayesian networks and their application to classify Type 2 diabetes (T2D) cardiovascular disease. Kumar et al. [30] used three distinct data mining arrangement methods, e.g., Simple Biz (NB), support vector machine (SVM), and decision tree decision-making approach to potential approaches to predict the likelihood of heart disease for people with diabetes. However, the accuracy of their prediction is dependent on the accuracy of their prediction. Medical bioinformatics analyses were used by Saru et al. to predict diabetes [31] . Subramaniyan et al. [32] was predictive analytics using machine learning to evaluate massive data to anticipate future difficulties in diabetic patients. Yang et al. [33] also created a computer method that combined multiple forms of physical examination data to predict the risk of diabetes. Today, ML offered various tools for efficient data analysis. Especially in the last few years, the digital revolution has provided affordable and accessible tools for collecting and saving data. Data collection and examination machines are located in new and modern hospitals to collect and share data in large information systems. ML technology is very effective for analyzing medical data and has an influential role in solving diagnostic problems. Correct diagnostic data is presented as medical history or report in modern hospitals or their specific information department. These techniques are accustomed to classifying the information set [28, 34] . To evaluate the early diagnosis of diabetes, Zeki et al. [35] used three DM methods: Nave Bayes (NB), logistic regression (LR), and Random Forest (RF). The findings of the RF test, according to their research, showed that it has the best level of accuracy when compared to other procedures. In addition, according to the Kalyankar et al. study [36] , the machine learning technique in the Hadoop MapReduce environment was used to detect missing values and discover trends in the Pima Indian Diabetes dataset. According to the patient's risk level, this research can forecast the forms of diabetes mellitus, associated future hazards, and the type of therapy supplied. Predictive algorithms based on data mining for evaluating diabetes data can aid in the early diagnosis and prediction of the condition and important link events such as hypo/hyperglycemia. Various approaches have been developed to diagnose, forecast, and classify diabetes [14] . The Internet of Things (IoT) is increasingly being utilized to implement various applications, particularly as the amount of data available grows. The Internet of Things can be used in various applications, including patient monitoring systems. For example, we can use the Internet of Things to evaluate data and offer it to physicians and paramedics in the health industry. We can identify solutions for longer and healthier lives by analyzing, processing, and exploiting the knowledge and information contained in big data on health issues and illness trends in a specific population. In many ways, extensive data analysis improves healthcare insight [37] . Each year, several expenditures are associated with treating and diagnosing patients with diabetes. Diagnostic techniques that have been used in the past are time-consuming. As a result, the most significant and urgent concern is precise forecasting and dependable procedures [38] . Machine learning techniques used to healthcare data could be one option. Diabetes can be diagnosed and managed more efficiently using machine learning approaches. Researchers have investigated several machine learning methods for identifying diseases such as cardiopathy, diabetes, liver, and hepatitis and have been successful in doing so. Many research suggests that artificial neural networks, random forest networks, and Bayesian networks are the most accurate ways for diagnosing and classifying diabetes when compared to other techniques. However, we offer a paradigm that incorporates artificial neural networks, Bayesian networks, and seven different methods for categorization. Ensemble Learner is the name given to this combination strategy. We proposed our approach for more accurate disease prediction compared to prior models. The experimental research findings also demonstrated that our strategy outperforms artificial neural networks, Bayesian networks, and random forests. Table 1 compares the benefits and drawbacks of the proposed approach for diagnosing diabetes to previous methods. This section discusses all the supported methodologies used in this study. Machine learning techniques have been widely used in many scientific fields. However, this use in the medical literature is limited partly because of technical difficulties [49, 50] . • Decision tree: a decision tree is a decision support tool that uses trees to model [51, 52] . • Naive Bayes Classifier: a ensemble of simple classifiers based on probabilities is based on Bayes' theorem, assuming the independence of random variables [53, 54] . • Artificial Neural Network: inspired by how the biological nervous system processes data and information for learning and knowledge creation [55, 56] . • Support Vector Machine: it is one of the supervised learning methods used for classification and regression [56, 57]. • C 4.5: Algorithm C 4.5 is one of the decision tree algorithms, which is very important due to its very high interoperability [58] . • Random Forest: one combination learning method for categorization is regression, which works supported training time and, therefore, the output of classes (classification) or for the predictions of every tree individually, supported a structure consisting of the many decision trees [59, 60] . The suitable choice of k features has a significant impact on the diagnostic performance of the KNN algorithm. An oversized k reduces the effects of variance caused by random error but runs the danger of ignoring small but significant patterns [59, 61, 62 ]. The following figure shows the proposed flowchart [62] . Within the following, we will examine the popular methods of mixing categories. • Bagging: one of the most straightforward and successful combined approaches to improving the classification problem is the Bagging algorithm, commonly used for decision trees. This algorithm is beneficial for bulk data and will work well for unstable learning algorithms, that is, algorithms that change due to changing data. • Boosting: this algorithm aims to combine several weak classifiers and obtain a strong one to improve performance, in which the predictors are trained continuously. • Ada Boost: a meta-algorithm is designed to improve the performance and solve the problem of unbalanced categories, which produces a robust and high-quality learner from a combination of three-week learners. This algorithm combines weak learners to produce an accurate classifier [63] [64] [65] [66] [67] . This study provides an intelligent monitoring system for patients and older people with chronic diseases using the collected data for effective diagnosis and prediction in noncritical situations to promote smart health and prevent deaths on IoT infrastructure using the intelligent ensemble learning algorithms. Ensemble methods are learning algorithms that construct an ensemble of classifiers and classify new data points by taking a (weighted) vote of their predictions. The original ensemble method is Bayesian averaging, but newer algorithms include error-correcting output coding, Bagging, and boosting. This article reviews these methods and explains why ensembles often perform better than any classifier. Since the use of an intelligent machine learning algorithm in diagnosing and predicting diseases has not been successful in many scenarios and considering the content and challenges mentioned in the background section, there are many challenges to smart health in IoT, need to be addressed; one of which is the accurate diagnosis and prediction of disease outcomes. This paper uses the new machine learning approach called Ensemble Learning to diagnose and predict chronic diseases described below. The purpose of this article is to improve the accuracy and speed of diagnosis of chronic diseases in the context of the intelligent network by which we want to use ensemble learning approaches and a new meta-learner in stacking learning. Stack generalization is an approach that allows researchers to connect several different prediction algorithms to a combination. In this study, the data set of type 2 diabetes available in (https://archive.ics.uci.edu/ml/datasets/diabetes) has been used, which has nine useful variables and 768 records. These variables and abbreviations are listed in Table 2 . 70% of the data is for training, and 30% of the information is for testing. In addition, another dataset has been used to teach algorithms. This data has been prepared to investigate factors associated with readmission yet as other outcomes regarding patients with diabetes. The data set of heart patients is available in (https://archive.ics.uci.edu/ml/datasets/Diabetes+ 130-US+hospitals+for+years+1999-2008) has been used, which has 55 useful variables and 100.000 records. Variables and their abbreviations are listed in Table 3 . 80% of the data is for training, and 20% of the information is for testing. On the analysis of big clinical databases, the Knowledge Discovery in Databases (KDD) methodology appears to be appealing. The preprocessing step (data cleaning and management of missing values) is critical in the KDD process since it determines the quality of the results acquired by data mining processes and takes up roughly 80% of the total project time. Data preprocessing is vital to arrange the diabetes type data and Pima Indians data to accept a machine learning model. Separating the training and testing data sets ensures that the model learns only from training data and tests its performance with the testing data. Therefore, the data set used was divided into training and test data. The training data contain 70% of the data set, and the test and validation data include 15% each. At first, all the data was shuffled. This article develops a stacking-based evolutionary ensemble learning system, "Stacked Generalization based Metaheuristics," to predict the onset of Type-2 diabetes mellitus (T2DM) within five years. Before learning, as a data preprocessing step, the missing values and outliers were identified and imputed with the median values. Several machine learning optimization algorithms are utilized for base learner selection, which simultaneously maximizes the classification accuracy and minimizes the ensemble complexity. As for model combination, Bagging, Boosting, and Ada boost are employed as a meta-classifier that combines the predictions of the base learners [68] . The comparative results demonstrate that the proposed stacking genetic method outperforms several individual ML and conventional ensemble approaches. Figure 2 depicts the learning process with stacked generalization based on the model selection from 9 (Table 5 ) base learners and three stacking-based combination methods. Stack generalization is a different technique for combining several different classifiers such as decision tree, artificial neural network, support vector machine, etc., which consists of two stages: Basic learners at level zero and stacking model learners at level one; at level zero, several different models are used to learn from the dataset, and the output of each model is used to make a new dataset. For example, Fig. 3 shows the Stacking algorithm [69] . Algorithms: an overview. Figure 4 shows a flow chart of the algorithms we applied in this study. The stack generalization learning algorithm is shown in Fig. 3 . Here, the Pima Indian diabetes and Diabetes 130-US hospitals for years 1999-2008 dataset is considered for testing all the models. The source of this dataset is the UCI repository [70] [71] [72] [73] [74] . To determine whether ensemble predictors constructed using stacked generalization improve the prediction accuracy for diabetes, we constructed different stacked generalization frameworks using the same parameters in individual algorithms. Of course, we use Feature selection to select the best feature to improve the accuracy suggested method. Feature selection reduces the number of attributes while keeping a subset of the original features. Feature selection is frequently used in data preparation to find previously unknown features useful to classification tasks and remove unnecessary or redundant features. The goal of feature selection is to boost classification accuracy. GA is used to reduce insignificant features in this study. We defined chromosomes as a mask for characteristics to achieve this goal [75] . In this work, GA is used to eliminate insignificant features. In order to reach this purpose, we defined chromosomes as a mask for features. To put it another way, each chromosome is a collection of characteristics. The number of characteristics indicating a diabetes patient's specification is equal to the size of the chromosome (number of genes). As previously stated, a chromosome is represented by a binary string that GA is one of the initial population-based random algorithms proposed in history. Select, cross, and mutant are the most common GA operators [76] . These algorithms use recombinant operators to store crucial information in simple chromosome-like structures that encode a possible solution to a specific problem. Although the wide variety of genetic algorithms is very wide [77] , GAs are frequently regarded as performance optimizers. A simple genetic algorithm's operating concept is depicted in Fig. 5 . Other parameters exist in GA. Executive parameters such as mutation and elite rates and structural characteristics such as population size must be modified. However, we employed the most frequent values for these factors, which yielded satisfactory results. The significance of the GA parameters utilized in this experiment is shown in Table 4 . The conventional GA method was employed in this experiment. There are four significant steps in the GA method: (1) The features were coded as genomes in binary, with '1' denoting selection and '0' denoting non-selection (phenotypes labeled as '0' denote features that were eliminated, whereas phenotypes labeled as '1' denote features that were chosen)). Thus, phenotypes labeled as '0' are referred to as decreased features, whereas phenotypes labeled as '1' are very significant features). Then, based on the concept of phenotypes, each genotype creates a collection of subsets. The proposed approach uses these subsets as training sets. Then, (1) each chromosome was evaluated using the fitness function, and the best features were picked; (3) the chromosomes were modified using crossover and mutation to form a new generation of the population; and (4) the new generation continued to (2) until halting criteria were satisfied. A population of 50 people was estimated. This ranking was utilized as a part of a selection strategy in which people who scored in the top half of their fitness levels were chosen to have children. The following population was created using a single point crossover with a ratio of 0.6 and a single point jump with a ratio of 0.033. First, the elitist strategy was set at 2, which meant that the two most minor members of the current generation were included in the next population. Then, the number of generations with the best value was set to twenty for the same fitness. The number of generations was finally set to 100. There were 100 GA executions, each with different initial conditions (some data splitting). The classifier fit was evaluated using a linear ranking value in this work. A statistical measure of the agreement between expected and actual values is linear ranking. The classifier's performance is determined by how the training set is generated. Cross-validation was done using a repeated sub-random sampling technique due to the small data. Data were randomly partitioned into 70% training sets and 30% validation sets before each GA analysis. Because of the linear ranking value, each stage's performance was recorded. Due to the value of genome fit, the center of the ten linear ranking values was recorded [76, 77] . Two-parent solutions are employed in crossovers to care for a toddler. The selection (production) procedure then comes to a close, followed by the event of higher persons. To pair two chromosomes, this study uses a single-point crossover. Two chromosomes are cut once, and the slices are swapped between them. When the crossing procedure is finished, the strings are inserted to perform the mutation process. Small amounts are rotated from 0 to 1 and 1 to 0 in touch mutations [76, 77] . Hyperparameters are variables whose values influence the learning process and affect the learning algorithm's model parameters. As the prefix ' hyper _ ' suggests, they are 'toplevel' parameters that regulate the learning process and the model parameters that come from it, as the prefix 'hyper_' suggests. Before you start training your model, as a machine learning engineer, you choose and establish hyperparameter values that your learning algorithm will employ. This paper investigates many approaches to establish which factors do not affect model performance, a set of parameters that may have an additional impact on model performance and offers appropriate parameters for diabetes research to show the issues involved with identifiability. The purpose of parameter suggestions is to locate the collection of parameter values that reduce your cost function to the smallest possible value. For Machine Learning Algorithms, there are six Must-Know Parameters, such as: We have the trained model parameters at the end of the learning process, which is effectively what we refer to as the model. Support Vector Machine (SVM): like gradient boosting, the SVM algorithm is viral, very effective, and has many hyperparameters to tweak. The choice of the kernel that will regulate how the input variables are projected is maybe the most significant parameter. This work evaluated the subset of selected genes using an SVM classification model with an RBF kernel. There are numerous to pick from, but linear, polynomial, and RBF are most frequent. The penalty (C), which can take on various values and has a significant impact on the geometry of the resulting areas for each class, is another important parameter. Random Forest (RF): the number of random features to sample at each split point (max features) is the most critical parameter in Random Forest. In this study, a range of integer values, such as 1 to 20, or 1 to half the number of input features, are tested. However, after careful considera-tion and testing, we decided to change this number to zero. The number of trees (n estimators) is another key element for the random forest. This should ideally be increased until the model shows no more improvement. However, a log scale from 10 to 1000 could be helpful too. In this paper, Ring was utilized to determine the optimal settings. For this parameter, the best-selected scale is 100. K-Nearest Neighbors (KNN): the most important hyperparameter for KNN is the number of neighbors (n neighbors). We tested values ranging from 1 to 21 in this research. In this paper, the best selected for this parameter N neighbors 5 was used. This study also looked at distance measures (metrics) for determining the composition of a neighborhood. For the metric, I used Minkowski after checking. Multilayer Perceptron (MLP): a variety of hyperparameters, such as the number of hidden neurons, layers, and iterations, must be tuned in order for MLP to work. Grid search is a method for optimizing model hyperparameters. It would be best to give a dictionary of hyperparameters to evaluate in the param grid argument when creating this class. This is a map containing the model parameter's name and an array of values to attempt. Grid-search was utilized in this paper to discover the optimal parameter for this technique. Grid-search is configured as follows. Decision tree (DTree): DTree are a great technique to categorize classes because, unlike Random Forests, they are transparent or white box classifiers, which means we can see the logic behind their classification. The function for determining a split's quality. The criterion "Gini" for Gini impurity and "entropy" for information gain is supported. In this paper, entropy was used as a criterion parameter. The entropy measure is used as the impurity measure, and information gain splits a node to deliver the highest information gain. On the other hand, Gini Impurity examines the divergences between the probability distributions of the target attribute's values and splits a node so that the least amount of impurity is produced. The strategy used to choose splitter "best" the split at each node used. AdaBoost: the number of decision trees employed in the ensemble is an essential hyperparameter for the AdaBoost method. For the model to perform successfully, there must be many trees put to it, often hundreds, if not thousands. The "n estimators" option can specify the number of trees. This parameter was set at 100 in this study. Naive Bayes (NB): classifiers are scalable, with several parameters proportional to the number of variables (features/predictors) in a learning problem. The parameter set for the Naive Bayes classifier is somewhat narrow. Depending on the implementation, the number of classes may be the only parameter we do not influence over in actuality. Extra Trees: it is simple to use because it only includes a few key hyperparameters and logical rules for tuning them. The number of decision trees in the ensemble, the number of input features to select and consider for each split point randomly, and the minimum number of samples necessary in a node to establish a new split point are the three significant hyperparameters to tune in the algorithm. The number of decision trees utilized in the ensemble is an essential hyperparameter for the Extra Trees technique. This setting is set to auto default in this paper. The Extra Trees algorithm, like Random Forest, is unaffected by the value utilized, despite being a critical hyperparameter to control. It is set via the max_features argument and defaults to the square root of the number of input features. In this case, for our test dataset, this would be three features. Gradient boosting (GB): among data scientists, GB is a very popular prediction model. The following are the parameters used in this algorithm. • Learning rate: this influences how much each tree affects the final result. This parameter was set to 0.1 in this study. • N estimators: this is the number of sequential trees modeled. Even though GBM is fairly resilient when dealing with many trees, it can nonetheless overfit at times. As a result, a CV should modify this for a specific learning rate. • Loss: the loss function that must be minimized in each split. For classification and regression cases, it can have a variety of values. In most cases, the default settings are sufficient. The term 'deviance' for loss refers to deviance ( logistic regression) for classification with probabilistic outputs, employed in this paper. In Table 5 -RF is the Random Forest; KNN-k-Nearest Neighbors algorithm; MLP-multilayer Perceptron; Ada Boost-Ada Boost; D Tree-decision tree algorithm; NB--Naive Bayes; GBC-gradient boosting classifier algorithm; SVM-Support vector machine, Extra Trees-Extremely Randomized Trees Classifier. For the study, Jupyter notebook was used for implementation, and Python, the programming language, was used for coding. For the study, Jupyter notebook was used for implementation, and Python, the programming language, was used for coding. Among all models, we selected the model with the most predictive accuracy. This article efficiently used cost-benefit analysis (the disruption matrix), ROC curve, and other model selection issues such as accuracy. Performance measurement is used to determine the effectiveness of the classification algorithm so that, in the case of two-dimensional classification problems, one can show the cost of classification with a cost matrix for two types of false positive (FP) and false-negative (FN) errors and two types of classification into the positive true (TN) and negative true (TN) that give different costs and benefits. As shown in Tables 6 and 7 [6, 49, 50, 78] . Diabetes is a condition in which blood flow is obstructed throughout the body. The retinal blood vessels may leak in this disorder, resulting in retinal edema [79] . Four learning strategies for extracting patterns from data have been described based on data types: supervised, semi-supervised, unsupervised, and reinforced. Labeled data is challenging to access in machine learning, but unlabeled data is frequently collected and accessed quickly. In most initiatives, however, most of the data is unlabeled, but some are [80] . So, in machine learning and data mining, the primary assumption is that the training and future data have the same distribution and properties [81] . During the past years, medical service providers have always manually examined patients' vital signs and diagnosed and predicted the disease based on patient records and research findings. In this study, intelligent machine-learning algorithms are used to diagnose effectively and accurately predict the outcomes of the disease, in which cases such as age, gender, blood pressure, cholesterol, smoking, etc., are considered in the diagnosis of this disease. Finally, the risk of the disease against the mentioned diseases is determined. Table 8 compares the similar works of others with ours. Table 8 shows the results. Our model's ability to forecast people with diabetes When compared to the findings of other researchers, it is high, with acceptable accuracy. These models can be integrated into an online computer software to assist doctors in predicting the onset of diabetes in patients and offering required preventive measures. The success of an ensemble learning system is based on a variety of classifiers that make it up. If all classifiers present the same output, it is impossible to correct a possible error. So, they are more likely to have different errors on different samples. If each classifier presents another mistake, you can reduce the total error after their strategic combination. So, such a set of classifiers must be diverse. This diversity can be achieved in different ways, as shown in Fig. 6 [86] . • Use different training datasets is to train classifiers. • Use different training parameters is for different classifiers. • Use different classifiers. Today, due to the lack of knowledge about using different data around us, they are neglected by managers. In contrast, if these seemingly insignificant data are purposefully stored and then mined, it will generate much knowledge and help us make managerial decisions. In this study, the genetic algorithm with logistic regression and random forest is used to select the appropriate feature is based on the correct diagnosis of the desired class, after applying statistical and probabilistic approaches in the data set and also preprocessing to remove redundant and lost data to extract features that have more variance in the complications of diabetes. The genetic algorithm selects a subset of the most essential qualities for classification [87] . Logistic regression algorithm and a random forest to calculate the accuracy and examine the features with more variance in the rapid diagnosis of diabetes. We applied the data set properties as input to both algorithms. Then this algorithm calculated the accuracy of We chose ensemble learning in this work since it usually outperforms any trained models [24] . It has been successfully applied to both supervised (Regression, [25] Classification, and Distance learning [26] ) and Unsupervised (Density estimation) learning tasks [27] . It has also been used to figure out how much packing fault [38, 69] . Using a series of models instead of a single model is advantageous for various reasons: • Performance: compared to a single model, there is a significant performance improvement. • Error reduction: predictive errors in machine learning models can be described using bias and variance. As a result, this paper aims to identify the limitations of machine learning algorithms employed by other researchers in the accurate diagnosis of diabetes and compare them to the ensemble learning approach for better outcomes. Using these findings, this chapter describes the limitations of machine learning models used for diabetes diagnosis using the dataset, with the goal of highlighting critical issues such as data quality, data quantity, explainability, and data privacy while getting quick results. The following are the article's strengths: • Combining the greatest machine learning architectures for voting. Diagnosis of detection diabetes was compared in rates among various algorithms with the holdout approach Diagnosis of detection diabetes was compared in rates among various algorithms with the K-fold crossvalidation approach • Achieve a high level of classification reliability. • When compared to single-core models, accuracy and error have improved. • A comparison of the proposed method's outcomes. Future research could resolve several significant shortcomings in this study. The study's first goal was to predict diabetes. and similar things: • The sample size. • Data that is not readily available or is not trustworthy. • Inadequate access to hospital information. Diabetes has become one of the most important concerns of people and officials due to irreversible complications and its high prevalence. The PID and Diabetes 130-US hospital's 1999-2008 database was used to diagnose diabetes in this study. Data mining methods have been widely used in medicine and health care to diagnose and prevent diseases, choose treatment methods, and predict deaths and treatment costs during the last years. For this purpose, we used an ensemble learning algorithm called stacked generalization based on genetic algorithms to classify diabetic patients based on the observed complications. This study aimed to combine data mining algorithms to show that combining models can improve models. The highest accuracy was obtained using the proposed Stack Generalization algorithm according to the methods used Intelligence. A recent study has developed many machine learning algorithms for predicting diabetes. The ensemble learning method for the best diabetes prediction is key to this study. Through the site, we have gathered patient information. Following data collection, only the relevant features were eliminated from each data set to improve the proposed model's accuracy and remove unrelated features that slowed down calculation. The ensemble learning approach works more carefully in larger healthcare datasets and gives better outcomes. Finally, we developed a diabetes data gathering and ensemble learning approach for accurate and timely prediction. Our recommended system's total performance is between 98.8 and 99.9%. As a result, new medical researchers will benefit from future research and academic practice, particularly for Internet of Things-based prediction systems. In future research, considering the importance of diagnosing the disease, we intend to expand the research in the field of diagnosing diseases such as breast cancer metastasis, lung cancer, Covid-19 by data mining tools and proposed algorithm and also develop and implement the subjects 1-Consumption of drugs and s upplements (drug interaction) and 2-Provide solutions to caregivers and 3-Introduce a specialist related to the disease and 4-Online medical services. Author contributions JA: Designed and performed experiments, analyzed data. BNM: supervised the findings of this work and co-wrote the paper. All authors discussed the results and contributed to the final manuscript. Funding None. Improving type 2 diabetes mellitus glycaemic control through lifestyle modification implementing diet intervention: a systematic review and meta-analysis Identification of stress-related microRNA biomarkers in type 2 diabetes mellitus: a systematic review and meta-analysis Combined lifestyle factors and risk of incident type 2 diabetes and prognosis among individuals with type 2 diabetes: a systematic review and meta-analysis of prospective cohort studies Depression in context: Important considerations for youth with type 1 vs type 2 diabetes Type 2 diabetes Improving diabetes diagnosis in smart health using genetic-based Ensemble learning algorithm. Approach to IoT Infrastructure Fasting blood glucose at admission is an independent predictor for 28-day mortality in patients with COVID-19 without previous diagnosis of diabetes: a multi-centre retrospective study Diagnosis of diabetes in pregnant woman using a Chaotic-Jaya hybridized extreme learning machine model Cloud-assisted industrial internet of things (iiot)-enabled framework for health monitoring Accurate wearable heart rate monitoring during physical exercises using PPG Federated internet of things and cloud computing pervasive patient health monitoring system A conceptual framework for trajectory-based medical analytics with IoT contexts Exploiting smart e-health gateways at the edge of healthcare internet-of-things: a fog computing approach Cloud-centric IoT based disease diagnosis healthcare framework Machine learning in the Internet of Things: designed techniques for smart cities Estimation of the warfarin dose with clinical and pharmacogenetic data Predicting warfarin dosage from clinical data: a supervised learning approach Evolutionary ensemble learning algorithm to modeling warfarin dose prediction for Chinese Stacked regressions Using stacked generalization to predict membrane protein types based on pseudo-amino acid composition Machine learning technique to prognosis diabetes disease: random forest classifier approach Investigating health-related features and their impact on the prediction of diabetes using machine learning A tongue features fusion approach to predicting prediabetes and diabetes with machine learning Ensemble learning. Handb Ensemble learning Ensemble learning Ensemble learning: A survey Survey of machine learning algorithms for disease diagnostic Machine learning techniques for classification of diabetes and cardiovascular diseases Comparative analysis of data mining techniques to predict heart disease for diabetic patients Analysis and prediction of diabetes using machine learning Semi-supervised machine learning algorithm for predicting diabetes using big data analytics Risk prediction of diabetes: big data mining with fusion of multifarious physical examination indicators Comparison of machine learning algorithms for clinical event prediction (risk of coronary heart disease) Developing a predictive model for diabetes using data mining techniques Predictive analysis of diabetic patient data using machine learning and Hadoop Diabetic patients monitoring and data classification using IoT application Diabetes prediction using machine learning techniques Severity classification of diabetic retinopathy using an ensemble learning algorithm through analyzing retinal images A smart healthcare recommendation system for multidisciplinary diabetes patients with data fusion based on deep ensemble learning Comparative study of ensemble learning algorithms on early stage diabetes risk prediction Diabetic retinopathy detection using texture features and ensemble learning Early detection of type 2 diabetes mellitus using machine learning-based prediction models Predictive supervised machine learning models for diabetes mellitus Diabetes detection using machine learning classification methods Deep learning approach for diabetes prediction using PIMA Indian dataset Performance comparison of machine learning techniques on diabetes disease detection A classification system for diabetic patients with machine learning techniques Deep neural network based ensemble learning algorithms for the healthcare system (diagnosis of chronic diseases) Deep neural network based ensemble learning algorithms for the healthcare system (diagnosis of chronic diseases Decision tree classifier: a detailed survey Induction of decision trees Comparative study of K-NN, naive Bayes and decision tree classification techniques PCA-NB algorithm to enhance the predictive accuracy Kidney disease prediction using SVM and ANN algorithms 12 fast training of support vector machines using sequential minimal optimization Classification and regression by randomForest An evaluation of Guided Regularized Random Forest for classification and regression tasks in remote sensing Introduction to machine learning: k-nearest neighbors Stacked generalization: an introduction to super learning GIS-based modeling of rainfall-induced landslides using data mining-based functional trees classifier with AdaBoost, Bagging, and MultiBoost ensemble frameworks Induction of fuzzy-rule-based classifiers with evolutionary boosting algorithms Boosting theory towards practice: Recent developments in decision tree induction and the weak learning framework Prediction of weatherinduced airline delays based on machine learning algorithms The boosting approach to machine learning: An overview Analysis of prediction accuracy of diabetes using classifier and hybrid machine learning techniques Ensemble learning A stacked generalization approach for diagnosis and prediction of type 2 diabetes mellitus Stacking-based multi-objective evolutionary ensemble framework for prediction of diabetes mellitus A multi-class classification model for supporting the diagnosis of type II diabetes mellitus Early temporal prediction of type 2 diabetes risk condition from a general practitioner electronic health record: a multiple instance boosting approach Realizing a stacking generalization model to improve the prediction accuracy of major depressive disorder in adults Detection and prediction of diabetes using data mining: a comprehensive review Evolutionary Algorithms and Neural Networks. Studies in Computational Intelligence Genetic algorithm accurate detection of breast cancer metastasis using a hybrid model of artificial intelligence algorithm A new model for retinal lesion detection of diabetic retinopathy using hierarchical self-organizing maps A review of various semi-supervised learning models with a deep learning and memory approach Heterogeneous transfer learning techniques for machine learning Classification of diabetic patient data using machine learning techniques A novel machine learning framework for diagnosing the type 2 diabetics using temporal fuzzy ant miner decision tree classifier with temporal weighted genetic algorithm Automated detection of diabetic retinopathy using SVM An efficient decision support model based on ensemble framework of data mining features assortment & classification process Stacked generalization Application of a genetic algorithm to feature selection under full validation conditions and to outlier detection Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations The authors declare that they have no conflict of interest.