key: cord-0059995-2rz5aa4k authors: Hammam, Ahmed A.; Elmousalami, Haytham H.; Hassanien, Aboul Ella title: Stacking Deep Learning for Early COVID-19 Vision Diagnosis date: 2020-07-29 journal: Big Data Analytics and Artificial Intelligence Against COVID-19: Innovation Vision and Approach DOI: 10.1007/978-3-030-55258-9_18 sha: 8c5d79c7fc7e071fc17d55e9dacdf0abf50708f0 doc_id: 59995 cord_uid: 2rz5aa4k early and accurate COVID-19 diagnosis prediction plays a crucial role for helping radiologists and health care workers to take reliable corrective actions for classify patients and detecting the COVID 19 confirmed cases. Prediction and classification accuracy are critical for COVID-19 diagnosis application. Current practices for COVID-19 images classification are mostly built upon convolutional neural network (CNNs) where CNN is a single algorithm. On the other hand, ensemble machine learning models produce higher accuracy than a single machine leaning. Therefore, this study conducts stacking deep learning methodology to produce the highest results of COVID-19 classification. The stacked ensemble deep learning model accuracy has produced 98.6% test accuracy. Accordingly, the stacked ensemble deep learning model produced superior performance than any single model. Accordingly, ensemble machine learning evolves as a future trend due to its high scalability, stability, and prediction accuracy. The world health organization (WHO) declared that the COVID-19 outbreak is an international pandemic on 11th March 2020. On 24th April 2020, the number of confirmed cases has increased up to 3,042,444 cases and 211,216 deaths around more than 209 countries. The number of confirmed cases and deaths grow exponentially in our world. Therefore, all efforts should be integrated to fight COVID 19 pandemic. In the age of digital transformation, and machine learning (ML) play a key role in processing the data to be converted into knowledge and decisions [1] . In the age of digital transformation, big data and machine learning (ML) play a key significant in processing the data to be converted into knowledge and decisions [2] . Several means of COVID-19 diagnosis can be applied to identify the confirmed cases of COVID-19. Radiologist's diagnosis includes computed tomography (CT) scans; chest X-ray (CXR) radiographs [3] . In any case, CT scans and X-ray pictures are time expending and exhausted indeed for master radiologists. Computer vision and deep learning computing such as convolutional neural networks (CNN) can viably offer assistance radiologists for identifying COVID-19 affirmed cases. Based on the chest CT scans, radiologists can detect the (COVID-19) pneumonia and the arrange of persistent recuperation or weakening. Computerized insights models can precisely produce early location for the conclusion of the patients of COVID-19 by detecting the early lung thermal signs within the X-ray images. A beginning inception neural model has been connected for two-fold classification for tainted with COVID-19 or wellbeing people utilizing 1119 CT pictures [4] . Using 6000 CT, U-Net++ system can distinguish the COVID-19 patients with 93.55 and 100% for specificity and sensitivity [5] . Feature Pyramid Network have been used for to distinguish the COVID-19 cases with an acceptable total accuracy of 86.7%. Accordingly, the fully-connected layers produced better results with sensitivity of 0.93 and AUC of 0.99 [6, 7] . A convolutional neural network (CNN) is a deep neural network that is conducted for computer vision applications [8] . CNN architecture is commonly applied for biomedical processing, analysis and classifications. CNNs are consisted of regularized multilayer perceptron and filters. The hidden layers of CNNs typically are represented of a series of convolutional layers where these convolutional layers convolve based on dot product to extract the features of each sample of images pixels. Each convolutional layer has the following parameters: Convolutional kernels, the number of input channels and output channels, the depth of the convolution filter. CNNs includes pooling layers to reduce the dimensions of the data by converting the neuron outputs to one layer based on mathematical voting such as average. Multi-layer perception neural network (MLP) can be used as a Fully connected to classify the images as displayed in Fig. 1 . MobileNets proved its efficiency in several applications for embedded vision applications and mobile applications. MobileNets applies streamlined architecture based on depthwise convolutions for developing light weight model. MobileNets optimizes the global parameters by trading off accuracy and latency [9] . The key advantage of MobileNets is using limited hardware resources for computing by reducing the network parameters and maintains the model accuracy. MobileNets needs 1/33 of parameters needed for VGG-16 to achieve the same classification accuracy [10] . Inception is a deep convolutional neural network (CNN) Inception architecture optimizes quality of the architecture design using multi-scale processing and Hebbian principle [11] . Training deeper neural networks is a high computational process. Therefore, residual learning model has been developed to substantially train deeper models using residual functions related to the input layer as showed in Fig. 2 . The key advantage of Residual nets is improving accuracy with depth of neural model and easier for optimization. Residual nets model has achieved a first place on the ILSVRC 2015 classification task [12] . Model ensemble is a technique in which the predictions of a collection of models are given as inputs to a second-stage learning model. Ensemble learning helps improve machine learning results by combining several models. This approach allows the production of better predictive performance compared to a single model. The ensemble ML algorithms are depending on ensemble voting such as majority, plurality voting, "hard", or "soft" voting. In hard voting, the final class label is predicted as the class label that has been predicted most frequently by the classification models. In soft voting, the class labels are predicted by averaging the class-probabilities. The soft voting is only recommended if the classifiers are wellcalibrated [13, 14] . In majority voting (Hard Voting) can be formulated as the following formula:ŷ The weighted majority vote can be computed by associating a weight wj with classifier C j as the following formula: where (i) is the outcome of the classifier or class labels. In soft voting, the class labels are predicted based on the predicted probabilities pj for classifier. This approach is only recommended if the classifiers are well-calibrated. Stacking is to ensemble several classifications algorithms such as Bagging or Boosting techniques. Stacking (Stacked Generalization) is applying different algorithms to learn part of the problem space and combining these different algorithms. Stacking paradigm improves the overall accuracy than any other individual based learner [15] . As shown in Fig. 3 , implementation of stacking models consists of two main levels; Level-0 is training base learners (model A, model B and model C) where each model produces different classifications. Level-1(generalizer) is collecting the classification of each based learner to make final classification. The scope of the current research will focus on development stacking ensemble deep learning model for early COVID-19 diagnosis prediction using deep stacking model to boost the accuracy of single computer vision algorithms. As illustrated in the Fig. 5 . The research methodology can be conducted through the Algorithm 1 and the following steps: • Collecting data set of X-ray or CT scans for COVID-19 confirmed cases. • Applying pre-processing techniques to remove missing data. • Applying different classification computer vision models. • Compare the results and performance of the applied model algorithms to rank using classification evaluation techniques. • Apply the stacking ensemble deep learning using the best performance models. • Compare the results of the stacking ensemble deep learning against the best single model using classification evaluation techniques. The data set consist of 500 X-ray images where the data set has been divided d into three subsets: training set (80%), validation set (10%) and testing set (10%). The whole X-ray images have two labels: 0 for positive COVID-19 case and 1 for negative COVID-19 case as shown in Fig. 4 . To compare machine learning algorithms, the identical blind validating cases used to test the performance of the algorithm. The data set has been dividing into a training set (80%), validation set (10%) and testing (10%) where the validation cases are excluded from the training data to ensure the generalization capability. Classification accuracy (Acc), specificity and sensitivity are scaler measures for the classification performance. Moreover, receiver operating characteristics (ROC) is a Fig. 4 A sample of X-ray images dataset for normal cases (first row) and COVID-19 patients (second row) graphical measure for classification algorithm [16] . The receiver operating characteristics (ROC) curve is a two-dimensional graph in which the true positive rate (TPR) represents the y-axis and a false positive rate (FPR) is the x-axis. Classification accuracy (Acc) computes the ratio between the correctly classified instances to the total number of samples as the following equations: where: true positive (TP); false positive (FP); true negative (TN); false negative (FN). Based on ROC, the perfect classification happens when the classifier curve possesses through the upper left corner of the graph. The study applies six different deep learning models for COVID-19 classification. These models are MobileNet, InceptionResNetV2, ResNet50, ResNet50V2, Incep-tionV3, and VGG16. Table 1 shows the setting for each model used for training process; the number of epochs was 100, learning rate was 0.001, and the optimization algorithm was Adam. Figures 6 and 7 illustrates the training accuracy and validation accuracy during for the applied models. Moreover, the corresponding confusion matrix to each frame superior performance than any single model. The stacked model improves the accuracy of COVID-19 classification by 1.54% than other the highest accurate applied models. We proposed stacked ensemble deep learning model by combining the predictions from multiple deep learning models on the same dataset, the models are typically different in architecture that are skilled on the dataset, but in different ways. Stacked is an ensemble method where model figures out how to best join the predictions from numerous current models. The stacked ensemble deep learning model accuracy has produced 0.986 test accuracy. The stacked ensemble deep learning model produced superior performance compared to any single model (top single model one is 0.971). The ensemble learning algorithms could be a future trend for prediction and classification applications where dataset have limited size. Moreover, we plan to further explore more ensemble deep learning approaches to produce higher predictive accuracy than single computer vision algorithm for classification. Day level forecasting for coronavirus disease (COVID-19) spread: analysis, modeling and recommendations Comparison of artificial intelligence techniques for project conceptual cost prediction: a case study and comparative analysis Automatic Xray COVID-19 lung image classification model based on multi-level thresholding and support vector machine A deep learning algorithm using CT images to screen for corona virus disease (COVID-19) Deep learning-based model for detecting 2019 novel coronavirus pneumonia on high-resolution computed tomography: a prospective study Deep learning model to screen coronavirus disease 2019 pneumonia Deep learning enables accurate diagnosis of novel coronavirus (COVID-19) with CT images Imagenet classification with deep convolutional neural networks Mobilenets: efficient convolutional neural networks for mobile vision applications Research on a surface defect detection algorithm based on Going deeper with convolutions Deep residual learning for image recognition Ensemble methods in machine learning Empirical analysis of ensemble machine learning techniques for bug Triaging Scalable stacking and learning for building deep architectures Classification assessment methods