key: cord-0777491-vt9smefx authors: Loey, Mohamed; Manogaran, Gunasekaran; Hamed N. Taha, Mohamed; Eldeen M. Khalifa, Nour title: A Hybrid Deep Transfer Learning Model with Machine Learning Methods for Face Mask Detection in the Era of the COVID-19 Pandemic date: 2020-07-28 journal: Measurement (Lond) DOI: 10.1016/j.measurement.2020.108288 sha: 70ab957d4fc26fc2186be36fca0d1a1a76d3103e doc_id: 777491 cord_uid: vt9smefx The coronavirus COVID-19 pandemic is causing a global health crisis. One of the effective protection methods is wearing a face mask in public areas according to the World Health Organization (WHO). In this paper, a hybrid model using deep and classical machine learning for face mask detection will be presented. The proposed model consists of two components. The first component is designed for feature extraction using Resnet50. While the second component is designed for the classification process of face masks using decision trees, Support Vector Machine (SVM), and ensemble algorithm. Three face masked datasets have been selected for investigation. The Three datasets are the Real-World Masked Face Dataset (RMFD), the Simulated Masked Face Dataset (SMFD), and the Labeled Faces in the Wild (LFW). The SVM classifier achieved 99.64 % testing accuracy in RMFD. In SMFD, it achieved 99.49%, while in LFW, it achieved 100% testing accuracy. The trend of wearing face masks in public is rising due to the COVID-19 coronavirus epidemic all over the world. Before Covid-19, People used to wear masks to protect their health from air pollution. While other people are self-conscious about their looks, they hide their emotions from the public by hiding their faces. Scientists proofed that wearing face masks works on impeding COVID-19 transmission [1] . COVID-19 (known as coronavirus) is the latest epidemic virus that hit the human health in the last century [2] . In 2020, the rapid spreading of COVID-19 has forced the World Health Organization to declare COVID-19 as a global pandemic. According to [3] , more than five million cases were infected by COVID-19 in less than 6 months across 188 countries. The virus spreads through close contact and in crowded and overcrowded areas. The coronavirus epidemic has given rise to an extraordinary degree of worldwide scientific cooperation. Artificial Intelligence (AI) based on Machine learning and Deep Learning can help to fight Covid-19 in many ways. Machine learning allows researchers and clinicians evaluate vast quantities of data to forecast the distribution of COVID-19, to serve as an early warning mechanism for potential pandemics, and to classify vulnerable populations. The provision of healthcare needs funding for emerging technology such as artificial intelligence, IoT, big data and machine learning to tackle and predict new diseases. In order to better understand infection rates and to trace and quickly detect infections, the AI 's power is being exploited to address the Covid-19 pandemic [4] such as the detection of COVID-19 in medical chest X-rays [5] . Policymakers are facing a lot of challenges and risks in facing the spreading and transmission of COVID-19 [6] . People are forced by laws to wear face masks in public in many countries. These rules and laws were developed as an action to the exponential growth in cases and deaths in many areas. However, the process of monitoring large groups of people is becoming more difficult. The monitoring process involves the detection of anyone who is not wearing a face mask. In France, to guarantee that riders wear face masks, new AI software tools are integrated in the Paris Metro system's surveillance cameras [7] . The French startup DatakaLab [8] , which developed the software, reports that the goal is not to recognize or arrest people who do not wear masks but to produce anonymous statistical data that can help the authorities predict potential outbreaks of COVID-19. In this paper, we introduce a mask face detection model that is based on deep transfer learning and classical machine learning classifiers. The proposed model can be integrated with surveillance cameras to impede the COVID-19 transmission by allowing the detection of people who are not wearing face masks. The model is integration between deep transfer learning and classical machine learning algorithms. We have used deep transfer leering for feature extractions and combined it with three classical machine learning algorithms. We introduced a comparison between them to find the most suitable algorithm that achieved the highest accuracy and consumed the least time in the process of training and detection. The novelty of this research is using a proposed feature extraction model have an end-to-end structure without traditional techniques with three classifiers machine learning algorithms for mask face detection. The organization for the rest of the paper is as follows. Section 2 reviews previous related works. Section 3 describes the characteristics of the dataset. Section 4 illustrates the proposed model in detail. Section 5 reports and analyses the experimental results, and Section 6 presents the conclusions and possibilities of future work. Generally, most of the publication focus is on face construction and identity recognition when wearing face masks. In this research our focus is on recognizing the people who are not wearing face masks to help in decreasing the transmission and spreading of the COVID-19. Researchers and scientists have proved that wearing face masks help in minimizing the spreading rate of COVID-19. In [9] , the authors developed a new facemask-wearing condition identification method. They were able to classify three categories of facemaskwearing conditions. The categories are correct facemask-wearing, incorrect facemask-wearing, and no facemask-wearing. The proposed mothed has achieved 98.70% accuracy in the face detection phase. Sabbir et al [10] , have applied the Principal Component Analysis (PCA) on masked and unmasked face recognition to recognize the person. They found that the accuracy of face resonation using the PCA is extremity affected by wearing masks. The recognition accuracy drops to less than 70% when the recognized face is masked. Also, PCA was used in [11] . The authors proposed a method that is used for removing glasses from a human frontal facial image. The removed part was reconstructed using recursive error compensation using PCA reconstruction. In [12] , the authors used the YOLOv3 algorithm for face detection. YOLOv3 uses Darknet-53 as the backbone. The proposed method achieved 93.9% accuracy. It was trained on CelebA and WIDER FACE dataset including more than 600,000 images. The testing was the FDDB dataset. Nizam et al [13] proposed a novel GAN-based network that can automatically remove masks covering the face area and regenerate the image by building the missing hole. The output of the proposed model is a complete face image that looks natural and realistic. In [14] , the authors presented a system for detecting the presence or absence of a compulsory medical mask in the operating room. The overall objective is to minimize the false positive face detections as possible without missing mask detections in order to trigger alarms only for medical staff who do not wear a surgical mask. The proposed system archived 95% accuracy. Muhammad et al [15] presented an interactive method called MRGAN. The method depends on getting the microphone area from the user and using the Generative Adversarial Network to rebuild this area. Shaik et al [16] used deep learning real-time face emotion classification and recognition. They used VGG-16 to classify seven facial expressions. The proposed model was trained on the KDEF dataset and achieved 88% accuracy. This research conducted its experiments on three original datasets. The first dataset is Real-World Masked Face Dataset (RMFD) [17] . The author of RMFD created one of the biggest masked face datasets used in this research. The RMFD dataset consists of 5000 masked faces and 90,000 unmasked faces. Figure 1 illustrates samples of faces with and without masks. In this research, 5000 images for faces with masks and without masks have been used with a total of 10000 images to balance the dataset. The RMFD dataset used for the training, validation, and testing phases. Figure 2 . The SMFD dataset used for the training, validation, and testing phases. The Third dataset used in this research is the Labeled Faces in the Wild (LFW) [19] . It is a simulated masked face dataset that contains 13000 masked faces for celebrities around the round. Figure 3 illustrates samples of LFW images. The LFW dataset used for the testing phase only as a benchmark testing dataset which the proposed model never trained on it. layer, and end with a fully-connected layer, and in between followed by 16 residual bottleneck blocks each block has three layers of convolution layer as shown in Figure 5 . In classification, the last layer in ResNet-50 was removed and replaced with three traditional machine learning classifiers (Support vector machine (SVM), decision tree, and ensemble) to improve our model performance. The main contribution of this research is to construct SVM, decision trees, and ensemble that do not overfit the training process. One of the most popular and spectacular supervised learning techniques with related learning algorithms for treatment classification and regression tasks in patterns is SVM. SVM is a classification machine learning algorithm based on hinge function as shown in Equation 1, where z is a label from 0 to 1, is the output, .w and b are coefficients of linear classification, and I is an input vector. The loss function to be minimized can be implemented in Equation 2 [23] , [24] . The decision tree is the classification model of computation based on entropy function and information gain. Entropy computes the amount of uncertainty in data as shown in Equation 3 . Where D is current data, and q is a binary label from 0 to 1, and p(x) is the proportion of q label. To measure the difference of entropy from data, we calculate information gain (I) as illustrated in equation 4. Where v is a subset of data [25] , [26] . Ensemble methods are algorithms of machine learning that create a collection of classifiers. An ensemble of classifiers is a collection of classifiers whose individual decisions (usually by weighted or unweighted voting) are merged in one way or another to identify new instances [27] . The used Ensemble methods are K-Nearest Neighbors Algorithm (k-NN) [28] , Linear Regression [29] and Logistic Regression [30] . All the experimental trials have been conducted on a computer sever equipped by an Intel Xeon processor (  Three datasets: ─ A dataset of RMFD with real face masks for (training, and testing phases), will be referred to DS1. ─ A dataset of SMFD with fake face masks for (training, and testing phases), will be referred to DS2. ─ A combined dataset from DS1, and DS2 for (training, and testing phases), will be referred to DS3. ─ A dataset of LFW with simulated face masks for (testing), will be referred to DS4.  Datasets for the (training, and testing) are split up to (70% for training, 10% for validation, 20% for testing phase) To evaluate the performance of the different classifiers, performance matrices are needed to be investigated through this research. The most common performance measures to be calculated are Accuracy, Precision, Recall, and F1 Score [32] , and they are presented from Equation (6) to Equation (9) . [38] to improve image classification accuracy, but the classification accuracy wasn't acceptable. The experimental results will be presented in five subsections, and the first subsection will discuss the achieved results for the decision trees classifier while the second subsection will introduce the results for the SVM classifier. Subsection number three will present the obtained results for the ensemble classifier. Subsection four will illustrate the confusion matrices for the different classifiers. Finally, the fifth subsection will illustrate a comparative results analysis with related works according to the testing accuracy. As mentioned earlier in the experimental setup, three datasets (DS1, DS2, and DS3) will be experimented on for training, validation and testing. The DS4 will be used for testing only. Figure 6 illustrates the achieved results for the decision trees classifier in the validation phase for the different datasets. percentage of 98% for the validation accuracy with performance metrics. DS3 is a combined dataset from DS1 and DS2. DS3 is a large dataset in terms of the number of images which help in achieving better accuracies, and more data means better accuracies in machine learning [39] . Although the time is relative from machine to another machine, it is a good indicator to measure the performance of the classifier [40] . Figure 7 illustrates the time consumed by the decision trees classifier for the training process for the different datasets. different testing strategies has been tested through this research and they are summarized as follow:  Training over DS1, and testing over DS1, DS2, DS3, and DS4.  Training over DS2, and testing over DS1, DS2, DS3, and DS4.  Training over DS3, and testing over DS1, DS2, DS3, and DS4. Figure 8 illustrates the achieved percentage for the testing accuracy and performance metrics for the different testing strategies for the decision trees classifier. Figure 8 shows exciting results, and they are 1) on the training over DS1, the decision trees classifier wasn't able to achieve a good classification accuracy 68% in DS2, as DS2 contains a lot of variation of fake masks. That's also will reflect in the DS3 which is a combined dataset from DS1, and DS2. 2) on the training over DS2, the decision trees classifier was able to achieve 93% for DS1, which contains real masks. 3) on the training over DS3, the decision trees classifier achieved the highest accuracy with performance metrics in all datasets. All the achieved results are above 95%. 4) on the DS4 which is used only for testing and never been trained on it, the decision trees classifier achieved a competitive accuracy with 99% whatever the training is performed over DS1, DS2 or DS3. From this subsection, we conclude that the decision trees classifier achieved the highest accuracy possible when the training is performed over the DS3. The highest testing accuracy for DS1, DS2, DS3, and DS4 was 96.78%, 95.64%, 96.5%, and 99.89% respectively. The same experimental trials which were conducted using decision trees classifiers will be performed on the SVM classifier. Figure 9 presents the validation accuracy and performance metrics for the SVM classifier for the different datasets. Fig. 9 . SVM classifier validation accuracy with performance metrics for the different datasets. Figure 9 shows that the SVM classifier achieved a higher validation accuracy for all datasets than the decision trees classifier. In DS1, SVM achieved 98% while decision trees achieved 93% in the validation accuracy. In DS2, the SVM classifier achieved 100% while decision trees achieved 96%. In DS3, SVM achieved 99% while decision trees achieved 98%. The SVM classifier surpasses the validation accuracy along with performance metrics than the decision trees classifier. One more notable remark, training over the DS2 achieved the highest validation accuracy possible with 100% accuracy while in the decision trees classifier the highest validation accuracy was 98% in DS3. The consumed time also is an essential factor in evaluating the performance of the classifier, and Figure 10 illustrated the wasted time for the SVM classifier for the different datasets. Figure 11 shows acceptable results and they are 1) the behavior of the SVM classifier is similar to the decision trees classifier but the SVM classifier achieves a higher testing accuracy. On the training over DS1, the SVM classifier, achieved 82% over DS2 in the testing accuracy while decision trees classifier achieved 68%. On the training over DS2, the SVM classifier, achieves higher accuracies over 97% for all datasets, while the decision trees classifier achieved accuracies over 93%. The same pattern happens for the training over DS3, the SVM classifier, achieved higher accuracies over 98% for all datasets, while the decision trees classifier achieved accuracies over 95%. 2) on the DS4 which is used only for testing and never been trained on, the SVM classifier achieved a higher accuracy with 99% whatever the training is performed over DS1, DS2 or DS3. From this subsection, we conclude according to the achieved results that the SVM classifier is better than decision trees classifier in terms of validation, testing accuracy, performance metrics, and consumed time. The highest testing accuracy for DS1 was achieved by training over DS3 with 99.4%. For DS2, the highest accuracy was achieved by training over DS2 with 99.49%. In DS3, the highest accuracy was achieved by training over DS3 with 99.19%, and for DS4, the highest testing accuracy was achieved by training over DS3 with 100%. The same experimental trials which were conducted on decision trees and SVM classifier will be performed on the ensemble classifier. Figure 12 presents the validation accuracy and performance metrics for the ensemble classifier for the different datasets. From the achieved results, we can conclude that the ensemble classifier performance according to consumed time is not competitive at all. This is due to the nature of the ensemble classifier to try all possible classifiers that achieve the highest accuracy, which by definition takes a long time if it is compared to the other classifiers. Relevant results appear in Figure 14 , and they are 1) on the training over DS1, the ensemble classifier Confusion matrices are another useful insight into the performance of the classifiers. Training over combined datasets (DS3) is the most appropriate choice to achieve the highest accuracy possible for the different classifiers. As for decision trees, the confusion matrices are not included in this section as it reached the least testing accuracy. Figure 15 illustrates the confusion matrices for the SVM classifier in the testing phase for DS1, DS2, and DS3 when the training is over DS3. The achieved testing accuracy for DS1 is 99.4%, for DS2 is 98.7%, and DS3 is 99.2%. Figure 16 illustrates the confusion matrices for the ensemble classifier in the testing phase for DS1, DS2, and DS3 when the training is over DS3.  All testing accuracy results for training over DS3 is very close only, 0.01 % difference for DS3 between the SVM, and the ensemble classifier.  In the testing accuracy for DS4, over the training of DS3, the SVM classifier achieved 100% the same result as for the ensemble classifier.  The SVM classifier consumes less time in training as a performance indicator. The work presented in [17] used the same datasets, which include the real masked dataset RMFD (DS1) and the fake masked dataset LFW (DS4). The authors of [17] achieved a testing accuracy ranging from 50% to 95%. In the presented work, the testing accuracy for DS1 ranging from 93.44% using the decision tree classifier and 99.64% using the ensemble classifier. For DS4, the testing accuracy ranging from 99.76% using the decision tree classifier and 100% using the SVM classifier. For the fake masked dataset SMFD (DS2), there is no reported accuracy according to the author of the dataset (https://github.com/prajnasb/observations). In this work, we report an accuracy ranging from 94.54% using the decision tree classifier, and 99.49% using the SVM classifier. For the combined masked dataset (DS3), there is no reported accuracy according to related works as we present it through this work, we report an accuracy ranging from 96.50% using the decision tree classifier, and 99.35% using the SVM classifier. As a future study, we plan to approach the masked face from a neutrosophic environment with deep transfer learning models. The coronavirus COVID-19 pandemic is causing a global health crisis. Governments all over the world are struggling to stand against this type of virus. The protection from infection caused by COVID-19 is a necessary countermeasure, according to the World Health Organization (WHO). In this paper, a hybrid model using deep and classical machine learning for face mask detection was presented. The proposed model consisted of two parts. The first part was for the feature extraction using Resnet50. Resnet50 is one of the popular models in deep transfer learning. While the second part was for the detection process of face masks using classical machine learning algorithms. The Support Vector Machine (SVM), decision trees, and ensemble algorithms were selected as traditional machine learning for investigation. Three datasets had experimented on, and different training and testing strategies had adopted through this Rational use of face masks in the COVID-19 pandemic COVID-19: Face masks and human-to-human transmission WHO Coronavirus Disease (COVID-19) Dashboard Digital technology and COVID-19 Within the Lack of Chest COVID-19 X-ray Dataset: A Novel Detection Model Based on GAN and Deep Transfer Learning What policy makers need to know about COVID-19 protective immunity Paris Tests Face-Mask Recognition Software on Metro Riders Datakalab | Analyse de l'image par ordinateur Identifying Facemask-wearing Condition Using Image Super-Resolution with Classification Network to Prevent COVID-19 Implementation of Principal Component Analysis on Masked and Non-masked Face Recognition Glasses removal from facial image using recursive error compensation Face Detection Based on YOLOv3 A Novel GAN-Based Network for Unmasking of Masked Face System for Medical Mask Detection in the Operating Room Through Facial Attributes Interactive Removal of Microphone Object in Facial Images A real time face emotion classification and recognition using deep learning model Masked face recognition dataset and application Labeled Faces in the Wild: A Survey Exudate detection in fundus images using deeply-learnable features A transfer convolutional neural network for fault diagnosis based on ResNet-50 Deep Residual Learning for Image Recognition Feature Extraction Based on Deep Learning for Some Traditional Machine Learning Methods Feature Extraction using Convolution Neural Networks (CNN) and Deep Learning Overview of use of decision tree algorithms in machine learning A new decision-tree classification algorithm for machine learning Ensemble Learning Wind power forecasting using the k-nearest neighbors algorithm Linear Regression for Face Recognition Logistic regression A deep learning-based multi-model ensemble method for cancer prediction A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation CNN for Handwritten Arabic Digits Recognition Based on LeNet-5 BT -Proceedings of the International Conference on Advanced Intelligent Systems and Informatics Arabic Handwritten Characters Recognition Using Convolutional Neural Network Deep bacteria: robust deep learning data augmentation design for limited bacterial colony dataset Aquarium Family Fish Species Identification System Using Deep Neural Networks Artificial Intelligence Technique for Gene Expression by Tumor RNA-Seq Data: A Novel Optimized Deep Learning Approach Deep Transfer Learning in Diagnosing Leukemia in Blood Cells Impact of dataset size and variety on the effectiveness of deep learning and transfer learning for plant disease classification Effectiveness analysis of machine learning classification models for predicting personalized context-aware smartphone usage 