key: cord-0712550-onicoabk
authors: Sheeba Rani, S.; Selvakumar, S.; Pradeep Mohan Kumar, K.; Thanh Tai, Duong; Dhiravida Chelvi, E.
title: Internet of Medical Things (IoMT) with machine learning–based COVID-19 diagnosis model using chest X-ray images
date: 2021-05-21
journal: Data Science for COVID-19
DOI: 10.1016/b978-0-12-824536-1.00001-0
sha: 4d3dc0e5e231e43ce04c50a56231817063076e33
doc_id: 712550
cord_uid: onicoabk

The outbreak of COVID-19 in Wuhan, China severely affected other parts of the world at a drastic rate. COVID-19 is classically diagnosed by a reverse-transcription polymerase chain reaction test on a blood sample. However, it has some limitations related to the sensitivity and availability of tests and the turnaround times for results. To resolve these issues, artificial intelligence techniques can diagnose COVID-19 from computed tomography scans and investigate radiological features for accurate COVID-19 diagnosis. This chapter presents a new Internet of Medical Things–based COVID-19 diagnosis model using different machine learning–based classification models on chest X-rays. The proposed model initially collects the samples of patients using Internet of Things devices and transfer the data to the cloud server, where actual diagnosis takes place. Once diagnosis is completed, the report is transferred to the concerned health care centers for further processing. For purposes of diagnosis, a series of processes involves preprocessing, texture feature extraction, and classification. The performance of the proposed model has been validated using a chest X-ray dataset. Analysis of the experimental results indicated that the AdaBoost with random forest model is superior to other models with a maximum accuracy of 90.13%, F score of 90.28%, kappa value of 89.59%, and Mathew Correlation Coefficient (MCC) of 87.44%. The attained results demonstrated that the proposed model is effective for the diagnosis of COVID-19 along with severe acute respiratory syndrome over comparable methods.

The standard testing model to predict coronavirus disease 2019 (COVID-19) is carried out by real reverse-transcription polymerase chain reaction (rRT-PCR) assay, a typical molecular-based assay that requires a long time to produce the desired outcome. Although the tool has been extensively applied, it depends on a well-trained laboratory and expert physicians, and it is time-consuming [1e5] . Because the COVID-19 outbreak is unmanageable, a great number of people's lives are at the dangerous point that has resulted in the breakdown of medical applications and global panic. Testing that relies on rRT-PCR is not applicable to managing the disease because COVID-19 has many asymptomatic cases [6] .

Another option for COVID-19 diagnostic models is point-of-care (POC) tools that apply the lateral flow immunoassay (LFIA) method, mainly used to predicting COVID-19 in humans [7, 8] . Immunoglobulin (Ig)G and IgM antibodies over severe acute respiratory syndrome (SARS)-CoV-2 might be predicted from human serum once COVID-19 is induced. The prediction stages of such antibodies offer data on the development and phases of viral infection. Because there are many virus-confirmed cases, various POC-LFIA tools have the ability to predict IgG and IgM levels, which facilitate them as screening devices for COVID-19, and they can applied to satisfy the emergency requirement for extra and robust diagnostic tools [8e12] . A Conformitè Europëenneecertified coronavirus rapid test from Sure Biotech, USA, could sample IgG/IgM antibodies with whole blood in a given time interval [13] . A challenging factor of LFIA diagnostics to detect the whole blood tests are GenBody COVID-19 IgM/IgG device is carried out by GenBody, Korea; the COVID-19 IgM/IgG rapid test is deployed by BioMedomics, USA; and Coronavirus COVID-19 TestdHome Self Test Kit generated by The Hive Pharmacy, United Kingdom. During testing COVID-19-related with the antibodies present in blood can be applied. Tan et al. identified strong positive results using specimens such as nucleocapsid (N) and spike (S) proteins gathered with nasal or pharyngeal swabs [14] .

COVID-19 is very contagious. It requires people to undergo lockdowns and remain at home. The need for medical supplies and devices is likely to outpace the ability for fast and effective diagnosis. Patients have to visit hospitals for medical treatment, which contradicts lockdown efforts that may slow disease transmission. In addition, limited isolation wards and medical tools have prompted medical sectors to place signs warning people to stay inside their home. A home-based diagnostic test offers an invalid solution to for emergency requirements. The Internet of Medical Things (IoMT) is upgraded and refers to the extended version of the Internet of Things (IoT) [15e17]. The models are embedded with the IoMT and applied to develop a medical environment that guides patients to obtain appropriate treatment; they deploy an extensive disease management database for government and health care sectors, as shown in Fig. 34 Patients might extend the health condition routinely to an IoMT environment  through the Internet, and the data are transmitted to nearby clinics, the Centers for  Disease Control and Prevention, and state and local health bureaus [18e21] . Hospitals should provide online health solutions according to the patient's health status, and the government should designate equipment and allocate quarantine places such as hotels and theaters. Under the application of the IoMT platform, users can observe the disease level and obtain proper medical remedies without transmitting the virus to each other. It limits national health costs, reduces stresses on medical tools, and offers a systemic database to enable the government to limit the spread of disease, distribute supplies, and carry out immediate legislation.

In past decades, the establishment of POC tools for infectious diseases was predicted for low-resource settings [22e24]. Therefore, it is useful for contagious disease outbreaks such as the COVID-19 pandemic. The self-quarantine at the time of emergency development, generation, and execution of a supportive environment integrates the homescreening, POC tools, and the IoMT to analyze the disease and observation [25, 26] . A rapid and robust medical platform has a significant part in reducing risks and saving life.

An extensive requirement for clinical platform becomes essential. Reduced medical resources and the requirement for self-quarantine strategies need the establishment of POC, home-relied diagnostic tools that minimize testing and overhead and force people to stay inside buildings. The integration of these models with IoMT offers data to health care to serve as applicable tools for government sectors. Hence, this method can be applied to save many people's lives, safeguard strained finances, and invent new features that extend lives [27e29].

This chapter presents an IoMT-based COVID-19 diagnosis model using different machine learning (ML)-based classification models. First, the data will be collected using IoT devices from patients; data transfer to the cloud takes place. The diagnosis of COVID-19 involves a step-by-step procedure of preprocessing, feature extraction, and classification. A set of 4 ML models is used (artificial neural network [ANN], decision tree [DT], support vector machine [SVM] , and AdaBoost with random forest [AB-RF]). A detailed set of simulation processes takes place to verify the effective characteristics of the applied classifier models.

The overall working procedure involved in the technique is depicted in Fig. 34. 2. The proposed model is composed of preprocessing, feature extraction, and classification. Once the data have been collected by the IoMT devices and transferred to the cloud, preprocessing is initially done to improve the image quality. Next, texture features undergo extraction and then classification process is executed.

In general, texture is a duplicate pattern of local variation from image intensities. Here, a co-occurrence matrix is defined as a statistics technique applied to analyze textures. The application of a co-occurrence matrix depends on a hypothesis in which a similar GLCM is followed by a texture. These patterns are highly modified by effective textures compared with coarse textures. This module Pðm; njd; qÞ can count the co-existence of pixels along with gray values m and n, applied distance d, and provided direction q [30] .

Therefore, the function of a provided GLCM-driven feature and ranking of texture features are based on the values of the gray levels employed. The m denotes the mean value of P. m c ; m y ; s c , and s y are the means and standard deviation of P c and P y $ G implies the size of a co-occurrence matrix. The GLCM features were obtained using Eqs. (34.1e34.14)

The angular second moment (ASM) is defined as the values of homogeneity of an image. A homogeneous image is composed with some gray levels. GLCM provides less with maximum values of Pðm; nÞ. Hence, the total of squares might be higher:

Contrast is a measure of the local variations of an image. The values of a contrast tend to calculate Pðm; nÞ from a diagonal, in which m ¼ n. When it has the maximum difference of an image, the P½m; ns focus on the major diagonal, and contrast is improved:

The inverse difference moment (IDM) is affected by the homogeneity of an image.

Because of the existence of a weighting factor 1 þ ði À nÞ 2

, the IDM would obtain minimum contributions from dissimilar regions ðm; nÞ. The final outcome is a lower IDM value for dissimilar images and the maximum value for homogeneous images: 

Entropy determines the difficulty of the image. Complex textures result in maximum entropy. It is a robust but inversely related to energy:

Pðm; nÞ Â logðPðm; nÞÞ (34.4)

Correlation is defined as the values of a gray-scale level linear dependency among the pixels present in a particular location that are relevant to one another. Correlation would be at maximum when an image has a manageable quantity of linear structure:

The variance is a measure of dispersing gray-level variations at a specific distance, d. The features are comparatively higher weights that vary from the mean of Pðm; nÞ:

ðm À mÞ 2 Pðm; nÞ (34.6)

Difference entropy is a metric of histogram content as well as logical values between two images. When two images are same, the DE may be either maximum or minimum: where P c ðmÞ denotes the ith entry from the marginal probability matrix attained by adding the rows of Pðm; nÞ and P y ðmÞ is acquired by including the columns of Pðm; nÞ.

Inertia shows the distributive characteristics of gray-scale images:

fm À ng 2 Â Pðm; nÞ (34.8)

The image is nonidentical if a shade is maximum; it is defined as:

n¼0 È m þ n À m c À m y É 3 Â Pðm; nÞ (34.9)

The image is asymmetric if prominence is at a maximum:

The energy of a texture defines the uniform level of a texture. It is 1 for a static image:

Pðm; nÞ 2 (34.11)

Homogeneity offers a value that counts the nearness of distribution of components in GLCM to GLCM diagonal. Every homogeneous image results in a co-occurrence matrix with an integration of maximum and minimum P½m; n's. The heterogeneous image tends to produce a distribution of P½m; n's:

Pðm; nÞ 1 þ jm À nj (34.12)

Dissimilarity is the value of symmetry between two sets:

jm À njPðm; nÞ (34.13)

The difference in variance defines a totality of variation among intensities of a primary and the nearby pixels:

ðm À mÞ 2 Pðm; nÞ (34.14)

The SVM classifier is an effective supervised classification and exact learning model. The statistical theory was proposed by Vapnickin in 1982 [31] . It provides an efficient outcome from diverse applications such as clinical diagnosis. SVM depends on the structural risk reduction strategy from a statistical learning model. The main aim of this method is to manage empirical risk as well as the ability to classify to improve margins among the classes and reduce true costs. The SVM explores the best hyperplane from members and nonmembers of a provided class of a high-dimension feature space. Next, classification is followed to divide images into COVID-19 and healthy images, in which every subject has been expressed by a vector present in every image.

The ANN is an arithmetic method with interlinked computing elements divided into layers, geometry, and functionality that mimics the actions of the human brain [32] . The cascade-forward back-propagation system is applied as a classifier with two layers. The initial layer has 28 input components and 14 feature vectors. There are five neurons in a hidden layer. The neural network undergoes training to modify the connection weight and bias to generate the required mapping. Here, the feature vectors are used as input to a system and the network tailors the variables, weights, and biases to attain a relationship among input and output patterns.

The DT is defined as a well-known classification method [33] . A DT is a tree-structured prediction approach in which every internal node represents a sample of attribute, an outgoing branch denotes the result of a test, and every leaf node is modeled with an image. DT is always applied to perform classification and prediction operations. It is easy and effective way to present knowledge. The methods generated by DT are shown as a tree structure. Learning a DT is the process of selecting a split to make every node and mention the depth of a tree. The samples undergo a classification process by describing the tree from roots to leaves and nodes.

The integration of AdaBoost and random forest (RF) methodologies was employed for brain magnetic resonance (MR) image classification to improve accuracy and stability and resolve the problem of overfitting [32] . AdaBoost is a novel boosting method widely employed to improvise for the accuracy of a provided learning method. In addition, it is an ensemble approach that concatenates several types of fragile classifiers composed of maximum error values and produces a final hypothesis with a minimum training error. It is assumed to be highly elegant, rapid, and simple to execute. Basically, it is nonparametric and capable of finding outliers. Furthermore, it does not acquire advanced experience regarding vulnerable learners. However, this model is used effectively to resolve massive classification issues. ML technology is used for binary classification. The boosting framework gets the input as a training dataset S of N samples, in which S ¼ ðx i ; y i Þ; i ¼ 1; .; N, where x i˛X is the feature vectors of a dataset and y i˛y ¼ f0; 1g is the adjacent class label. Labels 1 and 0 are abnormal as positive whereas normal is a negative class. It refers to a base learning algorithm frequently used for a large number of iterations. For every iteration, all training samples were assigned a weight that denotes the possibility of a sample being selected for a training set of a classification model. At the initial stage, the similar weight would be assigned for every example present in the training data.

Once the training has been completed for all base learning methods, the weight of improperly classified instances is enhanced to concentrate on complex examples existing in the training set. Consequently, the final hypothesis can be developed using linear integration of weak hypotheses produced from every round. RF is defined as an enhanced bagging approach composed with vital features as it implements successfully on massive datasets, capable of managing large input parameters with no variable removal and evaluates significant characteristics to perform classification tasks. In addition, it is easy and simply parallelized, is stronger for outliers and noise, and efficiently calculates missing data. RF collects tree-structured classifications in which every tree is based on the values of an arbitrary vector that is tested autonomously and uses the similar distribution of every tree in a forest, X. The given points are essential to develop a tree that tends to create a forest from the training set:

When N is said to be the number of instances present in training data, trees are deployed by acquiring instances of size N in a random manner by replacing actual data. This is called bootstrap samples. When R features exist in a dataset, r ( R is described in which in every node, r features are chosen from R features randomly and a split on r is applied to divide the node. The value of r is maintained constantly for the entire deployment of a forest. Every tree has been grown to a greater extent with no pruning.

Once the tree has been constructed, an input vector x undergoes classification by applying a massive number of votes casted by every tree from an ensemble. Here, it applied RF as a base learner of the AdaBoost model to classify brain MR images. This integration is composed with the merits of the RF and AdaBoost methods; hence, it needs to attain a maximum classification function from every dataset.

The experimental validation of the proposed model was validated in a COVID chest X-ray dataset. Fig. 34.3 shows the sample set of images with different classes [34]. optimal results and had a minimal kappa of 76.23%. On the other hand, the DT model had superior results to SVM with a kappa of 77.12%. Moreover, the ANN model had acceptable results over previous models with a kappa of 80.31%. Finally, optimal disease diagnosis results are achieved in the AB-RF model with a maximum kappa of 89.59%.

A comprehensive MCC analysis of the applied set of four ML-based classifier models was performed; the outcome is shown in Fig. 34.7 . The DT model resulted in a worse classifier outcome with a minimal accuracy of 79.30%. In addition, the SVM model had a somewhat higher accuracy of 80.59%. Moreover, the ANN model almost reached optimal results over DT and SVM with an accuracy of 83.50%. Interestingly, the AB-RF model had an effective outcome compared with the other models, with a maximum accuracy of 87.44%.

These figures and tables show that the proposed model can be employed as an appropriate tool for diagnosing COVID-19. The performance of the proposed model was validated using a chest X-ray dataset. The experimental results analysis indicated that the AB-RF model was superior to other models with a maximum accuracy of 90.13%, F score of 90.28%, kappa value of 89.59%, and MCC of 87.44%. Therefore, this model can be used by physicians for proper diagnosis.

This chapter discusses an IoMT-based COVID-19 diagnosis model using different MLbased classification models. First, data are collected using IoT devices from patients and data transfer to the cloud takes place. Once the data have been collected by IoMT devices and transferred to the cloud, preprocessing is initially done to improve the image quality. Next, texture features are extracted from the preprocessed image and then classification takes place. A set of four ML models, namely ANN, DT, SVM, and AB-RF models are used for classification. The performance of the proposed model was validated using a chest X-ray dataset. The experimental results analysis indicated that the AB-RF model was superior to other models with a maximum accuracy of 90.13%, F score of 90.28%, kappa value of 89.59%, and MCC of 87.44%. For future work, the proposed model can be extended to the use of image fusion techniques.

COVID-19): role of chest CT in diagnosis and management

Profiling early humoral response to diagnose novel coronavirus disease (COVID-19)

Epidemiology, causes, clinical manifestation and diagnosis, prevention and control of coronavirus disease (COVID-19) during the early outbreak period: a scoping review

Coronavirus disease 2019 (COVID-19): a perspective from China

Correlation of chest CT and RT-PCR testing in coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases

Potential preanalytical and analytical vulnerabilities in the laboratory diagnosis of coronavirus disease 2019 (COVID-19)

Current status of epidemiology, diagnosis, therapeutics, and vaccines for novel coronavirus disease 2019 (COVID-19)

Diagnosis of the Coronavirus disease (COVID-19): rRT-PCR or CT?

Epidemiological characteristics of 2143 pediatric patients with 2019 coronavirus disease in China

Molecular diagnosis of a novel coronavirus (2019-nCoV) causing an outbreak of pneumonia

Molecular immune pathogenesis and diagnosis of COVID-19

COVID-19): a systematic review of imaging findings in 919 patients

The epidemiological characteristics of an outbreak of 2019 novel coronavirus diseases (COVID-19) in China, Zhonghua liu xing bing xue za zhi¼

Evolving status of the 2019 novel coronavirus infection: proposal of conventional serologic assays for disease diagnosis and infection monitoring

A rapid advice guideline for the diagnosis and treatment of 2019 novel coronavirus (2019-nCoV) infected pneumonia (standard version)

Development and clinical application of a rapid IgM-IgG combined antibody test for SARS-CoV-2 infection diagnosis

Antibody tests in detecting SARS-CoV-2 infection: a meta-analysis

Detection of SARS-CoV-2 in different types of clinical specimens

Internet of Medical Things (IOMT): applications, benefits and future challenges in healthcare domain

An IoMT based cyber training framework for orthopedic surgery using Next Generation Internet technologies

An IoMT cloud-based real time sleep apnea detection scheme by using the SpO2 estimation supported by heart rate variability

InAMA-IEEE medical technology conference on individualized healthcare

Wearable hardware design for the internet of medical things (IoMT)

Point-of-care diagnostics for global health

Chemical, Gas, and Biosensors for Internet of Things and Related Applications

Combining point-of-care diagnostics and internet of medical things (IOMT) to combat the Covid-19 pandemic

Clinical course of COVID-19 infection in elderly patient with melanoma on nivolumab

Real-world scenario of patients with lung cancer amid the COVID-19 pandemic in China

Novel approaches to diagnose COVID-19, Kurd

Computer aided detection of ischemic stroke using segmentation and texture features

Properties of support vector machines

Brain MR image classification using two-dimensional discrete wavelet transform and AdaBoost with random forests

A survey of decision tree classifier methodology