key: cord-0900232-pp8sy0us authors: Padmakala, S.; Revathy, S.; Vijayalakshmi, K.; Mathankumar, M. title: CNN Supported Automated Recognition of Covid-19 Infection in Chest X-Ray Images date: 2022-05-08 journal: Mater Today Proc DOI: 10.1016/j.matpr.2022.05.003 sha: 455f8f4f9c39ad8a5ba7ca21d11ddee556989c65 doc_id: 900232 cord_uid: pp8sy0us Automatic recognition of lung system is use to identify normal and covid infected lungs from chest X-ray images of the people. In the year 2020, the coronavirus forcefully pushed the entire world into a freakish situation, the foremost challenge is to diagnosis the coronavirus. We have got standard diagnosis test called PCR test which is complex and costlier to check the patient’s sample at initial stage. Keeping this in mind, we developed a work to recognize the chest X-ray image automatically and label it as Covid or normal lungs. For this work, we collected the dataset from open-source data repository and then pre-process each X-ray images from each category such as covid X-ray images and non-covid X-ray images using various techniques such as filtering, edge detection, segmentation, etc., and then the pre-processed X-ray images are trained using CNN-Resnet18 network. Using PyTorch python package, the resnet-18 network layer is created which gives more accuracy than any other algorithm. From the acquired knowledge the model is correctly classifies the testing X-ray images. Then the performance of the model is calculated and analyzed with various algorithms and hence gives that the resnet-18 network improves our model performance in terms of specificity and sensitivity with more than 90%. Coronavirus disease is first outbreak in China followed by South Korea, Iran, Italy and finally it started spreading in India on February 2020. Then the expert called this virus as Severe Acute Respiratory Syndrome Coronavirus-2(SARA-CoV-2) and shortened this coronavirus into COVID-19. This virus is contagious in nature which widely spreads and affected many people especially the age above 60, asthma patient and mainly the person which have lung problems. The number of affected and death rate is increasing day-by-day, there is no proper treatment or vaccines to cure or prevent the virus, because the virus DNA is not in the standard formula. Because of this coronavirus they are various challenges in several department like heath, education, economy, society, etc., in this paper the health department is focused mainly. In the entire world the health department is affected widely, even in the developed countries they are struggles for the equipment, tools, and for more resources. The primary goal of this health department is to diagnosis the virus first and then to treat it. The suspected people are isolated and take the test for avoid the risk of more infection spreads. For diagnosis itself it takes more than two weeks by using only the traditional PCR test which may led the result to false in rare cases. To reduce this problem factor, experts have suggested to diagnosis using lungs x-ray images [1] . COVID-19 is a new infectious disease caused by SARA-CoV-2 which is previously called novel coronavirus. COVID-19 is highly contagious in nature. It is an enveloped RNA virus, making it more exposed to disinfectants than a normal virus. The infection rate is high, because it is mainly affecting the elder, those who underlying in medical conditions such as asthma, autoimmune diseases, diabetes, respiratory diseases, etc., but the affecting rate in children is low and spreading rate is also very low. The main origin of this covid is not yet proved and confirmed, but the researches says wet market has been consider as one of the potential origin [6] . Because China recently experienced an outbreak at a meat market where dozens of workers were infected, which leads the experts to consider that wet market is the main reason for the origin, whereas no bats in those markets. The main characterized or symptoms of COVID-19 are high fever, body-aches, runny nose, sore throat, shortness of breath, chest pain, dry cough, and /or gastrointestinal symptoms. Where the some of the symptoms are similar to influenza(flu) or common cold. The most serious and dangerous symptoms are chest pressure and difficulty in breathing due to low oxygen level. Some patients will not have the above symptoms but they are also affected by covid, this can be found out by test such as RT-PCR test [13] , radiology test, X-Ray etc. The range of illness time will be varied among patients due to age, exposure, preexisting conditions, state of health, etc. The transmission rate of COVID is much higher than other viral infections. The mode of transmission of COVID is through direct contact with respiratory droplets of an infected person mainly through saliva and also by touching virus contaminated surface such as glass, counter top surface, plastic, doorknobs, stainless steel which can be last for 12-24 hours, but it can be disinfected by sunlight or ultraviolet light within 30 seconds or less than that. Therefore, COVID-19 dissemination can be slow down by blocking infectious droplets by wearing mask, stay at least 3-6 feet distance, washing and sanitizing the hands frequently. The diagnosis or testing of this COVID-19 are categorized into three types they are, the first one is to identify whether the actual COVID virus genetic material exists and that's called a NAAT test. And it's a PCR test where it would have nasal pharyngeal swab or a pharyngeal swab taken, then look for the genetic material of the virus itself. The second one is identifying the outer proteins of the viral shell or envelope and this is called as antigen testing. So, this test is used to detect the outer proteins of the virus. And the final testing is to detect whether the human body have developed antibodies or not. So, this final type of test is used to identify the antibodies that are present to the outer proteins of the virus. If the individual's outer layer of protein had developed an immune response, then the person had developed immunity toward the specific virus called COVID. So, these are the three big categories of testing that exist to identify the COVID. Some of the other viral diseases are Influenza SARS, MERS, SARS-CoV-2 and EVD are also introduced for a comparison with COVID-19. Which are briefly discussed and final comparison table is given in the following. Influenza is an illness caused by influenza viruses in short it is called as "flu". This virus will infect the nose, throat and mainly lungs. The rate of transmission is very fast and easily spread and cause serious problems, mainly for very young children, older people, pregnant women, and people who are under gone a medical condition such as asthma and diabetes. They are two types of influenza (flu) virus they are types A and B which are responsible for seasonal flu epidemics each year. This virus can change their form in two types they are "antigenic drift" and "antigenic shift". The word "antigens" means a molecular structures on the surface of viruses that are recognized by the immune system and are capable of triggering an immune response. The symptoms of this virus is similar to that of COVID like fever, cough, sore throat, headache, vomiting or diarrhoea. The diagnosis test available for influenza are viral culture, serology, rapid antigen testing, immunofluorescence assays, rapid molecular assays and reverse transcription polymerase chain reaction (RT-PCR) test. Ebola virus officially called as Zaïre Ebola Virus (EBOV), is an infectious disease which is caused by the Ebola virus. This virus can affect both human and animal. It is very rare diseases but very severe and potentially lethal and dangerous diseases. Once this virus is attacked, the virus spreads and multiplies at lightning speed. It causes bleeding in various places in the body, which results in multiple organ failure. The main symptoms of Ebola virus are severe flu or malaria symptoms such as diarrhoea, fatigue, vomiting, muscle pain and headache. This virus is also transmitted through direct contact with any kind of body fluid such as blood, saliva or feces. This virus not only transmitted from human to human but also it transmits from animal to human. But it is not clear that which animal carry the Ebola virus. Probably, bat plays a vital role in virus the virus from one animal to another animal. The virus transmitted from animal to human by eating contaminated meat. The most commonly used diagnosis method is PCR test. This test will able to find the virus particles is present in the blood, only the number of viruses increases in the blood during the active infection. When the virus is no longer present in great enough numbers in a patient's blood, PCR method will not be effective. Other method is based on the antibody's detection in the infected cases then it can be used to confirm the patient's exposure and infected by Ebola virus. SARS is abbreviated as Severe Acute Respiratory Syndrome. It is a viral respiratory disease caused by a SARS-associated coronavirus. It is an airborne virus and transmitted through small droplets of saliva as if in the common cold or influenza. It can also be spread indirectly through the surfaces that have been touched by infected person. Most affect cases are healthy adults aged from 25 to 70 years, only few suspected cases where under 15 years. The symptoms are all same as in the influenza and covid other than the common symptoms are chills and rigors, malaise and mild respiratory symptoms. The diagnosis method is also same as the covid, which includes NAATs and antigen tests. These tests are used only for screening purpose to reduce the transmission of SARS who need to be isolated from others. Coronaviridae-family of viruses Influenza viruses Ebola virus SARS-associated coronavirus An infected person's mouth or nose in small liquid particles when they speak, cough, sneeze or breath. The particle ranges from larger respiratory droplets to smaller aerosols. Droplets produced while coughing or sneezing. Body fluid like blood, saliva or feces. Droplets or touching the virus contaminated objects. Fever, cough, headache is most common. Chest pain, oxygen level saturation is serious Fever, chill, runny nose, sore throat, vomiting or diarrhea. Joint pain, weakness and fatigue, loss of appetite, unexplained hemorrhaging, bleeding. Fever (greater than 38 degree C), dry cough, shortness of breath Nucleic Acid Amplification Tests (NAATs), antigen tests and antibody test (serology test) Viral culture, serology, rapid antigen testing, RT-PCR, immunofluorescence assays and rapid molecular assays. PCR test is the most common test to detect virus is present in the blood. PCR test and Antibody testing The epic novel coronavirus outbreaks all departments including scientists, researchers, laboratories and health organizations all around the world to treat, prevent and cure this virus. They handled several treatment and diagnosis strategies in the month of the following COVID-19 outbreak, some of the research papers [1, 4, 5, 7] are examined and find out the different diagnosis, treatment and analysis method. Which are seen in the below table with different aspects from each paper [8, 9, 10, 1216] . Yujin Oh et al., [1] suggested the methodology of Patch-based deep neural network and Probabilistic Grad -CAM. Data set used on these methods are Chest X-Ray image from Japanese Society of Radiological Technology (JSRT), Segmentation in Chest Radiography (SCR), National Library of Medicine (NLM). They didn't focus on the current diagnosis performance of CXR is not sufficient for clinical use so the need of AI to improve. Large dataset for deep neural network training is difficult. Nuha Zamzami et al., [5] suggested that Convolutional Neural Network (CNN). Data set used here is Chest X-Ray image from public database called GitHub repository. They didn't focus on Pre-trained on ImageNet DB and pre-trained weight. Also, data sets are Chest X-Ray and CT Scan image from GitHub repository used here and the methodology of Shifted-scaled Dirichlet Maximum likelihood method is used. Since regression is used only for linear model but Image can also be designed as non-linear also. Edwin Montes-Orozco et al., [7] uses the methodology of Mutually Connected Giant Component (MCGC) and the data sets are raw text data available from WHO, WB and IT by applying statistical analysis. Main drawback of this system here is drastic change of one or more variables cause the dynamics of the system to change. Mohammad (Behdad) Jamshidi et al., [8] introduced the methodology of Generative Adversarial Network (GANs) and Extreme Learning Machine (ELM), Long/short Term Memory (LSTM). Data sets used here is recent publications, investigated medical report, clinical data and medical imaging. Limitations of AI-based methods for COVID-19 is yet to be achieved, and novel approaches have to be in place for problems of this level of complexity. Shaoping Hu et al., [9] uses methodology of Class Activation maps (CAMs) and the data sets are CT scan lungs image from The Cancer Imaging Archive (TCIA). The parts in the CT image should be marked manually for diagnosis. Furqan Rustam et al., [10] uses Exponential Smoothing (ES), Linear Regression (LR), Least Absolute shrinkage and Selection Operator (LASSO), Support Vector Machine (SVM). Time series data (include date, number of confirm cases, number of death cases, etc.,) from GitHub repository. SVM produces poor results in all scenarios because of the ups and downs in the dataset values. Abdul Waheed et al., [12] uses Auxiliary Classifier Generative Adversarial Network (ACGAN) methodology and data sets are Chest X-ray image from IEEE Covid Chest X-Ray dataset, COVID-19 Radiography dataset and COVID-19 chest X-Ray-dataset Initiative. The parts in the image are manually marked and trained to classify it. Fazle Ka Rim et al., [16] uses Long short-term memory fully convolutional network (LSTM-FCN), Attention Long short-term memory fully convolutional network (ALSTM-FCN). Time series data from UCR dataset is retrieved. More training time is required for the refinement by added computational complexity of re-training the model using smaller batch sizes. These are the few related works studied and analyzed with various algorithm's performance based on various factors such as dataset, processing technique, pre-trained weights, etc., so we came to the conclusion to develop a model to recognize the lungs automatically using Resnet18 from the CNN [4] [2] family with 18 layers to minimize the error rate. And this model will also help the researchers, scientist, doctors to diagnosis the patients quickly and easily. In the following section, we can see the development and classification of the model. Lung recognition system is used to identify covid infected lungs and normal/healthy/non-covid lungs from the Chest CT scan image as input. This coronavirus (COVID-19) has first outbreak in China (Wuhan), at late 2019's and has set the entire world in pandemic. This coronavirus spreads rapidly due to its contagious nature, its mainly spreads through the droplets of saliva travelling in the air from the infected person when they cough or sneeze. The symptoms of this virus is as same the symptoms of common cold but it will be sever for the already lung affected patient like asthma, the age above 55 and the less immune system will easy and soon get trap into this virus. But in some cases, there is no symptoms like high temperature, cold or cough which is very difficult to diagnosis and treat the virus [8] . In the above cases, the pathological and chest scan result were covid positive, so it is very difficult for the doctors to diagnosis for every single people all over the world. Still now all the doctors were following the traditional pathological test called PCR test which stand for Polymerase Chain Reaction which take much time to produce the result which may be false sometimes and its very expensive for all kinds of people. To avoid all these problems this paper, give a better solution like to take a Chest X-ray or CT scan as a diagnostic tool rather than a PCR test as a diagnostic tool. Finally, this paper used the deep learning neural network called ResNet18 CNN model to train the system to classify the input image as two class namely covid and normal [19] [20] [21] . So, in this paper, we have collected the dataset around 1000 images in each category for training and testing the images. These datasets are pre-processed which produce a structured data such as image from the unstructured data. The collected and processed data are then under goes various image pre-processing techniques such as scaling, median filter, Canny edge detection and region-based segmentation. These pre-processed images are used to train and test the model using Deep learning method called Convolutional neural networks-Resnet18 which will be then become a lungs detection model. Then finally, predict the input image as covid or normal [10] . The entire workflow of the paper is illustrated in the figure 1. If R1('finding')== "Covid": 5: Extract only the image and save it in Covid-folder(S1) 6: Else : 7: Do nothing 8: For each record R2 in dataset D2 : 9: If R1('finding')== "Normal": 10: Extract only the image and save it in Normal-folder(S1) 11: Else Test the image I with trained model We have collected the dataset from the open-source data repository. The normal lung Chest X-ray (CTX) [1, 2] image is collected from separate repository and the covid lung CTX image is collected from separate open-source repository. In Covid-19 Chest X-ray dataset contain 29 attributes for each around 950 patients records which contain unnecessary attributes like pO2_saturation, leukocyte_count, neutrophil_count, lymphocyte_count, other_notes, etc., and this metadata contain similar types of lung diseases such as Pneumonia/Bacterial, Pneumonia/Viral ,SARS and Covid-19.So, we need to pre-process the data according to which we need for the model to be trained and classify the correct class when the different input is given to the model. Since this project is regarding classification of image so we need to extract only the findings, views and URL attributes for the further steps in this work. Then from this attribute we need only covid related images for our work so we need to eliminate the remaining records from the dataset so this process is called as data cleaning. This data cleaning process should be done for another set of data (i.e. Normal Chest X-ray). In this dataset they have only 7 attributes for each 5900 records. Here we are going to extract only normal lung images for the further works. After data cleaning is done for both the dataset, then extracted information of attributes are integrated into to a separate folder as covid-19 and normal images which is integrated under a single folder named as dataset. This process is called as data integration. The figure 2 shows the output of the dataset collected module. Pre-processing is a procedure to transform a raw or improper data to a proper data before it is used for further purpose [14] . In this paper, the lung images with noise and many distortions are converted into proper image before it is used in Convolutional process. Each and every image both the category undergoes pre-process technique such as scaling, filtering edge detection, segmentation and contour. Which listed and explained below briefly. Scaling is process of reshaping a different sized image into a standard or uniform size for the next training process. Scaling is done by changing each and every pixel from the image to maintain the quality of the image. We have reshaped each image into 210x220 pixels for the better quality of the 2D grey scale x-ray images. A good scaling technique will have all the details of image for further resize approaches. More the image is scaled up or down the image details and quality will not be maintained. Filtering is a technique which usually used to remove the noise and unwanted signal from the image. Here we used median filter to remove the unwanted noise or signal from the image. The median filter is performed by first sorting the neighborhood pixel in numerical order then replaces the pixel with the middle value [17] . The advantage of using median filter is preserve the sharp edges in the image will not affect the median value significantly. The formula for the median filter is given below [13] . where y[a, b] is a pixel position need to change and x[i, j] position of the neighbour pixels(n) and its value and find the middle value from the set of neighbour pixels. If the neighbour pixels are even finding the middle two values and find the average of these two values. Canny edge detection is one of the edge detection algorithms to find the edges of high level images with multiple stages to give the correct boundary or outline of the image. It contains all the basic techniques like filtering, smoothening, edge detection and threshold. The Canny edge detection technique is briefly explained below. The process of Canny edge detection algorithm follows the following 5 steps they are, ❖ Filtering to remove the noise from the image and get a clear image. ❖ Find the intensity gradients of the image. ❖ Perform non-maximum suppression to eliminate unwanted pixels from the image. ❖ Give threshold for classify the image. ❖ Find the edge hysteresis. The segmentation is a process of partition an image into small regions. The vital role of the region-based segmentation to determine a region directly. The region-based segmentation is done in four ways they are, region growing, region splitting, region merging, region splitting and merging. Form these 4 techniques, region splitting and merging is implemented in this paper. Region splitting, starts with the full image as a single region and split the region until the condition is not satisfied. Region merging, is complete of contrast region splitting, which starts with small pixel as region and merge the region that have similar properties such as grey scale level and variance [17] . These two approaches are done alternatively and stop the process if there is no further merging or splitting is possible. In binary image, if there is same value the splitting is not done else splitting process is done. Image contour is used to find the accurate outline of the image for the better understanding of the preprocessed image. The output of the image contour gives the structural outlines of an object (in our case lungs) in an image. Using these contoured images itself we can say whether the image is covid or normal lungs, which is shown in the below figure 3(1.e and 2.e). These are the techniques used to pre-process the image before training the image into the created model. The next module in this paper is to create and train the model using Convolutional Neural Network (CNN)-resnet18 network for classify the COVID and normal lungs from the given image. Because CNN gives high accuracy by recognizing the X-Ray image [2] . ResNet-18 is abbreviated as Residual Network with 18 layers of 5 main blocks. The Resnet-18 architecture of first block contain 7x7 kernel size of convolution, 64 channels with 2 strides. Then the max pooling is done. Then each block residual the same batch size for 4 layers of various batch size in each block. The batch size of 64, 128,256 and 512 contains in the second, third, fourth, and fifth block respectively. Initially the input image is convolved by 7x7 kernel size, then after all the convolution are done. Finally, the fully Convolutional [16] is done with max pooling and ReLu activation function. The whole basic and standard resnet-18 architecture is given in the figure 4. In this paper, we analysis four different pre-trained model they are ResNet50, ResNet18, Keras and VGG. The experimental result of these models are summarized in table 3. The performance of these four pre-trained models have evaluated using some evaluating metrics like Accuracy, Precision, Recall, and F1-Score. The accuracy values of four models are based on the basic Convolutional neural network architectures for the diagnosis of the lung image to classify the covid and normal lungs. Form these four pre-trained CNN models, ResNet18 makes the best CNN model with the maximum F1-Score of 96%. Hence, this work is done using CNN-ResNet18 model using PyTorch for classify the CXR image. The performance of the automatic lung recognition model is evaluated using confusion matrix, which contain precision, recall, F1-score and specification as parameters. Precision is a fraction of all true positive from all the positive cases. In this paper the precision is calculated as the fraction of number of lung images is correctly classified as covid from all the covid lungs [9, 10] . Recall tells how many numbers of images are correctly classifies as covid, recall is also called as true positive rate or sensitivity. F1-score is calculated using precision and recall value. Specificity is a fraction of correctly classified normal lung from all the normal lungs which is also called as true negative rate. In addition to total accuracy, the macro-average and weighted average is also calculated. The formulas of the measures are given below: The Confusion matrix of the automatic lung recognition model is calculated and summarized in the table 3. During this covid time, the entire global suffered a lot mainly the health department with limited number of equipments, resources and personnel. So, to reduce their work we created an automatic lung recognition model which will classify the covid lungs and normal lungs from the given image. This model is build using ReNet18 CNN model with various pre-processing technique such as median filter, canny edge detection etc., these all are done on opensource platform using PyTorch package in python. The performance of the model is observed on 60 testing samples (the testing samples are COVID lungs:30 images and Normal lungs:30 images). The performance of the model is improved by using CNN-ResNet18. The accuracy of the system is reached above 90%. This work can be enhanced by using more testing dataset to train different image and that can be detected and this model can be deployed into the webpage using Flask package to work on live stream data, and correctly classify the image when the user gives a new image, which is used by all types of users like doctors, researchers, patients and others. Deep Learning COVID-19 Features on CXR Using Limited Training Data Sets Convolution neural network with low operation FLOPS and high accuracy for image recognition A new approach for the detection of Pneumonia in children using CXR images Based on an real-time IoT system Deep Learning based Diagnosis Recommendation for COVID-19 using Chect X-ray Images A Distribution-based Regression for Real-time COVID-19 Cases Detection from Chest X-ray and CT Images Study on Epidemic Prevention and Control Strategy of COVID-19 Based on Personnel Flow Prediction Identification of COVID-19 Spreaders Using Multiplex Networks Approach Deep Learning Approaches for Diagnosis and Treatment Weakly Supervised Deep Learning for COVID-19 Infection Detection and Classification From CT Images COVID-19 Future Forecasting Using Supervised Machine Learning Models A Comprehensive Review of the COVID-19 Pandemic and the Role of IoT, Drones, AI, Blockchain, and 5G in Managing Its Impact CovidGAN: Data Augmentation Using Auxiliary Classifier GAN for Improved Covid-19 Detection Early Prediction of the 2019 Novel Coronavirus Outbreak in the Mainland China Based on Simple Mathematical Model Techniques of medical image processing and analysis accelerated by high-performance computing: a systematic literature review Deep Learning Applications in Medical Image Analysis LSTM Fully Convolutional Networks for Time Series Classification A Comparative study of different noise filtering techniques in digital images A Novel Method of Synthetic CT Generation from MR image based on Convolutional Neural Networks A graphical user interface-based heart rate monitoring process and detection of PQRST peaks from ECG signal Medical Images Processing using Effectiveness of Walsh Function Classification of Electrocardiogram Cardiac Arrhythmia Signals Using Genetic Algorithm -Support Vector Machines ☒The authors declare that they have no known competing financialinterestsor personal relationships that could have appeared to influence the work reported in this paper.☒The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: