key: cord-0798013-1zbe6t48 authors: Al-karawi, D.; Al-Zaidi, S.; Polus, N.; Jassim, S. title: AI based Chest X-Ray (CXR) Scan Texture Analysis Algorithm for Digital Test of COVID-19 Patients date: 2020-05-08 journal: nan DOI: 10.1101/2020.05.05.20091561 sha: e97b59dc2293445fa5c75693ac85a23c3e253b0f doc_id: 798013 cord_uid: 1zbe6t48 Chest Imaging in COVID-19 patient management is becoming an essential tool for controlling the pandemic that is gripping the international community. It is already indicated in patients with COVID-19 and worsening respiratory status. The rapid spread of the pandemic to all continents, albeit with a nonuniform community transmission, necessitates chest imaging for medical triage of patients presenting moderate-severe clinical COVID-19 features. This paper reports the development of innovative machine learning schemes for the analysis of Chest X-Ray (CXR) scan images of COVID-19 patients in almost real-time, demonstrating significantly high accuracy in identifying COVID-19 infection. The performance testing was conducted on a combined dataset comprising CXRs of positive COVID-19 patients, patients with various viral and bacterial infections, as well as persons with a clear chest. The test resulted in successfully distinguishing CXR COVID-19 infection from the other cases with an average accuracy of 94.43%, sensitivity 95% and specificity 93.86%. The SARS-CoV-2 pandemic continues to wreak havoc across the world in a manner that unwitnessed since the unusual deadly influenza's pandemic of 1918, otherwise known as the Spanish flu. Since December 2019 to third of May 2020, over 3.3 million cases were globally reported [1] . Despite the enormity of the present challenge, the impact of a century of gigantic advances in science, medicine and technology is manifested by the emergence of a globally adopted strategy to ensure significantly lower level of casualties. This strategy is based on (a) developing faster and more accurate diagnostic methods, (b) conducting clinical trials on possible drug treatments, and (c) developing and producing appropriate vaccines. With increasing prevalence of the disease and the nature of the outbreak, the priority is to scale up public health testing while developing new means of testing. Molecular diagnosis of COVID-19 using real-time RT-PCR test is the preferred screening/testing method [2] . While laboratorybased performance evaluations of RT-PCR test show high analytical sensitivity and nearperfect specificity with no misidentification of other coronaviruses or common respiratory pathogens, test sensitivity in clinical practice may be adversely affected by a number of variables including: adequacy of specimen, specimen type and handling, stage of infection (CDC guidelines for in-vitro diagnostics). Indeed, early reports of test performance in the Wuhan outbreak showed variable sensitivities ranging from 37% to 71% ( [3] , [4] ). Concerns about test availability and delays in PCR test, have led to promoting the use of serological tests for antibodies detection as proof of positive COVID-19 infection. Unlike the RT-PCR, these tests are not diagnostic testy but are meant to help estimate the proportion of the population that have had the infection and developed antibodies to it. However, recently doubts have been raised about the reliability of antibody tests. The World Health Organization's head of emerging diseases (Dr. Maria Van Kerkhove) said that while these tests measure the level of antibodies in the blood, there is no evidence they can show "that an individual is immune or is protected against re-infection". According to WHO, only a tiny proportion of the global populationmaybe as few as 2% or 3%appear to have antibodies in the blood showing they have been infected with COVID-19. Interested readers can find more on these issues from a variety of web reports (e.g. see [5] , [6] ). Despite the many controversial views continuously reported about the reliability of antibody tests, or otherwise, they are still being developed, manufactured, and made available in some countries. For example, on 17 April 2020 -Roche (SIX: RO, ROG; OTCQX: RHHBY) announced the development and upcoming launch of its Elecsys® Anti-SARS-CoV-2 serology test to detect antibodies in people who have been exposed to the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) that causes the COVID-19 disease. The newly formed Rapid Testing Consortium at Oxford University have been reported to be "close to picking up 100% of all cases where people have antibodies. Now it is just a question of scaling up the manufacturing process." A third significant and complimentary line of investigations target the use of Artificial Intelligence based efficient classification algorithms to analyse radiology chest scan images of suspected patients. Therefore, this paper is aiming to make a contribution to these efforts. Radiographic chest scans with CXR and CT images, as key tools for pulmonary disease diagnosis and management, form a useful source of complementary tools for testing and management purposes in dealing with COVID-19 pandemic. Since the emergence of the disease in Wuhan, china, there has been a growing number of investigations in this direction, evidence of their use across the different diagnosis and management scenarios have been noted as of 1 st April 2020, in [7] , as yet scant and may undergo refinement through rigorous scientific investigation. Guided by high clinical suspicion and laboratory assessment of COVID-19 suspected cases, radiographic imaging has been used to guide patient management decisions in categories of self-isolation at home, admission and isolation or evaluation for alternative diagnoses [8] and a similar process has been used in Spain [9] . Classic CXR findings in COVID-19 are multiple, bilateral, or unilateral, peripheral ground glass opacities. Organising pneumonia patterns and crazy paving may be present [10], [11] . Due to the wide availability of conventional chest radiography units especially the mobile ones and the ease of decontaminating the equipment between patients has given CXR an essential role in the fight against this pandemic. In order to support radiologist to perform such tasks, some research groups have developed Deep learning /Transfer learning techniques to analyse CXRs and classify them into COVID and non-COVID (normal CXR), however these algorithms have achieved varying results [12] , [13] . On 11 th April 2020, M. Ilyas et al, [14] , presented a review of several early AI based deep learning systems designed for the automatic detection of COVID-19 from CXR. The reported . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 8, 2020. . https://doi.org/10.1101/2020.05.05.20091561 doi: medRxiv preprint schemes are trained on different size datasets primarily to detect pneumonia as an indicator of COVID-19. With two exception all achieve considerable-to-impressive accuracy rates 80%-98%. The review concludes that all those AI approaches are detecting the subjects suffering with pneumonia without determining whether the pneumonia is caused by COVID-19 or due to any other bacterial or virus. In most cases, the training and testing datasets are rather of small size. This is a known source of overfitting by Deep learning schemes besides the black-box style of their decisions. The main contribution of this paper is the development of innovative AI schemes for the analysis of CXR images on a combined dataset comprising CXRs of positive COVID-19 patients, patients with various viral and bacterial infections, as well as persons with clear chest. Instead of developing deep learning models, we opt for certain types of texture analysis that allow informative way of justifying the output decisions, This paper is a follow up on our recent work [15] , on the analysis of Chest CT Scan Images of Coronavirus (COVID-19) Patients, and building on our accumulated experience on developing Computer Aided Diagnosis for tumour analysis we shall promote our innovative idea of texture analysis in the frequency and other transform domains. The paper is structured as follows. Section 3 will be describing our proposed schemes, while Section 4 presents the experimental results and discussion. Section 5 concludes the existing study and illustrates some opportunities for future works. A computer-based system for categorising chest X-Rays scan images are expected to add innovative digital test to complement and improve performance of existing diagnostic tests for what is known as a "feature map" image representation followed by the use of a data classifier, and they differ only in where to put most efforts. In the DL approach, the emphasis is on extracting a feature map that encapsulate as many as possible image local features whereas the . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 8, 2020. . https://doi.org/10.1101/2020.05.05.20091561 doi: medRxiv preprint non-DL approach create a feature map consisting of one or more known/meaningful features. DL schemes require a large number of samples for training and the outcome suffers more than the non-DL schemes from overfitting [16] . In this paper, we continue to opt for non-DL approach. This is motivated by the success in our earlier work [15] on using CT scans to test for COVID-19 infection, and is justified by research done by the first and last authors, on classifying Utrasound Ovarian Tumours by AI texture analysis schemes ( [17] , [18] , [19] ). The innovative contribution of the current paper is to extend the practice of extracting image texture features from the Fourier domain into another known image transform domain, namely the Local Binary Patterns (LBP). Our proposed AI scheme for distinguishing CXR images of COVID-19 patients from those of patients who are uninfected (i.e. normal) or infected by other viral or bacterial diseases, will be based on extracting texture features that exhibit noticeable patterns of variation relating to COVID-19 infection progression. The scheme follows the typical framework of non-deep machine learning Scheme for image classification that works by (a) pre-process the image using adaptive winner filter for noise reduction and image quality improvement, followed by inversion, (b) apply the LBP transform, (c) select a texture type and extract the appropriate feature vector, and (d) train and test the performance of each scheme on a sufficiently large dataset of image, using the linear Support Vector Machine (SVM) classifier [20] . Image features refer to pixel intensity patterns or quantitative data values that reveal meaningful information about image pixels in terms of local and/or global variations. Feature extraction is the process of determining a feature vector representation of input images, and effectively isolating the most critical attributes relevant to the aim of the intended application. Popular image analysis features include colour, colour distributions, colour variation patterns from different domains (spatial or frequency domain) of an image [21] . Image-texture often refers to repeated patterns of pixel intensity value changes, and texture is accepted as a unifying approach to analysing a wide range of image types (natural, biometrics, satellite, etc.) for various tasks including segmentation, object detection/tracking, and classification. In our classification of CT scan images of COVID-19, [15] , we found that textures extracted from the Fourier transform (FFT-spectrum) provided the most COVID-19 distinguishing features. Fourier transform is a mathematical tool that represents data signals/images in terms . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 8, 2020. . https://doi.org/10.1101/2020.05.05.20091561 doi: medRxiv preprint of waveforms of different frequencies, and when applied to image data its frequency content is analysed into different ranges. Higher frequency ranges are associated with more significant texture components. The FFT spectrum quantifies the presence/strength of various frequencies. Based on the success of the FFT-Gabor texture in achieving high accuracy for CT based analysis, we first tested the performance of this feature for our intended scheme only to find that it performed well but not as good as its performance with CT scan. Therefore, we opted to extract texture features from the LBP transform domain. The LBP transform acts as a nonlinear filtered version of the image. Each pixel is replaced by a byte representing the order relation between the pixel value and its 8-immediate neighbours scanned in a clockwise manner starting from to top left corner. If the neighbour is ≥ than the centre pixel value, then the corresponding bit is set to 1 otherwise it is set to 0. See Figure 1 , below for an illustration. While each pixel in the FFT spectrum depends on the entire image each in the LBP image depends on the neighbouring 8 pixels. Besides being efficient, the LBP transform is invariant to illumination (i.e. unaffected by global grey-level increase/decrease). We extracted various types of texture features from the LBP transformed images, but here we shall only briefly describe the best three COVID-19 discriminating texture features. • Gabor Texture feature. Images are transformed by a set of parametrised 2-dimensional band-pass Gabor wavelet filters designed for detecting specific frequency information within an image region at a user chosen set of scales and orientation ranges [22] . The extracted Gabor-texture quantifies the frequency content in the input image within the subregions determined by the selected scales and directions. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 8, 2020. . https://doi.org/10.1101/2020.05.05.20091561 doi: medRxiv preprint with significant geometric textures. There are 58 ULBP's, and another way to form a texture feature vector representation of an input image, is a 59-bins histogram whereby where the first bin is for the non-uniform LBP's and the remaining ones are for the 58 ULBPs [22] . • Histograms of Oriented Gradient (HOG): A localised texture feature , introduced in [23] , that uses image oriented gradient information for detecting image objects. The oriented gradient of an image I at pixel point p(x,y) , is the resultant of the horizontal vector gx = (I(p(x+1,y))-I(p(x-1,y) )) and the vertical vector gy = (I(p(x,y+1))-I(p(x,y-1) This framework emulates a typical CAD process. It involves 2 key stages: training and testing. In the training stage, the feature vectors extracted from a sufficient number of samples will be used to determine the best separation of the extracted feature vectors by the chosen classifier model. Due to relatively high dimensionality of the various extracted texture feature vectors compared to the modest number of samples available for training, we choose the Linear SVM binary classifier. The output from the SVM training stage is a hyperplane that separates the feature vectors of the positive class from those of the negative class. In the testing stage, the feature vector of each testing image is extracted as usual and its signed distance from the SVM hyperplane is calculated, where predicted class is determined by the sign and actual distance is used to determine probability of the predicted class being accurate. Figure 2 , below, illustrates the full process of feature extraction applied to all image at the training as well as testing stages. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 8, 2020. Figure 3 , below, displays samples from these different sets. While the COVID-19 transition is continuing, datasets of CXR images of newly examined COVID-19 patients will be frequently updated. These updates provide us with opportunities to update, improve and hopefully soon help achieve the best performance of our schemes . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 8, 2020. For each experiment, we shall first train and test the performance of our LBP-based schemes with the remaining images of the corresponding dataset and follow the protocol described by the left side part of the self-explanatory block diagram of Figure 4 , below. At the end of this experiments, 30 SVM classifier models are output with its performance and the average of their accuracy/sensitivity/specificity rates are declared as the performance of the tested scheme. Subsequently to test any new case, we use one of the 30 SVM hyperplanes namely the model with highest accuracy rate among those for which the different between the sensitivity and specificity are minimal. This model will be used to test the performance of the scheme over the corresponding prospective testing data set, per the right-side diagram of Figure 4 . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 8, 2020. Table 1 , below, displays the results of the 3 experiments and include the performance of the 3 different LBP-based texture COVID-19 classification schemes in terms of the average accuracy, sensitivity and specificity rates for the original training/testing experiments over the 700 images dataset. In each scenario, we also present the results of fusing the three texture schemes at the decision level using simple majority rule. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 8, 2020. . In Table 2 below, we present the performance of the saved models over the corresponding 210 prospective datasets for the 3 scenarios including the majority rule fusion results. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 8, 2020. . https://doi.org/10.1101/2020.05.05.20091561 doi: medRxiv preprint Furthermore, each of the 3 texture schemes achieves the highest discrimination between CXR of COVID-19 patients against normal CXR images, and the lowest discrimination is achieved between COVID-19 against all others. Finally, although fusion of the 3 LBP-texture schemes resulted in improved performance, unfortunately only marginally. This may be explained as a shortcoming of the primitive majority decision rule rather than fusion in general, but it also reflects the superiority of the LBP-Gabor scheme which distinguishes COVID-19 CXR's from the normal, viral bacterial, and All cases with accuracy of 97.16%, 92.70%, and 94.15% respectively. In all scenarios, the sensitivity and specificity are close to the accuracy rates. It is worth noting that extracting Gabor features in the CXR spatial domain achieved significantly lower accuracy for the same experimental data set. The results in Table 2 for the prospective tests follow the same pattern described above, but in all cases, there is a tolerable reduction in accuracy for all scenarios. This reduction can be explained by the fact none of the images in the prospective datasets is involved in the training/testing of the SVM model. The fact that all features were extracted directly from the LBP images with image enhancement and de-noising, demonstrate the success of our innovative strategy of texture analysis in the LBP transform domain. It is worth noting that, we test a larger number of LBP-texture schemes and most of the unpresented schemes achieved in-excess of 80% accuracy. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 8, 2020. . https://doi.org/10.1101/2020.05.05.20091561 doi: medRxiv preprint The paper investigated and evaluated our new innovative approach of extracting texture features from the LBP-transformed CXR images of different types of Chest infections, with the aim of developing an automatic CAD system to be used for distinguishing between COVID-19 and Non COVID-19. This paper is a follow up on our COVID-19 infection detection by texture analysis CT scan images where the FFT-Gabor scheme achieved accuracy in-excess of 95%. The choice of using CXR instead of CT (standard and Low dose) is due to the significant differences in the harmful ionising radiation dose (0.1 mSv for CXR compared to 7 mSv for standard CT and 1.5 mSv for LDCT). Consequently, CXR is a much safer option. Experimental test results over different collected sufficiently large datasets show that the corresponding SVM models built on several texture features extracted from the LBP domain achieved accuracy well above random guess (min 75%). The LBP-Gabor filters attained the highest and stable accuracy of 94.15% with equally impressive near equal sensitivity and specificity rates. Decision level fusion of all texture features resulted in average accuracies close to 95%, and higher than those by the individual features. These results demonstrate the viability and effectiveness of the LBP transform domain to extracted features for the stated task. We have also observed the possibility of using the LBP-Gabor filtered sub-images as an additional visual evidence of the predicted decision within clinical setting. Due to the promising results, we are currently testing LBP-Gabor and FFT-Gabor texture features for Lung tumour detection. Below is an illustration of the output of using the LBP-Gabor scheme that filters the LBP transformed images (2 Normal and 2 COVID-19) in 5 scales and 10 orientations. It is easy to visibly note the differences in the pattern displayed in each of the 50 small LBP-Gabor blocks between the COVID-19 and normal cases. Such displays provide the clinician and informative justification of the output decision. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 8, 2020. . https://doi.org/10.1101/2020.05.05.20091561 doi: medRxiv preprint European Centre for Disease Prevention and Control, situation update worldwide Guidance and standard operating procedure: COVID-19 virus testing in NHS laboratories Correlation of chest CT and RT-PCR testing in coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases Stability Issues of RT-PCR Testing of SARS-CoV-2 for Hospitalized Patients Clinically Diagnosed with COVID-19 WHO warns that few have developed antibodies to Covid-19 Two Antibody Studies Say Coronavirus Infections Are More Common Than We Think. Scientists Are Mad The role of chest imaging in patient management during the COVID-19 pandemic: a multinational consensus statement from the Fleischner Society A British Society of Thoracic Imaging statement: considerations in designing local imaging diagnostic algorithms for the COVID-19 pandemic healthcare-in-europe Thoracic Imaging in COVID-19 Infection Frequency and distribution of chest radiographic findings in COVID-19 positive patients Artificial intelligence applied on chest X-ray can aid in the diagnosis of COVID-19 infection: a first experience from Lombardy Detection of Covid-19 From Chest X-ray Images Using Artificial Intelligence: An Early Review Detection of Covid-19 From Chest X-ray Images Using Artificial Intelligence: An Early Review International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity Machine Learning Analysis of Chest CT Scan Images as a Complementary Digital Test of Coronavirus (COVID-19) Patients," medRxiv Analysing the limitations of deep learning for developmental robotics OC04. 04: A machine-learning algorithm to distinguish benign and malignant adnexal tumours from ultrasound images Prospective clinical evaluation of texture-based features analysis of ultrasound ovarian scans for distinguishing benign and malignant adnexal tumors An automated technique for potential differentiation of ovarian mature teratomas from other benign tumours using neural networks classification of 2D ultrasound static images: a pilot study Support-vector networks Feature extraction & image processing for computer vision Texture Analysis based Machine Learning Algorithms For Ultrasound Ovarian Tumour Image Classification within Clinical Practices Histograms of oriented gradients for human detection