key: cord-0465685-246pr7ea authors: Kvak, Daniel; Bendik, Marian; Chromcova, Anna title: Towards Clinical Practice: Design and Implementation of Convolutional Neural Network-Based Assistive Diagnosis System for COVID-19 Case Detection from Chest X-Ray Images date: 2022-03-20 journal: nan DOI: nan sha: 395cdcda50de4967a7d4c7ff77d56e1978da1c45 doc_id: 465685 cord_uid: 246pr7ea One of the critical tools for early detection and subsequent evaluation of the incidence of lung diseases is chest radiography. This study presents a real-world implementation of a convolutional neural network (CNN) based Carebot Covid app to detect COVID-19 from chest X-ray (CXR) images. Our proposed model takes the form of a simple and intuitive application. Used CNN can be deployed as a STOW-RS prediction endpoint for direct implementation into DICOM viewers. The results of this study show that the deep learning model based on DenseNet and ResNet architecture can detect SARS-CoV-2 from CXR images with precision of 0.981, recall of 0.962 and AP of 0.993. : Comparison with state-of-the-art methods. The COVID-19 disease can be identified on a conventional chest radiograph based on several typical patterns. The two most common patterns are ground-glass opacities and lung consolidation. However, the ground-glass opacities observed on a chest CT can be difficult to detect on a chest radiograph, it is often accompanied by a reticular opacities region which is more easily appreciable on a standard CXR. (Cozzi et al. 2021 ) Lung consolidations in (and other viral pneumonia) can be often found multifocally, the distribution is usually bilateral and includes lower lobes. (Jacobi et al. 2020) One of the most specific features of COVID-19 on a CXR is a peripheral and posterior distribution of air-space opacities. In more serious stages of the disease when patients are typically hypoxic, diffuse air-space opacities covering the majority of lung parenchyma can be found and the CXR pattern can be similar to acute respiratory distress syndrome (ARDS). The rare or less common findings in COVID-19 are pleural effusions, pneumothorax, and lung cavitation. (Jacobi et al. 2020) During the first four days of symptomatic COVID-19 disease, there can be a normal finding on the chest radiograph and chest CT scan. Later on, when there are present the signs of COVID-19 on CXR, the images can show a high similarity to those of several types of viral pneumonia and other inflammatory lung diseases. Therefore, it is difficult for medical doctors to distinguish COVID-19 infections from other viral pneumonia using only a chest Xray. However, the conventional chest radiograph as well as the chest CT scan were found to be an important diagnostic tool in addition to PCR test for their higher sensitivity, availability, speed, and possible prediction of severity of the disease. (Tahir et al. 2021) 3 Proposed model architecture In the recent past, deep learning has been very successful in a variety of visual tasks. Deep learning-based models have revolutionized CADe by accurately analyzing, identifying and classifying patterns in medical images. (Yamashita et al. 2018) In the past, deep learning has had success in mammography image classification. (Shen et al. 2019) The success of machine learning-based algorithms in CADe and the rapid growth of COVID-19 cases have necessitated the need for an automatic detection and diagnosis system based on artificial intelligence. Recently, many researchers have proposed the use of CNN-based CADe models to detect COVID-19 from CXR. , Mahmud et al. 2020 , Ozturk et al. 2020 , El Asnaoui & Chawki 2021 , Khan et al. 2020 In recent years, the advent of a new generation of GPUs and the deployment of cloud convolutional layers that segment the image into small pieces that can be easily processed. (Albawi et al. 2017) The outputs from these layers are aggregated into layers to reduce the size of the data and reduce noise. The sequential layers feed into a neural network, which then produces a middleware. The endpoint is integrated into the Picture Archiving and Communication Systems (PACS) web system and provides services for a specific PACS system. It can be deployed in a hospital to provide the WADO service to healthcare professionals and used in regional PACS to transfer medical images and messages. (Liu et al. 2015) The use of WADO-RS allows us to build a DICOM Structured Report object that contains the requested CADe application predictions. CNNs are often trained with a closed-world assumption, i.e., the distribution of the test data is assumed to be similar to the distribution of the training data. (Yamashita et al. 2018 ) However, when deployed in real-world, this assumption does not hold, leading to significant performance degradation. Although this performance degradation is acceptable for applications such as product recommendation, it is dangerous to use such systems in intolerant domains such as CADe because they can cause serious accidents. (Firmino et al. 2016) The proposed Carebot Covid app generalizes to out-of-distribution (OOD) examples whenever possible, flagging those that are beyond its capabilities and seeking human intervention. (Hendrycks & Gimpel 2016) Figure 3. While similar solutions try to predict the diagnosis even from the inappropriate image, Carebot Covid app successfully detects that it is not a CXR image and discards it as invalid. One of the difficult tasks is knowing when not to make a prediction. If you ask a radiologist to diagnose an image of something that is not his specialty, he should not provide a diagnosis. For CNN, its specialization is the classifier domain, which is defined by its training were taken in posteroanterior (PA) or anteroposterior (AP) projection in patients who were unable to stand. All images in AP projection were taken using portable X-ray machines with patients in supine or sitting position. (Zhang et al. 2021) To create the dataset, we combined and modified several different publicly available datasets. Examples of CXR images from the dataset used are shown in Figure 1 and illustrate the diversity of patient cases (including age, sex, stage of infection, or imaging projection) in the dataset. Patients younger than 15 years were excluded from the dataset, as well as images of poor quality or incorrect projection. and JPEG encoding noise. The training data was augmented using five randomly selected variations, the extension of test dataset was not investigated. An F1 Score becomes a critical evaluation tool to determine False Positive and False Negative rates yielded through a discriminating threshold in a similar situation with unbalanced dataset samples. (Sokolova et al. 2006) The classification performance of the Carebot Covid app model for multi-class problem was evaluated for each component and the average classification performance of the model was calculated. The following table includes the accuracy, precision, recall, and F1 Score, calculated based on the following equations below: F 1Score = 2 * P recision * Recall P recision + Recall = 2 * T P 2 * T P + F P + F N For specific experiments and given that there is a class imbalance problem, the most reliable metric is the model average accuracy metric, while given that this accuracy is high, the second most important metric is the recall metric for individual classes. (Japkowicz & Stephen 2002) This is due to the importance of correctly identifying true cases that are not COVID-19 (True Negatives). AP (Average Precision) summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold (Yilmaz & Aslam 2006) , with the increase in recall from the previous threshold used as the weight: The precision-recall curve shows the trade-off between precision and recall for different thresholds. (Buckland & Gey 1994) A high area under the curve represents both high recall and high precision, with high precision associated with low False Positive cases and high recall associated with low False Negative cases. (Boyd et al. 2013 ) Shown baseline classifier is defined as a classifier that cannot distinguish between classes and would predict a random class or a same class in all cases. (Brownlee 2018) 6 Conclusion and future work However, data quantity is not the only element that can be central to an CADe system; the machine learning algorithm used and its parameters, as well as the characteristics of the added data, have a significant impact on the final accuracy. (Reiner et al. 2005) As CNN increases its applicability and importance in the medical imaging domain, radiology workflows that enable AI models to access medical data are critically important. (Reiner et al. 2005 , Our design avoids the limitation of usability in the real world by adapting to radiology workflow and adhering to industry norms and standards. Although we did not evaluate the added value of the CADe to the diagnostic performance of radiologists, given the results of previous studies in which radiologists' diagnostic performance improved with deep learning algorithms, we believe that this algorithm can improve radiologists' performance even in largescale population screening. All persons who meet authorship criteria are listed as authors, and all authors certify that they have participated sufficiently in the work to take public responsibility for the content, including participation in the concept, design, analysis, writing, or revision of the manuscript. Furthermore, each author certifies that this material or similar material has not been and will not be submitted to or published in any other publication. The authors hereby declare that this research article meets all applicable standards with regards to the ethics of experimentation and research integrity. The authors also declare that the text of the article complies with ethical standards, the anonymity of the patients was respected. Understanding of a convolutional neural network Extracting possibly representative covid-19 biomarkers from x-ray images with deep learning approach and image data related to pulmonary diseases Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks', Physical and engineering sciences in medicine Area under the precision-recall curve: point estimates and confidence intervals How to use roc curves and precision-recall curves for classification in python', Machine learning mastery 30 The relationship between recall and precision Features, evaluation, and treatment of coronavirus (covid-19)', Statpearls The economic impact of the covid-19 pandemic on radiology practices Technical challenges of enterprise imaging: Himss-siim collaborative white paper Ground-glass opacity (ggo): A review of the differential diagnosis in the era of covid-19 The effectiveness of image augmentation in deep learning networks for detecting covid-19: A geometric transformation perspective Computer-aided detection (cade) and diagnosis (cadx) system for lung cancer with likelihood of malignancy Dicomwebâ„¢: Background and application of the web standard for medical imaging Deep residual learning for image recognition A baseline for detecting misclassified and out-ofdistribution examples in neural networks The origins and prevalence of texture bias in convolutional neural networks' Densely connected convolutional networks A survey of medical imaging, storage and transfer techniques Data augmentation for improving deep learning in image classification problem A novel method for multivariant pneumonia classification based on hybrid cnn-pca based feature extraction using extreme learning machine with cxr images Automated detection of covid-19 cases using deep neural networks with x-ray images Computer-aided detection in chest radiography based on artificial intelligence: a survey Multi-institutional analysis of computed and direct radiography: Part i. technologist productivity The epidemiology and pathogenesis of coronavirus disease (covid-19) outbreak' Beyond accuracy, f-score and roc: a family of discriminant measures for performance evaluation Chest radiography in general practice: indications, diagnostic yield and consequences for patient management Covid-19 infection localization and severity grading from chest x-ray images The outbreak of covid-19: An overview Convolutional neural networks: an overview and application in radiology Estimating average precision with incomplete and imperfect judgments