key: cord-0691065-0h1aqapb authors: Zhu, Hongling; Lai, Jinsheng; Liu, Bingqiang; Wen, Ziyuan; Xiong, Yulong; Li, Honglin; Zhou, Yuhua; Fu, Qiuyun; Yu, Guoyi; Yan, Xiaoxiang; Yang, Xiaoyun; Zhang, Jianmin; Wang, Chao; Zeng, Hesong title: Automatic pulmonary auscultation grading diagnosis of Coronavirus Disease 2019 in China with artificial intelligence algorithms: a cohort study date: 2021-10-27 journal: Comput Methods Programs Biomed DOI: 10.1016/j.cmpb.2021.106500 sha: ef80481f6cfa30156f036bfdb52e5808f492c0ec doc_id: 691065 cord_uid: 0h1aqapb BACKGROUND AND OBJECTIVE: Research on automatic auscultation diagnosis of COVID-19 has not yet been developed. We therefore aimed to engineer a deep learning approach for the automated grading diagnosis of COVID-19 by pulmonary auscultation analysis. METHODS: 172 confirmed cases of COVID-19 in Tongji Hospital were divided into moderate, severe and critical group. Pulmonary auscultation were recorded in 6-10 sites per patient through 3M littmann stethoscope and the data were transferred to computer to construct the dataset. Convolutional neural network (CNN) were designed to generate classifications of the auscultation. F1 score, the area under the curve (AUC) of the receiver operating characteristic curve, sensitivity and specificity were quantified. Another 45 normal patients were served as control group. RESULTS: There are about 56.52%, 59.46% and 78.85% abnormal auscultation in the moderate, severe and critical groups respectively. The model showed promising performance with an averaged F1 scores (0.9938 95% CI 0.9923–0.9952), AUC ROC score (0.9999 95% CI 0.9998–1.0000), sensitivity (0.9938 95% CI 0.9910–0.9965) and specificity (0.9979 95% CI 0.9970–0.9988) in identifying the COVID-19 patients among normal, moderate, severe and critical group. It is capable in identifying crackles, wheezes, phlegm sounds with an averaged F1 scores (0.9475 95% CI 0.9440–0.9508), AUC ROC score (0.9762 95% CI 0.9848–0.9865), sensitivity (0.9482 95% CI 0.9393–0.9578) and specificity (0.9835 95% CI 0.9806–0.9863). CONCLUSIONS: Our model is accurate and efficient in automatically diagnosing COVID-19 according to different categories, laying a promising foundation for AI-enabled auscultation diagnosing systems for lung diseases in clinical applications. The 2019 novel coronavirus (2019-nCov) emerged December 2019, and caused a cluster of acute respiratory illness called Coronavirus Disease 2019 (COVID-19) 1 . As of Dec 3, 2020, more than 89,906 cases were confirmed and 4,642 cases died of it in China. While in the world, the confirmed cases increased sharply to more than 47007194 and death cases reached to 1208224, which induced huge damage to the people worldwide 2 . The 2019-nCov is a kind of coronavirus infecting humans through the angiotension converting enzyme (ACE) receptor, inducing organ dysfunction including lungs, heart and kidneys 3 . The main manifestation of the patient varied from fever, cough, dyspnea and vomiting and so on, accompanied with leukocytopenia and lymphocytopenia 4 . The virus can be transmitted and spread in humans at all ages and genders through close contact and droplets, even high concentration of aerosol 5 . Thus, it is a highly infectious and dangerous disease. According to the severity of COVID-19, the Chinese Center for Disease Control and Prevention (CDC) divided the patients into four types called light, moderate, severe and critical 6 . As there is no effective drug or vaccine available thus far, the severity level of the patients is essential information for the treatment of COVID-19 7, 8 . However, the main method to decide the severity level of COVID-19 patients are syndromes, computed tomography (CT) and blood testing of inflammation markers. These testing methods are radioactive, expensive or invasively. There is no efficient, cost effective and reproducible method yet. Mosby's Medical and Nursing Dictionary defines the physical examination as 'An investigation of the body to determine its state of health using any or all of the techniques of inspection, palpation, percussion, auscultation, and olfaction' 9 . Specifically, auscultation is the process of listening to the internal sounds of human body through a stethoscope and it is an effective and widely-used tool to diagnose especially the lung diseases and abnormalities. Through the stethoscope, physicians may hear various abnormal lung sounds including wheezes, crackles, squawk, rhonchus and phlegm sounds as well as the normal lung sounds according to the patients' diseases 10 . Thus the auscultation is an essential, but simple and patient-friendly method for testing the pneumonia conditions. 11 While in COVID- 19, in order to prevent from infecting with this virus, we have updated protective equipment that make it difficult for the doctors to do auscultation in the patients. What's more, as the quality of auscultation relies much on the environment around and the diagnostic value relies on the physicians' experience which are prone to inherent subjectivity, physicians' manual auscultation usually leads to a low value of auscultation in lung disease diagnosing and testing. This study is the first to investigate the auscultation characteristics of COVID-19 with different severities and by analyzing auscultation data with deep learning algorithms to offer a practical, high accuracy, cost effective, and comprehensive automatic auscultation diagnosis framework, which can be used as a reliable tool for the diagnosis and prediction for not only COVID-19, but also various pathological respiratory conditions. 172 confirmed cases of COVID-19 treated in Tongji Hospital of Huazhong University of Science and Technology, Wuhan, China from Mar 31 to Apr 5, 2020 were included in this study. Cases were confirmed by next generation sequencing or real-time PCR methods 12 or according to the clinical diagnosis criteria 13 Epidemiological, clinical, laboratory and radiological features were extracted from electronic medical records in Tongji Hospital. Throat-swab specimens were obtained after clinical remission of symptoms such as fever, cough, and dyspnea, and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) RNA was detected. Routine laboratory examinations consisted of complete blood count (Sysmex XN-2000 and its original reagent, Kobe, Japan), liver and renal function tests (Roche cobas 8000 and its original reagent, Basel, Switzerland), myocardial enzymes, and inflammatory cytokines like hsCRP (<1 mg/L low risk, 1-3 mg/L moderate risk, >3 mg/L high risk, >10 mg/L infection and inflammation) (Roche cobas e602 and its original reagent, Basel, Switzerland). Non-contrast CT scanning (GE Healthcare, Philips, or Toshiba Medical Systems) of the thorax were performed in almost all patients in the supine position during end-inspiration. The pulmonary auscultation was recorded in 10 sites per patient with 30 seconds for each site according to the Diagnosing guidelines through 3M littmann stethoscope The auscultation data were transferred into digital WAVE files to the Littmann StethAssist TM software (1.3.230) via the 3M littman stethoscope built-in Bluetooth transceiver to construct the dataset. The auscultation was diagnosed by a committee consisting of two independent doctors, who are board-certified actively-practicing doctors majored in cardiology and respirology in Tongji Hospital. Doctor committees firstly annotated auscultation records independently, then discussed the records that they did not reach an agreement. After discussing comprehensively, all the auscultation records were annotated by consensus, providing an expert standard for artificial intelligence (AI) model evaluation. Overview of the deep learning AI models 5 6 Three convolutional neural network (CNN) models were developed for lung sound classification, as three classifications of lung sounds were realized to study the For these CNN models, rectified linear unit was used to prevent vanishing gradient and dropout was used to prevent overfitting. We chose Adam for stochastic optimization and used cross-entropy as the loss function. In the output layers, the softmax function was used to calculate the probabilities of input test sample corresponding to two/four categories respectively, and the one with the largest probability is chosen as the final classification result. The structures of the deep CNN models and related information including the layer type, kernel stride, filter shape, and input size are shown in appendix 1 (Supplementary Figure 1 , 2 and Table2, 3). We evaluated the performance of the CNN models by some criteria, such as the prediction accuracy, the area under the curve (AUC) of the receiver operating characteristic (ROC), F1 score (harmonic mean of the precision and sensitivity), sensitivity, and specificity, with two-sided 95% CI. Confusion matrices were also used to evaluate whether the predictions of the CNN models were consistent with the labelled results from committee consensus. The CNN models can be assessed comprehensively through these different criteria. For the prediction accuracy, we evaluated the ratio between the predicted result Ŷ s and the labelled result Y s , according to the following expression: where S is the number of the lung sound segments to be assessed. We also calculated the macro sensitivity, specificity, precision, F1 score, by using the classification results including true-positive (TP), true-negative (TN), false-positive (FP), and false-negative (FN) rates. The formulas are defined by the following equations 15, 19 : We used the macro F1 score to evaluate the performance with all classes according to the expression: where N is the number of the classes. In terms of clinical characteristics, all statistical analyses were performed using SPSS (Statistical Package for the Social Sciences) version 21.0 software (SPSS Inc). Categorical variables were described as frequency rates and percentages, and continuous variables were described with mean± standard deviation (SD) when the data were normally distributed, otherwise median and inter quartile range (IQR) values. Means for continuous variables were compared using independent group t tests when the data were normally distributed, otherwise, the Mann-Whitney test was used. Proportions for categorical variables were compared using the χ2 test. For unadjusted comparisons, a 2-sided α of less than 0.05 was considered statistically significant. The Figure 3 ). We found that there were positive correlations between the sounds signals and CT scanning in diagnosing 2019-nCov diseases. The original lung sound signals were segmented with a duration of 4 seconds. To handle the imbalance problem in the data set, we used data augmentation methods 19 including noise addition, time shifting, etc., and segmented the signals with a duration For the four major lung sounds, the number of normal, crackles, wheezes and phlegm sound segments are 6446, 5379, 2732 and 2714, respectively. We randomly scrambled the lung sound segments, and then divided the segment data into training, validation, and testing datasets with the ratio of 6:2:2. We used the testing dataset to calculate the accuracies of the CNN models. ROC curves and AUC were plotted to assess the model discrimination of each class in the three CNN models (Table 3 and Figure2a, b). The confusion matrices were also used to illustrate the discordance between the CNN's predictions and the labelled results from committee consensus. (Figure 3 ). Experiments were carried out on neural network models with and without depthwise separable convolution/residual structure. It is found that lightweight CNN models and residual networks perform well for audio signals. Supplementary Figure 1 , 2 and Specifically, the confusion matrix shows that it is more difficult to classify between normal class and crackle class. The main reason why the performance index is worse than the first two CNN models is that some of the crackle signals were wrongly classified into the normal signal category. Actually, crackles may sometimes occur in healthy subjects during a deep inspiration 21 . This may explain why the prediction accuracy of the third CNN model is lower than that of the first two CNN models. To further evaluate the proposed model, we also used the International scientific challenge respiratory sound database 22 as an independent dataset to evaluate our models. For the classification of normal and abnormal respiratory sounds, the accuracy, sensitivity and specificity were 82.59%, 97.10% and 80.59% respectively, while for classification of four kinds of different respiratory sounds, the accuracy, sensitivity and specificity were 91.59%, 97.10% and 90.83% respectively. The test results are comparable with the state-of-the-art research 19 . In our study, we found that the auscultation paradigm represented by end-to-end automatic deep learning showed high potential as a new approach with high efficiency to help auxiliary diagnosis of 2019-nCov with exact severity classifications, also revealed its pathophysiological condition in a new way compared with the radiological characteristics of traditional CT. The stethoscope was one of the first medical diagnostic instrument in clinical ever used and the auscultation of the respiratory system is broadly utilized since it is costless, noninvasive and safe. However, correct distinguishing auscultation of lung sounds related to different lung diseases is an art that requires rigid practice and experience. Nowadays several investigations have aimed to identify respiratory sound signal using machine learning or deep learning which make an automatic analysis of lung sound is available 10, 15, 16, 19, 21, 23, 24 . However, these studies are limited to common pulmonary diseases without any medical or physiological parameters of the patients 25, 26 , thus weakened the clinical influence and application of their AI models. has advantages in terms of availability, costless, comfort, and diagnostic potential. Unlike CT and X-rays emitting ionizing radiation, it is safe for all patients; unlike MRI, it is available for those with metal implant and pacemakers 30 . It is also accessible, available and portal equipment whenever and wherever the patients need medical check for lung condition, without taking off the ventilator and going to the specific CT rooms. In our study, we found that the patients in the critical group were too sick that they were not tolerable to be tested with CT in the specific CT rooms but were surrendered to X rays testing instead in about every 3 days or even 1 day as their conditions became worse. This will definitely cause much fees and radiation harm to the patients, as well as less accurate result by X rays. The diversity contrasts of auscultation and different radiological tests implies that the auscultation showed desired sensitivity and superior features in clinical usage, indicating a necessity of auscultation in diseases diagnosis and treatment. This study had some limitations. We have limited number of patients included in each kind of classifications. Our dataset is consisted of the lung sounds acquired from patients in isolation ward because of the characteristics of 2019-nCov, thus the auscultation was inevitably recorded with several kinds of noises, such as talking, cell phone ringing and air conditioner running, which made the lung sounds impure and could induce some interference for the model. However, this is also an essential way to keep the auscultation prone to the real clinical condition. Fallibility is inevitable with a large sample size even gold standard achieved at the expert-level by the committee members. To the best of our knowledge, our study is the first to systematically identify the lung sound on COVID-19 with deep learning methods. Our study showed that deep learning algorithm performs with high precision in distinguish abnormal lung sound, what's more, it performs essentially valuable precision in identifying different kinds of abnormal lung sounds. This is the first time that a deep learning approach has been used to systematically diagnose almost all classifications of COVID-19, resulting in an end-to-end computerized, AI-based diagnosis model. As for the special characteristics of COVID-19, which is a serious viral pneumonia with widely spreading and without targeted medicine or vaccines, we foresee this deep learning approach showing a potentially promising use in infectious diseases, and newly appeared diseases fields. At the meantime, this AI-based diagnosis system also can be used in telemedicine, especially for rural areas and hospitals where experienced doctors are scarce. Our models with the advantages of cost-effectiveness and efficiency in classifying COVID-19 lung sounds, could potentially be used for helping real-time diagnosis and observing through wearable devices that monitor pulmonary and cardiovascular conditions through lung and cardiac auscultation. In principle, our deep learning approach shows a superior capability with accuracy, efficiency and precision to classify COVID-19 in different severity level, and different kinds of abnormal lung sounds, indicating the deployment of automatic, computerized and AI-based decision-support systems in clinical environments. It provides clinicians with useful early prognostic information to facilitate pretreatment risk stratification for COVID-19, and guides the medical staff to conduct more intensive surveillance and treatment to patients at high risk of severe illness to improve outcomes. Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study Cardiovascular Implications of Fatal Outcomes of Patients With Coronavirus Disease 2019 (COVID-19) Single-cell RNA-seq data analysis on the receptor ACE2 expression reveals the potential risk of different human organs vulnerable to 2019-nCoV infection Clinical Characteristics of 138 Hospitalized Patients With 2019 Novel Coronavirus-Infected Pneumonia in Wuhan Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet Diagnosis and treatment of novel coronavirus pneumonia in China (trial version Biological Product Development Strategies for Prevention and Treatment of Coronavirus Disease Epidemiology, Treatment, and Epidemic Prevention and Control of the Coronavirus Disease 2019: a Review. Sichuan Da Xue Xue Bao Yi Xue Ban Mosby's medical and nursing dictionary Unwrapping the phase portrait features of adventitious crackle for auscultation and classification: a machine learning approach Mosby's Dictionary of Medicine, Nursing & Health Professions Nursing & Health Professions -Seventh edition 2272 Mosby 9780723433934 0723433933 Clinical management of severe acute respiratory infection when novel coronavirus ( 2019-nCoV) infection is suspected: interim guidance CT Imaging of the 2019 Novel Coronavirus World Medical Association Declaration of Helsinki: ethical principles for medical research involving human subjects A Lightweight CNN Model for Detecting Respiratory Diseases From Lung Auscultation Sounds Using EMD-CWT-Based Hybrid Scalogram A Smart Digital Stethoscope for Detecting Respiratory Disease Using bi-ResNet Deep Learning Algorithm MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications Deep Residual Learning for Image Recognition Deep Neural Network for Respiratory Sound Classification in Wearable Devices Enabled by Patient Specific Model Tuning Characteristic of breath sounds and adventitious respiratory sounds Standardization of computerized respiratory sound analysis ICBHI 2017 Challenge An automated computerized auscultation and diagnostic system for pulmonary diseases Classification of lung sounds using convolutional neural networks Automatic heart and lung sounds classification using convolutional neural networks Signal and Information Processing Association Annual Summit and Conference (APSIPA) Lung sounds classification using convolutional neural networks Ultrasound and stethoscope as tools in medical education and practice: considerations for the archives. Advances in medical education and practice Physical examination: a revered skill under scrutiny Don't throw the stethoscope away! European heart journal Use of advanced imaging techniques during visits to emergency departments-implications, costs, patient benefits/risks All authors declare no competing interests. This work is partially supported by National Natural Science Foundation of