key: cord-229612-7xnredj7
authors: Pal, Ankit; Sankarasubbu, Malaikannan
title: Pay Attention to the cough: Early Diagnosis of COVID-19 using Interpretable Symptoms Embeddings with Cough Sound Signal Processing
date: 2020-10-06
journal: nan
DOI: nan
sha: 
doc_id: 229612
cord_uid: 7xnredj7

COVID-19 (coronavirus disease 2019) pandemic caused by SARS-CoV-2 has led to a treacherous and devastating catastrophe for humanity. At the time of writing, no specific antivirus drugs or vaccines are recommended to control infection transmission and spread. The current diagnosis of COVID-19 is done by Reverse-Transcription Polymer Chain Reaction (RT-PCR) testing. However, this method is expensive, time-consuming, and not easily available in straitened regions. An interpretable and COVID-19 diagnosis AI framework is devised and developed based on the cough sounds features and symptoms metadata to overcome these limitations. The proposed framework's performance was evaluated using a medical dataset containing Symptoms and Demographic data of 30000 audio segments, 328 cough sounds from 150 patients with four cough classes ( COVID-19, Asthma, Bronchitis, and Healthy). Experiments' results show that the model captures the better and robust feature embedding to distinguish between COVID-19 patient coughs and several types of non-COVID-19 coughs with higher specificity and accuracy of 95.04 $pm$ 0.18% and 96.83$pm$ 0.18% respectively, all the while maintaining interpretability.

The novel coronavirus (COVID-19) disease has affected over 31.2 million lives, claiming more than 1.02 million fatalities globally, representing an epoch-making global crisis in health care. At the time of writing, no specific antivirus drugs or vaccines are recommended to control transmission and spread infection. The current diagnosis of COVID-19 is made by Reverse-Transcription Polymer Chain Reaction (RT-PCR) testing, which utilizes several primer-probe sets depending on the assay utilized (Emery et al., 2004) . However, this method is time-consuming, expensive, and not easily available in straitened regions due to lack of adequate supplies, healthcare facilities, and medical professionals. A low-cost, rapid, and an easily accessible testing solution is needed to increase the diagnostic capability and devise a treatment plan. Computed Tomography(CT) helps clinicians perform complete patient assessments and describe the specific characteristic manifestations in the lungs associated with COVID-19 (Li et al., 2020) . Hence, serving as an efficient tool for early screening and diagnosis of COVID-19. In analyzing medical images, AI-based methods have shown great success (Du et al., 2018; Heidari et al., 2018 Heidari et al., , 2020 . These methods are scalable, automatable, and easy to apply in clinical environments (Ahmed et al., 2020; Shah et al., 2019) . Significant attempts have been made to use x-ray images for automatic diagnosis of COVID-19 (Pereira et al., 2020; Narin et al., 2020; Zhang et al., 2020; Apostolopoulos and Mpesiana, 2020) . Studies dealing with the classification of COVID-19 show promising results in this task.

However, in (Cohen et al., 2020) work, the classification limitations of x-ray images are examined since the network may learn more unique features to the dataset than those unique to the disease. Despite its success, the CT scan displays similar imaging characteristics, making it difficult to distinguish between COVID-19 and other pneumonia types. Moreover, CT-based methods can be integrated only with the Healthcare system to help clinical doctors, radiologists, and specialists detect COVID-19 patients using chest CT images. Unfortunately, an individual cannot utilize this method at home. To obtain the CT scan image and report, one must visit a well-equipped clinical facility or diagnostic center, which may increase the risk of exposure to the virus. According to the WHO and CDC official report, the four primary symptoms of the COVID are dry cough, fever, tiredness, and difficulty in breathing. (CDC). However, cough is more common as it is one of the early symptoms of respiratory tract infections. Studies show that it occurs in 68% to 83% of the people showing up for the medical examination. Cough classification is usually carried out manually during a physical examination, and the clinician may listen to several episodes of voluntary or natural coughs to classify them. This information is crucial in diagnosis and treatment.

In previous studies, several methods with speech features have been proposed to automate different cough types classification. In the study published by (Knocikova et al., 2008) , the sound of voluntary cough in patients with respiratory diseases was investigated. Later, in 2015, (Guclu et al., 2015) published the study on the analysis of asthmatic breathing sounds. These studies utilized the wavelet transformation, which is a type of signal processing technique, generally used on non-stationary signals. In a study by (Swarnkar et al., 2012) , a Logistic Regression model was utilized to classify the dry and wet cough from pediatric patients with different respiratory illnesses. For pertussis cough classification, three separate classifiers' performance was analyzed in (Parker et al., 2013) research. Several AIbased approaches, motivated by prior work, have been presented to detect patients with COVID-19 using cough sound analysis. (Deshpande and Schuller, 2020) gives an overview of Audio, Signal, Speech, NLP for COVID-19, (Orlandic et al., 2020; Brown et al., 2020; Sharma et al., 2020) have collected a crowdsourced dataset of respiratory sounds and shared the findings over a subset of the dataset. Imran et al. (2020); Furman et al. (2020) performed similar analyses on cough data and achieved good accuracy. Most studies use short-term magnitude spectrograms transformed from cough sound data to the convolutional neural network (CNN). However, these methods have the following limitations :

• Ignoring domain-specific sound information Cough is a non-stationary acoustic event. CNN is based only on a spectrogram input; some domain-specific important characteristics (besides spectrogram) of cough sounds might be overlooked in the feature space.

• Using cough features only These methods exploit the cough features only, ignoring patient characteristics, medical conditions, and symptoms data. Both cough features and other symptoms accompanied by demographic data are responsible for COVID-19 infection. Wherein the prior carries vital information about the respiratory system and the pathologies involved, the latter encodes patient characteristics, signs, and health conditions (fever, chest pain, dyspnea). However, their existence alone is not a precise enough marker of the disease. Therefore, determining the symptoms (besides cough) presented by suspected cases, as best predictors of a positive diagnosis would be useful to make rapid decisions on treatment and isolation needs.

• Lack of interpretability In AI research, the model is not limited to accuracy and sensitivity reports; instead, it is expected to describe the predictions' underlying reasons and enhance medical understanding and knowledge. Clinical selection of an algorithm depends on two main factors, its clinical usefulness, and trustworthiness. When the prediction does not directly explain a particular clinical question, its use is limited.

To overcome the limitation of the existing methods, A novel interpretable COVID-19 diagnosis AI framework is proposed in this study, which uses symptoms and cough features to classify the COVID-19 cases from non-COVID-19 cases accurately. A three-layer Deep Neural Network model is used to generate cough embeddings from the handcrafted signal processing features and symptoms embeddings are generated by a transformer-based self-attention network called TabNet. Arik and Pfister (2020) Finally, the prediction score is obtained by concatenating the Symptoms Embeddings with Cough Embeddings, followed by a Fully Connected layer. In a sensitive discipline such as healthcare, where any decision comes with an extended and long term responsibility, making wrong predictions can lead to critical judgments in life and death situations.

In this study, it is illustrated that this framework is not limited to accurate predictions or projections. Instead, it explains the underlying reasons for the same and answers the question as to why the model predicts it. The contributions of the paper can be summarized as follows:

• A novel explainable & interpretable COVID-19 diagnosis framework based on deep learning (AI) uses the information from symptoms and cough signal processing features. The proposed solution is a low-cost, rapid, and easily accessible testing solution to increase the diagnostic capability and devise a treatment plan in areas where adequate supplies, healthcare facilities, and medical professionals are not available.

• In this study, an interpretable diagnosis solution is presented, capable of explaining and establishing a dialogue with its endusers about the underlying process. Hence, resulting in transparent human interpretable outputs.

• Three binary and one multi-class classification tasks are developed in this study; Task 1 uses only cough features to classify between COVID-19 positive and COVID-19 negative. In Task 2, only demographic and symptoms data is used, and in Task 3, both types of information are used, which helps the model learn deeper relationships between temporal acoustic characteristics of cough sounds and Symptoms' features and hence perform better. In Task 4, multiclass classification is performed to explain the proposed model's effectiveness in classifying between four cough types, including Bronchitis, Asthma, COVID-19 Positive, and COVID-19 Negative.

• An in-depth analysis is performed for different cough sounds. The observations and findings are presented, distinguishing COVID-19 cough from other types of cough.

• A python module was developed to extract better and re-boost cough features from raw cough sounds. This module is opensourced to help users, developers, and researchers. Those are not necessarily experts in domain-specific cough feature extraction, contributing to real-time cough based research application, and provide better mobile health solutions.

• This study hence provides a medicallyvetted approach.

The model architecture consists of two subnetworks components, including the Symptoms Embedddings and Cough Embeddings , that process the data from different modalities.

Symptoms Embeddings capture the hidden features of patient characteristics, diagnosis, symptoms. A feature that has been masked a lot has low importance for the model and vice-versa. Averaged Attention masks are used to explain the overall importance of symptoms features.

TabNet stacks the subsequent DS one after the other. Decision steps are composed of a Feature Transformer(FT) Appendix B.0.2, an Attentive Transformer(AT)Appendix B.0.3 and feature masking. Symptoms features are mapped into a D-dimensional trainable embeddings q ∈ R B×D , where B is the batch size, and D is the feature dimension. A batch normalization (BN) is performed across the whole batch. For the selection of specific soft features and explain the feature importance TabNet uses a learnable mask M[i] ∈ R B×D . Each decision step has a specific mask and selects its own features; steps are sequential, so the second step needs the first to be finished. We obtain the mask(M) output at each decision step by multiplying the mask with Normalised symptoms features q i M[i] · q Normalized domain features are passed to FT, and a split block divides the processed representation into two chunks for the next decision step.

where d[i] ∈ R B×n d and a[i] ∈ R B×n a . and n d , n a are size of the decision layer and size of the attention bottleneck respectively. After n th step two outputs are produced.

• Mask outputs are aggregated from all the decision steps to provide model interpretability result. Figure 7 and Figure 8 shows the interpretability result.

• The final output is a linear combination of the all the summed decision steps, similar to the decision tree result.

Pairwise dot product was computed between output d out and FC layer to obtain the Symptoms Embeddings S e ∈ R B×F where B is the batch size and F is the output dimension.

TabNet uses regularized sparse entropy loss to control the sparsity of attentive features. The regularization factor is a mathematical aggregation of the attention mask.

(3) Where ε is a small positive value.

Cough Embeddigs learn and capture deeper features in temporal acoustic characteristics of cough sounds.

Before extracting cough features and feeding it to Deep Neural Networks(DNN), some preprocessing of raw audio data is needed. Each cough recording was downsampled to 16 kHz; normalization was applied to the cough signal level with a target amplitude of -28.0 dBFS to Normalized features were split into cough segments based on the silence threshold. Let s[t] be the discrete-time cough sound recording. The expression of signal s[t] can be written as: see Appendix C for detailed information about our data collection process. The final feature matrix was grouped by chunks of n Consecutive feature matrix, and A total of 44 cough features were extracted by taking the mean and standard deviation for all the cough features in each chunked matrix.

Later we feed the final feature matrix to 3 layers Deep Neural Network(DNN) with ReLu activation function to get the final Cough Embeddings C e ∈ R B×F where B is the batch size and F is the output dimension.

In Multi-class classification setting, we use Categorical Crossentropy loss function to calculate the loss of Cough Embeddings

where N is the number of classes in dataset,ŷ i denotes the i-th predicted class in the model out- 

We get the prediction score by concatenating the Symptoms Embeddings with Cough Embeddings followed by a FC layer.

(7) Figure 1 shows the overall structure of the proposed architecture. After this, Total loss was calculated as follows (8) Where α is a small constant value to balance the contribution of the different losses.

In this section, a comprehensive evaluation is carried out to investigate the results of four clinical classification tasks. Based on the dataset collected, the model was trained on the following combination of features.

• Task 1, Using cough data only In this experiment setup, only cough features were utilized from the collected dataset to train the Model and distinguish between COVID-19 positive and negative cases. Cough features were extracted using the signal processing pipeline, as described in section 1.

• 

It is demonstrated that the proposed framework benefits from the high accuracy and generality of deep neural networks and TabNet's interpretability, which is crucial for AI-empowered healthcare. Figure 7 and Figure 8 visualizes the symptoms of a healthy and COVID-19 infected individual. It shows that the model comprehends the hidden pattern in symptoms data and its relationship with cough sounds. To intuitively show the representation's quality, the cough features using t-sne and symptoms correlation matrix are visualized in Figure 9 and Figure 10 

An in-depth analysis is conducted for different cough sounds diagnosed with different diseases based on the collected data. Different types of cough samples are visualized in Figure 6 . Based on the analyzed data, the findings are as follows.

The coughing sound consists of three phases-Phase 1-Initial burst, Phase 2-Noisy airflow, and Phase 3-Glottal closure. It is observed that in the cough sample of healthy individuals, phase 3 finished with vocal folds activity. Figure 2 shows that after Phase 1, i.e., initial burst, the energy levels are high at higher frequencies. It is observed that COVID-19 cough is continuous; energy distribution is spread across frequencies preceded by a short catch. By analyzing the mean energy distribution of many COVID-19 cough sounds, Energy distribution was high in Phase 2 and Phase 3. The abnormal oscillatory motion in the vocal folds may be produced by altered aerodynamics over the glottis due to respiratory irritation. Figure 5 shows the result

Mass COVID-19 monitoring has proved essential for governments to successfully track the disease's spread, isolate infected individuals, and effectively "flatten the curve" of the infection over time. In the wake of the COVID-19 pandemic, many countries cannot conduct rapid enough tests; hence an alternative could prove very useful. This study brings forth a Low cost, accurate and interpretable AI-based diagnostic tool for COVID-19 screening by incorporating the demographic, symptoms, and cough features and achieving mean accuracy, precision, and precision in the mentioned tasks. This significant achievement supports large-scale COVID-19 disease screening and areas where healthcare facilities are not easily accessible. Data collection is being performed daily. Experiments will be carried out in the future by incorporating different voice data features such as breathing sound, counting sound (natural voice samples), and sustained vowel phonation. The results prove to be transparent, interpretable, and multi-model learning in cough classification research. 

(10) where i = 1, 2, . . . , l, l denotes the cepstrum order, E(m) and M are the filter bank energies and total number of mel-filters respectively.

To calculate the log energy of each sub-segment, the following formula was used:

where ε is a minimal positive value.

ZCR is used to calculate the number of times a signal crosses the zero axis. To detect the cough signal's periodic nature, we compute the number of zero crossings for each sub-segment.

Where Π[A] is a indicator function and is defined as

Skewness is the third order moment of a signal, Which measures the symmetry in a probability distribution.

Where µ and σ is mean and stand deviation of the sub-segment y i [t] respectively.

We compute the Entropy for each sub-segment of the cough signal to capture the difference between signal energy distributions.

A.6. Formant frequencies

In the analysis of speech signals, Formant frequencies are used to capture a human vocal tract resonance's characteristics. We compute the Formant frequencies by peak picking the Linear Predictive Coding(LPC) spectrum. We used the Levinson-Durbin recursive procedure to select the parameters for the 14th order LPC model. The first four Formant frequencies(F1-F4) are enough to discriminate various acoustic features of airways.

Kurtosis can be defined as the fourth-order moment of a signal, which measures the peakiness or heaviness associated with the cough subsegment probability distribution.

Where µ and σ is mean and stand deviation of the sub-segment y i [t] respectively.

To estimate the fundamental frequency (F0) of the cough sub-segment, we used the centerclipped auto-correlation method by removing the formant structure from the auto-correlation of the cough signal. The feature transformer is one of the main components of TabNet; it consists of 4 GLU layers, two are shared across the entire network, and two are step-dependent across each decision step allowing for more modeling flexibility. GLU layers are concatenated with each other after being multiplied by a constant scaling factor( √ 0.5). Feature Transformer process the filtered features by looking at all the symptoms features assessed and deciding which ones indicate which class.

Attention Transformer is another main component of TabNet architecture. It utilizes sparse intense wise features selection based on learned symptoms dataset and directs the model's attention by forcing the sparsity into the feature set, focusing on specific symptoms features only. It is a powerful way of prioritizing which features to look at for each decision step. The FC handles the learning part of this block. TabNet uses the Sparsemax, Martins and Astudillo (2016) an alternative of softmax function for soft feature selection. Sparsemax activation function is differentiable and has forward and backward propagation. Due to projection and thresholding, sparsemax process sparse probabilities lead to a selective and more compact attention focus on symptoms features. 

All the COVID-19 data utilized in this study were obtained from 200 subjects in a Dr. Ram Manohar Lohia Hospital, New Delhi, India. Out of 100 were confirmed positive from COVID-19 reverse transcription-polymerase chain reaction (RT-PCR) results. The Clinical Trials Registry-India (CTRI) had approved the study protocols and the patient recruitment procedure. After data preprocessing and Out of 200 samples, 50 samples were discarded due to low data quality. Aside from COVID-19 and healthy data, We also collected Bronchitis and Asthma cough from different online and offline sources. The data collection person followed all the clinical safety measures and inclusion-exclusion criteria and the cough sounds, breathing sounds, counting 1 to 10 ( natural voice samples ), sustained phonation of 'a,' 'e,' 'o' vowel, demographic, symptoms data such as fever, headache, sore Thorat, or any other medical conditions were also collected at the same time. The average interaction time with the subject was 10-12 mins.

Artificial intelligence with multi-functional machine learning platform development for better healthcare and precision medicine

Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks

Tabnet: Attentive interpretable tabular learning

Exploring automatic diagnosis of covid-19 from crowdsourced respiratory sound data

On the limits of cross-domain generalization in automated x-ray prediction

Language modeling with gated convolutional networks

An overview on audio, signal, speech, & language processing for covid-19

Bin Zheng, and Yuchen Qiu. Classification of tumor epithelium and stroma by exploiting image features learned by deep convolutional neural networks

Real-time reverse transcription-polymerase chain reaction assay for sars-associated coronavirus

The remote analysis of breath sound in covid-19 patients: A series of clinical cases

Classification of asthmatic breath sounds by using wavelet transforms and neural networks

Prediction of breast cancer risk using a machine learning approach embedded with a locality preserving projection algorithm

Development and assessment of a new global mammographic image feature analysis scheme to predict likelihood of malignant cases

Train longer, generalize better: closing the generalization gap in large batch training of neural networks

Ai4covid-19: Ai enabled preliminary diagnosis for covid-19 from cough samples via an app

Wavelet analysis of voluntary cough sound in patients with respiratory diseases

Using artificial intelligence to detect covid-19 and community-acquired pneumonia based on pulmonary ct: Evaluation of the diagnostic accuracy

Martins and Ramón Fernandez Astudillo. From softmax to sparsemax: A sparse model of attention and multi-label classification

Automatic detection of coronavirus disease (covid-19) using x-ray images and deep convolutional neural networks

The coughvid crowdsourcing dataset: A corpus for the study of large-scale cough analysis algorithms. ArXiv, abs

Detecting paroxysmal coughing from pertussis cases using voice recognition technology

Covid-19 identification in chest x-ray images on flat and hierarchical classification scenarios

Artificial intelligence and machine learning in clinical development: a translational perspective

Coswara -a database of breathing, cough, and voice sounds for covid-19 diagnosis

Automated algorithm for wet/dry cough sounds classification

Viral pneumonia screening on chest x-ray images using confidence-aware anomaly detection