key: cord-0126910-oyo86u5c authors: Chetupalli, Srikanth Raj; Krishnan, Prashant; Sharma, Neeraj; Muguli, Ananya; Kumar, Rohit; Nanda, Viral; Pinto, Lancelot Mark; Ghosh, Prasanta Kumar; Ganapathy, Sriram title: Multi-modal Point-of-Care Diagnostics for COVID-19 Based On Acoustics and Symptoms date: 2021-06-01 journal: nan DOI: nan sha: 7bacecb1eead6dc96b8e212642d4ce811d743b08 doc_id: 126910 cord_uid: oyo86u5c The research direction of identifying acoustic bio-markers of respiratory diseases has received renewed interest following the onset of COVID-19 pandemic. In this paper, we design an approach to COVID-19 diagnostic using crowd-sourced multi-modal data. The data resource, consisting of acoustic signals like cough, breathing, and speech signals, along with the data of symptoms, are recorded using a web-application over a period of ten months. We investigate the use of statistical descriptors of simple time-frequency features for acoustic signals and binary features for the presence of symptoms. Unlike previous works, we primarily focus on the application of simple linear classifiers like logistic regression and support vector machines for acoustic data while decision tree models are employed on the symptoms data. We show that a multi-modal integration of acoustics and symptoms classifiers achieves an area-under-curve (AUC) of 92.40, a significant improvement over any individual modality. Several ablation experiments are also provided which highlight the acoustic and symptom dimensions that are important for the task of COVID-19 diagnostics. A highly contagious variant of the coronavirus family, SARS-CoV-2 has resulted in the most significant health crisis of the twenty-first century [1] . The outbreak was termed as the coronavirus disease 2019 (or COVID-19) and declared a pandemic in March 2020 by the World Health Organization (WHO). While vaccination efforts have partly reduced the viral spread in some parts of the world, the multiple waves of infections across various countries indicate that the COVID-19 pandemic has the potential to persist for months and years to come [1] . The pathogenesis of COVID-19 suggests that infection triggers a series of events that enable the SARS-CoV-2 virus to replicate and migrate down the respiratory tract to the epithelial cells in the lungs [1] . The most common timeline for these events is 2-5 days from the onset of the infection. Further, the risk of viral spread to primary contacts is the highest during the first week of infection. Early diagnosis can help to identify, isolate infected individuals, and control the community spread of the virus [2] . Currently, the gold-standard of COVID-19 diagnosis is the reverse transcription polymerase chain reaction (RT-PCR) assay [3] . The RT-PCR, based on molecular testing, detects the amino-acid sequences unique to SARS-CoV-2 in swab samples. The throat and/or nasal swab samples are first collected, stored, and processed at a lab facility, where the ribo-nucleic acid (RNA) content is amplified for detecting the presence of the COVID-19 genome in the sample. However, the RT-PCR has four major limitations when it comes to massive population level scaling: i) the cost of RT-PCR chemical reagent and facility, ii) expert supervision, iii) the turnaround time from sample collection to results (hours to days), and iv) lack of physical distancing during sample collection. A widely used alternative to RT-PCR testing is the rapid antigen testing (RAT) methodology [4] . In particular, the RAT attempts to identify one of the outer proteins of the viral shell or envelope with results available in 15 − 20 min. This testing methodology is less expensive compared to the chemical reagent based testing. However, the key limitation of potential spread during sample collection is not alleviated. Further, the sensitivity values for the same level of specificity are lower compared to RT-PCR tests [5] . In summary, there is a need to discover alternative test methodologies which improve the trade-off between time, cost, physical distancing, and performance. The WHO blueprint on COVID-19 diagnostic tests highlights the urgent need for developing point-of-care tests (POCTs) [6] . In this paper, we present a study analyzing the respiratory acoustics of COVID-19 infection and its suitability for the design of POCT tools. Listening to acoustic changes in lung sounds for preliminary screening of abnormalities in the respiratory passage was first formalized by Laennec et al. [7] in 1838. With the advances in sensor and communication technologies, "machine listening" approach for automatic analysis of respiratory sounds has arXiv:2106.00639v2 [eess.AS] 5 Jun 2021 gained interest. The evaluation metrics typically employed are area under the curve (AUC) and specificity at a pre-defined sensitivity. Pramono et. al. [8] developed an automated diagnosis of pertussis using cough sound signals. A smartphone based multi-modal user-friendly approach to the detection of chronic obstructive pulmonary disease (COPD) and congestive heart failure was explored by Windmon et. al. [9] . Botha et.al. [10] showed that a spectral analysis of cough sounds can provide a low cost detection of tuberculosis. In a recent work, Porter et. al. [11] developed a smartphone based portable cough sound and reported symptoms analysis to detect chronic airway disease. Additionally, in controlled studies using small cohort sizes, effectiveness of machine learning based sound analysis has been demonstrated for childhood pneumonia detection [12] , wet versus dry cough classification [13] , and asthmatic versus healthy classification [14] . These efforts illustrate that acoustic signals have the potential to allow low-cost, rapid and efficient diagnostics of respiratory ailments. For COVID-19, a meta-analysis study of symptoms by Li et al. [15] found fever (78.8%), followed by cough (53.9%), and malaise (37.9%) as the common symptoms in 281, 641 COVID-19 infected individuals. The clinical symptoms of COVID-19 include fever, common cold, cough, chest congestion, breathing difficulties, dyspnea, loss of smell/taste and pneumonia [16] . The human sounds, such as cough, breathing, and speech, result from a coordinated functioning of the lungs and the organs in the respiratory pathway. An impairment in the functioning of these organs, such as constriction, fatigue, or difficulty in breathing, may be manifested in the acoustic characteristics of these sounds. These observations motivate the design and evaluation of a sound sample based diagnosis approach for COVID-19 [17] . A success can lead to creation and large-scale deployment of a POCT tool, where an individual can record their sound samples on a portable webconnected device, and an application can analyze and display the result. For sound sample dataset creation, the notable efforts include the COVID-19 Sounds project (cough and breathing) by Univ. Cambridge [18] , COUGHVID dataset (cough) by EPFL [19] , and the COVID-19 Cough dataset by MIT [20] . Further, our team has also been involved in the dataset creation task [21] . Using cough and breathing sounds, Brown et al. [18] report a performance measure of 0.80 AUC on a subset of COVID-19 Sounds dataset. On a similar dataset, Coppock et al. [22] report 0.85 AUC. Agbley et al. [23] demonstrated a specificity of 0.81 at a sensitivity of 0.43 on a subset of COUGHVID dataset. Feng et al. [24] used a subset of cough sounds from Coswara dataset and reported a performance of 0.90 AUC. Laguarte et al. [20] obtained AUC greater than 0.90 on samples from the COVID-19 Cough data set. These studies use acoustic feature representations of cough sounds such as Mel frequency cepstral coefficients (MFCCs) [18] , Mel-spectrogram [20] , [22] , or scalograms [23] , while the classifier models are deep learning based neural networks such as convolutional neural networks (CNNs) [23] , recurrent neural networks (RNNs) [24] , CNN based feature embeddings in support vector machines (SVM) [18] or with CNN based residual networks [20] , [22] . There are also attempts at creating more controlled COVID-19 cough sound dataset from individuals in hospitals [25] , [26] . For voice sounds, Verde et al. [27] used sustained phonation of vowels /a/, /e/, /o/ from a subset of the Coswara dataset, and obtain AUCs in the range 0.71−0.97 using different kinds of classifier. Using self-reported symptoms of COVID-19, Menni et. al. [28] showed that an AUC of 0.74 is achievable from two different sets of recorded data (from the US and UK). A recent study by Zaobi et. al. [29] further extended the analysis using a large pool of COVID-19 and healthy subjects. The key contributions from the current work are as follows. 1) Investigating a multi-modal integration approach to classification using cough, breathing and speech signals, along with self-reported symptoms. 2) Exploring recording level statistical feature descriptors of acoustic signals, and binary feature encoding of symptom data. 3) Emphasis on simple linear classifier models like linear regression and support vector machine approaches for COVID-19 classification task. 4) Understanding the importance of feature dimensions using ablation studies. This study is based on sound and symptom samples from the Coswara dataset 1 [21] . We use the data recorded (and released) up to 07-May-2021. This is composed of contributions from 1699 participants (157 COVID-19 positive). Each participant contributes 9 audio recordings, namely, (a) shallow and deep breathing, (b) shallow and heavy cough, (c) sustained phonation of three vowels /ae/ (as in bat), /i/ (as in beet), and /u/ (as in boot), and (d) fast and normal pace 1 − 20 number counting. Alongside this, each participant also records current health status (COVID-19 infection, symptoms and co-morbidity, if any), gender, age, and broad geographical location. No personally identifiable information is collected. The dataset collection protocol is approved by the Human Ethics Committee of the Indian Institute of Science, Bangalore and the P. D. Hinduja National Hospital and Medical Research Center, Mumbai, India. In this study, we focus on modeling and analysis of three sound categories, namely, (i) breathingdeep (breathing), (ii) cough-heavy (cough), and (iii) countingnormal (speech) and the symptom data. In this study, we select participants in the age group of 15 − 80 yrs. Any participant with less than 100 ms of sound sample or peak amplitude less than 10 −4 are removed. The resulting subset consists of data from 1569 participants. An illustration of geographic distribution, age, and gender is shown in Figure 1 (a,b,c). The participants come from several countries; however, 89% belong to India. The majority of the participants belong to the 15 − 45 years age group, and a majority are male (73%). The 1569 participants can be further grouped into three pools (Figure 1(d) ). In the first pool, referred to as non-COVID, 1403 participants are selfdeclared COVID-19 negative. This comprises individuals who are healthy (85%), exposed to COVID-19 positive patients (9%), or have pre-existing respiratory ailments. In the second pool, referred to as COVID, 135 participants are self-declared COVID-19 positive. This comprises individuals who have mild (73%), moderate (16%), or asymptomatic COVID-19 infection at the time of recording their audio sample. In the third pool, referred to as recovered, we have 32 participants. The three pools of participants, shown in Figure 1 (d), are re-organized to create non-overlapping subsets facilitating development of classifiers, testing, and analysis ( Figure 2 ). The subsets are also illustrated in Figure 1 (e). 1) Development and test set: These are obtained by performing a 80 − 20% random split on both non-COVID and COVID pools after removing the observation set (126 participants). The resulting subsets are referred to as the development and test sets. The development set is composed of 1125 (106 COVID) participants, and the test set has 286 (29 COVID) participants. The development set is further divided into training and validation sets in a five-fold validation setup. The train/validation folds are used for training and fine-tuning the classifiers. All the subsets have COVID and non-COVID pool distributions for gender/age groups similar to the full data. 2) Observation set: From the non-COVID pool shown in Figure 1 (d), we partition the data recorded between April 1-May 7, 2021. This data was recorded during the second wave of COVID-19 infections in India. This subset of non-COVID pool (126 nos.) is only used for score analysis in Section V-D. 3) Recovered set: The set of participants who self-reported as recovered (32 participants) is also separated out of development and test sets. As the date of recovery is not collected during recording, we only analyze the score distribution of these participants in Section V-D. The block schematic of the multi-modal diagnostic tool proposed in this work is shown in Figure 3 . We consider the binary classification task designed to separate COVID participants from non-COVID participants. On the sound sample data, we explore the logistic regression (LR) model, support For the discussion below, let x denote an acoustic feature vector and y denote a symptom feature vector. We denote the prediction score (higher values indicating a higher probability of COVID class), an output of the model, by p. Logistic regression (LR): The LR model generates the prediction score (p) as, where, w and b are the weight vector and the bias of the model, respectively. Here, σ is the logistic function, σ(a) = (1 + e −a ) −1 . The LR model is trained by minimizing the cost function E(.) defined as, where, c denotes the class label of the feature vector x, c = 1 for COVID class and c = 0 for non-COVID class, ||.|| 2 is the 2 -norm of the vector and λ is a regularization parameter. The cost function is optimized using standard gradient based methods [30] . Support vector machine (SVM): The linear SVM (Lin-SVM) model generates prediction scores as, where, w and b are the weight vector and bias of the model, respectively; and f () denotes the Platt scaling based calibration [31] . The model is learned by minimizing the softmargin cost function E(.) defined as, where, c denotes the class label of the feature vector x, c = 1 for COVID class and c = −1 for non-COVID class. The above cost function is optimized using constrained optimization methods involving primal-dual modeling. In the dual space, the weight vector w is expressed as a function of the inner-product of the feature matrix [31] . By replacing the inner-product with a kernel, the Lin-SVM model can be made to operate in a higher dimensional space. We explore SVM with radial basis function kernel (RBF) defined as, where k is the kernel function of two data points x i , x j and γ is a free parameter of the RBF kernel. The acoustic features that are input to the classifier are standardized using the global mean-variance, computed on the training data. Decision tree: Each node in the tree is associated with a feature dimension. The edges drawn out of a node indicate the value of the feature dimension for each possible value (in our case, the features are binary). The leaf nodes are associated with a posterior probability distribution over the classes. In a classification tree model, the leaf nodes represent classification decisions while the other nodes represent the set of conditions applied on to the features that lead to the class labels. The Gini criterion is used by the classifier [32] , and the minimum number of samples at the leaf node in the tree is chosen based on the cross-validation. The primary metric used in our analysis is the area-under-the curve (AUC) measure of the receiver operating characteristic curve (ROC). The ROC plots "1-specificity" versus the sensitivity. The sensitivity (a.k.a true positive rate) and specificity (a.k.a false negative rate) are defined as, Sensitivity = # correctly predicted COVID labels # COVID labels (6) Specificity = # correctly predicted non-COVID labels # non-COVID labels (7) where, label stands for a participant. We compute the ROC curve by varying the decision threshold from 0 to 1 in steps of 10 −4 and obtaining the specificity and sensitivity at each of these thresholds. The AUC is computed using the trapezoidal rule. The positive predictive value (PPV) is the probability that a participant with a positive decision from the test has the COVID-19 infection. Similarly, negative predictive value (NPV) is the probability that a participant with a negative decision does not have the COVID-19 infection. The Coswara dataset provides sound samples as WAV format audio files. A majority (> 90%) of these are sampled at 44.1 kHz. We standardize all sound files to a sampling rate of 44.1 kHz via re-sampling. The amplitude range of the audio file is also normalized to ±1. Any initial and trailing silences in the audio files (greater than 50 msec on either side) is removed using threshold amplitude value of 10 −4 . The average duration (and standard deviation) of sound samples correspond to 16 We make use of the low-level descriptors (LLDs) referred to as ComParE2016 [33] for this. These descriptors broadly quantify the energy, spectral, and voicing attributes in an acoustic signal. These are listed in Table I descriptors, the analysis using simple linear models of classification allows us to understand the acoustic characteristics of the cough, speech and breathing signals that enable the classification of COVID and non-COVID participants. This analysis is given in Section V-D. The dataset also has information on the presence/absence of 8 common symptoms from all the participants. The odds ratio of the participants with symptoms among the COVID and non-COVID category is shown in Figure 4 . Not all COVID participants have symptoms, and a few participants have more than one symptom. In terms of proportions, a higher proportion of participants with COVID have symptoms than the non-COVID participants. Figure 4 shows that the odds-ratio is higher for fatigue, muscle pain and loss of smell. In our model development, the symptoms are converted to 8 dimensional binary features for each participant. Each feature dimension is set to 1 or 0 depending on the presence or absence of the symptom respectively. The binary vectors derived in this manner are used to train and test a classifier which is input with these features. We explore the fusion of predicted probability scores from multiple modalities of acoustic data (cough, breathing and speech) and symptom data. Let {p 1 , p 2 , ..., p N } be the scores obtained from N different classifiers (maximum value of N is 4) for a given participant. The fused score is, We use the OpenSmile [34] Python toolbox 3 to extract the features from sound samples. The classifiers are implemented using the Scikit-learn Python toolkit [36] . In order to allow reproducible research, all the implementation scripts to extract features and train the classifiers reported in this work are available at https://github.com/iiscleap/MuDiCov. The experiments are performed on the dataset splits shown in Figure 2 . The hyper-parameters, namely, λ in Equation (2) (LR) and Equation (4) (SVM), and minimum number of samples in leaf nodes (decision tree), are selected using a five-fold validation procedure on the development data. The final value for each hyper-parameter is chosen based on the best average AUC measure on the five-folds. Subsequently, the classifier is trained on the entire development set with the chosen hyper-parameter value. This classifier is evaluated on the test set. We use the default options for other configuration parameters. For all the classifiers, we use the "balanced loss" option in the classifier configuration. In the balanced loss setting, the loss value for the positive class samples is weighted by the ratio of the number of negative class samples to positive class samples in the training set. The top row in Figure 5 depicts the five-fold cross validation results obtained for the three classifiers, independently trained on each of the three acoustic modalities. The plots correspond to the classifier with the best regularization parameter λ, identified based on maximum average validation AUC. The breathing (Br) and speech (Sp) categories performed relatively better compared to the cough (Co) data. The confidence interval around the mean ROCs and the mean AUCs indicates that the variance across the folds is also small. The bottom row in Figure 5 depicts the ROCs on the test set using the classifiers trained on the entire development set. The AUC is 0.79, 0.74 and 0.79 for Br, Co and Sp categories, respectively. Also, the performance of the lin-SVM is similar to LR. Further, the use of non-linear RBF kernel did not show consistent benefits. We use the logistic regression (LR) which is a linear classifier in all the subsequent experiments. Figure 6 shows the final decision tree model obtained after five-fold cross-validation. At each node, the classifier tests whether a given symptom is present or not. The numerical value at the leaf node is the probability of COVID class. The probability is greater than 0.76 if any one of the symptoms are present, and is 0.2 if none of the symptoms are present. For isolated symptoms of cough, cold and fever, the predicted probabilities are 0.775, 0.821 and 0.89 respectively. The isolated symptoms of loss of smell and fatigue are also assigned probability greater than 0.9 (higher odds ratio seen in Figure 4 ). The symptom of sore throat has the smallest probability of 0.764. The ROC for the decision tree classifier is shown in Figure 8 , where an AUC of 0.80 is observed. We investigate the cross-correlation between the test scores predicted by classifiers trained on the four categories of multimodal data -three categories of acoustics and the symptom data. Table. III shows the cross-correlation coefficient values on the test data. The correlation coefficient is less than 0.4 for all the pairs of modalities. The table shows that the scores predicted using symptoms have less correlation with scores from the sound categories of cough and speech. We analyze a score fusion across the multi-modal data categories (Equation (8)). Figure 8 shows the test ROCs for the individual modalities, fusion of the acoustic categories, and the fusion of all the four categories. The fusion of the acoustic categories yields an improvement over the individual categories and gives an AUC of 0.84. The symptom data has better performance over the sound categories in the region of low sensitivity. Further, the fusion of the four categories improves the overall AUC significantly, and gives an AUC of 0.92. It is noteworthy that this AUC performance is achieved using classical machine learning models, LR and decision-tree, along with an arithmetic mean based score combination. Figure 7 also shows the performance for fusion of symptoms and the acoustic modalities as well as the fusion with pairs of acoustic categories. We see that fusion with symptoms improves the performance of the acoustic based classifiers significantly. The fusion of all the four modalities has the test AUC of 0.92, an absolute improvement of 8% compared to the fusion of the acoustic categories alone. At 95% specificity, a sensitivity of 69% is achieved for the fusion of all modalities. The corresponding PPV is 0.75 with a NPV value of 0.95. Next, we investigate the decision power of the developed classifiers. Figure 9 shows the confusion matrices for the four modalities and the score fusion. The confusion matrices are shown for an operating point with specificity value of 0.95. At the chosen operating point, the overall accuracy of the fusion system is 92.7%, and the class-weighted accuracy is 82.1%. We analyze the score distribution from the individual modalities as well as the fusion system. The score distribution is given in Figure 10 (a). The dataset distribution used in training and testing of the models is shown in Figure 2 . Figure 10 (a) shows the score distribution for the recovered subset (in blue shade) along with the score distribution of COVID class and non-COVID class data (green and red shade respectively). As seen here, for all the classifier settings, the score distribution of the participants in the recovered status is found to be similar to those seen in COVID positive subjects. While the coupling of this data with the duration of the recovery period would have enabled an in-depth analysis, we would like to point out that the majority of the data came from participants who had just been discharged from the hospital facility. This indicates that the acoustic bio-markers of COVID may last for longer periods of time. Figure 10 (a) also shows the score distribution of the observation set (set of participants who are non-COVID, but recorded during the second wave period of April 1, 2021 to May 7, 2021. All participants in this observation set came from India). The observation set consists of data from three categories -completely healthy with no exposure to COVID, healthy but exposed to COVID and individuals with preexisting respiratory ailments but not infected with COVID. The score distribution for this subset is shown in orange shade in Figure 10 (a). As seen here, the data from the observation set also tends to have a higher probability score compared to the score values obtained for the non-COVID category. A closer analysis is given in Figure 10 (b). Here, the score distribution of the blind test data belonging to the non-COVID category is compared with the score distribution of the observation subset (also of non-COVID category). It is observed that the subset of participants who have self-reported as "exposed" have a higher value of scores during this period of second wave in India compared to the same category of participants from a previous time period. This indicates that the observation set of participants with "exposed" category may have acoustic bio-markers of the COVID infection. The ComParE2016 features [35] are the statistics computed from low-level descriptors of 13 different categories shown in Table I . In this section, we analyze the importance of subsets of features by (i) training a classifier with a selected feature subset (fSet) and (ii) training a classifier excluding the selected feature subset ({All features}\{fSet}). Table IV shows the test AUC obtained for different subsets. We see that the performance of statistical features based on MFCC alone is comparable to the use of all the features for cough and speech. For cough sounds, excluding MFCC based features from the feature set also results in a degradation in the performance, indicating that statistical features of MFCC capture the essential characteristics for classification of cough sounds. More than one subset of features gives test AUC comparable to the one using the full set for breathing and speech. However, the subsets with AUC above 0.7 correspond to the broad category of spectral features. Since multiple subsets have predictive power for breathing and speech, excluding a specific subset did not degrade the performance. The score combination of the best-performing feature categories in Table IV , i.e., 400-dimensional spectral roll-off features for breathing, 1400-dimensional MFCC features for cough, and the 400-dimensional energy features for speech, gives a test AUC of 0.88. The above analysis suggests that the ComParE2016 features capture redundant information with good predictive power for the COVID classification task. The paper presents a novel approach to point-of-care diagnostics of COVID-19 using multi-modal data of acoustics and symptoms. The data used in this study comes from a webbased crowd-sourced data collection platform. The acoustic features are based on statistical measures of low-level acoustic descriptors while binary features are extracted from the symptom data. The proposed multi-modal diagnostic tool for COVID (MuDiCoV) is the fusion of scores from individual classifiers. The classification models are simple linear models like logistic regression and decision tree classifiers having a small memory footprint (600 kB). On the test set, the average time from input to decision was found to be 5.06 secs (std. dev. 1.21 secs) on a desktop computer with Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz processor and 16 GB memory. The performance (with 69% sensitivity at 95% specificity) obtained on the test set surpasses the benchmark set by the Indian council of medical research (ICMR) for approval of POCT, (≥ 50% sensitivity at ≥ 95% specificity) [37] . We foresee that the use of simple classifiers and models would allow the diagnostic methods to be more interpretable. The proposed methodology combines all the advantages of being a rapid, low-cost, scalable, and remotely usable testing approach. Characteristics of SARS-CoV-2 and COVID-19 WHO Director-General's opening remarks at the media briefing on COVID-19 -16 Detection of 2019 novel coronavirus (2019-ncov) by real-time RT-PCR Scaling up COVID-19 rapid antigen tests: promises and challenges Low performance of rapid antigen detection test as frontline testing for COVID-19 diagnosis A coughbased algorithm for automatic diagnosis of pertussis TussisWatch: a smart-phone system to identify cough episodes as early symptoms of chronic obstructive pulmonary disease and congestive heart failure Detection of tuberculosis by automatic cough sound analysis Diagnosing chronic obstructive airway disease on a smartphone using patient-reported symptoms and cough analysis: Diagnostic accuracy study Cough sound analysis can rapidly diagnose childhood pneumonia Automatic identification of wet and dry cough in pediatric patients with respiratory diseases Development of machine learning for asthmatic and healthy voluntary cough sounds: a proof of concept study Epidemiology of COVID-19: A systematic review and meta-analysis of clinical characteristics, risk factors, and outcomes Clinical features of patients infected with 2019 novel coronavirus in Recent advances in computer audition for diagnosing COVID-19: An overview Exploring automatic diagnosis of COVID-19 from crowdsourced respiratory sound data The COUGHVID crowdsourcing dataset: A corpus for the study of large-scale cough analysis algorithms COVID-19 artificial intelligence diagnosis using only cough recordings Coswara -a database of breathing, cough, and voice sounds for COVID-19 diagnosis End-to-end convolutional neural network enables COVID-19 detection from breath and cough audio: a pilot study Wavelet-based cough signal decomposition for multimodal classification Deep-learning based approach to identify COVID-19 A generic deep learning based cough analysis system from clinically validated samples for point-of-need COVID-19 test and severity levels AI4COVID-19: AI enabled preliminary diagnosis for COVID-19 from cough samples via an app Exploring the use of artificial intelligence techniques to detect the presence of coronavirus COVID-19 through speech and voice analysis Real-time tracking of self-reported symptoms to predict potential COVID-19 Machine learning-based prediction of COVID-19 diagnosis based on symptoms The method of steepest descent for non-linear minimization problems Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods An introduction to decision tree modeling The INTERSPEECH 2016 computational paralinguistics challenge: Deception, sincerity & native language Opensmile: The munich versatile and fast open-source audio feature extractor On the acoustics of emotion in audio: What speech, music, and sound have in common Scikit-learn: Machine learning in Python ICMR Rapid Antigen Test Kits for COVID-19 (Oropharyngeal / Nasopharyngeal swabs) -28 The authors would like to express their gratitude to Anand Mohan for the design of the web based data collection platform. The authors would like to thank Dr. Nirmala, Dr. Shrirama Bhat, Dr. Chandra Kiran and Dr. Suhail Khalid for their co-ordination in data collection. The authors would like to acknowledge Amir Poorjam and Flavio Avila for discussions on ComParE2016 features.