key: cord-0730233-x3yqp9lp
authors: Alzubaidi, Mohammad A.; Otoom, Mwaffaq; Otoum, Nesreen; Etoom, Yousef; Banihani, Rudaina
title: A novel computational method for assigning weights of importance to symptoms of COVID-19 patients
date: 2021-01-15
journal: Artif Intell Med
DOI: 10.1016/j.artmed.2021.102018
sha: d9e17b977d9cbbe36d0146a7a5da10786bfd2407
doc_id: 730233
cord_uid: x3yqp9lp

BACKGROUND AND OBJECTIVE: The novel coronavirus disease 2019 (COVID-19) is considered a pandemic by the World Health Organization (WHO). As of April 3, 2020, there were 1,009,625 reported confirmed cases, and 51,737 reported deaths. Doctors have been faced with a myriad of patients who present with many different symptoms. This raises two important questions. What are the common symptoms, and what are their relative importance? METHODS: A non-structured and incomplete COVID-19 dataset of 14,251 confirmed cases was preprocessed. This produced a complete and organized COVID-19 dataset of 738 confirmed cases. Six different feature selection algorithms were then applied to this new dataset. Five of these algorithms have been proposed earlier in the literature. The sixth is a novel algorithm being proposed by the authors, called Variance Based Feature Weighting (VBFW), which not only ranks the symptoms (based on their importance) but also assigns a quantitative importance measure to each symptom. RESULTS: For our COVID-19 dataset, the five different feature selection algorithms provided different rankings for the most important top-five symptoms. They even selected different symptoms for inclusion within the top five. This is because each of the five algorithms ranks the symptoms based on different data characteristics. Each of these algorithms has advantages and disadvantages. However, when all these five rankings were aggregated (using two different aggregating methods) they produced two identical rankings of the five most important COVID-19 symptoms. Starting from the most important to least important, they were: Fever/Cough, Fatigue, Sore Throat, and Shortness of Breath. (Fever and cough were ranked equally in both aggregations.) Meanwhile, the sixth novel Variance Based Feature Weighting algorithm, chose the same top five symptoms, but ranked fever much higher than cough, based on its quantitative importance measures for each of those symptoms (Fever - 75 %, Cough - 39.8 %, Fatigue - 16.5 %, Sore Throat - 10.8 %, and Shortness of Breath - 6.6 %). Moreover, the proposed VBFW method achieved an accuracy of 92.1 % when used to build a one-class SVM model, and an NDCG@5 of 100 %. CONCLUSIONS: Based on the dataset, and the feature selection algorithms employed here, symptoms of Fever, Cough, Fatigue, Sore Throat and Shortness of Breath are important symptoms of COVID-19. The VBFW algorithm also indicates that Fever and Cough symptoms were especially indicative of COVID-19, for the confirmed cases that are documented in our database.

The novel coronavirus disease 2019 (COVID-19) is considered a pandemic by the World Health Organization (WHO). As of August 31, 2020, there are 24,954,140 reported confirmed cases, and 838,924 reported confirmed deaths. In addition, the disease has been transmitted to 208 countries, areas or territories [1] . Doctors have encountered countless COVID-19 patients with many different symptoms. This raises two important questions. What are the common symptoms of COVID- 19 patients, and what are the relative importance of these symptoms?

Machine learning methods can be used to analyze the importance of the different symptoms of the disease. However, these methods need a dataset of COVID-19 patients and their symptoms. At this time, few datasets are available on COVID-19 patients, and their symptoms, and the available datasets have some problems that make applying machine learning algorithms to them difficult. Such problems include:

• The data is not well structured.

• The data is of one classall records in the database are for confirmed COVID-19 cases.

To address the first problem, this work has created a structured dataset, using currently available datasets. To address the second problem, many one-class machine learning approaches have been proposed in the literature. To rank the symptoms, based on their importance, feature selection algorithms can be used. This paper makes the following contributions:

• We construct a preprocessed, cleaned and organized dataset of COVID-19 symptoms for confirmed cases, available to researchers upon request. • We propose a novel feature selection and weighting method, called the Variance Based Feature Weighting (VBFW) method for COVID-19 symptoms, for ranking the features (or symptoms) from the most important to least important, and assigning weights of importance to each of them. This assignment is automatically made, based on the change that would occur to the Variance of the training data instances if the selected feature were to be removed from the dataset.

The rest of this paper is organized as follows. Section 2 presents the background and literature review and poses our research question. Section 3 proposes our novel variance-based feature weighting method. It also presents the set of experiments conducted in this work. The results of these experiments are presented in Section 4, and are discussed in Section 5. Finally, Section 6 concludes the paper.

The coronavirus disease 2019 (known as COVID-19) is a new disease that appeared late in 2019 [1] .

Lauer et al. [2] investigated the incubation period of the coronavirus. Their study consists of 181 confirmed cases. They found that the incubation period ranges from 5.1-11.5 days.

Bai et al. [3] studied the asymptomatic carrier transmission of COVID-19. Their study included 5 patients who had some symptoms (fever and repository symptoms) and 1 asymptomatic patient. All patients underwent chest CT imaging. Their study was the first to find that a transmission of the disease could occur from an asymptomatic patient with a normal CT scan.

Shi et al. [4] described the CT findings across 81 patients with confirmed COVID-19. They found that abnormalities appeared on the chest CT scans for all COVID-19 patients -even the asymptomatic ones. Thus, the assessment of CT imaging features could facilitate the early diagnosis of the disease.

Bernheim et al. [5] studied chest CT scans of 121 symptomatic patients of confirmed COVID- 19 . Surprisingly, they found that 20 out of 36 patients imaged 0-2 days after symptom (i.e. 56 %) had normal CT scans. However, with longer time after the symptoms appeared, abnormalities started to appear on the CTs.

Rothan and Byrareddy [6] reviewed and highlighted the symptoms that could present in COVID-19 patients. These symptoms include systematic disorders (such as Fever, Cough, Fatigue, Sputum Production, Headache and Diarrhea) and respiratory disorders (such as Rhinorrhea, Sneezing, Sore Throat and Pneumonia).

Hellewell at al. [7] proposed a stochastic transmission mathematical model to assess whether isolation is an effective method to control the transmission of the COVID-19 disease. They found that case isolation is enough to control the transmission of the disease within 3 months.

The World Health Organization (WHO), in their situation report about COVID-19 [8] , defined three cases of COVID-19 patients: suspect case, probable case and confirmed case. A suspect case is defined as a patient with Fever and at least one more symptom (such as Cough, Shortness of Breath … etc.) and, in some cases, a history of travel to a suspicious area. A probable case is defined as a suspect case with a pending lab test, while a confirmed case is defined as a patient with a positive lab test result.

One-class learning [9, 10] is the problem of learning a model from a training dataset that has instances from only one class, with the absence of instances from the counter class. The learnt model should be able to distinguish between instances that belong to either the target class or the absent class. Several one-class learning algorithms are available in the literature. Some of these algorithms employ data generation methods to generate artificial data from the second (absent) class, and then use the traditional two-class learning algorithms [9, 11] .

Other algorithms try to learn the distribution of the available training instances [9, 12] or learn a compact boundary that encloses most of the training instances [9, 13] . A new instance that follows the learnt distribution, or lies inside the learnt boundary, is assigned to the target class. Otherwise, it is assigned to the absent class. These algorithms include one-class Support Vector Machines (SVM) [14] .

Feature selection [15, 16] , also known as attribute selection, is the process of selecting a subset of relevant or important features to be used in the learning process, and thus removing all irrelevant and redundant features. The selected features should be able to represent the original dataset without a substantial loss in the prediction performance. This process helps in (1) reducing the time complexity of the learning process by removing all irrelevant features from the feature space and (2) highlighting the most important and informative features that contribute most to the learnt model, and to the predictive variable.

Feature selection can also be used to rank the current features within a given dataset, based on their importance, starting from the most informative features down to the least informative [16] . If a weight that quantifies the importance and informativeness of a feature can be assigned to each feature in the dataset, it helps in the ranking process, and it can be reflected in the learning process as well, by putting more emphasis on the most important features during the learning process.

To do so, several measures can be extracted from the dataset to evaluate the importance of each feature. These measures consider several factors. Liu and Motoda [16] define the importance of a feature as the change that occurs to a specific measure after the removal of that feature from the dataset. In a later study [17] , the authors defined a set of categories for feature importance measures. These categories include:

• Distance. This category studies how the feature makes the training instances far from each other. • Information. This category studies the change in the information gain after and before the removal of the feature of interest. • Dependency. This category studies how dependent each feature is on the others.

• Consistency. This category studies how consistent each feature is in predicting the output variable.

Although many feature selection methods have been proposed for regular classification problems, few studies have investigated the application of these methods to one-class datasets. One of the most interesting studies on feature selection for one-class problems was done by Lorena et al. [18] . The authors proposed five feature selection measures from the different categories mentioned above. These measures are: Spectral Score, Information Score, Pearson Correlation, Intra-Class Distance and Interquartile Range. The authors then used these measures to rank the features, based on their importance. The resulting rankings were then combined using rank aggregation strategies [19] such as average ranking [20] and majority voting [21, 22] .

This work employs the five measures proposed in [18] . We also propose an additional measure, called Variance Based Feature Weighting, which allows ranking of the features based on importance, assigning weights of importance to each feature. Next, we explain these feature selection and importance measures.

2.4.1. Spectral score [18] In this measure, a weighted graph data structure is built using the available dataset. The graph consists of nodes linked by weighted edges. The nodes of the graph are the data instances, while the weights of the edges represent the similarities between the data instances. The similarity between two data instances is computed using the Radial Basis Function (RBF), as shown in Eq. (1).

where, S ij is the similarity between the instances x i and x j , and σ is the standard deviation of the data instances.

The spectrum of the graph is then used to rank the features based on how consistent and similar the data instances are, before and after the removal of each feature. [18] The entropy of the data is a measure of its randomness. When the entropy is low, the similarity between the data instances is high. This measure uses the RBF similarity in Eq. (1) to compute the entropy of the data, as shown in Eq. (2) .

where, E is the entropy of the data, n is the number of data instances, and S is the similarity matrix for the data instances. The change in the entropy is then used to rank the features, based on how homogenous and similar the data instances are, before and after the removal of each feature. [18] This measure uses Pearson correlation to compute the correlation between each feature and all other features in the dataset, as shown in Eq. (3) .

where, corr

is the total correlation of feature f i and m is the number of features.

The sum of absolute correlation values is then used to rank the features, based on how each feature is associated with other features in the dataset. [18] In this measure, a centroid instance is computed as the average of all data instances. The intra-class distance is then computed as the average distance between the centroid and all data instances, as shown in Eq. (4) .

where, ICD is the intra-class distance, n is the number of data instances and d(x i , x) is the Euclidean distance between the data instance x i and the centroid x.

The change in the intra-class distance is then used to rank the features based on how close the data instances are, before and after the removal of each feature. [18] This measure quantifies the variability and dispersion of the data instances by dividing them into four equal parts (Q 1 , Q 2 , Q 3 and Q 4 ), called quartiles. The interquartile range (IQR) is then computed as the difference between the third quartile (Q 3 ) and the first one (Q 1 ), as shown in Eq. (5) .

The change in the interquartile range is then used to rank the features, based on how dispersed the data instances are, before and after the removal of each feature.

The coronavirus disease 2019 (COVID-19) appeared late in 2019. Research is needed to discover the different aspects of the disease, such as its incubation period, its symptoms, its effects on chest CT scans and its transmission.

Some studies identified the main symptoms of the disease. However, none has computationally investigated the importance of each symptom. To do so, a researcher will be faced with datasets that only have information about confirmed COVID-19 cases. This suggests the use of oneclass learning and feature selection methods.

Many feature selection methods have been proposed in the literature. However, only a few have focused on a one-class dataset, including spectral score, information score, Pearson correlation, intra-class distance and interquartile range. Also, none of these proposed methods have focused on assigning weights of importance to features in one-class datasets.

With this in mind, we pose the following research question.

Research Question: How could we use feature selection methods to (1) rank the COVID-19 symptoms based on their importance and (2) assign importance weights to each symptom?

In this section, we present our proposed feature importance measure, as well as the experiments we conducted to evaluate that measure.

This section presents our novel Variance Based Feature Weighting method. First, we define what an important feature is. Then, we describe our feature weighting function. Last, we formally define our proposed VBFW method.

If the inclusion of a feature to the training dataset causes the variance of the values of the feature across the data instances to increase, then it is defined to be an important feature. On the other hand, if its inclusion to the dataset causes the variance to decrease, or stay constant, it is not an important feature.

If the dataset contains n instances and m features, then the values of feature i across the n instances form an n-element vector (where n > 1). The variance of the n-element vectors (generated for each of the m features) will also be represented as an n-element vector. The variance comparison in the weighting function will produce an n-element binary vectorwith 1 in the jth element if the corresponding element in the original variance (including the feature) is greater than that of the new variance (after removing the feature), and 0 otherwise. The weight of importance of the feature would then depend on the number of ones on that binary vector, which would range between 0 and n. We choose to normalize the weights to values between 0 and 1, by dividing the count of ones in the binary vector by the length of that vector (i.e. n).

In Table 1 , we present the formal definition for our proposed VBFW feature weighting method.

The proposed importance weight of a feature is calculated as the percentage of data instances that saw an inter-feature variance decrease (that is V ALL > V new ) after removing said feature. When the proposed VBFW method works in a feature space that binarizes the presence of a feature, the direction of the change of variance from removing a feature is influenced by how many total positive (i.e. present) features a data instance has. The variance of equally likely discrete values can be expressed without referring to the mean as squared deviations of all points from each other. When a negative (i.e. absent) feature is removed, if the remaining features are primarily positive (i.e. present), then the variance is likely to decrease. On the other hand, when a positive feature is removed, if the remaining features are primarily negative, then the variance is likely to decrease.

In other words, features with higher importance weights are essentially those that are less likely to co-exist with other features. The fact that a feature has high importance weight reflects the possibility that it is less likely to be present with other features. The less important features are likely to be regarded as supplementary features that typically accompany more common ones.

Thus, one main characteristic of the proposed VBFW method is that it is largely determined by the co-occurrence of features in a data instance. Therefore, if the data instances are representing an application that correlates with this characteristic, then the proposed VBFW is recommended, otherwise, it is not.

Similarly, the behavior of the proposed VBFW method when dealing with datasets that have features with continuous and/or multiple discrete values could be summarized as follows. The direction of the change of variance from removing a feature is influenced by how many features of relatively high values a data instance has. When a feature of relatively low value is removed, if the remaining features are primarily of relatively high values, then the variance is likely to decrease. On the other hand, when a feature of relatively high value is removed, if the remaining features are primarily of relatively low values, then the variance is likely to decrease.

This section presents the experiments that were conducted in this work to evaluate the proposed feature weighting method.

A COVID-19 dataset from the COVID-19 Open Research Dataset (CORD-19) repository [23] was used in this work. The data contains information about 14,251 confirmed cases of COVID-19 patients, geographically distributed as shown in Table 2 . This information does not include information about all symptoms in all of the patients. Further, the data is not well structured for learning and data mining algorithms, such as feature selection. Thus, data preprocessing must be performed.

The data was preprocessed and organized as follows. First, cases with symptoms were collected. This resulted in 738 patient records, geographically distributed as shown in Table 3 . Then, the reported symptoms were collected, to form a list of 80 symptoms. Many of these symptoms were synonyms for each other. Thus, we were able to reduce the list to 20 symptoms. This was done in an ad-hoc manner by the two medical doctors, who are co-authors of this work. For example, "anorexia" and "loss of appetite" were merged. (The grouping of the 80 symptoms is included as supplementary material.) The final list is shown in Table 4 , along with the distribution of age and gender. This list was then used to create a 738 × 20 data records for the 738 confirmed COVID-19 cases. Each of the 738 rows represents a patient case. While Table 1 Formal Definition: VBFW Method.

Let D be the dataset of one-class instances with n instances and m features

Step 1: For each feature f, form an n-element feature vector from the values of f across the n data instances. Each feature vector has a shape of "matrix (n, 1)" and there are m such feature vectors.

Step 2: Compute the variance VALL of all m feature vectors generated in Step 1. The variance should be taken row-wise resulting in a shape of (n, 1).

For each Feature f :

Step 3: Compute the new variance Vnew excluding the n-element vector of feature f

Step 4: Compare Vnew to VALL. The result is an n-element binary vector B B = VALL > Vnew

Step 5: Count the number of ones in B. The result is between 0 and n Nones = ∑ n i=1 Bi

Step 6: Assign an importance weight W f to feature f as W f = Nones n Output W as the set of importance weights of all m features Hong Kong  94  46  Switzerland  11  21  India  6  47  Taiwan  34  22  Iran  46  48  Thailand  81  23  Iraq  6  49  UAE  41  24  Israel  5  50  UK  32  25  Italy  591  51  USA  35  26  Japan  921  52  Vietnam  32 each column represents a binary feature for each of the 20 symptoms. A value of 1 for a feature means that the corresponding symptom was recorded for the patient. On the other hand, a value of 0 means that the symptom was not recorded. As part of our contribution in this work, we make this preprocessed, cleaned and organized dataset available to researchers upon request.

In this experiment, we applied the Spectral Score feature selection measure to our COVID-19 dataset. The features (i.e. 20 symptoms) were then ranked, based on this measure. Below are the detailed steps followed in this experiment.

(1) Compute a 738 × 738 similarity matrix S between the 738 instances in our dataset, using Eq. (1) 

(4) Rank the symptoms (features) based on their spectral scores (higher values indicate more important symptoms).

In this experiment, we applied the Information Score feature selection measure to our COVID-19 dataset. The features (i.e. 20 symptoms) were then ranked based on this measure. Below are the detailed steps followed in this experiment. 

In this experiment, we applied the Pearson Correlation feature selection measure to our COVID-19 dataset. The features (i.e. 20 symptoms) were then ranked based on this measure. Below are the detailed steps followed in this experiment. Note that there are different types of correlation computation methods such as Pearson, Spearman and Kendall. There are some advantages of using one method over another depending on the type and/ or distribution of the data. However, in the case of having binary vectors to represent the input data, which is the case of this study, the results of these correlation methods are identical. Hence, there is no need to compare Pearson correlation with the other methods.

In this experiment, we applied the Intra-Class Distance feature selection measure to our COVID-19 dataset. The features (i.e. 20 symptoms) were then ranked based on this measure. Below are the detailed steps followed in this experiment. (1) and (2), compute the standard deviation, as well as the 95 % Confidence Interval, for the average of Euclidean distances of each feature f.

In this experiment, we applied the Interquartile Range feature selection measure to our COVID-19 dataset. The features (i.e. 20 symptoms) were then ranked based on this measure. Below are the detailed steps followed in this experiment.

(1) Compute the interquartile range R for the 738 instances in our dataset, using Eq. (5) (2) For each feature f of the 20 features (i.e. symptoms) a Compute a new interquartile range R n after removing the feature f, using Eq. (5) b Compute the Interquartile Range IQR for the feature f as the Euclidean distance between the two ranges (R, R n )

(3) Rank the symptoms (features) based on their interquartile ranges (higher values indicate more important symptoms).

In this experiment, we apply rank aggregation to the ranking results in Experiments I -V. We use the Averaging Method as well as the Majority Method. Below are the detailed steps followed in this experiment.

(1) Compute the ranking for the 20 features (i.e. symptoms) in our dataset, using Experiments I -V. 

In this experiment, we apply the proposed VBFW method on our COVID-19 dataset. Using this method, quantitative importance weights will be computed for each of the 20 symptoms. Below are the detailed steps followed in this experiment. 

(4) Rank the symptoms (features) based on their importance weights (higher values are more important)

The purpose of this experiment is to validate and quantify the performance of the proposed VBFW method. To do so, we use machine learning evaluation metrics as well as rank-aware evaluation metrics.

One way to evaluate feature selection methods is to apply the same machine learning model with the features selected from all feature selection methods, and then use some classification evaluation metrics (such as accuracy) to report the performance [24] .

In this part of the experiment, we apply the One-Class Support Vector Machines (OCSVM) [14] with the top five features (i.e. symptoms) selected from each of the six feature selection methods. For each method, the classification accuracy is computed using the well-known 10-fold cross validation method [25] . Below are the detailed steps followed in this part of the experiment.

(1) For each of the six feature selection methods (used in Experiments I -VII), select the top five features (i.e. symptoms) (2) Create an updated version of our COVID-19 dataset for each of the feature selection method based on the five features (i.e. symptoms) selected in (1) (3) For each of the feature selection methods, use the updated version of the dataset to a Compute a prediction model using OCSVM method b Compute the classification accuracy using the 10-fold cross validation method

Another way to evaluate feature selection methods is to apply rank-aware evaluation metrics [26] to the ranking results from all feature selection methods. A good feature selection method is the one that puts relevant features very high up the list of ranked features. The rank-aware metrics select the feature selection method that aims to achieve this goal. There are many rank-aware metrics. Some of which are binary relevance-based metrics such as Mean Reciprocal Rank (MRR) [27] and Mean Average Precision (MAP) [27] . These metrics focus on whether a feature is good (relevant) or not. Other metrics are utility-based metrics, which focus on the degree of goodness (relevance) or relative goodness for each feature, such as the Normalized Discounted Cumulative Gain (NDCG) [27] . Since our proposed VBFW method aims at providing relative order of importance between the features (i.e. symptoms), this work uses the NDCG metric to compare the performance of the ranked results of the proposed VBFW method with those of the other five baseline methods.

The NDGC metric [27] provides a measure of the ranking quality. It does so by comparing the ranked results with ground truth ranking. It is usually computed for the first p ranked items (i.e. at a particular rank position p) and is known as NDGC@p or NDGC p . Its value is between 0 and 1. A higher value means a higher quality ranking. It is given by the following formula:

where,

rel p is the graded relevance of the item at position i REL p is the list of relevant items ordered by their relevance up to position p This work uses the NDCG implementation provided by [28] . In this part of the experiment, we apply the NDCG@p metric to the proposed VBFW method and the five baseline feature selection methods to quantify their ranked results up to feature (i.e. symptom) at rank position p = 5. The ground truth ranking used in this computation is the result of rank aggregation of Experiment VI, plus two additional rankings provided by two medical doctors. Below are the detailed steps followed in this part of the experiment.

(1) Set the two ranked results of Experiment VI as ground truth (2) Two medical doctors are asked to rank the 20 symptoms (shown in Table 4 ) based on their importance in predicting COVID-19. (3) The two rankings from (2) are added to the ground truth in (1) (4) For each of the six feature selection methods a Select the top five features (i.e. symptoms) with their relative ranking positions b Compute the NDCG@5 metric

In Experiment VII, we use the proposed VBFW method to find some important features (i.e. symptoms). These features are across the entire population. However, it might be interesting to detect important features (i.e. symptoms) with respect to people with different genders, ages, and countries, which could then be used for each sub-population.

In this experiment, we split our COVID-19 dataset into partitions based on gender, age and country. Then, we apply our VBFW method to each partition to compute quantitative importance weights for each of the 20 symptoms. Below are the detailed steps followed in this experiment.

(1) Split our COVID-19 dataset into two partitions: one for males and one for females. For each partition, apply the proposed VBFW method to compute the importance weights for each of the 20 symptoms and rank the symptoms based on these weights. (As done in Experiment VII) (2) Split our COVID-19 dataset into four partitions: one for each age range (as presented in Table 4 ). For each partition, apply the proposed VBFW method to compute the importance weights for each of the 20 symptoms and rank the symptoms based on these weights. (As done in Experiment VII) (3) Split our COVID-19 dataset into 26 partitions: one for each country (as presented in Table 3 ). For each partition, apply the proposed VBFW method to compute the importance weights for each of the 20 symptoms and rank the symptoms based on these weights. (As done in Experiment VII)

In this section, we present the results of the nine experiments (I -IX), explained in the previous section. Table 5 shows a sorted list of the Spectral Score measures for the 20 features (i.e. symptoms) in our dataset, derived from Experiment I. Note: Higher Spectral Scores indicate greater importance. Thus, a rank of 1 is assigned to the feature with the highest Spectral Score and a rank of 20 is assigned to the feature with the lowest Spectral Score. Table 6 shows a sorted list of the Information Score measures for the 20 features (i.e. symptoms) in our dataset, derived from Experiment II. Note: Higher Information Scores indicate greater importance. Thus, a rank of 1 is assigned to the feature with the highest Information Score and a rank of 20 is assigned to the feature with the lowest Information Score. Table 7 shows a sorted list of the Pearson Correlation measures for the 20 features (i.e. symptoms) in our dataset, derived from Experiment III. Note: Higher Pearson Correlations indicate greater importance. Thus, a rank of 1 is assigned to the feature with the highest Pearson Correlation and a rank of 20 is assigned to the feature with the lowest Pearson Correlation. Table 8 shows a sorted list of the Intra-Class Distance measures for the 20 features (i.e. symptoms) in our dataset along with their standard 28 20 deviation and confidence interval measures, derived from Experiment IV. Note: Lower Intra-Class Distance indicated greater importance. Thus, a rank of 1 is assigned to the feature with the lowest Intra-Class Distance and a rank of 20 is assigned to the feature with the highest Intra-Class Distance. Moreover, the resulting narrow Confidence Intervals (≈ ±0.03) indicates that the average Euclidean distance is a good representative of the sample metric. Table 9 shows a sorted list of the Interquartile Range measures for the 20 features (i.e. symptoms) in our dataset, derived from Experiment V. Note: Higher Interquartile Ranges indicate greater importance. Thus, a rank of 1 is assigned to the feature with the highest Interquartile Range and a rank of 20 is assigned to the feature with the lowest Interquartile Range. Table 10 shows the rank aggregation results derived from Experiment VI, using the Averaging rank aggregation method. Note: Lower rank values indicate greater importance. Thus, the features (i.e. symptoms) were ordered from rank 1 (i.e. most important) to rank 20 (i.e. least important). Table 11 shows the rank aggregation results derived from Experiment VI, using the Majority voting rank aggregation method. Note: Lower rank values indicate greater importance. Thus, the features (i.e. symptoms) were ordered from rank 1 (i.e. most important) to rank 20 (i.e. least important). Fig. 1 shows the results of applying the proposed VBFW method to our dataset, derived from Experiment VII. The figure shows the importance weights (in percentages) assigned to each of the 20 features (or symptoms) along with their ranking. Fig. 2 shows the results of applying one-class Support Vector Machine to our dataset using the top five features (or symptoms) resulted from each of the six feature selection methods, derived from Experiment VIIIA. The figure shows the 10-fold cross validation accuracy (in percentages) for each feature selection method. Fig. 3 shows the results of applying rank-aware evaluation and validation to ranked results generated from each of the six feature selection methods, derived from Experiment VIIIB. The figure shows the NDCG@5 (in percentages) for each feature selection method. Fig. 4 shows the results of applying the proposed VBFW method to our dataset when split based on gender, derived from Experiment IX. The figure shows the importance weights (in percentages) assigned to each of the 20 features (or symptoms) along with their ranking. Fig. 5 shows the results of applying the proposed VBFW method to our dataset when split based on age, derived from Experiment IX. The figure shows the importance weights (in percentages) assigned to each of the 20 features (or symptoms) along with their ranking. Fig. 6 shows the results of applying the proposed VBFW method to our dataset when split based on country, derived from Experiment IX. The figure shows the importance weights (in percentages) assigned to each of the 20 features (or symptoms) along with their ranking.

The results presented in Tables 5 and 6 indicate that the five most important symptoms of COVID-19 confirmed cases (based on the spectral score as well as the information score measures) are as follows, starting from the most important to least important:

Cough, Fatigue, Fever, Sore Throat, Shortness of Breath The results presented in Table 7 indicate that the five most important symptoms of COVID-19 confirmed cases (based on the Pearson correlation measure) are as follows, starting from the most important to least important:

Fever, Fatigue, Diarrhea, Chill, Cough The results presented in Table 8 indicate that the five most important symptoms of COVID-19 confirmed cases based on the intra-class distance measure are as follows, starting from the most important to least important:

Cough, Fever, Fatigue, Sore Throat, Shortness of Breath The results presented in Table 9 indicate that the five most important symptoms of COVID-19 confirmed cases (based on the interquartile range measure) are as follows, starting from the most important to least important:

Average Rank Aggregation.

In summary, for our COVID-19 dataset, the use of different feature selection measures provided different importance levels for the top-five ranked symptoms, and even different sets of important symptoms. This is due to the fact that each of the five measures for feature selection (or importance) ranks the symptoms based on different data characteristics.

For example, the spectral score and information score measures focus on the similarity and consistency of the data, while the intra-class distance measure focuses on the dissimilarity with the centroid artificial instance. On the other hand, the Pearson correlation measure focuses on the association and correlation between the features, while the interquartile range measure focuses on the features with more concentrated values. Taken together, this suggests that each of the five measures has advantages and disadvantages. All of these rankings can be aggregated, to combine the distinct aspects of the data considered by each measure.

The aggregation results presented in Tables 10 and 11 show that the five most important symptoms of COVID-19 confirmed cases (based on both the average ranking and the majority vote ranking aggregation methods) are as follows, starting from the most important to least important. (Fever and cough were ranked equally in both aggregations.)

Fever/Cough, Fatigue, Sore Throat, Shortness of Breath The fact that the results for the top five symptoms from both of these aggregation ranking methods were identical supports the identification of these five symptoms as being the most indicative of COVID-19 in these confirmed cases.

The results presented in Fig. 1 indicate that the five most important symptoms of COVID-19 (based on the proposed Variance Based Feature Weighting (VBFW) method) are as follows, starting from the most important to least important:

Fever, Cough, Fatigue, Sore Throat, Shortness of Breath Note that the VBFW method ranked fever much higher than cough, based on its quantitative importance measures for each of those symptoms (Fever -75 %, Cough -39.8 %, Fatigue -16.5 %, Sore Throat -10.8 %, and Shortness of Breath -6.6 %).

These percentages show that, out of the five most important symptoms (i.e. Fever, Cough, Fatigue, Sore Throat, Shortness of Breath), Fever and Cough symptoms are common to a very high percentage of confirmed COVID-19 cases.

Since this work deal with a feature space that binarizes the presence of a feature (i.e. symptom) in unstructured text, the direction of the change of variance from removing a symptom is influenced by how many total present symptoms a case has. Symptoms with higher importance weights are essentially those that are less likely to co-exist with other symptoms. The fact that fever has high importance weight reflects the possibility that it is less likely to be mentioned with other symptoms. In other words, when fever is reported, people may be less inclined to report other symptoms. The less important symptoms such as anorexia and sweating are likely to be regarded as secondary symptoms that typically accompany more common ones such as fever and cough. In this work, the proposed VBFW method is largely determined by the co-occurrence of symptoms in a case.

The results presented in Fig. 2 indicate that building a one-class Support Vector Machine model using the five most important features (or symptoms) resulted from our VBFW method (i.e. Fever, Cough, Fatigue, Sore Throat, Shortness of Breath) outperforms that of using other features. The built model achieved an accuracy of 92.1 % using the 10fold cross validation method. Note that these results represent a rankless performance evaluation, in which the rank order of the top five features (or symptoms) does not affect the performance of the built model.

On the other hand, the results presented in Fig. 3 indicate that the ranking of the features (or symptoms) based on their importance resulted from our VBFW method outperforms those resulted from the other five feature selection method. The VBFW ranking achieved a normalized discounted cumulative gain of 100 % using the top five symptoms (i.e. NDCG@5). Note that these results represent a rankaware performance evaluation, in which the rank order of the top five features (or symptoms) affects the performance of the feature selection method.

Taken together, the results of Experiment VIII suggest that our proposed VBFW method outperforms five state-of-the-art feature selection methods. 

The results presented in Fig. 4(a) indicate that the five most important symptoms of COVID-19 for males are as follows, starting from the most important to least important:

Fever, Cough, Fatigue, Sore Throat, Myalgias The results presented in Fig. 4(b) indicate that the five most important symptoms of COVID-19 for females are as follows, starting from the most important to least important:

Fever, Cough, Fatigue, Sore Throat, Runny Nose This suggests that although the first four important symptoms are common for both males and females who are infected with COVID-19, the fifth symptom indicate that males are likely to suffer from Myalgias while females are likely to suffer from Runny Nose.

The results presented in Fig. 5(a) indicate that the five most important symptoms of COVID-19 for people of age 0− 14 are as follows, starting from the most important to least important:

The results presented in Fig. 5 This suggests that kids of age 0− 14 who are infected with COVID-19 are likely to suffer from Runny Nose, Diarrhea and Flu rather than Fatigue and Shortness of Breath. It also suggests that people of age 15-64 are likely to suffer from Myalgias rather than Shortness of Breath, and elderly people are likely to suffer from Sputum rather than Sore Throat. 

The results presented in Fig. 6(a) indicate that the five most important symptoms of COVID-19 for people in China are as follows, starting from the most important to least important:

Fever, Cough, Fatigue, Sore Throat, Pneumonia The results presented in Fig. 6 (b) indicate that the five most important symptoms of COVID-19 for people in Hong Kong are as follows, starting from the most important to least important:

Cough / Fever, Sore Throat, Shortness of Breath / Diarrhea The results presented in Fig. 6 (c) indicate that the five most important symptoms of COVID-19 for people in Japan are as follows, starting from the most important to least important:

Fever, Cough, Fatigue, Sore Throat, Shortness of Breath The results presented in Fig. 6(d) indicate that the four most important symptoms of COVID-19 for people in Malaysia are as follows, starting from the most important to least important:

Fever, Cough, Sore Throat, Fatigue

The results presented in Fig. 6 (e) indicate that the five most important symptoms of COVID-19 for people in South Korea are as follows, starting from the most important to least important:

Fever, Cough, Myalgias, Chill, Sore Throat The results presented in Fig. 6 (f) indicate that the five most important symptoms of COVID-19 for people in Taiwan are as follows, starting from the most important to least important:

Cough, Fever, Sore Throat, Fatigue / Shortness of Breath This suggests that people in China who are infected with COVID-19 are likely to suffer from Pneumonia rather than Shortness of Breath. It also suggests that people in Hong Kong are likely to suffer from Diarrhea rather than Fatigue, and people in South Korea are likely to suffer from Myalgias and Chill rather than Fatigue and Shortness of Breath.

In this paper, we posed the following research question: Q: How could we use feature selection methods to (1) rank the COVID-19 symptoms based on their importance and (2) assign importance weights to each symptom?

A novel Variance Based Feature Weighting (VBFW) method is proposed in this paper. This method is able to (1) rank the features in oneclass datasets based on their importance and (2) assign quantitative importance weights to each of these features.

The results presented in this paper show that the proposed VBFW method provides weight assignment for the features (symptoms) of the COVID-19 one-class dataset that is equal to, or better than, the feature ranking results obtained by state-of-the-art methods. The results also show that the proposed VBFW method achieved an accuracy of 92.1 % when used to build a one-class SVM model, and an NDCG@5 of 100 %.

Overall, the results suggest that symptoms of Fever, Cough, Fatigue, Sore Throat and Shortness of Breath should be considered important symptoms when diagnosing patients for COVID-19, with a particular focus on Fever and Cough symptoms.

The following aspects form future directions and plans for researchers:

• Testing and validating the proposed VBFW method on other available COVID-19 datasets. • Generalizing the proposed VBFW method to other one-class datasets, beside COVID-19 data. • Performing further statistical analysis to study the common symptoms on COVID-19 patients with respect to different gender, ages, races and counties. This might include the use of odds ratio. • Experimenting the proposed VBFW method with datasets that have features with continuous and/or multiple discrete values.

The authors report no declarations of interest.

Coronavirus disease (COVID-19) pandemic. World Health Organization

The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: estimation and application

Presumed asymptomatic carrier transmission of COVID-19

Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: a descriptive study

Chest CT findings in coronavirus disease-19 (COVID-19): relationship to duration of infection

The epidemiology and pathogenesis of coronavirus disease (COVID-19) outbreak

Feasibility of controlling COVID-19 outbreaks by isolation of cases and contacts

World Health Organization

One-class classification: concept learning in the absence of counter-examples

A survey of recent trends in one class classification

Counter-example generation-based one-class classification

Witten Ian H. One-class classification by combining density and class probability estimation

Estimating the support of a high-dimensional distribution

SVM and boosting: one class. GMD-Forschungszentrum Informationstechnik

Feature selection: an ever evolving frontier in data mining. Feature selection in data mining

Feature extraction, construction and selection: a data mining perspective

Toward integrating feature selection algorithms for classification and clustering

Lorena Ana C. Filter feature selection for one-class classification

Combining feature ranking algorithms through rank aggregation

An extensive comparison of feature ranking aggregation techniques in bioinformatics

An empirical comparison of voting classification algorithms: bagging, boosting, and variants

Diversity in search strategies for ensemble feature selection

CORD-19). 2020. Version 2020-03-13

Empirical evaluation of feature selection methods in classification

Introduction to data mining. India: Pearson Education

MRR vs MAP vs NDCG: Rank-Aware Evaluation Metrics and when to Use Them

Information retrieval evaluation

Normalized Discounted Cumulative Gain (NDCG)

 Symptom  SPEC  IS  PC  ICD  IQR  Average Rank   Fever  3  3  1  2  1  2  Cough  1  1  5  1  2  2  Fatigue  2  2  2  3  6  3  Sore Throat  4  4  10  4  3  5  Shortness of Breath  5  5  8  5  7 Chill  11  11  4  10  9  11  Chest Pain  12  12  15  13  11  12  Flu  13  13  16  15  15  13  Anorexia  14  14  18  16  16  14  Conjunctivitis  15  15  20  18  17  15  Vomiting  16  16  13  12  14  16  Diarrhea  17  17  3  14  8  17  Abdominal Pain  18  18  14  17  19  18  Pleural Effusion  20  20  19  19  13  19  Sweating  19  19  17  20  20  19 Note: Where the rankings produced by both aggregation methods agreed, they are shown in bold in Tables X and XI above. Fig. 1 . VBFW Weights and Ranking.

Supplementary material related to this article can be found, in the online version, at doi:https://doi.org/10.1016/j.artmed.2021.102018.