key: cord-0950239-0cnyidct
authors: Tavakolian, Alireza; Hajati, Farshid; Rezaee, Alireza; Fasakhodi, Amirhossein Oliaei; Uddin, Shahadat
title: Fast COVID-19 versus H1N1 screening using Optimized Parallel Inception [Image: see text]
date: 2022-05-20
journal: Expert Syst Appl
DOI: 10.1016/j.eswa.2022.117551
sha: 41b86f3111012be44c621b9c449a0c415c8a359d
doc_id: 950239
cord_uid: 0cnyidct

COVID-19 and swine-origin influenza A (H1N1) are both pandemics that sparked significant concern worldwide. Since these two diseases have common symptoms, a fast COVID-19 versus H1N1 screening helps better manage patients at healthcare facilities. We present a novel deep model, called Optimized Parallel Inception, for fast screening of COVID-19 and H1N1 patients. We also present a Semi-supervised Generative Adversarial Network (SGAN) to address the problem related to the smaller size of the COVID-19 and H1N1 research data. To evaluate the proposed models, we have merged two separate COVID-19 and H1N1 data from different sources to build a new dataset. The created dataset includes 4,383 positive COVID-19 cases, 989 positive H1N1 cases, and 1,059 negative cases. We applied SGAN on this dataset to remove issues related to unequal class densities. The experimental results show that the proposed model’s screening accuracy is 99.2% and 99.6% for COVID-19 and H1N1, respectively. According to our analysis, the most significant symptoms and underlying chronic diseases for COVID-19 versus H1N1 screening are dry cough, breathing problems, diabetes, and gastrointestinal.

According to recent regulations by the international virus taxonomy committee, coronaviruses are non-segmented positive-sense Ribonucleic acid (RNA) viruses that belong to the family of Coronaviridae, the order Nidovirales, and the genus Coronavirus (Zhou et al., 2019) . One of the variants of coronaviruses is COVID-19. There are more than hundreds of COVID-19 strains which cause illness in animals. In a process called 'spillover event', COVID-19 has jumped from unknown animal sources to human. The case-fatality rate of COVID-19 varies from 10.8% in European countries like Italy at their first pandemic wave to 0.7% in Germany. But, the global fatality rate according to the total deaths and the total recovered cases in the world has been 2.71% (Vaillant et al., 2009 ).

The influenza virus is notable for its periodic occurrence, and yearly economic impact (Purohit et al., 2018) . The annual seasonal influenza epidemic infects 3-5 million people with serious conditions worldwide (Nguyen et al., 2016) . In 2009, the novel swine-origin influenza A (H1N1) virus was identified (Patel et al., 2010) . Most fatality cases of H1N1 occurred in patients aged 21 to 50 years (Patel et al., 2010) . The reported case-fatality rate of H1N1 during the pandemic was from 0.3% to 3% (Vaillant et al., 2009) . For diagnosing H1N1 with flu-like symptoms, routine investigations such as haematological, microbiological, biochemical, and radiologic tests are performed. Due to the body immune system reaction, common symptoms of H1N1 include high fever, coryza, and myalgia. In severe cases, viral pneumonia, superimposed bacterial pneumonia, and hemorrhagic bronchitis have been reported (Jilani et al., 2020) . Other symptoms of the H1N1 virus are similar to common seasonal flu-like symptoms such as sore throat, fatigue, running nose, cough, and headache. The most common COVID-19's symptoms are cough, weakness, myalgia, fever, headache, impaired sense of smell, impaired sense of taste, sore throat, runny nose, and nasal congestion (Vaillant et al., 2009) . The common symptoms of both COVID-19 and H1N1 are similar, which makes their screening task challenging. Also, the peak season of these viruses may overlap (Foust et al., 2020) . So, a fast screening of patients regarding these two viruses using an invasive procedure can help healthcare systems respond better.

This research proposes a novel deep model, called Optimized Parallel Inception (OPI), for fast screening of COVID-19 and H1N1 patients. We evaluate the proposed model by measuring accuracy, precision, recall, F1score, and Area Under Receiver Operating Characteristic (AUROC). We have built a dataset by merging two publicly available data of COVID-19 and H1N1 to conduct the experiments. Also, we identify the possible chronic disease predictors for COVID-19 and H1N1 using the experiments' outcome. Finally, to address the lack of data, a Semi-supervised Generative Adversarial Network (SGAN) is deployed on 400 samples (less than 10% of the built dataset). The result shows the effectiveness of the SGAN for screening H1N1 and COVID-19 patients even with a small training dataset. and unsupervised. (Esen et al., 2008a , Esen et al., 2008b . If the labels of training samples are known, the task will be categorized as supervised learning. (Esen et al., 2017 , Esen et al., 2009 , Esen et al., 2009 ). Since we have the training labels of the COVID-19 patients in the screening task, we can consider it as the supervised learning. Many studies were conducted to diagnose COVID-19 patients and extract the most critical features in predicting the COVID-19 virus behaviour to reduce infection. Khanday et al. (2020) used clinical text data to classify COVID-19 patients using classical and ensemble machine learning methods. Their models grouped 212 patients into five classes: COVID-19, Acute Respiratory Distress Syndrome (ARDS), COVID-19 and ARDS, Severe Acute Respiratory Syndrome(SARS), and wholesome patients. Using the Term Frequency-Inverse Document Frequency (TF-IDF) technique, they first extracted the most correlated features. Then, they used multinomial naive Bayesian (Corbett-Davies & Goel, 2018) and logistic regression to achieve an accuracy of 96.2%. Wang and Wong (2020) used 13,975 chest radiography images to develop a Convolutional Neural Network (CNN) model to diagnose COVID-19 in patients. They compared their model with both Residual Networks-50 (ResNet50) and Visual Geometry Group-16 (VGG16) (Gikunda & Jouandeau, 2019) . Their proposed model could classify patient having an accuracy of 93.3%. Yan et al. (2020) developed five machine learning algorithms, including logistic regression, support vector machine, gradient boosted decision tree, k-nearest neighbour, and neural network for prediction of critical COVID-19 using immune-inflammatory features at admission in Tongji Hospital, Wuhan. They studied the electronic records of 2,799 patients and tested the models on 29 patients. Finally, they extracted three significant features to distinguish critical patients. Jiang et al. (2020) proposed a data-driven Artificial Intelligence (AI)based algorithm to identify high-risk patients, those with ARDS. Fever and cough were the most common symptoms in their study population (53 individuals in total). They have concluded that elevated haemoglobin, body aches, and alanine aminotransferase (liver enzyme) are the most predictive features to recognize those prone to ARDS. Their model's accuracy varied from 70% to 80% using decision trees, random forests, and support vector machine algorithms. Zoabi and Shomron (2020) used a gradient-boosting algorithm to build a model based on a decision tree for diagnosing COVID-19 patients. They have trained the model with 51,831 tested individuals, including 3,624 positive cases. Their test set consists of 47,401 samples, including 3,624 positive cases. They have used three features (gender, age, and close contact with a positive COVID-19 case) and five clinical symptoms (fever, cough, shortness of breath, headache, and sore throat). Their model is a binary classifier that predicts if the tested person infected by COVID-19 or not. They reported an AUROC of 0.90. Batista et al. (2020) developed a binary classification model with various classic machine learning algorithms to diagnose COVID-19 emergency care patients. In this study, models were trained on 235 adults, including 101 positive COVID-19 cases. The support vector machine obtained the best result algorithm with an accuracy of 85%, the AUROC greater than 0.84, and the sensitivity of 0.68. They also extracted the most predictive features, which were lymphocytes, leukocytes, and eosinophils. Shen et al. (2020) tried to classify pneumonia and COVID-19 by comparative analysis on patient data and distinguishing features of each disease group. However, due to the similarity of patients' clinical characteristics, they tried to find a suitable discriminator. They found that increased monocyte percentage, C-reactive protein, and decreased eosinophil were more common in COVID-19 patients compared to H1N1 patient. However, they were not very robust features to make a reliable classification. Finally, they reported that computed tomography (CT) scanning with nucleic acid detection is an effective and accurate method for detecting COVID-19. Despite of using medical imaging for detecting COVID-19 patients, using various machine learning models will help to investigate more aspects of the problem. Using of these models will improve the result (Esen et al., 2008d , Esen et al., 2008c . Most developed models for diagnosing the COVID-19 virus have used a binary classifier to detect positive and negative cases. All the data-driven models have been developed on classical machine learning algorithms with limited number of parameters which give a better performance on small datasets. (Belkin et al., 2019) . Due to the COVID-19 pandemic and the limitations of medical services, CT imaging is not available to all patients (Shah et al., 2021) . In addition, infection in the early stages of the COVID-19 is not evident in the CT images. To address these limitations, we proposed a deep learning method for fast screening COVID-19 versus H1N1 using clinical symptoms that provide a rapid, accurate result. Since the proposed deep learning models find complex and nonlinear patterns, they can discriminate between COVID-19 and H1N1 with similar symptoms. Also, deep learning models can generalize to larger datasets.

Currently, there is no publicly available dataset including both COVID-19 and H1N1 cases to evaluate the screening models. To address this limitation, we have merged two sets of publicly available data on COVID-19 and H1N1. This dataset can be used in COVID-19 and H1N1 screening researches.

For COVID-19, we use the COVID-19 symptom checker data (Bilal H, 2020) . The cleaned data contains of 5,435 patients including 4,383 positive cases. This data contains information about symptoms of COVID-19 such as 'Breathing Problem', 'Fever', 'Dry Cough', 'Headache', 'Sore Throat', 'Running Nose', and 'Fatigue'. Also, the dataset contains the history of patients' chronic diseases such as 'Asthma', 'Chronic Lung Disease', 'Heart Disease', 'Diabetes', 'Hypertension', and 'Gastrointestinal'. Moreover, patients' behavioural information such as 'Abroad Travel', 'Contact with COVID-19 Patients', 'Attended Large Gathering', 'Visited Public Exposed Places', and 'Family Working in Public Exposed Places' are recorded in this dataset. The attributes' values have been recorded as either 'Yes' or 'No'. Cough' and 'Fever' are the most frequent symptoms in the dataset. We also, have shown the percentage of the chronic diseases for the positive cases only. Among patients with positive COVID-19 test, the hypertension is the most frequent chronic condition. 

The H1N1 data were obtained from the NIAID Influenza Research Database (IRD) (Zhang et al., 2017) . The dataset contains 996 patients including 989 positive H1N1 cases. The dataset includes attributes such as 'Collector Institution', 'Host Identifier', 'Collection Year', 'Country', 'Symptoms', 'Subject Age', and 'Temperature'. Information about age, gender, year of data collection is illustrated in Figure 2 . One of the major aspects of the data which is shown in Figure 2 (a) is the average age of infected patients which is 21 and 11 years for females and males, respectively. This information confirms that the 2009 H1N1 pandemic mostly affected younger individuals. Also, Figure   6 J o u r n a l P r e -p r o o f Journal Pre-proof 2(b) shows that the first wave of H1N1 occurred from 2008 to 2009, while the second wave was from 2013 to 2015. tigue' (Vaillant et al., 2009) . Also, this data contains the percentage of chronic diseases in the study population. Information about the chronic disease has shown in Figure 3 . The figure shows that the asthma and H1N1 virus have a high correlation. Also, 37% of H1N1 patients have asthma. 

We have built the COV-H1N1 dataset by merging the COVID-19 and H1N1 data. Most machine learning algorithms only accept numerical data as input, so input data should be transformed into numerical features. First, filling the missing values is the priority. Based on categorical nature of the features such as symptoms, underlying disease and gender, missing values in each attributes are imputed with the most frequent value (Garcıa et al., 2015) . In the COV-H1N1 dataset, the most percentage of missing values is 2.5% which belongs to 'Dry Cough'. Other attributes have less than 2.5% (in total) missing values. Constant attributes are the type of attributes that contain only one single value. Constant attributes provide no useful information for the screening of the record. Therefore, we remove all the constant attributes from the dataset.

After cleaning the dataset, we apply three different encoding procedures. First, we used a label encoder that converts 'No' values to '0' and 'Yes' values to '1'. Then, we apply the One-hot encoding and target encoding (Rodrıguez et al., 2018) to the cleaned COV-H1N1 dataset. The use of onehot and target encoders has shown promising results in CNNs (Gikunda & Jouandeau, 2019) . The one-hot encoder creates an orthogonal space for the values of attributes. Target encoder converts attributes' values into numerical values according to the average of attributes' value. Figure 4 shows the correlation between the attributes in the COV-H1N1 dataset. Since the correlation between the attributes is within (-0.5,0.5), no feature elimination is required.

The classes of the COV-H1N1 dataset have various densities. Traditional classification algorithms, such as K-Nearest Neighbors (KNN) (Sha'abani et al., 2020), Support Vector Machine (SVM), and decision trees, which perform well in problems with balanced classes, do not necessarily achieve an acceptable performance in imbalanced class problems. One of the solution for reaching to a balanced dataset is to use oversampling.One of the first and simplest over sampling methods is random over sampling methods. (Ghazikhani et al., 2012) The idea behind of the random oversampling is to generate instances in the minority class to reach equality in the class densities. Synthetic Minority Over-sampling Technique (SMOTE) is one of the most popular sampling methods. Many improved oversampling algorithms attempt to retain SMOTE's advantages and reduce the shortcomings. Modified SMOTE (MSMOTE) is a modified version of SMOTE which divides samples of the minority class into three groups (safe, border, and latent noise instances) by calculating distances among all samples (Feng et al., 2018) . The MSMOTE, unlike SMOTE, tries to first indicate noisy samples in the majority class. Then with defining three classes in the minority class, MSMOTE tries to generate new samples for minority class instances that aren't classified as latent noise. Thus, result of using MSMOTE for generating new samples in minority class will lead to more similar samples (label wise) in the minority class than SMOTE. So, Using of MSMOTE rather than MSMOTE will increase balance between minority and majority class more robust. The result of using MSMOTE sampling and oversampling on COV-H1N1 dataset has shown in Figure 5 . As can be seen, the use of sampling methods has changed the density of values in each class. Among the applied methods, the MSMOTE with a safe border strategy at the minority class showed the best result. So, in this research, we use MSMOTE for the balancing the COV-H1N1 dataset.After we reach to desirable dataset, we divide the dataset into three sets: train, validation, and test sets. 

The inception model is a known deep CNN architecture that has an auxiliary path to increase computational efficiency. Here, we propose an Optimized Parallel Inception (OPI) model to screen the records of the created dataset. The structure of the proposed model is shown in Figure 6 . The proposed model, unlike conventional CNNs which extract information from two-dimensional images, use healthcare recording data. The structure of OPI consist of one main and two auxiliary paths. The first 8 layers of all paths are similar.These layers extract primary information from the input data. Inception layers with kernel sizes of 3 and 5 extract the relationship between co-occurring symptoms and comorbidities, while kernel sizes 7 and 9 focus J o u r n a l P r e -p r o o f Journal Pre-proof on extracting relationship of other symptoms and underlying diseases. After these common feature extraction layers, Auxiliary Path 1 tries to classify the instances with the corresponding dense layer. Other two paths have more inception layers in their structure for extracting high-level features. After Inception layers in each path, different structure of fully connected layers has been used for classification. In the main path, 19 different layers including dropout, dense, convolutional, pooling, and inception are used. In the first auxiliary path, we have used a small window for the pooling layer and deployed a dropout layer with a higher probability to avoid overfitting problem. In the second auxiliary path, the opposite anatomy of the first auxiliary path has been developed. Also, we can use different activation functions like the Swish function (Harshanand & Sangaiah, 2020 ) and hyperbolic tangent function for different auxiliary paths. Experimental results show that the use of the Swish activation function for the shortest path helps OPI to reach more consistent performance. In the proposed model, the output of the main and two auxiliary paths enter a competitive layer. The strategy behind this layer is defined as:

• If the difference between the accuracy of the main and two auxiliary paths in the training phase is equal or more than 0.1%, the output of the path with the maximum accuracy will determine the model's decision in the testing phase.

• If the difference between the accuracy of the main and two auxiliary paths in the training phase is less than 0.1%, the model's decision will be determine using the average of the paths' outputs in the testing phase.

Using the above strategy, path or paths with the best individual performances will determine the decision of the model. This strategy is used due to the small difference among the accuracy of the paths.

The OPI uses a modified inception module which is shown in Figure 7 . For better performance, we use the inception layers before deep convolutional layers. We exploit the merit of using one-dimensional CNN with a high kernel size. These layers help us to reduce computational cost and the number of parameters, speeding up the training and improving the generalization. Also, because the healthcare data does not include spatial dimension, these layers help to capture the patterns along the depth dimension. Lastly, each pair of filters ([1 × 1, 7 × 7] and [1 × 1, 9 × 9]) acts like a single, powerful convolutional layer, capable of capturing more complex patterns (Gikunda & Jouandeau, 2019) . Using the modified inception module, we achieve a deeper network without overfitting and the gradient vanishing would not affect the network.

Feature selection using Particle Swarm Optimization (PSO) is another aim of this research for COVID-19 versus H1N1 screening. The PSO is a population-based optimization technique inspired by the motion of bird flocks and schooling fish (Amoozegar & Minaei-Bidgoli, 2018) . In the PSO, the system is initialized with a population of random solutions, and the search for the optimal solution is performed by updating generations. The PSO uses information of each individual and swarm's search space in order to reach the best global minimum of the objective function (Amoozegar & Minaei-Bidgoli, 2018) . One of the reasons for choosing the PSO for the feature selection task is the fast convergence speed .

Assume the location of the i-th particle is P i (t) = (p(i, 1), p(i, 2), ..., p(i, D)), its velocity is V i (t) = (v(i, 1), v(i, 2), ..., v(i, D)), the optimal location found by this particle is L i (t) = (l(i, 1), l(i, 2), , l(i, D)), and the optimal location found by the swarm is G i (t) = (g(i, 1), g(i, 2), , g(i, D)). Then, location and velocity of each particle is updated as below.

V i,j (t+1) = w ·V i,j (t)+r 1 ·c 1 ·(l i,j (t)−P i,j (t))+r 2 ·c 2 ·(G i,j (t)−P i,j (t)) (1)

where t is the iteration times, c 1 and c 2 are two acceleration coefficients, r 1 and r 2 are random values between 0 and 1, and w is an inertia weight of the particle on the fly velocity. The aim is to rank the features based on their significance for COVID-19 and H1N1 screening and select a proper feature set. The PSO algorithm defines the subsets of features randomly based on the optimization policy For the evaluation of each particle, we use the OPI model. The objective function for evaluation of each particle (a subset of features) is determined by the measured accuracy of the OPI model. To increase the overall performance of the OPI, we use a P parameter which is defined as an average accuracy obtained for each path. For better discrimination of each particle, an E parameter which stands for the average mean squared error of three paths is also added to the objective function. So, the objective function, F , is defined as

when N s is the number of selected features, N t is the total number of features which is equal to 18 in the COV-H1N1 dataset, and α is a hyperparameter for specifying the contribution of accuracy and loss in the objective function. We have used a grid search strategy to find the optimum value for α. Chosen range for α parameters is between 0.01 to 0.99. We have mentioned the result of this experiment in Figure 8 . Based this result, α has been specified as 0.9. Also, P and E are the aggregated accuracy and loss of OPI for each PSO subset. The optimization algorithm is presented below. To address the limitation in the available COVID-19 and H1N1 data, we present a semi-supervised data generator model based on Generative Adversarial Networks (GANs) (Pan et al., 2019) . GANs consist of one model for generation and one model for discrimination. The generator model tries to make fake data from noisy data , while the discriminator will decide that if the fake generated data is real or not. In semi-supervised GANs (SGANs), the discriminator also classifies each data into different classes. The main aims of SGAN in our case is to train discriminator for better screening task using supervised loss minimization. The architecture of the proposed SGAN is shown in Figure 9 . Table 3 shows the architecture of the proposed SGAN. The generator is a multi-layer perceptron that takes a noise vector and converts it to a onedimensional vector in which length is equal to the number of features in the dataset. The discriminator is composed of a fully connected layer. Here, we use Leaky Rectified Linear Unit (Leaky ReLU) as the activation function. Using this activation function, the effect of the negative side of input layers J o u r n a l P r e -p r o o f Journal Pre-proof also are considered for prediction. Also, we use batch normalization as a trainable layer which decreases any unwanted interdependence between parameters across layers and speeds up the training process and increases the robustness (Pan et al., 2019) . To avoid over-fitting, a dropout layer is also used. 

For better evaluation of the proposed model, we use k-fold cross-validation. Here, we consider k = 10. A single fold acts as a validation set, while the remaining nine folds are used for training. Finally, the results are averaged to represent a single estimation. We use the Linear Regression (LR), Random Forrest (RF), and Extreme Gradient Boosting (XGBoost) classifiers as the benchmarks (Corbett-Davies & Goel, 2018) . For the evaluation, we use accuracy, precision, recall, F1-score, confusion matrix, and AUROC. For training of the model, we use Categorical Cross-Entropy (CCE) as the loss function and Adaptive Moment Estimation (Adam) as the optimizer.

First, we evaluated the proposed model without using an optimization block. The results of COVID-19 and H1N1 screening using different methods are shown in Table 4 . The result shows the superiority of the proposed model in the screening of COVID-19 versus H1N1 compared to the benchmarks. Also, the accuracies of the top five folds are shown in Figure 10 . As can be seen, both auxiliary paths helped to increase the performance of main path. The average results of 10 folds cross-validations are converged to an accuracy of 98.88%.

Based on the results in Table 4 , the difference between the accuracy and precision of the models is trivial. But based on the measured sensitivity and J o u r n a l P r e -p r o o f Journal Pre-proof specificity, the proposed model performs better than others. Since the task in this research is screening of COVID-19 and H1N1 patients, the model's ability to correctly detect positive patients (Sensitivity) and negative patients (Specificity) is more important. For better observation of the performed screening task, we show the normalized confusion matrix of the proposed model and the benchmarks in Figure 11 . A key point of the proposed model's confusion matrix is the high sensitivity (Recall) for both COVID-19 and H1N1. With such a high sensitivity, we may receive false alarms for COVID-19 or H1N1. However, the model can predict the positive COVID-19 and H1N1 cases with high confidence. The OPI perfectly detects the H1N1 and neither COVID-19 no H1N1 cases, while the random forest algorithms have the best result for COVID-19 detection. For better observation of achieved result in each class, detailed information about precision, sensitivity, specificity and AUROC of each class has been shown in Table 5 . Based on Table 5 detection of H1N1 in patients with proposed OPI is more accurate than COVID-19. Result of achieved specificity in class "Neither COVID-19 Nor H1N1" and class "COVID-19" shows that the detection rate for true negative case in "COVID-19" class is higher than the class of "Neither COVID-19 Nor H1N1". Although, based on achieved sensitivity, the detection rate for true positive case in "Neither COVID-19 Nor H1N1" is much higher than the class of "COVID-19".

J o u r n a l P r e -p r o o f Journal Pre-proof

For achieving an acceptable performance using a limited number of features, we used a PSO-based optimization. We measured the performance of the proposed model by applying the optimization technique. With the increase in the number of features, the accuracy of the classifier increases and the loss of the classifier decreases. However, we can achieve an acceptable performance even with a limited number of features. For this purpose, we have conducted an experiment measuring the performance of the model using various subsets of the features. Figure 12 shows the best results achieved in this experiment. Using the best seven subsets of features, we have calculated a significance score for each feature. Here, the significance score of a feature is the number of times the feature has been used in the best subsets. As it has shown in Figure 12 , one of the most significant feature to detect COVID-19 is the contact with other COVID-19 positive cases. 'Dry Cough' and 'Sore Throat' are also significant symptoms for the screening of COVID-19 versus H1N1. In addition, 'Diabetes' and 'Gastrointestinal' are the significant chronic diseases that can help the proposed model for COVID-19 versus H1N1 screening. However, using 'Dry Cough' and 'Breathing Problems' shows more promising results for reaching the highest accuracy.

Due to the lack of proper data for COVID-19 versus H1N1 screening in the healthcare systems, in this research, we have proposed a semi-supervised GAN (SGAN) model to tackle the issue. To evaluate the model, we randomly select less than 10% of the dataset (400 samples) and see if the model can accomplish the screening task appropriately. We have set a threshold for the accuracy, which is 99.2%, to stop the training. This threshold helps us to solve the problem of convergence which usually happens during GANs training procedure. After 3,396 iterations, the model's accuracy reached the threshold and the training procedure finished. Using the proposed SGAN model, an accuracy of 99.7% achieved with only 400 samples. However, without using SGANs the accuracy of the proposed model could not reach 90% on the small dataset. 

For better comparison between the proposed OPI with similar work for screening COVID-19, we have compared the result of the proposed model with other research. The summary of this comparison has been shown in Table 6 . The comparison result shows that proposed OPI is superior than other models. Also, cough is the most repeated symptoms for detection of COVID-19. RF with proper hyper parameter tuning structure has shown promising result for COVID-19 detection.

With the rage of COVID-19 at the end of 2019, detection of COVID-19 cases all around the world has gathered the attention of researchers. H1N1 is a branch of the influenza family that has similar symptoms to COVID-19. Peak prevalence of the H1N1 virus has been observed from October to April. In this research, we proposed the Optimized Parallel Inception (OPI) model to screen COVID-19 versus H1N1. To evaluate the proposed model, We have built a dataset by merging two publically available COVID-19 and H1N1 data. We proposed a procedure that processes the raw dataset in four J o u r n a l P r e -p r o o f Journal Pre-proof steps: cleaning data, preprocessing, encoding, and balancing. The proposed model shows an accuracy of 98.88% for the screening task. Unlike existing procedures, we proposed a non-invasive screening method using symptoms, history of underlying disease, and social behaviour of each patient. The proposed procedure does not impose a cost on healthcare systems, decrease contact between positive cases and medical staff, and has no side effect.

Further investigation of related symptoms and each virus was conducted. For COVID-19, the most related symptoms to positive COVID-19 are 'dry cough' and 'breathing problem'. The most related underlying disease to positive COVID-19 is 'hypertension' and 'heart disease'. For the H1N1 virus, most related symptoms consist of 'sore throat', 'fever', and 'myalgia'. The most related underlying disease to positive H1N1 are 'asthma' and 'diabetes'. When both datasets are combined into COV-H1N1 dataset, the experiments shown that 'Diabetes' and 'Gastrointestinal' are the most significant chronic disease factors for screening COVID-19 patients from H1N1. Also, using 'dry cough' and 'breathing problems' symptoms have shown promising results. The proposed model is useful to develop an expert system to fast screen patients with precise accuracy and break the sequence chain of coincident COVID-19 and H1N1 waves. For better observation, we used only 400 samples (less than 10% of the dataset) for the screening task using the proposed semi-supervised GAN (SGAN). Even with a lower number of instances, the SGAN successfully achieved 99.7% accuracy. So, we suggest the SGAN model for the case of insufficient H1N1 and COVID-19 samples. Based on Figure 12 , combining OPI with PSO has shown that small subset of features can be used for COVID-19 screening with an accuracy of 98%. SO, proposed OPI with PSO and SGAN can work with small subset of features or instances and outperform similar models of COVID-19 and H1N1 screening. Although, based on Table 6 , OPI is showing superiority compared to other machine learning models, the novelty of this research is to develop an expert hybrid system (OPI with PSO) to use optimum number of features and reach a better result compare to similar models. The OPI screens COVID-19 and H1N1 patients using their symptoms, chronic diseases, and social behavior. In case of asymmetric patients, patients without symptoms, we may still use the model with the chronic diseases and social behavior only as the input feature. However, a drop in the performance is expected.

The fast screening of COVID-19 versus H1N1 is a challenging task that is essential in disease trend monitoring and pandemic management. With the rapid growth in the number of COVID-19 patients, we should equip our healthcare's systems with expert systems to dealing with this pandemic wisely. In this research, we presented the Optimized Parallel Inception (OPI) model as a high-performing machine learning model to screen COVID-19 and H1N1 cases. The proposed model is robust to missing values and makes precise predictions in the presence of imbalance and error in the recorded attributes. The proposed model is state-of-the-art for the detection of COVID-19, H1N1, and neither COVID-19 nor H1N1 cases. The unique 99.6% accuracy for separating H1N1 and neither COVID-19 nor H1N1 cases was achieved. Diabetes and gastrointestinal are the most significant chronic disease indicators of COVID-19 and H1N1. Among the symptoms, dry cough and breathing problems have shown the most effective screening results. Also, a semi-supervised GAN (SGAN) model was presented to deal with the problem of insufficient data reached 99.2% accuracy. Compared to existing works for detection of COVID-19, OPI has shown 2.68% improvement in accuracy using electronics healthcare dataset. The proposed models help the healthcare providers in pandemics by rapid screening and decreasing human interactions. With emerging new variants like omicron or any other contagious variant, the future of this research is to train the proposed model using more data. Also, we will explore how to add more diverse symptoms to the proposed screening systems

Optimizing multi-objective pso based feature selection method using a feature elitism mechanism

Reconciling modern machine-learning practice and the classical bias-variance trade-off

Covid-19 symptoms checker

Early detection of covid-19 in the uk using self-reported symptoms: A largescale, prospective, epidemiological surveillance study. The Lancet Digital Health

The measure and mismeasure of fairness:a critical review of fair machine learning

Covid-19 diagnosis prediction in emergency care patients: A machine learning approach

Modelling and experimental performance analysis of solar-assisted ground source heat pump system

Artificial neural networks and adaptive neuro-fuzzy assessments for ground-coupled heat pump system

Forecasting of a groundcoupled heat pump performance using neural networks with statistical data weighting pre-processing

Modeling a groundcoupled heat pump system by a support vector machine

Modelling a groundcoupled heat pump system using adaptive neuro-fuzzy inference systems

Performance prediction of a ground-coupled heat pump system using artificial neural networks

Predicting performance of a ground-source heat pump system using fuzzy weighted pre-processing-based anfis

Modelling of a new solar air heater through least-squares support vector machines

Class imbalance ensemble learning based on the margin theory

Predicting diabetes second-line therapy initiation in the australian population via time span-guided neural attention network

Pediatric sars, h1n1, mers, evali, and now coronavirus disease (covid-19) pneumonia: What radiologists need to know

Dealing with missing values. Data preprocessing in data mining

Class imbalance handling using wrapper-based random oversampling

State-of-the-art convolutional neural networks for smart farms: A review

Comprehensive analysis of deep learning methodology in classification of leukocytes and enhancement using swish activation units. Mobile networks and applications

Covid-19 patient health prediction using boosted random forest algorithm

Towards an artificial intelligence framework for data-driven prediction of coronavirus clinical severity

H1n1 influenza (swine flu). StatPearls

Machine learning based approaches for detecting covid-19 using clinical text data

Seasonal influenza infections and cardiovascular disease mortality

Recent progress on generative adversarial networks (gans): A survey

Pandemic (h1n1) 2009 influenza

Public health policy and experience of the 2009 h1n1 influenza pandemic in pune, india

Ssp: Early prediction of sepsis using fully connected lstm-cnn model. Computers in biology and medicine

Beyond one-hot encoding: Lower dimensional target embedding

Knn and svm classification for eeg: A review

Diagnosis of covid-19 using ct scan images and deep learning techniques

Comparative analysis of early-stage clinical features between covid-19 and influenza a h1n1 virus pneumonia. Frontiers in public health

Large-scale screening to distinguish between covid-19 and community-acquired pneumonia using infection size-aware classification

Epidemiology of fatal cases associated with pandemic h1n1 influenza

Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images

Prediction of criticality in patients with severe covid-19 infection using three clinical features: A machine learning-based prognostic model with clinical data in wuhan

Influenza research database: An integrated bioinformatics resource for influenza virus research

Advances in merscov vaccines and therapeutics based on the receptor-binding domain

Machine learning-based prediction of covid-19 diagnosis based on symptoms. npj digital medicine

Covid-19 diagnosis prediction by symptoms of tested individuals: A machine learning approach