key: cord-0855883-9s60pc6g authors: Kalaycı, Mehmet; Ayyıldız, Hakan; Tuncer, Seda Arslan; Bozdag, Pinar Gundogan; Karlidag, Gulden Eser title: Can laboratory parameters be an alternative to CT and RT‐PCR in the diagnosis of COVID‐19? A machine learning approach date: 2022-01-22 journal: Int J Imaging Syst Technol DOI: 10.1002/ima.22705 sha: 4556a247b56db54160b2914d4e50369f6064aca6 doc_id: 855883 cord_uid: 9s60pc6g In this study, a machine learning‐based decision support system that uses routine laboratory parameters has been proposed in order to increase the diagnostic success in COVID‐19. The main goal of the proposed method was to reduce the number of misdiagnoses in the RT‐PCR and CT scans and to reduce the cost of testing. In this study, we retrospectively reviewed the files of patients who presented to the coronavirus outpatient. The demographic, thoracic CT, and laboratory data of the individuals without any symptoms of the disease, who had negative RT‐PCR test and who had positive RT‐PCR test were analyzed. CT images were classified using hybrid CNN methods to show the superiority of the decision support system using laboratory parameters. Detection of COVID‐19 from CT images achieved an accuracy of 97.56% with the AlexNet‐SVM hybrid method, while COVID‐19 was classified with an accuracy of 97.86% with the proposed method using laboratory parameters. 97% of cases with a confirmed diagnosis by RT-PCR have findings of pneumonia on computed tomography (CT), and CT imaging has a high sensitivity for the diagnosis of COVID-19. 7 RT-PCR analysis is required for a definitive diagnosis; however, false-positive RT-PCR test results are observed at a rate of approximately 20%-30%. 8 The following factors may be related to false-positive test results in infected individuals: (1) Poor quality of swab sample with very little patient material; (2) taking the swab sample at a very early or late stage of infection; (3) improper processing of the swab sample; and (4) technical reasons inherent to the test, such as PCR inhibition or virus mutation. 8, 9 In addition, the effect of the COVID-19 virus on the lungs may not be determined in CT scans performed during the early stages of the disease. Thus, clinicians have great difficulty in detecting the disease if the virus has not yet infected the lungs. Recently, many investigators have started performing data mining on medical data for clinical decision-making and learning personalized medical prediction models. With these studies, new treatments and alternative diagnostic methods have been suggested, helping people to live longer and healthier. Numerous studies have been initiated on artificial intelligence (AI) systems for the diagnosis of COVID-19. 10 , 11 Zali et al. reported that CT scans are safer, faster, and more reliable than RT-PCR for the diagnosis of COVID-19 in regions with a high prevalence of the disease. 12 Kayaaslan et al. investigated the effectiveness of the second RT-PCR test in patients with suspected COVID-19 whose first test was negative. As a result, they reported that the second RT-PCR test was a waste of labor and time, its additional contribution to the first test was very low, and that it was costly. 13 Ai et al. showed that 59% of COVID-19 patients had positive RT-PCR results and 88% had positive thoracic CT scans. 14 Long et al. determined the sensitivity of thoracic CT and RT-PCR for the diagnosis of COVID-19 infection to be 97.2% and 83.3%, respectively. 15 They emphasized that CT scans can be used as a complementary test with the RT-PCR test. Consequently, CT images should be examined to identify suspected patients early and help evaluate the progression of the disease. This is because the sensitivity of thoracic CT is high and the results can be obtained faster, unlike RT-PCR. However, it is often difficult to accurately assess the severity of COVID-19 on CT images because tissues affected by COVID-19 are similar to those affected by other pneumonia factors. There are many convolutional neural network (CNN) models that have been developed to make this distinction in the literature. Pathak et al. detected COVID-19 from CT images with an accuracy of 96.22% using the DTL technique. 16 Zhou et al. proposed the ensemble deep learning model using AlexNet, GoogleNet, and ResNet models. EDL-COVID classifier was evaluated in terms of accuracy, sensitivity, specificity, F value, and Matthews correlation coefficient. It has been reported that the model is more successful than classical deep learning models. Patients with COVID-19 have been classified at a mean rate of 99.05%. 17 Wang et al. created an AI system that automatically analyses CT images and detects COVID-19 and pneumonia. They predicted that the proposed system would reduce the workload of physicians by approximately 30%-40%. A sensitivity value of 0.974 and a specificity value of 0.922 were obtained in the classification of 1136 CT images, 723 of which were positive. 18 Polsinelli et al. proposed a light CNN based on the SqueezeNet model for efficient differentiation of diseases based on COVID-19, pneumonia, and healthy CT images. An accuracy of 85.03% was achieved with the proposed method. Early diagnosis and treatment play a major role in the prevention and gradual reduction of the coronavirus pandemic. Many studies have investigated the laboratory parameters of COVID-19 patients. However, these studies are mostly related to the prognosis and severity of COVID-19. In this study, we aim to develop a machine learning-based decision support system that would allow for an evaluation of routine laboratory parameters requested from almost every patient to detect COVID-19. The RT-PCR test has a low sensitivity and is timeconsuming to perform. The diagnosis of COVID-19 from chest CT images is promising. However, CT scans are negative if the virus does not infect the lungs. Complementary tests are needed to reduce the number of falsepositive results and, in doubtful cases, to replace RT-PCR or CT scans. For these reasons, in this article, we propose a machine learning-based decision support system that enables the evaluation of laboratory parameters requested from almost every patient for the detection of COVID-19. To demonstrate the accuracy of the system, CT images of 220 patients who underwent RT-PCR tests were classified by deep learning methods, and laboratory parameters were classified by machine learning methods. The difference in the method used is due to the difference in the data type. The optimum classification method was chosen for each data type, and both data types belonged to the same individuals. In this study, the results of RT-PCR, CT, and the proposed method were discussed comparatively, and it was concluded that the proposed method would be an alternative, complementary test for RT-PCR or CT. • The use of laboratory parameters and machine learning algorithms in the diagnosis of COVID-19. • An alternative test has been proposed for the diagnosis of COVID-19 in doubtful cases. • RT-PCR test results are available within 4-8 h, and CT scans are costly. The proposed method provides cheaper and more accurate results in a shorter time. • A lower-cost, machine learning-based decision support system has been developed compared with COVID-19 diagnostic methods. • Laboratory parameters were classified using 11 different machine learning methods. The highest accuracy value was 97.86%. Approval for this study was obtained from the Ethics Committee of Firat University (2020/0729). In this study, we retrospectively reviewed the files of patients who presented to the coronavirus outpatient clinic of Elazig Fethi Sekin City Hospital between April and September 2020. Demographic, thoracic CT, and laboratory data (hemogram, urea, creatinine, protein, albumin, AST, ALT, LDH, D-dimer, and CRP) of individuals without any symptoms of the disease whose RT-PCR test was negative (100 controls) or positive (120 patients with COVID-19) were examined. Patients with malignancy and renal insufficiency were not included in the study. Thoracic CT measurements of the patients were performed on Philips Ingenuity 128 CT device. Complete blood count was analyzed on DXH-800, biochemical parameters on Beckman AU-5800, immunoassay parameters on DXI-800, D-dimer levels on Radiometer IQT-90, and CRP levels on Immage-800 device. In this study, a model that enables methods using both thoracic CT images and laboratory parameters to classify COVID-19 was presented. Figure 1 shows the structure of the study. Both CT and laboratory data belonging to the same individuals were used to compare the cost-effectiveness and performance of the proposed method. First, a classification was made using CT data. For this purpose, CT images of 220 patients were classified with SVM and K-NN, the feature vectors in the last pooling layer of Ale-xNet, ResNet50, and GoogleNet architectures, which are among pretrained CNN methods. Figure 2 shows the AlexNet-SVM hybrid CNN model. 19 F I G U R E 1 The proposed system AlexNet is a convolutional neural network model developed by Krizhevsky et al. 20 In this architecture, there are a total of 25 layers (1 data, 5 convolutional, 7 ReLu, 2 normalizations, 3 max-poolings, 3 fully connected, 2 dropouts, 1 softmax, 1 output). AlexNet uses ReLu (Rectified Linear Unit) activation instead of tanh and sigmoid functions in nonlinear parts. A remarkable feature of this architecture is the prevention of overfitting by using dropout. The building blocks of the ResNet50 model, which consists of 50 layers, are residual blocks. The advantage of ResNet models is the use of residual blocks instead of using more layers to improve the performance of deep meshes. The reason for using these blocks is to perform learning faster and increase performance. The learning ability of the layers in the residual block increases the performance. 21 GoogleNet is a complex architecture due to the inception modules in its structure. 22 Consisting of 144 layers, this architecture is generally one of the first CNN architectures to move away from stacking convolution and pooling layers in a sequential structure. Also, this new model has an important place in memory and power usage. Because stacking all the layers and adding lots of filters adds a computational and memory cost and increases the probability of memorization. GoogLeNet has used inception modules connected in parallel to overcome this situation. The process of performing the hybrid CNN models is as follows: Step 1. Obtaining CT images from COVID-19 (À) and COVID-19 (+) patients. Step 2. Obtaining the feature vectors in the last pooling layer of the pretrained models. Step 3. Using the classical machine learning SVM algorithm instead of the classification layer of pretrained models for classification. Step 4. Training of the CNN model based on optimized hyper-parameters. Step 5. Applying the k-fold validation model to prevent overfitting. Step 6. Evaluation of the results. In Step 1, CT images were obtained from the patients. In Step 2, feature vectors of each image were obtained using the pretrained models. A number of convolution, normalization, and pooling processes were applied on each image in the CNN models. Feature vectors of each image were obtained with the help of these steps. Feature vectors obtained from pretrained models were classified with the help of the softmax layer. In the model proposed in Step 3, feature vectors in the last pooling layers of CNN models were classified using SVM instead of softmax. In Step 4, CNN models were trained using the training data set based on optimized hyper-parameters. In Step 5, cross validation was applied five times to overcome the problem of overfitting and to achieve control on errors. The data sets used consisted of training (60%), validation (20%), and test data (20%). Thus, more reliable classification results were obtained and the accuracy of the classification was checked. The last step involved obtaining the classification results and evaluating the results under the supervision of a physician. Since the two types of data used were different, the proposed methods were also different. It was appropriate to use machine learning methods for data types consisting of laboratory parameters. Therefore, COVID-19 patients were evaluated using the more rapid and cost-effective F I G U R E 2 Proposed hybrid convolutional neural network (CNN) model machine learning-based model instead of RT-PCR and CT scans. Figure 3 shows the model that classifies COVID-19 patients using laboratory parameters. First, laboratory test results were obtained from patients and stored in a database. Fifteen laboratory parameters (CRP, D-dimer, urea, creatinine, protein, albumin, AST, ALT, LDH, WBC, platelets, neutrophils, lymphocytes, monocytes, and eosinophils) were used for the classification. After the data were normalized, each record in the database was input into SVM, ANN, and K-NN deep transfer learning (DRL) classification algorithms. K-cross validation was applied five times to prevent overfitting. The training process was completed using 60% training data for this data set. Finally, the accuracy of the model was determined using 20% validation and 20% test data. The proposed model was performed in Matlab package software on a PC with 8 GB RAM i7 9750H processor and GeForce GTX 1050 feature. Performance criteria 23 were sensitivity (Se), specificity (Sp), F-score (F-Scr), and accuracy (Acc) calculated using the confusion matrix obtained from the experimental results. True positive (TP), false positive, true negative (TN), and false negative (FN) values were used to calculate the measurements. Figure 4 shows the confusion matrix and basic parameters. 23 Accuracy is not sufficient by itself, especially for unbalanced data, since it is calculated as the ratio of the areas accurately estimated to the total data set. While sensitivity measures how often a test is accurate for people with the condition being tested, specificity measures the ability of a test to produce an accurate negative result for people who do not have the condition being tested. Therefore, they should be evaluated together. While high specificity helps prevent misunderstandings and avoidable unnecessary interventions (TN), the system must have the capability of high sensitivity (TP), especially in cases of uncertain diagnosis or early disease. The F 1 score uses the harmonic mean instead of the arithmetic mean not to exclude the extreme cases; therefore, the F 1 score is also included in the evaluation metrics. The data set comprised data of 220 individuals, including 100 healthy individuals with routine laboratory parameters and 120 individuals diagnosed with COVID-19. In this section, the classification results of both CT and laboratory data of the 220 subjects were given comparatively. The answer to the question "from which data better diagnostic performance and less costs can be achieved, although the subjects are the same?" has been discussed with objective results. Each piece of data had a different value range, and the demographic and biochemical data from the data set are given in Table 1 . Classification was first made using CT data to discuss the superiority of the proposed method. Cross validation was performed five times in experimental studies. For the classification of CT images, the results of AlexNet, GoogleNet, ResNet50, and the hybrid model created using these models together with SVM and K-NN are shown in Table 2 . F I G U R E 3 Proposed model using laboratory parameters Table 2 shows the results of the Hybrid CNN model obtained using CT data of 220 individuals. For classification, the models AlexNet-SVM, AlexNet-KNN, ResNet50-KNN, ResNet50-SVM, GoogleNet-SVM, and GoogleNet-KNN were used. As a result of the classification, an accuracy of 0.9806, a sensitivity of 0.9865, and a specificity of 97.37% were obtained with AlexNet-KNN. The highest accuracy of classification using CT data was achieved with AlexNet-KNN. While the highest precision was achieved with the AlexNet-SVM model, the highest specificity was achieved with the AlexNet-KNN model. Secondly, the biochemistry parameters of the same patients were analyzed and a classification was made with the decision support system using laboratory parameters. The classifiers ANN, SVM, KNN, and DTL were used in the data set that contained 15 laboratory parameters (CRP, D-dimer, urea, creatinine, protein, albumin, AST, ALT, LDH, WBC, platelets, neutrophils, lymphocytes, monocytes, and eosinophils) and an accuracy value of 97.86%, sensitivity of 97.36%, and specificity of 97.04% was achieved with ANN. When Table 3 was analyzed, the highest accuracy, sensitivity, and specificity values were achieved with ANN. However, to achieve homogeneity and accuracy in the study, five-fold cross validation was applied to ANN, and the values obtained in each fold are shown in Figure 5 . Although both tests appear to be complementary tests, the high rate of misdiagnoses, cost, or the difficulty of distinguishing COVID-19 from pneumonia due to other factors suggest that new tests should be developed. In this study, detection of COVID-19 from thoracic CT images was classified with an accuracy of 97.56% and 97.86% using the AlexNet-SVM hybrid method and the method using laboratory parameters, respectively. Accurate results were obtained at a lower cost and in a shorter time by using laboratory parameters. In this study, it has been concluded that the proposed method is an alternative, complementary test for RT-PCR and CT imaging. Various biochemical changes have been reported in COVID-19 patients. Particularly, lymphopenia is one of the most common laboratory findings detected in complete blood count at the beginning of the infection in COVID-19 patients. 25 Studies have reported that increased WBC, CRP, IL-6, procalcitonin, ferritin, LDH, AST, ALT, D-dimer, and troponin levels, and decreased lymphocyte and platelet levels are associated with the severity of COVID- 19. 4,26-29 This study has also shown that laboratory data contribute significantly to the diagnosis, prognosis, and treatment follow-up of COVID-19. The most important feature that distinguishes this study from other studies in the literature is that the diagnosis made by RT-PCR methods can be supported by machine learning methods using biochemistry parameters in a more cost-effective and rapid manner. In previous studies using various machine learning systems, a diagnosis of COVID-19 was made using CT data, but laboratory data were not analyzed in the same patients, and the diagnoses obtained from the two data types were not compared. In this study, the classifiers ANN, SVM, KNN, and DTL were used for the method proposed as an alternative to the studies conducted in the literature. Since systemic T A B L E 3 Classification results obtained with the decision support system using laboratory parameters parameters are problem-dependent in models such as ANN, it is unknown which of the parameters such as the number of layers of the MLP networks, the number of neural processors in hidden layers, and learning coefficient would give an optimal result. Therefore, these parameters are determined by the trial and error method in a way that will give the best performance. For this reason, a comparison between classifiers cannot be a valid proposal. However, it can be mentioned that an algorithm is more suitable for a specific problem. 30 The fact that the difference between the sensitivity and specificity of decision tree, KNN, and SVM is relatively higher than that of ANN indicates weak generalization performance, that is, weak classification. The high sensitivity and specificity of ANN show good generalization performance and therefore qualify these classifiers to be used as a reliable diagnostic tool. The accuracy rates of SVM, KNN, and decision tree are also lower than that of ANN. For this reason, ANN can be used as a more suitable classifier than others by using laboratory parameters for the diagnosis of COVID-19 (Table 3) . Due to the fact that reasons such as lack of data may cause the generalization properties of the classifiers to be weak, a five-fold cross validation was performed to increase the validity of the classifier ( Figure 5 ). The low variation during cross validation further strengthens the assumption of linearity of the data set. The main reason why data increase methods were not applied in the article is the intention to show that COVID-19 can be detected with laboratory parameters. When the classification made with the hybrid deep learning structure and the classification made with biochemistry parameters are compared, it is observed that the numerical values of the results have very close accuracy levels. However, it has been concluded that classification by biochemistry parameters is much more advantageous, considering the harmful effects of CT on human health. This result shows that the machine learning method created with biochemistry laboratory data will contribute positively to the sensitivity and specificity of other examinations (CT and RT-PCR) in the diagnosis of COVID-19. In addition, considering the probability of FN RT-PCR results and the relatively long duration of the test, the machine learning method created with laboratory data will ensure that people suspected of having COVID-19 are quickly isolated, and the risk of transmission will be greatly reduced. It is crucial to quickly diagnose the disease in order to reduce COVID-19-related mortality and increase the recovery rates. In this study, we investigated whether COVID-19 could be diagnosed using biochemistry parameters as an alternative to CT scans to diagnose COVID-19. While performing this study, we evaluated and compared the results of RT-PCR and CT data. The comparison revealed that ANN was more effective than other classification systems, such as SVM, KNN, and DTL, in the diagnosis of COVID-19 with laboratory parameters. The proposed method can be used as a decision support system for the diagnosis of COVID-19 and can be of help to clinicians as an alternate solution with a different perspective. As a result, in this study, we emphasize that the diagnosis of COVID-19 will be made by analyzing laboratory parameters instead of CT scans for suspected patients with negative RT-PCR test results. It is believed that our method can aid in early diagnosis, thereby providing an opportunity to quickly start treatment during the COVID-19 pandemic. In addition, it is predicted that it will reduce the workload of healthcare workers and the cost of healthcare provision compared with other methods with the help of AI. Coronaviruses and the human airway: a universal system for virus-host interaction studies A new coronavirus associated with human respiratory disease in China Clinical characteristics of coronavirus disease 2019 in China Laboratory abnormalities in patients with COVID-2019 infection The use of laboratory parameters and computed tomography score to determine intensive care unit requirement in COVID-19 six signatories. A role for CT in COVID-19? What data really tell us so far False negative tests for SARS-CoV-2 infection -challenges and implications Reducing false negative PCR test for COVID-19 The potential for artificial intelligence in healthcare Determination of the effect of red blood cell parameters in the discrimination of iron deficiency anemia and beta thalassemia via Neighborhood Component Analysis Feature Selection-Based machine learning Correlation between lowdose chest computed tomography and RT-PCR results for the diagnosis of COVID-19: a report of 27,824 cases in Tehran The additional contribution of second nasopharyngeal PCR to COVID-19 diagnosis in patients with negative initial test. Infectious Diseases Now Correlation of chest CT and RT-PCR testing for coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases Diagnosis of the Coronavirus disease (COVID-19): rRT-PCR or CT? Deep transfer learning based classification model for COVID-19 disease The ensemble deep learning model for novel COVID-19 on CT images AI-assisted CT imaging analysis for COVID-19 screening: building and deploying a medical AI system Classification of lymphocytes, monocytes, eosinophils, and neutrophils on white blood cells using hybrid AlexNet-GoogleNet-SVM ImageNet classification with deep convolutional neural networks Deep residual learning for image recognition Going deeper with convolutions Discrimination of β-thalassemia and iron deficiency anemia through extreme learning machine and regularized extreme learning machine based decision support system Routine blood analysis greatly reduces the false-negative rate of RT-PCR testing for COVID-19 Biochemical biomarkers alterations in Coronavirus Disease 2019 (COVID-19) The critical role of laboratory medicine during coronavirus disease 2019 (COVID-19) and other viral outbreaks Predictors of mortality for patients with COVID-19 pneumonia caused by SARS-CoV-2: a prospective cohort study Biomarkers associated with COVID-19 disease progression Clinical laboratory evaluation of COVID-19 On the comparison of classifiers' performance in emotion classification: critiques and suggestions Can laboratory parameters be an alternative to CT and RT-PCR in the diagnosis of COVID-19? A machine learning approach We thank the staff of Fethi Sekin City hospital who helped to collect the data used in this study. The authors declare no conflicts of interest. Conceptualization, investigation, data curation: Mehmet Kalaycı. Formal analysis, investigation, data curation, writing-original draft preparation: Hakan Ayyıldız. Conceptualization, methodology, software, testing, formal analysis, review and editing: Seda Arslan Tuncer. Data curation, investigation, original draft preparation: Pinar Gundogan Bozdag. Data curation, investigation: Gulden Eser Karlidag. The dataset used for analysis during the current study are available from the corresponding author on reasonable request. Seda Arslan Tuncer https://orcid.org/0000-0001-6472-8306