key: cord-0695155-mbwpcuth authors: Çubukçu, Hikmet Can; Topcu, Deniz İlhan; Bayraktar, Nilüfer; Gülşen, Murat; Sarı, Nuran; Arslan, Ayşe Hande title: Detection of COVID-19 by Machine Learning Using Routine Laboratory Tests date: 2021-11-17 journal: Am J Clin Pathol DOI: 10.1093/ajcp/aqab187 sha: aba7cfbd4c7b2e4903a6a365771985c59ddd7c6d doc_id: 695155 cord_uid: mbwpcuth OBJECTIVES: The present study aimed to develop a clinical decision support tool to assist coronavirus disease 2019 (COVID-19) diagnoses with machine learning (ML) models using routine laboratory test results. METHODS: We developed ML models using laboratory data (n = 1,391) composed of six clinical chemistry (CC) results, 14 CBC parameter results, and results of a severe acute respiratory syndrome coronavirus 2 real-time reverse transcription–polymerase chain reaction as a gold standard method. Four ML algorithms, including random forest (RF), gradient boosting (XGBoost), support vector machine (SVM), and logistic regression, were used to build eight ML models using CBC and a combination of CC and CBC parameters. Performance evaluation was conducted on the test data set and external validation data set from Brazil. RESULTS: The accuracy values of all models ranged from 74% to 91%. The RF model trained from CC and CBC analytes showed the best performance on the present study’s data set (accuracy, 85.3%; sensitivity, 79.6%; specificity, 91.2%). The RF model trained from only CBC parameters detected COVID-19 cases with 82.8% accuracy. The best performance on the external validation data set belonged to the SVM model trained from CC and CBC parameters (accuracy, 91.18%; sensitivity, 100%; specificity, 84.21%). CONCLUSIONS: ML models presented in this study can be used as clinical decision support tools to contribute to physicians’ clinical judgment for COVID-19 diagnoses. | O r i g i n a l a r t i c l e associated with increased mortality risk in patients with COVID-19. Furthermore, higher neutrophil, D-dimer, prothrombin time, alanine aminotransferase (ALT), lactate dehydrogenase (LDH), total bilirubin, high sensitive troponin I, and lower lymphocyte and albumin levels were detected in patients with COVID-19 who were hospitalized in an intensive care unit (ICU). 5 Prothrombin time, fibrin degradation products, and D-dimer were higher in patients who died of COVID-19 pneumonia than in surviving patients. 6 The recommended laboratory tests and their changes during COVID-19 were extensively described by the International Federation of Clinical Chemistry (IFCC) and Laboratory Medicine Taskforce on COVID-19. 7 Although the literature covered which analytes were valuable in diagnosis, monitoring, and estimating prognosis, it did not quantitatively reflect the overall contribution and use of analytes in SARS-CoV-2 detection concerning the accuracy, sensitivity, and specificity. Real-time reverse transcription-polymerase chain reaction (rRT-PCR) is a gold standard method that detects the SARS-CoV-2 RNA. 8 Errors originating from the preanalytical phase, such as improper handling and transportation of specimens, contamination, inadequate sample quality, the presence of PCR inhibitors, and misidentifications, lead to false-negative test results. [9] [10] [11] Thus, a negative test result cannot exclude infection where strong clinical suspicion exists. 12 Machine learning (ML) models using routine laboratory results can provide valuable tools that support clinical decisions for detecting COVID-19 cases. The present study aimed to detect SARS-CoV-2 cases with high accuracy, sensitivity, and specificity with ML models using routine laboratory test results. The present study was approved by the Ministry of Health (form No. 2020-05-28T154351) and the Başkent University Institutional Review Board (project No. KA20-169). FIGURE 1 schematically shows the data collection process and ML model development. The demographic information and laboratory results of patients with COVID-19 detected by rRT-PCR were obtained from the laboratory information system. Data from patients with COVID-19 were collected from April 2020 to November 2020. Only patient data consisting of concomitant CC and CBC test results within a 48-hour window from rRT-PCR testing were selected. CC test parameters comprised total bilirubin, ALT, aspartate aminotransferase (AST), LDH, creatinine, and C-reactive protein (CRP). CBC parameters included eosinophils, monocytes, lymphocytes, red cell distribution width, platelets, mean corpuscular hemoglobin, leukocytes, mean corpuscular volume, mean corpuscular hemoglobin concentration, neutrophils, hemoglobin, hematocrit, basophils, and RBCs. Patients with multiple PCR results were considered follow-up patients, and only the first rRT-PCR results of these patients were included in the study. In line with the Republic of Turkey Ministry of Health's decision, only symptomatic patients were tested by rRT-PCR for SARS-CoV-2. In our hospital, patients with suspected COVID-19 are almost always admitted to the COVID-19 outpatient clinic. Thus, only outpatient cases were included in the present study. The patient data from January 2018 to November 2019 were recruited from the laboratory information system to establish a PCR-negative group. The following exclusion criteria were then used for PCR-positive and PCR-negative groups: inpatients, patients with missing corresponding laboratory test parameters, and patients younger than 18 years and older than 65 years, as shown in FIGURE 1 . The data belonged to patients admitted to Başkent University Ankara Hospital, which is a single-center university hospital in Ankara, Turkey. The present study's data did not include a COVID-19 vaccinated individual. We obtained the public data set from the Israelita Albert Einstein Hospital in São Paulo, Brazil, 13 for external validation of ML models. The São Paulo data set (n = 5,644) included SARS-CoV-2 rRT-PCR results and routine laboratory results of patients admitted to the abovementioned hospital from March 8, 2020, to April 3, 2020. The data set was divided into two subsets comprising concomitant CC and CBC (n = 34) results and only CBC (n = 513) results. The CBC data set had imbalanced SARS-CoV-2 results (positive SARS-CoV-2 results, 75; negative SARS-CoV-2 results, 438). Therefore, we randomly sampled 75 patients' data whose SARS-CoV-2 results were negative. The final CBC data set comprised 150 patients' results (positive SARS-CoV-2 results, 75; negative SARS-CoV-2 results, 75), as shown in FIGURE 1 . Serum AST, ALT, total bilirubin, creatinine, LDH, and CRP levels were measured using the Abbott Alinity c analyzer (Abbott Diagnostics). The following methods were used for CC tests: the enzymatic NADH method without pyridoxal-5′-phosphate for AST and ALT. The IFCC recommended the lactate to pyruvate forward reaction method for LDH, the diazo reaction for total bilirubin, the kinetic alkaline picrate method for creatinine, and the immunoturbidimetric method for CRP. CBC analyses were performed using Abbott CELL DYN Ruby hematology analyzers (Abbott Diagnostics). The rRT-PCR analysis was conducted on a Rotor-Gene Q rRT-PCR quantification system (Qiagen) with a Diagnovital HS SARS-CoV-2 rRT-PCR kit (A1 Life Sciences) that targets ORF1ab and N genes. Internal quality control materials ran every 12 hours at two levels for CC analytes (Technopath Clinical Diagnostics) and every 8 hours at three levels for CBC parameters (Abbott Diagnostics) in our laboratory. Our laboratory also enrolled in a monthly external quality control program (Randox Quality Control) for all analytes. Total error values are given in Supplementary Materials (all supplemental materials can be found at American Journal of Clinical Pathology online). Our laboratory followed health quality standards determined by the Republic of Turkey Ministry of Health. In this study's context, two data sets were formed using patient records. Data set A contained both CC and CBC results, and data set B consisted of only CBC results FIGURE 1 . The data sets' results were used as input variables, with rRT-PCR results the target variable. The Boruta feature selection method was applied to detect the model's noninformative or redundant features. 14 The Boruta method created a corresponding shadow for each attribute, whose values were obtained by shuffling the original attributes' values across properties. Then, the algorithm trained a random forest (RF) model to evaluate the importance of each data set feature. The importance measure of the real feature was compared with a threshold value using z scores. The threshold was determined dynamically using a binomial distribution. Finally, importance was classified into three classes: discard (red), speculative (blue), and keep (green) to identify important features. 14 Every parameter's importance was confirmed, as given in FIGURE 2 . ML methods, including RF, XGBoost, logistic regression, and support vector machine (SVM), were used to predict SARS-CoV-2 results. The RF classifier model was constructed using 200 decision trees, entropy for information gain, and other default parameters based on the scikit-learn 0.24.1 package. 15, 16 The XGBoost classifier model was built using 100 decision trees and other default parameters based on the scikit-learn 0.24.1 package. 16, 17 The support vector classifier 18 and logistic regression models were implemented using default parameters based on the scikit-learn 0.24.1 package. 16 Hyperparameter optimization was not performed for any model. The first ML construction process was to split all data into training and test data sets using 80% and 20% of the overall data, respectively. Then, the data sets were standardized by z score transformation. The training data set was used for ML models' construction and 10-fold cross-validation 19 to assess the proposed models' performance. Finally, model performances were independently evaluated using the test data set with accuracy, sensitivity, specificity, F score values, receiver operating characteristic (ROC) curve analysis, and κ statistics. 20 | O r i g i n a l a r t i c l e performances were assessed on the São Paulo data set using accuracy, sensitivity, specificity, F score values, and ROC curve analysis for external validation. Data preprocessing, implementation of artificial intelligence models, and statistical analyses were conducted using Python 3.7.6 21 and R statistical software 3.6.0. 22 The Python codes of the present study's steps are given in our GitHub account (https:// github.com/hikmetc/COVID-19-AI). The demographic features and laboratory measurements of the present study's population are summarized in TABLE 1 . The PCRpositive and PCR-negative groups were matched by sex; however, the PCR-positive group's median age was lower than that of the PCR-negative group (42 vs 48, P < .001). While there was no statistical difference detected for LDH (186 vs 187 IU/L), the most significant difference among laboratory parameters was the eosinophils count (0.04 vs 0.13 × 10 3 /μL; effect size, 0.62). Furthermore, the COVID-19-positive and COVID-19-negative groups differed in terms of pulmonary and gastrointestinal symptoms TABLE 1 . The São Paulo data set did not include analyzer, sex, race, or clinical information. The data set's laboratory results consisted of standardized values, and the patients' ages were given as quantiles. Therefore, the exact values of the laboratory results and ages were unknown due to the São Paulo data set's inherent characteristics. Positive and negative cases in the São Paulo data set were determined based on rRT-PCR testing. See TABLE 2 for patients' admission information. The ML models' performance for COVID-19 detection on our data set is given in TABLE 3 . Models trained from CBC and CC analytes represented better performance than the models trained from CBC analytes alone. The moderate agreements (Cohen's κ, 0.6-0.7) were observed between all models' predicted results and rRT-PCR results. The ML models' accuracy on the present study's data set ranged from 80% to 85%, and the area under the curve (AUC) values of all models were higher than 0.8. On the other hand, in the external validation, the ML models' accuracies ranged from 74% to 91%. Interestingly, sensitivity values of ML models were higher in the external validation data set than in the present study's data set. Moreover, found sensitivity values were 100% on the external validation data set for all ML models trained from CC and CBC results, as shown in TABLE 4 . The RF model trained from CC and CBC analytes showed the best performance among the models (accuracy, 85.30%; specificity, 91.24%; sensitivity, 79.58%; positive predictive value, 90.40%; AUC, 0.925) on our data set, as shown in TABLE 3 and illustrated in FIGURE 3 . However, in external validation, the SVM model's performance was better than other ML models, as given in | O r i g i n a l a r t i c l e test set accuracy, which demonstrated the models' generalizability. We performed external validation to show our ML models' reproducibility and found that sensitivity values were quite satisfactory, as shown in TABLE 4 . Several ML models using routine laboratory tests have been proposed to support COVID-19 diagnoses. Some models were built using only CBC parameters and others with extended routine laboratory parameters and other clinical features. Cabitza et al 23 reported that the RF model trained from CBC parameters could predict COVID-19 with 76% accuracy, 76% sensitivity, 82% specificity, and an AUC of 0.76. Another RF model trained from CBC parameters proposed by Tschoellitsch et al 24 showed 86% accuracy; however, the positive predictive value was only 20%. Our RF model trained from CBC parameters outperformed the previously reported models and showed 82.8% accuracy, 80.28% sensitivity, 85.4% specificity, and 85.07% positive predictive value (PPV) in the present study's data set and 79.33% accuracy, 90.67% sensitivity, 68% specificity, and 73.91% PPV in the external validation data set. In a study from Joshi et al, 25 the specificity and positive predictive values of logistic regression models using CBC parameters and sex information were lower than 50% and 30%, respectively. In the present study's data, our logistic regression model's specificity and positive predictive values trained from only CBC parameters were 78.10% and 79.45%, respectively, as given in TABLE 3 . The ML models offered by Schwab et al 26 were developed using 106 laboratory tests, demographics, and clinical parameters. In this study, the models' best AUC, sensitivity, and specificity values were 0.66, 75%, and 59%, respectively. Goodman-Meza et al 27 used a combination of seven ML models using age, sex, and broad laboratory test spectrum to predict COVID-19 diagnoses. Although their model's AUC, sensitivity, and specificity values were 0.91, 93%, and 63%, respectively, the positive predictive value was only 29%. More balanced performance characteristics were reported in a study from Yang et al. 28 The XGBoost model, using 27 laboratory test parameters and three demographic features, showed 85% sensitivity, 81% specificity, and 0.854 AUC. Likewise, our study also reached balanced performance characteristics for our data set and external validation data set. Our ML models used 20 CC and CBC and 14 CBC routine laboratory test parameters as input variables to predict a SARS-CoV-2 result. With only 14 CBC parameters, our ML models achieved 79.9% to 82.8% accuracy TABLE 3 . While the highest value of the ML models' sensitivities was 81.67% in the present study's data set, the ML models' sensitivities ranged from 85.33% to 100% in the external validation, as shown in TABLE 4 . Our data set comprised outpatients whose ages were younger than 65 years. On the other hand, the external validation data set included outpatients and patients referred to the regular ward, semi-ICU, and ICU, as given in TABLE 2 . Therefore, even if the exact ages were not available, the external validation data set probably included more severe cases considering admission characteristics. Furthermore, severe cases in the external validation data set might have had more abnormal laboratory results than the present study's outpatients. Thus, it can be suggested that ML models can distinguish true positives from false negatives more efficiently when the population has more severe cases, eventually leading to improved sensitivity. The importance of input parameters in COVID-19 prediction differs among studies. Some studies reported that leukocytes, 24 arterial lactic acid, 26 and LDH 23, 27, 28 were the most important parameters. However, consistent with the present study, Plante et al 29 found eosinophils as the most important feature. Eosinophil count is lower in patients with COVID-19, 30 which is related to the disease's severity. 31 Moreover, it was demonstrated that eosinophils were recruited in lung tissue during severe viral respiratory infections. 32 Similarly, the present study found eosinophils are the most important feature, followed by CRP, as illustrated in the Boruta plot FIGURE 2 . A single routine laboratory test cannot accurately detect a COVID-19 case. 33 However, the combined utility of routine laboratory tests using ML methods can contribute to more accurate COVID-19 diagnoses, as shown in the present study. The turnaround time for routine laboratory tests is relatively shorter than that for rRT-PCR tests. For example, in our laboratory the mean turnaround times for CBC, CC, and rRT-PCR tests were 22 minutes, 41 minutes, and 8 hours, respectively. Furthermore, the total cost of routine laboratory tests is cheaper than rRT-PCR. While routine laboratory parameters' total reagent cost in the present study was only ~$1 per analysis, the rRT-PCR's reagent market cost was $5. Therefore, we can infer that ML models using routine laboratory test results can serve as cheaper and quicker clinical decision support tools for COVID-19 diagnoses. When our ML models were first developed, their performances were evaluated using laboratory results from a single-centered university hospital. A limitation of the study was that our PCRpositive and PCR-negative groups in our data set were not age matched, even though they comprised 18-to 65-year-old adults. In addition, the COVID-19-positive group was sicker than the historic outpatient group regarding pulmonary symptoms, as inferred from TABLE 1 . On the other hand, stomachaches and other symptoms were more prevalent in the control group TABLE 1 . Furthermore, the study patients' comorbidities were not available. Hence, we could not confirm that our PCR-positive and PCR-negative groups were congruent, which may have affected the initial performance evaluation. Second, while our PCR-negative group's data were gathered from January 2018 to November 2019, the PCR-positive group included patients admitted from April 2020 to November 2020. Therefore, our PCR-positive group's data did not include the winter season, which may have confounded the study's outcome. Third, due to the reliance on rRT-PCR results, our study might have omitted false positives that came from rRT-PCR testing. In addition, our data set may have included predominantly mild/moderate cases due to the inclusion of outpatients whose ages were younger than 65 years. On the other hand, patients in the PCR-positive group had at least one symptom during admission. Thus, the present study's data did not include severe and asymptomatic patients with COVID-19. For these reasons, we externally evaluated our ML models' performance on the public data set from the Israelita Albert Einstein Hospital in São Paulo. The São Paulo data set also lacked detailed clinical-demographical characteristics and the exact laboratory results due to the standardized values presented for the sake of anonymity. However, the São Paulo data set included patients referred to the regular ward, semi-ICU, and ICU in addition to outpatients. Thus, we could externally evaluate our ML model's performance on the data set with more severe cases. Nevertheless, our ML models showed a satisfactory performance on the external validation data set. While the present study's data set includes only true negatives in the PCR-negative group, the external validation data set may contain false-negative results due to the reliance on rRT-PCR tests. Hence, the external validation performance should be interpreted with this limitation. Furthermore, ML models presented in this study still need to be validated in asymptomatic cases. The external validation data set used in the present study consisted of 34 and 150 concomitant patient results for CC and CBC and CBC parameters, respectively TABLE 2 . Therefore, external validation studies of the proposed models should be performed on larger data sets to ensure the reliability of the developed ML models. On the other hand, the models' performance could be improved by incorporating additional input variables, such as medical imaging, symptoms, physical examination, vital signs, and increasing sample size. The absence of vaccinated subjects limits the present study. Moreover, none of the recent SARS-CoV-2 variants, such as Beta and Delta, were in circulation when the current study was conducted in Turkey. 34 The external validation data set suffered from the same limitation. Hence, ML models should be validated against vaccinated populations and more recent SARS-CoV-2 variants. The false-negative results from rRT-PCR that originated from preanalytical errors are still an important problem in the fight against the pandemic. 35 We selected our PCR-negative group from prepandemic data for the ML models' training. Thus, the present study's data set was free of the false-negative results that originated from the preanalytical errors, such as inappropriate sampling, and they did not influence the ML models' development. The ML models presented in this study can be used as clinical decision support tools to contribute to physicians' clinical judgments on COVID-19 or direct them to offer repeat rRT-PCR testing in case of preanalytical error suspicions. COVID-19: epidemiology, evolution, and cross-disciplinary perspectives Clinical characteristics of 140 patients infected with SARS-CoV-2 in Wuhan Procalcitonin in patients with severe coronavirus disease 2019 (COVID-19): a meta-analysis Thrombocytopenia is associated with severe coronavirus disease 2019 (COVID-19) infections: a metaanalysis Clinical features of patients infected with 2019 novel coronavirus in Wuhan Abnormal coagulation parameters are associated with poor prognosis in patients with novel coronavirus pneumonia IFCC Taskforce on COVID-19. IFCC interim guidelines on biochemical/hematological monitoring of COVID-19 patients Laboratory testing for 2019 novel coronavirus (2019-nCoV) in suspected human cases: interim guidance Diagnostic strategies for SARS-CoV-2 infection and interpretation of microbiological results Guidelines for laboratory diagnosis of coronavirus disease 2019 (COVID-19) in Korea Potential preanalytical and analytical vulnerabilities in the laboratory diagnosis of coronavirus disease 2019 (COVID-19) The SARS-CoV-2 outbreak: diagnosis, infection prevention, and public perception Diagnosis of COVID-19 and its clinical spectrum Feature selection with the Boruta package Random forests Scikit-learn: machine learning in Python Greedy function approximation: a gradient boosting machine LIBSVM: a library for support vector machines Estimating classification error rate: repeated crossvalidation, repeated hold-out and bootstrap Interrater reliability: the kappa statistic Python 3 Reference Manual R: a language and environment for statistical computing Development, evaluation, and validation of machine learning models for COVID-19 detection based on routine blood tests Machine learning prediction of SARS-CoV-2 polymerase chain reaction results with routine blood tests A predictive tool for identification of SARS-CoV-2 PCR-negative emergency department patients using routine test results Clinical predictive models for COVID-19: systematic study A machine learning algorithm to increase COVID-19 inpatient diagnostic capacity Routine laboratory blood tests predict SARS-CoV-2 infection using machine learning Development and external validation of a machine learning tool to rule out COVID-19 among adults in the emergency department using routine blood tests: a large, multicenter, real-world study Dysregulation of immune response in patients with coronavirus 2019 (COVID-19) in Wuhan, China Hematologic, biochemical and immune biomarker abnormalities associated with severe illness and mortality in coronavirus disease 2019 (COVID-19): a meta-analysis Activated mouse eosinophils protect against lethal respiratory virus infection Routine laboratory testing to determine if a patient has COVID-19 A cross-sectional overview of SARS-CoV-2 genome variations in Turkey False negative results and tolerance limits of SARS-CoV-2 laboratory tests Acknowledgments: We thank Başkent University Faculty of Medicine Clinical Laboratory's personnel for their unseen endeavors in the COVID-19 pandemic.