key: cord-1026443-q51eoc3a authors: Santos Castro, J. D. title: Discrimination of SARS-Cov 2 and arboviruses (DENV, ZIKV and CHIKV) clinical features using machine learning techniques: a fast and inexpensive clinical screening for countries simultaneously affected by both diseases date: 2021-02-01 journal: nan DOI: 10.1101/2021.01.28.21250714 sha: 1c20e5485196fb13074144654d61f4df80466bde doc_id: 1026443 cord_uid: q51eoc3a SARS-Cov-2 (Covid-19) has spread rapidly throughout the world, and especially in tropical countries already affected by outbreaks of arboviruses, such as Dengue, Zika and Chikungunya, and may lead these locations to a collapse of health systems. Thus, the present work aims to develop a methodology using a machine learning algorithm (Support Vector Machine) for the prediction and discrimination of patients affected by Covid-19 and arboviruses (DENV, ZIKV and CHIKV). Clinical data from 204 patients with both Covid-19 and arboviruses obtained from 23 scientific articles and 1 dataset were used. The developed model was able to predict 93.1% of Covid-19 cases and 82.1% of arbovirus cases, with an accuracy of 89.1% and Area under Roc Curve of 95.6%, proving to be effective in prediction and possible screening of these patients, especially those affected by Covid-19, allowing early isolation In December 2019, a series of pneumonia cases of unknown cause emerged among 1 visitors to a wet market in the city of Wuhan, Hubei (China). Genetic sequencing of while arboviruses can cause Leukopenia, Lymphohistiocytosis and mild 36 Lymphocytopenia [19] [20] [21] [22] . 37 Therefore, based on the clinical data of patients affected by Covid-19 or Arboviruses 38 (DENV, ZIKV and CHIKV), this study aims to build a predictive model using machine 39 learning algorithms, in order to screen and diagnose patients in case a possible syndemic 40 state. It is also intended to share clinical data by the medical community about diseases 41 of global circulation, in order to mitigate their impacts. for clinical data related to arboviruses, the terms "Clinical features" OR "White blood 51 cells count" OR "Haemogram" AND "Dengue" OR "Zika" OR "Chikungunya" were 52 used. A Dataset containing clinical data from Brazilian patients Positive for SARS-CoV-53 2 was also added to the data set, made available by Hospital Israelita Albert Einstein [23] . 54 The papers were chosen according to the criteria of: analysis of the abstract, title and body 55 of the text, excluding works that presented only the average of the data obtained for a 56 group of patients or that did not present the patients' blood count. Thus, 23 scientific 57 papers were selected, all of which were written in English. 58 This study was carried out in accordance with the Declaration of Helsinki and in 59 accordance with the terms of local legislation. Before the implementation of the algorithm, the data were normalized (center by mean) 62 and then submitted to the detection of outliers (abnormal data), using covariance 63 assessment. The input data for the training of the algorithm were the White Blood Cells count (WBC) 65 and the Lymphocite count, available in the articles, and the classes used for the prediction 66 . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 1, 2021. ; https://doi.org/10.1101/2021.01.28.21250714 doi: medRxiv preprint were Covid-19 and Arboviruses. The type of algorithm used was SVM with Cost 67 regression (c) = 1.50, Kernel type RBF (g = 1,04) and the following optimization parameters: numerical tolerance = 0.005, without iteration limit. The model was trained using the data and validated using the cross-validation 70 methodology, with k-fold =20. The validation method consists of dividing the data set 71 into k parts, using k-1 parts for modeling (training) and the remaining part for testing. In 72 this model the data set was tested twenty times, each one with a different fold from 73 dataset. The idea behind this type of algorithm is to map the data, transposing it into a high 75 dimensional space in which hyperplanes separate the data into subgroups (clusters) 76 according to their characteristics, allowing the classification problem to be more easily is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 1, 2021. ; https://doi.org/10.1101/2021.01.28.21250714 doi: medRxiv preprint The results of a classification can be represented as a matrix called a confusion matrix, a 104 square matrix (G x G) whose rows and columns represent the experimental and predicted 105 data respectively [49] . 106 The matrix shows that the model was able to predict more than 90% of Covid-19 cases Cases misclassified totaling 10.93% of cases and are presented in Table 3 : is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 1, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 1, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 1, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 1, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 1, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 1, 2021. ; https://doi.org/10.1101/2021.01.28.21250714 doi: medRxiv preprint Monitoramento dos casos de arboviroses urbanas transmitidas pelo Aedes 212 Semanas Epidemiológicas 1 a 13 Doença pelo Coronavírus Thrombocytopenia is associated with severe 221 coronavirus disease 2019 ( COVID-19 ) infections : A meta -analysis Clinical , laboratory 225 and virological data from suspected ZIKV patients in an endemic arbovirus area Dengue and haemophagocytic lymphohistiocytosis Zika Virus Infection after Chikungunya Disease : Infection-Associated Markers from the Acute to the 233 Chronic Phase of Arbovirus-Induced Arthralgia Diagnosis of COVID-19 and its clinical spectrum Coronavirus in a pregnant woman with preterm delivery A 276 unique case of human Zika virus infection in association with severe liver injury 277 and coagulation disorders Australia following a monkey bite in Indonesia First imported case of Zika virus infection 284 into Korea Characteristics 287 of Zika Virus Disease in Children: Clinical, Hematological, and Virological Zika Virus Infection in a Massachusetts Resident After Travel to performance measures Onset Infection with SARS-CoV-2 in 33 Neonates Born to Mothers with 336 COVID-19 in Wuhan, China Clinical course and 340 outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan China: a single-centered, retrospective, observational study The SARS-CoV-2 outbreak: what we know Pan American Health Organization (PAHO), Tool for the diagnosis and care of 346 patients with suspected arboviral diseases Infected Patients: Lessons Learned From the Co-circulation of 352 Zika 355 virus infections, a review Clinical and differential diagnosis: Dengue, chikungunya and Zika