key: cord-0779132-nosrzg2z authors: Mohammadi, Farzaneh; Pourzamani, Hamidreza; Karimi, Hossein; Mohammadi, Maryam; Mohammadi, Mohammad; Ardalan, Nahid; Khoshravesh, Roya; Pooresmaeil, Hassan; Shahabi, Samaneh; Sabahi, Mostafa; Sadat miryonesi, Fatemeh; Najafi, Marzieh; Yavari, Zeynab; Mohammadi, Farideh; Teiri, Hakimeh; Jannati, Mahsa title: Artificial neural network and logistic regression modelling to characterize COVID-19 infected patients in local areas of Iran date: 2021-02-25 journal: Biomed J DOI: 10.1016/j.bj.2021.02.006 sha: 82ebd402b7ab38583a3258b290886feffba87d7f doc_id: 779132 cord_uid: nosrzg2z OBJECTIVES: COVID-19 is an infectious disease that started spreading globally at the end of 2019. Due to differences in patient characteristics and symptoms in different regions, in this research, a comparative study was performed on COVID-19 patients in 6 provinces of Iran. Also, multilayer perceptron (MLP) neural network and Logistic Regression (LR) models were applied for the diagnosis of COVID-19. METHODS: A total of 1043 patients with suspected COVID-19 infection in Iran participated in this study. 29 characteristics, symptoms and underlying disease were obtained from hospitalized patients. Afterwards, we compared the obtained data between confirmed cases. Furthermore, the data was applied for building the ANN and LR models to diagnosis the infected patients by COVID-19. RESULTS: In 750 confirmed patients, Common symptoms were: fever (%) >37.5°C, cough, shortness of breath, fatigue, chills and headache. The most common underlying diseases were: hypertension, diabetes, chronic obstructive pulmonary disease and coronary heart disease. Finally, the accuracy of the ANN model to the diagnosis of COVID-19 infection was higher than the LR model. CONCLUSIONS: The prevalent symptoms and underlying diseases of COVID-19 patients were similar in different provinces, but the incidence of symptoms was significantly different from each other. Also, the study demonstrated that ANN and LR models have a high ability in the diagnosis of COVID-19 infection. In February 2020, the first case of coronavirus was reported in Iran. According to the latest 28 report from the World Health Organization (WHO), the number of cases of coronavirus or infection in the world has reached more than 63,000,000 people and has led to the death of more than 30 1,466,000 people. Among these, more than 948,749 confirmed infected patients and 47,874 deaths are 31 related to Iran (until November 30, 2020) . COVID-19 with SARS and MERS is the third emerging 32 pathogenic coronavirus for humans over the past two decades [1] . 33 The problem that makes the Covid-19 pandemic so complicated is that it's hard to know how 34 the virus will affect any individuals. Most people infected with the Covid-19 will present with few or 35 mild symptoms, others may find themselves relying on a ventilator to breathe, or others die quickly. 36 This makes it difficult to diagnose the disease based on clinical symptoms [2, 3] . In the current 37 situation, early diagnosis of coronavirus infection and timely treatment reduces its complications and 38 spread [4] . Until now, artificial intelligence and logistical regression have been used to diagnose 39 various diseases in many studies [5] [6] [7] . 40 Therefore in this study, we had two main goals; first, we perform a statistical analysis and 41 comparison on the characteristics, symptoms and underlying disease of COVID-19 patients in 6 42 provinces in Iran and investigate if there is a significant difference between them; second, the MLP 43 neural network and logistic regression were used to predict binary responses in COVID-19 infection 44 diagnosis. Afterwards, the ability of the two models was compared with some performance 45 parameters. Finally, external validation was performed to evaluate the generalizability of the newly 46 developed diagnostic models. 47 analyses were performed by non-missing data. The SPSS 26 statistical software was used for analysis, 80 and P-value < 0.05 was considered statistically significant. 81 82 Logistic regression is a statistical regression model for binary dependent variables such as 84 infection or non-infection, disease or health, death or life [8, 9] layers, could classify vectors arbitrarily well, given enough neurons in its hidden layer. In this study, 100 equations 1 -3 were applied for determining the number of neurons in the hidden layer. 101 Where i, o, n h , L, n are the number of inputs neurons, number of outputs neurons, number of hidden 102 layer neurons, number of hidden layer and number of datasets. [11] [12] [13] . The ability and accuracy of the ANN and LR models, which are classifier models, were 112 compared in predicting COVID-19 infected patient using the area under the receiver operating 113 characteristic (ROC) curve. Other performance parameters were estimated using equations 4-6. and negative, respectively [14] . 117 119 Totally 750 of 1023 hospitalized patients was confirmed to have COVID-19 infection, those 120 patients were selected from 12 hospitals from 6 provinces in Iran. The total data are summarized in 121 is a statistically significant difference between groups (P-value<0.05). By comparing one by one, the 150 provinces were divided into two subgroups, Tehran, Kermanshah and Isfahan in the first group and 151 others in another group. 152 Statistical analysis of characteristics, symptoms and underlying disease between 6 provinces 153 showed the statistically significant difference between the groups (P-value< 0.01) except 154 cerebrovascular disease which was generally the least common among other underlying diseases. 155 In each province, the 5 top common symptoms were different, as follow ( From the results of statistical analysis performed in Table 1 and Figures 2 concluded that the prevalent symptoms and underlying diseases of COVID-19 patients were similar in 171 different provinces, but the incidence of symptoms was significantly different from each other. 172 performance. The ANN structure is shown in Fig. 4 based on cross-entropy in training, validation and test steps is visible in Fig. 4 testing group of 153 patients. The confusion matrix for these data was shown in Fig.6 . based on the 205 mentioned parameters, the ANN model was better performance than the LR model. 206 Prediction models tended to perform better on data that models were constructed than on new 207 data. This highlights the importance of external validation. In this research, due to the limitations of 208 internal validation to determine the generalizability of diagnostic prediction models, the external 209 validation was performed [15, 16] . For this purpose, information of 20 patients suspected to COVID-210 19 was collected from a hospital in Yazd province. The data of these patients were considered as new 211 for both diagnostic models. The simulation results were very interesting. As Fig. 7 shows, the ANN 212 model can correctly predict infected and not-infected patients 100%. The LR model also performed 213 very well and only it misdiagnosed one person, in a way that a not-infected patient was diagnosed as 214 infected. Also, For external validation data the AUC, sensitivity, specificity and accuracy of the 215 diagnostic models could be seen in Severe Acute Respiratory Syndrome (SARS-CoV-2) is a new strain of coronavirus that has 246 not been previously identified in humans. Mortality of COVID-19 appears to be higher than influenza 247 and lower than SARS and MERS [17] . This study investigated the characteristics, symptoms and 248 underlying diseases of COVID-19 patients in 6 provinces of Iran and compared them to know if these 249 cases are significantly different. Although the epidemic prediction is essential for applying effective 250 prevention and control of infectious diseases [7] , it has been somewhat neglected in research for 251 COVID-19 by now. Hence, using data obtained from hospitalized suspected COVID-19 patients, the 252 ANN and LR models were developed for diagnostics of COVID-19-infected and not-infected patients. 253 The age of patients was from 1 to 91 years old, and about 17.0% of patients were over 65 years of 254 age. There was no significant difference between male and female at the 0.05 level. 255 Based on this study in Iran, only about 20% of those admitted to hospitals due to are hospitalized, and among them, approximately 8.5% are admitted to the ICU. An average of 9.8% 257 mortality rate was calculated among hospitalized patients, therefore, the total mortality rate would be 258 about 1.96%. In this research, severe symptoms in older, obese and overweight patients were 259 significantly more than other patients. Mortality rates were significantly higher in elderly patients 260 over 65 years old [18] . The mean age of died patients was 66.4±16.7 years (between 22 to 90 years). 261 Also, patients with underlying heart disease might be more likely in the risk of severe infection and 262 The mortality rate in Tehran and Isfahan, industrialized and more populous provinces, was higher 264 than the others. They are often heavily involved in environmental issues such as air pollution and 265 pulmonary and heart diseases have a higher rate in these provinces [19] . 266 regression in the diagnosis of infectious disease [7] . But no studies have compared the abilities of 296 ANN and LR models to predict the COVID-19 infection. 297 In this study, the ANN and LR models were applied to predict and diagnose COVID-19 298 Infection. Then, the ability of models by AUC, sensitivity, specificity and accuracy were compared to 299 classify infected (750) in other studies (50%) both models show similar performance [22] . 308 It should be noted that, in published articles that used mathematical and machine learning 309 models to diagnose Covid-19 patients, either the number of data was much less than this study, or if 310 the data were extensive, the variables evaluated were much less than this study. Xiong et al, 311 investigated Pseudo-likelihood based logistic regression for estimating COVID-19 infection and case 312 fatality rates by gender, race, and age in California. Their model was focused on the gender, race, and 313 age parameters and they have not introduced the symptoms of patients to the model. Their analysis 314 indicates that in California, males had higher infection and case fatality rates across age and race 315 groups. Elderly infected with COVID-19 were at an elevated risk of mortality. LatinX and African 316 Americans had higher infection rates than other race groups [15] . 96.2% accuracy. They expressed that the efficiency of models can be improved by increasing the 324 amount of data. [23] . Patients' laboratory findings were introduced to the model. The total number of cases in this study 327 was 279 (177 confirmed and 102 unconfirmed) [24] . In some other studies, Mathematical and 328 computational models which are epidemiological models have been used to predict the number of 329 cases of COVID-19 and infection rates [25] [26] [27] . 330 The strengths of our study were making full use of demographical and clinical data which is 331 very convenient and easy to obtain to build models to predict the confirmed patients. Our models help 332 make more accurate detection of COVID-19, thus optimizing patient selection for appropriate 333 treatment. In addition, the entry of information from more than a thousand people from different 334 regions has greatly increased the accuracy of the model in COVID-19 detecting. However, this study 335 has some limitations as well, such as some parts of the data received were through self-declaration of 336 participants for determining whether the participants are infected or not with Covid-19. Also it was 337 not possible to follow up some patients until they were discharged from the hospital. 338 The authors would like to thank all the officials, nurses and other people who helped us with this 340 research project. The epidemiology and pathogenesis of coronavirus disease 352 (COVID-19) outbreak Coronavirus Disease 2019 in China Characteristics of and Important Lessons From the Coronavirus 358 COVID-19) Outbreak in China: Summary of a Report of 72 314 Cases From the 359 Chinese Center for Disease Control and Prevention A systematic review on the 362 efficacy and safety of chloroquine for the treatment of COVID-19 Artificial intelligence in diagnosis of obstructive lung 365 disease: Current status and future potential Machine Learning 368 techniques in breast cancer prognosis prediction: A primary evaluation A Review of Epidemic Forecasting Using Artificial 371 A weighted bootstrap approach to logistic regression 373 modelling in identifying risk behaviours associated with sexual activity The occurrence, fate, 376 and distribution of natural and synthetic hormones in different types of wastewater treatment 377 plants in Iran Evaluation of Sampling and Cross-Validation 380 Tuning Strategies for Regional-Scale Machine Learning Classification Artificial neural networks as emulators of 383 process-based models to analyse bathing water quality in estuaries Design and Implementation of Artificial Neural Network System for 386 of Artificial Neural Network Models and Logistic Regression Models Predicting Survival of 391 Neural network and 394 logistic regression diagnostic prediction models for giant cell arteritis: Development and 395 validation Evaluation of the 397 effects of AlkylPhenolic compounds on kinetic parameters in a moving bed biofilm reactor COVID-19, SARS and MERS: 400 are they closely related? Patients Infected with COVID-19: A Descriptive Study Quality of life in cardiovascular patients in iran and factors affecting it: a systematic review Global Epidemiological Investigation of SARS-CoV-2 and SARS-CoV Diseases Using Meta-410 MUMS Tool Through Incidence, Mortality, and Recovery Rates Al Moummar SH. Demographic, clinical, and outcomes of 413 confirmed cases of Middle East Respiratory Syndrome coronavirus Kingdom of Saudi Arabia (KSA) A retrospective record based study A comparison of logistic regression model and artificial neural 417 networks in predicting of student's academic failure Mohi Ud Din M. Machine learning based 420 approaches for detecting COVID-19 using clinical text data Detecting COVID-19 patients based on 423 fuzzy inference engine and Deep Neural Network Modeling and prediction 426 of COVID-19 in Mexico applying mathematical and computational models Pandemic in India using Genetic Programming A Machine Learning-Aided Global Diagnostic 432 and Comparative Tool to Assess Effect of Quarantine Control in COVID-19 Spread J o u r n a l P r e -p r o o f