key: cord-0427224-4zqkrfqx authors: Garcia Islas, E. I.; De Anda Jauregui, G.; Salas Rodriguez, J.; Serrania Soto, F. title: Machine Learning in the analysis of lethality and evolution of infection by the SARS-CoV-2 virus (COVID-19) in workers of the Mexico City Metro date: 2021-11-05 journal: nan DOI: 10.1101/2021.10.27.21265573 sha: b6e57fb46f0efe522cde5a986a1036a36f099a2d doc_id: 427224 cord_uid: 4zqkrfqx In terms of the number of fatalities, Mexico has been one of the countries most affected worldwide by the pandemic. Using different Machine Learning techniques, some of the first cases of the infection registered in Mexico City (CDMX), the geographical and political center of the country, are analyzed in order to determine the causes of lethality and evolution of infection by the SARS-CoV-2 virus, from April 1 to September 27, 2020 in workers of the Capital Metro. The Mexico City Metro is a massive public transport system, a metropolitan train type, which serves In order to reach a better understanding of the origin, dispersion and evolution of the SARS-CoV-2 virus (COVID-19), which causes Severe Acute Respiratory Syndrome, some of the first cases of infection registered in the country are analyzed. In particular, the one that occurred on february 6, 2020 among the workers of the Mexico City Metro (CDMX). 1 Collective Transportation System, Mexico City, Mexico: https://metro.cdmx.gob.mx/operacion/cifras-deoperacion. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 5, 2021. ; https://doi.org/10.1101/2021. 10 .27.21265573 doi: medRxiv preprint NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice. Based on databases analysis with information from the health system of the health care clinics of the Capital Metro, the confirmed cases with the SARS-CoV-2 virus during the period from April 1 to September 27, 2020 are analyzed. In the aforementioned interval, a total of 4.15 consultations were observed, which correspond to the care of 511 workers diagnosed with the SARS-CoV-2 virus, so an average of 8 consultations linked to symptoms due to the described pathology or associated comorbidities are recorded. In relation to patients infected with the SARS-CoV-2 virus, it is observed that of the 511 confirmed cases with the disease, 152 correspond to women between 21-70 years of age ( μ = 44.6,σ = 9.3), 359 to men between 19-73 years of age, while the vulnerable population at risk due to advanced age (60 years and over) included only 19 individuals, that is, 3.7% of the workers analyzed. Of the patients diagnosed with COVID-19, 436 workers (85.32%) received outpatient treatment, while 75 workers (14.68%) required hospitalization, including 53 men, representing 14.76% of the male population and 22 women, 14.47% of the infected female population, with a percentage of deaths of 1.34% for both sexes and an average of 17 days of hospital stay for women and 28 days for the male sex from the patient's admission to the unit health. Patients who required hospitalization were referred for care to two clinics, the first concentrating 70.66% and the second 29.34% of cases, respectively. According to the number of infected by department, administrative unit and level of command, during the aforementioned period, 142 infected workers worked in the Transportation Directorate, 124 in the Fixed Facilities Management, 97 in the Rolling Stock Maintenance Directorate, 60 belonged to the . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Table 1 ). . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted November 5, 2021. Regarding the type of treatment received, patients were treated with retrovirals such as oseltamivir, amantadine, dolutegravir, lopinavir/ritonavir, abacavir/lamividuine, emtricitabine/tenofovir, antiparasitics such as ivermectin, nitazoxanide, quinfamide and metronidazole while the added . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 5, 2021. ; https://doi.org/10.1101/2021.10.27.21265573 doi: medRxiv preprint infections were treated with azithromycin, levofloxacin, trimethoprim + sulfamethoxazole, cefixime and even immunomodulators such as tocilizumab. Of the total staff infected by the SARS-CoV-2 virus, a total of 238 workers at risk for comorbidity associated with Chronic Noncommunicable Diseases (NCDs) were identified, of which 182 reported a history of respiratory diseases (35.61%), 73 hypertension (14.28%), 66 hypercholesterolemia and / or hypertriglyceridemia (12.91%), 59 Diabetes (11.54%), 55 presented obesity (10.76%), 6 heart diseases (1.17%), 6 kidney diseases (1.17%), 3 liver diseases (0.58%), one worker reported immunosuppression (0.19%) and 6 of them reported tobacco use (1.17%). However, the evolution of the virus in outpatients occurred generally uncomplicated. In the case of hospitalized patients, 11 deaths occurred during the period analyzed. When considering cases in age groups of 20 years, it is observed that the mortality rate (case fatality rate) increases progressively as age increases, from 9.70% for the group of 20 to 40 years, 16.50% for the group of 40 to 60 years, and up to 36.8% for the group of 60 to 80 years of age. Figure 1 shows that in the "over 60 years" group, the risk of death is the highest of all age groups. This observation is consistent with the data reported for the general population, both in Mexico and in other countries. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 5, 2021. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It should be noted that, in general, the curves described coincide with the epidemiological dynamics observed in Mexico City, as well as with that reported by federal authorities from the database of the National Epidemiological Surveillance System (SINAVE). In order to determine the lethality and evolution of the disease, the data set described = { 1 , ⋯ , } The objective of this analysis is to determine, independently of the initial characteristics m, which are the values of ′ with greater relevance that allow to discriminate between the results of interest (recoveries-deaths). To this end, classifiers were built to distinguish between classes of interest. The feature selection process consists of choosing the subset of ′ traits of greatest relevance within the existing m's in a dataset, with the aim of classifying an observation into one of the k classes, improving prediction performance, as well as providing a better understanding of the underlying process that gave rise to the data. Among the methodologies and techniques that make the operation of the models described and the selection of traits more efficient, are filter-based methods and enveloping selection methods. The latter use a classifier to evaluate different subsets of traits based on a metric that allows choosing the . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 5, 2021. ; https://doi.org/10.1101/2021.10.27.21265573 doi: medRxiv preprint best representative and use the performance of that characteristic as an evaluation criterion. In this way the most suitable trait for the algorithm is obtained. Filter-based methods, on the other hand, look for statistical relationships between the traits and the target variable and use this score as the basis for selection. (1) , ⋯ , ( ′ ) } that consistently exceed the decision criteria in each iteration, for a g function that assigns the selected entities to the originals. By employing the ′ characteristics considered relevant by the selection process, machine learning classifiers were generated ℎ assigning the characteristics ′ of an example to their corresponding class .. In the present analysis, the following classifiers were analyzed: . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 5, 2021. ; https://doi.org/10.1101/2021.10.27.21265573 doi: medRxiv preprint Random Forest. Classifier based on decision trees, in which the training samples are randomly chosen, as well as the predictors used to perform each random division. NN) . They represent universal mappers, that is, they allow to approximate any function with an arbitrary level of precision. Neural networks present a hierarchical learning model in which each layer projects the inputs in non-linear spaces, possibly of greater dimension. In the last layer, hyperplanes are expected to be able to discriminate between classes. In general, Vector Support Machines allow to find an optimal hyperplane that maximizes class separation. In the case of classes that are not linearly separable, SVM uses the projection of the data to a nonlinear and higher dimensional space, through kernel functions, product point, and nonlinear functions that meet Mercer's criterion as a measure of likelihood. It produces a predictive prototype in the form of a set of decision trees. It constructs the model in a staggered way analogous to other boosting methods and generalizes them allowing the arbitrary optimization of a differentiable loss function. For the present analysis, the selection of characteristics was executed ten times and those traits that were considered relevant in each iteration were retained, using the technique known as Random Forest. From the aforementioned classifiers, a system of selection based on neural networks was defined that takes as input the output of each classifier and generates the medical prognosis. Available data were divided into subsets of training, assessment and testing. To evaluate performance, reception performance characteristics (ROC) and precision recovery curves (PR) were used. It is important to note that the results are provided as area under the curve (AUC) either ROC or PR, and uncertainty is limited by iteration of the above procedure. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 5, 2021. This allows us to affirm that the mortality rate associated with infection by the SARS-CoV-2 virus (case fatality rate) is affected by different risk factors related to traits, comorbidities and pharmacological treatment received. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 5, 2021. ; https://doi.org/10.1101/2021.10.27.21265573 doi: medRxiv preprint Lethality increases progressively as age increases. While the probability of contagion is the same for both sexes, the mortality rate in infected men is almost double the mortality rate in the female population. Thus, if the patient is male, over 60 years of age, the risk of mortality increases with a preponderance of 100.31 and 115.2, respectively, so even when the role of sex in the prevalence of infection by the SARS-CoV-2 virus is unknown, the evolution towards negative pictures of the disease increases significantly in the male population. This allows us to suggest the existence of a factor of hormonal origin that affects the propensity of the male sex to suffer severe effects of the disease, as well as to present a greater risk of lethality. On the other hand, the most important common symptoms related to infection by the SARS-CoV-2 virus are, in order of prevalence: Fever, cephalalgia, odynophagia, attack on the general state Likewise, patients who require endotracheal intubation have a higher risk of lethality than those receiving antiviral or antibiotic treatment, while the lowest risk is located in cases receiving outpatient treatment to which antipyretic is administered in combination with any of the pharmacological treatments described above. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. In relation to the problem of interest of the present study, the researchers have analyzed cases of patients who have survived the SARS-CoV-2 virus to discriminate between patients in general, those seriously ill and deceased. In this context, Yan et al. (Yan, et al., 2020) present an ML method that allows inferring clinical prognosis from three blood markers. Lactic dehydrogenase (LDH, associated with tissue degradation in pulmonary atrophies), lymphocytes (a subtype of white blood cells that are part of the innate immune system) and highly sensitive C-reactive protein (hs-CRP, which increases blood flow during the inflammatory process and the presence of infections). Additionally, researchers have developed AI techniques for the selection of risk factor characteristics and medical prognosis. implemented an ML algorithm to predict the outcome of qRT-PCR tests as well as to determine whether a confirmed positive case will require hospitalization or intensive care. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. From an international research effort that includes the United States and the European Union, (Bertsimas, and others, 2020) a method of mortality risk assessment was developed from XGB-based classifiers, clinical and laboratory predictors. Among the critical risk factors, age, reduced oxygen saturation, high levels of PCRhs, as well as the presence of creatinine, urea nitrogen and blood glucose were found. In the case study, an enveloping feature selector was employed, and unlike the studies cited, the role of regional data was emphasized. The CDMX Metro as a public transport system, intended for the transfer of 5 million 72 thousand 384 daily travelers in one of the largest cities in the world, offers a study scenario that can be used to understand the origin and expansion of the epidemic nationwide. In particular, it seems to explain why Mexico represents one of the countries with the highest mortality from COVID-19 in the world. The sample population reflects an age group with relevant risk, while its figures of contagion, hospitalization and mortality coincide with those of the rest of the population. In this sense, by coinciding the associated risk factors for death and improvement with those applied to the open population, the prediction values thrown by the algorithm can be transferred without risk of ambiguity to the population of the Metro of the CDMX, therefore, the experimental results obtained through this . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 5, 2021. ; https://doi.org/10.1101/2021.10.27.21265573 doi: medRxiv preprint study suggest that the population of the Metro can be considered as a representative group of the origin, dispersion, and evolution of the COVID-19 pandemic in Mexico City. According to the classification algorithm, the lethality and evolution of the disease are affected by 39 risk factors related to traits, comorbidities, as well as the treatment received. The lethality increases progressively as age increases, so it is suggested to evaluate the role of sex in the prevalence of who require endotracheal intubation present a higher risk of lethality than those receiving antiviral or antibiotic treatment, while the lowest risk is located in cases receiving outpatient treatment to which antipyretic is administered in combination with any of the pharmacological treatments described above. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted November 5, 2021. ; https://doi.org/10.1101/2021.10.27.21265573 doi: medRxiv preprint As future work, the analysis of the correlation between characteristics of the population in general and their representativeness in the capital's Metro will continue, in order to support the definition of public policies aimed at reducing the adverse effects of the pandemic in Mexico City. COVID-19 Mortality Risk Assessment: An International Multi-Center Study Applications of Machine Learning and Artificial Intelligence for COVID-19 (SARS-CoV-2) Pandemic: A Review Artificial Intelligence--enabled Rapid Diagnosis of Patients with COVID-19 COVID-19 Machine Learning based Survival Analysis and Discharge Time Likelihood Prediction using Clinical Data Predicting mortality risk in patients with COVID-19 using artificial intelligence to help medical decision-making predCOVID-19: A Systematic Study of Clinical Predictive Models for Coronavirus Disease Drawing Insights from COVID-19-infected Patients using CT Scan Images and Machine Learning Techniques: A Study on Predicting the disease outcome in COVID-19 positive patients through Machine Learning: a retrospective cohort study with Brazilian data An Interpretable Mortality Prediction Model for COVID-19 Patients Aprendizaje automático Voronoi-Based Multi-Robot Autonomous Exploration in Unknown Environments via Deep Reinforcement Learning Computer Science Handbook Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations Aplicaciones de Inteligencia Artificial (IA) para la pandemia de COVID-19". Diabetes y síndrome metabólico: investigación clínica y revisiones A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection" (PDF) Implementing Machine Learning in Health Care Probabilistic Machine Learning International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 5, 2021. ; https://doi.org/10.1101/2021.10.27.21265573 doi: medRxiv preprint