key: cord-0863911-5ozchkd6 authors: Kim, Y. H.; Ko, Y.; Kim, S. Y.; Kim, K. title: How closely is COVID-19 related to HCoV, SARS, and MERS? : Clinical comparison of coronavirus infections and identification of risk factors influencing the COVID-19 severity using common data model (CDM) date: 2020-11-24 journal: nan DOI: 10.1101/2020.11.23.20237487 sha: d48b0cd2b6faaa6a95ab8a78e5a5ea7c39441ce6 doc_id: 863911 cord_uid: 5ozchkd6 South Korea was one of the epicenters for both the 2015 MERS and 2019 COVID-19 outbreaks. However, there has been a lack of published literature, especially using the EMR records, that provides a comparative summary of the prognostic factors present in the coronavirus-derived diseases patients. Therefore, in this study, we aimed to compare and evaluate the distinct clinical traits between the patients of different coronaviruses, including the lesser pathogenic HCoV strains, SARS-CoV, MERS-CoV, and SARS-CoV-2. We also conducted observed the risk factors by the COVID severity to investigate the extent of resemblance in clinical features between the disease groups and to identify unique factor that may influence the prognosis of the COVID-19 patients. Here, we utilize the common data model (CDM), which is the database that houses the EMR records transformed into the common format to be used by the multiple institutions. For the comparative analyses between the disease groups, we used independent t-test, Scheffe post-hoc test, and Games-howell post-hoc test and for the continuous variables, chi-square test and Fisher exact test. Based on the analyses, we selected the variables with p-values less than 0.05 to predict COVID-19 severity by nominal logistic regression with adjustments to age and gender. From the study, we observed diabetes, cardio and cerebrovascular diseases, cancer, pulmonary disease, gastrointestinal disease, and renal disease in all patient groups. Of all, the proportions of cancer patients were highest in all groups with no statistical significance. Most interestingly, we observed a high degree of clinical similarity between the COVID-19 and SARS patients with more than 50% of measured clinical variables to show statistical similarities between two groups. Our research reflects the great significance within the bioinformatics field that we were able to effectively utilize the integrated CDM to reflect real-world challenges in the context of coronavirus. We expect the results from our study to provide clinical insights that can serve as predicator of risk factors from the future coronavirus outbreak as well as the prospective guidelines for the clinical treatments. COVID-19 is a global pandemic that has caused more than a million deaths and nearly 50 million cases since its outbreak 1 . The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), also known as the coronavirus disease 2019 (COVID-19) is a novel respiratory viral disease caused by the SARS-CoV-2 virus that was first discovered in Wuhan, China, in December of 2019 2 . Current evidence suggests the virus transmits via respiratory droplets, the aerosols, from cough, sneeze, speech, and heavy breaths yet we still don't know whether these aerosols persist in the air for a prolonged period [2] [3] . However, it is not the first time the world is experiencing the health threats from the coronaviruses. Coronavirus infections in humans were first reported in the 1960s. They are the RNA viruses that are commonly present in bats and belong to the family Coronaviridae. The family comprises four genera, Alphacoronavirus, Betacoronavirus, Gammacoronavirus and Deltacoronavirus, of which the two, Alphacoronavirus and Betacoronavirus are known to cause respiratory infections in humans. Including the COVID-19, there are seven human coronaviruses, including HCoV229E, HCoV-NL63, HCoV-OC43, HCoV-HKU1, SARS-CoV, and MERS-CoV. While the HCoV strains cause mild upper respiratory diseases, more pathogenic strains including the SARS-CoV, MERS-CoV, and SARS-CoV-2 cause severe respiratory symptoms and complications that may lead to death [4] [5] [6] . South Korea was no exception in facing the pandemic as one of the epicenters for both the 2015 MERS and 2019 COVID-19 outbreaks. However, there has been a lack of published literature that provides a comprehensive and comparative summary of the prognostic factors present in the coronavirus-derived diseases patients in South Korea. In the light of emerging and reemerging viral transmissions, it is utterly important to collect a comprehensive set of data upon which the robust control measures can be established. Therefore, in this study, we aim to compare and evaluate the distinct clinical traits between the patients of different coronaviruses, including the HCoV strains, SARS-CoV, MERS-CoV, and SARS-CoV-2. Further, we aim to conduct clinical characterization of the COVID-19 patients to observe the risk factors by severity of the disease with an attempt to compare them with the identified risk factors of all other coronavirus infections in order to see whether any common or distinct trait observed from the other coronavirus diseases significantly influence the prognosis of the COVID-19 patients. The study utilizes the common data model (CDM), which is the database that houses the EMR records transformed into the common format to be used by the multiple institutions for the research purposes 7 . With the use of real-world data, we expect to present valuable insights on clinical variables of importance on the COVID-19 as well as the degree of resemblance between the coronaviruses. We utilized the CDM within the Seoul National University Hospital located in Seoul, South Korea. The data collection period is from October 15 th of 2004 to July 31 st of 2020. Without any restriction on age and gender, we collected the records of symptoms, comorbidities, and laboratory test results of patients diagnosed with HCoV229E, HCoV-NL63, HCoV-OC43, HCoV-HKU1 (HCoVs), SARS-CoV, MERS-CoV, and SARS-CoV-2 infections. During the data mining process, we categorized all individual diagnoses by the disease types with references to the ICD codes and selected the laboratory measurement variables primarily based on the literature review. We excluded any variable that had null or sparse patient records. We divided the COVID-19 patients by disease severity based on the criteria from the published World Health Organization (WHO) 7 . Among the COVID-19 confirmed patients, those who experienced mild cold-like symptoms with no pneumonia were classified as mild, whereas the patients who showed any additional clinical presentation of pneumonia were classified as non-mild. Continuous variables were compared by the independent t-test, Scheffe post-hoc test in case of homogeneity within the variance, and Games-howell post-hoc test. They were expressed by the mean values with ± standard deviations. For categorical variables, we conducted chisquare test or Fisher's exact test to indicate whether presence of conditions differed across disease groups. Based on the analyses, we selected the variables with p-values less than 0.05 to predict COVID-19 severity by logistic regression with adjustments to age and gender [8] [9] [10] [11] [12] . All statistical analyses were carried using R version 3.6.2. For the study, we collected the records of 2840 COVID patients, 67 MERS patients, 39 SARS patients, and 81 other HCoV positive patients. Table 1 shows the summary of clinical characteristics of the COVID, MERS, SARS, and HCoV patients. Among the COVID patients, the mean age was 51.8 ± 26.0 and 1457 (51.3%) were males. Among the MERS patients, the mean age was 36.9 ± 23.4 and 35 (52.2%) were males whereas among the SARS and HCoV patients, the mean age of patients was 22.8 ± 28.9 and 7.2 ± 11.9 and 27 (69.2%) and 42 (51.9%) were males, respectively. For comorbidities, diabetes, cancer, pulmonary disease, gastrointestinal disease, and renal disease were present in all groups. From the comparison between the COVID and MERS patients, the COVID patients experienced significantly more of the diabetes (4.4% vs. 1.1%, p<0.001), cancer (29.5% vs. 3.0%, p<0.001), and gastrointestinal diseases (14.4% vs. 1.5%, p < 0.01) compared to the MERS group. However, SARS patients, compared to the COVID patients, were observed to experience more cerebrovascular (12.8% vs. 1.1%, p < 0.001), pulmonary (61.5% vs. 8.4%, p < 0.001), and renal conditions (17.9% vs. 6.4%, p < 0.01), whereas the HCoV group had significantly higher proportion of patients with pulmonary diseases (48.1% vs. 8.4%, p < 0.001) and musculoskeletal diseases (9.9% vs. 2.2%, p < 0.001) than the COVID group. For symptoms, patients in all disease groups reported fever, cough, dyspnea, gastrointestinal symptoms, and upper respiratory infections. Compared to the COVID patients, MERS and HCoV patients presented fever (79.1% vs. 33%, p < 0.001 and 60.5% vs. 33%, p < 0.001, respectively) and upper respiratory infections (28.4% vs 0.8%, p < 0.001 and 37% vs 0.8%, p < 0.001) more frequently. Except for dyspnea and sore throat, SARS patients compared to the is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2020. ; 0.47, p < 0.001 and 1.39 ± 2.34 vs. 0.67 ± 0.33, p < 0.001, respectively) compared to the HCoV and MERS patients, whereas the COVID patients had a higher level of albumin compared to the SARS group (3.53 ± 0.74 vs. 2.94 ± 0.83, p < 0.001) but lower level than the MERS group (4.17 ± 0.52 vs. 3.53 ± 0.74, p < 0.001). Within the kidney function, levels of blood urea nitrogen and creatinine were significantly higher among the COVID patients compared to the HCoV group (23.30 ± 20.20 vs. 12.90 ± 11.80, p< 0.001 and 1.33 ± 1.77 vs. 0.58 ± 0.64, p< 0.1, respectively). There were no observed statistical differences between the groups in the coagulation function. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2020. ; https://doi.org/10.1101/2020.11.23.20237487 doi: medRxiv preprint After dividing the COVID patients by the WHO disease severity criteria (ref), there were 2596 patients in the mild group and 159 patients in the non-mild group. Table 3 shows the summary of clinical characteristics of the non-mild and non-mild patients. Within the COVID positive patients, 2596 people belonged to the mild group (91.4%) and 244 (8.6%) belonged to the non-mild group. In the mild group, the mean age was 47.9 ± 25.6 and 1298 were males. In the non-mild group, the mean age was 67.0 ± 21.7 and 159 were males. There were 1567 COVID patients with the comorbidities, of which 1327 and 240 patients belonged to the mild and non-mild group, respectively. By comparison, non-mild COVID patients presented more comorbidities upon their hospital admissions (98.3% vs. 51.1%, p < 0.001). As shown in the Table 3 , patients in the non-mild group reported to have more cerebrovascular disease (4.9% vs. 0.8%, p < 0.001), pulmonary disease (36.9% vs. 5.7%, p < 0.001), renal disease (16.0% vs. 5.5 %, p < 0.001), whereas more patients in the mild-group experienced hepatic disease than the non-mild group (5.0% vs. 0.8 %, p < 0.01). For symptoms, non-mild group had the higher proportion of people reporting dyspnea (20.1% vs. 3.1%, p < 0.001), chest pain (5.7% vs. 1.9%, p < 0.001), and upper respiratory infections (2.5% vs. 0.7%, p < 0.001), whereas the mild group experienced more fever (34.5% vs. 18.0%, p < 0.001) compared to the nonmild group. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2020. ; In order to observe the extent of common clinical characteristics among disease groups that may uniquely affect the prognosis of COVID-19 patients, we conducted nominal logistic regression with the variables that showed statistical significance with p values less than 0.05. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2020. From all disease groups, we observed patients with diabetes, cardio and cerebrovascular diseases, cancer, pulmonary disease, gastrointestinal disease, and renal disease. Among the comorbidities, proportion of cancer patients were highest in all groups with no statistical significance. Interestingly, we observed the most similarities in clinical features between the COVID-19 and SARS patients. Out of 17 conditions including the comorbidities and symptoms, the two groups showed no statistical differences in 12 conditions (71%). Further, within the laboratory findings, both groups presented statistical similarities in 17 out of 19 measurements (89%). Such similarity in clinical characteristics between the SARS-CoV-2 and SARS-CoV-1, which also been supported by Petrosillo et al. 5 , maybe explained by their common ancestor, the bat coronavirus HKU9-1 13 . From both the comparative analyses between the disease groups and COVID-19 severity groups, cerebrovascular disease, hepatic disease, pulmonary disease, and renal disease were showed statistical significance. Thus, when applying those factors along with the selected measurement variables, we saw that cerebrovascular disease, pulmonary disease, renal disease, and increased eosinophil count were associated with the worse prognosis of COVID-19. Studies have found that COVID-19 infection can accelerate the development of cerebrovascular disease; previously published autopsy results of COVID-19 patients showed hyperemic and edematous brain tissue with some degenerated neurons [14] [15] . With such findings, Avula et al. suggested the possibility of hypercoagulation leading to macro and micro thrombi formation in the vessels during the COVID-19 infection 16 . Further, our observations of patients' existing conditions, especially cancer, reflect the susceptibility of immunosuppressed patients. Other coexisting conditions in multiple organs including kidney and GI tract may be attributed to the particular pathway-upon its entry to human body cells, the spike protein of SARS-CoV-2 bind to the angiotensin-converting enzyme 2 (ACE2) receptors, which are also expressed in heart, lungs, . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2020. ; https://doi.org/10.1101/2020.11.23.20237487 doi: medRxiv preprint kidneys, and other organs [17] [18] . Thus, our observations of a wide range of comorbidities among the COVID-19 patients may be explained by the aforementioned mechanism. Our research has a great significance that we effectively utilized the big data from the integrated model, the CDM, within one of the biggest national hospitals in South Korea. CDM research is conducted worldwide for the vitalization of medical research 19 . With the use of commonly formatted EMR records, our research successfully reflects the real-world challenges with the coronavirus. Thus, we expect the results from our study to provide clinical insights that can be used as basis for predicting the prospective clinical representations by yet another coronavirus-derived illness in near future. . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 24, 2020. ; https://doi.org/10.1101/2020.11.23.20237487 doi: medRxiv preprint World Health Organization -WHO Coronavirus Disease (COVID-19) Dashboard [Website The origin and underlying driving forces of the SARS-CoV-2 outbreak Origin and evolution of pathogenic coronaviruses Prognostic Factors for Severe Coronavirus Disease Issues on Coronavirus Disease 2019 (COVID-19) Pathogenesis. Viral Immunology, Online Ahead of Print SARS-CoV, MERS-CoV, and 2019-nCoV viruses: an overview of origin, evolution, and genetic variations COVID-19, SARS and MERS: are they closely related? OHDSI (Observational Health Data Sciences and Informatics): OMOP Common Data Model The Status of Metabolic Control in Patients With Type 2 Diabetes Attending Dasman Diabetes Institute Reassess the t Test: Interact with All Your Data via ANOVA Application of student's ttest, analysis of variance, and covariance Twice upon a time: The progression of canine visceral leishmaniasis in an argentinean city Use of logistic regression for prediction of the fate of Staphylococcus aureus in pasteurized milk in the presence of two lytic phages Evolution of the novel coronavirus from the ongoing Wuhan outbreak and modeling of its spike protein for risk of human transmission Neurologic Manifestations of Hospitalized Patients with Coronavirus Disease Cerebrovascular disease in COVID-19: Is there a higher risk of stroke?. Brain, behavior, & immunity -health COVID-19 presenting as stroke Acute Kidney Injury and Kidney Damage in COVID-19 Patients Single-cell RNA-seq data analysis on the receptor ACE2 expression reveals the potential risk of different human organs vulnerable to 2019-nCoV infection Analysis of antiseizure drug-related adverse reactions from the electronic health record using the common data model