key: cord-0925628-9zpzfztr authors: Murri, Rita; Masciocchi, Carlotta; Lenkowicz, Jacopo; Fantoni, Massimo; Damiani, Andrea; Marchetti, Antonio; Sergi, Paolo Domenico Angelo; Arcuri, Giovanni; Cesario, Alfredo; Patarnello, Stefano; Antonelli, Massimo; Bellantone, Rocco; Bernabei, Roberto; Boccia, Stefania; Calabresi, Paolo; Cambieri, Andrea; Cauda, Roberto; Colosimo, Cesare; Crea, Filippo; De Maria, Ruggero; De Stefano, Valerio; Franceschi, Francesco; Gasbarrini, Antonio; Landolfi, Raffaele; Parolini, Ornella; Richeldi, Luca; Sanguinetti, Maurizio; Urbani, Andrea; Zega, Maurizio; Scambia, Giovanni; Valentini, Vincenzo title: A REAL-TIME INTEGRATED FRAMEWORK TO SUPPORT CLINICAL DECISION MAKING FOR COVID-19 PATIENTS date: 2022-01-29 journal: Comput Methods Programs Biomed DOI: 10.1016/j.cmpb.2022.106655 sha: fc53a26431836dd2c50197d00b4e602d6919aa07 doc_id: 925628 cord_uid: 9zpzfztr BACKGROUND: : The COVID-19 pandemic affected healthcare systems worldwide. Predictive models developed by Artificial Intelligence (AI) and based on timely, centralized and standardized real world patient data could improve management of COVID-19 to achieve better clinical outcomes. The objectives of this manuscript are to describe the structure and technologies used to construct a COVID-19 Data Mart architecture and to present how a large hospital has tackled the challenge of supporting daily management of COVID-19 pandemic emergency, by creating a strong retrospective knowledge base, a real time environment and integrated information dashboard for daily practice and early identification of critical condition at patient level. This framework is also used as an informative, continuously enriched data lake, which is a base for several on-going predictive studies. METHODS: : The information technology framework for clinical practice and research was described. It was developed using SAS Institute software analytics tool and SAS® Vyia® environment and Open-Source environment R ® and Python ® for fast prototyping and modelling. The included variables and the source extraction procedures were presented. RESULTS: : The Data Mart covers a retrospective cohort of 2634 patients with SARS-CoV-2 infection. People who died were older, had more comorbidities, reported more frequently dyspnea at onset, had higher d-dimer, C-reactive protein and urea nitrogen. The dashboard was developed to support the management of COVID-19 patients at three levels: hospital, single ward and individual care level. INTERPRETATION: : The COVID-19 Data Mart based on integration of a large collection of clinical data and an AI-based integrated framework has been developed, based on a set of automated procedures for data mining and retrieval, transformation and integration, and has been embedded in the clinical practice to help managing daily care. Benefits from the availability of a Data Mart include the opportunity to build predictive models with a machine learning approach to identify undescribed clinical phenotypes and to foster hospital networks. A real-time updated dashboard built from the Data Mart may represent a valid tool for a better knowledge of epidemiological and clinical features of COVID-19, especially when multiple waves are observed, as well as for epidemic and pandemic events of the same nature (e. g. with critical clinical conditions leading to severe pulmonary inflammation). Therefore, we believe the approach presented in this paper may find several applications in comparable situations even at region or state levels. Finally, models predicting the course of future waves or new pandemics could largely benefit from network of DataMarts. HIGHLIGHTS  An unexpected rapid spread of SARS-CoV-2, the agent of the coronavirus disease 2019 (COVID- 19) , had been observed in China since January 2020, which resulted in a worldwide pandemic and a high number of deaths.  A real-time acquisition, centralization, and constant update of a COVID-19 Data Mart with information collected in healthcare systems of patients affected by COVID-19, and the availability of user-oriented data visualization tools, is a valuable source of information to support clinical practice and research on the pandemic.  A detailed description of the structure and technologies used to construct the COVID- 19 Data Mart architecture  Several views are presented to demonstrate how a large hospital had faced the challenge of pandemic emergency by creating a strong retrospective knowledge base, a real-time environment and integrated information dashboard for daily practice and early identification of critical condition at patient level. The COVID-19 pandemic affected healthcare systems worldwide. Predictive models developed by Artificial Intelligence (AI) and based on timely, centralized and standardized real world patient data could improve management of COVID-19 to achieve better clinical outcomes. The objectives of this manuscript are to describe the structure and technologies used to construct a COVID-19 Data Mart architecture and to present how a large hospital has tackled the challenge of supporting daily management of COVID-19 pandemic emergency, by creating a strong retrospective knowledge base, a real time environment and integrated information dashboard for daily practice and early identification of critical condition at patient level. This framework is also used as an informative, continuously enriched data lake, which is a base for several on-going predictive studies. The information technology framework for clinical practice and research was described. It was developed using SAS Institute software analytics tool and SAS® Vyia® environment and Open-Source environment R ® and Python ® for fast prototyping and modelling. The included variables and the source extraction procedures were presented. The Data Mart covers a retrospective cohort of 2634 patients with SARS-CoV-2 infection. People who died were older, had more comorbidities, reported more frequently dyspnea at onset, had higher d-dimer, C-reactive protein and urea nitrogen. The dashboard was developed to support the management of COVID-19 patients at three levels: hospital, single ward and individual care level.  A set of AI-based modelling tools that currently support ongoing research studies from several clinical teams. The framework was designed through the following steps a) the cross-disciplinary group of The framework was developed using SAS Institute software analytics tool and SAS® Vyia® environment and Open-Source environment R ® and Python ® for fast prototyping and modelling. As a result of this process of clinical variable selection and semantic control, the following variable groups have been identified to feed the Data Mart and support clinical activities: demographic, reverse-transcriptase polymerase chain reaction (RT-PCR) nasopharyngeal test for SARS-CoV-2, SARS-CoV-2 serology, respiratory isolated pathogens other than SARS-CoV-2, laboratory parameters at admission and during the hospitalization, comorbidities and treatments before the hospital admission, symptoms, arterial blood gases parameters, respiratory support, therapies for COVID-19, anticoagulant therapies, other drugs, radiology findings, intensive care measures, complications during the hospitalization, length of hospitalization and outcomes (needs of oxygen therapy, mechanical ventilation and death) The detailed description of the data available for each category is available in Table 1 in the Appendix. The selected variables have been extracted from the corresponding data sources through the implementation of a standard extract, transform and load (ETL) procedure. This procedure has made it possible to integrate data from different applications, including data cleaning and standardization to the target structure -a relational database (the COVID-19 Data Mart) with a general structure able to support the daily practice and research activities. Where necessary, the procedures include a transformation step to transform unstructured information into useful structured data. Therefore, this ETL procedures consisted of several components as briefly described below and summarized in Figure 1 . Daily update of the cohort of patients currently hospitalized, including only those patients who have carried out at least one positive RT-PCR nasopharyngeal test for SARS-CoV-2 and who have passed through one of the COVID dedicated wards in the hospital; patients discharged (full recovered or transferred to a pre-discharge structure) or deceased become part of the retrospective cohorts for statistical analysis and research studies. Daily extraction, validation, and Data Mart update for structured clinical variables (e. g. laboratory data) for each hospitalized patient (defined as ETL 1 + ETL 3 in Figure 1 ); baseline data for patients just hospitalized were included in this step. In the case of a structured source, an identification code has been associated with each field. The codes used were referred to national and international standard such as: the International Classification of Disease (ICD) version 9 ICD9, Diagnosis Related Group Classification. Where none of these standards were available, specific coding for hospital legacy applications were used. Daily extraction, transformation, validation, and Data Mart update for unstructured clinical variables (e. g. text extracted from medical reports and converted into structured clinical data) for each patient (defined as ETL 2 + ETL 3 in Figure 1 ). comorbidities. This is built on the base of chained ETL procedures on an incremental basis and includes robust and reliable error management through the creation of a Log file. Once identified the sensitive data associated with each patient (such as hospital code and hospitalization code), the cryptographic hash function MD5 has been used for each identification code. The conversion table is saved into a dedicated and safe area for separation of duty and privacy reasons. A simplified view of the relational database generated is shown as an example in Figure 2 . Statistical differences between died and survived patients was evaluated by Pearson's chisquare for categorical variables and Mann-Whitney test for the numerical ones. The Fondazione Policlinico Universitario A. Gemelli IRCCS Institutional Review Board approved the study protocol (IRB 3447). As of January 5 th 2022 , the Data Mart covers a retrospective cohort of 5528 patients with SARS-CoV-2 infection. We excluded from the analysis 210 who were currently being hospitalized and under care. Leveraging the wealth of information available from the data mart updated on daily basis, we are currently analyzing in detail a variety of correlation index which are pre-requisite to build accurate predictors for critical outcome. With the COVID-19 Data Mart online, the team has developed an extensive library of visualization dashboards, using SAS® Vyia® functionalities, to enable information at bedside for the clinical teams engaged in the daily care of infected patients. These dashboards cover cumulative views for hospital management level, ward management and patient care level. A conceptual view on how this is exploited is shown in Figure 3 . length of stay, current P/F and the P/F trend. In dashboard 6C a detail for a specific patient is provided. This includes the history and the trend of both P/F and fever along the hospitalization period; result of RT-PCR nasopharyngeal test; features of most recent chest X-ray or chest computed tomography (CT). The burden of the COVID-19 pandemic on the daily life of people and the impact on healthcare systems all over the world are still impressive. The high number of people with COVID-19 admitted to emergency rooms, general wards and ICUs critically stressed hospitals 9 . Preparedness for the pandemic has been largely suboptimal 10 . In particular, the onset of several multiple waves of pandemic, with continuously mutable condition, is an additional challenge that requires flexible and comprehensive tools for data analysis and understanding. In fact, clinical and epidemiological data may be significantly different among patients from the different waves and therefore healthcare needs may vary even in very short time. Among strategies to respond to a pandemic such as that caused by SARS-CoV-2, we experienced the need to evolve from manual data sharing towards building a health data infrastructure (the so-called "health data superhighway") 11 that facilitates automatic, interoperable data exchange and use. The potential insights provided from a very comprehensive Data Mart integrating clinical data and RWD for COVID-19 patients along with the whole natural history of the disease combined with AI methods is significant. We necessary. An external and international ontology validation is mandatory and finally this workflow has only been implemented and used by a single hospital. In conclusion, a large Data Mart, including numerous structured and unstructured variables, gives the opportunity to realize a real-world, readily available, interactive dashboard and to build sophisticated and advanced predictive models 24 . Networks and pan-cohorts promoting collaboration across health centers, disciplines, and institutions represent crucial instruments to respond to pandemics or global health events. Therefore, complex integration of a large volume of clinical, radiological and laboratory data in an advanced architecture could be useful to quickly and reliably test new predictive models or therapeutic agents active against SARS-CoV-2 or innovative regimens. The experience that can be produced by the application and exploitation of a COVID DataMart can be paradigmatic for a wider application such as that of an entire region or state. Lastly, models predicting the course of future waves or new pandemics could largely benefit from dataMart networks like the one presented here. Riabilitativo Aziendale del Policlinico Gemelli). We wish to thank Franziska Lohmeyer for her English language assistance. We also want to acknowledge the professional services support of SAS Analytics ® team who was instrumental for the Data Mart build, automated procedures, quality assurance and semantic consistency process. Vitamin D, ng/mL 17 <0.01 WHO Director-General's opening remarks at the media briefing on COVID19 -March 11 th 2020 2 COVID-19 and Italy: what next? Priorities for the US Health Community Responding to COVID-19 Emergency department triage prediction of clinical outcomes using machine learning models Effect of a sepsis prediction algorithm on patient mortality, length of stay and readmission: a prospective multicentre clinical outcomes evaluation of real-world patient data from US hospitals Rapid Learning health care in oncology-an approach towards decision support systems enabling customised radiotherapy An umbrella protocol for standardized data collection (SDC) in rectal cancer: a prospective uniform naming and procedure convention to support personalized medicine Characteristics of and Important Lessons from the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72 314 Cases from the Chinese Center for Disease Control and Prevention Council of State and Territorial Epidemiologists. Driving Public Health in the Fast Lane: The Urgent Need for a 21st Century Data Superhighway Prognostic value of interleukin-6, C-reactive protein, and procalcitonin in patients with COVID-19 Risk Factors of Fatal Outcome in Hospitalized Subjects With Coronavirus Disease 2019 From a Nationwide Analysis in China Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study Intensivists' Preferences for Patient Admission to ICU: Evidence From a Choice Experiment Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study Baseline Characteristics and Outcomes of 1591 Patients Infected With SARS-CoV-2 Admitted to ICUs of the Lombardy Region, Italy Derivation, Validation, and Potential Treatment Implications of Novel Clinical Phenotypes for Sepsis A machine-learning parsimonious multivariable predictive model of mortality risk in patients with Covid-19 Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning Real-time data analysis using a machine learning model significantly improves prediction of successful vaginal deliveries Building an Artificial Intelligence Laboratory Based on Real World Data: The Experience of Gemelli Generator