key: cord-0191414-6wubl1pb authors: Pegoraro, Marco; Narayana, Madhavi Bangalore Shankara; Benevento, Elisabetta; Aalst, Wil M.P. van der; Martin, Lukas; Marx, Gernot title: Analyzing Medical Data with Process Mining: a COVID-19 Case Study date: 2022-02-08 journal: nan DOI: nan sha: 8736e3ca62922e71d896f71d1ab379d46254a99f doc_id: 191414 cord_uid: 6wubl1pb The recent increase in the availability of medical data, possible through automation and digitization of medical equipment, has enabled more accurate and complete analysis on patients' medical data through many branches of data science. In particular, medical records that include timestamps showing the history of a patient have enabled the representation of medical information as sequences of events, effectively allowing to perform process mining analyses. In this paper, we will present some preliminary findings obtained with established process mining techniques in regard of the medical data of patients of the Uniklinik Aachen hospital affected by the recent epidemic of COVID-19. We show that process mining techniques are able to reconstruct a model of the ICU treatments for COVID patients. The widespread adoption of Hospital Information Systems (HISs) and Electronic Health Records (EHRs), together with the recent Information Technology (IT) advancements, including e.g. cloud platforms, smart technologies, and wearable sensors, are allowing hospitals to measure and record an ever-growing volume and variety of patient-and process-related data [7] . This trend is making the most innovative and advanced data-driven techniques more applicable to process analysis and improvement of healthcare organizations [5] . Particularly, process mining has emerged as a suitable approach to analyze, discover, improve and manage real-life and complex processes, by extracting knowledge from event logs [2] . Indeed, healthcare processes are recognized to be complex, flexible, multidisciplinary and ad-hoc, and, thus, they are difficult to manage and analyze with traditional model-driven techniques [9] . Process mining is widely used to devise insightful models describing the flow from different perspectives-e.g., control-flow, data, performance, and organizational. On the grounds of being both highly contagious and deadly, COVID-19 has been the subject of intense research efforts of a large part of the international research community. Data scientists have partaken in this scientific work, and a great number of articles have now been published on the analysis of medical and logistic information related to COVID-19. In terms of raw data, numerous openly accessible datasets exist. Efforts are ongoing to catalog and unify such datasets [6] . A wealth of approaches based on data analytics are now available for descriptive, predictive, and prescriptive analytics, in regard to objectives such as measuring effectiveness of early response [8] , inferring the speed and extent of infections [3, 10] , and predicting diagnosis and prognosis [11] . However, the process perspective of datasets related to the COVID-19 pandemic has, thus far, received little attention from the scientific community. The aim of this work-in-progress paper is to exploit process mining techniques to model and analyze the care process for COVID-19 patients, treated at the Intensive Care Unit (ICU) ward of the Uniklinik Aachen hospital in Germany. In doing so, we use a real-life dataset, extracted from the ICU information system. More in detail, we discover the patient-flows for COVID-19 patients, we extract useful insights into resource consumption, we compare the process models based on data from the two COVID waves, and we analyze their performance. The analysis was carried out with the collaboration of the ICU medical staff. The remainder of the paper is structured as follows. Section 2 describes the COVID-19 event log subject of our analysis. Section 3 reports insights from preliminary process mining analysis results. Lastly, Section 4 concludes the paper and describes our roadmap for future work. The dataset subject of our study records information about COVID-19 patients monitored in the context of the COVID-19 Aachen Study (COVAS). The log contains event information regarding COVID-19 patients admitted to the Uniklinik Aachen hospital between February 2020 and December 2020. The dataset includes 216 cases, of which 196 are complete cases (for which the patient has been discharged either dead or alive) and 20 ongoing cases (partial process traces) under treatment in the COVID unit at the time of exporting the data. The dataset records 1645 events in total, resulting in an average of 7.6 events recorded per each admission. The cases recorded in the log belong to 65 different variants, with distinct event flows. The events are labeled with the executed activity; the log includes 14 distinct activities. Figure 1 shows a dotted chart of the event log. In this section, we illustrate the preliminary results obtained through a detailed process mining-based analysis of the COVAS dataset. More specifically, we elaborate on results based on control-flow and performance perspectives. Firstly, we present a process model extracted from the event data of the COVAS event log. Among several process discovery algorithms in literature [2], we applied the Interactive Process Discovery (IPD) technique [4] to extract the patient-flows for COVAS patients, obtaining a model in the form of a Petri net (Figure 2) . IPD allows to incorporate domain knowledge into the discovery of process models, leading to improved and more trustworthy process models. This approach is particularly useful in healthcare contexts, where physicians have a tacit domain knowledge, which is difficult to elicit but highly valuable for the comprehensibility of the process models. The discovered process map allows to obtain operational knowledge about the structure of the process and the main patient-flows. Specifically, the analysis reveals that COVID-19 patients are characterized by a quite homogeneous high-level behavior, but several variants exist due to the possibility of a ICU admission or to the different outcomes of the process. More in detail, after the hospitalization and the onset of first symptoms, if present, each patient may be subject to both oxygen therapy and eventually ICU pathway, with subsequent ventilation and ECMO activities, until the end of the symptoms. Once conditions improve, patients may be discharged or transferred to another ward. We evaluated the quality of the obtained process model through conformance checking [2] . Specifically, we measured the token-based replay fitness between the Petri net and the event log, obtaining a value of 98%. This is a strong indication of both a high level of compliance in the process (the flow of events does not deviate from the intended behavior) and a high reliability of the methodologies employed in data recording and extraction (very few deviations in the event log also imply very few missing events and a low amount of noise in the dataset). From the information stored in the event log, it is also possible to gain insights regarding the time performance of each activity and the resource consumption. For example, Figure 3 shows the rate of utilization of ventilation machines. This information may help hospital managers to manage and allocate resources, especially the critical or shared ones, more efficiently. Finally, with the aid of the process mining tool Everflow [1] , we investigated different patient-flows, with respect to the first wave (until the end of June 2020) and second wave (from July 2020 onward) of the COVID-19 pandemic, and evaluated their performance perspective, which is shown in respectively. The first wave involves 133 cases with an average case duration of 33 days and 6 hours; the second wave includes 63 patients, with an average case duration of 23 days and 1 hour. The difference in average case duration is significant, and could have been due to the medics being more skilled and prepared in treating COVID cases, as well as a lower amount of simultaneous admission on average in the second wave. In this preliminary paper, we show some techniques to inspect hospitalization event data related to the COVID-19 pandemic. The application of process mining to COVID event data appears to lead to insights related to the development of the disease, to the efficiency in managing the effects of the pandemic, and in the optimal usage of medical equipment in the treatment of COVID patients in critical conditions. We show a normative model obtained with the aid of IPD for the operations at the COVID unit of the Uniklinik Aachen hospital, showing a high reliability of the data recording methods in the ICU facilities. Among the ongoing research on COVID event data, a prominent future development certainly consists in performing comparative analyses between datasets and event logs geographically and temporally diverse. By inspecting differences only detectable with process science techniques (e.g., deviations on the controlflow perspective), novel insights can be obtained on aspects of the pandemic such as spread, effectiveness of different crisis responses, and long-term impact on the population. Everflow Process Mining Process Mining: Data Science in Action Data-based analysis, modelling and forecasting of the COVID-19 outbreak Interactive data-driven process model construction A review of the literature on big data analytics in healthcare COVID-19 data hub A Big Data-driven Model for the Optimization of Healthcare Processes Suppression of a SARS-CoV-2 outbreak in the Italian municipality of Vo' Process mining in healthcare: evaluating and exploiting operational healthcare processes Modeling and forecasting the COVID-19 pandemic in India Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal