key: cord-0530312-4hhk2opl authors: Narayana, Madhavi Bangalore Shankara; Benevento, Elisabetta; Pegoraro, Marco; Abdullah, Muhammad; Shahid, Rahim Bin; Sajid, Qasim; Mansoor, Muhammad Usman; Aalst, Wil M.P. van der title: A Web-Based Tool for Comparative Process Mining date: 2022-04-01 journal: nan DOI: nan sha: 989c736de884a85d070dae45caa544386edc6220 doc_id: 530312 cord_uid: 4hhk2opl Process mining techniques enable the analysis of a wide variety of processes using event data. Among the available process mining techniques, most consider a single process perspective at a time-in the shape of a model or log. In this paper, we have developed a tool that can compare and visualize the same process under different constraints, allowing to analyze multiple aspects of the process. We describe the architecture, structure and use of the tool, and we provide an open-source full implementation. Many companies today are unaware of why their processes are not performing at the maximum capacity, what are the bottlenecks in their processes, and which are the root causes of such performance issues. Process mining enables early detection of such problems, and allows companies to use it for monitoring and optimizing their process. Comparing the behavior of a process under different circumstances, like time and other parameters, can help in identifying the causes of differences in models and performance. Particularly, comparative process mining aims at finding differences between two processes, and to detect process-related issues. Comparing the behaviour of two processes can help to investigate why one of them performs better in terms of predetermined characteristics or key performance indicators, to extract patterns or additional information, and to improve the understanding of these processes. * Corresponding author. Authors' addresses: Madhavi Bangalore Shankara Narayana, madhavi.shankar@pads. rwth-aachen.de, RWTH Aachen University, Ahornstr. 55, Aachen, Germany, 52074; Elisabetta Benevento, benevento@pads.rwth-aachen.de, RWTH Aachen University, Ahornstr. 55, Aachen, Germany, 52074 and University of Pisa, Pisa, Italy; Marco Pegoraro, pegoraro@pads.rwth-aachen.de; Muhammad Abdullah, muhammad.abdullah1@ rwth-aachen.de; Rahim Bin Shahid, rahim.shahid@rwth-aachen.de; Qasim Sajid, qasim.sajid@rwth-aachen.de; Muhammad Usman Mansoor, usman.mansoor@rwthaachen.de; Wil M.P. van der Aalst, wvdaalst@pads.rwth-aachen.de, RWTH Aachen University, Ahornstr. 55, Aachen, Germany, 52074. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2022 Association for Computing Machinery. XXXX-XXXX/2022/4-ART $15.00 https://doi.org/10.1145/nnnnnnn.nnnnnnn There exist two variants of process comparison: model-based or log-based comparison. The former uses models as input, while the latter considers event logs. Model-based comparison [Cordes et al. 2014; Ivanov et al. 2015; La Rosa et al. 2013 ] is based on the control-flow dimension, and focuses on comparing the similarity of the input models. Model-based comparison techniques start by extracting a model from an event log through process discovery, and then check what activities are present in all models, or absent in one of them. Often, color coding or thickness of arcs and activities is used to highlight (significant) differences. The main drawback of model-based comparison is that the comparison is based mostly on the structure of the input models. It is not possible to analyze other process metrics (e.g., frequency or time statistics). Log-based comparisons do not have such limitation. For example, in [van Beest et al. 2016] , the proposed framework allows comparing the event logs of two processes. Such framework constructs a transition system based on the two-input event logs and provides process metrics. The coloring allows a user to quickly see where the differences of the two input event logs lie. In [van der Aalst et al. 2013] , the authors propose a comparative analysis using process cubes. Each cell in the cube corresponds to a set of events that can be used to discover a process model, to check conformance, or to discover bottlenecks. Slicing, rolling-up, and drilling-down, enable viewing event data from different angles and produce results that can be compared. The purpose of our log-based comparison tool is to derive a way in which one can visualize the business process under certain conditions. Moreover, it also helps in comparing process models generated to find patterns and additional information to have a better understanding of the process. The remainder of this paper is structured as follows. In Section 2, we describe the tool, its functions, and its architecture. Then, Section 3 concludes this paper. The high-level architecture of the tool is shown in Figure 1 . The tool is build on Python 3.8 and Django framework. PM4Py 2.7.7.1 grants the core process mining functionalities, such as import/export of logs and models. We obtain Directly-Follows Graph (DFG) models and visualize them using the G6 library. The tool allows the users to upload and/or select an event log from the root directory of event logs. In the Comparison window, we provide users with the ability to choose a column and its attributes, which will be used to filter the log for variants satisfying the criteria. Users can choose what they wishes to visualize and compare. A common discovery algorithm will be applied to the variants, and once the models are generated, these will be analyzed to identify unique activities and edges. Unique elements in an activity will be highlighted so the user is easily able to identify comparative information. We also provide statistics such as number of traces, cases and average running time of the variants, and will be displayed to the user between the two filtered variants model. There, users can export the complete comparison information as PDF or can choose to export the individual components, like export variants, logs, or DFGs individually. The user may also add more models, although the comparison is limited to two models. The red colored activities in the model signify that these activities are not common between the models. The models can be visualized in performance or frequency metrics. Figure 2 shows the comparison of a given process at two different time intervals. It also highlights the activities that are not common in both the models. The implementation and user manual of the tool are available on Github 1 . A video is also available 2 . We have used a COVID-19 patients log recorded in the context of the COVID-19 Aachen Study (COVAS) [Pegoraro et al. 2021 ] to demonstrate the tool. in this demonstration, we have compared the models in two different time ranges. Figure 2 shows the process model followed in Uniklinikum, Aachen for the first two waves of COVID-19. In this paper, we presented our Comparative Process Mining tool built on the existing PM4Py framework. This tool will enable the research community to compare the behavior of processes in terms of performance or frequency and derive ideal and baseline models. As future work, we plan to enable comparing more than two models. A generic approach for calculating and visualizing differences between process models in multidimensional process mining BPMNDiffViz: A Tool for BPMN Models Comparison Business process model merging: An approach to business process consolidation Analyzing Medical Data with Process Mining: a COVID-19 Case Study Log delta analysis: Interpretable differencing of business process event logs Comparative process mining in education: An approach based on process cubes We acknowledge the ICU4COVID project (funded by European Union's Horizon 2020 under grant agreement n. 101016000) and the COVAS project for our research interactions.