key: cord-0511273-s4iyt13j
authors: Augusto, Adriano; Deitz, Timothy; Faux, Noel; Manski-Nankervis, Jo-Anne; Capurro, Daniel
title: Process Mining-Driven Analysis of the COVID19 Impact on the Vaccinations of Victorian Patients
date: 2021-12-09
journal: nan
DOI: nan
sha: 68a07ee20a683086935b767735b624fd05925421
doc_id: 511273
cord_uid: s4iyt13j

Process mining is a discipline sitting between data mining and process science, whose goal is to provide theoretical methods and software tools to analyse process execution data, known as event logs. Although process mining was originally conceived to facilitate business process management activities, research studies have shown the benefit of leveraging process mining tools in different contexts, including healthcare. However, applying process mining tools to analyse healthcare process execution data is not straightforward. In this paper, we report the analysis of an event log recording more than 30 million events capturing the general practice healthcare processes of more than one million patients in Victoria--Australia--over five years. Our analysis allowed us to understand benefits and limitations of the state-of-the-art process mining techniques when dealing with highly variable processes and large data-sets. While we provide solutions to the identified limitations, the overarching goal of this study was to detect differences between the patients` health services utilization pattern observed in 2020--during the COVID-19 pandemic and mandatory lock-downs --and the one observed in the prior four years, 2016 to 2019. By using a combination of process mining techniques and traditional data mining, we were able to demonstrate that vaccinations in Victoria did not drop drastically--as other interactions did. On the contrary, we observed a surge of influenza and pneumococcus vaccinations in 2020, contradicting research findings of similar studies conducted in different geographical areas.

The discipline of Process Mining [1] was born with the goal to design automated data analysis techniques that could support the phases of the business process management lifecycle [2] , especially those phases where the data analysis plays a central role, e.g., process discovery and process monitoring. Over the past two decades, research in the area of process mining has generated a number of methodologies and software tools (henceforth, process mining techniques). Process mining techniques usually require process execution data, which is known as event logs. It is possible to distinguish two major families of process mining techniques [2] : operational techniques and tactical techniques. The former family encompasses techniques whose goal is to generate insights in real-time during the process execution, e.g., estimating the probability of a negative event to happen; or the likelihood of a specific process outcome. The latter family encompasses techniques whose goal is to help analysts to discover, analyse, and periodically monitor the process execution in order to understand how the process is performed, what are its weaknesses, and how the process can be improved. Two of the most popular tactical process mining techniques, which we will refer throughout this study, are: i) automated process discovery -which allows to automatically discover a process model from event logs; ii) variant analysis -which facilitates the analysis of behavioural differences between process variants (e.g., process instances with a positive outcome versus those with a negative outcome).

Although process mining was initially conceived to be applied within business contexts (such as, banking, wholesaling, manufacturing), research has shown that its value can be harnessed and reused in a multitude of different contexts, including healthcare [3, 4, 5, 6, 7, 8] . This study sits within the healthcare context, and it is set during the COVID-19 pandemic in Victoria (Australia). Our research goal was to identify differences between the patients' health services utilization pattern observed in 2020during the COVID-19 pandemic and mandatory lock-downs -and the one observed in the prior four years, 2016 to 2019. Given that health services are provided via the enacting of healthcare processes, process mining techniques are ideal for achieving our research goal, in particular, process discovery and process variant analysis techniques. To this end, we analysed process execution data extracted from more than 100 general practice (GP) clinics in Victoria. This data included more than 30 million events capturing the GP healthcare processes of more than one million patients in Victoria, over a time-span of approximately five years.

The contributions of this study are on two fronts.

-From a medical perspective, the results of our analysis show that vaccinations in Victoria did not drop as drastically as other clinical interactions did. On the contrary, we observed a surge of influenza and pneumococcus vaccinations in 2020, contradicting research findings of similar studies conducted in different geographical areas in the equivalent seasonal periods [9, 10, 11] .

-From a process mining perspective, our study highlights the capabilities of stateof-the-art process mining techniques as well as their limitations when dealing with large data-sets recording highly variable processes -which is typical of the healthcare processes. While we address some of these limitations by providing: i) a method for fixing timestamp equivalence issues in process execution data; and ii) a method to identify boundaries of incomplete traces with unknown start events; we also draw directions for future research in the area of applied process mining in healthcare.

The remainder of the paper is structured as follows. In Section 2 we discuss related work and background. In Section 3, we describe the data, the analysis we ran, the challenges we faced, and the solution we adopted. In Section 4, we review the findings of the data analysis, providing a medical interpretation and considering their consequences. Lastly, Section 5 summarises our results and draws the conclusion.

The analysis of healthcare delivery from the process perspective has been a core aspect of health services research and redesign. However, until recently, the analysis of healthcare data using a process perspective has been challenging due to the limited availability of electronic health data (and/or its poor quality) and the lack of powerful methods to quickly make sense of it [12, 8] . The recent adoption of electronic health records alongside process-aware information systems [13] has generated vast amounts of healthcare data-both clinical and administrative-that can be leveraged to better understand healthcare processes. Several systematic reviews have highlighted the use and potential benefits of applying process mining methods to understand and improve healthcare processes [3, 14, 15, 8] , and research reports include uses of a range of process mining techniques such as automated process discovery, conformance checking, and process variant analysis.

Automated process discovery techniques [16, 17, 18, 19, 20] allow one to discover patients' clinical pathways from the recordings of their healthcare process activities captured by the hospitals and clinics information systems [21, 12] . Conformance checking techniques [22, 23] allow one to automatically compare the observed healthcare process behaviour (in the form of process execution data) against a prescribed process behaviour to identify differences between actual and normative healthcare behaviour. The latter is usually provided in the form of an imperative process model or as a set of declarative process rules [24] , which rather than capturing the full process behaviour may describe clinical guidelines. Process variant analysis techniques [25, 26, 27, 28] allow one to automatically compare two or more sets of healthcare process executions exhibiting different outcomes (or performance) to identify relevant differences between the executions that may have had an impact on the outcome or performance of the healthcare process. These type of techniques are applied to answer questions such as: what were the differences between the healthcare treatments provided by two different hospitals to patients having the same diagnosis?

One of the earliest application of process mining in healthcare dates back to 2008, Mans et al. [21] used Heuristics Miner [16] to extract insights from healthcare process data, both from clinical and administrative perspective, including process handovers analysis by levering the process mining analytics platform ProM. 2 Poelmans et al. [29] used a combination of process mining and data mining techniques to detect and analyse differences in the healthcare pathways of patients treated for breast cancer and how they would respond to different therapies. Lakshmanan et al. [30] proposed an approach for discovering patients healthcare pathways and correlate them to their outcomes, combining techniques from process mining and data mining (including clustering and pattern mining). Suriadi et al. [31] applied process mining techniques to understand the differences of the treatments provided to patients suffering from chest pain at four South Australian hospitals. Partington et al. [32] applied process mining techniques to analyse the quality and the costs of the healthcare services provided to patients at one South Australian hospital. Roviani et al. [24] reported a case study on how to leverage declarative process mining techniques to identify divergences between clinical guidelines and the observed execution of clinical processes, at the urology department of the Isala hospital in the Netherlands. Leonardi et al. [4] proposed a method to abstract low-level process execution data (in the form of simple actions), turning it into high-level data that can be used for process mining applications. They validated their method by discovering process models from healthcare services, showing that their method improved the graphical representation of the healthcare processes, and facilitated the clustering of similar process executions. Alvarez et al. [5] applied process mining techniques to discover process models capturing how healthcare professionals operate within emergency rooms, analysing them to identify opportunities for process improvement. Chen et al. [6] proposed a framework to extract high-level descriptions of medical treatment processes from electronic medical records by applying clustering techniques on doctor order set sequences. Their framework allows to enrich the extracted process descriptions with additional information regarding the process performance (e.g., cost, length), providing support for improvement. Yang et al. [7] designed a process mining approach to automatically and in real-time detect process deviations from recommended clinical guidelines. They validated their approach on a set of pediatric trauma resuscitation procedures, demonstrating the effectiveness of their solution.

All these studies on process mining in healthcare represent only a fraction of the existing ones, but reporting on all of them would require a separate study and it would be outside the scope of this one. Hence, we refer the interested reader to the latest literature reviews [15, 14] .

Given the diversity of tools available and the applicability of process mining to healthcare, we used this perspective to understand changes in health services utilization patterns during the COVID-19 pandemic in Australia.

Since the early months of the COVID-19 pandemic, the main drivers behind lockdowns and stay-at-home measures were the need to reduce face-to-face interactions to prevent the virus from spreading uncontrollably, the subsequent increase in morbidity, mortality, and overwhelming of healthcare service providers.

In parallel, there were growing concerns that stay-at-home recommendations, lockdown measures, and the fear of becoming infected would have a deep impact on the provision of non-COVID-19 health services. Although heterogeneous, most governments across the world recommended some form of mobility reduction measures to reduce the transmission rate of SARS-CoV-19 so the expectations were that most countries would be impacted, although at different extents. Several publications reported the observed effects on the utilization of health services. The World Stroke Organization reported on a reduction on the number of patients being diagnosed with stroke despite COVID-19 apparently increasing the risk of this diseases and attributed the change to reduced access to health services [33] . These findings were confirmed in the USA [34] . Similar effects were described for patients with acute myocardial infarction [35] , and cancer [36, 37] , among other conditions. This phenomenon was also observed for preventative care services such as cancer screening [38, 39, 40] , and maternal and child health services [41] . In particular, there were growing concerns that a significant reduction in immunizations would result in an increase in vaccine-preventable conditions [42, 9] .

The goal of this study was to analyse changes in health services utilization patterns during the 2020 COVID-19 pandemic and associated lock-downs in Victoria (Australia).

In this section, we introduce the data we analysed, discussing its characteristics and highlighting those that are the most critical in the context of this study. We describe what methodology and tools we used to analyse the data, what findings we uncovered, what challenges we faced during the analysis and how we addressed them. While we were able to solve some of these challenges, by proposing approaches that can be reused in different contexts, other challenges remain open or partially addressed and should be considered in future research work in the area of process mining.

Before discussing our analysis, we provide some formal definition for the concepts we refer to throughout section. While we contextualised these definitions within our study, we remark that these are well-known definitions and concepts in the area of process mining [1] . Definition 1. Event -An event e captures the execution of an activity within a process instance. An event can be represented as a tuple (x 1 , x 2 , . . . , x n ), where each element x i captures an attribute of the event, and at least three attributes are present: the process instance ID (c -event ID); the label of the activity the event refers to (aevent activity); and the timestamp (t -event timestamp). Additional attributes usually capture the process resource who executed the activity, customer information, etc. In the following, given an event e, we will refer to its three required attributes with the notation e |c , e |a , e |t . Definition 2. Event Log -An event log L is a sequence of events e 1 , e 2 , . . . , e n , such that all the events are ordered by their timestamp. Formally,

Definition 3. Trace 3 -Given an event log L, a trace of the event log τ ∈ L is a sequence of events, τ = e 1 , e 2 , . . . , e n , such that all the events belong to the event log, all the events are ordered by their timestamp, and all the events have the same event ID attribute. Formally, ∀e i ∈ τ | i ∈ [1, n − 1] ∩ N ⇒ e i|c = e i+1 |c ∧ e i|t ≤ e i+1 |t . 3 We note that, according to Definition 3, we can also consider an event log as a multiset of traces.

Definition 4. Directly-follows Relation -Given an event log L = e 1 , e 2 , . . . , e n , we say that a directly-follows relation holds between any two events e i , e j ∈ L if and only if e i and e j belong to the same trace and j = i + 1, in other words, the two events follow each other in (at least) one trace. We indicate such a relation with the notation e i → e j . Formally, given e i , e j ∈ L, e i → e j ⇐⇒ ∃τ ∈ L | e i , e j ∈ τ ∧ j = i + 1. We extend the concept of directly-follows relation to the event activities, i.e., if e i → e j then we say that also e i|a → e j |a holds.

Definition 5. Directly-Follows Graph (DFG) -Given an event log L, its Directly-Follows Graph (DFG) is a directed graph G = (N, E), where: N is the set of nodes, N = {n | ∃e ∈ L ∧ e |a = x}; and E is the set of edges E = {(x, y) ∈ N × N | ∃e 1 , e 2 ∈ L ∧ e 1 → e 2 ∧ e 1|a = x ∧ e 2|a = y}. In other words, each node of the DFG represents a unique activity recorded in the event log, and each edge of the DFG represents a directly-follows relation between two activities -represented by the source node and target node of the edge.

Process is a sequence of events, activities, and decisions involving actors and data objects triggered by a specific start event and leading to a specific end event (i.e., process outcome) that delivers value to a customer.

In this study we used the Patron dataset [43] . This dataset stores de-identified patient data from the Patron primary care data repository (extracted from consenting general practices), that has been created and is operated by the Department of General Practice at The University of Melbourne [43, 44] . This dataset is aggregated from more than 100 General Practice (GP) clinics in Victoria (Australia) and includes both administrative and clinical data, including all interactions between patients and their GPs, for more than one million patients. Access to the data was approved by the Melbourne Health Human Research Ethics Committee (HREC). The dataset is stored in a relational database, which includes the following six tables: Patient Details (Demographics); Patient Clinical Information; Medical History (Diagnoses); Patient Visits; Medications; Investigations (Pathology and Imaging). While the first three tables contain information regarding the patient and their clinical history; the last three tables containing information regarding the patient healthcare processes, respectively: information on patient visits to and interactions with their GP doctor(s); information on patient drugs prescriptions; and information on patient pathology and imaging tests and results.

Looking at the latter three tables through the lens of Definition 1, an event ID corresponds to a patient ID, which identifies a unique patient accessing GP services across all the tables. An event activity corresponds to a medical activity the patient underwent. From the three tables capturing the patient healthcare process, it is possible to extract seven medical activities, which are reported in Table 1 . These activities capture all the interactions of a patient with their GP, including the drugs they have been prescribed, their pathology and imagining tests and results, and their vaccinations. For simplicity, we will refer to each of these seven activities by using a letter A to G (following the mapping in Table 1 ). Lastly, the event timestamp corresponds to the time a medical activity was completed. We note that the Patron dataset does not record information regarding activities' lifecycle, e.g., the start and the completion of the activities, and that the timestamp granularity is at day-level (i.e., the smallest difference between timestamps is at day-level). Such a timestamp granularity is frequent in the healthcare contexts, and (at least in our case) it is related to how the system records events into the database. Consequently, it was virtually impossible to infer a better timestamp granularity (e.g., hours and minutes) or the duration of a single medical activity (e.g., how long a GP visit would last). In light of this, in the Patron dataset, a trace captures a unique patient accessing GP services over the time, i.e., a process instance of the GP day-to-day healthcare process. GP records a measurement (e.g., blood pressure) C Patient is prescribed a medication D Patient is prescribed a medication refill E Patient is referred for a laboratory or imaging study (e.g., blood analysis) F

Tests results are recorded G

One or more vaccinations are administered/recorded

The de-identified data was stored in a secure virtual machine. While this was a strict requirement for analysing the data, such a secure environment posed some challenges during the data analysis stage (discussed later in this section), mostly related to the fact that it did not allow for internet access.

To conduct our analysis, we adhered to the methodology proposed by van Eck et al. [45] , adapting it to our context. The PM 2 methodology [45] has six stages: planning; data extraction; data processing; data mining and analysis; evaluation; and process improvement and support. We thoroughly executed all the stages with the exception of the last stage. Given that this study did not involve GP clinics and healthcare practitioners, we did not have the means to implement a redesigned process, besides, it would have been outside the scope of this study.

Following the PM 2 methodology, we started from the planning, which includes three steps: i) selecting the process to analyse; ii) determining the process analysis goal; iii) and assembling a team. Indirectly, the Patron dataset drove the process selection. Given that it captures ambulatory patients' interactions with their GPs, we selected for our analysis the GP day-to-day healthcare process. Our analysis objective was to identify differences between the GP healthcare services provided in 2020-during the COVID-19 pandemic and mandatory lock-downs -and those observed in the prior four years, 2016 to 2019. The authors of this paper composed the research team, bringing expertise in process mining, data mining, and (medical) general practice.

Once the scope of our analysis was set, we moved to data extraction and processing. The Patron dataset, as mentioned above, already included all the data we required to analyse the selected process. The extraction of this data was performed outside this study, and it is not our contribution. However, healthcare process data rarely comes in the form of ready-to-use event logs [12, 32] , which is the required data format for conducting a process mining analysis [1, 45] , and the Patron dataset was no exception. During this stage, we focused on transforming the available data into an event log that could allow us to achieve our analysis goal. This required us to identify what entries of the relational database were suitable to be turned into events. As mentioned above, we extract all the entries from three tables out of six, which captured the medical activities shown in Table 1 . Each entry of a table included the patient ID and the timestamp, hence, the conceptual mapping from table entries to events was straightforward. We note that this mapping was facilitated by the existing of a very extensive data dictionary describing the Patron dataset, which often is not available.

The data extracted captured a time-span of (almost) five years, from January 2016 to November 2020, but we reduced this time-span to keep only the data collected between 01-March to 30-November for the years 2016, 2017, 2018, 2019, and 2020. This choice was driven by three factors: i) our analysis goal (as mentioned above); ii) a key date in the international response to the COVID-19 pandemic; iii) and our latest access to data. Precisely, given that the World Health Organization (WHO) officially declared the COVID-19 a pandemic on the 11 th March, we set the start date of our analysis on the 1 st of March, while for the end date we were forced to set it to the 30 th of November, which was our latest available access to data. We also note that the two dates are closely related to the enforcement of the first lockdown restrictions in Victoria (16-March-2020) and the lifting of the last lockdown restrictions in Victoria (09-November-2020), in the year 2020.

The data was extracted via an ad-hoc R-script and saved in the form of CSV event logs. These CSV logs were then converted in the standard XES format via Apromore (academic version) 4 , which can be used without internet access and does not have limits on the amount of data to be processed, as opposed to Disco 5 or Celonis 6 . Alternatively, we could have converted the CSV event logs into XES format via ProM 7 Once we obtained the event logs from the Patron dataset, we proceeded to the data mining and analysis stage.

By looking at the data through the lens of Definition 1, 2, and 3, we could summarise its characteristics as shown in Table 2 .

The main log (labeled, GP16-20) covered 45 months. The GP16-20 log counts almost 2.5 million traces (short of 20 thousand), of which 1.0 million (41.4%) are distinct -meaning no duplicate of that trace is present in the log. These traces include 31.8 million events, which -to the best of our knowledge -dwarf any of the reallife public logs used in automated process discovery research [20] . The trace length varies widely, with minimum, average, and maximum length of 1, 12, and 2317 events (respectively). Given that our goal was to compare the patients behaviour in the months between March and November 2020 against the patients behaviour in the same timeframe of the past four years, we divided the log into five sublogs (namely, GP20, GP19, GP18, GP17, GP16), each of them capturing the 9-month timeframe in one of the five years under analysis. Such an approach is common for performing process behavioural comparison -known in the area of process mining as process variant analysis [27] . Looking at Table 2 , we notice that dividing the GP16-20 log into five sublogs does not affect much the variety of the process behaviour. Although the absolute number of events and traces reduces, each of the five (sub)logs maintains remarkable characteristics; i.e., 5.1 million (GP20 log) to 6.9 million (GP19) events, and 401 thousand (GP20 log) to 520 thousand (GP17 log) traces (on average, 41% distinct). As a comparison, the largest real-life event log used in the series of business process intelligence challenges had 1.6 million events. 8 By analysing the characteristics of these five logs, we can immediately draw some initial observations. Observation 1. In 2020, there was an average drop of 22.8% of patients accessing GP clinic healthcare services, compared to 2016-19. This is captured by the decrease of the total number of traces observed in the GP20 log, 401,370 as opposed to an average of 520,304 across the previous years -having min and max of 507,075 and 531,618.

Observation 2. In 2020, GP clinic healthcare processes maintained their high-level overarching variety. This is captured by the almost constant percentage of distinct traces, 41.6% in 2020, and 41.4% on average over 2016-19. Meaning that each healthcare process instance was observed exactly the same little more than two times.

Observation 1 was expected, given that a strict lockdown was enforced in Victoria from 16-March to 21-June and from 04-July to 09-Nov, that possibly deterred patients from accessing healthcare for what they considered minor issues. Observation 2 had a surprising nature, in fact, intuitively, we would expect that the combination of lockdown and pandemic would foster standardization in healthcare processes (i.e., less variability).

We explored the distribution of the activities over time, their frequencies, and how they varied over the five years. This information is shown in Figure 1 . Figure 1a and 1b show the absolute and relative frequencies of each of the seven activities over the five years; Figure 1c to 1i show the absolute frequency of each activity over time, month by month; and Figure 1j shows the changes in absolute frequency of each of the activities in 2020, compared to the previous four years. From this data, we can observe the following.

Observation 3. In 2020, the relative frequency of activity B (GP records a measurement) dropped to 9.0% from an average of 12.1%. Although this seems a small variation, we note that in 2019, 2018, 2017, and 2016, the relative frequency of activity B was remarkably stable at 12.2%, 12.0%, 12.1%, and 12.2% (respectively).

Observation 4. In 2020, the relative frequency of activity D (GP prescribes a refill) increased to 9.2% from an average of 5.0%. Also in this case, we note that in 2019, 2018, 2017, and 2016, the relative frequency of activity D was somewhat stable at 5.8%, 5.1%, 4.6%, and 4.4% (respectively).

Observation 5. In 2020, the variation in the absolute frequency of activity G (vaccinations are administered/recorded) is remarkably low, in fact, it decreased of only 12.8% and 6.1% -compared to 2019 and 2018, and it increased of 7.4% and 16.9% -compared to 2017 and 2016. Furthermore, the absolute frequency of activity G is concentrated in the months of March and April, in contrast with the other years, where activity G is mostly observed in April and May.

Observation 3 can be straightforwardly interpreted. Given that activity B represents a GP taking and recording a measurement of the patient (e.g. measuring and recording the patient blood pressure), its decrease can relate to the actual implementation of safety measures -GP doctors may have avoided to interact with the patients unless strictly necessary.

Observation 4 represents an increase in medication refills. In particular, looking at Figure 1f , which captures the activity D distribution over the nine months, we note a clear spike in March, April, June, July, and September. This can relate to an overstocking of drugs by patients that could not risk to run out of their medications. We remind that, during the early COVID-19 pandemic, overstocking was a phenomenon observed across a variety of products from food to toilet paper, known also as panic buying [46] . However, taking into account the changes of absolute frequency for activity D (see Figure 1j ), we can observe that drug prescriptions have increased steadily in the past four years with an average increase of 14.2%. Given that also a similar trend can be observed for activity C (capturing a first-time drug prescription), we cannot conclude that the increase observed in activity D derived exclusively from the COVID-19 pandemic context. Lastly, Observation 5 is probably the most interesting one, also because it is in contrast with research findings of similar studies conducted in different geographical areas during the equivalent seasons [9, 10] . The data clearly shows that vaccinations were not substantially impacted in 2020, with a decrease in absolute frequency that is lower than the one of other activities (see Figure 1j ). Leaving aside medication-related activities (i.e., activities C and D), other activities reported an absolute frequency drop of between 23.9% (on average for activity A-GP Visit) and 43.2% (on average for activity B-measurement). While activity G (vaccinations) reported a maximum absolute frequency drop of 12.8% (compared to 2019) and an average drop of 1.3%. If we consider this in light of the total drop of the activities observed in 2020 (23.6% on average, see Table 2 -total events), the drop of vaccination activities is well below the average. In addition, there is a noticeable shift in the vaccination timeline for the 2020 year, bringing the vaccinations forward of one month. Observation 5 set a direction for additional analysis, which led us to additional findings that we will discuss in depth in Section 4.

Until now, we have described and analysed the data in general terms. Although we approached it from a process perspective, identifying the process activities and their execution over time, we have not discussed nor analysed the process behaviour, i.e., how such activities follow one another, and what their execution leads to. To analyse the process behaviour, process mining methodologies and tools often rely on directlyfollows relations [47] (see Definition 4), especially, for automated discovery of process models [48] , and for process variant analysis [25, 49, 26] .

Recalling the event log definition (Definition 2), given that the order of the events in an event log is imposed by the order of their timestamps, incorrect or imprecise timestamps can have a significant (negative) impact on the identification of directly-follows relations and, consequently, on the output of process mining tools that rely on directlyfollows relations. This is a well-known problem in the field of process mining [50, 51] , especially in healthcare [12] , where activities are documented manually. We recall that also in our case the event timestamps had a day-granularity.

To give an idea of the issue, let us consider a patient visiting a GP doctor (activity A), the doctor measures the blood pressure of the patient (activity B), and then prescribes a medication for the first time (activity C). The activities order is A, B, C . However, they will be recorded in the information systems having all the same timestamp (i.e., the day of the visit), and not necessarily in the order they have been executed. For example, the fact that the patient has visited the doctor may be recorded at the end of a consultation, and the doctor may log activities B and C after they really occurred (inputting them manually on a computer software). As a result, the actual recording may read as follow C, A, B . The more the activities to be recorded, the more are the users involved in their (manual) logging, the greater is the amount of errors.

Past research studies in process mining have addressed the problem of cleaning (or repairing) imprecise timestamps and timestamps errors [52, 53, 54, 55] , however, three of the proposed methods require as input a reference process model [52, 53, 54] , while the method of Conforti et al. [55] requires to have at least a subset of the events recorded in the event log that are not affected by imprecise timestamps. In our case, we could not rely on any of these existing methods, missing their requirements.

While recent work [8] called for improving the quality of the data captured by healthcare information systems, with the goal to fix the problem at its root, we would like to highlight the opportunity (and the need) for additional research addressing the problem of automated repairing and the cleaning of event log data errors -especially timestamps.

To continue our analysis and ensure the most reliable outcome, we devised an effective solution to deal with the imprecise timestamps. We imposed a standard order among the activities (matching the alphabetical order of their labels, see Table 1 ), and we reordered the events in the event log based on two attribute values: the event timestamp and the event activity. The latter attribute used as a tie-breaker on timestamps equality.

For example, let us consider the following sequence of events e 1 , e 2 , e 3 , e 4 , e 5 , and let us assume that the five events (e 1 to e 5 ) have all the same timestamp and that the corresponding sequence of activities is D, A, G, F, E . In such a case, we would reorder the events as e 2 , e 1 , e 5 , e 4 , e 3 , yielding the sequence of activities A, D, E, F, G . Note that the event IDs do not play a role in the ordering. Events having the same ID will be ordered correctly, while events having different IDs would not be affected by the reordering.

Our solution is based on the idea that, in most of the scenarios (and especially in healthcare), certain activities have logical order constraints, e.g., a GP doctor cannot take and record a patient blood pressure (activity B) if the patient is not attending a visit (activity A). Yet, our solution has limitations, given that not all the activities have a logical order constraint, e.g., a patient may be administered a vaccine (activity G) either before or after she is prescribed a medication (activity C or D). In fact, there are only three strict logical order constraints in our case, and they are: A before B, C before D, and E before F . The order we imposed satisfies the three constraints, but also enforces others. We note that, while enforcing additional constraints may distort the factual reality, it homogenise the data allowing for a correct and fair comparison.

To describe the effects of our solution, let us consider two traces A, B, G and A, G, B , and let us assume that all the events within each trace have the same timestamp. Comparing the two traces as they are would tell us that they are different, but according to the data they are not (i.e., the timestamps are equal, so any order is valid in principle). Enforcing a standard order over the activities as a tie-breaker on timestamp equality ultimately leads to data standardization and a correct interpretation.

Our approach for fixing imprecise timestamps due to high-level granularity can be generalized to virtually any other context when the objective of the process analysis is the comparison of process variants, so it should not be considered as an ad-hoc approach for our specific scenario. However, we acknowledge that to define the logical order on the activities, the input of domain experts may be required. In our case, we relied on the experience in general practice medicine of the co-authors Dr Capurro and Dr Manski-Nankervis.

Lastly, we note that the time complexity of our approach is linear on the number of events contained in the event log, making it not only effective but also efficient.

Once we solved the problem of imprecise timestamps, we focused on the process behaviour, analysing how the process activities follow one another and what their execution leads to. However, we note that our healthcare process instances do not perfectly fit the traditional definition of process [2] (see Definition 6), because they miss both a specific start event and a specific end event, making these process instances unbounded.

In our context, a patient may consult their GP doctor to discuss several health issues at once, each of them may lead to different outcomes and some of them may never reach an outcome (e.g., a chronic disease, which requires to be indefinitely monitored), forcing the customer to indefinite follow-ups. At the same time, while following health issues up, new health issues may arise. As one can see, the GP day-to-day healthcare process is conceptually unbounded. In particular, when we look at the activities of a patient within a specific timeframe, the first activity we observe is not necessarily the one that started their GP day-to-day healthcare process, and the only way to determine that with 100% accuracy would be to have a timeframe at least equal to the patient age -which is an unrealistic requirement for most of the patients.

Existing process mining techniques for automated process discovery and variant analysis (e.g., [18, 17, 25, 28] ) are not very effective when dealing with unbounded process instances, because by design they would implicitly (and erroneously, in our context) assume the first event of a trace in the input event log to be the start of the process instance, and the last event of a trace to be the end of the process instance. We can, however, identify the most appropriate start and end events given a process instance. This can be achieved by narrowing down the scope of an unbounded process instance, for example, by focusing on a single GP visit or a single health issue/procedure. To do that we devised an algorithm that leverages domain experts knowledge, once again, the co-authors Dr Capurro and Dr Manski-Nankervis.

We started from the assumption that a process instance should begin with a visit to the GP doctor (i.e., activity A), effectively making activity A the only possible start event of a trace. Any subsequent activity different than activity A (i.e., activities B to G) is assumed to be a follow-up of the initial visit to the GP doctor. However, when a second activity A is observed for the same process instance, we have to distinguish two cases: i) the new activity A is a follow-up of the past activities; ii) the new activity A is not related to the past activities (i.e., this would trigger a new process instance). We distinguished the two cases on a time basis. Precisely, if the new activity A is more than six months away from the first observed activity A and more than one month away from the last observed activity of the current process instance, we are in case ii); otherwise, we are in case i). These time thresholds were set empirically following the domain experts.

Algorithm 1 describes a generalisation of our approach to generate traces from a given event log containing unbounded process instances. The algorithm takes in input the log (L), a set of allowed start activities (α) -in our case containing only activity A, and two time thresholds ∆ 0 and ∆ n -in our case six-and one-month respectively. Three data structures are initialised (see lines 1 to 3): i) a map linking an event ID to its trace (Π) -representing the collection of traces to output; ii) a map linking an event ID to the timestamp of the first event in the corresponding trace (T 0 ); and iii) a map linking an event ID to the timestamp of the last observed event in the corresponding trace (T n ). Then, we read the log (L) one event at a time, starting from its first event (e, line 4).

If the event ID (e |c ) is not yet in the map Π and the event activity (e |a ) is in the set α, we create a new empty trace (τ ), we append e to τ , we add the event ID and the trace to the map Π, we save the timestamp of e in T 0 and in T n (lines 5 to 10).

If e |c is already mapped in Π and e |a is not an allowed start activity (line 12), we retrieve the trace linked to the event ID (Π(e |c )) and we append e to that trace (line 13). Then, we update the timestamp information by overwriting the last observed event timestamp in the map T n (line 14) .

If e |c is already mapped in Π and e |a is an allowed start activity (line 12), we distinguish the two possible cases mentioned above. Case ii), if e |t is less than or equal to ∆ 0 or less than or equal to ∆ n , then we append e to the already existing trace Π(e |c ) (as just described above -see lines 16 to 18). Otherwise, Case i), we create a new event ID (that is not present in the event log), 9 we link the new event ID to the existing trace in Π that is mapped to e |c , we create a new empty trace (τ ), we append e to τ , we add the event ID and the trace to the map Π, we save the timestamp of e in T 0 and in T n (lines 21 to 26).

Once all the events in the event log have been read, Algorithm 1 returns the map of event IDs and the corresponding traces.

Assuming that accessing the maps is a constant-time operation, as it is the case in modern object-oriented programming languages, Algorithm 1 has a linear time complexity on the number of events contained on the event log.

We note that the information shown in Table 2 is the one obtained after the execution of Algorithm 1. The column filtered events reports the number of events that were removed by applying Algorithm 1, i.e., events that are not preceded by an activity A. On average, we removed 3.6% of events from the data, which is a negligible amount.

At this stage, we can finally turn our attention to the process behaviour analysis, by leveraging process mining techniques [45] . Since we are interested in identifying process behavioural differences over five different timeframes (each captured in an event log), the appropriate process mining techniques are in the class of automated process discovery [20] and process variant analysis [27] . Automated process discovery techniques receive in input an event log and automatically produce a process model, which is a graphical representation of the process behaviour, such as a workflow chart, a Petri net, or a BPMN model 10 . By looking at different process models, it is possible to detect behavioural differences. On the other hand, process variant analysis techniques receive two event logs and automatically produce an artifact that highlights the process behavioural differences. Differences captured by variant analysis techniques are either at control-flow level (i.e., process behavioural differences in terms of executed activities) or at performance level (i.e., differences in the execution/hand-over Add (e |c , e |t ) to T0; 10 Add (e |c , e |t ) to Tn; Add (e |c , τ ) to Π; 25 Add (e |c , e |t ) to T0; 26 Add (e |c , e |t ) to Tn; 27 return Π;

times of/between the process activities). From both classes of techniques, we selected three state-of-the-art tools, based on previous studies evaluations [20, 26, 28] which are: Fodina [17] , Inductive Miner [19] , Split Miner [18] , and their metaheuristics optimization variants [48] (for automated process discovery); and process comparator [25] , fingerprints-based variant analysis [26] , and variant analysis via declarative rules [28] (for variant analysis). We attempted to discover a process model from each of the five event logs, by running each of the three automated process discovery tools. The models we obtained were not structurally complex (spaghetti-models), but they showed that any behaviour was allowed -with minimal constraints and many cyclical patterns. This finding highlights that the GP day-to-day healthcare processes have a behavioural degree of freedom that is not comparable with most business processes, allowing a vast amount of different behaviour to be executed and repeated over time.

As an example, Figure 2a and 2c show the models discovered by Inductive Miner and Split Miner from the GP20 log, while Figure 2a and 2b show the models discovered by the same techniques but from the GP19 log. The process models discovered by Split Miner are almost identical (with a small variation involving activity G), and they allow for much repetitive and variable behaviour over the set of seven activities. While those discovered by Inductive Miner are in fact identical (we discovered the same model from the two event logs), and they allow for an even wider range of behaviour. 11 (a) Inductive Miner [19] process model (GP19 and GP20 event logs) (b) Split Miner [18] process model (GP19 event log) (c) Split Miner [18] process model (GP20 event log) Figure 2 : Automatically discovered process models [18, 19] , from the GP20 and GP19 event logs Successively, we ran the variant analysis. In this case, we experienced some technical limitations of the existing tools, which future research should consider addressing.In particular, the tool of Bolt et al. [25] was not able to process the input datayielding exceptions. 12 The tool of Taymouri et al. [26] either was unable to provide an output within a two hours timeout 13 , or it was not able to identify statistically significant differences. Lastly, the tool of Cecconi et al. [28] was the only one that returned a valid output within the timeout, however, it that was possible only by applying its embedded filtering algorithm -which allowed us to focus only on the most frequent process behaviour. The top-10 differences identified are reported in Table 3 , for instance, the output of the process variant analysis of the GP20 and GP19 logs support and refines Observation 3 and Observation 4 (see Section 3.2), highlighting a decrease in the number of observations of activity B in the healthcare processes of 2020 (see Table 3 , rows 5, 7 and 10), and an increase in the number of observations of activity D in the healthcare processes of 2020 (see Table 3 , rows 1-4 and row 9). Similar results were obtained when comparing the data from the 2020 against the data from the 2018, 2017, and 2016.

Given that the selected process mining techniques struggled to deal with the variety of behaviour captured in the logs under analysis, we took a step back, and decided to review the behaviour recorded in each of the five event logs by visualising their directlyfollows graphs.

For reasons of space, clarity, and simplicity, we report the DFG of only two event logs (GP20 and GP19) and in their matrix form, where each matrix row (and column) represents a node of the DFG -i.e., an activity; and each cell of the matrix captures the frequency of the edge between the two nodes -i.e., how many times we observe in the event log a directly-follows relation between two activities. Table 5 : Directly-follows graph in matrix form -2019 process 1190430  449745  299143  70762  337449  122043  121266  B  219968  107579  78582  13069  74732  221429  12921  C  46754  11415  39  294829  26567  3681  4079  D  179501  25863  145  456  88901  11993  15318  E  109527  72731  1078  519  19149  412086  17438  F  485141  120865  16046  6182  88711  556912  5921  G  108491  19419  77  101  7483  15293  185   Tables 4 and 5 report the DFGs in matrix form of the event logs GP20 and GP19 (respectively). We note that any directly-follows relation can be observed in two DFGs. Although some of them are rare (e.g., C → C, with a frequency in the order of hundreds), the vast majority can be observed with a frequency in the order of thousands. The DFGs of the event logs (GP18, GP17, and GP16) are very similar to the two we reported here, and this clearly highlights that any behaviour is allowed in the process under analysis, across the five event logs.

The extent of behavioural variability we are observing is a major cause of strain for state-of-the-art process mining techniques, which can disarm them. An alternative would be to remove some behaviour by applying filtering techniques [56, 57, 58] , however, we recall that all the automated process discovery algorithms that we used already apply a filter [17, 19, 18] , as well as two of the three variant analysis approaches [25, 28] .

While in a business context some infrequent behaviour may be a violation of compliance or internal business rules, in our context, all behaviour is actually allowed. As such, we are not interested in removing behaviour, but rather change our focus, and consider only a portion of behaviour that can be fruitfully analysed.

With that in mind, we considered only the most frequent behaviour. Table 6 reports the top-20 most frequent traces that we could observe in each of the five event logs. Scanning carefully through Table 6 , we notice that in 2020, traces containing the activity G were more frequent than other years (for clarity, we reported these traces in Table 7 ). In 2020, not only the traces containing the activity G were more frequent, but they accounted for the 25% percent of the most frequent behaviour (5 traces out of 20). This finding is remarkable, and when paired it with Observation 5 (discussed in Section 3.5) clearly hints to a variation in the behaviour involving vaccinations during the 2020. A similar reasoning can also be done for traces containing the activity D (see A, D , across the five logs). Further investigation of the most frequent process behaviour may reveal several additional differences, but within the scope of this study, we decided to investigate the specific behavioural difference related to activity G and its traces among the top-20.

Our process mining analysis highlights two limitations of the state-of-the-art process mining techniques that we used in this study:

1. Process mining techniques for automated process discovery and process variant analysis suffer of scalability and/or quality issues when they deal with too much and too variable behaviour.

2. Automated process discovery techniques try to capture as much behaviour as possible from the event log, filtering infrequent behaviour only when it is strictly necessary to either simplify the process model or increase its accuracy. However, depending on the context, one may be interested in capturing very little process behaviour from the event log -requiring special filtering techniques.

While limitation 1 is ground for future research directions and studies. Limitation 2, at the moment, can be addressed manually, by applying ad-hoc filters of the process behaviour recorded in the event logs (as we did). The best ad-hoc filters must be identified by domain experts, often on a trial-and-error basis, and applied either via ProM plugins or commercial tools such as Celonis, Disco, or Apromore (which we used). Future process mining techniques should allow the user to automatically design such filters, without relying on domain experts knowledge. For instance, by automatically analysing the outputs of a set of process mining techniques (e.g., both automated process discovery and variant analysis) -as we did manually. Once we narrowed our attention down to only the most frequent traces containing a vaccination event (activity G), we could easily discover a clear and simple process model for each of the five years and identify the differences, Figures 3a to 3c show the process models. We note that in 2016, 2017, and 2018, the most frequent vaccination process is the same (Figure 3a) , changes in this process are minimal in 2019 (Figure 3b) , and substantial in 2020 (Figure 3c) .

Lastly, we analysed each of the process traces by looking into the distribution of the executed activities over time, we reported this information in Figures 4a to 4f . At this point, it is evident that the differences in behaviour were not only in terms of how the activities were executed (i.e., their order and frequencies) but also when. We summarised these findings in the following observation.

Observation 6. In 2020, we can observe a clear (left-)shift (i.e., towards March) and early peak in the distribution of the activities executed within the most frequent behaviour of the vaccination process, as well as a different trend when compared to the past four years, which holds for all the activities involved in the vaccination process. Furthermore, the most frequent vaccination process in 2020 was more complex than the previous four years, allowing for more behavioural variants with frequently requiring additional activities (activity A, and B).

The observed change can be explained by the recommendation that Australians receive their influenza vaccinations before the normal season (April-May). This recommendation was broadly advertised to minimize a possible double hit to the healthcare system: an epidemic of SARS-CoV-19 in addition to the usual Fall/Winter influenza season. 14 In the next section, we will discuss more in depth this observation from a medical perspective. 

Our process mining analysis provided us with a lead to follow. It allowed us to identify relevant differences in the behaviour of the patients in 2020, in particular, when considering the vaccination process. Accordingly, we refined our original research question into two sub-questions. RQ1. What type of vaccines have driven the frequency increase in the most frequent vaccination process traces? RQ2. What are the differences between vaccination behaviour of different age classes, i.e., children (0-17 years), adults (18-64 years) , and elderly people (65+)?

To answer these research questions, process mining techniques can provide little help. Process mining and, more in general, process science and process thinking can be a lighthouse in a ocean of data. However, it is difficult to dig deeper by only relying on process mining techniques, given that at the current stage they do not take into account rich perspectives surrounding the process behaviour. In fact, to the best of our knowledge, there are no reliable and effective process mining techniques -in the area of automated process discovery [20] and process variant analysis [27] -that give a global picture of the process, taking into account all the additional data recorded in the event attributes available in the event log. For instance, to answer our research questions, Yellow Fever the crucial event attributes are patient age and vaccine type, but the integration of this information in a process model is not a trivial problem. Besides, we note that our refined research questions are data mining oriented. In fact, process mining and data mining are complimentary, and future research directions should leverage this relation between the two disciplines to bring them together.

To continue with our analysis, we extracted all the data regarding vaccination events (activity G) from the original dataset (GP16-20 event log), and we analysed the different vaccine types and the immunity they provide, Table 8 shows a mapping between the vaccines and the labels we will use to simplify the presentation of the data. Figure 5a shows the absolute number of vaccines we observed in 2020, grouped by the provided immunisation (see Table 8 ). Figure 5b to 5e report the change in the absolute number of vaccines observed in 2020, when compared to the past four years. We compared the vaccination count by grouping the patients by age, specifically: all ages ( Figure 5b) ; young people (0 to 17 years old - Figure 5c ); adults (18 to 64 years old - Figure 5d ); elderly people (65+ years old - Figure 5e ). From the data, we can draw the following observation.

Observation 7. In 2020, there was a surge of influenza (V8) and pneumococcus (V13) vaccinations (see Figure 5 , vaccine V8 and V13), predominant in adults and elderly people, and in contrast with a decrease of these vaccinations for young people (see Figures 5b to 5e, V8 and V13 ). The increase is even more startling when we consider that all the other vaccines suffered a decrease of approximately 50% (on average). Similar to our discussion on Observation 6, the surge in influenza vaccinations can be linked to the public health campaign aiming at increasing the proportion of patients receiving the influenza vaccine to reduce the size of the seasonal peak of influenza infections and hospital admissions, in anticipation of the potential overload of the healthcare system by COVID-19 patients. Similarly, a larger proportion of older adults might have received their pneumococcal vaccines concomitantly. Furthermore, we were able to observe that vaccines associated with international travel requirements (i.e. Yellow Fever, Japanese Encephalitis) practically disappeared, probably a result of international border closures.

In the next section, we explore more in depth these implications from a medical point of view.

In this section, we described how we have analysed the data and the observations we could draw from it.

To analyse the data, we followed a well-known methodology, PM 2 [45] . We note that the latter was designed to be applied in a business-context. However, we argue that this does not pose a threat to its applicability in healthcare, in fact, we were able to adhere to its stages from start to end, with the exception of omitting the execution of the process improvement stage, since it was out of the scope of this study.

To execute the process mining analysis, we relied on a subset of the existing stateof-the-art process mining techniques for automated process discovery and variant analysis, which we selected according to the findings of the most recent literature reviews. While, in theory, applying other techniques may have yielded different or better results, we recall that the process mining techniques we used were the latest and the most reliable.

The observations reported in this study cannot be generalised to Australia, nor the state of Victoria. However, we note that the analysed data captured the behaviour of approximately 400 thousand patients (per year), which account for almost 6% of the entire population of the state of Victoria -a remarkable percentage, especially when we consider that not the whole population regularly visit GP clinics. While the observations reported in this study are derived from the data and, hence, objective, their analysis and our discussion represent our interpretation. We note that when providing an explanation for a specific observation we considered findings of other similar studies and the experience of two domain experts who co-authored this study (Dr Capurro and Dr Manski-Nankervis). In theory, alternative interpretations for some of our observations may be possible but, to the best of our knowledge, the one we provided in this study are the most reasonable and realistic. 

The study presented here represents the first use of process mining techniques to analyze the impact of the COVID-19 pandemic in health services utilization patterns in primary care. Using a combination of process mining techniques we were able to highlight several relevant changes in health services utilization patterns associated with the disruptions seen in 2020. In addition to these, we were able to highlight some limitations of the process mining tools available today, in particular, when applying them to analyze healthcare process data.

Overall, we observed a widespread reduction of GP activities during the period included in our study, when compared to the same period in the four preceding years, concordant with what has been reported in other countries [59] . It is expected that such a reduction of GP activities led to a reduction of specialists visits -given that the Australian is a referral-based system. The consequences of such additional potential reduction of healthcare activities remain to be seen. From the process perspective, in such a situation, we would have expected a reduction in the number of distinct healthcare process execution, instead, the degree of variety of process behaviour remained almost unchanged during this period.

One activity that showed a different behavior were drug prescriptions. We observed an increase in drug refill prescriptions, with peaks in March, April, July and September. These peaks are associated with periods immediately before lock-downs and might represent overstocking of chronic medications. This observation is in line with what has been observed in Australian national drug prescription databases [60] .

The most notable changes were observed in activities involving vaccinations. First, we see that although there still was a reduction in the total number of vaccinations, the drop was relatively minor compared to the rest of the GP activities. Vaccinations dropped an average of 1.3% and all other activities dropped an average of 23.6%. This contrasts to what has been reported elsewhere, where the 2020 pandemic has been associated to significant reduction in vaccination rates [11] . When we look into specific vaccines, we can see an increase in influenza vaccinations together with an earlier peak. This is in line with public health campaigns urging citizens to get their annual influenza vaccines and prevent a double epidemic. Interestingly, in older adults we can see a parallel increase in pneumococcal vaccinations. The most likely explanation was the drive to reduce any preventable respiratory infection in preparation of the impending pandemic. Finally, vaccines normally recommended for international travel (Yellow Fever, Japanese Encephalitis, Cholera) practically disappeared, as a consequence of the severe limitations to international travel.

From the process mining perspective we faced several challenges related to the problem of analysing a vast amount of process execution data. To the best of our knowledge, the event log analysed in this study represents the largest real-life event log used for automated process discovery and process variant analysis, especially, in the healthcare context. We showed that traditional process mining tools present some limitations when attempting to analyze processes with high behavioural variability.

The first challenge consisted of imprecise timestamps, since the time granularity was limited to day-level, a recurrent problem in the healthcare context that yet has to be solved. In our case, we relied on clinical knowledge to address this issue by defining a sequence of clinically meaningful activities as a tie-breaker for activities that had identical timestamps.

The second challenge involved the identification of start and end events for a process that is, by nature, unbounded. Once again, we relied on domain expertise to overcome this problem and we presented a generalisation of our solution, suitable for various contexts, in the algorithm described in Section 3.

The third challenge was the amount of data itself, and its high behavioral variability, which disarmed state-of-the-art process mining techniques for automated process discovery and process variant analysis. Although the scope of this study was not to devise novel variants of these techniques to deal with such type of data, we highlighted possible directions for future research addressing the improvement of these techniques.

Lastly, our study highlights that process mining techniques cannot yet leverage the event log information that is not related to the process behaviour and control-flow (e.g., patient age, medications, etc). This requires process analysts to integrate process mining analysis with data mining analysis. While this problem could be solved straightforwardly by further analysing the data from a different perspective, we call for future analysis methodologies and tools that automatically integrate both process and data perspectives.

This study represents the first application of process mining techniques to analyze the impacts of the COVID-19 pandemic in the patterns of primary care service utilization, specifically, in the General Practice day-to-day healthcare processes of Victorian 15 patients. Our analysis identified several relevant changes in the behavioural patterns of the patients. While some of these changes were expected, i.e., overall reduction in number of attended GP visits, some were not, i.e., increase in the number of medication prescriptions, less than expected drop in vaccinations, and increase of influenza and pneumococcus vaccinations -in contrast with research findings from different geographical areas [9, 11, 10] .

The size of the data-set under analysis -counting 31-million events -and the variability of the observed process behavior were unique, and the challenges we faced and overcame during the process mining analysis clearly highlighted the need for improving existing process mining techniques, drawing directions for future work. In particular, future process discovery techniques should integrate in the discovered process models also data surrounding the process behaviour and its control flow. In the healthcare context, this data is the information capturing a patient profile (e.g., age, gender, etc) and their medical procedure (e.g., type of vaccination or prescribed medication). Furthermore, existing process mining techniques are not tailored to deal with large amount of data that captures highly variable behaviour. Future research should consider the design of methods that can automatically filter process execution data to detect and extract the most relevant/interesting process behaviour (not necessarily the most frequent) by analysing the outputs of a set of process mining techniques (e.g., a combination of automated process discovery and process variant analysis). Lastly, as process mining applicability in the healthcare context gains momentum, novel process mining techniques should be tailored for such a context and leverage domain expertise to increase their effectiveness.

Fundamentals of business process management

Process mining in healthcare: A literature review

Leveraging semantic labels for multi-level abstraction in medical process mining and trace comparison

Discovering role interaction models in the emergency room using process mining

A data-driven framework of typical treatment process extraction and evaluation

An approach to automatic process deviation detection in a time-critical clinical process

Recommendations for enhancing the usability and understandability of process mining in healthcare

Effects of the covid-19 pandemic on routine pediatric vaccine ordering and administration-united states, 2020, MMWR. Morbidity and mortality weekly report 69

The impact of the covid-19 pandemic on immunization campaigns and programs: a systematic review

Impact of covid-19-related disruptions to measles, meningococcal a, and yellow fever vaccination in 10 countries

Process mining in healthcare: Data challenges when answering frequently posed questions, in: Process Support and Knowledge Representation in Health Care

Process-aware information systems: bridging people and software through process technology

Systematic mapping of process mining studies in healthcare

Process mining in healthcare: a systematic review

Flexible heuristics miner (FHM), in: Computational Intelligence and Data Mining (CIDM), 2011 IEEE Symposium on

Fodina: a robust and flexible heuristic process discovery technique

Split miner: automated discovery of accurate and simple business process models from event logs

Discovering block-structured process models from event logs containing infrequent behaviour

Automated discovery of process models from event logs: Review and benchmark

Application of process mining in healthcare-a case study in a dutch hospital

Conformance checking

Conformance checking: a state-ofthe-art literature review

Declarative process mining in healthcare

Process variant comparison: using event logs to detect differences in behavior and business rules

Business process variant analysis based on mutual fingerprints of event logs

Business process variant analysis: Survey and classification, Knowledge-Based Systems

Detection of statistically significant differences between process variants through declarative rules

Peters, Combining business process and data discovery techniques for analyzing and improving integrated care pathways

Investigating clinical care pathways correlated with outcomes

Measuring patient flow variations: A cross-organisational process mining approach

Process mining for clinical processes: a comparative analysis of four australian hospitals

Covid-19 and stroke-a global world stroke organization perspective

Decrease in stroke diagnoses during the covid-19 pandemic: Where did all our stroke patients go?

Covid-19 pandemic and the reduction in st-elevation myocardial infarction admissions

Impact of the covid-19 pandemic on cancer care: A global collaborative study

Access to cancer surgery in a universal health care system during the covid-19 pandemic

Impact of the covid-19 pandemic on lung cancer screening program and subsequent lung cancer

Impact of covid-19 pandemic on colorectal cancer screening program

Review of the impact of covid-19 on medical services and procedures in australia utilising mbs data: Skin, breast and colorectal cancers, and telehealth services

Early estimates of the indirect effects of the covid-19 pandemic on maternal and child mortality in low-income and middleincome countries: a modelling study

Special feature: immunization and covid-19

Data for Decisions and the Patron Program

Gathering data for decisions: best practice use of primary care electronic records for research

Pm 2 : a process mining project methodology

Psychological underpinning of panic buying during pandemic (covid-19)

Process Mining -Discovery, Conformance and Enhancement of Business Processes

Optimization framework for dfg-based automated process discovery approaches, Software and Systems Modeling

Multi-perspective comparison of business process variants based on event logs

Event log imperfection patterns for process mining: Towards a systematic approach to cleaning event logs

Wanna improve process mining results?

Improving documentation by repairing event logs

Cleaning structured event logs: A graph repair approach

Cleaning timestamps with temporal constraints

Automatic repair of same-timestamp errors in business process event logs

Filtering out infrequent behavior from business process event logs

Discovering more precise process models from event logs by filtering out chaotic activities

Improving process discovery results by filtering outliers using conditional behavioural probabilities

Reduced in-person and increased telehealth outpatient visits during the covid-19 pandemic

Increased dispensing of prescription medications in australia early in the covid-19 pandemic