key: cord-0870276-6ynmaii7
authors: Baumgartl, T.; Petzold, M.; Wunderlich, M.; Hohn, M.; Archambault, D.; Lieser, M.; Dalpke, A.; Scheithauer, S.; Marschollek, M.; Eichel, V. M.; Mutters, N. T.; Consortium, HiGHmed; Landesberger, T. von
title: In Search of Patient Zero: Visual Analytics of Pathogen Transmission Pathways in Hospitals
date: 2020-08-21
journal: IEEE transactions on visualization and computer graphics
DOI: 10.1109/tvcg.2020.3030437
sha: 8751ff3fffd901cf6c8095990f3bce0666066414
doc_id: 870276
cord_uid: 6ynmaii7

Pathogen outbreaks (i.e., outbreaks of bacteria and viruses) in hospitals can cause high mortality rates and increase costs for hospitals significantly. An outbreak is generally noticed when the number of infected patients rises above an endemic level or the usual prevalence of a pathogen in a defined population. Reconstructing transmission pathways back to the source of an outbreak -- the patient zero or index patient -- requires the analysis of microbiological data and patient contacts. This is often manually completed by infection control experts. We present a novel visual analytics approach to support the analysis of transmission pathways, patient contacts, the progression of the outbreak, and patient timelines during hospitalization. Infection control experts applied our solution to a real outbreak of Klebsiella pneumoniae in a large German hospital. Using our system, our experts were able to scale the analysis of transmission pathways to longer time intervals (i.e., several years of data instead of days) and across a larger number of wards. Also, the system is able to reduce the analysis time from days to hours. In our final study, feedback from twenty-five experts from seven German hospitals provides evidence that our solution brings significant benefits for analyzing outbreaks. It is also applicable to COVID-19 hospital-associated transmissions.

Pathogen transmissions are an acute problem in hospitals around the world [21, 25] . The transmission of pathogens, such as bacteria and viruses, can endanger the lives of patients, since they represent a very vulnerable group of persons. Pathogen infections in hospitals can be transmitted via patient-to-patient contact in the same room or ward [19] . Transmissions are generally difficult to detect since patients may be infected, but show no clinical symptoms. These carriers are invisible sources for potential transmissions. Screenings can detect carriers, but regular screening of all patients is very costly and ineffective. Thus, only high risk patients are usually screened when entering the hospital or after transfers within the hospital.

One initial infected patient, a patient zero or index patient, may transmit the pathogen to other patients [21] . These patients may change wards and infect further patients. The endemic level represents the usual prevalence of a pathogen in a defined population. When the number of infected patients rises above a certain endemic level, and a spatio-temporal context is observed, it is called cluster or outbreak. The endemic level is characterized by parameters such as the type of patients and the location within the hospital (underlying diseases), as well as the pathogen specifics: e.g., the mode of transmission, seasonality of occurrence, the resistance potential and infectiousness. As automatic outbreak detection is still an ongoing research problem [22, 84] , usually, the endemic level is set manually by comparing numbers of newly infected patients within a certain time frame with the numbers of infected patients in a previously recorded time frame. As not all patients can be regularly tested or screened, depending on the pathogen, it can take days, weeks or months before an outbreak is detected [28] . Therefore, the monitoring of infectious persons and determining their transmission pathways is the primary goal of Infection Control experts, e.g., hygienists, clinicians, hygiene experts, in order to intervene in time and prevent further pathogen spreading.

Once an outbreak has been detected (Task 1), the infection control experts need to trace the infection back to its source in order to determine if the patients are connected and belong to an outbreak. This means the identification of all potentially infected patients back to the patient zero, i.e., the overall source of the outbreak (Task 2). The experts need to reconstruct these transmission pathways -by whom, when and where did a transmission occur. This information is used to find the origin of the infectious agent: whether it is nosocomialhospital-intern. Furthermore, patient localization (which rooms/wards -Task 3) and outbreak duration (begin/end -Task 4) need to be determined. The transmission pathway is then used to identify putative colonized or infected patients that are yet unknown and thus require testing (Task 5). Intervention procedures can then be implemented by isolating or cohorting affected patients in separate rooms, special disinfecting processes, increasing hand hygiene, screening and teaching staff affected: infected and exposed.

The identification of transmission pathways is challenging because it can occur over several contacts, involve several hospital wards and various time spans (days to years). Tracing requires integrating both the spatio-temporal patient information and microbiological test results. Outbreak pathway reconstruction needs to be fast and precise: each potential patient and their contacts need to be detected to prevent further disease spreading. Often there is uncertainty about the patient status at the time of a contact, as screening often detects an infection or colonization only at a later date. Thus, also potential infections need to be considered.

Currently, transmission pathway reconstruction is a time-consuming and potentially error-prone manual process. It may take days to weeks, using current hospital systems [28] . Visual analytics systems have the potential to support this analysis process by saving analyst time [16] . However, current solutions focus mainly on the disease evolution at a population level (see Sect. 2).

In cooperation with Infection Control experts from four German Hospitals and Infection Research Institutes, we developed a novel visual analytics system for the exploration of disease outbreaks. We used iterative, user-centered design within a common project HiGHmed over two years. The system offers several specialized views for the exploration of outbreaks and pathogen transmissions. The core contribution is a novel view for contract tracing that was inspired by the well known storyline visualization [38, 47, 53] . The visual design, layout and interactions enable to explore contacts as well as to automatically determine and highlight patients and their contacts that could transmit pathogenic organisms.

Our approach was applied to a real outbreak of Klebsiella pneumoniae in a large German hospital. The experts were able to effectively reconstruct the transmission pathway back to the patient zero in a faster and in a more comprehensive way when compared to existing methods. In our final qualitative study, we gathered feedback from twenty-five experts located in seven German hospitals. The results indicated significant added value when tracing transmission and analyzing outbreaks in hospitals.

Visual Analytics of Health Data. Visual Analysis of healthcare data is an active area of research. Event-based visual analytics approaches for health record data has been a main focus of this area [14, 27, 29, 42, 54, 55, 58] . Disease surveillance and epidemiology has also been of interest [50] . Visual analytics approaches for epidemiologists focus on the spatio-temporal evolution of a disease at a population level-how many people will be infected, in which geographic area(s), and the speed of the spread [13, 40, 83] . These visual analytics approaches target disease spreading over a large population at a macroscopic level. They visualize the number of people infected over time, but detailed views about individual patient contacts are not the focus. Machine learning methods working with visual analytics have been proposed [17, 35] , and more specifically for disease progression pathways [34] . They have been applied to the problem of infection control [43] , but in this closely related work the transmission pathway was not reconstructed. Machine learning is effective, but may not be feasible for rarely occurring pathogens, such as K. pneumoniae, where the numbers of items in the trained classes of (non)-infection are very biased. Thus, transmission pathway reconstruction and the identification of potentially infected patients requires interactive exploration.

Disease Spread and Dynamic Networks. As many pathogens are transmitted over contacts preferentially, disease spreading is often simulated as dynamic processes over contact networks. A number of approaches have focused on the population level: the number of infected patients through line charts [1, 19, 37, 57] . They use the contact networks in the simulation, but only provide macroscopic views of disease progression. The simulations often result in a collection of thousands of dynamic graphs with very few existing approaches for this data [10, 39] . Manynets [23] , GraphLandscape [30] and SOM-based clustering [76] extract graph properties and compare many static graphs and these methods could be adapted and applied. However, two very different graphs can have the same properties [15] . Network piling approaches [5, 72] allow the comparison of several static graphs in detail but do not scale well to larger graphs and have not been adapted to compare multiple dynamic graphs. Recently, a simulation and visualization of disease spreading over contact networks in hospitals has been presented [81] . It shows the disease and transmission probability on individual level, but takes only a static network as input. This simplification is done for simulation purposes. However, in real cases, patients move across wards over time. Both visualization and simulation should take into account the network dynamics over time. In sum, however, the simulations focus only on disease spreading prediction and not on the reconstruction of pathogen transmission pathways.

Visualization of Contact Dynamics. The reconstruction of pathogen transmission pathways requires analyzing patient contacts and patient infection status over time. A number of methods exist for visualizing a single dynamic graph over time in a scalable way [6, 56, 70, 74] . These methods use timeslices as a basis for the visual analytics process. The uniformity of timeslices poses a problem as the distribution of transmission events is unequal over time. Event-based methods [42, 51] for the visualization of dynamic networks [36, 64, 65, 79] are more applicable. However, they have not been designed for outbreak networks visualization in a way that fulfills the needs of the infection control experts.

The visualization of contact network dynamics for hospital data could be supported by dynamic set visualization and storyline approaches. Set visualization methods [2] for the visualization of set dynamics [46, 73] are able to show the set sizes and the number of changes between sets over time. Methods for visualizing long time series [7] and text visualizations [18, 60] have similar representations. Showing the group membership over time on individual level has also motivated the storyline research. Storyline visualizations [38, 47-49, 68, 69] devote one dimension, usually the x-axis, to time and encode each individual character in a story as a line. Lines that are placed close together to indicate the characters share a scene; when the lines separate, the scene ends. For disease spreading, this encoding can be used to show contacts in the same ward and forms a key part of our approach. Research in storyline visualization has focused on optimizing the compactness of storyline visualizations (either automatic or users-assisted) [4, 24, 38, 48, 49, 52, 62, 63, 68, 69] , reducing crossings [26, 33, 71, 80] , plotting approaches [61] , combining storylines with event-based methods [3] , genealogical data [32] , streaming and dynamic data [67, 82] , and contacts between living things or actors exhibiting similar behavior [53] . Reda et al. [53] is the closest approach to ours, but it needs to consider all contacts in the storyline. In our work, we can reduce clutter by determining and highlighting pathogen transmitting contacts. On the other hand, we need to show more specific locations inside the hospital or if the patient is discharged.

Over the course of one year, we investigated the data and tasks that are performed by infection control experts in four German hospitals during their procedures. The data and tasks were guided by the scope of the HiGHmed project. For a deep understanding of the data, tasks and current analysis methods, we conducted a structured interview following the methodology in [75] . We first used an online questionnaire that was answered by six infection control experts and epidemiologists. Moreover, we interviewed three hygienists to learn the current work methods by infection control experts in three hospitals and one infection control institute. Further details on the tasks and data were assessed during the iterative design process (see Sect. 4). Our system is designed to support one of their main tasks: tracing transmission pathways in outbreaks.

Our work resulted in these tasks without a strict workflow order: T1 Detect Outbreak. Is there an outbreak? When the number of infected patients rises above a normal level within a certain period, i.e., the endemic level, an outbreak occurs. Depending on the pathogen, this endemic level can be two or more patients. As patients may not be tested or screened generally, an outbreak is determined by manual inspection. T2 Outbreak Pathway.

T2. These potentially (i.e., putative) infected patients need to be identified via their contacts to infected patients.

The pathogen transmission analysis requires to combine two types of data 1) patient locations for determining contacts, and 2) microbiological data for identifying infection status. Due to privacy reasons, collection of this data is limited (see Sect. 8). The patient location is only determined by a log record of their ward, not by tracking. The infection status is only known at the time of screening or test. These data sets can span years with events recorded down to second precision. The time between two consecutive events (e.g. transfers) can be on the order of seconds to months. Patient Locations: The location data consists of a list of patient transfer events T R = {T R k }. A transfer T R k records that the patient P k , was transferred to location L k at time t T R k for the following reason type k . Thus, a transfer is T R k = (P k , L k ,t T R k ,type k ). The type k , determines whether the transfer was 1) between wards, 2) from home to hospital, i.e., first hospital admission, or 3) home between hospital stays, i.e., 'temporary home', or 4) was the end of a hospital stay. The current location of the patient is the destination of the patient's last transfer (see Fig. 2 ). The time intervals between successive transfers are irregular and cannot be transformed to regular intervals without sampling or without causing scalability challenges. Even very short stays at a location can lead to important contacts in pathway reconstruction.

Microbiological data: MB is information about the tests and screenings of patients for pathogens. More formally, it is a set of events MB = {MB j }. Each element MB j records which patient P j , was tested or screened s j , for which pathogen ρ, at what time t MB j and the result r j : MB j = (P j , s j ,t MB j , ρ j , r j ). Patients may have several microbiological data records associated with them or none at all. The analysis can take place before, during or after the hospital stay and the spacing between events is irregular.

The result r j determines patient's infection status (see Fig. 2 ):

• infected -carrier: A positive result of a screening means that the patient is colonized. Due to lack of data on recovery (see Sect. 8) from this moment onwards, the patient retains this status unless he/she is later identified as diseased. • infected -diseased: If a patient with symptoms is tested positively on the pathogen ρ, he/she is in a diseased state. Due to lack of data on recovery (see Sect. 8), the patient retains this status from the moment of the positive test onwards. • unknown: Before the first positive screening or targeted microbiological test, the infection state is unknown. If the patient is not tested for the pathogen ρ, it is unknown whether they are infected/colonized. If a patient becomes infected at a later point in the data set, the patient is labeled "unknown -will be infected".

Our approach was developed through iterative design with our domain experts over two years following guidelines for effective visualization design [31, 44, 59] . After the initial task and data analysis, we had regular quarterly meetings with project participants. The number of participants in the meetings varied as not all participants could attend each meeting. On average, ten of our expert participants were present in our meetings. The expertise included: infection control, hygienists, infection control data management, epidemiologists and infection control managers. Between the meetings, we communicated per email and held interactive sessions onsite.

Our first prototype had several linked views [77] (see Fig. 3a ). The epidemic curve view showed the number of infected patients over a two weeks period (see Fig. 3a -part 1,2). It highlighted potential outbreaks with infections above the endemic level per selected pathogens. For identifying hospital-associated (nosocomial) infections, the patient stay along with infection status was shown (see Fig. 3a-part 3) . Potential transmissions between patients were supported only by node-link diagrams showing patient contacts and current infection status (see Fig. 3a-part 4) . A total of thirteen domain experts responded to our call for feedback by filling out an online questionnaire [77] . The prototype demonstrated a high degree of usefulness (mode 4 and 5 respectively on a five point Likert scale). The experts especially appreciated views that were new to them: a) patient stay and the infection duration and b) the contact network. They wanted these views to scale to longer periods of time, larger sets of microbiological data, and views to support pathogen transmission pathway reconstruction.

The second design included a Patient Timeline View that showed the patient stay and his/her microbiological data (see Fig. 3b ). This design leveraged the infection control expert's experience with excelbased information. Each row is one patient, and the x-axis is time. The background color shows the infection status, colored horizontal bars show the patient location, and colored vertical bars show the screening and test results. Our experts found that this view provided a good overview of patient location and infection status. However, the separation of patients into rows made it difficult to spot contacts leading to pathogen transmission. Sorting the rows by the time of the first infection did not help, as a patient can have contact with several patients.

Our third design used a storyline representation of the infection data (see Fig. 3c ). Patients are lines and the x-axis is time. Lines are colored according to infection status. Contacts in this view are line bundles. Although the view shows all data needed for pathogen transmission analysis, the experts had difficulty relating to it. The layout that emphasized the movements made it difficult to follow patients and to determine transmissions.

The final visual interface is shown in Fig. 1 . This design combines the pathogen pathway reconstruction views of the user-selected pathogen needed for our tasks (see Sect. 3.1).

1. Epidemic Curve View shows the number of infected persons per day in order to support Task 1 -outbreak detection. The infection control expert can select the total number of infections or only new ones (i.e., without copystrains). To see how it relates to the endemic level, a moving average of the user-selected time period is shown. This view can show data for the hospital (see Fig. 9 top) or specific wards (see Fig. 9 bottom). It supports longer time periods via focus-and-context methods inspired by [78] . 2. Contact Network View shows the contacts of selected patients for determining putative infected patients (Task 5). 3. Transmission Pathway View supports Tasks 2-4. This view shows patient contacts and possible pathogen transmissions over time in a storyline-like view. The extensions are layout, design, and highlighting of potential transmission events more directly. 4. Patient Timeline View shows the patient location overlaid with microbiological data. It supports the visual analytics process with infection status and location information. The data encoding is unchanged since its early prototype described in Sec. 4.1.

The Transmission Pathway View shows patient infection status and contacts over time and across locations ( Fig. 1 (3) ). Outbreak duration, potential transmission contacts between patients, and patient locations are visible (Task 2-4). Each line represents a patient and the x-axis encodes time (Task 4). Each patient line starts with the earliest recorded admission to hospital and ends with the last recorded stay. Temporary home stays are also shown. This helps in detecting hospital-associated transmissions during previous stays (Task 3.1). Patient lines pass close to each other for every potential contact (Task 2.1). The y-axis encodes patient location. Fixed vertical positions for individual wards (as in Baling et al. [8] ) is not scalable due to the larger number of wards and patients (hundreds), but our layout still aims at preserving the vertical position of wards [4] .

(a) Initial visualization, see also [77] (b) Patient timeline view.

(c) Initial storyline-like design Line color conveys infection status, and background color is used to encode location information. Technical details are below.

We propose a modified storyline [47] layout algorithm to support our tasks. While typical approaches to storyline drawing optimize the number of edge crossings and minimize bends, we have the constraint of patient locations, including ward (Task 3), and patient contacts (Task 2). Fast layout is required for interactive exploration, as sets of patients in the view can change when filters are applied, and data sets are loaded. Thus, we prioritize runtime over crossing optimization and minimizing bends (see Sect. 8). We build upon existing layouts and combine them and adapt them for our purposes. Given these desired goals, our algorithm is structured as follows (see also Fig. 4 ):

1. Data pre-processing: Construct patient contact graph 2. Initialization: Compute initial layout 3. Constrained force-directed layout: (a) Line order: For simultaneous contacts of many patients, y order of lines according to the transfer time (b) Temporal adjustment: Align x position according to the exact time moments 1. Pre-processing We construct a directed acyclic graph (DAG) from the event-based data. Nodes are sets of patients at a location (i.e., patient contacts) or infection state changes (see Fig. 4a ). Edges are transitions between these states. As the transfers have a single timestamp, nodes are interpreted as instantaneous events. Edges that connect these nodes form a from-to transfer edge. This duplication also ensures the duration of the stay at a particular location, and all possible contacts between two transfer time moments. The locations, location changes, and patient contacts can be followed through paths in this graph. For changes in infection state, an infection state change node is created and passing through this node changes the patient's infection state.

2. Initialization As each node of the DAG represents a set of patients together in a location, an initial layout can be computed using dynamic set visualization based on Sankey Diagrams (D3 implementation used [12] ). This initial layout takes into account the cardinalities of the sets and operates on the structure of the DAG only. The exact time of patient transfers is taken into account in the finalization.

3. Constrained force-directed layout We employ a force-directed layout for storylines [47, 63] . Our initialization helps avoid local minima, but it requires refinement subject to additional constraints (e.g., to convey patient location). After experimenting with various ways to use space to convey patient location (i.e., all on the bottom), we decided to divide the screen into home -hospital -temporary home from top to bottom (see Fig. 4b ). Admission to the hospital is a vertical line descending from the top area of the screen. Temporary home transfers are vertical lines towards the bottom of the screen. When optimizing the layout with the force-directed algorithm, we use a constraint-based approach [20] to enforce these locations and use additional forces to encourage the desired properties of the layout. The first constraint preserves the temporal order of movements along the x-axis. The second constraint restricts movement outside y-axis areas for the three location types (Task 3.1). Additional forces are used to maximize the stability of the y position of a patient during a stay in a ward to help represent contact location (see Fig. 4c ) (Task 3.2.). Still, the y position can vary significantly when a patient stays in a location for long periods of time.

To remove this position change, we replace the individual y positions of these nodes with the average (see Fig. 4d ).

4. Finalization Lines are ordered from top to bottom according to the order patients entered the ward (see Fig. 4e ). Even though this may cause more edge crossings, this order helps support Task 2: tracing transmissions. Patients that were in the ward for longer periods of time are more likely to be involved in transmission events. Finally, individual nodes are placed at the precise time of their events along the x-axis, which is important for determining outbreak duration and time of possible pathogen transmissions (Task 4) (see Fig. 4f ).

Infection status is encoded using color (see Fig. 5 left) in order to help in tracing transmissions (Task 2). Differentiating between 'unknown' and 'unknown-will be infected' helps infection control experts track patients over long time periods, reducing the requirement to pan and zoom. Details on microbiological data are shown on demand through a tooltip. The contact location (Task 3), specifically the ward, is shown on demand by a colored background hull.

Using a process inspired by [32, 69] , patient lines are drawn smoothly. As short periods of contact can lead to transmission, wiggles in the line indicate new contacts (see Fig. 4e ). We emphasize contacts between patients on the same ward (Task 2) by minimizing the line width for vertical lines (see Fig. 5 right) . This also reduces overplotting as inspired by Tanahashi and Ma [68] .

Standard pan and zoom operations are supported as well as highlighting of selected patients. When a patient is selected, its line width is increased. Non-focus information is not filtered out [32] , as it is a requirement to be able to track patients in the data set. Straightening selected patient lines [38] would be an alternative, but this would This variant leads to more clutter and lower discriminability between home and 'temporary home' transfers.

The used variant distinguishes transfers from home (lines from top) and from 'temporary home'(lines from the bottom). 

Potential pathogen transmissions must be visible [45] among many patient contacts over long periods of time (months to years). Panning and zooming over such long time periods is inefficient and may lead to missing important contacts in the data. We developed specialized tracing interactions to support this visual analysis.

Our interactive approach supports 1) backward tracing -finding transmission events and patients in the past that could have infected a selected patient. The interaction enables the search for an index patient, i.e., patient zero (Task 2). 2) forward tracing -finding patients that could be infected by a selected infected patient at a later point in time, i.e., putative infected patients (Task 5).

We now explain the necessary computation required to interactively backward trace. Computation for forward tracing is done analogously. For a selected patient P, we calculate which contacts might have lead to the patient's infection. A pathogen can be transmitted from an infected patient P i to the patient P when the patients come into contact. Note, a contact between two diseased patients is deemed irrelevant as both patients are already diseased (see Fig. 6 top). We identify relevant contacts as shown in Fig. 6 : the pathogen transmission must have happened during a critical contact before the infection was detected -the time moment of the earliest positive microbiological result τ. For each P i of P and each contact location L, we determine the earliest critical contact before the infection is detected CC i,L = min t s {P j ,t s ,t e ,t s ≤ τ : L(P) = L(P i ) = L)}. We assume the starting time moment of the contact interval (t s ,t e ) where t t = t s as the transmission event time, L t is transmission location (Task 3) and P t is transmission contact. Note, P can have several potential transmission events -different persons and locations and different times. This analysis is repeated, especially for contacts with unknown infection status, as they could be potentially infected if they had contact to an infected patient before. These critical contacts are computed using a constrained path search in the DAG used for the layout. There is currently no bound on how far back or forward in time the transmission events are searched, but user-specified bounds could be implemented, depending on the specifics of the investigated pathogen.

In the view, the selected patient P is emphasized. Relevant contact patients P t are kept and the non-relevant patients are de-emphasized. The positive test/screening events τ are highlighted with ellipses (orange for carrier and red for diseased) and connected to potential transmission events (t t , P t , L t ) with flashback lines inspired by [49] (see Fig. 7 & 8) . The circle color encodes the type of relevant contact (Task 2.1). We show all possible transmission events -i.e., several flashback lines. Connections indicate the length of time the potential infection was not detected by the screening or testing (Task 4).

The visual analytics system was applied to real data from a large German hospital by an infection control expert. The anonymized data includes hospital location data of ∼180,000 patients and ∼900,000 microbiological data over four years. The focus of the case study is a multi-resistant pathogen of special interest: the bacteria Klebsiella pneumoniae. K. pneumoniae is a commensal gut bacterium that is a dominant cause of hospital-acquired infections. It is responsible for infections in the urinary tract, respiratory tract and blood stream [41] .

In the first quarter of 2012 (actual date anonymized), the infection control experts faced an outbreak of K. pneumoniae in an area with four wards. Staff members work on all four wards and the patients are frequently swapped between these locations. The total capacity of the area is 50 beds. The outbreak investigation performed at that time was done manually by the infection control experts, meaning the data was collected from several systems and merged into spreadsheets. The original analysis using classical epidemiological data assigned in total 12 patients to the outbreak. However, a deeper analysis of the pathogen transmission was not possible.

This use case shows how the infection control experts analyzed this outbreak retrospectively using the system. The interaction with the system was supported by the visual analytics experts. Since the patient zero was not found during the first outbreak investigations, the experts aimed to trace back the initial source of infection and to identify potential cases that were overlooked at that time.

First, the infection control expert consults the epidemic curve for K. pneumoniae of the first months of 2012 for the entire hospital to assess the number of infected patients (Task 1). The data was cleaned for copystrains (repeated samples of the same patients), allowing to focus on new positive results (see Fig. 9a ). Only a small increase in the number of infections at the hospital has occurred. Thus, this outbreak is difficult to detect automatically. Based on his expertise, his focus turns to the four wards, S276, S278, S279, and S295 (see Fig. 9b ), where the outbreak was initially localized. The numbers of new infections in February in the wards were: 1 patient (S276), 3 patients (S278) and 9 patients (S279) (see Fig. 9b ). The number of newly infected patients on the wards was higher than the endemic level assuming an K. pneumoniae outbreak on the ward. The expert begins to reconstruct the transmission pathway (Task 2) and visualizes all positively tested patients on the wards in the Transmission Pathway View (see supplementary material and Fig. 7 ). In total, there were 23 infected patients, which is higher than the original outbreak analysis showed. A dominant cluster of patients highlights 12 infected patients on ward S279 in winter 2011/2012. It reveals the first infected patient, patient P68475, who was already tested positive in November 2011 -three months before this cluster of K. pneumoniae (Task 4). By using the forward tracing interaction from patient P68475, the infection control expert identified further patients that were in contact with this patient (see Fig. 7 ). Thus, patient P68475 is the cause for the spreading of K. pneumoniae on ward S279, but not for all patients in the outbreak.

Since there was no previous contact of patient P68475 with colonized patients (carrier) or diseased patients (with K. pneumoniae) in ward S279, the expert uses the backward tracing interaction to search for potential contacts on other wards (see Fig. 8 ). The visualization showed that patient P68475 had no direct contact to patients with positive laboratory tests for K. pneumoniae. However, the tracing lines lead to ward S95. As shown in the zoomed view of Fig. 8 , patient P68475 had contact with infected patients indirectly via patient P76101, who, in turn, was in contact with two patients on ward S295 (Task 2.1). This is a novel insight. Both patients, P113749 and P108085, were tested positive for K. pneumoniae. Thus, a possible transmission route of K. pneumoniae to ward S279 could be via P113749/108985 − P761010 − P68475. As Fig. 8 shows, both patients are frequently returning to the hospital (many vertical lines from bottom) and usually stay for longer periods in the hospital (long horizontal lines). Forward tracing shows the potential beginning of the transmission events was before their first admission to the hospital (see the start and the length of the tracing lines in Fig. 10 ). Patient Timeline View confirms this finding (see Fig. 11 ), where the first positive microbiological results (red vertical lines of patient P113749 and P108985) are before the first hospital admission (Task 3.1). This is due to the limited dataset availability. The real first admission was before the start of the available data. The forward tracing shows that both patients could infect patients at ward S295 and S279 (see Fig. 10 ).

The Patient Timeline View (see Fig. 11 ) identifies a third infected patient, patient P152039 on ward S276, who tested positive before the increase of infected patients occurs (see red vertical line in 2011). Since all patients had contact to several patients, these three patients are regarded as patient zero(s) and the potential origin for the K. pneumoniae for this outbreak on the four wards S276, S278, S279 and S295 (Task 2& 3) with genetic analysis required for confirmation.

To provide further support and verification for the hypothesized transmission pathways, the experts also used genome information of (a) Epidemic curve for new infections of K. pneumoniaesum of whole hospital. the pathogen. Comparison of genomes of bacterial strains adds further evidence to the analysis of outbreaks. Details of this external analysis are provided in the annex in the supplementary material. The genome analysis confirms that eleven patients on the wards S279 and S295 were infected by one K. pneumoniae bacteria strain, including the potential patient zero candidate P113749. Interestingly, the two other potential patients zero P108085 and P152039 reveal substantial genome differences to this set of patients (see Fig. A1 in annex) . Thus, the expert assigns P113749 as patient zero of this outbreak on the wards. The genome analysis reveals P1520395 as a further source of infection for at least one more patient on the wards. Thus, the two transmission pathways occurred simultaneously (Task 2.3 ). This is a novel insight.

A further question is how many undetected transmissions the patient zero P11349 had and whether there are potentially other infected patients (Task 5). Therefore, the infection control expert opens a Contact Network View for this patient in the period when the initial transmissions were detected -beginning of November 2011 (see Fig. 12 ). The tracing interaction in this view highlights several potential transmissions to patients that were (not) detected as infected during their stay on the wards. Transmission Pathway View and Patient Timeline View (see Fig. 1 ) show that the non-infected patients were screened (e.g., rectal screening, urine, catheters) regularly and tested negative.

The use case provides support that the visual analytics system enables the infection control experts to conduct a classical epidemiolog- ical analysis in a fast, reliable and comprehensive way. Highlighting potential transmission events enables to reveal transmission pathways within the hospital. The steps taken for this analysis with our tool took the hygienist 30 minutes. He estimated that this analysis would normally take about two working days using his previous methods. The hygienists usually determine the epidemic curves and contact networks manually through raw data from multiple information systems.

This qualitative evaluation focused on the Transmission Pathway View through an online questionnaire. Twenty-five domain experts from seven institutions in Germany participated in this evaluation. All participants had several years of practical experience with infection control in hospitals: eleven were clinicians, epidemiologists or hygienists, six were medical data analysis experts, six were medical data experts, and two were healthcare managers. Overall, 60% of the participants were familiar with our interactive visualization, i.e., have participated in feedback sessions, or had seen a live demonstration of the tool. At the beginning of the experiment, a one-paragraph explanation of each view was provided with no further training. The online questionnaire showed the views from the use case and asked participants to read these visualizations and answer the usability and understandability questions as presented in Fig. 13-15 . Participants rated their answers to each question on a Likert scale: 1-5, with 5 P113749 Figure meaning 'fully agree' and could provide free text comments. Participants found the view useful (Mo = 5) and easy to understand (Mo = 4) (see Fig. 13 ). As expected, participants familiar with the tool gave higher scores. This result together with free text feedback indicates that training is needed for using the approach effectively.

We also assessed how well the views supported the intended tasks (see Fig. 14) . Participants could identify contacts (Mo = 4) and understand their infection status (Mo = 5) (Task 2.1). They can clearly identify patient stay duration and movement in or out of the hospital (Mo = 4) (Task 3.1). Participants found it harder to identify the wards (Mo = 3) (Task 3.2). The free text showed mixed reviews of using background color with some participants preferring it while others not. Therefore, we interactively enable/disable this encoding.

Participants found the highlighting of potential transmission events very useful (Mo = 4) and understandable (Mo = 4) (see Fig. 15 ). They found the feature helpful in determining which contacts could lead to a transmission and spreading of pathogens (Mo = 4, 5). The feedback on support for long time periods when determining outbreak duration (Task 3) was also positive (Mo = 5). In free text, experts called for further pan and zoom support, which we revised in our visual interface.

The comparison of the network diagram and pathway views showed that nine participants preferred Transmission Pathway View and six participants preferred Contact Network View. Ten participants preferred a combination of the two views. The distribution of preferences was the same for both familiar and non-familiar participants. This result is expected as the views target different tasks: "I need both views, as they help find an answer to different tasks." Transmission Pathway View is useful for detailed spatio-temporal analysis of transmissions and the Contact Network View is useful for an overview of contacts without temporal focus (e.g., finding a "super-spreader").

The free text feedback was positive: "Interesting and well thoughtout visualization. I was very enthusiastic about the presentation of this visualization. The visualization of such complex data is not-trivial." Participants found some training was required, but after understanding it, they found it very useful: "I find the Transmission Pathway View really good. Of course, one has to get used to it first and learn to use it first." One expert suggested broader applicability "I would strongly recommend this visualization to be shown to public health authority (Gesundheitsamt). It could be 'fit for purpose"'.

The participants found that scalability in terms of the number of patients posed a significant challenge. They recommended a two-staged process with filtering through the contact network and then show the filtered data in the Transmission Pathway View. Four of the twenty-five participants wanted to change the colors (e.g., showing each patient in a different color, using more prominent colors for the three areas (home, hospital, temporary home)). One participant requested further differentiation between carrier and diseased, as it is important for certain pathogens. A number of future extensions were suggested that mainly require additional data. We discuss them in Sect. 8.

The Transmission Pathway View layout was specifically designed for our tasks rather than to optimize edge crossing and wiggles. Still, the approach performs well on the benchmark Matrix dataset, with 14 entities and 90 transfers [66] (see Fig. 16 ). We used a laptop PC with an Intel i7-9750H CPU (2.6 GHz) and 32GB memory. Pre-processing lasts 18 ms, the initialization stabilized after 30 iterations (62 ms) and forcedirected layout after 10 iterations (135 ms). The layout of use case with 27 entities and 704 transfers and the same number of iterations lasted 2517ms in total (190ms + 1265ms + 1062ms). The computation of tracking interaction for all critical patients lasted 37ms in total.

Our data sets pose visual scalability challenges in terms of the time spans of the data sets, number of patients, number of locations, number of patient transfers, and the number of screenings and tests. Interaction is leveraged to allow the approach to scale to larger data sets, combining several views and automatically highlighting transmission pathway events. They allow to explore pathways of up to hundred patients (see Annex). Explicitly visualizing the number of forward and backward connections of a patient would help the system scale when tracing contacts. Other methods for increasing the scalability of our approach should be investigated.

The visual design of Transmission Pathway View relies heavily on colors to encode infection status and wards, limiting the number of wards that can be represented. Automatic detection of suspicious wards (i.e., a sudden increase in the number of cases) would help with this scalability along with forms of automated support. In order to scale a larger number of wards, interactive highlighting could help and hulls could improve space efficiency [68] . The expert feedback indicated a need for enabling annotations to the visualizations with genome data or measures taken during the outbreak, which could be implemented.

The data collection methodology and privacy considerations pose limitations on what can be analyzed. A core challenge is localizing contacts accurately, which is currently handled via electronic health records. More accurate data on patient mobility through the hospital, locations of rooms and beds, as well as information where procedures occur (e.g., surgery, endoscopy, radiography) could be helpful for certain visual analytics tasks. Technologies, such as RFID, could be used, but would incur significant privacy considerations. Also, current Layout Edge crossing Runtime (ms) Pre-processing (ms) Our 25 a (37 b ) 197 18 [38] 14 160 N/A Figure 16 : Layout quality using Matrix dataset [66] . ( a ) comparable to Storyflow [38] , ( b ) -including start/end lines.

records do not confirm when a patient has been recovered. Therefore, patients remain infected for the remainder of the data set, which is often not the case. These data need to be extracted from the current raw data by new algorithms. These algorithms have to be adjusted to the ward, the patient-type, the type of pathogen and the tests. Additional features, like the comparison and matching of antimicrobial profiles simultaneous with the integration of genomics results of the pathogens would help the infection control expert automatically cluster patients according to strains to better identify transmission pathways. We used an external tool to perform genome analysis to verify the suggested transmission routes. The genome analysis visualization is not part of our system. However, since laboratories have access to near real-time genome analysis, this data can be extracted and used for matching specific pathogens to help verify the transmission pathway hypotheses. Moreover, the visual interface could be extended to support provenance of analysis and interaction. In the future, it would be interesting to encode the mode of transmissions (e.g., airborne or contact) or other potential risk factors (e.g., diabetes, immunotherapy). Our approach was designed and implemented for transmission pathway reconstruction in hospitals. A modified version of the system has now been deployed in several hospitals for analyzing COVID-19 hospital-associated transmissions (see video in German at https://youtu.be/HAsb3dnUKyI). More generally, our approach could be used to visually analyze contact tracing graphs of localized outbreak clusters and in closed environments, such as cruise ships and buildings. The approach could be applicable to different definitions of spread, such as financial or information contagion. The tracing interaction could help with this reconstruction, but it needs to be adapted to the specifics of the application. For example for information spread, the model would take the beliefs about the information, i.e., whether a person supports or opposes the information [9] into account.

In this paper, we have presented a visual analytics system for visualizing transmission events and outbreaks in hospitals. The approach focuses on methods for reconstructing the transmission pathways for a certain pathogen back to a patient zero. This visual analytics approach was developed through iterative, user-centered design with seven German Hospitals and Infection Control Institutes or Units. As part of the design process, storyline visualizations were adapted to visualize temporal contact networks. Our approach was applied to a real outbreak of Klebsiella pneumoniae in a large German hospital. Infection control experts were able to effectively unravel the outbreak and reveal two distinct transmission pathways. Tracing back the initial source, patient zero was conducted in a faster and in a more comprehensive way when compared to existing methods. In our final qualitative user study of twenty-five users, we found significant value in the approach when making sense of outbreaks in hospitals. within the Medical Informatics Initiative (01ZZ1802B/HiGHmed).

To provide further support and verification for the hypothesized transmission pathways, the experts also used genome information of the pathogen. Comparing genomes of bacterial strains adds a further evidence to the analysis of outbreaks. This tool does not belong to the described system but represents a standard tool to describe the relatedness of pathogens collected from patients.

In brief, the DNA of each K. pneumoniae was extracted (using Qiagen DNeasy Blood & Tissue) and sequenced using the illumina MiSeq system (sequences available from the authors on request).

In total, the genomes of 23 bacterial isolates were analyzed and compared using the whole genome MultiLocus Sequence Typing (wgMLST) scheme for K. pneumoniae [11] (BioNumerics v7.6 created by Applied Maths NV). The results represent the genome relation of the bacteria strains in a minimum spanning tree. Identical or very similar genomes (less than 5 allelic differences) were regarded as the same strain and support a direct transmission. The genome analysis confirms that eleven patients on the subwards S279 and S295 were infected by one K. pneumoniae bacteria strain, including the potential patient zero candidate P113749. The bacterial strains of the patients show no differences in the wgMLST profile. Interestingly, the two other potential patients zero P108085 and P152039 reveal substantial differences to this cluster of patients (see Fig. A1 ). Based on the data, the expert assigns patient P113749 as patient zero of this particular outbreak on the subwards. The genome analysis reveals patient P1520395 as a further source of infection for at least one more patient on the considered wards. Thus, the two transmission pathways occurred simultaneously. Figure A1 : The genome analysis of patients infected by K. pneumoniae bacteria strain. The numbers on the line indicate the number of different alleles in the wgMLST scheme. Bacterial strains were regarded as identical/similar when less than five alleles differed. The visualization is a minimum spanning tree and was done with BioNumerics.

Interactive exploration and understanding of contagion dynamics in networked populations

The state-of-the-art of set visualization

Too much time not enough space: Multiscale event sequence visual analytics with m-SVEN

The y of it matters, even for storyline visualization

Small multipiles: Piling time to explore temporal patterns in dynamic networks

GraphDiaries: Animated transitions and temporal navigation for dynamic networks

Time Curves: Folding time to visualize patterns of temporal evolution in data

Storyline visualizations of eye tracking of movie viewing

A new rumor propagation model and control strategy on social networks

A taxonomy and survey of dynamic graph visualization

Genomic definition of hypervirulent and multidrugresistant Klebsiella pneumoniae clonal groups

D 3 data-driven documents

Integrating predictive analytics into a spatiotemporal epidemic simulation

Visual analytics for evaluating clinical pathways

Same stats, different graphs

What is visualization really for

Retain: An interpretable predictive model for healthcare using reverse time attention mechanism

Textflow: Towards better understanding of evolving topics in text

Hospital networks and the dispersal of hospital-acquired pathogens by patient transfer

Ipsep-cola: An incremental procedure for separation constraint layout of graphs

In-hospital costs of community-acquired colonization with multidrugresistant organisms at a german teaching hospital. BMC health services research

Comparison of statistical algorithms for the detection of infectious disease outbreaks in large multiple surveillance systems

Manynets: an interface for multiple network analysis and visualization

Minimizing wiggles in storyline visualizations

Nosocomial Infections and Multidrugresistant

Crossing minimization in storyline visualization

Eventthread: Visual summarization and stage analysis of event sequence data

Highmed-an open platform approach to enhance care and research across institutional boundaries

Proact: Iterative design of a patient-centered visualization for effective prostate cancer health risk communication

The graph landscape: using visual analytics for graph set analysis

Constructing and evaluating visualisation task classifications: Process and considerations

Tracing genealogical data with timenets

On minimizing crossings in storyline visualizations

DPVis: Visual analytics with hidden markov models for disease progression pathways

Retainvis: Visual analytics with interpretable and interactive recurrent neural networks on electronic medical records

Dynamic network plaid: A tool for the analysis of dynamic networks

Simulation of the spread of epidemic disease using persistent surveillance data. Simulation

Storyflow: Tracking the evolution of stories

Aggregated dendrograms for visual comparison between many phylogenetic trees

A pandemic influenza modeling and visualization tool

Fridkin, and the Emerging Infections Program Healthcare-Associated Infections & Antimicrobial Use Prevelance Survey Team. Multistate point-prevalence survey of health care-associated infections

Temporal event sequence simplification

Visual analysis for hospital infection control using a rnn model

A nested model for visualization design and validation

TreeJuxtaposer: Scalable tree comparison using focus+context with guaranteed visibility

Timesets: Timeline visualization with set relations

Software evolution storylines

Yarn: generating storyline visualizations using HTN planning

A system for generating storyline visualizations using hierarchical task network planning

A survey of visual analytics for public health

Visual analysis of parallel interval events

Storycake: A hierarchical plot visualization method for storytelling in polar coordinates

Visualizing the evolution of community structures in dynamic social networks

Visual analytics of electronic health records with a focus on time

Interactive information visualization to explore and query electronic health records

Mapping change in large networks

Gemfsim: a stochastic simulator for the generalized epidemic modeling framework

Combining visual cleansing and exploration for clinical data

Design study methodology: Reflections from the trenches and the stacks

Trains of thought: Generating information maps

Storygraph: Extracting patterns from spatio-temporal data

Variantflow: Interactive storyline visualization using force directed layout

Storyline visualization with force directed layout

Drawing dynamic graphs without timeslices

Event-based dynamic graph visualisation

Movie interaction dataset

An efficient framework for generating storyline visualizations from streaming data

Design considerations for optimizing storyline visualizations

istoryline: Effective convergence to hand-drawn storylines

Reducing snapshots to points: A visual analytics approach to dynamic network exploration

Block crossings in storyline visualizations

BayesPiles: Visualisation support for bayesian network structure learning

Visual analytics methods for categoric spatio-temporal data

Visual analysis of contagion in networks

Visualization system requirements for data processing pipeline design and optimization

Visual analysis of graphs with multiple connected components

Visual-Interactive Exploration of Pathogen Outbreaks in Hospitals

Timenotes: a study on effective chart visualization and interaction techniques for time-series data

Nonuniform timeslicing of dynamic graphs based on visual complexity

Computing storyline visualizations with few block crossings

Visual analysis of probabilistic infection contagion in hospitals

A layout technique for storyline-based visualization of consecutive numerical time-varying data

Pandemcap: Decision support tool for epidemic management

Supervised learning improves disease outbreak detection

Acknowledgments We would like to thank all infection control experts involved, in particular A. Wulff, P. Biermann, S. Rey, C. Baier and M. Kaase, for their helpful feedback. This work was supported by the German Federal Ministry of Education and Research (BMBF)