key: cord-0331268-2jcg70n0
authors: Franke, Loraine; Weidele, Daniel Karl I.; Zhang, Fan; Cetin-Karayumak, Suheyla; Pieper, Steve; O'Donnell, Lauren J.; Rathi, Yogesh; Haehn, Daniel
title: FiberStars: Visual Comparison of Diffusion Tractography Data between Multiple Subjects
date: 2020-05-16
journal: nan
DOI: 10.1109/pacificvis52677.2021.00023
sha: 11c77db495d671dc44950e99b49ab80a7eed1562
doc_id: 331268
cord_uid: 2jcg70n0

Tractography from high-dimensional diffusion magnetic resonance imaging (dMRI) data allows brain's structural connectivity analysis. Recent dMRI studies aim to compare connectivity patterns across subject groups and disease populations to understand subtle abnormalities in the brain's white matter connectivity and distributions of biologically sensitive dMRI derived metrics. Existing software products focus solely on the anatomy, are not intuitive or restrict the comparison of multiple subjects. In this paper, we present the design and implementation of FiberStars, a visual analysis tool for tractography data that allows the interactive visualization of brain fiber clusters combining existing 3D anatomy with compact 2D visualizations. With FiberStars, researchers can analyze and compare multiple subjects in large collections of brain fibers using different views. To evaluate the usability of our software, we performed a quantitative user study. We asked domain experts and non-experts to find patterns in a tractography dataset with either FiberStars or an existing dMRI exploration tool. Our results show that participants using FiberStars can navigate extensive collections of tractography faster and more accurately. All our research, software, and results are available openly.

In recent years, studying the brain and its neural connectivity has become an emerging discipline among various research fields. Especially, diffusion magnetic resonance imaging (dMRI) is currently the only technique that enables tracing the structural anatomy of white matter tracts in-vivo in the human brain. DMRI is sensitive to molecular water diffusion, charaterizes subtle changes in the brain microstructure and measures structural connectivity abnormalities in white matter tracts [5] . To analyze the white matter tracts and construct maps or diagrams of the brain's structural connectivity with high-resolution images, researchers use a process called tractography [6, 17, 37] . DMRI tractography has gained in popularity in clinical practice and research on brain diseases such as autism, multiple sclerosis, stroke, dementia, and schizophrenia [3, 54] . Moreover, dMRI is a powerful tool to track and detect disruptions in structural connectivity regarding brain disease and disorders [22, 23] . For example, when comparing the brain connectivity between healthy and disease populations, it is critical to understand the potential pathology.

Tractography data needs interpretation to be useful, and therefore visualizations are required to understand the underlying tissue microstructure of fiber tracts. High-dimensional fiber tracking datasets consisting of tens of gigabytes in size with millions of fibers, and the spatial 3D characteristics yield fundamental challenges for data exploration and visualization. Tractography data can include millions of 3D polylines with each line representing the path of a single white matter tract. These lines can form a fiber bundle or also called a cluster. The data is highly variable across fiber cluster and subjects.

Our main goal is to visualize fiber data in an efficient way that allows comparisons between different clusters or bundles and subjects. We choose to pair existing 3D visualizations with a 2D approach to reduce the complexity of the data. FiberStars aims to assist researchers among various disciplines, including neuroscientists, neurosurgeons, and psychologists. State-of-the-art tools used by clinicians are linked to an extensive workload and explicitly require to navigate through massive amounts of data independently. We design a web-based analysis tool for brain connectivity research that is easy to use for novices and experts alike. Another goal is that users do not necessarily require a detailed understanding of complex relationships and patterns in the data. With FiberStars, users can generate and validate new hypothesis when comparing tractography of multiple subjects with several levels of abstraction. We visualize large multi-subject datasets in a projection view that shows the overall distribution. A compact 2D representation creates fingerprints for different subjects and fiber clusters. Finally, we support the paired visualization of 3D anatomical renderings with 2D representations across multiple subjects and multiple regions of interest across different devices. We build off existing visualization research, that has demonstrated effectiveness of additional two-dimensional representations in medical imaging, such as for connectomics [2] or other fields such as cerebral arteries [47] .

While dMRI is a very specific use case, it is representative of a large class of complex visualization challenges involving multidimensional data or data composed of collections of subjects, and each with multiple 3D shapes with spatially-varying properties. Other contexts yielding similar visualization challenges are, for example, the visualization of biological diversity measured with microCT or surface scanning of specimens, or the comparison of large multidimensional astronomical data collected with multispectral telescopes. Overall, increasing amounts of these types of data are being collected in a variety of fields, while most of the current visualization methods are still insufficient.

Therefore, we present the design, implementation, and evaluation of FiberStars for the specific use case of dMRI data. However, we hope to inspire and contribute to other fields with similar data and visualization challenges. Our application facilitates the analysis of high-dimensional diffusion MRI data with different levels of abstraction. We focus specifically on ensemble visualization to allow the direct comparison of regions of interest and across multiple subjects. Such comparisons are important as tractography datasets are getting larger and include multiple timestamps. FiberStars maximizes usability, and our quantitative user study shows that novices without any tractography experience can generate meaningful insights. We also evaluate FiberStars with tractography experts and show that our software allows faster and more precise analysis compared to alternative state-of-the-art tools. All our materials and software with documentation are openly available on GitHub at https://github.com/lorifranke/FiberStars.

Among various scientific disciplines, the development of interactive three-dimensional renderings plays an increasingly important role in data and information visualization. In recent years, the field of neuronal connectivity visualization of brain imaging data has emerged. Current tools and libraries such as XTK [25] and Fiberweb [35] contribute to this task and allow web-based 3D renderings. However, such visualizations can be complex and hard to understand. Many works present approaches to visually explore the complex topology of biological datasets [16, 29, 30, 48] . To further decrease the cognitive load for visualization consumers, researchers suggest visualizing high-dimensional data with a reduced representation for data exploration and analysis [8] .

Neurolines [2] also provides dimensionality reduction. Researchers here visualize 3D brain tissue data as a 2D subway map. This multi-scale approach allows scientists to study connectivity with much greater ease than working in 3D. Further, Mohammed et al. [43] visualize similar datasets with different levels of detail. Another example is Jianu et al. [29] , where abstract 2D paths represent brain fibers while preserving anatomical information. Further literature aims to automatically cluster fibers providing similarity measures among fibers or whole fiber bundles. The remaining need for exploring fiber bundles was approached by [13-15, 32, 62] . But all these visualizations cannot analyze multiple datapoints across different subjects. Other related research has shown comparative visualizations, for example, using fMRI and multivariate clinical data [31] with parallel coordinates or on tenser changes of DTI with scatterplots [1] .

However, to allow comparative visualizations of DTI data, researchers need software that allows cohort and ensemble visualizations. One example is DiffRadar by Mei et al. [42] , a combined 2D dimensionality reduction with 3D fiber visualization using multidimensional scaling (MDS). Yet, to allow MDS, all fibers need to have the same number of vertices, and the data requires resampling, which might lead to distortion.

Another available application was created by Yeatman et al. [60] . The authors propose the AFQ-Browser (Automated Fiber Quantification) related to another previous approach called BundleMap [34] . This tool enables quantitative analysis of white matter fiber tracts and comparisons across different subjects. The authors compare healthy subjects with subject groups suffering from Multiple sclerosis. In initial experiments of using AFQ-Browser on our data, we faced challenges for effective comparisons across subjects, which will be discussed in Section 5. We design FiberStars to overcome these limitations and carefully compare our software to the AFQ-Browser in this paper. In Section 5, we evaluate the limitations of each application in more detail.

DMRI by Basser et al. [5] allows exploring information from in vivo fibrous structures such as white matter or muscles and is widely used across hospitals, universities, and research centers. In collaboration with neuroscience researchers, we studied how we can efficiently visualize dMRI data of single and multiple subjects.

Tractography data represents trajectories of fibers (or streamlines) in the white matter. These paths are positional data with different series of vertices (x, y, z) in 3D. Researchers then often attach per-vertex scalars or per-fiber properties to include additional information such as acquisition parameters, diffusion properties, or quantities obtained during processing or analysis. Researchers use this information to estimate cellularity, size of cell bodies and processes, or presence of myelin during the diffusion process [4] . ADHD dataset. We tested our tool on different datasets. The first dataset contains dMRI scans of subjects suffering from attention deficit hyperactivity disorder (ADHD) [50, 59, 61] . High-resolution MR images were obtained on a Siemens 3T scanner at Boston Children's Hospital, Boston, USA, with approval of the local ethics board. Multi-shell diffusion-weighted imaging (DWI) data were acquired using a simultaneous multi-slice acquisition factor of 2 at a spatial resolution of 2×2×2 mm 3 with 70 gradient directions spread over the three b-value shells of 1000/2000/3000 s/mm 2 . Whole-brain tractography was conducted using the unscented Kalman filter tractography (UKF) method, from the ukftractography package [40, 49, 51] . During fiber tracking, the following scalars were recorded, including the normalized signal estimation error, signal means, return-to-origin probability (RTOP), return-to-plane probability (RTPP), and the return-to-axis probability (RTAP). The ADHD dataset includes 67 subjects, each containing 800 clusters. HCP dataset. Our second dataset is from the Human Connectome Project (HCP) [56] . HCP data was acquired with a customized Connectome Siemens Skyra scanner, and acquisition parameters TE = 89.5 ms, TR = 5520 ms, phase partial Fourier = 6/8, and voxel size = 1.25 x 1.25 x 1.25 mm 3 . A total of 288 images were acquired for each subject, including 18 baseline images with low diffusion weighting b = 5 s/mm 2 and 270 diffusion weighted images evenly distributed at three shells of b = 1000/2000/3000 s/mm 2 . Scalars include diffusion measures of the fractional anisotropy (FA), mean diffusivity (MD), and the hemisphere location. Changes in these diffusion scalars are considered to reflect alternations of the underlying tissue microstructures. Quantifying changes is helpful for monitoring disease and abnormalities, for example, inflammation, cell death, changes in myelination, edema, gliosis, increase in connectivity of crossing fibers or in extra-or intracellular water and many more [7, 46] . Besides scalars, each fiber bundle contains cell data with properties such as embedding coordinate, cluster number, embedding color, total fiber similarity, and measured fiber similarity. Especially, total and measured fiber similarity in a fiber tract are of special interest in terms of crossing fiber bundle comparisons. Interpretation of changes in those scalar measurements is a complex task due to their non-specificity. Our tool supports any type and number of scalars attached to the fiber data. The data is available in .TRKand .VTP-file formats, which are XML type files. Both datasets include data of the right and left hemisphere of the brain. Each file includes data for one fiber bundle. The HCP dataset includes 100 subjects and additional metadata (such as patient demographics) as CSV-files.

In regular meetings with our collaborating scientists, we discussed goals and possible visualization designs. Through semi-structured interviews, we explored which type of visualizations are helpful for domain specific tasks. As most of the recent works show limitations in terms of multi-subject-multi-cluster comparisons, we decided to develop FiberStars. We derived the following requirements for a visualization tool: (R1) Multi-resolution: Multi-level visualizations from individual fiber clusters to whole-brain analysis. Most existing work focuses on single-fiber visualization only, which is useful for surgical planning and individual diagnosis in a clinical environment. Besides retrieving information of a certain fiber bundle, the system should support direct comparisons on the level of multiple fiber bundles of a subject's brain. (R2) Allowing to compare between multiple subjects. Our collaborators explicitly need group comparisons to study longitudinal scans from healthy subjects and those who suffer from diseases. The original purpose of FiberStars was to use the software for the Adolescent Brain Cognitive Development (ABCD) study. ABCD is the largest long-term study of brain development and child health in the United States [28] . With a targeted visualization tool, we can associate levels of brain development between subjects and co-founding factors such as water quality, pollution, social and lifestyle behaviors, physical activity, and others. Therefore, our domain experts need to assess anatomical structures, and the system should provide three-dimensional views paired with 2D views to make white matter tract differences in tractography easier and faster to identify. (R3) Interpretability and Usability: Provide an intuitive visual design that allows fast comparison and detection of group patterns, outliers or abnormalities in the high dimensional data. Even without prior knowledge of patterns in the data, the user can gain new insight into the dataset and identify subjects or clusters that are different from others depending on the properties or scalars. This can help domain experts to explore data faster and more easily. The targeted visualization tool requires scalable, interactive elements that allow working with high-dimensional data.

Our domain experts work with extensive dMRI data collections that require a complex computational infrastructure for processing, storage, and visualization. FiberStars provides a web-based user interface to explore, view and query the data for a user-defined set of subjects from an entire cohort of subjects across different diagnostic categories, including anatomical 3D visualizations. Our researchers perform data quality control in collections of unprecedented size, test hypotheses and answer research questions by comparing fiber tract changes in longitudinal studies for brain development as presented in R2. Clinical studies include the analysis of factors combined with a pathological finding, e.g. the fractional anisotropy (FA) in the corpus callosum in patients with auditory verbal hallucinations is reduced compared to the controls [19] .

We derived multi-level tasks (R1/R2) with the task taxonomy of Brehmer and Munzner [12] : T1: Analyze an abnormal measurement. When a possible outlier or abnormal cluster of a subject is pre-selected or has been identified in a previous step, the user might want to investigate how this measurement contributes to its abnormality. For example, we check if other measurements/scalars are abnormal or identify the abnormality's location along the fiber tract by browsing for deviations. T1 is derived from the interpretability and usability in R3 testing both 2D and 3D representations in each tool for the simple case of a single subject single cluster. The user is asked to browse and compare the relevant measurement as input to get an anomaly as output. T2/T3: Comparing. With relevant measurements as input, the user explores, compares and identifies [12] a certain measurement, or scalar value. Users can test prior assumptions and hypotheses about the selected cluster or selected subject of interest. In T2 the user has previously identified a subject of interest with anomalies, and then compares this previously identified cluster to multiple, or all other clusters, in a single subject's brain (R1). Vice versa in T3, a possible scenario is to compare an identified cluster to the same cluster of the other apparently healthy subjects (R2). T4: Identify anomalies/outliers and extreme values among multiple subjects and clusters. For an in-depth analysis, the user needs to discover, explore, and identify a relevant cluster or subject in a full collection (R2). Additionally, the user wants to retrieve information, such as the subject's metadata. According to [12] the user should discover, explore and identify one relevant cluster within the input of all available clusters. T5: Interaction within a cluster. Currently, our domain experts are unable to quantify differences in tract shape automatically. With FiberStars' 3D visualization and different color schemes, the user can interactively analyze the dMRI data. T5 measures overall performance and usability (R3). The user is asked to navigate, select and change a given anomaly, attribute or feature with a different color or cluster as output.

This sections describes the final design choices of FiberStars.

To tackle the challenge of creating a customized visualization software for the exploration process, we conceptualized FiberStars in close collaboration between neuroscientists and visualization researchers working with DTI data in an iterative process. Our collaborators regularly work with large-scale tractography data. The following five visual components offer tractography exploration on different levels and controlled navigation of dMRI data collections:

Universal Toolbar. The users can navigate through different views by selecting and deselecting several options in the navigation toolbar on the left. In the first place, it is possible to select the desired subjects and clusters. When selected one or multiple subjects, the fiber tracts are projected in the 2-dimensional projection view, and the navigation toolbar opens additional alternatives to select (R1). The user can select the scalar of interest and has different options on how to color the points in the projection view (see Section Projection View). The user can easily switch between multi-cluster view and split-screen view by enabling the 3D slider button (R3). Furthermore, the navigation toolbar offers different ways to color the 3D fiber tracts with dropdown menus.

Projection View. The projection view (Fig. 2) of FiberStars serves as an entry point for users without in-depth a priori knowledge about clusters. In the Universal Toolbar users can select subjects s ∈ S from the data collection. As we currently only support manual sampling of subjects, automated or guided approaches are subject to future work. Only after drawing a subject from the collection, its associated clusters C s are loaded lazily into FiberStars:

Then, for every cluster c ∈ C, we seek layout coordinates x c ∈ R d in the Euclidean plane (d = 2). In particular, we would like to preserve distances δ (c i , c j ) ≈ ||x c i − x c j || with c i , c j ∈ C and δ : C × C → R a distance function operating in the high-dimensional space of cluster scalars. In our application we find this problem formulation, known as Multidimensional Scaling (MDS) [55] , to be favorable over alternative dimensionality reduction techniques [39, 41] . Firstly, a less distorting approach allows for more intuitive reasoning, especially when expanding the exploration process from smaller to larger numbers of scalars (R1/R2). For example, consider a domain expert primarily interested in the relationship between the following scalars: Fractional Anisotropy (FA) and Estimated Uncertainty (EU). Then, the MDS solution in the projection view reduces to a simple, reasonable scatter plot-like layout to begin with. The domain expert can use the universal toolbar to add or replace scalars of interest to/from the scatter plot, which allows forming an intuition on the impact of the different variables. Further guidance in the resulting layout could be provided by adding artificial data points, which can be obtained by fully maximizing a single scalar while fixing all others to the minimum [58] . Secondly, we find that an analytical solution to the dimensionality reduction problem tends to be more user-friendly in that it is robust and requires no fine-tuning of artefactual parameters (R3). In FiberStars, our method of choice is PivotMDS [10] , as it satisfies the above requirements and can efficiently scale to very large data sets. Mei et al. [42] use a related technique, Landmark MDS [18] , to compute projections for fibers within a cluster. However, Brandes and Pich [11] show PivotMDS is superior to Landmark MDS in general graph layouts, a closely related problem. For better readability of individual data points in the already colorful space we further waive a density map overlay as suggested in [42] .

Via the Universal Toolbar, the user can further map cluster scalar or subject attribute values to points in the projection. The technique allows the user to select multiple scalars as dimensions and set different colors for the individual points (e.g. color points by gender, age, cluster etc.) Moreover, the projection view hosts two action listeners (Fig. 2) : 1 Upon hovering a cluster point, a detailed pop-up summarizes statistics about the cluster, its corresponding subject and also displays the abstract cluster representation in the form of a radar chart that is further described in the next part of this section. 2 Via rectangular brushing in the projection view, the user can choose clusters of interest, which will then be added to the alternative views across the application. Since we always display all selected clusters for all selected users in the following views of the application, we highlight all these clusters in the projection, even if they are outside of the drawn selection window. This feature, we find, has a useful side-effect: corresponding clusters in other subjects can be more quickly identified (R2/R3). Thus, the distribution of these grouped clusters can be assessed directly within a potentially crowded point cloud.

Abstract Representation. We choose consistent coloring of reappearing elements and information throughout FiberStars. For example, the same subject and the same cluster are always highlighted in the same color to avoid getting lost in the wealth of information. We implement color consistency in all views of the application. Furthermore, we leverage Pop-up elements that help the user to keep the orientation across different linked views (R3). In the projection view, we provide a Pop-Up when hovering over a single cluster point, as seen in 1 in Figure 2 , leading to more details of the cluster and the region of interest.

Comparison Matrix. The Multi-Cluster view (or Comparison Matrix) in Figure 3 enables a comparison between multiple subjects and multiple fiber clusters. In this view, the user selects any number of subjects and clusters of interest in the navigation toolbar (R2). For each cluster, one 2D radial plot is shown with normalized scalars on each axis. Displaying multiple radial plots next to each other as in Figure 3 , facilitates for the user to see the scalar differences between clusters. Especially, radial plots would allow a fast outlier detection of the scalars and properties, when multiple charts are shown next to each other. We evaluated alternative plots and designs for multi-dimensional data, for example line graphs as in [60] , parallel coordinates similar to [31] , as well as the option to integrate circular plots which we used in prototypes, heat maps or scatter plots. For exploratory purposes radial axes plots are useful to obtain an overview of the data [52] . However, due to the nature of star-based visualizations being less visually complex for the eye, a user can find outliers or extreme values on the star axes faster when these are juxtaposed [20] . In the Multi-Cluster view other displayed information includes subject demographics and overall statistics such as mean fiber length, total fiber similarity, etc. When filtered by certain clusters and subjects of interest, the user can switch back any time to the 3D split-screen view (R1).

3D Split-Screen Visualization. The split-screen view is shown in Figure 1 . Here we integrate a typical 3D scientific view with additional information from the 2-dimensional radial plots. The Split-Screen view allows comparing fibers from multiple subjects side-by-side (R2). Camera interaction can be synced across all 3D views. Additionally, the users can choose between 120 different available color maps and color the 3D fibers by selecting a scalar of interest to map. Moreover, this enables finding areas along the fiber bundle with high or low scalar values, whereas the user can easily switch between the scalar of interest. Domain scientists want to verify assumptions or hypotheses by additionally consulting the 3D anatomy. With the 3D view, experts can locate exact areas along the fiber bundle containing anomalies or relevant measurements and draw conclusions for specific regions of interest in a subjects' brain. A typical use case of the final FiberStars application can be seen in Figure 1 . The three-dimensional fiber clusters in the background can be interactively explored. Multiple subjects are shown next to each other, allowing interpretation of 3D illustration with 2D data summaries.

FiberStars builds interactive 2D and 3D visualizations with the JavaScript library. We use VTK.js for 3D renderings, a JavaScript implementation of the VTK software toolkit [53] which uses WebGL. For 2D visualizations, we included D3.js [9] . FiberStars is a Node.js application using React.js for the frontend. Adopting Node.js allows for flexible extensions in the future, such as additional visualizations, statistical plots, and other features. Being a web-based software, FiberStars runs hosted on a web server and does not require any client-side installation. For the first prototype, we brainstormed and experimented with different 2D charts to represent the scalars, where we grouped the data by scalar types and averaged the values of all scalars for all fibers. Then, we mapped these values to 2D using radial plots as described in Section 4.1. However, scaling was important. For example, in the ADHD dataset the scalar Normalized Signal Estimation Error has values between 0 and around 0.05, whereas the Estimated uncertainty has a range from negative to 22,000. We now perform Min-Max Feature scaling to normalize the ranges to allow comparison among subjects and clusters. React.js improves the overall usability and facilitates integration of additional components and features. With elements from the framework Material-UI, we were able to implement a modern user interface. This leads to the final design of FiberStars that seamlessly integrates different linked and interactive views that allow multi-dimensional data exploration. We support multiple tractography data formats. Our expert collaborators use VTK PolyData to store fiber clusters with a VTP file extension, offering a flexible data model and storing vertices, per-vertex scalars, and per-fiber properties as well as metadata without restrictions. VTK.js includes functionality to load fiber clusters from VTP files in JavaScript. Each cluster is then represented as VTK polydata, describing a surface mesh structure that holds additional data arrays in points, cells, or in the dataset itself. After loading these arrays, we calculate the means of all assigned scalars and properties per fiber and cluster. However, Tractography clusters stored as VTP files can be in the order of hundreds of Megabytes, and a whole-brain tractography can be as large as multiple terabytes in size.

Recently, we developed the Trako Compression Scheme [24] , and integrated it into Fiber-Stars. Trako allows compressing .trk, .tck and .vtp files while achieving data reductions of over 28x. While Trako uses lossy compression, our experiments show no loss of statistical significance when replicating analysis from previously published tractography experiments. Paired with state-of-the-art 3D geometry compression algorithms, Trako allows fast data transfer and realtime visualization with nearly no preprocessing. As part of this paper, we present Trako file readers for the VTK.js, Three.js, and XTK [25] visualization frameworks. For FiberStars, we convert Trako files to VTK poly data structures.

We evaluated the performance, effectiveness and usability of Fiber-Stars within an extensive user study. With a between-subjects study design, we recruited non-experts (novices) and domain experts to compare our software to the existing state-of-the-art tool AFQ-Browser [60] . We performed an A/B comparison of both applications. The results of our study confirm our design decisions.

Most of the major work in this field requires previous knowledge of the patterns in [42] are closest to our interactive tool. Table 1 includes the comparison of these tools and its most critical features for the exploration of dMRI data. Yeatman et al.'s AFQ-Browser has a user interface that consists of four different panels: a) bundles b) anatomy c) bundle details and d) subject metadata (Figure 4 ).While using their code, we noticed that transforming our dMRI files in the AFQ Browser only allows the input of a MatLab or JSON file format. Also, we were only able to use fibers with 100 data points. This might lead to a distortion of the data as fibers always have a varying number of points, and, therefore, we needed to interpolate the points. The other conventional tool, Fiber Models (DiffRadar) by Mei et al. [42] , shows the differences between DTI fiber data by providing an intra-cluster comparison of single fibers. Similarly to AFQ, the user interface in Figure 5 consists of four panels. The authors use a two-phase projection technique to map 3D fibers onto a 2D space with Multidimensional scaling (MDS). They implement a density estimation on their projection of the fibers. All fibers were reparameterized so that they have the same number of vertices and orientation by using the LAMP technique. This approach has already been shown in [16, 48] . The authors implement Landmark MDS (LMS) by randomly selecting some subset of fibers that are used as landmarks and then compute in a 2D plane the squared distance of fiber to the landmark fibers. A scatterplot is used to position similar fibers in a cluster next to each other, where they apply a Kernel density estimation to produce a continuous 2D density map of the scatterplot. Unfortunately, we were not able to include this software in our quantitative user study. Nevertheless, this approach might only allow intra-cluster comparison of single fibers but no possibility among multiple subjects and clusters.

Hypotheses. We propose three hypotheses to validate the design of FiberStars:

H1: FiberStars provides higher usability than AFQ-Browser. Both user groups evaluate the usability of each tool and how they perceive working with it. During the development of FiberStars, we focused on overall usability and intuitiveness with minimal and slick user experience in mind. We predict that participants using FiberStars report higher usability than the ones working with AFQ-Browser.

H2: Analyzing DTI data is more accurate in FiberStars. The users in our study are presented with the same datasets in both tools. FiberStars is optimized towards analyzing and comparing tractography data of multiple subjects. With matching amounts of training, we predict that novices and domain experts will more accurately explore scalars and properties of fiber clusters with FiberStars than with AFQ-Browser. We measure accuracy in terms of correct answers for the tasks.

H3: Within a given timeframe, the users are faster in solving tasks with FiberStars than with the alternative tool. We measure the time it takes participants to complete pre-defined tasks. These tasks were defined in connection with diffusion imaging researchers and replicate day-to-day use-cases of domain experts. Since we designed the FiberStars application with constant feedback and input by domain experts, we predict that participants using our software perform more efficiently.

Participants. At first, we evaluate the usability of both tools FiberStars and AFQ-Browser with participants without prior knowledge of DTI or tractography, recruited through flyers and mailing lists. We estimated to recruit 11 participants per tool including a dropout rate of roughly 10% [21, 27, 44] . From initial 22 participants, we had to exclude 2 subjects due to technical issues during the online meeting. Twenty participants completed the full study (N = 20). 13 participants were females and 7 males, with an age range from 18 to 39 years, consisting of students and workers with a variety of backgrounds. All participants reported not having any visual impairments. Participants received monetary compensation for their time. Additionally, we asked 6 domain experts to participate in our user study, testing one of the two tools. Qualifying domain experts are researchers who perform complex data analyses with dMRI or DTI data who have not used either AFQ-Browser or FiberStars before. We also excluded all researchers that helped during the design of FiberStars.

Data. For fair evaluation and comparativeness, we used the same data in both tools. We randomly picked a sample of 5 subjects from the Human Connectome project and to further reduce data loading times, selected every 50 th of the 800 clusters (total 16 clusters). We added the corresponding metadata to both tools, including information such as subject ID, age, gender, weight, and height.

Tasks. Following Section 3.1, we derived concrete tasks evaluating both tools in a controlled experiment. An expert user could get insights into the data after having identified a specific abnormal cluster (T1), or similarly, compare clusters from tractography of a single diseased subject (T2). T3 helps to identify a subject having anomalies in certain scalar values. In T4 the user needs to control for multiple variables. For example, an expert tests the hypothesis that FA1 is reduced in diseased subjects. We made sure all tasks were possible in both tools and structured them with increasing difficulty: T1: Interpreting a single cluster of a single subject. a) (For a given cluster) Which value is higher FA1 or FA2? b) (For a given cluster) Where along the fiber bundle is FA1 the highest? c) (For a given cluster) Is the standard deviation of FA1 rather large or small compared to that of Estimated Uncertainty? T2: Comparing multiple clusters of a single subject. a) (For a given subject) Which are the two clusters with the highest estimated uncertainty? b) (For two given clusters and a given subject) Which cluster has a higher number of fibers in the bundle? T3: Interaction between the same cluster in multiple subjects. a) (For given subjects and a cluster) In which subject is FA1 maximal? b) (For given subjects and a cluster) For which pair of subjects is the difference in FA2 minimal? T4: Comparing multiple clusters of multiple subjects. a) In which female subject cluster is FA2 maximal? b) In which cluster is FA1 minimal? c) (For each two given clusters of two subjects) Does subject A or B have a lower FA1? T5: General usability of components/features. a) In a view of your choice, color the data by subjects. b) Can you find a U-shaped cluster in the 3D visualization?

Procedure. Due to COVID19, the study was conducted online via 50-60 minutes arranged video conferences with the participants. We asked the participant to share their screens while working on the tasks. We assigned participants randomly to one of the two tools to avoid user bias. Each study session started with an introduction, demonstrating underlying visualization components with the fundamental interaction possibilities available in each tool. Then, users had 2 minutes to explore the tool and its main features, and were allowed to ask questions during this period. Then, we provided the participants with an online document describing the tasks to complete. Users wrote down short answers in the document after they thought each was done and immediately notified the experimenter when they finished a task. Users were not told if their answers were correct or wrong. The first task of each session (in addition to T1-5) served as an example to provide hands-on familiarity with the assigned tool. Following this training, the users performed the tasks while we measured their task-completion times. We budgeted a fixed timeframe of 5 minutes for each task. When a user was not able to solve a sub-task in this timeframe, we assigned a penalty of 150 seconds. After participants completed all tasks, they were asked to complete a post-study questionnaire accessing their demographic data and judgments of usability and qualitative feedback. Additionally, we used a standard NASA-TLX survey to assess the workload with 6 questions [26] .

Expert study design. We recruited 6 domain experts to evaluate the performance of both AFQ-Browser and FiberStars. We randomly assigned half of the experts to the AFQ-Browser and the other half to FiberStars. First, experts performed the same training task as the novices. After, we asked the participants to solve Task 1-4 with a twenty-minute overall time limit. For the expert study, we did not include Task 5 and instead asked for more extensive and detailed qualitative post-experiment feedback.

The results of our user study show an advantage of FiberStars in terms of accuracy and significant improvements of processing time for exploring the data.

We conducted a quantitative statistical analysis by analyzing the answers of the N = 20 novices and N = 6 domain experts.

Accuracy and Performance. Regarding accuracy in exploring the given DTI data, we examine the correctness of results by verifying the user responses for each task. For a correct answer in a sub-task, we assigned 1 point, and for a false answer, we assigned 0 points. Overall, the 10 participants using AFQ-Browser answered, on average, 69.17% of the tasks correctly (SD = 31.75%). The 10 non-expert participants using FiberStars were able to answer 87.5% correctly (SD = 10.55%). Task 1a 1b  1c  2a  2b  3a  3b  4a  4b  4c  5a 5b Mean AFQ Comparing the correct answers in both tools, the AFQ users were able to answer question 1b and 3a more often correctly than FiberStars users. In all other sub-tasks, the users either performed equal or more accurately with FiberStars, as depicted by the error rates in Table 2 . For testing the significance in mean differences for both tools, we used a two-sided statistical t-test with the null hypothesis, assuming that the two independent group means for AFQ and FiberStars are equal. From these values, we received a p-value of 0.0354 (p ≤ 0.05). With respect to H2, we therefore conclude that novices performed significantly better on our tasks with FiberStars. Efficiency. To test H3, we compared times the users needed to solve the tasks. The mean time per sub-task for the 10 AFQ users was 66.49 seconds (SD = 23.29), and for FiberStars 41.40 seconds (SD = 25.41). On average, FiberStars yields an improvement in speed of more than 37% for non-experts. Task 1b took least average time with AFQ, while with FiberStars, users required the least time for Task 5a with 20.5 seconds. Figure 6 shows the distributions. Novices performed faster with FiberStars in all tasks except subtask 1b. Summarizing all sub-tasks from Task 1 involving single subject single cluster problems, non-experts required 108.5 seconds on average with FiberStars, and 156.6 seconds with the alternative tool. Task 2 concerning multiple clusters of one subject, was solved in 121.6 with AFQ, and 58.9 seconds with FiberStars. Task 3 (one cluster across multiple subjects) required 100.6 seconds with AFQ, and 61 seconds with FiberStars. Task 4 comparing multiple subjects with multiple clusters, where AFQ users needed 212.2 seconds and FiberStars users 146.3 seconds. Task 5 took users 206.9 seconds with AFQ and 122.2 seconds with FiberStars. In general, users performed slowest in both tools when comparing multiple clusters across multiple subjects (high-level task). Overall, the expert users Table 3 : Mean values of subjective responses of novices showing the statements rated on a 7-point Likert-Scale (1 = totally disagree, 7 = totally agree). The asterisk * signs the statistically significant result.

The usability was very good. were faster with FiberStars in every task except for Task 1c. Improvement for domain experts was around 32%, with an average of 30.1s for one subtask with AFQ and 20.2s for a subtask with Fiberstars. We state the null hypothesis that group means are equal without variation from both groups. The alternative hypothesis states differences between the group means. The resulting p-value is statistically significant with 0.0195 (p ≤ 0.02). We reject the null hypothesis in favor of the initial hypothesis H3 with significant differences among both user groups. Overall, novices perform faster with our software.

We analyzed all non-expert answers (N = 20). The Likert Scale questionnaire helps validating our findings, clarifying hypotheses as well as giving feedback to the usability results. The NASA-TLX questionnaire to access workload in terms of mental demand, frustration, effort, etc. did not yield any interesting results. By using participants' subjective questionnaire responses, we evaluate the perceived performance of both tools. The participant's responses were recorded on a 7 point Likert scale with 1 = totally disagree, and 7 = totally agree in a post-experimental questionnaire. Fiber-Stars scored with a modest advantage compared to AFQ-Browser in perceived usability. Questions are stated in Table 3 . There was no statistical significance for these questions between both groups AFQ and FiberStars. Users rated FiberStars' usability on average with 6.2 (SD = 0.79) and AFQ with 5.8 (SD = 1.14) on the 1-7 scale. We could find a significant difference for the question 'Additional information beside the 2D plots was very helpful.' with p = 0.05. Additional information in FiberStars was rated on average higher with 6.4 (SD = 0.97) than AFQ with 5.1 (SD = 1.66). Regarding H1, novices liked working with both tools but preferred FiberStars' 2D visualizations. During the study, we additionally recorded which FiberStars component was used to solve a given task. For Task 1, 43.3% of the participants used the Comparison Matrix, whereas 56.6% used the split-screen view. In Task 2, roughly 75% preferred the split-screen view, while the other 25% used the Matrix Comparison View. Task 3 was mainly solved with the splitscreen view. Task 4 was dominantly solved with the Projection View (53.33%), followed by the split-screen view 26.67% and with the Matrix Comparison View by 20%. Task 5 was solved by 45% with the Projection View and 55% with the split-screen view. The use of a specific component depends on the given task. Tasks involving the comparison of different scalar values among a single subject or a single cluster were likely to be solved with either the split-screen view or the Matrix Comparison. For all tasks, the users found the answers in the radar charts. Tasks where users had to find values among a group with many clusters or subjects, they mainly used the Projection View. Grouping values from lowest to highest was the most popular ordering choice there. Domain Expert performance. The 3 experts testing AFQ-Browser needed, on average, 30.1 seconds per sub-task SD = 8.27, while experts using FiberStars required 20.2 seconds to solve a subtask SD = 11.35. The difference in mean time for both AFQ and FiberStars, is statistically significant at 0.0200 (p ≤ 0.05) (H3). The distribution of how experts performed in terms of timing is shown in Figure 7 . We used a two-sided t-test for testing statistical significance between novices and experts. Experts performed significantly faster than novices in both tools. Furthermore, we could not find significant differences between both expert groups in terms of subjectively perceived usability (H1). AFQ and FiberStars experts perceived usability as good in both tools. In terms of accuracy, the experts using AFQ were able to answer 73.33% (SD = 30.63%) of the tasks correctly on average, whereas the experts using FiberStars gave 96.67% (SD = 0.11) correct answers. We could find statistical significance between both group means in terms of correct answers for domain experts. With a p-value of around 0.0351, we confirm that there are significant differences among both groups in terms of correctness and reject the null hypothesis in favor of H2. For novices and experts, we could not find statistical significance in mean differences between the groups using AFQ and Fiberstars. 

We collected useful qualitative feedback from the experts. Most expert users had prior exposure to slicerDMRI [45] or TrackVis [57] , they were able to make themselves familiar quickly with both tools. In AFQ-Browser, users appreciated the brushing technique, which highlights clusters directly in the brain model. They further found the plots to be a good summary, despite having difficulties in differentiating single subjects. Experts missed a way to study exact values and would prefer more detailed statistics (mean or standard deviation for scalars). When comparing multiple clusters at once, participants requested an option to sort the 2D plots by scalar metrics in AFQ. Similarly, experts missed this sorting feature in the Comparison Matrix in FiberStars, despite having the option to sort by cluster or subject. However, they found it valuable to compare multiple numbers of clusters and subjects in the given level of detail. According to the users, the radar charts even reflected all of the essential information on DTI data. Experts would have liked to be able to configure the scalars shown in the radar chart. In the 3D visualization, a legend describing the color map would have been useful. The Universal Toolbar proved to be compelling, granting easy access to the data at all times. One expert expressed interest in FiberStars' compatibility with certain file formats. Summing up, multiple pros and cons in both tools underline our result from the previous section.

Qualitative feedback and quantitative analysis indicate novices and experts appreciated usability of both, FiberStars and AFQ. All groups were able to quickly adapt to user-interface design and understand functionalities of each tool. Both cohorts were significantly faster completing the tasks in FiberStars as compared to AFQ. Unfortunately accuracy in AFQ did not significantly increase for experts. In FiberStars, however, observed significant improvement from novice to expert efficiency suggests experts are enabled to harvest more of their full potential. This is an essential finding, providing experts with a tool for accelerating the process of analyzing large scale data sets. Based on questionnaire findings, FiberStars' 2D plots were found to be significantly more helpful than AFQs'.

Displaying metadata directly next to visualizations like 3D or radar charts is useful, but users still appreciate traditional tabular views as in AFQ. FiberStars' higher efficiency could stem from insights at the individual subject level combined with comparisons of multiple subjects. It was more difficult for non-experts to compare values across two subjects and two clusters in AFQ-Browser. This is reconfirmed by experts feeling challenged when analyzing multiple line plots while extracting and comparing data for several individual subjects. Moreover, participants face difficulties distinguishing subjects in the plots, as coloring by subject is not easily accessible in AFQ, which was also independently suggested by two experts in the qualitative feedback. Participants using FiberStars preferred to consult the Projection View over the Comparison Matrix to compare multiple subjects and multiple clusters simultaneously to find overall trends or correlations in the data. We assume that the Projection View can have a significant impact in contributing to our initial goal in comparing multiple subjects or drilling down to a representative sample of interest, thereby facilitating the exploration process for domain experts.

Limitations. Representing fiber bundle values in simple line plots is straight-forward and generally comprehensible. In Fiber-Stars, we experimented with encoding values in tract color gradients directly as part of the 3-D rendering, which seems less favorable in some situations (see Task 1b).

When working with users, we typically did not show more than 15 radial plots on-screen simultaneously. While shrinking or scrolling would allow representing even more components, we do not know at which level the otherwise useful comparison matrix would degenerate as fewer details can be spotted. However, previous small multiples visualizations frequently scale to in the order of 10 by 10 at a time [33, 36, 38] . We also want to emphasize that appropriate filtering steps (e.g., projection view Fig. 1 ) can lower demand for ever-larger matrix scaling. Moreover, with multi-subject DTI visualization still in its early attempts, it would have been desirable to evaluate FiberStars in the context of further alternatives, such as Fiber Model (DiffRadar).

We have presented FiberStars, a new open-source web-based visualization software to view extensive diffusion MRI datasets. In an iterative design process, we have derived requirements for analyzing such high-dimensional and longitudinal neuroscience datasets in web-based and scalable visualization software. Our resulting tool now supports the exploration of multiple fiber clusters across multiple subjects. The performed user study shows that it better supports experts on the given tasks and even lets novices gain insights from tractography data efficiently. In the future, we would like to investigate how we can automate steps in the analysis and create intelligent tractography exploration techniques taking even more data into account. Our findings and the open nature of our research will hopefully spur the adoption of web-based scientific visualizations and encourage further research in comparative visualizations and multidimensional scaling beyond the neurosciences.

Interactive formation of statistical hypotheses in diffusion tensor imaging

Neurolines: a subway map metaphor for visualizing nanoscale neuronal connectivity

Diffusion tensor imaging (dti)-based white matter mapping in brain research: a review

Clinical feasibility of using mean apparent propagator (map) mri to characterize brain tissue microstructure

Estimation of the effective self-diffusion tensor from the nmr spin echo

In vivo fiber tractography using dt-mri data. Magnetic resonance in medicine

Microstructural and physiological features of tissues elucidated by quantitative-diffusion-tensor mri

Evaluation of artery visualizations for heart disease diagnosis

D 3 data-driven documents

Eigensolver methods for progressive multidimensional scaling of large data

An experimental study on distance-based graph drawing

A multi-level typology of abstract visualization tasks

An automated string-based approach to white matter fiberbundles clustering

Fiber segmentation using a density-peaks clustering algorithm

Abstractive representation and exploration of hierarchically clustered diffusion tensor fiber tracts

A novel interface for interactive exploration of dti fibers

Tracking neuronal fiber pathways in the living human brain

Sparse multidimensional scaling using landmark points

Neuroimaging auditory verbal hallucinations in schizophrenia patient and healthy populations

A survey of radial methods for information visualization

Beyond the five-user assumption: Benefits of increased sample sizes in usability testing

The connectomics of brain disorders

Parsimonious approximation of streamline trajectories in white matter fiber bundles

Trako: Efficient transmission of tractography data for visualization

Neuroimaging in the browser using the x toolkit

Development of nasa-tlx (task load index): Results of empirical and theoretical research

Number of people required for usability evaluation: the 10±2 rule

The adolescent brain cognitive development study

Exploring 3d dti fiber tracts with linked 2d representations

Exploring brain connectivity with two-dimensional neural maps

A visual environment for hypothesis formation and reasoning in studies with fmri and multivariate clinical data

Automated segmentation of white matter fiber bundles using diffusion tensor imaging data and a new density based clustering algorithm

Interactive coordinated multiple-view visualization of biomechanical motion data

Bundlemap: Anatomically localized classification, regression, and hypothesis testing in diffusion mri

Fiberweb: diffusion visualization and processing in the browser

Hipiler: Visual exploration of large genome interaction matrices with interactive small multiples

The big and the small: challenges of imaging the brain's circuits

Correlatedmultiples: Spatially coherent small multiples with constrained multi-dimensional scaling

Visualizing data using t-sne

Filtered multitensor tractography

Umap: Uniform manifold approximation and projection for dimension reduction

Visually exploring differences of dti fiber models

Abstractocyte: a visual tool for exploring nanoscale astroglial cells

A mathematical model of the finding of usability problems

Slicerdmri: open source diffusion mri software for brain cancer research

Does diffusion mri tell us anything about the white matter? an overview of methods and pitfalls

Cerebrovis: Designing an abstract yet spatially contextualized cerebral artery network visualization

Employing 2d projections for fast visual exploration of large fiber tracking data

Multi-shell diffusion signal recovery from sparse measurements

Sparse multi-shell diffusion imaging

Joint multi-fiber noddi parameter estimation and tractography using the unscented information filter

Radar plots: a useful way for presenting multivariate health care data

The visualization toolkit: an object-oriented approach to 3D graphics

Diffusion imaging, white matter, and psychopathology

Multidimensional scaling: I. theory and method

The wu-minn human connectome project: an overview

On graphical representations of similarity in geo-temporal frequency data

Detecting microstructural white matter abnormalities of frontal pathways in children with adhd using advanced diffusion models

A browser-based tool for visualization and analysis of diffusion mri data

Suprathreshold fiber cluster statistics: Leveraging white matter geometry to enhance tractography statistical analysis

An anatomically curated fiber clustering white matter atlas for consistent white matter tract parcellation across the lifespan

We would like to thank the participants of the user study and the authors of AFQ-Browser for helping with the installation and setup.