key: cord-028437-lza8eo9n authors: Shabaninejad, Shiva; Khosravi, Hassan; Leemans, Sander J. J.; Sadiq, Shazia; Indulska, Marta title: Recommending Insightful Drill-Downs Based on Learning Processes for Learning Analytics Dashboards date: 2020-06-09 journal: Artificial Intelligence in Education DOI: 10.1007/978-3-030-52237-7_39 sha: doc_id: 28437 cord_uid: lza8eo9n Learning Analytics Dashboards (LADs) make use of rich and complex data about students and their learning activities to assist educators in understanding and making informed decisions about student learning and the design and improvement of learning processes. With the increase in the volume, velocity, variety and veracity of data on students, manual navigation and sense-making of such multi-dimensional data have become challenging. This paper proposes an analytical approach to assist LAD users with navigating the large set of possible drill-down actions to identify insights about learning behaviours of the sub-cohorts. A distinctive feature of the proposed approach is that it takes a process mining lens to examine and compare students’ learning behaviours. The process oriented approach considers the flow and frequency of the sequences of performed learning activities, which is increasingly recognised as essential for understanding and optimising learning. We present results from an application of our approach in an existing LAD using a course with 875 students, with high demographic and educational diversity. We demonstrate the insights the approach enables, exploring how the learning behaviour of an identified sub-cohort differs from the remaining students and how the derived insights can be used by instructors. The use of online learning systems provides a rich set of data that makes it possible to extract information about student learning behaviours. This information provides an opportunity for understanding and improving education, which has motivated many universities to invest in learning analytics dashboards (LADs) [6, 28, 41, 48] . These dashboards generally provide visualisations of student data, collected from a variety of educational systems, to assist educators in making decisions [41] . However, the increasing popularity and improvement of online learning systems over the years has resulted in a significant increase data in terms of its volume, velocity and variety. Consequently, making sense of data in LADs has become more challenging compared to earlier years [43] . In some domains, a common approach to navigating large complex multidimensional data sets is to use drill-downs [39] . A drill-down operation, in an educational setting, allows users to explore the behaviour of sub-cohorts of students by progressively adding filters. Manual drill-down operations can generally be used by instructors to effectively investigate curiosity-driven questions that are related to student attributes. For example, it is possible to use a drill-down filter to find how international or female students have performed compared to other students. However, instructors may also be interested in finding which drill-down filters lead to insightful results. As an example, an instructor may be interested in finding drill-downs that identify a sub-cohort of students who have significantly different behaviour or performance compared to the rest of the class. Given the availability of a large number of potential drill-downs, manually finding drill-downs that provide insights is a challenging task [1, 42] . In this paper, we report on extending LADs with a functionality that provides recommendations of insightful drill-downs. Our approach takes a process mining lens to examine students' learning process considering three aspects of their learning behaviour: performed learning activities, the frequency of each activity and the order in which the activities are performed. Utilising the learning process, rather than focusing on aggregated engagement metrics which is the common approach in LADs [41] , is increasingly being recognised as essential to understanding and optimising learning [33, 46] . In our approach, the notion of an insightful drill-down is defined as a set of filtering rules that identify a sub-cohort of students whose learning processes are most differentiated from the rest of the students. Our key contribution is the design and development of an algorithm, which we refer to as Learning Process Automated Insightful Drill-Down (LP-AID). LP-AID employs a process mining method called Earth Movers' Stochastic Conformance Checking (EMSC) [29] to compute the distance between learning processes of different cohorts to recommend insightful drill-downs. We present a practical application of LP-AID in an existing LAD called Course Insights that provides users with a manual drill-down functionality. Specifically, we apply LP-AID to data from a course with 875 students, with high demographic and educational diversity, to demonstrate the drill-down recommendations and to explore the possible insights that can be derived from them. Our initial findings, and instructor feedback on our approach, suggest that LP-AID can be integrated into LADs to provide automated and insightful drill-down recommendations. Learning Analytics Dashboards (LADs). Several recent systematic literature reviews have been published on LADs [6,41]. Schwendimann et al. [41] provide a comprehensive picture of the common data sources that are used by LADS, which include clickstream logs (e.g., [12, 14, 25, 34] ), data related to learning artefacts (e.g., [11, 16, 20, 24, 45] ), survey data (e.g., [4, 35, 40] ), institutional databases (e.g., [9, 19, 23] ), physical user activities (e.g., [16, 31, 44] ) and data captured from external educational technologies (e.g., [10, 26, 27, 36] ). To make sense of these data LADs provide a variety of visualisation options. Schwendimann et al. [41] outlines the different types of visualisations that are commonly used in LADs, which include bar charts, line graphs, tables, pie charts, and network graphs. While these visualisations simplify the process of making sense of large data sets, they naturally abstract away much of the details related to learning processes, which are essential to understanding and optimising learning [17] . We aim to address this challenge by employing process mining approaches to guide drill-down operations and identification of insightful data. Smart Drill-Down Approaches. The concept of a drill-down operation was initially introduced in the context of OLAP data cubes. They enabled analysts to explore a large search space to identify exceptions and highlight interesting subsets of data [39] . In recent years, drill-downs have also been employed in analytical dashboards. While their use has enabled users to explore large datasets, they provide users with too many drill-down choices and also the potential for incorrect reasoning due to incomplete exploration [1] . Several attempts to address these challenges have been made. Many of the proposed methods for discovering insightful drill-downs focus on detecting anomalies in small data portions (e.g. [1, 37, 38] ) while some focus on identifying interesting differences in larger data subsets (e.g. [21] ). In this paper, we take a similar approach as [42] by letting LAD users request drill-down recommendations at a level of granularity they are interested in, thus reducing drill-down choices without affecting user autonomy. While [42] recommends drill-downs based on the difference between cohorts' attribute values, this paper bases the recommendations on the difference between cohorts learning processes. Process mining aims to derive information from historical organisational behaviour, recorded in event logs [2] . Educational process mining uses data from educational contexts to discover, analyse, and visualise educational and learning processes, for instance to analyse whether students' behaviour corresponds to a learning model, to detect bottlenecks in the educational process, to identify patterns in processes [7] , to study administrative processes [18] and to study student learning through their interactions with online learning environments [3, 8, 49] . Prior work [7] indicates that current educational process mining solutions have not adequately provided support for allowing users to identify and investigate cohorts of interest. Next, we introduce our method for recommendation of insightful drill-down criteria in LADs, by first introducing relevant concepts and defining our problem statement formally, presenting our approach, and illustrating it with an example. Assume that a LAD has access to an event log L that captures a collection of traces T = {t 1 , . . . t N }, each representing a student. A trace t i has a unique identifier (e.g. a student ID), a set of features presents v being assigned to feature f i for user s i and a sequence of events E i = e i1 , . . . e iLi representing the learning path taken by student s i , where the trace length L i can vary for each student. Each event e iLi has a timestamp and a label representing the learning activity. A rule r expresses a condition on a feature (e.g., 'program' = 'Computer Science' ). For a feature with numerical values in an event log L, the corresponding rule value can be a range instead of a single value (e.g., 'age' > 25). A drilldown criterion σ is defined as the conjunction of a set of rules (e.g., 'program' = 'Computer Science' ∧ 'age' > 25). A drill-down criterion σ is said to cover a student s n , if all rules in σ are satisfied for the corresponding features of s n . Consequently, applying σ to L leads to the selection of a set of students S ⊆ S such that σ covers each s n ∈ S . We define the coverage of a drill-down criterion C σ as |S | |S| , which is the fraction of students S covered in the resulting sub-cohort S . Using this notation, our problem can be formalised as follows: Formal Problem Statement: Given an event log L, a set of features F ⊆ F , a constant 0 ≤ α ≤ 1 and a constant k, find a set of drill-down criteria Σ = {σ 1 , . . . σ k } that uses features in F such that each criterion σ k : (1) has a larger coverage than α (i.e., C σ k > α), (2) selects a sub-cohort of students S that deviates most from the remaining students on their taken learning path L in terms of events, relative frequency of each different learning path and the order in which the activities have been triggered (i.e. the distance between the sub-log L and the remaining students L \ L ). We present our approach by first providing a high-level overview of the underlying algorithm, and then describing the automatic drill-down process using an example. Our algorithm takes the students event log as an input and returns a set of drill-down criteria annotated with the learning process distance and students' population coverage as the output. The algorithm examines all the possible drill-down actions to find the drill-downs that result sub-cohorts with the most deviated learning processes. Algorithm 1 provides the high-level pseudocode of our proposed approach. It takes four parameters as input: the event log L, the features F , the minimum coverage α and the number of drill-down criteria to be recommended k. The output of the algorithm is a set of top k scored drill-down criteria represented by Σ. The algorithm consists of three main blocks as described in the remainder of this section. Create Drill-Down Tree. The BuildT ree function takes two parameters as input: the event log L and the list of selected features F , and returns a drill-down tree. The function obtains all the values of each feature in F that exist within Algorithm 1. Finding a set of k smart drill-down criteria Sort and return the top K drill-down criteria return nodeT oDrillDown(topK) end function function PruneAndScore(Log L, Node parentN ode, Log parentL, Minimal Coverage α) Score nodes and prune the tree PruneAndScore(L, childN ode, cohortL, α) end if end for end function L and generates a tree-like collection of nodes T , where each node represents a splitting rule r for one feature. Each path in the tree consists of a set of feature-value pairs. Score Nodes and Prune the Tree. The tree embodies all possible drill-down paths, of which not all will necessarily result in a cohort with the required minimum size (i.e. α). P runeAndScore traverses the tree recursively to examine all the possible drill-down actions. ObtainSubLog takes each node, which is a pair of feature/value pairs, and its parent's event log parentL as input and filters parentL to obtain a sub-log cohortL containing only the data of the sub-cohort. The sub-cohort's size is checked for the covered fraction of the student population to not be smaller than α and not greater than 1 − α. If the condition is met, the main event log L is filtered to obtain the event log of the rest of students remainderL. Otherwise, the node is pruned (if coverage ≤ α) or discarded from scoring (if coverage coverage ≥ 1 − α). For each drill-down path, computeDistance takes the pair of the sub-cohort and the remaining sub-logs as input and computes the distance between them using Earth Movers' Stochastic Conformance Checking [29] . Sort and Return the Top K Drill-Down Criteria. topDistances takes the scored drill-down Tree T and k as input and returns k recommendations. To pick the k nodes, this function uses a solution set ranking function that maximizes diversity, similar to the approach by [47] . As an alternative we could pick the k highest scored nodes. However, diversifying the recommendation allows us to provide a wider range of insightful drill-downs. Our algorithm converts the chosen nodes to a set of drill-down criteria Σ, each annotated with distance score and returns them as a recommendation to users. In this section, we illustrate our approach using an event log with a small set of 6 students, and k = 1 and α = 0.2. We explain how our algorithm is used to find the most insightful drill-down criteria (namely the criteria that identify a sub-cohort with the highest distance) for the event log given in Fig. 1a ,b with students {S1 · S6} and the feature set: {Residential Status, Assessment} as F . Our example course has learning activities of: {Lecture 1, Lecture 2, Quiz A, Lecture 3, Lecture 4, Quiz B and Lecture final}, which were made available to students weekly in the mentioned order. The trace of triggered learning events by each student is shown in Fig. 1a . Each event is represented by an activity label and the timestamp. Our algorithm initially extracts all values of F that are present in the event log and generates the drill-down tree T . Next, the tree is traversed depth first; based on each node's filtering criteria, the event log is divided into the sub-cohort's sub-log and the remaining students' sub-log. The nodes covering less than α = 0.2 of the student population are pruned. For instance, the node [Assessment='Mid Grade'] is pruned as only one student (i.e. 0.16 coverage) adheres to this criteria. As a result, 5 actionable drill-down paths remain (shown in Fig. 1c) Our algorithm computes the distance between the sub-logs for each drill-down path and annotates each node by the distance d and the coverage (as shown in Fig. 1c) . The drill-down path P5, which has the highest difference (57%), is the resulting recommendation. Figure 1d shows the LP-AID interface in Course Insights, representing the input and the resulting recommendation, including the drill-down criteria, coverage and distance. To understand the difference between the learning behaviour of the subcohort and the remaining students, here we used Disco [15] to visualise the underlying learning processes of each group. Disco generates a Process Map in which: boxes represent activities, numbers in the boxes represent frequency of each activity, arrows represent sequence the activities were performed in (i.e. the control flow), numbers on the arrows represent frequency with which the two connected activities were performed, and thickness of the arrows the activities represent relative frequencies. For the demonstration purpose we highlighted the activities that were performed in a different order in red. To compare the two modelled learning processes, we look at the difference between the activities, their frequencies and their order. For instance, Fig. 1e shows that Lecture 3 was skipped by one of the two students in the cohort, while Fig. 1f shows that the remaining students have done this activity. From a control flow perspective, Quiz A and Quiz B were performed as the last activities by the cohort while the remaining students performed these quizzes during the semester. This section presents an application of our approach using an existing LAD called Course Insights, which is equipped with manual drill-down functionality 1 . We first provide background on Course Insights and its main segments. We then use data from a course that was integrated with Course Insights to: 1) explore the recommended drill-downs generated by LP-AID; 2) visualise the process deviation for an example drill-down, and 3) report on the comments and feedback that was provided by the course coordinator upon reviewing our recommendations. Course Insights. Course Insights (CI) is a LAD that provides filterable and comparative visualisations of students' aggregated daily activities. CI aims to provide actionable insights for instructors by linking data from several sources, including a Student Information System, Blackboard [5] , edX Edge [32], and embedded learning tools such as Echo360 [13] and Kaltura [22] to create a multidimensional educational data set. CI is embedded in the learning management system of The University of Queenslandand is available to all instructors. It is equipped with filtering functionality to enable instructors to drill-down into the data to explore the behaviour of sub-cohorts of students. Figure 2a illustrates the filter interface, which allows users to select attributes from demographic, assessment, engagement and enrolment features. When a filter is applied, statistical data and a graph representing the filtered versus unfiltered distribution of the target feature is presented (as shown in Fig. 2b) . We applied our technique to an introductory calculus and linear algebra course offered in 2019 to 875 undergraduate students from 16 programs. Following our data cleaning process, we were left with a dataset on 739 students. As the input for our approach, the event log includes three types of learning activities: (1) Accessing course materials: access to course materials by chapter. (2) Submission of formative quiz: submitting chapter based practice quizzes. Practice quizzes were formative assessments and thus optional. (3) Review summative assessment solutions: access to chapter based workbook solutions, released weekly. Workbooks were summative assessments, assigned weekly with a weekly requirement to submit their answer-sheets (paper based submissions). As the features F , we selected the attributes Brand New, Final Exam, Gender, Program, and Residential Status. A total of 2447 drill-down actions were possible for this data set. Table 1 presents the recommendations generated for this course using respectively small (α = 0.05), medium (α = 0.1) and large (α = 0.03) coverage. To investigate what insights can be derived from the recommended drill-downs, we used process discovery methods for the identified sub-cohort and the remaining students. Here, we demonstrate the insights derived from the recommended drill-down (1) (shown in Table 1 ). This drill-down results in a sub-cohort of: Brand new = 'Yes' and Residential status = 'International' and Final exam = 'High' and Gender = 'Male'. According to the LP-AID result, this sub-cohort's learning process is 72% different from the remaining students. To investigate the difference between the two learning processes we visualised the underlying process of the sub-cohort (shown in Fig. 3a ) and the remaining students (Fig. 3b) . Each box in the map is an activity which is labeled by the action type and the relevant chapter (e.g., Formative Quiz-chapter1). To more clearly visually distinguish the three types of learning activities in the process map, we use color coding. In the sub-cohort's process, the arrows in between the three different types of activities indicate switching between the types of learning tasks. Such switching can be an indication that the three types of tasks were being performed every week before the next chapter's activities were made available. In contrast, the underlying process of the remaining students shows that each activity type related to chapters 9 to 18 (highlighted in Fig. 3b ) are mainly performed sequentially, which is indicative of students performing them at the end of the semester when all tasks were available. To further investigate our initial findings, we used Disco's Events' graph to compare the distribution of the events over the semester. Figures 3c and d demonstrate that the sub-cohort was more active during the semester compared with the remaining students. Furthermore, the average number of events per student was 36 in the sub-cohort and 25 for the remaining students, which is significantly different (p = 0.0006). To conclude our analysis, the identified subcohort had a high rate of activities throughout the semester compared to the remaining students. One of the common features of this cohort was their high performance in the final exam, which might be correlated with their developed learning process. Some other differences perceived by comparing the two process maps are that the Formative Quiz of chapter 8 was not performed by any students of the sub-cohort, Solution Review of chapters 2, 7, 8 and 9 were the highestrated activities by the sub-cohort, and that Solution Review of chapters 1, 2, 6, 7, 8 and 9 were the highest-rated activities by the remaining students. Feedback From the Instructor. We presented the reported drill-down recommendations and the process visualisations to the instructor of the course to capture their feedback and comments on the findings. Their feedback can be summarised as follows: (1) While the instructor had access to Course Insights throughout the semester, they rarely used it and generally found it to be overwhelming. They considered the large number of potential drill-down options within the platform as the main reason that made using the platform overwhelming; (2) Findings of behaviour that have led to successful outcome can be used for positive deviance [30] purposes. The instructor indicated they would like to share Fig. 3 as a recommended pattern of successful learning with their students as evidence that consistent engagement with learning activities throughout the semester is related to better outcomes. (3) Providing the ability to receive drill-down recommendations based on a rule (e.g., 'midterm' < 50) would be useful. The instructor indicated that they would like to understand deviations in low performing and at-risk students to help them pass the course. The OLAP drill-down operation is commonly used in data-driven dashboards to enable users to meaningfully zoom in to explore data in more detail. For LADs, this operation can be used to enable educators to identify a sub-cohort of students who deviate from class norms and who may require special attention. In this paper, we provide an automated method called LP-AID for finding and recommending a set of insightful drill-down actions to guide data exploration. To support understanding of student learning approaches, we take a process mining lens to examine and compare student learning behaviour in terms of their learning activities, the relative frequency of each different learning path and the order in which the activities were performed. It examines all drill-down paths and uses Earth Movers' Stochastic Conformance Checking to score the 'insightfullness' of each path by examining the distance between learning behaviours of two cohorts. Furthermore, we use a solution set ranking function that maximizes diversity to rank and select the drill-down paths for instructors to consider. We illustrated how LP-AID can be used as part of a LAD to guide the discovery of insightful drill-downs. The learning processes of students based on the recommended drill-downs were visualised and compared, highlighting how the learning process of the identified sub-cohort deviates from the remaining students. Feedback from the instructor of the course suggests that manual drilldowns without guidance can be overwhelming, and that insights gained from the recommendations can be shared with students to encourage change (i.e. application of positive deviance). Future work aims to embed LP-AID in Course Insights and to partner with course instructors through co-creation to investigate (1) the practical implications of our approach and refine it accordingly; (2) the most effective way to present the drill-down recommendations to instructors and (3) the most appropriate visualisation method(s) to present the learning process deviation of sub-cohorts to instructors. Avoiding drilldown fallacies with vispilot: Assisted exploration of data subsets Process Mining -Data Science in Action Comparative process mining in education: an approach based on process cubes Perceptions and use of an early warning system during a higher education transition program Open learner models and learning analytics dashboards: a systematic review A survey on educational process mining Clustering for improving educational process mining Eco d2. 5 learning analytics requirements and metrics report Improving teacher awareness through activity, badge and content visualizations Graph-based visual topic dependency models: supporting assessment design and delivery at scale Design and implementation of a learning analytics toolkit for teachers Echo360 Inc.: Echo360 Towards a system of guidance, assistance and learning analytics based on multi agent system applied on serious games A framework to support educational decision making in mobile learning Discovering time management strategies in learning processes using process mining techniques Educational process mining: a systematic literature review Socially augmented argumentation tools: Rationale, design and evaluation of a debate dashboard Academic dashboard for tracking students' efficiency Interactive data exploration with smart drill-down Kaltura Software company: Kaltura video platform Empowering L&D managers through customisation of inline learning analytics Topic dependency models: graph-based visual analytics for communicating assessment data Using learning analytics to investigate patterns of performance and engagement in large classes Ripple: a crowdsourced adaptive platform for recommendation of learning activities Development and adoption of an adaptive learning system: reflections and lessons learned Effects of learning analytics dashboard: analyzing the relations among dashboard utilization, satisfaction, and learning achievement Earth movers' stochastic conformance checking The power of positive deviance TSCL: a conceptual model to inform understanding of collaborative learning processes at interactive tabletops Analytics of learning strategies: associations with academic performance and feedback Towards textual reporting in learning analytics dashboards LIM app: reflecting on audience feedback for improving presentation skills Addressing learner issues with stepup! an evaluation User-adaptive exploration of multidimensional data User-cognizant multidimensional analysis Discovery-driven exploration of OLAP data cubes Learning process analytics for a self-study class in a semantic mediawiki Perceiving learning at a glance: a systematic literature review of learning dashboard research Automated insightful drill-down recommendations for learning analytics dashboards Application of big data in education data mining and learning analytics-a literature review Using learning analytics to visualise computer science teamwork Integrated analytic dashboard for virtual evaluation laboratories and collaborative forums From local patterns to global models: towards domain driven educational process mining Defining and optimizing indicator-based diversity measures in multiobjective search Learning analytics dashboard applications Recompiling learning processes from event logs. Knowl.-Based Syst