key: cord-0046954-6xctfzme authors: Mbouzao, Boniface; Desmarais, Michel C.; Shrier, Ian title: Early Prediction of Success in MOOC from Video Interaction Features date: 2020-06-10 journal: Artificial Intelligence in Education DOI: 10.1007/978-3-030-52240-7_35 sha: 0e4c2ce6914a76661bc17533f1aac46a71a0841f doc_id: 46954 cord_uid: 6xctfzme The popularity of online learning, such as MOOCs (Massive Open Online Courses), continues to increase among students. However, MOOCs dropout remains high. Prediction of student performance that could feed instructors’ dashboards and help them adapt their course structure and material, or trigger help and tailor interventions to specific groups of students, is a valuable research objective. Towards that end, this paper focuses on three predictive metrics (student attendance rate: [Formula: see text], utilization rate: [Formula: see text], and watching index: [Formula: see text]) of how students interact with MOOC videos in order to predict which group of students will pass or fail the course. Results show that these metrics, taken after the first week and the midpoint, can be highly effective for predicting the students that will pass or fail the course. Can the analytics of student interaction with videos provide effective predictors of their academic performance? With thousands of students registering for online courses every year, it is of great interest to both MOOC developers and instructors to predict, early on in the course, those students that may drop out or who are instead likely to complete the course successfully. Yet, by using data directly from student behaviours with video-based-learning (VBL), we believe we can predict student performance with minimal research bias while helping the student to avoid failing the course. The interactions during the first week of the course are a key reference point in predicting student performance throughout the course [8] , should the student remain enrolled in the MOOC. This can inform the MOOC developers, and instructors, of those students who are likely to persist and perform better in the MOOC if the course was tailored to their learning needs rather than focus on large-volume enrollment to the detriment of successful student performance. The aim of this paper is to investigate characteristics of video utilization that can determine if a student will likely fail or pass at an early stage of the MOOC; even as early as after one week of interaction with VBL modules. A fair amount of research is devoted to study student video interactions, including specific characteristics of video watching behaviors. For instance, Giannakos et al. [4] found a relationship between video interactions, repeated viewing, and the level of cognition required for a specific video segments. Li et al. [13] , also studied the link between student behavior patterns and learning performance. Their study included features as: videos lectures, lecture slides, shared assignments and open-forum messages. They concluded that video lectures and lecture slide were the most used by students. He et al. [7] measured the student's utilization of video in a MOOC and its relation to academic performance. They proposed indicators based on student interaction data to measure the utilization of video resources such as: attendance and utilization rates plus watch ratio. Our study measures the utilization and attendance rates for performance prediction. Hughes et al. [9] , show how the use of data analysis techniques in MOOCs can help to predict at risk of dropout students before it happens. Some studies used learners' behaviour by defining engagement [12] , online social networking [3] to predict performance [10] or predict dropouts [6] . Then it is possible to identify student that might dropouts early enough [11] to give them proper help before it happens [14] . The student performance can be influenced by some features of the videos themselves [5] bringing students to play a video more than others [2, 17] , or spend more time on video than on other online resources [16] . Beyond video interactions, prediction of success and failure was done by Baker et al. [1] based on student online activity using Soomo Learning Environment. Using many prediction models they show that using the logistic regression model of their propose combined model they can identify up to 59.5% of students who will perform poorly in the course way better than chance performance. Recently, Adithya et al. [18] analyzed students logs of three blended courses to predict student performance, (see also Owen et al. [15] with student profile for performance prediction). The results of their research show that it is possible to predict student performance through student level of use of online system. The general objective of this study is to use video interaction metrics to assess the likelihood that a student will succeed in a MOOC. We also aim to evaluate the accuracy of these estimates as a function of the MOOC timeline; earlier estimates are deemed to be less accurate but more useful for remedial purposes. We first describe the three metrics involved and then describe how they are used to define groups. The metrics such as AR and UR are inspired from [7] . The attendance rate AR s,c of a student s on a given week c since the beginning of the course, is the number of videos that the student played over to the total number of videos up to that period in time of the course schedule. The utilization rate UR s,c of a student s on a given week c since the beginning of the course is the proportion of video play time activity of the student over the sum of video lengths for all videos up to week c. The watch index (WI ) is defined as: Where: UR s,c is utilization rate of student s at week c, Vt j is the duration of video j, N the total number of videos released up to week c, Wt s,i is the total time student s played video i, n is the number of unique videos that student s played up to week c. AR s,c is the attendance rate of student s after week c, V c is the course's total number of videos released after week c, W s,c is the collection of different videos watched by student s after week c. The rule for grouping and testing is taking from the classification of a single student compared to the average value of the group of student. The student that has metrics over the average value of the group of student enrolled in the course is likely to pass the course. The algorithm for groupings student is: Where WI is the average WI over all students. Our study uses the traces of an online course at McGill University on the edX platform that had 30,640 students interacting with learning videos. The course has 138 videos (plus one live session). It has 13 weeks duration. Among students who interacted with the videos, 10,424 students are honors and 970 passed the course. Here, the first week of the course had nine (9) videos with a total length of 41 min. We extracted 2,733,169 different student watching events (play, pause, seek forward/backward, stop). Event data is structured such that we can compute the play time of each student for each video. Table 1 shows the results of classifying students into two groups according to the rule defined above. We find that after a week of data, students who passed the course overwhelmingly fall into group II (78%). The trend is even stronger after week 6 (93%). However, for the students who failed (or dropped) the course, the split is almost even between the two groups. Then, 60% of the students who fail are in Group I, and that ratio grows to 2/3 after the sixth week. We can see that around 4/5 of the students that will succeed the course will be in Group II, whereas around 60% of students who will fail will be in Group I. These results show an improvement compared to previous study using the same kind of metrics to red flag at risk of failing students. Comparing these results to the one that we obtained using the methodology of He et al. [7] (results shown within parenthesis), we see that after the first week there is no majority class separating student who will pass or fail between the two groups. The majority of students who will pass and the majority of student who will fail are all in group II according to He et al. [7] methodology. After six week the results based on their methodology identify majority classes (majority of students who passed are in group II and majority of students who fail are in group I) but there are still 35% of students who passed in group I. Through quantitative analysis that includes metrics such as Attendance Rate (AR), Utilization Rate (UR), and Watch Index (WI ), as defined in this paper, it is possible to identify failure patterns of up to 60% of students who will dropout or fail the course based on the first week student interaction with MOOC videos, based on a total course length of thirteen (13) weeks, and can identify 78% of successful students. Using the metrics defined, educational institutions can flag students at risk of failing, or dropout, of the MOOC based on the student's interactions with the learning videos in the early stages of the course. Our study shows a better classification compared previous study results of He et al. [7] . Results such as these should help MOOC developers better identify those students that might dropout or fail the course and thus take actions to prevent it. Analyzing early at-risk factors in higher education e-learning courses Studying learning in the worldwide classroom: research into edx's first MOOC Learning about social learning in MOOCS: from statistical analysis to generative model Making sense of video analytics: lessons learned from clickstream interactions, attitudes, and learning outcome in a video-assisted course How video production affects student engagement: an empirical study of mooc videos Dropout prediction in MOOCS using learner activity features Measuring student's utilization of video resources and its effect on academic performance Emerging student patterns in MOOCs: a (revised) graphical view The utilization of data analysis techniques in predicting student performance in massive open online courses (moocs) Predicting MOOC performance with week 1 behavior Behavioral analysis using cumulative playback time for identifying task hardship of instruction video Deconstructing disengagement: analyzing learner subpopulations in massive open online courses Accessing online learning material: quantitative behavior patterns and their effects on motivation and learning performance How do in-video interactions reflect perceived video difficulty? Applying learning analytics for the early prediction of students' academic performance in blended learning Video lecture watching behaviors of learners in online courses Who does what in a massive open online course? Predicting student performance based on online study habits: a study of blended courses