key: cord-0046943-8wl1xzc3 authors: Mogessie, Michael; Elizabeth Richey, J.; McLaren, Bruce M.; Andres-Bray, Juan Miguel L.; Baker, Ryan S. title: Confrustion and Gaming While Learning with Erroneous Examples in a Decimals Game date: 2020-06-10 journal: Artificial Intelligence in Education DOI: 10.1007/978-3-030-52240-7_38 sha: b941756386ab3e5007c97a27ec5c855a1c6a1698 doc_id: 46943 cord_uid: 8wl1xzc3 Prior studies have explored the potential of erroneous examples in helping students learn more effectively by correcting errors in solutions to decimal problems. One recent study found that while students experience more confusion and frustration (confrustion) when working with erroneous examples, they demonstrate better retention of decimal concepts. In this study, we investigated whether this finding could be replicated in a digital learning game. In the erroneous examples (ErrEx) version of the game, students saw a character play the games and make mistakes, and then they corrected the characters’ errors. In the problem solving (PS) version, students played the games by themselves. We found that confrustion was significantly, negatively correlated with performance in both pretest (r = −.62, p < .001) and posttest (r = −.68, p < .001) and so was gaming the system (pretest r = −.58, p < .001, posttest r = −.66, p < .001). Posthoc (Tukey) tests indicated that students who did not see any erroneous examples (PS-only) experienced significantly lower levels of confrustion (p < .001) and gaming (p < .001). While we did not find significant differences in post-test performance across conditions, our findings show that students working with erroneous examples experience consistently higher levels of confrustion in both game and non-game contexts. Researchers have investigated the value of solving problems using non-traditional approaches to problem solving. Worked examples [1] [2] [3] and erroneous examples [4] [5] [6] have been of particular interest. Worked examples demonstrate a procedure to arrive at a correct solution and may prompt students to provide explanations to correct steps of a solution while erroneous examples require them to identify and fix errors in incorrect solutions. The reason these approaches improve learning has been attributed to their role in freeing up cognitive resources that can then be used to learn new knowledge [7] . Factors not specific to a particular approach may also interact with learning. Of these, affect and behavior have garnered the most attention [8] [9] [10] [11] . In particular, states of confusion, concentration and boredom have been shown to persist across computerbased learning environments (dialog tutors, problem-solving games, problem-solving intelligent tutors) [12] . In a recent study, we found that students who were assigned erroneous examples implemented in an intelligent tutor [13] experienced higher levels of confrustion [14] , a mix of confusion and frustration, than those who were asked to answer typical problem-solving questions. However, we found that confrustion was negatively correlated with both immediate and delayed learning, albeit less so for students who worked with erroneous examples. This study, which is a replication of our recent findings but in a game versus ITS context, was motivated by two observations. First, in order to determine whether this relationship is robust, it is important to explore whether our recent findings persist in other digital learning environments. This is because levels of affective states such as frustration and behaviors such as gaming the system have been shown to vary across learning environments and user interfaces [12, 15] . Second, research has shown that students who engage in gaming the system also experience frustration [10] , though frustration does not always precede gaming [12] . Therefore, it is interesting to explore if this association persists when erroneous examples are implemented in a digital learning game context. Participants were divided into four groups where two groups worked with either Erroneous Examples (ErrEx) or Problem Solving (PS) questions only and the other two worked with a mix of either ErrEx then PS or PS then ErrEx questions. We expected that students in all four groups would perform better from pretest to posttest. We then tested the following hypotheses: H1: Confrustion and gaming will be negatively related to performance, even when controlling for prior knowledge. H2: Students in any of the conditions that include erroneous examples will experience higher levels of confrustion and gaming the system. H3: Students in any of the conditions that include erroneous examples will perform better than their PS-only counterparts in the posttest. The data used in this study was collected in the spring of 2015. Participants were recruited from four teachers' classes at two middle schools, and participated over four to five class sessions. Both schools are located in the metropolitan area of a city in the United States. The analysis for this study included the data of 191 students, divided into four conditions within the game context. Materials consisted of the digital learning game, Decimal Point [16] , and three isomorphic versions of a test administered as a pretest and posttest. The Decimal Point game is laid out on an amusement park map, with 24 mini-games in which students play two rounds of each. All tests and the game used the Cognitive Tutor Authoring Tool (CTAT) [17] as a tutoring backend. The game was designed with focus on common misconceptions middle school students have about decimals [18] . We used gameplay data to generate machine learning models to detect confrustion and gaming the system. In this study, we applied text replay coding [19, 20] to student logs to label 1,560 clips (irr j = .74). To predict confrustion and gaming, the detectors used 23 features of the students' interaction with the decimal tutor, involving the number of attempts, amount of time spent and restart behavior. After evaluating the performance of several classification algorithms in terms of Area Under the Receiver Operating Characteristic Curve (AUC ROC) and Cohen's Kappa (j), we built the confrustion detector using the Extreme Gradient Boosting (XGBoost) ensemble tree-based classifier [21] (AUC ROC = .97, j = .81) and the gaming detector using the J-Rip classifier [22] (AUC ROC = .85, j = .62). Confrustion was significantly, negatively correlated with performance on the pretest (r = −.62, p < .001) and posttest (r = −.68, p < .001). A multiple regression model tested using confrustion to predict posttest performance while controlling for pretest was also significant, F(2, 188) = 181.14, p < .001. Within the model, both pretest, (b = .57, p < .001) and confrustion (b = −.32, p < .001) were significant; confrustion was a significant, negative predictor of posttest performance even after controlling for pretest. Gaming was significantly, negatively correlated with performance on the pretest (r = −.58, p < .001) and posttest (r = −.66, p < .001). A multiple regression model tested using gaming to predict posttest performance while controlling for pretest was also significant, F(2, 188) = 181.14, p < .001. Within the model, both pretest, (b = .59, p < .001) and gaming (b = −.31, p < .001) were significant, indicating that gaming was also a significant, negative predictor of posttest performance even after controlling for pretest. Mean levels of confrustion and gaming for each condition are reported in Table 1 . A one-way analysis of variance (ANOVA) comparing gaming and confrustion levels across conditions indicated a significant effect of condition on confrustion, F(3, 187) = 14.01, p < .001, and gaming, F(3, 187) = 10.07, p < .001. Posthoc (Tukey) tests indicated that students in the PS-only condition experienced significantly lower levels of confrustion (ps < .001), while there were no differences among the other conditions (ps > .97). Similarly, posthoc (Tukey) tests indicated that students in the PS-only condition experienced significantly lower levels of gaming (ps < .001), while there were no differences among the other conditions (ps > .91). Finally, a repeated-measure analysis of variance (ANOVA) indicated that students across all conditions improved significantly from pretest to posttest, F(3, 187) = 167.04, p < .001. See Table 1 for means and standard deviations across conditions. A series of ANOVAs indicated no significant differences across conditions on pretest, F(3, 187) = 1.63, p = .18, or posttest, F(3. 187) = 1.65, p = .18. In this study, we implemented erroneous examples in a digital learning game context and found that students who played the erroneous examples versions of the game experienced higher levels of confrustion. There was also a significant correlation between gaming the system and confrustion. Future research might further explore the relationship between frustration and gaming, as previous research using affect detectors has found that frustration did not tend to precede gaming the system [12] . A previous study using a web-based intelligent tutor showed that students working with erroneous examples performed better than their problem-solving counterparts [6] . This study, however, did not replicate that finding. While it is not possible to make a direct comparison between confrustion levels in the game and intelligent tutor versions of the ErrEx condition, it is worth noting that students who played the game experienced higher levels of confrustion (M = 0.46, SD = 0.26) than those who used the intelligent tutor (M = 0.34, SD = 0.16) [13] . Since confrustion has been shown to be significantly, negatively correlated with learning, these higher levels of confrustion may explain why we did not see better learning effects of erroneous examples in the game context. Alternatively, integrating the game interface with a feature where students watch a game character play the game for them may have negatively impacted both the game experience and the intended benefit of erroneous examples. In an upcoming study, we will explore mechanisms intended to reduce the negative impact of confrustion and gaming on learning with erroneous examples in a digital learning game. The use of worked examples as a substitute for problem solving in learning algebra When and how often should worked examples be given to students? New results and a summary of the current state of research Learning from worked-out examples and problem solving Can erroneous examples help middle-school students learn decimals? Using erroneous examples to improve mathematics learning with a webbased tutoring system Delayed learning effects with erroneous examples: a study of learning decimals with a web-based tutor To err is human, to explain and correct is divine: a study of interactive erroneous examples with middle school math students Digital game-based learning Off-task behavior in the cognitive tutor classroom: when students "game the system Detection and analysis of off-task gaming behavior in intelligent tutoring systems Situated Language and Learning: A Critique of Traditional Schooling Better to be frustrated than bored: The incidence, persistence, and impact of learners' cognitive-affective states during interactions with three different computer-based learning environments More confusion and frustration, better learning: the impact of erroneous examples Sequences of frustration and confusion, and learning Educational software features that encourage and discourage "gaming the system A computer-based game that promotes mathematics learning more than a conventional approach Scaling up programming by demonstration for intelligent tutoring systems development: an open-access website for middle school mathematics learning Towards intelligent tutoring with erroneous examples: a taxonomy of decimal misconceptions Exploring the relationship between novice programmer confusion and achievement Human classification of low-fidelity replays of student actions XGBoost: a scalable tree boosting system Fast effective rule induction