key: cord-0046884-gtdmqi6l authors: Lehman, Blair; Gu, Lin; Zhao, Jing; Tsuprun, Eugene; Kurzum, Christopher; Schiano, Michael; Liu, Yulin; Tanner Jackson, G. title: Use of Adaptive Feedback in an App for English Language Spontaneous Speech date: 2020-06-09 journal: Artificial Intelligence in Education DOI: 10.1007/978-3-030-52237-7_25 sha: 42979482e1035167d9fbc1303f1ec65afc39f148 doc_id: 46884 cord_uid: gtdmqi6l Language learning apps have become increasingly popular. However, most of these apps target the first stages of learning a new language and are limited in the type of feedback that can be provided to users’ spontaneous spoken responses. The English Language Artificial Intelligence (ELAi) app was developed to address this gap by providing users with a variety of prompts for spontaneous speech and adaptive, targeted feedback based on the automatic evaluation of spoken responses. Feedback in the ELAi app was presented across multiple pages such that users could choose the amount and depth of feedback that they wanted to receive. The present work evaluates how 94 English language learners interacted with the app. We focused on participants’ use of the feedback pages and whether or not performance on spontaneous speech improved over the course of using the app. The findings revealed that users were most likely to access the most shallow feedback page, but use of the feedback pages differed based on the total number of sessions that users completed with the app. Users showed improvement in their response performance over the course of using the app, which suggests that the design of repeated practice and adaptive, targeted feedback in the ELAi app is promising. Patterns of feedback page use are discussed further as well as potential design modifications that could increase the use of feedback and maximize improvement in English language spontaneous speech. Language learning has moved from the traditional classroom-only model to computerassisted language learning to mobile-assisted language learning (MALL) [1, 2] . MALL apps provide users with flexibility, autonomy, and personalized learning experiences [3] . There are currently over 100 language learning apps in the iOS App Store, and apps have even expanded to smart watches that incorporate exercise into language learning [4] . MALL apps have shown to be an effective method [5, 6] . Duolingo, for example, claims to be as effective as college-level language courses [7] , but others report more mixed findings [8] . In this abundance of MALL apps, many have a similar focus in that they target (a) general language learning and (b) learners at an initially low proficiency level. Thus, there is still a need for the development of apps that provide support to learners at other proficiency levels and with differing goals. For example, recent efforts in MALL app development have focused on the particular language needs of migrants and refugees [9, 10] and low literacy adults [11] . One of the main advantages of MALL apps (and attractions to users) is that they provide immediate, targeted feedback about users' performance on learning activities. This is consistent with years of research that has shown that simply providing feedback is not enough, it must be delivered in a way that is optimally useful for learners [12, 13] . MALL apps are typically able to provide targeted feedback on the quality of selectedresponse items, grammar and spelling for written responses, and word pronunciation for constrained speaking tasks. However, many MALL apps are limited in the level of detail that can be provided for feedback on speaking tasks [14, 15] . Duolingo, for example, identifies whether or not a user has correctly pronounced a word, but it does not provide feedback about how the user could more accurately pronounce the target word. Given that speaking is often one of the more challenging aspects of learning a language [16] [17] [18] , it is important for MALL apps to provide feedback in such a way that users feel confident that they can improve their speaking skills. Given the challenges of providing targeted feedback in real time for speaking tasks, most MALL apps focus only on constrained speaking tasks in which users are provided with a text to read aloud verbatim because automated feedback can be more easily provided. However, our recent user interviews suggested that many language learners would like to practice and receive feedback for spontaneous speaking tasks. Spontaneous speaking tasks involve learners responding to an open-ended prompt (e.g., Tell me about your favorite vacation.). This type of task is utilized on many standardized assessments of language skills (e.g., TOEFL ® , IELTS™) as it shows an advanced level of speaking proficiency and spontaneous speaking skills are viewed as an important aspect of effective communication [19, 20] . This type of task is often not included in MALL apps because it is difficult to provide immediate, targeted feedback. Spontaneous speaking tasks are typically evaluated by human raters in standardized assessments, which limits the ability to provide feedback to users immediately after responding. To address the apparent lack of spontaneous speaking practice with immediate feedback for language learners, we have developed the English Language Artificial Intelligence (ELAi) app. The ELAi app was designed to provide users with an opportunity to practice spontaneous speech and receive detailed feedback about the quality of their responses. This learning model is consistent with languaging [21] [22] [23] as students are asked to engage in effortful language production that can draw attention to their current weaknesses, but with the added benefit of targeted feedback to help focus efforts for improvement. We utilized an automated speech analysis tool that evaluates spontaneous speech on delivery, language use, and topic development to provide targeted, detailed feedback. However, it is not enough to simply provide feedback [12, 13] . A recent review of research on oral feedback for spoken responses, for example, found that there is a limited understanding of how learners make use of feedback [24] . It is then important that we understand how users interact with the feedback provided. The present work is the first evaluation of the ELAi app and was guided by three research questions: (1) How do users interact with the app features?, (2) What do users do after viewing feedback?, and (3) Does users' performance improve during app use? We investigated these three research questions with native Mandarin speakers who are learning English for the purpose of attending university in an English-speaking country. The ELAi app was developed to provide an easily accessible resource for English language learners at an intermediate or advanced level, with the goal of attending university in an English-speaking country, to practice spontaneous speech and receive feedback. The development was guided by interviews with potential users from the target audience, which revealed that users were often practicing spontaneous speech on their mobile phone but were unable to receive feedback in the same medium [25] . Users were most interested in feedback that corresponded to standardized English language assessment evaluations. Users also revealed their desire for access to sample responses to compare to their own responses both in terms of delivery and content. The ELAi app was then developed to address the needs of these real-world users. Users began with the ELAi app by browsing the many prompt options available. Figure 1 shows (from left to right) screenshots of the app splash page as well as the process of selecting a prompt category, responding to a specific prompt (e.g., Do you think the use of smart watches will increase or decrease in the future? Why?), and the feedback overview. After completing a new response, users were notified when feedback was available (latency was equivalent to the response length). User responses were evaluated with an automated speech analysis tool that used acoustic and language models to allow for the extraction of acoustic characteristics and creation of a response transcript. The models were based on nonnative English speakers to account for pronunciation differences due to accents. The automated speech analysis tool then evaluated the response on over 100 raw speech features from the acoustic characteristics and transcript. A subset of these features was selected based on their potential for learning feedback and were then combined to provide feedback on six key speech features (filler words, pauses, repeated words, speaking rate, stressed words, vocabulary diversity) to help users improve speaking skills. Users could access feedback on four pages within the app, which allowed for self-selection of the type and amount of feedback provided. The first feedback page was My History (Fig. 1 , rightmost panel), which provided a Feedback Overview for each response at a relatively shallow level in that it only identified two speech features that needed improvement (weightlifter icon) and one feature that was done well (thumbs up icon). This was the first instance of feedback that was adaptive to the individual user. For example, in Fig. 1 the user needed to improve on Repeated Words and Vocabulary Diversity, whereas Filler Words was done well in the technology response (top card). Needs work was defined separately for each speech feature with some defined as overuse (filler words, pauses, repeated words, vocabulary diversity), whereas other features had an inverted U-shaped relationship in which too much or too little was problematic (speaking rate, stressed words). However, users were not provided with any explanations or resources to improve future responses on Overview. Thus, Overview provided minimal feedback on the quality of a response and minimal support for improving future responses but did serve as an organized resource for users to access all of the feedback they had received. Figure 2 shows the next feedback page that users could access by selecting a specific response card on Overview or directly through the feedback ready notification. This next page was Feedback Summary Report and was designed to be the main source of feedback for users. On Summary Report users could listen to their own response, review explanations for those three speech features that were shown on Overview, access additional ideas for how to develop a response to that prompt, and listen to sample responses from both native and nonnative English speakers (from left to right in Fig. 2 ). The design of Summary Report allowed the user to quickly develop an understanding of the quality of their response by focusing on two features that needed improvement and ensured that this feedback was actionable by providing users with additional information and resources to improve their future responses. Users could view more detailed feedback on Feedback Full Report and Feedback Details (see Fig. 3 ). The Full Report provided explanations for all six speech features. For example, in Fig. 3 the two leftmost panels show that the user did well on Filler Words but needed to improve on Repeated Words and Speaking Rate. Details provided even more detailed information about four of the six speech features (pauses, repeated words, filler words, vocabulary diversity). Details provided a transcript of the response (see second from the right panel in Fig. 3) , which highlighted the problematic aspects of the speech feature (e.g., repeated words). Details for vocabulary diversity provided suggestions of additional words that could be used to respond to the prompt (see rightmost panel in Fig. 3) . Full Report and Details provided users with a greater amount of and more indepth feedback, which can be beneficial if users dedicate the time and effort needed to process and apply the information provided [26] . Users were also able to view information about their app use metrics through the Me Screen. Users could see the total amount of time they had recorded responses, total number of responses, the amount of time for recorded responses in the current week, and how many days in a row they had recorded responses. The Me Screen also allowed users to View Badges that they earned. Users could earn a variety of badges that targeted engagement and performance. Engagement-based badges were designed to encourage persistence and regular practice (e.g., multi-day streaks of recording), whereas performancebased badges allowed users to track their progress over time on a single speech feature (e.g., received "good job" on filler words three times in a row). Participants were 94 students from an English language learning program in China that primarily focused on preparation for standardized English language learning assessments. Gender information was obtained from 62 participants: 58% female, 33% male, and 6% preferred not to respond. Participants completed from 1 to 45 sessions with the ELAi app over a one-month period (M = 8.62, SD = 8.56). Sessions were a little over five minutes on average (SD = 4.52) and included an average of 17.2 user-initiated actions (SD = 14.8). Users completed an average of 14.4 spoken responses over the course of using the ELAi app (SD = 24.2). Participants were recruited through their English language learning program. Those participants who were interested then completed an informed consent and were provided with the information needed to access the ELAi app. Participants were free to use the app as they wanted for one month. There were no direct instructions about how users should interact with the app; however, participants were told that they would receive a certificate of participation if they recorded at least five spoken responses. First, we investigated the use of app features in four ways (see Table 1 ): feature access (proportion of participants), average feature time use (in seconds, avg time per access), proportion of total session time (proportion of time), and proportion of total session actions (proportion of actions) [27] . The proportion of participants that accessed each feature at least once revealed a generally high rate of feature access, with the exception that 60% or less of users accessed the more in depth feedback pages (Full Report, Details) and listened to their own or samples responses, which were features that users specifically requested. This contradiction between what users say they want and how they interact with a MALL app has been found in other apps as well [28] . Overall the feature use analyses revealed that users spent the majority of their time interacting with the ELAi app browsing for a prompt, responding to prompts, viewing the shallowest level of feedback, and viewing their overall app usage data. This pattern is both consistent and inconsistent with user requests. Users were frequently practicing their spontaneous speech, but they were not typically utilizing the more detailed feedback and learning resources that they requested. It is important to note, however, that the more detailed feedback and learning resources were embedded in the app, meaning that users could only access them via another feedback page. Feedback Overview and Summary Report, on the other hand, could be accessed directly. Thus, the lack of access to the more detailed feedback (Full Report, Details) could represent a lack of user interest or lack of feature awareness. In an effort to consider this dependence between actions, we repeated the proportion of actions analysis with instances in which less detailed feedback page views were removed if they immediately preceded a more detailed feedback page view. This was an overly conservative analysis as it assumed that all of these less detailed feedback page views were only in service of accessing more detailed feedback. The pattern of findings remained the same, which suggests that although we cannot target the exact reason for infrequent access of more detailed feedback, we can feel confident that those pages were accessed less frequently. The previous analyses considered the sample as a whole; however, there was a wide range in the degree to which users engaged with the ELAi app (1 to 45 sessions), which suggests a potential for different use patterns. Users were divided into low (five or less sessions, n = 47) and high engagement groups (more than five sessions, n = 47) based on a median split to explore potential feature use differences. Table 1 shows the descriptive statistics for each engagement group. Particularly large differences can be seen for accessing more detailed feedback and learning resources, with high engagement users accessing those features at least once at a higher rate than low engagement users. The two engagement groups were compared with independent samples t-tests for average time spent on each feature, which revealed that the high engagement group spent more time on all features, except Summary Report, Full Report, and Me Screen. Despite this difference in time spent on features, there were no differences in how users in each engagement group distributed their time (proportion of time, p's > .05) and actions within a session (proportion of actions, p's > .05). The comparison of engagement groups revealed that users who had greater engagement with the ELAi app accessed more features and spent more time on those features, particularly those features that provided more in-depth feedback and support for improving future performance. The previous findings led us to question if there were particular patterns of behavior after viewing feedback that were indicative of more or less productive behavior. For example, a productive behavior after viewing FB Overview would be to access FB Summary Report to better understand why certain speech features need improvement and access resources for improvement. Thus, we investigated the next action taken after viewing each type of feedback. We combined several actions into action categories that consisted of Browse Behavior (View Category, View Prompt), Feedback Viewed, Me Screen Viewed (Me Screen, View Badge), and Exit App for these analyses. Repeated measures ANOVAs compared the prevalence of post-feedback actions for each feedback page (see Table 2 ) and were significant [Overview: F(4,352) = 33.6, p < .001, MSe = .044, partial η 2 = .277; Summary Report: F(4,268) = 333, p < .001, MSe = .024, partial η 2 = .832; Full Report: F(4,212) = 313, p < .001, MSe = .026, partial η 2 = .855; Details: F(4,224) = 259, p < .001, MSe = .032, partial η 2 = .822]. Bonferroni corrections were applied to all post hoc comparisons. The pattern for Overview revealed that all action categories were more likely to occur after viewing Overview than exiting the app. A different pattern emerged for the remaining feedback pages. Specifically, viewing feedback was the most likely action to occur after viewing Summary Report, Full Report, and Details, with at least 80% of next actions involving viewing one of the feedback pages. Exit App and Browse Behavior were the next most likely to occur and Me Screen Viewed was the least likely action to occur after viewing those three feedback pages. These findings suggest that if users can go deeper into the feedback than Overview, they may get into a potentially beneficial feedback loop. Last, we investigated changes in spoken response performance over the course of app interaction. User sessions (visit to app) were divided into thirds (first, middle, last) and we investigated changes in performance from the first third to the last third. Performance was measured as the proportion of spoken responses that received "Needs Work" feedback on each speech feature in each third of sessions. This investigation reduced the number of users to 27 as users were required to have at least three sessions and to have at least one speech in both the first and last third of sessions. All 27 users included in this analysis were in the high engagement group, which means that they made greater use of the app features, in particular the more detailed feedback and resources for response improvement. Table 3 shows the descriptive statistics and paired samples t-test comparisons for each of the six speech features. The comparisons revealed a reduction in the proportion of speech features that needed work, which suggests an overall improvement in performance across use of the ELAi app. The effect size differences between the first and last third of sessions were all large (d > .8) [29] , with the exception of a medium effect size (.5 < d < .8) for Pauses. These findings are very promising as they show large improvements in a variety of speech features over a relatively short period of time. However, the findings should be interpreted with a modicum of caution as a small number of participants were included in these analyses (29% of sample), time on task varied across participants, and we were not able to consider additional resources that users may have accessed during this same time period (e.g., language courses, other MALL apps). It is also important to note that this investigation was limited to performance within the app and a more formal investigation of changes in speaking skills is needed (e.g., pre/posttest design) to determine the true effectiveness of the ELAi app as a learning tool [27] . There is currently a plethora of language learning apps available to users. However, these apps are often designed for beginning language learners and are limited in their ability to provide feedback to spoken responses. The ELAi app was developed to provide an easily accessible English language learning app for users that want to receive detailed feedback about their speaking skills during spontaneous speech. The present work was the first evaluation of the ELAi app. Overall, the findings revealed that users spent the majority of their time browsing for prompts, completing new responses, and viewing shallow level feedback. This suggests that users are generally not taking advantage of the more in-depth feedback and resources to facilitate improvement, which were requested by users during interviews [28] . Although the prominence of viewing shallow feedback is disappointing, it could represent productive behavior. Feedback Overview is the only page in which users can compare their performance on multiple speech features across individual responses, which could reveal patterns of improvement or persistent issues [30] by leveraging the benefits of open learner models [31] . Future research is needed to determine if this cross-response comparison is occurring and to explore designs to facilitate these comparisons [32, 33] as language learners may not engage in self-regulated learning behaviors on their own in MALL apps [34] . We also investigated changes in user performance. Our preliminary findings were promising in that more engaged users improved their performance on all six speech features from the beginning to the end of their interaction with the ELAi app. However, these findings are only preliminary and a more rigorous investigation of the impact of the ELAi app on speaking skills is needed. Overall our initial findings suggest that the ELAi app is a promising MALL, but there is still room for improvement. The Feedback Summary Report, for example, could be improved by requiring less scrolling for users to access learning resources and explicitly highlighting the availability of more in-depth feedback to reduce any lack of feature awareness. Tailoring the feedback to user characteristics (e.g., cultural background) could also benefit learning [35] . New in-app incentives (e.g., badges) could encourage more frequent use (e.g., more than five sessions) and use of the in-depth feedback pages and learning resources. Users could also benefit from being shown their improvement over time to implicitly reward continued use of the app. Overall, the ELAi app shows initial promise at creating an easily accessible resource for practicing and receiving feedback on spontaneous speaking tasks, but more research is needed to understand how this app can be the most beneficial to users. Mobile-assisted language learning A theory of learning for the mobile age Mobile devices for language learning? A meta-analysis Smart watches for making EFL learning effective, healthy, and happy Enhancing L2 learning through a mobile assisted spacedrepetition tool: an effective but bitter pill? Mobile-assisted ESL/EFL vocabulary learning: a systematic review and meta-analysis Duolingo effectiveness study final report Mobile-assisted language learning: a Duolingo case study Migrants and mobile technology use: gaps in the support provided by current tools Mobile language learning innovation inspired by migrants Hidden in plain sight: low-literacy adults in a developed country overcoming social and educational challenges through mobile learning support tools The effects of feedback interventions on performance: a historical review, a meta-analysis, and a preliminary feedback intervention theory Focus on formative feedback User experience of a mobile speaking application with automatic speech recognition for EFL learning An overview of mobile assisted language learning: can mobile devices support collaborative practice in speaking and listening? Paper presented at EuroCALL Language anxiety and achievement Investigating the dynamic nature of L2 willingness to communicate Students' perspectives on foreign language anxiety Why do many students appear reluctant to participate in classroom learning discourse? System Challenges experienced by Japanese students with oral communication skills in Australian Universities Communicative competence: some roles of comprehensive input and comprehensible output in its development Three functions of output in second language learning Languaging, agency, and collaboration in advanced second language proficiency Understanding linguistic, individual and contextual factors in oral feedback research: a review of empirical studies in L2 classrooms The appropriation of interactive technologies: some lessons from placeless documents Levels of processing: a framework for memory research Exploring mobile tool integration: design activities carefully or students may not learn Self-directed language learning in a mobile-assisted, out-of-class context: do students walk the talk? A power primer ProTutor: historic open learner models for pronunciation tutoring Open learner models Mobile-assisted language learning (MALL) Mobile learning as 'microlearning': conceptual considerations towards enhancements of didactic thinking The role of self-regulation and structuration in mobile learning Learners' oral corrective feedback preferences in relation to their cultural background, proficiency level and types of errors