key: cord-0901610-1umte98o authors: Lauer, Luisa; Altmeyer, Kristin; Malone, Sarah; Barz, Michael; Brünken, Roland; Sonntag, Daniel; Peschel, Markus title: Investigating the Usability of a Head-Mounted Display Augmented Reality Device in Elementary School Children date: 2021-10-05 journal: Sensors (Basel) DOI: 10.3390/s21196623 sha: 427cae037930201bc5ce5a7db2e32920a8bee240 doc_id: 901610 cord_uid: 1umte98o Augmenting reality via head-mounted displays (HMD-AR) is an emerging technology in education. The interactivity provided by HMD-AR devices is particularly promising for learning, but presents a challenge to human activity recognition, especially with children. Recent technological advances regarding speech and gesture recognition concerning Microsoft’s HoloLens 2 may address this prevailing issue. In a within-subjects study with 47 elementary school children (2nd to 6th grade), we examined the usability of the HoloLens 2 using a standardized tutorial on multimodal interaction in AR. The overall system usability was rated “good”. However, several behavioral metrics indicated that specific interaction modes differed in their efficiency. The results are of major importance for the development of learning applications in HMD-AR as they partially deviate from previous findings. In particular, the well-functioning recognition of children’s voice commands that we observed represents a novelty. Furthermore, we found different interaction preferences in HMD-AR among the children. We also found the use of HMD-AR to have a positive effect on children’s activity-related achievement emotions. Overall, our findings can serve as a basis for determining general requirements, possibilities, and limitations of the implementation of educational HMD-AR environments in elementary school classrooms. Augmented reality (AR) is an emerging technology in education that enables real-time integration of real and virtual objects in the field of view [1, 2] . The real world represents the main channel of perception, and virtual objects are spatially and/or semantically connected to real objects [3] . In educational settings in particular, this offers a great potential to enhance learning processes and, therefore, there is a high interest in the development and research of AR-environments and devices in this context. In particular, head-mounted displays enable an engaging interaction with real and virtual objects. Recent review studies and meta-analyses have confirmed the general benefits of AR-applications for learning [4] [5] [6] . However, it is noticeable that children of elementary school age benefit less than older students [7] . The usability of the applied devices seems to play a significant role in the success of AR-applications [8] . Technology for the recognition of user activities and behavior is referred to as 'human activity recognition' (HAR) technology [9, 10] . It is suspected that HAR of AR-devices such as Microsoft's HoloLens (first generation) was not yet technologically mature enough to enable interference-free learning in younger children. However, AR-technology has evolved and may be able to make up for the shortcomings of the past, allowing the potential for AR to be suitable for younger students. Therefore, the current study aimed to test the usability of the HoloLens 2 for elementary school students and to provide an empirical basis to decide whether it is worthwhile to develop school-related learning scenarios for the device. In addition, we examined which of the offered multimodal interaction modes can be handled best by the students. Following Santos et al. [11] , the overlay of the physical world with external representations through AR enables situated multimedia learning. Based on this approach, through AR it is attempted to combine the best of two worlds: situated active learning in a meaningful real-world environment and virtual learning environments carefully designed according to the principles of the Cognitive Theory of Multimedia Learning [12] . Initial studies indicate that through fulfilling the spatial contiguity principle, AR-based learning environments can reduce cognitive load [13, 14] and increase learning gains [15] . Moreover, Szajna et al. [16] found that HMD-AR-based applications for training can significantly reduce the time required to perform tasks. Most educational AR-applications are designed for handheld display-devices like smartphones or tablets, while head-mounted display AR-devices (HMD-AR-devices) are used rarely [4] . Nevertheless, HMD-AR-devices provide several advantages when used in educational settings. In contrast to handheld devices, learners wearing see-through HMD-AR-devices experience a seamless merge of virtual and physical worlds. From the perspective of multimedia learning, this should facilitate the creation of meaningful cognitive relations between virtual information and the physical environment, improving learning outcomes [12, 17] . Moreover, unlike handheld display-devices, HMD-AR-devices allow for freehand interaction with physical as well as virtual objects [18] . This becomes particularly useful for learning settings based on physical activities, like laboratory work, which requires leaners to use both of their hands [19] . Theories of embodied cognition suggest that bodily interactions with a learning task, such as hand and finger gestures, can support cognitive processes [20] . Furthermore, Korbach et al. [21] used 2D multimedia learning material to show that using the index finger for pointing and tracing related information influences a learner's focus of visual attention and promoted the learning process. Since HMD-AR-based learning environments enable the presentation or adaptation of learning information based on a learner's gesture or action in real-time, positive effects of embodied cognitions are expected be particularly strong [22] . According to Yuen et al. [23] , educational AR-applications can be designed for discoverybased learning (DBL), object modelling (OM), game-based learning (GBL), for the teaching of specific skills in training, or they can be integrated into distinct educational AR-books, with GBL and OM being the most frequently addressed purposes of educational ARapplications [24] . AR in education can help learners to conduct authentic explorations in the real world by displaying virtual elements [25] , and can facilitate the observation of processes that cannot be perceived with the naked eye [26] . Further, AR opens new opportunities for the individualization of the learning process through real-time interaction between reality and virtuality as real-time reaction and adaption to the learner's actions [27] . For instance, recent technological advances enable the augmentation of relevant real objects that were fixated by a learner [28] . Besides promoting the acquisition of knowledge and skills [5, 29] , AR can positively influence curiosity [30] as well as motivation and interest [31] in educational situations. Motivation and interest are known to be modulated by so-called activity emotions, which, as a type of achievement emotions, concern ongoing achievementrelated activities [32, 33] . While positive, activating emotions (e.g., enjoyment) are assumed to promote motivation and interest, negative, rather deactivating emotions (e.g., boredom), are associated with their decline. Therefore, emotions such as enjoyment, boredom, etc. are referred to as 'activity-related achievement emotions'. However, the use of AR in education can be obstructed by technical issues and can require additional instruction [34] . Hence, well-designed user interfaces in AR-applications are essential for successful learning [35] . Due to the prevailing research gap concerning the use of HMD-AR-devices and applications in education, as well as ongoing technical advancements concerning HMD-AR-devices, further research is required to validate the existing results and to investigate the effects of HMD-AR on the learning process. In order for AR-devices to exert their positive impact on information processing during learning, the handling and interaction with the device itself or with the virtual learning information offered must not itself lead to load on the learners as described in the previous Section. According to several reviews on AR in education, an often-reported issue concerning the practical use of AR-devices and applications in educational situations is the underwhelming usability [4, 36, 37] . The (technical) usability of an educational technologysupported setting, which comprises technically conditioned aspects of use and operation, influences the overall usefulness of a learning application [38] . While good usability of educational AR-applications facilitates learning, poor usability can even hamper learning processes [39] . Further, Papakostas et al. [40] found the usability to be the strongest predictor of the behavioral intention to use an AR-application for training. For HMD-AR-devices, a poor performance of the user activity recognition concerning the detection of operation commands can impact usability, as the device is operated through gesture-or voice-based interaction [41] . This aspect is more important when using the devices with young children, as their physical body characteristics (e.g., hand size, arm length, voice pitch) differ from adults [42] , for whom the devices are currently designed and calibrated. Previous research concerning the (technical) usability of HMD-AR-devices focused mainly on Microsoft's HoloLens (first generation) and samples of adults. An evaluation of the device for the purpose of an assembly application for manufacturing [18] found the device to be applicable, but also revealed that the spatial mapping required improvement. Munsinger et al. [43] used the Microsoft HoloLens (first generation) to investigate its usability for a target group of elementary school children. They compared three AR-interaction modes provided by HoloLens ('remote clicker', 'air-tap', 'voice command') in their efficiency using the measures 'input errors' 'tutorial time' and 'game time', and found that the 'voice command'-interaction performed significantly worse than the other two. Their findings are in line with rather poor performance occurring for interactive devices with voice-based operation in general [41, 44] . Besides their physical body characteristics, the children's individual state of cognitive development concerning motoric skills and spatial cognition [42] may affect the usability of HMD-AR-devices. For many applications, multimodal interfaces have long been recognized to be more robust, accurate, and preferred by users than unimodal ones. A major benefit is that users can freely choose their preferred modality combination [45] . However, this requires the ability to make a good modality choice, because ineffective interaction modalities may lead to unsatisfactory results [46] . Still, multimodal interfaces are considered to be "especially well-suited for applications like education, which involve higher levels of load associated with mastering new content" [47] (p. 33). As the HoloLens 2 offers different means to multimodally interact in AR, we investigate the preferred modality choices of elementary school children. So far, investigations on children's handling with the revised interaction modes of the latest HoloLens 2 are still pending. Announced innovations and improvements concerning the spatial positioning, speech, and gesture recognition for the successor model HoloLens 2 by Microsoft (see Figure 1 ) do not only make it necessary to investigate the applicability of existing findings for the new device. The new device could further represent an important step towards user-friendly HMD-AR-applications for educational purposes, especially for young children. Announced innovations and improvements concerning the spatial positioning, speech, and gesture recognition for the successor model The HoloLens 2 offers various means to interact in AR. To describe these modes of interactions, we will focus on the action 'selection of an AR-object' from the tutorial that is pre-installed on the device (see Video S1a). The AR-object to select in the tutorial was a shimmering gemstone. On the one hand, there are gesture-based interactions: To select the AR-gemstone with a gesture-based interaction, one can either tap directly on the gemstone (newly implemented 'tap'-interaction, see Figure 2a and Video S1a) or one can aim at the gemstone from a distance with the open palm and then tap with the thumb and index finger ('air-tap'-interaction, see Figure 2b and Video S1a). On the other hand, there is voice-and-gaze-based interaction: To select the AR-gemstone, one can also look at the gemstone and say 'select' ('voice command'-interaction, see Figure 2c and Video S1a). In total, two gesture-based and one voice-and-gaze-based interaction mode are available for AR-interaction on the HoloLens 2. The HoloLens 2 offers various means to interact in AR. To describe these modes of interactions, we will focus on the action 'selection of an AR-object' from the tutorial that is pre-installed on the device (see Video S1a). The AR-object to select in the tutorial was a shimmering gemstone. On the one hand, there are gesture-based interactions: To select the AR-gemstone with a gesture-based interaction, one can either tap directly on the gemstone (newly implemented 'tap'-interaction, see Figure 2a and Video S1a) or one can aim at the gemstone from a distance with the open palm and then tap with the thumb and index finger ('air-tap'-interaction, see Figure 2b and Video S1a). On the other hand, there is voice-and-gaze-based interaction: To select the AR-gemstone, one can also look at the gemstone and say 'select' ('voice command'-interaction, see Figure 2c and Video S1a). In total, two gesture-based and one voice-and-gaze-based interaction mode are available for AR-interaction on the HoloLens 2. Announced innovations and improvements concerning the spatial positioning, speech, and gesture recognition for the successor model The HoloLens 2 offers various means to interact in AR. To describe these modes of interactions, we will focus on the action 'selection of an AR-object' from the tutorial that is pre-installed on the device (see Video S1a). The AR-object to select in the tutorial was a shimmering gemstone. On the one hand, there are gesture-based interactions: To select the AR-gemstone with a gesture-based interaction, one can either tap directly on the gemstone (newly implemented 'tap'-interaction, see Figure 2a and Video S1a) or one can aim at the gemstone from a distance with the open palm and then tap with the thumb and index finger ('air-tap'-interaction, see Figure 2b and Video S1a). On the other hand, there is voice-and-gaze-based interaction: To select the AR-gemstone, one can also look at the gemstone and say 'select' ('voice command'-interaction, see Figure 2c and Video S1a). In total, two gesture-based and one voice-and-gaze-based interaction mode are available for AR-interaction on the HoloLens 2. Children are vulnerable and it is the responsibility of adults to protect them from possible harm. Technologies in research on children should therefore be applied very prudently. Particularly when using immersive technologies, such as AR and virtual reality (VR), special precautions should be taken [48] . To ensure that the psychological and cognitive state of the target group was taken into account, our research team included experts in the fields of infant mental development (psychologists) and elementary school pedagogy (teachers and researchers). These considerations led us to gently introduce the children to the technology: we first showed them the device and explained how it works in a child-friendly way. While they were using the smartglasses, an experimental supervisor was always on hand to help them. We also monitored the physical well-being of the children [49] by asking them repeatedly whether they experienced any discomfort in terms of simulator sickness. Moreover, the virtual content of the AR environment used does not contain frightening or startling elements. To protect the children's data (e.g., eye movement recordings [50] ), we used a private offline Wi-Fi to enable Mixed Reality Capture. Our aim is to assess the usability of Microsoft's HoloLens 2 as the latest HMD-ARdevice for the use with elementary school children. The device is not yet technically designed for use with children younger than 13 years: young children's lower interpupillary distance might hamper the perception of virtual objects [51] . Therefore, we want to explore how usable the device is in its current state, and which technical adaptions need to be carried out before the device can be successfully used with young children, following similar evaluations for the predecessor model by Munsinger et al. [43] . We want to gain an insight into the general challenges and benefits that can serve as baseline findings once the device is used in educational applications. Our main research focuses concerning the use of the device are: Evaluation of the overall usability of the HoloLens 2 as an HMD-AR-device; 2. Comparison of the provided AR-interaction modes concerning their efficiency; 3. Assessment of the children's interaction preference in HMD-AR; 4. Examination of the change in activity-related achievement emotions. We invited 47 students (29% female, age: M = 9.3 years; SD = 0.9 years, 2nd to 6th grade) to participate in a laboratory study at the Saarland University. They took part in another study at the same day (either before or after attending this study), but the other study did not include the use of an HMD-AR-device. None of the children had previous experience with AR. In the beginning, we conducted test runs with four children (for procedure and instruction refinement, without data collection), so n = 43 valid data sets were collected. The study was conducted using a within-subjects design. The independent variable was interaction mode, and the modes were modeled as different measuring points. The different multimodal AR-interaction modes provided by HoloLens 2 ('tap', 'air-tap', and 'voice command') were compared regarding the dependent variables 'mean number of attempts' and 'mean time'. For the children's personal interaction preference in AR, we formed the variables 'most favorite interaction mode' and 'least favorite interaction mode'. The most and the least favorite interaction mode can be 'tap', 'air-tap' or 'voice command'. To investigate general effects of HMD-AR-usage on activity-related achievement emotions, we formed a pre-and a post-test variable for 'enjoyment', 'boredom' and 'frustration'. To assess the overall device usability, we formed the variable 'system usability score'. Due to the COVID-19 situation, only individual appointments with private journeys could be made. Prior to the start of the study, parents were informed about the investigation and gave their written consent for their children's study participation. The procedure of the study was centered around a standardized tutorial on interaction in HMD-AR on the HoloLens 2 in German language. Before starting the tutorial, we assessed the children's enjoyment, boredom and frustration. These activity-related achievement emotions are assumed to allow for inferences about motivation and interest [32, 33] (variables 'enjoyment-pre', frustration-pre', 'boredom-pre'). Each emotion was assessed using a single item adapted for children from Riemer and Schrader [52] (see Appendix A and Document S1b). Moreover, children were asked about their previous experience with AR. The children were then introduced to the HoloLens 2 and the concept of HMD-AR by showing them the 'Mixed Reality Capture' (livestream) while the experimental supervisor was wearing the device (see Figure 3 ). We thoroughly instructed the children to handle the device carefully and explained that it is not a toy. Afterwards, the experimenter mounted the device on the child's head and an eye calibration was carried out. As described in Section 1.5, the device is currently designed for adults and the manual states that children under the age of 13 years might not be able to see virtual objects comfortably due to a low interpupillary distance. We therefore asked the children after the eye calibration whether they had any problems in seeing the virtual objects, especially reading texts. Then, the children were informed that they were going to learn different methods of interaction in AR and went through the standardized tutorial on multimodal interaction in HMD-AR that is pre-installed on the HoloLens 2 (see Video S1a). The tutorial includes several interaction scenarios. For our analysis, we focus on the task 'selecting a gemstone' only because it is available for all interaction modes. The task 'selecting a gemstone' is the first shown in the tutorial. The three interaction modes are introduced one after the other and the order of the tutorial tasks is fixed ('tap'-'air-tap'-'voice command'). At the beginning, three gems are shown. They must be selected one after the other with the respective method. During the entire tutorial, the gems can only be selected with the interaction method that is currently being introduced. An invisible speech-based virtual agent explains what to do in each case, and this information is additionally displayed in text form. For 'tap' the translated instruction is: "Tap a nearby gem with your finger to select it." The translated instruction for 'air-tap' is: "Aim the beam from your palm at holograms out of range. Tap to select with your index finger and thumb and release." For the 'voice-command'-interaction, the translated instruction is: "Target a gem with the gaze cursor and say 'Select'). The auditory explanation is played only once, while the text remains visible. If the correct (gesture or voice) input does not follow immediately, help is given depending on the interaction method: For gesture-based interaction, a hand appears that repeats the correct gesture until the gem is successfully selected. In voice-based interaction, the text "say