Embodied Head Gesture and Distance Education


http://www.diva-portal.org

This is the published version of a paper presented at 2nd International IBM Symposium on Human
Factors, Software, and Systems Engineering, 26–30 July 2015, Las Vegas, United States.

Citation for the original published paper :

Khan, M S., ur Réhman, S. (2015)

Embodied head gesture and distance education.

In: 6th International Conference on Applied Human Factors and Ergonomics (AHFE 2015) and the

Affiliated Conferences (pp. 2034-2041).

Procedia Manufacturing

http://dx.doi.org/10.1016/j.promfg.2015.07.251

N.B. When citing this work, cite the original published paper.

Permanent link to this version:
http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-109205


2035 Muhammad Sikandar Lal Khan and Shafi q ur Réhman  /  Procedia Manufacturing   3  ( 2015 )  2034 – 2041 

1. Introduction 

In teaching and learning process, the role of teacher is undeniably vital and his/her presence stimulates the social, 
emotional and cognitive interactions especially when it comes to cross-cultural communication and education [5, 
31]. The researchers agreed that in cross-cultural education, face-to-face interaction and communication deliver the 
best experience in developing skills, knowledge and competence [3, 4]. But, the recent technological advancements 
have given birth to online or distance education setting where text, audio and video forms of interactions are mostly 
used. These forms of distance education setting have greatly benefited the education process [6]; however when it 
comes to pragmatics, prosody, and non-verbal behavior based communication, which are considered important 
components of emotional and cognitive presence of teacher, are not being fully addressed using internet based text, 
audio and video communication [2]. The emotional and cognitive based embodied presence of a teacher not only 
assists with planning, conducting, intervening in teaching process but also improves the student’s ability to relate 
and create meaning from the taught contents [7,8]. It has been reported that teacher’s behavioral cues such as facial 
expression, eye contact, proximities, direction of attention, postures and gestures, have significant impact on student 
performance and improves teacher’s influential role [9,10,12].  

The most common computer mediated distance education is based on standard audio-video conferencing system. 
These video conferencing systems based distance educational setting are considered economical and motivational for 
students but are restrictive when it comes to non-verbal communication [1,2,32]. In face-to-face learning, head 
gesture (and eye contact) conveys a serious communicative messages especially where speech is ambiguous, hard to 
hear and/or understand [11,13]. Recently, researchers have proposed various robotic-agent based solutions to narrow 
the gap between face-to-face learning and computer based distance learning. These robotic-agents based distance 
learning is also termed as robot-mediated learning. These robotic-agents improve the instructional effectiveness 
while presenting an embodied presence of a teacher in distance educational setting (see [19]). It is reported that this 
inclusion can provide an effective blend of human-behavioral cues and computer-mediated standard video-
conferencing systems. This embodied presence increases with social behaviour of robotic-agent which is directly 
related to its movements [28]. Human head movement is very important in general conversation and in-class 
communication, tutor’s head movements repersent the contextual information as well as an integral part of a deictic 
gestures [26] . Despite the influential role of the head gestures in teaching and learning, very little research has 
examined gesture’s role in the robot-mediated learning process.  

In this work, we present experimental studies that investigate the role of embodied head gesture (and eye contact) 
developed for a robotic agent in distance learning scenarios. For these experimental studies, we have used our 
embodied telepresence system (ETS) which represent the head gesture of a human tutor in distance education 
setting. We provide a comparison study on learning experience using ETS and standard video conference system. 
Through such scenarios we want to test our hypothesis that ‘robotic agent with expressive head gesture (i.e., mimic 
the human head movement) can improve the students learning process and hence have positive impact on their 
performance’. 

2. Background 

The massive growth in communication technologies, in last two decades, has impacted on distance education and 
training, which is expressively different from the traditional classroom interactions. It is estimated that more than 
6.7 million students are registered in online distance education [14]. During in-class interaction, people exchange 
non-verbal cues, facial expressions, gaze direction, bodily gestures, and tone of the voice to create presence and 
perform various interaction patterns to aid information transmission. It is believed that nonverbal cues accomplish 
two distinct purposes 1) a direct passage of information from one person to another; 2) the ‘integrational aspects’ of 
the communication process [18]. The ‘integrational aspects’ contains all the non-verbal physical manifestations of 
information exchange that regulate the interaction process and keep the conversation going and providing semantic 
meaning as well as relation to larger contexts. The most dominant form of computer mediated communication is 
based on the standard video conferencing software which is often described as a medium that is limiting in non-
verbal cues and social context [15,16,17]. Considering these limittaions, researchers are proposing the robotic and 


2036   Muhammad Sikandar Lal Khan and Shafi q ur Réhman  /  Procedia Manufacturing   3  ( 2015 )  2034 – 2041 

animated agents which would be assiting the remote students and may offer computer medicated education with 
flavour of human tutor’s beahvioural cues. 

 Recently, Robotic-agents, also known as educational assistive robots (EAR), have been used to interact with 
students in order to help them develop educational skills both in distance education and in-class setting [19, 20]. 
However, the effectiveness of these robot agents is not only measered by the task performance but also its social 
behaviour which is perceived and understood by its interacting parnters [21, 22]. Hence, the perception of social 
behaviour of a robotic-agnet is directly related to its movements which also repersents the contextual information. 
Therefore, the design and modeling of robot-agent’s mechincal part movements (eyes, head, arm, etc.) must be 
approached carefully. When it comes to educational assistive robots in distance education, it is highly desirable that 
their head movement (and hence eye contact) must ‘look like’ or ‘similar’ to human tutor head-gesture; as human 
gestures which are considered very cruial in social interactions as well as teaching and learning settings [9-11]. In-
class setting, tutor’s head movement communicates levels of gaze, physical proximity, and other behaviors 
indicative of interactivity. This feedback communicates relevant information to synchronize rhythm between 
participants as well as provides the contextual information [23-27]. 

In this work, we study the role of head movements of educational assistive robots due to its significance in face-
to-face settings. For these experiments, we have used our robotic agent namely; Embodied Tele-presence System 
(ETS). ETS is a three degree-of-freedom tele-robot which mimics the human neck movements; i.e., ETS embodies 
the remote participant head-movements to give more levelheaded presence and can be perceived by his collaborators 
as being equally present by gaze direction and by head embodiment. For more details about ETS design and 
functionality see [28]. 

2.1. Hypothesis 

For the following studies, we want to see how the head-gesture will affect the focus of attention and improve the 
learning experience of remote participants for distance education. We felt that the remote participants will be more 
interested-in and adaptively engaged by embodied head gesture based interactions as compared to standard video 
conferencing based distance education setting. Therefore, we consider the following two hypothesis; 

H1: Participants will have positive reaction towards ETS, which will increase over time. 
H2: Participants engagement with ETS in real time will improve participants’ attention and involvement hence 

motivate them toward learning.  

Fig. 1. a) Human head orientation modelling used for head pose estimation and designing a 3 DOF neck/head robot; (b) Embodied Telepresence 
System (ETS)- our educational assistive robot. 

 
2037 Muhammad Sikandar Lal Khan and Shafi q ur Réhman  /  Procedia Manufacturing   3  ( 2015 )  2034 – 2041 

3. Embodied telepresence system (ETS): Modeling head gesture 

The simulation of head gesture through ETS consists of two modules; software module and hardware module. In 
software module, the pose of the human head is estimated with a constraint that the human head is a 3 DOF rigid 
object which has yaw, pitch and roll movements as shown in Fig. 1(a). We have used our geometric head pose 
estimation (GHPE) algorithm which estimates the head pose through a standard webcam of the computer. The 
implementation of GHPE algorithm is done in VC++ and the details of this algorithm can be found in [29].  

The hardware module of ETS consists of a 3 DOF neck/head robot for exhibiting the real head movement. We 
named this robot as Embodied Telepresence System (ETS) as shown in Fig. 1(b). The design of ETS consists of 
three servo motors attached in a configuration to give all three degrees of head motion (yaw, pitch and roll). The 
tablet PC is used to present the audio and video of the person.  

We have used ETS in a distance education scenario where tutor’s head gestures are presented to the students 
through the combination of software and hardware modules of the ETS. The deployment of the ETS system in real 
distance education scenario is explained in section 3.1. 

3.1. System deployment 

The ETS system is deployed for a distance education setting where we have two sites: a student site and a tutor 
site as shown in Fig.2. The left column in Fig.2 shows the tutor site and the right column shows the student site. At 
tutor site, we have two computer screens; where one screen displays the lecture slides and the other shows a real-
time video stream of the student. Here at tutor site, a webcam based GHPE algorithm is installed to estimate the 
head gesture of the tutor. At student site, we have an ETS which could be used to present the audio-video and head 
gesture of tutor during lecture. The similar lecture slides are also being presented on the computer screen of the 
student and is controlled by the tutor. The real time deployment of the system consists of following steps:  

 
 Audio-Video communication is setup between the tutor and student site through Video-over-IP software (i.e. 

Skype). 
 Wireless data communication is done through Xbee wireless transceiver.  
 The GHPE algorithm is used to calculate the Yaw, Pitch and Roll angles of the tutor.  
 The pitch and roll angles are mapped directly to ETS to present the head gesture of the tutor at student site; 

where, the Yaw angle decides where the tutor is looking, i.e. is he looking at lecture slides or at the student video.  
 Based on the provided angles to ETS-controller, it turns toward the lecture slides or toward the student at student 

site- ‘showing’ eye-contact and head gesture.  
 ETS controller generates PWM signals to perform these yaw, pitch and roll movements.  
 The whole system performs real time communication with 25 frames per second.  

Fig. 2. Application scenario: one-to-one distance education setting; the left is a ‘tutor site’ and the right side depicts the remote student participant 
with educational assistive robot. 

 
2038   Muhammad Sikandar Lal Khan and Shafi q ur Réhman  /  Procedia Manufacturing   3  ( 2015 )  2034 – 2041 

Fig. 3. Real experimental setting: one-to-one distance education setting; the left is a ‘tutor site’ with two screen and head gesture estimation 
algorithm installed and the right side depicts the remote student participant with one screen and a Embodied Telepresence (ETS). 

4. Experimental studies 

The goal of the experimental studies are to investigate the effectiveness of our novel distance education scenario 
where the distant located student is assisted by the tutor’s head gestures, gaze and focus of attention. Furthermore, 
this study focuses on the effects of head gestures in distance education over time.  

4.1. Participants and procedure 

Ten students (5 boys and 5 girls) were recruited from the campus of Umeå University, Sweden, ranging in the 
age of 15 to 23. All the students were directly involved in distance education and they have the basic knowledge of 
mathematics. Furthermore, we have hired one mathematics teacher to deliver a lecture on triangulations. A training 
was given to the teacher to make him familiar with the ETS and head gestures based distance education setting. All 
the students plus the teacher were told about the purpose of the experiment. The experiment was setup according to 
the system deployment steps. The real experimental setup is shown in Fig. 3. The experiment was run 10 times for 
10 students; where teacher delivers a lecture for 5 minutes on triangulations for each student. There are two 
scenarios for this experiment; one where teacher delivers a lecture through simple skype conversation for 2.5 
minutes and the other where teacher delivers a lecture through ETS for the next 2.5 minutes. The order of these 
sessions was random; i.e., some students took skype lecture first and other ETS base lecture. At the end of the 
experiment there was informal interview and each student was given a questionnaire to fill in questions related to 
our hypothesis. 

4.1. Questionnaire  

Our modified subjective questionnaire was adopted from a previously developed questionnaire; i.e., studies 
related to distance education by [30]. We have selected the following most important questions which are directly 
related to our experiment. The questionnaire used Likert style 7-point rating system, which scales from 1 to 7. The 
value 1 represents strong disagreement (negative) and 7 represents strong agreement (positive). The questions in the 
questionnaire are:  

 
1) Motivation toward learning: Does movement capability of the educational assistive robot (ETS) motivates you to 

learn more during online lecture?  
2) Monitoring participants: Do you feel to be monitored by your tutor during lecture?  
3) Trust: Does the gaze direction of tutor helps to build trust during lecture? 
4) Understanding: Does ETS help you to understand more as compare to traditional online teaching method? 


2039 Muhammad Sikandar Lal Khan and Shafi q ur Réhman  /  Procedia Manufacturing   3  ( 2015 )  2034 – 2041 

5) Disturbing: Do you feel disturbed by the ETS? (1 = disturbed and 7 = not disturbed at all) 
6) Track: Can you keep a track of the lecture? 
7) Welcome and Comfortable: Is ETS welcoming and comfortable during teaching? 
8) Time: Do you forget the role of technology over time? 
9) Physical Presence: Do you feel the physical presence of the tutor at your site? 

 
All these questions are compared with the traditional online teaching method (i.e. through video-over-IP 

software). 

5. Results and discussion 

The questionnaire results were analyzed by calculating the means and standard deviations of the ETS vs Skype 
systems used in distance learning scenario. Table 1 shows the mean questionnaire response for all the questions on 7 
point Likert scale. The results are graphically shown in Fig. 4, where red bars show the ETS based user-responses 
and the blue bars show the Skype based user-responses.  

If we compare the standard deviation of ETS with Skype, we see a large variation for ETS as compared to Skype 
only setting. The less variation in case of Skype can be due to the fact that the students were ‘used-to’/aware of the 
traditional distance learning methods and were previously using them. But it was the first time they experienced 
distance learning through ETS, hence large standard deviation as compared to Skype setting. However, if we 
compare the mean values of the questionnaire, it can be seen that ETS setting outperform Skype-only setting in 
every question except in Disturbing. The students found that the ETS based scenario (μ= 4.5) is unusual which can 
be more disturbing as compared to Skype-setting (μ= 5.0). This was expected as movement based interaction are 
sometime distracting but with time (μ= 5.9) students forget the role of technology and showed the ‘feeling’ of 
physical presence (μ= 6.3) of the tutor when using ETS. The students felt to be monitored (μ= 6.0) as the ETS 
sometime moves toward the computer screen and sometime makes an eye contact with the student.  

One of the student mentioned that ‘now he feels the influence of the teacher during lecture….’ Because of 
monitoring/influence and head gesture, students felt motivated to learn and these non-verbal cues help them to 
understand better. 

There was more excitement in students as they found this learning method is more fascinating. The trust factor 
through ETS learning was also high as compared to skype-only based communication. The students keep an equal 
track of the lecture with both system. Finally, the students found ETS welcoming and comfortable and they showed 
willingness to buy this product in informal interviews. Based on these results, it can be confidently said that our both 
hypothesis holds for the setting i.e., H-1-- Participants have positive reaction towards ETS, which increases over 
time. Furthermore, H-2 – participants’ engagement with ETS in real time enhances participants’ attention and 
involvement hence motivate them toward learning.  

Table 1. Statistical Analysis of survey questionnaire. 

 
Questions 

ETS Skype 
Mean 
(μ) 

Std. dev.  
(σ) 

Mean  
(μ) 

Std. dev. 
 (σ) 

Motivation 5.7 0.7 4.1 0.6 
Monitoring 6.0 0.5 3.8 0.2 
Trust 4.6 1.1 3.9 0.9 
Understanding 5.0 0.9 4.0 0.5 
Disturbing 4.5 1.3 5.0 0.5 
Track 6.2 0.5 6.1 0.4 
Welcome/comfortable 6.5 0.3 4.5 1.0 
Time 5.9 0.9 4.2 0.7 
Physical Presence 6.3 0.6 4.0 0.2 


2040   Muhammad Sikandar Lal Khan and Shafi q ur Réhman  /  Procedia Manufacturing   3  ( 2015 )  2034 – 2041 

6. Conclusion and future direction 

Most of the traditional methods for online teaching are limited to standard audio-video and text based 
communication. These methods have limited ability to transmit certain nonverbal cues. In this paper, we have 
proposed a novel scenario for distance education by introducing educational assistive robot named embodied 
telepresence system (ETS). ETS is a physical representation of the tutor at student site, where head gestures of tutor 
are mapped to ETS. Furthermore, ETS imitates a shift in its focus of attention according to the tutor’s focus of 
attention. Our user study shows the effectiveness of the proposed approach in distance education scenario. The 
results suggest that the nonverbal cues provide a vital feedback such as head nod, head shake etc. and these cues are 
useful supplement to audio-video communication. Our hypothesis upheld on the basis of experimental results i.e. the 
robotic agent with expressive head gesture improves the learning process and have a positive impact on the student 
performance. 

The present study is subject to limitations that should be addressed in future research. The present study focused 
on a small number of university students in Sweden with little cultural diversity, therefore future research is required 
to determine whether the same results would be obtained in distance education environments with learners of 
different ages, gender, grade, intellectual level, and diverse cultural background. Similarly, future research is needed 
to determine whether the proposed ETS based online learning environment foster/affect the two way presence with 
positive affect on student learning outcomes. 

References 

[1] S. Yarosh, K. M. Inkpen and A. Brush, “Video playdate: toward free play across distance,” in Proceedings of the SIGCHI Conference on 
Human Factors in Computing Systems, 2010.  

[2] D. Szafir and B. Mutlu, “Pay attention!: designing adaptive agents that monitor and improve user engagement,” in Proceedings of the SIGCHI 
Conference on Human Factors in Computing Systems, 2012.  

[3] D. Batchelder and E. G. Warner, Beyond Experience: The Experiential Approach to Cross-Cultural Education., ERIC, 1977.  
[4] K. Cushner and R. W. Brislin, Intercultural interactions: A practical guide, Sage publications, 1995.  
[5] H. L. Dreyfus, On the internet: Thinking in Action, Routledge, 2008.  

 
Fig. 4. Mean Questionnaire Score. 


2041 Muhammad Sikandar Lal Khan and Shafi q ur Réhman  /  Procedia Manufacturing   3  ( 2015 )  2034 – 2041 

[6] M. Merryfield, “Like a veil: Cross-cultural experiential learning online,” Contemporary issues in technology and teacher education, vol. 3, no. 
2, pp. 146-171, 2003.  

[7] C. F. Aust, “Face-to-Face Communication outside the Digital Realm to Foster Student Growth and Development,” in Teaching, Learning, and 
the Net Generation: Concepts and Tools for Reaching Digital Learners, CH5, IGI Global, 2012, p. 74. 

[8] D. M. Christophel, “The relationships among teacher immediacy behaviors, student motivation, and learning,” Communication education, 
vol. 39, no. 4, pp. 323-340, 1990.  

[9] W.-M. Roth, “Gestures: Their role in teaching and learning,” Review of Educational Research, vol. 71, no. 3, pp. 365-392, 2001.  
[10] J. R. Nelson and M. L. Roberts, “Ongoing reciprocal teacher-student interactions involving disruptive behaviors in general education 

classrooms,” Journal of Emotional and Behavioral Disorders, vol. 8, no. 1, pp. 27-37, 2000.  
[11] R. H. Thaler and C. R. Sunstein, Nudge, Yale University Press, 2008.  
[12] J. A. Fredricks, P. C. Blumenfeld and A. H. Paris, “School engagement: Potential of the concept, state of the evidence,” Review of 

educational research, vol. 74, no. 1, pp. 59-109, 2004.  
[13] R. B. Church, S. Ayman-Nolley and S. Mahootian, “The role of gesture in bilingual education: Does gesture enhance learning?,” 

International Journal of Bilingual Education and Bilingualism, pp. 303-319, 2004.  
[14] I. E. Allen and J. Seaman, Changing Course: Ten Years of Tracking Online Education in the United States., ERIC, 2013.  
[15] A. Feenberg, “The written world: On the theory and practice of computer conferencing,” Mindweave: Communication, computers, and 

distance education, pp. 22-39, 1989.  
[16] J. Wuther and J. Andersson, “Interpersonal Effects in Computer-Mediated Interaction,” Communication Research, pp. 460-480, 1994.  
[17] D. R. Garrison, T. Anderson and W. Archer, “Critical inquiry in a text-based environment: Computer conferencing in higher education,” The 

internet and higher education, pp. 87-105, 1999.  
[18] R. L. Birdwhistell, Kinesics and context: Essays on body motion communication, University of Pennsylvania press, 2010.  
[19] L. Brown , R. Kerwin, and A.M. Howard, “Applying behavioral strategies for student engagement using a robotic educational agent,” in 

IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2013.  
[20] G. Gordon and C. Breazeal, “Bayesian Active Learning-based Robot Tutor for Children’s Word-Reading Skills,” in Proceedings of the 29th 

AAAI Conference on Artificial Intelligence, Austin, USA, 2015.  
[21] M. Saerbeck and A. J. van Breemen, “Design guidelines and tools for creating believable motion for personal robots,” in the 16th IEEE 

International Symposium on Robot and Human interactive Communication (RO-MAN 2007), 2007.  
[22] M. Saerbeck and C. Bartneck, “Perception of affect elicited by robot motion,” in The 5th ACM/IEEE international conference on Human-

robot interaction, 2010.  
[23] A. T. Dittmann and L. G. Llewellyn, “Relationship between vocalizations and head nods as listener responses,” Journal of personality and 

social psychology, vol. 9, no. 1, p. 79, 1968.  
[24] J. Lemke, “Meaning-making in the conversation: Head spinning, heart winning, and everything in between,” Human Development, pp. 87-

91, 1999.  
[25] C. Pelachaud, V. Carofiglio, B. De Carolis, F. de Rosis and I. Poggi, “Embodied contextual agent in information delivering application,” in 

the first international joint conference on Autonomous agents and multiagent systems: part 2, 2002.  
[26] C. Goodwin, Conversational organization: Interaction between speakers and hearers, Academic Press, 1981.  
[27] L.-P. Morency, C. Sidner, C. Lee and T. Darrell, “Contextual recognition of head gestures,” in 7th international conference on Multimodal 

interfaces, 2005.  
[28] M. Khan, H. Li and S. Rehman, “Embodied Tele-Presence System (ETS): Designing Tele-Presence for Video Teleconferencing,” in HCI 

International 2014, 2014.  
[29] M. Khan, S. Rehman, Z. Lu and H. Li, “Head Orientation Modeling : Geometric Head Pose Estimation using Monocular Camera,” in 

Proceedings of the 1st IEEE/IIAE International Conference on Intelligent Systems and Image Processing, 2013.  
[30] P. Youngblood, F. Trede and S. Di Corpo, “Facilitating online learning: A descriptive study,” Distance Education, vol. 22, pp. 264--284, 

2001.  
[31] R. M. Lehman and S. C. Conceicao, Creating a sense of presence in online teaching: How to 'be there' for distance learners, John Wiley & 

Sons, 2010.  
[32] M. Khan and S. Rehman, Distance Communication: Trends and Challanges and how to resolve them, HandBook: Strategies for a Creative 

Future with Computer Science, Quality Design and Communicability. Blue Herons, 2014.