key: cord-0317610-6d305k1b
authors: Abdollahi, Hojjat; Mahoor, Mohammad H.; Zandie, Rohola; Siewierski, Jarid; Qualls, Sara H.
title: Artificial Emotional Intelligence in Socially Assistive Robots for Older Adults: A Pilot Study
date: 2022-01-26
journal: nan
DOI: 10.1109/taffc.2022.3143803
sha: 99ea5278ea2b11d8b6e977f4d81604a399f26929
doc_id: 317610
cord_uid: 6d305k1b

This paper presents our recent research on integrating artificial emotional intelligence in a social robot (Ryan) and studies the robot's effectiveness in engaging older adults. Ryan is a socially assistive robot designed to provide companionship for older adults with depression and dementia through conversation. We used two versions of Ryan for our study, empathic and non-empathic. The empathic Ryan utilizes a multimodal emotion recognition algorithm and a multimodal emotion expression system. Using different input modalities for emotion, i.e. facial expression and speech sentiment, the empathic Ryan detects users' emotional state and utilizes an affective dialogue manager to generate a response. On the other hand, the non-empathic Ryan lacks facial expression and uses scripted dialogues that do not factor in the users' emotional state. We studied these two versions of Ryan with 10 older adults living in a senior care facility. The statistically significant improvement in the users' reported face-scale mood measurement indicates an overall positive effect from the interaction with both the empathic and non-empathic versions of Ryan. However, the number of spoken words measurement and the exit survey analysis suggest that the users perceive the empathic Ryan as more engaging and likable.

Socially Assistive Robotics (SAR) is a sub-field in robotics that aims to develop intelligent robots that can provide aid and support to users [1] . For instance, older adults living in senior care facilities often feel lonely and isolated. Social interaction and mental stimulation are critical for improving their well-being [2] , [3] . SAR has shown to alleviate this problem by providing companionship to assist older adults through conversation and social interaction [4] , [5] . Furthermore, the global outbreak of COVID-19 and the effects of social distancing and stay-at-home orders drew more attention to the isolation of older adults living in senior care facilities. The COVID-19 pandemic has highlighted the healthcare worker shortage that currently plagues the healthcare system [6] , and SAR has recently been used to address this problem by researchers [7] , [8] , [9] .

To more naturally and effectively interact with humans, we can endow robots with social capabilities. A social robot must be equipped [10] with human-oriented interaction that exhibits context and user-appropriate social behavior and focuses attention and communication on the user. Studies suggest that adding emotional information to SAR enhances user satisfaction [11] and results in a more positive interaction between robot and human. Empathy is a critical skill in health and elder care; users perceive robots that express empathic behavior as more friendly, understanding, and caring [12] .

A social robot with Artificial Emotional Intelligence (AEI) can recognize, process, simulate, and react to human affects/emotions [13] . The development of affective and empathic robots that have the capability to recognize users' emotions and interact with them naturally and effectively is in its infancy and more research needs to be carried out in this field [14] .

Liz: "Sure. Here is one: What's Forest Gump's password? One Forest one. . . . " [The robot tells a joke while smiling for the user] This dialogue example illustrates the different components that can serve to develop a friendly robot. Liz pro actively asks Katie how she is doing. When a human-oriented robot proactively starts a conversation with a user living in a senior care facility, it is helpful for the robot to detect the duration for which the user has been in the room. For instance, if the robot detects that the user has been in their room for a long period of time, then the user has probably not had a lot of social interaction during that time, and it is probable that the user has been alone. The robot should also have the ability to engage in a spoken dialogue with the user [15] . In the example above, the robot uses Sentiment Analysis (SA) and Facial Expression Recognition (FER) and detects a discrepancy between Katie's response and her facial expression. Emotional intelligence requires a multimodal emotion perception system [16] . To improve Katie's mood, the robot decides to tell a joke and smile. This means that the robot needs multiple channels to express emotional information.

This paper presents the results of our recent progress in developing an emotionally intelligent and autonomous conversational robot named Ryan. Ryan is designed to assist older adults suffering from mild dementia. Impaired thinking and cognitive decline, apathy, loss of interest in activities and hobbies, social withdrawal, isolation, and trouble concentrating are common symptoms of both dementia and depression [17] . Figure 1 depicts a general diagram of our human-robot-interaction (HRI) system. We utilized state-ofthe-art deep learning technology for multimodal emotion recognition (i.e. affective computing), the output of which is integrated into Ryan's dialogue management system. We developed Ryan's dialogue management system by writing scripted conversations on 12 different topics, including science, history, nature, music, movies, and literature. Based on the detection of users' facial expressions and language sentiment analysis, Ryan appears to empathize with users through emotive conversation and mirroring users' positive facial expressions (for example, Ryan smiles when the user smiles). We conducted an HRI study to measure the effectiveness of our emotionally intelligent robot in communicating and empathizing with older adults by creating two versions of the robot, one equipped with emotional intelligence (empathic Ryan) and one unequipped for emotional intelligence (non-empathic Ryan).

In 2016, we studied the feasibility of using a prototype version of Ryan with a broad range of features (dialogue, calendar reminders, photo album slide shows, music and video play, and facial expression recognition) to interact with older adults with mild depression and cognitive impairments [18] . The results of our previous study show that elderly individuals were interested in having a robot as a social companion and their interest did not wane over time. The subjects reported to enjoy interacting with Ryan and accepted the robot as a social companion, although they did not believe that Ryan can replace human companionship [18] . Because Ryan was equipped with several features, we could not thoroughly study the effect of emotional intelligence on measuring users' engagement with respect to conversational interaction. Therefore, in this study, we specifically focused on how emotional intelligence can improve and impact the quality of interaction and engagement with Ryan.

The main contributions of this paper are: 1) creating a multimodal emotion sensory and facial expressive system, 2) integrating the developed sensory and expressive system into a physical robot (i.e., creating empathic Ryan), 3) studying the effectiveness of the empathic Ryan with a cohort of older adults living in a senior care facility. Our hypothesis is that an emotionally intelligent robot is perceived as more friendly by users and positively affects their mental well-being (measured by changes in depression score and emotional state) in comparison to a robot without empathic capabilities.

The remainder of this paper is organized as follows. Section 2 defines the term Emotional Intelligence and details the makeup of an emotionally intelligent robot. Section 3 introduces a social robot named Ryan and explains the robot's hardware and software, concentrating on the components that correspond with the definition of emotional intelligence. Section 4 lays out the design of the study. The results are presented in Section 5. Finally, Section 6 concludes the paper and outlines future work.

Emotional Intelligence (EI) is the combination of thoughts and feelings [19] that enables us to perceive and manage our own emotions and also observe and interpret others' emotions and respond accordingly [20] . Dr. Picard, the author of "Affective Computing" book [16] , argues the need to integrate emotion in our machines and claims that it might be impossible to reach true intelligence without emotions. Integrating emotions into machines and technology services can improve numerous and diverse aspects of our lives. EI can improve communication systems, governance, personal assistants, physical and mental healthcare, education, advertisement, and the gaming industry [21] .

Before delving into EI, we will first clarify the word "emotion" and differentiate "empathy" from EI. Since there is no agreed upon definition for emotion, we will use this word as the intuitive and subjective concept that is used commonly in HRI literature [22] . Empathy is the ability to feel and experience other people's emotions. Empathy is the capacity to (a) share other people's emotional state or be affected by it, (b) infer the reasons of said emotional state, and (c) adopt other people's perspectives [23] . Compared to empathy, EI is the general ability to perceive, understand, express, and manage emotions [16] . EI consists of three components, while empathy is considered as one of the many aspects of EI: (A) Sensing and measuring emotions: monitor and measure one's and other's mental and emotional state. (B) Understanding and modeling emotions: understand and interpret recorded emotions. Usually, this step is carried out by mixing sensory information to get a clear picture of the emotional states of all agents involved. (C) Using and expressing emotions: utilize the measured emotions and current state of mind to drive one's thoughts, take action, choose responses, empathize, and express appropriate emotions using verbal and nonverbal cues. Recently, there have been several studies that investigate incorporating empathy in social robots [24] , [25] , [26] , [27] . This is mainly due to advances in emotion recognition in different modalities. Due to these advances, more studies have fused different modalities of emotion to create a more natural emotion recognition system [28] , [29] .

One group of people that has been the subject of robotics studies in healthcare are the residents of senior care houses. Back in 2003, Wada et al. [30] successfully showed that the social robot called Paro can lower stress levels and create a strong bond with older adults. Although Paro is a petlike robot with limited emotion expression and no emotion perception or speech abilities, it can be an effective companion for older adults. Paro is still being used as a robotic pet in dementia care studies [31] . With recent advancements in technology, especially in AI, HRI studies have evolved into a more sophisticated process. Dino et al. [32] studied the use of a social robot to deliver iCBT (Internet-based Cognitive Behavioural Therapy) to adults with depression. Sarabia et al. [33] used Nao [34] to combat social isolation in acute hospital settings. However, robots such as Paro and Nao are not expressive and these studies do not focus on emotional intelligence and its effects on the user.

A robot with AEI should be able to detect people's emotional state while simulating its own state of mind. The act of understanding one's feelings is called intra-personal intelligence [35] . It is possible to simulate intra-personal intelligence by modeling the state of mind of the robot using an internal emotion model. Sensing other people's emotions (interpersonal intelligence [35] ) is more challenging. Other people's emotions are conveyed in several different modalities. As humans use multiple modalities to express their emotions, an emotionally intelligent robot must ideally have a multimodal emotion recognition system [36] , [37] . However, there are very few studies using a multimodal emotion recognition system in a robot. Many studies on HRI use a uni-modal emotion recognition system. One of the most popular approaches to uni-modal emotion recognition is FER. Other than FER, which is based on non-verbal visual cues, sentiment analysis [38] provides verbal cues and has also been used in affective computing. Some researchers have used biological markers such as heart-rate, Galvanic skin response [11] , vocal features [39] , and body gesture [40] as other modalities to measure users' emotional state.

In this study, we use a multimodal emotion recognition model (i.e., facial expression analysis and sentiment analysis). This approach helps us to weigh different modalities based on their reliability in representing users' emotion. For instance, we may recognize a facial expression as "happy" though the person may feel "sad" inwardly. This could be due to low accuracy in automated FER systems or misinterpreting facial expressions. Therefore, to best perceive one's emotional state, we combine different verbal and nonverbal cues gathered from different sensors. This multimodal measurement model can help disambiguate the sensory information. Equation 1 simply describes our multimodal emotion perception model:

Based on this model, E is a continuous variable {E ∈ R : −1 ≤ E ≤ +1} that describes valence (i.e., Negative, Neutral, or Positive). E is calculated as the dot product of the input sensory information vector (I) and the sensitivity vector (S). The sensitivity vector contains coefficients that indicate the weight of each sensory input values. For example, we can give a higher weight to the output of the sentiment analysis and a lower weight to the output of the FER. The weights can be determined using an HRI study or based on the measurement accuracy of each modality. This model can be expanded using an emotional dynamic matrix [22] which represents the influence that each emotion has on its own and other emotions over time.

In addition to sensing and interpreting emotions, a social robot will have means and tools to express and demonstrate its own emotions. Among such tools is the ability to show facial expressions through mechanical actuators or computer graphics, make gestures using hand and head movement, and express emotions using voice intonation. The robot's "feelings" can be based on: (a) the internal emotion model that rests on the robot's emotional state, or personality, which can manifest when the robot receives a compliment or is being verbally abused; (b) a reaction to the user's feelings, which can be as simple as emotion mirroring. Some studies suggest that empathy can be traced back to the mirror neuron system [41] , [42] ; (c) a predefined emotion scripted by a psychologist. For example, a scripted story or memory can be accompanied by gestures and emotional expression. Emotion in social robots can be expressed using many modalities such as spoken language (Nao [34], Pepper [43] , Ryan [44] ), mechanical face (Zeno [45] ), digitally animated face (Ryan [44] , Socibot [46] ), and body gesture (Nao [34]). In summary, we believe a social robot with AEI would be capable of sensing users' emotions using multiple modalities, interpreting their perceived emotions, choosing an appropriate response, and delivering it using a multimodal expression system. One such social robot is Ryan, and we will describe this robot in the following section.

Due to the increasing life expectancy of human beings and the increasing shortage of caregivers in the United States, social robots, as a helping hand, are becoming more appealing. Studies show that social robots are successfully improving the overall well-being of their users [30] , [47] . Social robots may also alleviate some of the side effects of loneliness in housing designed for older adults, such as depression or the degradation of cognitive abilities [32] , [48] , [49] .

Ryan, a social robot created by DreamFace Technologies [44] , is a companionbot for older adults living in assisted or independent living facilities. Ryan is specifically designed to be a companion robot which means that we In aim for Ryan to be empathic, expressive, appealing in appearance and manner, and able to motivate users to live in ways that improve their mental and physical health. Such a robot should have multiple streams of input data for observation, many output streams for reaction, and an intelligent program for making decisions and empathizing and conversing with users. Ryan has an expressive animated face 2. Ryan also has a high-definition RGB camera, a depth camera, a microphone, an active neck, a 10 inch display, and speakers. Section 3.4 describes Ryan's hardware in more detail. As described in Section 2, there are three components to emotional intelligence. This section describes how these components are integrated into Ryan.

There are several models of emotions in the literature [50] , where Russell's [51] and Ekman's [52] are the most common models used in HRI studies [53] , [54] . We use Russell's dimensional model for measuring emotional facial expression.

Using an RGB camera, Ryan captures 10 images per second. We feed each image into a face detector that uses the Viola-Jones algorithm [55] . We then crop the detected face and feed it into a deep neural network (DNN) for FER. The FER algorithm returns the probabilities for three emotion classes: Positive, Neutral, and Negative. Figure 3 illustrates the structure of our FER network. The input to the network is a 64 × 64 RGB image (output of the face detector) and the output of the network is three numbers that represent the probability of the three emotion classes (i.e., Negative, Neutral, and Positive). We use a residual Neural Network (ResNet50) [56] for FER. ResNet is the state-of-art DNN that has shown to work well with visual data recognition. The depth of the network is of crucial importance to neural networks and may increase the accuracy. However, increasing the depth makes training more difficult. Residual networks allow us to train deeper networks more easily and, therefore, improve the recognition's accuracy.

We used the AffectNet [57] facial image dataset to train the residual network. AffectNet consists of more than 320,000 facial images with annotated expressions. We trained the network such that it can classify a facial image into three categories of emotions (i.e., valence): "Positive" (or class +1), "Negative" (class -1), or "Neutral" (class 0). The network initially was trained on an Nvidia 1080 Ti GPU using the AffectNet dataset and then using transfer learning, fine-tuned for the target population (50+ years old) by using a subset of facial images (44 thousand images) until the accuracy on the training data was stabilized around 80% (Fig. 4) .

Our FER algorithm returns 10 estimated values for the user's facial expression per second, given that the user's facial expression may change multiple times while conversing with the robot. The last frame before the user stops speaking might not be the best candidate for representing their facial expression at the moment. It could result in a misclassification. For example, if the user is blinking, yawning, or covering their face the output of the FER system might be incorrect. To avoid noises and also create a more stable emotional state measuring system, we use the data from the last 30 frames (see Figure 5 ). However, to make the algorithm more sensitive to the most recent changes in the subject's facial expression, we assigned higher weights to the more recent frames. The value (-1, 0, +1) for each new frame was added to the end of the list and the oldest one was deleted. Then the new emotional state was calculated by a dot product of the list of class values and the weights: where w i is the weight number i and v i is the valence for the i th frame. Figure 5 illustrates the video frames and the measured emotional state for a 72 year old subject that was not included in the training set. We divided the measured emotional state into three categories for facial expression mirroring; Negative:

]. These ranges were chosen experimentally. Based on the output of our SA and FER algorithms, we found that defining Neutral as [−0.1, 0.1] provides reasonable accuracy when detecting neutral responses.

Automated sentiment analysis is a mature task in the field of natural language processing with several open-source publicly available toolboxes such as the CoreNLP [58] developed at Stanford University for public use. The CoreNLP sentiment analysis toolbox is based on deep Neural Networks and is trained using the Stanford Sentiment Treebank consigning of 11,855 single sentences extracted from movie reviews [21] . The system has an accuracy of 85.4% and is suitable for our research. The sentiment analysis module returns a value between -1 to +1 as the sentiment value of the preprocessed sentence.

Finally, we use the model described in Sec. 2.2 to fuse perceived emotional facial expressions and sentiment values to make sure the robot understands the multi-faceted user emotions correctly:

The F inalEmotion is a weighted average of user utterance sentiment and emotional state that will be used to direct the flow of conversation. The decision to equally average the sentiment and the emotional state is made based on our tests in the laboratory, more experiments are needed to find the perfect balance and weight.

For a conversation with users, we wrote more than 90 minutes (2342 Questions/Answers) of conversational dialogues on 12 different topics (family, pets, TV shows, science, music, nature, foods, travel, art, movies, reading, and sports). We integrated the dialogues with the emotion recognition technology so that Ryan could engage users in a pleasant conversation while empathizing with them based on the perceived facial expressions and the sentiment of their responses. For example, if the participant's response to the question "How does playing cards make you feel?" was negative or the participant showed a "sad" facial expression, Ryan would say "I'm sorry to hear that!" If the sentiment was positive or Ryan detected a positive expression on the user's face, Ryan would say "I thought you seemed content! Do you prefer to play alone or with friends?", and if neutral, Ryan would say "What makes you feel this way?"

Ryan also mirrored the user's positive facial expressions (Positive valence) to establish shared feelings and rapport, or showed a compassionate face when users had a negative emotion to facilitate empathy and rapport. For our dialogue management system we created Program-R, a modified version of Program-Y [59] , a publicly available dialogue manager that utilizes Artificial Intelligence Markup language (AIML) for scripted dialogues. Figure 6 demonstrates a sample dialogue between a user and Ryan. As the figure shows, the dialogue is more than just question-and-answer and users can take different paths through conversation.

Program-R is a hybrid (rule-based and machine learning) system that uses state-of-the-art sentiment analysis to deliver an affective dialogue system. Studies on emotion-based dialogue systems stress different sources of information to extract user sentiments. Approaches like [38] , [60] use only textual cues for sentiment-based dialogue system. In [61] , [62] they explored the use of acoustic features. Our system uses multimodal facial and textual information in a dialogue management system.

Program-R is a sentiment adaptive AIML-based dialogue system (known as template-based dialogue systems) that can fuse visual and textual information and respond to users accordingly. Unlike most dialogue systems Program-R is an active agent, which means Program-R initiates the conversation and tries to have a controlled chat with the user. AIML [63] is an XML-based language that is used for organizing the set of all dialogues in different chatbots like Alice [64] . In AIML-based dialogue systems, we try to find the best response (responses are stored in the template tag in AIML) for any user input utterance using Regex matching (stored in the pattern tag). Pattern and template tags together represent a unit of conversation under the category tag. One advantage of AIML is that history can be accessed via a that tag. Every question is contextualized and is answered based on the last unit of dialogue between robot and user. To deliver a more interactive user experience, we added tags and features to AIML. The robot tags were added to send multimedia information along with the raw text response to give the user a multimedia experience. The robot tags contain information such as image and video and the possible answers to multi-option questions such (i.e. yes/no questions) to be presented to the user in certain dialogues. Moreover, the getsentiment tag, a custom tag built for this study, takes the user utterance after preprocessing and sends it to the sentiment analysis module. Figure 7 depicts the architecture of the dialogue system. Program-R communicates with Ryan through a Representational State Transfer (RESTful) API [65] . After receiving the output of speech to text from Ryan, the raw text will be sent to the Preprocessing module to remove unnecessary punctuation, normalize the text, and sentence segmentation. The Sentiment Analysis module is where the sentiment of the text is mixed with the output of the Facial Expression Recognition module (Emotional State) to get a single score (see Eq. 4). The Brain's Question Handler takes into account the context, sentiment and session data while the Context Manager handles the context in which the conversation is happening. For example, some questions may have identical answers (i.e. yes/no), without knowing the context, thus producing the proper response is impossible. With the provided information from the Context Manager and the computed value based on Emotional State and Sentiment, the Question Handler produces an answer. Finally, the selected answer is sent to the Answer Handler and Postprocessing module to be sent back to Ryan.

DreamFace Technologies [44] has been developing Ryan as a socially assistive bio-inspired humanoid robot designed to provide both companionship and cognitive stimulation for older adults. Ryan has an expressive, 3D animated face powered by rear-projection technology that enables the robot to show facial expressions and accurate visual speech (lip movement). Ryan's head and animated face sit atop a two degree of freedom actuated neck that allows it to track its user and maintain eye contact for more personal interactions. A standard RGB webcam mounted in Ryan's head provides the visual input for the FER algorithm.

Ryan's torso houses the remaining I/O, computation, and power components and provides embodiment complete with passive arms that make it appear more human. Interaction with a fully embodied physical system such as Ryan can have benefits over a purely virtual, 2D avatar [66] . There are many studies that incorporate emotion into virtual agents [67] , [68] , [69] , [70] , [71] but in this study we focus on a physical robot. We investigated the differences between a virtual agent and a physical robot in our previous study [72] . A Kinect depth camera is embedded in the chest and provides sensing for body tracking. Given that Ryan is a conversational robot, it needs audio input and output, which is provided by a cardioid microphone and stereo speakers. These conversations are based on turn-taking and indicator LEDs in the shoulders are used to inform the user when it is their turn to speak.

An adjustable touch screen display is also mounted on the torso and provides a convenient multimedia interface for Ryan to display images and videos and play the music that is integrated into the conversations.

Ten older adults (Age M=77.1 yrs, SD=9 yrs; 7 females; 9 Caucasian, 1 Hispanic) living in the independent living facility at Eaton Senior Communities located in Lakewood, Colorado participated in the study. See Table 1 for the participants' demographics.

Inclusion criteria were: i) suspicion of early-stage Alzheimer's disease or related dementia (ADRD) by administrative staff in their residential facility and/or early-stage ADRD diagnosed by a qualified provider, ii) being 60+ years old at the time of study, iii) having Saint Louis University Mental Score (SLUMS) [73] between 15-26, iv) verbal skills in English in order to interact with Ryan, v) presence of identifiable behavior difficulties (depression), vi) availability for a period of three weeks to interact with Ryan.

SLUMS exam is an assessment tool for mild cognitive impairment and dementia and is commonly used in research on aging and in senior care facilities. Scores of 27 to 30 are considered normal in a person with a high school education. Scores between 21 and 26 suggest a mild neurocognitive disorder. Scores between below 20 indicate dementia. Prior to participating, subjects were briefed fully on the study design and consented to their involvement with the proper Institutional Review Board (IRB) approvals for human-subjects in place.

Participants interacted and conversed with Ryan twice a week over a period of three weeks (October 2018 to November 2018) for six sessions total. Figure 8 illustrates the experimental setup and an example of the user's interaction with Ryan during a session. Each session consisted of about 15 minutes of the prepared dialogues.

In order to assess the impact of Ryan's use of empathy on the user's engagement and emotional state, we randomly assigned participants to two groups (G1 and G2). The first group interacted with a non-empathic version of Ryan that did not show any facial expressions or empathize with the users (Emotion-OFF), while the second group interacted with the fully empathic version of Ryan that mirrored the user's facial expressions and empathized with them throughout the conversation (Emotion-ON). The users were not aware of the different versions of Ryan. After three sessions, we switched the groups to interact with the other version of Ryan. This cross-over study design (illustrated in Table 2 ) makes analyzing the results meaningful, as all the subjects were exposed to both versions of Ryan and hence the only independent variable is Emotion (ON/OFF).

To measure users' engagement, we used the average number of words uttered by the user in each question and answer. Word count has been used as a measure of engagement for chatbots in the affective computing literature [74] . The output of the FER and sentiment systems were stored for analysis and the percentage of positive facial expressions compared to negative expressions could determine the condition that the user enjoyed the most.

To measure the impact of interacting with Ryan, every user was asked to rate their mood on a scale of 0 to 10 (on a face-scale) before and after each session. Face-scale mood measurement has been used in the affective computing literature to assess the participant's mood [75] , [76] .

At the end of the study, we interviewed the participants and asked them to complete an exit survey to measure the robot's likeability and empathy. The survey questions were [78] .

We also interviewed the caregiver to get more information about the participant's well-being in the nursing home during the study.

To analyze the study, we used quantitative measures such as word count, percentage of positive emotions detected from the participants, pre/post-study depression measures, as well as qualitative measures (i.e. the likeability of Ryan) collected via an exit survey and post-study interviews with the subjects and the caregiver. The following sections describe the results in detail.

This section presents the quantitative analysis of the recorded data. We used Linear Mixed-effects Model (LMM) in SPSS with either word count, emotional state (FER over time), or sentiment as the dependent variable, Emotion ON/OFF (empathic vs non-empathic) as a fixed-effect factor, and session and subject as random-effect factors. Table 3 shows the results of running three separate LMMs on word count, emotional state, and sentiment values. Before fitting the model, we normalized the emotional state and sentiment values per session. This would assure us that the data is not biased and we only measure the effect of robot interaction and the condition (empathic vs. non-empathic) on the dependent variables. As reported in Table 3 , Emotion ON/OFF has a significant effect on word count, where individuals who spent time with the empathic Ryan uttered more words compared to when they talked with the nonempathic Ryan. However, emotional state and the sentiment of users' responses were not significantly affected by the type of the robot. We present more measurements and detailed quantitative analysis below. Word count measurement: To measure how engaged the users were in conversations with Ryan, we recorded each conversation and converted automatically to text using the Microsoft Speech Recognition SDK. Then the number of words in each utterance was counted by the robot and stored in its database. As Table 3 shows, the Emotion feature (i.e., Emotion ON/OFF) has a significant effect on the word counts uttered by Ryan's users. The mean and the standard deviation of word count is M=4.11, STD=5.372 when Ryan empathizes with users, and it goes down to M=3.71, STD=3.350 when Ryan does not empathize with users.

Face-Scale mood measurement: Before and after each session, we asked the users to tell us how they felt using a face-scale mood evaluation. The face-scale is a pictorial nonverbal assessment designed to measure mood on a scale of 0-10, where a score of 10 is the most positive and a score of 0 is the most negative mood a person may feel. Previous evaluations suggest it is a valid method for assessing mood with little guidance required and useful for screenings [75] , [76] . Figure 9 illustrates the difference in the users' facescale score before and after each session. A Wilcoxon signed rank test [79] shows that there is a statistically significant difference (Z = −5.466, p < 0.001) between pre-session (Median = 7) and post-session (Median = 9) face-scale mood measurements regardless of the empathic or non-empathic condition. This means interaction with Ryan is effective in improving the users' mood.

User's percentage of automated recognized expressions: Using the automated FER method described in Section 3.1, participants' facial expressions were recorded during each conversational session with Ryan. As reported in Table 3 , emotion does not have a significant effect on the measured valence values. To get a better sense of this effect and since the duration of each session is constant across all subjects, we counted the number of times each participant had a positive, neutral, or negative expression throughout the session. Table 2 shows the participants reacted more positively when Ryan empathized with them. Particularly, in the first three dialogue sessions, on average 25.7% of the detected facial expressions were positive for G1 with Ryan's emotion disabled, while this value is 45% for G2 with emotion enabled. Table 2 also shows both G1 and G2 exhibit less percentage of positive expression (4% less in G1 and 11.7% less in G2) when the emotion condition is OFF. A similar pattern can be seen for negative facial expression, where the percentage of negative expression goes down when users interact with the empathic Ryan. However, the magnitude of the difference may vary between G1 and G2 for several reasons. One reason is the order of the experiment. Emotion is off for one group for the first three sessions, while it is on for the other group, and then we switch it on for the first group and off for the second. This means the topic of the conversations varied from session to session and had an effect on the subject's experience. Namely, some participants expressed to researchers a preference for some topics compared to other topics.

In the following, we show examples of dialogues between Ryan and the study participants as well as their measured facial expressions.

Session 3 Emotion on Ryan: "I am happy to be here with you [name is removed]. It is nice to see you again. As a reminder, my name is Ryan and I have a lot of fun things for us to talk about! Are you ready to get started?" SN03: "Yes please." With FER of +0.07 Ryan: "Great! Even though we have chatted before, I would still love to know how you are feeling! How has this experience been so far?" SN03: "Very extraordinary I like it." With FER of +0.25, sentiment of +0.97 and a final sentiment (FER + sentiment) of +0.61 Session 5 Emotion off Ryan: "I sure am feeling hungry now. Are you hungry? SN09: "You're making me hungry yes." With FER of -0.7 Ryan: "What was it like for you to go on this culinary adventure today?" SN09: "I had fun." With FER of -0.37, sentiment of +0.75, and final sentiment (FER + sentiment) of +0.19

Change in User's Depression: We used Patient Health Questionnaire-9 (PHQ-9) [80] and Geriatric Depression Scale (GDS) assessments [81] for depression measurement to assess participants' depression level pre-and post-study. The PHQ-9 is a widely used questionnaire to diagnose and measure the severity of symptoms for Major Depressive Disorder (MDD). It consists of questions that are answered on a scale of 0 (not at all) to 3 (nearly every day). Previous studies have indicated that the PHQ-9 is a consistent and valid measure of depression severity [80] .

The GDS is a dichotomous "yes" or "no" evaluation tool commonly used to measure depression. While this scale has a long and short form, the long form of 30 questions was used to obtain most accurate and comprehensive results. This scale has been specifically tested and used extensively with older adults aged 65 and higher. Data shows that the GDS is reliable and promising in screening for depression in older adults [81] . The results of our study are given in Table 5 . As the table shows 7 out of 10 participants had an improvement between 1 and 16 in their GDS depression score (the maximum score is 30) or between 1 and 6 in the PHQ-9 assessment (the maximum score is 27).

At the end of the study, we asked each participant to complete an exit survey. The survey contains 33 questions about the experiences they had with Ryan as follows: evaluation of Ryan's empathy and emotion and evaluation of the interaction with Ryan and the likeability of the conversation with Ryan and the conversation topics. We also asked the users to give us feedback about any other aspects of the robot and the study. The majority of questions were based on a fivepoint Likert scale where 1 means "Strongly Disagree" and 5 means "Strongly Agree", with an additional 5 "yes", "no" questions. Table 4 reports the questions and the average score. It also shows the score for each topic.

Average score was above 4.00 on all the questions except question "Q17: Talking with Ryan was like talking to a person", where the average score was 3.90 (STD = 1.37). Notably, they gave an average score of 4.5 (STD = .67) on "Q4: I feel happier when I was in the company of Ryan." and 4.57 (STD=.49) on "Q10: How much do you agree that Ryan empathized with you". We specifically asked participants "Q2/Q9: whether they noticed a change in the way Ryan communicates with them and its ability in showing facial expressions after the session three crossovers" and 73% of them said they noticed the change.

In our exit interviews, we asked the participants to give us comments about the study and provide feedback on the experience they had with Ryan. The participants use the pronoun "she/her" to refer to Ryan since Ryan had a female face/voice in this study. In the following we report the comments: SN01: "I had a good time. I enjoyed her very much. You want her to be a real thing like an addition to your home. I didn't think of her as a person like a dog or a cat." SN02: " Ryan told me a lot of good things and I had a good time with her. She was very interesting and helpful." SN03: "I liked her ("Ryan"). She is witty. At first, I didn't know what to think. I got better as I went. She sure has a pretty smile. It tears me up when she smiles, blinks her eyes. I would like to take her out to dinner but she wasn't hungry. Maybe next time." SN04: "I liked her when she smiled. She interrupted me sometimes. Give me a chance to finish what I am saying. She was fun to talk to. I think the first one talked more I like with a smile. Very friendly." (Note: She is on G2 where Emotion was ON first and Ryan Smile and empathized).

SN05: "She was sort of creepy looking a little bit but she was fine. I was surprised I enjoyed it! I like her when she smiled. When she wasn't smiling she was kind of crummy." SN06: "They forgot the eyelashes. The only thing I had difficulty was the lights. Took getting used to it. I had so much fun in those meetings. Also, the thing was that when robot communicated and I paused, it would repeat itself." SN08: "Ryan was very interesting and informative. When you first told me I was going to talk to a robot, I thought you were out of your mind but I really enjoyed it. She gave me ideas and information I had no ideas on." SN09: "The longer I made an effort to communicate with Ryan the better it seemed to go. At a point, it became more natural to speak with the robot. She was cathartic." SN10: "The robot asked a lot of questions and I didn't get to ask many questions. She looked really good. Her eyes blinked, her mouth moved. She smiled."

We asked the participants' caregiver (staff member in Eaton Senior Communities) about her observations of the subjects' behavior and mood pre-and post-study. Although the caregiver's observations are anecdotal and only represent one person's views/observations of subjects, it is still worth reviewing them since the caregiver had seen the subjects pre-and post-study and can judge changes in their wellbeing as an outsider.

She reported that the subjects who struggled with depression and social isolation benefited the most from interacting and conversing with Ryan. For instance, SN02 struggled with depression and social isolation (i.e. not attending holiday activities or no longer taking meals in the dining room), smiled and laughed again post-study and engaged in the community.

The caregiver also reports that participants keep talking to her about the variations in Ryan's facial expression and particularly smiling as a feature that positively affected their relationship with Ryan. She reports that the improvement in mood was quickly apparent but also cognition, as residents were exposed to research and educational opportunities and "stimulated human interaction."

The growth of the elderly population and the widespread understaffing across nursing homes can exacerbate feelings of loneliness in the residents and overburden their nurses. During the COVID-19 pandemic, this issue became more evident [6] . The development of AI technologies drew attention to service robots and SAR as potential solutions to these problems. Robots may effectively relieve the burden on healthcare workers and improve the well-being of elderly individuals. Such robots need to be socio-emotionally intelligent in order to effectively engage the aging population.

In this paper, we discussed Ryan, a socially assistive robot, and its multimodal emotion recognition and multimodal emotion expression systems. More specifically, we compared two versions of the robot: one that uses a scripted dialogue that does not factor in the users' emotions and is lacking facial expressions (non-empathic version), and one with facial expressions that uses an affective dialogue manager to generate a response and has the capability to recognize users' emotions (empathic version).

We studied the differences and effects of Ryan's two versions with a cohort of older adults living in a senior care facility. The statistical analysis of the users' face-scale mood measurement (illustrated in Figure 9 ) indicates an overall positive effect as a result of the interaction with Ryan, irrespective of the robot being empathic or non-empathic. However, the word count measurement (Table 3 ) and the exit survey analyses (Table 4 ) suggest that the empathic Ryan is perceived as more engaging and likable. Considering that the only difference between Ryan's two versions is empathic versus non-empathic, the findings suggest that empathy can encourage users to have longer conversations. Nonetheless, more experiments are needed to further study interactions using a more natural dialogue manager (chatbot). The changes in users' depression measurement scores (Table 5) suggest that Ryan can potentially decrease users' depression, although to verify this finding more subjects and long-term studies are required.

Although the study's results are positive and encouraging, our work has several limitations. Addressing these limitations in the future can improve Ryan and the effectiveness of similar HRI systems. When it comes to Ryan's perception and sensory input, acoustic signals and other modalities such as eye movement, gaze, head and body gesture, posture, and even breathing rhythm can be used to determine users' emotional state. Currently, Ryan does not utilize these sensory inputs, and adding these features would make the recognition of users' emotional state and intention more accurate and reliable. The other limitations of our study include small sample size, imbalanced participants' demographics, and the number of sessions.

Automatic speech recognition (ASR) is not specifically designed for older adults; their slow pace in talking and long pauses are considered "End of Sentence". This issue triggered the robot to interrupt participants, which had a negative effect on their perception. Open-domain dialogue is still an open question in computer science and consequently was the area that proved to have the most limitations in our study. While rule-based chatbots will never be perfect, our system still has room to grow in terms of the size of our knowledge base and our pattern matching rules. Finally, in this study, Ryan only mirrors the user's facial expression; in the future, the conversation and the context could drive the expression on Ryan's face.

Defining socially assistive robotics

Social isolation in covid-19: The impact of loneliness

Loneliness among elderly in nursing homes

Social companion robots to reduce isolation: a perception change due to covid-19

How do older adults experience and perceive socially assistive robots in aged care: a systematic review of qualitative evidence

Shortages of staff in nursing homes during the covid-19 pandemic: What are the driving factors?

A social robot intervention on depression, loneliness, and quality of life for taiwanese older adults in long-term care

Social Activities in Community Settings: Impact of COVID-19 and Technology Solutions

Robotic transformative service research: deploying social robots for consumer well-being during covid-19 and beyond

The grand challenges in socially assistive robotics

The empathic companion: A character-based interface that addresses users'affective states

A reinforcement learning based cognitive empathy framework for social robots

Heart of the machine: Our future in a world of artificial emotional intelligence

The effectiveness of social robots for older adults: a systematic review and metaanalysis of randomized controlled studies

A platform for human-robot dialog systems research

Affective computing

Pathways connecting late-life depression and dementia

A pilot study on using an intelligent life-like robot as a companion for elderly individuals with dementia and depression

Emotional intelligence: Implications for personal, social, academic, and workplace success

Intelligent expressions of emotions

Designing emotionally sentient agents

An emotional model for a guide robot

Empathy: Its ultimate and proximate bases

Studying effects of incorporating automated affect perception with spoken dialog in social robots

Learning by feeling: Evoking empathy with synthetic characters

The influence of empathy in human-robot relations

Empathic robot for group learning: A field study

Emotion recognition through multiple modalities: face, body gesture, speech," in Affect and emotion in human-computer interaction

Emotion recognition for human-robot interaction: recent advances and future perspectives

Effects of robot assisted activity to elderly people who stay at a health service facility for the aged

IEEE/RSJ International Conference on

The utilization of robotic pets in dementia care

Delivering cognitive behavioral therapy using a conversational social robot

Assistive robotic technology to combat social isolation in acute hospital settings

Emotional intelligence: Implications for personal, social, academic, and workplace success

Affective multimodal human-computer interaction

Multimodal emotion recognition

Sentiment adaptive end-to-end dialog systems

Automatic statistical analysis of the signal and prosodic signs of emotion in speech

A categorical approach to affective gesture recognition

Understanding emotions in others: mirror neuron dysfunction in children with autism spectrum disorders

Facial mimicry and emotional contagion to dynamic emotional facial expressions and their influence on decoding accuracy

Pepper robot

Social robotics

Socibot platform of engineeredarts

Pilot study on improvement of quality of life among elderly using a pet-type robot

Social function and cognitive status: Results from a us nationally representative survey of older adults

The association between social support and cognitive function in mexican adults aged 50 and older

Models of emotion: The affective neuroscience approach

A circumplex model of affect

The Facial Action Coding System: A Technique for the Measurement of Facial Movement

Emotion analysis in human-robot interaction

Emotion modelling for social robotics applications: a review

Robust real-time face detection

Going deeper with convolutions

Affectnet: A database for facial expression, valence, and arousal computing in the wild

The Stanford CoreNLP natural language processing toolkit

Program-y

Emotion detection in dialog systems: Applications, strategies and challenges

Real-time speech emotion and sentiment recognition for interactive dialogue systems

Speech emotion recognition using hidden markov models

The elements of aiml style

The anatomy of alice

Restful api

Embodiment in socially interactive robots

Basic: A believable, adaptable socially intelligent character for social presence

Making them remember-emotional virtual characters with memory

Building autonomous sensitive artificial listeners

Simsensei kiosk: A virtual human interviewer for healthcare decision support

Dynamic emotional language adaptation in multiparty interactions with agents

Role of embodiment and presence in human perception of robots' facial cues

The saint louis university mental status (slums) examination for detecting mild cognitive impairment and dementia is more sensitive than the mini-mental status examination (mmse)-a pilot study

Real conversations with artificial intelligence: A comparison between human-human online conversations and human-chatbot conversations

The face scale: a brief, nonverbal method for assessing patient mood

A Pilot Study on the eBear Socially Assistive Robot: Implication for Interacting with Elderly People with Moderate Depression

The emote project

Measuring individual differences in empathy: Evidence for a multidimensional approach

Wilcoxon signed-rank test

The phq-9: validity of a brief depression severity measure

He is the VP of engineering at DreamFace Technologies, LLC and currently and also a Ph.D. Student of Electrical and Computer Engineering at the University of Denver, USA. During his Ph.D. program, Hojjat has focused on developing and studying a social robot called Ryan the Companionbot

He is a Professor of Electrical and Computer Engineering at DU and also the founder of DreamFace Technologies, LLC, a start-up robotics company aims at developing and commercializing Ryan companionbot for assisting elder adults with depression and dementia. He does research in the area of computer vision, affective computing, and human-robot interaction (HRI)

Research reported in this manuscript was supported by the National Institute on Aging of the National Institutes of Health under award number R44AG059483 to DreamFace Technologlies, LLC. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The authors would like to thank Chandler Yunker and Gabriela Nordman for their contribution and help with dialogue writing.