key: cord-0154244-wvhjky1v authors: Ishii, Etsuko; Winata, Genta Indra; Cahyawijaya, Samuel; Lala, Divesh; Kawahara, Tatsuya; Fung, Pascale title: ERICA: An Empathetic Android Companion for Covid-19 Quarantine date: 2021-06-04 journal: nan DOI: nan sha: 92ede3a18f0f710b5342eac3371758029da656d1 doc_id: 154244 cord_uid: wvhjky1v Over the past year, research in various domains, including Natural Language Processing (NLP), has been accelerated to fight against the COVID-19 pandemic, yet such research has just started on dialogue systems. In this paper, we introduce an end-to-end dialogue system which aims to ease the isolation of people under self-quarantine. We conduct a control simulation experiment to assess the effects of the user interface, a web-based virtual agent called Nora vs. the android ERICA via a video call. The experimental results show that the android offers a more valuable user experience by giving the impression of being more empathetic and engaging in the conversation due to its nonverbal information, such as facial expressions and body gestures. To combat the COVID-19 pandemic, lockdowns have been imposed around the world, leading many to experience social isolation. Many people have also undergone weeks of mandatory self-quarantine as they crossed a border or had close contact with a patient. The resulting social loneliness can affect people's mental state, and mental support for those under isolation is suggested (Choi et al., 2020; Zhao et al., 2020) . For more than half a century, dialogue systems have played the role of therapist, psychologist or counselor (Vaidyam et al., 2019) , and many were designed to help people with a specific concern (Rizzo et al., 2011; DeVault et al., 2014) . Hence, dialogue systems have a role in helping curb the effects of social isolation arising from the pandemic. To meet the emerging needs arising from the pandemic, we extend the idea of Nora, an empathetic dialogue system which mimics a conversation with a psychologist (Winata et al., 2017 (Winata et al., , 2021 , to specifically mentally support people under selfquarantine, and we install her dialogue system into the autonomous android ERICA (Glas et al., 2016) . We utilize ERICA's nonverbal features, which are not offered by Nora, to improve the user interface (UI), because it is well-accepted that the nonverbal behavior of clinicians and therapists affects the outcome of patients (Foley and Gentile, 2010; Beck et al., 2002) . During the conversation session, our system asks a set of questions to screen for stress and depression as well as health conditions such as body temperature or shortness of breath. We conduct a comparative study of the virtual agents between the web-based Nora and android ERICA, and we design a dialogue flow particularly for quarantined users based on Nora's graphical UI. The experimental results show that nonverbal information actually enhances the quality of the user experience during the session by giving the user the impression he or she is being empathized with and listened to. This suggests the importance of the design of nonverbal behavior in dialogue agents, especially for those in the mental health care domain. Here we describe the end-to-end system for the Nora web-based virtual agent and the android ER-ICA, whose architectures are depicted in Figure 1 . The dialogue manager consists of three submodules: language understanding, response generation, and facial expression prediction. The language understanding module detects the user's intent and slot entities. The response generation module will then generate an appropriate response sentence according to the information from language understanding and empathy analysis. The system utterance generated from the response generation is then passed to the facial expression prediction module, which decides the appropriate facial expression to show. The facial expression is categorized into six distinct classes: happiness, sadness, anger, surprise, laughter, and neutral. We design a dialogue flow that focuses on a conversation with users in quarantine. As shown in Figure 2 , Nora's dialogue conversation is divided into two sessions, the first day session and daily session. In the first day session, the agent will introduce the session and ask about the user's profession. The agent will proceed in the daily sessions by asking about the user's mood and continue with a temperature and shortness of breath check. Afterward, the agent asks questions about gratitude and then recommends that the user enjoy activities such as yoga, exercise, and meditation. At the end of each activity, the agent will ask a follow-up question about how the user feels about the activity. When ending the conversation, the agent will say goodbye and remind the user to wash their hands and wear a mask. The empathy analysis module contains three modules to understand the user's mood: stress detection, sentiment analysis, and emotion recognition from text and audio (Winata et al., 2017) . These modules are later used in the dialogue manager to respond appropriately without discomforting the user. We compute stress, sentiment, and emotion scores on every user turn, and use them to identify whether the user has an extreme psychological condition or not. We also use the scores to track the user's mood every day and provide suggestions to the user for improving their mental well-being. The Nora virtual agent has a web interface, as shown in Figure 1a , that accepts speech input. Users can see their input and responses in text as well as the automatic speech recognition (ASR) results of their utterances. To improve the interaction, the virtual agent provides sound effects to signal the user when the system starts and stops listening. To make the conversation more natural, Nora uses a text-to-speech (TTS) module to generate a speech response. ERICA is a super-realistic female humanoid developed as a conversational agent to play various roles (Glas et al., 2016) . She has facial expressions controlled by a facial expression predictor inside the dialogue manager. We develop a mapping of the emotion category to ERICA's actual facial movement and execute it during her utterance, with examples shown in Figures 3(a) and (c). During the user turn, ERICA adopts the default (neutral) face. She also has a lip-motion generation module which is directly controlled by speech signals obtained by the TTS module. We implement nonverbal behaviors which are triggered based on the turn: body gestures during the system turn, and nodding during the user turn. Body gestures are intended to show openness to users during ERICA's utterance, mainly moving her right hand, as shown in Figure 3 (d) and (e). We design four versatile movements and play one of them randomly during ERICA's utterance. During the user turn, ERICA nods to play the role of an active listener until 2.0 s of user silence is detected. To enhance the naturalness of ERICA's behavior during the conversation, a random gazing model is also introduced. ERICA normally does speaker tracking using Kinect (Inoue et al., 2016 (Inoue et al., , 2020 , but since the participant in our case is not on-site, we model gazing behavior as a random uniform sampling of a gaze point nearby the webcam. The gaze point will be randomly changed within a hollow cylinder from the center of the webcam with an outer radius of 0.3 m, inner radius of 0.05 m, and width of 0.2 m. The gaze change decision is taken every 1.5 s. We conducted a comparative evaluation to see how nonverbal information such as facial expressions and body gestures affect the user experience by asking volunteers to participate in a session with the Nora virtual agent and ERICA. We conducted a simulation of counseling and recruited 19 participants who are fluent in English. In the experiment, a participant accessed the web interface through their web browser and reached the dialogue session page as in Figure 1(a) to have a session with Nora. Then, using a video conference tool, they talk with ERICA just as they would a usual video call. After finishing the two sessions, we asked participants to evaluate the two systems by choosing which agent is preferred from four different criteria based on their experience during the conversation. Participants were also asked to give an additional comment describing the reason for their choice on each criterion. In Table 1 , we summarize the experimental results. Overall, ERICA is only slightly preferred (52.6%) over the Nora virtual agent (47.4%) due to its system drawbacks, even though it is perceived to be more attentive and empathetic. Q1: Overall Experience is comparable for several reasons: Although ERICA is regarded as more empathetic and engaging in conversations, users reported that they had a poorer experience, mainly because of the delay in ERICA's response. Moreover, some participants pointed out that the virtual agent is preferable since calling ERICA every day might be troublesome. Q2: Empathy shows that ERICA is perceived as significantly more empathetic thanks to its facial expressions and gestures. Some participants reported that gestures reflected their emotions and thus ERICA was being empathetic, even though her gestures are independent of their emotions. Q3: Attentiveness shows that ERICA is perceived to be significantly more attentive to users because of her nodding, facial expressions, and gestures that mimic human listening behaviors to some extent. Most of the participants agreed that the feedback from ERICA during the user turn, namely, nodding, reduced their anxiety about not being understood. Q4: User Friendliness measures technical or psychological difficulties. The majority of the par-ticipants reported that ASR accuracy and response time are the drawbacks of ERICA, while some preferred ERICA as she is more human-like and easier to talk to. To enhance the user friendliness, further investigation should be done to handle additional environmental noise in the video call. One of the major challenges in dialogue systems is how to incorporate empathy, and several papers have explored approaches for end-to-end chatbots (Lin et al., 2020; Ma et al., 2020) . Empathetic dialogue systems are attracting more interest in the field of psychiatry as well (Vaidyam et al., 2019) , especially those equipped with nonverbal features (DeVault et al., 2014; Rizzo et al., 2011) . In addition, Inoue et al. (2016 Inoue et al. ( , 2020 utilized ER-ICA's nonverbal features to make her more empathetic in more generic situations. In this paper, we described the implementation of the Nora dialogue system and its application in the android ERICA. A comparison of ERICA against Nora shows that the facial expressions and body gestures of ERICA give a better impression of attentiveness and empathy, even though ERICA has technical drawbacks such as delayed response and worse ASR quality than Nora. These results suggest that nonverbal communication is crucial for machine-to-human conversation as for human-tohuman conversation, and special care is needed to design the nonverbal behaviors of empathetic dialogue systems. Physician-patient communication in the primary care office: a systematic review Depression and anxiety in hong kong during covid-19 SimSensei Kiosk: A virtual human interviewer for healthcare decision support Nonverbal communication in psychotherapy Erica: The erato intelligent conversational android An attentive listening system with android erica: Comparison of autonomous and woz interactions Talking with erica, an autonomous android Caire: An end-to-end empathetic chatbot A survey on empathetic dialogue systems Simcoach: an intelligent virtual human system for providing healthcare information and support Chatbots and conversational agents in mental health: A review of the psychiatric landscape Nora the empathetic psychologist Farhad Bin Siddique, Yongsheng Yang, and Pascale Fung. 2021. Nora: The well-being coach Social distancing compliance under covid-19 pandemic and mental health impacts: A population-based study