key: cord-0452377-o1vms1pf
authors: Ali, Mohammad Rafayet; Razavi, Zahra; Mamun, Abdullah Al; Langevin, Raina; Rawassizadeh, Reza; Schubert, Lenhart; Hoque, M Ehsan
title: A Virtual Conversational Agent for Teens with Autism: Experimental Results and Design Lessons
date: 2018-11-07
journal: nan
DOI: nan
sha: a2428eb9531abfbb463934163b68ac7de223bf91
doc_id: 452377
cord_uid: o1vms1pf

We present the design of an online social skills development interface for teenagers with autism spectrum disorder (ASD), who often lack access to social skills training. The interface is intended to enable private conversation practice anywhere, anytime using a browser. Users converse informally with an on-screen persona, receiving feedback on nonverbal cues in real-time, and summary feedback. The prototype was developed in consultation with an expert UX designer, two psychologists, and a pediatrician. Using the data from 47 individuals, feedback and dialogue generation were automated using the hidden Markov model and a schema-driven dialogue manager capable of handling multi-topic conversations. We conducted a study with nine high-functioning ASD teenagers, and through thematic analysis of post-experiment interviews, identified several key design considerations, notably: 1) Users should be fully briefed at the outset about the purpose and limitations of the system, to avoid unrealistic expectations. 2) An interface should incorporate positive acknowledgment of behavior change. 3) Realistic appearance of a virtual agent and responsiveness are important in engaging users. 4) Conversation personalization, for instance in prompting laconic users for more input and reciprocal questions, would help the teenagers engage for longer terms and increase the system's utility.

Autism spectrum disorder (ASD) is a developmental disorder which affects one in 59 individuals in the US alone [55] . Almost all the individuals with ASD show deficits in nonverbal communication [19] . They often fail to make appropriate gestures, eye contact, and smile to make their verbal communication compelling. Thus they often face difficulty in expressing their feelings and act out their frustrations through vocal outbursts [34] , [51] , [28] . Current practices for treating the communication skills deficit involve therapy sessions with behavior experts. There is a significant shortage of behavioral experts resulting in therapy time being limited and inaccessible. A computermediated social skills intervention has the potential to enable individuals with ASD to practice conversation frequently with a standardized and repeatable stimulus. Additionally, computer-mediated tools can enable a therapist to serve more individuals by monitoring their progress remotely while continuing their weekly face-to-face therapy.

Although computers have many advantages, social skills intervention for individuals with ASD is very challenging. Dr. Stephen Shore, an autistic professor of special education famously said, "If you've met one person with autism, you've met one person with autism". This indicates that a single solution does not exist, and if we want to design any intervention, it needs to be customized for individual needs. Thus, we need the active participation of the ASD individuals in the design process, so that the design will be appropriate both in typical cases and in terms of individual variation. Because of the deficit in communication skills a participatory design is often hard to achieve.

In this paper, we review the design of an online social skills development interface LISSA -Live Interactive Social Skills Assistance; we focus on design lessons learned, based in part on additional trials with autistic teens, and particularly Paste the appropriate copyright/license statement here. ACM now supports three different publication options:

 ACM copyright: ACM holds the copyright on the work. This is the historical approach.  License: The author(s) retain copyright, but ACM receives an exclusive publication license.  Open Access: The author(s) wish to pay for the work to be open access. The additional fee must be paid to ACM. This text field is large enough to hold the appropriate release statement assuming it is single-spaced in Times New Roman 8-point font. Please do not change or modify the size of this text box. Each submission will be assigned a DOI string to be included here. on analysis of post-session interviews with participants. LISSA features a virtual agent capable of engaging the user in a multi-topic conversation and giving real-time feedback on the nonverbal cues, by analyzing the spoken language and the facial expressions of the user in real-time. In addition, LISSA provides a post summary feedback after the conversation. The motivating idea is that users can interact with LISSA repeatedly, and potentially learn and observe their skills improvement from the summary feedback in multiple conversation sessions, held in a private and safe environment. The design of the interface was guided by an expert UX designer, psychologists, and a pediatrician. The initial design of the interface was evaluated in a Wizard of Oz setting through a speed-dating study with 47 college students [1] . Participants demonstrated improvements in several areas of nonverbal communication using the interface, which includes eye contact, head nods, and smile. The feedback system and the dialogue manager were automated using the data collected through the study.

Subsequently, we began to focus on adapting the system design to teenagers with ASD. We decided to conduct an initial study that would inform further development of a system that could be evaluated in controlled intervention experiments, and subsequently deployed broadly. The initial study, whose lessons we explore here, involved nine teenagers with ASD. Figure 1 shows two teenagers with ASD interacting with LISSA in the lab. LISSA can be accessed through a laptop or computer which has Internet access and a webcam/mic. With the help of professionals in developmental and behavioral pediatrics, we recruited teens with ASD for preliminary interaction with LISSA and interviewed them about the experience after their interaction. Through a thematic analysis of the interview transcripts we then identified several key design guidelines, including the following: 1) Users should be fully briefed about the purpose of an interface, and its capabilities and limitations. 2) An interface should incorporate positive acknowledgment of behavior change. 3) Realistic appearance of a virtual agent and responsiveness are significant factors in engaging users. 4) Conversation personalization, e.g., user-sensitive turn-taking and prompting laconic users, would help the teenagers engage for longer terms, and thus have them benefit from the interaction.

Our paper makes the following contributions - We motivate and explain the design of the LISSA system, with emphasis on its adaptation towards helping teens with ASD to improve their conversational skills.  We describe the automation of the system that makes it suitable for private, ubiquitous use.  Based on our experience with ASD subjects, including transcribed post-session interviews, we discuss key design guidelines that we identify as important for future interface design for individuals with ASD.

Individuals with autism spectrum disorder (ASD) are characterized in varying degrees by their social interaction, difficulties in verbal and nonverbal communication, and repetitive behaviors [11] . High-functioning ASD individuals are those diagnosed with autism but functioning cognitively at a relatively high level (e.g., IQ greater than 70) [47] , [10] . However, high functioning ASD individuals may still demonstrate deficits in communication, emotion recognition, and social interaction [47] . The existing treatments for high functioning ASD do not address the condition as a whole, rather they focus on individual symptoms. Thus there is no single intervention for such individuals. In this section, we discuss the existing computerbased social skills interventions for individuals with ASD.

Hourcade et al. [25] developed tablet apps for social skills development intervention for children with ASD. The authors conducted a randomized control study [26] with eight children with ASD to find empirical evidence of the effectiveness of the tablet apps. They used four apps from their Open Autism Software [25] and found that, when using the app, children spoke more, had more verbal interactions, and demonstrated physical engagement with the app activities. Gal et al. [18] developed a tabletop application as an intervention for improving social skills among children with high functioning ASD. The application allows two to four children to collaborate and create a narrative of a story for a given scenario. A study with 14 high functioning ASD children over a 3-week time period revealed that children were more likely to initiate positive social interaction, and demonstrate collaborative behavior after the intervention. Piper et al. [43] also developed a shared interface on tabletop known as SIDES to improve social skills among ASD individuals. The authors showed that their intervention tool enabled a middle school therapy class to become more effective in group work. Boyd et al. [8] designed SayWAT, a wearable assistive technology for adults with autism, which provides real-time feedback on prosody during face-to-face interaction. They demonstrated that their tool could detect atypical prosody and deliver feedback in real time without disruption to the conversation. Hayes et al. [21] have provided three prototype systems addressing the design challenges with the use of large group displays, mobile personal devices, and personal recording technologies. Through a qualitative study with 13 children, they presented design guidance for visual support. Benton et al. [4] presented a methodology for incorporating children with ASD in the design process. They conducted a study with 20 participants with ASD aged between 11 and 14 and, using their design methodology, came up with ten design guidelines specific for game design and idea generation. Madsen et al. [32] conducted a study with seven teenagers with ASD to understand the design concepts needed for developing interfaces for the particular target group. They came up with three design considerations -an adaptation of the form factor, customizability of graphical user interface, and adaptation of user experience. Kamaruzaman et al. [27] developed a touchscreen-assistive learning numeracy app, known as TaLNA for children with ASD. The design was driven by participatory design guidelines.

Nojavanasghari et al.

[42] designed a virtual agent-based system to mediate human-to-human interaction for children with autism. The aim of their work was to design a tool to help improve the social skills of children with autism by providing visual support for the children and real-time feedback to the interactor about the children's affective states, using a recommendation system. Foster et al. [17] designed ECHOES as a multimodal learning environment intended for children with ASD. The system features a virtual agent that engages a child in a collaborative learning activity and provides feedback based on sensed features including gaze direction and gesture. In a subsequent work with ECHOES, Bernardini et al. [5] designed a virtual agent as a credible social partner for children with ASD, that engages them in interactive learning activities. Milne et al. [36] designed a virtual agent as an educational tool for children with ASD, which helps improve their conversational skills and ability to deal with bullying. In a study with ten participants, the authors showed that participants who received the intervention had gained higher conversational skills and more knowledge about bullying than the control participants. Mower et al. [41] developed Rachel, an embodied conversational agent designed to elicit and analyze naturalistic interactions. This tool was designed for children with autism to encourage their affective and social behavior. Boujarwah et al. [7] presented a tool to enable non-expert humans to generate conversational scenarios, which can be used to teach children with ASD, appropriate behaviors in different social scenarios. DeVault et al. [14] developed SimSensei -a virtual agent in the context of the healthcare decision support system. The goal of this system is different than our work as it aims to identify psychological distress indicators through a conversation with a patient in which the patient feels comfortable sharing information. This system has both nonverbal sensing and a dialogue manager. The dialogue manager uses four classifiers to categorize the users' speech, and hence to generate a relevant response. Hopkins et al. [24] developed a computer-based social skills training system for children with ASD. In a study with 49 individuals, the authors demonstrated that the program helped the participants improve their emotion recognition ability and social interactions. Razavi et al. [44] developed a conversational agent capable of conversing with teenagers with ASD. The authors employed a script-like schema [45] for guiding the dialogues, and generated appropriate responses using hierarchical pattern transductions. Tartaro et al. [54] developed an authorable virtual peer for children with ASD. The virtual peer is capable of interacting with the children, sharing real toys, and responding to the children's input.

Although interaction with conversational virtual agents proved to be effective in various applications, most state-ofthe-art virtual agents are weak in terms of language understanding. Smartphone-based conversational agents can be impressive in their question-answering tasks but have been found to provide incomplete and inconsistent responses in areas like mental health, interpersonal violence, and physical health [37] . Woebot [16], a chat-bot recently developed at Stanford for delivering CBT to young adults with depression and anxiety, was able to significantly reduce symptoms of depression; however, its inability to hold a natural conversation was reported as the main issue users complained about.

Popular methods in dialogue managements include framebased methods such as [16] and [46] where the user's responses are used to fill frame slots and to perform a task according to the contents of the frames. Another method, employed in SimCoach [40] and ASST [53] controls dialogue via transitions between predetermined states based on the user's input. While the method shows good performance in limited dialogues, scalability remains an issue for more open-ended conversation. Some other end-toend systems achieve meaningful single-turn responses through data-driven methods (e.g., [49] and [57] ), but cannot conduct a coherent longer dialogue; also, data-driven methods are susceptible to bias and privacy issues, among others [23] .

Taken together, prior studies in computer-based social skills training and dialogue management suggest that dialoguebased systems hold considerable promise for helping individuals with ASD (or certain others with special needs). In our work, we have focused on improving users' nonverbal behavior in communication through real-time feedback. Through participatory studies, and post-session interviews, we have identified significant design guidelines for such virtual agent based intervention.

In this section, we outline the process of system design and implementation. We first describe the LISSA interface and then discuss the initial data collection process implemented with a Wizard-of-Oz prototype. Then we describe the machine learning techniques and dialogue management approach used to automate the system. The automation of the system, and its adaptation to the version aimed at ASD teens, prepared the way for the studies on which we focus here.

Our design team includes two psychologists, a psychiatrist, and a user experience designer. The online interface consists of two major parts: a conversation interface, and a postconversation feedback interface. The LISSA interface features a virtual agent (see Figure 2 ), able to hold conversations with users, simultaneously providing real-time 

Show negative feedback on i feedback to them on their nonverbal behavior. The feedback is presented through four flashing icons placed at the bottom of the interface. The four icons represent eye-contact, smile, speaking volume, and body movement. The icons are green by default but turn to flashing red as a prompt to the user to adjust their behavior. We kept the simple red-green color scheme to reduce the cognitive load on the user, considering the fact that real-time feedback can be distracting and difficult to interpret [22] . The four nonverbal cues were selected because of their known applicability in improving social skills [50] .

The post-conversation feedback interface (see Figure 3 ) shows a summary of the feedback provided during the conversation. This interface shows how many times the user received feedback (Reminder), how long the user kept the icons green (Best Streak), and how much time the user took to adjust nonverbal behavior (Response Lag).

We first developed a Wizard of Oz prototype of the interface. There were two human operators for driving the system. One was responsible for the dialogue management and the other was responsible for giving feedback by flashing icons. Both operators were able to monitor the user remotely and control the interface through a web-based interface. This design allowed us to collect data from users and learn about their user experience without as yet applying machine learning techniques. Subsequently, we used the collected data to train machine learning models for automation.

To collect data and assess the viability of the interface we conducted a randomized control study with 47 college students and 8 female research assistants in the context of speed-dating. Speed-dating is gaining increasing popularity among researchers as a tool for studying social and communication skills [15] . The study results revealed that participants who used the LISSA interface improved their eye-contact and head nods [1] . From this study, we collected 46 videos of 23 individuals interacting with LISSA. The videos were captured through a camera attached on top of the computer monitor which displayed the LISSA program. We then employed six undergraduate research assistants majoring in Psychology to label the collected videos. The psychologists in our team provided multiple training sessions to the research assistants before the start of the labeling work. The research assistants watched the video recordings and marked those moments where the participants should receive feedback (i.e., red icon flash). We only considered those moments for feedback labels where more than two research assistants marked it as a feedback moment.

We extracted facial and prosodic features from the recorded videos, including head pose, smile, facial action units, volume, and voice pitch. For this, we used off-the-shelf software tools, namely OpenFace [3] , Praat [6] , and SHORE [59] . We then trained a hidden Markov model (HMM) to generate the flashing-icon feedback from the facial and prosodic features. In the past, HMMs have proven successful in modeling human behaviors and actions [58] , [13] . Figure  4 shows the training and feedback generation technique.

In order for LISSA to conduct a conversation, we developed an automated dialogue manager capable of handling multiple topics. Much of the content was based on our experience with the WOZ experiments. An outline of the functionality of the dialogue module is presented in Figure 6 . LISSA leads the conversation by asking questions on different topics (often after a "personal" remark about the topic), and making relevant comments on the user's responses --comments intended to show actual understanding of the user. The dialogue manager can also handle some questions asked by user. It continually updates the conversation plan, based on the user's responses. At the top level, LISSA uses a structure called a schema, which contains a list of expected successive events in a dialogue; it allows for actions by both interlocutors and can be dynamically modified by user responses. The schemas are hierarchically structured, allowing LISSA to insert subschemas into the dialogue plan, helping to make the conversation more spontaneous. In order to capture users' inputs, we use the Nuance [60] speech recognizer. Automatic detection and control of turn-taking is still under development, and we required our users to indicate their turn taking by pressing a button on a wireless clicker When LISSA's dialogue manager receives input from the speech recognizer, it generates a high-level interpretation in the form of short, explicit, context-independent English sentences which we call "gist clauses". Gist clauses are extracted by applying several pattern transduction trees to the user's input, taking the current LISSA question as the context. In order to facilitate input matching, input words are automatically annotated with syntactic and semantic features before extraction of any gist clauses. Features are recursively attached to input words, such as GOODPRED for words like "happy", and ones like SOCIAL-SCIENCE and (by recursion) ACADEMIC-SUBJECT for "linguistics".

After gist clauses have been derived from the user's input, they undergo a second stage of transduction, producing a reaction for LISSA to output. If the user's input answers a question by LISSA, the reaction will usually be a relevant reaction to that answer; or, if there is a question among the extracted gist clauses (typically at the end of a user's input), LISSA is likely to answer the question. Every gist clause obtained is stored in LISSA's memory so that LISSA won't ask for information already provided by the user in previous turns. Also, the collected gist-clauses could be used for future enhancements that allow inference during the conversation and reference to previous contributions of the participants. Figure 5 provides an overview of the functioning of the automated system.

Over the past three years, we have been recruiting participants for our studies through the developmental/behavioral pediatric research center at the local medical center. As already mentioned, our experimental sessions have been conducted with nine teenage participants (i.e., between 13 and 18 years old). The goal of this study has been to learn what aspects of LISSA are judged to be useful, and what adjustments need to be made in order to make LISSA a useful tool for iterative conversation training in the lives of teens with ASD anywhere. The sessions with the teens (whose parents were also invited) were scheduled on separate days. Each participant first interacted with LISSA for five minutes, then took a break for two minutes, followed by a second conversation with LISSA for another four minutes. As dialogue topics, we picked ones common in casual conversations such as "getting to know each other", "living in the current city", "crazy room" (aimed at eliciting imaginative responses), "city I want to move to in future", "free time", and "movies". During the conversations, the participants received real-time feedback through the flashing icons. After each conversation, the participants received post-session summary feedback. We conducted an interview with both the accompanying parent and the participant right after the LISSA session. The interview included survey questions on LISSA's usability and capacity for open-ended discussion. The interview was audio recorded and then transcribed by professional transcribers.

We presented 12 statements to the participants and asked them to specify their opinion ('strongly disagree' to 'strongly agree'). The questions were targeted to understand the usefulness of the feedback and the dialogues. The questions were inspired by the well-established system usability questionnaire [9] and modified to our specific needs. Figure  7 shows the specific questions and percentage of participants' answers in each category. The questions marked with a star (*) were answered significantly more (p<0.05) with options 'agree' or 'strongly agree' compared to other options. We performed a single sampled nonparametric significance test [33] with Bonferroni correction [2] against the option 'neutral'.

Participants felt that they were being understood by LISSA. Participants also expressed that they could continue the conversation and pay attention to the icons without any trouble. This indicates that real-time feedback might be applicable to the teenage population. Additionally, participants felt that the feedback they received from LISSA was useful. During our interview session participants expanded on this perceived usefulness. The feedback was consistent and in accord with what their therapist said in the past. For instance, one participant had issues (i.e., slouching) with his posture and he received feedback through the 'body movement' icon. During the interview, the participant mentioned this and said that his parents often ask him to sit straight.

As can be seen in the figure, participants had mixed opinions about several questions, such as whether the conversational experience felt real, and whether LISSA's movements seemed natural. Three participants responded positively about the latter question, four responded negatively, and two were neutral. In the interview session, several mentioned that the lip movement and eye gaze were unnatural. Additionally, LISSA was not responding immediately. This was due to the fact that LISSA processes the dialogues and facial features in real-time and the processing takes place on a remote server. In our future versions of LISSA, we will make it more responsive by performing most of the computing locally.

We performed a thematic analysis [20] on the interview transcripts. In the past, thematic analysis was successfully used for user-centered design [35] and rapid online interface prototyping [31] . Additionally, thematic analysis was used for identifying the design guidelines for developing computer and phone-based technology to help improve the social skills of the children with ASD [39] , [38] . In our analysis, three researchers performed thematic analysis and then the themes were merged to produce the final analysis report. As a basis for a qualitative thematic analysis of the interview transcripts, we considered individual interviews from the perspective of the following 14 labels: usefulness, perceptiveness, related systems, accuracy, familiarity, curiosity, realism, speed, appearance, improvements, social, multitasking, uncertainty, and adult identity. As a result, six themes (closely related to some of these labels) stood out to us as relevant to summarizing the experiences of the participants. These are elaborated in the following. The first theme briefly summarizes positively perceived aspects of LISSA, while the rest dwell mostly on aspects where further developments are desirable. We believe it is at least as important to focus on weaknesses as on strengths, as a guide to further development.

Participants generally found LISSA useful for practicing conversations. Additionally, they thought that the program was not hard to navigate. One participant said, "Wasn't that hard to use. It could definitely be used for somebody who really needs help with conversations or for somebody who is not really social or for somebody who is not really the kind of person to be talkative."

Participants liked the fact that the feedback was not coming from a human and they could use it in their private space without being observed. When we asked if they prefer human or a computer for giving feedback, some were ambivalent, some preferred the computer ("I would rather have that (LISSA) for feedback,"), and some were skeptical about LISSA. Some participants with favorable reactions added that the automatic facial feature detection and the accuracy of the feedback were the main reason for endorsing LISSA for conversational skills training. One participant said, "The fact that it was able to actually detect the facial features and everything being so accurate, I would consider that is good enough to actually train on."

The caregivers, as well, liked the fact that LISSA allows users to have a conversation with a virtual agent instead of (for example) a stranger online. They felt that LISSA was realistic and appreciated that it provided quite complex, and private interactions.

When conversing with LISSA, participants were often evaluating the experience in relation to real, social settings, for example, whether the feedback they received was truly appropriate. One participant noted that smiling broadly enough for LISSA to notice might be perceived as inappropriate in a public setting:

Occasionally the self-aware evaluation of LISSA took the form of push-back against the presumption that they needed to improve their behavior, implicit in LISSA's feedback. One participant indicated that any inadequacy in their behavior with LISSA was due to the awkwardness of interacting with a virtual agent:

"Well, I am actually social. I do make good eye contact with other people but it's just kinda awkward you know."

Another teenager commented similarly:

From this it appears that some participants would need some time to become more familiar with LISSA. After multiple, unobserved interactions they might well feel more comfortable with the system. The comments also suggest that users might be more accepting of LISSA if the purpose of the system were more fully explained -that is, it is simply a tool for users who would like to improve technical aspects of their conversational ability, if they feel they could improve in that area.

Participants focused quite intently on the realism and precision of the LISSA persona. There were debates on how realistic the eyes, lips, face, voice, and overall movement were. Participants suggested various improvements, such as more flowing speech, immediate responses, and faster blinks:

"The other part I didn't like about her is because it took her a while to respond to my statements. So it was kind of confusing and irritating." "I would say blinking eyes was definitely a little bit slow, because if I blink it looks almost instant. But for that it was half a second total, it seems quite slow." "Nice sounding voice, nice face but the moving the lips thing was kind of a little creepy. Mainly because kind of it's little too computer-ish maybe."

When participants were asked about the usefulness of LISSA, their judgments were dependent on realism. For the other interview questions, participants had varying responses, but there always seemed to be an underlying fixation on realism.

"Well, it was good that she asked me what I liked to do. But since it wasn't really real it just seemed all awkward for me. I would make it better by making the quality High Definition and High quality better with talking to her for real."

When they were asked about multitasking between the conversations with LISSA and looking at the icons, there were different opinions: some found the icons helpful, and others wanted more explanation and bigger icons.

And when asked about if they preferred feedback from a computer or human, the participants had mixed opinions. For example, one participant said, "The feedback I received from LISSA was useful. Well it was kind of a bit choppy and kind of pretty much prefer something a bit more realistic."

Interpreting feedback through LISSA's behavior while conducting a conversation may be overwhelming for many participants. Perhaps if LISSA herself provided the feedback verbally during the conversations, rather than relying entirely on icons for feedback, it would feel more like a real interaction to users. Regardless of their preferences in feedback delivery, all participants expressed that the feedback was consistent with what other people had said to them in the past face-to-face training sessions. One participant said, "They (feedback) were actually kind of handy and appeared to be pretty accurate."

An important observation was that users broadly agreed on the need for real-time positive feedback. While the flashing red icons noticeably indicated the need for improvement, the reversion to static green icons was not sufficiently noticeable as positive feedback. This indicates both the need for changes in icon functioning and, once again, the desirability of verbal feedback (especially acknowledgement and praise) from LISSA.

At the end of the interviews, participants tended to compare LISSA to other conversational agents. They were often familiar with Siri and Alexa, and discussed the standard set by these virtual assistants in terms of speed and knowledgeability. They liked that LISSA could detect behavioral features, but hoped that LISSA could better understand and recognize them in the future. For example, a participant said, "In its current state yes I might mess around with it some but I don't believe I would use it as an actual social skills training and jump into actual conversations just yet." It seemed that they had high expectations of LISSA which in some cases led to impatience during the conversation. One participant said they would feel more comfortable talking with systems like Siri, that they believe understands them.

"That's like my first time ever talking to a computer, well except for Siri that's different though. That one is fine."

The desire for adult identity One teenager, after being urged by a caregiver to be honest, admitted that they probably wouldn't choose to interact with LISSA, unless perhaps LISSA "popped up on their computer". Part of the reason seemed to be that LISSA straddles the boundary between fantasy and reality; and while fantasy is fine for a kid, for real conversations they prefer actual humans:

These participants were ambivalent about the future use of LISSA because they were inclined to regard it as a conversational tool for children. One participant said their schedule was too busy, but others could benefit from LISSA.

"Just so you know I am already 17 years old, I am growing up and some of these little kids' things I have outgrown but not all of them."

In one conversation, a participant felt momentarily uncomfortable with LISSA's comment about a "crazy room," (asking them to speculate what kind of crazy room they would enjoy) and said they wanted to feel like an average adult:

"Well, pretty much just the crazy room cause well I wanna be what your average adult is. Basically responsible, kind, but also a bit unique."

These comments again suggest the need to make LISSA's purpose clearer to users. It is not intended to be a surrogate human, but rather a tool for repetitive, private practice of conversational behavior. Also, being sensitive in the choice of terminology is important, and perhaps asking for a creative response is inappropriate for some participants.

A majority of participants found that LISSA provided useful feedback and might well be helpful for practicing their conversational skills. As we noted, they also liked the fact that LISSA would allow them to converse in private. Another point of interest is that the participants preferred real-time feedback to post-session feedback.

Our interviews with users helped to shed light on how the interface design could be further improved. Our qualitative analysis of these interviews provide evidence for the soundness of our design so far and grounds for optimism about our further development plans. The analysis also allowed us to formulate several important guidelines for the design of LISSA-like interfaces for conversation practice.

Our experience with users made clear that users' assessment of LISSA's behavior and potential utility for conversation practice depended very much on their expectations. They generally acknowledged that LISSA seemed to understand them and responded appropriately to inputs, yet was not genuinely human-like. The perceived shortcomings concerned LISSA's physical behavior, accuracy of perception of user behavior, and depth of knowledge. In part, this perception arose from comparisons with commercial systems such as Alexa and Siri, which have been optimized for smooth functioning in targeted information retrieval and other assistive functions.

These reactions indicate the need for fuller preparation of users about LISSA's purpose: It is not a surrogate human, and it is not an app for access to useful knowledge or personal assistance. It is simply a tool for repeatedly practicing casual conversation for those who feel they could improve in that area. While LISSA has a range of verbal reactions to users depending on their particular inputs, and provides nonverbal feedback as a function of the user's behavior, the conversations are bound to be shallow, and to become more repetitive with multiple uses. Further, LISSA's physical behavior is not the focus; it is merely intended to be sufficiently human-like to make a casual conversation possible. All this should be made clear to potential usersalong with a comment that there is no assumption that all users with ASD are lacking in the skills that LISSA is intended to help with. Such preparatory information could be provided both in advance of actual use of LISSA, and as part of LISSA's opening remarks (which already include "I might sound a bit choppy, but ..."). The post-session interviews of users should likewise focus on relevant aspects of LISSA's functioning. For instance, interview questions might include disclaimers, as in, "We know that LISSA doesn't smile and blink very naturally, but was the content of her responses to you reasonable and natural?". In general, it is evidently important to prepare users not only for the capabilities of an interactive system, but also its limitations.

As noted in the previous section, the participants wanted to be made aware of positive changes. Perhaps flashing green could be used for behavioral improvements. Better yet, the virtual agent could say, for example, "You have good eye contact now". The efficacy of positive feedback and acknowledgment has been observed in past research [56] , [48] , [29] , and our experience further confirms the desirability of positive acknowledgments for interventions aimed at social skills development.

Notwithstanding disclaimers about LISSA's physical behavior, the issue deserves further attention. A possible reaction to users' comments about insufficiently realistic smiling, eye blinking, and reaction speed might be to back away from the "uncanny valley" (e.g., Chattopadhyay et al. [12] ) by using a more cartoon-like avatar. However, this would risk reducing LISSA to a toy in the eyes of potential users. Instead, we interpret the users' comments as urging further development of the avatar towards greater realism. This is consistent with their age -they are approaching adulthood, and prefer a realistic to a childish avatar. In fact their comments suggest that the more life-like the character, the more likely they are to take it seriously. Realistic appearance of virtual characters has also been shown to be effective in other scenarios such as negotiation, tactical questioning etc. [30] , [52] . The most important areas for improvement seem to be smiling and eye blinking. (Smiling is of course well-known to be very important in communication.) For example, smiling needs to be consistent with current feedback (one participant commented on co-occurrence of a smile by LISSA with negative feedback). Furthermore, smiles could be used directly to indicate improvements, or in support of positive verbal or icon feedback.

Some participants expressed enthusiasm about home-use of LISSA. However, they varied in their opinions about the choices of topics. For example, while the "crazy room" topic struck a chord with some (e.g., they would fill it with video games), others objected to it, terming it childish. In addition, participants thought that it would be useful if LISSA could talk about topics of their own choosing. For example, one participant was very interested in computer programming and wanted to talk about it more. The LISSA program, at its current stage, is designed for initiating conversational topics, and treating a specialized topic like computer programming seriously would be a major challenge. However, adding further mundane topics is quite feasible, and in fact we have added many more in a related application currently under development. Thus we could personalize interactions to a considerable degree by having LISSA choose topics dynamically, skipping those that the user seems indifferent to. Also, choices could be made sensitive to the user's age or maturity. An immature 13-year-old may have quite different interests from a mature 18-year-old. (Some of our newly developed topics pertain only to seniors, just as some of LISSA's topics for ASD teens, such as bullying at school, pertain only to school-age users.)

Another opportunity for personalization lies in the verbosity or otherwise of the user. Our experiments showed that while some users provide expansive responses to LISSA, others respond tersely. As the goal of the system is to help users improve their communication skills, the system could gauge users' verbosity and provide helpful feedback where appropriate. For instance, LISSA might encourage laconic users elaborate their answers, or conversely, provide gentle suggestions about curtailing rambling or off-topic inputs.

Assessing users' verbosity throughout the conversation can also help improve turn-taking behavior. Users who tend towards longer answers should probably be allowed slightly longer silences before the turn is seized from them. The same applies to hesitant, slower speakers. Certainly humans adapt to such individual differences. This is an important research area --to our knowledge, no available automatic turnhandling methods take into account individual speakers' verbosity or rate of speech.

One of the most important observations we made about teens with ASD in comparison with (neurotypical) college students, independently of verbosity, was that the ASD teens refrained from asking any reciprocal or other questions of LISSA (e.g., after telling LISSA about their favorite movie, asking "and what's your favorite movie?"). Whether this is due to less willingness to treat LISSA as human-like, or to limitations in social intuition, it is an area where verbal feedback by LISSA could be particularly useful; for instance LISSA might say, "This would be a good point to ask me about my favorite movie. Would you like to try?".

The current version of LISSA was not designed for immediate use in a randomized control intervention study. Rather, it is an exploratory system, which will enable a randomized control study after modification and enhancement based on the lessons learned from the trials with the initial set of ASD teens.

LISSA's dialogue manager was adapted from the initial version for college students to the anticipated needs of ASD teens, with advice from experts. It worked well, but the experiments have shown where improvements are most desirable, for example in topical adaptation to the user, inclusion of direct helpful hints in the verbal reactions to the user, and allowance for different turn-taking styles. Similarly, LISSA's nonverbal feedback system was trained on the data collected from college students, and although the participants perceived the feedback as useful, our experimental results indicate ways in which flashing icons, sensitivity to user smiles, and reaction speed could be improved. Also, while the post-session interviews indicated that the users liked the appearance and voice of the avatar, they saw a need for improvements in the naturalness of the avatar's behavior (especially smiles and blinking of eyelids). In our future work, we will design a customizable interface based on the knowledge we gathered through this study.

Data collected from teenagers with ASD using future versions of the system will help us to further improve the sensitivity and responsiveness of the system. In the current system, the dialogue and the feedback modules are independent. It clearly would be useful to tie the nonverbal feedback to the dialogue content, and to supplement nonverbal feedback signals with direct verbal ones. In the future, we will make the feedback dialogue aware.

In this paper, we described an interface capable of conducting a multi-topic conversation and provide feedback to help improve the user's overt behavior. The design benefited from the expertise of a pediatrician, psychologists, and a UX designer. We investigated further design desiderata through a study with nine teenagers with ASD. Using a thematic analysis we formulated several guidelines for improved interface design for teens with ASD. In future this knowledge will help guide interface design for this population as well as others with similar needs for social skills enhancement in casual dialogues.

Ophthalmic & physiological optics : the journal of the British College of Ophthalmic Opticians (Optometrists)

OpenFace:an open source facial behavior analysis toolkit

Ideas: An Interface Design Experience for Autistic Spectrum

Designing an Intelligent Virtual Agent for Social Communication in Autism

Praat: doing phonetics by computer

Socially computed scripts to support social problem solving skills

SayWAT: Augmenting Face-to-Face Conversations for Adults with Autism

Latha Carpenter, Laura Arnstein Soorya and Danielle Halpern

Centers Center for Disease Control, Prevention, and others. 2013. Learn the signs. Act early. Program (www. cdc. gov/ActEarly)

Familiar faces rendered strange: Why inconsistent realism drives characters into the uncanny valley

A daily behavior enabled hidden Markov model for human behavior understanding

SimSensei Kiosk : A Virtual Human Interviewer for Healthcare Decision Support

Speed-dating as an invaluable tool for studying romantic attraction: A methodological primer

Delivering Cognitive Behavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial

Supporting children's social communication skills through interactive narratives with virtual characters

Using Multitouch Collaboration Technology to Enhance Social Interaction of Children with High-Functioning Autism

The Use of Virtual Characters to Assess and Train Non-Verbal Communication in High-Functioning Autism

Introduction to applied thematic analysis

Interactive visual supports for children with autism. Personal and Ubiquitous Computing

Cognitive-behavioral therapy for social anxiety disorder: Current status and future directions

Ethical Challenges in Data-Driven Dialogue Systems

Avatar assistant: Improving social skills in students with an asd through a computerbased intervention

Multitouch tablet applications and activities to enhance the social skills of children with autism spectrum disorders. Personal and Ubiquitous Computing

Evaluation of tablet apps to encourage social interaction in children with autism spectrum disorders

Developing User Interface Design Application for Children with Autism

Assessing the minimally verbal school-aged child with autism spectrum disorder

Caregiver interactions with autistic children

Building Interactive Virtual Humans for Training Environments

A user-centered model for web site design: needs assessment, user interface design, and rapid prototyping

Lessons from participatory design with adolescents on the autism spectrum

On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other

Emotional and behavioural problems in children with autism spectrum disorder

mHealth consumer apps: the case for user-centered design

Development of a software-based social tutor for children with autism spectrum disorders

Smartphone-based conversational agents and responses to questions about mental health, interpersonal violence, and physical health

Additional key factors mediating the use of a mobile technology tool designed to develop social and life skills in children with Autism Spectrum Disorders: Evaluation of the 2nd HANDS prototype

Key factors mediating the use of a mobile technology tool designed to develop social and life skills in children with Autistic Spectrum Disorders

A mixed-initiative conversational dialogue system for healthcare

Rachel: Design of an emotionally targeted interactive agent for children with autism

Exceptionally social: Design of an avatar-mediated interactive system for promoting social skills in children with autism

SIDES: a cooperative tabletop computer game for social skills development

The LISSA virtual human and ASD teens: An overview of initial experiments

Managing Casual Spoken Dialogue Using Flexible Schemas, Pattern Transduction Trees

Mobile phone-based asthma selfmanagement aid for adolescents (mASMAA): A feasibility study

Qualitative or quantitative differences between asperger's disorder and autism? Historical considerations

Reward processing in autism

Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation. 3288-3294

Nonverbal Behavior and Communication

Psychiatric Disorders in Children With Autism Spectrum Disorders: Prevalence, Comorbidity, and Associated Factors in a Population-Derived Sample

Intelligent Agents for Interactive Simulation Environments

Embodied conversational agents for multimodal automated social skills training in people with autism spectrum disorders

Authorable virtual peers for autism spectrum disorders

Prevalence of autism spectrum disorder among children aged 8 years -autism and developmental disabilities monitoring network, 11 sites, United States

A positive approach to autism

Endto-end LSTM-based dialog control optimized with supervised and reinforcement learning

Recognizing human action in time-sequential images using hidden Markov model. Computer Vision and Pattern Recognition

NUANCE: Speech Recognition Solutions