key: cord-0632875-5e1ty6xb authors: Mahmoudi-Nejad, Athar; Guzdial, Matthew; Boulanger, Pierre title: Arachnophobia Exposure Therapy using Experience-driven Procedural Content Generation via Reinforcement Learning (EDPCGRL) date: 2021-10-07 journal: nan DOI: nan sha: b2133fead321176fca20314f91c100f304ac2b54 doc_id: 632875 cord_uid: 5e1ty6xb Personalized therapy, in which a therapeutic practice is adapted to an individual patient, leads to better health outcomes. Typically, this is accomplished by relying on a therapist's training and intuition along with feedback from a patient. While there exist approaches to automatically adapt therapeutic content to a patient, they rely on hand-authored, pre-defined rules, which may not generalize to all individuals. In this paper, we propose an approach to automatically adapt therapeutic content to patients based on physiological measures. We implement our approach in the context of arachnophobia exposure therapy, and rely on experience-driven procedural content generation via reinforcement learning (EDPCGRL) to generate virtual spiders to match an individual patient. In this initial implementation, and due to the ongoing pandemic, we make use of virtual or artificial humans implemented based on prior arachnophobia psychology research. Our EDPCGRL method is able to more quickly adapt to these virtual humans with high accuracy in comparison to existing, search-based EDPCG approaches. Experience-driven Procedural Content Generation (ED-PCG) is a PCG framework that modifies content to optimize a user's experience. Although EDPCG was developed for games, it can be applied to other HCI domains that require automated customized content (e.g. recommender systems) . We argue EDPCG can also be a useful framework for computer assisted-therapy. We can divide computer-assisted therapy into two groups: non-adaptive and adaptive. The non-adaptive approach provides predesigned content for all users, which is not ideal since individuals benefit from individualized treatment (Zahabi and Razak 2020). Adaptive approaches generally structure this as a selection problem: choosing between preexisting content to better match an individual's treatment needs, resulting in better health outcomes (Zahabi and Razak 2020) . Current adaptive computer-assisted therapy is mainly therapist-guided or rule-based. The former requires therapist intervention which is time-consuming, burdensome, and requires specialized training. The latter modifies the therapeutic content based on predefined rules, which are brittle and Copyright © 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. cannot account for all possible individuals (Abdessalem and Frasson 2017; Heloir et al. 2014) . In contrast, an EDPCG framework could be used in computer-assisted therapy as an automatic content generation tool that adapts to satisfying individuals' therapeutic needs. There are only a handful of studies that have used EDPCG for physical rehabilitation, e.g., motor rehabilitation (Dimovska et al. 2010) , amblyopia (Correa et al. 2014) , and upper limb rehabilitation (Hocine et al. 2015) . In these studies, the Player Experience Model (PEM) in the EDPCG framework is gameplay-based (player performance). Gameplaybased PEM assumes that the player's internal state can be derived from the way a player plays a game. However, according to Yannakakis and Togelius (Yannakakis and Togelius 2011) , gameplay-based PEM is a low-resolution model of the player's experience, which is not ideal for psychological rehabilitation. Further, less attention has been paid to using EDPCG for psychological rehabilitation, e.g., mental health (Badia et al. 2018) . We draw on physiological measures in our PEM, which more closely correspond to a player's internal state, especially for measuring stress. This paper presents an EDPCG framework for arachnophobia treatment leveraging a Experience-driven Reinforcement Learning-based PCG (EDPCGRL) content generator. The RL agent dynamically generates spiders in order to induce a desired stress level. The framework's goal is to keep players within a defined physiological state to allow for more effective exposure therapy. Exposure therapy is a therapy technique for treating anxiety disorders in which an individual is gradually exposed to the anxiety source. This paper introduces a new research area: EDPCGRL. We investigate the application of an EDPCGRL system to computerassisted therapy. The contributions of our framework are: 1) Demonstrating the feasibility of a cognitive-based EDPC-GRL approach for rehabilitation; 2) Demonstrating the first instance, to the best of our knowledge, of PCGRL outperforming search-based PCG. Virtual reality (VR) immerses individuals in graphical computer-generated environments. VR has been found to be effective in exposure therapy for specific phobias, in- cluding fear of heights (Freeman et al. 2018) , fear of spiders (Arachnophobia) (Côté and Bouchard 2009 ) and other anxiety disorders (Maples-Keller et al. 2017 ). In exposure therapy, a subject is gradually exposed to a feared situation or object in a safe environment, leading to desensitization and a healthier response. While we are not dismissing non-adaptive VR for arachnophobia such as (Shiban et al. 2015; Miloff et al. 2019 ), we focus on adaptive approaches, because they better fit subjects' different needs. For example, Kritikos et al. (Kritikos, Alevizopoulos, and Koutsouris 2021) defined rules to change a spider's appearance and pattern of behaviour (e.g., size and velocity of the spider, probability of walking towards the user, etc.) to induce a desired level of anxiety in a subject. The level of anxiety is calculated based on normalized electrodermal activity (EDA 1 ) changes. Instead of using predefined rules based on EDA changes, our framework adapts a spider using PCGRL. Procedural Content Generation (PCG) automatically generates content using algorithms. Traditional PCG approaches require hand-authored knowledge, such as constructive (Shaker, Togelius, and Nelson 2016) , search-based , and constraint-based methods (Smith and Mateas 2011) . To address this limitation, researchers started applying machine learning (ML) methods to PCG (Summerville et al. 2018); however, because they are primarily supervised learning methods, they require a pre-existing dataset. Our framework is based on PCGRL to automatically generate new content without the need of a dataset. 1 The EDA measures the variations in the skin's electrical conductance due to increases in the activity of sweat glands PCG for Rehabilitation PCG has rarely been applied to rehabilitation. Dimovska et al. (Dimovska et al. 2010 ) used a constructive PCG generator in a ski-slalom game to place challenges according to a player's performance for motor rehabilitation. Correa et al. (Correa et al. 2014 ) developed an adaptive firstperson shooter game using constructive PCG for amblyopia treatment based on the player's performance. Hocine et al. (Hocine et al. 2015 ) developed a game for upper limb rehabilitation through pointing tasks, i.e., reaching targets. They locate targets using Monte Carlo tree Search (MCTS) based on the user's performance, and generate the level with constructive PCG (i.e., choosing game entities). Badia et al. (Badia et al. 2018 ) developed a labyrinth game that adapts by estimating a user's emotions via physiological responses. The system promotes an emotional self awareness for more effective emotion regulation. A constructive PCG approach was applied to generate different graphical content. These works use a set of predefined content, assuming the subjects are known. Instead, our work assumes that the subjects are unknown; therefore, content needs to be generated and adapted dynamically. PCG via Reinforcement learning (PCGRL) methods focus on applications where we do not have any pre-authored training data, but we do have an environment of possible content and a way of automatically evaluating that content. In contrast to supervised PCGML methods, no pre-existing data is required for PCGRL. Thus, it can be applied to situations like ours in which no data is available. We summarize prior PCGRL work in Table 1 . The purpose of these projects is mainly entertainment or education. The reward functions used in these works are based on different game-specific content. The application of a gamespecific reward function, in our framework, would require assuming that the game-specific content reflects the players' cognitive state. However, the assumption is not necessarily true, and would be challenging to implement since it requires a sophisticated game design. Instead, based on Shaker (Shaker 2016) , we define our reward function via human interaction with the system, which we argue is an essential factor in effectively using PCGRL for rehabilitation purposes. This section overviews our framework for adapting a game environment using PCGRL for exposure therapy based on user responses. We visualize our framework in Figure 1 . We use arachnophobia as a case study for this framework. A subject interacts with a virtual spider, and their physiological responses are used to estimate their stress levels in realtime. A PCGRL agent modifies the virtual spider in order to reach a therapist-defined goal stress level. While we intend for the final version of this system to draw on prior work on estimating stress from physiological responses (Schmidt et al. 2018 ), we do not implement this part of the framework for this paper. In the following subsections, we describe each component of Figure 1 in detail. For the environment, we envision a VR program with a generated 3D model of a spider. Our initial implementation does not require the completed VR environment and so we use a prototype version of the game environment, which is a simple representation of the spider given its specific attributes. We visualize three sets of attribute values for our prototype spider in Figure 2 . The virtual spider is initially generated with random attributes, which are later adjusted by our ED-PCG approach. These attributes include movement-related and appearance characteristics of spiders based on Lindner et al.'s work . In this study, spiderfearful individuals (n = 194) were asked to rate the characteristics of spiders based on their fear response. We overview this study in more detail in the next subsection. Theoretically, players' physiological responses should allow for an estimation of their stress level (Schmidt et al. 2018 ). The paper introduced seven spider attributes and measured their impact on their participants' overall amount of fear by asking the participants to quantitatively rate each spider attribute. We choose six attributes for spider generation as shown in Table 2 . We drop the realness attribute since it indicates whether or not to use real spiders, and we focus on virtual spiders. For each attribute, we define 2-3 ordinal values. The movement attributes: locomotion, amount of movement and closeness, denote how the spider moves (specifically the movement of the legs), how much it moves, and how close it gets to the subject respectively. The appearance attributes: largeness, hairiness, and color, denote the size of the spider, whether the spider is hairy or not, and the color of the spider, respectively. We generate 100 virtual subjects as stress estimation functions, each with different responses to spider attributes. In , an impact factor is associated with each spider attribute, denoting the effect of the attribute on the subjects' amount of fear. Based on the impact factors reported in the study, we model a normal distribution for each of the attributes as N ∼ (µ ai , δ ai ), i ∈ {1, ..., 6}; where a i denotes the i-th attribute in Table 2 , and µ ai and δ ai are derived from the i-th attribute's mean impact factor and subjects' fear variance of the attribute, respectively. We draw 100 random samples from each distribution to generate 100 virtual humans as our subjects. An example of stress for one virtual subject is: 1.37 × (0.97 × a 1 + 0.87 × a 2 + 0.07 × a 3 + 0.63 × a 4 + 0.67 × a 5 + 0.77 × a 6 ); Where 1.37 is the coefficient to scale the stress level to the range 0-10 for this virtual subject. This approach led to 100 unique virtual stress levels, which are still based on psychological study of real, spider-fearing humans. Thus, we expect there to be at least some psychological grounding to our results. State Each state represents a combination of the spider's attributes. We define for each attribute a range of possible values, which are listed in Table 2 . The representation of state in time t can be represented as (Eq. 1): Where a i,t is the value of the i-th attribute in time t. Action The action is defined as increasing or decreasing one attribute at a time (Eq. 2). Where a policy,t is the attribute chosen by the policy of ED-PCGRL for action in time t. The intuition behind this decision is that therapists gradually increase/decrease these attributes in exposure therapy to find reasonable values to produce their intended stress level. Reward The reward function is calculated based on a normal distribution N (µ, δ), where µ is the target stress level, and δ = (M axStress − M inStress)/2, where M axStress and M inStress are maximum and minimum values that the stress level can reach respectively. We scale the distribution to the range of (−1, 1) such that the target stress level achieves a reward of 1, and the reward decreases as the stress level gets further from the target stress. The resulting reward function is shown in Eq. 3. where x is the current stress level. We propose a framework for arachnophobia exposure therapy that can automatically adapt to subjects. For each subject, it generates spiders until it finds a spider (with specific features) that induces a subject's target stress level. This is a difficult task since it is suggested in (Vetter 2013 ) that people with arachnophobia respond to different aspects of spiders differently. We evaluate our proposed method in terms of how many times we present new spiders to a subject, denoted as Spiders Presented. This number should be as low as possible, because it may adversely affect the effectiveness of the treatment. This metric is equivalent to the number of times that an algorithm outputs a new spider. At this stage, we evaluate our framework using virtual subjects (discussed in the methodology section) as a proof of concept. This is because of the ongoing COVID-19 pandemic, which stopped us from conducting a human subject study. We generate 100 virtual subjects with a theoretical basis in real spider-fearing individuals. If our method outperforms our baseline methods using these virtual subjects, it will indicate that we can likely expect similar performance with real subjects. Since there is no existing EDPCGRL framework to compare against, we compare our proposed method to EDPCG via search-based procedural content generation (SBPCG) methods. SBPCG is a traditional way to adapt content based on the assessment of users (Risi et al. 2015; Liapis, Yannakakis, and Togelius 2013) . We briefly describe the baseline methods used in our evaluation and the implementation of our proposed method. Genetic Algorithm: A Genetic Algorithm (GA) is our first baseline because it is a popular experience-driven SBPCG approach . In our problem, the GA's chromosome is equal to the state defined in the previous section. The GA starts with a population that consists of a given initial state and its nine neighbour states (the population size is 10). We denote two states as neighbours if and only if they differ by only 1 value for a single attribute. Our approach selects the ten best chromosomes in terms of their fitness as its initial population. Based on Table 2, there are two chromosomes that could have eleven neighbours (locomotion=1, amount of movement=1, close-ness=1, largeness=1, hairiness=0 or 1, color=1). The "average" initial state is one of these chromosomes. Therefore, we do not anticipate different results if we took all available neighbours. The GA then performs crossover, mutation, and selection until it reaches the termination condition. The crossover process uses a weighted sampling for picking two pairs of chromosomes based on fitness score. The pairs are swapped from the middle point (one child takes the first half of the attributes from one parent and the second half from the other, and the other child vice versa) to generate two new offspring. For mutation, a random attribute in the new offspring is changed with 0.1 probability to a random, valid. These operations do not hold the constraint of increasing/decreasing one attribute at a time, allowing for much larger steps through the search space. Finally, the selection process chooses ten chromosomes with the highest fitness values as the next population. Greedy search: Our second baseline is a simple greedy search. This approach works best if the fitness function does not have local optima since it always exploits the best neighbour. It starts from the given initial state, searches its neighbours, and chooses the neighbour with the maximum fitness. Therefore, the fitness gradually increases (or stays steady) until the termination condition. This approach should perform best in terms of the minimum number of Spiders Presented if our problem is simple enough. Random Search: Our final baseline is a random search. It also starts from the same initial state as previous baselines. In each step, it randomly selects an action (defined in the previous section) and goes to a new state. It keeps searching until it reaches the termination condition. We include random search to investigate the importance of exploration in this domain. PCGRL (Our method): This is our method, which optimizes the spiders' attributes using RL. We employ the tabular Q-learning algorithm and epsilon-greedy ( = 0.05) for our action selection policy (Sutton and Barto 2018) . We initialize a Q-table with either random or zero values that store the values of state-action pairs, and update the values in each iteration. These methods all use the same fitness function: the reward function for the PCGRL agent. We use the same initial state and termination criteria for all the methods for a fair evaluation. We define these as: • Initial State: We use three initial states for our evaluation: in each state, we set the spider's attributes to either the maximum, minimum, or the average values within each attribute's range (visualized in Figure 2 ). • Termination: These algorithms terminate if they reach one of these criteria: achieve a state with maximum fitness or run for 100 iterations. We run each algorithm 10 times for each virtual subject and for each of the three initial states. We apply these approaches for every goal stress level from 1 to 9. In total, each algorithm is executed 27000 times. We define three stress categories: low ([1, 3]), moderate ([4, 6]), and high ([7, 9] ). We calculate the average value of Spiders Presented for our 100 virtual subjects for each initial state and each stress category. We repeat this process 10 times and report the averages in Table 3 . Spiders Presented in the table denotes the number of new spiders showed to a virtual subject on average when the approach is successful. The Accuracy shows the percentage of times the method could find a spider that provokes the target stress level out of the ten attempts on average. The Spiders Presented result is not taken into account if a run had Accuracy less than 75%. For example, for the Max initial state and Low stress goal, the RL Zero has the least number of Spiders Presented, but since the Accuracy is 70.93% (less than 75%), we did not consider it to be the best result. In each row, the method that outperforms other methods according to the Spiders Presented metric is shown with bold text. The table show that our proposed method outperformed the baseline methods for almost all of the target stress levels with various initial states. We note that we do not expect these results to necessarily generalize to all hyperparameter values. A hyperparameter sweep of the approaches is not in the scope and within the page limitation of this paper. Two-tailed, paired-samples t-tests (p < 0.05) were performed to compare the mean Spiders Presented metric across different methods. The results are shown in the table with * * if the best algorithm is significantly better than the other methods. There are cases that the best method significantly differs from the others except for one/two method(s). In these cases, we indicate the method with * . We found that our proposed method, EDPCGRL, outperforms the baseline methods in the Spiders Presented metric, i.e., PCGRL showed fewer spiders to our virtual subjects before finding one that induces the specific subject's stress level on average. This might be because it combines exploitation and exploration. Figure 3 shows the Spiders Presented for each virtual subject in the sequence they were presented to all approaches. It reveals that the RL agent with the q-table initialized to 0 consistently outperforms other methods, which shows the method did not just learn our subjects' behaviour over time. Our results reveal that for each stress level category: low, moderate, and high, it might be ideal to set the initial state attributes to the minimum, average, and maximum, respectively. However, we emphasize that increasing/decreasing all the spider's attributes does not always produce spiders closer to our desired one due to the variance in the virtual subject fitness functions. Nevertheless, in these situations, the Random baseline performs near optimally, because it is already closer to the desired spider in most cases. We evaluate these methods on the virtual subjects, which simplifies the problem since the subjects are deterministic and do not change over time. However, if we imagine our virtual subjects not as distinct individuals, but as the same subject over time (e.g. fig. 3 ), we can observe that RL would also likely do better with dynamic, real-world individuals. Therefore, our results show that our method has potential for real subjects. We intend to evaluate the proposed method on real human subjects who are complicated and may respond differently over time. Therefore, it is challenging to obtain a model that maps physiological measures to stress levels. In fact, the same individuals may show different patterns in their physiological measures. Environmental factors may affect individuals' physiological response, e.g., drinking coffee, sleeping less/more. We require a robust and reliable model in our future work that accurately estimates stress levels from physiological measures. We also plan to improve the fidelity of the virtual spiders by using more sophisticated methods, such as more attributes and/or more possible ordinal values. Currently, our approach uses a simple yet effective RL method. However, we are interested in investigating more sophisticated RL methods to determine if their performance differs significantly. There are many other algorithms available that utilize the combination of exploitation and exploration, e.g., Monte Carlo Tree Search (MCTS). However, they would require hyperparameter finetuning to work prop-erly in our domain. We are also interested in applying transfer learning approaches to adapt the knowledge learned from one subject to a new subject. Another limitation of our work is related to the ethical aspects. Our framework might cause excessive stress in the subjects. Therefore, the action selection policy in our PC-GRL should be carefully adjusted in a way that the RL agent considers gradually increasing the stress level. Nevertheless, the framework might also re-traumatize the subjects. This paper introduces a new research area, i.e., EDPCGRL which was unidentified in prior work. We defined a proof of concept of our EDPCGRL framework for virtual reality exposure therapy, where PCGRL adjusts game content according to a subjects' physiological measures. We hypothesized that our EDPCG framework could be applied in computerassisted therapy rather than reliant on pre-authored methods. We support this hypothesis by evaluating our proposed framework for arachnophobia in a case study. Different spiders with six different attributes are generated based on a subject's stress level to find one that induces a target stress level. Our goal was to design a method that finds a desired spider with fewer spiders presented to a subject. We found that EDPCGRL outperformed existing experiencedriven SBPCG methods for this task. Real-time brain assessment for adaptive virtual reality game: A neurofeedback approach Toward emotionally adaptive virtual reality for mental health applications A new approach for self adaptive video game for rehabilitation-experiences in the amblyopia treatment Cognitive mechanisms underlying virtual reality exposure Multi-Context Generation in Virtual Reality Environments Using Deep Reinforcement Learning Towards procedural level generation for rehabilitation Learning Controllable Content Generators Automated psychological therapy using immersive virtual reality for treatment of fear of heights: a single-blind, parallelgroup, randomised controlled trial Co-creative level design via machine learning Design and evaluation of a self adaptive architecture for upper-limb rehabilitation Adaptation in serious games for upper-limb rehabilitation: an approach to improve training outcomes Pcgrl: Procedural content generation via reinforcement learning Personalized Virtual Reality Human-Computer Interaction for Psychiatric and Neurological Illnesses: A Dynamically Adaptive Virtual Reality Environment That Changes According to Real-Time Feedback From Electrophysiological Signal Responses Sentient sketchbook: computer-assisted game level authoring What is so frightening about spiders? Self-rated and self-disclosed impact of different characteristics and associations with phobia symptoms Reinforcement learning content generation for virtual reality applications Deep Reinforcement Learning for Procedural Content Generation of 3D Virtual Environments The use of virtual reality technology in the treatment of anxiety and other psychiatric disorders Automated virtual reality exposure therapy for spider phobia vs. in-vivo one-session treatment: A randomized non-inferiority trial Generation of Diverse Stages in Turn-Based Role-Playing Game using Reinforcement Learning Petalz: Search-based procedural content generation for the casual gamer Introducing wesad, a multimodal dataset for wearable stress and affect detection Intrinsically motivated reinforcement learning: A promising framework for procedural content generation Procedural content generation in games Effect of combined multiple contexts and multiple stimuli exposure in spider phobia: A randomized clinical trial in virtual reality Answer set programming for procedural content generation: A design space approach Procedural content generation via machine learning (PCGML) Applying Hindsight Experience Replay to Procedural Level Generation Reinforcement learning: An introduction Search-based procedural content generation: A taxonomy and survey Arachnophobic entomologists: when two more legs makes a big difference Experience-driven procedural content generation Adaptive virtual realitybased training: a systematic literature review and framework We acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC) and CISCO systems.