key: cord-0552857-7krgydn4 authors: Lin, Baihan; Bouneffouf, Djallel; Cecchi, Guillermo; Tejwani, Ravi title: Neural Topic Modeling of Psychotherapy Sessions date: 2022-04-13 journal: nan DOI: nan sha: bd139af94d59b80ae18e66fe3a6a7e03e6f21dc5 doc_id: 552857 cord_uid: 7krgydn4 In this work, we compare different neural topic modeling methods in learning the topical propensities of different psychiatric conditions from the psychotherapy session transcripts parsed from speech recordings. We also incorporate temporal modeling to put this additional interpretability to action by parsing out topic similarities as a time series in a turn-level resolution. We believe this topic modeling framework can offer interpretable insights for the therapist to optimally decide his or her strategy and improve the psychotherapy effectiveness. Mental health remains an issue in all countries and cultures across the globe. According to the National Institute of Mental Health (NIMH), nearly one in five U.S. adults live with a mental illness (52.9 million in 2020). One of the major causes of the mental illness is depression [1] , followed by suicide which is the second cause of death among young people [2] . It is clear that there is a need for new innovative solutions in this domain. Psychotherapy is a term given for treating mental health problems by talking with a mental health provider such as a psychiatrist or psychologist [3] . To reduce the workload on mental health provider, natural language processing (NLP) is more and more adopted [4] . Noting that psychotherapy has been the first discipline using NLP. It started with a chat bot ELIZA [5] capable of mimicking a psychotherapist. Another chatbot, Parry [6] , was able of simulating an individual with Schizophrenia. Natural language processing including topic modeling has shown interesting results on mental illness detection. In [7] the authors demonstrate that Latent Dirichlet Allocation (LDA) can uncover latent structure within depression-related language collected from Twitter. Authors [8] shows the add-value of using social media content to detect Post-Traumatic Stress Disorder. Although previous work demonstrates the effectiveness of classical topic modeling, they are no longer the state-of-the-art. In recent years, deep learning progresses the fields and the Neural Topic Modeling shows up as the consistent better solution compared to the classical Topic modeling [9] . In this context, we propose in this work to use Neural Topic Modeling to learn the topical propensities of different psychiatric conditions from the psychotherapy session transcripts. The goal here is at fist to evaluate the existing techniques on neural topic modeling and find the most adapted one to this domain. Second we incorporate temporal modeling to put additional interpretability, where the goal of this framework is to offer interpretable insights for the therapist to optimally decide on psychotherapy strategy. In natural language processing and machine learning, a topic model is a type of statistical graphical model that help uncover the abstract "topics" that appear in a collection of documents. The topic modeling technique is frequently used in text-mining pipeline to unravel the hidden semantic structures of a text body. There are quite a few neural topic models evaluated in this work. The Neural Variational Document Model (NVDM) [9] is an unsupervised text modeling approach based on variational auto-encoder. [10] further shows that among NVDM variants, the Gaussian softmax construction (GSM) achieves the lowest perplexity in most cases, and thus recommended. We denote it as NVDM-GSM. Unlike traditional variational autoencoder based methods, Wasserstein-based Topic Model (WTM) uses the Wasserstein autoencoders (WAE) to directly enforce Dirichlet prior on the latent document-topic vectors [11] . Traditionally, it applies a suitable kernel in minimizing the Maximum Mean Discrepancy (MMD) to perform distribution matching. We can this variant WTM-MMD. Similarly, we can replace the MMD priors with a Gaussian Mixture prior and apply Gaussian Softmax on top of it. We denote this method, WTM-GMM. In order to tackle the issue with large and heavy-tailed vocabularies, the Embedded Topic Model (ETM) [12] models each word with a matched categorical probablity distribution given the inner product between a word embedding and a vector embedding of its assigned topic. To avoid imposing impropoer priors, the Bidirectional Adversarial Training Model (BATM) applies the bidirectional adversarial training into neural topic modeling by constructing a two-way projection between the document-word distribution and the document-topic distribution [13] . Figure 1 is an outline of the analytic framework. During the session, the dialogue between the patient and therapist are transcribed into pairs of turns (such as the example in Figure 2 ). We take the full records of a patient, or a cohort of patients belonging to the same condition. We either use it as is before the feature extraction, or we truncate them into segments based on timestamps or topic turns. As you can see, the original format is in pairs of dialogues. We can extract the features in three ways: first, we can use the full pairs of dialogues; second, we can only extract what the patient says; or the third option, we only extract what the doctor says. The three feature formats all have their pros and cons. The dialogue format contains all information, but the intents within the sentences come from two individuals, so they might mix together. The patient format contains the full narrative of the patients, which is usually more coherent, but it's only part of the story. The therapist format, which people in computational psychiatry also believes to be some kind of semantic labels of what the patient feels, can be informative, but they can also be sometimes too simplistic. When we have the features, we fit them into the topic models. The end results of the topic modeling would be a list of weighted topic words, that tells us what the text block is concerned with. These knowledges are usually very informative and interpretable, thus important in psychotherapy applications. Here are a few downstream tasks and user scenarios that can plugged to our analytical frameworks. We can either use these extracted weighted topics to inform whether the therapy is going the right direction, whether the patient is going into certain bad mental state, or whether the therapist should adjust his or her treatment strategies. This can be built as an intelligent AI assistant to remind the therapist of such things. Some topics can also be off-limit taboos, such as those in suicidal conversations, so if such terms arises from the topic modeling (say, a dynamic topic modeling), it can be flagged for the doctor to notice. Algorithm 1 Temporal Topic Modeling (TTM) 1: Learned topics T as references 2: for i = 1,2,· · · , N do 3: for Tj ∈ topics T do 5: Topic score W t i j = similarity(Emb(Tj), Emb(S t i )) 7: end for 8: end for Given the learned topics, we can backtrack the transcript to get a turn-resolution topic scores. Algorithm 1 outlines the pipeline of our temporal topic modeling analysis (TMM). Say, if we have learned 10 topics, the topic score will be a vector of 10 dimensions, with each dimension corresponding to some notion of likelihood of this turn being in this topic. Because we want to characterize the directional property of each turn with a certain topic, we compute the cosine similarity of the embedded topic vector and the embedded turn vector, instead of directly inferring the probability as traditional topic assignment problem (which would be more suitable if we merely want to find the assignment of the most likely topic). In the result section, we will present the temporal modeling of the Embedded Topic Model (ETM), but this analytic pipeline can in principle be applied to any learned topic models. This Embedded Topic Model is special because, like our approach here, it also models each word with a categorical distribution whose natural parameter is the inner product between a word embedding and an embedding of its assigned topic. We use the same word embedding here (Word2Vec [16] ) to embed our topic and turns. The Alex Street Counseling and Psychotherapy Transcripts dataset 1 consists of transcribed recordings of over 950 therapy sessions between multiple anonymized therapists and patients. This multi-part collection includes speech-translated transcripts of the recordings from real therapy sessions, 40,000 pages of client narratives, and 25,000 pages of reference works. These sessions belong to three types of psychiatric conditions: anxiety, depression and schizophrenia. Each patient response turn S p i followed by a therapist response turn S t i is treated as a dialogue pair. In total, these materials include over 200,000 turns together for the patient and therapist and provide access to the broadest range of clients for our linguistic analysis of the therapeutic process of psychotherapy. In this section, we compare the five state-of-the-art neural topic modeling approaches introduced above, and analyze their learned topics. We separate the transcript sessions into three categories based on the psychiatric conditions of the patients (anxiety, depression and schizophrenia), and train the topic models over each of them for over 100 epochs at a batch size of 16. As in the standard preprocessing of topic modeling training, we set the lower bound of count for words to keep in topic training to be 3, and the ratio of upper bound of count for words to keep in topic training to be 0.3. The optimization and evaluation procedure follows the same implementation for [13] 2 . Topic models are usually evaluated with the likelihood of heldout documents and topic coherence. However, it was shown that a higher likelihood of held-out documents does not necessarily correlate to the human judgment of topic coherence [17] . Therefore, we adopt a series of more validated measurements of topic coherence and diversity by following [14] . In the first evaluation, we compute four topic embedding coherence metrics (cv, cw2v, cuci, cnpmi) to evaluate the topics generated by various models (as outlined in [14] ). The higher these measurements, the better. In all experiments, each topic is represented by the top 10 words according to the topic-word probabilities, and the four metrics are calculated using Gensim library [18] 3 . Other than these four topic embedding coherence evaluation provided by the Gensim package, we also included two other useful metrics. [15] proposed a robust and automated coherencce evaluation metric for identifying such topics that does not rely on either additional human annotations or reference collections outside the training set. This method computes an asymmetrical confirmation measure between top word pairs (smoothed conditional probability). In addition, we compute the topic diversity by taking the ratio between the size of vocabulary in the topic words and the total number of words in the topics. Similarly, the higher these two measures are, the better the topic models. Tables 1 and 2 summarize the quantitative evaluations. We first observe that the different measures of the coherence gives different rankings of the topic models, but there are a few models that perform relatively well across the metrics. For instance, the Wasserstein-based Topic Models and Embedded Topic Models both yield relatively high topic coherence and topic diversity. We then examine the interpretability of the topic models by plotting out the learned topics using a word cloud (Figure 3 . Each word cloud represents the word frequencies in the learned 10 topics. Therefore, if we see many big cohorts, the diversity of the topic tends to be low, but the coherence will be higher. However, the evaluation of the coherence and diversity is no longer a topic of interest in this visualization, the represented words are. We observe that these neural topic models tend to locate reasonable topics, such as family types (parent, child, mom, sister), emotion types (like, think, kind, know), finance types (work, buy, house, spend) and time-related types (sunday, party, birthday, saturday, moment). This shows promises in the topic modeling approach in psychotherapy. As introduced earlier, in this section, we present result on temporal modeling upon the learned Embedded Topic Model. Given the learned topics from the last step, we can compute a 10-dimensional topic score for each turn corresponding to the 10 topics. The higher the score is, the more positively correlated this turn is with this topic. Given this time-series matrix, we can potentially probe the dynamics of the these dialogues within the topic space. As in Figure 4 , we plot the average trajectories of different psychiatric conditions across the topic space of topics 0, 1 and 2. We notice that the trajectories of the patient and therapist are more separable from one another in anxiety and depression sessions, but more entangled in the schizophrenia sessions. This is the first step of a potential turn-level resolution temporal analysis of topic modeling. We can be generalize in a sense that with this approach one can go over your sessions (as a therapist) and analyze the dynamics afterwards. We also perform a four-way ANOVA upon the topic scores as time-series sequences. Figure 5 demonstrates the difference of the dynamics of the topic across the psychiatric conditions. We observe that they vary by both the disorders and topics, and there appears to be certain trends or interesting dynamics along the temporal dimension (x-axis in each subplot). This suggest that the topics parsed by this topic model is diverse with respect to its dynamics, and can potentially offer useful insights. Then, we can better understand what these topics are, by parsing out the highest scoring turns in the transcripts that correspond to each topics. For instance, here are the interpretations from the top scoring turns in the anxiety sessions: topic 0 is mostly vocal sounds; topic 1 is low-energy exercises; topic 2 is fear; topic 3 is medication planning; topic 4 is the past, control and worry; topic 5 is other people and some objects; topic 6 is just well being; topic 7 is music, headache and emotion; topic 8 is stress; and topic 9 is fear and responsibilities. For depression, topic 0 is time; topic 1 is husband and anger; topic 2 is time and distance; topic 3 is energy and stress levels; topic 4 is self-esteem; topic 5 is money and time; topic 6 is age and time; topic 7 is mood and time; topic 8 is other peoples and objects; topic 9 is holidays and vocal sounds. For schizophrenia, topic 0 is family; topic 1 is extreme terms ("exactly", "anybody", "everyone"); topic 2 is energy level and positivities; topic 3 is people and family; topic 4 is operational stuffs; topic 5 is calm things ("safe", "nice to meet people", "it's nice", "meet again"); topic 6 and 9 are hard topics ("house arrest", "self esteem", "anybody in particular", "spend money", "what's wrong with Texas"); topics 7 to 9 are others. In this work, our first goal is to compare different neural topic modeling methods in learning the topical propensities of different psychiatric conditions. We first observe that the different measures of the coherence gives different rankings of the topic models, but there are a few topic models that perform relatively well across metrics. For instance, Wasserstein Topic Models and Embedded Topic Models both yield relatively high topic coherence and diversity. Our second goal is to parse topics in different segments of the session, which allows us to incorporate temporal modeling and add additional interpretability. For instance, these allows us to notice that the session trajectories of the patient and therapist are more separable from one another in anxiety and depression sessions, but more entangled in the schizophrenia sessions. This is the first step of a potential turn-level resolution temporal analysis of topic modeling. We believe this topic modeling framework can offer interpretable insights for the therapist to improve the psychotherapy effectiveness. Next steps include predicting these topic scores as states (such as [19] ), training chatbots as reinforcement learning agents given these states (like [20, 21, 22] ) and studying its relation with other inference anchors (e.g. working alliance [23] ). Changes in mental health among psychiatric patients during the covid-19 pandemic in hong kong-a cross-sectional study Risk factors for deliberate self-harm and suicide among adolescents and young adults with first-episode psychosis Designing personality-adaptive conversational agents for mental health care Natural language processing in psychiatry: the promises and perils of a transformative approach From eliza to xiaoice: challenges and opportunities with social chatbots A brief history of chatbots Beyond lda: exploring supervised topic modeling for depression-related language in twitter Synonym, topic model and predicate-based query expansion for retrieving clinical documents Neural variational inference for text processing Discovering discrete latent topics with neural variational inference Topic modeling with wasserstein autoencoders Topic modeling in embedding spaces Neural topic modeling with bidirectional adversarial training Exploring the space of topic coherence measures Optimizing semantic coherence in topic models Distributed representations of words and phrases and their compositionality Reading tea leaves: How humans interpret topic models Gensim-statistical semantics in python Predicting human decision making in psychological tasks with recurrent neural networks A story of two streams: Reinforcement learning models from human behavior and neuropsychiatry Unified models of human behavioral agents in bandits, contextual bandits and rl Models of human behavioral agents in bandits, contextual bandits and rl Deep annotation of therapeutic working alliance in psychotherapy