key: cord-0574518-i11l0i9r authors: Alambo, Amanuel; Padhee, Swati; Banerjee, Tanvi; Thirunarayan, Krishnaprasad title: COVID-19 and Mental Health/Substance Use Disorders on Reddit: A Longitudinal Study date: 2020-11-20 journal: nan DOI: nan sha: 515f8e2eb529bf9150067dd585e8e59dfd1e3cdb doc_id: 574518 cord_uid: i11l0i9r COVID-19 pandemic has adversely and disproportionately impacted people suffering from mental health issues and substance use problems. This has been exacerbated by social isolation during the pandemic and the social stigma associated with mental health and substance use disorders, making people reluctant to share their struggles and seek help. Due to the anonymity and privacy they provide, social media emerged as a convenient medium for people to share their experiences about their day to day struggles. Reddit is a well-recognized social media platform that provides focused and structured forums called subreddits, that users subscribe to and discuss their experiences with others. Temporal assessment of the topical correlation between social media postings about mental health/substance use and postings about Coronavirus is crucial to better understand public sentiment on the pandemic and its evolving impact, especially related to vulnerable populations. In this study, we conduct a longitudinal topical analysis of postings between subreddits r/depression, r/Anxiety, r/SuicideWatch, and r/Coronavirus, and postings between subreddits r/opiates, r/OpiatesRecovery, r/addiction, and r/Coronavirus from January 2020 - October 2020. Our results show a high topical correlation between postings in r/depression and r/Coronavirus in September 2020. Further, the topical correlation between postings on substance use disorders and Coronavirus fluctuates, showing the highest correlation in August 2020. By monitoring these trends from platforms such as Reddit, epidemiologists, and mental health professionals can gain insights into the challenges faced by communities for targeted interventions. The number of people suffering from mental health or substance use disorders has significantly increased during COVID-19 pandemic. 40% of adults in the United States have been identified suffering from disorders related to depression or drug abuse in June 2020 1 . In addition to the uncertainty about the future during the pandemic, policies such as social isolation that are enacted to contain the spread of COVID-19 have brought additional physical and emotional stress on the public. During these unpredictable and hard times, those who misuse or abuse alcohol and/or other drugs can be vulnerable. Due to the stigma surrounding mental health and substance use, people generally do not share their struggles with others and this is further aggravated by the lack of physical interactions during the pandemic. With most activities going online coupled with the privacy and anonymity they offer, social media platforms have become common for people to share their struggle with depression, anxiety, suicidal thoughts, and substance use disorders. Reddit is one of the widely used social media platforms that offers convenient access for users to engage in discussions with others on sensitive topics such as mental health or substance use. The forum-like structure of subreddits enables users to discuss a topic with specific focus with others, and seek advice without disclosing their identities. We conduct an initial longitudinal study of the extent of topical overlap between user-generated content on mental health and substance use disorders with COVID-19 during the period from January 2020 until October 2020. For mental health, our study is focused on subreddits r/depression, r/Anxiety, and r/SuicideWatch. Similarly, for substance use, we use subreddits r/Opiates, r/OpiatesRecovery, and r/addiction. We use subreddit r/Coronavirus for extracting user postings on Coronavirus. To constrain our search for relevance, we collect postings in mental health/substance use subreddits that consist of at least one of the keywords in a Coronavirus dictionary. Similarly, to collect postings related to mental health/substance use in r/Coronavirus, we use the DSM-5 lexicon [7] , PHQ-9 lexicons [12] , and Drug Abuse Ontology (DAO) [3] . We implement a topic modeling algorithm [2] for generating topics. Furthermore, we explore two variations of the Bidirectional Encoder Representations from Transformers (BERT) [5] model for representing the topics and computing topical correlation among different pairs of subreddits on Mental Health/Substance Use and r/Coronavirus. The topical correlations are computed for each of the months from January 2020 to October 2020. The rest of the paper is organized as follows. Section 2 discusses the related work, followed by section 3 which presents the method we followed including data collection, linguistic analysis, and model building. Further, we present in section 4 that according to our analysis, there is high correlation between topics discussed in a mental health or substance use subreddit and topics discussed in a Coronavirus subreddit after June 2020 than during the first five months of the year 2020. Finally, section 5 concludes the paper by providing conclusion and future work. In the last few months, there has been a high number of cases and deaths related to COVID-19 which led governments to respond rapidly to the crisis [10] . Topic modeling of social media postings related to COVID-19 has been used to produce valuable information during the pandemic. While Yin et al. [13] studied trending topics, Medford et al. [8] studied the change in topics on Twitter during the pandemic. Stokes et al. [11] studied topic modeling of Reddit content and found it to be effective in identifying patterns of public dialogue to guide targeted interventions. Furthermore, there has been a growing amount of work on the relationship between mental health or substance use and COVID-19. While [6] conducted a study of the prevalence of depressive symptoms in US adults before and during the COVID-19 pandemic, [1] studied the level of susceptibility to stressors that might lead to mental disorder between people with existing conditions of anxiety disorder and the general population. [4] conducted an assessment of mental health, substance use and suicidal ideation using panel survey data collected from US adults in the month of June 2020. They observed that people with pre-existing conditions of mental disorder are more likely to be adversely affected by the different stressors during the COVID-19 pandemic. We propose an approach to study the relationship between topics discussed in mental health/substance use subreddits and coronavirus subreddit. In this study, we crawl Reddit for user postings in subreddits r/depression, r/Anxiety, r/SuicideWatch, r/Opiates, r/OpiatesRecovery, r/addiction, and r/Coronavirus. To make our query focused so that relevant postings from each category of subreddits would be returned, we use mental health/substance lexicons while crawling for postings in subreddit r/Coronavirus; similarly, we use the glossary of terms in Coronavirus WebMD 2 to query for postings in the mental health/substance use subreddits. Table-1 shows the size of the data collected for each subreddit for three three-month to four-month periods. We build a corpus of user postings from January 2020 to October 2020 corresponding to each of the subreddits. For better interpretability during topic modeling, we generate bigrams and trigrams of a collection of postings for each month using gensim's implementation of skip-gram model [9] . We then train an LDA topic model with the objective of maximizing the coherence scores over the collections of bigrams and trigrams. As we are interested in conducting topical correlation among topics in a mental health/substance use subreddit and r/Coronavirus, we use deep representation learning to represent a topic from its constituent keywords. We employ a transformer-based bidirectional language modeling where we use two models: 1) a language model that is pre-trained on a huge generic corpus; and 2) a language model which we tune on a domain-specific corpus. Thus, we experiment with two approaches: 1. We use a vanilla BERT [5] model to represent each of the keywords in a topic. A topic is then represented as a concatenation of the representations of its keywords after which we perform dimensionality reduction to 300 units using t-SNE. 2. We fine-tune a BERT model on a sequence classification task on our dataset where user postings from Mental health/Substance use subreddit or r/Coronavirus are labeled positive and postings from a control subreddit are labeled negative. For subreddits r/depression, r/Anxiety, and r/SuicideWatch, we fine-tune one BERT model which we call MH-BERT and for subreddits r/opiates, r/OpiatesRecovery, and r/addiction, we fine-tune a different BERT model and designate it as SU-BERT. We do the same for subreddit r/Coronavirus. Finally, the fine-tuned BERT model is used for topic representation. Once topics are represented using a vanilla BERT or MH-BERT/SU-BERT embedding, we compute inter-topic similarities among topics in an MH/SU subreddit with subreddit r/Coronavirus for each of the months from January 2020 to October 2020. We report our findings using vanilla BERT and a fine-tuned BERT model used for topic representation. Figure-1 and Figure-2 show the topical correlation results using vanilla BERT and a fine-tuned BERT model. We can see from the figures that there is a significant topical correlation between postings in a subreddit on mental health and postings in r/Coronavirus during the period from May 2020 -Sep 2020 with each of the subreddits corresponding to a mental health disorder showing their peaks at different months. For substance use, we see higher topical correlation during the period after the month of June 2020. While the results using a fine-tuned BERT model show similar trends as vanilla BERT, they give higher values for the topical correlation scores. We present a pair of groups of topics that have low topical correlation and another pair with high topical correlation. To illustrate low topical correlation, we show the topics generated for r/OpiatesRecovery and r/Coronavirus during APR -JUN (Table-2 ). For high topical correlation, we show topics in r/Suicidewatch and r/Coronavirus for the period JUN -AUG (Table-3 From Figure-1 and Figure-2 , we see Coronavirus vs depression has highest topical correlation in September followed by May. On the other hand, we see the fine-tuned BERT model give bigger absolute topical correlation scores than vanilla BERT albeit the topics and keywords are the same in either of the representation techniques; i.e., the same keywords in a topic render different representations using vanilla BERT and fine-tuned BERT models. The different representations of the keywords and, hence the topics yield different topical correlation scores as seen in Figure-1 and Figure-2 . The reason we generally see higher topical correlation scores with a finetuned BERT based representation is because a fine-tuned BERT has a smaller semantic space than a vanilla BERT model leading to keywords across different topics to have smaller semantic distance. According to our analysis, high topical overlap implies close connection and mutual impact between postings in one subreddit and postings in another subreddit. In this study, we conducted a longitudinal study of the topical correlation between social media postings in mental health or substance use subreddits and a Coronavirus subreddit. Our analysis reveals that the period including and following Summer 2020 shows higher correlation among topics discussed by users in a mental health or substance use groups to those in r/Coronavirus. Our analysis can give insight into how the sentiment of social media users in one group can influence or be influenced by users in another group. This enables to capture and understand the impact of topics discussed in r/Coronavirus on other subreddits over a course of time. In the future, we plan to investigate user level and posting level features to further study how the collective sentiment of users in one subreddit relate to another subreddit. Our study can provide insight into how discussion of mental health/substance use and the Coronavirus pandemic relate to one another over a period of time for epidemiological intervention. Do pre-existing anxiety-related and mood disorders differentially impact covid-19 stress responses and coping Latent dirichlet allocation Predose: a semantic web platform for drug abuse epidemiology using social media Mental health, substance use, and suicidal ideation during the covid-19 pandemic-united states Bert: Pre-training of deep bidirectional transformers for language understanding Prevalence of depression symptoms in us adults before and during the covid-19 pandemic let me tell you about your mental health!" contextualized classification of reddit posts to dsm-5 for web-based intervention An "infodemic": Leveraging High-Volume twitter data to understand public sentiment for the COVID-19 outbreak Distributed representations of words and phrases and their compositionality The neglected dimension of global security-a framework for countering infectious-disease crises Public priorities and concerns regarding covid-19 in an online discussion forum: Longitudinal topic modeling Semi-supervised approach to monitoring clinical depressive symptoms in social media IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining Detecting topic and sentiment dynamics due to COVID-19 pandemic using social media