key: cord-0191232-akdb16ha
authors: Alvarez, Richard; Bhatt, Paras; Zhao, Xingmeng; Rios, Anthony
title: Turning Stocks into Memes: A Dataset for Understanding How Social Communities Can Drive Wall Street
date: 2022-03-16
journal: nan
DOI: nan
sha: 870c1c8341f6e5c9589be7368ec76a80a08f5bcd
doc_id: 191232
cord_uid: akdb16ha

Who actually expresses an intent to buy GameStop shares on Reddit? What convinces people to buy stocks? Are people convinced to support a coordinated plan to adversely impact Wall Street investors? Existing literature on understanding intent has mainly relied on surveys and self reporting; however there are limitations to these methodologies. Hence, in this paper, we develop an annotated dataset of communications centered on the GameStop phenomenon to analyze the subscriber intentions behaviors within the r/WallStreetBets community to buy (or not buy) stocks. Likewise, we curate a dataset to better understand how intent interacts with a user's general support towards the coordinated actions of the community for GameStop. Overall, our dataset can provide insight to social scientists on the persuasive power to buy into social movements online by adopting common language and narrative. WARNING: This paper contains offensive language that commonly appears on Reddit's r/WallStreetBets subreddit.

There has been substantial research in persuasion and social engineering with a particular interest in how individuals can be convinced to behave (i.e., persuasion), buy a product or support some idea. For instance, Caldas et al. uses surveys to understand how the political framing of online discourse can impact "buy-in" to specific ideas. Likewise, in Wang et al., the authors evaluate how content (e.g., videos, text, etc.) and content creator characteristics affect the likelihood that people support or purchase a product. An interesting research direction is to understand the persuasive techniques used on social media platforms (e.g., Reddit) to get users to buy a product or support a specific social campaign. Before we can understand what drives persuasion, we must detect who intends to buy a product or at least supports the general ideology behind it. Hence, in this paper, we develop a new resource that can help researchers better understand purchase intentions and expressions of support on Reddit.

Surveys and interviews have been used in prior research to understand persuasion, which requires knowing the user's intent to buy products or support campaigns. For example, Figure 1 : An example conversation on Reddit where a user is convinced to buy GME stock.

research in online marketing focuses on measuring persuasiveness of an argument using self-reported surveys (Caldas et al. 2019; Gerlach, Buxmann, and Dinev 2019) , or by quantifying how responsive individuals are to tailored arguments in an online survey (Axt, Landau, and Kay 2020; Ormond, Warkentin, and Crossler 2019; Wang et al. 2021b ). While there have been recent advances in understanding persuasion and attitudes towards products and marketing campaigns (Wang et al. 2021a; Pignot, Nicolini, and Thompson 2020) ), the studies are not entirely practical (e.g., using executive messaging for interviews (Pignot, Nicolini, and Thompson 2020) ) or do not extract direct intentions to purchase (Wang et al. 2021a) . Hence, the motivation for this paper is to provide social researchers with a dataset to extract social support for an online campaign as well as users intentions; thereby providing researchers with a new way to conduct quantitative studies about buying intentions and support, which can potentially be used for downstream research on persuasion and social manipulation.

Specifically, we introduce a new dataset that uses comments from the Reddit community r/WallStreetBets (WSB), which substantially influenced GameStop 1 in early 2021. An example conversation that may appear on r/WallStreetBets is shown in Figure 1 . We can see that the last comment directly mentions someone's positive intent to buy a product (GME shares). One specific use case for our dataset is to facilitate the development of machine learning 1 We use GME throughout the paper to refer to the GameStop stock. GME is the ticker symbol for GameStop. models to extract these direct intentions. Our aim for this dataset is to help researchers study both direct intent (General Intent) and the gradual evolution of persuasion by measurements of supporting metrics (General Support) tied to social manipulation. For instance, in Figure 1 the user was persuaded to buy the GME shares. Is this a one-off item, or are there specific patterns that exist that cause someone to buy? Hence, an important use case of our dataset is to use a model developed to predict intent to identify all positive and negative purchase intentions which can be used as a dependent variable for various discourse analyses.

Particularly, our dataset captures both purchase Intent as well as varying degrees of Support for GME and related campaigns. As previously stated, Intent is related to whether someone actually intends (or already did) purchase GME shares, while Support is focused on the general anti-Wall Street narrative. Specifically, the GameStop narrative grew on Reddit into a David vs. Goliath-Esque narrative between the Redditors (representing ordinary investors) and big hedge funds (representing faceless corporations). Starting from December 2020, WSB saw an explosion of threads and comments centered on GameStop being a struggling videogame retail company. Rather than simply a stock recommendation, the stock experienced a short squeeze where according to the narrative, large faceless hedge funds were trying to kill the ailing corporation Gamestop by betting against its success. Whether users saw themselves as heroes, defending the company and main street from large scale investors and forcing the price up; the desire to "stick it to the establishment" (Business Insider 2021) and exact revenge for the 2008 market collapse, or simply exhibiting the fear of missing out (referred to as FOMO on WallStreetBets) users were persuaded to purchase the stock. For this dataset, we collected data on developing events of GME, the favorite stock of WallStreetBets during the period.

What makes this circumstance particularly interesting from a persuasion perspective is that each individual who ends up purchasing GameStop stock is heavily incentivized to sell when it is high (prior to the event, the stock was worth only $18 a share, while it peaked at more than $480 per share). Yet, even after catching more than a tenfold increase in price (and therefore considerable profit), individuals were persuaded to hold onto the stock together. As long as the retail investor held, the more damage that could be done to the hedge funds and the greater the profit that could be extracted. The factors that contribute to this decentralized unity amongst the members of these online communities is strong, causing individuals to ignore the financial incentives of selling out are interesting. Therefore, an individual would have to be convinced to "buy-in" to the investment opportunity. We hope that our dataset to detect intent and support can be used to understand such phenomena.

Overall, this dataset and paper make several contributions to the existing literature. First, this dataset provides an easily accessible and annotated dataset of peer-to-peer conversations online between anonymous and semi-anonymous individuals. Such availability will assist researchers, particularly in replicability and transparency for their studies. Furthermore, the accessibility of the dataset will hopefully en-courage further studies into social intent and online social manipulations. One advantage this study has over similar ones is the verification of intent. Individuals who express intent to engage in action may not always follow through. This dataset would be possible to track intent and eventual completion of the action (a transition not easily traceable in existing literature). Second, we provide the results of several baseline models showcasing that the models trained on our data can provide accurate predictions for Intent and Support. Third, we provide detailed future use-cases for our dataset to answer interesting social science-related questions.

This section describes the main areas of research related to this paper: Stocks and Financial NLP, Aspect-Based Sentiment Analysis (ABSA), and Stance Detection.

Research in forecasting stock market activity has been a mainstay of NLP-based studies that leverage content from the financial industry. For example, Das and Chen use NLP to facilitate "news-based trading," wherein analysts seek to isolate financial news that affects stock prices and/or market activity. Seo, Giampapa, and Sycara used Natural Language Processing (NLP) data, processed with various combinations of feature extraction (e.g., Latent Semantic Analysis and a Naive Bayes classifier and a weighted-majority voting ensemble, to analyze news articles, with the optimal combination yielding a 79% accurate classification of articles that signaled an increase or decrease in stock prices. Similarly, Seo, Giampapa, and Sycara processed text from webbased stock-discussion bulletin boards, analyzed the output using the Naive-Bayes-based classifier algorithm and multiple runs through a genetic algorithm, and generated significant (p<0.0001) excess returns.

In another study, Yıldırım et al. classified financial news articles as "hot" (significant) and "non-hot" (nonsignificant) to study their impact as predictors on stock price forecasting. In time, multiple NLP-based approaches were used to explore the predictive value of various international accounting and finance-related text sources. Zhai, Hsu, and Halgamuge applied part-of-speech features and TF-IDF-weighting, enhanced by Gaussian-radial-basisfunction-kernel and polynomial-kernel supervised SVM classifiers, to confirm correlations between the textual content of financial news articles and stock-price trends in Australia. Lugmayr and Gossen proposed analyzing broker newsletters with a German-based sentiment analysis SVM (LUGO Sentiment Indicator) to predict Deutscher Aktienindex German Stock Index (DAX 30) trading activity levels. Hagenau, Liebmann, and Neumann employed bi-normal separation-based feature selection, enhanced by an SVM classifier, to predict stock-price changes signaled by German financial news with 71.8% precision. Argentine and Brazilian currency trends were successfully predicted by Jin et al., using their Forex-Foreteller system, which employed topic clustering, sentiment analysis (based on the Loughran-McDonald, and AFINN sentiment-analysis dictionaries) and regression analysis.

Finally, there has been a recent surge in research exploring the r/WallStreetBets community. For example, Buz and de Melo evaluate whether people should take investment advice from the community, finding that many buy signals on Reddit can result in gains. Wang and Luo use data from r/WallStreetBets to predict stock movement. Mendoza-Denton qualitatively explore r/WallStreetBet's general sentiment towards the anti-establishment narrative. Overall, our work expands on prior work by considering both the task of predicting whether someone intends to buy a particular stock as well as extracting their Support for the GME-related "take down the establishment" campaign for quantitative analysis. Our dataset allows for analyses of both financial interest and general political/social ideology. Moreover, our work is less focused on how the social behaviors affect the company and instead how the individual is persuaded to make the purchase on a micro level. Ultimately, our dataset will provide subsequent research on conversation-level and user-level persuasion and purchase-intent on Reddit.

There has been substantial work in understanding various incarnations of Support in the NLP community. Support can be thought of as some sort of valence (sentiment) towards a specific entity (ABSA) or simply whether someone agrees or disagrees with a specific topic (Stance Detection). Demszky et al. apply ABSA methods to tweets about US mass shootings topic, where the topic was politically discussed from different viewpoints according to the locations of events with the contrasting use of the terms "terrorist" and "crazy", that contribute to polarization. Chen et al. show that most recent ABSA approaches rely on state-of-the-art supervised approaches combining complex layers of neural network models (e.g., transformers) to classify labels representing aspects from text elements with the standard sentiment (i.e., positive, negative).

Stance detection refers to the task of classifying a piece of text as either being in support, opposition, or neutral towards a given target. The most well-known data for political stance detection is published by the SemEval 2016 (Mohammad et al. 2016) . The paper describing the data set provides a highlevel review of approaches to stance detection using Twitter data. The best user-submitted system was a neural classifier from Zarrella and Marsh which utilized a pre-trained language model on a large amount of unlabeled data. An important contribution of this study was using pre-trained word embeddings from an auxiliary task where a language model was trained to predict a missing hashtag from a given tweet. Likewise, Wei et al. show that convolutional neural networks also perform accurately for the task.

Contrary to prior work on ABSA and Stance detection, our work differs in one important aspect. Specifically, we label positive support towards and event/idea even if that idea is not explicitly mentioned within the text. For example, the comment "I'm never going to sell my GME shares!!!" would show positive support for. Traditional ABSA and methods may only calculate a sentiment score with respect to nouns in the sentence (e.g., GME Shares). We note that there has been some recent work on implicit sentiment using Connota-tion Frames (Sap et al. 2017) . However, the work is focused at verb understanding and does not perform classification over an entire comment.

In this section we discuss the methodology involved in data collection, annotation, and evaluating annotation quality.

We collected data using the Python Pushshift.io API Wrapper (PSAW) (Baumgartner et al. 2020 ) library to collect submissions and comments in the r/WallStreetBets community. Submissions refer to descriptive posts made on the discussion board of the community by its members. Each submission has a 'title' and a corresponding 'body' of text that represents the main idea discussed within the submission. We have captured these data items separately in our dataset. This is because compared to other social media communities (Twitter, Facebook, etc.) Reddit is more of a discussion-based forum where people can talk about anything.To interact with one another, redditors will join smaller subject based groups referred to as subreddits. We scraped the r/WallStreetBets subreddit for all threads during the period January 1, 2021, to March 1, 2021, that contained some mention of GameStop or GME. The collected data is comprised of two complementing datasets, the Reddit posts, which included the author, postdate, ID, posting category, number of comments, author cross-posts, whether the comment was pinned, the comment score, the post submission text, the post ID number, the post title, and the ratio of upvotes to downvotes. The second set tracks the accompanying post ID and lists the commenter ids, the time and date of the comment, the reputation score of the comment, and the comments themselves. Both datasets are linked through the Post ID number. Overall, we obtained a total of 71,075 submissions and 100,069 comments. From the entire dataset of comments, we randomly sampled 5,000 to annotate.

For this study, we first took an exploratory approach to analyze the comments for the WSB community. The comments corpus was randomized utilizing a randomization python script. Initially, a pilot annotation was done with ten observations to establish relevant annotation rules. Comments were reviewed for two primary characteristics; the intent of the comment (Does the comment indicate an intention to purchase or has purchased GameStop during the craze period?) and level of support (Does the comment indicate support for the WSB community or support for the David vs. Goliath narrative?). These two annotation categories were chosen to capture both the level of community support as well as to identify if the support had a tangible impact. A second annotator was brought in to annotate another set of 50 observations using the pilot rule set for refinement. After multiple revisions, both parties would agree to the finalized annotation ruleset and guidelines. The final guidelines can be found in the Appendix.

Intent Annotations. The first annotation category, Intent, refers to the general intent to purchase the stock. We wanted to capture all statements that suggest the individual plans on or has already purchased the GME investment for this category. The category is broken into five different annotations: Yes, Maybe, Informative, Unknown, and No. We describe each category below: • Yes indicates there is clear intent to purchase or has already purchased GME shares in the recent time-period. • Maybe indicates uncertainty that the individual has the stock, but the context hints of a possibility of purchase or already owning the stock. • Informative posts are meant to capture potential moderator or bot comments. These are meant to inform users without any personal opinions or biases visible. No emotions, no sides taken, only sharing information. • Unknown It is not clear one way or another the intent to purchase or currently owning the stock. This can serve as a catchall if unable to annotate to any other category, such as completely unrelated posts. • No indicates a clear disinterest, or no intention to purchase the stock. The individual does not and will not purchase the stock. Alternatively, the individual could be betting against the stock, hoping to bring it down. Support Annotations. The second annotation category, Suport, is to measure the degree of buy-in the individual has with the current narrative. We defined the narrative as either support for the Us vs. them mentality (that is, support for GameStop because it hurts the institutions), support for the hype (that is, support the camaraderie, to be part of the moment, or to see it as a historical moment in the making), or alternatively sees GameStop as a legitimate investment. This category was broken into the following annotation classes:

• Yes posts indicate clear support for the GME narrative.

For GameStop as a company, the movement or the post could also show hostility towards the counter companies. An individual post can be a combination of any of these two categories. The following comment illustrates one combination found in our dataset:

"I agree, I only one .02 shares of GME, and I did that specifically for this reason. After the squeeze, HF know what to expect going forward, so the momentum here is pretty much done. I'll hold my measly little .02 because I think once COVID ends we could see potential for growth, but I feel bad for those here that put thousands into this stock after the price was over $200." In this comment, the individual states they own some shares of GameStop outright, causing intent to be "Yes." They also depict a lack of belief in the future success of GameStop and only hope. Showing a lack of support or belief that GME will continue its rise; therefore, support is rated as "No."

The following comment clearly supports the GME movement as he claims GME will moon (a term to suggest the stock will skyrocket); hence, support is "Yes." "You will make 10% while watching everyone else moon on GME." However, we cannot confirm one way or another, whether the individual has GME stock, so the Intent would be classified as "Unknown."

After the annotation rules were agreed upon and the annotation guidelines were completed, annotation began on the set of 5000 randomly selected comments. Annotators completed the process independently of one another. Annotators completed 3000 annotations with an agreement. It was measured by Cohen's Kappa, of .81 for intent and .72 for support. Following the annotation, both annotators worked together to adjudicate comments that had differing annotations. The remaining 2000 comments were then done without comparing to measure agreement by a single annotator. Finally, an outside third-party annotator was given the finalized instructions and asked to annotate a random set of 100 observations along with the annotation guidelines to measure the external validity of the annotation process. The third-party annotator achieved a Cohen's kappa of .76 for Intent and .65 for Support when compared to the 100 adjudicated annotations. Overall, there were a total of 5000 comments annotated, 3000 were annotated with two individuals, and 100 were annotated with three individuals.

The final dataset statistics can be found in Table 1 . We find that the majority of comments have an Intent of "Unknown". However, a large proportion of comments mention "Yes", while the "No" is the smallest Intent category. Intuitively, many comments are just discussing the event without discussing an actual purchase. But still, nearly one-fifth of the comments to express an intent to buy. We make similar findings for Support. But, "Yes" is the largest category, instead of "Unknown.". Again, this makes sense because the r/WallStreetBets started the GameStop hype. Hence, most of the members support it.

In this section, we describe and evaluate several baseline models. Overall, our goal is to show that models can be trained to learn the categories we annotated. If models can not learn anything better than random, the dataset will be of little use to both social scientists and computational social science researchers.

We explore four baselines on our dataset: a Linear SVM, RoBERTa, and two random baselines (Uniform and Stratified). We describe each baseline below: Linear SVM. We trained a Linear SVM using the term frequency-inverse document frequency-weighting (TF-IDF) of unigrams and bigrams (i.e., single words, "wsb", and pairs of words like "GameStop sucks" are used as features) and L2 regularization. TF-IDF is a statistical measure that weights how important words are in a corpus. Furthermore, we searched for the best C value from the set {0.0001, 0.001, 0.01, 0.1, 1, 10} using a validation dataset. The SVM is implemented using the LinearSVC classifier in scikit-learn (Pedregosa et al. 2011).

RoBERTa. We fine-tuned RoBERTa (Liu et al. 2019) from the Huggingface libary (Wolf et al. 2019) , specifically the roberta-base variant. Moreover, we used the last layer's CLS token which is passed to a softmax layer that is finetuned for up to 25 epochs. The model was checkpointed after each epoch, and the best version was chosen using the validation data. We used cross entropy loss as the objective function, a mini-batch size of 8, and learning rate of 2e-5 (other hyper-parameters same as (Liu et al . 2019) 

We use a 70/10/20 split of the 5000 comments into a training, validation, and test dataset, respectively. Furthermore, we evaluate the model using precision, recall, and F1 Score for each class independently, along with the aggregate measure Macro F1-score.

The results for Intent are shown in Table 2 . Overall, we find that both the Linear SVM and RoBERTa outperform the random baselines with regards to the Macro F1 metric. Furthermore, RoBERTa outperforms the Linear SVM with regards to Macro F1 score by nearly 15%. Unsurprisingly Unknown represents the easiest to predict as it was the largest class for our analysis. However, we are pleased to see a considerable increase in F1 score compared to the baseline, particularly for Yes, No, and Maybe classes. These Intent results show that machine learning models from our annotated data make the classes learnable. Hence, other researchers can use the predictions for potential downstream studies.

For Support results are shown in Table 3 . Again, we find that both the Linear SVM and RoBERTa models outperform the random baselines, indicating that the classes are learnable. Specifically, we find that RoBERTa substantially outperforms the Linear SVM baseline by more than 14% with regard to the Macro F1 score. Unsurprisingly, the worse performance is found for the "No" class with an F1 of .268 because it is the most infrequent within the dataset. Moreover, interestingly, the Informative class is the most accurate class for Support with an F1 of .800, followed by the most common class Yes with an F1 score of .771.

In this section, we aim to find the most predictive phrases for each category based on the learned coefficents of the Linear SVM model. The results can be found in Table 4 . There are a number of noteworthy and intuitive patterns. For example, we find that predictive words for the "Yes" intent category include "bought" and "holding", hence, indicating direct information about buying GME shares. Likewise, the "No" support category includes predictive words such as "bagholders" and "puts" (i.e., short sells) 2 indicating nega-2 The term holding refers to owning GME stock. The term bagholder is an insult and refers to individuals who purchased a stock at a high price and the price dropped considerably leaving the individual "holding the bag". The term put refers to betting against a stock (Betting the stock will decrease in value) Table 4 : Most predictive words found by the Linear SVM model for each Intent and Support category. The words are ranked based on predictive power, e.g., the first word is the most predictive, the second word is the second most predictive, etc.

tive valence towards the GameStop company or anti-support against the general GameStop narrative to fight the establishment. The "Informative" category for both Intent and Support show words that indicate information such as linking to a website. Interesting, the "Maybe" Intent category contains words plural pronouns (e.g., "we" and "us") making it unclear whether someone is showing an intent for themselves or someone else. Overall, this simple phrase analysis provides further evidence of the quality of the data annotation process by providing intuitive insights into each category beyond the annotation guidelines.

The initial study findings are promising. First, the high Cohen's kappa score during the annotation phrase indicates enough differentiation in the language syntax that it can be possible for humans to extract both the intent and support levels of the text. Furthermore, despite the colorful language used, the annotators were able to identify and agree on the interpretation of community-specific language such as sarcasm or other narrative wordplays. Other researchers have also explored the power of narrative-based discussions on Reddit (Antoniak, Mimno, and Levy 2019) , which depicts how emotions and narratives unfold through language use on social media platforms. Next, the findings suggest that modeling with machine learning algorithms can perform substantially better than random baselines performance, indicating machine learning models can learn that data. This is important for both NLP researchers as well as computation social scientists. For example, NLP researchers can use the data to further develop better algorithms. Likewise, social scientists can use the predictions of a model trained on our dataset to answer social and behavior questions (e.g., related to persuasion).

In this section, we describe two future research avenues and use cases for our dataset: Detecting cases of persuasion on Reddit and understanding persuasion methods that can change user Intent.

Recent research into persuasion literature has generally utilized the Elaboration Likelihood Model (ELM) as an explanation of how an individual can be persuaded to behave. The Elaboration Likelihood Model suggests that when an individual arrives at a decision, the decision will either be based on the message and logical reasoning (the central route) or based on cues related to the message (the peripheral route.). This model (theory) has been important for social scientist to better understand human behavior as it is related to persuasion. One study on petitions from Change.Org found that cognitive reasoning and moral judgments do not lead to effective campaigns. Instead, successful persuasive campaigns rely on emotionally charged language and enlightening information . Furthermore, in another case study on textual conversations of CEOs and businesses, individuals process information from the peripheral route of the ELM would do so based on one of four appeals channels (social, ethical, political, and ideological) (Pignot, Nicolini, and Thompson 2020) . However, much of the current research on human persuasion tend to rely on subject survey's asking the individual if the individual were persuaded (Yi et al. 2019) , at the aggregate Wang et al. 2021a) , or attempt to model receptiveness to phrasing (Pignot, Nicolini, and Thompson 2020) . The literature lacks a definitive, easily identified indicator of detecting intention to action. As a potential use case of this dataset, researchers can leverage the ELM (particularly the peripheral route) in qualitatively interpreting how an individual is persuaded to change their mind from a unsure intention to buy GME shares (Unknown) or might (maybe) to a positive intention (Yes). Moreover, researchers can expand the data to include qualitative longitudinal data of users who are confirmed to have made a stock purchase becoming a real life case of detected persuasion. Another potential direction for the research would be to annotate more details such as the degree of support level (strongly opposed, strongly agree), capturing the degree of changed opinion over time.

As previously stated, in modeling intent, particularly in Management Information Systems literature, there is a reliance on reporting perceptions such as "Are you convinced" or "Would you buy this" (Yi et al. 2019; Gerlach, Buxmann, and Dinev 2019; Yin, Bond, and Zhang 2020) . On the other hand, our dataset attempts to minimize this element by identifying characteristics of individuals who are persuaded to engage in an activity. While there have been plenty of research using text analysis to outside of surveys, they tend to look at a more passive context such as past consumer sentiment (Jiang et al. 2021) , or the success of a past event in the aggregate (Wang et al. 2021b ). An interesting use of this dataset would be to identify persuasion at the individual level among peers. Identifying the conversation patterns would allow extended research into cyber community behaviors where it would not be as obvious when the jump between intent and behavior is made, for example, from talk of threatening cybercrime to taking actually action offline. Applying the linguistics characteristics associated with committed behavior into more practical circumstances.

The authors of this paper acknowledge reading and abiding by the AAAI code of conduct and ethical guidelines for this submission. We acknowledge the risk that as our paper observes the human behavior of persuasion and social manipulation, those with unsavory intent can abuse the research for their ends. Nevertheless, we posit that research on how to manipulate and socially engineer is already available. Social manipulation is already a problem. Fake news, for example, is the major challenge for our time, and by not studying how manipulation occurs, we cannot learn how vulnerable we are and how to defend against it effectively. This dataset aims to provide a freely available corpus for peer-to-peer conversations that can confirm user support for an campaign (e.g., supporting GameStop) and user intentions to make a purchase (e.g., GME shares). Our research does not use any individuals personal information, nor any identifiable info to put the individual at risk. All information was taken from publicly available data on Reddit.

Our dataset adheres to FAIR principles (Findable, Accessible, Interoperable, and Re-usable). The dataset is Findable and Accessible through Zenodo 3 . Moreover, the data is licensed under the Creative Commons Attribution License (CC BY 4.0). Finally, the data is shared as a CSV file along with the annotation guidelines, which are shared as word documents. Thus, the data is reusable and inter-operable.

We have seen that online social media communities are increasingly turning to collaboration as a mechanism to coordinate activities beyond the boundaries of the said community. The case of WallStreetBets suggests the importance of social media communication and its resulting influence on online human communications. This dataset provides the unique capability to not only measure intent but to reveal a tangible result, uniquely bridging intent, and action together. GME halted 10:13:47 This post only informs when GME stock had stopped trading. It is simple and informative.

Probably not the most popular, but I bookmark the on Yahoo Finance. I like that it shows pre-market. I've not done much hunting for anything else.

This example only informs users of a site to look up stocks. The intent is clearly to help users utilize a resource. There is no side taken for GME intent. 

The post indicates clear disinterest, or no intention to purchase the stock. The individual does not and will not purchase the stock.

All it is, is a case of Misery loves company its why GME bag holders still hyping it up and hating on everyone taking interest in other stocks lol This individual refers to people holding GME as bagholders (someone who will lose money) indicating that he does not own the stock and considers it a scheme.

Lol yall still doing this gme thing huh? We had the greenest week in 8 months and you think your money should be invested in a pump and dump? Man, I almost feel bad for you, then I remember that you had 30 warnings and chose to ignore them. Have fun! In this case, the writer clearly considers GameStop a scam and has no interest in holding.

Tesla in some weird way could actually be the company of the future. They certainly aren't right now but it's not out of the realm of possibility. GameStop has no future, even if they pivoted entirely to digital sales and e-commerce, nothing could possibly justify a $20B valuation, with the information they have currently released to the public.

While not outright said, this individual suggests Gamestop is a dying company and vastly overvalued; thus it can be strongly inferred the individual has no interest in the stock. For this process, support is defined as any indication that the commentor has some support for the Game Stop company. Support can encompass either supporting the Game Stop company itself, that is, believing in the worth of the firm, support for the stock to increase, using supportive terminology (such as "to the moon" or "GME GME GME" cheers); or it can include degree of support for the narrative of "us vs. them".

For example: They're thinking: launch invesitigations and take every reddit wsb user to court that got GME for the sole purpose of forcing you to sell your GME to cover legal fees, therefore dropping the stock price so the shorts can benefit. HAHA They crazy if they think this group won't band together. I see strength here Ihave never seen on ws. Stay strong!

In this example the individual first implies that courts are against GME leading to the belief of support; however, the comment goes on to suggest themes such as banding together and stay strong.

These are direct indicators of support for the "Us vs Them" theme.

Rules Example(s) Notes

The post indicates clear support for the GME narrative and for GameStop as a company. The post could also show hostility towards the counter companies. meStop stock in the recent time period.

GME was also predicted to hit 1k or more. In example one, the post shows clear support for GME, believing the stock will rise.

i agree GME will stay at the same levels as it is now for long period of time, but there is still probability it will jump very high! Hedge funds will not scare us! GME, AMC, NOK, BB!

In example two, The support is indicated through antagonism against the hedge funds (the companies betting against GameStop), promoting the "Us vs Them" narrative. This sentiment is positive CNBC try hards trying shitting on Reddit and GME so hard, let's shit on them, BUY OR HOLD, there is no sell.

Example 3 suggests an "us vs them" against the general media supporting the GameStop narrative.

There is no clear indication, in either direction, that the comment supports the Gamestop narrative. This can also be a catchall if the observation does not meet other criteria, such as completely unrelated posts..

if I got it right GME has 122% of float shorted, AMC has 27% ... it's why I'm asking someone to confirm

This comment does not indicate support. It attempts to inform but gives the individuals opinion/calculations. Therefore it is not an informative post either $GME isn't available on cashapp afaik. This comment leaves no indication on support at all. It makes a claim and could be informative but provides uncertainty so it does not qualify.

Gonna assume it will get deleted. AMC crack down started now GME crack downs.

This comment is only commenting on their opinion of events, it does not imply any support.

Post is meant to inform users without any personal opinions or biases visible. No emotions, no sides taken, only sharing information.

Probably not the most popular, but I bookmark the GME on Yahoo Finance. I like that it shows premarket. I've not done much hunting for anything else.

This post recommends a site to analyze GME data. It does not indicate any bias or interest, simply a source recommendation.

If you've got a Revolut account you can get them on their, and GME This post recommends an app to purchase GME, it does not indicate any bias or interest.

Easier to calculate the overall breakeven using the total in/out He spent 90k overall on all the contracts combined, so he needs at least 90k out to breakeven. He has 10 each of 275, 285. For each dollar above each strike he can get $1000 back, so he needs to be a total of $90 in the money across all the strikes. 90 = (x -275) + (x -285) = 2x -560 x = (90+560)/2 = 325, so he needs GME at $325 at expiry to breakeven This post is focused on how a specific financial vehicle can be calculated. The post is meant to inform and does not indicate any support.

The individual does not support Game Stop. Amc and nok, gme is broken The first post shows a lack of support for GameStop independent of intent.

I'm holding because i wouldn't gain much from selling at this point, but raw number comparisons to VW were always bad logic. sure, VW hit 1k, but it strated at 250. Gamestop rocketing from 40 to almost 400 was already a much larger jump. Will it go up again, idk. it might, but I think some here are too confident in that. But if we are waiting for the moment's, then we barreled right past it last week.

The second post, clearly suggests owning GameStop stock, but the individual does not seem to support the company, or the stock increasing.

Idk, probably going to be called a bot, my guess GME has been squeezed already. I'm still holding on for another but I think its probably just starting its slow descent back down. I'm not very knowledgeable in the stock market's more complicated areas. Just super basic stuff. Still learning more everyday so feel free to correct me this is not financial advice yada yada yada.

This post is interesting as its clear the individual owns the GameStop stock but he does not believe in the company. Instead the individual believes the stock will decrease strongly implying that he thinks the stock increase was just an event and not the actual value of the company. 

Narrative paths and negotiation of power in birth stories

The psychological appeal of fake-news attributions

Business Insider. 2021. Wallstreetbets traders are pushing risky stocks to all-time highs

Should you take investment advice from wallstreetbets? a data-driven approach

Persuasion at different levels of elaboration: experimental effects of strength, valence and ego depletion

A multi-appeal model of persuasion for online petition success: A linguistic cue-based approach

Yahoo! for amazon: Sentiment extraction from small talk on the web

Analyzing polarization in social media: Method and application to tweets on 21 mass shootings

they're all the same!" stereotypical thinking and systematic errors in users' privacy-related judgments about online services

Automated news reading: Stock price prediction based on financial news using context-capturing features

Investigating the effects of dimension-specific sentiments on product sales: The perspective of sentiment preferences

Forex-foreteller: Currency trend modeling using news articles

Adam: A method for stochastic optimization

Evaluation of methods and techniques for language based sentiment analysis for dax 30 stock exchange a first concept of aâC oelugoâ C sentiment indicator. International SERIES on Information Systems and Management in Creative eMedia (CreMedia

sticking it to the man": r/wallstreetbets, generational masculinity and revenge in narratives of our dystopian capitalist age

Semeval-2016 task 6: Detecting stance in tweets

Integrating cognition with an affective lens to better understand information security policy compliance

Scikit-learn: Machine learning in python

Affective politics and technology buy-in: A framework of social, political, and fantasmatic logics

Connotation frames of power and agency in modern films

Text classification for intelligent agent portfolio management

Predicting $gme stock price movement using sentiment from reddit r/wallstreetbets

Mitigating information asymmetry to achieve crowdfunding success: Signaling and online communication

Social media and attitude change: Information booming promote or resist persuasion?

pkudblab at semeval-2016 task 6: A specific convolutional neural network system for effective stance detection

Leveraging user-generated content for product promotion: the effects of firm-highlighted reviews

Classification of" hot news" for financial forecast using nlp techniques

Anger in consumer reviews: Unhelpful but persuasive? MIS Quarterly, Forthcoming

Mitre at semeval-2016 task 6: Transfer learning for stance detection

Combining news and technical indicators in daily stock price trends prediction

In this section, we provide the annotation guidelines that were developed for this project. Specifically, Table 5 provides the guidelines for annotating Intent, while Table 6 provides the guidelines for Support.For this process, we define intent as the commentator exhibiting some interest in acquiring the Game Stop stock. For this section intent can be already owning the stock, intent to purchase, or the desire to not purchase the stock.For example: "Looks like my puts on GME and AMC have me +14k today. Sorry! At some point GME and AMC would go down. Hope no one is stuck holding the bag In this example, the commentor is referring to Game Stop stock holders as bag holders, a term used to describe the leftover people who now have worthless stock. This strongly suggests the author has no interest in the stock and would not purchase it. Furthermore, puts are a method to bet against Game Stop price. So the desire to not purchase the stock is visible.

Rules Example(s) Notes

There is clear intent to purchase or has already purchased GameStop stock in the recent time period.I'll send you my worth after it went 10x thanks to the GME squeezeThe individual indicates his worth will increase, so it is clear they currently own GameStop shares Fuck it, my tax return is going back to GME. In this example, the individual outright states his intent to use his tax return to invest in game stop in the future.Where can i still buy GME Shares? Again the author intends to buy GameStop but does not know where.

It is not clear the individual has the stock, but the context hints of a possibility of purchase or already owns the stock Come listen to the GME WAR ROOM RADIO, for market open !!! Hold the line apes only https://www.twitch.tv/stashkonig "hold the line" implying he could be someone who has the stock, but it can not be determined with certainty."CNBC try hards trying shitting on Reddit and GME so hard, let's shit on them, BUY OR HOLD, there is no sellIn this comment, its not entirely certain if the individual owns or will purchase GME but its clear he leans towards the buy side.Worked for me yesterday. Was up and funded instantly. GME appears to be buyable This comment, the author suggests GME can be bought and that he may have done it yesterday; but it could also be that the app worked. So it leans towards buying intent.

It is not clear one way or another the intent to purchase or currently owning the stock. This can serve as a catchall if unable to annotate to any other category, such as completely unrelated posts.ITS ALL ABOUT GME!!!! BUY AND HOLD This clearly supports GME, but there is no indication that the person might already possess the stock Google gamestop news today This post does not reveal much information, nor does it inform.yo l can finally find some DD on stocks not named gme thank the fucking god This post somewhat hints in the lack of disinterest in GameStop, however it is not informative enough to draw a conclusion.

Post is meant to inform users without any personal opinions or biases visible. No emotions, no sides taken, only sharing information