key: cord-0565944-w559xnrm
authors: Roy, Shamik; Pacheco, Maria Leonor; Goldwasser, Dan
title: Identifying Morality Frames in Political Tweets using Relational Learning
date: 2021-09-09
journal: nan
DOI: nan
sha: 542caba91507b66faf2a9479d61cf3d05be70e0b
doc_id: 565944
cord_uid: w559xnrm

Extracting moral sentiment from text is a vital component in understanding public opinion, social movements, and policy decisions. The Moral Foundation Theory identifies five moral foundations, each associated with a positive and negative polarity. However, moral sentiment is often motivated by its targets, which can correspond to individuals or collective entities. In this paper, we introduce morality frames, a representation framework for organizing moral attitudes directed at different entities, and come up with a novel and high-quality annotated dataset of tweets written by US politicians. Then, we propose a relational learning model to predict moral attitudes towards entities and moral foundations jointly. We do qualitative and quantitative evaluations, showing that moral sentiment towards entities differs highly across political ideologies.

Morality is a set of principles to distinguish between right and wrong. Shared moral values form the social and cultural norms that unite social groups (Dehghani et al., 2016) . Moral Foundations Theory (MFT) (Haidt and Joseph, 2004; Haidt and Graham, 2007) provides a theoretical framework for analyzing different expressions of moral values. The theory suggests that there are at least five basic moral values, emerging from evolutionary, social, and cultural origins. These are referred to as Moral Foundations (MFs), each with a positive and a negative polarity, and include Care/Harm, Fairness/Cheating, Loyalty/Betrayal, Authority/-Subversion, and Purity/Degradation (Table 1 provides details). Identifying MF in text is a relatively new challenge and past work has relied on lexical resources such as the Moral Foundation Dictionary (Graham et al., 2009; Fulgoni et al., 2016; Xie et al., 2019) and annotated data (Johnson and Goldwasser, 2018; Lin et al., 2018; Hoover et al., 2020) .

Social and political science studies have repeatedly shown the correlation between ideological and political stances and moral foundation preferences (Graham et al., 2009; Wolsko et al., 2016; Amin et al., 2017) . For example, Graham et al., 2009 captures the correlation between political ideology and moral foundation usage, showing that Liberals have a preference for Care/Harm and Fairness/Cheating while Conservatives use all five. Our main intuition in this paper is that even when different groups use the same MF, the moral sentiment would be directed towards different targets. To clarify, consider the following tweets discussing the Affordable Healthcare Act (ACA, Obamacare).

[@SenThadCochran and I]CARING are working to protect [MS small businesses]CARE−F OR from more expensive [#Obamacare mandates]HARMING.

[The ACA]CARING was a life saver for the more than [130 million Americans]CARE−F OR with a preexisting condition -including covid now.

[Republicans]HARMING want to take us back to coverage denials.

While both tweets use the Care/Harm MF, in the top tweet (Conservative) the ACA is described as causing Harm, while in the bottom (Liberal), the ACA is described as providing the needed Care.

Our main contribution in this paper is to introduce morality frames, a representation framework for organizing moral attitudes directed at different targets, by decomposing the moral foundations into structured frames, each associated with a predicate (a specific MF) and typed roles. For example, the morality frame for Care/Harm is associated with three typed roles: entity providing care, entity needing the care, and entity causing harm. We focus on analyzing political tweets, each describing an eliciting situation that evokes the moral sentiment, and map the text to a MF, and the entities appearing in it to typed roles. Given tweets by different ideological groups discussing the same real-world situation, morality frames can provide the means to explain and compare the attitudes of the two groups.

We build on the MF dataset by Johnson and Goldwasser, 2018 consisting of political tweets, and annotate each tweet with MF roles for its entities.

Identifying moral roles from text in our setting requires inferences based on political knowledge, mapping between the author's perspectives and the judgements appearing in the text. For example, Donald Trump is likely to elicit a negative moral judgement from most Liberals and a positive one from most Conservatives, regardless of the specific moral foundation that is evoked. From a technical perspective, our goal is to model these kind of dependencies in a probabilistic framework, connecting MF and roles assignments, entity-specific sentiment polarity and repeating patterns within ideological groups (while our focus is U.S. politics, these settings could be easily adapted to capture patterns based on other criteria). We formulate these dependencies as a structured learning task and compare two relational learning frameworks, PSL (Bach et al., 2017) and DRaiL (Pacheco and Goldwasser, 2021) . Our experiments demonstrate that modeling these dependencies, capturing political and social knowledge, result in improved performance. In addition, we conduct a thorough ablation study and error analysis to explain their impact on performance.

Finally, we demonstrate how entity-based MF analysis can help capture perspective differences based on ideological lines. We apply our model to tweets by members of Congress on the issue of Abortion and the 2021 storming of the US Capitol. Our analysis shows that while Conservative and Liberal tweets target the same entities, their attitudes are often conflicting.

Usage of sociological theories like the Moral Foundation Theory (MFT) (Haidt and Joseph, 2004; Haidt and Graham, 2007) and Framing (Entman, 1993; Chong and Druckman, 2007; Boydstun et al., 2014) in Natural Language Processing tasks has gained significant interest. The Moral Foundation Theory (MFT) has been widely used to study the effect of moral values on human behavioral patterns, such as charitable donations (Hoover et al., 2018) , violent protests (Mooijman et al., 2018) and social homophily (Dehghani et al., 2016) . Framing is a strategy used to bias the discussion on an issue towards a specific stance by emphasizing certain aspects that prime the reader to support the stance.

Framing is used to study the political bias and polarization in social and news media (Tsur et al., 2015; Baumer et al., 2015; Card et al., 2015; Field et al., 2018; Demszky et al., 2019; Fan et al., 2019; Roy and Goldwasser, 2020) . Moral Foundation Theory (MFT) is frequently used to analyze political framing and agenda setting. For example, Fulgoni et al. (2016) analyzed framing in partisan news sources using MFT, Dehghani et al. (2014) studied the difference in moral sentiment usage between liberals and conservatives. Brady et al. (2017) found that moral/emotional political messages are diffused at higher rates on social media.

Previous works have also contributed to the detection of moral sentiments. Johnson and Goldwasser (2018) showed that policy frames (Boydstun et al., 2014) While existing works study MFT at the issue and sentence level, Roy and Goldwasser (2021) showed that there is a correlation between entity mention and the sentence-level moral foundation in the tweets by the U.S. politicians. We extend this work by studying MFT directly at the entity level. Hence, our work is broadly related to the works on entity-centric affect analysis (Deng and Wiebe, 2015; Field and Tsvetkov, 2019; Park et al., 2020) .

Combining neural networks and structured inference was explored for traditional NLP tasks such as dependency parsing (Chen and Manning, 2014; Weiss et al., 2015; Andor et al., 2016) , named entity recognition (Lample et al., 2016) and sequence labeling systems (Ma and Hovy, 2016; Zhang et al., 2017) . Recently, these efforts have expanded to discourse-level tasks such as argumentation mining (Niculae et al., 2017; Widmoser et al., 2021) , event/temporal relation extraction (Han et al., 2019) and discourse representation parsing (Liu et al., 2019) . Following this trend, Pacheco and Goldwasser (2021) introduced DRaiL, a general declarative framework for deep structured prediction, designed specifically for NLP tasks. In this paper, we use DRaiL to model moral foundations 3 Identifying Entity-Centric Moral Roles

MFT defines a convenient abstraction of the moral sentiment expressed in a given text. Morality Frames build on MFT and provide entity-centric moral sentiment information. Rather than defining negative and positive MF polarities (e.g., CARE or HARM), we use the five MFs as frame predicates, and associate positive and negative entity roles with each frame. As described in Table 1 , these roles capture information specific to each MF. For example, entity causing harm, is a negative sentiment role, associated with the CARE/HARM morality frame. The entities filling these roles can be individuals, collective entities, objects, activities, concepts, or legislative elements.

We build on the dataset proposed by Johnson and Goldwasser (2018), consisting of tweets by US politicians posted between 2016 and 2017. A subset of it (2K out of 93K) is annotated for Moral Foundations and Policy Frames (Boydstun et al., 2014) . The tweets focus on six politically polarized issues: immigration, guns, abortion, ACA, LGBTQ, and terrorism, and the party affiliations of the authors are known. We consider only labeled moral tweets, and choose the most prominent MF annotation for each tweet (some tweets are annotated for a secondary MF). Since the data contains only few examples of the Purity/Degradation moral foundation, we collected more examples from the unlabeled segment and manually annotated them. Table  2 shows the statistics of the final dataset. The annotation process and per-topic distribution of tweets are outlined in Appendix A. 

We annotate each tweet for entities and their associated moral roles.

Annotation Schema: We set up a QA task on Amazon Mechanical Turk. Annotators were given a tweet, the associated MF label and its description. They were then presented with multiple questions, and asked to mark the answers, corresponding to our entity roles, in the tweet. Table 3 shows the questions asked for the Care/Harm case. We asked additional questions to assess the annotators' understanding of the task. The questions for other moral foundations can be found in Appendix B.1. Quality Assurance: We provided the annotators with work-through examples and hints with each question about the entity type. The interface allowed them to mark a segment of the text with one moral role only. To further improve the quality, we did the annotation in two phases. In the annotator selection phase, we released a small subset of tweets for annotation. Based on the annotations, we assigned qualifications to high performing workers and released the rest of the tweets only to them. We awarded the annotators 15 − 18¢ per tweet. We define agreement among annotators if they mark the same segment in the text as having the same entityrole. We calculate the agreement among multiple annotators using Krippendorff's α (Krippendorff, 2004) , where α = 1 means perfect agreement, α = −1 means inverse agreement, and α = 0 is the level of chance in a tweet. Table 4 shows that the average agreement increased in the final stage. Note that the annotator agreement (Krippendorff's α) is calculated by comparing the character by character agreement between annotations. For example, if one annotator has marked 'President Trump' as an answer in a tweet, and another has marked 'Trump' as the answer, it will be considered as agreement on the characters 'Trump' but disagreement on 'President', although they really did not disagree on their annotations. This makes the agreement measurement very strict. Regardless, we still got very good average agreement among annotators in the final annotation step. We further refine the annotations by taking majority voting as described in the following section.

Annotation Results: A tweet is annotated by at least three annotators. We define a text span to be an entity E, having a moral role M, in tweet T, if it is annotated as such by at least two annotators. This way, we found 2, 945 (T, E, M) tuples.

To compare the partisanship of MFs and MF roles, we calculate the z-scores for the proportion of MFs and MF roles in the left and right, and consider it as partisan score (-right, + left). The partisan scores for common MFs and their corresponding most partisan (role: entity) tuples are shown in Table 5 . The results of this analysis align with our intuition, moral sentiment towards entities can be more indicative of partisanship than the high-level MFs. In Table 6 , we present the top-5 most used entities per role by political party for Care/Harm. We can see that the target entities of moral roles vary significantly across parties. Details for other MFs and z-scores are in Appx. B.2. LGBT Fair/Cheat (+2.1)

Cheating: SCOTUS-Marriage (-6.5)

Target of Fairness:

LGBT (+1.0) 

We propose a relational learning model for identifying morality frames in text. We begin by defining our relational structure (Sec. 4.1) and proceed to describe its implementation using relational learning tools (Sec. 4.2).

Statistical Relational Learning (SRL) methods attempt to model a joint distribution over relational data, and have proven useful in tasks where contextualizing information and interdependent decisions can compensate for a low number of annotated examples (Deng and Wiebe, 2015; Johnson and Goldwasser, 2016, 2018; Subramanian et al., 2018) . By breaking down the problem into interdependent relations, these approaches are easier to interpret than end-to-end deep learning techniques.

We propose a joint prediction framework of morality frames, modeling the dependency between MF labels and moral roles instances. Following SRL conventions (Richardson and Domingos, 2006; Bach et al., 2017) , we use first-order-logic to describe relational properties. Specifically, a logical rule is used to define a probabilistic scoring function over the relation instances appearing in it, the full description appears in Section 4.2.

r1 : Tweet(t) ⇒ MF(t, m) r2 : Tweet(t) ∧ Ent(t, e) ⇒ Role (t, e, r) In addition, we make the observation that both moral foundations and entities' moral roles depend on external factors that go beyond the text, such as author information and party affiliation. Previous work has shown that explicitly modeling party affiliation and the topics discussed are helpful for predicting moral foundations (Johnson and Goldwasser, 2018) . For this reason, we condition both the moral foundation and moral roles on this additional information, as shown in the rules below.

Rules r 1 , r 2 condition the moral foundation label (m) and moral foundation role label (r) on the tweet (t) and entity (e), while r 3 , r 4 condition on the ideology of the author (i) and the topic of the tweets (k). Concretely, r 4 can be translated as "if a tweet t has author ideology i, topic k, and mentions entity e, the entity will have moral role r". Other rules can be translated similarly. Then, we explicitly model the dependencies among different decisions using the following three constraints. c1 : Ent(t, e) ∧ Role(e, r) ∧ MF_Role(m, r) ⇒ MF(t, m) c2 : Ent(t, e1) ∧ Ent(t, e2) ∧ Role(t, e1, r) ⇒ ¬Role(t, e2, r) c3 : SameIdeo(t1, t2) ∧ SameTopic(t1, t2) ∧ Ent(t1, e) ∧ Ent(t2, e) ∧ Role(t1, e, r1) ∧ Role(t2, e, r2) ⇒ SamePolarity(r1, r2) (c 1 ) Consistency between MF label and roles: While rules r 1 , r 3 predict the MF labels, and r 2 , r 4 predict the role labels, these two predictions are interdependent. Knowing the MF of a tweet limits the space of feasible roles. Likewise, knowing the role of an entity in a tweet will directly give us the MF label. For example, the presence of an entity frequently used as a harming entity indicates a higher probability of the MF label 'Care/Harm'. We model the dependency between these two decisions using constraint c 1 , which can be translated as "if an entity e, mentioned in tweet t, has role r, tied to MF m, then tweet t will have MF label m".

(c 2 ) Different roles for different entities in the same tweet: Our intuition is that if multiple entities are mentioned in the same tweet, they are likely to have different roles. While this may not always hold true, we use this constraint to prevent the model from relying only on textual context, and assigning the same role to all entities.

(c 3 ) Consistency in the polarity of sentiment towards an entity within a political party: Intuitively, role types have a positive or negative sentiment associated to them. For example, an entity causing harm and an entity doing betrayal carry negative sentiment. Intuitive polarity for each MF role can be found in Appendix C.1. Given the highly polarized domain that we are dealing with, we assume that regardless of the MF, an entity will likely maintain the same polarity when mentioned by a specific political party across the same topic. Constraint c 3 encourages this consistency, and it can be translated as: "if two tweets t 1 , t 2 are written by authors of the same political ideology, on the same topic, and mention the same entity e, then the polarity of the roles r 1 and r 2 of e in both tweets will likely be the same." We consider two entities to be the same if they are an exact lexical match, and leave entity clustering for future work.

In this work, we experiment with two existing frameworks for modeling relational learning problems -(1) Probabilistic Soft Logic (PSL) (Bach et al., 2017) and (2) Deep Relational Learning (DRaiL) (Pacheco and Goldwasser, 2021). Both PSL and DRaiL are probabilistic frameworks for specifying and learning relational models using weighted logical rules, specifically horn clauses of the form w r : P 1 ∧ ... ∧ P n−1 ⇒ P n . Weights w r indicate the importance of each rule in the model, and they can be learned from data. Predicates P i can be closed, if they are observed, or open if they are unobserved. Probabilistic inference is used over all rules to find the most probable assignment to open predicates. The main differences between PSL and DRaiL are -(a) In DRaiL, each rule weight is learned using a neural network, which can take arbitrary input representations, while in PSL a single weight is learned for each rule, and expressive classifiers can only be leveraged as priors; (b) DRaiL defines a shared relational embedding space, by specifying entity and relation specific encoders that are reused across all rules. In both frameworks, rules are transformed into linear inequalities corresponding to their disjunctive form, and MAP inference is defined as a linear program.

In PSL, rules are compiled into a Hinge-Loss Markov random field, defined over continuous variables. Weights can be learned using maximum likelihood estimation, maximum-pseudolikelihood estimation, or large-margin estimation. In DRaiL, rule weights are learned using neural networks. Parameters can be learned locally, by training each neural network independently, or globally, by using inference to ensure that the scoring functions for all rules result in a globally consistent decision. To learn global models, DRaiL can also employ maximum likelihood estimation or large-margin estimation. Details regarding both frameworks can be found in Appendix C.2.

The goal of our relational learning framework is to identify morality frames in tweets by modeling them jointly, and derive interpretable relations between them and other contextualizing information.

In this section, we compare the performance of our model with multiple baselines, and present a detailed error analysis. Then, we collect tweets on one topic (Abortion) and one event (2021 US Capitol Storming) written by US Congress members and analyze the discussion. 1 We identify the morality frames in these tweets using our best model.

We experiment with PSL and DRaiL for modeling the rules presented in Section 4.1. In DRaiL, each rule r is associated with a neural architecture, which serves as a scoring function to obtain the rule weight w r . In the case of rules r 1 and r 2 , which map tweets and entities to labels, we use a BERT encoder (Devlin et al., 2019) with a classifier on top. We use task-adaptive pretraining for BERT (Gururangan et al., 2020) , and fine-tune it on a large number of unlabeled tweets. In the case of rules r 3 and r 4 , that incorporate ideology and topic information, we learn topic and ideology embeddings with one-layer feed-forward nets over their one-hot representations. Then, we concatenate the output of BERT with the topic and ideology embeddings before passing everything through a classifier. On the other hand, PSL directly learns a single weight for each rule. Given that our rules are defined over complex inputs (tweets), we use the output of the locally trained neural nets as priors for PSL, by introducing additional rules of the form Prior(t, x) ⇒ Label(t, x). This approach has been successfully used in previous work dealing with textual inputs (Sridhar et al., 2015) . Note that while PSL can only leverage these classifiers as 1 Collected from https://github.com/alexlitel/congresstweets priors, DRaiL continues to update the parameters of the neural nets during learning.

We model constraint c 1 , aligning the MF and role predictions, and c 3 , aligning role polarity, as unweighted hard constraints in both frameworks. For constraint c 2 , we learn a weight to encourage different entities in a tweet to have different roles. PSL learns a weight directly over this rule, while in DRaiL we use a feed-forward net over the onehot vector of the relevant MF. We compare our relational models with the following baselines.

Lexicon Matching: Direct keyword matching using the MF Dictionary (MFD) (Graham et al., 2009 ) and a PMI-based lexicon extracted from the dataset by Johnson and Goldwasser (2018).

Sequence Tagging: We set the MF role prediction task as a sequence tagging problem, and map each entity in a tweet to a role label. We use a BiLSTM-CRF (Huang et al., 2015) over the full tweet, and use the last time-step in each entity span as its emission probability.

End-to-end Classifiers: We map the text and entities, and other contextualizing features (e.g. topic), to a single label. We compare BERT-base and task adaptive pretraining (BERT-tapt) by using a whole-word-masking objective over the large set of unlabeled political tweets.

Multi-task: We define a single BERT encoder, and a single ideology and topic embedding that is shared across the two tasks. Task-specific classifiers are used on top of these representations. Then, the loss functions are added as L = λ 1 L MF + λ 2 L Role . We set λ 1 = λ 2 = 1.

We perform 3-fold cross validation over the dataset introduced in Section 3, and show results for MF and role prediction in Table 7 . First, we observe that leveraging unlabeled data for taskadaptive pretraining improves performance. Then, we find that relational models that use probabilistic inference outperform all of the other baselines for both tasks. Further, we find that modeling rules using neural nets in DRaiL, and learning their parameters with global learning, performs better than using them as priors and learning a single weight in PSL. We also include results by fixing the gold labels for the MF prediction, and refer to this as a skyline. Unsurprisingly, having perfect MF information improves results for roles considerably. In this case, the candidates for each entity are reduced from 16 possible assignments to 3 or 4, which results in a much easier task. Details regarding all baselines, 

We perform an ablation study, evaluating different versions of our model by adding and removing constraints and analyzing corresponding errors. To study the effect of different rules and constraints on role prediction, we define three types of errors: (E1) Polarity Swap: when the role of an entity with one polarity (positive/negative) is identified as one role of the opposite polarity.

(E2) Mixed MFs: when different entities of the same tweet are identified with roles from a MF other than the gold label of the tweet.

(E3) Same Roles: all of the entities in a tweet are identified to have the same role when the gold labels are different.

The analysis is shown in Table 8 . First, we see that constraint c 1 , aligning the two decisions, does most of the heavy lifting and reduces error (E2) in all cases. Enforcing consistent polarities with c 3 further improves performance and reduces error (E1), for which it is designed for. c 3 also reduces error (E3) in some models. Encouraging entities to have different roles with c 2 does not improve the overall performance, but it helps to reduce error (E3) when combined with c 3 . We use a soft version of c 2 , so it is not strictly enforced. We find that roles with negative sentiments are easier for the model to identify (Appendix D.4). Note that every MF has only one role with negative sentiment, and the model does not swap role labels with different sentiments frequently (E1). Therefore, determining the correct positive role is more challenging.

To analyze the political discussion using the moral sentiment towards entities, we collected more tweets from US politicians on the topic of Abortion and around the storming of the US Capitol on Jan. 6, 2021. The Abortion tweets are from 2017 to Feb. 2021. For the US Capitol incident, we collected tweets 7 days before and after the event, with the goal of studying any change in sentiment towards entities. We took noun phrases occurring at least 50 times, manually filtered out non-entities, and grouped different mentions of the same entity (Appendix D.6). We collected tweets that mentioned these entities. Statistics for the resulting data can be found in Table 9 . We re-trained our model using all of our labeled data, and predicted the morality frames for each tweet in the new dataset. We performed human evaluation on the predictions for this new data by randomly sampling 50 tweets from each issue. This resulted in 91 and 76 (tweet, entity) pairs for Abortion and US Capitol, respectively. This procedure resulted in an accuracy of MF prediction of 88% for each issue, and a role prediction accuracy of 75% for Abortion, and 60.44% for the US Capitol incident. We found that entities that appear less in the training data have low precision for the role prediction (See Table 10 ). Note that the US Capitol event was not observed during training, which makes it more challenging. For Abortion, we observed that Democrats mention the entity Women most, and 84% of the time the predicted MF role is target of care/harm or fairness/cheating, and it is never assigned a negative role (possibly because of constraint c 3 ). For Republicans, we observed the same pattern for the entity Life (Stats in Appendix D.8). However, in a few cases (2.4%) Life is predicted as the entity ensuring fairness/purity/care, justified authority or being loyal. While these roles carry a positive sentiment, they are intuitively wrong predictions for Life. We found out that for 34.21% of such cases, there are multiple mentions of Life in the same tweet. Given that constraint c 2 encourages different roles for different entities in a tweet, this can be the source of this error. Examples for these cases can be found in Appendix D.9.

In this section, we first characterize the political discussion on Abortion using the predicted morality frames. Then, we analyze how an event impacts the moral sentiment towards entities by looking at the usage of MF roles before and after the 2021 US Capitol Storming for the different parties. most frequent for both parties (Appendix E.1). To analyze MF role usage, we list the most frequent entities and their most frequent moral roles in Figure  2a . The left portrays entities related to Reproduction Freedom as the target of Fairness/Cheating. While on the right, the top target of Purity/Degradation is Life. Both of them use Planned Parenthood frequently, but their sentiment towards it differs. To further examine this, we plot Planned Parenthood's polarity graph in Figure 2b . It shows that parties express opposite sentiments towards Planned Parenthood. These findings are consistent with known stances of democrats and republicans on this topic.

Entity-Relation Graph: We examine how the political discussion is framed by each party by looking at the sentiments expressed towards different entities, regardless of whether they use the same high level MF. We look at Care/Harm, which is frequently used by both parties, and take the two most used targets by each party. We then take the top three care providing and harming entities used in the same tweet as the target. We assign the most common role for each entity, and represent it in an entity-relation graph in Figure 1a -1b. We can see that both democrats and republicans express care for Women, but the caring and harming entities vary highly across parties. For example, the left portrays Planned Parenthood as the caring entity, while the right portrays it as the harming entity. This analysis shows that, while there is overlap in the MFs used, the moral roles of entities can highlight the differences between parties in politically polarized discussions at an aggregate level.

To analyze how the moral sentiment towards entities changed after the storming of the US Capitol on January 6, 2021, we look at the sentiment towards entities before and after the event. We found that Authority/Subversion and Care/Harm were the two most used moral foundations after the incident for both parties (Appendix E.1). In Table 11 , we present the top three most frequent entities for role types under Care/Harm and Authority/Subversion, before and after the event. Entities appearing less than 15 times are omitted from this analysis. Our model predicted that, after the event, the left justified the authority of Mike Pence, and violence appeared as a harming entity even before the event occurred. On the right, Trump shifted from an entity providing care prior to the event, to a harming entity after the event. We show some relevant tweets and their corresponding predictions in 

In this paper, we present the first study on Moral Foundations Theory at the entity level, by assigning moral roles to entities, and present a novel dataset of political tweets that is annotated for this purpose.

We propose a relational model that predicts moral foundations and the moral roles of entities jointly, and show the effectiveness of modeling dependencies and contextualizing information for this task. Finally, we analyze political discussions in the US using these predictions, and show the usefulness of our proposed schema. In the future, we intend to study how morality frames and our relational framework can be applied in other settings, where contextualizing information is not observed.

To the best of our knowledge no code of ethics was violated throughout the annotations and experiments done in this paper. We used human annotation for annotating an existing dataset with new labels. We adequately acknowledged the dataset and its various properties are explained thoroughly in the paper. While annotating, we respected the privacy rights of the crowd annotators and we didn't ask any personal details of the anonymous human annotators. They were informed that the task contains potentially sensitive political content. The crowd annotators were fairly compensated by rewards per annotation. We determined what is a fair amount of compensation by taking into consideration the feedback from the annotators and comparing our reward with other annotation tasks on the crowd-sourcing platform. The dataset presented is comprised of tweets, and for the reviewers, we only submitted a subset of the tweets with text. We will replace the tweet text with only tweet ids when publishing it publicly to respect the privacy policy of Twitter. We did a thorough qualitative and quantitative evaluation of our annotated dataset, presented in the paper. We reported all pre-processing steps, hyperparameters, other technical details and will release our code and data for reproducibility. Due to space constraints, we moved some of the pre-processing steps, detailed hyper-parameter information, and additional results to the Appendix section. The results reported in this paper support our claims and we believe that they are reproducible. Any qualitative result we report is an outcome from a machine learning model and does not represent the authors' personal views, nor the official stances of the political parties analyzed. As we study text from humans to identify the moral sentiment, to draw conclusions, we rely on a machine learning model which is more interpretable than an end to end deep learning model. 

Identification and Annotation of Tweets with 'Purity/Degradation': To collect more tweets on Purity/Degradation, we took more examples from the unlabeled segment of the dataset (93K tweets). Then we filtered out 619 tweets from it based on lexicon matching with Moral Foundation Dictionary for Purity/Degradation. Then two of the authors of this paper individually went over the 619 tweets and selected tweets having purity/degradation as the primary moral foundation in them. The two authors had agreement on 95% of the cases. Then we combined the two lists from two authors and in case of a disagreement we resolved it by discussion. In this manner we found 44 tweets on Purity/Degradation. Then we annotate these 44 tweets with Purity/Degradation with 17 Policy Frames present in them in the same manner. Two authors of this paper annotated the 44 tweets for Policy Frames individually. They had an agreement on 47% of the cases about the primary policy frame in a tweet. Most of the time they had a disagreement in the cases where there are more than 1 policy frame present in them. The authors resolved any disagreement by discussion.

Full Dataset Statistics: The statistics of the full dataset can be found in Table 13 . TWEETS  IDEOLOGY  TOPIC   LEFT RIGHT ABO  ACA  GUN IMM LGBT TER  Care/Harm  589  378  211  30  142  221  31  11  154  Fairness/Cheating  264  201  63  42  81  33  22  73  13  Loyalty/Betrayal  231  167  64  15  20  92  28  24  52  Authority/Subversion  471  200  271  33  177  76  99  19  67  Purity/Degradation  44  13  31  21  3  8  6  2  4  TOTAL  1599  959  640  141  423  430  186  129  290   Table 13 : Dataset summary.

B.1 Questionnaire asked to the annotators for annotation of entity roles

The questionnaire asked to the annotators for all moral foundations can be found in Table 14 .

To determine the partisanship of the elements -(1) moral foundations, (2) (moral foundation role: entity), we use z-score measure of these elements in the two political ideologies (left, right). We calculate the z-score to evaluate -whether two groups (e.g., left and right) differ significantly on some single characteristic. In our case the characteristics are any element of type (1) or type (2) as described above. A positive z-score means it's left-partisan and negative score means right-partisan. Most frequent entities per moral role can be found in Table 15 .

To examine how well moral roles account for political standpoints when compared to moral foundations, we use the moral foundations (MF) and (moral foundation role, entity) (MFR) as one hot encoded features to classify the ideology of the tweet (left/right). The results are shown in Tab. 16. Moral roles classify the ideology reasonably well compared to MF and BoW features, which proves the usefulness of the moral roles for capturing political perspectives.

Moral Roles with positive polarity: Target of care/harm, Entity providing care, Target of fairness/cheating, Entity ensuring fairness, Target of loyalty/betrayal, Entity being loyal, Justified authority, Justified authority over, Failing authority over, Target of purity/degradation, Entity preserving purity. Justified authority over If the LEADERSHIP or AUTHORITY is obeyed/praised/justified, then praised/obeyed by whom or justified over whom?

Failing authority Which LEADERSHIP or AUTHORITY is disobeyed or failing or criticized?

Failing authority over

If the LEADERSHIP or AUTHORITY is disobeyed or failing or criticized, then failing to lead whom or disobeyed/criticized by whom?

Target of purity/degradation What or who is SACRED, or subject to degradation? Entity preserving purity Who is ensuring or preserving the sanctity?

Entity causing degradation Who is violating the sanctity or who is doing degradation or who is the target of disgust? PSL models are specified using weighted horn clauses, which are compiled into a Hinge-Loss Markov Random Field, a class of undirected probabilistic graphical model. In HL-MRFs, a probability distribution is defined over continuous values in the range of [0, 1], and dependencies among them are modeled using linear and quadratic hinge functions. This way, they define a probability density function:

where w r is the rule weight, Z is a normalization constant and ψ r (Y , X) = max {l r (X, Y ), 0} ρr is the hinge-loss potential corresponding to the instantiation of rule r, represented by a linear function l r of X and Y , and an optional exponent ρ r ∈ {1, 2}. Inference in PSL is performed by finding a MAP estimate of the random variables Y given evidence X, this is done by maximizing the density function in Eq. 1 as arg max Y P (Y |X). To solve this, they use Alternating Direction Method of Multipliers (ADMM), an efficient convex optimization procedure.

Weights can be learned through maximum likelihood estimation by using the structured perceptron algorithm. The partial derivative of the log of the likelihood function in Eq. 1 above with respect to a parameter W r is:

where E W is the expectation under the distribution defined by W . Given that computing this expectation is intractable, they approximate it by taking the values in the MAP state. This approximation makes this learning approach a structured variant of the voted perceptron. Note that alternative estimations are also supported. More details can be found in the original paper (Bach et al., 2017) .

Rules in DRaiL can be weighted (i.e. classifiers, soft constraints) or unweighted (i.e. hard constraints 

Where each rule grounding r, generated from template t, with input features x r and predicted variables y r defines the potential ψ r (x r , y r ), added to the linear program with a weight w r . DRaiL implements both exact and approximate inference to solve the MAP problem, in the latter case, the AD 3 algorithm is used (Martins et al., 2015) . In DRaiL, weights w r are learned using neural networks defined over parameter set θ. Parameters can be learned locally, by training each rule independently, or globally, by using inference to ensure that the scoring functions for all rules result in a globally consistent decision. To train global models using large-margin estimation, DRaiL uses the structured hinge loss:

Where Φ t represents the neural network associated with rule template t, and parameter set θ t . Here, y corresponds to the gold assignments, andŷ corresponds to the prediction resulting from the MAP inference defined in Eq. 3. Note that alternative estimations are also supported. More details can be found in the original paper (Pacheco and Goldwasser, 2021).

We do task-adaptive pretraining for BERT (Gururangan et al., 2020) , and fine-tune it on a large number of unlabeled tweets 2 . To select unlabeled tweets, we build a topic-specific lexicon of n-grams (n≤ 5) from our training dataset based on Pointwise Mutual Information (PMI) scores (Church and Hanks, 1990) . Namely, for an ngram w we calculate the point-wise mutual information (PMI) with label l (e.g. topic), I(w, l) using the following formula.

Where P (w|l) is computed by taking all tweets with label l and computing count(w) count(allwords) . Similarly, P (w) is computed by counting ngram w over the set of tweets with any label. To construct the lexicon, we rank ngrams for each label based on their PMI scores.

We explore three pretraining objectives, described below. In all cases, models were initialized using BERT (Devlin et al., 2019) .

Masked Language Modeling: We randomly mask some of the tokens from the input, and predict the original vocabulary id of the masked word based on its context (Devlin et al., 2019) .

Whole Word Masking: Instead of masking randomly selected tokens, which may be sub-segments of words, we mask randomly selected words.

Moral Foundations Dictionary: We create a lexicon for each Moral Foundation from the dataset by Johnson and Goldwasser (2018) using the same PMI formula described above. We use the normalized PMI scores as a weight for each unigram, and assign a weight of 1 to unigrams in the Moral Foundation Dictionary (MFD) (Graham et al., 2009) . We score a tweet by summing the scores of words matching the lexicon. We take the highest scoring moral foundation for each tweet, and fine-tune a moral foundation classifier using this weakly annotated data.

We evaluate these objectives by performing the pre-training stage on the unlabeled data, and finetuning the encoder for our base task of leveraging only text to predict moral foundations and entity roles. Results can be seen in Tab. 17.

Lexicon Matching: We label a tweet with the moral foundation with maximum score based on lexicon matching. We use the Moral Foundation lexicons created in Appendix D.1. If there is no lexicon matching for a tweet, we assign a moral foundation label to it randomly. We experiment classifiers are used on top of them. Then, the loss functions are added as L = λ 1 L MF + λ 2 L Role . We set λ 1 = λ 2 = 1. For topic and ideology embeddings, we use feed-forward computations with 100 hidden layers and ReLU activations. For BERT we use the same configuration as the end-to-end classifiers.

For the underlying BERT, we use the default parameters of the hugging face implementation 3 . Other parameters can be observed in Table 18 (top). The bottom part of Table 18 shows the validation performance during the learning of the best performing model.

The per class classification results can be found in the 

All experiments were run on a 4 core Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz machine with 64GB RAM and an NVIDIA GeForce GTX 1080 Ti 11GB GDDR5X GPU. Runtimes for our models can be found in Table 20 Task sec p/Epoch epochs p/It sec p/It MF prediction is 'Care/Harm', possibly because there is a notion of protecting babies. In MF role prediction, the model makes mistake when there are multiple mention of the same entity, possibly because of constraint c2 but still assigns a positive role to 'Life', possibly because of constraint c3. (LOYALTY/BETRAYAL) I will always, always, ALWAYS be proud to Stand 4 [Life] BEIN G−LOY AL . I'm so grateful to @TXRightToLife for their support and pledge to never stop fighting for the [unborn] T ARGET −OF −LOY ALT Y . Now, Texas, let's get out and vote to #KeepTexasRed! MF prediction is correct. In MF role prediction, the model makes mistake when there are multiple mention of the same entity, possibly because of constraint c2 but still assigns a positive role to 'Life', possibly because of constraint c3. (2021) The most targeted entities and entity relationship graphs after the US Capitol Storming (2021) are shown in Figure 3 and 4, respectively. 

raskin Capitol: capitol, capitol building, capitol hill, nation capitol Impeachment: impeachment, impeach president Kamala Harris: kamala harris, vice president elect Capitol Police: capitol police, police officer, law enforcement, law enforcement officer Mike Pence: pence, vp pence, mike pence Mitch McConnell: mitch mcconnell, mcconnell GOP: house gop, gop leader, gop, republican Domestic Terrorism: domestic terrorist, domestic terrorism Nation: nation National Security: national security, national guard Democrats: dem, democrat, house democrat Violence: violence, violent insurrection, violent attack, violent mob White Supremacist: white supremacist Fair Election: fair election D.7 Human Evaluation on Test Data Model Prediction Validation We trained our model with all of our labeled data and used it to predict the moral foundations and entity roles of (tweet, entity) pairs in the new set. The validation set (randomly selected from train set) weighted F1 scores were 72.20% and 64.59% for moral foundations and roles, respectively. We validate our model's prediction on the unseen dataset using human evaluation. We randomly sampled 50 tweets from each of the two test sets. This resulted in 91 and 76 (tweet, entity) pairs for Abortion and US Capitol, respectively. Note that one tweet may have > 1 entities. Then, we presented the predictions of moral foundations and entity roles to two graduate students and asked them if the prediction is correct or not. We found the Cohen's Kappa (Cohen, 1960) score between the annotators to be 0.50 (moderate agreement) and 0.64 (substantial agreement) in case of the moral foundations and entity roles, respectively. In case of a disagreement, we asked a third grad student to break the tie. The accuracy of the model for moral foundations was 88% for each topic, while for roles it was 75% and 60.44%, for Abortion and US Capitol

We thank Nikhil Mehta, Rajkumar Pujari, and the anonymous reviewers for their insightful comments. This work was partially supported by an NSF CAREER award IIS-2048001.