key: cord-0774062-koo7l9b9
authors: Jing, Elise; Ahn, Yong-Yeol
title: Characterizing partisan political narrative frameworks about COVID-19 on Twitter
date: 2021-10-30
journal: EPJ Data Sci
DOI: 10.1140/epjds/s13688-021-00308-4
sha: c57513d367eed637de0f008501b1aa19b15382a0
doc_id: 774062
cord_uid: koo7l9b9

The COVID-19 pandemic is a global crisis that has been testing every society and exposing the critical role of local politics in crisis response. In the United States, there has been a strong partisan divide between the Democratic and Republican party’s narratives about the pandemic which resulted in polarization of individual behaviors and divergent policy adoption across regions. As shown in this case, as well as in most major social issues, strongly polarized narrative frameworks facilitate such narratives. To understand polarization and other social chasms, it is critical to dissect these diverging narratives. Here, taking the Democratic and Republican political social media posts about the pandemic as a case study, we demonstrate that a combination of computational methods can provide useful insights into the different contexts, framing, and characters and relationships that construct their narrative frameworks which individual posts source from. Leveraging a dataset of tweets from the politicians in the U.S., including the ex-president, members of Congress, and state governors, we found that the Democrats’ narrative tends to be more concerned with the pandemic as well as financial and social support, while the Republicans discuss more about other political entities such as China. We then perform an automatic framing analysis to characterize the ways in which they frame their narratives, where we found that the Democrats emphasize the government’s role in responding to the pandemic, and the Republicans emphasize the roles of individuals and support for small businesses. Finally, we present a semantic role analysis that uncovers the important characters and relationships in their narratives as well as how they facilitate a membership categorization process. Our findings concretely expose the gaps in the “elusive consensus” between the two parties. Our methodologies may be applied to computationally study narratives in various domains. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1140/epjds/s13688-021-00308-4.

Human beings make sense of the reality around them by constructing narratives using what they see, hear, and encounter [49] . However, narratives that evolve around different identities, cultures, religions, etc. are often at odds with each other [42] . One of the areas where contrasting narratives fiercely collide and fight is politics. Political communication often happens through narratives and stories, rather than logical reasoning [5, 24] . many politicians have been effectively using their tweets to spread their narratives [28] . While there have been studies on the hashtags [25] , sentiments [33] , and moral values [29] from the politicians' tweets, systematic studies of political narratives on Twitter are rare, although political science increasingly adopts text analysis methods [58] .

While the scale of social media data provides great opportunities, it also poses many challenges. Traditional approaches to narrative studies through "close reading" [43] may allow deep understanding of narratives, but are labor-intensive and rely on subjective judgements. Such constraints may be addressed by computational methods, where we can automatically identify patterns in large datasets. For example, Shurafa et al. [53] studied hashtags and rhetoric devices used by U.S. Twitter users leaning towards the Democratic or Republican parties, and identified their framing preference regarding the COVID-19 crisis; Green et al. [20] identified key words from politicians' tweets, and showed that partisanship can be inferred by their word usage. However, these studies rely on wordlevel analysis and Twitter hashtags, while in-depth analysis of such narratives are rarely attempted.

Additionally, the brief nature of Twitter postings makes it unlikely for each of them to contain a complete narrative. Rather, each tweet may contain "fragments" of a larger narrative. While human readers can often infer the overarching narrative based on their reading of other tweets and background knowledge, it is difficult for computational models to do so. A similar challenge is identified by Tangherlini, et al. in their study of online conspiracy theories [55] , where the complete narrative is often scattered in multiple short postings. Their response is to consider a narrative framework consisting of "cast of characters, the relationships between those characters, and the contexts in which those relationships arise", which individual postings sample from. Similarly, we consider two narrative frameworks for the U.S. Democratic and Republican parties, which are conceptualized by the aggregation of each party's tweets respectively, containing the contexts, characters, and relationships used by each party's narrative. Individual tweets draw their "ingredients" from this larger space, and allude to the complete narrative therein.

Following this intuition, we characterize the narrative frameworks for the two parties by analyzing collections of their tweets to identify three elements: context, framing, and characters and relationships. Our approach has two key differences from Tangherlini et al. [55] in that (i) we consider the context as the main topics and issues that each party engages with, instead of characterizing it with relationships. (ii) we examine framing separately as we consider it to be a central piece of political discourse, which shapes how political narratives are conveyed to the audience independent from what is communicated (we further elaborate on this below). In doing so, we aim to provide more nuanced analysis beyond the common term-based approaches.

First, we analyze the word frequencies in the tweets and identify the most characteristic words used by each party; this simple method allows us to see the most contrasting differences in each group's narratives at the level of "ingredients", which set up the contexts for their narrative frameworks.

Next, we ask how they are framed. Framing analysis is a central piece in political discourse analysis [57] . Framing is about selectively presenting some aspects of an issue and make them more salient, in order to promote certain values, interpretations, or solutions [13] . For example, on the undocumented immigration issue, the Democrats often focus on the human rights aspect, while the Republicans often focus on the legality. Similar diver-gence in framing across major political issues are widely recognized from the two parties. Hemphill et al. [25] showed that using Twitter data, a machine learning classifier can be trained to easily predict the partisanship of a politician from the frames that they use.

Traditional studies on political framing mostly rely on manual content analysis and discourse analysis to detect frames from texts [46] , and are therefore confined to a small set of frames because the process is labor-intensive. Here, we employ the FrameAxis model [35] , which was developed to facilitate this process by using word embeddings and antonymous word pairs. With this method, the overall bias (the alignment with a frame) and intensity (the strength of a frame) of a document with respect to many "microframes" can be computed. We apply the FrameAxis to identify important frames in the politicians' tweets about COVID-19. For example, we found the microframe dead vs. live is used to discuss the deaths related to COVID-19, and the microframe fast vs. slow is used to discuss the spread of COVID-19.

Finally, we analyze the characters and relationships in each party's narrative framework. We focus on the relationships captured by actions, the Agent (the one who initiates an action), and the Patient (the one being affected or the recipient of the action). For example, in the sentence Mary sold the book, Mary is the Agent, book is the patient, and the relationship is captured in the verb sold. The Agent-Patient-Action pattern appears to be universal in human cognition [8] .

We use semantic role labeling (SRL) models to automatically identify Agents, Patients, and verbs in our dataset. Originated in traditional linguistics [16] , SRL has attracted much interest from Computational Linguistics, leading to the development of large annotated corpora such as FrameNet [1] and PropBank [32] . Trained on such corpora, modern NLP platforms such as SENNA and AllenNLP can perform the SRL task with high accuracy [9, 18] . With the development of deep learning, SRL has been successfully applied to analyze events either as a stand-alone work or as part of an NLP pipeline [14, 27, 37] . As different semantic roles can refer to the same underlying character (e.g. "Kamala Harris" and "Vice President Harris" refer to the same person), other NLP techniques such as named entity recognition and coreference resolution are sometimes used to aggregate similar semantic roles and verbs [55] .

We are especially interested in the characters that play key roles in the COVID-19 crisis and the relationships between them. For example, when the Democrats use the word "help", who are to be helped and who will help them? Furthermore, how are these agents different in the Republican tweets? Our analysis shows the most prominent Agents and Patients in the Democratic/Republican narratives about the pandemic as well as the partisan differences. In particular, we identify a membership categorization process, namely the division between "us" and "them", where "us" is often projected as the heroes and "them" as the villains in each party's narratives. As the most general membership categories, they help people to organize their everyday knowledge and actions [51] . For example, the former President Donald Trump frequently used this categorization in his campaign: "They hate me. They hate you. They hate rallies and it's all because they hate the idea of MAKING AMERICA GREAT AGAIN!" [38] . Our analysis reveals a similar process where memberships are established by the interaction between characters.

Overall, our work applies a set of computational methods to comprehensively describe the elements making up the two parties' narrative frameworks, as well as how they diverge. Such divergence may be one of the "wedges" that exacerbate polarization in U.S. politics.

The combination of methods we employed here to explore political narratives are not limited to politics. The code we develop and publish would allow similar automatic analysis in various domains.

We collect data from major U.S. politicians on Twitter. Using the Twitter lists created by cspan, 1 we retrieve screen names of politicians including: U.S. Senators, House Representatives, state governors, and former President Trump. These Twitter accounts may be managed by the politicians or their staff, but in either case, they convey the messages from these politicians and are integral parts of their public images. We collect tweets from these accounts monthly starting in April 2020. In this study, we use tweets timestamped between February 1, 2020-one week after Wuhan's lockdown started-to July 22, 2020. We use the full texts of tweets and only keep the English tweets.

The number of politicians' tweets from each group is summarized in Table 1 . We found that the Democratic politicians tend to post more compared to their Republican peers. Figure 1 shows the distribution of politicians' posting frequencies and the length distribution of the tweets. We found a highly skewed distribution, where a few politicians tweet often and most only tweet occasionally. The majority of tweets have between 20-50 words for both groups. 

Because we are most interested in the COVID-19 related political discourse, we identify COVID-19 related tweets by checking if "COVID" or "coronavirus" is present in a tweet (case insensitive). This may omit some tweets that are about the pandemic but do not mention the name, but it ensures that all tweets we consider are related to COVID-19.

The number of COVID-19 related versus non-related tweets are show in Table 2 .

For an overall understanding of the topics and key issues that set up the contexts of each party's narrative framework, we identify the over-represented words in their tweets. We use the log-odds ratios with informative Dirichlet priors [41] by computing the log-odds ratio of each word w in two corpora i and j, with a background corpus bg as prior. This is formally expressed as:

where f i is the frequency of the word in the target corpus; for example, words in the COVID-19 related Democratic tweets. f bg is the frequency of the word in the background corpus. In this case, it is the combination of the Democratic and Republican tweets that are not related to COVID-19. n i is the size of the target corpus, and n bg is the size of the background corpus. f j is the frequency of the word in the other corpus, in this case, the COVID-19 related Republican tweets; and n j is the size of this corpus. Furthermore, we compute the z-scores of the log odds ratio as:

where the denominator serves as an estimate of the variance of the log-odds ratio. We choose the top 40 words with highest z-scores from each party's COVID-related tweets as the most over-represented words. We exclude the politicians' names and Twitter handles as they tend to be over-represented in each party's tweets. To better explore these words and the topics they represent, we obtain their contextual embeddings using word embedding models. While many word embedding models are available, we choose the GloVe [50] embeddings as it is considered one of the most effective word embedding models [44] and is widely used. We use the pre-trained GloVe model with 6 billion tokens and a dimensionality of 300.

As many of the topic words are specific to the COVID-19 crisis, we train a new GloVe model on our tweet corpus for 500 epochs 2 to obtain embeddings for words not in the pre-trained GloVe model. Furthermore, for a consistent representation for terms related to "COVID", we compile a list of all tokens including "COVID" or "coronavirus" and replace them with "COVID" in the corpora. After removing emojis and words without embeddings, we show the top 35 words for each party.

To explore the topic words visually, we use the Uniform Manifold Approximation and Projection (UMAP), an effective [59] and efficient [3] dimensionality reduction method, to reduce the dimensionality of the GloVe embeddings. This method works by finding lowdimensional projections of the data that preserves their topological structures in highdimensional space as much as possible [39] . We use the Python package umap. We plot the word embeddings with the dimensionality reduced to 2. With this visual aid, we identify and manually label six clusters for the Democratic tweets and three for the Republican tweets (see Sect. 3).

Most of the traditional framing analysis methods rely on "close reading" and manual examination of linguistic material, and are therefore challenging to apply to our dataset. Here, we employ the FrameAxis model [35] , which allows an exploratory framing analysis through "microframes". A microframe is operationalized as a pair of antonyms, such as "legal" and "illegal", or "fast" and "slow". In political science research, usage of antonyms has been successfully capturing political stances. For example, the Moral Foundations Theory uses five pairs of antonyms such as "Care/Harm" and "Fairness/Cheating" to serve as moral "axes" [22] . Here we use 1621 antonym pairs obtained from WordNet [40] .

We then compute the bias and intensity of each microframe present in a document based on the vector representations of the microframes and other words in the text. We define the contribution of a word to a microframe as the cosine distance between the word vector w and the microframe's vector f (see Kwak et al. [35] for details):

The bias of a microframe is defined as the average contribution of all words in the document to the microframe. It captures the stance of a political argument; for example, a conservative document on the immigration issue may be biased towards illegal rather than legal in the illegal versus legal microframe. Formally, the bias is computed as

where t is a document, f is a microframe, and n w is the number of occurrences of word w in t. Meanwhile, the intensity of a microframe captures how strongly it is presented in a document, regardless of which "pole" the document is closer to. The intensity is computed using the second moment of the word contribution with a background corpus as baseline:

where B T f is the baseline microframe bias of the entire text corpus T on a microframe f for computing the second moment. As the squared term is included in the equation, the words that are far from the baseline microframe bias-and close to either of the polescontribute strongly to the microframe intensity.

Here we compute the bias and intensity for each COVID-19 related tweet, using a background of non-COVID-19 related tweets, for each microframe. We focus on the microframes with the largest difference in intensity between the two parties; for the Democratic party, we present the microframes where the intensity in Democratic tweets is higher than that in the Republican tweets, and vice versa. In addition to showing the microframes, we also show the top 3 tweets with the strongest intensity for each microframe.

To identify important semantic roles, we use the Python package Allennlp [18] to perform semantic role labeling on our corpus. We focus on the verb, the Agents (Arg0 in the Allennlp system), and the Patents (Arg1). To focus on the most common semantic roles, we only consider the Agents and Patients consisting of three or less tokens.

To obtain a list of semantic roles specifically related to the Democratic and the Republican party, we produce two lists of terms most similar to the words "Democrat", "Democratic", and "Republican" using the GloVe embedding model we described above. The terms most similar to "Democrat" and "Democratic" include "dems", "housedemocrats", "reddemocrats", "democraticled", "pelosi", "speakerpelosi", "nancy pelosi", "chuck schumer", "ralph northam", "ayanna pressley", "gwen moore", and "senatedems". The terms most similar to "Republican" include "gop", "republicans", "president", "trump", "donald trump", 'patrick mchenry", "larry hogan", "mitch mcconnell", and "mcconnell' (case insensitive).

We identify important verbs by considering the top 100 most frequent verbs in each party's tweets. We obtain the GloVe embeddings for each verb in the same manner as we describe above. We then use UMAP to reduce the dimensionality of the embeddings, and use the k-means clustering algorithm to group the verbs from each party into 15 clusters. This produces clusters of verbs that are semantically close to each other in daily usage, but also indicates some verb usage that are specific to parliamentary politics.

First, we look at the most characteristic words found in each party's tweets. We start with comparing each word's dense rank [31] in the COVID-related Democratic and Republican tweets and the background corpus to find words over-represented in the COVID-related tweets. While these tweets unsurprisingly features many shared words between parties as shown in Table S1 , we notice that the two parties have different focuses. We therefore use the log-odds ratio to identify the most representative words for each party in Fig. 2 .

We find that the Democratic tweets have over-represented words related to media, such as "telephone", "town hall", and "facebook", while a similar cluster for the Republican tweets appear to be related to the White House and its press conferences, such as "whitehouse" and "press". Additionally, each party has words related to states, cities, and public figures from these places in the U.S. Meanwhile, the largest category in the Democratic tweets appears to be about the pandemic, such as "health", "response", "covid", "emergency", etc. Another cluster including "disparities" and "disproportionately" also suggest that they discuss issues about social and racial inequalities more. In the Republican case, few words such as "inittogether" appears to be directly related to the pandemic. Only the phrases and hashtags for certain region such as "covidma" and "inthistogetherohio" are detected, indicating much less active narrative regarding the pandemic from the Republicans. Lastly, both parties have some unique categories; the Democratic tweets has a cluster related to testing, specifically, including words such as "tested" and "positive". The Republican tweets has a particular cluster about China and the Chinese Communist Party, reflecting the ex-president's narrative against China.

The overrepresented words give us a sense of the topics and issues that set up the context for each party's narrative frameworks. Our analysis of the framing used in each party's tweets reveals the ways in which they shape their narratives. While the two parties share many common microframes about the pandemic, such as new versus worn and endemic versus epidemic (see Figure S2 ), here we focus on the microframes that one party uses significantly more than the other. Figure 3 shows the bias and intensity for each of the top ten microframes we identify (see Sect. 2). For example, the Democratic tweets features the public versus private frame more intensely than the republican tweets, and at the same time they are more biased towards "public" rather than "private".

Since it is hard to interpret the pole words without context, we also show the tweets with the highest intensity for each microframe in Table 3 . Combining the pole words and tweet texts, we find that the Democratic frames strongly feature the economic relief during the pandemic, discussing topics such as financial relief, increased funds for support, free testing, etc., which are picked up by the microframe pole words including free, financial, increased, and paid. Additionally, the public versus private microframe identifies the emphasis on the public aspect of the pandemic and its response. They also frequently tweet about live events and town hall meetings, invoking the live frame. Taken together, we interpret that they emphasize the roles that the government should play regarding the pandemic, contrasting to the Republican framing that we discuss below.

Republican microframes include aid for small business, the eligibility for financial aid, and securing the economy and nation. "Slowing the spread" appears to be the top slogan used in Republican tweets, emphasizing the roles that individuals play, which contrasts the Democratic narrative. Additionally, the top tweets about declaring national emergency, important information, and full statements also suggests that the Republicans tend to use Twitter as a channel for formal announcements.

Finally, we examine the characters in each party's narrative frameworks-people who need healthcare, travelers, voters, etc-and their relationships. For insights into how these characters are represented in the politicians' tweets, we explore the semantic roles in these tweets, in particular, the Agents and Patients. We explore the most frequent Agents and Patients in both parties' tweets in Figure S1 . We find many common semantic roles as personal pronouns, but also notice some unique semantic roles, such as "the resources" and "lives" in Democratic tweets, and "COVID" and "relief " in Republican ones. Furthermore, the Republican tweets often mention the Agent "Democrats", and the Democratic tweets often use "Trump" and "the president".

For a more detailed analysis of the semantic roles, we consider the combinations of an Agent, a verb, and a Patient in each party's tweets. We use the frequency for each combination to identify the most characteristic combinations. We found 321,913 unique combinations in the Democratic tweets and 82,821 unique combinations in Republican tweets. Table 4 shows the top combinations whose frequency in Democratic tweets is higher than in Republican tweets, and vise versa.

We find that most of the top combinations from Democratic tweets convey a message of "they" need support and "we" do everything we can to provide the resources, save lives, etc, further confirming the emphasis on the public response to the pandemic that we found in our framing analysis. Meanwhile, the combinations from Republicans are more diverse, featuring combating COVID, holding press conference, and aiding small businesses. Additionally, one combination discusses the threat of socialism.

From Figure S1 , we also notice that the Agents often contains personal pronouns such as "I", "we", "they", and both parties frequently discuss the opposite party, such as the Agent "Trump" from Democratic tweets, and "Democrats" from Republican tweets, evoking a membership categorization process. We therefore focus on the personal pronouns as Agents that we group into two categories-us, including the personal pronouns "I", "we", "us", "our", and "ours", and them, including the words "they", "their", and "them". Addition- Table 3 Three top tweets from each microframe with the largest difference in intensity between two parties. URLs, emojis, and some special characters are omitted Democratic microframe Republican microframe bound-free fast-slow "Free COVID testing is available near you. " "Today's free COVID testing sites" "Testing, testing, testing. the bill makes sure that COVID testing is free for all Americans. " 'Do your part to slow the spread of the Wuhan COVID:" "We all need to do our part to slow the spread of the COVID. here's what you can do to help:" "rt @housegop: are you doing your part to slow the spread of the COVID?" decreased-increased declared-undeclared "Check to see if you qualify for paid sick leave because of the COVID here" "Stand with @pattymurray and @sengillibrand and support the paid leave act to provide additional support to workers & businesses for paid family and sick leave during the COVID outbreak. " "Today, the house will vote on our next COVID response legislation to provide Americans wpaid family and medical leave, increased federal medicaid funds to support our state public health partners, free testing, & emergency sick leave for those impacted by the virus" "My statement after president @realdonaldtrump declared a #nationalemergency to respond to COVID. " "The first public health emergency was declared on March 6 and allows the state to increase coordination across all levels of government in the state's response to COVID. " "President @realdonaldtrump has declared today as national day of prayer. Please join me in praying for our country as we continue to respond to the COVID pandemic. " sure-unsure important-unimportant "#MD02 constituents, unsure where to turn for local COVID resources? check out the below graphic for the hotline for your county. " "The least the president can do is make sure they have the equipment they need. COVID 3/3" "The response to COVID needs to help all Americans. i'm working with my colleagues to make sure that it does. " "Important information for you and your family about the COVID" "Important information from @cdcgov regarding COVID" "Important COVID update from the @deptofdefense in the thread below. " critical-noncritical large-small "rt @frankpallone: @WHO is critical in the fight against the COVID pandemic. Trump must work with the world's premier public health. . . " "rt @uazmedphx: to address the critical needs of the Navajo nation during the COVID outbreak, #uazmedphx, @repgregstanton, as well as?"

"It is critical that we ensure those who have access to any COVID vaccine are not the privileged few, but the many who actually need it most. " "If you own or work for a small business affected by the COVID pandemic, visit my website for information on support for small businesses" "Visit learn about the EPCC's grant program for small businesses impacted by the COVID find more helpful EPCC small business resources" "Welcomed news for Georgia small business owners. @sbagov emergency loans are now available to impacted businesses in all 159 counties. COVID" financial-nonfinancial eligible-ineligible "Thank you, @abigaildisney, for looking out for the most vulnerable affected by the financial repercussions of COVID. " "rt @repmalinowski: "the COVID will prey not just on the health of Americans but their financial wellbeing. In its next bill responding. . . " "May 1 is quickly approaching, and I know that many marylanders are experiencing severe financial hardship because of the COVID. In this thread you'll find information about financial assistance available in MD. " "Alabamians laid off or unpaid due to COVID are eligible for unemployment compensation" "rt @oronline: if you work in Pennsylvania and the novel COVID has affected your job, you may be eligible for benefits. "

"Small businesses: you may be eligible for up to $2 million in @sbagov low-interest loans if your business has been affected by the COVID. These loans can help fill your working capital needs. Non-profits may also be eligible. Apply online here:" live-recorded empty-full "Tune in now: I'm hosting a Facebook live town hall with @repbillfoster and @repcasten. We will be answering your questions on COVID. Watch live here:" "tune in now for my Facebook live COVID town hall with @stevelockhartmd of @sutterhealth:" "I am #live now on Facebook addressing your questions and concerns about the COVID. Tune in here:" "Read here: my full statement in support of the COVID relief legislation the House just passed. " "My full statement on presumptive COVID cases in South Dakota" "See my full statement on president @realdonaldtrump's new actions to fight COVID here" dead-live insecure-secure "Tune in now: I'm hosting a Facebook live town hall with @repbillfoster and @repcasten. We will be answering your questions on COVID. Watch live here:" "I am #live now on Facebook addressing your questions and concerns about the COVID. tune in here:" "As of 2pm today 1,700 people in my state new jersey are tragically dead from COVID and 16,642 Americans are dead across the country. " "rt @waysandmeansgop: in the phase three package to secure our economy as we fight against COVID, @ustreasury secretary @stevenmnuchi?" "I also thank our brave frontline @tsa officers for the risks they face on our behalf, continuing to keep our nation safe & secure in the COVID pandemic. " "rt @waysandmeansgop: Dems voted against the phase three package to secure our economy as we fight against COVID. This package include?" available-unavailable helpful-unhelpful "More information is available from @cdcgov here: COVIDupdates COVIDUS" "Free COVID testing is available near you. " "rt @sfpelosi: 77,000 Americans killed by COVID unavailable for comment. " "This is a helpful resource for hoosiers to stay updated on COVID" "Here's some helpful information on COVID for pregnant women and parents from the @cdcgov. You can find these and other resources on my website at" "Continue to follow @cdcgov for the latest updates on the COVID and helpful information. #MI06" paid-unpaid first-last "Check to see if you qualify for paid sick leave because of the COVID here" "rt @facttank: new: as COVID spreads, which U.S. workers have paid sick leave? and which don't?" "I stand with @pattymurray and @sengillibrand and support the paid leave act to provide additional support to workers & businesses for paid family and sick leave during the COVID outbreak. " "Love this. @starbucks is fueling our first responders on the frontlines of the COVID crisis! #inittogether" "'rt @chadsabadie: @repabraham: the first responders, you bring calm to chaos COVID" "rt @woodtv: @rephuizenga pitches COVID aid bill for doctors, nurses and other first responders:" private-public affected-unaffected "rt @indivisibleteam: medicines, like the COVID vaccine, that are developed with public money should benefit public health, not create?" "@unitedwaydenver @cohealth coloradans can call the cohelp line for the latest public health information on the COVID at 1-877-462-2911. " "rt @bryan_pietsch: healthcare workers battling the COVID would have their public and private student loans forgiven under a new bill?" "If you own or work for a small business affected by the COVID pandemic, visit my website for information on support for small businesses" "If you own a small business and your operations are being affected by COVID you may be able to get assistance from @sbagov. More info here:" "Appeared on @foxbusiness to discuss congressional action being taken to help Americans affected by COVID" Table 4 Top Agent, verb, and Patient combinations in Democratic and Republican tweets extracted by semantic role labeling with largest differences in frequency. The left column shows the combinations where the frequencies in Democratic tweets are larger than the frequencies in Republican tweets, and vice versa. Most combinations in Democratic tweets focus on resources and support, while combinations in Republican tweets discuss combating COVID, news updates, support for small businesses, and the threat of socialism ally, we compile two lists of words associated with "Democrats" for Republicans, and vice versa (see Sect. 2). We choose specific verbs for a more focused investigation. To leverage the semantic similarities between verbs, we consider the verb clusters that we create from the GloVe embeddings of verbs (see Sect. 2 for details). These clusters are shown in Fig. 4 . Based on the proximity between verbs and examination of their Patients, we choose three sets of verbs that are most relevant to the pandemic, as well as having a number of diverse semantic roles as their Patients. We then consider the Patients with highest frequency for each set of verbs.

We begin by examining the Patients for the verbs "help", "save", and "protect" in Fig. 5 . For both of the "us" and "them" categories, we find a strong shared theme about curbing the pandemic, such as saving lives, helping Americans and public health. Despite some partyspecific Patients such as "#DACA" and "oil companies", these semantic roles indicate an overlap in both parties' tweets when it comes to protecting American people (although the way they frame help can be different as we discuss above).

We then move to the set of verbs "stop", "slow", and "prevent". While both parties share a common theme in "stop the spread", we observe many inter-partisan exchanges for both categories. For example, the Democrats discuss stop "mass employment" and "gun violence", and the Republicans discuss stop "terrorism" as part of their own agendas. In the "them" category, the Democrats accuse the Republicans of stopping Fauci and "doing stock buybacks", and the Republicans calls for the other party to stop "attacking president Trump" and "the deceptive mailers". Compared to the previous set, this set of verbs has much less common Patients between two parties.

Finally, we check the verb "want" and find that the Patients are rather distinctive for both categories. In the "us" category, the Democrats emphasizes "answers", "justice", "a healthy earth", and calling for the Equal Rights Amendment. Meanwhile, the Republicans do not have such strong callings, potentially due to the ruling/opposition party dynamics. In the "them" category, we see strong partisan messages about the opposite party, such as the Republican tweets discussing the Democrats' "blue masks" and "to remove president". This verb does not have any shared Patients, hinting at the different agendas from each party.

One-hundred most frequent verbs from Democratic and Republican tweets. Each verb is plotted using their GloVe embeddings with dimensionality reduced to 2 using UMAP. For each party, the verbs are grouped into 15 distinct clusters using the K-means algorithm. Colors of the points indicate cluster membership

In this work, we characterize the political narrative frameworks about the COVID-19 crisis constructed by two major U.S. parties, demonstrating that a suite of relatively simple natural language processing methods can be applied to a large dataset to produce useful insights into the diverging narrative frameworks. We examine each narrative framework from three aspects: context, framing, and characters and relationships. We show In terms of framing, the Democratic narrative focuses on the financial relief and public health service during the COVID-19 crisis, whereas the Republican narrative emphasizes small business and the role of individuals. When we consider the semantic agents, these different focuses are further exposed, and we also found that while both parties find a common ground in battling the pandemic, they also have distinct agendas and political goals, and use their narratives to criticize the other party.

Our work demonstrates that computational methods can automatically extract strong signatures of political narratives that fit the key theories of political science, providing a useful "recipe" for computational narrative analysis. In addition, we also provide empirical analysis about diverging narrative frameworks in U.S. politics during the pandemic. Our results confirming our intuition, commonsense, and social theories about American politics is a strong evidence for the effectiveness of the tools that we employ. By using an integrated set of computational methods, we bridge the gap between sophisticated NLP methodologies and real-world social problems.

Our study has several key limitations. One limitation of our FrameAxis model is not being able to distinguish word senses; for example, it is not able to separate "live" as the antonym of "dead", and "live" as the antonym of "recorded". This may lead to confusion when both word senses are widely used in the corpora. tweets with very different topics may also be identified under the same microframe, such as in the case of available versus unavailable, where the availability of COVID testing and availability for comment are mixed together. Such limitations may be partially addressed by using contextualized word embeddings such as ELMO or BERT, and will be an interesting future work.

Our semantic agent analysis use modern SRL tools to automatically identify semantic roles, but the interpretation of such roles remain a challenging task. For example, in Fig. 5 , manual examination is required to select the Agents and verbs, as well as inferring their context. We are also limited to showing several small sets of verbs and their semantic roles. Additionally, when we examine the membership categorization, some semantic roles such as "they" may refer to a third group, instead of one of the parties, and these could not be identified by our model. More automatic ways of analyzing and exploring the SRL data can therefore be fruitful future research.

As we are not working on well-established tasks with systematic benchmarks, and because the tools are exploratory in nature (i.e., they serve as discovery tools and should be combined with human expertise in most cases), it is difficult to quantitatively evaluate them, although we have more rigorous evaluation tasks for our FrameAxis model [35] . We believe that designing systematic benchmarks for narrative analysis is a challenging, yet important future work. Nevertheless, even with these limitations, our set of methods provide an effective way to systematically characterize narrative frameworks that can be applied not only to the political communication domain, but to other domains as well.

Supplementary information accompanies this paper at https://doi.org/10.1140/epjds/s13688-021-00308-4.

Additional file 1. Supplementary information (PDF 79 kB)

The Berkeley framenet project

Who leads? Who follows? Measuring issue attention and agenda setting by legislators and the mass public using social media data

Evaluation of UMAP as an alternative to t-SNE for single-cell data

Science vs conspiracy: collective narratives in the age of misinformation

Actual minds, possible worlds

Why "an angel rides in the whirlwind and directs the storm"?: a corpus-based comparative study of metaphor in British and American political discourse

Analysing political speeches

Prediction, events, and the advantage of agents: the processing of semantic roles in visual narrative

Natural language processing (almost) from scratch

Political discourse analysis: exploring the language of politics and the politics of language

Twitter as arena for the authentic outsider: exploring the social media campaigns of Trump and Clinton in the 2016 US presidential election

Personalized campaigns in party-centered politics

Framing: toward clarification of a fractured paradigm

Using semantic role labeling to extract events from Wikipedia

Crowd or hubs: information diffusion patterns in online social networks in disasters

The case for case

Human communication as narration: toward a philosophy of reason, value, and action

Allennlp: a deep semantic natural language processing platform

Who's speaking?: evidentiality in US newspapers during the 2004 presidential campaign

Elusive consensus: polarization in elite communication on the COVID-19 pandemic

Health disinformation & social media: the crucial role of information hygiene in mitigating conspiracy theory and infodemics

When morality opposes justice: conservatives have moral intuitions that liberals may not recognize

Above and below left-right: ideological narratives and moral foundations

The moral mind: how five sets of innate intuitions guide the development of many culture-specific virtues, and perhaps even modules

Framing in social media: how the US Congress uses Twitter hashtags to frame political issues

Who benefits from Twitter? Social media and political competition in the US House of Representatives

Web mining for event-based commonsense knowledge using lexico-syntactic pattern matching and semantic role labeling

All I know about politics is what I read in Twitter: weakly supervised models for extracting politicians' stances from Twitter

Classification of moral foundations in microblog political discourse

A narrative policy framework: clear enough to be wrong?

Scattertext: a browser-based tool for visualizing how corpora differ

From TreeBank to PropBank

Twitter sentiment analysis: the good the bad and the omg!

Personal experiences bridge moral and political divides better than facts

FrameAxis: characterizing microframe bias and intensity with word embedding

Are they talking to me? Cognitive and affective effects of interactivity in politicians' Twitter communication

Applying semantic knowledge to the automatic processing of temporal expressions and events in natural language

Trump turns virus conversation into 'US vs. THEM' debate

UMAP: uniform manifold approximation and projection for dimension reduction

WordNet: a lexical database for English

Fightin'words: lexical feature selection and evaluation for identifying the content of political conflict

Who am I and who are we? Conflicting narratives of collective selfhood in stigmatized groups

Conjectures on world literature

Comparative study of word embedding methods in topic segmentation

The age of Twitter: Donald J. Trump and the politics of debasement

Framing analysis: an approach to news discourse

Analyzing climate change debates in the US Congress: party control and mobilizing networks

Politics and the Twitter revolution: how tweets influence the relationship between political leaders and the public

Narrative in political science

Glove: global vectors for word representation

Lectures on conversation

Polarization of the vaccination debate on Facebook

Political framing: US COVID19 blame game

A narrative analysis of the party platforms: the democrats and republicans of 1984

An automated pipeline for the discovery of conspiracy and conspiracy theory narrative frameworks: bridgegate, pizzagate and storytelling on the web

IHME COVID-19 forecasting team (2020) Modeling COVID-19 scenarios for the United States

New political and communication agenda for political discourse analysis: critical reflections on critical discourse analysis and political discourse analysis

Large-scale computerized text analysis in political science: opportunities and challenges

Dimensionality reduction by UMAP reinforces sample heterogeneity analysis in bulk transcriptomic data

We thank Sandra Kübler, Xiaozhong Liu, Minje Kim, Haewoon Kwak, Jisun An, Byungkyu Lee, Matthew Josefy, and the anonymous reviewers for their insightful comments.

Not applicable. 1 

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. 

The authors declare that they have no competing interests.

EJ and YYA designed the study. EJ collected the data and performed the analysis. EJ and YYA wrote the paper. All authors read and approved the final manuscript.