key: cord-0444252-yx4rvqbv
authors: Lee, Nayeon; Bang, Yejin; Madotto, Andrea; Fung, Pascale
title: Mitigating Media Bias through Neutral Article Generation
date: 2021-04-01
journal: nan
DOI: nan
sha: a81aefe68d597579b2d8a0d207a931afcfb39009
doc_id: 444252
cord_uid: yx4rvqbv

Media bias can lead to increased political polarization, and thus, the need for automatic mitigation methods is growing. Existing mitigation work displays articles from multiple news outlets to provide diverse news coverage, but without neutralizing the bias inherent in each of the displayed articles. Therefore, we propose a new task, a single neutralized article generation out of multiple biased articles, to facilitate more efficient access to balanced and unbiased information. In this paper, we compile a new dataset NeuWS, define an automatic evaluation metric, and provide baselines and multiple analyses to serve as a solid starting point for the proposed task. Lastly, we obtain a human evaluation to demonstrate the alignment between our metric and human judgment.

Media bias refers to the bias produced when journalists report about an event in a prejudiced manner or with a slanted viewpoint (Gentzkow and Shapiro, 2006) . Since news media play a crucial role in shaping public opinion toward various important issues (De Vreese, 2004; McCombs and Reynolds, 2009; Perse and Lambe, 2016) , bias in media could reinforce the problem of political polarization. Due to its potential societal harm, this issue has been extensively studied in social sciences, and there have been both journalistic and computational efforts to detect and analyze media bias (Entman, 1993; Groseclose and Milyo, 2005; Recasens et al., 2013) . However, computational methods for media bias mitigation are still under-explored.

Currently, the main computational approach to mitigating media bias is aggregation of multiple news articles to provide a comprehensive reporting with the additional analysis of news outlets (Park et al., 2009; Sides, 2018; Zhang et al., 2019) . In [Right] Republicans scored an important victory in North Carolina Tuesday as President Trump helped them hang on to a GOP congressional seat in a closely-watched special election.

[Neutral] Republican Dan Bishop won an special election in North Carolina on Tuesday. this way, readers can access diverse news coverage, however, at the cost of reading more articles than what they would normally read. Moreover, since bias within each individual article is not neutralized, it could still undesirably sway readers' views. According to Bail et al., exposure to opposing political views can actually reinforce political polarization. Therefore, presenting articles from different stances alone cannot entirely solve the problem.

As a remedy, we take a step forward from the news aggregation approach and propose a new task -generating a single neutralized article out of multiple biased articles on the same event (multidocument neutralization 1 ). The articles from conflicting news outlets still share the "same set of underlying facts", however, convey different impression of same event through deliberate omission of certain fact and slanted choice of words (Gentzkow and Shapiro, 2006) . With an automatic method to extract and aggregate neutral information from multiple articles, the public could easily access more unbiased news information, leading to reduced risk of political polarization.

In this work, we formulate our new task by i) constructing a weakly-labeled news neutralization dataset, which we call NEUWS, ii) defining a general model setup, and iii) designing an automatic metric called the NEUTRAL score to evaluate the success of the neutralization. Then, we establish a solid baselines by leveraging large pre-trained model for neutralization. Lastly, we conduct human evaluation to examine how aligned our NEU-TRAL score is with human judgement, and provide interesting insights from additional experimental analysis that suggests potential directions for future work.

Media Bias Detection and Prediction Media bias has been studied extensively in various fields such as social science, economics and political science, and various measures have been used to analyze the political preference of news outlets (Groseclose and Milyo, 2005; Miller and Riechert, 2001; Park et al., 2011; Gentzkow and Shapiro, 2010; Haselmayer and Jenny, 2017) . For instance, Gentzkow and Shapiro count the frequency of slanted words within articles. In natural language processing (NLP), computational approaches for detecting media bias consider lexical bias, which is linguistic cues that induce bias in political text (Recasens et al., 2013; Yano et al., 2010; Hamborg et al., 2019b) . While these methods specifically focus on the lexical aspect of media bias, our work attempts to address media bias more comprehensively.

As highlighted by Fan et al., media bias also has an informational aspect due to framing bias, which is selective reporting of an event to sway readers' opinions -e.g., omission of crucial facts or choice of words (Entman, 1993 (Entman, , 2007 Gentzkow and Shapiro, 2006) . Efforts related to informational bias (Park et al., 2011; Fan et al., 2019) are constrained to detection tasks. In this work, we attempt to tackle mitigation of media bias (both lexical and informational bias).

Media Bias Mitigation News aggregation, displaying articles from different news outlets on a particular topic (e.g., Google News, 2 Yahoo News 3 ), is the most common approach in NLP to mitigate media bias, but it still has limitations (Hamborg et al., 2019a) . Thus, multiple approaches have proposed to provide additional information (Laban and Hearst, 2017) , such as automatically classified multiple view points (Park et al., 2009 ), multinational perspectives (Hamborg et al., 2017 , and detailed media profiles (Zhang et al., 2019) . Allsides.com 4 provides bias ratings of each news outlet alongside balanced political coverage. However, they focus on making news consumers more aware of what they are reading. Thus far, there has been no attempt to automatically aggregate biased articles to produce a single neutralized article.

Controlled Text Generation One line of work under controlled text generation tries to de-bias or neutralize text (Dathathri et al., 2019; Pryzant et al., 2020; Ma et al., 2020) . Specifically, efforts were made to reduce toxicity (Dathathri et al., 2019), implicit social bias (Ma et al., 2020) or subjectivity (Pryzant et al., 2020) in generations. Our work differs from theirs in two ways: 1) Neutralization in the previous works is close to revision of single input text with a focus on style, while our work is neutralized aggregation of multiple texts. 2) We focus on media bias. To the best of our knowledge, we are the first to attempt politically neutralized news article generation.

Hallucination Recent studies have shown that neural sequence models can suffer from hallucination of additional content, not supported by the input, as result, adding factual inaccuracy to the generation of abstractive summarization models. To address this problem, many researchers proposed methods to measure factual inconsistency (Holtzman et al., 2019; Kryściński et al., 2019; Zhou et al., 2020; Lux et al., 2020; Gabriel et al., 2020) , and to correct them (Zhao et al., 2020; Cao et al., 2020; Dong et al., 2020) . While these works focus on the factual inaccuracy and inconsistency, we focus on the bias that is not factually incorrect but can still pose a problem due to the way how it affects the readers' opinion.

The main objective of this work is to neutralize biased news articles from two different news outlets (left-winged and right-winged 5 ) into a single Event Democratic presidential candidates ask to see full Mueller report Left Democrats want access to special counsel Robert Mueller's investigation into Russian interference in the 2016 presidential election before President Donald Trump has a chance to interfere. [...] Sen. Mark Warner said in a statement: "Any attempt by the Trump Administration to cover up the results of this investigation into Russia's attack on our democracy would be unacceptable."

Democratic presidential candidates wasted no time Friday evening demanding the immediate public release of the long-awaited report from Robert S. Mueller III. [...] Several candidates, in calling for the swift release of the report, also sought to gather new supporters and their email addresses by putting out " petitions " calling for complete transparency from the Justice Department. neutral article which (i) retains as much information as possible and (ii) eliminates as much bias as possible from the input articles.

We follow the categorization and definition of media bias from Fan et al.

There are two types of media bias: lexical bias, which refers to the writing style or linguistic attributes that may mislead readers, and informational bias, which refers to tangential or speculative information pieces to sway the minds of readers. Such biases can make an article convey a different impression of what actually happened (Gentzkow and Shapiro, 2006) . Ideally, a neutral article should avoid both types of bias by using a neutral tone and including balanced information without preference towards any particular stance or target.

The neutralization task requires a dataset that consists of politically opposing source articles and neutral target article, reporting about same event. We, therefore, build a weakly-labeled data from article triplets consisting of articles from politically left, right and center publishers. The dataset language is English, and mainly focuses on the U.S. political events only. We term our dataset to be weakly-labeled on the following basis. First, there is no single answer to writing a neutral news, thus, there cannot be "the gold" neutral article to optimize for. Second, since all news requires editorial judgements on what is the "important" information to report, it is possible that even news articles from the most neutral publisher contain some bias. Note that, the literature still considers center publishers to be bias-free, especially in comparison to other hyper-partisan publishers (Baum and Groeling, 2008) .

For the political orientations of the publishers, we rely on the Media Bias Ratings 6 , which uses editorial reviews, blind bias surveys (10,000+ community participants), independent reviews, and third party research to correctly judge the political stance of various publishers.

Train/Valid Set To construct this train/valid set, we first crawled the URLs of article triplets from Allsides.com, which displays news coverage of events from left, right, and center publishers. Then, we built custom news crawlers to obtain the full article content from the collected URLs. In total, we collected 1,740 full articles which were compiled into 580 article triplets. The data statistics are listed in Table 3 .

Test Set For test set, we utilize a subset of BASIL dataset (Fan et al., 2019) which contains sentencelevel annotation of media bias within news articles. These annotation of bias spans are key to the measurement of the neutralization performance. This dataset is only used in testset because it is small (100 samples) to be split into train/val/test.

We extend the BASIL dataset by adding center articles. BASIL dataset originally consists of article triplets from Huffington Post (left-wing), Fox News (right-wing) and New York Times (leftwing). The New York Times is the outlet closest to center among the three. However, its political leaning is still considered pro-Liberal (Puglisi, 2011; Chiang and Knight, 2011) . Therefore, we replace the New York Times articles with those from center publishers (e.g. Reuters, BBC) that report on the same event. To ensure that the newly collected center news articles are covering the same event as the left/right articles, we manually confirmed the content and the publication dates.

Notations Here, we introduce the notations used throughout the paper. We denote the two biased articles with X (l) = {x

R }, and a center article as

C }, where x represents a token. These three articles form one triplet of

Then, we denote the set of annotated bias span B (l) =

i,j |1 ≤ i ≤ j ≤ m} as the bias sub-strings in X (l) and X (r) respectively. An example of bias spans is shown in Table 2 .

Model Following the current state-of-the-art in sequence-to-sequence modelling, we propose to firstly encode the concatenation of the two articles and to then generate a neutralized article tokenby-token using a decoder. Hence, given the two articles X (l) and X (r) as a single sequence of tokens, the encoder processes the input as follows,:

where H ∈ R z+n×d and d is the hidden feature size. Note that the conversion between tokens and embedding is done directly in the encoder (EN C). This hidden representation is then passed to the decoder, which generates an articleX (n) token-bytoken in an autoregressive manner. More formally,

In the experiment section (Section 4), we provide more details on how we train this general model.

Two important goals for successful neutralization are to minimize bias and to maximize information recall. Therefore, we introduce ways to measure each aspect and combine the both to serve as the final neutralization score (NEUTRAL).

The most important criterion for success is to assess whether the generated articlê X (n) manages to filter out bias spans (B (l) and B (r) ) originally existing in the input articles.

To quantify this, we obtain generated neutralizedarticles for the whole test set ({X

, and measure the ratio that still contains at least one bias span. The lower the ratio, the better the neutralization performance (i.e., less bias spans). Formally, we define the bias score as follows:

} refers to the union of the bias spans from left-wing and right-wing articles, and BIASEXISTS refers to a function that identifies the existence of bias in a given generated article (X):

Information Recall One of the easiest way of deceiving the BIAS metric is to generate random text that has nothing to do with the biased source articles. Therefore, it is crucial to also ensure that key information is being retained while removing the bias. We adopt the ROUGE-1 (Lin, 2004) recall score between generated neutralized-article,X (n) , and center article, X (c) to measure the information coverage. The higher the ROUGE-1 score, the better the information covered by the neutralized generation. We mainly report unigram-based ROUGE-1 for simplicity, but ROUGE-2 and ROUGE-L are also reported in the appendix for interested readers.

Neutralization Score (NEUTRAL) Since the ultimate goal is to optimize for both the information recall score and bias score, we define a single neutralization score by combining the two:

Through multiplication, both scores get equal weighting in the final metric. 

Summarization, the task of producing a shorter version of one or several documents that preserves most of the input's meaning, has a similar setup to our proposed neutralization task. Therefore, we investigate the zero-shot neutralization performance of two strong BART-based (Lewis et al., 2019) summarization models by utilizing their pre-trained weights to initialize our encoder (ENC θ ) and decoder (DEC θ ). We test with two version of BART trained on the CNN/DailyMail (Hermann et al., 2015) and XSum (Narayan et al., 2018) dataset -BART cnn and BART XSum .

Many Transformer-based (Vaswani et al., 2017) pre-trained language models (Raffel et al., 2019; Lewis et al., 2019) achieved excellent performance in downstream tasks through simple fine-tuning with, small, task-specific data. Therefore, we finetune the encoder-decoder parameters of pre-trained Seq2seq models by minimizing the negative loglikelihood over the training set D. We experiment with the following two pre-trained models:

• BART f t : pre-trained BART-large model finetuned with NEUWS data.

• T5 f t : pre-trained T5-base (Raffel et al., 2019) model fine-tuned with NEUWS data.

Next, we explore adding an additional loss when fine-tuning the Seq2Seq model. We encourage the model to learn about hyper-partisanship writing and, as result, learn to avoid generating text alike. This is done by incorporating an additional classification head on top of the encoder (ENC θ ) to jointly optimize for this classification cross-entropy loss with the original negative log-likelihood loss from the decoder (DEC θ ). In our experiments, the following settings are explored:

• BART f t+LFakeNews : pre-trained BART-large jointly fine-tuned on NEUWS and fake news dataset (Potthast et al., 2017) .

• BART f t+LProp : pre-trained BART-large jointly fine-tuned on NEUWS and propagandistic sentence detection task (Da San Martino et al., 2019) , which is to classify whether a given sentence contains any propagandistic technique (e.g., "Name calling", "Appeal to fear").

All our experimental codes are based on the Hug-gingFace library (Wolf et al., 2020) . During training, and across models, we used the following hyper-parameters: 10 epoch size, 3e − 5 learning rate and a batch size of 8. We did not do hyperparameters tuning since our objective is to provide various baselines and analysis. Training run-time for all of our experiments are fast (< 6hr). No pre-processing of the text was made, except for the concatenation of left and right articles with special token [SEP] in the middle (i.e. " left-article [SEP] right-article "). When concatenating, we ensured the first half to begin with left-articles and the other half to begin with right-articles. This was done to avoid any unintended bias from the ordering of the ideology in the input.

All the experimental results are reported in Table 4 .

A. Zero-shot From Table 4 , we can observe that the zero-shot performance of both summarization models on the proposed neutralization task are poor in terms of NEUTRAL. One possible reason is due to the difference in the training data distribution, especially regarding the characteristics of the target generation. In fact, summarization tasks focus on obtaining concise and representative "summary" whereas our task focuses on obtaining neutral and representative article. Within the summarization models, we can observe that different training datasets (CNN vs XSum) lead to different neutralization performance (11.85 vs. 7.32).

It is important to note that the final NEU-TRAL score of BART XSum is very low (7.32) despite having the lowest BIAS score (6%). The reason behind BART XSum 's low bias score is its extremely short generation length (1.01 ± 0.10 sentence on average). By design, our NEUTRAL metric includes the ROUGE-1 score as well, which serves as a counteractor in such scenario. This illustrates the effectiveness of considering both the comprehensiveness and neutrality of the generated article to avoid pitfalls.

B. Fine-tuning We compare and report two finetuned Seq2Seq models. It is evident that the choice of the base model (i.e., BART f t vs. T5 f t ) greatly affects the neutralization performance after the fine-tuning. BART f t achieves double the NEU-TRAL score compared to the zero-shot baselines, but T5 f t hardly shows any improvement. A likely explanation for this observation would be the difference in the parameter size of these two base models. BART-large has 406M parameters when T5-base only has 220M, so BART f t has access to more rich features and bigger model capacity.

From the zero-shot and fine-tuning experiments, we can observe a weak positive correlation between the ROUGE-1 scores and the BIAS scores -i.e. lowest ROUGE-1 result (7.79) with the lowest BIAS score (6%) and highest ROUGE-1 (38.11) with the highest BIAS score (42%). This is because these baseline models failed to select the important information in a neutral manner from input articles. With this positively correlating pattern, achieving even the highest performance in one metric would not lead to a good neutralization score. To illustrate, achieving 100.00 ROUGE-1 can still result in 0.00 NEUTRAL score if BIAS is also 100. Therefore, more sophisticated models are required to correctly identify as much important information as possible while avoiding selecting bias spans at the same time. In generation tasks, various decoding strategies have been proposed and studied to find the optimal method of decoding for different tasks. For instance, sampling-based decoding techniques are normally adopted to encourage a more diverse response instead of being generic. Thus, we also investigated the impact of decoding strategy on the generation neutrality on the best performing baseline (BART f t+LProp ). We explored with two focuses: i) whether different decoding strategies affect the neutralization performance, and ii) if yes, what is the best or the worst decoding strategy for the task of neutralization. Four decoding techniques were explored − BEAM Search, Top-k Sampling and Nucleus Sampling (Top-p) (Holtzman et al., 2019) .

The results are shown in Table 5 . For ease of comparison between the decoding techniques, we Event 1: Obama willing to use executive orders on guns [BARTft] : President Obama unveiled a sweeping plan on Wednesday to curb gun violence that includes a ban on assault weapons and a 10-round limit on high-capacity magazines, as well as measures to improve school safety and mental health. The president's proposal, unveiled at the White House after a month-long review by Vice President Joe Biden and other top officials, is expected to face stiff opposition from gun-control advocates and Republican lawmakers, who have vowed to fight it in Congress.

[BARTft+L Prop ]: President Obama said Wednesday he will propose sweeping gun control measures to address a growing gun violence crisis in the United States, including a ban on assault-style weapons and requiring background checks for all gun purchases. The proposal was unveiled in response to the latest deadly shootings in America , the latest in which 20 people were killed by a gunman with an assault-type rifle.

Event 2: Netanyahu plans to deliver a speech to Congress [BART f t ]: Susan Rice, President Obama's national security adviser, said on Tuesday that Israeli Prime Minister Benjamin Netanyahu's planned speech in front of a joint session of Congress will be "destructive" to U.S.-Israel relations. "It's destructive to the fabric of the relationship," Rice said on PBS's "Charlie Rose" show. "We need to keep it that way. The American people want it to be bipartisan."

[BART f t+LProp ]: Israel Prime Minister Benjamin Netanyahu will address a joint session of Congress next week on his visit to Washington in what will be his first speech in almost a decade. He will address lawmakers in an address that will come two weeks before he is scheduled to visit the country ahead of his expected election. The event, which will take place at the U.S. Capitol on March 3 , comes amid tensions between President Obama and Republicans in Congress over Iran's nuclear program and tensions between the United States and Israel ahead of next month's elections. Table 6 : Generation examples from BART f t and BART f t+LProp for two different events. According to our NEU-TRAL metric, BART f t+LProp is more neutral than BART f t . We highlight neutral and biased parts of the generations.

report the average NEUTRAL scores as well. We can observe that BEAM and Top-p have rather indifferent average NEUTRAL scores, although each has its strength -BEAM produces better ROUGE-1 scores, whereas Top-p achieves lower BIAS scores (17%). The most notable observation is that the performance drops shown in Top-k decoding results -they experience marginal drops in ROUGE-1 (10) while retaining a similar level of bias. It would be an interesting future work to understand the root cause of this phenomenon and devise a decoding technique that can better avoid generating media-bias.

In Fig 2, we visualize the breakdown ratio of bias types (lexical bias, informational bias, or both) that exist in different versions of generation. Through this visualization, we aim to investigate if any one type of bias is harder to eliminate than another.

To begin with, we illustrate the ratio from "original input" that refers to the bias breakdown of the biased input articles (X (l) + X (r) ). This gives a complete picture of the original breakdown of bias before the neutralization attempt. Then, we visualize the bias breakdown of the generation from Original Input refers to biased articles before neutralization. The rest two represent generated articles from BART f t and BART f t+LProp models. BART f t model and BART f t+LProp model.

The most important insight is that majority of the eliminated bias is informational bias. For BART f t+LProp , the "lexical bias only" ratio is relatively unchanged compared to the big drop in "info bias only". Similarly, BART f t 's "lexical bias only" ratio stays unchanged, clearly indicating the difficulty involved in neutralizing the lexical bias. This is likely because lexical bias normally exists as a very short phrase that is hidden within an otherwise neutral sentence, thus, requiring more sophisticated mitigation.

On the contrary, informational bias is shown to be relatively easy to mitigate. We conjecture this is due to the informational bias often being additional piece of information to sway readers mind. There is a high chance for non-overlapping information between politically conflicting articles to be the informational bias -this serves as a good indicator for the neutralization models. For instance, as shown in Table 2 , Left article contains informational bias span "before President Donald Trump has a chance to interfere," negatively targeting Donald Trump, which does not appear in the Right article.

We compare the generation outputs from two models with different NEUTRAL scores to qualitatively check if the difference in scores aligns with the generated articles. In this analysis, we selected two models BART f t+LProp , relatively more neutral model, and BART f t , which is the less neutral model.

The generation examples 7 from Table 6 illustrates that BART f t+LProp indeed generates more neutralized version. In the first example about gun control, we can clearly notice a difference in the language used when describing the same information -BART f t+LProp writes in a neutral manner ("address" gun violence"), but BART f t writes in more sensational style ("curb" gun violence). Going further, we can also observe the difference in the neutrality from the nature of additional information being generated by each model. BART f t+LProp provides substantial detail directly related to the event being discussed (i.e., the reason behind the gun control proposal), but BART f t reports on the negative opposition that "is expected" to be faced by those who have "vowed to fight it in Congress".

For the second example related to Netanyahu, BART f t generates more provoking contents (i.e., the polarized relationship between US and Israel) and lexicons (i.e., "destructive"), whereas BART f t+LProp generation focuses on the factual information related to the actual event (i.e., event will take place at the U.S. Capitol on March 3).

We conduct A/B testing of neutrality between two articles to verify the alignment between our NEU- 7 We provide more examples in the appendix. TRAL metric and human judgment. The A/B paircomparison method is chosen over the scale-rating evaluations method because it is shown to be more reliable in the literature (Kiritchenko and Mohammad, 2017) . We carry out a human evaluation on a data annotation platform, Appen.com. Each annotator is provided two different model generations (i.e., BART f t vs. BART f t+LProp ) and is asked to select a less biased version. Our goal is to see if the model with a higher NEUTRAL score (i.e., BART f t+LProp ) is also perceived to be more neutral by humans as well. We obtained three annotations each for 50 random samples of the generations. 74% voted for BART f t+LProp to be more neutral with an average sample-wise agreement of 78.14%, suggesting that our NEUTRAL metric aligns with the human judgment.

In addition, we also asked the annotators to vote for the generation that has higher information overlap with the given neutral article. 60% of the annotators voted for BART f t+LProp to have higher information coverage as well, with the average sample-wise agreement of 75.52%.

To ensure the quality of annotation, we did the following. First, we only allow annotators that pass the qualification test step to participate in the A/B testing. Second, we ask for the political orientation to only incorporate the answers from annotators with center political orientation. This is done to avoid as much political bias of annotators as possible. For more detail, refer to the Appendix.

In this work, we presented a novel generation-based media bias mitigation task. By providing a new dataset, evaluation metric and baselines, this work serves as a solid benchmark for the proposed task. Through a human evaluation, we showed that the proposed metric aligns with the human judgement. Furthermore, our experimental results empirically proved that adding additional disinformation related loss is helpful in neutralization. Lastly, our analysis showed that lexical bias is harder to neutralize than informational bias and that decoding techniques could affect the neutrality of the generations. We hope our work initiates more research on automatic media-bias mitigation to make a positive impact on society. 

Our proposed work addresses directly a major ethical issue in our society, namely media bias. We propose a new task for computational mitigation of media bias. We believe in the potential beneficial impact of our work as, if published, it will lead to more research in mitigating media bias. Table 1 : ROUGE-2 and ROUGE-L recall scores for baseline generations using beam-4 decoding.

The human evaluation was to check alignment of our metric and actual generated articles. Before annotators are given the set of articles, they need to read instruction to have basic understanding of media bias. Please refer to Figure 1 . Annotators are asked to choose between given articles that are anonymously names as "Article 1" and "Article 2". The questions that annotators were asked to answer are following:

1. Which article is more biased?

2. Which article is more neutral?

3. Which article contains more information that overlapping with the given neutral article?

4. Which article contains less information that overlapping with the given neutral article?

5. Check if you are a citizen of U.S.

To ensure the quality of human evaluation, we only selected the qualified annotators who are experienced, possessing higher accuracy rate in the annotation platform (Appen.com).

We also added a separate set of qualification tasks so we could ensure the human annotator understand the task. We designed the quiz set by taking human-written neutral summary from Allsides.com to be a neutral article. Then, we prepared two manually selected articles (i) Neutral-Modified article, which is expected to be more neutral (ii) Biased article, which is expected to be selected as more biased. For Neutral-Modified article, we changed the sequence of sentence, but keeping the information and style of writing unchanged. For the biased sample, we took opinion pieces 1 written about the event. An example qualification quiz set is available in Table 2 .

We provide more example generations from BART f t and BART f t+LProp . Please refer to Ta 

Media bias refers to the bias produced when journalists report about an event in a prejudiced manner or with a slanted viewpoint. Neutral News Article refers to those without any preference towards any particular individual, group or any party.

In this job, you will be presented with two short articles about one political event. You will be also given a neutral article for reference and to provide the background about the event. After reading two articles, you have to decide which article is the more neutral or less biased. And, you have to assess which article covers the most overlapping information with the exemplary neutral article. Again, you have to choose the most information coverage and least information coverage. 

To decide whether it is bias or not, you can think of the following criteria:

Commission of unnecessary information:If the article includes any unnecessary information that potentially frame a story of the event in certain directly (both positively or negatively toward some specific target)

If the article has omitted any information that could help readers to have broader view on the event.

'Paul Ryan went on offense Tuesday in response to criticism over his Medicare p lan, using an interview with Fox News coupled with a new TV ad to claim Preside nt Obama's health care plan treats the treasured entitlement like a "piggy ban k," while the "Romney-Ryan" plan preserves it.'

In a letter dated Thursday, the GOP committee members accused Schiff of standin g "at the center of a well-orchestrated media campaign" about a possible Trump-Russia connection.

If an article contains any biased phrases or biased way of description.

And that was the point of contention for Democratic lawmakers who complained th at exempting parts of the document from the reading undermines the objective of the exercise.' Trump made waves by very loudly claiming he was considering running as a Republ ican earlier in the year.

Note:Please assess them as neutral as possible, regardless of your political stance.

What articles are considered to be biased? President Donald Trump's lawyer Rudy Giuliani and former federal prosecutor Sidney Powell led a press conference Thursday, claiming to have proof of a mass conspiracy to elect Joe Biden and manipulate the 2020 presidential election results against Trump. Trump's legal team described him as the election's winner on Thursday; Joe Biden is widely projected to be elected the next president, and there's currently no public evidence of the decisive voter fraud claimed by Trump, Giuliani and others.

Neutral Modified Article 1: Rudy Giuliani, President Donald Trump's lawyer, and Sidney Powell, former federal prosecutor, led a press conference, claiming to have proof of a mass conspiracy to elect Joe Biden and manipulate the 2020 presidential election results against Trump. Trump's legal team described him as the election's winner on Thursday. Currently, there is no public evidence of the decisive voter fraud claimed by Trump, Giuliani and others while Joe Biden is widely projected to be elected the next president.

Article 2: Rudy Giuliani sweated so much during a press conference Thursday that streaks of what appeared to be dark hair dye streamed down his cheeks, making for a bizarre juxtaposition as the ex-mayor insisted without evidence that Democrats stole the election from President Trump. Giuliani, who's spearheading the Trump campaign's bumbling legal battle to overturn Joe Biden's election, did not appear to notice the black liquid dripping down from his sideburns, though he kept dabbing pearls of sweat from his face with a napkin. which on Thursday sued to stop the state from enforcing its state law. In a federal lawsuit filed in the U.S. Supreme Court, attorneys general from Nebraska and Oklahoma accuse the state of acting illegally by allowing marijuana sales within its borders under its so-called Amendment 64, a measure voters overwhelmingly approved in 2012 that legalizes the possession and consumption of cannabis for adults. The two states filed a lawsuit in federal court in Nebraska asking the court to stop Colorado from enforcing the law.

[BART f t ] Nebraska and Oklahoma on Thursday filed a federal lawsuit challenging Colorado's legalization of recreational marijuana, saying the state's law violates federal law. The two states, which voted overwhelmingly in 2012 to legalize the drug for adults over 21, say the measure violates the supremacy clause of the U.S. Constitution, which bars the federal government from interfering with state marijuana laws. "Federal law undisputedly prohibits the production and sale of marijuana," Nebraska Attorney General Jon Bruning said at a news conference in Lincoln.

Event: AARP drops its opposition to Social Security cuts

[BART f t+LProp ] The head of the American Association of Retired Americans told CNN on Friday he had no plans to change its decades-old stance on cuts to Social Security benefits, a move that would signal a shift in the organization's position on the government's entitlement program. David Certner said the group's position remains unchanged on cuts, despite a Wall Street Journal report on Thursday that it could accept cuts in benefits under a potential privatization proposal that would shrink benefits and increase taxes on wealthier Americans.

[BART f t ] WASHINGTON (AP) -The head of the AARP, the nation's largest advocacy group for senior citizens, on Friday disputed a report that said the group was softening its opposition to cutting Social Security benefits. AARP CEO Barry Rand issued a statement saying the Wall Street Journal story was "misleading" and that the group remains opposed to cutting benefits for the sake of deficit reduction and to any privatization plan.

Event: White House Plays Down Data Program

[BART f t+LProp ] The president on Friday defended his administration's role in intelligence programs as part of the "war on terrorism" after the Guardian and The Washington Post reported on a secret phone records collection program that spied on Internet companies. "I welcome this debate and I think it's healthy for the American people to have a full and full discussion," President Barack Obama said. "The programs that we have in place are essential for keeping us safe from terrorism and keep us in the know about the threat of radicalization and foreign interference"

[BART f t ] President Obama on Friday defended his administration's surveillance programs, saying they are essential to fighting terrorism and that Congress has been fully briefed on them. "The programs are secret in the sense that they are classified. They are not secret, in that every member of Congress have been briefed," he said during a speech in San Jose, Calif. "These are programs that have been authored by large bipartisan majorities repeatedly since 2006."

Exposure to opposing views on social media can increase political polarization

Team yeon-zi at semeval-2019 task 4: Hyperpartisan news detection by de-noising weakly-labeled data

Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension

Rouge: A package for automatic evaluation of summaries

Truth or error? towards systematic analysis of factual errors in abstractive summaries

Powertransformer: Unsupervised controllable revision for biased language correction

How the news shapes our civic agenda

The spiral of opportunity and frame resonance: Mapping the issue cycle in news and public discourse. Framing public life: Perspectives on media and our understanding of the social world

Don't give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization

Newscube: delivering multiple aspects of news to mitigate media bias

Contrasting opposing views of news articles on contentious issues

Media effects and society

A stylometric inquiry into hyperpartisan and fake news

Automatically neutralizing subjective bias in text

Being the new york times: the political behaviour of a newspaper. The BE journal of economic analysis & policy

Exploring the limits of transfer learning with a unified text-to-text transformer

Linguistic models for analyzing and detecting biased language

Event: Walmart sex discrimination case goes to supreme court [BART f t+LProp ] A group of women alleging gender discrimination at Wal-Mart Stores Inc. is seeking a nationwide class action lawsuit over pay and promotions that could lead to billions in damages from the world's largest retailer. The Supreme Court heard arguments on Tuesday about whether the company has violated the rights of workers who claim they were underpaid or mistreated at work. The nine justices heard arguments Tuesday before a panel of three liberal justices about whether Wal-MEX Inc. must answer claims that it pays its workers less than men and sets wage and job conditions that favor men.[BART f t ] The Supreme Court on Tuesday appeared ready to dismiss a massive class action lawsuit against Wal-Mart alleging that the world's largest retailer discriminates against women by favoring men in pay and promotions. The justices heard arguments in a case that could be the largest class action of its kind in U.S. history and could have far-reaching implications for other women's rights cases. Justice Anthony Kennedy, often a key vote on the high court, said the women's argument points in apparently conflicting directions.