key: cord-0101273-rtiqsoef
authors: Kelk, Ian; Basseri, Benjamin; Lee, Wee Yi; Qiu, Richard; Tanner, Chris
title: Automatic Fake News Detection: Are current models"fact-checking"or"gut-checking"?
date: 2022-04-14
journal: nan
DOI: nan
sha: d2dd28ee877efad0931da057555f43fb23b483e3
doc_id: 101273
cord_uid: rtiqsoef

Automatic fake news detection models are ostensibly based on logic, where the truth of a claim made in a headline can be determined by supporting or refuting evidence found in a resulting web query. These models are believed to be reasoning in some way; however, it has been shown that these same results, or better, can be achieved without considering the claim at all -- only the evidence. This implies that other signals are contained within the examined evidence, and could be based on manipulable factors such as emotion, sentiment, or part-of-speech (POS) frequencies, which are vulnerable to adversarial inputs. We neutralize some of these signals through multiple forms of both neural and non-neural pre-processing and style transfer, and find that this flattening of extraneous indicators can induce the models to actually require both claims and evidence to perform well. We conclude with the construction of a model using emotion vectors built off a lexicon and passed through an"emotional attention"mechanism to appropriately weight certain emotions. We provide quantifiable results that prove our hypothesis that manipulable features are being used for fact-checking.

Recent events such as the last two U.S. presidential elections have been greatly affected by fake news, defined as "fabricated information that disseminates deceptive content, or grossly distort actual news reports, shared on social media platforms" (Allcott and Gentzkow, 2017) . In fact, the World Economic Forum 2013 report designates massive digital misinformation as a major technological and geopolitical risk (Bovet and Makse, 2019) . As daily social media usage increases (Statista Research Department, 2021) , manual fact-checking cannot keep up with this deluge of information.

Automatic fact-checking models are therefore a necessity, and most of them function using a system of claims and evidence (Hassan et al., 2017) .

Given a specific claim, the models use external knowledge as evidence. Typically, a web search query is treated as the claim, and a subset of the top search results is treated as the evidence. There is an implicit assumption that the fact-checking models are reasoning in some way, using the evidence to confirm or refute the claim. Recent research (Hansen et al., 2021) found this conclusion may be premature; current models can show improved performance when considering evidence alone, essentially fact-checking an unasked question. While this might seem reasonable given that the evidence is conditioned on the claims by the search engine, this can be exploited as illustrated in Figure 1 , which shows that evidence returned using a ridiculous claim can still appear reasonable if we view the evidence alone without the claim. Furthermore, textual entailment requires both a text and a hypothesis; if we have a result without a hypothesis, we are performing a different, unknown task.

This finding indicates a problem with current automatic fake news detection, signaling that the models rely on features in the evidence typical to fake news, rather than using entailment. Since most automated fact-checking research is primarily concerned with the accuracy of the results, rather than addressing how the results are achieved, we propose a novel investigation into these models and their evidence. We use a variety of pre-processing steps, including neural and non-neural ones, to attempt to reduce the affectations common in evidence: Figure 1 : An example of why evidence alone does not suffice in identifying fake news, despite the evidence being conditioned on the claim as a search-engine query. Although the returned evidence appearing reputable, it is clear that it has little relevance to deciding the veracity of the claim that "all Canadians have eaten at least one bear." et al., 2019), adding an "emotional attention" layer to weight the most relevant emotional signals in a given evidence snippet. We make our code publicly available. 1

With each of these methods, we focus on scores where the models perform better using both the claims and the evidence combined, S C&E , rather than with the evidence alone, S E . Going forward, we will refer to the difference between these dataset combinations as the delta of the pre-processing step, where delta = S C&E − S E . A positive delta score indicates that the claim was useful and helped yield an increase in performance. Since we are removing indicators that the current models rely on, some of the models perform worse at the task than they did previously. However, a surprising result is that many improved, and the need to consider the claim and the evidence together is a sign of using reasoning rather than manipulable indicators.

Under current fact-checking models, adversarial data can subvert these detectors. Paraphrasing can be performed by inserting fictitious statements into otherwise truthful evidence with little effect on the model's output. For example, an article titled "Is the GOP losing Walmart?", could have "Walmart" substituted with "Apple," and the predictions are nearly identical despite the news now being fictitious (Zhou et al., 2019) . 1 GitHub repository link

There has been significant work with automatic fact-checking models using RNNs and Transformers (Shaar et al., 2020a; Alam et al., 2020; Shaar et al., 2020b) as well as non-neural machine learning using TF-IDF vectors (Reddy et al., 2018) .

Current fake news detection models that use a claim's search engine results as evidence may unintentionally use hidden signals that are not attributed to the claim (Hansen et al., 2021) . Additionally, models may in fact simply memorize biases within data (Gururangan et al., 2018) . Improvements can be made when using human-identified justifications for fact-checking (Alhindi et al., 2018; Vo and Lee, 2020) , and making use of textual entailment can offer improvements (Saikh et al., 2019) .

Emotional text can signal low credibility (Rashkin et al., 2017), characterizing fake news as a task where pre-processing can be used effectively to diminish bias (Giachanou et al., 2019; Babanejad et al., 2020) . A framework to both categorize fake news and to identify features that differentiate fake news from real news has been described by Molina et al. (2021) , and debiasing inappropriate subjectivity in text can be accomplished by replacing a single biased word in each sentence (Pryzant et al., 2020) . Figure 2 : Ablation studies where evidence was sequentially removed for training and evaluation of models. On the far left, we show the most effective non-neural pre-processing compared to the baseline of none. Performance generally worsens as the ablation increases.

Using the claim as a query, the top ten results from Google News ("snippets") constitute the evidence (Hansen et al., 2021) . PolitiFact and Snopes use five labels (False, Mostly False, Mixture, Mostly True, True), which we collapse to True, Mixture, and False.

To construct the emotion vectors for our EmoAttention system, we use the NRC Affect Intensity Lexicon, which maps approximately 6,000 terms to values between 0 and 1, representing the term's intensity along 8 different emotions (Mohammad, 2017). For example, "interrupt" and "rage" are both categorized as anger words, but with the respective intensity values of 0.333 and 0.911.

The most common automatic fact-checking NLP models are based on term frequency, word embeddings, and contextualized word embeddings, using Random Forests, LSTMs, and BERT (Hassan et al., 2017). We limit our experimentation to the BERT model, as it is the highest performing state-of-the-art model and was thoroughly tested in (Hansen et al., 2021) . This BERT model with no pre-processing is our baseline model.

For the style transfer model we use the Styleformer model (Li et al., 2018; Schmidt, 2020) , a Transformer-based seq2seq model.

We also develop our own BERT-based model using the EmoLexi and EmoInt implementation of the EmoCred system by adding an emotional attention layer to emphasize certain emotion representations for a given claim and its evidence (Giachanou et al., 2019) . There is also a snippet attention layer at-tending to which evidence itself should be weighted most heavily for the given claim. Our goal is to separate affect-based properties from factual content of the text. Toward this, we run a large number of permutations of the following four simple pre-processing steps (see Figure 4 in Appendix B for results). These steps were chosen as they have been shown to facilitate affective tasks such as sentiment analysis, emotion classification, and sarcasm detection (Babanejad et al., 2020) . In some cases we used a modified form -such as removing adverbs for POS pre-processing.

• Negation (NEG): A mechanism that transforms a negated statement into its inverse (Benamara et al., 2012) . An example, "I am not happy" would have "not" removed and "happy" replaced by its antonym, forming the sentence "I am sad."

• Parts-of-Speech (POS): We keep only three parts of speech: nouns, verbs, and adjectives. We initially included adverbs but found removing them improved results. This could be due to some adverbs being emotionally charged.

• Stopwords (STOP): These are generally the most common words in a language, such as function words and prepositions. We use the NLTK library.

• Stemming (STEM): Reducing a word to its root form. We use the NLTK Snowball Stemmer.

We use the adversarial technique of generating paraphrases for all the claims and evidence through style transfer. The neural Transformer-based seq2seq model Styleformer changes the formality of the text, and it frequently changes the ordering of the sentence itself, too. For example, the formal-to-informal model changes "A photograph shows William Harley and Arthur Davidson unveiling their first motorcycle in 1914" to "In a 1914 photograph William Harley and Arthur Davidson unveil their first motorcycle." As well, it removes punctuation and alters phrasing that might be understood as sarcasm, such as "Melania Trump said that Native Americans upset about the Dakota Access Pipeline should 'go back to India"' to "Melania Trump told Native Americans that was upset by the Dakota Access Pipeline, that they should travel to India." The informalto-formal model lowercases everything and also changes the text significantly.

We chose this paraphrasing model based on the idea that fake news -especially that which is frequently posted on social media -has a certain polarizing style that might be neutralized by altering the formality of the text. Rather surprisingly, we received better results transforming the style from formal-to-informal than we did with informal-toformal.

The EmoCred systems of EmoLexi and EmoInt use a lexicon to determine emotional word counts and intensities, respectively (Giachanou et al., 2019) . We use the NRC Affect Intensity Lexicon, a "highcoverage lexicons that captures word-affect intensities" for eight basic emotions, which were created using a technique called best-worst scaling (Mohammad, 2017). These eight emotions can be used to create an emotion vector for a sentence, where each index corresponds to a score: [anger, anticipation, disgust, fear, joy, sadness, surprise, trust].

As an example, a sentence that contains the word "suffering" conveys sadness with an NRC Affect Intensity Lexicon intensity of 0.844, whereas the word "affection" indicates joy with an intensity of 0.647. We create the vector of length eight, and for each word associated with an emotion, the emotion's indexed value is either: (1) incremented by one for EmoLexi; or, (2) incremented by its intensity for EmoInt. Thus, the sentence "He had an affection for suffering" would have an EmoLexi emotion vector of [0, 0, 0, 0, 1, 1, 0, 0] and an EmoInt emotion vector of [0, 0, 0, 0, 0.647, 0.844, 0, 0] We build on this EmoCred framework, adding an attention system for emotion that gives a weight to each emotion vector, just as the attention layer for each snippet gives a weight to each snippet. The end result is that two independent attention layers attend to the ten snippets and ten emotional representations independently, and we call the resulting system Emotional Attention (see Figure 3 ).

Surprisingly, the four top-performing models with the Snopes dataset include two non-neural models and two neural models. All four achieve greater F1 Macro scores than the baseline BERT model without pre-processing (see Figure 2 ). POS and STOP yield the biggest delta between S C&E vs. S E , followed by EmoInt and Informal Style Transfer. However, EmoInt yields the highest F1 Macro, followed by POS, Informal, and STOP.

In PolitiFact, none of the pre-processing steps achieve a delta greater than zero for S C&E versus S E . The combination of POS+STOP steps comes closest to parity, followed by EmoInt, then POS and STOP. For the best F1 Macro scores overall, EmoAttention's two forms (i.e., EmoInt and EmoLexi) were the two best, followed by STOP and POS. All of these pre-processing steps achieve higher F1 Macro scores than the baseline BERT model. Further, they yield better deltas for S C&E versus S E , implying that the model now requires the claims to reason.

Many pre-processing steps increase both the model's F1 scores and its need for claims and evidence, validating our hypothesis that signals in style and tone have become a crutch for factchecking models. Rather than doing entailment, they are leveraging other signals -perhaps similar to sentiment analysis -and relying on a "gut feeling". EmoAttention generates our best predictions and deltas, confirming our suspicion that the models rely on emotionally charged style as a predictive feature. This is further narrowed to emotional intensity: the EmoInt intensity score-based model performs much better than its count-based counterpart EmoLexi. Thus, evidence containing emotions associated with fake news will be considered more when scoring the claim. One surprising result is the effectiveness of the simple POS and STOP pre-processing steps. POS only included nouns, verbs, and adjectives (i.e., a superset of STOP). This could explain why it has the best delta between S C&E vs. S E . Future research could investigate if stopwords, which are often discarded, actually contain signals such as anaphora: a repetitive rhetoric style which can affect NLP analyses (Liddy, 1990) .

As an example, Donald Trump makes heavy use of anaphora in his 2017 inauguration speech: "Together, we will make America strong again. We will make America wealthy again. We will make America proud again. We will make America safe again. And, yes, together, we will make america great again." (Trump Inauguration Address, 2017) By removing stopwords "we", "will" and "again", the model relies less on the text's rhetoric style and more on the entailment we are seeking. We propose further study on the effects of STOP and POS, as well as experimenting with different emotional vectors and EmoAttention to make factchecking models more robust. Automatic Fake News detection remains a challenging problem, and unfortunately, current fact-checking models can be subverted by adversarial techniques that exploit emotionally charged writing.

Disinformation is much more than just a mild inconvenience for society; it has resulted in needless deaths in the COVID-19 pandemic, and has fomented violence and political instability all over the globe (van der Linden et al., 2020). Our goal in this paper is to discover exploitable weaknesses in current fact-checking models and recommend that such models not be relied upon in their current form. We point out how the models are dependent on emotional signals in the texts instead of exclusively performing textual entailment, and that additional research needs to be done to ensure they are performing the proper task.

Harm Minimization Our quantifying of the effects of pre-processing on fact-checking models does not cause any harm to real-world users or companies. Research has demonstrated that adversarial attacks could result in disinformation being labeled as factual news. Disinformation has become increasingly present in global politics, as some nation-states with significant resources have disseminated propaganda to create political dissent in other countries (Zhou et al., 2019) . Our research here has demonstrated potential risks: emotional writing could be used as an exploit to circumvent fact-checking models. Thus, we urge others to further illuminate such vulnerabilities, to minimize potential harms, and to encourage improvements with new models.

Deployment Social media companies often deal with fake news by placing highly visible labels. However, simply tagging stories as false can make readers more willing to believe and share other false, untagged stories. This unintended consequence -in which the selective labeling of false news makes other news stories seem more legitimate -has been called the "implied-truth effect" (Pennycook et al., 2019) . Thus, unless these models become so accurate that they catch all fake news presented to them, the entire basis of their use is called into question.

Despite the significant progress in developing models to correctly identify fake news, the real elephant in the room is that many people simply ignore the labels (Molina et al., 2021) . There is, however, prior work supporting the idea that if people are warned that a headline is false, they will be less likely to believe it (Ecker et al., 2010; Lewandowsky et al., 2012) . Because of this, we believe this research represents a net benefit for humanity.

Warning labels are just one way of dealing with properly identified fake news, and publishers can choose to simply not allow it on their platforms. Of course, this issue leads to questions of censorship.

In Figure 4 , we report all results for each preprocessing step. Figure 4 : The full table of results for all pre-processing steps for the Snopes (SNES) and PolitiFact (POMT) datasets. Due to the high compute requirements of the formal and informal style transfer models, these datasets were only prepared for the Snopes dataset. The darkest green colors indicate the best results, while the red indicates the worst. Multiple pre-processing steps such as (pos, stop) were performed in the order written.

Fighting the COVID-19 infodemic: Modeling the perspective of journalists, fact-checkers, social media platforms, policy makers, and the society

Where is your evidence: Improving factchecking by justification modeling

Social media and fake news in the 2016 election

Multifc: A real-world multi-domain dataset for evidencebased fact checking of claims

Aijun An, and Manos Papagelis. 2020. A comprehensive analysis of preprocessing for word representation learning in affective tasks

How do negation and modality impact on opinions?

Influence of fake news in twitter during the 2016 us presidential election

Explicit warnings reduce but do not eliminate the continued influence of misinformation

Leveraging emotional signals for credibility detection

Annotation artifacts in natural language inference data

Automatic fake news detection: Are models learning to reason?

Toward automated factchecking: Detecting check-worthy factual claims by claimbuster

ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '17

Misinformation and its correction: Continued influence and successful debiasing

Delete, retrieve, generate: A simple approach to sentiment and style transfer

Anaphora in natural language processing and information retrieval

fake news" is not simply false information: A concept explication and taxonomy of online content

The implied truth effect: Attaching warnings to a subset of fake news headlines increases perceived accuracy of headlines without warnings

Automatically neutralizing subjective bias in text. Proceedings of the AAAI Conference on Artificial Intelligence

Truth of varying shades: Analyzing language in fake news and political fact-checking

Defactonlp: Fact verification using entity recognition, TFIDF vector comparison and decomposable attention

A Novel Approach Towards Fake News Detection: Deep Learning Augmented with Textual Entailment Features

Generative text style transfer for improved language sophistication

Overview of checkthat! 2020 english: Automatic identification and verification of claims in social media

Inoculating against fake news about covid-19

Where are the facts? searching for fact-checked information to alleviate the spread of fake news

Fake news detection via NLP is vulnerable to adversarial attacks