key: cord-0314011-fhdsdxf3 authors: Botnevik, Bjarte; Sakariassen, Eirik; Setty, Vinay title: BRENDA: Browser Extension for Fake News Detection date: 2020-05-27 journal: nan DOI: 10.1145/3397271.3401396 sha: 97a178d03d075e2a06132d6a45cef2bb4ea6be19 doc_id: 314011 cord_uid: fhdsdxf3 Misinformation such as fake news has drawn a lot of attention in recent years. It has serious consequences on society, politics and economy. This has lead to a rise of manually fact-checking websites such as Snopes and Politifact. However, the scale of misinformation limits their ability for verification. In this demonstration, we propose BRENDA a browser extension which can be used to automate the entire process of credibility assessments of false claims. Behind the scenes BRENDA uses a tested deep neural network architecture to automatically identify fact check worthy claims and classifies as well as presents the result along with evidence to the user. Since BRENDA is a browser extension, it facilities fast automated fact checking for the end user without having to leave the Webpage. Online fake news has become a major societal challenge due to its consequences in real life. For example, there are instances of stock market disruptions 1 , election meddling 2 and mob lynchings 3 . To address this, several fact checking organizations such as Snopes, Politifact and FullFact have become popular. Typically they employ experts and journalists who perform a tedious task of manually selecting fact check worthy claims made in online news and social media debunking them. 1 http://business.time.com/2013/04/24/how-does-one-fake-tweet-cause-a-stockmarket-crash/ 2 https://www.theguardian.com/commentisfree/2016/nov/14/ fake-news-donald-trump-election-alt-right-social-media-tech-companies 3 https://en.wikipedia.org/wiki/Indian_WhatsApp_lynchings Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request We propose BRENDA a proof of concept browser extension which anyone can install on desktop browsers to perform end-to-end fact checking. BRENDA automates following two tasks: (1) Selecting fact check worthy claims and (2) Verifying the truthfulness of claims based on the evidence found online. Existing demos (e.g, CredEye [6] , FactMata 4 , [4] etc) are limiting to the users reading online news, since they have to first identify the claims within the articles, then switch to a different website for fact checking. There are also demos which either do only claim ranking [1] or just list the relevant websites [10] . Moreover, existing demos do not provide any explanation for the claim classifications. There are no existing demos which can jointly identify the claim and fact check them and provide evidence to the support the decision. To address these issues, BRENDA provides the following contributions: Most fact-checking websites such as Snopes.com and Politifact.com perform manual fact check. Some automated fact-checking systems such as CredEye [6] are available. However, since CredEye only uses word-level attention, it can only highlight which words were used for classifying a claim. BRENDA on the other hand can provide evidence at both-word level and sentence-level. Moreover, BRENDA can provide evidence w.r.t each aspect such as subject, author and domain of the claim. FactMata is a commercial tool for automated fact-checking, there is no description of the detection algorithm. Moreover, they do not provide any evidence snippets. Grover 5 [9] is another solution which focuses on detecting neural generated fake news. To the best of our knowledge none of these systems are provided as a browser extension which allows users to fact-check without leaving the article they are reading. There are some browser extensions such as The Factual 6 , Trusted Times 7 , and FakerFact 8 which claim to support automated fact checking and they are listed in google chrome extension store. However, there is no research paper or documentation explaining the model they use. Moreover, we could not find any system which can narrow down the claim within the article using fact-check worthiness detection and use that claim to detect fake news. BRENDA follows a client-server architecture and has a frontend and backend module. The frontend is a browser extension and the backend is a python Flask server. We develop a browser extension which works with the popular Google Chrome browser. When the user invokes the fact checking by clicking on the browser extension, JavaScript modules are used to retrieve information and details from the web pages and send the query to the server. When the results are returned back from the server, another JavaScript module is invoked to display the results. The server provides a RESTful API for the browser extension. The browser extension sends the URL or claim text chosen by the user to the server. The server then analyzes the claim text first by retrieving relevant articles from the Web via search engines such as Google and analyzes them by applying machine learning models and gives a prediction for the credibility of the claim. A score indicating how credible the claim is based on the evidence found is sent back to the browser. In this section, we explain different parts of the server. The overall block diagram of the server can be seen in Figure 1 . Querying the Web: Given a claim text, we use Google API to retrieve the top-10 relevant web pages. We use the claim text as the query without quotes. Before passing the text to the neural network for credibility prediction, we preprocess the text to tokenize, extract publication date, authors and summary etc using a python library Newspaper3k 9 . Since not all parts of the news article are important to classify the claim, we filter the articles with relevant snippets using cosine similarity (inspired by [7] ). Then we select all the snippets above 0.75 similarity score for fact checking. SADHAN Model: In this demo, for the classification of fake news articles and false claims we use a deep neural network coined SADHAN [5] . SADHAN model uses hierarchical neural attention mechanism [8] for learning the representations for both claim text and the evidence news article both at word level and sentence level. As shown in Figure 2 , SADHAN takes claim text and a evidence document embeddings as input. Optionally, SADHAN can also take latent aspects such as 'author', 'topic' and the 'domain' etc into account to guide the attention. The aspect attribute vector used in computation of attention at both the word and sentence level comes from latent aspect embeddings for which weights are trained jointly in the model using corresponding aspect attentions. As shown in Table 1 which uses word-level attention such as DeClarE [7] . For more details and performance evaluation of SADHAN see [5] . Claim Detection: Since not all sentences in the articles are worthy of a fact check, we train a classifier and use it for detecting the claim check worthy sentences. We use ULMFiT, a language model fine-tuning technique [2] and use a model inspired by Averaged-SGD-LSTM [3] to train our classifier. The model is trained with a dataset with 9069 labeled sentences (4094 from a presidential debate dataset 10 and 4975 from the Politifact dataset 11 . We combined these two datasets and together the dataset has 4666 with label "claim" and 4193 with label "non-claim". We performed 5-fold cross validation and got a precision of 0.913, a recall of 0.937 and F1score (micro) of 0.920. We use the softmax value of the model as a claim-check worthiness score for the given sentence. 10 https://github.com/apepa/claim-rank/tree/master/data 11 politifact.com The screen recording of the demonstration can be found here 12 . When the user invokes BRENDA, a popup is launched where the user can choose with what method they want to analyze the article. The user can then choose one of the two options, as shown in Figure 3 (a). When the "Analyze marked text" is chosen the selected text is used as the claim and sent to the server, which then runs the series of web page extraction, NLP and classification explained in Section 3.2. The result from the SADHAN model is displayed in the same popup window (See Figure 3(b) ). The user can also choose to see the evidence by clicking on the "evidence" button, which then extracts the evidence snippet according to the attention mechanism of SADHAN model. Users can also give a feedback if the model makes a mistake ( Figure 3 (c)) which in-turn could be potentially used to improve the classifier or evaluate the performance on the live data. When "Analyze : Evidence visualization for the claim "Covid-19 can be cured by ingesting disinfectants" using the attention weights the whole article" is clicked, another popup shown in Figure 3 (d) is launched. BRENDA automatically analyzes the whole article and fact checks the top scored claim using SADHAN model. The user can also explore other identified claims in the article by setting the claim score threshold and the top-k sentences. The user can also provide feedback on claim score prediction by our model. When the user clicks on the evidence button, the user can also see the highlighted sentences based on the attention mechanism in SADHAN model [5] . The sentence-level attention weights are aggregated using word-level attention weights. This provides an intuitive understanding of the text the model considered as important for the classification. For example, in Figure 4 , for the claim "Covid-19 can be cured by ingesting disinfectants" the evidence is shown with highlighted sentences with contrast of the color proportional to the normalized aggregated word-level attention weights. The Chrome browser extension along with the instructions on how to install it can be found here 13 . In this demonstration we proposed BRENDA which is a browser extension to tackle the challenge of misinformation. The user can use BRENDA to first identify fact check worthy claims in any news article online. Subsequently the user gets the credibility classification using a sophisticated deep neural network model. The users are also presented with the evidence from the model, and can achieve all this without leaving the Web page of the news article they are reading. A contextaware approach for detecting worth-checking claims in political debates Universal language model fine-tuning for text classification Regularizing and optimizing LSTM language models Automated fact checking in the news room Sadhan: Hierarchical attention networks to learn latent aspect embeddings for fake news detection Credeye: A credibility lens for analyzing and explaining misinformation Declare: Debunking fake news and false claims using evidence-aware deep learning Hierarchical attention networks for document classification Defending against neural fake news Claimverif: a real-time claim verification system using the web and fact databases