key: cord-0582237-fnk0o3yk authors: Ghosal, Soumya Suvra; Deepak, P; Jurek-Loughrey, Anna title: ReSCo-CC: Unsupervised Identification of Key Disinformation Sentences date: 2020-10-21 journal: nan DOI: nan sha: 71585be23073a49dcffc116c7f1db9369d9932d4 doc_id: 582237 cord_uid: fnk0o3yk Disinformation is often presented in long textual articles, especially when it relates to domains such as health, often seen in relation to COVID-19. These articles are typically observed to have a number of trustworthy sentences among which core disinformation sentences are scattered. In this paper, we propose a novel unsupervised task of identifying sentences containing key disinformation within a document that is known to be untrustworthy. We design a three-phase statistical NLP solution for the task which starts with embedding sentences within a bespoke feature space designed for the task. Sentences represented using those features are then clustered, following which the key sentences are identified through proximity scoring. We also curate a new dataset with sentence level disinformation scorings to aid evaluation for this task; the dataset is being made publicly available to facilitate further research. Based on a comprehensive empirical evaluation against techniques from related tasks such as claim detection and summarization, as well as against simplified variants of our proposed approach, we illustrate that our method is able to identify core disinformation effectively. The internet is rife with different types of disinformation openly available to the public. Research has shown that disinformation tends to spread much faster and further than truth, as illustrated in a recent study [26] . The spread of fake news is further aided by cognitive patterns such as confirmation bias. Fake news published Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. iiWAS '20, November 30-December 2, 2020 online can have serious consequences on our health, democracy and economy. Data science approaches for estimating the veracity of an article, i.e., determining whether fake or true, can be broadly seen as exploiting one or more of the following three categories of information: content, structural and propagation. The content refers to the textual as well as any image/multimedia content of the article, whereas structural information refers to the usage of the social network positioning of either the authors or 'sharers' within a social network. Propagation patterns, on the other hand, consider making use of how fast the news is re-shared and whether the sharing happens through specified cross-sections of the network or more broadly. While all three types of information have been used, structural and propagation information have been most popular. In fact, techniques that totally discard content, such as [16] and [29] , have also met with reasonable success. While arriving at a verdict on the veracity of news articles has rightly been the subject of much data science activities in this space, recent guidelines on fact checking, such as that from a recent EU expert group on disinformation [5] , place a lot of emphasis on democratic practices on combating fake news such as empowering users through facilitating a positive engagement between users and technologies. A natural direction to tackle this would be to move towards identifying key disinformation extracts from an article as a way of providing more fine-grained information than a single article-level veracity verdict. These extracts, we believe, will function as a prop to encourage the user to ascertain the veracity for herself by carefully perusing such key disinformation. Likely motivated by such considerations, a number of recent techniques adopt claim identification as an important step [1, 10, 15] in the factchecking process. Our observation has been that disinformation within online articles are not necessarily in the form of well-structured claims, especially in the case of articles comprising long narratives. Further, even when disinformation is embedded within claims, the core disinformation within the article may be localized towards a few claims, than spread evenly across all claims. In this work, we outline the novel task of identifying sentences containing key disinformation within a long textual article known to contain disinformation, within an unsupervised setting where no labelled information is available to aid the key disinformation identification process. We propose a technique for the task, called ReSCo-CC, that adopts a three-step approach. First, sentences in the article are represented within a bespoke feature space, sentence representations being based on their relevance to the entire article, smoothness with adjacent sentences, and coherence as measured using entities referenced within it. Seond, simple existing clustering techniques are applied in order to identify clusters of sentences based on their positioning in the feature space. Third, clusters are ranked on the basis of their coherence and centrality in order to identify a cluster of sentences which would be regarded as key disinformation. In addition to a binary sentence-level judgement on whether sentences contain key disinformation as described above, we target to score and rank sentences to reflect their contribution to the overall disinformation within the article. Based on an empirical evaluation against a number of natural baseline methods over a newly curated dataset that is being made publicly available, we establish the effectiveness of our method. We start with covering some related work in Section 2. We define our task in Section 3 followed by detailing our proposed method in Section 4. Our experimental analysis comparing our method against baselines over a real-world dataset is presented in Section 5. This is followed by conclusions and pointers for future work. As outlined earlier, there has been an abundance of work on automated approaches for fake news detection. In this section, we focus on techniques that attempt to analyze sub-article level information for combating fake news, aligning well with our goal of identifying key disinformation sentences within articles. The research in the space of sub-article level disinformation spans across multiple fields including ML and NLP, and most of them target one of the following objectives: claim identification, claim verification and verdict justification/explanation. Broadly speaking, the main dimensions on which the existing approaches differ include: (i) claim representation and identification method; (ii) evidence used in the process; (iii) method used for automated fact checking, and (iv) final verdict representation. For (i), claims are often represented as subject-predicate-object triple (e.g. mobile-cause-cancer) or textual fragments. In both cases, the main challenge is the non-trivial level of processing required to extract the claim from a text. The most common approaches rely on a combination of NLP and ML [10] . Within (ii), existing fact checking systems include those that restrict attention to just the information within the claim, and those that seek additional evidence beyond the claim. The former estimate the veracity of a claim using surface-level linguistic features [19] and additional metadata like author's history of previous false claims [27] . The fact checking task itself (i.e., (iii)) is often defined as a supervised learning problem. It involves construction of a text classification model using labelled data, such as existing claims previously annotated as fake or non-fake. The classifier is further applied to assign fake/non-fake label to a new claim [19, 27] or to detect fact-check worthy claims [11] . Text classification only considers features extracted from the claim itself or the author's profile and does not rely on any other sources of evidence. Evidence based methods commonly consider fact checking as a knowledge graph analysis problem in order to predict whether an unobserved triple is likely to appear in the graph [4, 23, 25] . Another common approach casts fact checking as a textual entailment task, modelled to predict whether a document or its part is for, against or observing a given claim [7, 20] , or in order to retrieve sentence-level evidence for the claim [12] . Methods that rely on repositories of previously fact checked claims are mainly based on sentence-level textual similarity [10, 25] . Task (iv), in the simplest scenario, is just about providing a true/false output directly from a binary classification engine [18] . Alternatively, multi-class labels over a multi-point scale [19] , or a numerical range indicating how likely a claim is to be true [2] are used. Against the backdrop of claim-based and NLP-oriented data science solutions, we propose a significantly different task formulation, that of scoring sentences based on their contribution to the overall article disinformation. While we haven't come across work addressing this particular task, we will evaluate our method against state-of-the-art claim detection and summarization methods. Our task is designed to be invoked on only those news articles that are known to contain disinformation; thus, identifying disinformationladen articles is a task that is upstream to ours. In addition to a variety of supervised learning methodologies that seek to accurately identify articles as containing fake news or not, there has been recent interest in unsupervised fake news detection as well. Such unsupervised methods widely varying in character, with techniques capitalizing on synchronous user behavior [8] and emotions [13] . Other more subtle features such as lexical [22] and thematic [21] have also been found to be useful to differentiate fake news from real news without the aid of supervision. Consider a document which is known to contain some disinformation. Let the document comprise sentences; = [ 1 , 2 , . . . , ]. Our task targets unsupervised scoring of sentences at a per-document level, in a way that the scoring correlates with the contribution of the sentence to the overall disinformation within the document. We would like to consider two scenarios, with each to be either a binary or numeric value indicating whether/how much contributes to the disinformation within document . We call the setting of binary as the identification task, and the numeric setting as the scoring task. Unlike the formulation of similar tasks such as claims detection which are typically formulated within supervised settings, we address the unsupervised identification/scoring problem. will not be intercepted in any way. However, as noted in the EU report [5] , such a paradigm of presenting disinformation risks undermining freedom of expression as well as the diversity of the news media ecosystem, and democratic liberal values in general. The task of key disinformation identification offers a pathway towards addressing this issue, that of enabling selective highlighting of key disinformation sentences in an article. Such selective highlighting, with appropriate indicators, as well as perhaps shortcut links to authoritative information on the topic of the sentence, would align the presentation paradigm with a much more democratic model focusing on user empowerment and positive engagement (as alluded to earlier). Thus, the key disinformation identification task enriches the fake news detection pipeline with presentation opportunities in line with cherished democratic values. Any improvements achieved in the upstream task of disinformation identification will evidently be compelementary to this task, and would enrich the upstream part of the pipeline. Our method, codenamed ReSCo-CC, targets to address both the identification and scoring tasks outlined in Section 3. Our approach consists of three distinct phases; (i) representing sentences in a bespoke feature space, (ii) clustering the sentence representations, and (iii) choosing sentences based on clustering output. We describe these in separate subsections herein. We undertook a qualitative study of various fake news articles as well as literature on fake news from journalism with a view of understanding the aspects that could contribute to sentences within it being regarded as core disinformation. Our focus was on identifying sentence characteristics indicative of core disinformation content that may be considered as sufficiently generic -so it would not need additional fine-tuning to work for niche sub-domains -with an additional preference towards characteristics that are amenable to computational modelling. Our analyses indicated that there are at least three kinds of sentences within a disinformation-laden document; sentences that contain key disinformation, sentences that cushion the key disinformation through providing (tangential) truthful information, and sentences that are oriented towards smoothening the overall flow. Based on this observation, we distilled three key sentence characteristics from our study, viz., relevance, smoothness and coherence, abbreviated together as ReSCo, which forms the basis of our bespoke feature space for key disinformation identification. These three features bear significant complementarities, relevance being a document-level feature, smoothness being a more fine-grained feature estimated at a sub-article level, whereas coherence is a sentence-level feature assessed in reference to a knowledge base. We describe them herein. Thus, the relevance is quantified as the average of sentence level similarities between and each other sentence in , estimated using the chosen vector representation. Under this formulation, which has a flavor of document-level summarization, sentences that bear higher similarities with a larger number of 's sentences will be scored higher. In other words, sentences that are semantically central to the document are likely to achieve higher ( ) scores. Smoothness measures how well each sentence gels with sentences on either side of itself. We quantify it using the same vector representation (as introduced above) as: Thus, this is a local feature, quantified using the similarity between each sentence and it's adjacent ones. Intuitively, it measures how well maintains the local 'flow' of information in the document, with those that offer a smooth flow being accorded higher (.) scores. Unlike the above two measures, coherence measures the coherence of a sentence in reference to an external knowledge base. First, we identify mentions of Wikipedia entities within each sentence using entity linking methods [9] . We denote the set of entities identified within a sentence as ( ). We then use the Wikipedia2Vec method [30] in order to map each entity ∈ ( ) to a vector, denoted as . We regard coherent sentences as those that refer to a number of related entities, yielding the following formulation for coherence: Thus, ℎ( ) measures the average of pairwise similarities between entities mentioned within . As a feature of independent of all other sentences in , this stands very complementary to the relevance and smoothness features. We now illustrate the qualitative difference between the above three features using extracts from a fake news article in the dataset we use in our experiments, in Table 1 . As maybe noted therein, the top-scoring sentence on each for the features are quite different qualitatively. The top-Rel sentence is seen to be quite central to the article, whereas the top-Smo sentence identifies a region of very smooth flow of text, whereas the top-Coh sentence is seen to talk about very related entities (e.g., antibiotics and bacteria). We do not imply that the top-scored sentences are likely to be key disinformation, but simply that these are useful and complementary indicative features to capture; more details on using ReSCo for key disinformation identification and the motivation on using the three features will follow. Eczema is the most common skin disease worldwide. A new clinical trial is testing a natural treatment that researchers hope will provide a long-term solution for those dealing with the dry, itchy and painful skin that comes with chronic eczema. The trial uses a cream containing beneficial bacteria to fight harmful bacteria on the skin. Smo While it may seem counterintuitive to treat bacteria with more bacteria, experts say this approach seeks to restore the natural microbial balance of healthy skin. "There are over 1,000 species of bacteria that all live in balance on healthy skin, some that even produce natural antibiotics." . . . Coh Powerful antibiotics are commonly prescribed for eczema, but they kill good bacteria on patients' skin along with the bad. . . . Rel Experts say there is more research to be done, but that the goal of the trial is to discover the best combination of bacteria to clear eczema from the skin and then make it available to patients as a prescription cream. Having represented the sentences within in the R 3 ReSCo space as above, we now cluster to form groups of sentences that are coherent in the R 3 ReSCo space. This may be accomplished by any clustering algorithm; we use the popular centroid-based partitional clustering algorithm, -Means [17] . -Means requires a parameter, the number of clusters in the output. Given that individual articles may vary much in the number of sentences they contain, we use the elbow method [14] to choose the value of . The elbow method is a popular method of choosing the number of clusters in the data in a mathematically principled manner [24] that involves choosing a trade-off between competing criteria of fewer clusters and more coherent ones; in particular, it chooses the point beyond which returns of choosing larger numbers of output clusters saturate. It would rightly appear that each of the ReSCo features are correlated with what may be regarded as readability and quality, and one may be left wondering how such a set of features would lend to disinformation identification. This brings us to the key intuition and heusitic within our method. While relevance, smoothness and coherence are all individually good characteristics, the combination poses trade-offs due to some conflicting elements across them. A good quality article, i.e., one that is crafted to deceive the reader into believing the disinformation, would comprise sentences, each of which would excel in any one, or perhaps two, of those features. For example, a sentence that delves into some minor important points in the overall narrative would suffer on relevance, and sentences that connect different sub-narratives would suffer on smoothness and coherence. In other words, good quality articles encompass a bouqet of sub-narratives held closely together, making individual sentences unable to optimize on all ReSCo features. On the other hand, two key observations from our qualitative analysis guide the formulation of our identification/scoring method. First, we observe that fake news authors ensure that key disinformation sentences are not particularly disadvantaged on any of the ReSCo features; this is implicitly due to the urge to optimize on readability and quality. Given that optimizing for all are quite challenging, this leads to key disinformation sentences being positioned close to the centroid of 's ReSCo space. Secondly, the positioning in the ReSCo space is strongly influenced by the style of the author and the topic in question, making it likely that the key disinformation sentences are all very close to each other in the ReSCo space. These lead to our two key heuristics, Centrality and Cohesion, for identification and scoring of key disinformation sentences. With cohesion being enforced by the clustering step, the key disinformation sentences are those belonging to the cluster whose centroid is closest to 's ReSCo space centroid; this cluster is denoted as * : * = arg min where C is the set of all clusters output from the clustering step, ( ) being the 3-d ReSCo space representation of sentence , and (.) denoting the euclidean distance. The binary output for the identification task is computed as: In other words, the identification task is accomplished by choosing all sentences in the cluster whose ReSCo-space centroid is closest to that of all sentences within . The output for the scoring task computes a score for all sentences in * based on its proximity to * 's centroid. Sentences belonging to the other clusters are left at 0.0; this is shown as below: Determine * according to Eq. 5 5 Determine s according to Eq. 6 or Eq. 7 and output. The scoring provides a way to discriminate among sentences in * ; especially, if * contains many sentences and a top-needs to be chosen, the selection may be done using the scoring in Eq. 7. This completes the description of our method, codenamed ReSCo-CC, that uses ReSCo features and the Centrality and Cohesion heuristics to perform key disinformation identification. The overall method is outlined in Algorithm 1. The first two steps correspond to the design of the ReSCo space, with the clustering in Step 3 followed by the identification or scoring process in Steps 4-5. Once the ReSCo embedding is achieved, all further steps are linear. The computation of (.) in achieving the ReSCo embedding, however, is quadratic in the number of sentences in the document. Given that typical documents contain only 15-30 sentences, the quadratic complexity does not pose much concern. ReSCo-CC was found to offer turn around times of milliseconds even in quite long documents. We describe the experimental setup followed by results and analysis. 5.1.1 Creation of gold standard dataset. Evaluation of the key disinformation sentence identification task would ideally require sentence level manual labellings. Being a novel task, we have not come across such sentence annotated datasets. Further, we found that health domain fake news often requires signifciant domain expertise to annotate, something that we did not have access to. Thus, we develop an innovative approach for generating labelled data for evaluating our approach. We identified a website, healthnewsreview.org, which targets, among other initiatives, to debunk fake health news (hoaxes, as they call it). They produce, for each fake health news piece, a textual refutation of the same detailing why the information in the newspiece is false. We selected 80 articles which were debunked within healthnewsreview, to form our empirical testbed. It may be noticed that ours is an article-level task, thus each article is treated separately, making the small size of the dataset not an overwhelming concern, not as much for scenarios that do corpus level learning. We use simple statistical methods over the dataset comprising (ℎ , ) pairs, to arrive at a disinformation scoring for each sentence in the hoax article. For each sentence in the hoax (the hoax file is also the same that will be input to the method as ), the score is as follows: This estimates ( ) as the average similarity that sentences in the refutation have, with . The refutation narratives in healthnewsreview were found to focus on the key disinformation sentences in the hoax. Given that, we expect that ( ) estimated as average similarities to the refutation sentences to approximate the disinformation-ness of each sentence; we verified this assumption through manual perusal of a significant fraction of the dataset. Thus, high (.) would be associated with sentences that hold key disinformation. We call this {. . . , ( ), . . .} vector as the refsim vector, short for refutation similarity. This vector holding similarities to the refutation forms our gold standard labelling for experiments; as may be obvious, the refutations and thus the refsim vector are not available to the methods, and are used only in evaluation. We are making this dataset -comprising both the (hoax, refutation) pairs and the refsim vectors -available in the public domain, at http:// member.acm.org/~deepaksp > Publications > Entry for ReSCo-CC paper. We measure the effectiveness of the binary and scored versions of the estimates from ReSCo-CC against the vector using two evaluation models. Firstly, for the identification setting, our task is to compare the s which are in {0, 1} with the numeric vector. As a natural method for comparing two vectors, we use the Pearson product moment correlation coefficient. Pearson correlation co-efficient has been popular in evaluating NLP methods, and has been extensively used in evaluating methods in shared tasks such as SemEval [3] and others 3 . Second, for the scoring task, our intent is on understanding whether the high values indeed correlate with the highest scored sentences from . Towards bringing this into a standard information retrieval evaluation model, we discretize/truncate the gold-standard vector by retaining the top-sentences according to (.) as key disinformation and others as not, forming a binary labelling. Now, the ranking offered by s is evaluated using NDCG [28] to measure whether they rank the key disinformation (i.e., top-) sentences highly enough. NDCG is the de facto measure for evaluating ranking quality in information retrieval (an analysis of its theoretical underpinnings appears at [28] ); the construction of NDCG accounts for not just whether the expected results appear in the top-, but also quantifies how close to the top the correct results appear. We also vary the discretization/truncation parameter to study the effectiveness trends. For both these metrics, higher values are desirable and correlate with better effectiveness. Each of these are computed at the document level; we report the average of these measures over the 80 documents in our dataset. Given that our task is novel, there exist no methods in literature addressing the precise task. Thus, we adapt methods for related tasks to serve as baselines in our empirical evaluation. Our task of key disinformation sentence identification/scoring has a similar structure in construction to claim detection (typically supervised) and document summarization (typically unsupervised, like our task). Algorithms for both tasks are capable of operating in the binary output model (as in our identification task) and the scored model (as in our scoring task). Thus, we use state-ofthe-art techniques from these tasks as baseline methods; these are (i) a fact-checking oriented supervised claim detection method [1] (we abbreviate it to CD) trained over IBM Debator dataset, and (ii) the deep learning-based document summarization method from a very recent paper [6] that has been shown to produce good document summaries. In addition to these, we also evaluate against variants of our method formed by dropping each of the centrality and cohesion assumptions, to arrive at an ablation study. First, we omit centrality and focus on cohesion by choosing the most coherent cluster, the one that has the highest average pairwise similarity between its members; we call this ReSCo-Coh. Similarly, we relax the cohesion criterion and rank sentences based on their similarity to the dataset centroid along the ReSCo features; this is called ReSCo-Cen. In order to achieve robust results (given the randomness in the clustering initialization step), we report results averaged over 100 iterations. Tables 2 and 3 list the results of the comparative evaluation of our method against the four baseline techniques based on the evaluation measures of correlation co-efficient and NDCG respectively. It may be noted from Table 2 that ReSCo-CC achieves significantly higher performance than the baselines, with the second best one, CD (which is notably a supervised technique, being trained over the IBM Debator dataset), being left far behind. These trends continue over into Table 3 as well, across varying values of . Across both these settings, we carried out the two-tailed t-test 4 , wherein it was observed that ReSCo-CC gains over each of the baselines were staistically significant with a p-value < 0.01. While all techniques show an increase with increasing , it may be noted that NDCG is more meaningful for low values of . For example, at the extreme case, setting = is equivalent to considering all sentences in the document as key disinformation sentences, making it meaningless since all techniques would score = 1.0 by design. Given this construction of the evaluation, we find it very promising to note that ReSCo-CC scores 0.856 even at a relatively low value of = 3. In addition to the improvements over CD and Summ, it is seen that ReSCo-CC outperforms both ReSCo-Coh and ReSCo-Cen by very large margins, indicating that the combination of clustering and cohesion is very pertinent to core disinformation detection within our method, and relaxing either one would lead to significant drops in accuracy. Table 4 illustrates the ReSCo-CC outputs on a part of a sample document; we have chosen eight sentences from a part of the document with key disinformation for illustration. The first column indicates the binary identification output from ReSCo-CC for the sentence in the second column, with the last (third) column populated with the refsim score associated with the sentence. The article excerpt starts with largely true sentences introducing bipolar disorder (sentence #1) and standard treatments for it (#2), followed by introducing the main point, that of a dubious probiotics based treatment (#3) backed up by a spurious study (#4). Then, as is typical of good hoaxes, it moves to some truthful details about probiotic bacteria (#5 and #6) before once again moving back to references to disinformation (#7 and #8). It may be noticed from the ReSCo-CC outputs that disinformation sentences are being identified correctly, and that the refsim scores (the external scores computed using the refutation document which are unavailable to ReSCo-CC, and used only in evaluation) also are correlated well with the disinformation in the document. Towards analyzing ReSCo-CC on the ongoing COVID-19 pandemic, we tested on a number of COVID-19 fake news and obtained promising results. For example, in a popular debunked post 5 , ReSCo-CC correctly identified the key disinformation, that lemon would 'alkalize' the immune system among the top disinformation sentences. Within our COVID-19 analysis, ReSCo-CC was found to identify the core disinformation in each case. We hope to do a rigorous COVID-19 analysis as data accumulates within healthnewsreview.org, our data source for hoax-refutation pairs. We considered a novel task, that of identifying/scoring key disinformation sentences within long textual articles that are known to be disinformation-laden, motivated by applications to health fake news, and proposed an unsupervised method for the same. Our method makes use of three features, viz., relevance, smoothness and coherence, as well as two cross-sentence assumptions, cohesion and centrality, within a clustering-based construction over sentences in a document. Based on an empirical evaluation over a wide variety of baselines over a task-specific dataset that we curated (to be made public), we illustrate that our method is very effective in identifying core disinformation sentences within disinformation laden health related articles. In the light of recent and hitherto unseen interest in health disinformation owing to the COVID-19 situation, we are considering further technological avenues of deepening democratic practices in combating disinformation. In particular, we are developing an evidence-based medicine approach towards annotating key disinformation sentences with information obtained through queries over trusted medical databases such as TRIP 6 , towards allowing users to verify and ascertain the disinformation themselves. We are also considering usage of captions of images that appear embedded within text articles to further improve key disinformation identification; the images and thus captions obviously relate to key elements in the article providing complementary input to improve disinformation assesment. Real-time Claim Detection from News Articles and Retrieval of Semantically-Similar Factchecks Overview of the triple scoring task at the WSDM Cup Semeval-2017 task 1: Semantic textual similarity-multilingual and crosslingual focused evaluation Computational fact checking from knowledge networks A multi-dimensional approach to disinformation: Report of the independent High level Group on fake news and online disinformation Unified Language Model Pretraining for Natural Language Understanding and Generation Emergent: a novel data-set for stance classification Unsupervised Fake News Detection: A Graph-based Approach Entity linking via joint encoding of types, descriptions, and context ClaimBuster: the first-ever end-to-end fact-checking system Toward automated fact-checking: Detecting check-worthy factual claims by ClaimBuster Understanding and detecting supporting arguments of diverse types Emotion cognizance improves health fake news identification Review on determining number of Cluster in K-Means Clustering Towards automated factchecking: Developing an annotation schema and benchmark for consistent automated claim detection Detect rumors in microblog posts using propagation structure via kernel learning Some methods for classification and analysis of multivariate observations Language-aware truth assessment of fact candidates Truth of varying shades: Analyzing language in fake news and political fact-checking A simple but tough-to-beat baseline for the Fake News Challenge stance detection task Exploring thematic coherence of fake news On the Coherence of Fake News Articles An extensible framework for verification of numerical claims Estimating the number of clusters in a data set via the gap statistic Fact checking: Task definition and dataset construction The spread of true and false news online liar, liar pants on fire": A new benchmark dataset for fake news detection A theoretical analysis of NDCG type ranking measures Tracing fake-news footprints: Characterizing social media messages by how they propagate Wikipedia2vec: An optimized implementation for learning embeddings from wikipedia Deepak P was partly supported by projects funded by MHRD SPARC (P620) and UKIERI.