key: cord-0652540-jnx243gq
authors: Khan, Tanveer; Michalas, Antonis; Akhunzada, Adnan
title: SOK: Fake News Outbreak 2021: Can We Stop the Viral Spread?
date: 2021-05-22
journal: nan
DOI: nan
sha: 4f0da72f1432b666554a4370876e4762c03899fa
doc_id: 652540
cord_uid: jnx243gq

Social Networks' omnipresence and ease of use has revolutionized the generation and distribution of information in today's world. However, easy access to information does not equal an increased level of public knowledge. Unlike traditional media channels, social networks also facilitate faster and wider spread of disinformation and misinformation. Viral spread of false information has serious implications on the behaviors, attitudes and beliefs of the public, and ultimately can seriously endanger the democratic processes. Limiting false information's negative impact through early detection and control of extensive spread presents the main challenge facing researchers today. In this survey paper, we extensively analyze a wide range of different solutions for the early detection of fake news in the existing literature. More precisely, we examine Machine Learning (ML) models for the identification and classification of fake news, online fake news detection competitions, statistical outputs as well as the advantages and disadvantages of some of the available data sets. Finally, we evaluate the online web browsing tools available for detecting and mitigating fake news and present some open research challenges.

The popularity of Online Social Networks (OSNs) has rapidly increased in recent years. Social media has shaped the digital world to an extent it is now an indispensable part of life for most of us [54] . Rapid and extensive adoption of online services is influencing and changing how we access information, how we organize to demand political change and how we find partners. One of the main advantages and attractions of social media is the fact that it is fast and free. This technology has dramatically reshaped the news and media industries since becoming a dominant and growing source of news and information for hundreds of millions of people [80] . In the United States today more people are using social media as a news source than ever before [157] . Social media has progressively changed the way we consume and create news. The ease of producing and distributing news through OSNs has also simultaneously sharply increased the spread of fake news.

Fake news is not a new phenomenon; it existed long before the arrival of social media. However, following the 2016 US presidential election it has become a buzzword [3] . There are numerous examples of fake news trough history. A notable one from the antiquity is the Mark Anthony smear campaign circa 44 BC [128] . In more recent times, examples include the anti-German campaign, German corpse factory in 1917 [115] and the Reich Ministry of Public Enlightenment and Propaganda established in 1933 by the Nazis to spread Nazi ideology and incite violence against Jews [19] .

Although propaganda campaigns and spread of fabricated news may have been around for centuries, their fast and effective dissemination only became possible by means of a modern technology such as the internet. The internet revolutionized fake news, regardless of how the misinformation is manifested: whether we are talking about a rumor, disinformation, or biased, sloppy, erroneous reporting. In a recent study [180] , it was found that almost 50 percent of traffic taken from Facebook is fake and hyperpartisan, while at the same time, news publishers relied on Facebook for 20 percent of their traffic. In another study, it was found that 8 percent of 25 million Universal Resource Locator (URLs) posted on social media were indicative of malware, phishing and scams [164] .

Researchers in Germany conducted a study regarding fake news distribution in the country and people's attitudes and reactions towards it [26] . Based on the published results, 59 percent of participants stated that they had encountered fake news; in some regions, this number increased to almost 80 percent [163] . Furthermore, more than 80 percent of participants agreed fake news poses a threat and 78 percent strongly believed it directly harms democracy. Government institutions and powerful individuals use it as a weapon against their opponents [22] . In the 2016 US presidential election, a significant shift in how social media was used to reinforce and popularize narrow opinions was observed. In November of the same year, 159 million visits to fake news websites were recorded [5] , while the most widely shared stories were considered to be fake [154] . Similarly, it is believed that the distribution of fake news influenced the UK European Union membership referendum [64] .

However, fake news is not only about politics. During the recent fires in Australia, several maps and pictures of Australia's unprecedented bushfires spread widely on social media. While users posted them to raise awareness, the result was exactly the opposite since some of the viral maps were misleading, spreading disinformation that could even cost human lives [131] . The recent COVID-19 pandemic accelerated the rise of conspiracy theories in social media. Some were alleging that the novel coronavirus is a bio-weapon funded by Bill Gates to increase the selling of vaccines [49] . Undoubtedly fake news threaten multiple spheres of life and can bring devastation not only to economic and political aspects but peoples' wellbeing and lives.

The main motivation behind our study was to provide a comprehensive overview of the methods already used in fake news detection as well as bridge the knowledge gap in the field, thereby helping boost interdisciplinary research collaboration. This work's main aim is to provide a general introduction to the current state of research in the field.

We performed an extensive search of a wide range of existing solutions designed primarily to detect fake news. The studies used deal with identification of fake news based on ML models, network propagation models, fact-checking methods etc. More precisely, we start by examining how researchers formulate ML models for the identification and classification of fake news, which tools are used for detecting fake news and conclude by identifying open research challenges in this domain.

Comparison to Related Surveys. In a related work by Vitaly Klyuev [85] , an overview of the different semantic methods by concentrating on Natural Language Processing (NLP) and text mining techniques was provided. In addition, the author also discussed automatic fact-checking as well as the detection of social bots. In another study, Oshikawa et al. [121] focused on the automatic detection of fake news using NLP techniques only. Two studies can be singled out as being the closest to our work. First, Study by Collins et al. [32] which examined fake news detection models by studying the various variants of fake news and provided a review of recent trends in combating malicious contents on social media. Second, a study by Shu et al. [150] which mostly focused on various forms of disinformation, factors influencing it and mitigating approaches.

Although some similarities are inevitable, our work varies from the aforementioned ones. We provide a more detailed description of some of the approaches used and highlight the advantages and limitations of some of the methods. Additionally, our work is not limited to NLP techniques, but also examines types of detection models available, such as, knowledge-based approaches, fact-checking (manual and automatic) and hybrid approaches. Furthermore, our approach considers how the NLP techniques are used for the detection of other variants of fake news such as rumors, clickbaits, misinformation and disinformation. Finally, it also examines the governmental approaches taken to combat fake news and its variants.

The rest of this paper is organized as follows: Section 2 discusses the most important methods for detecting fake news, in Section 3, we detailed both the automatic and manual assessment of news and analyzed different ways of measuring the relevance, credibility and quality of sources. To automate the process of fake news detection, the analysis of comprehensive data sets is of paramount importance. To this end, in Section 4, we first discuss the characteristics of online tools used for identifying fake news and then compare and discuss different data sets used to train ML algorithms to effectively identify fake news. The classification of existing literature, identified challenges, future directions and existing governmental strategies to tackle the problem of fake news detection are discussed in Section 5. Finally, the concluding remarks are given in Section 6.

People are heavily dependent on social media for getting information and spend a substantial amount of time interacting on it. In 2018, the Pew Research Center revealed that 68 percent of Americans [108] used social media to obtain information. On average, 45 percent of the world's population spend 2 hours and 23 minutes per day on social media and this figure is constantly increasing [9] . The biggest problem with information available on social media is its low quality. Unlike the traditional media, at the moment, there is no regulatory authority checking the quality of information shared on social media. The negative potential of such unchecked information became evident during the the 2016 US presidential election 1 . In short, it is of paramount importance to start considering fake news as a critical issue that needs to be solved.

In spite of the overwhelming evidence supporting the need to detect fake news, there is, as yet, no universally accepted definition of fake news. According to [90] , "fake news is fabricated information that mimics news media content in form but not in organizational process or intent". In a similar way, fake news is defined as "a news article that is intentionally and verifiable false" [152] . Some articles also associate fake news with terms such as deceptive news [5] , satire news [139] , clickbait [25] , rumors [196] , misinformation [88] , and disinformation [87] . Hence, these terms are used interchangeably in this survey.

The following forms of misuse of information have been considered as variants of fake news in the existing literature [139, 160] :

• Clickbait: Snappy headlines that easily capture user attention without fulfilling user expectations since they are often tenuously related to the actual story. Their main aim is to increase revenue by increasing the number of visitors to a website.

• Propaganda: Deliberately biased information designed to mislead the audience. Recently, an increased interest has been observed in propaganda due to its relevance to the political events [139] . • Satire or Parody: Fake information published by several websites for the entertainment of users such as "The Daily Mash" website. This type of fake news typically use exaggeration or humor to present audiences with news updates. • Sloppy Journalism: Unreliable and unverified information shared by journalists that can mislead readers. • Misleading Headings: Stories that are not completely false, but feature sensationalist or misleading headlines. • Slanted or Biased News: Information that describes one side of a story by suppressing evidence that supports the other side or argument.

For years, researchers have been working to develop algorithms to analyze the content and evaluate the context of information published by users. Our review of the existing literature is organised in the following way: subsection 2.1, examines approaches to identifying different types of user accounts such as bots, spammers and cyborgs. It is followed by subsection 2.2, where different methods used for identifying rumors and clickbaits are discussed. In subsection 2.3, the users' content and context features are considered while in subsection 2.4, different approaches for the early detection of fake news by considering its propagation are discussed.

According to a report published in 2021 Twitter alone has 340 million users, 11.7 million registered apps, delivers 500 million tweets a day and 200 billion tweets a year [10] . It's popularity has made it an ideal target for bots, or automated programs [81] . Recently, it was reported that around 5-10 percent of Twitter accounts are bots and responsible for the generation of 20-25 percent of all tweets [119] . Some of the bots are legitimate, comply with Twitter objectives, and can generate a substantial volume of benign tweets like blogs and news updates. Other bots, however, can be used for malicious purposes such as a malware that gathers passwords or a spam that adds random users as friends and expects to be followed back [77] . Such bots have a more detrimental effect particularly when spreading fake news. The significance of differentiating the legitimate bots from the malicious ones emerged from the fact that malicious bots can also be used to mimic human behaviour in a negative way.

Researchers examined bots, in a number of existing publications [35, 43, 60, 92, 156, 185] . Gilani et al. [63] focused on classifying Twitter accounts into "human" and "bots" and analyzing the impact each has on Twitter. The proposed technique was based on previous work by "Stweeler" [62] for the collection, processing, analysis, and annotation of data. For the identification of bots, human annotation was used, where participants differentiated bots from humans and generated a reliable data set for classification. The process provided an in-depth characterization of bots and humans by observing differences and similarities. The finding stated that the bots' removal from Twitter causes serious repercussions for content production and information dissemination and also indicated that bots count on re-tweeting, redirecting users via URLs, and uploading media. However, the imprecision in the existing algorithm revealed by the authors and the manual collection of data limited the ability to analyse accounts.

Similarly, Giachanou et al. [58] investigated whether the Twitter account author is human or a bot and further determined the gender of a human account. For this purpose, a linear Support Vector Machines (SVM) classifier was trained to analyse words, character grams, and stylistic features. For the identification of human gender, a stochastic gradient descent classifier was used to assess the sentiment of tweets, words, and character grams and point wise mutual information features -the importance of terms per gender. The data set used consisted of tweets in English and Spanish. The experiments illustrated the accuracy of bot detection, i.e 0.906 for bots in English and 0.856 for Spanish. Similarly, for the identification of gender, the accuracy for English tweets amounted to 0.773 and 0.648 for Spanish tweets. In the long run, the bot detection model outperformed the gender detection model.

Another account type that can be generated on Twitter is a Cyborg. Cyborg refers to a human-assisted bot or bot-assisted human [27] . Cyborgs have characteristics of both humangenerated and bot-generated accounts and as such require a level of human engagement. These accounts faciltate posting various information more frequently, rapidly and longterm [40] . Differentiating a cyborg from a human can be a challenging task. The automated turing test [169] used to detect undesirable or bot programs is not capable of differentiating cyborgs from humans. However, Jeff Yan [187] proposed that a cyborg might be differentiated by comparing the characteristics of a machine and human elements of a cyborg. Similarly, Chu et al. [28] differentiate between bot, cyborg and human accounts by taking into account tweet content, tweeting behaviour and features of the account.

OSNs also serve as platforms for the rapid creation and spread of spam. Spammers act similarly to bots and are responsible for posting malicious links, prohibited content and phishing sites [110, 124] . Traditional methods of detecting spammers that utilize network structure are classified into three categories:

• Link-based, where the number of links is used as a measure of trust. These links are considered to be built by legitimate users [92] . • Neighbor-based, which treats links as a measure of homophily, the tendency for linked users to share similar beliefs and values [76, 98, 132] . • Group-based, which recognizes that spammers often work in groups to coordinate attacks [78] . Group-based methods detect spammers by taking advantage of the group structure hidden in the social network. Additionally, spammers behave differently from legitimate users so they can be treated as outliers [2, 52] .

Current efforts for detection of social spammers utilize the structure and behavioural patterns of social spammers in an attempt to discover how their behaviour can be differentiated from legitimate users [27, 97, 99, 186, 189, 191] . However, spammers often find ways to create a link with legitimate users, making it more difficult to detect specific spamming patterns. Wu et al. [182] tackled this problem by taking into account both content and network structure. They proposed "Sparse Group Modelling for Adaptive Spammer Detection (SGASD)" that can detect both types of spammers -those within a group and individuals.

Another challenging task is detection of camouflaged content polluters on OSNs. Content polluters -spammers, scanners and fraudsters -first establish links with a legitimate user and then merge the malicious with real content. Due to insufficient label information available for camouflaged posts in online media, the use of these manipulated links and contents as camouflage makes detecting polluters very difficult. In order to tackle this challenge, Wu et al. [183] studied how camouflaged content polluters can be detected and proposed a method called "Camouflaged Content Polluters using Discriminate Analysis (CCPDA)" which can detect content polluters using the patterns of camouflaged pollution.

Chris et al. [65] spam detection analysis juxtaposed two different types of Twitter accountsa "professional spamming account" whose sole purpose is to distribute spam, versus "accounts compromised by spammers". The authors found that accounts currently sending spam had been compromised by spammers; once legitimate, they became controlled by spammers. Furthermore, to detect spam activity on Twitter, a directed social graph model [173] based on friend and follower relationships was proposed. Different classifier techniques were used to distinguish between the spammer and normal behaviour and determined that the Naive Bayes classifier performs better with respect to F-measure.

Huge momentum has been observed where user-generated content is exploited in microblogs for predicting real-world phenomena such as prices and traded stock volume on financial markets [34] . Research efforts in this domain targeted sentiment metrics as a predictor for stock prices [16, 24, 51] , company tweets and the topology of the stock network [107, 141] and used weblogs pointing to the relationship between companies [84] . Cresci et al. [36] demonstrated the use of twitter stock micro-blogs as a platform for bots and spammers to practice cash-tag piggybacking -an activity for promoting low-value stocks by exploiting the popularity of real high-value stocks. They employed a spambot detection algorithm to detect accounts that issue suspicious financial tweets. Nine million tweets from five main US financial markets, which presented stocks with unusually high social significance compared to their low financial relevance, were investigated with respect to their social and financial significance. These tweets were compared with financial data from Google finance. The results indicated that 71 percent of users were classified as bots and that high discussion of low-value financial stocks was due to a massive number of synchronized tweets.

Twitter currently has no defined policy for addressing automated malicious programs operating on its platform. However, it is expected that these malicious accounts will be deleted in the near future [146] . A survey of the literature has identified numerous studies [1, 8, 38, 45, 63, 91, 113, 172, 177, 178] that describe the important characteristics which can be used for the identification of bots on Twitter. Despite these attempts, limitations still exist in employing these characteristics for detecting fake news, especially, early detection of fake news during it's propagation. Other methods, such as network propagation, have to be utilized for this purpose.

Social media is like a blank sheet of paper on which anything can be written [190] , and people easily become dependent on it as a channel for sharing information. This exactly is the reason why social media platforms (e.g. Twitter and Facebook) are highly scrutinized for the information shared on them [71] . These platforms have undertaken some efforts to combat the spread of fake news but have largely failed to minimize its effect. For instance, in the United States, 60 percent of adults who depend on social media for news consumption are sharing false information [181] . In April 2013, two explosions during the Boston Marathon gained tremendous notoriety in the news and on social media, and the tragedy was commented on in millions of tweets. However, many of those tweets were rumors (controversial factual claims) and contained fake information, including conspiracy theories. Similarly, a survey published by Kroll -a business intelligence and investigating firm -states that 84 percent of companies feel threatened by the rise of rumors and fake news fuelled by social media [15] . On Weibo, rumors were detected in more than one-third of trending topics [193] . The spread of rumors on social media has also become an important issue for companies worldwide. Still, there is no clear policy defined by social media administrators to verify shared content.

Below, we discuss different techniques that have been proposed by researchers to address this problem.

Traditionally, human observers have been used to identify trending rumors. Currently, research is focused on building an automated rumor identification tool. For this purpose, a rumor detection technique [193] was designed. In this technique, two types of clusters were generated: posts containing words of inquiry such as "Really", "What", "Is it true?" were grouped into one cluster. These inquiries were then used to detect rumor clusters. Similarly, posts without words of inquiry were grouped into another cluster. Similar statements were extracted from both clusters. The clusters were then ranked, based on their likelihood of containing these words. Later, the entire cluster was scanned for disputed claims. These experiments, performed with Twitter data, resulted in earlier and effective detection of rumors (almost 50 rumor clusters were identified). However, there is still considerable space to improve these results [193] . For instance, the manual collection of inquiry words could be improved by training a classifier and the process of ranking could be improved by exploring more features for the rumor cluster algorithm.

People can share fake information on social media for various reasons. One of those is to increase readership, which is easily achievable by using clickbait. Clickbait is a false advertisement with an attached hyperlink. It is specifically designed to get users to view and read the contents inside the link [7] . These advertisements attract users with catchy headlines but contain little in the way of meaningful content. A large number of users are lured by clickbait. Monther et al. [4] provided a solution to protect users from clickbait in the form of a tool that filters and detects sites containing fake news. In categorizing a web page as a source of fake news, they considered several factors. The tool navigates the content of a web page, analyzes the syntactical structure of the links and searches for words that might have a misleading effect. The user is then notified before accessing the web page. In addition, the tool searches for the words associated with the title in the links and compares it with a certain threshold. It also monitors punctuation marks such as question and exclamation marks used on the web page, marking it as a potential clickbait. Furthermore, they examined the bounce rate factor-percentage of visitors who leave a website, associated with the web page. Where the bounce rate factor was high, the content was marked as a potential source of misleading information.

A competition was organised with the aim of building a classifier rating the extent to which a media post can be described as clickbait. In the clickbait competition, the data set was generated from Twitter and consisted of 38,517 Twitter posts from 27 US news publishers [129] . Out of 38,517 tweets, 19,538 were available in the training set and 18,979 were available for testing. For each tweet, a clickbait score was assigned by five annotators from Amazon Mechanical turk. The clickbait scores assigned by human evaluators were: 1.0 heavily clickbaity, 0.66 considerably clickbaity, 0.33 slightly clickbaity and 0.0 not clickbaity. The goal was to propose a regression model that could determine the probable clickbaitiness of a post. The evaluation metric used for the competition was Mean Squared Error (MSE). In this competition, Omidvar et al. [120] proposed a model using the deep learning method and won the challenge. They achieved the lowest MSE for clickbait detection by using a bi-directional Gated Recurrent Unit (biGRU). Instead of solving the clickbait challenge using a regression model, Yiewi zhou [195] reformulated the problem as a multi-classification. On the hidden state of biGRU, a token level self-attentive mechanism was applied to perform multi-classification. This self attentive Neural Network (NN) was trained without performing any manual feature engineering. They used 5 self-attentive NNs with a 80-20 percent split and obtained the second lowest MSE value. Similarly, Alexey Grigorev [66] proposed an ensemble of Linear SVM models to solve the clickbait problem and achieved the third lowest MSE value. In addition to the given data set, they gathered more data from multiple Facebook groups that mostly contained clickbait posts by using the approach described in "identified clickbaits using ML" [162] .

The rapid dissemination of fake news is so pernicious that researchers resolved towards trying to automate the process by using ML techniques such as Deep Neural Networks (DNN). However, the Black box problem -a lack of transparency in decision-making in the NNobscures reliability. Nicole et al. [118] addressed the deep learning "black-box problem" for fake news detection. A data set composed of 24,000 articles was created, consisting of 12,000 fake and 12,000 genuine articles. The fake articles were collected from Kaggle while genuine ones were sourced from The Guardian and The New York Times. The study concluded that DNNs can be used to detect the language patterns in fabricated news. Additionally, the algorithm can also be used for detecting fake news in novel topics.

Another technique to tackle the deep learning "black-box problem" in fake news detection is CSI (capture, score and integrate) -a three-step system which incorporates the three basic characteristics of fabricated news [140] . These characteristics include text, source, and the response provided by users to articulate missing information. In the first step, a Recurrent Neural Network (RNN) is used to capture the momentary pattern of user activity. The second step estimates the source of suspicions related to user behaviour. The third, hybrid step involves integration of steps one and two and is used to predict fake articles. The experiments were performed on real-world data sets and demonstrated a high level of accuracy in predicting fake articles. Still, a bottleneck in using a computationally intensive model is posed by the lack of a manually labelled fake news data set. William Yang Wang [174] addressed the limited availability of labelled data sets for combating fake news using statistical approaches and chose a contemporary publicly available data set called LIAR. This data set was utilized to investigate fabricated news using linguistic patterns. The results were based on an evaluation of several approaches, including Logistic Regression (LR), the Convolution Neural Network (CNN) model, Long Short-Term Memory (LSTM) networks and SVM. They concluded that combination of meta-data with text significantly improves the detection of fake news. According to the authors, this body of information can also be used to detect rumors, classify stance and carry out topic modeling, argument mining, and political Natural Language Processing (NLP) research. Table 1 present a summary of the different approaches proposed for both the account as well as content and context analysis of fake news.

In 2017 a competition named Fake News Challenge (FNC) was held with the aim to use Artificial Intelligence (AI) techniques to combat the problem of fake news. During the initial phase, stance detection was used. It refers to the relative stance to any issue, claim or topic made by two pieces of relevant text (what other organizations say about the topic). A two-level scoring system was applied -25 percent weight was assigned if the text was deemed to be related or unrelated to its headline and 75 percent weight was assigned on the basis of labelling the related pairs as agrees, disagrees, discusses or unrelated. In this competition, the top team submitted an ensemble model for a Deep Convolution Neural Network (DCNN) and Gradient-Boosted Decision Tree (GBDT) with a weighted average of 50/50 [147] . The DCNN and GBDT separately did not achieve perfect accuracy. However, the combination of both approaches correctly detected the stance of each headline with a score of 82.01. Similarly, approach proposed by team Athene [70] achieved a score of 81.97 and won second place in the competition. They used an ensemble approach involving multi-layer perception and applied MLP and Bag-of-Word (BoW) features to the challenge. The team in third place, Riedel et al. [134] , proposed a stance detection system for FNC Stage 1. For the input text, they used two BoW representations. A MLP classifier was used with one hidden layer having 100 units. For the hidden layer, a Rectified Linear Unit (RELU) activation function was used while the final linear layer utilized a soft-max. They achieved an accuracy of 81.72.

At a different competition named Web Search and Data Mining (WSDN) 2019, fake news was detected by classifying the titles of articles. Using a given title for any fake news article 'A' and a title for another incoming news article 'B', people were asked to classify the incoming article into one of three categories: agrees, disagrees and unrelated [136] . The winner of this competition Lam Pham [127] , who achieved 88.298 percent weighted accuracy on the private leader boards and 88.098 percent weighted accuracy on the public leader boards. This ensemble approach incorporated NNs and gradient boosting trees. In addition, Bidirectional Encoder Representation from Transformer (BERT) was used for encoding news title pairs, transforming and incorporating them into a new representational space. The approach by Liu et al. won a second place [101] by proposing a novel ensemble framework based on the Natural Language Interference (NLI) task. Their proposed framework for fake news classification consisted of three-level architecture with a 25 BERT model along with a blending ensemble strategy in the first level followed by 6 ML models and finally a single LR for the final classification. Yang et al. [188] also considered this problem as a NLI task and considered both the NLI model as well as the BERT. They trained the strongest NLI models, Dense RNN, Dense CNN, ESIM, Gate CNN [37] and decomposable attention, and achieved an accuracy of 88.063 percent.

One of the OSNs main strong points is facilitating the propagation of information between users. The information of interest to users is further shared with relatives, friends, etc [86] . In order to detect the propagation of fake news at its early stage, it is crucial to be able to understand and measure the information propagation process. The influence of propagation on OSNs and their impact on network structure was studied in [75, 145] . Ye et al. [192] study revealed that more than 45.1 percent of information shared by a user on social media is further propagated by his/her followers. Furthermore, approximately 37.1 percent of the information shared is propagated up to 4 hops from the original publisher.

Liu and Wu [102] used the data network features and introduced a popular network model for the early detection of fake news. They addressed the limitation of low accuracy of early fake news detection by classifying news propagation paths as a multivariate time series. Characteristics of each user involved in spreading news were represented by a numerical vector. Then a time series classifier was built by combining CNN and RNN. This classifier was used for fake news detection by capturing the local and global variations of observed characteristics along the propagation path. This model is considered as more robust, as it relies on common user characteristics which are more reliable and accessible in the early stage of news propagation. The experiments were performed on two real-world data sets based on Weibo [104] and Twitter [105] . The proposed model detected fake news within 5 minutes of its spread with 92 percent accuracy for Weibo and 85 percent accuracy for Twitter data sets.

Sebastian et al. [167] examined the ways to minimize the spread of fake news at an early stage by stopping its propagation in the network. They aggregated user flagging, a feature introduced by Facebook that allows users to flag fake news. In order to utilize this feature efficiently, the authors developed a technique called 'DETECTIVE' which uses Bayesian Inference to learn flagging accuracy. Extensive experiments were performed by using a publicly available data set [96] from Facebook. The results indicated that even with minimum user engagement DETECTIVE can leverage crowd signals to detect fake news. It delivered better results in comparison to existing algorithms, i.e. NO-Learn and RANDOM.

The dissemination of misinformation on OSNs has a particularly undesirable effect when it comes to public emergencies. Dynamic Linear Threshold (DLT) model [100] was developed to attempt and limit this type of information. It analyzes the user's probability, based on an analysis of competing beliefs, of propagating either credible or non-credible news. Moreover, an optimization problem based on DLT was formulated to identify a certain set of users that could be responsible for limiting the spread of misinformation by initiating the propagation of credible information.

A study by Garcia et al. [53] focused on examining reputation [11, 41, 42, 109] , popularity and social influence on Twitter using digital traces from 2009 to 2016. They evaluated the global features and specific parameters that make users more popular, keep them more active and determine their social influence. Global measures of reputation were calculated by taking into account the network information for more than 40 million users. These new global features of reputation are based on the D-core decomposition method [59] and The Twitter's bow-tie structure [17] in 2009. The results indicated that social influence is more related to popularity then reputation, and global network metrics such as social reputation are more accurate predictors for social influence than local metrics such as followers, etc.

Soroush et al. [170] collected and studied twitter data from 2006 to 2007 in order to classify it as true or false news. News is classified as true or false based on information collected from six independent fact-checking organizations. They generated a data set that consisted of approximately 126,000 tweets, tweeted by 3 million twitter users approximately 4.5 million times. They found that fake news was more novel and inspired surprise, fear, and disgust in replies, while true news inspired trust, sadness, anticipation, and joy. As people prefer to share novel information, false news spreads more rapidly, deeply and broadly than true news. According to Panos et al. [33] , rapid dissemination of information on social media is due to information cascade. Liang Wu and Huan Liu [184] also classified twitter messages using diffusion network information. Instead of using content features, they focused on the propagation of Twitter messages. They proposed trace miner, a novel approach that uses diffusion network information to classify social media messages. Trace miner accepts message traces as inputs and outputs its category. Table 2 presents a detailed summary of the studies used in network as well as content and context analysis.

After reviewing the studies discussed above, it became evident there is no 'one size fits all' when it comes to fake news detection. Extensive research is still required to fully understand the dynamic nature of this problem.

The rapid spread of fraudulent information is a big problem for readers who fail to determine whether a piece of information is real or fake. Since fake news is a big threat to society and responsible for spreading confusion, it is necessary to have an efficient and accurate solution to verify information in order to secure the global content platform. To address the problem of fake news, the American media education agency Poynter established the International Fact-Checking Network (IFCN) in 2015, which is responsible for observing trends in fact-checking as well as providing training to fact-checkers. A great deal of effort has already been devoted to providing a platform where fact-checking organizations around the world can use a uniform code of principles to prevent the spread of fake news. Two fact-checking organizations, Snopes and Politifact, developed a fake news detection tool useful in classifying fake news levels in stages. However, this tool requires a lot of manual work. There is a profound need for a model that can automatically detect fake news.

Giovanni et al. reduced the complex manual fact-checking task to a simple network analysis problem [29] , as such problems are easy to solve computationally. The proposed approach was evaluated by analyzing tens of thousands of statements related to culture, history, biographical and geographical information using a public knowledge graph extracted from Wikipedia. They found that true statements consistently receive higher support in comparison to false ones and concluded that applying network analytics to large-scale knowledge repositories provides new strategies for automatic fact-checking. Below, we examine two facets of fact checking problem. In subsection 3.1 we look into computational Table 2 . Detailed Summary of the Studies used in Network as well as Content and Context Analysis approaches to automatic fact checking, whereas in subsection 3.2, we concentrate on the issue of trust and credibility of the information and the source providing it.

Computational approaches to fact-checking are considered key to tackling the massive spread of misinformation. These approaches are scalable and effective in evaluating the accuracy of dubious claims. In addition, they improve the productivity of human fact-checkers.

One of the proposed approaches is an unsupervised network flow-based approach [149] , which helps to ascertain the credibility of a statement of fact. The statement of fact is available as a set of three elements that consist of the subject entity, the object entity, and the relation between them. First, the background information of any real-world entity is viewed on a knowledge graph as a flow of the network. Then, a knowledge stream is built by computational fact-checking which shows the connection between the subject and object of a set. The authors evaluated network flow model on actual and customized fact data sets and found it to be quite effective in separating true and false statements.

A study by Baly et al. [13] examined on the factuality and bias of claims across various news media. They collected features from articles of the target news websites, their URL structures, the web traffic they attract, their twitter accounts (where applicable) as well as Wikipedia pages. These features were then used to train the SVM classifier for bias and factuality separately. The evaluation, showed that the articles' features achieved the best performance on factuality and bias, Wikipedia features were somewhat useful for bias but not for factuality, and Twitter and URL features faired better in factuality than bias.

A different approach for an automatic fake news detection [125] was based on several exploratory analyses to identify the linguistic differences between legitimate and fake news. It involved the introduction of two novel data sets, the first collected using both manual and crowdsource annotation, and the second generated directly from the web. Based on these, first several exploratory analyses were performed to identify the linguistic properties most common for fake news. Secondly, a fake news detector model based on these extracted linguistic features was built. They concluded that the proposed system performed better than humans in certain scenarios with respect to more serious and diverse news sources. However, human beings outperformed the proposed model in the celebrity domain.

OSNs are also used as a vector for the diffusion of hoaxes. Hoaxes spread uncontrollably as propagation of such news depends on very active users. At the same time, news organizations devote a great deal of time and effort to high-quality fact-checking of information online. Eugenio et al. [158] used two classification algorithms: LR and Boolean Crowd Sourcing (BCS) for classifying Facebook posts as hoaxes or non-hoaxes based on users who "liked" it. On a data set of 15,500 posts and 909,236 users, they obtained a classification accuracy of more than 99 percent. The proposed technique even worked for users who "liked" both hoax and non-hoax posts. Similarly, Kumar et al. [89] studied the presence of hoaxes in Wikipedia articles based on a data set consisting of 20K hoaxes explicitly and manually labeled by Wikipedia editors. According to their findings, hoaxes have very little impact and can be easily detected. A multi-modal hoax detection system that merges diverse modalities -the source, text, and the image of a tweet was proposed by Maigrot et al. [106] . Their findings suggested that using only source or text modality ensures high performance in comparison to using all the modalities. Marcella et al. [159] focused on the diffusion of hoaxes on OSNs by considering hoaxes as viruses in which a normal user, once infected, behaves as a hoax-spreader. The proposed stochastic epidemic model can be interpreted as a Susceptible-Infected-Susceptible (SIS) or Susceptible-Infected-Recovered (SIS) model -the infected user can either be a believer (someone who believes the fake news) or a fact-checker (checking the news before believing it). The model was implemented and tested on homogeneous, heterogeneous and real networks. Based on a wide range of values and topologies, the fact-checking activity was analysed and then a threshold was defined for fact-checking probability (verifying probability). This threshold was used to achieve the complete removal of fake news based on the number of fact-checkers considering the news as fake or real. A study by Shao et al. focused on the temporal relation between the spread of misinformation and fact-checking, and the different ways in which both are shared by users. They proposed Hoaxy [148] -a model useful in the collection, detection, and analysis of this type of misinformation. They generated a data set by collecting data from both fake news (71 sites, 1,287,768 tweets, 171,035 users and 96,400 URLs) and fact-checking (6 sites, 154,526 tweets, 78,624 users and 11,183 URLs) sources. According to their results, fact-checking data sharing lags behind misinformation by 10-20 hours. They suggested that social news observatories could play an important role by providing the dynamics of real and fake news distribution and the associated risks.

The ease of sharing and discovering information on social media results in a huge amount of content published for target audiences. Both participants (those who share and consume) must check the credibility of shared content. Social media also enables its users to act simultaneously as content producers and consumers. The content consumer has more flexibility in what content to follow. For the content producer, it is necessary to check and evaluate the source of information. If a user is interested in receiving information regarding a particular topic of interest from a specific source, his primary task is to check the credibility, relevance, and quality of that source. Different ways of checking credibility include:

• Looking for other users who have subscribed to such information [20] .

• Assessing both the expertise (support and recommendations from other professionals) [44, 175] and user credibility. • Assessing the credibility of the sources (examining the content and peer support) [135] .

Researchers have proposed different techniques for identifying credible and reputable sources of information. Canini et al. [20] proposed an algorithm based on both the content and social status of the user. Weng et al. merged the web page ranking technique and topic modelling to compute the rank of a Twitter user [179] . Cha et al. [23] studied the factors that specify user influence. A random walk approach [126] was proposed for separating credible sources from malicious ones by performing network feature analysis.

TweetCred, a real-time web-based system, was developed to evaluate the credibility of tweets [68] . It assigns a credibility score to each tweet on a user time line rating from 1 (low credibility) to 7 (high credibility). The credibility score is then computed using a semi-supervised ranking algorithm trained on a data set consisting of an extensive set of features collected from previous work [125] and manually labelled by humans. The TweetCred evaluation was performed based on its usability, effectiveness, and response time. An 80 percent credibility score was calculated and displayed within 6 seconds. Additionally, 63 percent of users either agreed or disagreed with the generated score by 1-2 points. Irrespective of its effectiveness, the results were still influenced by user personalization and the context of tweets which did not involve factual information.

A different research model was developed -based on perceptions related to news authors, news sharers, and users -to test verification behaviours of users. [166] . The aim was to study the validation of content published by users on Social Networking Sites (SNSs). The results were assessed using a three-step analysis to evaluate the measurement model, structural model, and common method bias. It focused on the epistemology of declarations of interpersonal trust to examine factors that influence user trust in disseminated news on SNSs. To test the full model, the researchers used SmartPLS 2.0. The evaluation showed that the variety in social ties on SNSs increases trust among network participants and trust in the network reduces news verification behaviours. However, the evaluation disregards the importance of the nature of news connected with the recipient.

Trust is an important factor to be considered when engaging in social interaction on social media. When measuring trust between two unknown users, the challenging task is the discovery of a reliable trust path. In [56] , Ghavipour et al. addressed the problem of reliable trust paths by utilizing a heuristic algorithm built on learning automata called DLATrust. They proposed a new approach for aggregating the trust values from multiple paths based on a standard collaborative filtering mechanism. The experiments performed on Advogato -a well-known network data set for trust, showed the efficiency and high accuracy in predicting the trust of reliable paths between two indirectly connected users.

Liu and Wu et al. [153] studied the correlation between user profiles and the fake news shared on social media. A real-world data set comprising social context and news content was built for categorizing users based on measuring their trust in fake news. Representative groups of both experienced users (able to differentiate between real and fake news) and naive users (unable to differentiate between real and fake news) were selected. They proposed that the features relevant to these users could be useful in identifying fake news. The results for identified user groups showed that the distribution satisfies power-law distribution [31] with high 2 scores. This result indicated a significant difference between features of experienced and naive users. However, the paper left unexplored the credibility and political bias of experienced users before characterizing them for fake news detection.

The timely detection of misinformation and sharing of credible information during emergency situations are of utmost importance. The challenge of distinguishing useful information from misinformation during these events is however still significant. Moreover, a lack of know-how about social networks makes it even more challenging to discern the credibility of shared information [176] . Antoniadis et al. [8] developed a detection model to identify misinformation and suspicious behavioural patterns during emergency events on the Twitter platform. The model was based on a supervised learning technique using the user's profile and tweets. The experiments were performed on a data set consisting of 59,660 users and 80,294 tweets. The authors filtered 81 percent of the tweets and claimed that more than 23 percent were misrepresentations. Although the proposed technique makes no distinction between intentional and unintentional information [12] , it successfully achieved timely detection.

In Table 3 , we analyze trust and reputation models in terms of the mechanism used, data set as well as the outcomes and weaknesses of each model. In the existing literature, insufficient importance is given to the sources responsible for spreading the fake news. Evaluating the source is not straightforward process, as there are multiple variables to be considered in source verification, such as affiliation and reputation of the source, expertise in the domain, agreement or disapproval of other sources etc. Moreover, the absence of a source makes information unreliable, regardless of whether it is generated by an authentic source or not. Hence, fake news evaluation requires a model capable of performing source tracking, verification and validation.

Social media popularity, the availability of the internet, the extreme growth of user-generated website content, the lack of quality control and poor governance all provide fertile ground for sharing and spreading false and unverified information. This has led to continuous deterioration of information veracity. As the significance of the fake news problem is growing, the research community is proposing increasingly robust and accurate solutions. Some of the proposed solutions are discussed below and their characteristics are provided in Table 4 .

• BS-Detector 2 : Available as a browser extension for both Mozilla and Chrome. It searches for all the links available on a webpage that are linked to unreliable sources and checks these links against a manually compiled list of domains. It can classify the domains as fake news, conspiracy theory, clickbait, extremely biased, satire, proceed with caution, etc. The BS detector has been downloaded and installed around about 25,000 times [67] . Table 3 . Detailed summary of the studies used in Fact-checking, Trust and Credibility

• FiB 3 : The distribution of content is as important as its creation, FiB takes both post creation as well as distribution into account. It verifies the authenticity of a post in real time using AI. The AI uses keyword extraction, image recognition and source verification to check the authenticity of posts and provide a trust score. In addition, FiB tries to provide true information for posts that are deemed false [48] . • Trusted News add-on 4 : Built in conjunction with MetaCertProtocol powered by the Metacert organization to help users spot suspicious or fake news. It is used to measure the credibility of website content and flags content as good, questionable or harmful. It gives a wider set of outputs, including marking website contents as malicious, satirical, trustworthy, untrustworthy, biased, clickbait and unknown [171] . • SurfSafe 5 : There are different ways to analyze fake news such as textual analysis, image analysis, etc. Ash Bhat and Rohan Phadte focused on the analysis of fake news using images and generated a data set which consists of images collected from 100 fact-checking and trusted new sites. They developed a plug-in that checks the images against a generated data set. The main idea is to check each new image against the generated image data set. If the image is used in a fake context or modified, the information as a whole is considered fake [30] . • BotOrNot: A publicly available service used to assign a classification score to a Twitter account. This score is assigned to an account on the basis of the similarity it exhibits to the known characteristics of social bots. This classification system leverages more than 1,000 features extracted from contents, interaction patterns and available metadata [38] . These features are further grouped into six sub-classes: -Network features -built by extracting the statistical features for mentions, retweets, and hashtag co-occurrence. -Sentiment features -built by using a sentiment analysis algorithm that takes into account happiness, emotion scores, etc. • Decodex 6 : An online fake news detection tool that alerts the user to the potential of fake news by labeling the information as 'satire', 'info' and 'no information' [61] . • TrustyTweet 7 : TrustyTweet is a browser plug-in, proposed for twitter users to assess and increase media literacy. It shifts the focus from fake news detection by labelling to supporting users to make their own assessment by providing transparent, neutral and intuitive hints when dealing with the fake news. TrustyTweet is based on gathering the potential indicators for fake news, already identified and proven to be promising in previous studies [72] . • Fake News Detector 8 : The Fake News Detector is an open source project used for flagging news. A user can flag news as either fake news, extremely biased or clickbait. The user flagging activity is visible to other fake news detector users who may flag it 3 https://devpost.com/software/fib 4 https://trusted-news.com/ 5 https://chrome.google.com/webstore/detail/surfsafe-join-the-fight-a/hbpagabeiphkfhbboacggckhkkipgdmh? hl=en 6 https://chrome.google.com/webstore/detail/decodex/kbpkclapffgmndlaifaaalgkaagkfdod?hl=fr 7 https://peasec.de/2019/trustytweet/ 8 https://github.com/fake-news-detector/fake-news-detector/tree/master/robinho again. Once the news is flagged, it is saved in the repository and accessible to Robhino -an ML robot trained on the inputs provided by humans that flags news automatically as clickbait, fake news or extremely biased news. • Fake News Guard 9 : Available as a browser extension, it can verify the links displayed on Facebook or any page visited by the user. There is insufficient information about the way this tool works, however the key idea is that the "Fake news guard uses the AI technique along with network analysis and fact-checking". • TweetCred 10 : A web browser tool used for assessing the credibility of tweets by using a supervised ranking algorithm trained on more than 45 features. TweetCred assigns a credibility score for each tweet on the user time line. Over the course of three months, TweetCred was installed 1,127 times and computed the credibility score for 5.4 million tweets [68] . • LiT.RL News Verification 11 : A research tool that analyses the language used on web pages. The core functionality of the News Verification browser is textual data analysis using NLP and automatic classification using a SVM. It automatically detects and highlights website news as clickbait, satirical fake news and fabricated news [137] .

Detecting fake news on social media poses many challenges as most fake news is intentionally written. Researcher are considering different information, such as user behaviour, the engagement content of news, etc. to tackle the problem. However, there is no data set available that could provide the information on how fake news propagates, how different users interact with fake news, how to extract temporal features which could help to detect it and what the impact of fake news truly is.

In the previous section, we discussed the automatic detection of fake new using ML models. ML models require high quality data set to be efficient. This continues to be a major challenge when it comes to social media data due to it's unstructured nature, high dimensionality, etc. In order to facilitate research in this field, a comprehensive guide to existing data sets is required. Below we present the details for some of the more widely used ones:

• CredBank: Collected by tracking more than 1 billion tweets between October 2014 and February 2015 [111] . It consists of tweets, events, topics and an associated credibility judgment assigned by humans. The data set comprises 60 million tweets which are further categorized into 1,049 real-world events. Further, the data is spread into a streaming tweet file, topic file, credibility annotation file and searched tweet file [112] . • LIAR: A publicly available fake news detection data set [46] that can be used for fact-checking. It consists of 12,836 short statements labelled manually by humans.

In order to verify their truthfulness, each statement is evaluated by the editor of POLITIFACT.COM. Each statement is labelled in any of the following six categories: true, mostly-true, half-true, barely-true, false, pants on fire [174] . • Memetracker9: This data set [94] recorded social media activity and online mainstream content over a three-month period. They used a Spinn3rAPI and collected 90 million documents from 165 million different websites [95] . The data set they generated is 350GB in size. First, they extracted 112 million quotes, which were further refined and from which 22 million distinct phrases were collected. • FakeNewsNet 12 : A multi-dimensional data repository consisting of social context, content and spatiotemporal information [151] . The data set was constructed using FakeNewsTracker, a tool used for collecting, analyzing as well as visualizing fake news. In the given data set, the content consists of news, articles and images while context consists of information related to the user, post, response and network. The spatiotemporal information consists of spatial (user profile with location, tweets with location) and temporal information (timestamp for news and responses). • BuzzFeedNews 13 : This data set, recorded all the news published by 9 news agencies on Facebook regarding the US election. The articles and news were fact-checked by journalists from BuzzFeed. It contains 1,627 articles and 826 streams from hyperpartisan Facebook pages which publish misleading and false information at an alarming rate [116] . • BS Detector [168] : This data set was collected by using BS Detector, a web browser extension for both Chrome and Mozilla. It is used to search all the links linked to unreliable sources on a given web page. These links are checked across a manually compiled list of domains. • BuzzFace: This data set [143] consists of 2,263 news articles and 1.6 million comments.

Buzzface is based on extending the BuzzFeed data set by adding comments related to Facebook news articles. The news articles were categorized as "mostly true","mixture of true and false", "mostly false" and "no factual comments". • FacebookHoax: In this data set [158] , facebook graph API is used for the collection of data. It consists of 15,500 posts, of which 8,923 are hoaxes and the rest are non-hoaxes. These posts were collected from 32 pages: 14 conspiracy and 18 scientific pages. In addition, the data set also includes the number of likes, which exceeds 2.3 millions. • Higgs-Twitter: This data set [39] consists of Twitter posts related to the discovery of the new Higgs boson particle. The tweets were collected using the Twitter API. It consists of all the tweets that contain one of the following hashtags or keywords: cern, higgs, lhc, boson. The data set consists of 527,496 users and 985,590 analysed tweets of which 632,207 were geo-located tweets. • Trust and Believe: This data set consists of 50000 Twitter users, all of whom were politicians [82] . For each user, a unique profile is created containing 19 features. A total of 1000 user was manually annotated, with the rest being classified using an active learning approach. Table 6 presents a detailed summary of the available data sets used for fake news detection in existing literature. Most are either small in size or contain mainly uni-modal data. The existing multi-modal data sets, unfortunately, still can't be used as a benchmark for training and testing models for fake news detection [79] . The next step is to generate large and comprehensive data sets that would include resources from which all relevant information could be extracted.

Solving the problem of fake news detection and minimizing their impact on society is one of the important issues being considered in the research community. In this review, we analysed different studies using varying methods for detecting fake news. With the aim of aiding future research we provide their classification based on the social media platform used in Table 5 .

Research Papers Twitter

Grier et al. [65] ,Ye et al. [192] , Chengcheng et al. [148] , Soroush et al. [170] , Liang Wu Similarly, a study of the current literature on false news identification can be divided into four paradigms: hybrid approach, feature-based, network propagation and knowledge-based. The hybrid approaches employ both human and ML approaches for the detection of fake news. In the feature-based method, multiple features associated with a specific social media account are used to detect fake news. This paradigm can further be divided into three sub-categories -account-based, context and content-based and Text categorization. These methods are explicitly discussed in section 2. The third paradigm, network propagation, describes the potential methods for discovering, flagging and stopping the propagation of fake news in its infancy. The final paradigm entails supplementing AI models with human expert knowledge for decision-making (see section 2). An overview of these paradigms is given in Figure 1 .

Knowledge Based Info Retrieval [13, 47, 68, 74] Semantics [29, 148, 149, 159] Network Propagation Link Based [4, 92, 149] Neighbor Based [76, 98, 132] Group Based [2, 52, 78, 182, 184] Feature Based Text Categorization [118, 125, 130, 158, 182] Context and Content Based [140, 173, 174, 193] Account Based [27, 28, 35, 43, 45, 58, 62, 63, 92, 185] Hybrid Approach [70, 83, 102, 103, 127, 147, 174, 188] Identifying and mitigating the spread of fake news and its variants presents a set of unique challenges. Fake news dissemination is a part of coordinated campaigns targeting a specific audience with the aim of generating a plausible impact on either local or global level. Many companies as well as entire countries were faced with the need to start building mechanisms to protect citizens from fake news. In September 2019, Facebook announced it was contributing $10 million to a fund to improve deepfake detection technologies while several governments have taken different initiatives to defeat this problem [50, 142, 163] . Educational institutions and non-profit organizations have also tried to mitigate the problem through advocacy and literacy campaigns. Specifically, these institutions in collaboration with technology companies have designed various techniques for detecting, flagging, and reporting fake news [21, 117, 133, 144] . Table 7 summarizes the actions that have been taken by governments around the world in order to battle the spread of fake news. -Imposition of a fine and imprisonment; -Reliable information is published to systematically rebut fake news Egypt Media regulation -Three domestic laws have been passed to regulate information distribution and its accuracy; -Imposing sanctions for spreading fake news France

Election misinformation -No specific law but there is general legislation against fake news; -Imposing sanctions for spreading fake news Germany

Hate speech -A number of civil and criminal laws exist for fake news; -Network enforcement act specific for fighting fake news Israel

Foreign disinformation campaign -High-level committee appointed by the president to examine the current law for threats and find ways to address them; -Imposing sanctions for spreading fake news Japan

Media regulation -A law exists to counter fake news; -Ministry of Communication and Internal Affairs work jointly to counter fake news Kenya

Election misinformation -Computer misuse and cyber-crime act has been passed, not yet in force; -Educating citizens Malaysia

Election misinformation -Malaysian anti-fake News Act 2018; -A fact-checking portal is operated by government agencies; -Imposing sanctions for spreading fake news Nicaragua

Media regulation -No specific law available, however some provisions can be found within the penal code and election law Russia

Election misinformation -Passed legislation that addresses the spread of fake news; -Imposing sanctions for spreading fake news Brazil

Election misinformation -No law but the topic is under discussion in congress; -Fines and imprisonment United Kingdom Foreign disinformation campaign -No legislation to scrutinize or validate news on social media; -Reliable information is published to systematically rebut fake news United Arab Emirates Election misinformation -Sharing misinformation is a crime by law; -Imposition of a fine United States disinformation, misinformation -Proposed a federal law; -State media literacy initiatives Table 7 . Approaches Taken by Governments to Tackle the Problem of Fake News

The greatest obstacle in fake news detection is that the information spreads through social media platforms like forest fire (especially if it's polarizing) which when not addressed, becomes viral in a matter of milliseconds [155] . The implications of this instantaneous consumption of information, on the other hand, are long-lasting. As a result, fake news becomes indistinguishable from real information, and the ongoing trends are difficult to recognize. We believe that fake news propagation can only be successfully controlled through early detection (see section 2.4). Another significant problem is that the rise in the influence of social media is closely connected to the increase in the number of users. According to Figure 2 , there are currently more than 3 billion users and by 2024 this number is expected to exceed 4 billion, a development that will eventually lead to an exponential rise in data [161] . This data is most likely to be potentially uncertain due to inconsistencies, incompleteness, noise and unstructured nature. This complexity increases the velocity, variety, and amount of data and will most probably jeopardize the legitimacy of the results of any standard analytic processes and decisions that would be based on them. Analysis of such data requires tailor-made advanced analytical mechanisms. Designing techniques that could efficiently predict or evaluate future courses of action with high precision thus remains very challenging.

To summarize, humans are susceptible to becoming victims of false information due to their intrinsic way of processing and interpreting information being influenced by cognitive biases -namely, by the Truth Bias, Naive Realism and Confirmation Bias [155] . Consequently, all fake information floating around can lead to false information which is capable of ruining the "balance of news ecosystem". The main challenge is that most users do not pay more attention to the manipulated information, while those who are manipulating it are systematically trying to create more confusion. The outcome of this process is that the people's ability to decipher real from false information is further impeded [138, 152] .

Can we stop the viral spread?, the answer obviously is Not yet and it is because of the critical challenges surrounding the detection of fake news (see Figure 3 ). Several efforts, however, have been put in place to help limit it such as media literacy. Media literacy comprised of practices that enable people to access and critically evaluate content across different media seems like the only valid solution. Although this is, and always was a challenging task, a coherent understanding, proper education, training, awareness and responsible media engagement could change this [18] . In the mean time, resisting disinformation and "fake news" culture should be promoted and encouraged. In addition, cross-disciplinary collaboration (i.e., social psychology, political science, sociology, communication studies etc.) can help and streamline findings across diverse disciplines to devise a holistic approach for understanding the media environment structure and how it operates.

Today, OSNs can be seen as platforms where people from all over the world can instantly communicate with strangers and even influence people's actions. Social media has shaped the digital world to an extent that they now seem like an indispensable part of our daily lives. However, social networks' ease of use has also revolutionized the generation and distribution of fake news. This prevailing trend has had a significant impact on our societies.

In this survey paper, we studied the problem of fake news detection from two different perspectives. Firstly, to assist users in identifying who they are interacting with, we looked at different approaches in existing literature used for the identification and classification of user accounts. To this end, we analysed in depth both the users' context (anyone) and content (anything). For the early identification and mitigation of fake news, we studied different approaches that focus on data network features. Recently proposed approaches for measuring the relevance, credibility, and quality of sources were analysed in detail.

Secondly, we approached the problem of automating fake news detection by elaborating on the top three approaches used during fake news detection competitions and looked at the characteristics of more robust and accurate web-browsing tools. We also examined the statistical outputs, advantages, and disadvantages of some of the publicly available data sets. As the detection and prevention of fake news presents specific challenges, our conclusion identified potential challenges and promising research directions.

This research has received funding from the EU research projects ASCLEPIOS (No. 826093) and CYBELE (No 825355).

Malicious accounts: Dark of the social networks

Graph based anomaly detection and description: a survey

The #Election2016 Micro-Propaganda Machine

Detecting fake news in social media networks

Social media and fake news in the 2016 election

Detecting social bots on Twitter: a literature review

A model for identifying misinformation in online social networks

How much time do people spend on social media?

Twitter by the Numbers: Stats, Demographics & Fun Facts

Sifting: A Privacy-Preserving Reputation System Through Multi-Input Functional Encryption

Identification of credulous users on Twitter

Predicting factuality of reporting and bias of news media sources

Detecting spammers on twitter

Companies fear rise of fake news and social media rumours

Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena

Graph structure in the web

The promises, challenges, and futures of media literacy

Grassroots propaganda in the Third Reich: The Reich ring for National Socialist propaganda and public enlightenment

Finding credible information sources in social networks based on content and social structure

Flagging Fake News

Fake news: What exactly is it and how can you spot it

Measuring user influence in twitter: The million follower fallacy

Wisdom of crowds: The value of stock opinions transmitted through social media

Misleading online content: recognizing clickbait as" false news

Fake News Perception in Germany: A Representative Study of People's Attitudes and Approaches to Counteract Disinformation

Who is tweeting on Twitter: human, bot, or cyborg

Detecting automation of twitter accounts: Are you a human, bot, or cyborg?

Computational fact checking from knowledge networks

SurfSafe offers a browser-based solution to fake news

Power-law distributions in empirical data

Trends in combating fake news on social media-a survey

Introduction-platforms and infrastructures in the digital age

DNA-inspired online behavioral modeling and its application to spambot detection

The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race

Cashtag piggybacking: Uncovering spam and bot activity in stock microblogs on Twitter

Language modeling with gated convolutional networks

Botornot: A system to evaluate social bots

The anatomy of a scientific rumor

Multi-Party Trust Computation in Decentralized Environments

Multi-party Trust Computation in Decentralized Environments in the Presence of Malicious Adversaries

Is that a bot running the social media feed? Testing the differences in perceptions of communication quality for a human agent and a bot agent on Twitter

The Cambridge handbook of expertise and expert performance

Twitter fake account detection

Emergent: a novel data-set for stance classification

The current state of fake news: challenges and opportunities

Spread of coronavirus fake news causes hundreds of deaths

A guide to anti-misinformation actions around the world

Twitter sentiment around the Earnings Announcement events

On community outliers and their efficient detection in information networks

Understanding popularity, reputation, and social influence in the twitter society

The research of the level of social media addiction of university students

A dynamic algorithm for stochastic trust propagation in online social networks: Learning automata approach

Trust propagation algorithm based on learning automata for inferring local trust in online social networks

Understanding and combating link farming in the twitter social network

Bot and Gender Detection using Textual and Stylistic Information

D-cores: measuring collaboration of directed graphs based on degeneracy

The rise of machine learning for detection and classification of malware: Research developments, trends and challenges

Evaluation of the existing tools for fake news detection

STCS -Streaming Twitter Computation System

Classification of twitter accounts into automated agents and human users

Fake news handed Brexiteers the referendum and now they have no idea what they're doing

@ spam: the underground on 140 characters or less

Identifying clickbait posts on social media with an ensemble of linear models

Facebook warned people that a popular fake news detector might be "unsafe

Tweetcred: Realtime credibility assessment of content on twitter

Triple Scoring

Team Athene on the Fake News Challenge

Lifespan and propagation of information in On-line Social Networks: A case study based on Reddit

TrustyTweet: An Indicator-based Browser-Plugin to Assist Users in Dealing with Fake News on Twitter

Consensus dynamics in online collaboration systems

Journalists, social media, and the use of humor on Twitter

Predicting popular messages in twitter

Social spammer detection in microblogging

Social bots -the technology behind fake news

Review spam detection

NewsBag: A Multimodal Benchmark Dataset for Fake News Detection

Social media, the digital revolution, and the business of media

Rise of spam and compromised accounts in online social networks: A state-of-the-art review of different combating approaches

Trust and Believe -Should We? Evaluating the Trustworthiness of Twitter Users

Trust and Believe -Should We? Evaluating the Trustworthiness of Twitter Users

Weblog analysis for predicting correlations in stock price evolutions

Fake news filtering: Semantic approaches

Academic social networks: Modeling, analysis, mining and applications

The economics of "fake news

Study epidemiology of fake news

Disinformation on the web: Impact, characteristics, and detection of wikipedia hoaxes

The science of fake news

Uncovering social spammers: social honeypots+ machine learning

Seven months with the devils: A long-term study of content polluters on twitter

Warningbird: A near real-time detection system for suspicious urls in twitter stream

Meme-tracking and the dynamics of the news cycle

Meme-tracking and the dynamics of the news cycle

Learning to discover social circles in ego networks

Analyzing and detecting opinion spam on a large-scale dataset via temporal and spatial patterns

Robust unsupervised feature selection on networked data

Detecting product review spammers using rating behaviors

Real-time and cost-effective limitation of misinformation propagation

Trust or Suspect? An Empirical Ensemble Framework for Fake News Classification

Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks

FNED: A Deep Network for Fake News Early Detection on Social Media

Detecting rumors from microblogs with recurrent neural networks

Detect rumors in microblog posts using propagation structure via kernel learning

Mediaeval 2016: A multimodal system for the verifying multimedia use task

Correlating S&P 500 stocks with Twitter data

News use across social media platforms 2018

The lord of the sense: A privacy preserving reputation system for participatory sensing applications

Keep Pies Away from Kids: A Raspberry Pi Attacking Tool

Credbank: A large-scale social media corpus with associated credibility annotations

Tweeting is believing?: understanding microblog credibility perceptions

Real-time detection of content polluters in partially observable Twitter networks

Media and Propaganda: The Northcliffe Press and the Corpse Factory Story of World War I

Fact-Checking Facebook Politics Pages

Instagram Is Removing "Fake News

The language of fake news: Opening the black-box of deep learning based detectors

New application can detect Twitter bots in any language

Using Neural Network for Identifying Clickbaits in Online News Media

A survey on natural language processing for fake news detection

Content based fake news detection using knowledge graphs

Investigating the emotional appeal of fake news using artificial intelligence and human contributions

Twitter's spam reporting tool now lets you specify type, including if it's a fake account

Automatic detection of fake news

Deepwalk: Online learning of social representations

Transferring, Transforming, Ensembling: The Novel Formula of Identifying Fake News

A short guide to the history of 'fake news' and disinformation

Webis Clickbait Corpus

Janek Bevendorff, and Benno Stein. 2017. A stylometric inquiry into hyperpartisan and fake news

Australia fires: Misleading maps and pictures go viral

Collective opinion spam detection: Bridging review networks and metadata

Social Media News: Fake News Flagging Tool, Clear Facebook History and More

A simple but tough-to-beat baseline for the Fake News Challenge stance detection task

Annual review of information science and technology

Getting Real about Fake News

A News Verification Browser for the Detection of Clickbait, Satire, and Falsified News

Deception detection and rumor debunking for social media

Deception detection for news: three types of fakes

Csi: A hybrid deep model for fake news detection

Correlating financial time series with micro-blogging activity

Legislative Measures Adopted at the International Level Against Fake News

Buzzface: A news veracity dataset with facebook user commentary and egos

Instagram fact-check: Can a new flagging tool stop fake news?

Understanding spreading patterns on social networks based on network topology

Your favorite Twitter bots are about die, thanks to upcoming rule changes

Talos Targets Disinformation with Fake News Challenge Victory

Hoaxy: A platform for tracking online misinformation

Finding streams in knowledge graphs to support fact checking

Combating disinformation in a social media age

FakeNewsNet: A Data Repository with News Content, Social Context, and Spatiotemporal Information for Studying Fake News on Social Media

Fake news detection on social media: A data mining perspective

Understanding user profiles on social media for fake news detection

This Analysis Shows How Viral Fake Election News Stories Outperformed Real News On Facebook

Fake news detection in social media

Your botnet is my botnet: analysis of a botnet takeover

More Americans Are Getting Their News From Social Media

Some like it hoax: Automated fake news detection in social networks

Fact-checking effect on viral hoaxes: A model of misinformation spread in social networks

Defining "fake news" A typology of scholarly definitions

Identifying Clickbaits Using Machine Learning

Global Legal Research Directorate The Law Library of Congress. 2019. 53K rumors spread in Egypt in only 60 days, study reveals

The role of the underground economy in social network spam and abuse

Design and evaluation of a real-time url spam filtering service

Combating fake news: An investigation of information verification behaviors on social networking sites

Fake news detection in social networks via crowd signals

Telling humans and computers apart automatically

The spread of true and false news online

Factmata Trusted News Chrome Add-On Has Been Turned Off Until Further Notice

Automatic scoring of online discussion posts

Don't follow me: Spam detection in twitter

liar, liar pants on fire": A new benchmark dataset for fake news detection

A trust-based probabilistic recommendation model for social networks

Explained: What is Fake News?

Credibility improves topical blog post retrieval

Automatically assessing the post quality in online discussions on software

Twitterrank: finding topic-sensitive influential twitterers

Almost all the traffic to fake news sites is from Facebook, new data show

Fake news is thriving thanks to social media users, study finds

Adaptive spammer detection with sparse group modeling

Detecting camouflaged content polluters

Tracing fake-news footprints: Characterizing social media messages by how they propagate

Detecting marionette microblog users for improved information credibility

Votetrust: Leveraging friend invitation graph to defend against social network sybils

Bot, cyborg and automated turing test

Fake News Detection as Natural Language Inference

Uncovering social network sybils in the wild

How should social media platforms combat misinformation and hate speech?

Temporal opinion spam detection by multivariate indicative signals

Measuring message propagation and social influence on Twitter

Enquiring minds: Early detection of rumors in social media from enquiry posts

Network-based Fake News Detection: A Pattern-driven Approach

Clickbait detection in tweets using self-attentive network

Detection and resolution of rumours in social media: A survey