key: cord-0604358-8oyywoup authors: Raj, Chahat; Meel, Priyanka title: A Review of Web Infodemic Analysis and Detection Trends across Multi-modalities using Deep Neural Networks date: 2021-11-23 journal: nan DOI: nan sha: dd1d1c15d5bb65557833537f4ef157433dc75432 doc_id: 604358 cord_uid: 8oyywoup Fake news and misinformation are a matter of concern for people around the globe. Users of the internet and social media sites encounter content with false information much frequently. Fake news detection is one of the most analyzed and prominent areas of research. These detection techniques apply popular machine learning and deep learning algorithms. Previous work in this domain covers fake news detection vastly among text circulating online. Platforms that have extensively been observed and analyzed include news websites and Twitter. Facebook, Reddit, WhatsApp, YouTube, and other social applications are gradually gaining attention in this emerging field. Researchers are analyzing online data based on multiple modalities composed of text, image, video, speech, and other contributing factors. The combination of various modalities has resulted in efficient fake news detection. At present, there is an abundance of surveys consolidating textual fake news detection algorithms. This review primarily deals with multi-modal fake news detection techniques that include images, videos, and their combinations with text. We provide a comprehensive literature survey of eighty articles presenting state-of-the-art detection techniques, thereby identifying research gaps and building a pathway for researchers to further advance this domain. Since the advent of online social platforms, the need to establish baselines for informationassessment across all information flow channels has demanded researchers' attention. It has been dubious how wisely people using these platforms actively communicate and digest the information circulating on the internet. With the exponential emergence of social media as a globally used platform in the recent decade, we have encountered massive escalation in fake news dispersion. Any act of deliberate, miscreant, or unverified inclusion of information creates fake news. Substantial instances include widespread misinformation since January 2020 claiming World War 3 [1], its probability, countries involved, and tentative dates. The arrival of the pandemic has led to a multi-fold rise in disseminating fake news globally. This escalated spread of misinformation amidst the pandemic has been termed as 'Infodemic.' Statements such as Social media interactions have contributed a lot towards inventories of big data. Users accord it in text, image, video, audio, emoticons, reactions, etc. Text, images, and videos compose a major actuating portion of the information on the internet and hence blot the grey areas of fake news. Since fake news and its critical nature came into the picture, several scientific developments have been made in textual fake news detection. Fake news detection technologies using NLP, text classification, vector-space models, rhetoric structure theory, opinion mining, sentiment analysis, graph theory, deep neural networks, and others have been created, reviewed, and summarized by fellow researchers [13] [14] [15] [16] . Probing of image fake news is comparatively lower, and of videos, it is negligible. Most of the information users encounter on the internet is accompanied by visual representations, either using an image, video, or other modalities. Visual data is quickly gazed upon and leaves a lasting impact. It is the breeding ground of tampering, manipulations, and forgeries. With technology emanating, editing applications and techniques, images and videos are being meddled to mislead information consumers. Moreover, online social media allows users to add to their data, be it real or fake. Recent examples include an image of a shark on a Houston freeway that went viral during Hurricane Harvey (Figure 1 ). The image was shared and retweeted at large, creating havoc amongst people. Misinformation has been so outrageous that even US President Donald Trump could not escape from it and went on to retweet a fake video that stated anti-malaria drugs could cure coronavirus, which was later brought down when proven fake [17] . In 2020, during the coronavirus pandemic, fake news claiming Kim Jong Un dead, absconding, or assassinated spread widely [18] . Fake videos of his funeral were shared on social media ( Figure 2 ). In this survey, we broadly categorize data modalities into text, images, and video that are spreading fake news on the internet. We present a literature review in fake news detection techniques that cover these data modalities individually or in combination as hybrid or multi-modal data. This review intends to cover detection mechanisms for data modalities that promote most of the spreading fake news, i.e., text, images, and videos. This survey aims to highlight detection mechanisms that can detect misinformation based on any of these data types. We enlist the challenges and limitations of past research and enlighten ways to future work. As we observe trends among fake news detection, a huge amount of published work and resources are available for text-based detection. Considerable amounts of frameworks and detection mechanisms using textual features have been designed since the problem domain emerged. Machine learning and deep learning algorithms have been applied largely to provide solutions due to their extreme popularity in this domain. These include sentiment analysis, text mining, stance classification, similarity analysis, etc. Texts are analyzed based on their sentence structures, words, punctuations, tone, grammar, and pragmatics. Textual fake news analysis is the domain well explored and worked upon by a large number of researchers. Various other detection techniques have emerged, like using psycholinguistic features, deception modeling, fake news spreading prediction, graph-based propagation detection, and reputation scores. We present this review in a modality-focused manner. Fake news spreads in the form of textual or visual data. We aim to elaborate on how mechanisms have been developed to deal with each type of fake information that spreads textually through images or videos. The fake news domain has been expressively presented by Sharma and Sharma [19] , Figueira and Oliveira [20] , Torabi and Taboada [21] , Zhang and Ghorbani [22] , Tandoc et al. [23] , and Shu et al. [24] . Summaries of noteworthy works have been provided in reviews by Mosinzova et al. [25] and Rubin et al. [26] . In contrast, Rajendran et al. [27] brought into the light the importance of Deep Neural Networks for stance classification. Moving towards fake visual news, tasks performed have been comparatively lower in number. The domain started gaining attention in 2013 and rose considerably from 2017 to 2020. Image fake news detection started gaining importance when image-accompanied news and posts started appearing at large on online platforms. The rise in visual data made ways for fake news to seep in and thus encouraged researchers to explore this domain. Cao et al. [28] have highlighted the role of exploring visual data while detecting fake news. Fake news detection has shown notable performance improvements when a visual analysis is combined with text. Most of the visual classification techniques applied neural networks that provided quick and efficient results. Fake images can be classified as tampered images where some of the other manipulations are made in the image or as misleading images where the context of news content and image do not comply. Some fake news is also accompanied by older images from other events, i.e., combining recent news with outdated images. Another category of fake images that recently emerged is computergenerated images commonly created by Generative Adversarial Networks (GANs). All these types of images contribute to fake news. Any FND framework has not been able to detect fake news that revers to all these types collectively. Individual modules have been created for different tasks. Some architectures can classify tampered images well, while others can spot misleading images that do not match the context. Some frameworks classify images based on statistical features of textual and visual data. Parikh and Atrey [29] provided a survey for multimedia FND. There is a lack of a wholesome framework that could efficiently detect fake news under all modalities. As we talk of visual data, fake news videos have taken up at large over social media. These have a great impact on the minds of viewers and bring adversarial social and political effects. Frameworks for video fake news detection are very few. Multiple features are needed to be studied for video analysis, being a complex task containing spatial and temporal features, speech, and movements. Few researchers have applied it using inconsistencies between speech and lip movements; few have attended analyzing facial expressions following similar trends in real and fake videos, while few utilize the image information at every frame. Other multimedia news in audio, podcasts, and broadcasts are yet less infiltrated with fake news. With an acutely low amount of work performed, video fake news detection is an emerging problem domain that researchers need to pay heed to. The domain of fake news detection gained popularity very quickly. Large amounts of unverified and uncredible posts have been misleading people. Using linguistic features for credibility assessment of content is a popular and widely-used method. Here, we list the existing challenges to detect fake news spreading through all types of data. The fake news menace is rising. It began with spreading through text and has now started gripping users through all forms of multimedia. Visual data probable to fake news exploitation can be categorized into images or videos. There are various existing approaches to detect text-based fake news. However, visuals play a great role in impacting viewers' minds and therefore are being infiltrated with fake news in the current generation. An image or video can be easily modified using media editing applications. Various manipulations in visual data go unrecognized through the viewers' eyes. It is very difficult for humans to observe minor changes in modified images and videos to classify them as fake or real. Automated tools are required which can identify minute variations created made to fabricate or manipulate visual data. This poses a great challenge for researchers in designing visual-based fake news detectors. Auditory fake news is a type of fake news that has been in existence but is going unnoticed. Many social applications allow users to share recorded audios. These audios files are vulnerable to spreading fake content, propaganda, unverified information, and more. This type of multimedia has not been put into use for credibility assessment. There are no fake news detection mechanisms that incorporate audio as a sole modality or in combination with other data modalities. This issue needs to be addressed to prevent the contamination of audios with false content and serve its early detection. When one type or modality of data is fixed or embedded within another type of data, it is embedded content. A new type of social media posts is spreading widely known as a 'meme.' It is an image or a video, mostly with text embedded on it. Various forms of media like text, image video, gifs, or hyperlinks are embedded into other forms. It is a complex task to analyze media that is embedded in some other data type. There is an upsurge of fake content in the form of embedded media or memes. Efficient detection mechanisms are required to fight such misleading content. To build fake news detection mechanisms, most machine learning and deep learning tasks require large amounts of data. There is a lack of real-world multimodal datasets. Text-based fake news datasets are more in number than visual or multimodal datasets. Lack of proper datasets limits the extent of research. There is a need to collect real-world fake news data that consists of various types of information like text, image, video, and meta-data. The research community has encountered many techniques that can robustly detect fake news. These techniques use linguistic features, visual features, sentiment scores, social context, network/propagation-based features, meta-data, and hybrid features. There is no such mechanism at present that extracts all these details from a given fake content and predicts its integrity based on all the contributing factors. Different researchers have highlighted the importance of all of these techniques individually or in hybrid combinations. It is worthwhile to consider all these features for building a holistic fake news detector. Provided information-spread ease through the internet and online social platforms, fake news is being generated and spread at every instant. Fake news can be about anything and anyone. It spreads continuously as we interact on the internet. Existing detection tools either require users to self-validate a piece of news by fact-checking on their website/application or classify news late after it has been spread and affected various aspects of life. The world needs a system that analyses content in real-time and declares it as fake or real based on its decision. There is a lack of significant literature in the domain of multi-modal fake news detection. Although many authors have presented textual detection mechanisms, work done in the multimodal domain is minimal and not complete. In this paper, we endeavour to cover all the past research performed using multiple data modalities other than only text-based techniques. We highlight works that have used visual content, alone or along with textual content, for detecting fake news. has been rising since then. It is easily recognizable that tasks involving visual data in the fake news domain has been rising for the past five years and is grabbing researchers' attention. The search words applied for querying digital databases include multi-modal fake news detection, image fake news detection, fake news detection, multi-modal fake news datasets, and their synonyms. Figure 4 , presents the article-wise percentage distribution of algorithms and techniques utilized for multi-modal fake news detection. To the best of our knowledge, this is the first data modality-based review in fake news detection. The contributions of this study are as follows: • Analyzing and identifying the techniques utilized in multi-modal fake news detection tasks. • Comparing these techniques based on their applications, advantages, and disadvantages. • Providing a comprehensive review of remarkable work done in the domain, discussing popular techniques, datasets used, and results obtained. • Providing a detailed summary of multi-modal usable datasets for fake news detection. • Comparing the efficiencies of available literature and their work in terms of evaluation parameters utilized. • Identifying the research gaps in multimodal fake news detection methods and enlisting potential research directions. The organization of this paper is depicted in figure 5 . Section 1 discusses the current Infodemic situation and introduces readers to the problem domain. We describe the motivation to conduct this review and the existing challenges in multimodal fake news detection. The review methodology used and analysis of articles is described next. In section 2, we describe the types of modalities in which fake news propagates, which forms the basis of this review. Section 3 discusseses several detection methods used so far across textual and visual modalities and list their advantages, disadvantages, and applications. Section 4 provides a detailed review of the literature and important architectures built for fake news detection tasks. Section 5 serves the readers with rich tabular information of benchmark multi-modal datasets for FND tasks. Section 6 explains the performance evaluation metrics various researchers have used in their works, their distribution, and performance analysis of noteworthy architectures. In section 7, we discuss potential future research directions. Section 8 concludes this review by summarizing our work and imparting motivation and future research directions to readers. Fake news is defined as any piece of false information that misleads people. It can be deliberate, fabricated, or simply unintentional. The intent of spreading false news could be maliciously intentional, political, for gaining monetary benefits, popularity, or simply for fun. While referring to data modalities, fake news spreads through text, images, videos, audios, hyperlinks, embedded content, and hybrids. Because of less or no work in the remaining modalities, research is limited to textual and visual modalities. Therefore, we consider these modalities for review, which have been explored by researchers for fake news detection. Text: This is the most popular mode of communication on the internet. People interact through textual matter on social media platforms, websites, blogs, e-mails, personal messaging, and more. Most of the false information spreads through text on the internet. Fake news is found propagating on social media posts, articles, and online messaging services. Text is the simplest and most used way for an internet user to convey his concerns. Being a largely used modality for communication, it also accounts for a large amount of fake news. Figure 6 shows an example of fake textual news. The screengrab is taken from Twitter. In the tweet, the user falsely attributes a claim to WION News, which states that China is hiding the real numbers of death amidst the coronavirus pandemic. The post says that SO2 concentration around Wuhan, China, has grown due to the burning of a large number of dead bodies. The claim is false and has been debunked by various fact-checking websites. cropping, splicing, copy-move, retouching, or blurring. Any image can be manipulated to convey a false message, which contributes to fake news. Often, they are not manipulated but accompanied by false text. Many of the times, irrelevant or out-of-context image is placed with fake text. All of these types of images emanate false conceptions accord with fake news. In figure 7 , a Facebook post shows a girl rescuing a koala bear from Australia's bushfires. Originally, the picture is a digitally created artwork used out-of-context to match the bushfire situation. sharing of fake content through videos. Videos are a powerful and impactful tool. They are capable of successfully manipulating people through their content. Therefore, it raises a serious concern to authenticate video content and decide whether a video is credible or not. Figure 8 shows a screengrab from Twitter that shows a video claiming Vladimir Putin's daughter was getting the first shot of coronavirus vaccine. The girl is, in fact, a volunteer and not the Russian President's daughter. Other Modalities: Numerous data types have not been analyzed for fake news yet due to limitations of exposure and datasets. These include embedded data types, audios, and hyperlinks (clickbait). Embedded media is that where one type of data is merged or superimposed onto another. For example, textual matter on images or videos, embedding audio in images, altering audio in videos, etc. Detection of fraudulent content in such a data type is complex and challenging. There is a lack of past research in this area. Figure 9 shows a meme with a text embedded on it that says that the North Korean leader Kim Jong Un faked his death to expose traitors. Many such false statements and claims circled the internet. Various fact-checking sites have debunked these claims. Various fake news detection algorithms utilized by researchers are explained below. Figure 10 depicts the percentage distribution of used algorithms and methods in the reviewed articles. These First introduced in the 1980s, CNNs have come a long way in the domain of computer vision. They have been applied to Natural Language Processing [30] , image classification [31] , video classification [32] , object recognition [33] , time-series forecasting [34] , anomaly detection [35] , speech analysis [36] , handwriting recognition [37] and the likes. For image classification, CNNs require training over large image datasets. Their learning process occurred to be substantially faster than previous methods known, an underlying feature that brought CNNs into the picture. They are efficient in analyzing latent features present inside an image or a video. A rich survey of the latest Convolutional Neural Networks has been provided in Khan et al. [38] . As compared to images, little work has been done in video classification using CNNs as videos are more complex to process owing to their temporal dimension. Most works utilize CNNs to classify videos by extracting images at every video frame [39] . Another method treats spatial and temporal domain separately and classify them using two convolutional neural networks and fusing them after that. Many researchers have also applied CNNs for text classification using one-dimensional convolutional networks. Recurrent Neural Networks analyze sequential inputs like text, image, speech, video, and output in a feedback loop. A network with feedback loops is created, which allows RNNs to retain information and train themselves. RNNs have been utilized in image classification [40] , video classification [41] , object recognition [42] , video annotation [43] , time-series prediction [44] , anomaly detection [45] , sentiment analysis [46] , speech recognition and other ML and DL tasks. LSTMs (Long Short-Term Memory networks), a special case of RNNs, have been widely utilized in fake news classification tasks in multiple modalities. These train much faster and can perform complex classification tasks than other RNNs. 13% of the articles reviewed in this paper performing multi-modal fake news detection have utilized RNNs and a combination of RNNs and CNNs. the authenticity of an image. We can get to know how old an image dates back and where did it appear first. Metadata can also be extracted from such visual data. It also helps us verify the context of the image to the text it accompanies. This method is used by automated fake news detection tools, applications, or web plugins. Under explicit features utilized for FND tasks, we categorize statistical features (no. of words, likes, shares, retweets, comments, reactions, etc.), similarity features that analyze the similarities between content and visual information of a news article and state how well both of these are correlated, semantic features that verify meaningfulness of data, user profile features that provide information about users' age groups, backgrounds, faiths and beliefs, inclination, online social behavior, and other relevant profile information, propagation features that help analyze the flow of fake news among networks and people, geolocation features those study areas of fake news generation and propagation and other external features. These features, when combined with other modalities, increase the weightage of detection accuracies. They serve as an important factor for fake news analysis and detection. Images and videos, given the current technological advancements, can be easily edited and tampered with. We have classified fake news detection techniques using forgery detection, splicedetection, copy-move detection, face-swapping detection, face manipulation detection, pixel- Few other methods that have been utilized by researchers for fake news detection can be named as co-occurrence matrices, blockchain, pattern recognition, etc. It has become popular to match the semantics between post text, image, and video. Few of the latest works verify if the post's modalities convey the same meaning and then classify them as real or fake. They have provided a new dimension to investigate fake news detection. This area provides opportunities to be explored more, enhance currently available methods, and leverage new ones. This section presents an overview of crucial research performed in visual and multi-modal fake news detection. We highlight the vast usage of Deep Neural Networks and forgery detection techniques in multi-modal analysis through the survey. We present the survey classified based on the modalities used for fake news detection. Qureshi and Deriche [47] explained the taxonomy of types of forgeries found in images: copy-move forgery, image retouching, resampling, image splicing. They have also discussed pixel-based forgery detection methods in images that include contrast enhancement detection, sharpening filtering detection, median filtering detection, resampling detection, post-processing editing detection, copy-move detection, and image splicing detection. Brezeale and Cook [48] provided a survey of existing video classification methods that classify videos using text features, Image tampering has become easier than ever, given the advances in photo editing tools. It is crucial to detect such forgeries to keep a check on fake news data. Fake images accompanied by fake news are categorized as in figure 11 . The categories are: tampered/edited images, outdated images used with a later situation, and images out-of-context with the accompanying text. Table 2 summarizes all the tasks related to fake news detection that involve visual modalities. It helps to understand the necessary details of the related works easily. In table 3, a summary of supportive works is provided, which utilize data forensics mechanisms to identify tampered visual data. Results were fused into message-level feature-vector as an extra feature and then trained the classifier. Each tweet was given a separate label instead of labeling each event. Shu et al. [62] presented a way to utilize user profile features for fake news detection. They extracted and studied explicit and implicit user profile features, also studying which users were most likely to share real and fake news. Chen et al. [73] argue the need for an automatic fake news detector tool for evaluating the integrity of any news online. Using multi-modal features, Budack et al. [74] measured the consistency between the modalities for fake news verification. The proposed work evaluates the coherence between text and image data in an unsupervised manner. Textual module extracts persons, entities, and locations using Named-Entity Recognition (NER). POS Tagging is applied, and subsequently, embeddings are calculated using fastText. For visual features, the ResNet model is used for verification. Persons, locations, events, and scene context verification is performed in the cross-modal entity verification process. This model applies to real-world news classification. Parikh et al. [75] performed the task on tweet text and images providing a web application. The UI allows users to upload a screengrab of a tweet from which the model extracts useful information like tweet text, image, username, timestamp, location, etc., and predict the authenticity of a tweet using these features. It has become easier to create fake scenarios in videos by replacing, removing, or adding Nixon et al. [76] annotated videos as real or fake to verify the news. They analyzed the text of news stories and videos circulating online and used their metadata to fact-check and annotate the videos. In textual analysis, the author checks the stories for correctness, distinctiveness, homogeneity, and completeness and then groups them under clusters. The videos related to these news stories were then retrieved and annotated based upon fragment information. Bagade et al. [77] developed a fact-checking web and mobile application 'Kauwa-Kaate' for full-article verification incorporating text, images, and videos. Their proposed system provides a user-friendly interface to query and fact-check information as and when they encounter it. The algorithm scrapes news articles from fact-checked and trusted news sources available on the internet and maintains a repository in the backend. The verification is carried out by matching the query item with news articles in the repository. Devoting a platform entirely for fact-checking, is a very practical method for users to verify fake news. Using several tweet-based and user-based features, Boididou et al. [78] have introduced a model that predicts tweets as fake or real depending upon the majority vote from individual classifiers that use different features. The approach has displayed a successful classification of news from a variety of events. An event rumor detection mechanism for Sina Weibo has been designed by Sun et al. [79] . The model uses many features, which are content-based linguistic features, user-based features, and multi-media features. The model is suitable for detecting rumors in the form of text, image, and video. Detecting fake news in visual data is closely related to identifying manipulations in images and videos. The utilization of data manipulation detection algorithms for fake image/video detection offers a worthy scope. In this sub-section, we identify some of the important manipulation detection techniques that could be useful in fake news detection tasks. A new convolutional layer has been proposed by Bayar and Stamm [80] Sabir et al. [81] and Jaiswal et al. [82] have detected image repurposing where unaltered images are put together with false metadata in a news item. Pomari et al. [83] , Zampoglou et al. [84] , and Wu et al. [85] detected fake images by checking if the images or their portions have been sliced. where the face of a person is replaced by another. They used deep neural networks for the task. Guera and Delp [102] also contributed with a dataset of 300 deep-fake videos extracted from websites. They classified videos using CNN and LSTM into pristine and deep-fake categories. CNN has been used for feature extraction from video frames, concatenated and propagated to LSTM for analyzing sequences temporally. This architecture allows detecting fake videos as short as 2 seconds of length. Korshunov and Marcel [101] showed that using static frame features correspond to higher accuracies than using audio-visual analysis. Nguyen et al. [103] identified forgeries like replay attacks, computer-generated images/videos by building a capsule network with CNN layers. Videos are analyzed at frame level, and the probabilities of fake and real of every frame are averaged to generate results for the video. Lack of suitable multi-modal datasets have, a lot, hampered the progress in the direction of fake news detection. Deep learning algorithms largely depend on huge amounts of training data, which, being meager, has appeared as a big challenge. The maximum number of fake news detection frameworks built to date have been trained upon data extracted from Twitter, Sina Weibo, or some websites. A few of the other small-sized datasets have been generated for image There is an urgent demand for good quality multi-modal datasets that would furnish the need of the hour. The advancements in data augmentation or computer-generated data are beginning to contribute towards building datasets. For the time being, we present a piece of tabulated information about available datasets (image, video, and multi-modal) that have been used in the above-reviewed articles for fake news detection and similar tasks (Table 4) . We also list out such datasets that contain news article URL or image/video URL. These datasets can be further improved by extracting visual data using web scraping methods. In this section, we demonstrate the usage of evaluation metrics utilized by fake news detection tasks and compare their performances based on the most utilized metrics, i.e., accuracy and F1-score. The comparison provided here is irrespective of the dataset but highlights each task's features and methods. We determine how the results have been moving all these years and identify prospective detection methods. Performances are displayed for tasks displaying the results achieved by the experiments on datasets they have used. Visual representations are provided for an easy understanding of how a given model performs when they use a specific set of features. This section discusses the various evaluation parameters utilized for fake news classification tasks. We explain the evaluation methods utilized to examine a model's performance using Accuracy, Precision, Recall, F-score, ROC, AUROC, and EER. In figure 16 , a confusion matrix is presented that explains the categorization of rightly and wrongly classified items. We refer to the items as news, images, and videos. [85] , [61] , [67] , [68] , [70] , [76] , [81] ✓ ✓ ✓ [54] , [55] , [56] , [97] , [62] , [63] , [67] , [72] , [75] , [112] , [113] , [110] , [111] ✓ ✓ ✓ ✓ [80] , [53] , [102] , [116] , [ EER, defined as Equal Error Rate, calculates the error that occurred in classification. Table 5 identifies the evaluation metrics used by the reviewed tasks. Several metrics and parameters have been developed to define the functional performance or, in simpler terms, the algorithms' efficiency on giving the desired output of classification of data from a given dataset. Among them and widely utilized and relied upon metrics are Accuracy (in percentage), Precision, Recall, and F1 values, among several others such as AUC and EER. Almost all major research related to Fake News detection utilizes one or more among the former set of four metrics (Accuracy, Precision, Recall, and F-Score). Hence, we bring forth such metric evaluation summary of the most relevant and pivotal experiments conducted for fake news detection. Figure 17 demonstrates the results of tasks in terms of F1-scores. We observe that the overall performance stays between 80-95% for methods that use textual and visual features combined. The video classification task, which uses the annotation technique, still has a long way to go. In terms of accuracy (Figure 18 ), we observe that the range of results is between 70-100%, with an average score of 85%. The majority of fake news classification tasks have relied upon deep neural networks. With the changing time, we also notice an inclination towards forensic algorithms for the same. Trends determine that most of the existing approaches have preferred to use deep learning algorithms due to their efficiency, robust nature, feasibility, and accuracy. Most works have preferred to use more than one feature, i.e., using multimodal data. Thus, depending on more options and the type of fake information posts can offer. The aim is to consider all parameters that form/alters a user's perception of a piece of information. Convolutional Neural Networks, with maximum usage in the reviewed articles, have displayed eminent classification performance by exploiting the implicit features. They hold the potential to provide better results in future implementations. Research done in the past years is overwhelming yet, insufficient to cope with the amount of fake news pouring in. Each new happening or event in the world serves as a topic for fake news generation and propagation. In the present scenario, while a pandemic is going on, fake news reaches out to people more swiftly than authentic news is. No data modality is left behind in the race of spreading fake news. Text is not the only type of data of which one should be aware while intaking. People need to be more careful while digesting anything available on the internet because false information could greet us in any form, be it text, image, or video. So is the need for designing efficient and robust detection mechanisms. Analyzing the limitations and research gaps, this section highlights the potential directions where research can be proceeded into. With the assistance of deep learning algorithms, real-time detection models can be built to use fact-checked articles on the web for training and generate predictions for unseen data. There is a wide opportunity for the development of real-time detectors and automated fact-checkers. Fake news detectors are built by feeding past data to algorithms. A baseline comparison is made on previous data. These algorithms are built when the fake information has already spread into the world and affected many. Entrapped within the fake news web, the world requires early detection of false news as and when it appears online. Users can only be benefited from fake news detectors when they provide early detection to prevent the propagation of fake news to a large scale. Early detection would allow intervention and thus mitigation of fake news before it spreads to a larger audience. With many social networking platforms available, it is challenging to incorporate a fake news detection mechanism to separate platforms individually. Similar content makes rounds on multiple platforms because one user can have accounts on various networks. This creates a replica of data on different social networks. With the help of redundant data and manual annotations, classification becomes easy for deep neural networks. A cross-platform system is required that would detect fake content on multiple social platforms. Implementation of models that can train on manually annotated content on one platform and then identify fake news on other platforms is suggested. As far as the previous research is considered, we have very few frameworks that provide credibility assessment to fake content types. Most techniques consider text only, while some allow visual verification. It is challenging for a single system to verify the contents of all data modalities. Such a system would be more beneficial for the general public to authenticate information. 6. Feature-oriented Detection: All existing approaches use a limited subset of features, either linguistics, visuals, hybrid, data-centered, sentiment scores, social context, network-based, user-based, or post-based features. These contributing factors of fake news identification could be used all together for dependable predictions. In multimodal approaches, existing works perform detection based on features from each type of data independently. In many fake news instances, the post contents are not semantically related. The text, image, or video for a given post could be expressing unrelated context. Few works focus on assessing the semantic integrity of the news. This helps detect false news where data modalities have not been manipulated but are unrelated to each other. Such integrity assessment tools shall help in identifying out-of-context news items. Detection of fake embedded content has not been done yet. A large volume of fake news is spreading through such type of data. To cope up with the incoming fake news, this type of detection mechanism is required. Current approaches have focused on English language data in text and videos. Due to the spread of fake news through regional languages on the web, multilingual approaches should be considered to detect fake news from other languages in the form of text, videos, or embedded content. With the popularity of image and video forensics techniques, forgery detection in data has become easier. Various manipulation techniques like facespoofing detection, deepfake identification, tampering detection, splicing, copy-move detection, object removal/addition detection, etc., should be considered and merged with fake news detection mechanisms. There is a need for merging the domains of fake news detection and data manipulation detection. The availability of fake news detector tools in the form of easy-to-use browser plugins, add-ons, software, and mobile applications will enhance their accessibility and serve detection on a user-basis. Uncontrolled and unauthentic data being over-loaded on the web needs appropriate solutions for the complexities being generated and has become a hard nut to crack. CNNs are taking the lead in computer vision, and allied domains and have become a prospective application for future FND tasks. Many researchers have identified fake images and videos and tampered regions in them, which we review as supportive tasks that can help classify fake news based on fake visuals. We motivate the readers to combine such tasks with FND modules to perform multimodal FND. By fusing modules performing such tasks on different modalities, optimized performances are assured. There has been the unavailability of symbolic literature in this domain. The progress along the pathways of multimodal fake news detection has been slow. Researchers are unaware of the advancements so far reached. Existing literature is focused upon fake textual news and its detection mechanisms. This survey acknowledges this shortage and provides a broader and intact overview of multimodal fake news detection that incorporates image, video, audio, and their combinations with text. We engage the readers with a taxonomy of detection methods utilized in the referenced articles. We identify the methods with most applications and highlight their prospects. In this review, we have endeavored to cover almost all the significant works performed in relevant domains. We segregate articles based on modalities implied in them. We demonstrate the yearwise trend of work involved and also analyze their method-wise distribution. We neatly summarize all the notable work and highlight important techniques that may pose powerful algorithms in future fake news classification tasks. Further, we described the evaluation metrics adapted in research so far. We see that accuracy, precision, recall, and F-scores were observed for most of the tasks, whereas some liked to evaluate their models' performances using AUC and ROC. A few other metrics utilized were EER, HTER, TPR, and FPR. Accuracy appears as the most adopted method. Further, owing to the scarcity of multimodal datasets, we regard the obstacles faced in fledged research in the form of not so optimum solutions. Hence, we provide collective information of all the good quality image, video, and multimodal datasets available and previously been used in various tasks. This provides a route for fellow researchers to efficiently choose amongst the few available datasets and perform future research. We also encourage them to build good quality multimodal fake news datasets that would render this domain positively. Concerning future works, we promote a multimodal framework that would efficiently detect fake news in all forms that revolve around the internet. We suggest exploring the domain incorporating fake news in the form of videos. We also motivate researchers to engage in building versatile multimodal datasets for future use collecting information from websites, online social platforms, and the likes. We encourage the readers to dive deeper into machine learning and deep learning algorithms and fish out ultimate solutions to the problem domain. Our work helps in bridging the research gaps and serve as potential future opportunities to work upon. We conclude this review anticipating that interested researchers will benefit from the information provided and narrow down their interests to this domain to contribute to the society and research community. A Dataset of Fact-Checked Images Shared on WhatsApp During the Brazilian and Indian Elections Faking sandy: Characterizing and identifying fake images on twitter during hurricane sandy A survey on natural language processing for fake news detection Detecting opinion spams and fake news using text classification Sentiment analysis for fake news detection by means of neural networks Graph Neural Networks with Continual Learning for Fake News Detection from Social Media Fake News Detection: A long way to go The current state of fake news: Challenges and opportunities Big Data and quality data for fake news and misinformation detection An overview of online fake news: Characterization, detection, and discussion Defining 'Fake News': A typology of scholarly definitions Fake News Detection on Social Media Fake News, Conspiracies and Myth Debunking in Social Media -A Literature Survey Across Disciplines Deception detection for news: Three types of fakes Stance-In-Depth Deep Neural Approach to Stance Classification Exploring the Role of Visual Content in Fake News Detection Media-Rich Fake News Detection: A Survey Convolution Neural Network for Text Mining and Natural Language Processing Review of deep convolution neural network in image classification Large-scale video classification with convolutional neural networks 3D convolutional neural network for object recognition: a review Convolutional neural networks for time series classification An empirical study on network anomaly detection using convolutional neural networks An analysis of convolutional neural networks for speech recognition A comparative study on handwriting digit recognition using neural networks A survey of the recent architectures of deep convolutional neural networks A Deep Learning Approach for Multi-modal Deception Detection Deep recurrent neural networks for hyperspectral image classification Beyond short snippets: Deep networks for video classification Recurrent convolutional neural network for object recognition Hierarchical recurrent neural encoder for video representation with application to captioning Recurrent neural network for time series prediction Long short term memory networks for anomaly detection in time series Recurrent neural networks for sentiment analysis A bibliography of pixel-based blind image forgery detection techniques Automatic video classification: A survey of the literature Verifying information with multimedia content on twitter: A comparative study of automated approaches Leveraging Heterogeneous Data for Fake News Detection Deepfakes and beyond: A Survey of face manipulation and fake detection Multi-modal, Semi-supervised and Unsupervised web content credibility analysis Frameworks What You See is What You Get? Automatic Image Verification for Online News Content EANN: Event adversarial neural networks for multi-modal fake news detection Image Credibility Analysis with Effective Domain Transferred Deep Networks Exploiting multi-domain visual information for fake news detection Detection and veracity analysis of fake news via scrapping and authenticating the web search Fighting Fake News : Image Splice Detection Same: Sentiment-aware multi-modal embedding for detecting fake news MCG-ICT at MediaEval 2015: Verifying multimedia use with a twolevel classification model The role of user profiles for fake news detection Fake news identification on Twitter with hybrid CNN and RNN models SpotFake: A multi-modal framework for fake news detection Fake tweet buster: A webtool to identify users promoting fake news ontwitter Faking sandy: Characterizing and identifying fake images on twitter during hurricane sandy Visual and Textual Analysis for Image Trustworthiness Assessment within Online News TI-CNN: Convolutional Neural Networks for Fake News Detection Fake News: A Technological Approach to Proving the Origins of Content, Using Blockchains Identifying tweets with fake news NewsVallum: Semantics-Aware Text and Image Processing for Fake News Detection system SAFE: Similarity-Aware Multi-modal Fake News Detection News in an online world: The need for an 'automatic crap detector Multi-modal analytics for real-world news using measures of cross-modal entity consistency A framework to detect fake tweet images on social media Multi-modal Video Annotation for Retrieval and Discovery of Newsworthy Video in a News Verification Scenario The Kauwa-Kaate fake news detection system: DemO Detection and visualization of misleading content on Twitter Detecting Event Rumors on Sina Weibo Automatically A deep learning approach to universal image manipulation detection using a new convolutional layer Deep multi-modal image-repurposing detection Aird: Adversarial learning framework for image repurposing detection Image Splicing Detection Through Illumination Inconsistencies and Deep Learning DETECTING IMAGE SPLICING IN THE WILD ( WEB ) Markos Zampoglou , Symeon Papadopoulos , Yiannis Kompatsiaris Deep matching and validation network: An end-to-end solution to constrained image splicing localization and detection Detecting both machine and human created fake face images in the wild Detection of GAN-Generated Fake Images over Social Networks Modular convolutional neural network for discriminating between computer-generated images and photographic images Using Convolution Neural Networks To cite this version Distinguishing Computer Graphics from Natural Images Using Convolution Neural Networks Exposing computer generated images by using deep convolutional neural networks Recurrent Convolutional Strategies for Face Manipulation Detection in Videos Automated face swapping and its detection Conf. Signal Image Process. ICSIP 2017 Two-Stream Neural Networks for Tampered Face Detection Face image manipulation detection based on a convolutional neural network Mantra-net: Manipulation tracing network for detection and localization of image forgeries with anomalous features Detecting photoshopped faces by scripting photoshop BusterNet: Detecting copy-move image forgery with source/target localization Identification of deep network generated images using disparities in color components Detecting GAN generated Fake Images using Co-occurrence Matrices Fake news detection by image montage recognition DeepFakes: a New Threat to Face Recognition? Assessment and Detection Deepfake Video Detection Using Recurrent Neural Networks Capsule-forensics: Using Capsule Networks to Detect Forged Images and Videos Learn Convolutional Neural Network for Face Anti-Spoofing Speaker inconsistency detection in tampered video Deception detection in videos Deception detection using real-life trial data In Ictu Oculi: Exposing AI created fake videos by detecting eye blinking Local tampering detection in video sequences Novel Visual and Statistical Image Features for Microblogs News Verification MvaE: Multi-modal variational autoencoder for fake news detection Web video verification using contextual cues Multi-modal fusion with recurrent neural networks for rumor detection on microblogs MediaEval 2016: A multi-modal system for the Verifying Multimedia Use task Multimedia semantic integrity assessment using joint embedding of images and text Fact-checking meets fauxtography: Verifying claims about images Detecting fake news stories via multi-modal analysis The Point Where Reality Meets Fantasy: Mixed Adversarial Generators for Image Splice Detection r/fakeddit: A new multi-modal benchmark dataset for finegrained fake news detection NewsBag: A Multi-modal Benchmark Dataset for Fake News Detection ReCOVery: A Multi-modal Repository for COVID-19 News Credibility Research