key: cord-0444282-po5mxqrx authors: Wang, Ge; Tan, Li; Shang, Ziliang; Liu, He title: Multimodal Dual Emotion with Fusion of Visual Sentiment for Rumor Detection date: 2022-04-25 journal: nan DOI: nan sha: 6595613cc356caf00b3eedd99c9c5701d880c1d8 doc_id: 444282 cord_uid: po5mxqrx In recent years, rumors have had a devastating impact on society, making rumor detection a significant challenge. However, the studies on rumor detection ignore the intense emotions of images in the rumor content. This paper verifies that the image emotion improves the rumor detection efficiency. A Multimodal Dual Emotion feature in rumor detection, which consists of visual and textual emotions, is proposed. To the best of our knowledge, this is the first study which uses visual emotion in rumor detection. The experiments on real datasets verify that the proposed features outperform the state-of-the-art sentiment features, and can be extended in rumor detectors while improving their performance. W ITH the advancement of the information age, the speed of information dissemination on the Internet has reached an unprecedented level. The emergence of social media and micro-blogging have gradually become the preferred ways for people to collect and disseminate information. An existing study [1] shows that more than two-thirds (68%) of Americans get news from social media, and the journalists also use the social media as a convenient and powerful work platform. Although online media has achieved success in communication and practicality, it contributes to the rapid growth and spread of rumors. According to the 2018 Internet Development Trends Report [2] , more than one-third of social media news events contain false information such as false images, videos and texts, for example. The rapid spread of rumors has proven to have serious consequences. For instance, the official Twitter account of the Associated Press was hacked on April 23 of 2013, tweeting two explosions at the White House, which injured the president. Although this rumor was quickly debunked, it is still spreading to millions of users, causing severe social panic and a rapid stock market crash [3] . In addition, some rumors about COVID-19 are even more irreversible threats to life security, such as false claims suggesting to drink bleach to cure the disease [4] . Therefore, if rumors cannot be detected in time, sensational news may cause severe social panic, and they can have a powerful impact during the outbreak of emergencies [5] [6] such as the new coronavirus incident [7] , for example. Thus, rumors from social media are a major concern. In recent years, several advanced enterprises and researchers focused on rumor detection. Rumors are defined as the stories or reports currently circulating about uncertain or dubious truths [8] . Most of the early rumor detection methods are based on text features [9] - [11] and visual features [12] - [16] , in order to experiment and obtain accurate results. Studies have deduced that rumors often have strong personal views and emotional colors to attract people's attention. In addition, they use people's curiosity and social media circulation to quickly spread [17] . Several studies highlight the emotional feature, add emotional features to classify rumors in experiments, and therefore obtain accurate results [9] , [19] - [21] . Unfortunately, there are no rumor detection studies involving the visual emotion features. A study which uses the extreme emotions of users generation when they see rumor images to join rumor detection, does not exist. In fact, in order to deepen people's impression and credibility of rumors to promote the spread of rumors, the researchers often use the emotional information from images taken for events in history or generated by computers for creating attention and rich rumors emotion when publishing rumors. Studies have shown that people can get more intuitive emotions from vision [22] . For instance, in Figure 1 , most of the textual information of rumors is a statement of the expressed information without too much sentiment. Therefore, the effect of only extracting emotional features from the text is minimal. On the contrary, the rumor publisher instills more emotional features into the images, so that the users can more intuitively experience the emotional colors from the vision. Therefore, extracting the emotional features of the images can theoretically provide more help. For this reason, inspired by the dual emotion [19] , this paper attempts to incorporate an automatic extraction of multimodal emotion into dual emotion, in order to help rumor detection work, thereby overcoming the limitation of only using text emotion. The main contributions of this paper are summarized as follows: • To the best of our knowledge, this is the first study which automatically extracts the Multimodal Sentiment in rumor arXiv:2204.11515v1 [cs.CY] 25 Apr 2022 detection, while incorporating the visual emotion into dual emotion. • An accurate multimodal approach is proposed to classify social media posts using only the post content (i.e. text and attached images). • Experiments are conducted based on real-world datasets. The results show that: 1) The Multimodal Dual Emotion Features outperform the existing sentiment features for rumor detection. 2) The multimodal dual emotion feature module can be attached to the existing multimodal rumor detection methods, and it can improve the method's performance. The remainder of this paper is organized as follows. In Section 2, the related work on rumor detection and visual emotion feature extraction is introduced. Section 3 presents the problem statement. In Section 4, the proposed Multimodal Dual Emotion rumor detection framework is detailed. In Section 5, the details, results and analysis of the experiments are presented. Finally, the conclusions and future work are drawn in Section 6. In response to the problem of rumor detection, [9] evaluate the credibility of platform-specific topic-related information by manually designing different types of statistical features such as the sentence length and several positive and negative emotional words, for example. In addition, [10] analyze the fake news or rumors and their emotional features. Their experiments show that the fake news or rumors have more emotional colors, and can improve the rumor detection. [23] manually extract words with emotional features such as emotion, morality and hyperbole in the news. They also extract the emotional difference between the fake news and real information using Bi-LSTM. [19] use the mining release sentiment and social-emotional features to assist in solving the problem of fake news detection. However, they method only uses textual information and emotion to address rumor detection, while lacking visual semantic and emotional features. In fact, several experiments proved that the visual features have a positive impact on rumor detection. [14] successfully extract the visual features of rumor microblogs, and propose a recurrent neural network-based deep learning method. [12] and [13] recently explore the impact of multimodal data on fake news detection. The experiments show that due to the higher attractiveness of the visual information, the visual features play a more important role in such issues. However, these works still do not use image emotion features to help solving the problem of rumor detection. The rumor detection combined with automatic image emotion recognition is still not obvious. The problem of image emotion recognition has been widely studied since 2015. Several studies consider the statistical features from color in order to find out the important features of the image emotion prediction problem. For instance, [24] use the psychological and theory-based art features such as the color contrast. In addition, [25] use a pre-trained neural network model for image sentiment classification, such as classifying the sentiment labels into positive sentiment and negative sentiment, in order to solve a binary classification problem. [26] classify the image sentiment into eight categories, and trained eight classifiers in order to solve the problem of image sentiment classification. However, these methods can only output a small number of emotional states. [27] and [28] demonstrate that, by valence-arousal two parameters located in a continuous space, the subtle changes of image emotion can be better captured compared to the earlier works. In this work, we address the realistic rumor detection scenario facing by social media platforms. We define a rumor detection dataset as C = {C 1 , C 2 , ..., C i , ..., C N } ,where C i is the i-th post of C , and N is count of post. And each post C i ∈ C is associated with a ground-truth label y i ∈ {0, 1} . Each post C i = (T i , V i , Comment i ) is a tuple representing a given post C i which is associated with text T i , image V i , and a set of n comments Comment i = {comment j } n i=1 . Our task is to classify a post into a class defined by the specific dataset, such as veracity class labels, e.g., Real/Fake. For modeling rumor detection methods, we use visual emotion and text emotion to jointly construct a multimodal dual emotion method to learn better the similarity and difference between multimodal publisher emotion and socital emotion. It combines the rumor detector to learn image semantics and text features and finally achieves the purpose of predicting the authenticity of rumors. Furthermore, our proposed multimodal dual emotion method can be added as a plug-in to the existing state-of-the-art multimodal rumor detectors to improve their performance on the rumor detection task. Figure 2 shows the framework of our proposed multimodal dual emotion rumor detection method. This section will detail the proposed Multimodal Dual Emotion modeling method and the stitching method with the rumor detector. Due to the fact that rumor publishers sometimes do not instill too many sentences in the rumors to arouse readers' emotions, they rather instill more information in the published images and videos that arouse the readers' emotions. Therefore, a multimodal publisher sentiment feature, which combines the visual sentiment feature and text sentiment feature, is proposed. 1) Publish Text Emotion: Five sentiment features are used in this part, including the sentiment category, lexicons, intensity, score and other auxiliary features. The sentiment category, intensity, score and lexicons provide the overall information, and the other auxiliary features provide word and symbol level information. Considering the i-th blog post text a) Sentiment category and sentiment score: The sentiment category is the probability for each of the 8 sentiments contained for the given text which include the anger, anticipation, disgust, fear, joy, sadness, surprise and trust. For a given text T i and sentiment classifier f (•), f (T i ) is considered as the sentiment category prediction result of text T i . Therefore, the sentiment category feature is T E Ti classif i =f (T i ). It is assumed that the dimension of the sentiment category feature is D f , and therefore T E Ti classif i ∈ R D f . In addition, the sentiment score is a score for each of the 8 sentiments contained for the given text. Compared with the sentiment categories, the sentiment score can more clearly describe each emotion degree, and express the positive and negative positivity of the whole text for each sentiments. For a given text T i and sentiment score computation method f score (•), f score (T i ) is considered as the sentiment score prediction result. Therefore, the sentiment score feature is T E Ti score =f score (T i ). It is assumed that the dimension of the sentiment score is D secore , and therefore T E Ti score ∈ R Dscore b) Sentiment lexicons and sentiment intensity: An existing study [29] demonstrates that the emotion expression can be described by modeling the specific words expressing emotion in the text. Therefore, rumor sentiments are extracted using sentiment lexicons annotated by experts in the real world. In this paper, it is assumed that the given text T i contains n sentiments T emo = {e 1 , e 2 , ..., e n } and for each sentiment e i ∈ T emo , it can be provided in an annotated sentiment dictionary Φ ei = {ϕ ei,1 , ϕ ei,2 , ..., ϕ e i,Le } of length Le. Aggregating sentiment scores are used for each sentiment word across the text, in order to model the sentiment. For each sentiment word of a given text T i , a sentiments'word score S (w r , e i ) is computed (cf. Eq.1), where w r is the r-th word of text T i . where adverb (w r ) represents the score of negative words and degree adverbs, computed as Eq.2: where deny (w r ) indicates whether there is a negative modifier (cf. Eq.3), and degree (w r ) represents the degree score of the modified sentiment word. After calculating the score of each emotional word of each sentence, each emotion word is accumulated to obtain the emotion lexicons score S (e i ) (cf. Eq.4) corresponding to emotion e i ∈ T emo : Finally, all the calculated sentiment dictionary scores are spliced, and the sentiment lexicons score T E Ti lexicon ∈ R D lexicon , which dimension is D lexicon , Eq.5 is obtained : In addition, in order to more accurately calculate the finegrained sentiment dictionary score, the distinction between different levels of sentiment words is added in the sentiment dictionary. For instance, the word "sad" has a higher intensity than word "blue". Therefore, the emotional words of each degree are manually graded in the dictionary, and different degrees of rating scores Grade (w r , e i ) are assigned correspondignly. The emotional intensity feature is then computed by weighting the scores S grade (e i ) as Eq.6: Finally, the sentiment intensity score T E Ti grade ∈ R D grade , which assumed dimension is D grade , is obtained: c) Other Auxiliary Features: Besides the previously mentioned four features, in order to further mine the emotional information which is not clearly indicated in social media, auxiliary features are introduced to mine the emotional information behind the media blogs and comments, including emoji expressions, punctuation marks and letter cases. In addition, the frequency of use of emotional words and the frequency of use of personal pronouns is introduced, in order to enhance the user's learning of the use of preferred words, so that the model can further learn the emotional features. In fact, the social media is full of non-emotional words or symbols used to express emotions, such as ":)" for happiness and ":(" for sadness, for example. In addition, the punctuation such as "?" is a method for expressing emotions. Finally, the other auxiliary feature T E Ti auxiliary ∈ R D auxiliary is obtained, where D auxiliary is the assumed dimension of the other auxiliary features. d) Text Sentiment: Five sentiment features are concatenated to obtain the text sentiment [T E Ti in multimodal publish emotion as Eq.8: 2) Publish Visual Emotion: In order to quickly spread rumors, rumor publishers attach impressive images along with the rumors. Such images usually carry more extreme emotions, further deepening the emotional color of the rumors. Therefore, in order to extract the image emotion in the rumor, a method referred to as Visual Emotion Extractor, is designed. Several studies on the visual emotion [30] , [31] demonstrate that the emotional color of the image is co-presented with the high-level and low-level features of the image. The specific manifestations of high-dimensional features are the object features and semantic features in the image, while the manifestations of low-dimensional features are the colors and textures . Therefore, for the Visual Emotion Extractor to learn image emotion, this module extracts three features: semantic segmentation, object features and low-dimensional features. a) Semantic segmentation: As a kind of high-level feature, different semantic information of the image play a crucial role for computers to learn the emotion of images. As shown in Figure 1 , the parts of the two-headed snake and the mutant dog will make people feel terrified. In addition, the last terrifying sky background will also affect the people's emotions. However, if the background in Figure 1 is converted to a clear blue sky or a blue sea, it will produce different emotions. Therefore, this is very important for the understanding of semantic information. Different parts of the image are studied in study of [32] , where the pixels of the image are divided into 150 categories in order to calculate the semantic features of each part in the image. This class contains both high-level features of objects and semantics and low-level features such as the color. Therefore, it is a part of the calculation of the image emotion by the Visual Emotion Extractor. This part takes a rumor image as input, uses ResNet50 as the encoder, and uses the pyramid pooling module as the structure of the decoder to calculate the semantic feature V F Vi sematic = f sematic (V i ) . Assuming that the dimension of the semantic feature vector is D sematic , then V F Vi sematic ∈ R Dsematic . b) Object Feature: [27] calculate the degree of correlation between the V-A value in the emotional image dataset and the emotion of the object in the image. The obtained results demonstrate that the emotion of the object in the image is highly correlated with the emotion of the image. In other words, the emotion in the image is highly correlated. The object affects the mood of the image. Based on this point of view, a part of extracting object features is added to the Visual Emotion Extractor. More precisely, a rumor image V i is used as input, and the object feature V F Vi object = f vgg16 (V i ) is extracted by the VGG16 network that has been pre-trained in the ImageNet [33] . Assuming that the dimension of the object feature vector is D object , then V F Vi object ∈ R D object . c) Low-level Feature: The low-level features involved in this paper refer to the color of the image. An existing study [34] demonstrated that the color of the images can be used to change the emotion. The color cannot directly affect and change the emotion with a large gap. Therefore, it is a lowdimensional feature. However, it is still a key factor for image emotion prediction [35] , [36] . Consequently, the mean value of RGB is extracted as the basic color feature. Furthermore, the saturation and brightness factors are added to the Low-level features, since they can directly affect the Valence, arousal and dominance (VAD) [37] . In this experiment, the 512dimensional GIST descriptor is used to obtain the image color, and the 59-dimension local binary pattern (LBP) descriptor is used to detect the image texture, and finally obtain the Low-level feature V F Vi low ∈ R D low , while assuming that the dimension of the Low-level feature is D low . d) Visual Emotion Extractor: Finally, the feature vector is obtained by splicing the semantic features V F Vi sematic , object features V F Vi object and low-level features V F Vi low of the rumor image V i . The final VAD value is calculated by the fully connected layer. After pre-training using the IESN image emotion dataset [38] , the network parameters of the image emotion extraction module are obtained in order to extract the image emotion. In addition, in order for the Visual Emotion Extractor to learn the slight difference between the images in rumors and the image emotions in the IESN dataset, and to align the visual emotion features with text emotions, an initial fully connected layer is added after the penultimate layer of the fully connected layer. More precisely, it is used to finetune the network parameters, so that the generalization ability of this module in the image emotion extraction in the rumor field becomes stronger, and finally obtain the visual emotion feature as Eq.9: Assuming that the dimension of the visual emotion feature is 3) Multimodal Publish Emotion: In order to obtain the multimodal publisher sentiment M P E Ci of the blog C i , the text sentiment T E Ti is combined with the image sentiment V E Vi , as Eq.10 : where lambda is the weight of different modal emotions obtained during the training process. The social emotion feature is obtained from comments Comment i = {comment 1 , comment 2 , ..., comment n } in blogs C i . The text emotion feature mentioned in 4.1.1 is calculated for each comment , and each comments' sentiment feature T E commenti is obtained. In order to maintain the integrity of comment sentiment, the sentiment feature vector is concatenated for each comment T E commenti as the comment sentiment feature vector T E Commenti of the blog (cf. Eq.11): Max pooling and average pooling on the comment sentiment feature vector T E Commenti are used to obtain extreme sentiment features T E Commenti max and average pooling features T E Commenti average , respectively. Finally, they are concatenated to obtain the social emotion feature T E Ci social of post C i (cf. Eq.12-Eq.14): where In oder to model the difference between the publisher emotion and social emotion, the difference between the multimodal publisher emotion and social emotion, referred to as Multimodal Emotion Gap ([M EG Ci ), is computed as Eq.15: (15) where M EG Ci ∈ R 2 * d . The network will measure the difference bewteen dual emotion by multimodal dual emotion gap. Finally, the multimodal publish emotion, social emotion and multimodal emotion gap are concatenated to obtain the multimodal dual emotion. Due to the fact that the image will make the readers more impressive, and plays a more important role in rumor detection [12] , [13] , the visual emotion will also concatenate into the multimodal dual emotion M DE Ci as Eq.16: (16) where M DE Ci ∈ R 6 * d . The multimodal dual emotion can be expended to the existing Rumor Detector. In this paper, VGG19 pre-trained on ImageNet is used to extract high-dimensional features [V GG 19 (V i ) of rumor images, and Bi-LSTM is used to learn the semantic features [BiLST M (T i ) of the text. Finally, the multimodal dual emotion will be concatenated with image feature and text feature, and the Multilayer Perceptron (MLP) and Softmax are input in order to obtain prediction resultŷ as Eq.17-Eq.18 : In this section, the datasets used in the experiments, including the real dataset of social media and the image emotion dataset of the pre-trained Visual Emotion Extractor, are first presented. The experiment settings are then provided and the performance of the proposed model is compared with that of the existing SOTA method on the rumor detection task. A. Datasets 1) IESN: In order to pretrain the network parameters of the Visual Emotion Extractor mentioned in the model, a public and reliable image emotion dataset should be used from social media with VAD labels. IESN [38] , which comprises 21,066,920 images, includes 10 sentiments from Flickr 1 . [38] assign each image with 8 emotion categories and continuous VAD values, by combining the expected emotion and actual emotion of each user in all the relevant images. In the experiment, 59,200 images are considered for training. The number of datasets for each emotion is shown in Table 1. TABLE I IESN DATASET amusement awe contentment excitement 7400 7400 7400 7400 anger disgust fear sadness 7400 7400 7400 7400 2) Fakeddit: In order to meet the requirements of the post, images and comment in the experiment, the real-world Fakeddit dataset [39] is used. The Fakeddit 's data comes from Reddit 2 , which is a social news and discussion website. Note that Reddit is one of the top 20 websites in the world by traffic 3 . The data are obtained from March 19, 2008 until October 24, 2019. A part of Fakeddit is selected for the experiments. The detailed parameters of the dataset are shown in Table 2 . During the experiments, the used text sentiment classifier is a pretrained model provided by NVIDIA 4 .In order to compute the sentiment intensity feature in 4.1.1.2, the sentiment score model in NLTK 5 , which can measure the sentiment score of a text, is used. In addition, the emoji library involved in the calculation of other auxiliary features is derived from wikipedia 6 , which covers most of the emoji symbols, as well as the meaning and degree of the corresponding emotions to be expressed. For the corpus, the NRC Sentiment Dictionary [41] and NRC Sentiment Intensity Dictionary [40] are used to extract the sentiment dictionary and sentiment intensity features, respectively. For word embeddings, 200-dimensional Glove [42] is used. The feature vector of each word with dimension 200 is obtained from the pre-trained Glove in an unsupervised manner. The outputs from the second to last layers of the 19-layer VGGNet, trained on the ImageNet dataset for visual features, are used. The feature dimension obtained from the VGG19 network is 4096. In the training process, the weight of VGG is not fine-tuned in order to reduce thecomputational load and improve the training efficiency. Simultaneously, the fully connected layer is connected after the last layer of VGG19. In order to reduce the image features and prevent the image features from being too large to cover up the text features, the final output dimension is set to 256. In the rumor detector, Bi-LSTM with a hidden layer dimension size of 32 is used to extract the text features. After the Visual Emotion Extractor, two fully connected layers are connected in order to align the dimensions of image emotion features and text emotion features. The output dimensions are 64 and 300, respectively. A batch size of 32 instances is used in the training of the whole network. The model is trained for 100 epochs with a learning rate of 10-3, with an early stopping to report the results. The Relu nonlinear activation function is used. Simultaneously, in order to prevent overfitting, the L2regularizer is used for the model, different weights have been tried, and finally the weight (of 0.01) is determined and the loss is calculated by cross-entropy. In order to verify the efficiency of the multimodal dual emotion feature, baseline models from the sentiment feature and rumor detector aspects are chosen. 1) Sentiment Features: In the experiment, Dual Emotion is chosen as the experimental baseline to prove the effect of the proposed multimodal dual emotion and the improvement of image emotion on rumor detection: • Dual Emotion:this is an emotional feature [19] which consists in extracting the emotional score of the text and the emotional score of the comment, and calculating their previous difference as the text emotional feature of the news. The experiments show that this method is the most efficient in rumor detection. 2) Rumor Detector: In order to demonstrate that the multimodal dual emotion can enhance the performance of rumor detectors, the combination of Bi-LSTM and VGG is chosen as the most basic rumor detector, while EANN and MVAE are chosen as experimental baselines: • BiLSTM + VGG19: Bi-LSTM has been shown to be efficient for fake news detection [1] . Simultaneously, a large number of rumor detection studies have proved that VGG19 can better extract the rumor features in the images [13] , [14] . Therefore, a combination of these two networks is used to detect whether the multimodal bimodal features can improve them. • EANN [15] : the Event Adversarial Neural Network (EANN) consists of three main components: multimodal feature extractor, fake news detector and event discriminator. A multimodal feature extractor extracts the textual and visual features from posts. It learns a discriminative representation for detecting fake news, together with a fake news detector. The event discriminator is responsible for removing any event-specific features. It is also possible to detect fake news using only two components: a multimodal feature extractor and a fake news detector. For a fair comparison, a variant of EANN which does not include the event discriminator, is used in the experiments. Note that the parameters in the experiment are coherent with the original text. • MVAE [16] : the Multimodal Variational Autoencoder (MVAE) consists of three parts: encoder, decoder and fake news detector. The fake news detector classifies posts as fake news. The encoder extracts the multimodal features from the textual and visual information. The parameters in the experiment are also coherent with the original text. Table 3 shows the results of the baseline and the proposed method. More precisely, the accuracy, Macro F1 value and F1 score of each rumor detector with the help of different modal emotion features, are reported. It can be clearly seen that the proposed multimodal dual emotion has a good improvement on the rumor detector. In the Fakeddit dataset, the multimodal feature combination of Bi-LSTM and VGG has a certain improvement in single text semantics. It also verifies the important position of visual modal information in rumor detection. It can also be seen that each group of rumor detectors has a certain improvement with the help of Dual Emotion or Multimodal Dual Emotion. Moreover, for different rumor detectors, the performance improved by Dual Emotion and Multimodal Dual Emotion is different. Corresponding to each rumor detector in Table 3 , the performance improvement of the Dual Emotion on the accuracy is 1.10%, 1.21% and 1.23%, respectively. The Emotion improves the performance of the combination of Bi-LSTM and VGG the most, which increases the accuracy from 82.2% to 85.9%. It can be clearly deduced that the Multimodal Dual Emotion improves the rumor detector more than the baseline method. It is also preliminarily proved that the image emotion has a non-negligible positive impact on the task of rumor detection. In order to further compare the importance of image emotion for rumor detection, the part of the rumor detector is removed in the experiment. That is, the semantic features of text and images are ignored, and the prediction results of Multimodal Dual Emotion and Dual Emotion input are compared to MLP, respectively. The used MLP is coherent with the previous experiments. The obtained results are shown in Table 4 . It can be seen that Multimodal Dual Emotion is better than Dual Emotion feature when only relying on emotion feature for the rumor detection task. It further demonstrates the role of image emotion features in rumor detection In this paper, the multimodal rumor detection task is explored. In order to make up for the neglect of image emotion in the existing rumor detection methods, it is demonstrated that the image emotion has a positive effect on rumor detection. A novel multimodal emotion feature which can be added as an extension to the existing rumor detectors, is also proposed. The Multimodal Dual Emotion can better improve the performance of the existing rumor detectors. In addition, through comparative experiments, it is deduced that image emotion features have a greater positive impact in the task of rumor detection. This study will also bring a novel idea of exploring multimodal emotion to the field of rumor detection. In future work, we aim at conducting a further study on multimodal emotional feature fusion in rumor detection. Vroc: Variational autoencoderaided multi-task rumor classifier based on text False rumor of explosion at white house causes stocks to briefly plunge; ap confirms its twitter feed was hacked Facebook will remove misinformation about coronavirus (2020) Automatic rumor detection on microblogs: A survey Rumor cascades Q&a: The novel coronavirus outbreak causing covid-19 Detection and resolution of rumours in social media: A survey Information credibility on twitter Sentiment aware fake news detection on online social networks Multi-view learning with distinguishable feature fusion for rumor detection Exploring the role of visual content in fake news detection On the role of images for analyzing claims in social media Multimodal fusion with recurrent neural networks for rumor detection on microblogs Eann: Event adversarial neural networks for multi-modal fake news detection Mvae: Multimodal variational autoencoder for fake news detection The science of fake news The psychology of fake news Mining dual emotion for fake news detection Sentiment analysis for fake news detection A framework for big data analytics in commercial social networks: A case study on sentiment analysis and fake review detection for marketing decision-making Robust image sentiment analysis using progressively trained and domain transferred deep networks Fakeflow: Fake news detection by modeling the flow of affective information Affective image classification using features inspired by psychology and art theory From pixels to sentiment: Fine-tuning cnns for visual sentiment prediction Robust image sentiment analysis using progressively trained and domain transferred deep networks Building emotional machines: Recognizing image emotions through deep neural networks Cross-modal image sentiment analysis via deep correlation of textual semantic Emotion detection from text and speech: a survey Image color transfer to evoke different emotions based on color combinations Joint image emotion classification and distribution learning via deep convolutional neural Semantic understanding of scenes through the ade20k dataset Imagenet: A large-scale hierarchical image database Image recoloring with valencearousal emotion model Learning visual emotion representations from web data Pdanet: Polarity-consistent deep attention network for fine-grained visual emotion regression Norms of valence, arousal, and dominance for 13,915 english lemmas Predicting personalized image emotion perceptions in social networks Fakeddit: A new multimodal benchmark dataset for fine-grained fake news detection Understanding emotions: A dataset of tweets to study interactions between affect categories Crowdsourcing a word-emotion association lexicon Glove: Global vectors for word representation He is member of China Computer Federation. His research interests include social multimedia analysis, multimedia computing and misinformation detection Funding from the Chongqing Municipal Education Commission of Science and Technology Research Project (KJZD-K202114401) is gratefully acknowledged.