key: cord-0508253-cv4w3ok6 authors: Liang, Chaoqi; Zhang, Yu; Li, Xinyuan; Zhang, Jinyu; Yu, Yongqi title: FuDFEND: Fuzzy-domain for Multi-domain Fake News Detection date: 2022-05-08 journal: nan DOI: nan sha: 75d9d039e48f3a19f41b2569552124cba571c963 doc_id: 508253 cord_uid: cv4w3ok6 On the Internet, fake news exists in various domain (e.g., education, health). Since news in different domains has different features, researchers have be-gun to use single domain label for fake news detection recently. This emerg-ing field is called multi-domain fake news detection (MFND). Existing works show that using single domain label can improve the accuracy of fake news detection model. However, there are two problems in previous works. Firstly, they ignore that a piece of news may have features from different domains. The single domain label focuses only on the features of the news on particu-lar domain. This may reduce the performance of the model. Secondly, their model cannot transfer the domain knowledge to the other dataset without domain label. In this paper, we propose a novel model, FuDFEND, which solves the limitations above by introducing the fuzzy inference mechanism. Specifically, FuDFEND utilizes a neural network to fit the fuzzy inference process which constructs a fuzzy domain label for each news item. Then, the feature extraction module uses the fuzzy domain label to extract the multi-domain features of the news and obtain the total feature representation. Fi-nally, the discriminator module uses the total feature representation to dis-criminate whether the news item is fake news. The results on the Weibo21 show that our model works better than the model using only single domain label. In addition, our model transfers domain knowledge better to Thu da-taset which has no domain label. With the development of the Internet, social media platforms such as Sina Weibo and Twitter have become the main source of information. Fake news is also widely spread on social platforms. The fake news on social media is usually a short text, a picture and a short video that they can be understood in a few seconds. At the same time, the fake news incites people's emotions and stimulates users to forward, so they can be widely spread. Tavernise [1] showed that important social events were affected due to moderated fake news campaigns. The spread of fake news may result in people's panic and social dislocation. The quality of content on social media platforms has suffered greatly due to the spread of fake news, misinformation and unverifiable facts [2] . Therefore, it is meaningful to research into social media fake news detection today. At present, there are two methods of social media fake news detection. One is social background detection method, and the other is content-based detection method [3] . Social background method aims to study users' social network structure, users' personal information, microblog forwarding and reply relationship and rumor's propagation patterns. The content-based research method aims to detect the text, voice and video carried by microblog news. For example, Mouratidis et al. [4] analyzes linguistic features to distinguish between fake news and real news. Our work focuses on the content-based method. On the Internet, fake news arises in many different domains. The task of detecting fake news in multiple domains is called Multi-domain Fake News Detection (MFND). There are two main challenges in MFND. First, the performance of techniques generally drops if news records are coming from different domains (e.g., politics, entertainment), and a possible explanation of this could be the rather unique content and style of each domain [5] . For example, during the 2020 US election, political fake news and Covid19 fake news were widespread simultaneously. It is difficult to detect both political fake news and Covid19 fake news at the same time, because of the different or even conflicting features between political fake news and Covid19 news. Thus, a variety of previous works [6, 7, 8, 9, 10] focused on rumor detection in a single domain. But herein lies another problem. In a single-domain, there may be too little data to train a good model. Therefore, Nan et al. [11] proposed a Multi-domain Fake News Detection Model (MDFEND) which use single domain label to solve the above two problems. Single domain label is to describe that a piece of news belongs to a certain domain, such as science and technology, education, health and so on. The model receives news content and single domain label as input. Utilizing these data, the model can extract the common features of fake news in all different domains, and can distinguish the specific features of certain domain by domain label. We build this paper on that work [11] by recognizing that there are two problems. (i) A piece of news may have features of several domains. There is an example showed in Table 1 . This news has the features three domains. Single domain label can't help the model extract this kind of news with multi domain features. (ii) Their model can't be transferred to datasets without domain label. Therefore, we propose a novel model, Fuzzy-domain Fake News Detection Model (FuDFEND). FuDFEND is an improved model based on the MDFEND proposed in [11] . The improvement is made by introducing a fuzzy inference mechanism into the model. The fuzzy inference mechanism can solve the above two problems. The fuzzy inference mechanism constructs a fuzzy domain label for each news item. Compared with single domain label, fuzzy domain label can better describe the domain features of news, so that help the model better extract the multi-domain features of news. We demonstrated this by the experiment on the weibo21. In addition, FuDFEND has better transfer ability than MDFEND. We will illustrate this through experiment on Thu dataset. Our summarized contributions with this paper are: • We propose a novel model, Fuzzy-domain Fake News Detection Model (FuD-FEND), which can extract the multi-domain features of news content by the fuzzy domain label that fuzzy inference mechanism generate. The results on the Weibo21 show that FuDFEND works better than the model, MDFEND, using only single domain label. • In order to describe the multi-domain features of news, we introduced fuzzy inference mechanism to the multi-domain fake news detection task. The fuzzy domain label constructed by fuzzy inference mechanism can more accurately describe the multi-domain features that news has. • We solve the problem that model can't transfer to the dataset without domain label. We evaluate FuDFEND on Thu dataset which has no domain label. Experimental results show that FuDFEND can transfer domain knowledge better by utilizing the fuzzy inference mechanism. Researchers have come up with number of ways to detect fake news. In [12] , the authors proposed an ensemble classification model for fake news detection. Their model obtains relevant features from a fake news dataset and then uses an ensemble model to classify the extracted features, but these works are based only on news texts. In [13] , the authors study and analyze whether fake news can be distinguished from mainstream news by text writing style and whether fake news can be detected only by writing style by building a model. Rawat et al. [2] proposed a method to automatically collect fake news detection tasks online. For each piece of news data, they collect evidence and generate their summaries as another input to the model to help detect news text. In [14] author found images have different distribution patterns and statistically distinctive patterns for fake and real news. It reveals that images in the real news are more diverse and much denser than those in the fake news. Therefore, they extract image features from the visual features and overall statistics of images in news events. Alonso-Bartolome et al. [15] used the early fusion method to fuse text and image information for rumor detection. In [16] , an innovative RNN with attention mechanism (att-RNN) is proposed for effective multimodal feature fusion. In [17] , author propose a Multi-domain Visual Neural Network (MVNN) framework consisting of three main parts to mix all the features together. Social Background Detection Method. These researches mainly focused on propagation patterns, publisher-news relations and user-news interactions. In [18] , the authors examined numerous features related to user, linguistic, network and temporal characteristics of rumors. They studied the spread patterns of rumors over time and the ability to track the precise changes in the predictive power of rumor features. However, the events of this volume (111 rumor and non-romors) may not be sufficient for summarizing the prediction performance, and the algorithm takes every feature into account did not achieve good results in short observation periods. Shu et al. [19] examined the correlation of publisher bias, news stance, and relevant user engagement. They studied the novel problem of exploiting social context for fake news detection and proposed a tri-relationship embedding framework TriFN, which model relations and interactions between publishers, news, and users simultaneously for fake news classification. But it requires the social context information to be included in the data. Meanwhile, there may be significant differences in social relations in different domains, and targeted detection in different domains is a promising concept. Alrubaian et al. [20] proposed a credibility analysis system consisting of a model based on reputation, a feature ranking algorithm, a credibility evaluation classifier engine, and a user expertise model, to assess the accuracy of information on Twitter so it can stop the promotion of disinformation. In [5] , the author introduced two new English fake news datasets (FakeNewsAMT, Celebrity) covering six news domains, and the author analyzes the different performance of his model in different news domains. Nan et al. [11] construct Weibo21, a multi-domain fake news dataset in Chinese. Weibo21 contains more than 9000 pieces of real and fake news from nine domains and was manually marked the domain label for each piece of news. And they designed a Multi-domain Fake News Detection Model (MDFEND) for Weibo21. The model can extract the common features of fake news in all different domains, and can distinguish the specific features of certain domain through single domain label. However, this model relies on manually annotated datasets, which not only contain news text and true and false information, but also need to input the domain category to which the news belongs, then select the corresponding vector in the domain matrix according to the type of domain to calculate the weight of each domain experts (TextCNN), so it requires a lot of effort to obtain the data set that match the model. In the following we will propose our improvement strategy on the base of this model. FuDFEND is an improved model based on the MDFEND proposed in [11] . The improvement is made by introducing a fuzzy inference mechanism into the model. The fuzzy inference mechanism includes two modules: Membership Function and Gate. Overall framework of FuDFEND is showed in figure 1 . The working process of the model is as follows. Enter the news text into BERT [21, 22] to obtain a series of word embeddings W=[w[CLS], w1, w2, ..., wn, w[SEP] ]. W is input into a mixture of experts to extract the features of different domains. Enter W into membership function to obtain fuzzy domain label. Then, Gate generates weight scores by inputting fuzzy domain label. The output experts are weighted and summed by the weight scores , so that obtain the total feature representation . The classifier module uses the total feature representation to discriminate whether the news item is fake news. In traditional set theory, we define a set with a definite condition. For example, we define a tall set as those who are taller than or equal to 190cm. The other belong to short set. If a person is 189cm tall, very close to 190cm but still belongs to short set. Obviously, the traditional way of describing sets is very crude. In order to describe the set more accurately, LA Zadeh proposed the concept of fuzzy sets in 1965. A fuzzy set is characterized by a Membership (characteristic) Function which assigns to each object a membership grade between zero and one [23] . As shown in Figure 2 , we can use a Membership Function to describe high set and short set. If a person is 189cm tall, he belongs to a high set with a membership grade of 0.9 and belongs to a short set with a membership grade of 0.1. A piece of news may have features of different domains. Here is an example that "The new generation of iphone has added a number of new technologies such as face recognition, which are popular with consumers. The next day, Apple shares rose sharply." This news has the features of both science domain and financial domain. Therefore, we need to use fuzzy sets to more precisely measure the domains that news belongs to. We will show how to use neural network to fit the Membership Function for the sets of news domains. The Membership Function consist of an GRU, a multi-layer perception (MLP) and a softmax function. Membership Function can generate a nine-dimensional membership grade vector . We call this vector as the fuzzy domain label. During training, we train it as training a classifier for news domains. When we use it, we regard it as a Membership Function. The specific process is as follows. For a piece of news, we put the news' word embeddings W into an GRU. The output of GRU is fed into MLP which output is nine dimensional vector. Use softmax function to normalize the output of MLP. Then we obtain a vector which has nine dimensions and sum of all dimensions is 1. The value of each dimension of the vector represents the membership grade of nine news domains (Science, Military, Education, Disasters, Politics, Health, Finance, Entertainment). We denote the Membership Function as (•; •), and , 1 , 2 is the parameters in the Membership Function, GRU, MLP. represents the fuzzy domain label: The single domain label of this news is "Society". The content of this news is that "# Dad said his daughter is not a vegetative person but a sleeping beauty # when I saw this topic, I was moved by the story of Mizuho in The House Where the Mermaid Sleeps. Mizuho's mother always believed that Mizuho could hear what she said. She dressed and dressed Mizuho every day. Mizuho was like falling asleep. Of course, at the end of the book, when choosing between life and death, she chose "living" to donate Mizuho's heart. This is the first time I realized the human body." We use Mixture-of-Expert [24, 25, 26] to extract features of news' content. Each expert network is a TextCNN [27] in our model. Each expert has its own area of expertise and is good at extracting the features of certain domain. An expert network can be denoted by (•; •) (1 ≤ ≤ ). T is a hyperparameter, representing the number of the experts. denote the output of the expert. are the trainable parameters of an expert: Because different experts can extract the features of different domains, we input the fuzzy domain label into the Domain Gate, to obtain the weight score. The weight score consists of T positive real numbers, corresponding to T experts. Then we use the weight score to aggregate the features extracted by experts. We denote the Domain Gate as (•; •). is the trainable parameters of the Domain Gate. represent the weight score: Through the weight score, we aggregate the features extracted by experts to obtain the total features vector : Input the total features vector into a binary classifier which is MLP with a sigmoid output layer. And then we get our model's predicted value, ̂. The value ̂ is between zero and one and it is the probability that the news is fake. The larger the value of ̂ is, the more likely the model is to conclude that the news is fake: represent the trainable parameters of the MLP. We use to represent value label and ̂ to represent predicted value. We employ Binary Cross-Entropy Loss for training: FuDFEND is an improved model based on the MDFEND proposed in [11] . The improvement is made by introducing a fuzzy inference mechanism into the model. In order to show the effect of fuzzy inference mechanism, we compare our model FuDFEND with baseline method which mentioned in [11] , especially MDFEND which hasn't fuzzy inference mechanism. In this section, we introduce the two datasets we use. Weibo21 was released in this work [11] last year. It is a Chinese multi-domain fake news dataset. It consists of 4,488 pieces of fake news and 4,640 pieces of real news, covering nine different news domains: Science, Military, Education, Disasters, Politics, Health, Finance, Entertainment. The number of true and fake news contained in each domain is shown in Table 2 . Table 3 . Another dataset we used is the Chinese Rumor Dataset released by thunlp [28, 29] . The news in this dataset has no domain label. The dataset consists of two parts. One is called rumors_V170613 [28] , including 31669 rumors. Another named CEO_Dataset [29] , including 1538 fake news and 1849 real news. We found that some rumors in CEO_Dataset coincide with the data in Weibo21, so we randomly selected 1560 pieces of fake news from rumors_V170613 and combined the fake news with the real news in the CEO_Dataset to form a new dataset that we called Thu dataset. We performed zeroshot learning on Thu dataset to test the model's capability of domain knowledge transfer. BERT [21, 22] is fixed during training. The dimension of BERT's output embedding vectors is fixed to 768. The max length of the sentence is 170. We employ the Adam [30] optimizer for our training. As mentioned previously, we trained the Membership Function as a nine-classification classifier for domain label. We randomly selected 70% of the data from Weibo21 as the training set and the remain as the verification set. The mini-batch size is 32. The learning rate is 5e-4. After training 7 epochs using the crossentropy loss function, the f1-score on the validation set reaches 82.31%. We take the membership function from the 7th epoch training, because it has the highest f1-score. As training FuDFEND, the membership function is fixed. The mini-batch size is 128. The learning rate is 5e-4. Nan et al. [11] have showed the effectiveness of their multi-domain fake news detection model (MDFEND). Further, the results in Table 4 support that our idea is right. The fake news detection performance of FuDFEND compared with MDFEND proves that the fuzzy domain label generated by fuzzy inference mechanism can better help the model extract the features for fake news detection task. The news in Thu dataset has no domain label. FuDFEND has membership function module, so FuDFEND only needs to enter news content and does not need to enter domain label. To demonstrate the transfer learning capabilities of our models, we use MDFEND as a baseline and used the hyperparameters provided in [11] to train MDFEND. MDFEND requires domain label, so we use membership function to label the domain label for the dataset. The results are shows in Table 5 . The results prove that FuDFEND has better transfer ability than MDFEND without fuzzy inference mechanism. Previous work has demonstrated that the use of single domain label can effectively improve the performance of rumor detection models. However, for news containing multi-domain features, models using single domain label do not synthesize multi-domain features well. In this work, we propose FuDFEND and provide a set of experiments to prove that: 1) Fuzzy domain label more accurately portrays the multi-domain features of news content, so it can help the model to better extract the multi-domain features of news. 2) Fuzzy inference mechanisms can be very helpful for model to transfer domain knowledge to dataset without domain label. As fake news spreads lies, more readers shrug at the truth. The New York Times Automated Evidence Collection for Fake News Detection Fake News Detection Tools and Methods --A Review Deep Learning for Fake News Detection in a Pairwise Textual Input Schema Automatic Detection of Fake News. Association for Computational Linguistics Information Credibility on Twitter Detection and Analysis of 2016 US Presidential Election Related Rumors on Twitter Prominent Features of Rumor Propagation in Online Social Media Content Representation for Microblog Rumor Detection. Intelligent Systems and Computing Detecting rumors from microblogs with recurrent neural networks MDFEND: Multi-domain Fake News Detection An ensemble machine learning approach through effective feature extraction to classify fake news A Stylometric Inquiry into Hyperpartisan and Fake News Novel Visual and Statistical Image Features for Microblogs News Verification Multimodal Fake News Detection Multimodal Fusion with Recurrent Neural Networks for Rumor Detection on Microblogs. ACM Exploiting Multi-domain Visual Information for Fake News Detection Rumor detection over varying time windows Beyond News Contents: The Role of Social Context for Fake News Detection A Credibility Analysis System for Assessing Information on Twitter BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Revisiting Pre-Trained Models for Chinese Natural Language Processing Fuzzy sets Adaptive Mixtures of Local Experts Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts Learning to Expand Audience via Meta Hybrid Experts and Critics for Recommendation and Advertising Convolutional Neural Networks for Sentence Classification Statistical and semantic analysis of rumors in Chinese social media CED: Credible Early Detection of Social Media Rumors Adam: A Method for Stochastic Optimization