key: cord-0877772-u2827zr4
authors: Shelke, Sushila; Attar, Vahida
title: Rumor detection in social network based on user, content and lexical features
date: 2022-03-07
journal: Multimed Tools Appl
DOI: 10.1007/s11042-022-12761-y
sha: faeae43cb29e48559204fa52929199db502a4cc1
doc_id: 877772
cord_uid: u2827zr4

Emergence in the social network leads to the extensive and faster diffusion of news than conventional news channels. Verification of data is challenging due to massive information on a social network. Unverified information can be a rumor or fake news that causes damage to an individuals and organizations, revealing the harmful impact on humanity. Therefore, it is vital to combat rumor diffusion to minimize the adverse effects on society. Despite vigorous efforts to deal with this issue, researchers mainly focussed on temporal dynamics of posts and other features like a user, network, content-based, which demonstrate a moderate accuracy. The time series features are associated with an event that suppresses the other quality features related to each post. There is a scope for improvement in the accuracy, so this paper focuses on post-wise features such as user-based, content-based and lexical-based features along with post sequences. We proposed a framework that uses various essential features and combines two deep learning models. Word embedding is utilized with bidirectional long short-term memory (BiLSTM) and combined with post-wise features using a multilayer perceptron (MLP), which improves accuracy. The experiments on the real-world dataset of Twitter demonstrate a notable improvement in accuracy compared to state-of-the-art approaches.

Some effective strategies are required to fight the spread of such news, which builds fear and anxiety among society. Many specialty-based fact-checking websites such as Politifact, Snopes, FactCheck [13] work for debunking rumors or fake news. Also, there are crowdsourcing-based fact-checking sites like Twitter [40] and Facebook [12] . The rapid circulation of stories can create chaos within the society if not handled early. In the case of time-dependent events, the consequences can be frightful. News verification through manual efforts is time-consuming. Recognition of rumors and rumor sources [34] can control rumor dissemination.

Many researchers have put forth their views in a rumor detection survey. In [6] , they categorize various features broadly into content-based (semantic and lexical features) and context-based (user and network-based) features. They depicted detection in terms of a classification problem by dividing it into four modules, detection, tracking, stance and veracity classification of rumor [43] . Also, reviewers classified the approaches based on machine learning-based (ML) and deep learning-based (DL) techniques. Many researchers have started with an ML-based method to solve the rumor detection problem [18] . However, the manual feature selection in ML approach is tedious and requires physical effort. Therefore, researchers moved on towards the DL-based approach to overcome the problems of ML classification. This research focused on deep learning-based strategies. The review on DL-based methods is depicted in [2] , which shows a detailed analysis of the datasets utilized, various deep learning architectures and open challenges in rumor detection. The limitations explained in the existing review involve collecting or selecting benchmarked datasets, size or quality of data, and choice of DL architectures and relevant features.

This paper argues that the crucial focus of previous research was text and temporal features of posts using deep learning. Though few researchers combined time-dependent characteristics with other features such as user, content-based, they use aggregate or fraction values for such features. These aggregate values ignored many essential components associated with an individual post. This research utilizes the significant post-wise features from various categories such as userbased, content-based, lexical and post-based features using different deep learning models.

The contributions of this research are summarized below.

-We have collected a real-world dataset for rumor and non-rumor events from Twitter.

-Identified essential features from different categories such as user, content-based and lexical. -We have proposed a hybrid deep learning model combining bidirectional LSTM (BiLSTM) and multilayer perceptron (MLP) models. -We have comprehensively analyzed experimental results on real-world datasets and compared them with state-of-the-art deep learning-based rumor detection approaches.

The current paper has presented a literature review on rumor detection approaches using deep learning frameworks in section 2. The problem definition of rumor detection, data collection and methodology is explained in section 3. Section 4 discuss experimental results and, followed by a conclusion at the end of the paper.

The study in this research is a kind of classification problem used in many applications such as Email [30, 31] , Sentiment classification [38] and Fake news detection [19] . The research work in this paper focused on deep learning-based approaches. According to the different deep learning models used in the current work, we have divided the techniques based on recurrent neural network (RNN), convolutional neural network (CNN) and a combination of different models referred to as Hybrid models.

& RNN based approach: RNN is a form of a feed-forward neural network used to process the sequential data with a variable-length, such as time-series data and is the first to apply for rumor detection by Ma et al. [25] . They extended the basic RNN model with memory unit models like Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), where GRU performs well. The words related to rumor should get more attention, as proposed by Chen et al. [8] using multilayer LSTM and deep attention model. The model depicts the soft attention to the recurrence of distinct features with a specific focus and produces hidden representations in the posts. Guo et al. [17] proposed the same attention-based technique for the hierarchical network of word-post-subevent using bidirectional LSTM. They utilized propagation features such as an average of reposts and comments along with user and post-based features. Chen et al. [9] proposed an unsupervised learning model for rumor detection as an anomaly detection problem on Sina Weibo, a Chinese microblogging website. They utilized microblogs features (like Question mark count, Sentiment score, Pictures count) and posts (like length, count of likes, URL count). The experimental results show that their projected method could attain an accuracy of 92% and an F1 score of 89%. The self-learning semi-supervised deep learning model and the trust network layer are used in the FakeNewsNet dataset [35] , which uses a bidirectional LSTM (BiLSTM) model with a trust layer and shows an F1 score of 88% [22] . & CNN-based approach: Yu et al. [42] discover that RNN models are not suitable for early detection of rumors with limited input data, so they propose a CNN-based approach for misinformation identification (CAMI). This model can extract essential features from the input sequence and perform well. Rumor identification based on only text features using the BiLSTM-CNN model is projected by Asghar et al. [3] , where they proposed a webbased interface for rumor detection. Their method showed an accuracy of 86%. Lin et al. presented a recurrent CNN and combined bidirectional GRU with an attention network, which helps understand the vital information at the word level and learn the temporal features [23] . Also, they utilize the signal words from the text along with a fraction of userbased and content-based features. & Hybrid Models: Ruchansky et al. [27] proposed a model for fake news detection based on the text of an article, the temporal activity of user response and source users propagating it. They put forward a hybrid model by integrating features from all three categories to get a more precise rumor classification. A recommender system determines a user's genuine interest based on user involvement [32] ; therefore, user characteristics are vital. Song et al. [37] proposed a CNN-based model for credible early detection of rumor where they extract feature vectors in each interval using CNN and feed it to RNN. Also, they suggest that other features related to the user profile and propagation patterns can improve rumor detection. Liu et al. [24] proposed a model for early rumor detection, which combines RNN and CNN to capture temporal patterns of global and local features of users with the propagation paths. They utilized user-related characteristics such as user registration age, geo-enabled, verified users, etc. Data based on information campaign and promoting is used and proposed a Generative Adversarial Network network (GAN) for rumor detection by Ma et al. [26] . Kumar et al. proposed a model in which a sentiment analysis of social network users utilized various deep learning models like CNN, a variation of RNN as long short-term memories (LSTM) with ensemble and attention mechanism [21] . They used Glove (global vector for word representation) for word embedding.

Other than the proposed categories of deep learning-based approaches, the researchers have used ensemble learning for rumor detection and transfer learning for fake news classification. In ensemble learning combination of RNN, GRU and LSTM models is used with various layers in the neural network by Kotteti et al. [20] . A transfer learning using BERT (Bidirectional Encoder Representations from Transformers) model referred to as FakeBERT uses a combination of deep Convolutional Neural Network (CNN) with different kernel sizes and filters by Kaliyar et al. in [19] . The experimental results of the FakeBERT model on the fake news dataset show an excellent accurateness of 98.90%.

In the literature, most of the researchers targeted temporal and text features. While considering temporal features, they think of aggregated features, which hides the importance of post-wise features. Table 1 shows a comparison of deep learning-based methods, where most of the research focuses on textual and temporal characteristics. For text-based features, the most common models are RNN and LSTM. Table 1 depicts the excellent utilization of factors from various categories, which contributes to refining the detection accuracy of rumor and non rumor events. Table 2 compares the performance of benchmarked methods concerning the accuracy and F1 score on a real-world Twitter dataset. This table also presents the text conversion method used in the literature. The most commonly used text representation method in previous work is the term frequency-inverse document frequency (TF-IDF), whereas recently, word2vec is utilized with glove vector. CNN-based methods are mainly used to detect rumors or fake news early. Table 2 reveals that the hybrid model shows excellent performance in terms of accuracy of 89% [27] . In Table 2 , methods used in [8, 27, 37] consider either overall accuracy or cross-validation. From the literature, we can conclude that we have a scope for improvement in the preciseness of rumor detection.

Due to the advantage of the hybrid model from the literature, this research combines the BiLSTM model with Multilayer Perceptron (MLP) as a hybrid deep learning model. It explores different attributes from the user, content-based, lexical and text of a post. Features from each category are listed in section 3. This section presents problem definition, data collection and pre-processing, feature selection and methodology followed using a deep learning model.

The rumor detection in social networks formally presented as the event-wise sequence of posts given as input to the proposed model identifies whether the event is rumor or non-rumor. The event is any condition or incident that happened around us and informed through news, messages, like news-related bomb blasts, political statements, targeted organizations, etc. 

This research focused on the Twitter microblogging website. The data collection for rumors involves identifying rumor and non-rumor events from debunking sites, collecting data related to each event from Twitter and finally, cleaning data. This section presents the entire process of data curation.

We have identified rumor and non-rumor events from debunking sites of www.snopes.com and www.politifact.com. These sites have the details of story circulation, evidence of news, Figure 1 shows the recognition of rumor events from snopes.com with a rating as False [16] and Fig. 2 shows the determination of non-rumor events from the Politifact site with news status as "Mostly True" [4] . Figures 1 and 2 present the example of event identification from various websites. The tweets were collected for each event from 1st March 2020 to 31st March 2020. Twitter's 30-day endpoint premium API paid scheme extracts tweets from the last 30 days during the above period. The data for each event is collected by writing different search queries. Figure 3 shows the data collected for news by altering the keywords in search queries highlighted in bold.

The sample examples of finalized events related to rumor and nonrumor are listed in Tables 3 and 4 . These tables give details of event headline, count of posts for each event and date of data collection. The statistics for real-world data collected from Twitter are given in Table 5 . 78% of events are identified from politifact.com and 22% are from snopes.com. Besides the dataset we formed, we also utilize the publicly available Twitter dataset [28], constructed by [25] for rumor detection. Due to Twitter's policy, only tweet ids are given in a dataset for rumor and non-rumor events. Therefore, we have extracted tweets for each tweet id in JSON format. This research focuses on English posts only. Twitter does not provide data for a few events for reasons such as user does not exist, user suspended. After data pre-processing, the events having a single post and non-English language are removed from the dataset. Therefore, the final count for the events is 986, whereas the total posts become 267,708. The detailed steps of data pre-processing are explained further.

The data extracted from Twitter contains URLs, hashtags, mentions, emoticons, special characters, etc., which need to be preprocessed to use the cleaned text data as input to our model. The text data is prepared by removing URLs, hashtags, mentions, emoticons, punctuations, and non-ASCII characters using the python regex library represented by re. The duplicate posts are removed from the dataset. We have expanded the contractions Fig. 1 Identification of event as rumor from snopes.com present in the tweet, such as can't to "cannot", don't to "do not". Finally, non-ASCII characters are removed and text is converted to lower string. The detailed function of text preprocessing is shown in Fig. 4 . These clean tweets are used to pull out lexical features and word embedding. The content-based components are extracted from the original tweets.

Data encoding in the numeric form must be needed for the text data to input a deep neural network. Besides the approximate average length of all posts, we consider the maximum lengths for the sequence as 100. The Tokenizer separates the post into different tokens and used as a word dictionary. This word dictionary is used to convert the message into a sequence of integers using the text_to_sequence function. Padding after the post is performed to make all sequences equal to maximum length. The embedding layer is used to understand the meaning of words in a post, transforming each word into an n-dimensional word embedding vector by taking a sequence of posts as an input. The output of this embedding layer passed as an input to the BiLSTM model. [14] to analyze the text into lexical features, similar to Linguistic Inquiry and Word Count (LIWC). It provides a total of 190 lexical features. Table 6 shows the identified features from each category. We have extracted 8 features from the user category, 12 from content-based, and 190 from the lexical category. Principal component analysis (PCA) was applied as a dimension reduction technique on 190 features from the lexical category. The optimum number of components is determined using the cumulative explained variance graph in Fig. 5 and concluded with 125 as principal components. These features are normalized using standard scalar and feed to a multilayer perceptron (MLP), one of the deep neural networks.

Features are represented through boxplot and heatmap in feature selection from the user and content-based category. Figure 6 shows the boxplot for value distribution of user registration age and it can be noticed that the user sending genuine posts is much older on Twitter than the user sending fake posts. Figure 7 shows the difference between the correlation matrix of features from user and content-based groups related to rumor and non-rumor posts. It can be observed that few features (retweet count and follower count, verified user and follower count) in non-rumor data are highly correlated.

This section presents the details of various deep learning models implemented in this research, including existing models such as words Embedding, BiLSTM and MLP, and newly proposed models, which are explained below:

The dataset contains several tweets twt and every tweet twt is encompassed of an order of n words, i.e.t 1 , t 2 ,….t n . Each word t i transform into an embedding vector w i ∈ E m , called as word embedding. The Keras embedding layer is utilized in this research. The input to the embedding layer contains an input matrix of two dimensional, also known as word embedding matrix represented by E l x m , where l is the tweet's length, and m is the dimension of word 

We have used a variant of RNN as a bidirectional LSTM model (BiLSTM), which involves forward LSTM and backward LSTM. LSTM specifically fails to remember part of the historical data through three entryways (input door, forget door and output door), adds the 

MLP is one of the deep neural network (DNN) used to learn the post-wise features from the user, lexical and content-based features. A feature vector of 200 is given as input to MLP. A feature vector from MLP and BiLSTM is combined and provided as input to a densely connected layer in a hybrid model. lex_PCA and UCL_PCA models are executed explicitly on MLP.

Based on the above-mentioned existing models, we have implemented BiLSTM_Embed, Lex_PCA, UCL_PCA and BiLSTM_UCL models. Figure 10 represents a summary of the BiLSTM_UCL model.

This section presents the dataset used, baseline approaches, experimental environment, evaluation metrics and experimental analysis on various deep learning models.

The proposed method is evaluated on a real-world and benchmarked dataset of Twitter. Earlier, Table 7 explains the statistics of collected data and benchmarked datasets. The actual data collected from Twitter is significantly less as compared to the benchmarked dataset. Therefore, we have combined benchmarked and real-world data to get an extended dataset. Table 7 shows the details of benchmarked and extended real-world datasets. Thus, original benchmarked data get grown event-wise by 7% and post-wise by 32%. Table 7 shows the actual data size used in the experimental evaluation. Data availability after extraction varies in different papers because few posts are not available at the time of data extraction from Twitter. We have split the dataset as 70% for training, 20% for testing and 10% for validation.

Following baseline, algorithms are identified to compare and evaluate the proposed deep learning models. HAS-BLSTM [17] used a hierarchical attention model for social information with three hierarchy levels as word, post and subevent and utilized the BiLSTM model for rumor detection.

CAMI [42] discover that the CNN model can extract the significant features from post sequence and such models are suitable for early detection of rumors.

CSI [27] proposed a model for fake news classification using features based on the text of an article, the temporal characteristics of user reply and origin users broadcasting it. They combined the RNN model with a deep neural network (DNN) by integrating features from all three categories to get a more accurate rumor classification.

The environmental setup used for implementation includes a scientific python development environment as Spyder-anaconda, tweepy library to access Twitter data and Keras with Tensorflow for deep learning. For the GPU environment, we have used the Google Colab cloud service. We have used tweepy API for collecting real-world data from Twitter.

For evaluation metrics, we adopted Accuracy, Precision, Recall and F1 scores for a comprehensive evaluation are defined in Eqs. (1), (2), (3), (4). The confusion matrix summarizes predicated results over actual results, as shown in Fig. 11 , where R stands for rumor and NR for nonrumor. Accuracy is a fraction of correct predictions overall predictions. The quantity of accurate positive results divided by the quantity of positive results predicted by the classifier is called Precision. The recall is the quantity of correct positive results divided by the amount of all relevant samples. 

This section explains the optimal hyperparameters used to set up different models in the research and experimental analysis. Table 8 shows the optimal hyperparameters used in the experiment, which involves the parameters used in various deep learning models, activation function, loss function. The performance of various models is evaluated to finalize the proposed hybrid model. We have used binary_crossentropy as the loss function and Adagrad as the optimizer. The models are trained with a batch size of 32. However, the hyperparameters used for Adagrad are learning rate as 1e-1 and epsilon as 1e-07. Early stopping with the patience of 3 and drop out of 0.5 is used to avoid the overfitting of the model. Due to the early stopping number of epochs varies from 10 to 50, whereas the models are evaluated for 100 epochs. The accuracy of a model is verified with the learning curve of accuracy and loss to training and validation data. Figure 12 shows the learning curve of accuracy and loss for the BiLSTM_UCL model.

Initially, PCA applied on lexical features and given as input of 125 principal components to MLP called as Lex_PCA model shows the accuracy of 91%. UCL_PCA is an MLP model that takes 145 features as an input, where 12 features from content-based, 8 from the user group and 125 features from the lexical category after applying PCA. The model trained with UCL features where UCL stands for User-Content-Lexical, which shows an accuracy of 93%. The BiLSTM model considers posts with word embedding as input to the bidirectional LSTM model and offers 95% accuracy. This model has 5 dense layers in MLP. The third model, BiLSTM_USL, combines the output of the previous two models and shows an accuracy of 97%. Table 9 compares experimental results on various deep learning models for real-world and benchmarked datasets. Here, the actual data collected from Twitter is significantly less than the benchmarked dataset. Therefore the proposed method is evaluated on the extended dataset. Table 9 expresses a significant improvement in precision and recall value from 0.90 to 0.96 for rumor events. Also, it can be observed that results are slightly similar on a benchmarked and extended dataset. From Table 10 , it can be observed that combining BiLSTM_Embed and UCL_PCA model improves the accuracy, which shows the accuracy of 97% for the BiLSTM_UCL model. CSI [27] shows the highest accuracy of 89% from the previous work, which the Lex_PCA model shows. The experimental results shown in Table 10 are the values taken from the results mentioned in the related research paper for similar datasets and methodology. The experimental results of the proposed method are tested on a benchmarked dataset. Figure 13 presents the comparison of the proposed model with existing models concerning accuracy. Figure 14 shows the improvement in all implemented models where accuracy improves from 89% to 97%. The results from overall experiments conclude that the proposed BiLSTM_UCL model shows an excellent enhancement in the accuracy of rumor detection. Table 9 presents the performance of proposed models on two datasets of different sizes and shows similar performance in terms of precision, recall and F1 score. Deep learning models are most suitable on large dataset and experiments in this research demonstrates that proposed method is scalable. Fig. 12 The learning curve of accuracy and loss for the BiLSTM_UCL model The computational complexity of NN models are analyzed in terms of a multiplication per recovered output by Freire et al. in [15] . Parameters considered for MLP are [batch, n s , n i ], where the batch is the batch size, n s is memory size and is n i features count. Considering the Lex_PCA, an MLP based model having 125 features and 3 dense layers with the number of neurons in each dense layer as nd 1 , nd 2 and nd 3 , then the computational complexity (CC) of the MLP model can be given in Eq. (5) as:

Where a, b and c represent contributions from the input, hidden and an output layer of MLP.

The limitation of this research is the real-world data collected is relatively less; therefore, we have extended the benchmarked dataset by combining collected realworld data with the existing dataset. Although the results are evaluated on the benchmarked dataset and baseline algorithms, the dataset is not entirely available due to Twitter's policy. In the previous work, methods are evaluated on Sina Weibo and Twitter dataset. However, this research assessed only the Twitter dataset and focused on only English posts.

The diffusion of rumors and their impact on society is a massive problem in current social networks. To combat this, we have come up with rumor detection using post-wise essential features. Compared to the previous work, where more importance was given to text and temporal features and showed a moderate accuracy, this paper focused on text, user, contentbased, and lexical category features. The BiLSTM with word embedding and MLP model with various features improves the accuracy. The experimental results compared with the state-ofthe-art approaches and show a good improvement in the accuracy. This research also fetched 

real-world data from Twitter and evaluated the experiment on both real-world and benchmarked datasets. Lexical features with PCA components show an accuracy of 89%. The continuous improvements in the proposed models help finalize the combined model of BiLSTM_UCL with significant features from selected categories, demonstrating accuracy of 97%.

In the future, we are planning to implement the same aspect with temporal features and attention models. The attention model can be utilized to identify the significant attributes from lexical features and will help to replace the feature selection method. In temporal characteristics, the word count of each post can be used to convert variable-length posts into fixed length posts. Also, future research may utilize multimedia-based features (such as image count, multimedia content present, is_real_image in the post? and video link present) to check the real news. 

Conflict of interest The research work presented here has not been submitted to, nor under review, at another journal or other publishing venue. All authors have participated in conception and design, analysis and interpretation of the data, drafting the article or revising it critically for important intellectual content, and approval of the final version.

After COVID-19 vaccine, blood or plasma donation not allowed

Deep learning-based rumor detection on microblogging platforms: a systematic review

Exploring deep neural networks for rumor detection

Available: PolitiFact | Bill Gates warned in 2015 that we were unprepared for an infectious virus

Verifying information with multimedia content on twitter: a comparative study of automated approaches

A survey on fake news and rumour detection techniques

Information credibility on twitter

Call attention to rumors: deep attention based recurrent neural networks for early rumor detection

Unsupervised rumor detection based on users' behaviors using neural networks

COVID-19 killed fewer people than the flu

COVID-19 Vaccine Cause Herpes

Empath: understanding topic signals in large-scale text

Performance versus complexity study of neural network equalizers in coherent optical systems

Gargling with salt water or Vinegar 'eliminate' the COVID-19 coronavirus from the throat

Available: Will Gargling with Salt Water or Vinegar 'Eliminate' the COVID-19 Coronavirus? | Snopes

Rumor detection with hierarchical social attention network

Rumor detection on social networks: a sociological approach

FakeBERT: fake news detection in social media with a BERTbased deep learning approach

Ensemble deep learning on time-series representation of tweets for rumor detection in social media

Fake news detection using deep learning models: A novel approach

A novel self-learning semi-supervised deep learning network to detect fake news on social media

Rumor detection with hierarchical recurrent convolutional neural network

Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks

Detecting rumors from microblogs with recurrent neural networks detecting rumors from microblogs with recurrent neural networks

Detect rumors on twitter by promoting information campaigns with generative adversarial learning

Csi: a hybrid deep model for fake news detection

The spread of low-credibility content by social bots

ML-EC2: an algorithm for multi-label email classification using clustering

Comparative study of classification algorithms for spam email detection

Personalized recommendation system with user interaction based on LMF and popularity model

Source detection of rumor in social network -a review

Origin identification of a rumor in social network

Fakenewsnet: a data repository with news content, social context, and spatiotemporal information for studying fake news on social media

Social network statistics

CED: credible early detection of social media rumors

Email sentiment classification using lexicon-based opinion labeling

Lee C, van den Bosch A (2017) Exploring lexical and syntactic features for language variety identification

A convolutional approach for misinformation identification

Detection and resolution of rumours in social media: a survey

Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations