key: cord-0214543-ulhzezaz authors: Butt, Sabur; Sharma, Shakshi; Sharma, Rajesh; Sidorov, Grigori; Gelbukh, Alexander title: What goes on inside rumour and non-rumour tweets and their reactions: A Psycholinguistic Analyses date: 2021-11-09 journal: nan DOI: nan sha: 13c0d84466fe840ad42e158452c6f21865f4192a doc_id: 214543 cord_uid: ulhzezaz In recent years, the problem of rumours on online social media (OSM) has attracted lots of attention. Researchers have started investigating from two main directions. First is the descriptive analysis of rumours and secondly, proposing techniques to detect (or classify) rumours. In the descriptive line of works, where researchers have tried to analyse rumours using NLP approaches, there isnt much emphasis on psycho-linguistics analyses of social media text. These kinds of analyses on rumour case studies are vital for drawing meaningful conclusions to mitigate misinformation. For our analysis, we explored the PHEME9 rumour dataset (consisting of 9 events), including source tweets (both rumour and non-rumour categories) and response tweets. We compared the rumour and nonrumour source tweets and then their corresponding reply (response) tweets to understand how they differ linguistically for every incident. Furthermore, we also evaluated if these features can be used for classifying rumour vs. non-rumour tweets through machine learning models. To this end, we employed various classical and ensemble-based approaches. To filter out the highly discriminative psycholinguistic features, we explored the SHAP AI Explainability tool. To summarise, this research contributes by performing an in-depth psycholinguistic analysis of rumours related to various kinds of events. The credibility of information is the most decisive issue on social media as the unmoderated nature of social media text has resulted in several cases of misinformation spreading [11] . * Both authors contributed equally to this research. In this work, we focus on rumour, which is a specific kind of misinformation, whose authenticity has not been verified [57] . In the past few years, researchers have analysed rumours from two different directions which can be divided into descriptive analyses of rumours, and the detection of rumours using a variety of machine learning and deep learning techniques [16, 31, 46] . Despite these apparent robust techniques, the increasing tendency to give rise to rumours motivates the development of systems that, by gathering and analysing the collective judgements of users [29] , are able to reduce the spread of rumours by accelerating the sense-making process [10] . In particular, linguistics and natural language processing researchers have taken the onus to study how users have discussed rumours and to understand the psycho-linguistic attributes connected to rumour spreading and detection [17, 47] . Scientific studies aim to understand the malicious intentions of spreading rumours through the psychological processes involved in the use of language can help in textual classification and behaviour analyses of users. Linguistics and natural language processing researchers study how users have discussed rumours and understand the psycho-linguistic attributes connected to rumour spreading and detection [17, 47] . See Section 2 for more details. To aid in the mitigation of misinformation, in this work, we performed the analysis of rumour vs. non-rumour tweets using psycholinguistic approaches, which is the study of the interrelation between linguistic factors and psychological aspects. It should be noted that we are not considering the user level features and are solely focusing on textual information. To be specific, we used psycholinguistics attributes that tend to convey the latent (hidden) meaning of the text. However, the patterns of these attributes cannot be determined in individual instances and need to have aggregated supervised data processed by computing psycho-linguistic methods and statistical evidence. To the best of our knowledge, this is the first study to use psycholinguistics features to conduct an in-depth analysis of rumour and non-rumour tweets. We also investigated how effectively the characteristics of psycholinguistic analyses can assist classification algorithms for predicting rumour and non-rumour tweets. As a step further, we also looked at the "why and what" part. That is, to filter which features are more important in the identification of rumours. Specifically, we investigate the following research questions: RQ 1: Is there a difference in psycho-linguistic characteristics between rumour and non-rumour source tweets? RQ 2: How can psycho-linguistic features be used to differentiate between the reactions that rumour and non-rumour tweets attract? RQ 3: Does the contribution of psycho-linguistic features vary from event to event or do they remain consistent for all events? RQ 4: Can we exploit these features for classifying rumour and non-rumour tweets using Machine Learning models? RQ 5: Which psycho-linguistic features are highly discriminative for identifying rumour and non-rumour class for the classification task? To answer these research questions, we used the PHEME-9 dataset, consisting of 9 different events (Section 3). We performed psycho-linguistic studies to address RQ1, RQ2 and RQ3, and extracted four types of features from the dataset: LIWC, Readability, SenticNet, and Emotions. All of these features provided us with more insight and perspective into the rumour and non-rumour tweets. We calculated the statistical significance and mean values of every psycho-linguistic feature for rumour source tweets, non-rumour source tweets, reactions of rumour tweets, and reactions of non-rumour tweets, respectively. The statistical significance tests helped us to assess the difference between the rumour and non-rumour psycho-linguistic features. We explain the methodology in Section 4 and the results of our analysis in Section 5. For RQ4 and RQ5, we further used machine learning classifiers (classical as well as ensemble-based) on every event to evaluate the effectiveness of the psycholinguistic features in classifying the tweets into rumour vs. non-rumour classes. Lastly, we use SHAP, an AI Explainability tool based on shapley values to identify the measure of contributions each feature has in the model (Section 6). Finally, we conclude this article with some future works in Section 7. Psychological perspectives have been used in several studies to find a correlation with conspiracy theories. Douglas et al. [12] wrote extensively on the psychological factors and divided them into existential (e.g. desire for control), social (e.g. desire to maintain a positive image of the self or group) and epistemic (e.g. desire for understanding) factors. The author explained that these factors contribute to the popularity of conspiracy theories. Another study discovered [26] that conspiracies urge from the need for uniqueness. Similarly, in the light of the COVID-19 pandemic, researchers [6] have correlated a high level of conspiracy thinking of parents with the delay of vaccination among children. While there are studies that have attempted to discover the correlation of psycholinguistics with fake news [47] and conspiracy [41] in general, we narrowed it down to the understanding of rumour. Various NLP studies have been published related to rumour detection in text. These studies mostly revolve around the data collection, techniques and features suitable for identifying rumour text. Other than PHEME which is the benchmark dataset for rumour detection and is used for this study, the popular text based datasets include RUMDECT [30] , SNAP data [55] , CrisisLexT26 [36] , MULTI [21] , KWON [25] and RUMOUREVAL [11] . Among the components of rumour detection, rumour has been explored [52] in the context of rumour tracking, rumour stance classification, rumour detection, and rumour veracity classification. We analyzed the most commonly used techniques for rumour tasks and observed the use of traditional Machine learning (TML) methods with selective feature engineering, deep learning models (DL) and hybrid models respectively. In the study [58] authors introduced context-aware rumour detection using a sequential classifier to detect rumours from the tweets divided into five news events. In an outbreak of stories, identifying the emergency is also important for the timely detection of credible information. The work [54] used an unsupervised algorithm to label the tweets as credible and incredible and identified the urgency of the news verification using supervised machine learning methods and multiple features (content, author, and diffusion). The Rumour detection task can be topic-level [56] or post-level [11] , where the task is to identify if the topic is relevant to the text or if the post information has rumour respectively. The trend of rumour detection on topic-level has shifted quickly towards the post level understanding of rumour and, hence, needs insightful analyses for understanding linguistic and psycho-linguistic attributes of rumour. The need to understand the explainability aspect of rumours stems from the need for early detection. Early detection of the rumour is essential to mitigate the harm and the study [31] segregated the rumours using a propagation tree. They used recursive neural networks to classify the tweets into false rumour, non-rumour, unverified rumour and true rumour. A Recurrent Neural Network (RNN) based aware provenance approach was proposed [13] combining the textual information and provenance information to enhance the results. To tackle the cases where provenance details were missing, they used a fusion of text and provenance information. Among the machine learning models [2, 16, 21, 42 ] Support vector machine (SVM), Naive Bayes (NB), Random Forest (RF), Decision Tree (DT), Logistic Regression (LR), and XGBoost have been used repeatedly. Deep learning models (Convolutional neural network (CNN), Long short-term memory (LSTM), Recurrent neural networks (RNN)) [24, 35, 44] and transformer-based models (XLNet, BERT, RoBERTa, DistilBERT, and ALBERT) [9, 45] have often given the state of the art results in multiple rumour tasks. Psycholinguistic features on rumour detection studies have involved the use of Long Short-Term Memory (LSTM) with LIWC features i.e swear words and personal pronouns [50] and emotions [53] . The study [53] showed the existence of false rumours that triggered fear, disgust and surprise in the rumour replies, while, [19] proposed an LSTM-based model with emotions to check the credibility of articles. The role of user profiles was evaluated by [48] which showed implicit (age, location) and explicit features (follower count, status count) that contributed to fake news. In addition, they showed how the combination of these features with the psycho-linguistic characteristics (LIWC, Writing styles) can be effective. Another study [18] proposed a CNN based model combined with personality traits (Big-five) and LIWC to distinguish between fake users. Similarly, the authors [17] gave a comparative analysis over various profile, linguistic and psychological characteristics (Big-five personality traits, LIWC, emotions) and used CNN based classifier with word embeddings and psycho-linguistic characteristics for classification. Contrary to these studies, our main focus is to provide insights into the source as well as reaction tweets comparing different events as every event triggers a different insight. Moreover, we discuss the explainability part in the context of which of these features highly contribute to the rumour or non-rumour classes as well as their reactions. It should be noted that none of these papers indulges in the explainability aspects of psycholinguistics in rumour tweets and reaction tweets. This study has been carried out using the PHEME dataset [23] , which consists of nine events, wherein all the events are breaking news. To be specific, each event contains the source tweets which are divided into rumour and non-rumour source tweets. Similarly, every source tweet has a set of reaction tweets which are again divided into rumoured reaction tweets and non-rumour reaction tweets. The reactions are triggered by either a rumoured source tweets and the non-rumoured source tweets and the reactions triggered by rumour and non-rumour tweets are set into rumoured reaction tweets and non-rumoured reaction tweets, respectively. However, it should be noted that the reactions have no ground truth to be labelled as the rumour or non-rumour and are just a reaction in the form of replies. The dataset initially reported five events [58] , but was later extended to nine events to bring more variety in context. Understanding the events for rumours in the text also helps us understand the linguistic differences since these incidences were rife with rumours and gained significant media attention. The nine incidences mentioned are explained below: • We combined various psycholinguistic features such as LIWC, SenticNet [8] , readability indexes, and emotions to bring out true insights about user patterns. In computerized text analyses, Linguistic Inquiry and Word Count [37] is a gold standard in understanding linguistic aspects of motivations, thoughts, feelings and personality. SenticNet is used to derive concept-level sentiment analyses from the text. Readability features indicate the easiness to interpret a text depending on its unique attributes. Finally, emotions show us the true nature of the feelings which cannot be interpreted by plain polarity and sentiment detection. We first thoroughly pre-processed the noisy data of Twitter and then extracted previously mentioned features from it. The LIWC features were extracted through the LIWC (2015) program which records all punctuations and words. We used Textatistic 1 python library to extract readability features including Gunning Fog, Flesch Reading Ease, Flesch-Kincaid, Simple Measure of Gobbledygook (SMOG) and Dale-Chall. It is important to keep the period in a sentence to extract readability features, hence, we only removed hashtags, user mentions, emojis and URL's. SenticNet features were calculated using the SenticNet API 2 and the sentic words were combined to achieve the phrase features. For Sentic-Net features, we used word lemmatization and removed all punctuations, URL's, user-mentions, hashtags, and custom stopwords (without negation words) for pre-processing. Emotions were calculated using DistilRoBERTa base model [43] trained on a combination of multiple emotion datasets for English predicting Ekman's 6 basic emotions, plus a neutral class which can be tested on the HuggingFace API 3 . To verify that the difference between the features (indicated in Section 4.1) of rumour and non-rumour classes are significant, we employed Kolmogorov Smirnov (KS) test 4 . This is a nonparametric test used to compare the two distributions. In our case, the two distributions correspond to rumour and non-rumour features, as explained in Section 5. Next in Section 6, we used these features as inputs to machine learning models to evaluate whether they are good enough to classify the tweet as a rumour or non-rumour. Furthermore, we utilized the SHAP explainability tool 5 , indicating which features are more relevant than others when making predictions. In this section, using psycholinguistic analysis we studied the characteristics of rumour and non-rumour source tweets and the corresponding reactions to the tweets. Different events show us insightful information on the nature of rumour categories and how the social and psychological meaning of words can change in every scenario. In addition, we calculated the statistical significance test to validate that the difference in both classes (rumour and non-rumour) was not due to random chance. We used the mean value to evaluate the overall influence of the individual features for every class. LIWC features tell us about the psychometric properties of the tweets. Table 2 and 3 shows the results of individual events for source tweets and reaction tweets respectively. The Tables clearly show a significant difference between rumour and nonrumour tweets based on LIWC categories. The Table 4 on the other hand is an aggregated representation of the significant differences for LIWC features in source tweets and their reactions. 5.1.1 Linguistic Processes. These processes include text related analysis. In particular, Word Count (WC) tells us about the engagement and domination of a user in a conversation. Word count needs to be balanced in the deceptive scenarios where the descriptiveness of the scenario needs to be balanced and too many words can reveal inaccuracies. In every event, word count was proven to be statistically significant in either the source tweets or the reaction tweets. The average word count (AWC) of all events of rumour source tweets was 20 and the average word count for the rumoured reply tweets was 14.63. Where the average word count of rumoured source tweets and non-rumoured source tweets was almost the same, AWC in non-rumoured reply tweets was found to be higher than rumoured reply tweets. One possible reason for this can be that source tweets needed more convincing and engagement and also engaged more non-rumoured reply tweets to deny the rumoured claims. The function words (Table 2 and 3, Row 2) in LIWC include total pronouns, impersonal pronouns, articles, prepositions, auxiliary verbs, common adverbs, conjunctions and negations. Among the personal pronouns (I, s/he, they, we), we can identify that the mean in rumour source tweets is 1 against the 3.20 in non-rumour source tweets. Pronouns give us a lot of insight into the personality of the users as they indicate how users are communicating with each other [1, 3, 49] and what is the intent [5] of the conversation. We saw a means score of 1st person singular (I), 1st person plural (we), 2nd person(you), 3rd person singular (S/he) and third-person plural (they) all to be higher in non-rumour source tweets. The same trend was seen in the rumour and non-rumour reaction tweets where, the use of personal pronouns was significantly more however, the non-rumour personal pronouns still had a higher mean. Although the collective significance of function words in reaction tweets was not seen, when we divided the significance based on events (3), we saw 5 out of 8 events showing significant difference between non-rumour and rumour reaction tweets. Use of prepositions (to, with above) shows us concern with precision [ 34, 39] and was found to be higher in rumour source tweets ( = 11) compared to rumour reaction tweets ( = 7.81). Negation words psychologically correlate with inhibition [38, 51] and was seen to be higher in rumour reaction tweets with the mean value of 2.02 compared to 1 in rumour source tweets. In both events, non-rumour tweets has higher negation words than rumour tweets. Similarly, other attributes such as conjunction ( = 2 in RS, = 2.39 in NRS), common adverbs ( = 2 in RS, = 2.35 in NRS), auxiliary verbs ( = 4 in RS, = 4.71 in NRS) and impersonal pronouns ( = 1 in RS, = 2.34 in NRS) all saw lower mean in rumour source tweets. Use of Impersonal pronouns, auxiliary verbs, conjunctions and negation words was seen to be more in reaction tweets where non-rumour reactions had a higher mean than rumour reactions. Processes. Among the psychological processes we analyzed affective process (positive emotions, negative emotions), social processes (female references, male references, family, friends), cognitive processes (insight, causation, discrepancy, tentative, certainty, differentiation), perceptual processes (see, hear, feel), biological processes (body, sexual, ingestion, health), drives (affiliation, achievement, power, reward, risk), time orientations (past focus, future focus, present focus), relativity (motion, space, time) and personal concerns (work, leisure, home, money, religion). Though affective processes are discussed extensively in Section 5.4, LIWC stats showed positive emotions words ( = 1 in RS, = 2.16 in NRS) and negative emotions ( = 2 in RS, = 3.19 in NRS) breaking down to more anxiety, anger and sadness in the non-rumour source tweets. The rumour ( = 7.034) and non-rumour ( = 7.624) reaction stats show us a similar story with higher affective processes in non-rumour tweets. One can observe (Table 2 Social words correlate with social concerns [33, 34, 50] , and social support [27] , we observed (Table 2 Cognitive Processes gives us the insight of the reasoning and difference in the thought process of the authors [1, 4, 28, 50] . Cognitive processes showed a significant difference in the events of both reaction and source tweets (Table 2 and 3, Row 5) The event of Putin and Prince had no significant difference in cognitive processes due to the nature of the rumour (no supporting claim). Cognitive processes were also used higher by the users in non-rumour source and reaction tweets. Except for the tentative words (maybe, perhaps) that were used more in the rumour source tweets and reaction tweets, all subcategories of cognitive processes had a higher value of mean in the non-rumour source and reaction tweets. Perceptual Processes tell us about the sensory experiences in the text including seeing, hearing and feeling related words. We saw that the perceptual processes (Table 2 and 3, Row 6) did not create a lot of impact in the individual scenarios except in Putin's case in the source tweets where the category was very relevant to the scenario of Putin being absent since his last sighting. Similarly, biological processes (Body, health, sexual, ingestion) was also very scenario specific on significant in Sydney's case was about a hostage scenario and an act of terrorism. We conclude that biological processes (Table 2 and 3, Row 7) become significant where the incident is directly related to health, body, sex etc. Time orientations in the psychological processes gave us good insight about the rumour and non-rumour scenarios where rumour source tweets (Table 2, Row 13) were more past focused ( = 3 in RS, = 1.52 in NRS) and non-rumour source tweets were more present focused ( = 6.42 in NRS, = 5 in RS). Reactions (Table 3 , Row 13) to non-rumour and rumour tweets followed the same trend. Relativity (area, bend, exit) includes motion (arrive, car, go), space (down, in, thin), time (end, until, season) related words. Relativity, in general, proved to be less significant in reactions (Table 3 , Row 9) of the scenarios and significant in five out of eight scenarios in the source tweets (Table 2, Row 9). Though relativity is more heavily used in rumour source tweets and their reactions, however, an important observation to make is that motion related words were present more in the non-rumour source tweets and the reactions of the rumour tweets. A combination of higher mean in the negation words and motion related words in the non-rumour source tweets shows us the attempted correction in the direction of the conversation. People show their personal concerns in the reactions and sources of the tweet where we can see an emphasis on work, money, leisure, home religion and death related words. Personal concerns can be seen to be significant in some events (Table 2 and 3, Row 12) i.e Charlie, Sydney and Ferguson in the source tweets. However, this can be case dependent as all these events were linked to violence and reported abuse. In the incidences that mattered, we saw a higher mean of work, death and leisure related words in the rumoured cases whereas more use of home, money and religion related words in the non-rumoured source tweets. Informal Language (Table 2 and 3, Row 10) on social media is expected in general and the data presented a similar scenario where the mean of non-rumoured source tweet ( = 5.32 in NRS, = 5 in RS) was higher in mean. In general, the reaction of the tweets had a high mean ( = 4.024 in NRS, = 4.051 in RS) of informal words as well. Drives (Table 2 and 3, Row 8) are the motivational factors and in rumour detection it can identify a lot about the authors motives/agendas behind the tweets. Drives might include affiliation (ally, friend), power (superior, bully), reward (prize, benefit), risk (danger, doubt), and achievement (win, success) related words. Drives had a significant impact on the differentiation between non-rumour and rumour resources. Breaking down the reaction on rumour and non-rumour tweets, it also played a vital role in differentiating how drives of users were different in every scenario impacting through the rumour and non-rumour sources. Further narrowing revealed that rumoured sources had higher mean in reward, risk and power related words and non-rumour source tweets had more mean of affiliation and achievement related words. (Table 2 and 3, Row 11) such as question marks and assents show how people are communicating with each other. Punctuations can also show us attempted effort for explanation or emphasis through indicators such as apostrophes and parentheses. Punctuations did not have a lot of impact in the reactions of rumour and non-rumour, however, played some part in differentiating the rumour and non-rumour source tweets. We observed that non-rumour sources of tweets have relatively higher mean periods, commas, semi-colons, question marks, apostrophes and parenthesis. The reactions, in general, had more mean of punctuations for both rumour and non-rumour. Grammar. The other grammar category (Table 2 and 3, Row 14) is the only category that is statistically significant in both rumour and non-rumour source tweets and their reactions ( Table 4 , Row 14) . The grammar category of LIWC includes common verbs, common adjectives, comparisons, interrogatives and quantifiers. Common verbs can explain the temporal focus of the tweets along with common adjectives that identify the actions. In the non-rumour and rumour source tweets, common adjectives almost show similar means and hence do not contribute to differentiating. The reactions to the rumour and non-rumour tweets however used much more common adjectives and verbs. Non-rumour source tweets and the reactions to non-rumour source tweets had a higher mean of interrogative and comparison words compared to rumour source tweets and their reactions, signifying that users were questioning more about the non-rumour tweets compared to rumour tweets. Reaction of both categories also showed a greater mean of quantifiers. Readability allows differentiating between a text that is easy to comprehend compared to a text that is complicated and requires a high level of education or intelligence for understanding. There are many readability scores used to evaluate the text, we considered the most popular tests to evaluate tweets. Table 5 and 6 shows the significant difference in readability scores between rumour and non-rumour tweets and their reactions to various events. The Table presents Flesch score [14] , Flesch-Kincaid score [22] , Gunning-Fog score [20] , Smog score [32] , Dale-Chall score [15] . Although we found no significant difference collectively of readability scores, however, we found many incidences of significance individually when divided per event. It would be fair to say that Twitter rumour spreaders and their responses engage casual conversations and are designed to target masses for rapid spreading of news which is different from formal news platforms. SenticNet features ∈ −1, 1 give us a commonsense understanding of the text by translating the hourglass wheel of emotions [7, 40] into statistical values. We considered the sentic values (Aptitude, Pleasantness, Attention, and Sensitivity) and polarity associated with the concept. We observed a shift of emotions throughout the tweets giving a mix of SenticNet values. The aggregated statistical significance can be seen in Table 7 Table 9 shows significance in many individual scenarios such as the Sydney case where all SenticNet features had significant differences among reactions. Reactions of rumour and non-rumour tweets gave us negative mean pleasantness ( = -0.01907 in R, = -0.0258 in NR) weighing more towards non-rumour tweets along with sensitivity ( = 0.0756 in R, = 0.0853 in NR), polarity ( = 0.0373 in R, = 0.0383 in NR) and aptitude ( = 0.05901 in R, = 0.0681 in NR). Attention related emotions (interest, anticipation and vigilance) were seen more in rumour reactions ( = 0.04763 in R, = 0.0389 in NR) clearly showing how people are more attentive towards rumoured content that is designed to draw more attention. The statistics identify that Sentic net values for non-rumour reactions draw more attention to emotions like annoyance, anger, acceptance, trust, grief and sadness compared to rumour reactions. It should be noted that the emotions are triggered depending on the scenario and drive of rumour spreaders. Table 10 shows the emotion percentages across the rumour and non-rumour categories. Fear and Sadness are the two most instigated emotions in the rumour tweets. The reactions to the rumour showed that the percentage of fear and sadness was converted into anger and surprise while the highest instigated emotion being neutral. Non-rumour source tweets had the highest percentage of neutrality, followed by fear, anger and sadness. The reactions to non-rumour source tweets had less percentage of fear and greater percentages of neutrality, anger and surprise. We can see the patterns of rumoured tweets trying to use negative emotions to instil fear among people and as a reaction, many people felt fear compared to reactions to non-rumour sources in general. It should be noted that the majority of the rumoured incidences in the study were related to some sort of tragedy, however, the distribution among the same news in rumoured and non-rumoured forms shows the extent of negative and positive emotions used to achieve the potential motives. The Table 11 gives a bird's eye view to the analyses section. We partitioned the rumour and non-rumour source tweets and their reactions based on feature impact and mean. The analysis is further strengthened by the SHAP plots and explanation given in Section 6. In this section, we investigate how effective are the four types of features (discussed in the previous section) distinguishing between the rumour and non-rumour categories (classes). We use these features as inputs to various Machine Learning (ML) models, both classical and ensemble-based, to categorise each tweet as rumour or non-rumour. Each event in the dataset is passed through 30 ML models individually. Moreover, we train two separate models based on two types of inputs, that is, source tweets features and reply tweets features. As previously stated, if a source tweet is classified as a rumour, all of their reply tweets are also classified as rumours and vice versa. This aids in determining whether the individual features of source and reply tweets are distinct enough to classify the tweet into one of two classes. We split the source tweets (psycho-linguistic features) dataset into 80% train and 20% test set for each event. Further, we applied a ten-fold cross validation technique on the train set, which resulted in 10 train-fold sets. Following that, each ML model is trained on each train-fold set and then evaluated on the 20% test set. We also apply the oversampling technique where the dataset is imbalanced. The same procedure is followed for reply tweets. We used default hyper-parameters of the ML models provided by scikit-learn 6 package. Due to space constraints, we only report the results of the best performing model, which is the Random Forest model, in Table 12 . Each of the eight events is evaluated based on four metrics: Accuracy, Precision, Recall, and F1 Score. The Table shows that the results of source tweets are better (# of cases -6) or equal (# of cases -2) to the results of reply tweets, implying that source tweets are more important to classify rumour and non-rumour tweets than reply tweets. Next, the SHAP Explainability AI tool is utilized to evaluate the contributions of each feature in the classification task. Specifically, by computing the average marginal contributions of each feature, this tool assists in determining the significant features. Figure 1 shows the significant features for the rumour class in descending order. Due to space limits, we are only showing the SHAP plot of Charlie hebdo event for both source and reply tweets in Figures 1a and 1b respectively. It can be noted that for the source tweets, fear is the highest contributing attribute, whereas for reply tweets relative attributes the most. We also illustrate the features that have a positive and negative impact on the class. Particularly, in fear attribute, as the red color samples, which are on the right side of the x-axis, are more than the blue color samples, this indicates a positive impact on the rumour class. This means that the higher the value of the fear attribute, the better the chances of predicting the rumour class. This is intuitive as the fear emotion is high in rumour tweets. The features: surprise, personal, sadness, dalechall_score, relativ, summary, attention, drives, fleschkincaid_score, grammar, WC, time, and percept all have a positive impact on the rumour class, as seen in Figure 1a . We also noticed that the SHAP plot of reply tweets, Figure 1b has the same but few different features than the SHAP plot of source tweets, Figure 1a : sensitivity, affect, informal, bio, all of which have a positive influence on the rumour class. The remaining attributes in both the Figures have a negative impact, meaning that as the value of these attributes decreases, the likelihood of correctly predicting rumour class increases. We noticed a similar trend in other terrorist-related or killing people events such as Sydney siege, Germanwings crash. In comparison, in non-terrorist events such as Ferguson, which is an event about protest, we observed that (not shown due to space limits), these events contains more positive influenced features with respect to rumour class than the Charlie event, including language, fleshkincaid_score, cogproc, surprise, sadness, personal, anger, gunningfog_score, disgust, WC, function, drives, social, aptitude, bio, neutral, fear, time, polarity, attention, percept, smog_score. One possible explanation could be that the nature of the event was about speculation, future occurrence, and was driven by fear. The purpose of this research was to perform an in-depth analysis of the psycholinguistics side of the rumour task. We discovered a substantial difference between rumour and nonrumour psycho-linguistics source features, as well as between reply features. We discovered that rumour source tweets used more past related words, prepositions and contain drives (motivation) related to reward, risk, and power. Similarly, non-rumour source tweets had more mean of features such as affective processes, cognitive processes (insight, causation, discrepancy, certainty, differentiation), present-related words, informal language, and had drives related to affiliation and achievement. The highest percentage of neutrality was found in non-rumor source tweets and non-rumor reactions, whereas rumour source tweets were driven by fear and grief, and their reactions invited anger and fear. Attention-related emotions (interest, anticipation, and vigilance) were more prevalent in rumour reactions, according to SenticNet. Readability has a considerable impact on the majority of events. We also explored the effectiveness of these features in predicting rumours. Specifically, we discovered that the ensemble-based, Random Forest model, for all events outperformed the other used models. As machine learning models are black-box in nature, we utilised the SHAP AI Explainability tool to look for the features that are more important than the other features. This helped in understanding which features contribute the most in classifying the tweets into one of two categories. For future work, we plan to work in multiple directions. One possible future direction is to examine these features from a user-level perspective. This aids in the comprehension of rumour spreaders' human psychology. Another possible extension is to include more psycho-linguistic features, such as morphological features, referential cohesion, in our future study. Talk to me: Foundations for successful individual-group interactions in online communities Grigori Sidorov, and Alexander Gelbukh. 2021. CIC at CheckThat! 2021: Fake news detection using machine learning and data augmentation Telling losses: Personality correlates and functions of bereavement narratives Physical and psychological effects of written disclosure among sexual abuse survivors Conflict, arousal, and curiosity Parent psychology and the decision to delay childhood vaccination The hourglass of emotions Sentic API: a common-sense based API for conceptlevel sentiment analysis Transformer-Based Language Model Fine-Tuning Methods for COVID-19 Fake News Detection Pheme: Veracity in Digital Social Networks Geraldine Wong Sak Hoi, and Arkaitz Zubiaga The psychology of conspiracy theories Provenance-based rumor detection Simplification of Flesch reading ease formula A new readability yardstick Rumor detection of Sina Weibo based on SDSMOTE and feature selection Detection of conspiracy propagators using psycho-linguistic characteristics The role of personality and linguistic patterns in discriminating between fake news spreaders and fact checkers Leveraging emotional signals for credibility detection The fog index after twenty years Multimodal fusion with recurrent neural networks for rumor detection on microblogs Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel PHEME dataset for Rumour Detection and Veracity Classification Ensemble Deep Learning on Time-Series Representation of Tweets for Rumor Detection in Social Media Prominent features of rumor propagation in online social media I know things they don't know! Social Psychology Feedback for guiding reflection on teamwork practices Expressing health experience through embodied language Hawkes processes for continuous time sequence classification: an application to rumour stance classification in twitter Detecting rumors from microblogs with recurrent neural networks Rumor detection on twitter with tree-structured recursive neural networks SMOG grading-a new readability formula Gender differences in language use: An analysis of 14,000 text samples Lying words: Predicting deception from linguistic styles A comprehensive low and high-level feature analysis for early rumor detection on twitter What to expect when the unexpected happens: Social media communications across crises The development and psychometric properties of LIWC2015 Linguistic predictors of adaptive bereavement Words of wisdom: language use over the life span The nature of emotions: Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice Psycholinguistic Markers of COVID-19 Conspiracy Tweets and Predictors of Tweet Dissemination Leveraging the implicit structure within social media for emergent rumor detection DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Earlier detection of rumors in online social networks using certainty-factor-based convolutional neural networks Automatic Fake News Detection with Pre-trained Transformer Models Identifying possible rumor spreaders on twitter: A weak supervised learning approach defend: Explainable fake news detection Understanding user profiles on social media for fake news detection How do hostile and emotionally overinvolved relatives view relationships?: What relatives' pronoun use tells us The psychological meaning of words: LIWC and computerized text analysis methods Linguistic style matching and negotiation outcome A review on rumour prediction and veracity assessment in online social network The spread of true and false news online Information credibility on twitter in emergency situation Patterns of temporal variation in online media Emerging rumor identification for social media with hot topic detection Detection and resolution of rumours in social media: A survey Learning reporting dynamics during breaking news for rumour detection in social media