key: cord-0254124-1g095gjk authors: Sheng, Qiang; Cao, Juan; Bernard, H. Russell; Shu, Kai; Li, Jintao; Liu, Huan title: Characterizing Multi-Domain False News and Underlying User Effects on Chinese Weibo date: 2022-05-06 journal: nan DOI: 10.1016/j.ipm.2022.102959 sha: 67cce0d98db7cc1bc2638e0b4b7238ef04b2af89 doc_id: 254124 cord_uid: 1g095gjk False news that spreads on social media has proliferated over the past years and has led to multi-aspect threats in the real world. While there are studies of false news on specific domains (like politics or health care), little work is found comparing false news across domains. In this article, we investigate false news across nine domains on Weibo, the largest Twitter-like social media platform in China, from 2009 to 2019. The newly collected data comprise 44,728 posts in the nine domains, published by 40,215 users, and reposted over 3.4 million times. Based on the distributions and spreads of the multi-domain dataset, we observe that false news in domains that are close to daily life like health and medicine generated more posts but diffused less effectively than those in other domains like politics, and that political false news had the most effective capacity for diffusion. The widely diffused false news posts on Weibo were associated strongly with certain types of users -- by gender, age, etc. Further, these posts provoked strong emotions in the reposts and diffused further with the active engagement of false-news starters. Our findings have the potential to help design false news detection systems in suspicious news discovery, veracity prediction, and display and explanation. The comparison of the findings on Weibo with those of existing work demonstrates nuanced patterns, suggesting the need for more research on data from diverse platforms, countries, or languages to tackle the global issue of false news. The code and new anonymized dataset are available at https://github.com/ICTMCG/Characterizing-Weibo-Multi-Domain-False-News. Social media are now long established as a daily source of news in many countries around the world, Western or Eastern, developed or developing (Mitchell, Simmons, Matsa and Silver, 2018; Tang, Huang and Wu, 2020) . These platforms facilitate equally the distribution of both reliable news as well as false news (including fake news). The problem with false and fake news on social media is widely documented. These include threats to the economy (El-Boghdady, 2013) , to social order (Wang and Li, 2011; Chen, 2020) , to politics (Fisher, Cox and Hermann, 2016) , and to physical security (Gowen, 2018; BBC, 2020) . Efforts to mitigate the spread of false news by researchers in social, political, and computer science include exploring the characteristics of false news (Vosoughi, Roy and Aral, 2018; Grinberg, Joseph, Friedland, Swire-Thompson and Lazer, 2019; Shu, Wang and Liu, 2018; Del Vicario, Bessi, Zollo, Petroni, Scala, Caldarelli, Stanley and Quattrociocchi, 2016) , detecting false news using machine learning techniques (Castillo, Mendoza and Poblete, 2011; Ma, Gao, Mitra, Kwon, Jansen, Wong and Cha, 2016; Jin, Cao, Guo, Zhang and Luo, 2017; Shu, Cui, Wang, Lee and Liu, 2019a) , and developing the automatic detection and verification system (Zhou, Cao, Jin, Xie, Su, Chu, Cao and Zhang, 2015; Popat, Mukherjee, Strötgen and Weikum, 2018a; Cui, Shu, Wang, Lee and Liu, 2019) . Among these efforts, empirical studies for characterizing false news are fundamental to both reveal the phenomenon and to guide the design of detection methods. Existing empirical studies have examined false news either in general , or in a specific domain, such as politics (Grinberg et al., 2019; Guess, Nagler and Tucker, 2019; Shu et al., 2018) , science (Del Vicario et al., 2016) , health (Ghenai and Mejova, 2018) , and entertainment (Shu et al., 2018) . While comparisons of false news across diverse domains are rare and limited (Nan, Cao, Zhu, Wang and Li, 2021; Silva, Luo, Karunasekera and Leckie, 2021) , one of the findings in suggests the need for domain-level spread analysis: On Twitter, 1 for example, political false news had more effective capacity for diffusion than any other on Twitter (it traveled farther and reached more people than non-political false news). But so far, it remains under-explored how false news in other domains spread, which is important for highlighting and positioning the influence of false news in different domains and guiding the design of detection systems. In this article, we use a new dataset of 44,728 false social media posts from Weibo, 2 the largest Twitter-like social media platform in China, to investigate the capacity for diffusion of false and fake news posts in nine domains: (1) (9) Military. We then explore how user characteristics (such as gender, age, and account type) are related to the spread process and how user emotions and behaviors affect the spread of these false posts. Our contributions are as follows: • Capacity for diffusion. We find that false news on life-unrelated domains generated fewer stories but diffused better than those on life-related domains. Of the nine domains, political false news had the most effective capacity for diffusion. • User effects. We characterize the user effects of the widely diffused false stories: They engaged more males, older users, or verified users. Further, they provoked strong emotions in the reposts and diffused along with false-news starters' active engagements. • Methodology. This work introduces multi-domain analysis, a new perspective to understand the phenomenon of false news. We design the rules to rank the domain-level capacity for diffusion and then observe the user effects by statistical, linguistic, and semantical measurements. • Data. We collect a multi-domain false news dataset from Chinese Weibo, which contains 44,728 false stories of nine domains from 2009 to 2019. To the best of our knowledge, the dataset has the largest amount and is for the longest period for false news research on Chinese social media. The recent attention to this field is largely due to so-called fake news going viral during the 2016 U.S. presidential election (Holan, 2016) . The term "fake news" and other related concepts including rumor, misinformation, disinformation, and false news are used in published studies interchangeably to describe the social media posts we are working with. We rely here on the definitions in and (Zafarani, Zhou, Shu and Liu, 2019) and define false news as any story or claim with a false assertion with unknown intention. A rumor refers to an unverified and instrumentally relevant statement of information spread among people (Shu, Sliva, Wang, Tang and Liu, 2017) and is not germane to our research interest on verified inaccurate information. Fake news and disinformation refer to intentionally false information (Allcott and Gentzkow, 2017; Vosoughi et al., 2018; Shu et al., 2017; Zafarani et al., 2019) where the true intention of creators is hard to know in ex post collection. Misinformation is a broad term, since it includes any inaccurate posts (Mohseni, Ragan and Hu, 2019) , which, therefore, our method for data collection could not cover. False news includes disinformation as well as well-intentioned, but untrue news stories, which is proper in our work. The openness and freedom of access on social media provides researchers with observable, naturally occurring, large-scale data without conducting individual-level interviews or in-lab experiment. Researchers in this arena first collect news posts from the official data interfaces or webpage parsers of social media platforms and label the posts as true or false according to the rating from reliable fact-checking organizations (e.g., Snopes 3 in the United States and Jiaozhen 4 in China). Then the data along with social contexts are analyzed using statistical methods for new findings. According to (Zhang and Ghorbani, 2020) , four major components are involved in false news: creator/spreader, target victims, news content, and social context. As news content and social context are often closely related to fake news detection, we will detail these components along with the detection methods. Here, we introduce the researches on the spread of false news and the involved users. Research on the spread of false news exhibits false news sharing and further influences, especially in important events such as elections and pandemics. Allcott and Gentzkow (2017) found that pro-Trump fake stories were more widely shared on Facebook than pro-Clinton ones before the 2016 U.S. presidential election. Baptista and Gradim (2020) found that fake news is more likely to be shared but got fewer reactions than real news before the 2019 Portuguese election. The network analysis by Memon and Carley (2020) suggested that misinformation on COVID-19 spreads in denser and more organized communities than true information. Instead of focusing on a specific event, our first research question is inspired by , which measured the spread of false news in diverse domains on Twitter from 2006 to 2017 and found that political false news spread faster and deeper than non-political false news. We extend this research to comparing across the nine domains noted above: RQ1) Are there differences in the capacity for diffusion of false news in different domains? We measured the capacity for diffusion of all domains and showed which domain(s) of false news warrant monitoring for real-time detection and mitigation. The creators and spreaders play important roles in false news spread. Yang, Liu, Yu and Yang (2012) found that verified accounts with a large number of friends posted false rumors with a small probability. Rampersad and Althiyabi (2020) found that age has a strong influence on the acceptance of fake news in Saudi Arabia. Grinberg et al. (2019) analyzed respondents' Facebook 5 sharing history during the 2016 U.S. presidential campaign and found strong user effects: Older users shared more fake posts and the super-sharers of fake news sources were disproportionately female and unverified. We are interested in how the user effects will be in an observation across domains: RQ2) How are the demographic factors, like gender, age, and verification status, related to the spread of false news in different domains? Extending this to nine domains can help us understand, at a more granular level, the kind of users who tend to engage in false news and could advance mitigation by guiding the initial filtering of users susceptible to false news in specific domains. Though manual fact-checking by journalists or independent organizations is still the most prevalent method to debunk false news, automatic detection of false news (i.e., predicting the veracity of a given post) by machine-learning techniques has become a promising direction due to its expected high computing efficiency and low labor costs. The methods can be divided into two genres: knowledge-based and appearance-based. (Sheng, Zhang, Cao and Zhong, 2021b) Knowledge-based methods for predicting veracity start by collecting evidence and then applying reasoning, but the sources of evidence are diverse. Comment-based methods employed crowd wisdom for prediction (Ruchansky, Seo and Liu, 2017; Shu et al., 2019a; Tian, Liu, Yang, Lyu, Zhang and Fang, 2020) . For those that have been previously fact-checked claims, debunking articles are used for matching with news posts (Shaar, Babulkov, Da San Martino and Nakov, 2020; Vo and Lee, 2020; Sheng, Cao, Zhang, Li and Zhong, 2021a; Mansour, Elsayed and Al-Ali, 2022) . The scope of evidence is broadened to evidential web articles (Popat, Mukherjee, Yates and Weikum, 2018b; Wu, Rao, Yang, Wang and Nazir, 2020; Wu, Rao, Zhang, Zhao and Nazir, 2021) or contemporary mainstream news (Sheng, Cao, Zhang, Li, Wang and Zhu, 2022) . Instead of obtaining in-the-wild knowledge, recent works leveraged entity background information obtained from knowledge graphs (Cui, Seo, Tabar, Ma, Wang and Lee, 2020; Zhang, Fang, Qian and Xu, 2019; Hu, Yang, Zhang, Zhong, Tang, Shi, Duan and Zhou, 2021) . For multi-modal scenarios, entity knowledge is important to bridge the text-image semantics (Xue, Wang, Tian, Li, Shi and Wei, 2021; Qi, Cao and Sheng, 2021b; Qi, Cao, Li, Liu, Sheng, Mi, He, Lv, Guo and Yu, 2021a; Li, Sun, Yu, Tian, Yao and Xu, 2021) . These methods could provide accurate and explainable evidence, but have the issue of source credibility and scalability. Instead of focusing on what the publisher says, appearance-based methods focus more on how false news looks different from true news. The differences are captured from multiple perspectives such as content styles (Przybyla, 2020; Zhu, Sheng, Cao, Li, Wang and Zhuang, 2022) , emotional signals (Zhang, Cao, Li, Sheng, Zhong and Shu, 2021) , user credibility (Shu, Zhou, Wang, Zafarani and Liu, 2019b) , and audiences' behaviors (e.g., like, repost, and make comments) (Shu et al., 2019a) . Shu et al. (2019b) utilized user profiles that contain metadata on personal pages and inferred demographic features to detect fake news. Our findings on RQ2 can clarify the effects of some key demographic features across the domains and indicate the application areas of such methods. In terms of emotional signals, several independent works Ajao, Bhowmik and Zargari, 2019; Zhang et al., 2021; Solovev and Pröllochs, 2022) found the statistically significant different between fake and real news. For instance, Vosoughi et al. (2018) calculated emotion vectors for reply tweets based on an emotion word lexicon and found that false rumors inspired replies expressing greater surprise and disgust. Solovev and Pröllochs (2022) found that COVID-19 misinformation is more likely to go viral than truthful information, especially the original posts expressing contempt, anger, and disgust. Zhang et al. (2021) performed a significant test between real and fake news on Chinese Weibo using a diverse emotion feature set of the contents and comments and showed that the emotion signals statistically correlate to news veracity. The emotional features were then used to improve the performance of text-based fake news detectors Sheng et al., 2021b) . Our third research question focuses on the role of these emotional signals: RQ3) How are the emotional signals related to the spread of false news among the domains? Our findings would provide a new understanding of the effects of emotion signal at the domain-level. Behavior-based methods modeled the propagation network where nodes are connected based on the user behaviors and captured the unique diffusion patterns of false news as predictors (Ma et al., 2016; Ma, Gao and Wong, 2017; Rosenfeld, Szanto and Parkes, 2020; Lu and Li, 2020; Song, Yang, Chen, Tu, Liu and Sun, 2019; Naumzik and Feuerriegel, 2022) . In this article, we research the only behaviors a user can employ to enlarge the size of cascades, i.e., reposting. The fourth research question, then, is: RQ4) What did the engaged users do that promoted the spread of false news in each domain? We expect to characterize reposting behaviors in domains where posts generally diffuse widely and find the key users that promote the diffusion. Overall, our work on RQ1 measures the spread of false news in different domains, and the rest of the research questions (RQ2, RQ3, and RQ4) expose the relationship between the user engagements and the spread of various kinds of false news. We use data from Weibo because of its richness, its comparability with Twitter, and its accessibility. Weibo has been providing microblogging service in China since August 2009 (Wikipedia, 2020), and is now the largest microblogging platform in the world. On Weibo, users post or repost content in a variety of domains. Since Weibo has a role similar to that of Twitter or Facebook in the United States, which are the main sources of data in some related works (Del Vicario et al., 2016; Vosoughi et al., 2018; Grinberg et al., 2019; Guess et al., 2019; Bovet and Makse, 2019) , a partially aligned comparison for Weibo and U.S. platforms can be performed. Almost all previous studies of Weibo data for empirical analysis (Liu, Jin, Shen and Cheng, 2017; Zhao, Zhao, Sano, Levy, Takayasu, Takayasu, Li, Wu and Havlin, 2020) and detection (Ma et al., 2016; Jin et al., 2017) used false news data collected from Weibo Community Management Center 6 (hereafter, Center), an official platform to deal with user-reported violations of Weibo regulations. Reported posts that contain false information are fact-checked and made public by the platform. However, two biases determine which post is reported: • Exposure bias. Posts from influential users (e.g., celebrities) are exposed more frequently, enhancing collective wisdom for finding inaccuracy. In contrast, posts from little-known users with similar contents may not be noticed and reported as false. • Selection bias. The Weibo platform began operation in August 2009, but the reporting system started in 2012the first false post was reported on May 29, 2012. 7 The lack of false news during the first three years may reduce the confidence of our results to reflect the overall situation. Moreover, we observe that users usually report posts related to their interests or reputations. The last concern is that Weibo does not accept the reports of media accounts' publishing false information (Weibo, 2020) , so the Center data ignore false posts from media accounts. Thus, false news with little influence, with no clear involving user, or published by media accounts may not be well covered in the Center data. To adjust for these biases, we extended the dataset by tracing back from debunking posts or web articles from fact-checking sources (as Twitter researchers did Wang, 2017; Shu, Mahudeswaran, Wang, Lee and Liu, 2020) ), including Zhuoyaoji, 8 Weibo Piyao, 9 Liuyanbaike, 10 Jiaozhen, 11 China Joint Internet Rumor-Busting Platform, 12 Jiangning Police Online, 13 CCTV News, 14 and others. The process introduces multiple sources committed to debunking in different domains to neutralize the selection bias. And the search-and-sift step preserves any false news posts, including those popular and less popular, to tackle the exposure bias. The process was as follows: We crawled data from the two aforementioned sources. For the Center data, we crawled the accessible false news posts from judging webpages. 15 For the data tracing from debunking information, we automatically extracted the query terms with our designed rules (e.g., extract text in quotation marks). For those not matching with the rules, the authors manually extract the query terms. Next, we searched the selected terms on the Weibo Search Engine 16 and manually sifted out debunked false news from returning result lists. We double-checked each false news post and dropped those mistakenly sifted debunking posts and those posts with mismatching judging evidence. We crawled contents, publication date, and user profiles (specifically, gender, age, and verifying status) of each original post and each repost. While we do not guarantee the thoroughness of data collected from the two sources above, the high coverage and richness of the dataset are the best it can be, under the limitations of data access. We end up with 44,728 false news posts, of which 24,690 are from the Center and 20,038 from the Weibo Search Engine, ranging from August 2009 to August 2019. Acquisition. To assign a domain tag to each post in the dataset, we worked out the domain list and the classification criteria. 17 Considering the the appropriateness of granularity and the congruence to the collected data, we identified nine domains: (1) We may use the first word of each domain as its short name for better representation. In the following analysis, the first five domains are classified as daily-life-related -that is, events that have an impact on a person's daily life -while the others are daily-life-unrelated (hereafter life-related and life-unrelated). Human Annotation. Because most debunking posts on Weibo lacked domain tags, we gathered 26 human annotators (graduate students) to code all posts into nine domains. Following the existing researches (e.g., ) and the aforementioned fact-checking sites, we assigned only one domain tag for each post according to its main content. For those related to multiple tags, we labeled them by their key elements of interest. Consider the post: A Chinese military singer on active service named Dawei Jiang has been naturalized in the United States. 18 The post could be multi-labeled as Dawei Jiang is related to both military (a member of Chinese Army) and entertainment (a famous singer). However, his identity as a member of Chinese Army is the focus in this event, so the post was included in the domain Military in our dataset, not Entertainment. In our workflow for annotation, the first author carefully labeled a randomly sampled subset of data containing posts in all domains (about 500 posts). Before the formal annotation, all annotators participated in a pilot annotation test. We showed 100 posts (selected from the first-author-annotated subset) and asked the annotators to assign domain labels. The annotators would go to the formal annotation with a hit rate higher than 0.8. As there could be different false news posts that were related to the same event (e.g., an earthquake), we first ran a K-means clustering based on TF-IDF vectors on the whole dataset. Then we split the annotation batches in each cluster to (ideally) let news posts in one batch be more likely to be in the same domain. An annotator could scan the batch in the same annotation page to speed up. The Cohen's Kappa coefficient is 0.76, which indicates good agreement. The first author carefully annotated the remaining posts, including those that were skipped during the annotation and those raising inter-annotator disagreement. We also randomly checked the posts in the same batches as the skipped or disagreement posts to improve the data quality. The Weibo dataset consists of 44,728 false news posts, 24,690 from the Center and 20,038 from the Weibo Search Engine, ranging from August 2009 (when Weibo started operation) to August 2019. The posts were published by 40,215 users and reposted~3.4 million times. For each post, the contents, publication date, repost lists, and user profiles are attached. Note that Weibo allows a user to repost an original post multiple times with or without comments or replying text, so the repost lists are analogous to but not identical to those on Twitter. Table 1 shows the domain-level distribution of false news posts on Weibo. The life-related domains are dominant domains on Weibo (79.8%), which is in line with the finding that many false posts on Weibo are related to people's general concerns (most are life-related) (Xiao and Chen, 2020) . Political false news accounts for only 5.7% on Weibo. This is quite different from the Twitter data in the U.S. , where politics is with the largest amount and more posts are from life-unrelated domains such as business and science. These differences may bring us statistical findings different from existing ones based on the Western data, which will be explored in the following sections. To answer RQ1), we need to define and calculate domain-level capacity for diffusion. Before that, we introduce how to represent a cascade and measure its capacity for diffusion. Then we evaluate the domains by aggregating cascade-level scores. The spread of a news story on social media forms a cascade with the original post (published by the original user, i.e., starter) and reposts connected by reposting. However, raw data does not provide tree-structure cascades. We exemplify a cascade with Figure 1 . If a Weibo user reposts an original post from that reported a hostess's sudden death on a live show and said: "She is so young!" this repost will be displayed as : she is so young! (by default, no original post follows). If reposts this repost and said nothing, the repost of repost will be : //@ : she is so young! We pre-process the double-slashes format into a tree structure by string split. As the cascades are tree-structured, key attributes of trees could straightforwardly serve as the measurement of the diffusion of false news posts. Here, we use the widely used indicators to characterize tree structure, that is, size, maximum depth, and maximum breadth, to respectively indicate how many times that Weibo users participated in, how fierce the discussion was and how many engagements were individually triggered. Further, we consider the indicator, number of engaged users, to see the number of unique users. Here are the definitions and illustrations: • Size: The number of posts in a cascade. Because there is only one original post in a cascade, the size equals the number of reposts plus one. In Figure 1 , 's original post is reposted by , , and itself, so the size of this cascade is 4. • Maximum depth: The number of posts on the longest reposting path from the original one in a cascade. In practice, we recorded depths for each repost in a cascade. Thus the maximum of those recorded values was exactly the maximum depth. The maximum depth in Figure 1 is 3, i.e., → → . • Maximum breadth: The maximum number of posts at any depth in a cascade. In practice, we obtain the frequency of each depth in a cascade and use the maximum of the frequencies as the maximum breadth. Since there is two reposts at the depth of 2, the maximum breadth in Figure 1 is 2. • Number of engaged users: The number of users engaging in a cascade, i.e., those having published at least a (re)post in the cascade. This indicator will be 3 in Figure 1 as , , and have engaged in it. We aggregated the cascade-level results to obtain domain-level capacity for diffusion. For each domain, we first drew the Complementary Cumulative Distribution Function (CCDF) of cascades represented in the four indicators, as shown in Figure 2 . Next, we calculated the areas under CCDF and added the areas (which were normalized within the nine areas of each indicator) up. The summation of normalized areas scores domain-level capacity for diffusion, as shown in Table 2 . With the scores, we obtained the rankings for capacity for diffusion: Table 11 , with one exception (false news in Science & Technology), false news in life-unrelated domains has a more effective capacity for diffusion than that for life-related ones. That is, some life-related false posts were not as influential as those in life-unrelated domains. Notably, of the nine domains, cascades on political false news are the largest (Figure 2 d)). In other words, despite the difference in the quantity of political false news on Weibo and Twitter , the capacity for diffusion of political false news is highly similar. Engaging with a story requires users to have a personal interest and to react to some immediate feelings about the story. It also, perforce, involves interaction with other users. To answer RQ2, RQ3, and RQ4, we explored these user effects based on user characteristics, emotions, and behaviors. We focus on three basic and accessible user attributes: gender, age, and account type. Gender Figure 3 shows the gender distribution of false-news starters for each of the nine major domains. The gender ratio for all users on Weibo is male ∶ female = 57 ∶ 43 (Sina Weibo Data Center, 2019) . This is shown by the vertical white line in Figure 3 Age To filter out users with unreliable ages, we simply ignored posts from users who claimed to be under 6 or over 100 years old. We also excluded the verified organizational users. Figure 4 shows the age distribution of false-news starters for each of the nine major domains. During the 2016 U.S. presidential election campaign, users over 65 were more likely to share articles of political false news (Guess et al., 2019) . We label this a seniors-attracted tendency. In the Weibo data, we observed a similar but less pronounced seniors-attracted tendency for posts on Politics and Military matters. Data Center, 2016) , are more credible. Although, as described in Section 3, we did our best to collect less popular false posts (which were more possibly published by unverified users with fewer followers) as Section 3.2 described, the verified users spread false news posts over its account type proportion. This corresponds to the finding on Twitter . In the verified users, individual users published more false posts than organizational users across all domains except Science & Technology. In a sense, verified individuals, who were influential (that is, generally had more followers than unverified ones) but lacked professional information source and rigorous editorial process, contributed the most to the spread of false news. In contrast, while verified organizational users generally have advantages in their access to information sources and have more incentives to control the quality of published content, they still published more than 10% of false news posts. To evaluate the organizational users' ability to distinguish false stories, we analyzed their belief in what they reposted 20 . Specifically, we focus on the role of users who represent six verified organizations: police, government (gov.), media, schools, companies (biz., excluding media run as companies), and social organizations (social org.). 21 These organizations are generally credible in the eyes of Chinese citizens especially those state-run (Lu, Jiang, Lu, Naaman and Wigdor, 2020) , so posts labeled as coming from these organizations are followed by many users. We 20 We did not observe the original posts here because they could only provide evidence on users' inability to tell truth from falsehood, not the opposite. 21 Police here refers to the government-run public security departments. We separated Police from Government because one of its functions is to defend cyberspace security (State Council, P.R.C., 2014), including combating fake news (e.g., punish false rumor spreaders after Tianjin blast (Xinhua News Agency, 2015)). evaluated the content in a repost by an organizational user as an indicator of whether it believed the false news story being passed on. If no disbelief is expressed, we infer that the user believes the story and is motivated to help spread it. In contrast, if the added content expresses disbelief or doubt, then we infer that the user reposts as a way of mitigating the potential misconceptions of others regarding the veracity of the story. We classified the reposts of the engaged organizational users into five classes: believe, debunk, do not believe (for short, DNB), doubt, and unknown (i.e., neutral or unrelated content). A repost with no added content is labeled as believe by default, except when the user reposts a debunking repost. As Table 5 shows, 85.0% of all reposts from users associated with one of the six organizations ( = 18, 057) showed belief in reposts of false news, with a range of 70.9% for users associated with the media to 88.4% for users associated with companies. Summing debunk, DNB, and doubt, the media had by far the highest rate of disbelief (26.0%), close to the disbelief rate (23.1%) found on Twitter for news organizations (Li, Liu, Fang, Nourbakhsh and Shah, 2016) . By comparing the proportions of the nine domains' posts in the fooling-organization list and all false (see Table 6 ), we found that organizational users were slightly more likely to be fooled by false news on Politics, Finance & Business, and Society & Life. At 26.0% disbelief, however, even the most skeptical of the users (those associated with the media) were highly likely to be taken in by false news posts. The result is not in line with the finding that journalists are more likely to deny false rumors on Twitter during three crisis events (Starbird, Dailey, Mohamed, Lee and Spiro, 2018) . We argue that institutions, like the Police and the Media, which generally enjoy solid reputations among citizens for their authority (Metzger, 2007) , especially in China , are not skillful at maintaining the reliability of what they repost. This may facilitate the diffusion of false news instead of containing it. Inspired by , we evaluated the emotions in the publishers' contents and engaged users' responses respectively in each domain. We adopted the affective lexicon ontology database (Xu, Lin, Pan, Ren and Chen, 2008) curated by Dalian University of Technology, China. This database comprises 27,467 Chinese words, each of which is manually classified into one of seven emotion types with intensity: joy, like, anger, sadness, fear, disgust, and surprise. This list represents a Chinese adaptation to the original list of six universal emotion types from Eckman (Eckman, 1972; Ekman, Friesen, O'sullivan, Chan, Diacoyanni-Tarlatzis, Heider, Krause, LeCompte, Pitcairn, Ricci-Bitti et al., 1987) . It adds "like" to that list. 22 For each (re)post, we first segmented the texts by using the Chinese lexical analyzer, THULAC (Sun, Chen, Zhang, Guo and Liu, 2016) , and then recorded the intensity of the corresponding emotion if a word matched with an affective lexicon ontology from (Xu et al., 2008) . We used a seven-dimensional vector (denoted as e) to record the intensities of each kind of emotion and then normalized each entry in the vector with the summation of intensities. For example, e = [0, 0, 0.2, 0, 0.5, 0.3, 0] means that the text expressed 20% anger, 50% fear, and 30% disgust. We finally ranked the nine domains according to the average intensity in each emotion. The left and middle part of Figure 5 show the domain-emotion heatmaps with the intensity ranks of the original posts and reposts, respectively. We obtained the whole ranking by averaging out the emotional intensities of all the original posts (or reposts) across domains and compared it with the ranking of capacity for diffusion (Section 4.2) in the right part of Figure 5 . The whole emotional intensity of the reposts was more related to the capacity for diffusion ( = 0.61, = 0.08), compared to that of the original posts ( = 0.05, = 0.90). This indicates that while the original posts may not be that emotional, they are good at provoking the emotion of engaged reposting users. At the domain-level, false news on Politics and Finance & Business showed weak emotion in the original posts, but inspired strong emotion in the reposts, while the opposite was the case for posts on Military, Science & Technology, and Health & Medicine. At the emotion-level, false news in the domains with more effective capacity for diffusion provoked more disgust, anger, and like in the reposts ( = 0.88, 0.70, and 0.60 respectively). Vosoughi et al. (2018) found that false news inspired responses expressing greater surprise and greater disgust on Twitter and Han, Cha and Lee (2020) found anger contributed to the spread of COVID-19 misinformation. Here we had the consistent finding on disgust and anger, but surprise was almost unrelated to the capacity for diffusion ( = −0.13 in the original posts, 0.10 in the reposts). Please refer to Table 11 and Table 12 in A for details. We conclude that, while false news items may provoke strong emotions like disgust and anger, and therefore, high item capacity for diffusion, the content of the original posts does not need to be emotional. Besides statistical observation of reposting (Section 4.2), we focus here on how specific reposting behaviors promoted the spread. In our data, we observed two special behaviors that could significantly increase the size of cascades and thus promote the spread: (1) Reposting in high frequency. One might repeatedly repost an original post to maximize its visibility in different time period and towards different user groups; and (2) Reposting as a reply (i.e., replying to an engaged user while reposting). One might argue with other users by replying again and again to a (re)post and thereby promote the spread of a story. To measure the two behaviors, we designed the following measurements: • Cascade concentration score: the proportion of reposts that is not a user's first engagement in one cascade. It evaluates the users' level of engagement. The equation is where c is the score and # is short for "the number of". • Number of replies: the number of reposts that start with "Reply to @user" where user represents the user to be replied to. It evaluates the level of interactions among the engaged users. Table 7 shows the cascade concentration score and number of replies in the nine domains. Reposts of false news in the domains where posts were generally widely diffused were concentrated and interactive. At 2.60, political false news had the highest concentration score among the nine domains. That is, to achieve the same number of reposts, political false news needed fewer engaged accounts on average than did that in other domains. There are 0.793 replies across all domains, but 1.881 on average for military false news in a single cascade, 2.4 times the overall average and 7.2 times the 0.259 figure for false news on Science & Technology. High concentrated and interactive spread of false news suggests there might be some users playing a special role in promoting the spread. Here we investigate the role of cascade starters (i.e., the users publishing the original posts) because publishers usually tend to attract more people and persuade the reposters to believe the content. Counting only cascades with at least one repost, political false news has the highest starter engagement of all nine domains. Fully 18.89% of cascade starters reposted the original post in Politics and top 1.87% starters reposted at least 10 times. Starters in other domains also engaged in cascades, but the engagements are not comparable to that for Politics (Table 8) . Though starters' reposting essentially promoted the spread, there might be starters reposting for correcting false claims instead of convincing others. However, as far as we know, no method can be directly applied to understand starters' real motivation with only digitally collected social media data. Here, applying the results of the content analysis on the posts of organizations, if a starter reposts a false news story without expressing any disbelief, we take that as prima facie evidence of the starter's motivation. For starters who reposted their original posts, we classified their reposts into four classes by merging the category of debunking into do not believe (DNB) because we could not expect the starters to debunk effectively. In 2,280 labeled cascades (5.1% of the total, 10.7% of those reposted), only 12.5% of starters reposted at least once for expressing disbelief (DNB and doubt) regarding the content of the original posts, indicating that they might not post false news deliberately at the start of a cascade. The remaining starters, who always reposted with belief, tried to promote the spread of false news. At the domain level, the disbelief rate (here, the proportion of starter-engaged cascades in which the starters show disbelief) for false news on Politics, Military, and Science & Technology was low (from 15.09% to 18.00%), while it was high for Education & Examinations, Disasters & Accidents, and Culture & Sports & Entertainment had high disbelief rates (from 36.51% to 42.86%) (Table 9 ). When a starter continues reposting the false news post published by itself with little disbelief, it may be promoting other users to discuss the story, thus achieving the goal of spreading. By observing 333 reposts without disbelief by the highly engaged starters, we found these starters often repost by thanking users who share the same opinion, replying to comments that challenge the story, or using techniques of neutralization Table 10 Examples of main types of starters' reposts without disbelief. The text in boldface represents the latest reposts from the starter at that moment. We replace the three attached pictures in the original post with text here and anonymize all usernames. The original post is omitted due to the space limit and the reposts are translated into English. Example Cascade Explanation thanking the users who share the same opinion Thanks for your comments! // @UA: Reply to @Starter: As a national civil servant, he raped a woman by taking advantage of his position, causing her to be pregnant and give birth to a daughter. Want to get away with it? Take my advice: You'd better be responsible for the mother and daughter! // @Starter: Repost // (The original post) User UA expressed agreement to the original post. As a reply, the starter reposted and thanked UA. replying to the comment which challenges the story The girl took those photos, just in case! // @UB: How could she take photos when being drunk? // @UC: // @UD: Repost // (The original post) User UB questioned about how the attached photos, which recorded the rape, was taken. Then, the starter gave its (unconvincing) explanation. The starter was reported publishing false news because the attached photos were from another story. Then the starter reposted and tried to shift the responsibility (to Baidu) and divert readers' attention. The starter used the technique of neutralization named denial of responsibility. like denial of responsibility (Sykes and Matza, 1957) . Examples are shown in Table 10 . This provides an alternative hypothesis about why false news in domains that comprise a small proportion of the total, like Politics and Military, have effective capacity for diffusion: The stories may inspire the starters to actively promote the spread process. Our research is to understand the role of domain in the spread of false news. We performed our research in two steps: We first measured the capacity for diffusion of false news in each domain (RQ1), and then explored the related factors in user characteristics, emotions, and behaviors (RQ2, RQ3, and RQ4, respectively). In this section, we first answer the RQs and analyze them based on our key findings in the context of existing studies. Next, we introduce how our findings can help improve practical systems on this issue and recommend several future research directions. The capacity for diffusion of false news varied from domain to domain. False news in life-unrelated domains diffused more effectively than that in life-related ones and political false news had the most effective capacity for diffusion. However, the ranking of capacity for diffusion and that of the amount are quite different: Life-related false posts are more than life-unrelated ones and Politics ranked the third to last, which is aligned with the finding that Chinese users did not perceive much political false news and concerned with false news relevant to their daily life or well-being . The discrepancy can be explained in two ways. On the one hand, ordinary users hardly have the source of life-unrelated rumors beyond social media. Even if they do, they may be too cautious to publish them with concern about the possible punishments . On the other hand, reposting is a behavior with much fewer consequences because the platform only punishes who publishes false stories (the root user of a cascade) (Weibo, 2012) . RQ2) How are the demographic factors, like gender, age, and verification status, related to the spread of false news in different domains? We found a slight age and gender effect: Male, older users were more likely to publish false news in domains where posts were generally widely diffused, while female, younger users were more related to others. As to the account type, the proportion of false news posts published by verified users and the reposts they led to largely exceeded its proportion in all users. Verified organizational users mostly reposted false information with belief, indicating their inability to recognize the falsehood. Dangers arose since verified users, individual or organizational, lacked fact-checking ability that matched their influence. Generally, an organizational account is managed by human teams (mostly employees in the public relationship or marketing departments). In this sense, the "inability" may be caused by two factors: the employee's carelessness and the absence of an internal editorial process. A recent case confirmed our assumption to some extent: A governmental employee mistakenly used the official account of the local earthquake agency to repost an entertainer-related post because the employee forgot to change to the personal account (Jiemian News, 2020) . Without a necessary internal editorial process before (re)posting a message through a verified organizational account, these accounts will inevitably face a risk of encountering reputation crises. RQ3) How are the emotional signals related to the spread of false news among the domains? We found that the emotional signals in user responses (reposts) were more related to capacity for diffusion of false news than those in contents (original posts). On the one hand, with high-arousal emotions, users tend to repost or comment, leading to the posts go viral (Berger and Milkman, 2012; Chuai and Zhao, 2020) . However, on the other hand, it is unnecessary to use emotional language to arouse readers' strong emotions. For example, false stories related to controversial objects or persons could provoke emotional responses even with no emotional words included. RQ4) What did the engaged users do that promoted the spread of false news in each domain? User engagements were more concentrated and interactive in false news cascades in domains where posts were generally welldiffused. In the engaged users, false news starters were more proactive in interacting with other users. Most starters' reposts were to attract more people to read or convince those skeptical engaged users, not to debunk and mitigate the spread, especially in the domains where posts were generally widely diffused. This finding bridges a connection between false news spread and controversy arising-the behaviors of engaged users promote the spread of false news by making the discussion more controversial. The findings in this study confirm or contradict the conclusions of existing studies, which are summarized as follows: 1) We derived a similar diffusion capacity ranking (life-unrelated ">" life-related) on Chinese Weibo data as that on English Twitter data , though the distribution of false news posts is quite different (e.g., Twitter > Weibo in terms of %Politics). This validates that false news in domains such as Politics and Finance is consistently more likely to incite readers to repost, regardless of the context of languages and perception levels. 2) We provide new evidence that male, older users were more likely to publish political false news, as Grinberg et al. (2019) did before the 2016 U.S. presidential election, based on a much longer period without targeting a specific political event. However, this trend did not hold true in the picture of nine domains. We found that female, younger users contributed more to the domains such as Health, Society, and Education. This indicates that the connection between false news publishing and user gender or age is not constant but depends on domains. 3) We observed a counter-intuitive phenomenon that verified users were actually vulnerable to false news. Our results showed that the verified users did fail in veracity judgment of social media posts with a high probability and thus, were not as credible as expected. This warns us to reevaluate the role of verified users in false news detection (Shu et al., 2019b; Yang et al., 2012; Lu and Li, 2020) . 4) We found that the strong emotions in reposts rather than original posts were more related to false news spread. Unlike existing empirical works that observed either original posts (Solovev and Pröllochs, 2022) or reposts for differentiating true and false news, we relate them with the false news spread across domains for comparison. By comparing with existing findings, we found that the emotions such as anger and disgust might constantly serve as a motivating factor for the spread of false news. 5) We highlighted the property of well-diffused false news cascades: concentrated, interactive, and starter-proactive. Unlike existing network-based methods which focus on community property (e.g., (Jin, Cao, Zhang and Luo, 2016; Zhou and Zafarani, 2019) ), we provide a new observation of reposting behavior itself. Typically, a false news detection system (Zhou et al., 2015; Cui et al., 2019; Samarinas, Hsu and Lee, 2021) has the following procedures: • Suspicious news discovery: Collects news posts that are suspicious as candidates (Zubiaga, Aker, Bontcheva, Liakata and Procter, 2018; Hassan, Arslan, Li and Tremayne, 2017) . Often formulated as a ranking task (Shaar, Hasanain, Hamdan, Ali, Haouari, Nikolov, Kutlu, Kartal, Alam, Da San Martino, Barrón-Cedeño, Miguez, Beltrán, Elsayed and Nakov, 2021). • News veracity prediction: Use fake news detection methods (mostly from multiple perspectives) to predict news veracity. • Display and explanation: Show the abnormal elements (e.g., propagation network, questions from the comments, and contradiction with known facts) to explain to users why the news might be false. Our findings indicate the necessity of dividing and conquering false news in different domains in the detection system. The suggestions are as follows: 1) For suspicious news discovery, prioritize news from domains where posts are generally well-diffused. In practice, the step to "find out" candidate news often takes more resources than the veracity prediction; thus, a careful design of the initial filtering strategy is important for maintaining good scalability and efficiency. Unfortunately, very few works provide guidance. We reveal that life-unrelated false news has a more effective capacity for diffusion than life-related. Thus, a system with limited computing resources should prioritize suspicious posts in domains such as Politics, Military, and Finance & Business, or set a higher frequency of fetching news in these domains. 2) For news veracity prediction, integrate models' outputs with awareness of domains. The existing system tends to use a single model or the integration of a model set for all news posts. Our findings on the differences among domains suggest the challenge of "one solution for all domains." For example, many health-related false news posts had weak diffusion, and thus few engaged users, so the propagation-based or user-based models may fail to capture useful signals and judge them incorrectly. In this sense, a wise solution is to fact-check against external knowledge bases (Cui et al., 2020) . In contrast, political false news posts may spread widely but the truth might be unknown for now. Prediction based on propagation networks would be more practical. To enrich the result page, existing systems often list the properties of the given post such as the attributes of engaged users. We argue that this is not informative and helpful to let the audience know the effects of the numbers. Given that the user effects were quite different among domains, we suggest that these properties should be benchmarked with the statistics of all historical posts in the same domain. For example, adding a note like "over % of false news in [domain]" below the number of how many verified users engaged in. Highlighting the role of domains in false news research Our findings on the spread and user effects of false news in nine domains uncovered that there existed common and unique features for false news in different domains. It indicates that some findings in a specific domain or event may not generalize to other domains and those on mixeddomain data may ignore unique characteristics of less popular domains. Therefore, to have a clear picture of false news, considering the role of domains in future research is highly recommended. Providing a partial solution to infer the beliefs and motivations with limited data We analyzed the comments in reposts of organizational users and starters to infer their beliefs (and belief shifts if applicable) to false news posts. The results unveiled the organizational users' lacking ability to recognize false news and the starters' most willingness to promote the spread. Though our comment-based inference is only effective when the textual or behavioral signals exist, it provides the way to infer users' minds and motivations when no more psychological information is provided. We analyzed the shortcoming of existing Weibo false news datasets and proposed that retrieving false news posts that scattered in the platform was important to mitigate selection and exposure biases of data collection. In this way, we collected a new Weibo false news dataset containing false events excluded by the Center data. Our bias-mitigated collecting method can be a reference for future works. Mixed Factors Along with the report on the false news on Weibo, we have compared some of our findings with existing ones on Twitter and Facebook. Although we found an interesting phenomenon that political false news on both Weibo and Twitter had the most effective capacity for diffusion, attracted similar user groups in terms of age and gender, and provoked similar emotions, differences existed in other aspects such as amounts and emotions. Our comparison suggested that although there were more relevant studies, the findings based on the U.S. social media data was not general enough to be a global proxy. Indeed, multiple variables such as country (China vs. the U.S.), language (Chinese vs. English), and platform (Weibo vs. Twitter/Facebook) led to the differences in several ways. However, the respective effects of sociocultural backgrounds, language use, and platform managements were hard to measure from these data. To know the influence of these factors, in-lab experiments with variable control or data from more diverse platforms are needed. We combined statistical analysis and content understanding to obtain meaningful results, some of which were strongly related to users' internal states, such as starters' and organizational users' beliefs to false news. Our analysis may be limited by the existence of explicit textual signals and annotators' understanding of texts. Considering that conducting a user survey or interview after, what for original users would be 12 years of activity, is unfeasible, follow-up surveys or in-the-moment interviews (e.g., just after a user is told of having published a false news post) are potential methods to further observe the phenomena reported in this retrospective study. We performed analysis on multi-domain false news on Weibo from 2009 to 2019. On Weibo, political false news, though few, has the most effective capacity for diffusion. Broadly, life-unrelated domains have more effective capacity for diffusion than life-related domains on Weibo, though the number of the latter exceeds that of the former (79.8% vs. 20.2% respectively). Our observations of user effects show that a widely diffused false news post on Weibo is associated strongly with certain types of users (male, old, or verified users), provokes strong emotions in the repost list, and evokes more replies in a limited group with the starter's promotion. However, the gender and age effects were mostly due to the news consumption preferences of gender and age groups. Based on our findings, we highlight the roles that the domain plays in practical false news detection systems. We made suggestions on the pipeline design including suspicious news discovery, veracity prediction, and proper display. Our findings also point to issues for further research on false news in China and other countries: First, in addition to political false news, we need more research in other domains, because some findings in Politics may not apply to others. Second, we advocate for more focus on users who have special roles, like starters and verified users. Third, false news on social media is a global issue. Comparing false news on the U.S. and Chinese platforms is a start, but it is clear from our analysis that no single platform can serve as a global template for understanding and further mitigating false news. More work on diverse platforms will help us determine features of false news that are common across countries and languages and the unique ones as well. This, in turn, will help us all in facing the challenge of false news effectively. Average intensity of each domain in seven types of emotions by measuring the original posts. Spearman coefficients between the emotions of the original posts and capacity for diffusion in the nine domains are: 0.40 (Disgust), -0.23 (Like), 0.30 (Anger), 0.20 (Sadness), -0.13 (Surprise), -0.33 (Joy), and -0.65 (Fear). Tencent Rumor Governance Report, and The spread of true and false news online Sentiment aware fake news detection on online social networks Social media and fake news in the 2016 election Online disinformation on facebook: the spread of fake news during the portuguese 2019 election Bangladesh lynchings: Eight killed by mobs over false child abduction rumours Influence of fake news in twitter during the 2016 us presidential election Information credibility on twitter Coronavirus rumors trigger irrational behaviors among chinese netizens Anger makes fake news viral online DETERRENT: Knowledge guided graph attention network for detecting healthcare misinformation defend: A system for explainable fake news detection The spreading of misinformation online Universal and cultural differences in facial expression of emotion Universals and cultural differences in the judgments of facial expressions of emotion Market quavers after fake ap tweet says obama was hurt in white house explosions. The Washington Post Fake cures: User-centric modeling of health misinformation in social media As mob lynchings fueled by whatsapp messages sweep india, authorities struggle to combat fake news Fake news on twitter during the 2016 us presidential election Less than you think: Prevalence and predictors of fake news dissemination on facebook Anger contributes to the spread of covid-19 misinformation Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster lie of the year: Fake news Compare to the knowledge: Graph neural fake news detection with external knowledge Official microblog of gansu huixian earthquake agency apologized for publishing information relevant to karry wang and jackson yee Multimodal fusion with recurrent neural networks for rumor detection on microblogs News verification by exploiting conflicting social viewpoints in microblogs Entity-oriented multi-modal alignment and fusion network for fake news detection User behaviors in newsworthy rumors: A case study of twitter Do rumors diffuse differently from non-rumors? a systematically empirical analysis in sina weibo for rumor identification Gcan: Graph-aware co-attention networks for explainable fake news detection on social media The government's dividend: Complex perceptions of social media misinformation in china Detecting rumors from microblogs with recurrent neural networks Detect rumors in microblog posts using propagation structure via kernel learning Did I see it before? detecting previously-checked claims over twitter Characterizing COVID-19 misinformation communities using a novel twitter dataset Making sense of credibility on the web: Models for evaluating online information and recommendations for future research Publics globally want unbiased news coverage, but are divided on whether their news media deliver Open issues in combating fake news: Interpretability as an opportunity MDFEND: Multi-domain fake news detection Detecting false rumors from retweet dynamics on social media Credeye: A credibility lens for analyzing and explaining misinformation Declare: Debunking fake news and false claims using evidence-aware deep learning Capturing the style of fake news Improving fake news detection by using an entityenhanced framework to fuse diverse multimodal clues Semantics-enhanced multi-modal fake news detection Fake news: Acceptance by demographics and culture on social media A kernel of truth: Determining rumor veracity on twitter by diffusion pattern alone CSI: A hybrid deep model for fake news detection Improving evidence retrieval for automated explainable fact-checking That is a known lie: Detecting previously fact-checked claims Overview of the CLEF-2021 checkthat! lab task 1 on check-worthiness estimation in tweets and political debates Zoom out and observe: News environment perception for fake news detection Article reranking by memory-enhanced key sentence matching for detecting previously fact-checked claims Integrating pattern-and fact-based fake news detection via model preference learning dEFEND: Explainable fake news detection Fakenewsnet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media Fake news detection on social media: A data mining perspective Understanding user profiles on social media for fake news detection The role of user profiles for fake news detection Embracing domain differences in fake news: Cross-domain fake news detection using multimodal data Moral emotions shape the virality of covid-19 misinformation on social media Ced: Credible early detection of social media rumors Engage early, correct more: How journalists participate in false rumors online during crisis events Ministry of public security Thulac: An efficient lexical analyzer for chinese Techniques of neutralization: A theory of delinquency Annual Report on Development of New Media in China Qsan: A quantum-probability based signed attention network for explainable false information detection Where are the facts? searching for fact-checked information to alleviate the spread of fake news Radiation fears prompt panic buying of salt liar, liar pants on fire": A new benchmark dataset for fake news detection Weibo community management regulations (trial) Evidence-aware hierarchical interactive attention networks for explainable claim verification Category-controlled encoder-decoder for fake news detection Misinformation in the chinese weibo 197 punished for spreading rumors about stock market, tianjin blast Constructing the affective lexicon ontology Detecting fake news by exploring the consistency of multimodal data Automatic detection of rumor on sina weibo Fake news research: Theories, detection strategies, and open problems Multi-modal knowledge-aware event memory network for social media rumor detection Mining dual emotion for fake news detection An overview of online fake news: Characterization, detection, and discussion Fake news propagate differently from real news even at early stages of spreading Proceedings of the 24th International Conference on World Wide Web Network-based fake news detection: A pattern-driven approach Generalizing to the future: Mitigating entity bias in fake news detection Detection and resolution of rumours in social media: A survey The authors thank Carole Bernard, Xirong Li, and Amrita Bhattacharjee for their proofreading and feedback on the manuscript. The