key: cord-0573672-rny7jcxb authors: Wang, Andrea W; Lan, Jo-Yu; Yu, Chihhao; Engineering, Ming-Hung Wang Information Operations Research Group Department of Information; Science, Computer; University, Feng Chia title: The Evolution of Rumors on a Closed Platform during COVID-19 date: 2021-04-28 journal: nan DOI: nan sha: 7dbb2648a6fb41531dfd56b84f9795550db9011a doc_id: 573672 cord_uid: rny7jcxb In this work we looked into a dataset of 114 thousands of suspicious messages collected from the most popular closed messaging platform in Taiwan between January and July, 2020. We proposed an hybrid algorithm that could efficiently cluster a large number of text messages according their topics and narratives. That is, we obtained groups of messages that are within a limited content alterations within each other. By employing the algorithm to the dataset, we were able to look at the content alterations and the temporal dynamics of each particular rumor over time. With qualitative case studies of three COVID-19 related rumors, we have found that key authoritative figures were often misquoted in false information. It was an effective measure to increase the popularity of one false information. In addition, fact-check was not effective in stopping misinformation from getting attention. In fact, the popularity of one false information was often more influenced by major societal events and effective content alterations. Online social media has democratized contents. By creating a direct path from content producer to consumers, the power of production and sharing of information has been redistributed from limited parties to general populations. However, social media platforms have also given rise to the proliferation of misinformation and enabled the fast dissemination of unverified rumors [24] [14] [8] . In 2020, the COVID-19 pandemic put the world in crisis on both physical and psychological health. Simultaneously, a myriad of unverified information flowed on social media and online outlets. The situation was so severe that the World Health Organization identified it an infodemic on February 2020 [26] . According to studies, rumors and claims regarding erroneous health practices can have long-lasting effects on physical and psychological health, and it even interfered with the control of COVID-19 in various parts of the world [2] [23] . In light of the infodemic, several investigations have been carried out to look at the COVID-19 misinformation issue in various aspects. Topics included but not limited to, the types and contents of COVID-19 misinformation [27] [5] , the spread and prevalence of rumors on social media platforms [7] , [13] , [10] , [27] , [19] , [20] , the consequences of misinformation [6] , and the application of machine learning algorithms on rumor analyses [21] [11] . However, the majority of the studies focused on data collected from public social media platforms such as Twitter, Facebook, or Weibo. Explorations on closed messaging platforms, such as WhatsApp, WeChat, or LINE, remained extremely scarce. While popular social media platforms are indeed important targets to study online behaviours and expressions, closed platforms remain an integral place to look at, given its more private settings. Our contribution to the current research is in three ways. First, by investigating COVID-19 messages on LINE, we added to the limited research of COVID-19 rumors on closed messaging platforms [17] [18] . According to the survey by Taiwan Communication Survey in 2018, 98.5% of people in Taiwan used LINE as their primary messaging tool, making LINE the most popular instant message platform in Taiwan. 1 We looked into a dataset of 114, 124 suspicious messages reported by LINE users in Taiwan between January, 2020 to July, 2020 Secondly, we proposed an efficient algorithm that could cluster a large number of text messages according their topics and narratives without having to decide how many groups beforehand. The results were clusters where each one only contains messages that are within limited alterations among each other. Thus, each cluster is one specific rumor. Third, by using the results from the algorithm, we were able to look at the dynamics of each particular rumor over time. To the best of our knowledge, we are the first to study not only how the content of a specific COVID-19 rumor evolved over time but the interaction between content change and popularity. We found that some form of content alterations were successful in aiding the spread of false information. The major findings of this work are three-fold: 2. Fact-check did not effectively alleviate the spread of COVID-19-related false information. In fact, the popularity of rumors were more influenced by major societal events. 3. Key authoritative figures were often falsely mentioned or quoted in misinformation, and such practice helped with the popularity of a message. This paper is organized as followed: we introduced our data in Section 3. Next, we presented the proposed algorithm to cluster text data in Section 4 and subsequently compared the proposed algorithm with other clustering techniques in Section 5.2. Finally we reviewed 3 high-volume COVID-19 false information in Section 5.3. We discussed and concluded this work in Section 6 and 7. In the following sections, we used clusters and groups interchangeably. And we described a group of suspicious messages as one rumor, since belonging to the same group meaning they were seen as one narrative. And then we referred to rumors that are verified false as misinformation or false information. From the inception of the pandemic, several survey studies revealed that people relied on social media to gather COVID-19 information and guidelines [15] [16] . Misinformation on social media has since been a keen interest of the research community. Efforts have been put into studies of true and false rumors on social media [19] . For example, Cinelli et al. compared feedbacks to the reliable and questionable information across five platforms, including Twitter, YouTube, and Gab. The study showed that users on the less regulated platform, Gab, responded to questionable information 4 times more than those on the reliable ones. YouTube users were more attracted to reliable contents, and Twitter users reacted to both contents more equally [7] . Gallotti et al. looked at the how much unreliable information Twitter users were exposed to across countries. While the level of exposure was country dependant, they revealed that the exposure to unreliable information decreased globally as the pandemic aggravated [10] . Machine learning and deep learning techniques have been used to study the topics and sentiments for COVID-19 misinformation [1] . For example, Jelodar et al. used Latent Dirichlet Allocation to extract topics from 560 thousands of COVID-19 Twitter posts and then used LSTM neural network to classify sentiments of posts [11] . By applying Structure Topic Model and Walktrap Algorithm, Jo et al. classified questions and answers from South Korea's largest online forum and discovered that questions related to COVID-19 symptoms and related government policies revealed the most fear and anxiety [12] . Furthermore, by employing a multimodal deep neural network for demographic inference and VADER model for sentiment analysis, Zhang et al. performed a cross sectional study on Twitter users. They found that older people exhibited more fear and depression toward COVID-19 than their younger counterparts, and females were generally less concerned about the pandemic [28] . Previous investigations on rumors indicated that individuals are more likely to believe in questionable statements after seeing repeatedly [4] [3] , and that rumors became more powerful after being shared multiple times [9] . Most studies only look at the broad topics of misinformation. For example, some looked at reliable versus unreliable information [7] [10] [27] , and others employed natural language processing techniques to reduce thousands of social media posts into 10 to 20 groups of topics [1] [11] [12] [7] . Shih et al. instead investigated the content change and temporal diffusion pattern of 17 popular political rumors on twitter [22] . They found that false rumors came back repeatedly, usually becoming more extreme and intense in wordings, while true information did not resurface at all. To the best of our knowledge, there has not been similar study at COVID-19 rumors. In Taiwan, LINE users can voluntarily forward suspicious messages to factchecking LINE bots such as Cofacts 2 or MyGoPen 3 . The bots archive the messages and check against their existing databases. If such message has been fact-checked, the bots would reply with the fact-checked results. We obtained a dataset of 210, 221 suspicious messages forwarded by LINE users to a fact-checking LINE bot between January to July, 2020. The dataset included rumors related to COVID-19 and also some other topics. To do clustering, we preprocessed each message by the following steps: 1. Removed non-Simplified or non-Traditional Chinese Characters. 2. Tokenized with Jieba 4 . 3. Removed tokens that are Chinese stopwords. In the following sections, we focused on longer texts. We only looked at 114, 124 messages having at least 20 tokens. The character distributions is presented in Table 1 . Along with the text content of each reported message, we also obtained the report time of each message and a unique identifier for the LINE user that reported the message. It is to note that the user identifier we received were scrambled, therefore, it was not possible for us to use the identifiers to attribute any message back to any actual LINE user. Alphabets Tokens Min 24 24 0 0 0 20 Median 233 145 7 2 38 58 Max 10012 8132 3252 7014 5532 2971 Table 1 : Characters components of messages having at least 20 tokens. "Others" include characters such as punctuation marks and emojis. In this section, we described our problem and the proposed clustering algorithm. To follow the terminology of Natural Language Processing, in this section we used document to refer to one message in our dataset. Given a set of n documents, we would like to group them into m clusters, of which each cluster are made up of documents very similar in usage of terms, only within a limited degrees of text alterations. Intuitively, we wanted the same cluster to have documents that talked about the same thing in the same way. Note that m is unknown beforehand. For example, given two documents A and B, they should be in the same cluster if the overlapping terms of A and B constitute a large part of both A and B. However, if the overlapping terms make up a large part of A but not B, then they should be in different clusters, because that means B is made up of A and also some other terms. Formally, we defined the terms in a document to be its token set after tokenization. And the distance between two documents A and B to be where tok(·) is the set of tokens of one document. And | · | is the number of elements in a set. 1. Select p × |D T | elements from D T , denoted as D T p , and the rest not selected as set D T q . 3. Feed M into Hierarchical Clustering with distance threshold of λ. We would get back a sequence of numbers L p , where (L p ) i is the label of element (D T p ) i . Elements with the same label are in the same cluster. Since the number itself does not carry meaning, manipulate them so they are all non-negative whole numbers. Denote the updated label set as L p . 5. Train a K-Nearest Neighbors classifier K using the training set (D T p , L p ). And then use K to predict the labels of D T q . Denote the prediction as L q . Output Output is L. The i th element of L, denoted as (L) i , is the label of (D T ) i . Note that the value of the label itself does not carry any meaning. However, elements in D T with the same label belong to the same cluster. We randomly selected 50, 000 messages from the dataset and used pure Hierarchical Clustering algorithm to perform clustering. The messages were separated into 7, 401 groups. The largest group had 1, 082 messages, and the smallest group contained only 1. There were 5, 231 groups with only 1 message, meaning the rest of 44, 796 messages were separated into 2170 groups. There were 12 groups with at least 500 messages. We opted precision, recall and F-score as evaluation metrics. In the sense of information retrieval, precision is the number of correct results returned divided by all results returned from search. Hence, high precision means the predictions are very relevant. On the other hand, recall measures the number of correct results returned divided by the total number of correct results. High recall corresponds to the completeness of returned results. Note that simply by returning all documents, one could achieve 100% of recall, but that will result in very low precision. Therefore, precision and recall need to be taken together to determine the quality of classification. F-score, defined as the harmonic mean of precision and recall, is one such measure that combine precision and recall. We compared speed and performances among 4 models: 1. Hierarchical Clustering only (clustering). The result from this model is considered to be ground truth. 2. Cluster-Classification Model (hybrid). This is our proposed algorithm. 3. Latent Dirichlet Allocation (LDA). Throughout the experiments we used distance threshold λ = 0.6. Both LDA and pca+kmeans clustering required a predefined number of groups, which doesn't really fit out purposes. However, for the sake of comparison, we would use the number of groups outputted by clustering model as input to both models. Suppose the input is tokenized set of k documents D T and the clustering model put k documents into n groups, (g 1 , g 2 , ... g n ). g 1 is the group having largest number of documents and g n the least. Another model M put D T into m groups: (l 1 , l 2 , ..., l m ). We calculated precision, recall and F-score of model M by the following algorithm: Find l k where l k has the most overlapping components with g i ; calculate precision p k , recall r k , and F-score f k of l k by comparing with g i ; r ← r + r k ; In each experiments, we did 5 iterations. In each iteration, we randomly selected k messages from our dataset. We would get 1 precision and recall after each iteration, and we used the results of 5 iterations to calculate confidence intervals. As shown in Figure 3 , the hybrid model greatly reduced the time required especially when p was equal or less than 0.6. Furthermore, the performance metrics remained greater than 99% across levels of p (Figures 4, 5, 6 ). It showed that the hybrid model's assignments of groups were very complete (measured by recall), and that the classification of K-Nearest Neighbors did not introduce too much errors in each group (measured by precision). From Table 3 , we observed that LDA is much slower that other models. Furthermore, the precision was very low, meaning that predicted groups could have many false positives. On the other hand, pca+kmeans were 10 times slower than clustering. While the precision was comparable to that of hybrid methods, recall was only 73%. This showed that pca+kmeans would miss out many transformations of a message. We used hybrid methods with train portion p = 0.4 and distance threshold λ = 0.6 to cluster the whole set of 114 thousands messages. The messages were separated into 12, 260 groups. Among those, 8, 529 groups only had 1 message. Therefore, the rest of 105, 595 messages were separated into 3, 731 groups. The largest group had 2, 546 messages. There were 15 groups with at least 1000 elements. We presented the statistics of group sizes in Table 4 mean std max Q 3 Q 2 min In this section we presented some high-volume suspicious messages related to COVID-19, obtained from the previous section 5.2.5. Academian Zhong, Nan-Shan emphasized repeatedly, 'Do not go outside! Wait until at least the Lantern Festival to assess the situation of the epidemic.' Be warned that even if you're cured, you would suffer the rest of your life. This is a plague worse than SARS. The side effect of the drugs are more severe...This is a war, not a game ... There is no outsider in this war ... First of all, the time-sensitive information in the message evolved with time. At its early stage, "Lantern Festival", on Feb 8 th in 2020, was spotted in the majority of messages. However, on Feb 18 th , we spotted the first message that replaced "Lantern Festival" with "March". Then, after March 10 th , the majority of reported messages used "Mid-Autumn Festival (June 25 th , 2020)". Secondly, the efforts were put to emphasize the authoritativeness from whom the message was quoted. The first form of this message started with quotation from The Main-land Academian Zhong, Nan-Shan, who gained fame during the SARS pandemic in 2003 5 . Other titles, such as "Expert in Pandemic from Mainland China" or "Expert in Coronavirus", were also observed in some transformations. Then later, on Feb 18 th , age was first seen in the message: "Expert in Coronavirus from Mainland China, 78-year-old Academian Zhong, Nan-Shan, emphasized...". Starting March 10 th to March 31 st , almost every message included age. Then starting from April 1 st , every reported message has Zhong replaced by Chen, Shih-chung. As the Director of Taiwan's Central Epidemic Command Center (CECC), Chen's popularity has skyrocketed during the pandemic through his daily press conference. This was also when we observed the highest peaks of the reported messages. Due to the prevalence of this message spreading on web and closed platforms, the Ministry of Health and Welfare as well as CECC sent out a press release and a facebook post 6 7 on April 2 nd , reminding the public that this was a false information. Nevertheless, this did not stop another viral spread of the same message at the end of a four-day long holiday in Taiwan, where crowds were seen in every tourists attraction on the island. For days people were worried that the long-weekend would lead to another outbreak of the pandemic, which explained why the message bearing the key topic "do not go out" would become a big hit. Previous New In this case we looked at the messages that promoted drinking salt water to prevent the coronavirus. In fact, we investigated two messages and the combination of the them (Table 7) . We first observed Message (B) in our dataset on March 16 th . Over the course of its evolution, several medical personnel, such as Director of The Veteran Hospital or Dr. Wang of Tung Hospital (who, in fact, is an Orthopedist), were misquoted. This showed the use of authoritative power to spread this piece of false medical information. The highest peak was on March 27 th , where 265 documents were reported. Around the same time, a small number of Message (A) were also lurking, however, it did not get as much attention as Message (B) before both messages merged into 1 on March 27 th and went viral shortly after on March 30 th (Orange line in Figure 8 ). In fact, Message (B) was fact-checked by Taiwan FactChecking Center 8 rather early, on March 19 th 9 and announced it a misinformation, however, this did not stop the piece from misquoting doctors and continued spreading. As a matter of fact, several translations of Message (A+B) were reported in April, including but not limited to English, Indonesian, Filipino and Tibetan. The lifespan of this "drink salted water" message was rather long, as the another famous fact-checking platform in Taiwan, MyGoPen 10 , released an article to disprove this false medical advice again in October 2020 11 , 7 months after it was first seen in our dataset. This is a 100% accurate information... Why did we see a huge decline of confirmed cases in China during the last few days? They simply forced their citizens to rinse mouths with salted water 3 times a day and then drink water for 5 minutes. The virus would attack throats before the lungs, and when getting in touch with salted water, the virus would die or get destroyed in lungs. This is the only way to prevent the spread of COVID-19. There is no need to buy medicine as there is nothing effective on the market. Before reaching the lungs, the Novel Coronavirus would survive in throats for four days. At this stage, people would experience sore throats and start coughing. If one can drink as much warm water with salt and vinegar, the virus could be destroyed. Share this information to save people's lives. 12 , however, the fact-check did not avoid the message from getting attention. The content started with authoritative tone that announced "We are at the most critical period of COVID-19", and then provided a list of "do's and dont's". While some suggestions made medical sense in terms of hygiene, others didn't 13 . It was not stated explicitly in the message what the critical period was referring to, however, when taking together the listed "guidelines" into account, we could deduce that it hinted at the "critical period to prevent community spread". Community spread (社區感染)is a phase in a pandemic where many people who tested positive in an area cannot be determined how they got infected 14 . It is not hard to imagine that people would be concerned and worried about this significant phase where the risk of getting infected is greatly increased. In fact, we observed that such concerns co-occurred with the spread of this piece of message in February. On February 15 th , 2020, Taiwan's Central Epidemic Command Center (CECC) reported that a taxi driver, infected by a person traveled back from China, was tested positive with the virus. He died on the same day and became the first death case in Taiwan. Over the next 4 days, four of his family members were also tested positive, forming the first COVID-19 cluster in Taiwan. During that time, people's concerns for community spread was looming. In fact, Google trend for search term "社區感染 (Community Spread)" sharply increased on February 16 th (Figure 10) . Also, during this period, the number of the reported messages sharply increased (Figure 9 ). Content-wise, like what we observed in the first two cases, authorities, especially medical personnel, were used in several versions of the same message to "endorse" the content (Table 9 ). We spotted a major revision of the message on Feb 12 th , 6 days after the first report, where the 18 bullets were pruned to 14, and strong words were modified to gentler tone. Last but not least, the message added a signature of "Regards from Medical Association" on the last line. This became the most widespread version afterwards. Out of the 394 documents reported on Feb 停 去 髮 廊。 11.穿 過 的 衣 服(外 套,長 褲),回 家 先 單 獨 吊 在 外2小 時 12.暫 停 戴 首 飾。 13.一 有 接 觸錢幣, Similar to the findings of [25] , we found that fact-check did not effectively alleviate the spread of false information. The popularity of rumors were more associated with major societal events or content changes. In addition to the above 3 case studies, we went through five other COVID-19 related rumors and manually identified common patterns of textual changes in their propagation. First of all, we observed that key authoritative figures were often (falsely) mentioned or quoted. For example, COVID-19 rumors often included medical-related persons, such as doctors or head of CECC. In addition, it was quite common to observe messages having a line or two disclaimers that expressed the uncertainty of truthfulness of the forwarded messages. For example, The following is for your reference only, I do not guarantee the truthfulness of the message. (以下謹提供參考不代表是 否正確) was seen in some messages during propagation. Many messages also included simplified Chinese characters or terms that are rarely used in Taiwan. For example, while in Taiwan, people refer to SARS pandemic as "SARS", a large number of messages use "非典", which is a term more popularly used in China. We also noticed messages that were a merge of other previously independent ones, and messages that included translation to other non-Chinese languages. These characteristics could serve as rules to discover possible false information as early detection mechanism. Although we identified these characteristics manually this time, it is quite possible to employ techniques such as Natural Language Processing to automatically recognize these textual changes in the future, making it possible to have a automatic early warning system of misinformation that does not involve fact-check by professionals. This study had several limitations. First, this data was collected by people's reports. Therefore, it was impossible to infer the true distribution of messages without making some assumptions. That is, if we saw more health-related misinformation in our data, it did not necessarily translate to more health-related rumors circulating in the platform. In fact, it could also be that people were more alerted and skeptical at truthfulness health-related information. In addition, we only looked at text messages, therefore, information distributed visually or in audio was not covered. Lastly, our algorithm to group messages does not work well with short texts. In this paper, we analyzed COVID-19 related rumors on a closed-messaging platform, LINE. We proposed a clustering algorithm that reduced the computational time from exponential to linear time. The algorithm enabled us to investigate the evolution of text messages. In fact, the algorithm enabled the research community to perform large-scale studies on the evolution of text messages at messagelevel rather than topic-level. Similar to what [22] discovered in its study of 17 political rumors, we found that false COVID-19 rumors tend to resurface multiple times even after being fact-checked, and with different degrees of content alterations. Furthermore, the messages often falsely quoted or mentioned authoritative figures, and such practice was helpful for the rumor to reach broader audiences. Also, the resurfacing patterns seemed to be influenced by major societal events and content change. However, each peak of popularity would not last long and it was often without good explanation about how one wave of propagation ended. To the best of our knowledge, this is one of the few works that study COVID-19 misinformation on closed-messaging platforms and the first to study textual evolution of COVID-19 related rumors during its propagation. We would hope that this would further spark more studies in rumor propagation patterns. Top concerns of tweeters during the COVID-19 pandemic: infoveillance study Disaster Medicine and Public Health Preparedness Rumors and health care reform: Experiments in political misinformation The validity effect: A search for mediating variables Types, sources, and claims of COVID-19 misinformation The causes and consequences of COVID-19 misperceptions: Understanding the role of news and social media". en-US The covid-19 social media infodemic The spreading of misinformation online Rumor psychology: Social and organizational approaches Assessing the risks of 'infodemics' in response to COVID-19 epidemics Deep sentiment classification and topic discovery on novel coronavirus or covid-19 online discussions: Nlp using lstm recurrent neural network approach Online information exchange and anxiety spread in the early stage of the novel coronavirus (COVID-19) outbreak in South Korea: structural topic model and network analysis Coronavirus Goes Viral: Quantifying the COVID-19 Cureus 12.3 ( The science of fake news Attitude Toward Protective Behavior Engagement During COVID-19 Pandemic in Malaysia: The Role of E-government and Social Media Knowledge and awareness regarding spread and prevention of COVID-19 among the young adults of Karachi Analysing Public Opinion and Misinformation in a COVID-19 Telegram Group Chat COVID-19 Related Misinformation on Social Media: A Qualitative Study from Iran COVID-19 infodemic: More retweets for sciencebased information on coronavirus than for false information An exploratory study of COVID-19 misinformation on Twitter". en Rumor Detection of COVID-19 Pandemic on Online Social Networks The diffusion of misinformation on social media: Temporal pattern, message, and source Impact of rumors and misinformation on COVID-19 in social media The spread of true and false news online The elusive backfire effect: Mass attitudes' steadfast factual adherence Novel Coronavirus (2019-nCoV) Prevalence of low-credibility information on twitter during the covid-19 outbreak Understanding Concerns, Sentiments, and Disparities Among Population Groups During the COVID-19 Pandemic Via Twitter Data Mining: Large-scale Cross-sectional Study