key: cord-0603053-9vwg2jly authors: Cui, Hao; Kert'esz, J'anos title: Attention dynamics on the Chinese social media Sina Weibo during the COVID-19 pandemic date: 2020-08-10 journal: nan DOI: nan sha: bf32f5647aef51af2780c9e6e14f06d679227537 doc_id: 603053 cord_uid: 9vwg2jly COVID-19 was first detected in Hubei province of China and has had severe impact on the life in the country since then. We investigate how this epidemic has influenced attention dynamics on the biggest Chinese microblogging website Sina Weibo in the period December 16, 2019 - April 17, 2020. We focus on the real-time Hot Search List (HSL), which provides the ranking of the most popular 50 hashtags based on the amount of Sina Weibo searches on them. We show, how the specific events, measures and developments during the epidemic affected the emergence of new hashtags and the ranking on the HSL. A significant increase of COVID-19 related hashtags started to occur on HSL around January 20, 2020, when the transmission of the disease between humans was announced. Then very rapidly a situation was reached, where the participation of the COVID-related hashtags occupied 30-70% of the HSL, however, with changing content. We give an analysis of how the hashtag topics changed during the investigated time span and conclude that there are three periods separated by February 12 and March 12. In period 1, we see strong topical correlations and clustering of hashtags; in period 2, the correlations are weakened, without clustering pattern; in period 3, we see potential of clustering while not as strong as in period 1. To quantify the dynamics of HSL we measured the lifetimes of hashtags on the list and the rank diversity at given ranks. Our observations indicate attention diversification since the COVID-19 outbreak in Mainland China and a higher rank diversity at the top 15 ranks on HSL due to the COVID-19 related hashtags, a drastic attention decay shortly after the outburst and a slower decay for a longer period. In our times of information deluge the dynamics of public attention is of eminent importance from many aspects, including education, politics, marketing and governance. On the new media the flow of information has dramatically accelerated, leading often to rapidly changing public attention. At the same time these media provide unprecedented possibilities to study attention dynamics [1, 2] as they produce Big Data open for investigation. The microblogging service Twitter [3] is particularly suited to provide the basis for quantitative studies on the dynamics of public attention as the content of the messages is available [4] . Accordingly, Twitter data have been used to identify classes of dynamical collective attention [5] , investigate party-related activity and its predictive power for elections [6] as well as modeling of the related attention dynamics [7] or study the relationship of public attention and social emotions [8] . Public attention becomes a focal issue in times of crises like pandemics. As early as 2010, four years after it was launched, Twitter was shown to be an adequate, realtime content, sentiment, and public attention trend-tracking tool [9] and was used to study rapidly-evolving public sentiment with respect to the epidemic H1N1 [10] . The analysis of tweets enabled to quantify the difference between attention and fear and their distance-dependence in the case of the Ebola epidemic [11] . Even for the present pandemic COVID-19, the first Twitter studies on public attention have occurred [12, 13] mainly focusing on the perception of policies by the public. During a critical time of the Spring Festival travel rush, Wuhan, the capital city of Hubei province was reported to be the first COVID-19 epicenter. The service of Twitter is blocked in China, but its local substitute, Sina Weibo is very popular [14] , therefore it is natural to use data from Weibo for similar purposes as was introduced earlier for Twitter in other countries. Posts on Sina Weibo are predominantly in Chinese, which causes a language barrier, however, scientists have already recognized that this microblogging service provides important insight into the function of the Chinese society [15, 16] . Recently some studies have appeared dealing with the reaction of Sina Weibo on the COVID-19 analyzing, e.g., the propagation of situational information [17] . In this study we focus on the attention dynamics in the period of COVID-19 using the Hot Search List (HSL) of Sina Weibo. This is a ranking of hashtags updated on the minute basis, created according to an algorithm in which the number of searches on the hashtags is dominant. This ranking provides a proxy for the attention preferences of the Weibo users enabling the quantification of the dynamics thereof, which reflects the changes in the attention due to events and measures. Ranking is present in many fields of today's world from sports to universities, from wealthy individuals to purchasable goods. Recently, ranking dynamics has been studied widely from sports [18, 19, 20] to scientists, journals or companies [18] . There are stable rankings (like word frequencies) with little or no changes in the ranks and there are volatile ones with vivid dynamics (mentions of Twitter hashtags) [18] . Clearly, Weibo HSL belongs to the latter with rich dynamic properties, which gives insight into the changes in the attention of the Weibo users. The Weibo HSL provides rich data about the public attention and its dynamics in China. Based on that data, we have been able to identify different periods in the pandemics and could follow how the attention of the population shifted from one group of topics to another. The paper is organized as follows: in Section 2, we provide background information on Sina Weibo real-time Hot Search List (HSL) and methodologies on quantifying attention dynamics. In Section 3, we present our results on attention dynamics, correlations between different types of hashtags, and attention decay. In Section 4, we discuss and summarize the results. Sina Weibo is the biggest Chinese microblogging website, with MAU (monthly active users) reaching 550 millon and DAU (daily active users) 241 million in March 2020 [21] . Instead of using one hashtag at the beginning like Twitter, topics on Weibo are confined in double hashtags one at the beginning and one at the end of the topic description, for example, #Pneumonia of unknown cause detected in Wuhan#. In this paper, we use hashtag as the synonym of topic, we refer a hashtag as the content contained within the double #s. A hashtag becomes popular at a given time as it is used in many tweets, gains a large number of searches, likes and discussions by the users. The Hot Search List on Weibo is a section that displays the 50 most popular hashtags in real-time. The hashtags on the HSL, together with their ranks and search volume indexes are updated every minute [22] and new popular hashtags may emerge and others vanish. The search volume index is a comprehensive measure which takes into account multiple dimensions such as the number of searches in Sina Weibo and the quality of the user accounts involved in the search, for the aim of preventing manipulated fake popularity [23] . The third and sixth ranks on the HSL are sometimes occupied by promoted advertisements labeled with the character "荐" [24] (meaning recommendation). We took data from Weibo HSL to study attention dynamics as it captures vibrant real-time change of public attention. Due to the random existence of one or two commercial advertisements at the third and the sixth ranks, in order to get a constant length of non-advertisement hashtags on the HSL at each timestamp, we removed all the hashtags labeled with "荐", re-ranked the original HSL and took the top 48 hashtags for each timestamp. All the HSL we mentioned later in this paper mean the re-ranked HSL with 48 ranks. We collected the data on the HSL with a frequency of every 5 minutes from December 16, 2019 to April 17, 2020. There are in total 26022 hashtags and 9120 of them are related to the aspects of COVID-19. To relate social media contents with real-life pandemic situation in Mainland China, we collected the daily number of infections, deaths, and recoveries from the official website of National Health Commission of China [25] . In the following subsection we explain how we identified the different categories of hashtags. Fig. 1 shows the number of daily infections, deaths and recoveries in Mainland China. The number of daily infections and deaths have a sharp peak on February 12 due to the adoption of new diagnostic criteria [26] . The decreasing trend of daily infections since the peak turned to increasing after March 13, as a result of the rising number of imported coronavirus cases from abroad [27] . We will argue that there are three periods to be distinguished after the outburst of COVID-19 around January 19, separated by the maximum and local minimum of the daily number of infections on February 12 and March 12, respectively. The public attention towards COVID-19 is believed to change with the real world pandemic situation. To study the public attention towards COVID-related information, we first extracted hashtags which encompass all aspects of COVID-19 and classified them based on geographic regions and the exposure order under the pandemic into three categories: Mainland China, East Asia outside of Mainland China and Other Countries outside of East Asia. With a focus on COVID-hashtags related to Mainland China, we manually classified them based on semantic meanings into the following seven sub-categories. The Bad News category comprises hashtags on confirmed infections and deaths in different regions of Mainland China as well as shortages of essential supplies. The Good News category consists of news on cases of recovery, sufficiency of supplies, and decrease in daily infections or deaths. The Regulations category consists of authority responses of national, regional, institutional laws, rules and regulations associated with public behavior during the pandemic. The Life Influence category contains hashtags that reflect the pandemic influence on the aspects of citizen lives. The Front Lines category includes hashtags related to the lives of front line workers (mainly doctors and nurses) and their interactions with patients in hospitals. The Science category incorporates scientific understandings of the virus properties, vaccine development, and ways for public protection given by authoritative doctors. The Supports category takes into account hashtags on worldwide donations and emotional supports. All the classifications were made by human decisions due to the syntactic-semantic complexity of Chinese language. For ambiguous cases which contain information of more than one category, our classifications were based on the focus of the main subject. The Mainland China sub-categories are summarized in Table 1 together with examples. The full list of COVID-related hashtags is available in the dataset, which we have made public (see declaration at the end of the paper). To further understand how the Mainland China related COVID-hashtags are correlated with each other and with the daily number of infections/deaths/recoveries in the three separated time periods, we measured the Pearson's correlations between the seven series of daily number of new hashtags in each of the sub-categories defined above, together with the three series of daily number of infections/deaths/recoveries. The correlation of these ten time series are calculated using the percentage change between the current and the prior element instead of the actual value in order to reduce the effect of the trend which can cause spurious correlations. For time series category X = {X ti : t i ∈ T, i = 1, 2, ...n} and category Y = {Y ti : t i ∈ T, i = 1, 2, ...n}, where T is the time index set, the Pearson's correlation is calculated using the percentage change seriesX = { One natural measure of social media attention towards a topic category is the quantity of the related hashtags. The growing pattern of the cumulative number of hashtags on the HSL with time reflects the dynamics of the public attention. We separately measured the growth of the cumulative number of all hashtags and all COVID-related hashtags that ever appeared on the HSL in our observation period. To understand how much COVID-information occupies the HSL at each timestamp, we constructed the historical ratio trajectory of the COVID-related hashtags on the HSL since the first COVID-hashtag #武汉发现不明原因肺炎# (#Pneumonia of unknown cause detected in Wuhan#) appeared on December 31, 2019. The lifetime duration of a hashtag on the HSL indicates the ability of obtaining persistent attention from the public. We quantified the duration (continuous existence on the HSL) of a hashtag with τ: where τ 0 is the timestamp of the first and τ 1 is the timestamp of the last appearance of a hashtag on the HSL. We compared the duration of the hashtags across various categories and different time scopes. We compared the duration of the hashtags before the outbreak on January 19, all COVID-related hashtags, and non-COVID hashtags after the outbreak. To ensure complete life cycles of the hashtags, we took all hashtags whose first arrivals on the HSL are between December 19, 2019 and January 18, 2020 as the sample for hashtags before the pandemic, which includes 6161 in total. Similarly, we took all COVID-hashtags whose first arrivals are no later than April 14, with a total number of 8808. For the non-COVID hashtags after the outbreak, we took a random sample of all non-COVID hashtags with the same size as the COVID sample. Hashtags that reappeared after disappearing from the HSL were excluded from our calculation. To understand the overall attention variation towards COVID-hashtags with time, we investigated the daily value of their cumulative average duration. We denote D j as the cumulative average of duration from December 31, 2019 (day 0) until day j. D j is calculated as follows: where d α i is the duration of hashtag α whose first appearance was on day i. S(j) is the set of all the hashtags whose first appearance is in the interval [0, j]. The changes in the ranking patterns of the hashtags at different time periods reflect the general public attention dynamics. Rank diversity [20] , which measures the number of different hashtags occupying a given rank over a given length of time, gives overall information on the total dynamical trend of the hashtags on the HSL. Rank diversity is known to give characteristic profiles for different types of systems; e.g., in open systems (where only the top part of the competing items is ranked) behaves differently from closed systems (where all the items are ranked). We compared the rank diversity at the 48 ranks on the HSL before the outbreak and during the different periods after the outbreak, with and without COVID-19 hashtags. The public attention towards a hashtag can also be indicated by its highest rank during the lifetime on the HSL. The highest rank of a hashtag reveals its highest ability and achievement when competing for attention with the other hashtags. We studied the highest rank distribution of the classified COVID-hashtags and compared the results with the hashtags before the outbreak as well as the non-COVID hashtags after the outbreak (SI). To understand the overall highest rank variation towards COVID-hashtags with time, we investigated the daily value trajectory of their cumulative average highest rank. We denote H j as the cumulative average of highest rank from December 31 (day 0), 2019 until day j. H j is calculated as follows: where h α i is the highest rank of hashtag α whose first appearance was on day i. S(j) is the set of all the hashtags whose first appearance is in the interval [0, j]. The cumulative number of new hashtags on HSL grows approximately linearly (see Fig. 2 (A)), indicating a nearly constant attention capacity and need for news of the users. Closer inspection tells, however, that the rate of new hashtags decreases between January 10 and February 12 followed by an increased rate until March 28 after which the original slope of 225 ± 4 new hashtags/day sets in. We attribute this change in the slope to the effect of COVID-related hashtags. The first COVID-related hashtag appeared on the HSL on December 31, 2019, followed by only a few ones in the following week. As the first death case occurred on January 11, second one occurred on HSL on January 16 and more infected cases detected in other cities in China as well as in the surrounding Asian countries, rumours and scared emotions about the unknown pneumonia were permeating in the society and the number of daily COVID-related hashtags started to increase rapidly on January 19. On January 20, Chinese authorities announced to the public that the new coronavirus is transmissible between humans. From our point of view the period until January 19 can be considered as pre-COVID. During that time at most three COVID-related hashtags per day have occurred on the HSL and the cumulative number of different hashtags on HSL has grown approximately linearly with an unaltered slope (see Fig. 2 (A) ). Around January 19 the number of COVID-related hashtags started growing and, at the same time, the overall growth of the total number of hashtags slightly decreased, indicating that the new COVID-related hashtags stay longer on HSL as compared with those before the outbreak. This results in a decrease of the total number of new hashtags per unit time on HSL. After January 19, a rapid increase can be observed in the number of COVID-hashtags (see the inset of Fig. 2 (A) ). This has, finally, also an effect on the total cumulative number of hashtags resulting in an increased slope in Fig. 2 (A) . Fig. 2 (B) shows the cumulative number of geographically categorized COVIDhashtags with Mainland China, East Asia outside of Mainland China, and Other Countries outside of East Asia as categories. The Mainland China category starts to rise rapidly from January 19, reaches a peak in the following week, and then gradually drops with a few rebounds. The second peak and the decline of Mainland China category is intertwined with the trajectory of the Other Countries category in mid-March. The East Asia category remains at a relatively low level throughout the pandemic. COVID-19 was first observed in east Asia, with Mainland China being the hardeststricken region, followed by places with growing infections such as South Korea, Diamond Princess cruise ship and Japan. The epicenter of COVID-19 later shifted to Europe and the rest of the world as the situation mitigated in east Asia. The results depicted in Fig. 2 (B) follow these events closely, confirming the role of the realtime HSL on Weibo as a reflection of the real world. Unsurprisingly, the upward and downward trend periods of Mainland China and Other Countries coincide with Fig. 2 (C) , where the ratio of COVID-related hashtags on the HSL at each timestamp is displayed. The swift third peak on April 4 in Fig. 2 Fig. 3 Fig. 3 (A) (C) (E), paired with their correlation matrices with daily infections, deaths, and recoveries in Fig. 3 (B) (D) (F). As noted above, we have identified three periods in the investigated time interval: The first period is January 19 -February 11, separated by the huge peak in Fig. 1 from the the second one (February 12 -March 12). The third period (March 13 -April 17) is separated from period 2 by the second vertical line where the number of new infections has a local minimum (Fig. 1 inset) . In Table 1 , we show the number of hashtags related to Mainland China in the different categories for the three periods. In Fig. 3 , we show that the daily emergence of the categorized COVID-hashtags is dominated in the first two periods by Bad News, with increasing and decreasing trends in period 1 and period 2, respectively. In period 3, the categories Regulations, Life Influence, and Front Lines receive more attention as compared to the rest of the categories. Here the consistently high values in Regulations and Life Influence could result from the worsening world pandemic situation along with the rise of the imported infected cases in Mainland China, necessitating the establishment of measures to handle it. The categories of the Mainland China COVID-hashtags move with the number of infections and deaths in the world. The patterns of the Pearson's correlation matrix of the ten time series reflect temporal structure with the three periods. Fig. 3 (B) shows a positive correlation block structure. There are strong correlations between New Death, Regulations, Science, and Bad News (upper left block) as well as between Supports, Good news and Front Lines (lower right block) and there is considerable anti-correlation between the two blocks. Fig. 3 (D) (period 2) exhibits much weaker correlations, in fact, very few elements of the matrix reach values beyond the noise level (see SI). Exceptions are new strong correlations between New Death and Front Lines, as well as Bad News and Front Lines. In the third period ( Fig. 3 (F) ) the block structure gets again more pronounced, though not as pronounced as in the first period. Note that the categories had to be rearranged in order to achieve this structure. The major change is that Supports/Front Lines and Life influence/Regulations have exchanged positions. In period 1, the Bad News (mainly infections and deaths) of domestic cases in Mainland China were flooding, this lead to the urgent establishment of regulations, which caused life influences. In period 3, the domestic situation was under control, therefore, the Bad News in Mainland China were mainly caused by the worsening international situation (infections/deaths and Chinese coming back from abroad). Then the Regulations and corresponding Life Influences towards these issues were not anymore strongly associated with domestic deaths. In period 3, the assisting front line doctors were gradually going back home after finishing their work, people expressed their gratitude to them, so that Front Lines and Supports were moving together. What is the effect of COVID-19 on the ranking dynamics? Fig. 4 shows a comparison of the rank diversity at the top 48 ranks taking non-COVID and COVID hashtags in different periods. Striking differences are observed between the rank diversity plots before and after the outbreak. As Fig. 4 (A) suggests, the rank diversity plot before the outbreak was approximately linear with moderate fluctuations. A clear gap emerges in the rank diversity after rank 15 in Fig. 4 (B) during the COVID period. We recognize resemblances in the rank diversity plots before the outbreak and after the outbreak considering only non-COVID hashtags, except for the strange drops at ranks 29 and 34 in Fig. 4 (C) . Comparing Fig. 4 (D) with Fig. 4 (B) , the gap after rank 15 is larger in the rank diversity plot considering only COVID-related hashtags. The rank diversity plots for hashtags in period 1 surpass period 2 and period 3 with both non-COVID and COVID hashtags as depicted in Fig. 4 (C) and (D), while the difference is much higher in the latter case. Fig. 4 gives evidence that the COVID-hashtags cause the gap in the rank diversity plot after the outbreak. Taking the normalized rank diversity plot before the outbreak as a reference, a higher normalized rank diversity at a certain rank position represents a higher number of unique occurrences within the observation period, so that the COVID-related hashtags in the top 15 ranks change faster (with higher frequency) than normal. One possible explanation is that the COVID-hashtags kept emerging with higher frequency than before the outbreak and people payed much attention to these new hashtags. Additionally, when the flooding hashtags contained similar information such as the new infections and deaths in different cities or provinces of China, the public interest towards individual hashtags could drop quickly, resulting in a higher number of unique hashtags at certain ranks in unit time on HSL. This effect of higher rank diversity for higher ranks seems to be amplified by the algorithm leading to the observed gap. Strange drops of rank diversity at ranks 29 and 34 can also be seen on our plots in Fig. 4 . As provided in SI, there are hashtags that stay at the ranks 29 and 34 for an unusually long time and then disappear from the HSL, indicating algorithmic intervention from Weibo. As one of the most popular and influential social media in China, Weibo might shoulder the responsibility during the global public health emergency to keep people informed about related news in China and around the globe, by means of changing the algorithm towards COVID-hashtags to promote crucial news and keep them updating in the top 15 positions and leave the list at rank 29 or 34. Our methods are sensitive enough to demonstrate this type of interventions. Therefore our observations reflect a combination of both spontaneous attention dynamics from the public and the controlled effects from Sina Weibo. Rank diversity captures attention dynamics from the point of view of the overall dynamical rank movements of the hashtags on the HSL. It is interesting to follow the dynamics also from the aspect of the individual hashtags. The average highest rank of a category of hashtags on a given day is characteristic to the attention paid to that category. (Note, of course, that getting to the HSL expresses already considerable attention.) Similarly, the average duration is another measure of attention. However, in the latter case it should be mentioned that short duration can be caused by decaying attention to the general topic (in this case the hashtag is likely to be replaced by another from a different topic) or because of the heavy stream of new hashtags of the same topic. How do the average highest rank and average duration accumulate with time? As Fig. 5 (A) shows, the cumulative average highest rank, H j is initially at a top rank, indicating that the first few hashtags about the unknown pneumonia received a huge amount of attention from the public. As more COVID-related hashtags occurred, H j becomes lower, with a rapid change at the beginning and a slower change later, separated by around January 30. This is due to the rapidly increasing number of COVID-related hashtags and the limited number of ranks on HSL. In Fig. 5 (B) , the first peak of the cumulative average duration, D j is on January 8, when the hashtag that eight patients infected by the unknown pneumonia recovered from hospital. Then the D j decreases first and then increases again, reaching the second peak on January 22, after which the increasing daily new hashtags with short durations started to play a greater role than the few hashtags with long durations. The fast decay of D j in the period between January 22 and February 18 (see the inset in Fig. 5 (B) ) was fitted by an exponential function: with α = 4.13h, β = 0.31h/day, γ = 5.72h. On February 18, hashtags of positive changes in the COVID situation started to appear on HSL. After that, the D j exhibits a slower and longer decay. In this work, we have studied the public attention dynamics on the biggest Chinese microblogging website Sina Weibo under the influence of the COVID-19 pandemic. We provide a novel approach to study and quantify the attention dynamics in terms of ranking dynamics, taking advantage of the real-time Hot Search List on Weibo. We have identified three periods within the investigated time interval and analyzed the attention dynamics on different COVID-related categories within them. We have compared the behavior of the hashtags before the outbreak with COVIDrelated and non-COVID hashtags after the outbreak. We have observed differences in the attention dynamics on Weibo HSL in the different periods. First, the public attention is mainly driven by the infection and death situations in Mainland China, with mainly domestic cases at the beginning, and internationally imported cases later. The attention variation follows worldwide major events. Second, the public attention on Weibo towards COVID-19 is diversified into different sub-categories after the outbreak on January 19, 2020, with varying correlation patterns in the three phases. Third, the attention decays as the situation in China gets better. The cumulative average duration follows exponential decay since the attention peak in the pandemic beginning phase. The situation in China is interrelated with the world pandemic situation which keeps changing, so that the decay of public attention on the Chinese social media Weibo is not a clear-cut case. Fourth, the rank diversity at the top 15 ranks are higher than normal due to COVID-hashtags. The reason can be that Weibo has different algorithms towards COVID-hashtags from normal ones, or possibly a combined influence of both Weibo algorithm and the spontaneous preference from the public towards COVID-related information. Besides exploring the attention dynamics on the Chinese social media Sina Weibo, we also studied the cumulative growth of all topics and all the COVID-topics on Twitter trending list in the United States. As is shown in the supplementary material Fig. SI1 (A) , the cumulative number of all the Twitter trending topics in the United States is almost perfectly linear. The time period that the cumulative number of all COVID-topics on Twitter trending list increases is in accordance with the rising period of the number of hashtags in the Other Countries category in Fig. 2 (B). The similarity of results on Sina Weibo HSL and Twitter trending is a reflection that both platforms are influenced similarly by the major events worldwide during the COVID-19 pandemic. Though having more daily new topics on Twitter trending list than Weibo HSL, the number of COVID-topics on Twitter is much fewer. The topics on Twitter are generally shorter and have broader meaning, for example, #QuarantineLife, while the hashtags on Weibo are more detailed, for example, #小区窗台演唱会庆祝解除隔离# (#Community windowsill concert to celebrate the lifting of quarantine#), contributing to the rich number of diverse hashtags on Weibo. It should be emphasized that both for Sina Weibo and Twitter the lists are produced by unknown algorithms and in the case of Sina Weibo we have been able to pinpoint direct interventions from the side of the provider into the ranking. However, the detailedness of Weibo HSL, its fixed length and the fact that HSL is the same for all users seem to make Weibo HSL more suitable to study attention dynamics through ranking than Twitter, as Twitter trending lists are without fixed length and can be personalized. Sina Weibo is the largest microblogging site in China, where Twitter, the worldwide most popular service of this kind does not operate. It is a natural idea to try to compare our observations made on Sina Weibo with Twitter attention dynamics. Unfortunately, there is no comparable statistics on Twitter to the HSL. Instead, Twitter has the service to inform about most retweeted hashtags during the last 24 hours updated on the minute basis and broken down to countries [1] . We have chosen to study the US tweets. Categorization of tweets has been widely investigated [2, 3] , including recent attempts to analyze the impact of COVID-related topics [4] on Twitter by analyzing the sentiments to 10 words related to COVID. Twitter even created a "COVID-19 stream" [5] to promote this type of research. In spite of these, a direct comparison of our results on Sina Weibo with Twitter is hindered by a number of factors, including the different characters of the listings, the different roles hashtags play in these services and the differences due to the scripts. Nevertheless, we tried to capture at least the overall trends (see Fig. SI1 ). Fig. SI1 (B) shows, the COVID-topics on Twitter trending list first grows very slowly at the beginning phase, and then starts to increase dramatically from late February 2020. The rate of COVID-related topics is, however, much smaller in the Twitter list than on that of the Sina Weibo. To understand how the categories of time series of daily new hashtags move together and whether there are blocks of categories that co-move, we presented the correlation matrices plot between the ten time series in the three periods after the outbreak. In order to get information about the significance of the correlations we apply a null model, which is created by shuffling the times of the individual values, thus smearing out the correlations. Due to the finiteness of the time series, there will be non-zero background noise level denoted by Z in the null model, defining the background to which measured real correlations can be compared. Z is calculated by correlating 500 shuffled time series for each of the 10 categories. We observed that all the pairs have similar standard deviations between around 0.16 to 0.2. We take a uniform value Z = 0.2. In Fig. SI2 we show correlations where only those C ij correlation matrix elements are presented for which Z < |C ij |. The figure shows the different Mainland China topical categories and their thresholded correlations in the three pandemic phases. In Fig. SI2 (B) most of the correlations are beyond the threshold, while in Fig. SI2 (D) very few are beyond the threshold. In Fig. SI2 (F), though some values at the upper left and lower right corners are beyond the threshold, they are much weaker than in Fig. SI2 (B) . We showed in the main paper Fig. 4 that the gap between the top 15 ranks and the rest of the ranks in the rank diversity plot after the outbreak is caused by the COVID-hashtags. In order to further understand the properties of COVIDhashtags and how they influenced the HSL hashtag dynamics, we compared the highest rank and duration distribution of different COVID-categories with the non-COVID hashtags before and after the outbreak. Fig. SI3 shows a detailed comparison of the highest rank and duration of the categorized Mainland China COVID-hashtags on Weibo Hot Search List (HSL), before and after the COVID-19 outbreak. As Fig. SI3 (A) shows, most of the categories have a median of highest rank close to 15. Science category and Bad News category are generally higher ranked than other categories. The median highest rank of the non-COVID hashtags after the outbreak is the same with that of the hashtags before the outbreak (rank 19), while the median highest rank of the COVID-hashtags is higher than both (rank 16). Fig. SI3 (B) shows the lifetime duration of the different categories. The median duration of most of the categories is less than 3.5 hours. Science category has the highest duration among all categories. Non-COVID hashtags after the outbreak (3.95 hours) and hashtags before the outbreak (3.80 hours) have similar duration distributions. The COVID-hashtags generally have shorter duration (3.21 hours) than non-COVID hashtags. In the main paper, we have seen strange drops in the rank diversity plot at the ranks 29 and 34 after the outbreak, this implies that the number of unique hashtags occurred at these ranks in a given time interval is smaller than usual, so that there should be hashtags staying there for unusually long time. Here we present examples of normal and abnormal hashtag rank trajectory plots, and verify there are hashtags that stay at certain ranks such as rank 29 and 34 on the HSL for a strangely long time without any fluctuation. Table SI1 . The abnormal rank plots are likely due to the algorithm intervention from Sina Weibo. Novelty and collective attention The dynamics of public attention: Agenda-setting theory meets big data Dynamical Classes of Collective Attention in Twitter Twitter-based analysis of the dynamics of collective attention to political parties Model for twitter dynamics: Public attention and time series of tweeting Interplay between public attention and public emotion toward multiple social issues on twitter Pandemics in the age of twitter: Content analysis of tweets during the 2009 h1n1 outbreak The use of twitter to track levels of disease activity and public concern in the U.S. during the influenza A H1N1 pandemic Too far to care? measuring public attention and fear for ebola using twitter COVID-19 media textual analysis. A dashboard for media monitoring Understanding the perception of COVID-19 policies by mining a multilanguage Twitter dataset An Introduction to Sina Weibo: Background and Status Quo Weibo communication and government legitimacy in China: a computer-assisted analysis of Weibo messages on two 'mass incidents'. Information Networked framing between source posts and their reposts: an analysis of public opinion on China's microblogs. Information Characterizing the propagation of situational information in social media during COVID-19 epidemic: A case study on Weibo Dynamics of ranking processes in complex systems A new method for comparing rankings through complex networks: Model and analysis of competitiveness of major european soccer leagues Generic temporal features of performance rankings in sports and games Weibo Reports First Quarter 2020 Unaudited Financial Results An Introduction to Sina Weibo for Journalists Common Questions on the Rules of Real-time Hot-Search-List, Hot-Message-List and Hot-Topic-List National Health Commission of People's Republic of China China confirms 15152 new coronavirus cases, 254 additional deaths China reports 99 new virus cases, majority imported Classifying trending topics: A typology of conversation triggers on twitter Twitter trending topic classification Exploring Coronavirus Twitter Trends COVID-19 stream Thanks are due to Márton Karsai and Tiago Peixoto for suggestions. Additional file 1. Supplementary information (PDF 2.4 MB) The datasets supporting the conclusions of this article are available in the Attention Dynamics Sina Weibo COVID19 repository, https://github.com/cuihaosabrina/Attention_Dynamics_Sina_Weibo_COVID19 The authors declare that they have no competing interests.Funding JK acknowledges partial support from the H2020 project SoBigData++ (ID: 871042).Author's contributions HC and JK conceived the idea and designed the study. HC carried out the data collection, HC and JK did the data analysis. Both authors drafted the manuscript, read and approved the final manuscript.