key: cord-0010490-wyo08uqg
authors: nan
title: Information Diffusion on Social Media During Natural Disasters
date: 2018-01-11
journal: IEEE Trans Comput Soc Syst
DOI: 10.1109/tcss.2017.2786545
sha: 276249381f8998bcdee56166802344f2583ef0de
doc_id: 10490
cord_uid: wyo08uqg

Social media analytics has drawn new quantitative insights of human activity patterns. Many applications of social media analytics, from pandemic prediction to earthquake response, require an in-depth understanding of how these patterns change when human encounter unfamiliar conditions. In this paper, we select two earthquakes in China as the social context in Sina-Weibo (or Weibo for short), the largest Chinese microblog site. After proposing a formalized Weibo information flow model to represent the information spread on Weibo, we study the information spread from three main perspectives: individual characteristics, the types of social relationships between interactive participants, and the topology of real interaction networks. The quantitative analyses draw the following conclusions. First, the shadow of Dunbar’s number is evident in the “declared friends/followers” distributions, and the number of each participant’s friends/followers who also participated in the earthquake information dissemination show the typical power-law distribution, indicating a rich-gets-richer phenomenon. Second, an individual’s number of followers is the most critical factor in user influence. Strangers are very important forces for disseminating real-time news after an earthquake. Third, two types of real interaction networks share the scale-free and small-world property, but with a looser organizational structure. In addition, correlations between different influence groups indicate that when compared with other online social media, the discussion on Weibo is mainly dominated and influenced by verified users.

about human activity patterns, such as influencer's identification [2] [3] [4] [5] , the network topology measurement [6] , [7] , trust analysis [8] [9] [10] , social hot spot-tracing [11] [12] [13] , and the dynamics of information spread [14] [15] [16] . However, many applications, from pandemic prediction to earthquake response, require an understanding of how these patterns change when human encounter unfamiliar conditions [17] , [18] . Especially for China, who suffered from frequent natural disasters, the understanding of how the behaviors of hundreds of millions of Web users change is very important. The empirical study of the human flesh search (HFS) [19] , [20] , for one, provided quantitative insights into these collective responses of Web users in China. Inspired by previous research on Web users collective responses, we choose two empirical cases in China-Yi'liang (2012) and Ya'an earthquakes (2013) . What makes it additionally useful is that many densely populated areas in mainland China, such as Sichuan, Fujian, and so on, are frequent earthquake areas and often suffered severe damage from earthquakes.

At present for earthquake topics, there are two main types of studies: detecting seismic waves and enhancing rescue efforts. The former focuses on how to improve the accuracy of magnitude of earthquake forecasting or issue warnings as early as possible [21] [22] [23] [24] , such as the Did You Feel It system. And the later studies [25] [26] [27] try to explore ideas to cope with earthquake relief, postearthquake reconstruction, and to improve the mental health status of rescuers. This paper focuses on the latter effort from a social network perspective, with a particular focus on information diffusion and social networking behaviors.

In social network study, the above-mentioned focus [25] [26] [27] can be seen as a nonlinear superposition of a multitude of social interaction networks, where nodes represent individuals and edges capture a variety of different social relations. However, after further study, a group of researchers represented by Huberman et al. [28] found that social interactions within Twitter cannot be inferred directly from a declared relationship set of friends and followers: many users interact with few other people in their declared relationship network [28] [29] [30] [31] . The key problem is that the structure of the underlying interaction network is not visible and must be inferred from the flow of information between individuals, which poses a serious challenge to our efforts to understand how the structure of the network affects social dynamics and the spread of information [32] . Take the tweet reposting connection shown in Fig. 1 underlying interaction network: 1) user-D has four followers, B, C, F, and E, but just C and F repost the message created by D; 2) user-F reposts the tweet created by user-D through the intermediary-C; 3) user-C participates more than once; and 4) the tweet created by user-D is also reposted by user-H who does not have the relationship of friend or follower with user-D.

There are two ways of describing these features in previous studies. The conventional method distinguishes whether users interact with each other by adding extra attributes for nodes or edges of declared relationship networks [17] , [18] , [33] . Fig. 1(c) is the typical application of this method, which attempts to describe Fig. 1(a) and (b) together. However, it cannot depict the features-III and IV. The other solution is to describe all kinds of user relationships or interactions using a multilayer/multirelational network [34] , [35] . Likewise, the method cannot express the features-II and III explicitly, and the types of models lack the parallel development of specific analysis methods to exploit the information hidden between the layers.

In response to the above-mentioned problem, we propose a formalized definition of the Weibo information flow (WIF) and have applied it to the empirical analysis of a Sina-Weibo data set on the topic of earthquakes. The goal of our work is to provide a way to extract the underlying hidden social network that more clearly represents the information cascade and actual interactions among users after an earthquake. We will then address the problem in understanding of hidden social networks behind the online information flow.

1) What about the languages used by participants? How is the demographics of participant users? Are there regional features? What are the differences before and after the earthquake? 2) What are the common features of influential users? What are the key factors in user influence? 3) When a user reposts, will he/she evenly repost tweets that attract him/her, or just those tweets posted by his/her friends? Also for users whose message are reposted, does the reposting all take place solely among their followers? Are there any differences in information diffusion patterns before and after the earthquake? 4) Compared with early Web2.0 online communications, does the Sina-Weibo platform have unique features that include social and information dissemination functions? 5) Are there any correlations between the type of interacted user relationships and the topological feature of user networks? The organization of this paper is as follows. Section III presents the main body of this paper. We first introduce the data set and give a formal definition of the WIF model in Section II. Section III consists of three subsections. Section III-A includes the empirical results of eight individual attributes: the configured language of a personal page, gender, location, the number of followers, the number of friends, the number of posted tweets, the number being reposted, and the correlation among three user rankings. Section III-B analyzes the proportion of each type of user relationship between interacted users and discusses the differences in reposting patterns of each interacted user group by the user relationships. Section III-C uses social network analysis to unveil the topological properties of two types of underlying interaction networks, which are extracted by the WIF model, and analyze the correlation between influential user groups of two networks. Section IV closes this paper with remarks for future work.

Yi'liang earthquake erupted on September 7, 2012. Ya an earthquake erupted on April 20, 2013. Because happening consecutively in China, these two natural disasters triggered sparked discussions of Sina-Weibo users. We collected these tweet repost data related two disasters by the Sina-Weibo public application programming interface (API). The API's registered users can obtain relevant permits as long as that their identities are verified by the Sina-Weibo user authentication. During the course of empirical analyses, we used the MySQL database management system for data extraction and cleansing, and used the Cytoscape toolkit to analyze the network topology [36] . All of the figures are plotted with MATLAB.

As presented in Table I , the data set consists of two parts, demonstrating the contrast between before and after earthquake. Yi'liang county in China was hit by a 5.7-magnitude earthquake on September 7, 2012; the section titled After Yi'liang Earthquake is the seismic information generated by Sina-Weibo for seven months after the earthquake. In addition, the section Before Ya'an Earthquake is the daily information exchanged on Sina-Weibo regarding Ya'an within a week before April 20, 2013, a 7.0-magnitude earthquake occurred. The object of comparison should be the "before Yi'liang earthquake," but the amount of relevant data is too small for empirical analysis.

Each original tweet record consists of five parts: the original tweet, the original user, repost tweets and their corresponding users, the repost tree, and the relationship between users. Fig. 2 shows the variation trend over time of the total number of these tweets and retweets and five high peaks in the after Yi'liang earthquake data set. These peaks may be related to some sensitive events taking place during those periods. The following social events might offer some available clues. 

To describe and retrieve a Weibo-spread information precisely, a formal definition of WIF is proposed as the following quadruple:

where WUS is the Weibo user set consisting of participant Weibo users, TS is the tweet set consisting of original tweets and retweets, RRS is the repost relationship set consisting of the citing relationships between tweets, and TRTSis the tweet reposts tree set consisting of repost trees of each original tweet. where gender = 0/1 is the "0" to male and "1" to female. pNum ∈ ProvincesNumList, PNum denotes the city where the current user lives (see the Sina-Weibo city code list). lang ∈ {zh-cn, zh-tw, zh-hk, en} represents the language that the current user has configured on his Weibo page, where "zh-cn" denotes the simplified Chinese characters used in Mainland China, "zh-tw" and "zh-hk" denotes the complex Chinese characters used in Taiwan and Hong Kong, respectively, and "en" denotes English. followerset is the set consisting of Weibo users who follow the current user. For the followerSet/friendSet of each participant, the actual data set mentioned in Table I just records a number, i.e., the number of his/her followers/friends. Of course, for the part of their followers/friends who also participated in the earthquake information dissemination by posting/reposting Sina-Weibo messages, there are detailed records in the data set. friendSet is the set consists of Weibo users whom the current user follows. sCount is the number of tweets posted by the current user.

where flag = 0/1, "0" indicates that the current tweet is an original tweet and "1" indicates responding to the retweet, user ∈ WUS is the user who posted the current tweet, repostCount ∈ N is the number of times of the current tweet has been reposted, time is the time when the current tweet is posted, text is the content body of the current tweet, and ot ∈ TS, when flag = 1, it denotes the original tweet of the current tweet; otherwise, it is null.

where st∈ {T | (T ∈ T S) ∧ (T r epostCount > 0)} denotes the tweet that was reposted. Regular occurrences of the multilayer reposting phenomenon exist: for example, user-A posts the ot first, and then user-B posts the rt B by reposting ot, and rt B are reposted by others, too. Therefore, the st is either an original tweet or a retweet. Fig. 3 (a) shows three types of user relationships among Weibo users where the directed edge indicates that the starting user follows the target user. Here, we refer to the pair of users linked by bidirectional edges as bi-friend if they both follow each other, and refer those users as strangers if there are not any directed edges. In this paper, it is assumed that if user-B reposts one tweet posted by user-A, then the tweet information flows from user-A to user-B.

After the combination of the information direction between two users and their relationship type, there will be five types 

of RRs [see Fig. 3 

The following equations are the symbolic descriptions of five types of RRs: 

TRT = OT, RT S, R RS * , si ze where OT∈ {T | (T ∈ T S) ∧ (T. f lag = 0)}RRS * ⊆ {R R | (R R ∈ R RS) ∧ (R R.RT ∈ R RS * )}

This subsection focuses on the following questions. What about the distribution of users' language? How is demographics of participant users? Are there some regional features? What are the differences before and after the earthquake? What are the typical features of influential users?

1) Basic Analysis: The demographics, language, and geographic distribution of online users are important measurement indicators of application fields in disaster relief and disease surveillance and control [21] , [37] , [38] . On Sina-Weibo, the growth spurt of seismic topic tweets emerged as soon as the earthquake occurred. Where did these active users come from? What are their social backgrounds? Our measurement of WU.gender, WU.lang, and WU.pNum shows: 1) there is a roughly equal proportion of males and females among participant users; 2) users living in the Chinese Mainland, Hong Kong, Macao, and Taiwan are the force that made the tweet peak instantaneously, and the reason of the phenomena mainly come from the locality of Sina-Weibo; and 3) there are a few foreign users participating to post or repost related tweets, but nearly all of them live in China. Fig. 4 shows the geographic distribution of participant users before and after the earthquake. During the aftermath in Yi'liang which was hit by a 5.7-magnitude earthquake on September 7, 2012, Fig. 4 (b) shows that Beijing, Shanghai, and Guangdong topped the rank in the number of participant users and the proportion of active users located in these three areas is 27.83%, these three cities are the political center, the financial center, and the largest province in the economy, respectively. Only 3.5% of active users are located in the local province (Yunnan). In cases prior to the earthquake, there is too little data to analyze unfortunately, probably because Yi'liang is a very remote country in Yunnan province. Therefore, we replaced it with Ya'an which is a popular tourist destination (in the same way later). Then, we found in Fig. 4 (a) hat local province (Sichuan) ranked first in the number of active users posting Ya'an common topics, with a share of 26.33%, at the same time the sum of shares in Beijing, Shanghai, and Guangdong still reaches 29.25%.

For participants after the Yi'liang earthquake, Fig. 5 shows the distributions of their four individual attributes the number of friends (WU.foCount), the number of followers (WU.frCount), and the number of followers/friends who also participated in the earthquake information dissemination by posting/reposting Sina-Weibo (WU.sCount-P and WU.repostsCount-P), which are counted by the following equations:

WU. f rCount − p(wu) = |{u|u ∈ wu. f ri end Set

WU. f rCount − p(wu) = |{u|u ∈ wu. f ollower Set The fitting function of red fitting lines is the general power model-frequency(X = x) = c · x −z with 95% confidence bounds, where c, τ ∈ R + , and x ∈ N + . Fig. 5 shows the distributions of four individual attributes of participant users. Sudden spikes in Fig. 5(a) -(c) are very obvious. However, we thought that these phenomena have nothing to do with social networks, but are led by the constraints of users privileges on Sina-Weibo. Just with the upper limit of WU.frCount, for example, there are three levels: 2000, 2500, and 3000 for nonmembers, ordinary members, and VIP members, respectively (see http://vip.weibo.com/privilege).

Another common feature found in Fig. 5 is segment characteristics. This is true in Fig. 5(a) and (b) , where the sudden changes near 130 verify Dunbar's number. That is, human communities are much larger than those of other primates and hence require more time to be devoted to social maintenance activities. However, there is an upper limit on the amount of time that can be dedicated to social demands, so this sets an upper limit on social group size [41] , [42] . It is known that humans have the cognitive capacity to maintain about 150 stable social relationships. With the advent of different types of super social networking services developing one after another, people have once again picked up the topic for discussion [43] , [44] . Many researchers have investigated how tools, such as Facebook and Twitter, have changed our capacity to handle social connections using the empirical study [45] [46] [47] . Here, Fig. 5(a) and (b) also shows the shadow of Dunbar's number on Sina-Weibo.

In addition, compared with the distribution of broadly defined "friends/followers" which are influenced by the Sina-Weibo user privilege in Fig. 5(a) and (b) , the number of each participant's "actual friends/followers" in Fig. 5(c) and (d) shows the typical power-law distribution; from this, we could know that the unevenness is a universal phenomenon of social networks, even when there are some limiting conditions in social network services, such as the constraint of users privileges on Sina-Weibo as described earlier. 2) User Influence: The number of relevant tweets is the most immediate indication of the popularity of posts on the Weibo space. The number of being reposted can offer us an intuitive view of the original poster's influence on public opinion. For the popularity, the engagement/productivity, and the influence [5] , [40] , [48] , three types of user rankings are made by the following R T , R F , and R RT

For the top 20 users of each ranking, the comparison shows that entertainers belong to the most popular group and most users who post tweets actively come from the grassroots, but nearly all users whose tweets regularly receive widespread reposting are news/charity organizations or entertainers. The rest of this subsection quantifies the correlations between the three rankings by the generalized−τ model [49] :

where R 1 and R 2 are two ordered rankings with the equal length, k; and K r 1 ,r 2 (R 1 , R 2 ) = 1, if 1) r 1 , r 2 / ∈ R 1 ∩ R 2 and r 1 is only in one ranking and r 2 is in the other ranking; 2) r 1 is ranked higher than r 2 and only r 2 appears in the other ranking; or 3) r 1 and r 2 are in both rankings in the opposite order, otherwise, K r 1 ,r 2 (R 1 , R 2 ) = 0. In particular, K r 1 ,r 2 (R 1 , R 2 ) = K r 2 ,r 1 (R 1 , R 2 ). In addition, the normalized distance-K is used, computed as follows [50] :

The range of K(R 1 , R 2 ) is from 0 to 1. K(R 1 , R 2 ) = 0 means complete disagreement, and K(R 1 , R 2 ) = 1 means complete agreement. Fig. 6 shows that with the increase of k, the distance between two arbitrary rankings will get to a relatively stable Fig. 7 . Three main tweet push services on Sina-Weibo. (a) For any two users, User 1 and User 2 , if User 2 follows User 1 , then User 1 is the friend of User 2 and User 2 is the follower of User 1 , in addition, tweets posted by User 1 would be shown for User 2 in real time. (b) All tweets beginning with the string of "@User 2 " will be shown for User 2 . (c) Every user can read "hot messages" that Sina-Weibo picked out from all tweets of a certain period by their popularity.

level, but three stable levels vary widely. 1) The top association between R F and R RT allow us to infer that a huge fan base may indeed be transformed into a real influence on the public.

2) The weak but stable association between R T and R RT let us deduce that active participation may indeed help to enhance user influence, but it is very limited. 3) There is little or no association between R T and R F , so it can be inferred that active grassroots participators seldom intersect with celebrities. Therefore, it can be seen that Weibo is a place shared by the general public and celebrities, which makes Weibo an excellent example of grassroots media. The seldom intersection makes the metamorphosis from grassroots activist to celebrity difficult, but it is a truth that intimate contact can make overnight success possible, though we have not found the hidden principles among this [51] .

From a sociological perspective, friendship is a trust relationship built by repeated games, and now the quality of online social relationships has been receiving increased attention [30] , [31] , [52] . In our earthquake discussion case, Fig. 7 lists the three main information push services in Sina-Weibo. This section focuses on the following questions. Who actually repost tweets that the original user has posted? When a user reposts, does he/she repost tweets that he/she is interested in, or just those tweets posted by his/her friends? Also for users who get reposted, do the reposting all take place just among their followers? Are there any differences in information diffusion patterns before and after the earthquake? 1) Structure of Weibo Information Flow: For the five types of RR in Fig. 3 , the butterfly shaped cartoon structure in Fig. 8 analyzed their share in the total amount of RR before and after earthquake, respectively, computed as follows:

where RRS i = {RR|RR ∈ RRS∧RR.type = i}, i ∈ {I, II, III, IV, V}.

For the WIF structure before the Ya'an earthquake, Fig. 8 (a) shows as follows. First, the RR t ype=I I I from WU 0 to WU 3 shares the largest percentage of all RRs, 43.04%, which benefits from the push service in Fig. 7(a) . Second, there is no declared relationship between WU 0 and WU 4 , but the share of RR t ype=I V between them reaches up to 37.23%, which may be largely due to the push service in Fig. 7(c) . Also, it can be seen that compared with the user popularity, the content of Fig. 8 . Structure of Sina-Weibo information spread before and after the earthquake, respectively. Notes: five parts marked with | ∼IV correspond to the five types of RRs in Fig. 3 , and the percentages are their shares in the total amount of all RRs. Structure of Sina-Weibo information spread during (a) week before the Ya'an earthquake and (b) seven months after Yi'liang earthquake.

tweets is equally important for their great popularity. Third, the strongest declared friendship exists between WU 0 and WU 2 , and the share of RR t ype=I I between them is 17.85%, which can be considered the mirror of the contacting pattern among friends in real life. Fourth, the share of RR t ype=V is 1.438%. Unfortunately, we found no logical explanation for these behaviors. Fifth, the RR t ype=I from WU 0 to WU 1 shares the smallest percentage of 0.4420%, less than 1%. Most users did not pay attention to tweets posted by their followers at all. It can be noticed that there is not an information push service from followers to friends, but in the real world, there is also a limited number of efficient ways to attract celebrities. So, this is what we can see in the hedged real world just through the lens of online media. Fig. 8(b) shows the WIF structure after the Yi'liang earthquake, and in Table II , we have listed the comparison result of these two butterfly maps. We can see the proportion of RR t ype=I V from WU 0 to WU 4 rises remarkably, the growth is more than half.

In this subsection, the following three conclusions can be drawn. First, "3.1 Analysis of user characteristics" has already shown that very few Weibo-big-Vs are able to attract wide attention; in other words, an overwhelming majority of Weibo users play the role of followers. From the dramatic contrast between proportions of RR t ype=I and RR t ype=I I I , Sina-Weibo seems like a natural platform to worship idols. Second, given the dramatic contrast among the proportions of RR t ype=I I and RR t ype=I I I or RR t ype=I V , Weibo users pay more attention to receiving interesting information than making friends. Maybe this is what makes Weibo different from other online social platforms. Third, followers are indeed the major driving force for the Weibo information spread, but at the same time, the reports from strangers are equally crucial. Particularly in some public emergencies, strangers are likely to be in the vanguard of discussing and distributing news in real time. So, from the point of view of information distribution, in addition to the followers' count of a user, it is also important to examine whether the tweet is selected as a hot message used to push to all other users (see http://hot.weibo.com/).

This subsection presents the distributions of individual reposting times in each type of RRS for the WIF structure after the Yi'liang earthquake [ Fig. 8(b) ]. In Fig. 9(f) , the statistic sample set is the sum of all RRs. If there is a point (k, p k ) in the scatter plot, it means that those users whose tweets have been reposted k times by others makeup p k percentage of participating users total, and there is a following mapping between k and p k : (11) where if the following equation is true, then count(wu) = 1, otherwise count(wu) = 0:

The five statistic sample sets in Fig. 9(a) and (e) are RRS V , RRS I V , RRS I I I , RRS I I , and RRS I , respectively. So, the point (k, p k ) in these five scatter plots means that there is a following mapping between k and p k :

where i∈{I, I I, I I I, I V, V }, and if the following equation is true, then count(wu) = 1, otherwise count(wu) = 0 :

All six distributions in Fig. 9 follow a power-law distribution. This is consistent with the distributions of the number of being cited as well as citing others in an empirical study of HFS 6 . However, the three similar power-law slope values (τ = 1.68, 1.75, and 1.84 ) in the HFS study are equivalent to one distribution (τ = 1.757) of RR t ype=I V , far below those (τ = 3.127, 2.225, 2.677, 3.882, and 2.138) other five distributions. The nature underlying power-law distribution is uneven, and the higher a slope value is, the more severe the imbalance gets. Corresponding to the situation here, it means that these users participating RR t ype=I V maintain the same interaction pattern with users of ordinary online social forums. However, there might be a serious structural imbalance in the interactions of RR t ype=I , RR t ype=I I , RR t ype=I I I , and RR t ype=V , which may be the reason why Weibo is so different, churning out a string of "Weibo-big-Vs"-Weibo users with mass followings and whose identities have been verified by Sina. These features have to do with social role partitioning [ Fig. 3(a) ] and the rolebased push services (Fig. 7) on Sina-Weibo.

Research by Huberman et al. [28] found that the driver of Twitter usage is a sparse and hidden network of connections underlying the "declared" set of friends and followers [53] , that is to say in terms of the network density, there is a gap between the real interaction network and the declared user network. Additionally, some other researchers have also found that social interaction existed among various types of users, far more than among acquaintances [54] . From the perspective of network topology, beyond the network density, are there any differences in other topological properties? In addition, what are the key factors behind user influence? To answer these questions, first, a method is introduced to extract the hidden interaction network from the friends and followers network on the basis of the WIF model. Afterward, we present measurement results and provide the corresponding explanation.

The TRT structure is the key to extracting real interaction networks. For 3096 TRTS in the WIF after the Yi'liang earthquake, Fig. 10(a) shows their size distribution where the largest TRT consists of 102 526 retweets, and Fig. 10(b) shows the depth distribution of all RRs, where the maximum value is 85.

According to the definition of WIF presented in Section II-B, the symbolic representation of the WIF instance in Fig. 11 is listed as follows:

where WUS = {wu 2 , wu 3 , wu 4 , wu 5 , wu 6 , wu 7 , wu 8 , wu 9 , wu 10 , wu 11 , wu 12 }. TS = {ot 1 , ot 2 , ot 3 , ot 4 , ot 5 , ot 6 , ot 7 , ot 8 , ot 9 , ot 10 , ot 11 , ot 12 , ot 13 }, and ot 1 = 0, wu 5 , 0, -, -, null, ot 2 = 0, wu 9 , 3, -, -, null, ot 3 = 0, wu 1 , 10, -, -, null, rt 1 = 1, wu 10 , 1, -, -, ot 2 , rt 2 = 1, wu 4 , 0, -, -, ot 2 , rt 3 = 1, wu 11 , 0, -, -, ot 2 , rt 4 = 1, wu 2 , 2, -, -, ot 3 , rt 5 = 1, wu 2 , 5, -, -, ot 3 , rt 6 = 1, wu 12 , 0, -, -, ot 3 , rt 7 = 1, wu 3 , 1, -, -, ot 3 , rt 8 = 1, wu 6 , 1, -, -, ot 3 , rt 9 = 1, wu 7 , 2, -, -, ot 3 , rt 10 = 1, wu 7 , 0, -, -, ot 3 , rt 11 = 1, wu 6 , 0, -, -, ot 3 , rt 12 = 1, wu 8 , 0, -, -, ot 3 , rt 13 = 1, wu 3 , 0, -, -, ot 3 . RRS = {rr 1 , rr 2 , rr 3 , rr 4 , rr 5 , rr 6 , rr 7 , rr 8 , rr 9 , rr 10 , rr 11 , rr 12 , rr 13 }, and rr 1 = ot 2 , rt 1 , 1, 1, rr 2 = ot 2 , rt 2 , 1, 4, rr 3 = rt 1 , rt 3 , 2, 4, rr 4 = ot 3 , rt 4 , 1, 3, rr 5 = ot 3 , rt 5 , 1, 3, rr 6 = ot 3 , rt 6 , 1, 4, rr 7 = rt 4 , rt 7 , 2, 3, rr 8 = rt 5 , rt 8 , 2, 2, rr 9 = rt 5 , rt 9 , 2, 4, rr 10 = rt 7 , rt 10 , 3, 4, rr 11 = rt 8 , rt 11 , 3, 5, rr 12 = rt 9 , rt 12 , 3, 4, rr 13 = rt 9 , rt 13 , 3, 4. TRTS = {trt 1 , trt 2 , trt 3 }, and trt 1 = ot 1 , null, null, 1, trt 2 = ot 2 , {rt 1 , rt 2 , rt 3 }, {rr 1 , rr 2 , rr 3 }, 4, trt 3 = ot 3 , {rt 4 , rt 5 , rt 6 , rt 7 , rt 8 , rt 9 , rt 10 , rt 11 , rt 12 , rt 13 , }, {rr 4 , rr 5 , rr 6 , rr 7 , rr 8 , rr 9 , rr 10 , rr 11 , rr 12 , rr 13 }, 11.

The "declared" online social network is still the common empirical object among existing empirical studies, where nodes represent users, and the edge between nodes indicates the declared social relationship [6] [7] [8] , [39] , [48] . Given the prevalence of the Internet water army, this paper is only interested in how participants collaborated with each other and their relationship types. In addition, we found

that is, more than a third of retweets in TRTS link only to the original tweet without any citations relating to other retweets, such as rt 2 and rt 6 , in Fig. 11(a) . We denoted these types of retweets as casual nodes and the corresponding participants as casual participants. Although casual nodes help spread information (the total number of reposted tweets is an important factor for tweet rank), those nodes did not contribute to the actual collaboration activities. Therefore, we excluded casual nodes and analyzed the remaining repost behavior, which involved a total of 249 237 retweets. Fig. 12 (a) shows one type of real interaction network extracted from the WIF instance in Fig. 11 , which was named the friendship-based reposting cooperation network (FRCN). The cartoon figures represent distinct participants, and directed edges indicate the type of social relationships between them [corresponding to the black directed edges Fig. 3(a) ]. The extraction process needs to visit all directed edges in Fig. 11(b) , retaining only the edges and the participants linked by edges that match the following conditions. That is, if there is a pair of users (wu i , wu j ) linked by edges in FRCN, then in Fig. 11(a) there is at least one RR that makes the following formula true:

Another type of real interaction network is shown in Fig. 12 (b); named stranger reposting cooperation network (SRCN) where the cartoon figures represent distinct participants, and directed edges indicate the information flow between them (corresponding the blue directed edges in Fig. 3) . The extraction process needs to visit all R Rs, where type = IV in Fig. 11(b) . If there is at least one R R matching the following condition, then we add the corresponding pair of users (wu k , wu l ) and one directed edge from wu k to wu l into SRCN:

Finally, on the basis of the WIF after Yi'liang earthquake, the FRCN consisted of 128 865 nodes and 159 379 edges and the SRCN consisted of 85 978 nodes and 104 410 edges. Therefore, it can be seen again that Sina-Weibo is an excellent synthesis of the traditional society of acquaintances and the strangers one.

2) Topology Measurement of FRCN and SRCN: Table III lists a comparison of topological properties between FRCN, SRCN, and HFS7; some corresponding distributions are presented in Fig. 13 . TABLE III   TOPOLOGICAL PROPERTIES COMPARISON OF THE HFS, THE FRCN, AND THE The values of N C and N G in Table III The comparison of ρ in Table III shows that values of the network densities of FRCN and SRCN are both much less than the HFS groups; the interaction groups in Sina-Weibo are looser and more disorganized than traditional online social forums, such as HFS communities. This allows Weibo users to group together quickly and leave quickly.

All of the degree distributions in Fig. 13 (c) and (f) are the power-law type, meaning that FRCN and SRCN both are scale-free networks. This also means that there are users whose tweets always receive the overwhelming public response (i.e., huge hubs), but the appeal of most of others is severely limited. For the scale-free property in a social network, the research team of Albert-László Barabás holds that it is a consequence of a decision-based queuing process: when individuals execute tasks based on some perceived priority, there will be heavy tail phenomenon [55] . We do not rule out the possibility, because the number being reposted is indeed an important factor for a tweet in a real-time-based push service shown in Fig. 7(c) . However, one thing is certain: without the strong appeal of hub users in either FRCN or SRCN, few messages could get widespread.

The comparison of C, avg.D, S, and d in Table III shows that FRCN and SRCN both are small-world networks. The avg.D values of FRCN and SRCN are both below the avg.D of HFS, and their C values are at least an order of magnitude lower than HFS, indicating that most users are not friends of one another. However, the values of S and d in Table III and the result in Fig. 13(g) and (h) both shows that the avg.D has not affected the social distance, i.e., Weibo information can always flow from one user to another through a small number of hops. With the SRCN, for example, each user has less than three adjacent users on average, but the typical distance between randomly chosen users still remains fewer than nine hops. Another finding about FRCN and SRCN is that the shortest paths of FRCN behave according to the negative binomial distribution.

Finally, on the basis of analysis in Fig. 6 , the rest of this section performed further analysis on how much the influential group may converge in FRCN and SRCN with the generalized Kendal−τ model again. FRCN and SRCN are two real interaction networks emerging in entirely different ways, so we wondered in particular whether the appeal of these users on the top of R F and R RT maintain the same powerful influence both in FRCN and SRCN. The two formulas in the following are for ranking users of FRCN and SRCN, respectively. And the results are shown in Fig. 14 

The total number of times a tweet is reposted is the most concrete indication of user influence. The comparison between the two curves of K(R RT , FRCN_R RT ) and K(R RT , SRCN_R RT ) in Fig. 14. A indicates that the influential users in FRCN cater to popular taste better than those in SRCN. The curve of K(SRCN_R RT , FRCN_R RT ) in Fig. 14(a) shows that an influential group does converge in FRCN and SRCN, but with limited overlap especially so for the large-scale and widespread reposting. In addition, the two curves in Fig. 14(b) have a similar level, it can be observed that the two influential groups in FRCN and SRCN have at least one thing in common: both have a massive following of fans. Therefore, the determinant of a tweet gets large-scale notice would likely be related to the number of followers the publisher has.

Weibo has been ubiquitously integrated into people's everyday lives in China. Both Sina-Weibo and Tencent-Weibo had more than five million users in early May 2013, and their open platforms (including the data API service) have been improved constantly. Although the very existence of user records raises huge concerns of privacy, these user records also create a historic opportunity, that is offering for the first time unbiased data of unparalleled detail on the behavior of not one, but millions of individuals.

In this paper, we performed a comprehensive analysis of Weibo information diffusion during earthquakes. We found that symbolic representation applied to the WIF model is indeed a feasible choice for the empirical study of human behavior based on online social media data sets. In retrospect, our primary inspiration came from the description mechanism of concepts and relationships in ontology theory. The main feature of this idea is that it can give a formal expression for the data structure and the analysis process (such as the extraction of FRCN and SRCN).

However, the structure of social networks is only a starting point. When people talk about the "connectedness" of a social network, in general, they are really talking about two related issues. One is who is linked to whom; and the other is the fact that each individual's actions have implicit consequences for the outcomes of everyone in the system [56] . In fact, Fig. 8 has given us some intuition that there is probably a serious structural imbalance between the "declared" relationship network and the real interaction network [57] [58] [59] . In addition, to measure public perceptions in emergencies, many researchers have worked extensively on the evolution of public opinion during information dissemination based on Twitter [60] , which allows for many interesting directions for future work.

DARPA Network Challenge Project Report

Time-critical social mobilization

Identification of influential spreaders in complex networks

Everyone's an influencer: Quantifying influence on twitter

Reality check for the Chinese microblog space: A random sampling approach

Measurement of microblogging network

Understanding Sina Weibo online social network: A community approach

Analyzing user relationships in Weibo networks: A Bayesian network approach

Recommender systems

Empirical analysis of online social networks in the age of Web 2.0

Artificial inflation: The real story of trends and trend-setters in Sina Weibo

Tweeting political dissent: Retweets as pamphlets in FreeIran, FreeVenezuela, Jan25, SpanishRevolution and OccupyWallSt. Integrated Performance Primitives

Identifying influential twitter users in the 2011 egyptian revolution

Social contagion: An empirical study of information spread on digg and twitter follower graphs

What stops social epidemics?

Predicting the temporal dynamics of information diffusion in social networks

Collective response of human populations to large-scale emergencies

Transforming earthquake detection

A study of the human flesh search engine: Crowdpowered expansion of online knowledge

Understanding crowd-powered search groups: A social network perspective

Earthquake shakes Twitter users: Real-time event detection by social sensors

A sensitive twitter earthquake detector

Earthquake: Twitter as a distributed sensor system

Tweet analysis for real-time event detection and earthquake reporting system development

Information balance between transmitters and receivers based on the twitter after great East Japan earthquake

Multi-level functionality of social media in the aftermath of the great East Japan earthquake

Information sharing on Twitter during the 2011 catastrophic earthquake

Social networks that matter: Twitter under the microscope. First Monday

Does social contact matter?: Modelling the hidden Web of trust underlying Twitter

Are our online 'Friends

Who says what to whom on twitter

Inferring networks of diffusion and influence

Social influence locality for modeling retweeting behaviors

Community structure in time-dependent, multiscale, and multiplex networks

Combinatorial analysis of multiple networks

Cytoscape: A software environment for integrated models of biomolecular interaction networks

Geographic distribution of, staphylococcus aureus, causing invasive infections in Europe: A molecularepidemiological analysis

Geography of Twitter networks

Measurement and analysis of online social networks

Influential users in social networks

Neocortex size as a constraint on group size in primates

Social networks: Human social networks

Too Many Friends?

Three Questions for Robin Dunbar

Modeling users' activity on twitter networks: Validation of Dunbar's number

Sharing the joke: The size of natural laughter groups

Is Dunbar's number up?

What is Twitter, a social network or a news media

Comparing top k lists

Agreeing to disagree: Search engines and their public interfaces

Experimental study of inequality and unpredictability in an artificial cultural market

Twitter under crisis: Can we trust what we RT?

Information contagion: An empirical study of the spread of news on digg and Twitter social networks

The quality of online social relationships

The origin of bursts and heavy tails in human dynamics

Networks, Crowds, and Markets: Reasoning About a Highly Connected World

Social balance on networks: The dynamics of friendship and hatred

The slashdot zoo: Mining a social network with negative edges

Signed networks in social media

Pandemics in the age of Twitter: Content analysis of tweets during the 2009 H1N1 outbreak