key: cord-0035672-pqwcoure authors: Hsiao, Ping-Nan title: A Social Network Model Based on Topology Vision date: 2009 journal: Complex Sciences DOI: 10.1007/978-3-642-02469-6_20 sha: fdc36d693dd916fa35982f178731faaa33a84145 doc_id: 35672 cord_uid: pqwcoure There are many researchers proposed social network models in recent years, and most of them focus on clustering coefficient property of a small-world network and power law degree distribution of a scale-free property. In social network topology, we observed the network is consisted of many nodes with small connectivity and a few high-degree nodes. In the small connectivity part, there are many nodes which have only one degree. Most of past social network models can not generate this part. In this paper, we proposed a social network model based on topology vision and with tunable high hub connectivity. At the same time, we suggested a new characteristic of social network, condensed clustering coefficient, to replace the original clustering coefficient. Finally, this study also includes the analysis of real social network data. Social networks are a description of relationship between people. The relationships include friends, co-authorships, human sexual relations, and other interactions between people. In this study, an individual person is as a node and the relationship between two persons as an edge. If there is any relationship between two persons, one edge is added between the two nodes. Therefore, a social network is constructed. Recent research about social network analysis [1] [2] [3] [4] [5] [6] focused on observing clustering coefficient property, and power law degree distribution phenomena. In this paper, we proposed a social network model based on topology vision and with tunable high hub connectivity. In last ten years, many social network models were proposed [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] . The social network model has been used for the simulation of epidemiological transmission [17] [18] , rumor spreading [19] [20] , opinion formation [21] and the spread of computer viruses in email network [22] . Since social networks can be found in many aspects of our lives, it is worth to investigate how they work precisely. In this paper, some background knowledge is in Section 2. In Section 3, the data of real social network from Internet is analyzed. In Section 4, a social network model is proposed, which both have scale-free property and high clustering coefficient. In addition, the network model contains tunable high-hub connectivity property. The simulation results are shown in Section 5. Conclusions and our discussion for future works are presented in Section 6. In this chapter, some characteristics of network topology which are used in this study are introduced. A network consists of a set of nodes and many edges between the nodes. In this paper, the network is simplified as a connected network with nonweighed and non-directed edge. The average shortest path is the average of shortest path between all pair nodes in the network. The average shortest path (sometimes called average distance) between network nodes, denoted by L, is calculated by the formula. where δ(i,j) is the distance of shortest path between node i and node j, and N is the number of nodes in the network. The L indicates the average distance to communicate from one node to the other. The power law degree distribution is a property of a scale-free network. A scale-free network is inspired by empirical studies such as the Internet backbone network [23] , web documents link network [24] , and email network [25] [26] . The main characteristic property of a scale-free network is that it consists of a few large connected nodes and many nodes with small connectivity. In 1999, Barabási and Albert [27] proposed the BA model to descript the scale-free network. The network of BA model grows by adding one new node at a time, denoted by t, which randomly connects to m nodes already in the network. The connecting probability depends on the node degree it was connected to. For node i with node degree k i , the connecting probability Π(k i ) is denoted by The connectivity distribution pr(k) of the resulting network, which is defined by the probability of a node having k edges. The connectivity distribution for scale-free networks is a decaying function of k's power following This phenomenon is sometimes called power law distribution and γ is the tail index. The BA model for scale-free network construction with a preferential attachment strategy creates a scale-free network with tail index γ=3. In 1998, Watts and Strogatz (WS) [5] proposed the small-world phenomena. The small-world network is between regular and random network. In the small-world network, the clustering coefficient is large, and the average shortest path is small. The clustering coefficient describes how close among the neighbors of a node. The formula of calculating clustering coefficient of node i is the ratio of actual number of edges connecting these nodes in its k neighbors to the number of edges in a fully connected network of the k nodes, denoted by C i , where E i is the number of edges in the sub-network formed by node i and its E i neighbors. The clustering coefficient of the entire network is defined as the average of all nodes. In a social network, there are some nodes which only have one degree. We defined the O is the ratio of how many nodes which only have one degree to the total nodes in the network. If there are more nodes which only have one degree, the O will be higher. For the nodes which only have one degree, their clustering coefficient will be zero, since the E i in (4) will be zero. Because we did not include the self-loop in shortest path L in (1) which is zero, we consider the condensed clustering coefficient C', which is also not to include the C i which is zero. For the C', the formula is In 2005, Lun Li et al. proposed the s-metric. In a society, the famous people represent high-degree nodes in a social network. The s-metric measures the level of high-degree nodes connectivity. Let g be an undirected graph with edge-set ε, and let degree of node i be d i , then the metric is defined as and the s max is the maximum value of s(g). The value of s(g)/s max , denoted by S, describes the level of a large hub connected to another large hub, and the more highhub nodes are connected, the higher value of S is. We collected the real network data from the Opel Astra Club (OAC) in Taiwan [28] at June 2008. This is a forum based website. Most of users in the website own and drive an Opel Astra car. They can post articles to forum, upload photos to album or send private messages to other users. We analyzed their private message database. If user A sent a message to user B, and user B also sent a message to user A, in addition, they kept the message in their inbox and not deleted, then we added one edge between node A and node B. In the website, the total number of the private message database was 21,691, and total number of users which had used private messages was 1,378. After deleting duplicated records, the construct connected network was consisted of 835 nodes and 2,223 edges. 1 shows the degree distribution plot of the OAC network. In this figure, we can see clearly the degree distribution following a straight line, which indicates distinguished scale-free property. Table 1 shows the results of the average distance L, clustering coefficient C, tail index γ and s(g)/s max value of the OAC network. The tail index γ is below 3, which indicates scale-free property. The clustering coefficient is much higher than corresponding random network, which indicates that there are many neighbor nodes have the same neighbor nodes. In topology vision, we found that there are many nodes which only have one degree in social networks, which will appear in the position which is 1 of x-axis of degree distribution. In BA model, it cannot generate any nodes which only have one degree. At the same time, we reviewed many social network papers which have presenting degree distribution of real network; most of them have lots of nodes which only have one degree. Therefore, we decide to design a model to fit it. Our model is based on BA model, and then modified for clustering coefficient property and high-hub connectivity. In the real social network data, there are lots of nodes which degree number is one, but if we use the BA model to construct a network, every node has at least m edges. There are not any nodes which have only one degree unless the m is 1, but this will cause the average degree 2m is limited to 2. Therefore, we modified the growing stage of BA model. The algorithm of this model is described as follows: Step1: Initialize stage: Add m nodes into the network, and add one node which connects to the m nodes by m edges. There are m+1 nodes and m edges in the network in this stage. Step2: Growing stage: We defined four functions at this stage: First Attach (FA): The edge is connected between the new adding node and the exist nodes of the network which follow preferential attachment; Friend Function (FF): The edge is connected between the two unlinked neighbor nodes of previous link nodes. The two neighbor nodes have weight by its degree number in connecting condition; s max Function (SF): The edge is connected between the two unlinked nodes which follow s max property; Random Function (RF): The edge is connected between any two unlinked nodes which are random chosen. After initialize stage, we have (N-(m+1)) loops to add nodes and edges. Let total degree number be K, therefore the average edge, denoted by E loop , at every loop should be added is At every loop, it begins with adding one new node and connecting to the network by preferential attachment, which is FA. Other edges connect between two nodes which are FF or SF or RF by parameter f and x. The three functions are included in the Attach_edge function of our algorithm. The parameter f stands for clustering coefficient part and parameter x represents high-hub connectivity part. In addition, we define a variable β to check whether need to do once more edge attaching, since the E loop may be not integer, but the times number of a for-loop is integer. Our model is demonstrated as the following pseudo codes: If (!Is_attach(x)) RF; Else SF; } We used our model to simulate the social network for comparison with the OAC network experimental results. The results of simulation are shown in Table 2 . In this table, some cells are colored in gray to indicate that their values are 5% smaller or bigger than those in empirical data. In the simulation of the OAC, we used parameters f=0.10 to 0.25 and x=0.05 to 0.30. In clustering coefficient C, the suitable f is 0.15, and x is from 0.15 to 0.30. In condensed clustering coefficient C', the suitable f is 0.20, and x is from 0.05 to 0.20. In shortest path L and s-metric S, all parameters are suitable. The tail index γ of the OAC is 1.439, but our model only generates 1.52 to 1.68 under the parameters. The one degree in the network O of the OAC is 0.339, but our model only generates 0.19 Fig. 3 . The clustering coefficient C with node N=2000, degree k=4 to 9, parameter f=0.5, and parameter x=0 to 1 to 0.24 under the parameters. The reason is that we use only two parameters to tune our social network, and that is a little difficult to control six characteristics. In this simulation, we choice f=0. 15 and x=0.20 to plot the degree distribution as Fig. 2 , and it demonstrates scale-free property. Furthermore, we used other parameters to simulate the social network. The parameters in Fig. 3 , 4, 5, 6 and 7 are node N=2000, degree k=4 to 9, and parameter f=0.5. Fig. 3 shows the clustering coefficient relation with parameter x=0 to 1, and when x is large, C is large. The reason is that, when x is large, there will generate more high-degree nodes. The more high-degree nodes cause higher clustering coefficient since the neighbor nodes of high-degree nodes maybe become their neighbors' neighbor nodes. Fig. 4 demonstrates the tail index relationship with the same parameters of Fig.3 . The more x makes, the less γ is. The reason is that, the large x is, the larger hubnodes become. The more large hub-nodes make, the less tail index γ is. Fig. 5 shows the s-metric relationship with the same parameters of Fig. 3 , and this is absolutely for large x made large S. Fig. 6 displays the one degree in the network relationship with the same parameters of Fig.3 . The more x makes, the more O is. Since the more x makes more hub-nodes, and cause the other nodes have less degree, and the more O is. Fig. 7 demonstrates the condensed clustering coefficient relationship with the same parameters of Fig. 3 . This figure is like Fig. 3 , and then divides by (1-O) . That presents the clustering coefficient eliminates the nodes which only have one degree. Fig. 7 . The condensed clustering coefficient C' with node N=2000, degree k=4 to 9, parameter f=0.5, and parameter x=0 to 1 In this paper we proposed a social network model based on topology vision, and try to define a new characteristic which is condensed clustering coefficient. The condensed clustering coefficient did not include the nodes which only have one degree, since their value of clustering coefficient is zero. The social network analysis can be used in business for improving knowledge creation and sharing, and city planning for designing spaces to support human interaction [31] [32] . Besides, it can also be used for the simulation of epidemiological transmission [17] [18] , the rumor spreading [19] [20], opinion formation [21] , and computer virus dissemination [22] . Our model is compared with the OAC network which is formed by forum, and fit it approximately. Therefore, we think that our model is suitable in simulation of rumor spreading and opinion formation. Using social network model to simulate epidemiological disease spreading allows us to predict its damage under different measures. We believe that this study constitutes a valuable contribution to understanding the phenomena of social network. Classes of Small-World Networks Clustering and Preferential Attachment in Growing Networks The Web of Human Sexual Contacts The Structure of Scientific Collaboration Networks Collective Dynamics of 'Small-World' Networks Evolution of the Social Network of Scientific Collaborations Highly Clustered Scale-Free Networks Growing Scale-Free Networks with Tunable Clustering Growing Scale-Free Networks with Small-World Behavior Geography in a Scale-Free Network Model Structure of a Large Social Network Models of Social Networks Based on Social Distance Attachment Evolving Scale-Free Network Model with Tunable Clustering A Spatial Model for Social Networks A Model for Social Networks Modeling Email Communications. IEICE Transactions on Information and Systems E87-D Epidemiology, Transmission Dynamics and Control of SARS: the 2002-2003 Epidemic Network Theory and SARS: Predicting Outbreak Diversity Dynamics of Rumor Spreading in Complex Networks The Spread of Gossip in American Schools Phase Transitions as a Persistent Feature of Groups with Leaders in Models of Opinion Formation Email Networks and the Spread of Computer Viruses Heavy-Tailed Probability Distributions in the World Wide Web Internet: Diameter of the World-Wide Web Heavy Tail Distribution in Email Network Scale-Free Topology of E-mail Networks Coordination and Cooperation in Local, Random and Small World Networks: Experimental Evidence Scaling Exponents and Clustering Coefficients of a Growing Random Network IT to Support Knowledge Sharing in Communities, Towards a Social Capital Analysis A Bird's-Eye View: Using Social Network Analysis to Improve Knowledge Creation and Sharing