Microsoft Word - Volume37_final insna.org | Issues 1&2 | Volume 37 | 23 Are We in Agreement? Benchmarking and Reliability Issues between Social Network Analytic Programs Philip J. Murphy Middlebury Institute of International Studies Monterey, CA, USA YuFei Wang Middlebury Institute of International Studies Monterey, CA, USA Karen T. Cuenco University of Pittsburgh Pittsburgh, PA, USA Abstract Reliability and validity are key concerns for any researcher. We investigate these concerns as they apply to social network analysis programs. Six well-used and trusted programs were compared on four common centrality measures (degree, betweenness, closeness, and eigen- vector) under a variety of network topographies. We identify notable inconsistencies between programs that may not be apparent to the average user of these programs. Specifically, each program may have implemented a variant of a given measure without informing the user of its characteristics. This presents an unnecessary obfuscation for analysts seeking measures that are best suited to the idiosyncrasies of their data, and for those comparing results between programs. Under such a paradigm, the terms in use within the social network analysis community become less precise over time and diverge from the original strength of network analysis: clarity. Acknowledgements The authors would like to thank Elma Paulauskaite and Maizy Cuenco for their help in putting together this research. This work was funded in part through a grant from the National Institutes of Health: Grant number NIDCR 1R03DE020127. Authors Philip J. Murphy, Middlebury Institute of International Studies, Monterey, CA, USA. YuFei Wang, Middlebury Institute of International Studies, Monterey, CA, USA. Karen T.Cuenco, Department of Human Genetics, University of Pittsburgh, PA, USA. Correspondence concerning this work should be addressed to Philip J. Murphy, Middlebury Institute of International Studies, 460 Pierce St., Monterey CA 93940 USA, pjmurphy@miis.edu; phone: 831- 647-4600; fax: 831-647-6693 Connections Benchmarking and Reliability Issues 24 | Volume 37 | Issues 1&2 | insna.org Introduction An important part of the appeal of social network analysis originates from the in- corporation of mathematical descriptions for social relations. This has made it possi- ble to provide clear and unambiguous defi- nitions for concepts relating to relational structures. The clarity of communication that resulted from this intersection of mathematics and social sciences has been credited with much of the field’s early growth (Freeman 1984). As Freeman re- lates, [From] the start, contributions to social network analysis were often couched in mathematical terms. The rela- tive precision of these mathematical treat- ments gave social networks an advantage. Because of that precision, the network field did not generate the same kinds of quibbles and misunderstandings over terms and concepts that lead to conflict in fields that are wedded to a natural language. (2004) The mathematical core of social network analysis has delivered the dual benefits of precision and flexibility. Equa- tions are clear to the point that those who were interested in the topic could build up- on one another’s work with minimal need for clarification. But mathematical defini- tions are also general enough to allow for their application in a variety of relational contexts. The structural measures that form the core of social network analysis have thereby proven to be compelling in a varie- ty of contexts and interests. The logarith- mic proliferation in where and how social network analysis has been applied (Otte and Rousseau 2002, Freeman 2011) is tes- tament to the scalability of the field. The diverse purposing of social network analysis has been mirrored by a corresponding proliferation in the number and variety of software packages that are available today in the field of social net- work analysis. Although software develop- ers and programmers have put a great deal of effort into producing network analytic software that is suited to a wide variety of needs and applications, no single piece of software is generally applicable to every situation. Software packages have been op- timized for efficiency, analytic variety, an- alytic specificity, ease of use, specialized data handling, greater capacity for visuali- zation, and for terminology and concepts that are tailored to a particular end-user. Software also differs in style of user inter- face, method of reporting, and even the de- fault methods for scaling output. As the available packages continue to diversify, their content is also converg- ing. However, the question of whether the analytic functions across programs are tru- ly equivalent and exchangeable arises. Are the names being used to identify each function explicit in what they identify, or are they only referring to a generalized class of functions? Naming conventions are important. The developers of network analytic soft- ware are faced with decisions about how they should implement a particular analytic function. Some software developers may choose to incorporate the ability to handle common topological features of social networks (e.g., loops, multiple compo- nents) by default, while others may choose a stricter interpretation of how the measure or algorithm should perform. Under such a paradigm, the terms in use within the so- cial network analysis community become less precise over time and diverge from the original strength of network analysis: clari- ty. It is possible that a measure or algo- rithm differs by implementation in order to address some given scenario or feature of network topology, and that it therefore bears unique attributes that constitute a trade-off at some level. It is therefore valu- able to both analysts and the social net- work analysis community for any such dif- ferences to be explicit, or systematized. The issue of whether two software implementations produce the same measures is especially important when us- ing two programs in concert. In such situa- tions, consistency of output indicates that the user is introducing a minimum of vari- ability when moving from one program to Benchmarking and Reliability Issues Connections insna.org | Issues 1&2 | Volume 37 | 25 another. The equivalency of network met- rics is important as small variations in such basic measures may translate into large dif- ferences on more complex algorithms. However, procedural dissimilarities in programs’ calculations of measures can be difficult to identify and frequently lack documentation. Differences in how various soft- ware programs provide output constitute a barrier to assessing consistency of measures from program to program. The variety in default output styles (e.g., raw scores, normalized scores, scalar multi- plied output) makes it difficult to visually compare raw output. Even if there were to be no meaningful difference in the infor- mation provided in the output, the empiri- cal differences that are evident on casual inspection make such a judgment more dif- ficult to establish. The numbers may look different, but in many cases the user would never know just by visual inspection. Table 1: Analytic interfaces used in this study Our primary focus is an assessment of inter-program reliability and three relat- ed questions. Are the various software pack- ages producing consistently equiva- lent results? If not, how do they differ? Under what conditions do the centrality outputs diverge, if divergences exist? To assess inter-program reliability, we focused on the basic building block of network analysis: node centrality. Specifi- cally, the investigation involved the four most commonly applied centrality measures: degree, betweenness, closeness, and eigenvector (Valente et al. 2008). Such measures are often fundamental to social network analysis. Here, we evaluate and report on the consistency of basic measures of node centrality from across various software platforms in standardized simula- tions. Materials and Methods Six software packages for social network analysis were compared in terms of their calculations of four basic measures of node centrality in each of four networks. We se- lected popular network analytic software that are self-contained (UCINET, Pajek, ORA, and Gephi), or available through CRAN R archive (sna and igraph) packag- es (Table 1, below). Measures The “big four” centrality measures (degree, betweenness, closeness, and eigenvector12 centrality) were calculated using each ana- 12 The program Pajek does not include a measure titled eigenvector centrality. In cases of undirected networks, Pajek’s “hubs and authorities” measure is analogous to eigenvector centrality and was used for the purpose of comparison. Version Source UCINET 6.564 http://www.analytictech.com/ Pajek 4.03 http://pajek.imfm.si/ ORA-NetScenes 3.0.9.9.20 http://www.casos.cs.cmu.edu/projects/ora/ Gephi 0.8.2 http://gephi.org/ sna 2.3-2 http://www.statnet.org/ igraph 0.7.1 http://igraph.org/ Connections Benchmarking and Reliability Issues 26 | Volume 37 | Issues 1&2 | insna.org lytic interface. These measures were se- lected because they are basic measures that are frequently used alone for analytic in- ference, as well as functioning as constitu- ent parts of more complex algorithms. Var- iability in these measures may lead to downstream variation in analytic results for more complex algorithms. Some pro- grams produce output in multiple formats (e.g., raw, normalized, scaled). When pos- sible, similarly scaled output was com- pared (Table 2, below). Table 2: Output scaling Centrality measures were deemed to be optimally correspondent if the Pear- son correlation coefficient comparing two centrality values was 1.0000. The closer the correlation is to the optimally con- sistent value and when scatterplots lie along a 45° line, the better the centrality values concur between software calcula- tions. Optimal consistency is invariant to scale and magnitude differences in raw centrality values. If the correlation fell be- low 1.0000, then there was suggestive evi- dence that these measures lacked consen- sus across software. Scatterplots offer additional insight for comparing similar measures that em- ploy different scales. If any nodes are of particular interest to the analyst then poten- tial variations in their measurement may become very important. Deviations in the measurement of a small number of nodes within a large network could still occur within the correlation threshold that we se- lected. Scatterplots were therefore used to identify or characterize variation in meas- urement and assess whether such variation, if present, is a singular anomaly (e.g., dif- ferences in floating point) or appear to be deviations that are patterned (i.e., errors that arise from differences in the assump- tions behind how a measure should be cal- culated). Datasets Network data (graphs) of variable size and modality were generated to compare cen- trality measures across software packages. Undirected one-mode [small (n=35) and a moderately large (n=2000)] and two-mode [small (n1=10, n2=25) and a moderately large (n1=300, n2=1500)] network datasets were generated. Initially, both the one- mode, and two-mode networks contained smaller disconnected components (e.g., isolates and/or other small components) in addition to a large main component (Table 2). One-mode networks also contained loops. New networks were created by re- moving loops, removing smaller compo- nents, or both, from the initial network in order to model a variety of conditions. This resulted in twelve networks: both large and small networks that contain either loops, or disconnected components, or both, or nei- ther; as appropriate for one- or two-mode networks. Each dataset was designed to be well within the data handling limits of each of the software packages that we evaluat- ed. Most of the programs tested were lim- ited mainly by concerns such as network density and size, in addition to the proces- sor speed and the amount of available memory in a given computer. All are Degree Closeness Betweenness Eigenvector Gephi Raw Normalized Av- erage Raw Scaled (max=1) Pajek Raw Normalized Normalized Normalized UCINET Raw Average Raw Normalized ORA Normalized Normalized Normalized Normalized sna Raw Normalized Raw Normalized igraph Raw Normalized Raw Scaled (max=1) Benchmarking and Reliability Issues Connections insna.org | Issues 1&2 | Volume 37 | 27 Table 3: Data used for reliability comparisons capable of handling networks into the tens of thousands of nodes, with some capable of handling networks into the millions of nodes. Centrality measures were calculat- ed using all six programs under a variety of conditions on all four networks, where ap- plicable.13 All graphs were undirected, with no multiple edges. These graphs were analyzed under multiple conditions: 1) loops (edges that recursively link a node to itself), but no disconnected components present; 2) disconnected components, but no loops present; 3) loops and disconnect- ed components present; and 4) a reference graph with no loops or disconnected com- ponents. For a more detailed description of the procedures used for each software pro- gram, see Appendix 1. Results Our findings, presented in brief form in Table 4, demonstrate that differences be- tween analytic programs exist on each measure, with the notable exception of betweenness centrality. Results are pre- sented below by measure, and within each measure, by network condition. Results are presented in a manner that highlights some of the most common or notable differences between programs. Consistency was considered to be “high” when no notable 13 Loops were not considered to be a feature that is consistent with the definition of two-mode networks as consisting of two sets of nodes that have ties between but not within each node set. The two-mode networks were therefore not evaluated for network data with loops. difference arose, “medium” when the out- put from one or two software implementa- tions differed from others, and “low” when the output from more than two software implementations differed from others. For the sake of brevity, many of the cases where all programs demonstrated high consistency in the measures produced (“High” in Table 4) are not discussed but may be noted in the table below. For all measures, with the exception of eigenvec- tor, differences in output were generally more pronounced in smaller networks. Data Nodes in Main Component Nodes in Smaller Components Number of Loops Max. Number of Nodes Average Degree Small One-mode 29 6 6 35 3.5 Large One-mode 1876 112 60 2000 3.0 Small Two-mode (10, 21) 4 NA (10, 25) 3.7 Large Two-mode (300, 1815) 185 NA (300, 1700) 3.7 Connections Benchmarking and Reliability Issues 28 | Volume 37 | Issues 1&2 | insna.org Table 4: Consistency of output by centrality type and network conditions High = Completely consistent, Medium = One or two programs vary from the others, Low = More than two programs offer unique results Closeness Centrality Closeness centrality measures showed the least amount of measurement variability in ideal networks (i.e., no loops or discon- nected components), or in networks con- taining loops, but not in disconnected components. In networks containing no loops or disconnected components, plots indicated that calculations of closeness centrality were consistent between Pajek, sna, igraph, ORA, and UCINET; but only when UCINET was calculated using Freeman (1979) normalization. UCINET and Gephi also correspond when closeness measures in UCINET are report- ed as summed or averaged distances. In this condition, both UCINET and Gephi produce output for Freeman closeness with smaller values indicating shorter average distances from a particular node to all oth- ers in the graph (see negative correlation coefficients [small graph r = -0.9903, large graph r = 0.9883]). UCINET also offers an “Average Reciprocal Distance” measure (ARD) that corresponds more closely with other programs (small graph r =0.9850, large graph r =0.9990). No Disconnected Components & No Loops Disconnected Components Loops Disconnected Components & Loops 1 Mode 2 Mode 1 Mode 2 Mode 1 Mode 2 Mode 1 Mode 2 Mode Between- ness Cen- trality High High High High High NA High NA Degree Centrality High Medi- um High Medi- um Low NA Low NA Eigenvec- tor Cen- trality Medium Medi- um Medi- um Low Medium NA Low NA Closeness Centrality Medium Medi- um Low Low Medium NA Low NA Benchmarking and Reliability Issues Connections insna.org | Issues 1&2 | Volume 37 | 29 Figure 1: Scatterplot matrix comparing closeness centrality output for a large, two-mode network. Pearson’s correlation coefficients between programs are provided above the diagonal. In two-mode networks, neither UCINET, nor Gephi produced results that corresponded with other programs (Figure 1). UCINET, the only one of these pro- grams to include a closeness measure de- signed explicitly for use with two-mode data, produced a bifurcated plot in both large and small two-mode networks, though the effect is more pronounced in the larger networks (pictured, Figure 3). While the numeric values are different than those seen in degree centrality, the split- line pattern was similar to that observed in two-mode data without loops, but with dis- connected components included, and is at- tributable to UCINET’s distinctive treat- ment of two-mode output. Networks that contain disconnected components, but no loops, resulted in the greatest disparities in closeness centrality measurements. Although all software test- ed cited Freeman as the reference for their centrality measure, only sna seemed to closely implement Freeman’s (1979) ap- proach, and therefore produced no centrali- ty values when disconnected components were included in the graph, as expected under Freeman’s approach. All other tested software generated closeness centrality values, as did sna when disconnected com- ponents were not present. Although UCI- NET (as of version 6.452) no longer pro- vides a warning to the user that analyzing a disconnected graph with Freeman’s close- Connections Benchmarking and Reliability Issues 30 | Volume 37 | Issues 1&2 | insna.org ness centrality measure is technically inap- propriate, it does require the user to select between options for handling the unde- fined distances offered by disconnected components. Of the software that produced output under these conditions, values were disparate and only igraph and ORA were consistent with one another. (Figure 2) In considering networks with both loops and disconnected components, there was a similar disparity of closeness cen- trality measures as seen in graphs with no loops, but with disconnected components. The same pairs of consistent and incon- sistent software values as seen in the graphs with disconnected components were observed with loops added to the data. Figure 2: Scatterplot matrix comparing output for closeness centrality in a small, one-mode network. Pearson’s correlation coefficients between programs are provided above the diagonal. Note that the sna package for R does not produce measures between disconnected components, resulting in correla- tion values listed as “NA”. Benchmarking and Reliability Issues Connections insna.org | Issues 1&2 | Volume 37 | 31 Degree Centrality In networks with no loops, degree centrali- ty was consistent across software in one- mode networks, with the exception of UCINET. A similar pattern was observed for two-mode networks. For these two- mode data, UCINET values fell into two distinct groups in the plots contrasting UCINET with other software. The data are positively correlated, but some stratifica- tion is present. This pattern was similar for both large and small two-mode networks, though the effect is more pronounced in the larger networks (Figure 3). UCINET normalizes output for nodes in each mode individually, an aspect that differentiates it from other tested programs when handling two-mode data. Such differences between UCINET and other programs are eliminat- ed if the network is converted to bipartite, to be analyzed as a one-mode network. The measurement of degree centrality was con- sistent among all programs in networks that contained disconnected components with no loops. Figure 3: A variety of solutions are possible when analyzing two-mode networks in UCINET. Top Row: Scatterplots of UCINET’s degree (r = 0.3713) and closeness (r = -0.0134) output using the two- mode centrality procedure, compared with other analytic packages. All other packages performed identically. Bottom Row: When transformed into a bipartite network format, UCINET calculates as for a one-mode network, and results are analogous to other packages. Closeness centrality for the bipartite aspect was calculated using Freeman normalization in UCINET. Connections Benchmarking and Reliability Issues 32 | Volume 37 | Issues 1&2 | insna.org One-mode networks containing loops generated the greatest variability in measures of degree centrality across soft- ware (Figure 4). For both small and large networks with loops, calculations of degree that are made without modification of the data structure were consistent only be- tween UCINET and ORA, between igraph and Pajek, and between sna and Gephi (see Figure 4 for an example in a small net- work). In networks with both loops and disconnected components, the patterns were essentially the same as those ob- served for one-mode networks with loops only. No other new patterns were apparent. The variations in output stem from how each program handles loops. Program defaults counted single loops as two edges (Pajek and igraph), loops as one edge (UCINET and ORA), or ignored loops en- tirely under the default commands (Gephi and sna). Note that the two R packages (sna and igraph) differ in their default treatment of loops, with igraph defaulting to include loops and sna defaulting to ig- nore them in calculations. When the sna package was modified to include loops (di- ag = TRUE) in the calculation of degree centrality, sna counted all loops as one edge and the output was consistent with that of UCINET and ORA. Gephi counted all loops as two arcs in a manner that was consistent with Pajek and igraph. Figure 4: Scatterplot matrix comparing degree centrality output for a small, one-mode network con- taining loops. Pearson’s correlation coefficients between programs are provided above the diagonal. Benchmarking and Reliability Issues Connections insna.org | Issues 1&2 | Volume 37 | 33 Eigenvector Centrality Eigenvector centrality was inconsistent across software packages and network types. In networks with no loops or dis- connected components, eigenvector cen- trality measures were inconsistent between Gephi and other programs in moderately large, but not small one-mode networks. Changing the default number of iterations in Gephi’s eigenvector centrality measure from 100 to 1,000,000 greatly improved the consistency of measures between pro- grams; however, a small disparity remains for one-mode networks (r = 0.9901). In two-mode networks, igraph, ORA, Gephi, and Pajek’s “2-mode im- portant vertices” function produced results that were largely consistent with UCI- NET’s two-mode eigenvector centrality (Figure 5). Pajek’s “hubs and authorities” measure (designed for one-mode networks) and sna produced results that are consistent with one another (not shown). In large two-mode networks, however, the output from Gephi was again characterized by some small disparities (Figure 5). Figure 5: Scatterplot matrix comparing eigenvector centrality output for a large, two-mode network. Pearson’s correlation coefficients between programs are provided above the diagonal. Pajek output for this plot was calculated using “important vertices”, a two-mode generalization of hubs and authorities. Connections Benchmarking and Reliability Issues 34 | Volume 37 | Issues 1&2 | insna.org Networks containing loops, but lacking disconnected components resulted in additional variability in measures of ei- genvector centrality across software. As observed for degree centrality, the correla- tion between programs’ centrality values was high; however, a separate set of points forming a group off of the diagonal ap- peared. Eigenvector centralities calculated in large, one-mode networks resulted in correspondence between UCINET, ORA, igraph, and the “hubs and authorities” measure in Pajek. The result was similar in small one-mode networks. A correspond- ence between sna (which defaults to ignor- ing loops) and Gephi is also noted (Figure 6). The sna “evcent” function offers two additional options for calculating eigenvec- tor centrality. The sna evcent function with included loops (diag=TRUE argument) yielded eigenvector scores correlated (r = 1.0) with all other software except for Gephi results. However, when combining presence of loops with the more robust calculation of eigenvector centrality (di- ag=TRUE, use.eigen=TRUE) specified in the user manual, the outputted eigenvector is inversely correlated (sna : other packag- es, r = -1.0; sna : Gephi, r = -0.89). The variability in eigenvector centrality scores was noted in large and small networks, but more pronounced in the former. Figure 6: Scatterplot matrix of eigenvector centrality output for a small, one-mode network with loops. Pearson’s correlation coefficients between programs are provided above the diagonal. Note, initial cal- culations in sna – shown above – were run using the default argument (diag=FALSE). For additional variation, consult the text above. Benchmarking and Reliability Issues Connections insna.org | Issues 1&2 | Volume 37 | 35 The igraph package produced eigenvector output that differed slightly from other programs in networks that contain discon- nected components, but no loops (small networks r = 0.9890). The disparity was much less pronounced in large networks (r = 0.9997). The above patterns of inconsisten- cies in calculating eigenvector centrality persisted in networks with both loops and disconnected components. In small net- works of this type, Pajek, UCINET, and ORA produced measures that were con- sistent with one another. Similarly, sna and Gephi also produced nearly identical out- put in small networks with both loops and disconnected components. In larger net- works, however, the similarities between sna and Gephi diminished. Only Pajek, UCINET, and ORA were highly consistent (see Figure 7). Figure 7: Scatterplot matrix comparing eigenvector centrality output for a moderately large, one-mode network containing loops and disconnected components. Pearson’s correlation coefficients between programs are provided above the diagonal. Betweenness Centrality Measurement of betweenness centrality was virtually unaffected by the various network conditions being evaluated (i.e., loops, disconnected components). Measures were consistent for each of the tested packages, on every dataset (see Fig- ure 8 for an example). The one, very slight, exception was in UCINET’s two- mode measures. UCINET differed slightly from other programs in measuring be- tweenness in the small two-mode network Connections Benchmarking and Reliability Issues 36 | Volume 37 | Issues 1&2 | insna.org (r = 0.9996, accompanied by slight jitter in scatterplots – not shown). However, no dif- ferences were apparent if the same network was converted to bipartite and the one- mode variation of the betweenness meas- ure was used instead. For more examples of the differences between UCINET’s two- mode measures and other programs’ ap- proaches to two-mode networks, see Fig- ure 3. Figure 8: Scatterplot matrix comparing betweenness centrality output for a large, one-mode network. Pearson’s correlation coefficients between programs are provided above the diagonal. Benchmarking and Reliability Issues Connections insna.org | Issues 1&2 | Volume 37 | 37 Discussion This study was designed to examine basic reliability concerns among a selection of popular tools in use within the social net- work analysis community. Specifically, we investigated whether some popular soft- ware packages were producing equivalent results. We found variability that brings to light disagreement, sometimes substantial, over how four concepts of node centrality should be measured. The programs under consideration were only able to produce the same output under a very narrow set of conditions. Disagreements over aspects of how these measures should be operational- ized manifested as networks departed from the ideal reference graphs that contained no loops or disconnected components. Such variability precludes the ability to seamlessly port data and/or exchange measures between programs and makes it essential for the user to have access to evaluations that highlight differences be- tween the default, and available, options for various measures when using two or more programs in concert. Within the so- cial network analysis community, the dif- fering assumptions behind the various measurement variations unnecessarily cloud communication between users of dif- ferent programs and leave enough doubt in the minds of new entrants as to whether the community has unified its language. Be- low, we discuss in greater detail our inter- pretation of the results, the implications of our findings for the average user, and the implications for the social network analy- sis community. General findings By employing hierarchical subsets of net- work conditions, we isolated measure dif- ferences under specific conditions. The use of varying network conditions was intend- ed to better reflect a range of network data that are likely to be encountered. Condi- tions in the undirected networks ranged from the “ideal” of reference data – no loops or disconnected components – to scenarios commonly encountered when analyzing social networks, namely, loops, disconnected components, and the combi- nation of the two. In general, centrality measures for reference graphs – those with no loops or disconnected components – were largely consistent. This may be taken to imply that programs are implementing the same – or very similar – algorithms for the offered measures, albeit mainly in the absence of issues that may complicate calculation, such as loops and disconnected compo- nents. A notable inconsistency did, how- ever, arise in the analysis of the two-mode reference networks. Of the tested software, only UCINET offered measures tailored specifically for two-mode networks. Cor- respondingly, analytic results reveal a bi- furcated pattern when comparing calcula- tions of degree and closeness in UCINET to those of other software packages, along with slight differences in betweenness measures in small two-mode networks. No other programs demonstrated a pattern that corresponded to that of UCINET when measuring degree, closeness, or between- ness. When measuring eigenvector central- ity, however, three additional programs (igraph, ORA, and Gephi) evince a bifur- cated pattern that corresponds very closely with UCINET. (Figure 9) Connections Benchmarking and Reliability Issues 38 | Volume 37 | Issues 1&2 | insna.org Figure 9: Scatterplot matrices comparing degree and eigenvector output for a two-mode network. Nei- ther network contains loops or disconnected components. Pearson’s correlation coefficients between programs are provided above the diagonal. There was a surprising amount of inconsistency in the most basic measure: degree centrality. Programs’ definitions identified degree centrality as either the number of neighbors adjacent to a node, or the number of edges incident upon a node. However, many programs did not provide a citation for this measure. Among those that do, the Freeman (1979) definition is employed, which does not account for a topological feature that is common in bio- logical, corporate, citation, and other net- works: self-referencing, or loops. Freeman implements a variation on Nieminen (1974), which accounts for the number or proportion of other nodes that are adjacent to a particular node, but not for nodes that are adjacent to themselves (i.e., loops). The evaluated programs defaulted to three dif- ferent methods for dealing with loops in a graph, revealing the variation in degree centrality calculations. This disparity be- tween definitions of what constitutes a loop and how its effect should be measured in such a conceptually simple calculation suggests the need for the community of social network analysts to strengthen naming and meas- urement standards. Such a process may re- duce error in interpretations resulting from what are actually different measures resid- ing under the same name. The problem of different measures residing under the same name is exempli- fied when considering eigenvector central- ity. Although most programs identify this measure as eigenvector centrality, the presence of loops in a network reveals slight differences in how this measure is operationalized. In its essence, eigenvector centrality extends degree centrality by weighting each node’s score by its neigh- bors’ scores (Bonacich 1972). Like degree, it is affected by loops. Five of the tested programs cite Bonacich (1972, or 1987) in calculations of this measure, and one – Pa- jek – employs the analogous “hubs and au- thorities” (Kleinberg 1999). As with de- gree centrality, the presence of loops in a network reveals which programs opera- tionalized this measure using the same or substantially similar assumptions. Benchmarking and Reliability Issues Connections insna.org | Issues 1&2 | Volume 37 | 39 Three programs – UCINET, ORA, and Pa- jek – were consistent with one another in measuring eigenvector centrality in all three variations of one-mode networks. This is notable because one of those three, Pajek, provides “hubs and authorities”, which generates two independently scaled measurement vectors, of which the authori- ties vector was consistent with eigenvector centrality in the other two programs. This is the one case where a centrality meas- urement that differed from the classic cita- tion was identified under a different name. Perhaps the most conspicuous case of inconsistent calculations is closeness centrality in the presence of disconnected components. All programs evaluated refer- enced Freeman’s (1979) measure, with the exception of Pajek, which cites Sabidussi (1966), as cited in Freeman. However, it quickly becomes apparent that the question of how to operationalize closeness centrali- ty is neither agreed upon, nor settled in consideration of disconnected components. The original formula for closeness centrali- ty should not function with disconnected data since the distance between discon- nected components is undefined (Freeman 1979). Any means of dealing with discon- nected data, with the possible exception of running calculations only within each indi- vidual component, is therefore a later vari- ation of the Freeman formulation. Isolates and other disconnected components – which can be common network features in some areas – frequently present an obstacle to communicating the results of this meas- ure. A wide array of alternate measures has since been proposed to allow calculation of closeness with disconnected data (e.g., Borgatti 2006, Dangalchev 2006, Opsahl 2010, Wei et al 2011). Amid such prolifer- ation, however, it is unclear which forms have been incorporated in the software yielding results for disconnected compo- nent datasets. Only the R package sna produced an error message without numerical results rather than closeness values as stipulated in Freeman (1979), because it treats the dis- tances between disconnected components as infinite. There is also a stern admonish- ment in the package details against calcu- lating closeness centrality in networks with disconnected components. All other soft- ware produced closeness calculations without requiring that smaller components first be removed. The analysis of disconnected net- works using closeness produced widely varied output. Correspondingly, all tested software provided some means for dealing with disconnected components. In most software, the method for defining the dis- tance between disconnected components was incorporated into the measure. ORA and the R package igraph appear to default to substituting the number of nodes for un- defined distances, whereas Gephi appears to report undefined distances as zero and omit disconnected nodes from calculation. Pajek sets undefined distances to zero and calculates closeness only within each com- ponent. Of the software tested, UCINET of- fers the most options for applying close- ness centrality to one-mode networks with disconnected components. The user may choose one of four options for dealing with the distances between disconnected com- ponents: (1) substitute the number of nodes in the graph for the undefined distance; (2) substitute the maximum distance, plus one (the default setting); (3) treat undefined distances as missing and assign no value to isolates; and (4) set undefined distances to zero and calculate closeness only within each component. Those four options, com- bined with three options for scaling output (summed distances, averaged distances, and Freeman normalization) present the user with 10 combinations of options14 for calculating closeness in cases with discon- nected components. Aside from differences in how out- put was scaled, there was essentially no variation in calculating betweenness cen- trality. This is perhaps unsurprising, as there is little room for interpretation in the definition of this measure. Betweenness, a 14 The option of treating undefined distances as missing is scaled in only one manner. Connections Benchmarking and Reliability Issues 40 | Volume 37 | Issues 1&2 | insna.org normalized count of the number of times that a node appears on the shortest path be- tween any two other nodes (Freeman 1977), will generally be unaffected by dis- connected components and loops since loops will not create new geodesics (short- est paths) and the absence of paths be- tween disconnected components does not complicate geodesic counts. It bears repeating that the programs tested displayed relatively little variation when analyzing reference graphs – those with neither loops, nor disconnected com- ponents. The variation that was present in the centrality output from the reference graphs appears more likely to have arisen from differences in opinion on preferred methods of calculation in situations that vary from the ideal of connected one-mode graphs with non-recursive edges. Implications for the field user The reference graphs make it clear that – with only a few exceptions – those respon- sible for developing and maintaining each of these programs have done an admirable job of benchmarking their programs against others and correcting unintentional software differences. However, network topology that diverges away from the “ide- al” reference graph reveals that there is a great deal of disparity in the analytic as- sumptions that are built into software used to calculate such measures. The problem that arises from this lack of understanding and agreement within the social network analysis community is that it puts both analysts and the field itself at a disad- vantage by introducing unnecessary noise into analyses and communication within the community. Certainly, for those analysts whose data are similar to our reference graphs (i.e., no loops, no isolates or other discon- nected components) the low variability in measurement definition and implementa- tion is good news. The lack of variation in the output for the four reference graphs in- dicates that the programs used in this study agree in their standards for the calculation of basic centrality measures under the most basic and favorable conditions. The differ- ences that resulted from other conditions, however, underscore the importance of an analyst’s familiarity with their choice of software, and the software used by those whose work they wish to use as an analytic benchmark. A good deal of care should be exercised to verify the precise method of calculation being applied and the settings – and defaults – that were employed for those calculations. Centrality measures typically form the foundation of an analysis and if their implementation varies, more complex al- gorithms that involve one or more of these centrality measures may be producing measures or results that magnify this vari- ability. Unfortunately, the variation in the measurement of centrality values between programs remains somewhat opaque. Measurement disparities were observed even when the terms and citations used to identify the measures were identical be- tween programs. Clearly, analyzing the same net- work in the same way, using different software, can produce divergent results. If the implementation of a given centrality measure differs from one program to an- other, they are at best two different variants of the same measure. If such is the case, it will aid the analyst to know which variant they are utilizing. With its variety of measures and selections, UCINET goes the furthest of all the software tested above in identifying which variant of a particular measure it is employing – naming particu- lar variants of the same measure according to its originator or an intuitive description of its function. Only a few of the tested an- alytic tools were consistently explicit about the equations used to produce all four measures. The clarity in communication that has characterized the development of methods in the field of social network analysis is less evident in software opera- tionalization. This presents a threat to the validity of how those measures are em- ployed. Definitional differences between programs exist and are not readily apparent Benchmarking and Reliability Issues Connections insna.org | Issues 1&2 | Volume 37 | 41 to the average user. Although variations in centrality calculations hold the potential to increase the validity of a particular meas- ure when applied in the appropriate con- text, such differences are frequently masked from the general user, resulting in increased potential for the misapplication of measures. The present research has highlight- ed the value of knowing what one is get- ting into when considering new analytic software, and the importance of thoroughly vetting the topology of the network being analyzed. Add to that the tendency for most software packages to have some pro- vision for porting data and/or measures be- tween packages. The detected disparity in measures available in the current analytic packages indicates that such practices should be undertaken with caution – espe- cially in cases where graphs contain loops, isolates, or disconnected components. If the basic measures differ between packag- es then it may be inadvisable to use the two packages together in an analysis that involves those measures. Implications for the social network analy- sis community The lack of consensus over how to opera- tionalize the most common node centrality measures suggests some ontological varia- bility within the social network analysis community. Disagreements over how vari- ous centrality measures should be opera- tionalized would not be troubling if they were apparent to the user. But the differ- ences highlighted above are far from clear to the average user. Lack of agreement over how to operationalize a measure is masked when a variety of approaches share a single name. The situation is exacerbated when software documentation does not clarify precisely which approach has been implemented. The debate over how various net- work measures should be calculated is rich, and as old as the field itself. The community’s openness to new variations on established methods provides flexibility and a healthy diversity of analytic options. However, the advantages of such wealth are substantially diminished when the same measure is operationalized different- ly in each analytic package. Although a shared lexicon of terms and concepts exists within the social network analysis commu- nity, those terms and concepts are only generally – and not explicitly – applied. The programs used to perform social net- work analysis are disparate enough to cre- ate idiosyncratic analytic results. The interfaces of each analytic package vary greatly and do not always de- fault to the most commonly used variants of each measure. Without equivalence of measurement assumptions and nomencla- ture between programs that is easily acces- sible, the assumption of equivalence and portability of centrality measures in net- work analysis is lacking. This increased variability of centrality results may poten- tially affect more complex algorithms that incorporate these basic measures. These basic differences could be resolved with agreed-upon defaults and naming conven- tions for variants of a particular measure or algorithm. It is important that the variants of each measure be identified as distinct vari- ations of a centrality – or other measure- ment – theme. It is not enough to identify a measure generally as “closeness centrality” if it varies from the basic measure identi- fied by Freeman (or Sabidussi) – which most all do. Instead, the measure should be explicitly identified as a particular variant in order to better emphasize its unique at- tributes and trade-offs. Explicit descriptions of measurements can be essential to proper analysis. To draw an example from another analytic field: post-hoc tests for pairwise compari- sons of means following analysis of vari- ance have been developed to address varia- tions in hypothesis testing, trade-offs be- tween power and error, unequal sample sizes, and unequal variances (for a discus- sion of 22 post-hoc tests see Kirk 1995). Although the more casual user may find such a selection daunting, the strength in Connections Benchmarking and Reliability Issues 42 | Volume 37 | Issues 1&2 | insna.org this diversity of options is that the user may better consider and tailor their analyt- ic selections. Additionally – and perhaps more importantly – small differences be- tween variations on a measure become a feature, rather than an obstacle, when a particular form of a measurement is explic- itly named. Naming conventions are important. If a measure or algorithm differs from oth- ers in order to address some given scenario or feature of network topology, then it bears unique attributes that more often than not constitute a trade-off at some level. It is far better for both the analyst and the community for these differences to become less opaque. Of the tested programs, UCI- NET appears to have gone the furthest in giving attribution to the different variants of each measure. Though, both R packages benefit from the explicit nature of specify- ing a measurement. This aids the analyst by further clarifying differences between analytic approaches. The community is aided in establishing reliability of meas- urements between programs; as such clari- ty makes it much easier to directly com- pare results from different programs. It is not necessary for each program to offer every available option – though several have clearly taken steps in that di- rection. It is likely to be much more help- ful to the social network analysis commu- nity at large if the measures and algorithms that a program offers are fully identified for appropriate application of their proper- ties. Proper identification will simplify dis- course and improve the communication of methods. Such improvements in communi- cation within the field also translate into increases in measurement validity when a measure is identified as a specific variant, rather than just belonging to a general cat- egory or class of measures. Lastly, it should be noted that a lack of agreement from within the commu- nity on something as fundamental as nam- ing conventions hints at an arbitrariness that is surprising given the care and rigor of those who have established and expand- ed the field of social network analysis. Freeman (1984, 2004) has repeatedly made a compelling case for clarity and precision in communication – as facilitated by math- ematical notation – being the factor that set social network analysis apart from similar fields that rely more on natural language for clarification. The benefits of such pre- cision are, however, often frustratingly be- yond the reach of those using social net- work analysis software. Further, as researchers from other fields continue to adapt and adopt social network analytic methods, the use of standard, specific terms on the available analytic options provides a clarity that aids newcomers to social network analysis in seeing how their field can benefit by adopting a network analytic approach. But the converse is not the case: imprecise def- initions need not constitute an invitation for some within the “hard sciences” to forego established network analysis meth- ods in favor of feigning to invent them for themselves. Although co-option will likely continue, there has been an increase in the number of new entrants to social network analysis who give proper attribution (Freeman 2011). Clarity of communication will reinforce social network analysis as a mature and growing field, and de- emphasize the perception of it as being a general perspective or a mere category of tools (Knoke and Yang 2008, Snowden 2005). In most cases, it is possible to force the centrality output for a network contain- ing loops or disconnected components to be relatively consistent across all six plat- forms employed above. But such actions frequently require transformations or other preprocessing in order to do so, and those steps are seldom stipulated since there is no real agreed-upon definition of exactly which mathematical approach constitutes each type of centrality. The clarity that comes with definitional consistency be- tween programs is what we feel to be needed. We advocate the clarity that comes with dissimilar means to a particular end being clearly identified up front. Benchmarking and Reliability Issues Connections insna.org | Issues 1&2 | Volume 37 | 43 The “correct” measure is the one that is best suited to handle the idiosyncra- sies of the data an analyst holds. For the analyst to make this assessment, they first need to know the topology of the network they are analyzing; and next, specifically how a measure is meant to operate, and its underlying assumptions. A more complete approach includes asking which variation of the measure is available, the strengths and limitations of that version, and how re- liably one or more programs produce accu- rate measures. We have identified program and inter-program reliability issues under varied conditions. Similar comparisons be- tween other programs and under different conditions are strongly recommended when weighing whether to use two or more analytic programs in conjunction with one another. Further evaluations of inter- program reliability will benefit from add- ing more types of variation: e.g., directed graphs, density variations, clusterability variations. Ongoing research in this topic will continue to be important as new en- trants continue to discover the scalability and utility of the tools and concepts of so- cial network analysis for deciphering in- creasingly diverse networks with complex topological features. References Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D. (1999). LAPACK Users' Guide. (3rd ed.). Phila- delphia, PA: SIAM. Batagelj, V., Mrvar, A. (2003). Pajek – Analysis and Visualization of Large Networks. University of Ljubljana, Slovenia, EU. Available at: http://pajek.imfm.si/doku.php?id=download Bastian, M., Heymann S., Jacomy, M. (2009). Gephi: an open source software for exploring and manipu- lating networks. International AAAI Conference on Weblogs and Social Media. Available at: https://gephi.org. Becker, R.A., Chambers, J.M., Wilks, A.R., (1988). The New S Language. Pacific Grove, CA: Brooks/Cole. Bonacich, P. (1972). Factoring and weighting ap- proaches to status scores and clique identification. Journal of Mathematical Sociology, 2, 113–120. Bonacich, P. (1987). Power and centrality: A family of measures. American Journal of Sociology, 92, 1170–1182. Borgatti, S.P., Everett, M.G., Freeman, L.C. (2002). Ucinet for Windows: Software for Social Network Analysis. Analytic Technologies, Harvard, MA. Available at: http://www.analytictech.com/ucinet.htm Borgatti, S.P. (2006). Identifying sets of key players in a network. Computational, Mathematical and Or- ganizational Theory, 121, 21–34. Butts, C.T. (2007). sna: Tools for Social Network Analysis. Version 2.2. Irvine, CA. Available at: http://erzuli.ss.uci.edu/R.stuff Carley, K., Reminga, J. (2004). ORA: Organization Risk Analyzer*. Center for Analysis of Social and Organizational Systems, Carnegie Mellon Univer- sity. Pittsburgh, PA. Available at: http://www.casos.cs.cmu.edu/projects/ora Csardi, G., Nepusz, T. (2006). The igraph software package for complex network research. InterJour- nal, Cambridge, MA, 1695. Available at: http://igraph.sourceforge.net. Doreian, P. (2001). Causality in social network analy- sis. Sociological Methods Research, 30(1), 81– 114. Dangalchev C. (2006). Residual Closeness in Net- works. Physica, 365:2, 556–564. Freeman, L.C. (1977). A set of measures of centrality based on betweenness. Sociometry, 40: 35–41. Freeman L.C. (1979). Centrality in Social Networks: Conceptual clarification. Social Networks, 1: 215– 239. Freeman, L.C. (1984). Turning a profit from mathe- matics: the case of social networks. Journal of Mathematical Sociology, 10: 343–360. Freeman, L.C. (2004). The Development of Social Network Analysis: A Study in the Sociology of Sci- ence. North Charleston, SC: BookSurge. Freeman, L.C. (2011). The Development of Social Network Analysis: with an Emphasis on Recent Events, in Sage Handbook of Social Network Analysis. John Scott and Peter Carrington, eds. Thousand Oaks, CA: Sage, 26–39. Kirk, R.E. (1995). Experimental Design: Procedures for the Behavioral Sciences. (3rd ed.) Pacific Grove, CA: Brooks/Cole. Kleinberg, J. 1999. "Authoritative Sources in a Hyper- linked Environment." Journal of the ACM 46(5):604-32. Knoke, D., Yang, S. (2008). Social Network Analysis. (2nd ed.) Los Angeles, CA: Sage. Nieminen, J. (1974). On centrality in a graph. Social Science Research, 2: 371–378. Opsahl, T. (2010, March 20). Closeness centrality in networks with disconnected components. In: Tore Opsahl. Retrieved November 21, 2011, from http://toreopsahl.com/2010/03/20/closeness- centrality-in-networks-with-disconnected- components/ Connections Benchmarking and Reliability Issues 44 | Volume 37 | Issues 1&2 | insna.org Otte, E, Rousseau, R. (2002). Social network analysis: a powerful strategy, also for the information sci- ences. Journal of Information Science, 28, 441– 453. Reminga, J., Carley, K. (2003). Measures for ORA (Organization Risk Analyzer*). CASOS Working Papers. Retrieved November 21, 2011, from http://www.casos.cs.cmu.edu/publications/papers/r eminga_2003_ora.pdf Sabidussi, G. (1966). The centrality index of a graph. Psychometrika, 31, 581–603. Smith, B.T., Boyle, J.M., Dongarra, J.J., Garbow, B.S., Ikebe,Y., Klema, V., Moler, C.B. (1976). Matrix Eigensystems Routines – EISPACK Guide. Springer-Verlag Lecture Notes in Computer Sci- ence, 6. Snowden, D. (2005). From atomism to networks in social systems. The Learning Organization, 12(6), 552–562. Stokman, F.N., Doreian, P. (1997). Evolution of social networks: Processes and principles, Patrick Doreian and Frans N. Stokman, eds. Evolution of Social Networks. Amsterdam: Gordon and Breach Publishers, 233–250. Valente, T.W., Coronges, K., Lakon, C., Costenbader, E. (2008). How correlated are network centrality measures? Connections, 28(1), 16–26. Wei, W., Pfeffer, J., Reminga, J., Carley, K. (2011). Handling Weighted, Asymmetric, Self-Looped and Disconnected Networks. CASOS Technical Report: CMU-ISR-11-113. Retrieved November 21, 2011, from http://www.casos.cs.cmu.edu/publications/papers/ CMU-ISR-11-113.pdf Wikipedia (2011). Eigenvector Centrality. In: Central- ity. Retrieved November 21, 2011, from http://en.wikipedia.org/wiki/Betweenness#Eigenve ctor_centrality. Wilkinson, J.H. (1965). The Algebraic Eigenvalue Problem. New York, NY: Oxford University Press.