1© 2018 Authors. This work is licensed under the Creative Commons 
Attribution 4.0 License (https://creativecommons.org/licenses/by/4.0/)

CONNECTIONS
Issue 1 | Vol. 38Article | DOI: 10.21307/connections-2018-002

Techniques: Dichotomizing a Network

Abstract
This techniques guide provides a brief answer to the question: How 
to choose a dichotomization threshold? We propose a two step ap-
proach to selecting a dichotomization threshold. We illustrate the 
approaches using two datasets and provide instructions on how to 
perform these approaches in R and UCINET.

Keywords
Techniques, Dichotomization.

There are many reasons to dichotomize valued 
network data. It might be for methodological rea-
sons, for example, in order to use a graph-theoret-
ic concept such as a clique or an n-clan, or to use 
methods such as ERGMs or SAOMs, which large-
ly assume binary data1. There is also the matter of 
visualizing networks, where fewer ties often yield a 
considerably more readable picture. It could also be 
for theoretical reasons. For example, in order to dis-
tinguish between positive and negative ties, since tie 
strength or valence is often captured using a single 
scale, which then needs to be dichotomized in order 
to match the theory. Finally, we might be engaging 
in a certain kind of data smoothing: we have collect-
ed data at fine levels of differences in the strength of 
tie, but are not confident that small differences are 
meaningful. We have greater confidence in a few big 
buckets such as strong and weak than in 100 gradu-
ations of strength.

Whatever the reason, if we are going dichot-
omize, the question is at what level should we di-
chotomize? In some cases, the situation is guided 
by theoretical meaningfulness and the research de-
sign. For example, suppose respondents are asked 
to rate others on a scale of 1 = do not know them, 
2 = acquaintance, 3 = friend, and 4 = family. We see 
there is a loose gradation from “does not know” to 

“knows well”; however, categories 3 and 4 do not 
possess so much degrees of closeness as different 
kinds of social relations. The choice of which to use 
is determined by the research question. A similar ex-
ample is provided by questions that ask for a range 
of effects from negative to positive. If respondents 
are asked to rate others on a scale of 1 = dislike a 
lot, 2 = dislike somewhat, 3 = neither like nor dislike, 
4 = like somewhat, and 5 = like a lot, for many analy-
ses, it will make sense to choose a cut off of >3 or 
>4 for positive ties and <3 or <2 for negative ties. 
Note that in both of the last examples, we are still 
confronted with a choice of two values to choose 
from. In addition, if the scale points are more ambig-
uous than the ones above, or if the data are counts 
or rankings, then there is likely no a priori way of de-
ciding where to dichotomize.

Here, we propose a two-step approach to di-
chotomizing. Step 1 is to simply dichotomize at every 
level (or a collection of k bins) and examine the net-
work produced at each level. Step 2 is to use simple 
analytics in order to obtain an informed rationale for a 
specific dichotomization threshold that makes sense 
for a given data set.

Step 1

For step 1, input your valued network into your favorite 
network data management software and dichotomize 
at every level of the scale (see insert for information 
about how to do this in R and in UCINET). We rec-
ommend always spending some time visualizing the 

Stephen P. Borgatti1* and  
Eric Quintane2

1University of Kentucky, Gatton 
College of Business & Econom-
ics Lexington, KY.
2School of Management, The 
University of Los Andes, Bogota, 
Colombia.

*E-mail: sborgatti@uky.edu.

1There are, of course, many methods that do not require di-
chotomization. For example, we do not need to dichotomize 
in order to measure eigenvector centrality, nor to apply the 
relational event model (Butts, 2008).


2

Techniques: Dichotomizing a Network

Table 1. One mode DGG Women by Women network projection.

EV LA TH BR CH FR EL PE RU VE MY KA SY NO HE DO OL FL

EVELYN 8 6 7 6 3 4 3 3 3 2 2 2 2 2 1 2 1 1

LAURA 6 7 6 6 3 4 4 2 3 2 1 1 2 2 2 1 0 0

THERESA 7 6 8 6 4 4 4 3 4 3 2 2 3 3 2 2 1 1

BRENDA 6 6 6 7 4 4 4 2 3 2 1 1 2 2 2 1 0 0

CHARLOTTE 3 3 4 4 4 2 2 0 2 1 0 0 1 1 1 0 0 0

FRANCES 4 4 4 4 2 4 3 2 2 1 1 1 1 1 1 1 0 0

ELEANOR 3 4 4 4 2 3 4 2 3 2 1 1 2 2 2 1 0 0

PEARL 3 2 3 2 0 2 2 3 2 2 2 2 2 2 1 2 1 1

RUTH 3 3 4 3 2 2 3 2 4 3 2 2 3 2 2 2 1 1

VERNE 2 2 3 2 1 1 2 2 3 4 3 3 4 3 3 2 1 1

MYRNA 2 1 2 1 0 1 1 2 2 3 4 4 4 3 3 2 1 1

KATHERINE 2 1 2 1 0 1 1 2 2 3 4 6 6 5 3 2 1 1

SYLVIA 2 2 3 2 1 1 2 2 3 4 4 6 7 6 4 2 1 1

NORA 2 2 3 2 1 1 2 2 2 3 3 5 6 8 4 1 2 2

HELEN 1 2 2 2 1 1 2 1 2 3 3 3 4 4 5 1 1 1

DOROTHY 2 1 2 1 0 1 1 2 2 2 2 2 2 1 1 2 1 1

OLIVIA 1 0 1 0 0 0 0 1 1 1 1 1 1 2 1 1 2 2

FLORA 1 0 1 0 0 0 0 1 1 1 1 1 1 2 1 1 2 2

Figure 1: DGG Women by Women dataset dichotomized above 1.


3

CONNECTIONS

networks, which can be very informative regarding 
the emergence of clusters at certain levels of dichot-
omization. For example, consider Davis et al.’s (1941) 
women-by-events data (often referred to as the Davis 
data set or the DGG data). We construct a 1-mode 
women-by-women network by multiplying the original 
by its transpose. The result is shown in Table 1.

If we dichotomize at >1 and visualize, we get  
Figure 1.

If we dichotomize at >2, we get Figure 2.
And if we dichotomize at >3, we get Figure 3.
Thus, the successive dichotomizations reveal 

a 2-group structure, which is illuminating2. In oth-
er networks, successive dichotomization confirms a 
core/periphery structure. For example, the BKFRAT 
data set (Bernard et al., 1980) gives the number of 
times each pair of actors was seen interacting by an  
observer. The values range from 0 to 51. If we dichot-
omize at > 0, we get Figure 4.

If we dichotomize at > 2, we get Figure 5.
Dichotomizing at > 4, we get Figure 6.
Dichotomizing at > 6, we get Figure 7.
And so on. Core-periphery structures have a kind 

of self-similarity property where the main component 
always looks the same regardless of what level of  
dichotomization produced it.

Step 2 (three approaches)

Now, successive dichotomizations are informative, 
but our original question was about choosing a sin-
gle dichotomization that would be used in all further 
analyses, which is where step 2 becomes important. 
For step 2, we present three potential approaches. 
The first will horrify some people. This approach is to 
choose the level of dichotomization that maximizes 
your results. For example, suppose you are predicting 
managers’ performance as a function of between-
ness centrality. For each possible level of dichoto-
mization, you measure betweenness centrality and  
regress performance on betweenness, along with 
any control variables. The level of dichotomization 
that yields the highest r2 is the one you choose.

As we said, some people (scientists, statisticians, 
and people of good character) will be horrified3. There 
is definitely a danger of overfitting. The predictions 
work really well for this one data set, but perhaps not 

Figure 2: DGG Women by Women dataset dichotomized above 2.

2However, this should not be taken as definitive. Various 
normalizations of the data, as well as bipartite representa-
tions, tend to show a third smaller subgroup. See Freeman 
(2003) for a related discussion.

3On the other hand, these same people are happy to use 
regression to find the optimal coefficients to show a rela-
tionship between their explanatory variable and a depend-
ent. Perhaps, we should ask them to choose the coeffi-
cients a priori on the basis of strong theory.
4Of course, if you have these other datasets on hand, then 
you could pick the level of dichotomization that yields the 
highest average r2 across all datasets. The same applies 
if you have multiple DVs and IVs – you pick that level of 
dichotomization that gives the best results across all data-
sets, DVs, and IVs.


4

Techniques: Dichotomizing a Network

for others, 4. The other issue is that the particular di-
chotomization value that scores highest may be an 
odd value that you cannot explain. For example, sup-
pose we carry out this procedure and get the results 
shown in Table 2.

Clearly, we would choose 5, but how to make 
sense of these results? They rise and fall with no rhyme 
or reason. In this case, we would strongly advise 
against taking this approach. On the other hand, if the 
results were something like those presented in Table 3,

we would be comforted by the underlying regular-
ity and feel good about choosing 5, even though  
we might be hard-pressed to explain why medium 
density worked best.

A slightly less controversial version of this ap-
proach might be to choose the dichotomized  
version of your network that maximizes the repli-
cation of results from past studies. For example, 
we know from past studies that actors with higher 
levels of self-monitoring are more likely to receive 

Figure 4: BKS FRATERNITY dataset dichotomized above 0.

Figure 3: DGG Women by Women dataset dichotomized above 3.


5

CONNECTIONS

more friendship nominations. We could choose 
the dichotomization threshold that maximizes the  

relationship between self-monitoring and new 
friendship nominations, even if the test of our hy-
pothesis has to do with betweenness centrality and 
performance.

That was the first approach. The second approach 
is less controversial. Dichotomization, by its very  
nature, is a distortion of the data5. Where once you 
had nuance, you now have just ‘has tie’ and ‘not tie.’ 
This does violence to your data. The question is, how 

Figure 5: BKS FRATERNITY dataset dichotomized above 2.

5Clearly, in some cases, distorting the data is what we are 
looking for, for example, when distinguishing between neg-
ative and positive ties. In this case, we should not expect 
the dichotomized data to preserve the properties of the 
original dataset and we should either use a theoretically or 
literature driven approach or revert to approach 1.

Figure 6: BKS FRATERNITY dataset dichotomized above 4.


6

Techniques: Dichotomizing a Network

much? Suppose, as in an analysis of variance, you 
predicted your valued data from your dichotomized 
data. Some cutoff values are going to yield better pre-
dictions than others. Here is an example using the Da-
vis, Gardner, and Gardner women-by-women data. In 
the table below, the first column is the dichotomization 
value. For example, value 4 means that the data were 

dichotomized at ≥ 4. Dichotomizing at ≥ 4 results in a 
network with 48 ties, which corresponds to a density 
of 0.16. The interesting part is the correlation column, 
which achieves its maximum at ≥ 3 (correlation 0.81). 
The correlation refers to the correlation between the 
original valued matrix and the dichotomized matrix. 
A correlation of 0.81 is extremely high. Yes, the data  

Table 2. R-square of models predicting 
performance using betweenness 
centrality at different levels of 
dichotomization.

Dichot. level R2

1 0.05

2 0.29

3 0.02

4 0.01

5 0.31

6 0.06

7 0.11

8 0.02

9 0.23

Table 3. R-square of models predicting 
performance using betweenness 
centrality at different levels of 
dichotomization.

Dichot. level R2

1 0.05

2 0.09

3 0.12

4 0.23

5 0.31

6 0.27

7 0.22

8 0.15

9 0.07

Figure 7: BKS FRATERNITY dataset dichotomized above 6.


7

CONNECTIONS

are distorted by dichotomizing, but the dichotomized 
matrix still retains a very high resemblance to the origi-
nal data. We have chosen a level of dichotomization that 
does the least violence to the original data (Table 4).

Interestingly, ≥ 3 is the level just below the one at 
which the network splits into two large components 
(along with four isolates). At ≥ 4, the network looks 
like this, as shown in Figure 8

The third approach is theory based, and can be 
harder to implement. There are certain cases where 
we can use the emergent properties of the dichot-
omized networks themselves in order to identify the 
correct dichotomization threshold, just like when we 
noticed the appearance of clusters while visually in-

specting different dichotomization thresholds in the 
DGG data. As an example, let us consider an ap-
proach proposed by Freeman (2003) to distinguish 
between weak and strong ties. In his piece on the 
strength of weak ties, Granovetter (1973) argues that 
an important characteristic of strong ties is that if A is 
strongly tied to B, and B is strongly tied to C, then A is 
likely to be at least weakly tied to C. In his analysis of 
the DGG data, Freeman (2003) refers to Granovetter’s 
transitivity rule as g-transitivity. A data set is perfectly 
g-transitive if there are no violations of g-transitivity. 
Given a valued data set (and selecting a value such 
as zero as an indicator of no ties), Freeman’s proposal 
is to dichotomize the data set at every possible cutoff 
and calculate the number of violations of g-transitivity  
at each level. The lowest cutoff with an acceptable 
number of violations (such as zero) identifies the 
strong tie. For example, applied to the Davis women 
data, we get Table 5.

The table shows that at ≥ 4, the number of g- 
transitive triples is 160 and the number of intransitive  
triples is 0. Hence, ties 4 or above are strong ties, and 
ties < 4 but > 0 are weak ties.

Combining this with our previous approach, we 
might summarize the situation as follows. Dichoto-
mizing at ≥ 3 optimally identifies ties of any kind in 
terms of the least-violence criterion, and maintains a 
single large component (plus isolates). Dichotomizing 
at ≥ 4 identifies strong ties, which strongly fragment 
the network. The latter is useful for sharply outlining 
a subgroup structure, while the former enables the 
calculation of measure that requires connected net-
works (aside from isolates) (Figure 9).

Table 4. Z-score, correlation, number of ties and density of the DGG dataset at 
different dichotomization levels.

Value Z-score Correlation Ties Density

7 3.352 0.271887 2 0.006536

6 2.667 0.646625 16 0.052288

5 1.983 0.666829 18 0.058824

4 1.298 0.781314 48 0.156863

3 0.613 0.811928 92 0.300654

2 −0.072 0.720115 190 0.620915

1 −0.756 0.457341 278 0.908497

0 −1.441 306 1.000000

Table 5. Number of g-transitive and 
intransitive triples in the DGG dataset at 
different dichotomization levels.

Value Trans Intrans

7 0 0

6 26 0

5 30 0

4 160 0

3 526 4

2 2,032 44

1 3,786 292

0 4,448 448


8

Techniques: Dichotomizing a Network

It is worth noting that Freeman’s approach needs 
not be limited to maximizing g-transitivity. On theoretical 
grounds, we may identify a specific mechanism that 
organizes ties. For example, we may see a status 
mechanism such at the Matthew effect in which nodes 
that already have a lot of ties tend to attract even more 
ties. Now, to dichotomize valued data, we choose the 
cutoff that maximizes the extent to which there are just 
a few nodes with many ties and a great many nodes 
with few ties. Alternatively, we might choose the cutoff 
to maximize the level of transitivity in the network.

Conclusion

This “How to” guide on dichotomization is intend-
ed to provide guidance on how to find a suitable 
threshold for dichotomization for social network 
data. We propose that in all cases, we should start 
by creating multiple versions of the dichotomized 
network at every possible value of the threshold and 
inspect them visually. Then, we suggest three sepa-
rate approaches in order to choose (and justify your 
choice of) a single threshold based on (i) maximiz-

Figure 9: DGG Women by Women dataset dichotomized at 3. Strong ties in bold.

Figure 8: DGG Women by Women dataset dichotomized at 4.


9

CONNECTIONS

ing expected results, (ii) minimizing distortions, and 
(iii) identifying specific emergent properties in the  
network.

References
Bernard, H., Killworth, P. and Sailer, L. 1980. In-

formant accuracy in social network data IV. Social Net-
works 2: 191–218.

Butts, C.T. 2008. A relational event framework for 
social action. Sociological Methodology 38: 155–200.

Davis, A., Gardner, B. B. and M. R. Gardner 1941. 
Deep South, Chicago: The University of Chicago 
Press.

Freeman, L.C. 2003. Finding social groups: a  
meta-analysis of the southern women data, in 
Breiger, R., Carley, K., and Pattison, P. (Eds), Dynamic  
social network modeling and analysis: workshop 
summary and papers Committee on Human Factors,  
National Research Council: 39–45, National Acade-
mies Press.

Granovetter, M. 1973. The strength of weak ties. 
American Journal of Sociology 81: 1287–303.


10

Techniques: Dichotomizing a Network

Figure A1: Screenshot of Netdraw.

Addendum 2 – UCINET

To visualize successive dichotomizations in UCINET, 
one opens the valued data as usual and presses the 
+ sign in the rels tab at right to raise the level of di-
chotomization by one unit, see Figure A1, below.

This can also be done in the command line intern-
face (CLI) as follows:

->d1 = dichot(women ge 1)
->d2 = dichot(women ge 2)
->d3 = dichot(women ge 3)
Etc.

Addendum 1 – R Script

#Import the Davis data set in R, assuming that it 
is already in a text file, for example exported from 
UCINET.

library(readr)
davis <- as.matrix(read.csv(“davis.txt”,sep = “\t”, 

row.names = 1))
#Create a one-mode network by multiplying the 

original matrix by its transpose
davisonemode <- davis %*% t(davis)
diag(davisonemode) <- 0
#Dichotomize the network at all values
davisonemodedic <- array(dim = c(NROW(davi-

sonemode),NCOL(davisonemode),max(davisone-
mode)))

for (i in 1:max(davisonemode)) {
davisonemodedic[,,i] <- ifelse(davisonemode >= i, 1, 0)
}
#Visualize all networks
library(sna)
par(mfrow = c(4,2))
for (i in 1:max(davisonemode)) {
plot(as.network(davisonemodedic[,,i]))
}
#Correlation between original network and dichot-

omized networks, and some descriptive statistics

stats <- array(dim = c(max(davisonemode),4))
colnames(stats) <- c(“Threshold”, “Correlation”, 

“Num of 1 s”, “Density”)
for (i in 1:max(davisonemode)) {
stats[i,1] <- i
stats[i,2] <- summary(qaptest(list(davisonemode, 

davisonemodedic[,,i]), gcor, g1 = 1, g2 = 2))$test
stats[i,3] <- sum(davisonemodedic[,,i])
stats[i,4] <- stats[i,3]/(NROW(davisonemode)*(N-

ROW(davisonemode) - 1))
}

stats


11

CONNECTIONS

Figure A2: Screenshot of UCINET’s Interactive Dichotomization routine’s results.

In addition, the network could be drawn after each 
step:

->draw d1
->draw d2
Etc.

To compute the correlation between an original 
data set and successive dichotomizations of it, we 
can use UCINET’s Transform|Interactively Dichoto-
mize procedure. Figure A2 below shows this proce-
dure applied to the DGG women data.

Finally, to execute Freeman’s strong-weak-null tie 
decomposition based on g-transitivity, we can use 
UCINET’s command line interface (CLI) as shown in 

Table A1.

Table A1. G-transitivity decomposition 
command line instruction and output in 
UCINET.

->dsp gtrans(women)

1 2 3 4

Level Trans Intrans Possible Prop Trans

-------- -------- -------- -------- --------

7 0 0 0

6 26 0 26 1

5 30 0 30 1

4 160 0 160 1

3 526 4 530 0.992

2 2,032 44 2,076 0.979

1 3,786 292 4,078 0.928

0 4,448 448 4,896 0.908