Beyond Node Embedding: A Direct Unsupervised Edge
Representation Framework for Homogeneous Networks
Sambaran Bandyopadhyay1 and Anirban Biswas2 and Narasimha Murty3 and Ramasuri Narayanam4

Abstract. Network representation learning has traditionally been
used to find lower dimensional vector representations of the nodes
in a network. However, there are very important edge driven mining
tasks of interest to the classical network analysis community, which
have mostly been unexplored in the network embedding space. For
applications such as link prediction in homogeneous networks, vec-
tor representation (i.e., embedding) of an edge is derived heuristically
just by using simple aggregations of the embeddings of the end ver-
tices of the edge. Clearly, this method of deriving edge embedding is
suboptimal and there is a need for a dedicated unsupervised approach
for embedding edges by leveraging edge properties of the network.

Towards this end, we propose a novel concept of converting a net-
work to its weighted line graph which is ideally suited to find the em-
bedding of edges of the original network. We further derive a novel
algorithm to embed the line graph, by introducing the concept of col-
lective homophily. To the best of our knowledge, this is the first direct
unsupervised approach for edge embedding in homogeneous infor-
mation networks, without relying on the node embeddings. We val-
idate the edge embeddings on three downstream edge mining tasks.
Our proposed optimization framework for edge embedding also gen-
erates a set of node embeddings, which are not just the aggregation
of edges. Further experimental analysis shows the connection of our
framework to the concept of node centrality.

1 Introduction

Network representation learning (also known as network embedding)
has gained significant interest over the last few years. Traditionally,
network embedding [22, 12, 28] maps the nodes of a homogeneous
network (where nodes denote entities of similar type) to lower di-
mensional vectors, which can be used to represent the nodes. It has
been shown that such continuous node representations outperform
conventional graph algorithms [2] on several node based downstream
mining tasks like node classification, community detection, etc.

Edges are also important components of a network. From the point
of downstream network mining analytics, there are plenty of network
applications - such as computing edge betweenness centrality [20]
and information diffusion [24] - which heavily depend on the infor-
mation flow in the network. Compared to the conventional down-
stream node embedding tasks (such as node classification), these
tasks are more complex in nature. But similar to node based ana-
lytics, there is a high chance to improve the performance of these
tasks in a continuous lower dimensional vector space. Thus, it makes

1 IBM Research & IISc, Bangalore, email: sambband@in.ibm.com
2 Indian Institute of Science, Bangalore, email: anirbanb@iisc.ac.in
3 Indian Institute of Science, Bangalore, email: mnm@iisc.ac.in
4 IBM Research, Bangalore, email: ramasurn@in.ibm.com

sense to address these problems in the context of network embed-
ding via direct representation of the edges of a network. As a first
step towards this direction, it is important to design dedicated edge
embedding schemes and validate the quality of those embeddings on
some basic edge-centric downstream tasks.

(a) Synthetic Graph (b) node2vec (c) line2vec

Figure 1: Edge Visualization: (a) We created a small synthetic net-
work with two communities. So, there are three types of edges: Green
(or red) edges with both the end points belonging to the green (or red
respectively) community; Blue edges with end points belonging to
two different communities. (b) node2vec embedding (8 dimensional)
of the edges obtained by taking average of the embeddings of the end
vertices and then used t-SNE for visualization. (c) Direct edge em-
beddings (8 dimensional) obtained by line2vec and then used t-SNE
for visualization. Clearly, line2vec is superior which visually sepa-
rates the edge communities, compared to that with the conventional
way of aggregating node embeddings to obtain edge representation.

In the literature, there are indirect ways to compute embedding
of an edge in an information network. For tasks like link predic-
tion, where a classifier needs to be trained on both positive (existing)
and negative (not existing) edge representations, a simple aggrega-
tion function [12] such as vector average or Hadamard product has
been used on the representations of the two end vertices to derive
the vector representation of the corresponding edge. Typically node
embedding algorithms use the homophily property [18] by respecting
different orders of node proximities in a network. As the inherent ob-
jective functions of these algorithms are focused on the nodes of the
network, using an aggregation function on these node embeddings
to get the edge embedding could be suboptimal. We demonstrate the
shortcoming of this approach in Figure 1, where the visualization of
the edge embeddings derived by aggregating node embeddings (tak-
ing average of the two end nodes) from node2vec [12] on a small
synthetic graph do not maintain the edge community structure of the
network. Whereas, a direct edge embedding approach line2vec, to be
proposed in this paper, completely adheres to the community struc-
ture, as edges of different types are visually segregated in the t-SNE
plot of the same shown in 1(c). So there is a need to develop algo-
rithms for directly embedding edges (i.e., not via aggregating node
embeddings) in information networks. We address this research gap

ar
X

iv
:1

91
2.

05
14

0v
1 

 [
cs

.S
I]

  1
1 

D
ec

 2
01

9


in this paper in a natural way. Following are the contributions:

• We propose a novel edge embedding framework line2vec, for ho-
mogeneous social and information networks. To the best of our
knowledge, this is the first work to propose a dedicated unsuper-
vised edge embedding scheme which avoids aggregation of the end
node embeddings.

• We exploit the concept of line graph for edge representation by
converting the given network to a weighted line graph. We fur-
ther introduce the concept of collective homophily to embed the
line graph and produce the embedding of the edges of the given
network.

• We conduct experiments on three edge-centric downstream tasks.
Though our approach is proposed for embedding edges, we further
analyze to show that, a set of robust node embeddings, which are
not just the aggregation of edges, are also generated in the process.

• We experimentally discover the non-trivial connection of the clas-
sical concept of node centrality with the optimization framework
of line2vec. The source code of line2vec is available at https:
//bit.ly/2kfiS2l to ease the reproducibility of the results.

Though edge centric network mining tasks such as edge central-
ity, network diffusion and link prediction can be benefited from edge
embeddings, applications of edge embeddings to tackle them is non-
trivial and needs a separate body of work. For example, finding cen-
tral edges in the network amounts to detecting a subset of points in
the embedding space which are diverse between each other and rep-
resent a majority of the other points. We leave them to be addressed
in some future work.

2 Related Work and Research Gaps
Node embedding in information network has received great interest
from the research community. We refer the readers to the survey arti-
cles [33] for a comprehensive survey on network embedding and cite
only some of the more prominent works in this paragraph. DeepWalk
[22] and node2vec [12] are two node embedding approaches which
employ different types of random walks to capture the local neigh-
borhood of a node and maximize the likelihood of the node context.
Struc2vec [23] is another random walk based strategy which finds
similar embeddings for nodes which are structurally similar. A deep
autoencoder based node embedding technique (SDNE) that preserves
structural proximity is proposed in [31]. Different types of node em-
bedding approaches for attributed networks are also present in the
literature [35, 3, 9]. A semi-supervised graph convolution network
based node embedding approach is proposed in [14] and further ex-
tended in GraphSAGE [13] which learns the node embeddings with
different types of neighborhood aggregation methods on attributes.
Recently, node embedding based on semi-supervised attention net-
works [28], maximizing mutual information [29], and in the presence
of outliers [4] are proposed.

Compared to the above, representing edges in information net-
works is significantly less matured. Some preliminary works ex-
ist which use random walk on edges for community detection in
networks [15] or to classify large-scale documents into large-scale
hierarchically-structured categories [11]. [1] focuses on the asym-
metric behavior of the edges in a directed graph for deriving node
embeddings, but it represents a potential edge just by a scalar which
determines its chance of existence. [25, 30] derive embeddings for
different types of edges in a heterogeneous network, but their pro-
posed method essentially uses an aggregation function inside the op-
timization framework to generate edge embeddings from the node

embeddings. For knowledge bases, embedding entities and relation
types in a low dimensional continuous vector space [5, 7, 10] have
been shown to be useful. But, several fundamental concepts of graph
embedding, such as homophily, are not directly applicable to them.
[19] proposes a dual-primal GCN based semi-supervised node em-
bedding approach which first aggregates edge features by convolu-
tion, and then learns the node embeddings by employing a graph
attention on the incident edge features of a node. To the best of
our knowledge, [36] is the only work which proposes a supervised
approach based on adversarial training and an auto-encoder, purely
for edge representation learning in homogeneous networks. But their
framework needs a large amount of labelled edges to train the GAN,
which makes it restrictive for real world applications. Hence in this
paper, we propose a task-independent unsupervised dedicated edge
embedding framework for homogeneous information networks to ad-
dress the research gaps.

3 Problem Description
An information network is typically represented by a graph G =
(V,E,W), where V = {v1,v2, · · · ,vn} is the set of nodes (a.k.a.
vertices), each representing a data object. E ⊆{(vi,vj )|vi,vj ∈ V}
is the set of edges. We assume, |E| = m. Each edge e ∈ E is
associated with a weight wvi,vj > 0 (1 if G is unweighted), which
indicates the strength of the relation. Degree of a node v is denoted
as dv, which is the sum of weights of the incident edges. N(v) is the
set of neighbors of the node v ∈ V . For the given network G, the
edge representation learning is to learn a function f : e 7→ x ∈ RK ,
i.e., it maps every edge e ∈ E to a K dimensional vector called edge
embedding, where K < m. These edge embeddings should preserve
the underlying edge semantics of the network, as described below.

Edge Importance: Not all the edges in a network are equally im-
portant. For example, in a social network, millions of fans can be
connected to a movie star. But any two fans of a movie star may not
be similar to each other. So this type of connections are weaker com-
pared to an edge which connects two friends who have much lesser
number of connections individually [16].

Edge Proximity: The edges which are close to each other in terms
of their topography or semantics should have similar embeddings.
Similar to the concepts of node proximities [31], it is easy to define
first and higher order edge proximities via incidence matrix.

4 Solution Approach: line2vec
We propose an elegant solution (referred as line2vec) to embed each
edge of the given network. First we map the network to a weighted
line graph, where each edge of the original network is transformed
into a node.Then we propose a novel approach for embedding the
nodes of the line graph, which essentially provides the edge embed-
dings of the original network. For simplicity of presentation, we as-
sume that the given network is undirected. Nevertheless, it can triv-
ially be generalized for directed graphs.

4.1 Line Graph Transformation
Given an undirected graph G = (V,E), the line graph L(G)
is the graph such that each node of L(G) is an edge in G and
two nodes of L(G) are neighbors if and only if their correspond-
ing edges in G share a common endpoint vertex [32]. Formally
L(G) = (VL,EL) where VL = {(vi,vj ) : (vi,vj ) ∈ E} and
EL = {

(
(vi,vj ), (vj,vk)

)
: (vi,vj ) ∈ E , (vj,vk) ∈ E}. Figure 2

https://bit.ly/2kfiS2l
https://bit.ly/2kfiS2l


shows how to convert a graph into the line graph [8]. Hence the line
graph transformation induces a bijection from the set of edges of the
given graph to the set of nodes of the line graph as l : e 7→ v where
∀e ∈ E, ∃ v ∈ VL and if two edges ei,ej ∈ E are adjacent there
will be an corresponding edge e ∈ EL in the line graph.

Figure 2: Transformation process of a graph into its line graph. (a)
Represents an information network G. (b) Each edge in the original
graph has a corresponding node in the line graph. Here the green
edges represent the nodes in line graph. (c) For each adjacent pair of
edges in G there exists an edge in L(G). The dotted lines here are
the edges in the line graph. (d) The line graph L(G) of the graph G

4.2 Weighted Line Graph Formation

We propose to construct a weighted line graph for our problem even
if the original graph is unweighted. These weights would help the
random walk in the later stage of line2vec to focus more on the rel-
evant nodes in the line graph. It is evident from Section 4.1 that a
node of degree k in the original graph G produces k(k−1)/2 edges
in the line graph L(G). Therefore high degree nodes in the origi-
nal graph may get over-represented in the line graph. Often many
of these incident edges are not that important to the concerned node
in the given network, but they can potentially change the movement
frequency of a random walk in the line graph. We follow a simple
strategy to overcome this problem. The goal is to ensure that the line
graph not only reflects the topology of the original graph G (which is
guaranteed by Whitney graph isomorphism theorem [32] in almost
all cases) but also the dynamics of the graph is not affected by the
transformation process. The edge weights are defined to facilitate a
random walk on L(G), as described in Section 4.3.1. Intuitively if
we start a random walk from a node vij ≡ (vi,vj ) ∈ L(G) and
want to traverse to vjk ≡ (vj,vk) ∈ L(G), then it is equivalent to
selecting the node vj ∈ G from (vi,vj ) and move to vk ∈ G. If G
is undirected, we define the probability of choosing vj to be propor-
tional to

dvi
dvi +dvj

. Here, dvi and dvj are the degrees of the end point

nodes of the edge (vi,vj ) and an edge in general is more important
to the endpoint node having lower degree than the other endpoint
with a higher degree [16]. Then selecting vk is proportional to edge
weight of ejk ≡ (vj,vk) ∈ E. Hence, for any two adjacent edges
eij ≡ (vi,vj ) and ejk ≡ (vj,vk), we define the edge weight for the
edge (eij,ejk) of the line graph L(G) as follows:

w(eij,ejk) =
di

di + dj
×

wjk∑
r∈N(vj )

wjr −wij
(1)

This completes the formation of the weighted line graph from any
given network.

4.3 Embedding the Line Graph
Here we propose a novel approach to embed the nodes of the line
graph. Line graph is a special type of graph which comes with some
nice properties. Below is one important observation that we exploit
in embedding the line graph.

Lemma 1 Each (non-isolated) node in the graph G induces a
clique in the corresponding line graph L(G).

Proof 1 Let’s assume that a (non-isolated) node v in the graph G
has nv edges connected to it. So these nv edges are neighbors of
each other. Hence in the corresponding line graph L(G), each of
these edges would be mapped to a node and each of these nodes is
connected to all the other nv −1 nodes. Thus there is a clique of size
nv induced in the line graph by node v.

This can be visualized in Fig. 2, where the node 1 in (a) with de-
gree 3 induces a clique of size 3, including the nodes (1,2), (1,3) and
(1,4) into the corresponding line graph in (d). Lemma 1 is interesting
because it tells that the nodes of the line graph exhibit some col-
lective property, rather than just pairwise property. To clarify, in the
given network, two nodes are pairwise connected by an edge, but in
the line graph, a group of nodes form a clique. Pairwise homophily
[18], which has been the backbone to many standard embedding al-
gorithms [31], is not sufficient for embedding the line graph. Hence
we propose a new concept ‘collective homophily’ applicable to the
line graph. We explain it below.

Figure 3: Collective Homophily ensures the embeddings of the edges
which are connected via a common node in the network, stay within
a sphere of small radius.

4.3.1 Collective Homophily and Cost Function Formulation

We emphasize that all the nodes, which are part of a clique in a line
graph, should be close to each other in the embedding space. One
way to enforce collective homophily is to introduce a sphere (of small
radius R ∈ R) in the embedding space and ensure that embedding of
the nodes (in the line graph) which are part of a clique, remain within
the sphere. Hence any two embeddings within a sphere are at a maxi-
mum of 2R distance apart from each other. The concept is explained
in Fig. 3. Smaller the radius R, embeddings of the neighbor edges
would be closer to each other and hence the better the enforcement
of collective homophily. Note that a sum of pairwise homophily loss
in the embedding space may lead to some pairs being very close to
each other and others may still be quite far. So, we formulate the
objective function to embed the (weighted) line graph as follows.

Let us introduce some notation. Bold face letters like u (or v)
denote a node in the line graph L(G), which can also be denoted
by uuv when the correspondence with the edge (u,v) ∈ E in the
original graph G is required. Normal face letters like u,v denote
nodes in the given graph. xv ∈ RK (equivalently xuv) denotes the
embedding of the node vuv in line graph (or the edge (u,v) ∈ E).


To map the nodes of the line graph to vectors, first we want to pre-
serve different orders of node proximities in the line graph. For this, a
truncated random walk based sampling strategy S is used to provide
a set of nodes NS(v) as context to a node v in the network. Here we
employ the random walk proposed by [12], which balances between
the BFS and DFS search strategy in the graph. As the generated line
graph is a weighted one, we consider the weights of the edges while
computing the node transition probabilities. Let X denote the matrix
with each row as the embedding xv of a node v of the line graph. As-
suming conditional independence of the nodes, we seek to maximize
(w.r.t. X) the log likelihood of the context of a node as:∑

v∈VL

log P(NS(v)|xv) =
∑
v∈VL

∑
v′∈NS (v)

log P(v
′|xv)

Each of the above probabilities can be represented using standard
softmax function parameterized by the dot product of xv′ and xv. As
usual, we also approximate the computationally expensive denomi-
nator of the softmax function using some negative sampling strategy
N̄(v) for any node v. The above equation, after simple algebraic
manipulations, leads to maximizing the following:∑

v∈VL

∑
v′∈NS (v)

xv′ ·xv −|NS(v)| log
( ∑

v̄∈N̄(v)

exp(xv̄ ·xv)
)
(2)

Next, we implement the concept of collective homophily as pro-
posed above. Each node u ∈ V (in the original network) induces a
clique in the line graph (Lemma 1). An edge (u,v) ∈ E corresponds
to the node vuv ∈ VL in the line graph. So we want all the nodes
of the form vuv ∈ VL belong to a sphere centered at cu ∈ RK and
of radius Ru, where v ∈ N(u) (neighbors of u). As collective ho-
mophily suggests that embeddings of these nodes must be close to
each other, we minimize the sum of all such radii. This with Eq. 2
gives the final cost function of line2vec as follows.

min
X,R,C

∑
v∈VL

[
|NS(v)| log

( ∑
v̄∈N̄(v)

exp(xv̄ ·xv)
)

−
∑

v′∈NS (v)

xv′ ·xv
]

+ α
∑
u∈V

R
2
u

such that, ||xuv −cu||22 ≤ R
2
u, ∀v ∈N(u), ∀u ∈ V

Ru ≥ 0, ∀u ∈ V

(3)

Here, α is a positive weight factor. The constraint ||xuv −cu||22 ≤
R2u ensures that nodes of the form xuv belong to the sphere of radius
Ru and centered at cu. We use R and C to denote set of all such radii
and centers respectively.

4.3.2 Solving the Optimization

Equation 3 is a non-convex constrained optimization problem. We
use penalty functions [6] technique to convert this to an uncon-
strained optimization problem as follows:

min
X,R,C

∑
v∈VL

[
|NS(v)| log

( ∑
v̄∈N̄(v)

exp(xv̄ ·xv)
)

−
∑

v′∈NS (v)

xv′ ·xv
]

+ α
∑
u∈V

R
2
u

+ λ
∑
u∈V

∑
v∈N(u)

g(||xuv −cu||22 −R
2
u) +

∑
u∈V

γug(−Ru)

(4)

Here the function g : R → R is defined as g(t) = max(t, 0). So
it imposes a penalty to the cost function in Eq. 4 when the argument
inside g is positive, i.e., when there is a violation of the constraints
in Eq. 3. We use a linear penalty g(t) as the gradient does not van-
ish even when t → 0+. To solve the unconstrained optimization in
Eq. 4, we use stochastic gradient descent, computing gradients w.r.t.
each of X, R and C. We take subgradient when t = 0 for g(t). All
the penalty parameters λ and γu’s corresponding to penalty func-
tions are positive. When there is any violation of a constraint (or sum
of constraints), the corresponding penalty parameter is increased to
impose more penalty. We give more importance to the type of con-
straints Ru ≥ 0, as violation of them may change the intuition of the
solution. So we use different penalty parameters for each of them, so
imposing a different penalty to each of such constraints is possible.
One can show that under appropriate assumptions, any convergent
subsequence of solutions to the unconstrained penalized problems
must converge to a solution of the original constrained problem [6].
Very small values of the penalty parameters might lead to the vio-
lation of constraints, and very large values would make the gradient
descent algorithm oscillate. So, we start with smaller values of λ and
γu’s and keep increasing them until all the constraints are satisfied
or the gradients become too large making abrupt function changes.
Note that, theoretically some of the constraints in Eq. 3 may still be
violated, but experimentally we found them satisfied up to a large
extent (Section 5). In the final solution, xv gives the vector represen-
tation of node v of the line graph, which is essentially the embedding
of the corresponding edge in the original network.

4.4 Key Observations and Analysis

Both the edge embedding properties mentioned in Section 3 are pre-
served in the construction and embedding of the weighted line graph.
Particularly, if two edges have a common incident node in the orig-
inal network, the corresponding two nodes in the transformed line
graph would be neighbors. Also two edges having similar neighbor-
hood in the original network lead to two nodes having similar neigh-
borhood in the transformed line graph. The random walk and collec-
tive homophily preserve both pairwise and collective node proxim-
ity of the line graph in the embedding space. Thus different orders
of edge proximities of the original network is captured well in the
edge embeddings. Also the construction of edge weights in line graph
(Sec. 4.2) ensures that underlying importance of edges of the original
network is preserved in the transformed line graph, and hence in the
embeddings through truncated random walk.

Time Complexity: Edge embedding is computationally difficult
than node embedding, as the number of edges in a real life net-
work is more than the number of nodes. From Lemma 1, each node
u in the original network induces a clique of size du (degree of
u in G). Hence total number of edges in the line graph is: mL =∑
u∈V

(
du
2

)
=
∑
u∈V

du(du−1)
2

≤ |V |d2, where d is the maximum de-

gree of a node in the given network. So, the construction of line
graph would take O(|V |d2) time. Next, we use alias table for fast
computation of the corpus of node sequences in O(mL log(mL)) =
O(|V |d2 log(|V |d)) by the random walks, assuming the number of
random walks on the line graph, maximum length of a random walk,
context window size and the number of negative samples for each
node to be constant, as they are the hyper parameters of skip-gram
model. Then, the first term (under the sum over the nodes in VL) of
Eq. 4 can be computed in O(|VL|) = O(|E|) time. Next, the term
weighted by α can be computed in O(|V |) time. Then, for the term


weighted by λ, we need to visit each node in V and for each such
node, its neighbors in the original graph, which can be computed in
a total of O(|E|) time. The last term of Eq. 4 can be computed in
O(|V |) time. As we use penalty methods to solve it, the runtime of
solving Eq. 4 is O(|E| + |V |). Hence the total runtime complexity
of line2vec is O(|V |d2 log(|V |d)). So in the worst case, (for e.g., a
nearly complete graph), run time complexity is O(|V |3log|V |). But
for most of the real life social networks, the maximum degree can
be considered as a constant (i.e., does not grow with the number of
nodes). Hence for them, the run time complexity is O(|V |log|V |).

5 Experimental Evaluation
We conduct detailed experiments on three downstream edge centric
network mining tasks and thoroughly analyze the proposed optimiza-
tion framework of line2vec.

5.1 Design of Baseline Algorithms
Unsupervised direct edge embedding for information network itself
is a novel problem. Existing approaches only aggregate the embed-
dings of the two end nodes to find the embedding of an edge. So as
baselines, we only consider the publicly available implementation of
a set of popular unsupervised node embedding algorithms which can
work only using the link structure of the graph: DeepWalk, node2vec,
SDNE, struc2vec and GraphSage (official unsupervised implemen-
tation for the un-attributed networks). We have considered differ-
ent types of node aggregation methods such as taking the average,
Hadamard product, vector norms of two end node embeddings [12]
to generate the edge embeddings for the baseline algorithms. It turns
out that average aggregation method performs the best among them.
So we report the performance of the baseline methods with average
node aggregation, where embedding of an edge (u,v) is computed
by taking the average of the node embeddings of u and v.

5.2 Datasets Used and Setting Hyper-parameters
We used five real world publicly available datasets for the ex-
periments. A summary of the datasets is given in Table 1. For
Zachary’s karate club and Dolphin social network (http://
www-personal.umich.edu/˜mejn/netdata/), there are
no ground truth community labels given for the nodes. So we use
the modularity based community detection algorithm, and label the
nodes based on the communities they belong to. For Cora, Pubmed
(https://linqs.soe.ucsc.edu/data) and MSA [26], the
ground truth node communities are available. The ground truth edge
labels are derived as follows. If an edge connects two nodes of the
same community (intra community edge), the label of that edge is
the common community label. If an edge connects nodes of different
communities (inter community edge), then that edge is not consid-
ered for calculating the accuracy of downstream tasks. Note that, all
the edges (both intra and inter community) are considered for learn-
ing the edge embeddings. We also provide the size of the generated
weighted line graphs in Table 1. Note that, line graphs are still ex-
tremely sparse in nature, which enables the application of efficient
data structures and computation on sparse graphs here.

We set the parameter α in Eq. 3 to be 0.1 in the experiments. At
that value, the two components in the cost function in Eq. 3 con-
tribute roughly the same to the total cost in the first iteration of
line2vec. The dimension (K) of the embedding space is set as 8 for
Karate club and Dolphin social network as they are small in size,

Table 1: Summary of the datasets used.

Dataset #Nodes #Edges #Edge-Labels #Nodes in L(G) #Edges in L(G)

Zachary’s Karate club 34 78 3 78 528
Dolphin social network 62 159 4 159 923

Cora 2708 5278 7 5278 52301
Pubmed 19717 44327 3 44327 699385

MSA 30101 204926 3 204926 6149555

and it is set as 128 for the other three larger datasets (for all the algo-
rithms). For the faster convergence of SGD, we set the initial learning
rate higher and decrease it over the iterations. We vary the penalty pa-
rameters in Eq. 4 over the iterations as discussed in Section 4.3.2 to
ensure that the constraints are satisfied at large.

5.3 Penalty Errors of line2vec Optimization
We have shown the values of two different penalty errors (or con-
straint violation error of the penalty method based optimization) over
the iterations of line2vec in Figure 5. For all the datasets, total spher-
ical error

∑
u∈V

∑
v∈N(u)

g(||xuv − cu||22 − R2u) converges to a small

value very close to zero and negative error
∑
u∈V

g(−Ru) remains to

be zero. This means, almost all the constraints of line2vec formula-
tion are satisfied in the final solution.

5.4 Downstream Edge Mining Tasks
Edge visualization: It is important to understand if the edge embed-
dings are able to separate the communities visually. We use the em-
bedding of the edges as input in RK , and use t-SNE [17] to plot the
edge embedding in a 2 dimensional space. Fig. 4 shows the edge vi-
sualizations by line2vec, along with the baselines algorithms on Cora
datasets. Note that, line2vec is able to visually separate the commu-
nities well compared to all the other baselines. The same trend was
observed even in Fig. 1 for the small synthetic network. Line2vec, be-
ing a direct approach for edge embedding via collective homophily,
outperforms all the baselines which aggregate node embeddings to
generate the embeddings for the edges.

Edge Clustering: Like node clustering, edge clustering is also im-
portant to understand the flow of information within and between the
communities. For clustering the embeddings of the edges, we apply
KMeans++ algorithm. To evaluate the quality of clustering, we use
unsupervised clustering accuracy [34] which uses different permuta-
tions of the labels and chooses the assignment which gives best possi-
ble accuracy. Figure 6a shows that line2vec outperforms all the base-
lines for edge clustering on all the datasets. DeepWalk and node2vec
also perform well among the baselines.

Multi-class Edge Classification: We use only 10% edges with
ground truth label (as generated in Section 5.2) as the training set,
because getting labels is expensive in networks. A logistic regression
classifier is trained on the edge embeddings generated by different al-
gorithms. The performance on the test set is reported using Micro F1
score. Figure 6b shows that line2vec is better or highly competitive
with the state-of-the-art embedding algorithms. node2vec and Deep-
Walk follows line2vec closely. On the Dolphin dataset, node2vec
outperforms line2vec marginally. Performance of line2vec for edge
classification again shows the superiority of a direct edge embedding
scheme over the node aggregation approaches.

5.5 Ablation Study of line2vec
The idea of line2vec is to embed the line graph for generating the
edge embeddings of a given network. There are two main novel com-

http://www-personal.umich.edu/~mejn/netdata/
http://www-personal.umich.edu/~mejn/netdata/
https://linqs.soe.ucsc.edu/data


(a) DeepWalk (b) node2vec (c) SDNE (d) struc2vec (e) GraphSAGE (f) line2vec

Figure 4: Edge visualization on Cora dataset. Different colors represent different edge communities.

(a) Spherical Error (b) Non-negative Error

Figure 5: Both spherical error
∑
u∈V

∑
v∈N(u)

g(||xuv − cu||22 − R2u)

and non-negative error
∑
u∈V

g(−Ru) in the penalty function based

optimization of line2vec converge to zero very fast on all the datasets.

ponents in line2vec: first, the construction of weighted line graph;
and second, more importantly, proposing the concept of collective
homophily on the weighted line graph. In this subsection, we show
the incremental benefit of each component through a small experi-
ment of edge visualization on the Dolphin dataset, as shown in Fig.
7. We use node2vec (N2V) as the starting point because the skip-
gram objective component of line2vec (L2V) is similar to node2vec.
Though, visually there is not much difference between Sub-figures
7a and 7b, but there is some improvement when we apply node2vec
on the weighted line graph (without using collective homophily) in
Sub-fig. 7c. Finally, superiority of line2vec because of using collec-
tive homophily on the weighted line graph is clear from Sub-fig. 7c.
Thus, both the novel components of line2vev have their incremental
benefits for the overall algorithm.

5.6 Parameter Sensitivity of line2vec
Figure 8 shows the sensitivity of line2vec with respect to the hyper-
parameter α (in Eq. 3 of the main paper) on Karate and Dolphin
datasets. We have shown the variation of performance for node clas-
sification (both micro and macro F1 scores) and node clustering (un-
supervised accuracy). From the figure, one can observe that optimal
performance in most of the cases is obtained when the value of α is
from 0.05 to 0.1. Around these values, the loss from both the com-
ponents of line2vec in Eq. 3 are close to each other. For our other
experiments, we fix α=0.1 for all the datasets.

5.7 Interpretation of cu as Node Embedding
line2vec is dedicated for direct edge embedding in information net-
works. Lemma 1 suggests that each node in the given network G
induces a clique in the line graph L(G). Based on the concept of col-
lective homophily, corresponding to a node u in G, the clique in the
line graph is enclosed by a sphere centered at cu ∈ RK (Eq. 3). In-
tuitively, the center acts as a point which is close to the embeddings
of all the nodes in the clique induced by u (or equivalently, all the

edges incident on u in G). Hence the role of this center in the em-
bedding space is similar to the role of the node to its adjacent edges
in the graph. This motivates us to consider cu as the node embedding
of u ∈ V in G. If (u,v) ∈ E, then the edge embedding of (u,v)
should be close to both cu and cv, which in turn pulls cu and cv
close to each other. Thus, node proximities are also captured in cu.

We use clustering of the nodes (a.k.a. community detection) of the
given network to validate the quality of node embedding obtained
from the centers of the line2vec optimization. We use k-means++
clustering, as before, on the set of points cu, ∀u ∈ V and validate
the clustering quality by using unsupervised accuracy [4]. Figure 6c
shows that line2vec, though designed specifically for edge embed-
ding, performs really good for a node based mining task. Specifically,
for Karate and Dolphins networks, the gain is significantly more than
best of the baselines. This result is interesting as we aimed to find
edge embeddings, but also generate a set of efficient node embed-
dings, which are not just the aggregation of the incident edges.

5.8 Connection of Node Centrality with Ru
This subsection analyzes the interpretation of the radius Ru of the
sphere enclosing the clique induced by node u ∈ V in the embed-
ding space. When a node u has less number of incident edges, and the
neighbors are very close to each other in the embedding space (for
e.g., they are all from the same sub-community), a small radius Ru
should be enough to enclose all the edges incident on u. But when
the neighbors of the node u are diverse in nature, the corresponding
edges would also be different in terms of strength and semantics. For
example, an influential researcher may be directly connected to many
other researchers in a research network, but only few of them can be
direct collaborators. Hence, a larger sphere is needed to enclose the
clique in the line graph induced by such a node. This intuition con-
nects radius Ru of a sphere in the embedding space of line graph to
the centrality [27] of the node u in the given network. A node which
is loosely connected (i.e., less number or very similar neighbors) in
the network is less central, and a node which is strongly connected
(many or diverse set of neighbors) is considered as highly central.

As real life networks are noisy [4], first we experiment with a small
synthetic graph as shown in Figure 9 to show the connection between
Ru and the centrality of the node u ∈ V . It has three communities
and there is a central (red colored) node connecting all the commu-
nities. Each community has three sub-communities which are con-
nected via the green colored nodes. The degree of each node in this
network is kept roughly the same. We use closeness centrality [21],
which is used widely in the network analysis literature. The closeness
centrality of the nodes are plotted in Fig. 9(b). The nodes in the y-axis
are sorted based on their closeness centrality values and as expected,
the red node top the list as it is well connected to all the communi-
ties, followed by the green nodes, with yellow nodes placed at the
bottom. We run line2vec on this synthetic graph and plot the Radius
Ru for each node u in Fig. 9(c). Here also, the nodes are sorted in


(a) Edge Clustering (b) Edge Classification (c) Node Clustering

Figure 6: Performance Comparisons: (a) Micro F1 Score of Edge Classification. (b) Edge Clustering with KMeans++. (c) Node Clustering with
KMeans++. Here we use cu as the embedding of the node u in the given network.

(a) N2V (b) N2V+LG (c) N2V+WLG (d) L2V

Figure 7: Edge visualization on Dolphin Dataset by t-SNE: In the
following sub-figures, edge Embeddings are obtained (a) by using
node2vec on the input graph and then taking average of end node
embeddings for each edge, (b) by using node2vec on an unweighted
(conventional) line graph, (c) by using node2vec on our proposed
weighted line graph, (d) by line2vec. Clearly, there is an incremental
improvement of the quality because of using weighted line graph
and then collective homophily as reflected in (c) and (d) respectively.

(a) Edge Classification (b) Edge Clustering

Figure 8: Sensitivity of line2vec with respect to the hyper-parameter
α (in Eq. 3 of the main paper) on Karate and Dolphin datasets: We
have shown the variation of performance for edge classification (Mi-
cro F1 score) and edge clustering (unsupervised accuracy).

the same order as in sub-figure 9(b). As one can see, the red node has
the highest value of the radius. As this node is connected to a diverse
set of nodes in the network, it needs a larger sphere to enclose the
induced clique in the line graph. We also observe that most of the
green nodes have higher values of Ru than that of the yellow nodes.
The correlation coefficient between the closeness centrality and the
radius Ru is 0.56. A more prominent trend can be observed for be-
tweenness centrality [27], where the correlation coefficient with the
radius Ru is 0.86.

On all the real-world datasets, we show the correlation of Ru with
the two centrality metrics for all the nodes in Table 2. High positive
correlation between them can conclude that radius Ru of a node is
roughly proportional to the centrality of the node u in the network.
However, a detailed analysis is required to see the scope of introduc-
ing a new type of node centrality based on the values of Ru.

(a) (b) (c)

Figure 9: Relationship between radius Ru associated with each node
and closeness centrality in a synthetic graph. (a) shows the structure
of the synthetic network. (b) shows the closeness centrality of the
nodes, where in Y axis, nodes are sorted based on their centrality
values. (c) shows the Ru for all the nodes. Nodes in Y-axis of (c) are
sorted in the same order as in (b). The colors of the lines in (b) and
(c) correspond to three different types of nodes (colored accordingly)
in (a). This figure also shows the high overlap between the top few
nodes in both the lists.

Table 2: Pearson Correlation-Coefficient(CC) values obtained be-
tween the radius(Ru) and centrality values of nodes for different net-
works. The centrality measures considered here are Betweenness and
Closeness centrality.

Dataset Karate Dolphins Cora Pubmed MSA

Betweenness CC 0.81 0.66 0.29 0.26 0.35
Closeness CC 0.68 0.78 0.79 0.59 0.72

6 Discussion and Future Work
We proposed a novel unsupervised dedicated edge embedding frame-
work for homogeneous information and social networks. We convert
the given network to a weighted line graph and introduce the con-
cept of collective homophily to embed the weighted line graph. Our
framework is quite generic. The skip-gram based component in the
objective function of line2vec can easily be replaced with any other
approach like graph convolution in weighted line graph. Beside, we
also plan to extend this methodology for heterogeneous information
networks and knowledge bases. There are several edge centric appli-
cations in networks. This work, being the first one towards a direct
edge embedding, can play a basis to solve some of them in the con-
text of network embedding and help to move network representation
learning beyond node embedding.

REFERENCES
[1] Sami Abu-El-Haija, Bryan Perozzi, and Rami Al-Rfou, ‘Learning edge

representations via low-rank asymmetric projections’, in Proceedings
of the 2017 ACM on Conference on Information and Knowledge Man-
agement, pp. 1787–1796. ACM, (2017).


[2] Lada A Adamic and Eytan Adar, ‘Friends and neighbors on the web’,
Social networks, 25(3), 211–230, (2003).

[3] Sambaran Bandyopadhyay, Harsh Kara, Aswin Kannan, and
M Narasimha Murty, ‘Fscnmf: Fusing structure and content via
non-negative matrix factorization for embedding information net-
works’, arXiv preprint arXiv:1804.05313, (2018).

[4] Sambaran Bandyopadhyay, N Lokesh, and M Narasimha Murty, ‘Out-
lier aware network embedding for attributed networks’, in Proceedings
of the AAAI Conference on Artificial Intelligence, volume 33, pp. 12–
19, (2019).

[5] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston,
and Oksana Yakhnenko, ‘Translating embeddings for modeling multi-
relational data’, in Advances in neural information processing systems,
pp. 2787–2795, (2013).

[6] Kurt Bryan and Yosi Shibberu, ‘Penalty functions and constrained opti-
mization’, Dept. of Mathematics, Rose-Hulman Institute of Technology.
http:// www. rosehulman. edu/˜ bryan/lottamath/penalty. pdf, (2005).

[7] Muhao Chen and Chris Quirk, ‘Embedding edge-attributed relational
hierarchies’. SIGIR, (2019).

[8] Tim S Evans and Renaud Lambiotte, ‘Line graphs of weighted net-
works for overlapping communities’, The European Physical Journal
B, 77(2), 265–272, (2010).

[9] Hongchang Gao and Heng Huang, ‘Deep attributed network embed-
ding.’, in IJCAI, volume 18, pp. 3364–3370, (2018).

[10] Zheng Gao, Gang Fu, Chunping Ouyang, Satoshi Tsutsui, Xiaozhong
Liu, Jeremy Yang, Christopher Gessner, Brian Foote, David Wild, Ying
Ding, et al., ‘edge2vec: Representation learning using edge semantics
for biomedical knowledge discovery’, BMC bioinformatics, 20(1), 306,
(2019).

[11] Mohammad Golam Sohrab, Toru Nakata, Makoto Miwa, and Yutaka
Sasaki, ‘Edge2vec: Edge representations for large-scale scalable hier-
archical learning’, Computación y Sistemas, 21(4), 569–579, (2017).

[12] Aditya Grover and Jure Leskovec, ‘node2vec: Scalable feature learn-
ing for networks’, in Proceedings of the 22nd ACM SIGKDD interna-
tional conference on Knowledge discovery and data mining, pp. 855–
864. ACM, (2016).

[13] Will Hamilton, Zhitao Ying, and Jure Leskovec, ‘Inductive representa-
tion learning on large graphs’, in Advances in Neural Information Pro-
cessing Systems, pp. 1025–1035, (2017).

[14] Thomas N Kipf and Max Welling, ‘Semi-supervised classification
with graph convolutional networks’, arXiv preprint arXiv:1609.02907,
(2016).

[15] Suxue Li, Haixia Zhang, Dalei Wu, Chuanting Zhang, and Dongfeng
Yuan, ‘Edge representation learning for community detection in large
scale information networks’, in International Workshop on Mobility
Analytics for Spatio-temporal and Social Data, pp. 54–72. Springer,
(2017).

[16] David Liben-Nowell and Jon Kleinberg, ‘The link-prediction problem
for social networks’, Journal of the American society for information
science and technology, 58(7), 1019–1031, (2007).

[17] Laurens van der Maaten and Geoffrey Hinton, ‘Visualizing data us-
ing t-sne’, Journal of machine learning research, 9(Nov), 2579–2605,
(2008).

[18] Miller McPherson, Lynn Smith-Lovin, and James M Cook, ‘Birds of
a feather: Homophily in social networks’, Annual review of sociology,
27(1), 415–444, (2001).

[19] Federico Monti, Oleksandr Shchur, Aleksandar Bojchevski, Or Litany,
Stephan Günnemann, and Michael M Bronstein, ‘Dual-primal graph
convolutional networks’, arXiv preprint arXiv:1806.00770, (2018).

[20] M.E.J. Newman, Networks: An Introduction, Oxford University Press,
Oxford, UK, 2010.

[21] Tore Opsahl, Filip Agneessens, and John Skvoretz, ‘Node centrality
in weighted networks: Generalizing degree and shortest paths’, Social
networks, 32(3), 245–251, (2010).

[22] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena, ‘Deepwalk: Online
learning of social representations’, in Proceedings of the 20th ACM
SIGKDD international conference on Knowledge discovery and data
mining, pp. 701–710. ACM, (2014).

[23] Leonardo FR Ribeiro, Pedro HP Saverese, and Daniel R Figueiredo,
‘struc2vec: Learning node representations from structural identity’, in
Proceedings of the 23rd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, pp. 385–394. ACM, (2017).

[24] E. Rogers, Diffusion of Innovations, Free Press, New York, USA, 1995.
[25] Yu Shi, Qi Zhu, Fang Guo, Chao Zhang, and Jiawei Han, ‘Easing em-

bedding learning by comprehensive transcription of heterogeneous in-
formation networks’, in Proceedings of the 24th ACM SIGKDD In-
ternational Conference on Knowledge Discovery & Data Mining, pp.
2190–2199. ACM, (2018).

[26] Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-
june Paul Hsu, and Kuansan Wang, ‘An overview of microsoft aca-
demic service (mas) and applications’, in Proceedings of the 24th in-
ternational conference on world wide web, pp. 243–246. ACM, (2015).

[27] Oskar Skibski, Talal Rahwan, Tomasz P Michalak, and Makoto Yokoo,
‘Attachment centrality: An axiomatic approach to connectivity in net-
works’, in Proceedings of the 2016 International Conference on Au-
tonomous Agents & Multiagent Systems, pp. 168–176. International
Foundation for Autonomous Agents and Multiagent Systems, (2016).

[28] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana
Romero, Pietro Lio, and Yoshua Bengio, ‘Graph attention networks’,
in International Conference on Learning Representations, (2018).

[29] Petar Veličković, William Fedus, William L Hamilton, Pietro Liò,
Yoshua Bengio, and R Devon Hjelm, ‘Deep graph infomax’, in Inter-
national Conference on Learning Representations, (2019).

[30] Janu Verma, Srishti Gupta, Debdoot Mukherjee, and Tanmoy
Chakraborty, ‘Heterogeneous edge embedding for friend recommenda-
tion’, in European Conference on Information Retrieval, pp. 172–179.
Springer, (2019).

[31] Daixin Wang, Peng Cui, and Wenwu Zhu, ‘Structural deep network
embedding’, in Proceedings of the 22nd ACM SIGKDD international
conference on Knowledge discovery and data mining, pp. 1225–1234.
ACM, (2016).

[32] H. Whitney, ‘Congruent graphs and the connectivity of graphs’, Amer-
ican Journal of Mathematics, 54(1), 150–168, (1932).

[33] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi
Zhang, and Philip S Yu, ‘A comprehensive survey on graph neural net-
works’, arXiv preprint arXiv:1901.00596, (2019).

[34] Junyuan Xie, Ross Girshick, and Ali Farhadi, ‘Unsupervised deep em-
bedding for clustering analysis’, in International conference on ma-
chine learning, pp. 478–487, (2016).

[35] Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and Edward Y
Chang, ‘Network representation learning with rich text information.’,
in IJCAI, pp. 2111–2117, (2015).

[36] Yang Zhou, Sixing Wu, Chao Jiang, Zijie Zhang, Dejing Dou, Ruom-
ing Jin, and Pengwei Wang, ‘Density-adaptive local edge representa-
tion learning with generative adversarial network multi-label edge clas-
sification’, in 2018 IEEE International Conference on Data Mining
(ICDM), pp. 1464–1469. IEEE, (2018).


	1 Introduction
	2 Related Work and Research Gaps
	3 Problem Description
	4 Solution Approach: line2vec
	4.1 Line Graph Transformation
	4.2 Weighted Line Graph Formation
	4.3 Embedding the Line Graph
	4.3.1 Collective Homophily and Cost Function Formulation
	4.3.2 Solving the Optimization

	4.4 Key Observations and Analysis

	5 Experimental Evaluation
	5.1 Design of Baseline Algorithms
	5.2 Datasets Used and Setting Hyper-parameters
	5.3 Penalty Errors of line2vec Optimization
	5.4 Downstream Edge Mining Tasks
	5.5 Ablation Study of line2vec
	5.6 Parameter Sensitivity of line2vec
	5.7 Interpretation of cu as Node Embedding
	5.8 Connection of Node Centrality with Ru

	6 Discussion and Future Work