key: cord-0538596-qrk4s2nd
authors: Valentini, Giorgio; Casiraghi, Elena; Cappelletti, Luca; Ravanmehr, Vida; Fontana, Tommaso; Reese, Justin; Robinson, Peter
title: Het-node2vec: second order random walk sampling for heterogeneous multigraphs embedding
date: 2021-01-05
journal: nan
DOI: nan
sha: bb489fa4df4f20334cc0ecb75e592d4278371d66
doc_id: 538596
cord_uid: qrk4s2nd

We introduce a set of algorithms (Het-node2vec) that extend the original node2vec node-neighborhood sampling method to heterogeneous multigraphs, i.e. networks characterized by multiple types of nodes and edges. The resulting random walk samples capture both the structural characteristics of the graph and the semantics of the different types of nodes and edges. The proposed algorithms can focus their attention on specific node or edge types, allowing accurate representations also for underrepresented types of nodes/edges that are of interest for the prediction problem under investigation. These rich and well-focused representations can boost unsupervised and supervised learning on heterogeneous graphs.

In the field of biology, medicine, social science, economy, and many other disciplines, the representation of relevant problems through complex graphs of interrelated concepts and entities motivates the increasing interest of the scientific community towards Network Representation Learning . Indeed by learning low-dimensional representations of network vertices that reflect the network topology and the structural relationships between nodes, we can translate the non-Euclidean graph representation of nodes and edges into a fully Euclidean embedding space that can be easily ingested into vector-based machine learning algorithms to efficiently carry out network analytic tasks, ranging from vertex classification and edge prediction to unsupervised clustering, node visualization and recommendation systems [Grover and Leskovec, 2016 , Zhang et al., 2016 , Martinez et al., 2017 , Wang et al., 2017 .

To this aim, in the past decade most of research efforts focused on homogeneous networks, by proposing matrix factorization-based methods [Natarajan and Dhillon, 2014] , random walk based methods [Perozzi et al., 2014, Grover and Leskovec, 2016] , edge modeling methods [Tang et al., 2015b] , Generative Adversarial Nets , and deep learning methods [Cao et al., 2016 , Hamilton et al., 2017 .

Nevertheless, the highly informative representation provided by graphs that include different types of entities and relationships is the reason behind the development of increasingly complex networks, also including Knowledge Graphs [Dai et al., 2020] , sometimes referred as multiplexheterogeneous networks [Valdeolivas et al., 2019] , or simply as heterogeneous networks [Dong et al., 2020] (Section 3), where different types of nodes and edges are used to integrate and represent the information carried by multiple sources of information. Following this advancements, Heterogeneous Network Representation Learning (HNRL) algorithms have been recently proposed to process such complex, heterogeneous graphs [Dong et al., 2020] .

The core issue with HNRL is to simultaneously capture the structural properties of the network and the semantic properties of the heterogeneous nodes and edges; in other words, we need node and edge type-aware embeddings that can preserve both the structural and the semantic properties of the underlying heterogeneous graph.

In this context, from an algorithmic point of view, two main lines of research have recently emerged, both inspired by homogeneous network representation learning [Dong et al., 2020] : the first one leverages results obtained by methods based on the "distributional hypothesis" 1 , firstly exploited to capture the semantic similarity of words [Mikolov et al., 2013] , and then extended to capture the similarity between graph nodes [Grover and Leskovec, 2016] ; the second one exploits neural networks specifically designed to process graphs, using e.g. convolutional filters [Kipf and Welling, 2017] , and more generally direct supervised feature learning through graph convolutional networks [Hamilton et al., 2017] .

The methods of the first research line share the assumption that nodes having the same structural context or being topologically close in the network (homophily) are also close in the embedding space. Some of those methods separately process each homogeneous networks included in the original heterogeneous graph. As an example, in [Tang et al., 2015a ] the heterogeneous network is firstly projected into several homogeneous bipartite networks; then, an embedding representing the integrated multi-source information is computed by a joint optimization technique combining the skip-gram models individually defined on each homogeneous graph. A similar decomposition is initially applied in [Zitnik and Leskovec, 2017] , where the original heterogeneous graph is split into a set of hierarchically structured homogeneous graphs. Each homogeneous graph is then processed through node2vec [Grover and Leskovec, 2016] , and the embedding of the heterogeneous network is finally obtained by using recursive regularization, which encourages the different embeddings to be similar to their parent embedding. Another approach in this context constraints the random walks used to collect node contexts for the embeddings into specific meta-paths: the walker can step only between pre-specified pairs of vertices, thus better capturing the structural and semantic charateristics of the nodes [Dong et al., 2017] . Other related approaches combine vertex pair embedding with meta-path embeddings [Park et al., 2019] , or improves the heterogeneous Spacey random walk algorithm by imposing meta-paths, graphs and schema constraints [He et al., 2019] .

Differently from the "distributional hypothesis" approach that usually applies shallow neural networks to learn the embeddings, graph neural networks-based (GNN) approaches apply deep neural-network encoders to provide more complex representations of the underlying graph [Wu et al., 2020] . By this approach the deep neural network recursively aggregates information from neighborhoods of each node, in such a way that the node neighborhood itself defines a computation graph that learns how to propagate information across the graph to compute the node features [Hamilton et al., 2017 , Gilmer et al., 2017 .

As it often happens for the distributional approach, the usual strategy used by GNN to deal with heterogeneous graphs is to decompose them into its homogeneous components. For instance, Relational Graph Convolutional Networks [Schlichtkrull et al., 2018] maintain distinct weight matrices for each different edge type, or Heterogeneous Graph Neural Networks , apply first level Recurrent Neural Networks (RNN) to separately encode features for each type of neighbour nodes, and then a second level RNN to combine them. Also Decagon [Zitnik et al., 2018] , which has been successfully applied to model polypharmacy side effects, uses a graph decomposition approach by which node embeddings are separately generated by edge type and the resulting computation graphs are then aggregated. Other approaches add meta-path edges to augment the graph or learns attention coefficients that weight the importance of different types of vertices [Chen et al., 2018] .

The drawbacks of all the aforementioned GNN approaches is that some relations may not have sufficient occurrences, thus leading to poor relation-specific weights in the resulting GNN. To overcome this problem, an Heterogeneous GNN [Hu et al., 2020] that uses the Transformer-like selfattention architecture have been recently proposed.

Despite the impressive advancements achieved in recent years by all the aforementioned methods (distributional approaches and GNN-based approaches), they both show drawbacks and limitations.

Indeed, methods based on the "distributional hypothesis", which base the embeddings on the random neighborhood sampling, usually rely on the manual exploration of heterogeneous structures, i.e. they require human-designed meta-paths to capture the structural and semantic dependencies between the nodes and edges of the graph. This requires human intervention and non-automatic preprocessing steps for designing the meta-paths and the overall network scheme. Moreover, similarly to Heterogeneous GNN, in most cases they treat separately each type of homogeneous networks extracted from the original heterogeneous one and are not able to focus on specific types of nodes or edges that constitute the objective of the underlying prediction task (e.g. prediction of a specific edge type).

For what regards GNN, an open issue is represented by their computational complexity, which is exacerbated by the intrinsic complexity of heterogeneous graphs, thus posing severe scaling limitations when dealing with big heterogeneous graphs. Moreover in most cases heterogeneous GNN models use different weight matrices for each type of edge or node, thus augmenting the complexity of the learning model. Some GNN methods augment the graphs by leveraging human-designed meta paths, thus showing the same limitation of distributional approaches, i.e. the need of human intervention and non-automatic pre-processing steps.

To overcome some of these drawbacks we propose a general framework to deal with complex heterogeneous networks, in the context of the previously discussed "distributional hypothesis" randomwalk based research line. The proposed approach, that we named Het-node2vec to remark its derivation from the classical node2vec algorithm [Grover and Leskovec, 2016] , can process heterogeneous multi-graphs characterized by multiple types of nodes and edges and can scale up with big networks, due to its intrinsic parallel nature. Het-node2vec does not require manual exploration of heterogeneous structures and meta-paths to deal with heterogeneous graphs, but directly models the heterogeneous graph as a whole, without splitting the heterogeneous graph in its homogeneous components. It can focus on specific edges or nodes of the heterogeneous graph, thus introducing a sort of "attention" mechanism [Bahdanau et al., 2015] , conceptually borrowed from the deep neural network literature, but realized in an original and simple way in the world of random walk visits of Figure 1 : A step (at time t) of a 2 nd order random walk in a homogeneous graph presented in [Grover and Leskovec, 2016] . At time t − 1 the random walk was in X(t − 1) = r, and has just moved from node r to node v (image taken from [Grover and Leskovec, 2016] ). Then, the probability of moving to any neighboring node is given by αw v,x , where w v,x is the edge weight, and α depends on a "return" parameter, p, and on an "outward" parameter q.

heterogeneous graphs. Our proposed approach is particularly appropriate when we need to predict edge or node types that are underrepresented in the heterogeneous network, since the algorithm can focus on specific types of edges or nodes, even when they are largely outnumbered by the other types. At the same time the proposed algorithms learn embeddings that are aware of the different types of nodes and edges of the heterogeneous network and of the topology of the overall network.

In the next section we summarize node2vec, since Het-node2vec can be considered as its extension to heterogeneous graphs. Then, we present Het-node2vec declined in three flavours to process respectively graphs having Heterogeneous Nodes and Homogeneous Edges (HeNHoE-2vec), Homogeneous Nodes and Heterogeneous Edges (HoNHeE-2vec), and full Heterogeneous graphs having both Heterogeneous Nodes and Edges, through HeNHeE-2vec, a more general model that integrates HeNHoE-2vec and HoNHeE-2vec.

Leskovec's node2vec classical algorithm applies a 2 nd order random walk to obtain samples in the neighborhood of a given node [Grover and Leskovec, 2016] .

Let X(t) be a random variable representing the node v ∈ V visited at time (step) t during a random walk (RW) across a graph G = (V, E). Considering a 2 nd order RW we are interested at estimating the probability P of a transition from X(t) = v to X(t + 1) = x, given that node r was been visited in the previous step of the RW, i.e. X(t − 1) = r:

According to Leskovec (see Fig.1 ), this probability can be modeled by a (normalized) transition matrix Π with elements:

where w v,x is the weight of the edge (v, x) ∈ E and α p,q is defined as:

where:

• d r,x is the "distance" from node r to node x, that is the length of the shortest path from r to x, whereby d ∈ {0, 1, 2};

• p is called the return (inward) hyper-parameter, and controls the likelihood of immediately revisiting a node in the walk;

• q is the in-out hyper-parameter which controls the probability of exploring more distant parts of the graphs.

The advantage of this parametric "biased" RW is that by tuning p and q parameters we can both leverage the homophily (through a Depth-First Sampling (DFS)-like visit) and the structural (through a Breath-First Sampling (BFS)-like visit) characteristics of the graph under study. More precisely:

• setting p < min(1, q) promotes the tendency of the RW 2 to backtrack a step, thus keeping the walk local by returning to the source node r;

• parameter q allow us to simulate a DFS (q < 1), thus capturing the homophily characteristics of the node, or a BFS (q > 1), thus capturing the structural characteristics of the node.

Het-node2vec introduces 2 nd order RWs that are node and edge type-aware, in the sense that the RW are biased according to the different types of nodes and edges in the graph. This is accomplished by introducing "switching" parameters that control the way the RW can move between different node and edge types, thus adding a further bias to the RW towards specific types of nodes and edges. From this standpoint our approach resembles the method proposed in [Valdeolivas et al., 2019] , where "jumps" between different types of edges or nodes (inspired to Levy flights [Guo et al., 2016 , Riascos and Mateos, 2012 ) are stochastically run across an heterogeneous graph. Nevertheless, the authors focused on a different problem using a different algorithm: they proposed a first order random walk with restart in heterogeneous multi-graphs to predict node labels, while our approach performs biased second order random walks for graph representation learning purposes.

Het-node2vec adopts a sort of "attention" mechanism [Velickovic et al., 2018] in the world of RW, in the sense that RW samples are concentrated on the parts of the graph most important for the problem under investigation, while other less important parts received less attention, i.e. they are visited less intensely by the RW algorithm.

In this section we introduce three algorithms that extend node2vec to heterogeneous networks: The algorithms differ in the way the "biased" random walk is implemented, but share the common idea that the random walk can switch to nodes of different type (HeNHoE-2vec) or can switch to edges of different type (HoNHeE-2vec) or can switch to both nodes and edges of different type (HeNHeE-2vec).

In particular HoNHeE-2vec and HeNHeE-2vec can manage multigraphs, i.e. graphs characterized by multiple edges of different types between the same two nodes.

It is worth noting that HeNHoE-2vec and HoNHeE-2vec are special cases of HeNHeE-2vec.

Figure 2: A heterogeneous multigraph with nodes and edges of different types. Different colors are used to represent node and edges types. Multiple types of edges may connect the same pair of nodes. Fig. 2 represents a heterogeneous network with different types of nodes and edges. Different colours represent different node and edge types. The proposed algorithms are able to manage heterogeneous networks and multigraphs, i.e. graphs where the same pair of nodes may be connected by multiple types of edges. For didactic purposes, we show a graph with four different subgraphs in Figure 2 to illustrate the requirements for the three algorithms.

G 1 is a multigraph having nodes of the same type (cyan nodes) but edges of different types (edges having different colours) and multiple edges may connect the same pair of nodes: for this subgraph we can apply HoNHeE-2vec, since this algorithm can manage homogeneous nodes and heterogeneous edges.

G 2 is a graph having different types of nodes, but the same type of edges (green edges): for this subgraph we can apply HeNHoE-2vec, since this algorithm can manage heterogeneous nodes and homogeneous edges.

G 3 is a multigraph having both different types of nodes and edges, and the same pair of nodes may be connected by multiple edges: for this subgraph we can apply HeNHeE-2vec, since this algorithm can manage heterogeneous nodes and heterogeneous edges.

Finally G 4 is a graph with both homogeneous nodes and edges (violet nodes and red edges): for this subgraph the classical node2vec suffices, since this algorithm can be applied to graphs having homogeneous nodes and edges.

Note that the overall graph depicted in Fig. 2 can be managed with HeNHeE-2vec, since this algorithm generalizes both node2vec, HoNHeE-2vec and HeNHoE-2vec.

3.2 HeNHoE-2vec. Heterogeneous Nodes Homogeneous Edges to vector algorithm

This algorithm extends node2vec by specifying the probabilities for the RW to switch between nodes with different types, adding the possibility to switch to another node x o having a type different from the source node v s . One idea for the algorithm is that if we have a total of n node types, we can either define the probability of switching between any two pairs of types, or we can define up to n − 2 types as 'special' and specify the probability of switching from a non-special node to a special node or vice versa, and analogously with edges. In most cases, we would define one category as 'special'. More precisely, given a function that returns the type of each node φ :

, using a switching parameter s to modulate the probability to move toward a heterogeneous node. Note that the subscript s stands for same type of the source node v s , while o stands for other type (see Fig. 3 ). More precisely we have:

• r s or r o : the node visited before v s (preceding node) having (r s ) the same type as v s , i.e. The transition matrix used to decide where to "move" next is computed by introducing a "switch" parameter s, used in the calculation of α p,q,s :

Eq. 3 shows an implementation of a 2 nd order RW for both the situations in which: a) at step t + 1 we move to a node having the same type of v s (φ(v) = φ(x)); b) at step t + 1 we move toward a node having a different type (φ(v) = φ(x)).

To better understand the dynamic of a 2 nd order RW in a heterogeneous network we consider two distinct cases (Fig. 3) :

a. At step t − 1 we start from a node of the same type of X(t) = v s , and hence X(t − 1) = r s ; b. At step t−1 we start from a node having type different from X(t) = v s , and hence X(t−1) = r o These two cases in a 2 nd order RW are mutually exclusive in the sense that obviously for a fixed t we may have either X(t − 1) = r s or X(t − 1) = r o

The first case (X(t − 1) = r s ) is depicted in Fig. 3 (Left) . When walking in the homogeneous graph which v s belong to, the α originally defined in [Grover and Leskovec, 2016 ] are used to weight edges toward x s ; when we move to another type of node, all the corresponding edges starting from v s are weighted through the switching parameter s (eq. 3).

The second case (X(t − 1) = r o ) is depicted in Fig. 3 (Right). In this situation the node visited at step t − 1 has a type different from that of v s : for instance node X(t − 1) = r o is a pink node while node node X(t − 1) = v s is green, and the α p,q,s values vary according to eq. 3).

The general idea of the proposed approach is to use the classical node2vec when moving inside a node homogeneous sub-graph; however, we consider the possibility to switch to other sub-graphs having different type of nodes, based on the value of an additional switching hyper-parameter s.

Essentially, if s < 1, a switch between heterogeneous sub-graphs is encouraged. Moreover if q < 1 Depth-First-Search sampling (DFS) is encouraged, where consecutively sampled nodes tend to have an increasing distance from the starting node.

Conversely, if s > 1, switches are discouraged and the RW tends to more cautiously stay in the same homogeneous graph. This especially happens when also q > 1, encouraging a Breadth-First visit.

Summarizing, with respect to the original node2vec setting, that considers the interplay between the parameter q that controls the depth-first or breadth-first like steps, and the p "return" parameter that controls the probability of returning to to the previous node, HeNHoE-2vec introduces the novel s "switching" parameter that allows us a node type-aware RW actoss the graph. 

A single new parameter s allows us to maintain the memory about the diversity of types between the preceding node and the node where to move. We name this modality of switching between different types of nodes as simple switching. Nevertheless note that, using only one s parameter for all the switches between different node types, we loose the specificity of switching between two specific types of nodes of the overall heterogeneous network.

Multiple switching To overcome this problem we can introduce different s ij for each switch from type V i to type V j (and viceversa). This approach, that we name multiple switching can introduce more control on the specificity of the switching process, but at the expenses of a major complexity of the resulting model. Indeed if we have n different types of nodes we can introduce till to n 2 = O(n 2 ) different node switching parameters.

Special node switching To avoid the complexity induced by multiple switching parameters, and to focus on specific node types, we can define a subset of the n node types as special and we could simply specify the probability of switching from a non-special node to a special node or vice versa. This can be easily accomplished by relabeling the node types included in the special set as "special" and those not included as "non special": in this way we can bring back to the simple switching algorithm (eq. 3) with only two types of nodes defined. It is easy to see that this special node switching is a special case of the multiple switching: for instance if we have four node types and V 1 nodes are special and the other ones non special, we can set s 12 = s 13 = s 14 = c, and by setting c < 1 we can focus on switching to node of type V 1 .

Versus specific switching We note that we have considered only undirected edges. With an edge (x, y) and φ(x) = φ(y), say x ∈ V 1 , and y ∈ V 2 , we have the same switching parameter s or s 12 when both moving from x to y and from y to x . This could be fine, but in this way we cannot capture the versus specific switching. For instance, if we would like to focus on V 1 nodes and we are in y ∈ V 2 , by setting s << 1 we improve the probability to switch to x ∈ V 1 . But if the neighbours of x are nodes of type V 1 , it is likely that at the next step we will come back to y. In this way we cannot specifically focus the RW on nodes of type V 1 . To overcome this problem we could define different switching values s 12 < s 21 , thus modeling in different way the switch from V 1 to V 2 (s 12 ) with respect to the switch from V 2 to V 1 (s 21 ). This approach could be also applied to special nodes, by setting two different switching parameters for moving from special to non special and from non special to special nodes.

The value for the hyper-parameter s (or s i,j ) should be "tuned" in order to obtain "good" node/edge embeddings, that is embeddings exploiting the topology of the overall heterogeneous network. This could be done by Bayesian optimization/grid search/random search. If we set s < 1 or s > 1 we can respectively encourage or discourage the visit of different types of nodes-On the contrary, by setting s = 1 we in practice transform a heterogeneous network to a homogeneous one, and eq. 3 reduces itself to the original 2 nd order random walk of the Leskovec's node2vec. Moreover if we set p = 1 and q = 1 we obtain a 1 st order random walk. If we finally set s = p = q = 1 we obtain a 1 st order random walk in a homogeneous network, since we treat nodes in the same way without considering their different types.

We may have at most six different α p,q,s values associated to the edges that connect X(t) = v s to either nodes of the same type or nodes of different type of v s , namely X(t + 1) = x, coming either from a node X(t − 1) = r s or X(t − 1) = r o .

These α values are used to compute the unnormalized transition probability matrix Π associated with the heterogeneous network. For instance, looking at Fig. 3 , the unnormalized probability of moving from v s to the node x 2 s of the same black subgraph is: P X(t + 1) = x 2 s |X(t) = v s , X(t − 1) = r o = π x 2 s ,vs,ro = α p,q,s w vs,x 2 s = 1 q w vs,x 2 s As another example, the unnormalized probability of moving from v s to a node x 2 o belonging to a different subgraph is:

As a final example, the unnormalized probability of coming back to node r o is:

To normalize the transition probabilities π to normalized probabilitiesπ, we can divide the unnormalized transition probabilities for the sum of the unnormalized ones connecting v s to its neighbours N (v s ):π

x,vs,r = v∈N (vs) π x,vs,r π v,vs,r Figure 4 : 2 nd order RW according to the HoNHeE-2vec algorithm. The node v represents the node visited at step t, while r the node visited at step t − 1. Red edges represent switching edges. The weights over the edges represent the β p,q,e values according to eq. 6.

This algorithm explicitly considers multigraphs, and can be applied to graphs with homogeneous nodes and heterogeneous edges, including edges of different types between the the same two nodes. We can define a function ψ : E → Σ E , where Σ E represent different edge types. We say that we have an "edge switch" in a 2 nd order random walk when, having X(t − 1) = r, X(t) = v, X(t + 1) = x, the edge (r, v) has different type than (v, x), i.e. ψ(r, v) = ψ(v, x).

Looking at Fig. 4 , to model the 2 nd order transition probabilities of moving from v to one of its neighbouring nodes x, i.e. P (X(t + 1) = x|X(t) = v, X(t − 1) = r) = π vx = β p,q,e w vx

where w vx is the weight of the edge (v, x), we need to define the coefficient β p,q,e that biases the RW according to the three parameters < p, q, e >. Basically the algorithm is similar to HeNHoE-2vec, but this time a new parameter e controls the probability of "edge switching" instead of "node switching" and can model the multigraph case with homogeneous nodes.

In Fig. 4 red edges refer to an edge switch, i.e. ψ(r, v) = ψ(v, x), and the corresponding values of β are shown. Depending on the distance of the node x from r and on the possible "switch" to another type of edge, we may have six different values of β and of the resulting unnormalized transition probabilities π v,x (see Fig. 4) :

The notation π (vx) using round brackets around the edge (v, x) indicates that in the step from v to x an edge switch has occurred.

Summarizing, considering a network with homogeneous nodes and heterogeneous edges G = (V, E, ψ), with ψ : E → Σ E , then the 2 nd order transition probabilities:

where e is the "edge switching" parameter, can be modeled by computing β p,q,e in the following way:

In eq. 6 the "if ψ(r, v) = ψ(v, x)" branch manages a 2 nd order RW when no edge switch is performed: in practice this is the classical node2vec algorithm. The else branch manages the switch to another type of edge instead, i.e. the case when ψ(r, v) = ψ(v, x). The e > 0 parameter controls the probability of "switching" between two different types of edges: low values of e improve the probability to switch in the random walk from ad edge of a given type to an edge of a different type.

By setting e = 1, we treat the heterogeneous edges as homogeneous and in practice we reduce the algorithm to the classical node2vec. The meaning of the p and q parameter is the same of node2vec.

Analogously to HeNHoE-2vec, we may introduce multiple e parameters, one for each pair of types of edges, to model switches from a specific type of edge to another specific type. This can introduce more control in the biased random walk, but at the expenses of a more complex model.

To reduce complexity and to focus on on specific edge types, we can subdivide the in "special" and "non special", as just discussed for the case of heterogeneous nodes (Section 3.2.2) This algorithm can be applied to the synthetic lethality link prediction problem O 'Neil et al. [2017] , when we have homogeneous nodes and heterogeneous edges. In this use case, we would define SLI edges as 'special'.

3.4 HeNHeE-2vec. Heterogeneous Nodes Heterogeneous Edges to vector algorithm

This algorithm can be applied to multigraphs G = (V, E, φ, ψ) with φ : V → Σ V and ψ : E → Σ E having different types of nodes and edges. The general idea behind this algorithm is to allow us to "switch" between both different types of nods (as in the HeNHoE-2vec), and also between different types of edges (as in the HoNHeE-2vec).

To this end we need 4 parameters, i.e. p, q, the original node2vec parameters that control BFS and DFS-like visit of the graph and the probability to come back to the previous node X(t − 1), and two additional parameters s and e that control the probability to switch respectively to another type of node or edge.

Looking at Fig. 5 , v s represents the node visited by the RW at the first step of a 2 nd order RW, i.e. X(t) = v s . For node visited at the previous step, there are two possibilities: we can start from a node X(t − 1) = r s having the same type of v s (Fig. 5, Left) , or from a node X(t − 1) = r o having a different type of v s (Fig. 5, Right) :

At step t + 1 we can move back from v s to r s if X(t − 1) = r s or to r o if X(t − 1) = r o , or we can move to a a node x of the same type of v s at distance 1 (x 1 s ) or at distance 2 (x 2 s ) from r s or r o , or we can alternatively move to a node x having different type of v s , at distance distance 1 (x 1 o ) or at distance 2 (x 2 o ) from r s or r o : 1. P (X(t + 1) = x|X(t) = v, X(t − 1) = r) = π vx = γ p,q,s,e w v,x considering that from v s we can switch to a different type of node or edge at distance 0, 1 or 2 from the node visited at step t − 1, we may have twelve possibilities:

1. π vsrs = 1 p w vsrs 2. π (vsrs) = 1 pe w vsrs 3. π vsro = 1 ps w vsro 4. π (vsro) = 1 pse w vsro 5. π vsx 1 s = w vsx 1 s 6. π (vsx 1 s ) = 1 e w vsx 1

We recall that the notation π (vx) means the probability of moving from node v to x, having an edge switching. For instance the sixth equation of the previous table means:

Summarizing, considering a network with heterogeneous nodes and heterogeneous edges G = (V, E, φ, ψ), with φ : V → Σ V and ψ : E → Σ E , then 2 nd order transition probabilities P (X(t + 1) = x|X(t) = v, X(t − 1) = r) = π vx = γ p,q,s,e w v,x where s is the "node switching" and e the "edge switching" parameters, can be modeled by computing γ p,q,s,e through the HeNHeE-2vec algorithm:

(8)

HeNHeE-2vec is a general algorithm to perform 2 nd order RW in heterogeneous multigraphs, characterized by the following features:

• can be applied to multigraphs having multiple types of nodes and edges • can explore the graph according to a continuum between a BFS and a DFS visit (by tuning the q parameter)

• can control the probability of walking a step back to the original source node (by the p parameter)

• can control the probability to switch to different types of nodes (by the s parameter)

• can control the probability to switch to different types of edges (by the e parameter)

HeNHeE-2vec is a generalization of the the previously presented algorithms. Indeed:

• if we set e = 1 we obtain HeNHoE-2vec.

• if we set s = 1 we obtain HoNHeE-2vec.

• if we set e = 1 and s = 1 we obtain node2vec.

Analogously to HeNHoE-2vec and HoNHeE-2vec, we may introduce multiple s and e switching parameters, one for each pair of different types of nodes and edges. This approach on the one hand allow us to better control the biased RW, but on the other hand introduces a more complex model, with a large number of parameters to be tuned. Also in this case, analogously to Section 3.2.2, we can introduce "special" nodes and edges.

The algorithm is inherently parallel since we can generate multiple samples of 2 nd order RW in general heterogeneous networks starting at the same time from each node of the graph.

The complexity of a single 2 nd order RW step for single node of the graph is O(1). To perform one step for each node of the graph, the complexity is O(n) with n = |V |. To perform a random walk of length k, the complexity is O(kn), and to perform m RW of length k for each node the complexity is O(kmn). If km << n then the complexity is O(n), and if km n, then the complexity is O(n 2 ).

As an example of possible relevant applications of HeNHeE-2vec, we could consider the analysis of KG-COVID-19 [Reese et al., 2021] , or the analysis of the heterogeneous Monarch network [Shefchek et al., 2020] .

We proposed Het-node2vec an algorithmic framework to run biased 2 nd order random walks on heterogeneous networks, to obtain embeddings that are node and edge-type aware. The proposed approach can be viewed as an extension of the classical node2vec algorithm, since the random walk is further biased, according to the type of nodes and edges. This is accomplished through simple "switching" mechanisms, controlled by appropriated "switching parameters", by which a random walk can stochastically move between different types of nodes and edges, and can focus on specific types of nodes and edges that are of particular interest for the specific learning problem under investigation. The algorithms are general enough to be applied to general unsupervised and supervised problems involving graph representation learning, ranging from edge and node label prediction to graph clustering, considering different possible application domains, ranging from biology and medicine to economy, social sciences and more in general for the analysis of complex heterogeneous systems.

Neural machine translation by jointly learning to align and translate

Deep neural networks for learning graph representations

Heterogeneous graph neural networks for malicious account detection

Lfgcn: Levitating over graphs with levy flights

A survey on knowledge graph embedding: Approaches, applications and benchmarks

Metapath2vec: Scalable representation learning for heterogeneous networks

Heterogeneous network representation learning

Meaning and linguistic analysis. Language

Neural message passing for quantum chemistry

Node2vec: Scalable feature learning for networks

Levy random walks on multiplex networks

Inductive representation learning on large graphs

Hetespaceywalk: A heterogeneous spacey random walk for heterogeneous information network embedding

Association for Computing Machinery

Semi-supervised classification with graph convolutional networks

A survey of link prediction in complex networks

Distributed representations of words and phrases and their compositionality

Inductive matrix completion for predicting gene-disease associations

Synthetic lethality and cancer

Task-guided pair embedding in heterogeneous network

Deepwalk: Online learning of social representations

Kg-covid-19: A framework to produce customized knowledge graphs for covid-19 response. Patterns, page 100155

Long-range navigation on complex networks using lévy random walks

Modeling relational data with graph convolutional networks

The monarch initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species

Pte: Predictive text embedding through large-scale heterogeneous text networks

Republic and Canton of Geneva, CHE, 2015b. International World Wide Web Conferences Steering Committee

Random walk with restart on multiplex and heterogeneous biological networks

Graph attention networks. ArXiv

Learning graph representation with generative adversarial nets

Ensemble-spotting: Ranking urban vibrancy via poi embedding with multi-view spatial graphs

Community preserving network embedding

Heterogeneous graph attention network

A comprehensive survey on graph neural networks

Heterogeneous graph neural network

Homophily, structure, and content augmented network representation learning

Network representation learning: A survey

Predicting multicellular function through multi-layer tissue networks

Modeling polypharmacy side effects with graph convolutional networks