key: cord-0043196-5akx3o58 authors: Chen, Hsi-Wen; Shuai, Hong-Han; Wang, Sheng-De; Yang, De-Nian title: Quality-Aware Streaming Network Embedding with Memory Refreshing date: 2020-04-17 journal: Advances in Knowledge Discovery and Data Mining DOI: 10.1007/978-3-030-47426-3_35 sha: 1f99ae34fd23e4a4155ad00b86806884937fd585 doc_id: 43196 cord_uid: 5akx3o58 Static network embedding has been widely studied to convert sparse structure information into a dense latent space. However, the majority of real networks are continuously evolving, and deriving the whole embedding for every snapshot is computationally intensive. To avoid recomputing the embedding over time, we explore streaming network embedding for two reasons: 1) to efficiently identify the nodes required to update the embeddings under multi-type network changes, and 2) to carefully revise the embeddings to maintain transduction over different parts of the network. Specifically, we propose a new representation learning framework, named Graph Memory Refreshing (GMR), to preserve both global types of structural information efficiently. We prove that GMR maintains the consistency of embeddings (crucial for network analysis) for isomorphic structures better than existing approaches. Experimental results demonstrate that GMR outperforms the baselines with much smaller time. Low-dimensional vector representation of nodes in large-scale networks has been widely applied to a variety of domains, such as social media [13] , molecular structure [7] , and transportation [9] . Previous approaches, e.g., DeepWalk [13] , LINE [16] , and SDNE [20] , are designed to reduce the sparse structure information to a dense latent space for node classification [13] , link prediction [16] , and network visualization [21] . However, the above embedding schemes were not designed for evolutionary networks. Current popular networks tend to evolve with time, e.g., the average number of friends increases from 155 in 2016 and to 338 in 2018 [8] . Ephemeral social networks, like Snapchat for short-term conversations, may disappear within weeks. However, retraining the whole embedding for each snapshot is computationally intensive for a massive network. Therefore, streaming network embedding is a desirable option to quickly update and generate new embeddings in a minimum amount of time. Different from dynamic network embeddings [12, 21] that analyze a sequence of networks to capture the temporal patterns, streaming network embedding 1 aims to update the network embedding from the changed part of the network to find the new embedding. Efficient streaming network embedding has the following four main challenges. 1) Multi-type change. Dynamic changes of networks with insertions and deletions of nodes and edges are usually frequent and complex. It is thus important to derive the new embedding in minimum time to timely reflect the new network status. 2) Evaluation of affected nodes. Updating the embeddings of only the nodes neighboring to the changed part ignores the ripple effect on the remaining nodes. It is crucial to identify the nodes required to update the embeddings and ensure that the nodes with similar structures share similar embeddings. 3) Transduction. When a network significantly changes, it is difficult to keep the local proximity between the changed part and the remaining part of the network. It is also important to reflect the change in the global structure. 4) Quality guarantee. For streaming embeddings based on neural networks (usually regarded as a black box), it is challenging to provide theoretical guarantees about the embedding quality. To effectively address the above challenges, this paper proposes a new representation learning approach, named Graph Memory Refreshing (GMR) . GMR first derives the new embedding of the changed part by decomposing the loss function of Skip-Gram to support multi-type changes. It carefully evaluates the ripple-effect area and ensures the correctness by proposing a globally structureaware selecting strategy, named hierarchical addressing, to efficiently identify and update those affected nodes with beam search to avoid the overfitting problem. To effectively support streaming data, our idea is to interpret the update of embeddings as the memory networks with two controllers, a refreshing gate and percolation gate, to tailor the embeddings from the structural aspect and maintain the transduction. GMR then updates the embeddings according to the streaming information of the new network and the stored features (i.e., memory) of the current network to avoid recomputing the embedding of the whole network. Moreover, GMR aims to both preserve the global structural information and maintain the embeddings of isomorphic structures, i.e., ensuring that the nodes with similar local structures share similar embeddings. This property is essential to ensure the correctness of network analysis based on network embeddings [18] . We theoretically prove that GMR preserves the consistency of embeddings for isomorphic structures better than that of the existing approaches. The contributions of this paper are summarized as follows. -GMR explores streaming network embedding with quality guarantees. The hierarchical addressing, refreshing gate, and percolation gate efficiently find and update the affected nodes under multi-type changes. -We prove that GMR embedding preserves isomorphic structures better than the existing approaches. According to our literature review, this is the first theoretical analysis for streaming network embedding. -Experimental results show that GMR outperforms the baselines by at least 10.5% for link prediction and node classification with a much shorter time. Static network embedding has attracted a wide range of attention. Laplacian Eigenmaps [1] and IsoMaps [17] first constructed the adjacency matrix and then solved the matrix factorization, but the adjacency matrix was not scalable for massive networks. After Skip-Gram [11] was demonstrated to be powerful for representation learning, DeepWalk [13] and node2vec [5] employed random walks to learn network embedding, while LINE [16] and SDNE [20] were able to preserve the first-order and second-order proximity. GraphSAGE [6] and GAT [19] generated node representations in an inductive manner, by mapping and aggregating node features from the neighborhood. In addition, a recent line of research proposed to learn the embeddings from a sequence of networks over time for finding temporal behaviors [12, 21] . However, these approaches focused on capturing the temporal changes rather than the efficiency since they recomputed the embeddings of the whole network, instead of updating only the changed part. Another line of recent research studied the dynamic embedding without retraining. However, the SVD-based approach [22] was more difficult to support large-scale networks according to [5] . Besides, [10] only supported the edge insertion and ignored edge deletion, whereas the consistency of the embeddings for globally isomorphic structures was not ensured. Compared with the above research and [3] , the proposed GMR is the only one that provides a theoretical guarantee on the embedding quality (detailed later). It also more accurately preserves both the global structural information and the consistency of the embeddings. In this section, we present the definitions for streaming network embeddings. A dynamic network G is a sequence of networks G = {G 1 , · · · , G T } over time, where G t = (V t , E t ) is the network snapshot at timestamp t. ΔG t = (ΔV t , ΔE t ) represents the streaming network with the changed part ΔV t and ΔE t as the sets of vertices and edges inserted or deleted between t and t + 1. Let z i,t denote the streaming network embedding that preserves the structural property of v i ∈ G t at timestamp t. The streaming network embeddings are derived by Φ s = (φ s 1 , · · · , φ s t+1 , · · · , φ s T ), where φ s t+1 updates the node embedding z i,t+1 at timestamp t + 1 according to z t and ΔG t , i.e., In other words, the inputs of the streaming network function are the embedding in the current time and the changed part of the network. In contrast, for [12, 21] , given a dynamic network G, the embedding is derived by a sequence of functions Φ = (φ 1 , · · · , φ t+1 , · · · , φ T ), where φ t+1 maps the node v i to the d-dimensional embedding z i,t+1 at timestamp t + 1, i.e., z i,t+1 = φ t+1 (v i , G t+1 ). Therefore, the inputs are the whole networks in the current and next time. In the following, we present the problem studied in this paper. Given a streaming network with ΔV t and ΔE t as the sets of the vertices and edges inserted or deleted between t and t+1, the goal is to find the streaming network embedding and derive the corresponding embedding quality to ensure that the nodes with similar structures share similar embeddings. Later in Sect. 5, we formally present and theoretically analyze the quality of the embedding with a new metric, named isomorphic retaining score. Moreover, we prove that the proposed GMR better preserves the structures than other state-of-the-art methods in Theorems 1. In this section, we propose Graph Memory Refreshing (GMR) to support multitype embedding updates, to identify the affected nodes required to update the embeddings by hierarchical addressing, and to ensure that the nodes with similar structures share similar embeddings. To effectively support streaming data, we leverage the controllers (refreshing and percolation gates) of memory networks [4] to refresh the memory (update the embedding) according to the current state (the current embedding) and new input (streaming network). For each node v i , the Skip-Gram model predicts the context nodes v j ∈ N (v i ) and maximizes the log probability, However, it is computationally intensive to derive the above probabilities for all nodes. Therefore, the probabilities are approximated by negative sampling [11] , where σ(x) = 1/(1 + e −x ) is the sigmoid function, z i and z j are respectively the embedding vectors of v i and v j , and P N (v i ) is the noise distribution for negative sampling. The two terms respectively model the observed neighborhoods and the negative samples (i.e., node pairs without an edge) drawn from distribution P N (v i ). However, Eq. (4.2) focuses on only the edge insertion. To support the edge deletion, the second part in Eq. (4.2) is revised to consider unpaired negative samples and the deletion as follows, where D is the set of deleted edges, and α is required to be set greater than 1 because the samples from D usually provide more information than the unpaired negative samples P (v i ). 2 Note that node deletion is handled by removing all incident edges of a node, while adding a node with new edges is regarded as the edge insertion. 3 (a) Construction of the addressing tree, t = 1. (b) Searching of the most affected nodes for v4 on the addressing tree, t = 2. For streaming network embedding, previous computationally intensive approaches [4] find the embeddings of all nodes by global addressing. A more efficient way is updating only the neighboring nodes of the changed part with local addressing [10] . However, the ripple-effect area usually has an arbitrary shape (i.e., including not only the neighboring nodes). Therefore, instead of extracting the neighboring nodes with heuristics, hierarchical addressing systematically transforms the original network into a search tree that is aware of the global structure for the efficient identification of the affected nodes to update their embeddings. Hierarchical addressing has the following advantages: 1) Efficient. It can be regarded as a series of binary classifications (on a tree), whereas global addressing and local addressing belong to multi-class classification (on the candidate list). Therefore, the time complexity to consider each node in ΔV t is reduced from where k is the number of search beams (explained later). 2) Topology-aware. It carefully examines the graph structure to evaluate the proximity and maintain the isomorphic structure, i.e., ensuring that the nodes with similar structures share similar embeddings. This property is essential for the correctness of network analysis with network embeddings [18] . Specifically, hierarchical addressing first exploits graph coarsening to build an addressing tree for the efficient search of the affected nodes. Graph coarsening includes both first-hop and second-hop collapsing: first-hop collapsing preserves the first-order proximity by merging two adjacent nodes into a supernode; second-hop collapsing aggregates the nodes with a common neighbor into a supernode, where the embedding of the supernode is averaged from its child nodes [2] . Second-hop collapsing is prioritized because it can effectively compress the network into a smaller tree. The network is accordingly transformed into an addressing tree with each node v ∈ V t as a leaf node. Afterward, for each node v i ∈ ΔV t , we search for the node v j ∈ V t sharing the highest similarity with v i as the first affected node for v i by comparing their cosine similarity [4] along the addressing tree. For each node in the tree, if the left child node shares a greater similarity to v i , the search continues on the left subtree; otherwise, it searches the right subtree. The similarity search ends when it reaches the leaf node with the highest similarity to v i , and any node in V t (not only the neighbors of v i ) is thereby allowed to be extracted. In other words, hierarchical addressing enables GMR to extract the affected nodes located in different locations of the network (not necessary to be close to v i ), whereas previous approaches [3, 10, 21] update only the neighboring nodes of v i . Afterward, hierarchical addressing extracts the top-1 result for all nodes in ΔV t as the initially affected nodes (more will be included later), where the nodes with the similarity smaller than a threshold h are filtered. To prevent over-fitting in a local minimum, hierarchical addressing can also extract the top-k results at each iteration with the beam search. 4 Figure 1 presents an example of hierarchical addressing with the dimension of embeddings as 2. At timestamp t = 1 ( Fig. 1(a) ), we construct the addressing tree by first merging nodes v 1 and v 2 into supernode u 12 through second-hop collapsing. The embedding of u 12 is 0.5 · (0.4, 0.4) + 0.5 · (0.2, 0.8) = (0.3, 0.6). Afterward, v 3 merges u 12 into u 123 through first-hop collapsing, and u 123 is the root of the tree. At t = 2 ( Fig. 1(b) ), if a new node v 4 is linked to v 1 with the embedding as (0.3, 0.2), we identify the affected nodes with bream search (k = 2) and start from the root u 123 . First, we insert v 3 and u 12 into the search queue with the size as 2 since k = 2, to compare the similarity of v 4 with that of v 3 and u 12 . Both u 12 and v 3 are then popped out from the queue because v 1 and v 2 have higher similarity i.e., the top-2 results (0.78 and 0.98), compared with 0.73 for v 3 . After identifying the nodes required to update the embeddings by hierarchical addressing, a simple approach is to update the embeddings of those affected nodes with a constant shift [6, 20] . However, a streaming network with a topology change on only a subset of nodes usually leads to different shifts for the nodes in distinct locations. Moreover, updating only the nodes extracted from hierarchical addressing is insufficient to ensure consistency of embeddings for the nodes with similar structures when the embeddings are tailored independently. To effectively support streaming data, inspired by the gating mechanism in GRU [4] , we parameterize the update of the embedding according to the current embedding and incoming streaming network. Specifically, GMR decomposes the update procedure into two controller gates: a refreshing gate g r and percolation gate g p . For each node v j selected in hierarchical addressing for each v i ∈ ΔV t , the refreshing gate first updates the embedding of v j according the new embedding of v i , and the percolation gate then updates the embedding for every neighbor v k of v j from the new embedding of v j . The refreshing gate quantifies the embedding update for v j from an incoming stream (i.e., one-toone update), while the percolation gate transduces the embedding of v j to its neighborhoods (i.e., one-to-many update) to preserve better local structure. The two gates are the cornerstones to maintain isomorphic structure, as proved later in the Theorem 1. To update the embeddings of v j , i.e., updating z j,t+1 from z j,t , we first define a shared function a r to find the refreshing coefficient ρ r , which represents the correlation between the embedding of v j and the new embedding of v i , i.e., ρ r = a r (z i,t+1 , z j,t ). The refreshing gate selects the correlation function [19] as the shared function a r to extract the residual relation [19] between the two embeddings, instead of directly adopting a constant shift as was done in previous work. Here a r ∈ R 2d is a shift projection, and ρ r is derived by a r T [z i,t+1 ||z j,t ], where || is the vector concatenation operation. After this, we regulate refreshing coefficient ρ r into [0, 1] by a sigmoid function g r = σ(ρ r ) to provide a non-linear transformation. Therefore, g r quantifies the extent that z i,t+1 affects z j,t , Thereafter, the percolation gate revises the embedding of the neighbor nodes of v j to ensure the consistency of the embeddings for the nodes with similar structures. The percolation gate learns another sharable vector a p ∈ R 2d and finds the percolation coefficient ρ p = a p T [z j,t+1 ||z k,t ], to quantify the extent that v j affects v k . Similarly, we regulate ρ p by g p = σ(ρ p ) to update z k,t as follows, (4.5) Therefore, when the refreshing and percolation gates are 0, the streaming network is ignored. In contrast, when both gates become 1, the previous snapshot embedding is dropped accordingly. In summary, the refreshing and percolation gates act as decision makers to learn the impact of the streaming network on different nodes. For the percolation gate, when node v j is updated, the percolation gate tailors the embedding of each v k ∈ N 1 (v j ), 5 by evaluating the similarity of v j and v k according to the embeddings z k and z j . If v j and v k share many common neighbors, the percolation value of (v j , v k ) will increase to draw z k and z j closer to each other. The idea is similar for the refreshing gate. Note that a r and a p are both differentiable and can be trained in an unsupervised setting by maximize the objective Eq. (4.3). The unsupervised loss can also be replaced or augmented by a task-oriented objective (e.g., cross-entropy loss) when labels are provided. We alternatively update the embeddings (i.e., z i,t and z j,t ) and the correlation parameters (i.e., a r and a p ) to achieve better convergence. Figure 2 illustrates an example of updating the node v 3 . After the embedding of v 3 updated from (0.8, 0.1) to (0.9, 0.1), GMR uses the percolation gate to transduce the embedding to the neighborhood nodes (i.e., v 1 , v 2 , and v 4 ) to preserve the local structure. Since v 1 shares more common neighbors (v 4 ) with v 3 than v 2 (none), the values of percolation gate for v 1 and v 2 are 0.8 and 0.5, respectively. The embeddings of node v 1 and v 2 become (0.76, 0.16) = 0.2 · (0.4, 0.4) + 0.8 · (0.9, 0.1) and (0.55, 0.45) = 0.5 · (0.2, 0.8) + 0.5 · (0.9, 0.1) through the percolation gate from v 3 , respectively. Therefore, relative distance between z 3 − z 2 and z 3 − z 1 can be maintained. The quality of network embedding can be empirically evaluated from the experiment of network analysis, e.g., link prediction [16] and node classification [13] , since the network embedding algorithm is unsupervised learning without knowing the ground truth. In contrast, when the network analysis task is unknown a priori, it is important to theoretically analyze the quality of network embedding. To achieve this goal, we first define the isomorphic pairs and prove that the embeddings of isomorphic pairs are the same in GMR. This property has been regarded as a very important criterion to evaluate the quality of network embedding [18] , because the nodes with similar structures are necessary to share similar embeddings. Moreover, the experimental results in Sect. 6 manifest that a higher quality leads to better performance on task-oriented metrics. Pair ) . Any two different nodes v i and v j form an isomorphic pair if the sets of their first-hop neighbors N 1 (.) are the same. is also an isomorphic pair. Proof: According to Definition 4, is also an isomorphic pair. Proof: We first prove the sufficient condition. If (v i , v j ) is an isomorphic pair with z i = z j , the probability of v i to predict the context nodes is not to equal to that of v j (Eq. (4.1)). Therefore, there exists a better solution that makes z i and z j be equal, contradicting the condition that the algorithm has converged. For the necessary condition, if z i = z j but (v i , v j ) is not an isomorphic pair, since the probabilities are equal and the algorithm has converged, N (v i ) should be identical to N (v j ) for Eq. (4.1), contradicting that (v i , v j ) is not an isomorphic pair. The lemma follows. As proved in [14] , the network embedding algorithms can be unified into the factorization of the affinity matrix. Therefore, nodes with the same first-hop neighborhood have the same embedding when the decomposition ends. Based on Lemma 2, we define the isomorphic retaining score as follows. Retaining Score) . The isomorphic retaining score, denoted as S t , is the summation of the cosine similarity over every isomorphic pair in G t , S t ∈ [−1, 1]. Specifically, where s ij,t is the cosine similarity between z i,t and z j,t , and ξ t is the set of isomorphic pairs in G t . In other words, the embeddings of any two nodes v i and v j with the same structure are more consistent to each other if s ij,t is close to 1 [18] . Experiment results in the next section show that higher isomorphic retaining scores lead to better performance of 1) the AUC score for link prediction and 2) the Macro-F1 score for node classification. The following theorem proves that GMR retains the isomorphic structure better than other Skip-Gram-based approaches, e.g., [5, 13, 16] , under edge insertion. Afterward, the time complexity analysis is presented. Proof: Due to the space constraint, Theorem 1 is proved in the online version. 6 Time Complexity. In GMR, the initialization of the addressing tree involves O(|V 1 |) time. For each t, GMR first updates the embeddings of ΔV t in O(|ΔV t | log(|ΔV t |)) time. After this, hierarchical addressing takes O(k|ΔV t | log(|V t |)) time to identify the affected nodes. Notice that it requires O(|ΔV t | log(|V t |) time to update the addressing tree. To update the affected nodes, the refreshing and percolation respectively involve O (1) and O(d max ) time for one affected node, where d max is the maximum node degree of the network. Therefore, updating all the affected nodes requires O(kd max |ΔV t |). Therefore, the overall time complexity of GMR is O(kd max |ΔV t | + k|ΔV t | log(|V t |)), while retraining the whole network requires O(|V t | log(|V t |)) time at each timestamp. Since k is a small constant, d max |V t |, and |ΔV t | |V t |, GMR is faster than retraining. To evaluate the effectiveness and efficiency of GMR, we compare GMR with the state-of-the-art methods on two tasks, i.e., link prediction and node classification. For the baselines, we compare GMR with 1) Full, which updates the whole network with DeepWalk [13] ; 2) change [3] , which only takes the changed part as the samples with DeepWalk; 7 3) GraphSAGE [6] , which derives the embeddings from graph inductive learning; 4) SDNE [20] , which extends the auto-encoder model to generate the embeddings of new nodes from the embeddings of neighbors; 5) CTDNE [12] , which performs the biased random walk on the dynamic network; 8 and 6) DNE [3] , which updates only one affected node; 7) SLA [10] , which handles only node/edge insertion; 8) DHPE [22] , which is an SVD method based on matrix perturbation theory. The default α, h, k, d, batch size, and learning rate are 1, 0.8, 3, 64, 16, and 0.001, respectively. Stochastic gradient descent (SGD) with Adagrad is adopted to optimize the loss function. For link prediction, three real datasets [15] for streaming networks are evaluated: Facebook (63,731 nodes, 1,269,502 edges, and 736,675 timestamps), Yahoo (100,001 nodes, 3,179,718 edges, and 1,498,868 timestamps), and Epinions (131,828 nodes, 841,372 edges, and 939 timestamps). 9 The concatenated embedding [z i ||z j ] of pair (v i , v j ) is employed as the feature to predict the link by logistic regression. 10 Table 1 reports the AUC [5] , isomorphic retaining score S in Eq. (5.1), and running time of different methods. 11 The results show that the proposed GMR achieves the best AUC among all streaming network embedding algorithms. Compared with other state-of-the-art baselines, GMR outperforms other three baselines in terms of AUC by at least 17.1%, 15.7% and 11.3% on Facebook, Yahoo and Epinions, respectively. Besides, GMR is close to that of Full(1.7% less on Facebook, 0.6% more on Yahoo and 2.2% less on Epinions), but the running time is only 4.7%. Moreover, GraphSAGE has relatively weak performance since it cannot preserve the structural information without node features. The running time of SDNE is 2.1× greater than that of GMR due to the processing of the deep structure, while the AUC of SDNE is at least 12.5% less than that of GMR on all datasets. 9 Facebook and Epinions contain both the edge insertion and deletion, represented by "i j -1 t" for removing edge (i, j) at timestamp t. Yahoo lacks deletion since it is a message network. 10 For link prediction, at time t, we predict the new edges for time t + 1 (excluding the edges incident to the nodes arriving at time t + 1). 11 For Full, due to high computational complexity in retraining the networks for all timestamps, we partition all timestamps into 50 parts [23] with the network changes aggregated in each part. Compared to other streaming network embedding methods (e.g., DNE, SLA, and DHPE), GMR achieves at least 10.8% of improvement because the embeddings of other methods are updated without considering the global topology. In contrast, GMR selects the affected nodes by globally structure-aware hierarchical addressing, and the selected nodes are not restricted to the nearby nodes. Furthermore, GMR outperforms baselines regarding the isomorphic retraining score since it percolates the embeddings to preserve the structural information. Note that the isomorphic retaining score S is highly related to the AUC with a correlation coefficient of 0.92, demonstrating that it is indeed crucial to ensure the embedding consistency for the nodes with similar structures. For node classification, we compare different approaches on BlogCatalog [16] (10,132 nodes, 333,983 edges, and 39 classes), Wiki [5] (2,405 nodes, 17,981 edges, and 19 classes), and DBLP [22] (101,253 nodes, 223,810 edges, 48 timestamps, and 4 classes). DBLP is a real streaming network by extracting the paper citation network of four research areas from 1970 to 2017. BlogCatalog and Wiki are adopted in previous research [3] to generate the streaming networks. 12 The learned embeddings are employed to classify the nodes according to the labels. Cross-entropy is adopted in the loss function for classification with logistic regression. We randomly sample 20% of labels for training and 80% of labels for testing, and the average results from 50 runs are reported. 13 Table 2 demonstrates that GMR outperforms Change by 27.1% regarding Macro-F1 [13] , and it is close to Full but with 20.7× speed-up. The Macro-F1 scores of GraphSAGE and SDNE are at least 40% worse than that of GMR, indicating that GraphSAGE and SDNE cannot adequately handle multi-type changes in dynamic networks. Moreover, GMR achieves better improvement on BlogCatalog than on DBLP, because the density (i.e., the average degree) of BlogCatalog is larger, enabling hierarchical addressing of GMR to exploit more structural information for updating multiple nodes. For DBLP, GMR also achieves the performance close to Full. It is worth noting that the isomorphic retaining score S is also positively related to Macro-F1. We further investigate the percentages of isomorphic pairs with the same label on different datasets. The results manifest that 88%, 92% and 97% of isomorphic pairs share the same labels on BlogCatalog, Wiki, and DBLP, respectively. Therefore, it is crucial to maintain the consistency between isomorphic pairs since similar embeddings of isomorphic pairs are inclined to be classified with the same labels. Laplacian eigenmaps and spectral techniques for embedding and clustering HARP: hierarchical representation learning for networks Dynamic network embedding: an extended approach for skip-gram based network embedding Neural turing machines node2vec: scalable feature learning for networks Inductive representation learning on large graphs Mol2vec: unsupervised machine learning approach with chemical intuition SNAP datasets: Stanford large network dataset collection Multi-layered network embedding Real-time streaming graph embedding through local actions Distributed representations of words and phrases and their compositionality Continuoustime dynamic network embeddings DeepWalk: online learning of social representations Network embedding as matrix factorization: unifying DeepWalk, LINE, PTE, and node2vec The network data repository with interactive graph analytics and visualization LINE: large-scale information network embedding A global geometric framework for nonlinear dimensionality reduction VERSE: versatile graph embeddings from similarity measures Graph attention networks Structural deep network embedding Dynamic network embedding by modeling triadic closure process High-order proximity preserved embedding for dynamic networks Online learning to rank in stochastic click models In this paper, we propose GMR for streaming network embeddings featuring the hierarchical addressing, refreshing gate, and percolation gate to preserve the structural information and consistency. We also prove that the embeddings generated by GMR are more consistent than the current network embedding schemes under insertion. The experiment results demonstrate that GMR outperforms the state-of-the-art methods in link prediction and node classification. Moreover, multi-type updates with the beam search improve GMR in both taskoriented scores and the isomorphic retaining score. Our future work will extend GMR to support multi-relations in knowledge graphs.