key: cord-0204831-95mdr46y
authors: Gao, Fei; Zhang, Yan; Zhang, Jiang
title: NEDMP: Neural Enhanced Dynamic Message Passing
date: 2022-02-14
journal: nan
DOI: nan
sha: 146bc5cfc9843eda3ee8bf7c71dbd37677f31179
doc_id: 204831
cord_uid: 95mdr46y

Predicting stochastic spreading processes on complex networks is critical in epidemic control, opinion propagation, and viral marketing. We focus on the problem of inferring the time-dependent marginal probabilities of states for each node which collectively quantifies the spreading results. Dynamic Message Passing (DMP) has been developed as an efficient inference algorithm for several spreading models, and it is asymptotically exact on locally tree-like networks. However, DMP can struggle in diffusion networks with lots of local loops. We address this limitation by using Graph Neural Networks (GNN) to learn the dependency amongst messages implicitly. Specifically, we propose a hybrid model in which the GNN module runs jointly with DMP equations. The GNN module refines the aggregated messages in DMP iterations by learning from simulation data. We demonstrate numerically that after training, our model's inference accuracy substantially outperforms DMP in conditions of various network structure and dynamics parameters. Moreover, compared to pure data-driven models, the proposed hybrid model has a better generalization ability for out-of-training cases, profiting from the explicitly utilized dynamics priors in the hybrid model. A PyTorch implementation of our model is at https://github.com/FeiGSSS/NEDMP.

Stochastic spreading processes on complex networks are widely used for modeling the epidemic spreading, information propagation, and transmission of social behaviors. Given the initial states, accurately predicting the spreading is of primary importance in many domains. Since the stochastic nature of the processes, the more reasonable is to predict the marginal probabilities of each state for each node, as shown in Figure  1 . In addition to being an accurate description of the spreading, the predicted marginal probabilities could also be used for inferring the source of spreading (Zhu & Ying, 2016) , maximizing the influence by selecting a fixed size of initially activated nodes (Kempe, Kleinberg, & Tardos, 2015) , and determining an optimal set of nodes to immunize (Pastor- Satorras & Vespignani, 2002) .

Dynamics Message Passing (DMP) (Karrer & Newman, 2010; Shrestha & Moore, 2014; Shrestha, Scarpino, & Moore, 2015; A. Lokhov, Mézard, & Zdeborová, 2015) , a special case of Belief Propagation (BP) on time trajectories, is an efficient algorithm for inferring the marginal probabilities for stochastic spreading processes on graphs. For spreading process with unidirectional dynamics (e.g., SIR, SEIR), DMP is exact on tree graphs and asymptotically exact on locally tree-like graphs, and has a linear computational complexity in the number of edges and spreading time steps. It typically yields accurate estimations on real sparse networks for a large class of spreading dynamics (A. Lokhov et al., 2015) . Therefore, as an analytical inference machine, it has been used for inferring the patient zero (A. Lokhov, Mézard, Ohta, & Zdeborová, 2014) , reconstructing the dynamics parameters (A. Lokhov, 2016; Wilinski & Lokhov, 2021) , optimal deployment of resources (A. Lokhov & Saad, 2017) , and functional immunization of networks (S. Li et al., 2020) .

However, same as other Belief Propagation algorithms, the fundamental assumption of DMP is the independence of neighboring messages, which limits the ability of DMP to capture high-order interdependencies, i.e., DMP can struggle in graphs with short local loops. As illustrated in Figure 2 , since the existence of local loops, DMP fails to approximate marginal probabili-arXiv:2202.06496v1 [cs.SI] 14 Feb 2022 ties in an example graph with only four nodes. Unfortunately, it is extremely challenging to handle message dependence in those loops analytically (Cantwell & Newman, 2019) .

The success of Graph Neural Networks (GNNs) in modelling complex pair-wise interactions Fetaya et al., 2018; Cranmer et al., 2020; Bapst et al., 2020) motivates us to model the dependence of messages in local loops using trainable GNNs. Specifically, GNNs can be used to learn a refined aggregation operation for messages beyond the naive independent assumption used in DMP. Several works have applied GNNs to improve the estimation accuracy of Belief Propagation algorithms. Gated recurrent GNNs are used in (Yoon et al., 2019) as an end-to-end trainable inference algorithm for Binary Markov Random Fields, and learned GNNs outperforms BP in loopy graphs. More recent works focus on integrating the advantages of data-driven and modelbased methods. Instead of using a scalar damping factor, which is proposed to accelerate the convergence of BP, Kuck et al.(Kuck et al., 2020) replace the constant scalar factor with a trainable Neural Networks layer. The proposed hybrid model converges much faster than using scalar factors while returning an estimate of comparable quality. Methods in (Satorras & Welling, 2021; Liang & Meyer, 2021) aim to improve the estimation accuracy of BP for discrete and continuous random variables, respectively. Since the inaccuracy of BP comes from the oversimplified aggregation scheme of neighboring messages, those methods incorporate the GNNs into BP iterations as a trainable aggregation block, which is used to refine the aggregated messages in BP by learning from supervision data the complex dependence of messages. The resulting hybrid model runs BP and GNNs co-jointly, and benefits from the combination of physics prior and data-driven neural networks. However, no work has applied GNNs to improve the performance of DMP for better estimation of marginal probabilities of spreading process on graphs.

Inspired by (Yoon et al., 2019; Satorras & Welling, 2021) , in this work, instead of analytically modeling the complex dependence of messages in local loops, we use a GNNs module to learn from simulation data. In order to utilize the dynamical priors encoded in DMP equations, we use GNN module only to refine the inaccuracy aggregation operation in the iteration of DMP, while the exact iteration rules in DMP remain the same. The resulting hybrid model, which we call Neural Enhanced Dynamic Message Passing (NEDMP), runs DMP and GNN jointly, and benefits from complementation of model-based and data-driven components. For better training, we design a penalty term to enforce the monotone of predicted probabilities, which conforms the physical prior of dynamics. To verify the effectiveness of the hybrid model, utilizing the popular epidemic model Susceptible-Infected-Recovered (SIR), we conduct experiments of marginal probabilities inference on various graph structures and dynamics parameters. The main contributions can be summarized as follows: (1) We propose a customized GNNs that runs on line graph, (2) We propose a hybrid model for marginal inference, which runs GNNs and DMP jointly, (3) We design a dynamics-inspired penalty term to train the model, and (4) We conduct experiments on marginal probabilities inference problem on various graphs and dynamics parameters. The results show that all the proposed models outperform DMP, and the hybrid model, NEDMP, generalizes better than pure data-driven models.

Dynamic message passing has been derived as the inference algorithm for a wide range of diffusion processes. Without loss of generality, we use SIR model to demonstrate the spreading process on networks as well as the marginal probability inference problem.

For the discrete SIR model on a diffusion graph G = (V, E), where V is the set of nodes and E is the set of interactions, each node will take one of the three states: susceptible (S), infected (I) or recovered (R) at a specific time. Let σ t i ∈ {S, I, R} be the state of node i at time step t, the states transition follows:

where N i is the set of neighbors of node i, β ji ∈ [0, 1] is the infection rate of edge (j → i), γ i ∈ [0, 1] is the recovery rate of node i, and δ is the Kronecker function.

Define the marginal probabilities of node i as:

we now formulate the problem as follows:

Marginal Probability Inference (MPI) . For the SIR model on directed graph G = (V, E) with infection rates {β ij } (i→j)∈E , recovery rates {γ i } i∈V , and initial infectious nodes S ⊆ V, the goal is to infer the marginal i i i Figure 1 : Visualization of the marginal probabilities over time, we use the SI model for illustration. Given a diffusion network with infected nodes (in darkest color) at t = 0, the problem aims to compute {P i I (t)} i∈V,t≥1 , i.e, the time-dependent marginal probabilities (indicated by color) of been infected for each node. The values are obtained via 10 6 Monte Carlo simulations.

probabilities conditioned on β, γ and S:

where T is a preset time step or the convergence time, and P i S (t) + P i I (t) + P i R (t) = 1. In general, the exact computation of the marginals above is #P-hard (Wang, Chen, & Wang, 2012) . Figure 1 is a visualization of marginals evolving over time on an example graph. We define MPI problem based on the SIR model only for convenience, and the definition can be easily generalized to other spreading models on graph, such as Independent Cascade Model (Kempe et al., 2015) .

Dynamic message passing is the state-of-the-art method for inferring the time-dependent marginal probabilities in stochastic processes, e.g., SIR. Utilizing the unidirectionality property of the SIR model, the DMP equations for SIR are rigorously derived (A. Lokhov et al., 2015) from the dynamic cavity method (also known as Belief Propagation, BP) on nodes' time trajectories. It is proved (A. Y. Lokhov & Saad, 2019) that DMP is exact on trees and graphs without short loops. We briefly introduce the DMP equations for the SIR model here, and the detailed derivation is presented in the Appendix.

We begin with defining three intermediate dynamics variables:

• θ j→i (t): the probability that disease has not spread through the edge (j → i) up to time t

• P i→k S (t): the probability that σ t i = S when node i ignores infection from its neighbor node k;

• φ j→i (t): the probability that disease has not spread through the edge (j → i) up to time t and node j is infected at time t ( i.e., σ t j = I ).

Then, the marginal probabilities of SIR model can be approximated using:

The value of θ j→i (t) is computed by closed iteration:

with the initial values:

Following the equations above, we can recursively compute the marginal probabilities of SIR model.

Like other BP algorithms, DMP builds on the assumption that the graph structure is a tree, i.e., messages from different neighbors are independent of each other. This allows DMP to take the simplest approach (i.e., ) to aggregate the messages from neighbors, as in Equation (26) and (27). However, the assumption also prevents DMP from handling the high-order dependencies between neighboring messages introduced by the local loops. As illustrated in Figure 2 , when executing DMP in the simple graph, θ 1→3 (t) and θ 2→3 (t) are no longer independent when t ≥ 2, since they all root from θ 0→1 (t) and θ 0→2 (t) through blue and red paths respectively. Thus, the resulting marginal probabilities, when estimated using Equations (26) and (27), are no longer accurate, i.e., P i

Analytically extending DMP to high-order neighborhood structures is very challenging (Cantwell & Newman, 2019) . Therefore, a reasonable choice is to use a data-driven approach to learn a more appropriate message aggregation function that can capture the complex dependencies between messages.

This section introduces our hybrid model, in which the DMP iterates jointly with a GNNs module. The GNNs module is designed to refine the aggregated messages in the iteration process of DMP.

Notice that DMP for SIR is essentially a messagepassing process on line graph L = (N L , E L ) with nonbacktracking adjacency, i.e., N L = E and E L = {(i → j) → (j → k)} i,j,k∈V,i =k . To enable better integration of DMP and GNNs, we first define a special case of GNNs on line graph L. Specifically, we customize and extend Gated Graph Neural Networks (Y. Li, Tarlow, Brockschmidt, & Zemel, 2016 ) on graph L.

Mathematically, at every time step t > 0, each node (i → j) in graph L is associated with a hidden state h i→j (t) ∈ R D . In our scenarios, node (i → j) receives time-varying inputs x i→j (t) ∈ R F . We combine the inputs with hidden states as the messagesh i→j (t):

where the ⊕ is the concatenation operator. We then aggregate the incoming messages for target node (i → j):h

Finally, we update the hidden states for every node (i → j) based on the aggregated messages and current states via gated recurrent unit (GRU):

The functions φ m , φ e and φ a are nonlinear functions mapping input to R D , and are instantiated as mul- (ReLU) as the activation function. The parameters of φ m , φ e , φ a and GRU are shared by all nodes (i → j) and time steps t.

The equations above define one iteration of GNNs from time step t to t + 1 on line graph L. To achieve timedependent prediction, we can iterate those equations and feed {h i→j (t)} t>0 to a nonlinear readout function

whereŷ i (t) andŷ i→j (t) corresponding to the nodewise and edge-wise prediction in graph G.

As discussed in Section (2.2), the inference inaccuracy of DMP is due to its inability to capture the highorder interdependencies while the graph neural networks are good at capturing high-order relations (Wu et al., 2019) . In order to break this limitation of DMP and also to preserve the precise physically inspired part of DMP, we propose a hybrid model of GNNs and DMP, which is an extension of NEBP in (Satorras & Welling, 2021) .

Specifically, the hybrid model runs GNNs and DMP on the line graph L jointly. The GNNs module preserves and updates the hidden states for each node by incorporating the messages from DMP iterations; in return, the GNNs module outputs a refinement of aggregated messages in DMP. We name this model Neural Enhanced Dynamic Message Passing (NEDMP), which benefits from the complementary strengths.

We adopt the GNNs variant introduced in Section (3.1), termed as Φ. And the messages {θ i→j (t)} from DMP are provided as the inputs {x i→j (t)} to Φ. We initialize {h i→j (0)} as:

Since we aim to refine the aggregated messagesP i S (t) andP i→k S (t) in DMP iteration, it is reasonable to integrate those values into Φ as the prediction baseline. Therefore, we modify the readout function in Φ as follows:

where φ R is a nonlinear function mapping inputs to [0, 1] 2 , and is instantiated as multilayer perceptron (MLP) with Sigmoid as the activation function. Utilizing the readouts of module Φ, we can finally refine the messages by applying affine transformation:

The refined P i S (t) and P i→k S (t) are fed back to DMP for the next iteration. The whole framework of NEDMP is visualized in Figure 3 .

We can obtain the ground-truth marginal probabilities q i (t) = [q i S (t), q i I (t), q i R (t)] by extensive Monte Carlo simulations. Therefore, the simplest way to optimize the proposed model NEDMP is to minimize the crossentropy(CE) loss:

where T is the preset time length or the convergence time step, P i (t) is the predicted probabilities by model. It is worth noticing that the dynamics of SIR model ensure the monotone of node's marginals, i.e.,

, which the sole objective in Equation (20) may fail to capture. Thus, we add a regularization term to enforce this limitation:

We combine the regularization to the cross-entropy loss with a factor λ as the final objective function:

We conduct experiments on a set of synthetic and real networks to evaluate the performance of the proposed model NEDMP with respect to the MPI problem of the SIR model. Specifically, in Section 4.1 we evaluate the performance of all methods on diverse sets of graphs and dynamics parameters, where the training and testing sets come from the same distribution. In Section 4.2, we evaluate the generalization ability of the proposed models on instances that have graph structure or dynamics parameters out of the training distribution.

Data Generation. For the training and testing in all experiments, we obtain the ground-truth of marginal probabilities by averaging over 10 5 Monte Carlo simulations.

Baselines. We compare three methods for the MPI problem of SIR model:

• DMP (A. Lokhov et al., 2014) : An efficient inference algorithm for SIR model, and it is asymptotically exact on locally tree-like networks.

• GNN (Yoon et al., 2019) : This work proposed two pure data-driven models for inferring the marginal probabilities for probabilistic graphical models. Considering the variant msg-GNN (similar to the model proposed in Section (3.1)) increases the computational consumption, with no improvement in accuracy, we adapt the variant node-GNN for our problem (details in Appendix).

• NEDMP: A hybrid model in which DMP and GNNs module runs jointly. The GNNs module is designed to improve the accuracy of messages in DMP by learning the high-order dependencies from simulation data.

Evaluation Metrics. Averaged L1 error is used as the performance metrics for all experiments:

Training Details. We optimize the proposed models with loss function in Eq.(22) with λ = 5. The maximum time step T is 30. We use the Adam optimizer (Kingma & Ba, 2015) with the learning rate 0.01 and batch size of 1. The learning rate reduces by a factor of 0.5 whenever the learning stagnates. We use early stopping with patience 15. The dimensions of all hidden states are D = 32. Each dataset in the following experiments is split by 6:2:2 for training, validation, and testing, respectively. 

The complexity of MPI comes from two parts: the topology structure of the graph as well as the values of dynamics parameters, i.e., β, γ and S. Therefore, it is necessary to train and test the methods over different graph structures and parameter ranges. Particularly, the training and testing set are generated within the same distribution of β, γ and S. We also evaluate the methods on several real networks.

Graph Structures. As shown in Figure 4 , we choose eight types of graph structure with the number of nodes |N | ≈ 12. For each structure, we generate 200 samples for training and testing. Each of the samples has one randomly chosen node as the initial infectious seed, and dynamics parameters from β ∼ U(0.4, 0.6) and γ ∼ U(0.2, 0.5). After training, we run each method on testing graphs, and plot the inferred marginal probabilities, paired with ground truth, at time step T for each method in Figure 4 . The first row in Figure 4 verifies the inaccuracy of DMP in loopy graphs, as well as that DMP, is the upper bound of true influence (A. Y. Lokhov & Saad, 2019) . The flexibility and powerful inference ability of both GNN and NEDMP are validated by the bottom two rows which show almost accurate inferences in all kinds of graphs.

Dynamics Parameters. With the structure fixed as the n-regular graph, we vary the dynamics parameters β, γ, and S around the tipping point to evaluate the performance of proposed models, as shown in Figure   5 . For each combination of parameters, we generate 200 samples for training and testing. When parameters approach the poles of the x-axis, the diffusion steps are too few to form loopy paths, resulting in trivial instances for MPI problem. This is why the error curve of DMP is bell-shaped. The two trained models GNN and NEDMP have similar performance, and both outperform DMP in all ranges of parameters. As parameters approach the area with loopier diffusion paths, the performances of the learning-based models degrade much slower than DMP.

Real Diffusion Networks. We also train and test all the methods on six true diffusion networks (Rossi & Ahmed, 2015) , and all the graphs are preprocessed as undirected graphs. Again, we generate 200 samples for each graph as the training and testing set. For each sample, two randomly selected nodes are set as the initially infected nodes. And dynamics parameters are randomly sampled from β ∼ U(0, 0.3) and γ ∼ U(0.1, 0.4). Same as previous, within the distribution of training set, GNN and NEDMP have comparable performances, and all outperform DMP based on metrics L 1 .

It is well known that the neural network model suffers from poor generalization. Using hand-engineering and end-to-end learning cooperatively, a hybrid model which benefits from their complementary strengths can be one way to break the generalization limitation (Shlezinger, Whang, Eldar, & Dimakis, 2020) . The proposed NEDMP is such a hybrid model while GNN 0.028 ± 0.033 0.030 ± 0.039 0.021 ± 0.037 0.018 ± 0.042 0.032 ± 0.044 0.033 ± 0.048 NEDMP 0.027 ± 0.036 0.034 ± 0.044 0.024 ± 0.043 0.023 ± 0.054 0.048 ± 0.064 0.032 ± 0.044 is a purely data-driven model. To better understand the generalization ability of those two kinds of models, we conduct experiments on structure and parameter generalization.

Structure Generalization. We freeze the models trained on specific graph structures in Section 4.1, and then test them on all eight graph structure datasets. For instance, a GNN model trained on tree graphs is then tested on all graph structures, and the eight testing L 1 errors are visualized as the first row in Figure  7 (a). All the off-diagonal cells in Figure 7(a,b) are the generalization errors for the corresponding model. The average of generalization errors are 0.128, 0.035 for GNN and NEDMP respectively. The superiority of NEDMP in generalization verifies the strengths of the hybrid model. Dynamics Parameters Generalization. We consider the model generalization over β, γ and S. We fix the graph structure as the Watts-Strogatz graph. As shown in Figure 6 , for each parameter spectrum, we train the GNN and NEDMP on a small range of parameter values (bounded by the two vertical dashed lines). Then the trained models are tested on the whole parameter spectrum; the out-of-set results are presented in Figure 6 . As expected, GNN degrades rapidly as the parameter away from the training set. Surprisingly, NEDMP can maintain almost the same performance outside the training distribution, and it even outperforms DMP under all unseen parameters. NEDMP is so robust to parameters because the GNN module in NEDMP does not interact with parameters directly; parameters are encoded by well-established updating equations in the DMP module, which enables the hybrid model to generalize to unseen values.

In this work, we propose a hybrid model Neural Enhanced Dynamic Message Passing (NEDMP), which runs DMP and GNN jointly, for the marginal probabilities inference problem of the SIR model. We test the performance of inference in diverse sets of graph structures and dynamics parameters, and the results show that after training the proposed model significantly outperforms DMP in all kinds of graph structure within the training distribution. The generalization ability of the proposed model is evaluated by inferring on graph structures and parameters that are out of the training distribution. The results show that NEDMP generalizes much better than the pure data-driven model since it incorporates the informative dynamics-based prior bias from DMP module.

This work is also a demonstration of the emerging field of hybrid physics-guided machine learning (Rai & Sahu, 2020) . We use physics priors (DMP) to guide the design of the neural networks and regularize the training process. In future work, incorporating the physic prior and graph neural networks more compactly can be a promising direction.

As demonstrated in the experiments, NEDMP can be an efficient estimator for the spreading processing on graphs, which makes it potentially useful for forecasting and controlling epidemics (e.g., or rumor spreading on social networks. However, since NEDMP relies on the line graph of spreading networks, it is computationally hard to be applied to large social networks. Reducing the complexity of NEDMP is requisite for realistic scenarios, which is left for future work.

We follow (A. Lokhov et al., 2015) to introduce the DMP equations for the SIR model as well as the physical sense of intermediate dynamic variables.

We start with the updated rules of P i R (t) and P i I (t), derived directly from SIR model:

Let θ j→i (t) be the probability that disease has not spread through the edge (j → i) up to time t, and P i S (t) is updated by Eq.(26) which is exact on the tree and approximate on general graphs :

Before deriving the close form of θ j→i (t), we introduce two useful intermediate dynamic variables:

• P i\k S (t): the probability that σ t i = S when node i ignores all infection from its neighbor node k; • φ j→i (t): the probability that disease has not spread through the edge (j → i) up to time t and node j is infected at time t ( i.e., σ t j = I ).

By excluding node k from N i in Eq.(26), we have

Utilizing variable φ j→i (t − 1), we have the update rule for θ j→i (t):

The change of φ j→i (t − 1) comes from two parts: (1) When node j is infected at time t − 1, it becomes recovery with probability γ j or infects node i with rate β ji in the next time step; (2) When node j is susceptible at time t − 1, it turns into infected with probability P j\i S (t − 1) − P j\i S (t). Therefore, we have the update rule for φ j→i (t):

To complete the recursion updating rules, we give the initial values as:

Graph Neural Networks (GNNs) model the pair-wise interactions by implementing a trainable recurrent message passing mechanism in the graph structure, and has three main steps: Message Passing, Update and Readout. After t − 1 iterations of GNNs, let m u (t − 1) as node u's hidden states. In the next iteration, Message Passing step firstly aggregates all information from u's neighborhood N u as a message vector m →u (t), and then node u updates its hidden status to m u (t) by combining m u (t − 1) and m →u (t) in the step Update. The updated hidden states are then fed into a task-specified Readout function R for node-wise predictions in step t.

The marginal probabilities are dependent on the nodes initial infection status S ∈ {0, 1} |V| , nodes attributes γ ∈ [0, 1] |V| , as well as the edge attributes β ∈ [0, 1] |E| . Therefore, we first embed those attributes into vectors:

where ⊕ is the concatenation operator, φ n : R 2 → R D , φ e : R → R D and D is the preset number of hidden dimension, X 0 ∈ R |V|×D , E 0 ∈ R |E|×D . The nonlinear functions φ n , φ e are shared by all nodes and edges, respectively.

We present the details of proposed specific GNN for MPI as follows:

•

Step 1: Hidden status Initialization. Unlike (Yoon et al., 2019) initializing hidden status as zeros, we initialize the hidden states from nodes attributes:

where φ init is a nonlinear function mapping the inputs to R D .

•

Step 2: Message Passing. For node i in GNN, the incoming messages m →i (t) are the summations of the adjacent hidden states. To enable the messages explicitly dependent on initial conditions, before summation, we concatenate the E 0 , X 0 to the corresponding hidden states. Then we have:

where φ 1 and φ 2 are nonlinear functions mapping the inputs to R D .

•

Step 3: Update. Before updating, we concatenate the incoming messages with the target-nodes initial attributes, which is an imitation of Eq.(27) in DMP, to filter the incoming messages. We then update nodes hidden status with a Gated Recurrent Unit (GRU):

and φ 3 is a trainable nonlinear function.

•

Step 4: Readout. At each time step t (i.e. GNN tth layer), the predicted marginals for node i in GNN is computed by a Softmax function R:P

where φ 4 is trainable nonlinear functions. The concatenation of hidden status and X i 0 aims to imitate Eq.(26) in DMP.

Step 2-4 until termination condition is satisfied.

In our implementation, all the nonlinear functions φ n , φ e , φ init , φ 1,2,3,4 are specified by MLP with ReLU as the activation function, and φ 1,2,3,4 are shared across all time steps (GNN layers).

C The benefits of using the NEDMP in the within-distribution case.

Obtaining the ground truth as the training labels for large graphs is computationally expensive. This requires a sample-efficient model. Compared to the pure data-driven baseline GNN, the proposed model NEDMP is a physics-informed hybrid model, to which the DMP module provides strong regularization. Therefore, the NEDMP is much more sample-efficient than the baseline. For two real networks, we increase the number of training samples from 3 to 52, and the validation error of NEDMP and GNN is plotted in Figures 8 and 9 . The NEDMP can achieve satisfactory error for both networks with limited training samples, while the GNN requires more training data. Therefore, using NEDMP is a better choice when obtaining the training data is expensive. 

We implement our model utilizing Pytorch 1 and Pytorch-Geometric. We train and test our model on Ubuntu 18.04.5 with Intel Xeon Gold 6248 CPU and NVIDIA Tesla V100 GPU.

Unveiling the predictive power of static structure in glassy systems

Message passing on networks with loops

Discovering symbolic models from deep learning with inductive biases. ArXiv , abs

Neural relational inference for interacting systems

Message passing approach for general epidemic models. Physical review. E, Statistical, nonlinear, and soft matter physics

Maximizing the spread of influence through a social network

Adam: A method for stochastic optimization

Belief propagation neural networks

Functional immunization of networks based on message passing

Gated graph sequence neural networks

Neural enhanced belief propagation for cooperative localization

Reconstructing parameters of spreading models from partial observations

Inferring the origin of an epidemy with dynamic message-passing algorithm. Physical review. E, Statistical, nonlinear, and soft matter physics

Dynamic message-passing equations for models with unidirectional dynamics. Physical review. E, Statistical, nonlinear, and soft matter physics

Optimal deployment of resources for maximizing impact in spreading processes

Scalable influence estimation without sampling. arXiv

Immunization of complex networks. Physical review. E, Statistical, nonlinear, and soft matter physics

Driven by data or derived through physics? a review of hybrid physics guided machine learning techniques with cyber-physical system (cps) focus

The network data repository with interactive graph analytics and visualization

Learning to simulate complex physics with graph networks

Neural enhanced belief propagation on factor graphs

Model-based deep learning

A message-passing approach for threshold models of behavior in networks. Physical review. E, Statistical, nonlinear, and soft matter physics

A message-passing approach for recurrent-state epidemic models on networks. Physical review. E, Statistical, nonlinear, and soft matter physics

Scalable influence maximization for independent cascade model in large-scale social networks

Prediction-centric learning of independent cascade dynamics from partial observations

A comprehensive survey on graph neural networks

Inference in probabilistic graphical models by graph neural networks

Information source detection in the sir model: A sample-path-based approach

We are grateful for supporting from "Save 2050 Project" which is sponsored by Swarma Club and X-Order, and we also thank Muyun Mou and Jing Liu for technics supporting and insightful discussion.