key: cord-0043109-5gndpj2d
authors: Li, Hui; Li, Hui; Bhowmick, Sourav S.
title: BRUNCH: Branching Structure Inference of Hybrid Multivariate Hawkes Processes with Application to Social Media
date: 2020-04-17
journal: Advances in Knowledge Discovery and Data Mining
DOI: 10.1007/978-3-030-47426-3_43
sha: b566ef73beaf7b88aee39ff62dc1bc0a87c00ba9
doc_id: 43109
cord_uid: 5gndpj2d

Multivariate Hawkes processes (MHPs) are a class of point processes where an arrival in one dimension can affect the future arrivals in all dimensions. Existing MHPs are associated with homogeneous link functions. However, in reality, different dimensions may exhibit different temporal characteristics. In this paper, we augment MHPs by incorporating heterogeneous link functions, referred to as hybrid MHPs, to capture the temporal characteristics in different dimensions. Since the branching structure can be utilized to equivalently represent MHPs, we propose a novel model called BRUNCH via intensity-driven Chinese Restaurant Processes (intCRP) to identify the optimal branching structure of hybrid MHPs. Furthermore, we relax the constraint on the shapes of triggering kernels in MHPs. We develop a Monte Carlo-based inference algorithm called MEDIA to infer the branching structure. Experiments on real-world datasets demonstrate the superior performance of BRUNCH and its usefulness in social media applications.

Multivariate Hawkes processes (MHPs) are a class of point processes with mutually exciting components to model sequences of discrete events in continuous time, where an arrival in one dimension can affect future arrivals in all dimensions [5, 6] . Recently, MHPs have emerged in multiple fields to capture mutual excitation between dimensions, including high frequency trading [1] , social influence analysis [16] and computational biology [13] . However, these MHPs are limited to a specific scenario where a past arrival can only excite the occurrence of future arrivals, and the corresponding link functions 1 are linear (i.e., linear MHPs). However, in reality, inhibitory arrivals and non-additive aggregation of effects from past arrivals are present in several application domains [10] . For instance, negative feedbacks of online consumers may inhibit others' purchasing behaviours. Consequently, MHPs associated with nonlinear link functions, namely nonlinear MHPs [10] , have been proposed where effects from past events encompass both excitation and inhibition.

Prior work on MHPs also assume that all dimensions take the same link function i.e., either linear or nonlinear MHPs (homogeneous MHPs). That is, all dimensions follow roughly the same temporal characteristics. Note that a dimension may record actions of a user (e.g., tweet, retweet, comment, share) in social media, behavior of a customer (e.g., purchase, comment, return) in online shopping websites, and so on. In reality, different dimensions may exhibit different temporal characteristics. For example, in Twitter, one individual (i.e., dimension) may be extremely interested in a popular topic and interact with her followers frequently whereas another individual may take little interest in that topic and seldom respond to his followers. In such scenario, the cumulative influence from recent events on the former individual is clearly different from the one on the latter. Hence, homogeneous MHPs are insufficient to capture such diverse temporal characteristics. To address this problem, in this paper we augment MHPs by incorporating heterogeneous link functions, referred to as hybrid MHPs, allowing us to cope with diverse impact of past events on future events in different dimensions.

The cluster Poisson process interpretation [7] of MHPs separates the events into two categories, namely, immigrants and offspring. The offspring events are triggered by past events, while the immigrants arrive independently and thus do not have a existing parent event. Offsprings are structured into clusters associated with each immigrant event. This is called the branching structure [8] , which is an useful representation of MHPs in various applications. For example, in social influence analysis it can construct the narrative of information diffusion to pave the way for strategies to encourage or limit individual behaviors [14] . Additionally, the branching structure is widely utilized as a strategy in the maximum likelihood estimation of MHPs [16] . Unfortunately, such cluster Poisson process representation can only be applied to linear MHPs due to the mutually exciting assumption. Nonlinear MHPs cover both mutual excitation and mutual inhibition stochastically. Consequently, existing approaches based on the cluster Poisson process representation cannot be adopted to infer the branching structure of hybrid MHPs. In this paper, we infer the branching structure of hybrid MHPs regardless of the shapes of the triggering kernel functions 2 .

We propose a novel probabilistic model called BRUNCH (Branching stRUcture iNferenCe of Hybrid multivariate Hawkes processes) to reveal the branching structure of hybrid MHPs without assuming homogeneity of link functions or shapes of triggering kernel functions (Sect. 3). It is important for our probabilistic model to emphasize the following two features mirrored by the 

Notation Definition

Collection of event links t il → t ik Event link from event t il to event t ik C Collection of cluster links s → g Cluster link from cluster s to cluster g P ik Events triggered by t ik I(B, C) Collection of cascades

The cascade the event t ik belongs to event sequences in MHPs: (a) the chronological order of events is nonexchangeable; and (b) the triggering relations could distribute within or across dimensions stochastically. To this end, we propose intensity-driven Chinese Restaurant Process (intCRP), a novel extension of classical CRP [3] in which the random seating assignment of the customers depends on the triggering kernels between them. In particular, intCRP has a nested structure -inner intCRP to explore the possible triggering relations among events occurring in one dimension (i.e., event links), and outer intCRP to identify the collection of triggering relations between all events and their parents (if any) across dimensions (i.e., cluster links). Obviously, the changes to the triggering relations within and across dimensions are highly coupled, i.e., the inner intCRP and outer outCRP are strongly interlaced.

Since there are countably infinite sets of triggering relations, we propose a novel inference approach called MEDIA (MontE Carlo-baseD Inference Approach) that leverages the triggering nature of MHPs to sample event links and cluster links alternatively (Sect. 4). Finally, we apply BRUNCH on real-world social media datasets, and the experimental study in Sect. 5 demonstrates its superior performance and usefulness. Formal algorithms and proofs of theorems and lemmas appear in [9] . List of key symbols used in this paper is given in Table 1 .

In this section, we introduce relevant concepts for understanding this paper.

The conditional intensity function of the i-th dimension for an M -dimensional MHP takes the following form [10] :

where μ i > 0 is the base intensity capturing the arrival rate of exogenous events independent of historical events. The term M j=1 t jl <t α ij φ ij (t−t jl ) represents the accumulation of endogenous intensity caused by history [4] . The coefficient α ij measures the influence from the j-th dimension to the i-th dimension, allowing mutual excitation (α ij > 0) and mutual inhibition (α ij < 0). The triggering kernel function φ ij (t − t jl ) quantifies the triggering effect from the event t jl (i.e., t jl denotes the l-th arrival occurring in the j-th dimension) to the occurrence rate of the i-th dimension. Most of the existing work use predefined kernel functions with unknown parameters, such as the exponential kernels [16] and the power-law kernels [15] . The link function F i : R → R + , recognizes the triggering pattern of the i-th dimension over the historical events. The linear MHPs [6] is the case F i (x) = x with nonnegative α ij for each dimension, while nonlinear MHPs apply various F i (x) to guarantee the positive intensities. In hybrid MHPs, we allow each dimension to take a personalized link function to capture diverse temporal characteristics in real-world scenarios.

Recall that the events in MHPs are classified as either immigrants or offsprings. An immigrant event arrives independently of other events, while an offspring event is triggered by a previous event. In the sequel, we refer to an immigrant together with its offsprings as a cascade. The collection of triggering relations in cascades is called the branching structure [8] . The cluster Poisson processes [7] could equivalently represent the branching structure of linear MHPs. Briefly, each immigrant starts one cascade, which consists of offspring events of the 1 st , 2 nd , 3 rd , · · · generations, controlled by the endogenous intensity in Eq. 2.1 [11] . Due to the non-linear link functions, the above branching structure representation based on the cluster Poisson processes is inapplicable for nonlinear MHPs and hybrid MHPs.

While modeling the sequences X via hybrid MHPs, the corresponding branching structure could be represented by the variable set

Our objective is to infer (Z ik , P ik ) for each event. Consider Fig. 1(a) . Directed links sketch an example of branching structure. Z 11 = t 11 indicates the event t 11 is an immigrant, and Z 12 = t 11 shows t 11 triggers t 12 . Z 32 = t 31 and Z 33 = t 31 leads to P 31 = {t 32 , t 33 }, denoting that t 31 triggers t 32 and t 33 successively.

While modeling the asynchronous time-stamped event sequences via hybrid MHPs, we aim to reveal the underlying branching structure. Although the traditional Chinese Restaurant Process (CRP) [3] provides a flexible class of distributions that is amenable for modeling dependencies between elements, the exchangeability assumption here is problematic for elements with temporal dependencies. This is because events in MHPs occur at different time points, and are nonexchangeable. In addition, in our problem setting the influence that determines the pairwise dependencies between events is not homogeneous within and across different sequences. Intuitively, it is natural to quantify the influence via the triggering kernels in Eq. 2.1. In order to tackle these issues, we present the BRUNCH model, which is based on intensity-driven Chinese restaurant processes (intCRP), a new variant of CRP that allows a number of intensity-driven distributions as priors on triggering relations between events.

The events in one cascade could stem from different dimensions. So, beyond the triggering relations obtained from single-sequence intCRP, we need to capture the cross-sequence triggering relations among events. To this end, BRUNCH allows us to identify the possible dependencies among multi-dimensional event sequences in hybrid MHPs. Specifically, it presents a class of prior distributions over branching structure according to intCRP, which has nested structure, inner intCRP and outer intCRP. Briefly, the inner intCRP identifies the possible triggering relations in each sequence independently, which are referred to as event links. Linked events in each sequence form one cluster. Subsequently, outer intCRP captures the potential cross-sequence triggering relations, referred to as cluster links, which connects parent events with children from cross-sequence clusters.

We resort to the Chinese Restaurant metaphor to describe the generative process of event links and cluster links in BRUNCH. Imagine a collection of event sequences as a collection of restaurants, and the events in each sequence as customers entering a restaurant. The linked events in each sequence compose one cluster, and such clusters correspond to tables. Figure 1(b) illustrates the process. Note that in traditional CRP, the probability of a customer sitting at a table is computed from the number of other customers already sitting at that table.

In each sequence, if one event is an immigrant, there exists one self-link with itself; otherwise, there exists a triggering dependency for the event. That is, if event t il generates event t ik , the link from t il to t ik is created spontaneously. The inner intCRP assigns event link t il → t ik in a biased way, according to the following cluster-specific distribution:

where self-affinity ρ i = μ i yields new immigrant events, and larger self-affinity favors more cascades. A i lk describes how the affinity between a pair of events affects the probability for t il triggering t ik . In accordance with the propagation characteristics described by hybrid MHPs, child events tend to be generated by preceding events with stronger triggering effect. We define A i lk as the product of two parts:

It determines the probability to link with events that are at most W timespan away, and disregards the historical events as time progresses. Intuitively, the possible links existing between events that are far away from each other are negligibly rare.

which decays the probability of connecting events along with the timespan to the current one. Consequently, the event links B i = {t il → t ik |k = 1, 2, · · · , N i (t), l ∈ {1, 2, · · · , k}} in i-th dimensional sequence assign events into clusters, where two events are assigned to the same cluster if one is reachable from the other by traversing the directed links. Once the event link t il → t ik is confirmed, we can obtain the branching structure Z ik = t il within sequences.

Accordingly, the collection of event links B = {B 1 , B 2 , · · · , B M } will divide the event sequences X into clusters, denoted by C(B). By involving the mutualtriggering kernels, we could measure the pairwise affinity between cross-sequence events. Hence, given two clusters, s and g, the outer intCRP assigns the cluster link s → g according to the following cascade-specific distribution:

where t je = min{t jl |t jl ∈ g}. Only if one event (i.e., t ik ) in cluster s generates the earliest event in cluster g, there exists a cluster link s → g. In particular, self-loop cluster link is non-existent. Once the cluster link s → g is determined, we could construct the equivalent branching structure P ik = P ik {t je }. In summary, we construct the complete branching structure K = {(Z ik , P ik )} via scanning the obtained event links B and cluster links C.

Intuitively, a collection of cluster links will divide clusters into cascades. We represent one cascade as one set of events. Let Z ik denote the cascade associated with event t ik , and Z = {Z ik |i = 1, 2, · · · , M; k = 1, 2, · · · , N i } records the final cascade assignments. Notice that events t ik and t jl belong to one cascade (i.e., Z ik = Z jl ) if and only if they are reachable via combinations of event links and cluster links. Given the collection of event links B and cluster links C, we denote I(B, C) as the final collection of cascades. We initialize the Hawkes likelihood parameters Θ 1:

where γ are the hyperparameters, and π(γ) is the collection of distributions.

The central goal of BRUNCH is to infer the posterior distribution of the latent links (B, C), given a collection of time-stamped events. It places a prior distribution over a combinatorial number of possible event links and cluster links, according to inner intCRP (Eq. 3.1) and outer intCRP (Eq. 3.2), respectively. Intuitively, applying Bayes's Theorem, the posterior distribution takes the form Pr(B, C|X) = Pr(X,B,C)

Pr(X) . Unfortunately, we cannot compute Pr(X). Hence, the posterior inference of links is intractable. To address this, in the following section we present a strategy that approximately infers the posterior distribution.

As the number of event links and cluster links varies with the observed events, we need to undertake Bayesian inference over a link set of unknown cardinality. Moreover, changes over event links may induce subsequent changes to other event links and current cluster links. To this end, we propose the Monte Carlo-based inference approach (MEDIA) that leverages the triggering nature of hybrid MHPs to sample event links and cluster links. We aim to construct a Markov chain whose stationary distribution is the target posterior distribution Pr(B, C|X). The state of the chain is represented by (B, C) , a collection of links over event sequences. Furthermore, event links B and cluster links C are strongly coupled, that is, sampling event links could trigger a chain of merges and splits to the current structure, as shown in Fig. 2 . In view of this, we design the Monte Carlo-based inference approach, which involves two key phases: (a) sampling event links B via a Metropolis-Hastings rule, which could possibly bring changes to current cluster links C, and then (b) update cluster links C via a Gibbs sampler. We elaborate on them in turn.

After sampling one event link, the reconstructed links are denoted as (B,C). The iterative procedure runs until approaching the stationary distribution as follows:

1. Using the current links B, sample a candidate link setB from the transition probability q(B,C|X, B, C); 2. SampledB leads to cluster linksC. Calculate the acceptance probability η(B,C|X, B, C) for the candidate setB, Notice that we are considering the ratio of Pr(B, C|X) under two different structures, so the denominator Pr(X) is eliminated.

Based on Theorem 4.1, we can guarantee the resulting Markov chain converges to a stationary distribution uniquely [2] . BRUNCH provides a joint distribution for a collection of events and current links as following:

Pr(X, B, C) = Pr(B) Pr(C|B) Pr(X|B, C) (4.1)

If one event is an offspring, it has only one parent event. So once the affinity functions in inner intCRP are predefined, the generated event links are conditionally independent. Furthermore, event links divide all events into clusters. Hence, once the event links and the affinity functions in outer intCRP are predefined, the generated cluster links are conditionally independent. That is to say, event links B are conditionally independent given (ρ 1:M , A 1:M ), and cluster links C are conditionally independent given B, thereby causing the independent cascades. As a consequence, the joint distribution on events and links equals to:

Pr(X, B, C|ρ 1:M , A 1:M , γ) =

Pr(s → g|B) (4. 2) The activities belonging to cascade I are represented as t Z=I , hence,

where the parameter set Θ could be drawn from pre-determined distributions associated with the hyper-parameters γ. Hence, the above integral is tractable. Based on the hybrid MHPs associated with conditional intensity (Eq. 2.1), we derive the conditional probability density [8] that an event occurs at time t ik is:

where the integral t ik 0 λ i (s|Θ 1:M )ds is not always analytically integrable w.r.t. various link functions of hybrid MHPs. To address the issue, we propose the approximation:

We describe how replacing an event link affects the current links (B, C), and there are four cases:

1. Split. If adding event link t ik → t ik breaks the existing link t il → t ik and divides t ik and t il into two clusters, it shows that event t ik is an offspring. Moreover, there are no existing clusters to be merged after sampling event link t ik → t il . Thus, the new cluster including t il has the same incoming cluster links as the previous cluster containing both t ik and t il . We assume that the incoming links are independently linked to the new cluster with equal probability. New cluster s including t ik collects its outgoing cluster links C s according to the distribution Pr(C s |X,B, C −s ) where C −s represents the set of cluster links excluding the ones s → g, g ∈ C(B). Then, the transition probability is:

where |C t ik ∪ C t il | records the number of incoming cluster links for the old cluster containing both t ik and t il . Calculate Pr(t ik → t il |ρ i , A i ) according to Eq. 3.1.

Adding event link t ik → t ik breaks existing link t il → t ik . Moreover, the cluster including event t ik and the cluster including event t il are merged after sampling event link t ik → t il . Thus, the outgoing cluster links of new cluster retain the outgoing cluster links of the old cluster including t il , and the incoming cluster links of new cluster combine the incoming cluster links connecting to the cluster including t ik and the cluster including t il . The transition probability is:

Also, the incoming cluster links are assumed to be assigned to the new merged cluster equally. When the sampling link t ik → t il merges two clusters from one cascade, q(t ik → t il ) = Pr(t ik → t il |ρ i , A i ). Otherwise, when t ik → t il merges two clusters from different cascades, it combines the two cascades. Hence, , γ) , and X Z 1 =Z ik represents the events in the cascade Z ik excluding the events in the cascade Z il . Similarly, W 2 = Pr(X Z 2 =Z il |Z ik = Z il , γ), and X Z 2 =Z il denotes the events in the cascade Z il excluding the events in the cascade Z ik .

After adding event link t ik → t ik , if there is no new cluster to appear, it shows that event t ik is an immigrant. Moreover, sampling event link t ik → t il causes two existing clusters to be merged. Hence, the transition probability is:

Considering the two merged clusters come from one cascade or two cascades, q(t ik → t il ) is analogous to the analysis of q 2 (B,C|X, B, C).

There is no new cluster after setting t ik → t ik , and also no merge after sampling event link t ik → t il . In this case, the corresponding transition probability is:

Obviously, after sampling each event link, the resulting split (i.e., case 1) and merge (i.e., case 3) are inverse of each other, meanwhile, the other two cases (i.e., split and merge vs. no change) are inverse transform. Accordingly, by taking the inverse pairs, we derive the corresponding acceptance ratios η (B,C|X, B, C) = min{τ, 1}, where the Hastings ratio τ = Pr(X,B,C)q(B,C|X,B,C)

Pr(X,B,C)q(B,C|X,B,C) . As aforementioned, sampling an event link may lead four possible changes to current links (B, C). Hence, the corresponding Hastings ratio is:

• A single offspring event becomes an immigrant and the previous cluster is split into two clusters. Thus the candidate partition structure (B,C) is generated.

The transitions corresponding to q 1 and q 3 are the inverse of each other. Substituting Eq. 4.1, 4.5 and 4.8, the Hastings ratio works out to be

Pr(X|B, C −s ) Pr(X|B, C) (4.9) When the offspring event changes its parent event from one cluster to another cluster under the condition that the two clusters locate in different cascades, the reverse transition q 3 leads to the merger of two cascades. Consequently, the transition probability is:

The corresponding transition probability becomes:

• A single offspring event switches to an immigrant, and sampling a new event link leads to the combination of two local clusters. In this case, the transitions corresponding to q 2 and q 4 are the inverse of each other. If the two merged clusters come from one cascade, We keep all event links in an adjacency matrix S ∈ R n×n (n = M i=1 N i (t)), wherein rows and columns are indexed by ordered events from all sequences, value 1 or 0 is recorded in entry (t ik , t jl ) according to whether t ik triggers t jl or not. The formal description of MEDIA, is given in [9] . The time complexity of MEDIA is O(LM 3 n 2 a n c ) [9] .

In this section, we investigate the performance of our model and inference algorithm and report the key results. All experiments are performed on a machine with 16 GB RAM with Intel(R) Core(TM) E5-1620V2 CPU@3.70 GHz processor running on Windows 8.1 Pro.

Recall that there is no existing work that infers the branching structure of nonlinear MHPs. Hence, we are confined to compare our proposed frameworks to techniques that infer the branching structure of linear MHPs in [12, 17] . In particular, we consider the following strategies for our study. (a) Cluster-L: Based on the alternative representation of the linear Hawkes process in terms of cluster Poisson processes, [12, 17] propose the cluster-based method. As mentioned earlier, BRUNCH is not sensitive to shapes of the triggering kernel functions in hybrid MHPs. Hence, we adopt an exponential kernel function φ ij (t) = exp(−β ij t)(β ij > 0) in the experiments (see [9] for the performance associated with other kernel functions). For each dimension, the hyper-parameter γ is sampled by a uniform distribution U(0, 10). The base intensity μ is set varying over dimensions and is sampled from U(0, γ), then the coefficient α ij is sampled from N (0, γ 2 ), and the decay parameter β ij has the form of β ij = c * α ij where c is sampled from U(0, γ). The initial parameters Θ 1:M = (μ, α, φ) need satisfy the stability and uniqueness conditions (see details in [9] ). Additionally, we set the time window size to W = 12 h. Inferring Branching Structure. We convert the branching structure to a binary matrix S as mentioned in Sect. 4. Given the estimated links (B,C), we updateS according toB, and then derive the across-dimensional branching structure K fromC before filling in the finalS. Afterwards, comparing the estimated matrixS with the ground truth S, we evaluate the effectiveness for all the aforementioned strategies in terms of F1-Score. Figures 3(a) and 3(b) plot the results. Clearly, our proposed model BRUNCH with MEDIA outperforms the baseline Cluster-L, obtaining higher inference F1-Score. That is, the triggering relations among events (i.e., branching structure of MHPs) identified by the MEDIA approach are more reliable. While applying the same inferring procedure MEDIA, hybrid MHPs (mixing exponential MHPs and linear MHPs) show superior inference performance compared to other alternatives. In particular, MEDIA-H is superior to MEDIA-L and MEDIA-E. This further verifies the justifiability of our proposed hybrid MHPs.

We compare the convergence rate of our proposed techniques in Fig. 3(c) on Facebook data. The results on Twitter are qualitatively similar (see [9] ). Clearly, MEDIA-H is of higher likelihood than MEDIA-L and MEDIA-E. This further validates the usefulness of our hybrid MHPs. Figures 3(d) plots the scalability of our algorithms with increasing events on Facebook data. The results are qualitatively similar on Twitter (see [9] ). We run the inference methods on different sizes of datasets (i.e., slice different percentages of events in datasets as input data for BRUNCH). Observe that the average inference time of MEDIA stabilizes with increasing number of events. Since the number of event links and cluster links grows significantly as events increases, we expect the average runtime of MEDIA becomes relatively stable when more than 70% input data are utilized.

In this paper, we propose a novel probabilistic model called BRUNCH to infer the branching structure of hybrid MHPs. It bridges a significant chasm between hybrid MHPs (nonlinear MHPs as well) and branching structure inference. We handle the inferencing procedure via the MEDIA method, which provides a heuristic to make coordinated changes to both event links within clusters and cluster links within cascades. Empirically, our model demonstrates good performance and application potential in the real world.

Hawkes model for price and trades high-frequency dynamics

Pattern Recognition and Machine Learning

Distance dependent Chinese restaurant processes

Shaping social activity by incentivizing users

Point spectra of some mutually exciting point processes

Spectra of some self-exciting and mutually exciting point processes

A cluster process representation of a self-exciting process

Hawkes processes with stochastic excitations

Brunch: branching structure inference of hybrid multivariate Hawkes processes with application to social media

The neural Hawkes process: A neurally self-modulating multivariate point process

Perfect simulation of Hawkes processes

Bayesian inference for Hawkes processes

Goodness-offit tests and nonparametric adaptive estimation for spike train analysis

A stochastic differential equation framework for guiding online user activities in closed loop

Modeling high frequency data using hawkes processes with power-law kernels

Learning social infectivity in sparse low-rank networks using multi-dimensional Hawkes processes

Learning triggering kernels for multi-dimensional Hawkes processes

Acknowledgments. This work is partly supported by the National Natural Science Foundation of China (No. 61672408, 61972309) and National Engineering Laboratory of China for Public Safety Risk Perception and Control by Big Data (PSRPC).