key: cord-0670713-yrm7uio3
authors: Daw, Andrew
title: Conditional Uniformity and Hawkes Processes
date: 2022-02-10
journal: nan
DOI: nan
sha: 60e715d7037c227034984478f5942c9aafa2f5cb
doc_id: 670713
cord_uid: yrm7uio3

Classic results show that the Hawkes self-exciting point process can be viewed as a collection of temporal clusters, where exogenously generated initial events give rise to endogenously driven descendant events. This perspective provides the distribution of a cluster's size through a natural connection to branching processes, but this is irrespective of time. Insight into the chronology of a Hawkes process cluster has been much more elusive. Here, we employ this cluster perspective and a novel adaptation of the random time change theorem to establish an analog of the conditional uniformity property enjoyed by Poisson processes. Conditional on the number of epochs in a cluster, we show that the transformed times are jointly uniform within a particular convex polytope. Furthermore, we find that this polytope leads to a surprising connection between these continuous state clusters and parking functions, discrete objects central in enumerative combinatorics and closely related to Dyck paths on the lattice. In particular, we show that uniformly random parking functions constitute hidden spines within Hawkes process clusters. This yields a decomposition that is valuable both methodologically and practically, which we demonstrate through application to the popular Markovian Hawkes model and through proposal of a flexible and efficient simulation algorithm.

The hallmark of the Hawkes self-exciting point process is that each event generates its own stream of offspring events, so that the history of the process becomes an endogenous driver of its own future activity (Hawkes 1971) . In this way, the model can be viewed as a collection of clusters of events, where each cluster is initiated by some exogenous activity and then filled by the progeny descending from that original event. This structure appears throughout the many applications of the Hawkes process, observed in the virality of social media and the contagion of financial risk, and employed in the prediction of seismological activity and the spread of COVID-19 (e.g. Ogata 1998 , Aït-Sahalia et al. 2015 , Bertozzi et al. 2020 , Nickel and Le 2020 . 1 This cluster-based perspective was first provided by Hawkes and Oakes (1974) , and it has served as a cornerstone for analysis of the stochastic process. A particularly useful consequence is that, through a connection to Poisson branching processes, we are granted comprehensive time-agnostic insight into the model. That is, the distribution of the size of the cluster, meaning the number of events it contains, is well 1 See also: https://ai.facebook.com/blog/using-ai-to-help-health-experts-address-the-covid-19-pandemic/ understood and available in closed form. In this paper, we address a question that is elementary, related, yet evasive: how long will a cluster last?

The duration of a Hawkes process cluster, meaning the time elapsed from the first epoch to the last, must be at least as important in application as the cluster size; examples of longing to know when an endogenously driven activity will cease surely cannot be far from mind. However, despite the ease of access to the distribution of the cluster size, this answer has remained elusive.

While formalizing the cluster-based definition, Hawkes and Oakes (1974) bounded the mean cluster duration and obtained an integral equation that must be satisfied by its cumulative distribution function (CDF), but acknowledged that it would only "be solved in principle by repeated numerical integration." Møller and Rasmussen (2005) tightened the bound on the mean of the duration and showed that the duration itself could be stochastically dominated by an exponential random variable under some tail conditions on the excitation function, but they also remarked that the distribution is "unknown even for the simplest examples of Hawkes processes." They too develop a numerical approximation of the CDF, and this informs their approach for introducing a perfect simulation procedure for the Hawkes process, which is the primary focus of that work. Chen (2021) is also devoted to perfect simulation of the Hawkes process, particularly in steady-state, and applies this to a single server queue driven by Hawkes arrivals. By comparison to Møller and Rasmussen (2005) , Chen (2021) leverages exponential tilting to produce a more efficient algorithm. Because exponential tilting with respect to the cluster duration is challenging, the author instead constructs an upper bound of the duration that is almost surely longer than the duration itself. It is important that we note that simulation of Hawkes process clusters is actually a subroutine of the Chen (2021) algorithm (and likewise for Møller and Rasmussen 2005) , and this is explicitly described in the extension to multivariate Hawkes processes in Chen and Wang (2020) . In fact, each of these works simulate the cluster through the structure provided by the Hawkes and Oakes (1974) definition.

Several other works have sought headway into the cluster duration outside the context of simulation. Graham (2021) also bounds the duration of the cluster, invoking the tail conditions and results from Møller and Rasmussen (2005) for inequalities on the mean and on the CDF. Here, the author uses the bounds to prove regenerative properties for the Hawkes process as a whole.

Relatedly, Costa et al. (2020) also show renewal results, but only for excitation functions with a bounded support condition. Additionally, Reynaud-Bouret and Roy (2007) provide non-asymptotic tail bounds of the cluster duration, and Brémaud et al. (2002) bound the tail of the duration in order to bound the rate of convergence to equilibrium in the Hawkes process overall. However, none of these prior works characterize the mean duration explicitly, let alone its distribution.

Here, we address the fundamental pursuit of the cluster duration through a surprising connection from Hawkes processes to parking functions, a family of random objects that are rooted in enumerative combinatorics. In a comprehensive survey of combinatorial results, Yan (2015) remarked that the parking function lies "in the center of combinatorics and appear [s] in many discrete and algebraic structures." In this paper, we find that parking functions are also the hidden spines of Hawkes process clusters. While this bridge from discrete to continuous space may be unexpected, the parking function itself is truly not a lonely concept. Originally introduced in Konheim and Weiss (1966) , the classic context is an ordered vector of preferences over a row of k parking spaces such that, if k drivers proceed left to right and take only their preferred space or the next available to the right of it, all cars will be able to park. In enumerative combinatorics, this simple concept has been connected to many other interesting objects. For example, Riordan (1969) linked parking functions and labeled trees, while Stanley (1997) connected parking functions and non-crossing partitions. Stanley and Pitman (2002) then extended this to plane trees and to the associahedron, as well as to a family of polytopes closely related to the one that arises here. Parking functions have also been of use in the analysis of polynomials, such as in Carlsson and Mellit (2018) .

Highly relevant to our following study is the bijection between parking functions and labeled Dyck paths, or, equivalently, between sorted parking functions and Dyck paths (see e.g. Section 13.6 in Yan 2015) . Many other combinatorial relationships are available in great detail from Yan (2015) . Quite recently, the parking function has received considerable attention as a random object; for example by Diaconis and Hicks (2017) and Kenyon and Yin (2021) . To the best of our knowledge, Diaconis and Hicks (2017) provide the first probabilistic perspective of the parking function.

Motivated by the idea of a uniformly random parking function (from the collection of all those considering the same number of spaces), the authors explore the joint distribution of the full vector of preferences and the marginal distribution of the first preference. Additionally, Diaconis and Hicks (2017) also conduct an asymptotic study of parking functions as the number of cars (or, equivalently, spaces) grows large, yielding a connection to Brownian excursion processes in the limit. In the combinatorics literature, there are many generalizations of the parking function in which the cardinality of spaces exceeds the cardinality of preferences, and Kenyon and Yin (2021) adapt this generalization to the stochastic model. These generalized parking functions are again uniformly random within each combination of space and preference sizes, and Kenyon and Yin (2021) study both the marginal distribution of a single coordinate for the generalized parking function and the covariance between two coordinates for the classical notion.

Our connection between Hawkes processes and parking functions joins a family of work descended from the random time change theorem for point processes. The classical random time change theorem (e.g. Theorem 7.4.I of Daley and Vere-Jones 2003) provides that if the integral of the process intensity is evaluated at the arrival epochs of a simple, adapted (not necessarily Poisson) point process, these transformed points will form a unit rate Poisson process. This integral of the intensity is called the compensator. This beautiful result can be traced back to Meyer (1971) , with the corresponding characterization of the Poisson on the half-line originating with Watanabe (1964) . There is a deep theory surrounding this theorem, such as the elegant martingale proofs given in Brémaud (1975) or Brown and Nair (1988) . A rich collection of these ideas is available in Brémaud (1981) , as are many demonstrations of the potency of martingales for point process models. However, it has since been seen that this result need not require the most advanced of techniques, as Brown et al. (2002) showed that this attractive idea can be explained with elementary arguments, essentially reducing the proof to calculus. We will make use of this simplicity here.

In addition to elegance, the random time change is also quite useful. For example, in the simulation of point processes, if one can compute and invert the compensator function, Algorithm 7.4.III of Daley and Vere-Jones (2003) shows that one can obtain a point process with the corresponding intensity function by transforming arrival epochs of a unit rate Poisson process according to the inverse of the compensator. For the Hawkes process, the application of this idea can be traced to Ozaki (1979) , and inverting the compensator is essentially the idea behind the exact simulation procedure in Dassios and Zhao (2013) , even if it is not described as such. As we will see, our focus on the Hawkes cluster sets us apart. The simulation method that arises from our analysis differs from these Hawkes compensator inversion methods by sampling the points collectively from a polytope, rather than iteratively in the sample path. In statistics, the random time change theorem leads to methods for evaluating the fit of point process models on data (see, e.g., Ogata 1988 , Brown et al. 2005 , Kim and Whitt 2014 . The idea of these techniques is that, by transforming a complicated, possibly time-varying and/or path-dependent point process to a unit rate Poisson process, one can more easily observe and quantify exceedances from confidence intervals for the model. Both the simulation and statistical techniques are perhaps most often used for non-stationary Poisson processes, where one can immediately petition to the classical Poissonian idea of conditional uniformity. Here, parking functions will allow us to extend this notion to the clusters within Hawkes processes.

Our nearest predecessors in random time change theory are likely Giesecke and Tomecek (2005) and Ding et al. (2009) . Both of these papers are interested in using this concept to build stochastic models. Giesecke and Tomecek (2005) offer somewhat of a converse to the random time change theorem, inverting from unit rate Poisson epochs to construct a more general point process. Ding et al. (2009) uses a similar idea to construct point processes for financial contexts by converting from a pure birth process. By comparison, in this work we will not construct a point process generally, but rather uncover structure specific to the Hawkes process that becomes visible only when the process is transformed akin to the random time change theorem. Furthermore, it is fundamental to our approach that we are inspecting the cluster while conditioning on its size. Our pursuit of the times within a Hawkes cluster is framed by the knowledge of how many times there will be, and thus our methodology is indebted conceptually to both the random time change theorem and the conditional uniformity typically enjoyed by the Poisson process. This size-conditioning actually constitutes a departure from the typical random time change theorem assumptions in subtle yet important ways, as we will discuss in detail.

This brings us to our contributions and to the organization of the remainder of the paper. Our methodological goal in this work is to use one of the most important results for the Hawkes process, Hawkes and Oakes (1974) 's cluster definition, and one of the most important results for point processes in general, the random time change theorem, to create an analog of one of the most powerful tools for the Poisson process, conditional uniformity (Lemma 2). This is powered by an unexpected connection between two notable stochastic models, Hawkes process clusters and random parking functions. In linking the continuous time point process to this discrete random vector, we will find that parking functions uncover the full chronology of Hawkes clusters (Theorem 1).

Through this, we are brought back to our original goal, which is to understand the cluster duration.

We organize this analysis as follows. In Section 2, we review the key background concepts, providing precise definitions and summarizing important results from the literature. Then, in Section 3, we invoke the compensator transform of the Hawkes process and adapt the random time change theorem for our setting. Our primary result, in which we formalize the Hawkes-process-parkingfunction connection, is shown in Section 4. To demonstrate its usefulness, we apply the techniques to the popular Markovian Hawkes process in Section 5, where we use conditional uniformity and parking functions to prove an explicit equivalence between the cluster duration and a random sum of conditionally independent exponential random variables (Theorem 2). In a second application of the parking function decomposition, in Section 6 we describe the resulting simulation algorithm, which may be of importance to applications in rare event simulation thanks to its ability to explicitly condition on the cluster size (Algorithm 1). Finally, in Section 7 we conclude.

To begin, let us briefly review the definitions and prior results that set the stage for our analysis.

In particular, we will provide formal definitions of the two focal objects in this work, the Hawkes process cluster and the parking function, while also touching on related topics like branching processes, the Borel distribution, Poisson conditional uniformity, and Dyck paths.

Originally introduced by Hawkes (1971) , the Hawkes point process N t and intensity λ t are defined such that

where F t is the natural filtration on the underlying probability space (Ω, F, P) generated by the point process N t , and where the conditional intensity function λ t is given by

with {A i | i ∈ Z} as the increasing sequence of arrival epochs in the point process N t . The function g : R + → R + governs the excitement generated upon each arrival epoch, and thus is often referred to as the excitation kernel or excitation function. On the other hand, λ ≥ 0 is commonly called the baseline intensity, as it drives an underlying stream of exogenous arrivals. By construction in

(2), every event epoch increases the intensity upon its occurrence by g(0) and then generally s time units later by g(s), hence earning the process its hallmark trait of "self-excitement." For this reason, it is the history of the process that drives its future activity.

One of the most powerful tools for analysis of Hawkes processes has been an alternate definition of the model, originally provided and proved to equivalent to (1) and (2) in Hawkes and Oakes (1974) . This formulation, frequently referred to as the cluster-based definition, bears a style similar to branching process. In a first-level stream, initial events are generated according to a Poisson process at rate λ. Then, in a secondary stream for each initial event, a progeny cluster is generated independently of all other clusters and arrivals. Starting with the initial event and for each successive arrival in the cluster, direct offspring of that event are generated according to an inhomogeneous Poisson process with rate g(t) for t time units elapsed since the given arrival epoch.

This repeats with generating descendants of the offspring themselves until no further arrivals occur in the cluster, and extinction is guaranteed if ρ = ∞ 0 g(t)dt < 1. One of the foremost benefits of this alternate definition of the Hawkes process is that it makes very clear the two types of arrivals: the baseline-generated stream and the excitement-generated clusters that spawn off of it. Because the baseline-generated stream is a Poisson process, its behavior is well understood, and so the cluster-based definition allows us to focus on the impact of the selfexcitement. This is where the focus of this paper will lie. To dedicate our attention to the structure of the self-excitement, we will narrow the primary definition and isolate the clusters. That is, let us essentially mirror Equations (1) and (2) but with a time 0 initial arrival (A 0 = 0) and no baseline stream (λ = 0).

Definition 1 (Hawkes Process Cluster) . For t ≥ 0, we will henceforward take (λ t , N t ) as the cluster-specific intensity-and-point-process pair with N t functioning as in Equation (1) and with the intensity being simply

where A 0 = 0 without loss of generality. Supposing ρ = ∞ 0 g(t)dt < 1, the cluster can be fully characterized by its arrival epochs 0 = A 0 < A 1 < · · · < A N −1 , where N = lim t→∞ N t is the cluster size and τ = A N −1 the cluster duration.

One can think of the cluster initializing with a time zero arrival from the exogenous baseline stream, and thus the above definition focuses on endogenously driven activity. The stability assumption ρ < 1 traces back to Hawkes' original work, and it provides that the size N is finite almost surely and that the intensity lim t→∞ λ t = 0 almost surely (Hawkes 1971) . The Hawkes and Oakes (1974) alternate definition of the model already provides perspective of the cluster's descendant structure irrespective of time, and this describes the cluster's size with clarity. Because each event spawns its own offspring Poisson process with time-varying rate g(t), its expected total number of offspring events is ρ, and moreover the total number of direct offspring of any one event is Pois (ρ) distributed. This means that the family tree becomes a Poisson branching process and thus the total progeny will be Borel distributed (Feller 2008) . That is, the size N ∼ Borel(ρ) has probability mass function

for all k ∈ Z + .

While this clear understanding of the cluster size is helpful, on the surface it only tells us a limited part of the cluster's story. However, we will see here that it provides a key to the chronology as well. We aim to leverage the time-agnostic perspective of the cluster's size to understand the times at which the events occur within it. Our approach is rooted in a concept like the conditional uniformity property provides for Poisson processes. The classical notion of conditional uniformity states that for a Poisson process with a homogeneous arrival rate, the joint distribution of the epochs A 1 < A 2 < · · · < A k given that there were k arrivals in the interval [0, t) is equivalent to the joint distribution of the order statistics of k i.i.d. uniform random variables on [0, t). Alternately stated,

The volume of U is t k /k!, and hence the joint density of this random vector is k!/t k for all x ∈ U, as the generalized uniform distribution on a polytope means that all points in that polytope have density given by the inverse of the volume. It is of course straightforward to sample from U, as one can simply generate k i.i.d. Uni(0, 1) random variables, sort them, and multiply by t. This creates a handy way of sampling Poisson processes, and the idea extends to time inhomogeneous Poisson processes as well. For a Poisson process with time-varying arrival rate given by f : R → R + with F (t) = t 0 f (s)ds, it can be simulated on an interval [0, t) where f (·) > 0 by generating the number of epochs according to Pois(F (t)), sampling the yielded number of standard uniform random variables, sorting them, and returning each arrival epoch as

where U (i) is the i th smallest. This idea can be seen to follow simply from the inverse transform sampling method and the likelihood function of the time-varying Poisson process, but it can also be seen as a consequence of the random time change theorem, as we will discuss in Section 3.

Well-known in combinatorics, a k-step Dyck path is a non-decreasing trail of points on the lattice from (0, 1) to (k, k) such that the height of the path never exceeds the integer ceiling of its interval.

For reference and intuition's sake, Figure 1 shows all 3-step Dyck paths. Dyck paths have myriad connections to other combinatorial objects, and many of these connections follow from the fact that they are enumerated by the Catalan numbers (e.g. Stanley 1999 ). We will let D k be the set of all k-step Dyck paths. For convenience downstream, our convention will be to record the Dyck path as the vector containing the integer values that are the y-coordinate for each integer interval of the x-coordinate in the lattice path, omitting the terminus. That is, the 2-step path

The collection of all 3-step Dyck paths.

Closely related to Dyck paths are parking functions, which will serve as a focal point throughout this paper, and these can be defined through a parsimonious condition.

Definition 2 (Parking Function). For k ∈ Z + , π ∈ Z k + is a parking function of length k if and only if it is such that, when sorted such that π (1) ≤ · · · ≤ π (k) , π (i) ≤ i for each i ∈ {1, . . . , k}.

Parking functions earn their name from the following intuitive context. Suppose k cars arrive in successive fashion to a strip of k parking spaces, labeled 1 through k. Each car i has a preferred space π i . If spot π i is available, then the car will park there, otherwise they will take the next available space after π i , meaning greater than π i . Hence, the constraints π (i) ≤ i ∀ 1 ≤ i ≤ k ensure that all cars will be able to park. For our context, perhaps the most valuable property of parking functions will be that they can be viewed as labeled Dyck paths, or, equivalently, that a sorted parking function is a Dyck path (see, e.g., Section 13.6 of Yan 2015). That is, every k-step Dyck path is also a parking function of length k, but moreover every parking function of length k can be seen to be a permutation of a k-step Dyck path. We will let PF k for each k ∈ Z + be the set of all length-k parking functions, and it will be valuable for us to both enumerate these and review an elegant proof of this enumeration. The proof is due to Pollak, but it was recorded and published by contemporaries (e.g. Riordan 1969, Foata and Riordan 1974) , and it is also available on p. 836 in Yan (2015) .

Proof (Pollak's Circle Argument). Suppose that there are instead k + 1 parking spaces, and letπ ∈ {1, . . . , k + 1} k be any k-length preference vector for these spaces. Let the cars progress one-by-one so that the i th car is the i th to pick and suppose, as before, that each car attempts to park in their preferred spotπ i and, if unavailable, takes the next space open after it. However, suppose now that the spaces are arranged in a circle, so that if a car has made it to space k + 1 without parking, it will start again at space 1. Hence, one can instead think ofπ as an element of (Z/(k + 1)Z) k . By the pigeonhole principle, all k cars will be able to park and there will be exactly one space remaining, some ∈ (Z/(k + 1)Z). If = k + 1, thenπ is a parking function, and if not, it can be converted to one by defining π = (π − ) mod (k + 1), effectively rotating the empty spot to align with space k + 1. Since there are k + 1 possible rotations that would convert to the same parking function, the cardinality of the set of parking functions is 1/(k + 1) of the cardinality of the set of preference vectors, which is (k + 1) k .

Example conversion of a preference vector drawn from {1, . . . , k + 1} k (here,π = [2, 5, 1, 5, 6]) to a parking function of length k (π = [4, 1, 3, 1, 2]) by rotating the empty space to position k + 1.

A demonstration of this argument and the concept of rotating the empty space is shown in Figure 2 . This construction along the circle leads to a group theoretic sampling of uniformly random parking functions that is used in both Diaconis and Hicks (2017) and Kenyon and Yin (2021) , and we will employ this later in our proposed Hawkes cluster simulation algorithm in Section 6.

To connect conditional uniformity and Hawkes processes and to begin understanding this relationship, we first need to define the the compensator of the process, a transformation of the stochastic process. Let G(t) = t 0 g(u)du. Then, the continuous time compensator of the Hawkes cluster intensity is given by

By construction, Λ(t) is a continuous and increasing function. In general, the difference of a point process and its compensator N t − Λ(t) is a martingale, which can be seen from the Doob-Meyer decomposition (see, e.g., Lemma 7.2.V in Daley and Vere-Jones 2003) , and this hints at some of the elegant martingale approaches to the random time change theorem. In the case of the Hawkes process, we can see that the process is deterministic between its arrival epochs, and so we can capture the full information of the sample path through the discrete time compensator process evaluated only at these points in time:

for k ∈ Z + , where Λ k is only defined for clusters of size at least k + 1, so that A k < ∞. Here, we can see that not only is the sequencing increasing by definition since λ t > 0 as long as there is activity in the cluster, we can also recall that lim t→∞ G(t) = ρ = ∞ 0 g(u)du, and so Λ k−1 < Λ k < kρ. The compensator sets the stage for us to invoke and extend the classical random time change theorem.

Lemma 2. Given that there are N = k + 1 events in the cluster in total including the initial time 0 event, the conditional joint density of the transformed epochs is

for all k ∈ N and all 0 <

for Λ as the vector of the compensator points and the polytope

Proof. From the classical random time change theorem, e.g. Equation (2.14) of Brown et al. (2002) , the unconditioned joint density of the first k compensator points is

where Λ i < iρ for all 1 ≤ i ≤ k is both implied and required whenever there are at least k offspring events. Then, the conditional density can be expressed

. Now, given that there have been k epochs (excluding the initial time 0 message) so far up to time A k , the event that there will be exactly k epochs in the cluster in total is equivalent to the event that there will be no epochs after time

Given the history of the process up to time A k (as contained in the natural filtration F A k ), the probability that N [A k ,∞) = 0 is given by

since Equation (1) shows that the Hawkes process is conditionally non-stationary Poisson given its history. Because the intensity λ t must be strictly positive on [0, A k ], there is a unique and deterministic bijection between (A 1 , A 2 , . . . , A k ) and (Λ 1 , Λ 2 , . . . , Λ k ). Hence, because the collection of arrival epochs completely specifies the sample path of the process, so does the collection of compensator points. Therefore, we have

which by the definition of the compensator further implies that

Recalling that N ∼ Borel(ρ), we can substitute this probability mass function for P (N = k + 1) and

achieve the stated result.

Let us briefly contrast Lemma 2 with the classical version of the random time change theorem. It is well-established that the seminal result holds for any simple, adapted point process, and that of course includes the Hawkes process we study here. So, it is not that we are newly extending to selfexciting processes; rather, the novelty is in the use of conditioning. Specifically, in this lemma we obtain the conditional joint density of the compensator points given the size of the cluster. Because this conditioning is on a tally collected at the end of time, we depart from some of the common assumptions of the classical random time change theorem. That is, the statement of the theorem typically contains an assumption that Λ(t) is not bounded almost surely (such as in Theorem 7.4.I in Daley and Vere-Jones 2003) , but here we have discussed that each Λ i is strictly less than iρ. In fact, P (lim t→∞ Λ(t) = kρ | N = k + 1) = 1. By direct consequence, we also do not require an infinite sequence of epochs on the positive real half-line (as in Proposition 7.4.IV or Theorem T16 in, respectively, Daley and Vere-Jones 2003, Brémaud 1981) . Similarly, there is also often an assumption that λ t > 0 for all t in the time interval of the transform (as in Brown et al. 2002) , but here we are explicitly assuming that the intensity converges to 0: P (lim t→∞ λ t = 0 | N = k + 1) = 1.

These changes are subtle in assumption but important in consequence, as it shifts the connection's end result from the Poisson to the uniform distribution instead. The key takeaway from Lemma 2 is that the conditional density is constant: it has no dependence on the values of the compensator points. Thus, for a cluster with k post-initial events (N = k + 1), any collection of compensator points satisfying 0 < Λ 1 < Λ 2 < · · · < Λ k < kρ is equally likely to occur. As used in the the proof of Lemma 2, the compensator vector entirely characterizes the cluster, so we now have access to the full sample path through these conditionally uniform points. This also shows that any two Hawkes processes with the same ρ will have equivalent distributions of compensator points; neither the size of the cluster nor the conditional joint density of the compensators depend on g(·) outside of ρ.

Still, the establishment of conditional uniformity is not out of the blue. Setting aside the departure from typical random time change theorem assumptions, the conditional uniformity in Lemma 2

is not overly surprising given the well known transformation to a unit-rate Poisson process. However, what is special for this result is the polytope on which the transformed points lie uniformly distributed. Recalling the review of Poisson conditional uniformity in Section 2, we know that the only constraints on the Poisson arrival epochs are that each point is less than the next. However, Lemma 2 shows that in addition to that, each compensator point in the Hawkes cluster is constrained to be less than the product of its index and ρ. Hence, the interesting piece of this conditional uniformity that remains to be understood is determining what exactly it means to be uniformly random on this convex polytope P k .

This leads us to our first main result. Lemma 2 gave us the family of polytopes {P k | k ∈ N} upon which the cluster's compensator points are conditionally uniform. What we will find now is that this implies that parking functions provide a hidden spine to these transformed arrival epochs.

Theorem 1 shows that uniformly random parking functions actually provide a partition for these polytopes from which the compensators can be drawn directly through standard uniforms. Hence, this decomposition elucidates the structure within self-excitement and unites Hawkes processes with parking functions.

Theorem 1. Let k ∈ N. Given that the Hawkes process cluster is comprised of N = k + 1 events, the compensator transformed arrival epochs Λ 1 , . . . , Λ k are such that

where (π − U ) (1) ≤ · · · ≤ (π − U ) (k) , with π ∈ PF k as a k-step parking function drawn uniformly at random and U ∈ [0, 1) k as k i.i.d. standard uniform random variables.

We will prove Theorem 1 across a series of subsidiary statements. For simplicity and without loss of generality, let us consider the unit polytopeP k = {x ∈ R k

, which maps to P k through the bijection x → ρx for all k. To begin to establish our methodology of sampling fromP k , we will first viewP k as a union over sub-polytopes with measure 0 intersections. Specifically, let us distill each point inP k as lying on intervals bounded by integer steps in each coordinate direction. Then, any point inP k can be seen to lie in one of these subspaces.

To organize this decomposition ofP k , we will draw in the first combinatorial object discussed in Section 2: Dyck paths. In Proposition 1, we use the collection of k-step Dyck paths to partitionP k into |D k | = C k = 2k k /(k + 1) disjoint subspaces. As the brief proof demonstrates, the result itself is straightforward, but it will be of use in structuring the uniform sampling from the polytope.

Proposition 1. For each k ∈ Z + , the unit polytopeP k can be decomposed through the set of

for each Dyck path d ∈ D k .

Proof. By definition, the setsP k,d are mutually disjoint across all paths d. Because the ordering constraint x i ≤ x i+1 is shared by every set in the Dyck path union and byP k , we only must check that the space constraints agree. Here, we can quickly observe that 1 ≤ d i ≤ i for each i ∈ {1, . . . , k},

The three dimensional polytope,P 3 , and its Dyck path decomposition are shown in Figure 3 .

In some sets in the Dyck path partition, the ordering constraints will be superfluous given the integer ranges on which each coordinate lives. For example, at any dimension k the largest volume subset will be the hypercubeP k,[1,2,...,k] = {x ∈ R k + | i − 1 ≤ x i < i ∀ 1 ≤ i ≤ k}, in which the ordering constraints are not needed. By comparison, the smallest volume subset will correspond to the "lowest" Dyck path [1, 1, . . . , 1], which yieldsP k,[1,1,...,1] = {x ∈ (0, 1] k | x i < x i+1 ∀ 1 ≤ i ≤ k −1}. This structure is also intimately linked to the value of the decomposition. The benefit of Proposition 1 is that it is simple and intuitive to sample uniformly within a given subspace from the Dyck path The three dimensional polytope (left) and its decomposition into the five sub-regions indexed by Dyck paths (right).

partition. That is, a uniform sample from the hypercube is simply given by U i ∼ Uni(i − 1, i), drawn independently for each i. On the other hand, a uniform sample from the [1, 1, . . . , 1] subspace is given by the order statistics of k independent Uni(0, 1) random variables. This generalizes cleanly.

To sample uniformly from the region given by the Dyck path d ∈ D k , draw k independent Uni(0, 1) random variables, say U 1 , . . . , U k , and return the order statistics of d 1 − U 1 , . . . , d k − U k as the sample. This is again straightforward to show.

Proposition 2. For k ∈ Z + and d ∈ D k , let X d ∈ R k + be equivalent in distribution to the order

Proof. The two distributions share a sample space by definition. Because the elements of U are independent, distinct values of d yield independent elements of X d . When values are repeated in d, the corresponding elements of X d are equivalent to shifted standard uniform order statistics.

Hence, for κ i (d) = |{j | d j = i}|, the density of X d is f (x 1 , . . . , x k ) = k i=1 κ i (d)!, which is precisely the inverse of the volume ofP k,d .

Hence, we see that the Dyck path indexing provides a way of encoding which points share an interval and which do not. The remaining piece for using this decomposition to uniformly sample fromP k is to find a distribution over the Dyck paths, so that we draw fromP k by first selecting a set from the partition and then drawing a point from within it. Because Dyck paths are enumerated by the Catalan numbers, we know that there will be C k = 2k k /(k + 1) sets within the partition ofP k . However, each set should not be equally likely; rather, the likelihood of each set should be proportional to its volume. This is clear at k = 2.P 2,[1,2] , a square comprised of 0 ≤ x 1 < 1 and 1 ≤ x 2 < 2, should be twice as likely asP 2,[1,1] , the half-square given by 0 ≤ x 1 < x 2 < 1. At k = 3, there is one cube (P 3,[1,2,3] ), three half-cubes (P 3,[1,1,2] ,P 3,[1,1,3] , andP 3,[1,2,2] ), and one sixth-cube (P 3,[1,1,1] ), as shown in Figure 3 . Since the total volume ofP 3 is 8/3, the respective probabilities for each of the five regions should be 3/8, 3/16, 3/16, 3/16, and 1/16.

In what is of beautiful consequence for our problem, parking functions yield precisely the correct weighting of Dyck paths. Recalling from Section 2 that there is a bijection between sorted parking functions and Dyck paths, the suite of proportions of length-k parking functions that after sorting are equal to each k-step Dyck path yields exactly the desired distribution over the partitions ofP k .

Hence, to properly select the partition subset from which to sample inP k , we can simply sample the Dyck paths according to the fractions of equivalent parking functions, or, even more directly, we can merely draw a parking function uniformly at random, sort it, and select the corresponding subset of the polytope. We prove this now in Proposition 3.

Proposition 3. For k ∈ Z + , let π ∈ PF k be a uniformly random parking function of length k, and let π be the parking function sorted increasingly. Likewise, let Λ ∈ R k + be selected uniformly at random fromP k . Then, for any Dyck path d ∈ D k ,

where κ i (d) = |{j | d j = i}| counts the number of occurrences of i within a Dyck path d.

Proof. Let d ∈ D k be arbitrary. We will prove the result by showing that P Λ ∈P k,d and P ( π = d) are both equal to the right-most side of Equation (11 Hence, Equation (11) shows the true distribution of π. Now, for P Λ ∈P k,d , let us recall that Lemma 2 implies that the density of Λ onP k is k!/ (k + 1) k−1 . Hence, the probability that Λ lies in d's partition set is

If d i−1 < d i < d i+1 , then the neither the range of integration nor the integrand of the integral with respect to x i will depend on any of the other variables of integration, hence this integration is separable. Moreover, the integrals over x i and x j are not separable if and only if d i = d j . Therefore, we can re-express the integral as

where an empty integral is taken to equal 1 by convention. In this simple form we can quickly see

which yields the stated result for P Λ ∈P k,d .

Theorem 1 now follows directly.

Proof of Theorem 1. By Lemma 2, we have that vector of transformed epochs Λ = [Λ 1 , . . . , Λ k ] is jointly uniform over the polytope, i.e. Λ ∼ Uni(P k ). Equivalently, Λ/ρ ∼ Uni(P k ). Then, by the decomposition into disjoint subsets in Proposition 1, for d ∈ D k we further have that Λ/ρ | (Λ/ρ ∈ P k,d ) ∼ Uni(P k,d ). By Proposition 2, the order statistics of d − U are also uniform withinP k,d . Now, Proposition 3 establishes that the probability P Λ/ρ ∈P k,d = P ( π = d) for a uniformly random parking function π, and thus the order statistics of π − U and Λ/ρ are equivalent in their uniform distribution onP k .

In the remainder of this paper, we will demonstrate some of the advantages of the parking function decomposition of the Hawkes process clusters. We start by showcasing the analytical value of the conditional uniformity through application to what is perhaps the most widely used Hawkes excitation kernel. Within this section, let us now assume that g(x) = αe −βx for 0 < α < β. Under this kernel, the intensity becomes a Markov process (Oakes 1975) . Much of the popularity of the exponential kernel follows from its frequent tractability, as we will see here.

Let us start by expressing the compensator for the exponential kernel. Given g(x) = αe −βx , we see that G(x) = ρ(1 − e −βx ) for ρ = α/β, and thus

There is a great deal of information hidden in this expression. For example, if we let

be the value of the intensity immediately before the cluster's k th event, then we can observe that this quantity is merely an affine transformation of the corresponding compensator point Λ k . That is,

An interesting consequence of this relationship is that, after shifting and normalizing, the pre-event intensity values will satisfy the same distributional properties seen in Lemma 2 and Theorem 1.

Hence, the conditional uniformity property is even stronger for this kernel. Given the number of events in the cluster, here we find the stronger and even more surprising implication that the intensity sample paths are conditionally uniform. The connection to parking functions also carries over naturally, as Equation (12) 

This tractability also allows us direct access to the distribution of the cluster's duration, as the arrival times and inter-arrival times can be easily obtained from the compensator points. Letting S k = A k − A k−1 be the inter-arrival times of the cluster, we can see that

and thus each inter-arrival time is given by

Hence, we also immediately have that the k th inter-arrival time can be written

for all k ∈ Z + . This convenient form enables us to analyze this paper's most sought after single quantity: the cluster duration τ = A N −1 . This brings us to our second main result, in which we provide a parsimonious characterization of the distribution of the duration, showing that τ is equivalent to a sum of conditionally independent exponential random variables. Again, a uniformly random parking function appears as a hidden spine of the Hawkes process cluster.

Theorem 2. Let N ∼ Borel(ρ) for ρ = α/β, and, given N , let π be a uniformly random parking function of length N − 1. Then, the duration of a cluster in the Markovian Hawkes process satisfies

where T π,i ∼ Exp i + 1 − N −1 j=N −i κ j (π) are conditionally independent given π, with κ i (π) = |{j | π j = i}|.

Proof. Let us first find an expression for the conditional Laplace-Stieltjes transform (LST) directly from Lemma 2, and then interpret it through the lens of Theorem 1. If N = 1, then the cluster contains only the initial time-0 event. Thus, P (τ = 0 | N = 1) = 1 and E [e −θτ | N = 1] = 1.

Then, for k ≥ 1, we can see through applying Equation (14) that the conditional LST can be found through the integral

Now, for reference, let us note that for c ≥ 0, a > 0, and m ∈ N we can obtain the following definite integral directly from the binomial theorem and integration by parts:

To make use of this for the conditional LST, let us substitute y i = i − x i /ρ for each index of integration x i . After iterative application of (16), the exponent on the next index of integration will depend on the exponent of the previous. This leaves us with

where now the bounds of summation, rather than the bounds of integration, depend on the magnitude of the prior term. Combining like terms into a product, this can be re-expressed

By decomposing the binomial coefficients into the underlying factorial terms, cancelling numerators with preceding denominators, and moving the remaining pieces to the product, we can pull (k − 1)! from the leading (k + 1)! and recognize a resulting multinomial coefficient. That is, after cancellation, the remaining denominators contain factorials of terms that will sum to k − 1, and thus with the leading (k − 1)! the multinomial arises:

E e −θτ | N = k + 1 = (k + 1)! (k + 1) k

To simplify further, we can now multiply and divide by k−1 + 1, allowing us to re-index the product and absorb the leading k into the multinomial, yielding a form which we can now interpret:

First, let us observe that L k−1 can be viewed as the set of differences between length every k-step Dyck path and the diagonal y = x line. This is implied directly from the constraints defining the set.

Following this, the multinomial coefficient can then be viewed as counting the number of length-k parking functions that can be formed using these integer differences between the Dyck paths and the y = x line. That is, the multinomial coefficient counts the number of parking functions with 1 − 1 k's, 1 + 1 − 2 (k − 1)'s, and so on. Using the κ i (π) notation, we can then re-express the conditional Laplace-Stieltjes transform as a sum over all parking functions:

.

Here, we have used the pattern above to substitute i − k j=k+1−i κ j (π) for i . Because there are (k + 1) k−1 parking functions of length k, this can also be viewed as an expectation over a uniformly sampled parking function. That is,

where the expectation is taken relative to π. By recognizing that the Laplace-Stieltjes transform

where now the expectation is taken relative to both π and the random variables T π,i . Hence, removing the conditioning on N = k + 1, we achieve the stated equivalence.

As direct corollaries of this result, we can see how the distribution of objects like the duration, the intensities, and the full arrival epoch sequence depend on the parameters α and β. For example, by consequence of Theorem 2, if 0 < α 1 < β 1 and 0 < α 2 < β 2 with α 1 /β 1 = α 2 /β 2 are parameter pairs defining two UHP models with respective duration random variables τ 1 and τ 2 , then β 1 τ 1 and β 2 τ 2 are equivalent in distribution. Similarly said, for ρ = α/β held fixed, Theorem 2 shows that the only dependence τ has on β is through the 1/β coefficient. This realization can actually be traced back to Lemma 2, implying that this same proportionality can be extended to any arrival epoch, not only the final one. Conditioning yields similar insights even when comparing different values of ρ. For example, the conditional LST found in the proof shows that β 1 τ 1 | N 1 = k is equivalent to β 1 τ 2 | N 2 = k for all k ∈ N even if α 1 /β 1 = α 2 /β 2 .

In addition to analytic results, the conditional uniformity and parking function decomposition also enable us to propose a novel simulation algorithm for Hawkes process clusters. This new procedure draws upon the combinatorial structure and group theoretic perspectives of parking functions, mirroring what we have reviewed in Section 2. As we will see, this new algorithm offers competitive efficiency relative to widely used procedures in the literature. However, like in the preceding analysis demonstration, the most important implication of this methodology again lies in the conditioning.

The method we propose first generates the size of the cluster and then generates the arrival epochs accordingly, and we will discuss how this leads to natural applications for rare event simulation and importance sampling.

Algorithm 1: Simulation of a Hawkes Process Cluster Input: Integrated excitation kernel G(x) =

x 0 g(u)du, with ρ = ∞ 0 g(u)du Output: Cluster of N self-excited arrival epochs 0 = A 0 < A 1 < · · · < A N −1 .

1. Generate the total number of events, N ∼ Borel(ρ).

2. Generate the conditionally uniform compensator points, Λ ∈ R N −1

. . , N − 1} and set Λ = ρ · sort(π − U ).

3. return A 1 , . . . , A N −1 as the unique solution to the system of equations,

We present the pseudo-code of our procedure now in Algorithm 1. Let us walk through the idea of this method. The algorithm terminates for any Hawkes process with ρ < 1, and it uses G(·) and

ρ as arguments. In the first step, the size of the cluster N is sampled from the Borel distribution described in Equation (4). All of the remaining steps will then use N as if it was a parameter.

Steps 2 and 3 presume that N > 1; whenever Step 1 yields N = 1, the simulation trivially completes with A 0 = 0. The second step uses Theorem 1 to generate the vector of compensator transformed arrival epochs, Λ, by generating a uniformly random parking function, π. Specifically, Steps 2(a)- The proof of Proposition 4 follows directly from Theorem 1 and the proof of Lemma 1. One of the benefits of this algorithm is that it is modular, in the sense of the word as it might be used by enthusiasts of high fidelity audio equipment or designers of building construction. That is, it can be built from the pieces one prefers. Each of the three primary steps can be conducted through one's selected method, swapping a style of one component out for another approach as desired. For example, our

Step 2 uses Pollak's circular parking argument to generate π, which aligns with Kenyon and Yin (2021) . However, one could just as readily use a similar idea described in Diaconis and Hicks (2017) , where the preference vector is simply iteratively increased by 1 modulo N until it is a valid parking function. Similarly, one could forgo Theorem 1 and instead sample from the polytope P k in Step 2 through use of Markov Chain Monte Carlo (MCMC) methods like

Metropolis-Hastings or hit-and-run algorithms (see, e.g., Chen et al. 2018 , and references therein).

Likewise, our implementations of Algorithm 1 have generated N in Step 1 directly through the probability mass function in (4), but one could, for example, instead simulate a Poisson branching process to yield the same distribution.

This speaks to another advantage to which we have already alluded: the potential for application in rare event simulation and importance sampling. Algorithm 1 untangles the cluster distribution and the cluster size. That is, this method allows one to specify a particular value or, more generally, a target distribution of the cluster size in Step 1 and then generates a collection of cluster arrival epochs matching this desired cardinality in Steps 2 and 3. We expect this to be of particular practical value, as the Borel distribution might be called nearly or asymptotically heavy tailed. As ρ → 1, Equation (4) yields a valid probability distribution on the positive integers with no finite moments. Hence, it is quite natural for a Hawkes process with ρ near 1 to experience clusters of extreme size, and Algorithm 1 provides a controlled way of simulating and evaluating these scenarios. To the best of our knowledge, no prior Hawkes process simulation algorithm allowed for user selection of the cluster size, as this is a direct consequence of the conditional uniformity we have explored throughout this paper. Table 1 Observed run times (seconds) of Hawkes process cluster simulation algorithms across various choices of excitation kernels (2 20 replications). To measure the efficiency and performance of Algorithm 1 relative to the literature, in Table 1 we compare the observed run times of this procedure against the literature. Specifically, we consider two prior algorithms: the classical non-stationary Poisson clustering process perspective provided by Hawkes and Oakes (1974) and the exact simulation method by Dassios and Zhao (2013) . Let us note that other well known Hawkes process simulation methods, such as the thinning procedure from Lewis and Shedler (1979) , Ogata (1981) or Algorithm 7.4.III from Daley and Vere-Jones (2003) , cannot be applied to generation of clusters alone, as they implicitly rely on the presence of a continual baseline stream. Hawkes and Oakes (1974) 's method is essentially exactly like the corresponding definition: for each arrival, simulate its own descendants according to a non-stationary

Poisson process with rate given by the kernel g(·), and repeat this process until there are no more descendants. The generality of this procedure has made it quite popular; it is a backbone subroutine of the perfect Hawkes process simulation algorithms in Møller and Rasmussen (2005) , Chen (2021), and Chen and Wang (2020). The Dassios and Zhao (2013) procedure (Algorithm 3.1 in that paper) only applies to the exponential kernel case and it relies on that Markov assumption.

At each step, a weighted coin is flipped to decide if there will be another arrival given the current value of the intensity. If there will be another, then the time and intensity are updated accordingly, and otherwise the simulation terminates.

We conduct the simulation in Table 1 with two families of excitation kernels, the Markovian exponential decay kernel and the heavy-tailed power law kernel, which is often called "Omori's law" in seismology (Ogata 1998) . Specifically, we consider a series of kernels in each family with increasing mean cluster size: g(x) = (4 m − 1)e −4 m x for the exponential, which will have mean cluster size 4 m , and g(x) = (2 m − 1)/(2 m + x) 2 for the power law, which will have mean cluster size 2 m . All three algorithms can be used to simulate clusters under the exponential kernel, but only Hawkes and Oakes (1974) can be compared to Algorithm 1 outside of the Markovian setting. Table 1 shows that the speed of Algorithm 1 is broadly competitive with the literature and superior in many cases. In particular, in the case of the exponential kernel, the parking function simulation of the Hawkes process proves to be more efficient than either alternative as the clusters grow large. Here, Section 5 shows how

Step 3 can be solved in closed form, and so the efficiency of generating one Borel distributed random variable and one random parking function outpaces the performance of either simulating a non-stationary Poisson process many times in Hawkes and Oakes (1974) or performing many iterative computations of the intensity and the next event epoch in Dassios and Zhao (2013) . When the clusters are mostly small though, the set-up of the circular parking function sampling is likely less efficient than Dassios and Zhao (2013)'s simple steps, and so Algorithm 1 only outperforms Hawkes and Oakes (1974) in such settings. For general decay kernels, closed form solutions to

Step 3 may not be available and one might instead turn to root-finding methods, like what we have done here for the power law kernel. Here we make relatively naive use of Newton's method, and while the efficiency of the Borel and parking function generations helps Algorithm 1, it is clear that its performance is bottlenecked by the Newton calculations. Thus, as the cluster size increases our method should eventually be outpaced by the Hawkes and Oakes (1974) approach.

Hence, careful design of Step 3 of Algorithm 1 is an interesting future direction for this work, and it seems likely that more efficient approaches are available when tailoring to a particular kernel.

As we have discussed, though, we anticipate the primary benefit of this algorithm to lie in the conditioning on N , since the size distribution's propensity for large values creates an opportunity for rare event simulation.

In this paper, we have seen that uniformly random parking functions constitute hidden spines of Hawkes process clusters, providing a decomposition structure for the conditionally uniform compensator transform of the cluster's arrival epochs. Hence, from the cluster size and a random parking function, we can faithfully reconstruct the full sequence of events in the cluster. We have also demonstrated the impact of this connection to both analysis and simulation. For the former, we have found a surprising distributional equivalence between the duration of the Markovian Hawkes cluster and a random sum of exponential random variables. For the latter, we have demonstrated that the resulting sampling methodology offers both efficiency and, perhaps more importantly, fine control over the experiment through cluster-size-conditioning.

Much of our discussion of the connection between Hawkes clusters and parking functions has centered around what the discrete object can do for the continuous, but we can also see that there is some partial reciprocity available. That is, it is an immediate consequence of Theorem 1 that, for the arrival epochs drawn from any Hawkes process cluster, the shuffled ceiling of the compensator points will be a parking function, where only the length of the parking function will be dependent on the parameters of the Hawkes process. This underscores a point that we have mentioned only briefly, which is that Lemma 2 shows how one Hawkes process cluster can readily be transformed to another cluster driven by an entirely different excitation kernel so long as the values of ρ match.

Hence, one could replace Steps 1 and 2 in Algorithm 1 with simulating the cluster according to some efficient Hawkes process procedure, such as Dassios and Zhao (2013) , and then transform to the true targeted excitation function in Step 3. In a similar notion, one could also petition to the classical random time change theorem in place of Lemma 2 and replace Steps 1 and 2 with a unit rate Poisson process. That is, one could run a unit rate Poisson process until the first index

i ≥ 1 such that the Poisson arrival epoch T i exceeds iρ, and then return times 1 through i − 1 as the compensator points and proceed to Step 3. Analogously by the closure of the exponential distribution when multiplying by a constant, one could also run a Poisson process at rate ρ and then compare T i just to the index i. This is close to the idea of Algorithm 7.4.III from Daley and

Vere-Jones (2003), but the above proposal would terminate with an absorption event for the end of the cluster. By comparison, it is inherent to Algorithm 7.4.III that the point process continues indefinitely and that the compensator grows without bound, as otherwise "the final interval is then infinite and so cannot belong to a unit-rate Poisson process" (Daley and Vere-Jones 2003) .

In numerical experiments, we found that the speed of this alternate method is essentially identical to the Dassios and Zhao (2013) algorithm. This makes sense, because they are mostly doing the same thing: generating exponentials until an absorption event occurs. Hence, there could be some efficiency gains over our algorithm when ρ is not large, but regardless, like Dassios and Zhao (2013) , this approach does not offer the control over the support and distribution of N like Algorithm 1 does.

In the broader perspective of the Hawkes process literature, we have studied the same model as contemporary works like Chen (2021), Graham (2021) , which is a linear (relative to the non-linear generalization introduced by Brémaud and Massoulié 1996) univariate (relative to the mutuallyexciting version actually dating back to Hawkes 1971) Hawkes process with general excitation kernel (relative to kernel-specific works like Dassios and Zhao 2013) . For generalizations of these results, our initial suspicion is that multiple dimensions is the most promising next step, as there are analogs of much of the background fundamentals from which we have drawn. In particular, the random time change does still hold in higher dimensions, where we can build from both results for general processes (e.g. Brown and Nair 1988) and for Hawkes processes specifically (e.g. Embrechts et al. 2011 ). One can also leverage similar time-agnostic multi-type branching process perspectives for the cluster size distribution (e.g. Good 1960) . In fact, a very similar proof to Lemma 2 can create conditionally uniform analogs of the two styles of multivariate random time change in Embrechts et al. (2011) . The remaining challenge arises while converting from conditionally uniform compensator points to the arrival epochs in each sub-stream, as it may be possible that there are multiple solutions to the compensator system of equations. Hence, we are quite interested in this future direction, particularly given the variety of recent interest in these models, such as in works

like Aït-Sahalia et al. (2015) , Nickel and Le (2020) , Daw et al. (2021) , Karim et al. (2021) .

Finally, to that end, let us emphasize that this paper's analysis has not been purely a theoretical exercise. Our original motivation was to study Hawkes process clusters in an operational context inspired by the data and application in Daw et al. (2021) , but we found the methodological cup-

board not yet full enough for the questions we sought to answer. Hence, we are quite interested in returning to these problems with the ability to address the cluster duration and chronology now in hand. Building from the model of Daw et al. (2021) , the Hawkes cluster duration can be seen as the length of a co-produced service exchange. Understanding this distribution allows us to evaluate how the history-dependent service process will impact the service system overall, leading to new possibilities in the analysis of natural queueing theoretic questions like staffing and routing decisions. This is of foremost intrigue to us as a direction of future research, and we believe that the conditional uniformity property and its parking function decomposition provide valuable insight into such problems, which we look forward to pursuing.

Modeling financial contagion using mutually exciting jump processes

The challenges of modeling and forecasting the spread of COVID-19

An extension of Watanabe's theorem of characterization of poisson processes over the positive real half line

Point processes and queues: martingale dynamics

Stability of nonlinear Hawkes processes. The Annals of Probability

Rate of convergence to equilibrium of marked Hawkes processes

The time-rescaling theorem and its application to neural spike train data analysis

Statistical analysis of a telephone call center: A queueing-science perspective

A simple proof of the multivariate random time change theorem for point processes

A proof of the shuffle conjecture

Perfect sampling of Hawkes processes and queues with Hawkes arrivals

Perfect sampling of multivariate Hawkes processes. 2020 Winter Simulation Conference (WSC)

Fast MCMC sampling algorithms on polytopes

Renewal in Hawkes processes with self-excitation and inhibition

An introduction to the theory of point processes: Volume I: Elementary theory and methods

Exact simulation of Hawkes process with exponentially decaying intensity. Electronic Communications in Probability 18

The co-production of service: modeling service times in contact centers using Hawkes processes

Probabilizing parking functions

Time-changed birth processes and multiname credit derivatives

Multivariate Hawkes processes: an application to financial data

An introduction to probability theory and its applications

Mappings of acyclic and parking functions

Dependent events and changes of time

Generalizations to several variables of Lagrange's expansion, with applications to stochastic processes

Regenerative properties of the linear Hawkes process with unbounded memory

Spectra of some self-exciting and mutually exciting point processes

A cluster process representation of a self-exciting process

Exact and asymptotic analysis of general multivariate Hawkes processes and induced population processes

Parking functions: From combinatorics to probability

Are call center and hospital arrivals well modeled by nonhomogeneous Poisson processes?

An occupancy discipline and applications

Simulation of nonhomogeneous Poisson processes by thinning

Démonstration simplifiée d'un théorème de Knight

Perfect simulation of Hawkes processes

Learning multivariate hawkes processes at scale

The Markovian self-exciting process

On Lewis' simulation method for point processes

Statistical models for earthquake occurrences and residual analysis for point processes

Space-time point-process models for earthquake occurrences

Maximum likelihood estimation of Hawkes' self-exciting point processes

Some non asymptotic tail estimates for Hawkes processes

Ballots and trees

Parking functions and noncrossing partitions

A polytope related to empirical distributions, plane trees, parking functions, and the associahedron

On discontinuous additive functionals and Lévy measures of a Markov process

Parking functions. Handbook of enumerative combinatorics

We are grateful for valuable comments and suggestions from David Eckman, Shane Henderson, Sam Gutekunst, and Galit Yom-Tov.