key: cord-0154520-zsr5qdn7 authors: Holbrook, Andrew J.; Ji, Xiang; Suchard, Marc A. title: From viral evolution to spatial contagion: a biologically modulated Hawkes model date: 2021-03-03 journal: nan DOI: nan sha: eed28bad4a33dcb0a0305bfadd73bbb7a92aa911 doc_id: 154520 cord_uid: zsr5qdn7 Mutations sometimes increase contagiousness for evolving pathogens. During an epidemic, scientists use viral genome data to infer a shared evolutionary history and connect this history to geographic spread. We propose a model that directly relates a pathogen's evolution to its spatial contagion dynamics -- effectively combining the two epidemiological paradigms of phylogenetic inference and self-exciting process modeling -- and apply this emph{phylogenetic Hawkes process} to a Bayesian analysis of 23,422 viral cases from the 2014-2016 Ebola outbreak in West Africa. The proposed model is able to detect individual viruses with significantly elevated rates of spatiotemporal propagation for a subset of 1,610 samples that provide genome data. Finally, to facilitate model application in big data settings, we develop massively parallel implementations for the gradient and Hessian of the log-likelihood and apply our high performance computing framework within an adaptively preconditioned Hamiltonian Monte Carlo routine. The COVID-19 pandemic has demonstrated the need for new scientific tools for the analysis and prediction of viral contagion across human landscapes. The mathematical characterization of the complex relationships underlying pathogen genetics and spatial contagion stands as a central challenge of 21st century epidemiology. We approach this task by unifying two distinct probabilistic approaches to viral modeling. On the one hand, Bayesian phylogenetics (Sinsheimer et al., 1996; Yang and Rannala, 1997; Mau et al., 1999; Suchard et al., 2001) uses genetic sequences from a limited collection of viral samples to integrate over high-probability shared evolutionary histories in the form of phylogenies or family trees. On the other hand, Traditional Bayesian phylogenetics Hawkes processes Observational limit N in low thousands N in high tens-of-thousands Biological insight Evolutionary history None Genetic sequencing Required Not required Spatiotemporal data Not required Required Geographic spread Not modeled Modeled Large-scale transport Does not induce bias Induces bias Table 1 : Comparison of two probabilistic modeling paradigms within viral epidemiology, the combination of which represents a new tool for Bayesian phylogeography. self-exciting, spatiotemporal Hawkes processes (Reinhart, 2018) model spatial contagion by allowing an observed event to increase the probability of additional observations nearby and in the immediate future. Both modeling paradigms come with their own advantages. For Bayesian phylogenetics, the past twenty years have witnessed a slew of high-impact scientific studies in viral epidemiology (Rambaut et al., 2008; Smith et al., 2009; Faria et al., 2014; Gire et al., 2014; Dudas et al., 2017; Boni et al., 2020) and the rise of powerful computing tools facilitating inference from expressive, hierarchical models of phylogenies and evolving traits (Ronquist et al., 2012; Suchard et al., 2018) . Unfortunately, the number of evolutionary trees to integrate over explodes with the number of viral samples analyzed (Felsenstein, 1978) , so Bayesian phylogenetic analyses typically restrict to a relatively small number of viral samples, at most totaling a few thousand. The fact that viral cases that undergo genetic sequencing usually represent a small subset of the total case count exacerbates this issue. Thus, failure to detect phylogenetic clades that represent novel strains on account of computational and surveillance limitations always remains a possibility. Until now, these weaknesses have also held for the sub-discipline of Bayesian phylogeography, which attempts to relate viral evolutionary histories to geographic spread as represented by (typically Brownian) phylogenetic diffusions. These models describe viral spread through either discretized (Lemey et al., 2009) or continuous (Lemey et al., 2010) space, but both approaches induce their own form of bias (Holbrook et al., 2020) . In the face of these shortcomings, Bayesian phylogeography needs new tools for directly modeling spatial contagion. Hawkes processes (Hawkes, 1971a (Hawkes, ,b, 1972 (Hawkes, , 1973 (Hawkes, , 2018 are widely applicable point process models for generally viral or contagious phenomena, such as earthquakes and aftershocks (Hawkes, 1973; Ogata, 1988; Zhuang et al., 2004; Fox et al., 2016) , financial stock trading (Bacry et al., 2015; Hawkes, 2018) , viral content on social media (Rizoiu et al., 2017; Kobayashi and Lambiotte, 2016) , gang violence (Mohler et al., 2013; Mohler, 2014; Loeffler and Flaxman, 2018; Park et al., 2019; Holbrook et al., 2021) and wildfires (Schoenberg, 2004) . Unsurprisingly, Hawkes processes are natural models for the contagion dynamics of biological viruses as well. Kim (2011) uses spatiotemporal Hawkes processes (Reinhart, 2018) , which model viral cases as unmarked events in space and time, to model the spread of avian influenza virus (H5N1). Meyer et al. (2014) incorporate power laws to describe spatial contagion dynamics and model meningococcal disease in Germany from 2001 to 2008. While Rizoiu et al. (2018) do not model epidemiological data, they do draw connections between epidemiological SIR models and Hawkes processes, showing that the rate of events in the SIR model is equal to that of a finite-population Hawkes model. Kelly et al. (2019) apply a temporal nonparametric Hawkes process to the 2018-2019 Ebola outbreak in the Democratic Republic of the Congo and successfully generate accurate disease prevalence forecasts. Chiang et al. (2020) model COVID-19 cases and deaths in the U.S. at the county level using spatially indexed mobility and population data to modify the process conditional intensity. Most recently, Bertozzi et al. (2020) compare the performance of a temporal Hawkes process model with temporally evolving conditional intensity to that of SIR and SEIR models for modeling regional COVID-19 case dynamics. Because such Hawkes processes do not involve genetic information, one may apply the model to a much larger collection of cases, i.e., those for which a timestamp and spatial coordinates are available. Moreover, recent successes in scaling Hawkes process inference to a big data setting enable inference from observations numbered in the high tens-of-thousands (Holbrook et al., 2021; Yuan et al., 2021) . This ability to interface with an order of magnitude more cases represents a major benefit of the Hawkes process in comparison to the Bayesian phylogenetic paradigm, but the trade-off is that conclusions drawn from a Hawkes process analysis are devoid of explicit biological insight. It is possible for the model to attribute selfexciting dynamics to nearby viral cases that are only distantly related on the phylogenetic tree. Finally, these processes do not immediately account for viral spread through large-scale transportation networks (Brockmann and Helbing, 2013; Holbrook et al., 2020) but attribute events resulting from such contagion to a 'background' process. In the following, we construct a Bayesian hierarchical model that allows both modeling approaches to support each other. This model ( Figure 1 ) learns phylogenetic trees that describe the evolutionary history of the subset of observations that yield genetic sequencing and uses this history to inform the distribution of a latent relative rate or productivity Schoenberg, 2020) for each virus in this limited set. In turn, these virus-specific rates modify the rate of self-excitation of a spatiotemporal Hawkes process describing the contagion of all viruses, sequenced or not. We use a Metropolis-within-Gibbs strategy to jointly infer all parameters and latent variables of our phylogenetic Hawkes process and overcome the O(N 2 ) computational complexity of the Hawkes process likelihood by incorporating the modified likelihood in the hpHawkes open-source, high-performance computing library (Holbrook et al., 2021) available at https://github.com/suchard-group/hawkes. Within the same library, we also develop multiple parallel computing algorithms for the loglikelihood gradient and Hessian with respect to the model's virus-specific rates. GPU-based implementations of these gradient and Hessian calculations score hundred-fold speedups over single-core computing and help overcome quadratic complexity in the context of an adaptively preconditioned Hamiltonian Monte Carlo (Neal, 2011) . These speedups prove useful in our analysis of 23,422 viral cases from the 2014-2016 Ebola outbreak in West Africa. We develop the phylogenetic Hawkes process and its efficient inference in the following sections. Importantly, our proposed hierarchical model integrates both sequenced and unsequenced viral case data, representing a significant and clear contribution insofar as: 1. the percentage of confirmed viral cases sequenced during an epidemic is often in the single digits (Wadman, 2021) ; 2. and previous phylogeographic models have failed to leverage additional information provided by geolocated, unsequenced case data. We address this shortcoming by constructing a new hierarchical model that both models all spatiotemporal data with a Hawkes process (Section 2.1) and allows an inferred evolutionary history in the form of a phylogenetic tree to influence dependencies between relative rates of contagion (Section 2.2) for the small subset of viral cases for which genome data are available. We believe that this approach is altogether novel. Hawkes processes (Hawkes, 1971a (Hawkes, ,b, 1972 (Hawkes, , 2018 constitute a useful class of inhomgeneous Poisson point processes (Daley and Jones, 2003) for which individual events contribute to an increased rate of future events. Spatiotemporal Hawkes processes (Reinhart, 2018) are marked Hawkes processes with spatial coordinates for marks (Daley and Jones, 2003) . We are interested in spatiotemporal Hawkes processes with infinitesimal rate where x ∈ R D , t ∈ R + and the subscript n indicates that the usual triggering function g(·, ·) takes on different forms depending on some characteristic associated with event n. These non-negative, monotonically non-increasing, event-indexed triggering functions additively contribute to ξ(·, ·), the self-excitatory rate component, and encourage this rate to increase after each observed event. Here, µ(·) is the background rate and only depends on spatial position x. Conditioned on observations (x n , t n ), n = 1, . . . , N , we specify the rate components where τ x > 0 and h > 0 are the background and self-excitatory spatial lengthscales, µ 0 > 0 and θ 0 > 0 are the background and self-excitatory weights, 1/ω > 0 is the self-excitatory temporal lengthscale, φ(·) is the D-dimensional standard normal probability density function, and the background rate's indicator function prevents a trivial maximum at τ x → 0 (Habbema et al., 1974; Robert, 1976) . The inclusion of θ n > 0 for n = 1, . . . , N within the self-excitatory rate marks a major departure from similar model specifications in Loeffler and Flaxman (2018) ; Holbrook et al. (2021) . These 'degrees of contagion' or 'productivities' Schoenberg, 2020) allow different events to contribute differently to the overall self-excitatory rate of the process: the larger the θ n , the higher the rate directly following event n (Figure 1, right) . Following the connection of the Hawkes process with exponential triggering function to a discrete time SIR model (Rizoiu et al., 2018) , Bertozzi et al. (2020) refer to these quantities as a reproduction number. In the following, we refer to θ n as the event-specific, case-specific or virus-specific rate for the nth event, case or viral observation. Denoting θ = (θ 1 , . . . , θ N ) T , the likelihood of observing data (X, t) = ((x 1 , t 1 ), ..., (x N , t N )) T is (Daley and Jones, 2003) The choice of R D for integration domain is popular and often necessary but assumes complete observation over the entirety of R D (Schoenberg, 2013) . The resulting integral may be written (Appendix A) We reference these formulas while outlining our inference strategy in Section 2.3 and detailing our massively parallel algorithms for calculating the log-likelihood gradient and Hessian with respect to event-specific rates θ in Appendix C. We describe our biologically modulated joint prior on event-specific rates θ 1 , . . . , θ N in the next section. We use standard Bayesian phylogenetics hierarchical approaches (Suchard et al., 2003) to model a single molecular sequence alignment S containing sequences from M ≤ N evolutionarily related viruses. Let M denote the ordered index set with cardinality |M| = M containing every number within the set {1, . . . , N } that corresponds to an observed virus for which genome data are present. In the following, we number the elements within M as m 1 , m 2 , . . . , m M . Moreover, we make use of the set M + with cardinality |M + | = 2M − 1, satisfying M ⊂ M + and containing elements m 1 , . . . , m 2M −1 . Our primary object of interest is the phylogenetic tree G (Figure 1 , left) defined as a bifurcating, directed graph with M terminal degree-1 nodes (ν m 1 , . . . , ν m M ) that correspond to the tips of the tree (or Figure 1 : The phylogenetic Hawkes process relates the evolutionary history of a virus to the rate at which subsequent viral cases occur nearby. Left: a phylogenetic tree characterizes the evolution of a viral strain. Middle: a Brownian motion 'along the tree' describes the evolution of infinitesimal rates as a function of branch lengths and tree topology (Section 2.2). Right: virus-specific rates additively contribute to the rate of occurrence for future cases (Section 2.1). , a root degree-2 node ν m 2M −1 and edge weights (w m 1 , . . . , w m 2M −2 ) that encode the elapsed evolutionary time between nodes. Here, each w m communicates the expected number of molecular substitutions per site, which is itself the product between the real time duration and the evolutionary rate arising from a molecular clock model. For example, we use a relaxed molecular clock model (Drummond et al., 2006b ) that allows for substitution rates to flexibly vary accross branches (Section 3.2). One may either know G a priori or endow it with a prior distribution parameterized by some vector φ. Suchard et al. (2001) and Suchard et al. (2018) develop the joint distribution p(S, φ, G) in detail. We assume that the event-specific rates θ defined within our Hawkes model take the form (Figure 1 , middle) and that the elements of vector z = (z m 1 , . . . , z m M ) T follow a Brownian diffusion process along the branches of G (Cavalli-Sforza and Edwards, 1967; Felsenstein, 1985; Lemey et al., 2010) . Here, f (·) is some fixed vector function and the inclusion of the linear term β T f (t n ) helps control for global trends resulting from extrinsic events such as mass quarantine or travel restrictions. Under the Brownian process, the latent value of a child node ν c in tree G is normally distributed about the value of its parent node ν pa(c) with variance w c × σ 2 , where σ 2 gives the dispersal rate after controlling for correlation in values that are shared by descent through the phylogenetic tree G. We further posit that the latent value of the root node ν m 2M −1 is a priori normally distributed with mean 0 and variance τ 0 × σ 2 . The vector z is then multivariate normally distributed Cybis et al. (2015) and has probability density function where V G = {v nm } is a symmetric, positive definite, block-diagonal M × M matrix with structure dictated by G. Defining d F (u, v) to be the sum of edge-weights along the shortest path between nodes u and v in tree G, the diagonal elements v mm = τ 0 + d F (ν m 2M −1 , ν mm ) are the elapsed evolutionary time between the root node and tip node m m , and off-diagonal are the evolutionary time period between the root node and the most recent common ancestor of tip nodes m n and m m . Due to the complexity of the phylogenetic Hawkes process and the large number of viruses we seek to model, we must use advanced statistical, algorithmic and computational tools to infer the posterior distribution We do so using a random-scan Metropolis-with-Gibbs scheme, in which we compute key quantities with the help of adaptively preconditioned Hamiltonian Monte Carlo (HMC) (Neal, 2011) , dynamic programming and parallel computing on cutting-edge graphics processing units (GPU). We must evaluate p(z | σ 2 , G) to sample G. The bottleneck within the evaluation of Equation (2) (Cavalli-Sforza and Edwards, 1967; Pearl, 1982) . Similar tricks render inference for φ linear in M , and Fisher et al. (2021) extend Pybus et al. (2012) to compute gradients with respect to φ. Finally, implementing these algorithms on GPUs would lead to additional speedups ), but the computational bottleneck we face when applying the phylogenetic Hawkes process arises from the Hawkes process likelihood and its gradients. Sampling the Hawkes process parameters µ 0 , τ x , θ 0 , ω, h, β and event-specific rates θ requires evaluation of the likelihood L(X, t|µ 0 , τ x , θ 0 , θ, ω, h) or its logarithm. Unfortunately, the double summation of Equation (1) results in an O(N 2 ) computational complexity that makes repeated likelihood evaluations all but impossible for the number of observations considered in this paper. We therefore use the high-performance computing framework of Holbrook et al. (2021) to massively parallelize likelihood evaluations in the context of univariate, adaptive Metropolis-Hastings proposals for parameters µ 0 , τ x , θ 0 , ω, h and β. On the other hand, inference for the M -vector z requires more than fast univariate proposals, so we opt for HMC to sample from its high-dimensional posterior. Even in high dimensions, HMC efficiently generates proposal states by simulating a physical Hamiltonian system that renders the target posterior distribution invariant. Here, we follow standard procedure and specify the system with total energy where π(z) is the density of the marginal posterior for z, p is a Gaussian distributed 'momentum' variable with density ξ(p|M), and M is the system mass matrix and the covariance of p. We again follow standard HMC procedure and use the leapfrog algorithm (Leimkuhler and Reich, 2004) to integrate Hamilton's equationṡ But there is no free lunch: simulation of the physical system requires repeated evaluations of the log-likelihood gradient, and these evaluations may become burdensome in big data contexts. Indeed, the gradient of the Hawkes process log-likelihood of Equation (1) with respect to θ becomes computationally onerous for large N and M . The gradient with respect to a single event-specific rate θ m takes the form where the summation and λ n are both of complexity O(N ). Thus, computing the entire vector ∂ /∂θ = (∂ /∂θ 1 , . . . , ∂ /∂θ M ) T requires time O(N M ). Worse still, due to the multiscale nature of the posterior for the relative rates (Figure 4) , we find it necessary to precondition the Hamiltonian dynamics by specifying a diagonal mass matrix with elements Specifically, we maintain a running average of Hessians calculated at a fixed interval and use this as our preconditioner M, thus maintaining asymptotic unbiasedness of Monte Carlo estimates (Haario et al., 2001) . Just as with the gradient, the summation and λ n are both of complexity O(N ), and the resulting complexity for the entire Hessian is O(N M ). To overcome these rate-limiting steps, we develop massively parallel central processing unit (CPU) and GPU implementations of both the gradient and the Hessian. Although our GPU-based implementations are fastest (Section 3.1), our CPU implementations are competitive, making use of both multi-core processing and SIMD (single instruction, multiple data) vectorization (Holbrook et al., 2020) . Regardless of implementation, all of our high-performance software remains freely available for public use. [Left] Multiplicative speedups over single-threaded advanced vector extensions (AVX) vectorization for single-threaded non-vectorized and streaming SIMD extensions (SSE), multithreaded AVX and many-core GPU processing for 25,000 randomly generated data points. [Right] Seconds per gradient and Hessian calculations for multi-threaded AVX and GPU implementations from 10 to 90 thousand data points. We use the Bayesian evolutionary analysis by sampling trees (BEAST) software package (Suchard et al., 2018) , a popular tool for viral phylogenetic inference that implements MCMC methods to explore p(S, φ, G) and p(z | σ 2 , G) (Cybis et al., 2015) under a range of evolutionary models. In writing this paper, we have contributed to the open-source, stand-alone library hpHawkes http://github.com/suchard-group/hawkes for computing the spatiotemporal Hawkes process log-likelihood (Equation 1), its gradient (Equation 4) and its Hessian (Equation 5). hpHawkes integrates into BEAST with the help of an application programming interface (API). Within hpHawkes, we combine C++ code with which standard compilers generate vectorized CPU-specific instructions and OpenCL kernels that allow for GPU-specific optimization. Finally, we have used the Rcpp package (Eddelbuettel and François, 2011) to make the same massive parallelization speedups available to users of the R programming language. 4). For the GPU results, we use an NVIDIA Quadro GV100, which has 5,120 CUDA cores (at 1.13 GHz) and reaches an (unboosted) 2.9 teraflops peak double-precision floating point performance (or 5.8 teraflops for fused operations such as fused multiply-add). We use a Linux machine with two 26-core Intel Xeon Gold processors (2.1 GHz) for CPU results. Each physical core supports 2 threads or logical cores, and the machine achieves a peak performance of 874 gigaflops with double-precision floating point enhanced with AVX vectorization (again, double this for fused operations). Based on peak double-precision floating point operations, our a priori expectation is for fully parallelized GPU-based gradient evaluations to to be roughly 3.3 times faster than 104-threaded AVX evaluations on the CPU. On the left of Figure 2 , we compare relative efficiency for GPU and various CPU implementations of the log-likelihood gradient and Hessian for 25,000 simulated data points using single-threaded AVX computing (15.7 and 16.3 seconds per gradient and Hessian evaluations) as baseline. Using SSE or non-vectorized single-threaded computing results in 1.5-and 2.2fold slowdowns for the gradient and 1.3-and 2.1-fold slowdowns for the Hessian. Sticking with AVX processing, we see diminishing returns as we increase the number of threads. For the both the gradient and the Hessian, the 14-54-and 104-thread AVX implementations are roughly 13, 33 and 41 times faster than single-threaded AVX. Agreeing with our a priori expectations, the GPU implementation is 140.4 times faster than single-threaded AVX and 3.5 times faster than 104-threaded AVX for the gradient and 120.0 times faster than single-threaded AVX and 2.9 times faster than 104-threaded AVX for the Hessian. The right of Figure 2 demonstrates the O(N 2 ) computational complexity for the same gradient and Hessian evaluations by varying the number of data points from 10,000 to 90,000. While parallelization does not overcome this quadratic scaling, it does reduce computational costs for finite observation counts. During the 2014-2016 outbreak in Guinea, Sierra Leone and Liberia, Ebola viral fever resulted in over 28,000 known cases and 11,000 known deaths (World Health Organization, 2015) . First reports of the virus in Guinea emerged during March of 2014 (Baize et al., 2014) . At around the same time, viral cases with the same Guinean origin (Gire et al., 2014) 1,367 (of 1,610 ) RNA-sequenced viral samples for which date/location data are available. Unsurprisingly, the largest relative rates occur within or nearby major clusters of events. Adjusting for downward trends in case data with a negative coefficient β (Table 2) allows detection of higher relative rates after peak outbreak (late 2014) including a jump in infections mid-2015 (Figure 3 ). the end of 2014 did case numbers begin to abate in most areas due to control measures. By March of 2015 sustained transmission of the virus only continued in western Guinea and western Sierra Leone (Dudas et al., 2017) . Figure 3 shows the spatiotemporal distribution of the majority of known Ebola virus cases during the epidemic. Using our high performance computing framework, we apply the phylogenetic Hawkes process to the analysis of 23,422 viral cases. Dudas et al. (2017) provide a total of 1,610 cases furnishing genomic sequencing, 1,367 of which come with date and location data (https: //github.com/ebov/space-time). We supplement this sequenced data with 21,811 date and location pairs from unsequenced cases documented by the World Health Organization (https://apps.who.int/gho/data/node.ebola-sitrep). The precision of the spatial data is district or county level. To leverage spatial information as much as practically possible within our Hawkes model, we assume the locations follow a Gaussian distribution at district population centroids and with variance guaranteeing a 95% probability of the case occurring within the circle of equal area to the district and centered at the population centroid. We then integrate over uncertainty with respect to these locations by periodically sampling new locations according to the assumed Gaussian distribution throughout the MCMC run and with a period of roughly 100 iterations. That said, sensitivity analyses show that model inference is robust to fixing randomly generated locations for the entire MCMC chain. We make the combined data and documentation for our entire BEAST analysis available within the single file Final.xml and place this as well as other project scripts together at the repository https://github.com/suchard-group/EBOVPhyloHawkes. In addition to the software mentioned in the previous section, we make use of the ggplot2 and ggmap R packages for data and results visualization (Wickham, 2016; Kahle and Wickham, 2013) . For the phylogenetic prior specification p(S, φ, G), we follow the phylogeographic analysis of Dudas et al. (2017) and use a mixture of 1,000 phylogenetic trees obtained as highprobability posterior samples from their purely phylogenetic analysis of the 1,610 sequenced Figure 6 : 95% credible intervals and posterior means for virus-specific rates θ corresponding to the subset of 1,610 sequenced viruses that come with date/location data and therefore appear in the Hawkes process module. We call those 183 intervals which do not include 1 'significant'. Of these, 177 are above and 6 below 1. Appendix B has labels, locations and times for each. viral samples. In that preceding Bayesian analysis, Dudas et al. (2017) combine an HKY+Γ 4 substitution model prior for molecular evolution (Hasegawa et al., 1985; Yang, 1994) , a relaxed molecular clock prior on rates (Drummond et al., 2006a) , a non-parametric coalescent 'Skygrid' prior on effective population size dynamics (Gill et al., 2013 ) and a continuous time Markov chain reference prior for overall rate (Ferreira and Suchard, 2008) . We assume a priori that the background lengthscale τ x follows a diffuse inverse gamma distribution with shape 1 and scale 10, where distance units are latitudinal and longitudinal degrees. An inverse gamma distribution with shape and scale parameters equal to 2 and 0.5 for both h and 1/ω encodes our beliefs that self-excitatory dynamics occur at finer spatiotemporal scales, where years are the temporal units. We upweight self-excitatory dynamics by giving θ 0 and µ 0 gamma priors with shape parameters 1 and 2 and scale parameters 0.001 and 2, respectively. We absorb τ 0 into σ and place a tight inverse gamma prior on 1/σ with shape and scale parameters of 2 and 0.5. Finally, we set f (t n ) = t n and place a normal prior on the univariate coefficient β with mean 0 and standard deviation of 10. We find all parameters robust to prior specification due to the large number of observations considered. We generate 100 million MCMC samples according to the routine outlined in Section 2.3 and discard the first 500 thousand as burn-in. Using our parallel computing algorithms and a single NVIDIA GV100 GPU (Section 3.1), the routine requires 6.77 hours to generate 1 million samples and 28 days to generate all 100 million samples. Figure 4 shows the distribution of effective samples sizes (ESS) across all model parameters and illustrates some of the challenges facing any MCMC routine for the phylognetic Hawkes model. Namely, the posterior distribution is high-dimensional, multimodal, multiscale and has complex correlation structures. Table 2 shows posterior means and 95% highest posterior density (HPD) credible intervals for the phylogenetic Hawkes process parameters. The posterior mean for the spatial bandwidth τ x of the Hawkes background process is 194 km (147, 243) , allowing the model to incorporate and adapt to large scale geographic movement. On the other hand, the Hawkes process self-excitatory spatial bandwidth h has a posterior mean of 7.4 km (7.1, Figure 7 : Posterior mean virus-specific relative rates color the posterior maximum clade credibility tree of phylogeny G, a few subtrees of which potentially demonstrate elevated contagiousness. 7.6), indicating the smaller local scale for which the model attributes viral contagion. The self-excitatory temporal bandwidth 1/ω has a posterior mean of 29.8 days (28.5, 30.9), indicating the timescale for which the model attributes the same viral contagion. The normalized self-excitatory weight θ 0 /(θ 0 + µ 0 ) indicates the proportion of events the model attributes to self-excitatory (compared to background) dynamics and has a posterior mean of 0.96 (0.95, 0.97). The posterior mean of the self-excitatory rate's temporal trend coefficient β is -2.22 (-2.37, -2.06) indicates that, for every additional year and ceteris paribus, one should expect a multiplicative decrease of 1 − exp(−2.22) × 100 ≈ 90% to the process self-excitatory rate. In this way, the model adjusts for downward trends arising from epidemiological control (e.g., mass quarantine and travel restrictions) and controls for these factors when inferring virus-specific relative rates. Next, we consider posterior inference of the virus-specific rates θ for those viral observations that provide RNA sequences. When interpreting these results, it is important to understand that the phylogenetic Hawkes process implicitly assumes that such samples spark nearby contagion as described by spatial and temporal bandwidths h and 1/ω. Recall that the posteriors for these two parameters concentrate at over 7 km and 4 weeks, respectively. Figure 5 depicts the relationship between posterior mean values of θ and the spatiotemporal distribution of corresponding viruses. Since these rates represent multiplicative factors of the global self-excitatory weight θ 0 , a null value would be 1. Posterior means range from approximately -2.5 to 7.5 and increasingly vary as a function of time. As one might expect, the highest rates appear near or within larger clusters. Thanks to the negative temporal trend coefficient β and the increase of uncertainty with time, larger rate values do obtain for some viral cases occurring in 2015, despite following after peak epidemic. Figure 6 features posterior means and 95% intervals for the same virus-specific rates. Only a small subset of 183 rate intervals do not include 1. Of these, 177 have lower bound greater than 1, and 6 have upper bound below 1. We interpret all 103 of the corresponding viruses as having statistically significantly increased or decreased contagiousness. Virus-specific labels, locations and times for all 183 appear in Appendix B. Finally, Figure 7 shows how these posterior rates organize as a function of the inferred posterior maximum clade credibility tree G. Generally speaking, shorter branch lengths indicate larger effective populations of viruses, while larger branch lengths indicate smaller. For example, the structure of the bottom subtree reflects this intuition as branches are short with many splits during peak outbreak in the second half of 2014 but become mostly long in late 2014 and the remainder of 2015. According to the phylogenetic Brownian process model outlined in Section 2.2, virus-specific rate values are more highly correlated to one another when closely located to one-another on the the phylogenetic tree. It is plausible that these correlations allow the phylogenetic Hawkes model to infer higher rates for some strains that survive late into the epidemic despite dropping case counts. The model attributes some of the highest values to strains appearing in Coyah, Conakry and Kindia, Guinea, in late 2014 and early 2015. Interestingly, the model also attributes its lowest values to cases in Kono, Sierra Leone, in early 2015. Taken together, Figure 7 and the significant virus lists of Appendix B may provide helpful leads for epidemiologists searching for variants with heightened relative rates of contagion. We propose the phylogenetic Hawkes process, a Bayesian hierarchical model that relates viral spatial contagion to molecular evolution by uniting the two epidemiological paradigms of selfexciting point process and phylogenetic modeling. Due to difficulties in scaling the model to larger numbers of observations, we advance a computing strategy that combines HMC, dynamic programming and massive parallelization for key inferential bottlenecks. Finally, we apply our novel model and high performance computing framework to the analysis of over 23,000 viral cases arising from the 2014-2016 Ebola outbreak in West Africa, and Ebola strains and subtrees with plausibly higher degrees of contagiousness reveal themselves. Unfortunately, the current model will fail when applied to the analysis of a global pandemic due to the dominant role of non-local, large scale transportation networks in propagating viral spread (Holbrook et al., 2020) . We are particularly interested in developing extensions to the phylogenetic Hawkes process that leverage recent advances in scaling highdimensional multivariate Hawkes processes (Nickel and Le, 2020) and applying the resulting multivariate phylogenetic Hawkes process to the analysis of global pandemics. In this context, each additional dimension will represent an additional country or province. Prodigious computational challenges are inevitable, and we suspect that non-trivial GPU implementations will be necessary for big data applications. Moving beyond inference, a major question is whether the phylogenetic Hawkes process can be useful for prediction of spatial contagion and dynamics. Here, recent neural network extensions of the Hawkes process might prove useful (Mei and Eisner, 2017; Zuo et al., 2020; Zhang et al., 2020) , but it is unclear what forward simulation of phylogenetic branching dynamics would look like in the context of a Hawkes process. Moreover, generating samples from the posterior predictive distribution of a Hawkes process would be extremely time consuming when one is conditioning on millions of posterior samples. To work around this, one could perhaps parallelize over fixed parameter settings the algorithm of Dassios and Zhao (2011) for simulating Hawkes processes when the temporal triggering function is exponential. Such an implementation would require efficient use of parallel pseudo-random number generators (Salmon et al., 2011) . Without loss of generality, we consider the temporal Hawkes process with constant background rate. To compute the likelihood (Equation 1), we must calculate the integral Λ(t N ) = θ n e −ω (t n+1 −t n ) − e −ω (tn−t n ) , and we further simplify the double summation in the following. θ n e −ω(t N −tn) − 1 . Proof. Proceeding by induction, the assertion is trivial for N = 1. If it is true for some N > 0, this implies that θ n e −ω (t n+1 −t n ) − e −ω (tn−t n ) = N −1 n=1 θ n e −ω(t N −tn) − 1 . It follows that The following observed viruses have relative rates for which the 95% credible intervals have lower bound greater than 1. 1 EBOV|EM_COY_2015_013857||GIN|Forecariah|2015-03-18 2 EBOV|EM_COY_2015_013731||GIN|Coyah|2015-03-14 3 EBOV|EM_COY_2015_014261||GIN|Forecariah|2015-04-02 4 EBOV|IPDPFHGINSP_GUI_2015_5339||GIN|Conakry|2015-04-08 5 EBOV|KG12||GIN|Boke|2015-05-27 6 EBOV|EM_COY_2015_017091||GIN|Boke|2015-05-28 7 EBOV|EM_COY_2015_016483||GIN|Boke|2015-05-13 8 EBOV|EM_COY_2015_016743||GIN|Boke|2015-05-19 9 EBOV|KG88||GIN|Boke|2015-06-19 10 EBOV|KG90||GIN|Boke|2015-06-19 11 EBOV|KG87||GIN|Boke|2015-06-19 12 EBOV|KG80||GIN|Boke|2015-06-18 13 EBOV|KG35||GIN|Boke|2015-06-08 14 EBOV|GUI_CTS_2015_0052||GIN|Boke|2015-06-25 4 EBOV|EM_079578|KR817201|GIN|Gueckedou|2014-04-18 5 EBOV|EM_079587|KR817202|GIN|Gueckedou|2014-04-22 6 EBOV|EM_079514|KR817197|GIN|Gueckedou|2014-04-10 C Parallelized gradient algorithms Algorithms 1 and 2 present instructions for computing the Hawkes process log-likelihood gradient (Equation 4 ) with respect to the M -vector θ of event-specific rates. Figure 2 shows resulting speedups for both Algorithm (1) and Algorithm (2) on a CPU and GPU, respectively. Hawkes processes in finance Emergence of zaire ebola virus disease in guinea The challenges of modeling and forecasting the spread of covid-19 Evolutionary origins of the sars-cov-2 sarbecovirus lineage responsible for the covid-19 pandemic The hidden geometry of complex, network-driven contagion phenomena Phylogenetic analysis. models and estimation procedures Hawkes process modeling of covid-19 with mobility leading indicators and spatial covariates. medRxiv Assessing phenotypic correlation through the multivariate phylogenetic latent liability model An Introduction to the Theory of Point Processes: Elementary Theory of Point Processes A dynamic contagion process Relaxed phylogenetics and dating with confidence Relaxed phylogenetics and dating with confidence Virus genomes reveal factors that spread and sustained the ebola epidemic Rcpp: Seamless R and C++ integration The early spread and epidemic ignition of hiv-1 in human populations The number of evolutionary trees Phylogenies and the comparative method Bayesian analysis of elapsed times in continuoustime markov chains Relaxed random walks at scale Spatially inhomogeneous background rate estimators and uncertainty quantification for nonparametric hawkes point process models of earthquake occurrences Fast likelihood calculations for comparative analyses Improving bayesian population dynamics inference: a coalescent-based model for multiple loci Genomic surveillance elucidates ebola virus origin and transmission during the 2014 outbreak An adaptive metropolis algorithm A stepwise discriminant analysis program using density estimation Dating of the human-ape splitting by a molecular clock of mitochondrial dna Spectra of some mutually exciting point processes with associated variables Cluster models for earthquakes-regional comparisons Point spectra of some mutually exciting point processes Spectra of some self-exciting and mutually exciting point processes Hawkes processes and their applications to finance: a review A linear-time algorithm for Gaussian and non-Gaussian trait evolution models Massive parallelization boosts big bayesian multidimensional scaling Scalable bayesian inference for self-excitatory stochastic processes applied to big american gunfire data ggmap: Spatial visualization with ggplot2 Real-time predictions of the 2018-2019 ebola virus disease outbreak in the democratic republic of the congo using hawkes point process models Spatio-temporal point process models for the spread of avian influenza virus (H5N1) Tideh: Time-dependent hawkes process for predicting retweet dynamics Bayesian phylogeography finds its roots Phylogeography takes a relaxed random walk in continuous space and time Is gun violence contagious? a spatiotemporal test Bayesian phylogenetic inference via markov chain monte carlo methods The neural Hawkes process: A neurally self-modulating multivariate point process. Pages 6754-6764 in Power-law models for infectious disease spread Marked point process hotspot maps for homicide and gun crime prediction in chicago Modeling and estimation of multi-source clustering in crime and security data MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo 2 Learning multivariate hawkes processes at scale Statistical models for earthquake occurrences and residual analysis for point processes Investigating clustering and violence interruption in gang-related violent crime data using spatial-temporal point processes with covariates Reverend Bayes on inference engines: A distributed hierarchical approach Unifying the spatial epidemiology and molecular evolution of emerging epidemics The genomic and epidemiological dynamics of human influenza a virus A review of self-exciting spatio-temporal point processes and their applications A tutorial on hawkes processes for events in social media Sir-Hawkes: Linking epidemic models and Hawkes processes to model diffusions in finite populations. Pages 419-428 in On the choice of smoothing parameters for parzen estimators of probability density functions Mrbayes 3.2: efficient bayesian phylogenetic inference and model choice across a large model space Parallel random numbers: as easy as 1, 2, 3. Pages 1-12 in Testing separability in spatial-temporal marked point processes Facilitated estimation of etas Nonparametric estimation of variable productivity hawkes processes A recursive point process model for infectious diseases Bayesian hypothesis testing of four-taxon topologies using molecular sequence data Origins and evolutionary genomics of the 2009 swine-origin h1n1 influenza a epidemic Hierarchical phylogenetic models for analyzing multipartite sequence data Many-core algorithms for statistical phylogenetics Bayesian selection of continuous-time Markov chain evolutionary models Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10 United states rushes to fill void in viral sequencing ggplot2: Elegant Graphics for Data Analysis Who: Ebola situation report Maximum likelihood phylogenetic estimation from dna sequences with variable rates over sites: approximate methods Bayesian phylogenetic inference using dna sequences: a markov chain monte carlo method Fast estimation of multivariate spatiotemporal hawkes processes and network reconstruction Self-attentive hawkes process Analyzing earthquake clustering features by using stochastic reconstruction Transformer hawkes process The research leading to these results has received funding through National Institutes of Health grants K25 AI153816, U19 AI135995 and R01 AI153044 and National Science Foundation grant DMS1264153. We gratefully acknowledge support from NVIDIA Corporation and Advanced Micro Devices, Inc. with the donation of parallel computing resources used for this research. The following observed viruses have relative rates for which the 95% credible intervals have upper bound less than 1.1EBOV|EM_079413|KR817184|GIN|Gueckedou|2014-03-31 2EBOV|EM_079414|KR817185|GIN|Gueckedou|2014-03-31 3EBOV|EM_079542|KR817199|GIN|Gueckedou|2014-04-12Algorithm 1 Parallel evaluation of Hawkes process log-likelihood gradient: Makes use of multiple central processing unit (CPU) cores and loop vectorization to calculate Hawkes process log-likelihood gradient with respect to event-specific rates θ. When using double-precision floating point, this algorithm may use either SSE or AVX vectorization to make J = 2-or 4-long jumps. We denote the number of CPU cores as B. Symbols , λ and Λ appear in Equations (1) and (4).1: Compute rates λ 1 , . . . , λ N : a:copy xn, tn to cache i:m: copy x n :(n +J) , t n :(n +J) to cache n:calculate δ nn : δ n(n +J−1) vectorized multiplication p:calculate λ nn : λ n(n +J−1) vectorized evaluation q:λn ← λn + λ nn : λ n(n +J−1) vectorized addition r:n ← n + J s: Algorithm 2 Parallel evaluation of Hawkes process log-likelihood gradient:Computes the log-likelihood gradient with respect to event-specific rates θ using multiple levels of parallelization on a graphics processing unit (GPU). In this paper, we specify B = 128 for the size of the GPU work groups. Symbols , λ and Λ appear in Equations (1) and (4).