key: cord-0925617-1r6ecytl
authors: Park, Jaewoo; Chang, Won; Choi, Boseung
title: An interaction Neyman–Scott point process model for coronavirus disease-19
date: 2021-12-07
journal: Spat Stat
DOI: 10.1016/j.spasta.2021.100561
sha: d76213fa772725320f20808d99e5ff827303f81d
doc_id: 925617
cord_uid: 1r6ecytl

With rapid transmission, the coronavirus disease 2019 (COVID-19) has led to over three million deaths worldwide, posing significant societal challenges. Understanding the spatial patterns of patient visits and detecting local cluster centers are crucial to controlling disease outbreaks. We analyze COVID-19 contact tracing data collected from Seoul, which provide a unique opportunity to understand the mechanism of patient visit occurrence. Analyzing contact tracing data is challenging because patient visits show strong clustering patterns, while cluster centers may have complex interaction behavior. Cluster centers attract each other at mid-range distances because other cluster centers are likely to appear in nearby regions. At the same time, they repel each other at too small distances to avoid merging. To account for such behaviors, we develop a novel interaction Neyman–Scott process that regards the observed patient visit events as offsprings generated from a parent cluster center. Inference for such models is challenging since the likelihood involves intractable normalizing functions. To address this issue, we embed an auxiliary variable algorithm into our Markov chain Monte Carlo. We fit our model to several simulated and real data examples under different outbreak scenarios and show that our method can describe the spatial patterns of patient visits well. We also provide useful visualizations that can inform public health interventions for infectious diseases, such as social distancing.

Caused by the transmission of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the coronavirus disease 2019 was first reported in December 2019 in Wuhan, Hubei province, China (WHO, 2020) . By May 2021, there have been about 150 million confirmed cases of COVID-19, with more than three million deaths. The disease spreads more quickly than influenza, 5 mainly through close contact with infected people (CDC, 2020) . Human-to-human transmission is most common for COVID-19, primarily via respiratory droplets and aerosols from an infected person. Thus, contact tracing, which records the travel history of patients, can be an effective way to gain insights into disease prevalence. Spatial point process models for contact tracing data can be useful to study the probabilistic mechanism of the spread of diseases. Here, we propose a new 10 point process model that can provide useful epidemiological information such as a warning system for local hotspots.

We analyze the locations visited by confirmed patients in Seoul, South Korea. We refer to a "confirmed patient" as a person who tested positive through the polymerase chain reaction (PCR) test, which is considered to be the standard procedure to identify COVID-19 patients because of its 15 high accuracy. The visited locations are first identified through patient interviews and then verified through other measures. The data set contains the thorough contact histories of confirmed patients, collected through a rigorous contact tracing protocol (Park et al., 2020) , and hence provides a unique source for point process modeling. We describe the details of the contact tracing protocol in Section 2. 20 We regard each visit in the contact tracing data as an event in the point process model. Since such events are spatially clustered, a natural way to model this is the Neyman-Scott point process (Neyman & Scott, 1952) , which considers observed events as offsprings generated around unobserved parents (cluster centers). In the epidemiological context, we can regard a parent point as a cluster center of disease and offsprings as the corresponding clusters (i.e., patient visits). 25 Several computational methods have been developed for inference for point processes, including a minimum contrast method (Diggle, 2013) and pseudolikelihood approximation (Guan, 2006; Diggle et al., 2010) . However, such approaches are sensitive to the choice of tuning parameters and may not be accurate in the presence of strong spatial dependence among points. Through simulation studies, Mrkvička et al. (2014) report that Bayesian estimation is the most precise for the Neyman-Scott extensions for Neyman-Scott processes have been developed. Waagepetersen (2007) ; Mrkvička et al. (2014) ; Mrkvička & Soubeyrand (2017) study inference for inhomogeneous Neyman-Scott point processes. Yau & Loh (2012) considers a special case where the parent process exhibits repulsion. Albert-Green et al. (2019) generalizes the Neyman-Scott process by allowing parents to follow the log-Gaussian Cox process (Møller et al., 1998) . However, these existing models may not be 40 appropriate for COVID-19 data because of the complex interactions of cluster centers. Unobserved local cluster centers attract each other because other cluster centers are likely to appear in nearby communities.

However, an interaction point process only with pairwise attraction is not well defined in general (Moller & Waagepetersen, 2003) due to clumping behavior; the integral of the intensity of the 45 process becomes infinite because adding a new point always increases intensity in the nearby area, which in turn encourages the generation of another point. This creates a loop of positive feedback, leading to infinite intensity in the target area. Therefore, more sophisticated models are often used to consider both attractive and repulsive interactions (Geyer, 1999; Goldstein et al., 2015) . By adapting these models, we allow local cluster centers to attract each other at mid-range distances, 50 while make them repel each other at too small distances for a valid model specification.

In this manuscript, we propose a novel parametric approach that exhibits interaction between local clusters. This new approach can infer model parameters that provide guidelines for controlling the spread of disease. Our model allows us to detect unobserved cluster centers, thereby providing a practical warning system for coronavirus hotspots. Recently, Goldstein et al. (2015) proposed 55 an attraction-repulsion point process model by extending the Strauss process (Strauss, 1975 ) that only allows repulsion behavior. Russell et al. (2016) uses such a process to model different types of interactions among animals, including herding behavior and collision avoidance. Adopting this idea, we incorporate the attraction-repulsion (Goldstein et al., 2015) process into the parent process. Our model includes an intractable normalizing function, posing computational and inferential challenges.

MCMC algorithm for this model and provide implementation details. In Section 4, we apply our methods to both simulated and real COVID-19 data sets. We show that our methods can detect the sources of cluster centers and describe the spatial patterns of patient visits. Furthermore, we provide a disease risk map that can give important epidemiological interpretations. In Section 5, we conclude the paper with a discussion and summary. 

In this section, we provide the background of our contact tracing data and summarize the results from the exploratory data analysis.

We study the COVID-19 contacting tracing data in Seoul, which disclose the detailed locations 75 visited by confirmed patients. When a patient is confirmed to be infected by a screening clinic through a PCR test, he or she is immediately transported to a hospital dedicated to treating infectious disease. Once patients are admitted, they are interviewed by an epidemiological investigator, and their visited location history is thoroughly recorded. For asymptomatic cases, the travel history since a possible exposure to an infection source is recorded. For symptomatic cases, the travel 80 history from two days before the symptoms' onset is recorded. The collected tracing information is anonymized and disclosed for two weeks on local government websites and then archived. This code follows the "Response Guidelines to Prevent the Spread of COVID-19 (for local government)" (KDCA, 2020) . The contact tracing data sets used in this study were collected directly from these websites. Table 1 shows the locations visited by multiple confirmed patients. For example, the first row in Table 1 indicates that a patient was confirmed as infected by a screening clinic on March 7. However, the confirmation dates for the other patients are different because Table 1 provides records from multiple patients. Note that we do not use any personal identification information, including the patient ID in the record for each visited location, to avoid possible privacy violations. spatial patterns of patient visits, we regard each visit (coordinate) as a realization of a spatial point process.

In this paper, at any given time point, we fit a point process model based on the contact tracing data accumulated over the last two weeks. According to the basic data analysis for the early stages 95 of the COVID-19 spread in South Korea and China, two weeks is the mean period of recovery from infection (Ki, 2020; Choi & Ki, 2020) . Two weeks is also the period for epidemic investigations conducted by the local health authority for all confirmed cases. Therefore, when we construct a warning system for COVID-19 at a certain time, modeling patient visits in the last two weeks would be the most useful. In particular, we focus on three different scenarios that end on March 19th,

April 2nd, and April 15th, which can be considered as severe, moderate, and mild outbreak cases, respectively, in terms of overall visit numbers. Our goal here is to examine how our model works and provide important epidemiological insights for these different scenarios with various disease spreading patterns.

Spatial point processes provide a natural solution to model the spatial patterns of the locations visited by confirmed patients. Here, we provide the motivation for a new interaction point process model with an exploratory data analysis. Let X = {x 1 , · · · , x n } be the realization of the point process over the bounded spatial domain S ∈ R 2 . The pair correlation function (PCF) is useful for 5 J o u r n a l P r e -p r o o f Journal Pre-proof exploring point process observations (Stoyan & Stoyan, 1994) . The PCF is defined as

where |S| is the area of the spatial domain. Here, Ripley's K(r) is the expected density of the points within distance r. Under complete spatial randomness, K(r) = πr 2 , which results in g(r) = 1.

g(r) > 1 indicates that points have a tendency to cluster at a distance r (attraction), while g(r) < 1

indicates that points tend to remain apart at a distance r (repulsion). Figure 2 illustrates the spatial pattern in the severe case period (ending on March 19th) and its 110 PCF. We observe that patient visits are spatially clustered (g(r) > 1). Therefore, a point process model for the data should capture such behavior. The Neyman-Scott process (Neyman & Scott, 1952 ) is widely used to study spatially aggregated point patterns. Furthermore, the Neyman-Scott process can detect cluster centers, which can be useful for identifying local hotpots. Consider the Neyman-Scott process X = ∪ c∈C X c , where X c is the offspring and C is the parent (cluster centers).

Given parent process C, offspring X c , c ∈ C follows an independent Poisson process with intensity αk(u − c, ω). Here, ω controls the spread of offsprings around their parent and α determines the expected number of offsprings per each cluster. With the Gaussian kernel k(u − c, ω) with mean c and variance ω 2 , X is called the Thomas process (Thomas, 1949) . The basic Neyman-Scott process models the unobserved parent process C as a simple Poisson process.

The Neyman-Scott process is appropriate for modeling COVID-19 data because the locations visited by confirmed patients, X c , are clustered around the unobserved center C. From this, we can detect the local cluster centers of COVID-19. However, C may also have complex spatial interactions. Naturally, cluster centers tend to clump together because other cluster centers are likely to exist in a nearby region. However, a purely attractive pairwise interaction process is not 125 well defined (Moller & Waagepetersen, 2003) in general because of the degeneracy issue; when a new point is added to an existing set of points, the probability of having another point in the nearby area will increase. This creates a loop of positive feedback and the process will create an infinite number of points; as a result, a process only with pairwise attraction will have an infinite integrated intensity over the study area. To prevent this phenomenon, incorporating repelling behavior at too 130 small r in the model is necessary. The basic Neyman-Scott process cannot describe such behavior because the basic model considers an independent Poisson process for C. In the following section, 6 we add another layer of complexity to the basic Neyman-Scott process. This new model can provide epidemiological interpretations for understanding local clusters of COVID-19.

In this section, we propose a new Neyman-Scott process by incorporating interaction behavior into C. This new parametric approach can detect unobserved cluster centers of COVID-19 and describe their patterns. Based on this, we can provide a warning system for higher-risk regions.

Consider the realization of Neyman-Scott point process X = {x 1 , · · · , x n } in the bounded plane S ⊂ R 2 (Seoul domain). This indicates the observed locations visited by confirmed patients. Let C = {c 1 , · · · , c m } be their unobserved parent process. In our context, each c i is the center of the kernel that captures the local intensity of point processes. The kernel centers tend to remain apart (repulsion) at small distances to avoid the degeneracy issue. At mid-range distances, they tend to clump together (attraction) because other cluster centers are likely to appear in nearby areas.

Cluster centers become independent at sufficiently large distances. To describe such patterns, we model the parent in the Neyman-Scott process as an attraction-repulsion interaction point process (Goldstein et al., 2015) . The locations of the unobserved center of the parent process is modeled

where D ij denotes the Euclidean distance between two parent points c i and c j . Z(κ, θ 1 , θ 2 ) is a normalizing function that makes f (C|κ, θ 1 , θ 2 ) as a valid probability density function. The interaction function φ(·) can be defined as

Here, D 1 , D 2 are set to make the interaction function φ(D) continuously differentiable (Goldstein Let f (X|C, α, ω) be the density of the patient visits with respect to the unit Poisson point process in |S|. Given the local cluster centers (parents), the locations visited by confirmed patients can be modeled as

We use a symmetric Gaussian kernel with a center at each c i and variance ω 2 . This results in a higher intensity of patient visits around the cluster centers. In our context, α controls the expected number of confirmed patients for each c i and ω controls the width of cluster events activity.

The proposed model in the previous section has a hierarchical structure. Unobserved local cluster centers (parent process) follow the spatial interaction process defined in (1) and (2). Observed patient visits (offsprings) are distributed around the unobserved parents with Gaussian kernels.

The Bayesian framework is useful for such hierarchical models, as it can easily quantify the model parameters' uncertainty using MCMC. With priors p(Θ), the joint posterior distribution is

Although (4) (1) can be written as

Here, the calculation of normalizing function Z(κ, θ 1 , θ 2 ) requires integration over spatial domain S, 155 which is infeasible. Inference for such models is challenging because an intractable Z(κ, θ 1 , θ 2 ) is a function of the model parameters. The resulting posterior (4) is referred to as a doubly-intractable distribution (Murray et al., 2006) .

Several computational methods have been proposed to address this issue. By assuming the conditional independence of points, Besag (1974) proposed pseudolikelihood that does not have 160 intractable normalizing functions. Due to its convenience, pseudolikelihood approximations are often used in many point process applications (e.g., Diggle et al., 2010; Tamayo-Uria et al., 2014) .

However, it is well known that such approximations are unreliable when there is strong dependence among points (Mrkvička et al., 2014) . Geyer & Thompson (1992) (DMH) is the most practical method for point process models. DMH can avoid the calculation of Z(κ, θ 1 , θ 2 ) and alleviate memory issues, which can be problematic for adaptive algorithms (Atchade et al., 2008; Liang et al., 2016) . In the following section, we formulate a DMH that can carry out Bayesian inference for an interaction Neyman-Scott point process model.

Here, we describe an MCMC algorithm for the attraction-repulsion Neyman-Scott point process model. Consider the model parameters

2 } and latent parent process C (t) in (4) at the t-th iteration. We update the parameters successively. The offspring parameters J o u r n a l P r e -p r o o f Journal Pre-proof

with a joint prior density p(α (t) , ω (t) ). We can update parent parameters from

2 ) (see Section 4.1 for how we choose the priors in our problem). Compared with updating the offspring parameters, updating the parent parameters is challenging because of the intractable normalizing function in (5). Intractable Z(κ, θ 1 , θ 2 ) leads to the intractable acceptance probability in MCMC as

for the proposed parameters κ , θ 1 , and θ 2 . To avoid the direct evaluation of the intractable normalizing functions in α, we incorporate double Metropolis-Hastings (DMH) (Liang, 2010) into our MCMC algorithm. Instead of evaluating the intractable normalizing functions in α, the DMH sampler generates an auxiliary variable A from (1) via a birth-death MCMC (Moller & Waagepetersen, 2003) . We provide the details of the birth-death MCMC in the supplementary material. For given a proposed parent parameters (κ , θ 1 , θ 2 ), the birth-death MCMC iteration is repeated m times to generate an auxiliary variable A. Theoretically, we can simulate A exactly as m → ∞; of course, m should be finite in practice. We observe that the acceptance rate of each birth-death iteration is 0.87 on average, which is natural because we propose to change only a single point (birth, death, or move). In DMH (Liang, 2010) , the last state of the resulting birth-death MCMC (i.e., mth realization) is regarded as an exact draw from f (C|κ , θ 1 , θ 2 ). Following the recommendation in

Liang (2010) 

, 1 . (7) This does not include intractable terms because the introduction of A modifies the original densities in (6) by multiplying h(A|κ (t) , θ

2 ) into the numerator and h(A|κ , θ 1 , θ 2 )/Z(κ , θ 1 , θ 2 ) into the denominator. If the simulated auxiliary variable A resembles the parent process C, the proposed parent parameters (κ , θ 1 , θ 2 ) are likely to be accepted. Therefore, the realistic simulation of A is important for accepting the parent parameters. However, the auxiliary variable does not 180 affect the acceptance of the other components (offspring parameters, C) because the conditional posterior distributions of the other components are not dependent on intractable Z(κ, θ 1 , θ 2 ).

Finally, we obtain C (t+1) from

using a birth-death MCMC. Note that stationary distribution of the birth-death MCMC algorithm for updating C (t+1) is different from that of generating the auxiliary variable A in DMH. The stationary distribution for generating the auxiliary variable follows (1) (see the supplementary material for details). Our MCMC algorithm is summarized in Algorithm 1.

In this section, we apply our approach to simulated and real data examples. The computer code for this is implemented in R and C++ using the Rcpp and RcppArmadillo packages (Eddelbuettel et al., 2011). 190 

To complete the posterior specification we now explicitly define the prior distributions. In the Neyman-Scott process, there is a strong dependence between α and ω. The same observed pattern can be regarded as the points generated from a small number of parents, with each having a large number of offsprings (large α, large ω), or they may come from a large number of parents, with 195 each having a small number of offsprings (small α, small ω). To avoid such identifiability issues, we 11 J o u r n a l P r e -p r o o f Journal Pre-proof Algorithm 1 MCMC for an attraction-repulsion Neyman-Scott process

2 } and C (t) at t-th iteration.

Part 1: Update the offspring parameters: α (t) , ω (t) .

Propose α , ω ∼ q(·|α (t) , ω (t) ) and accept it with probability

Part 2: Update the parent parameters:

2 ).

Generate an auxiliary variable A from the probability model using birth-death MCMC:

Accept it with probability

2 )p(κ , θ 1 , θ 2 )q(κ , θ 1 , θ 2 |κ (t) , θ

Part 3: Update the unobserved parent process: C (t) .

) using birth-death MCMC J o u r n a l P r e -p r o o f Journal Pre-proof need to set the upper and lower bounds for them using uniform priors (Møller & Waagepetersen, 2007; Kopeckỳ & Mrkvička, 2016) .

Here, we focus on modeling cluster centers at the Dong scale (the smallest administrative district). To this end, we set the lower and upper bounds for ω based on the average size of a 200 Dong. This encourages that patient visits from different Dongs come from different cluster centers. Therefore, we set the range of ω to [ |S|/70, |S|/25]. We assume that each cluster center (parent) generates at least 3 and at most 30 patient visits (offsprings) in a Dong on average. This choice of priors can avoid identifiability issues in the Neyman-Scott point process while maintaining interpretability and modeling flexibility.

The prior range of κ needs to be determined in consideration of the overall intensity and offspring intensity α. The overall intensity is around 6 × 10 −7 , computed by dividing the average number of patient visits within two weeks by the area of Seoul |S|. The average number of patient visits is computed over about 16 weeks (02/20/2020-06/11/2020). Since κ controls for the intensity of the parent process and α controls for the average number of offsprings per each parent, κ × α should 210 be around the overall intensity 6 × 10 −7 . By specifying the prior of κ to be uniform with the range [1 × 10 −10 , 1 × 10 −6 ], κ × α belongs to the range [3 × 10 −10 , 3 × 10 −5 ] so that the resulting range can include the overall intensity 6 × 10 −7 .

Similar to ω, we set a uniform prior for θ 2 whose upper and lower bounds are proportional to the average size of a Dong. We use a uniform prior with the range [ |S|/70, |S|/25] so that θ 2 215 gives the location of the peak value within a Dong. To allow for attraction, we set the prior of θ 1 to be uniform with the range [1, 3], as in Goldstein et al. (2015) .

We now conduct a simulation study to validate our method. We first simulate the parent process C from (1) with the true parent parameters κ, θ 1 and θ 2 . We then simulate the offsprings X from 220 (3) centered on each parent point with the true offspring parameters α and ω. We consider three different scenarios here: In Scenario 1, we emulate COVID-19 data with a severe outbreak. This has numerous local cluster centers (m = 129) and each of them also generates a large number of offsprings. This results in n = 798 patient visits in total. In Scenario 2, we consider a moderate outbreak. This has fewer cluster centers (m = 93) and each generates a moderate number of consider a mild outbreak in which we have the smallest number of cluster centers (m = 56) and patient visits (n = 239). Table 2 indicates that the true parameter values used in all the scenarios are estimated with reasonable accuracy. We also compare the true and recovered parent points in Figure 4 . We 230 obtain the recovered parent points from the last iteration of the MCMC algorithm (i.e., C (t) in Algorithm 1). We observe that our method can detect cluster centers well as there are good agreements between the true and fitted parent points in all the scenarios.

We repeat the simulation 100 times for each scenario to investigate whether the credible intervals can appropriately cover the true parameters. Table 3 indicates that coverage of the parameters in 235 the different scenarios is reasonably close to the 95% nominal rate. Note that obtaining an exact frequentist coverage of Bayesian credible intervals can be problem-specific. It might largely depend on the choice of priors or can only be achieved with an increasing sample size. A full-fledged study of these properties for an interaction Neyman-Scott process would be an interesting research direction. 

Parameter Interpretation. We now apply our method to the real observational data described in Section 2.1. Algorithm 1 takes about 2.5 hours for 100,000 MCMC iterations. The parameter estimation results for the time periods that end at March 19th, April 2nd, and April 15th are summarized in Table 4 . These three different periods can be regarded as severe, moderate, and 245 mild outbreaks, respectively, in terms of the number of patient visits. The average number of offspring points per each parent (α) and intensity of the parent process (κ) change as the overall number of patient visits changes; that is, more overall events lead to higher α and κ values. The width of the Gaussian kernel controlled by ω, on the other hand, stays similar. For April 15th, when the total number of patient visits is relatively small (n = 190), α takes a lower value as well.

The estimated values of the remaining two parameters, θ 1 and θ 2 describe the interaction between the parent points and show how clusters of the events are distributed in Seoul. The distance for the maximum interaction, determined by θ 2 , increases as the number of parents reduces, reflecting a sparser distribution of parent events. Nevertheless, from the θ 1 estimates (which are significantly greater than 1), the parent points in all the periods show clear attraction at the lo-255 cation given by θ 2 (around 550m). This implies that different event clusters tend to appear near each other rather than independently. Figure 5 indicates that parent points are generated nearby the observed locations visited by confirmed patients (i.e., offsprings in the Neyman-Scott process).

Note that parents are unobserved points; therefore, we do not have information about the types of places for parents' coordinates. Instead, we can calculate the closest data point from a parent 260 point and identify the place type in that observed point. We find that several parent points occur nearby points of interest (within 800m), such as supermarkets, Internet cafes, churches, and subway stations. This highlights the fact that our approach might be suitable for detecting local sources of Visualization of COVID-19 Risk. From the estimated values of α, κ, and ω, we can create an intensity map of patient visit events. Since local cluster centers cause rapid community transmission, it is essential to avoid higher risk locations in daily life. Understanding the probabilistic mechanism occurring patient visits and detecting local cluster centers (hotspots) are key to manage disease outbreaks. Letα andω be the posterior mean of α and ω, respectively. Given the parent points from the last iteration of the MCMC, denoted asĉ 1 , · · · ,ĉ m , the spatial intensity function for the offspring points is given by

for any x ∈ S. This map can be used to identify high-risk areas where the intensity exceeds a 265 certain threshold set by administrative decision. Figure 6 illustrates the intensity map J(x) and

high-risk areas defined as J(x)/14 > 1.427 km 2 , which corresponds to roughly one patient per 1.427 km 2 each day. Here we divide J(x) by 14 to convert the intensity computed for two weeks into the intensity for one day. The area 1.427 km 2 is the average size of each Dong in Seoul.

The hotspots shown in Figure 6 present different types of spreading events. For example, the risk in the community. The latter, on the other hand, did not receive media coverage considering privacy protection and the local community was largely unaware of the risk. Our approach may provide a way to warn local people without revealing information about the specific contagious individual. In summary, the visualization allows authorities to identify contagion risk hotspots and help them set up public health interventions while maintaining privacy protection.

Visualization of Risk Boundaries. Our approach can also provide a way to generate "risk boundaries" within a short time period (e.g., one day), assuming that the event distribution within that period can be approximated by a model based on the previous two weeks. Such boundaries are potentially useful for issuing public health advisories. The left column of Figure 7 shows that the 16 patient visits occurred on April 3rd were mostly concentrated within areas with high intensity g(x), 285 given by the existing parent points (ĉ 1 , ...,ĉ m ), but there are also points located slightly away from the high intensity areas. However, there are no points that are clearly far away from the existing high intensity areas either. Therefore, these points can be viewed as offspring events from new parents that are created near existing parents.

The right column of Figure 7 shows the risk boundaries that consider both the occurrence of 290 new parent points and the range of offspring densities created by them. Specifically, risk boundaries are represented as the orange circles, whose centers are the existing parent pointsĉ 1 , . . . ,ĉ m . The radii are given as θ 2 + 1.96ω, the sum of the distance that maximizes the interaction between the parent points (θ 2 ) and one side of the 95% interval for the offspring density (1.96ω). As Figure 7 shows, these risk boundaries indicate how far new events can reach given the existing parent points.

In this manuscript, we proposed a novel interaction Neyman-Scott point process for modeling COVID-19 data. Our model can be used to study the spatial patterns of patient visits and detect cluster centers in local communities by considering location-based interactions. To address the intractable normalizing functions involved in the posterior, we embed DMH (Liang, 2010) into 300 our MCMC algorithm. We apply our approach to COVID-19 data sets pertaining to Seoul, and draw meaningful epidemiological conclusions based on the parameter estimates. Furthermore, we provide easy-to-interpret disease risk maps from the fitted results. Through simulation studies, we

show that our method can provide reasonably accurate parameter estimates and detect true cluster centers.

Here, we focus on developing a new cluster point process model by regarding each patient visit as a realization from the process. Our models allow both attraction and repulsion for the cluster centers, which have values for the different modeling forms of interactions in the Neyman-Scott point process. Note that there are other spatial models that can be applied to contact tracing data. Examples include dynamic models for movement tracks of animals (Russell et al., 310 2016) and spatial gradient methods for modeling the spread of invasive species (Goldstein et al., 2019) . Adopting such methods for studying COVID-19 contact tracing data could provide another interesting epidemiological insight. Following the standard Neyman-Scott point process, we model the parent intensity parameter κ as a constant. One can generalize this by embedding a spatially J o u r n a l P r e -p r o o f Journal Pre-proof varying intensity function (i.e., κ(x)) into our model. As in the well-established mechanistic point 315 process framework (Diggle, 2013; González et al., 2016) , we can represent κ(x) as a function of various socio-demographic covariates (e.g., population sizes). Adopting such covariates into our model would be an interesting future research avenue.

Our method can be useful for planning public health interventions such as deciding social distancing levels. Social distancing is necessary to prevent the spread of infectious diseases, especially 320 for people in risk groups for severe COVID-19. However, raising the social distancing level could adversely affect the local economy. Therefore, it is essential to measure the degree of COVID-19 risk: how dangerous is the current coronavirus outbreak? Based on our approach, we can provide some guidelines for deciding the appropriate distancing levels to minimize economic damage. We q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q qq q q q q q q q q q q q q q q q q q q q q Figure 6 (left column) and the risk boundaries defined as circles, whose centers are the existing sampled parent pointsĉ 1 , . . . ,ĉm and the radii are defined as θ 2 + 1.96ω (right column). The black dots in both columns show the events observed on the day following the last date of each period. data examples in the order of α, ω, κ, θ 1 , θ 2 (rows). Columns indicate severe, moderate, mild cases, respectively. 100,000 MCMC samples are generated.

A hierarchical point process with application to storm cell modelling

Bayesian computation for statistical models with intractable normalizing constants

Spatial interaction and the statistical analysis of lattice systems

How COVID-19 spreads

Estimating the reproductive number and the outbreak size of covid-19 in korea

Statistical analysis of spatial and spatio-temporal point patterns

Partial-likelihood analysis of spatio-temporal pointprocess data

Rcpp: Seamless R and C++ integration

Likelihood Inference for Spatial Point Processes. Stochastic Geometry: Likelihood and computation

Constrained Monte Carlo maximum likelihood for dependent data

An attraction-repulsion 380 point process model for respiratory syncytial virus infections

Quantifying spatiotemporal variation of invasion spread

Spatio-temporal point process statistics: a review

A composite likelihood approach in fitting spatial point process models

Responseguidelines to prevent the spread of covid-19 (local government)

Epidemiologic characteristics of early cases with 2019 novel coronavirus (2019-ncov) 390 disease in korea

On the Bayesian estimation for the stationary Neyman-Scott point processes

A double Metropolis-Hastings sampler for spatial models with intractable normalizing constants

An adaptive exchange algorithm for sampling from distributions with intractable normalizing constants

An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants

Log Gaussian Cox processes. Scandinavian journal of statistics

Statistical inference and simulation for spatial point processes

Modern statistics for spatial point processes. Scandi-405 navian

Two step estimation for Neyman-Scott point process with inhomogeneous cluster centers

On parameter estimation for doubly inhomogeneous cluster point processes

MCMC for doubly-intractable distributions

A theory of the spatial distribution of galaxies

Bayesian inference in the presence of intractable normalizing functions

Contact tracing during coronavirus disease outbreak

Dynamic models of animal movement with spatial point process interactions

Fractals, random shapes, and point fields: methods of geometrical statistics volume

A model for clustering

Modelling of the spatio-temporal distribution of rat sightings in an urban environment

A generalization of Poisson's binomial limit for use in ecology

An estimating function approach to inference for inhomogeneous Neyman-Scott processes

Novel Coronavirus -China

A generalization of the Neyman-Scott process

Birth-death samplers (Moller & Waagepetersen, 2003) are often used to generate point processes given the model parameters. We propose adding a new point (birth), removing an existing point (death), or moving a point with equal probability. To generate an auxiliary variable in DMH, we use a birth-death sampler whose stationary distribution is f (C|κ, θ 1 , θ 2 ). Algorithm 2 summarizes 355 this step. In our full MCMC algorithm (Algorithm 1 in the manuscript), we update C, given all the model parameters. Here, we use a birth-death sampler whose stationary distribution is f (X|C, α, ω)f (C|κ, θ 1 , θ 2 ). Algorithm 3 summarizes this step.

Given the parent parameters κ, θ 1 , θ 2 and the point pattern C = {c 1 , · · · , c m }.Birth step: add a point with probability 1/3 Propose a new point ξ uniformly over the spatial domain S:, 1Death step: remove a point with probability 1/3Remove an existing point η ∈ {c 1 , · · · , c m }:Move step: move a point with probability 1/3Propose a new point ξ and remove an existing point η:Accept it with probabilityAlgorithm 3 Birth-death MCMC for updating C in the full modelGiven the parent parameters α, ω, κ, θ 1 , θ 2 and the point pattern C = {c 1 , · · · , c m }.Birth step: add a point with probability 1/3Propose a new point ξ uniformly over the spatial domain S:Death step: remove a point with probability 1/3Remove an existing point η ∈ {c 1 , · · · , c m }:Accept it with probabilityMove step: move a point with probability 1/3Propose a new point ξ and remove an existing point η: C = C ∪ {ξ} \ {η}Accept it with probability α = min f (X|C , α, ω)f (C |κ, θ 1 , θ 2 ) f (X|C, α, ω)f (C|κ, θ 1 , θ 2 ) , 1