key: cord-0072869-48dznxd1
authors: Caballero, Yolanda; Giraldo, Ramón; Mateu, Jorge
title: A spatial randomness test based on the box-counting dimension
date: 2022-01-05
journal: Adv Stat Anal
DOI: 10.1007/s10182-021-00434-4
sha: fda4b85285fbe61cf4449227ebade2cebcb9ad6e
doc_id: 72869
cord_uid: 48dznxd1

Statistical modelling of a spatial point pattern often begins by testing the hypothesis of spatial randomness. Classical tests are based on quadrat counts and distance-based methods. Alternatively, we propose a new statistical test of spatial randomness based on the fractal dimension, calculated through the box-counting method providing an inferential perspective contrary to the more often descriptive use of this method. We also develop a graphical test based on the log–log plot to calculate the box-counting dimension. We evaluate the performance of our methodology by conducting a simulation study and analysing a COVID-19 dataset. The results reinforce the good performance of the method that arises as an alternative to the more classical distances-based strategies.

Spatial statistics is the branch of statistics that deals with the modelling of realisations of spatially indexed stochastic processes (Schabenberger and Gotway, 2017) . This field covers three acknowledged areas: geostatistics, areal data, and spatial point patterns (Cressie, 1991) . The last one concerns the analysis of the spatial distribution of locations of events such as earthquakes, landslides or forest The notion of a fractal dimension was introduced by Mandelbrot (1967) who used it as an indicator of surface roughness. A shape with a higher fractal dimension is rougher than one with a lower dimension. Many methods exist for estimating the fractal dimension. Box-counting, R/S analysis and the variation method can be used for this purpose (Breslin and Belward, 1999) . Fractal dimension and its estimation using the box-counting method have been used in different fields of statistics. We can find contributions, among other statistical contexts, in time series analysis (Kopytov et al., 2016) , clustering analysis (Bones et al., 2016) , principal components analysis (Mo and Huang, 2010) and geostatistics (Vidal et al., 2010) . As mentioned before, we show how these concepts can be used in point pattern analysis from an inferential perspective. We then develop our test based on the box-counting dimension of a spatial point pattern. We also propose a graphical test in the line of the classical graphical tests based on G, F or K functions. We evaluate the performance of our methodology by conducting a simulation study through three known spatial structures that can be generated using the library spatstat (Baddeley et al., 2015) in R (R Core Team, 2020). In all cases, the results are consistent with those found by using the functions G, F and Ripley's K (Diggle, 2003) .

The paper is organised as follows. Section 2 introduces the box-counting methodology. Section 3 presents the proposed test, an illustration through simulated and real data, and a power study. Section 4 describes a graphical approach to test for CSR (also based on the box-counting dimension). Section5 shows an application of the method to a real data set of COVID-19 cases in Cali, Colombia. The article ends with a brief discussion and suggestions for further research.

The hypothesis of CSR for a spatial point pattern asserts that the number of events in any region follows a Poisson distribution with a given mean count per uniform subdivision. The events of a pattern are independently and uniformly distributed over space. In other words, the events are equally likely to occur anywhere and do not interact with each other. Here, we use uniform in the sense of following a uniform probability distribution across the study region, not in the sense of "evenly" dispersed across the study region. There are no interactions amongst the events, as the intensity of events does not vary over the plane. Thus, the independence assumption would be violated if the existence of one event either encouraged or inhibited the occurrence of other events in the neighbourhood. In this sense, CSR acts as a benchmark hypothesis to distinguish between randomness and clustering or regularity due to some form of interaction. A fractal is a non-regular geometric shape with the same degree of non-regularity at all scales. It can be treated as a self-similar structure in the sense that even an indefinitely small part of a shape is geometrically similar to the whole (Debnath, 2006) . The fractal dimension is a ratio providing a statistical index of complexity comparing how the details in a pattern change with the scale at which they are measured (Falconer, 2004) . The dimension of self-similar fractals is given by where M is the number of self-similar pieces, and is a scale factor, such that M D s = 1 . In Eq. (1) log corresponds to the logarithm to the base 10. We use this same notation throughout the paper to be consistent with the related published literature. The use of D s in Eq. (1) is quite limited in practice. An alternative is using the box-counting method (Liebovitch and Toth, 1989) . Suppose the object of interest is covered with a number Γ( ) of non-overlapping squares with sides of length . The box-counting estimation of the fractal dimension (hereinafter box-counting dimension) is given by (Addison, 1997) In practice, D in (2) is calculated as the slope of a linear regression between log(Γ( )) and log 1 . Given a number of i values ( i = 1, 2, 3, … ), D is defined by means of the linear model (see Addison (1997) ).

We now show how the box-counting dimension given in (3) can be adapted to the context of spatial point patterns and can be used to test the hypothesis of CSR. Under CSR, we have that the number of events in a square A, with area |A| and sides of length k (without loss of generality, we can take k = 1 ), is Poisson distributed with mean |A| , where is the constant intensity of the point process; that is, the probability function of the number of events in A is From (4), we have Assume the original square A is divided into i non-overlapping squares A j , j = 1, … , i with sides of length i = 1 i , i = 1, 2, 3, ... (see Fig. 1 ).

Under the CSR condition, (N(A)) = |A| = , the mean of a homogeneous Poisson process. From Eq. (5) and consequently

Define the random variable Γ( i ), i = 1, 2, … , as the number of squares of side i containing at least one event, that is, Γ( i ) corresponds to the number of squares required to cover the point pattern (see Figs. 1 and 2). This variable can be defined as

The expected value of Γ( i ) in (6) is Note that in order to define (D) in (3), it is requiered to find (log(Γ( i ))) . Using the first-order Taylor expansion of log Γ( i ) around (Γ( i )) , we have Taking expectation in (3) and using (7), we have under CSR

The square A with sides of length 1 is divided into 3 = 9 non-overlapping squares A j with sides of length 3 = 1 When → ∞ in (8), we have In general, if A (Fig. 1 ) is a square with side length k ≠ 1 , assuming again that |A| = , we then have

Using again the first-order Taylor expansion of log Γ( i ) around (Γ( i )) , we have

Then, under CSR, the expected value of the fractal dimension for a square of side k calculated with the box-counting method is defined by the linear model

Note that |A| = in (10) is usually unknown. Based on just one realisation of a homogeneous Poisson process, we can then estimate by n (the number of points of the observed point pattern) to estimate (D) . Taking lim →∞ in (10) we obtain

The functional relationship between (log(Γ( i ))) and log 1 i defined in Eq. (10) allows to characterise the behaviour of (D) . (log(Γ( i )) depends on k (the side length of the original square) and on (N(A)) = (the expected number of events of the spatial point pattern in A). Given , the shape of the curves does not change (Fig. 3) . Note that the greater the value of k, the more the curve is shifted to the left. Likewise, given a fixed k, the effect of is reflected on the maximum of (log(Γ( i ))) . The greater , the greater the value at which (log(Γ( i ))) becomes constant (Fig. 4) .

The minimum number of boxes required to cover the point pattern is obtained when i = 1 (initial square). In this case log 1 1 = log 1 k . The ordinate for this

value is (log(Γ( i ))) = log(1 − exp − ) . On the other hand, the maximum number of partitions (corresponding to the minimum size of ) is found when the expected number of events in A j is |A j | = 1 . Under this scenario, we have Relation between (log(Γ( i ))) and log 1 i , i = 1, 2, 3, … , 100 , when the initial square has sides of length k (0.01, 0.1, 1, 10, 100) and the expected number of events is |A| = = 100 . Black points at each curve correspond to coordinates log 1 k , log(1 − exp − ) and 1 2 log( ) − log(k), log( ) − 0.199 , respectively (see text for explanations on these values). The slopes of the dashed lines define (D 1 ) (see Eq. 15). At each case (D 1 ) = 1.80 . Black lines (slope 2) correspond to (D 1 ) = 2 (limit when → ∞)

Relation between (log(Γ( i ))) and 1 i , i = 1, 2, 3, … , 100 , according to the expected number of points of the pattern ( ), when initial square has sides of length k = 1 . Black points at each curve corresponds to coordinates log 1 k , log(1 − exp − ) and 1 2 log( ) − log(k), log( ) − 0.199 , respectively (see text for explanations on these values). The slopes of the dashed lines define (D 1 ) (see Eq. 15). These are, respectively, 1.60 ( = 10 ), 1.80 ( = 100 ), 1.87 ( = 1000 ) and 1.90 ( = 10000 ). In general when → ∞ , (D 1 ) → 2 (black line)

The log-log plots (Figs. 3 and 4) show a multifractal behaviour, i.e. the dependence between log Γ( i ) and log 1 i is non-linear. The box-counting dimension D in (3) is usually calculated with the portion of the data that allows to fit a linear model (see, for example, Kenkel, 2013; Mou and Wang, 2014; Vega et al., 2015, and Jaquette and Schweinhart, 2013) . This option might not be appropriate to discriminate between the different types of spatial point patterns. In this context, it is important to take into account the minimum and maximum values of the log-log curves. Thus, here, we propose to characterise the relationship between log Γ( i ) and log 1 We denote this slope as (D 1 ) instead of (D) to emphasise that we do not employ the traditional linear fitting used in box-counting estimation. Under CSR, we have (12)

From Eq. (15), lim →∞ (D 1 ) = 2 (see Fig. 4 ).

In practice with real data, in Eq. (15) is unknown. In this case in order to test for CSR, we can take ̂= n with n the number of points of the observed pattern, namely we assume that N(A) ∼ Poisson( |A| = n) . In this scenario, the expected value of D 1 under CSR is defined as and its estimation is given by where x 1 , x 2 , and ŷ 1 are defined similarly as in (16), and ŷ 2 is calculated from the scatter plot between log Γ( i ) and log 1 i . Specifically, ŷ 2 is the ordinate corresponding to the abscissa 1 2 log(n) − log(k) , with k the side length of the square. In practice, some mathematical interpolation procedure (linear, polynomial, etc) can be required to calculate ŷ 2 . By way of illustration, we show the results found with a simulation from N(A) ∼ Poisson( = |A| = 100) , |A| = 1 . Figure 5 shows the spatial distribution of n=114 simulated events in the unit square, the number of events per cell for each one of the three partitions 1 i , i = 5, 10, and 15 , and the value of Γ( i ) at each case.

We observe that the smaller the size of the partition, the greater the number of boxes without events (the number of boxes with zeros). Calculating log 1 i and log Γ( i ) for i = 1, … , 20 , we obtain the log-log scatter plot (white circles) shown in Fig. 6 . Its behaviour, as expected, is similar to the theoretical log-log curve log 1 i versus (log(Γ( i ))) under CSR (red line). The black points in this plot are the coordinates used to calculate the expected box-counting dimension ̂ (D 1 ) under the null hypothesis (Eq. 16). In this case ̂ (D 1 ) = 1.806 . The intersection of the blue lines corresponds to the coordinate ( x 2 ,ŷ 2 ) ( ŷ 2 is found by linear

interpolation between the two nearest values), which is replaced in Eq. (17) to find the estimated box-counting dimension ( ̂ (D 1 ) = 1.784 ). Generating m simulations from N(A) ∼ Poisson(n = |A| = 114) and repeating the procedure above described, we can find m estimations under the null hypothesis of CSR. A value of ̂ (D 1 ) at the extreme of the tail of the null distribution would indicate that the spatial randomness hypothesis should be rejected. Analogously, defining B = (̂ (D 1 ) −̂ (D 1 )) we reject the randomness hypothesis if this value is at the extreme of the tail of the corresponding null distribution. Using B may be preferable because in all cases (regardless of the type of pattern considered), the zero will be the reference value of the centre of the distribution (see Fig. 7 ). This is illustrated in Sect. 3.1 with a simulation study. The procedure to test for CSR based on ̂ (D 1 ) , ̂ (D 1 ) , and B above described is summarised in Algorithm 1, where in addition to presenting in a schematic way, the steps required to perform the spatial randomness test using the boxcounting method, it is shown how to estimate the corresponding p value. Black circles are the coordinates used to calculate the box-counting dimension ( ̂ (D 1 ) ) under the null hypothesis (Eq. (16)). The intersection of the blue lines corresponds to the coordinate ( x 2 ,ŷ 2 ) used to obtain the estimated box-counting dimension ( ̂ (D 1 ))(Eq. (17)). ŷ 2 is calculated by linear interpolation of the two nearest points log 1 10 , log Γ( 10 ) and log 1 11 , log Γ( 11 )

As an initial review of the goodness of fit of the test proposed in Sect. 3, we present the results of a Monte Carlo simulation study to describe the test behaviour under the three types of points structures generally considered in point pattern analysis. , which corresponds to the database cells widely known and used as example in many works on point patterns (Ripley, 1977 (Ripley, , 1981 Diggle, 1983) . Note that in the case of the Matérn process, we use instead of to avoid confusion with the notation in Eq. (16). The intensity of the Matérn cluster process is (Waagepetersen, 2007) , and the level of aggregation is determined by parameter r. Fixed and , the aggregation level increases when r decreases (Fig. 8) .

We particularly looked for simulations of size 42 to generate the point patterns under randomness and clustering (top left and centre left of Fig. 7 ) so that the results were more easily comparable with those of the cells pattern (which has 42 events). The distributions of the statistic B on the right of Fig. 7 were generated assuming a fixed n, although the results do not change significantly if an unconditional simulation is considered. The functions rpoispp and rMatClust of the spatstat library (Baddeley et al., 2015) of R (R Core Team, 2020) were used to simulate the random and clustered patterns. The point pattern cells are also available in spatstat. We apply the methodology presented in Sect. 3 to test the hypothesis of CSR with each one of these datasets. Employing the n values in Table 1 and Eqs. (16) and (17), we calculate for each one of the patterns in Fig. 7 , ̂ (D 1 ) , ̂ (D 1 ) , and B (Table 1) .

A quick inspection of the results in Table 1 reveals that the value of ̂ (D 1 ) found with the point pattern simulated under CSR (top left of Fig. 7) is very close to the expected value of ̂ (D 1 ) under complete spatial randomness, while in the other two cases, ̂ (D 1 ) is relatively far from this value of reference (below when the pattern is cluster and above if it is inhibitory). The same information is taken considering the B statistic. (In this case, the reference is zero.) The value of B under the Poisson process is close to zero, while the B values of the Matérn cluster and cells patterns are far from zero (above when the pattern is cluster and below if it is inhibitory). The distribution of the statistic B under the null hypothesis was estimated generating 500 simulations from N(A) ∼Poisson( |A| = 42 ) (see the histograms in right panel of Fig. 7) , that is, for j = 1 … , 500 , we obtained ̂ (D 1 ) j and B j = (̂ (D 1 ) j −̂ (D 1 ) j ) . A kernel density estimation (Sheater, 2004) of the B distribution is also obtained at each case (red curves in right panel of Fig. 7) . We use a Gaussian kernel, and the bandwidth is defined using the Silverman's rule (Sheater, 2004) . Note in Fig. 7 that we obtain three different distributions of B under randomness. Only one of these distributions could have been used. However, to present the results in more detail, we include three sets of independent simulations. Using the B j , j = 1 … , 100 , and the function quantile of the library stats of R (R Core Team, 2020), the percentiles B 0.025 and B 0.975 of the B distribution (black dashed lines in Fig. 7) were calculated. The null hypothesis of CSR is rejected at each case if the B values are lower or greater than the estimated percentiles B 0.025 and B 0.975 , respectively. The kernel density estimates (histograms and red curves) in Fig. 7 suggest that the distributions of B under CSR are symmetric around zero. A large value of B (in the upper tail of the distribution of B) will indicate that the pattern under study is clustered. On the contrary, a very low value of B (lower tail of the distribution of B) will give evidence that the process of interest follows an inhibition model.

Two aspects are noted from Table 1 and Fig. 7 . On the one hand, the B value calculated with the point pattern simulated under randomness ( − 0.007) (dashed blue line in the top right panel of Fig. 7) is around the centre of the null distribution, i.e. as expected, the test indicates that there is not evidence to reject the null hypothesis of CSR. On the other hand, for the Matérn cluster point pattern (centre Table 1 Expected number of events ( ), number of events recorded (n), expected box-counting dimension conditional to n ( ̂ (D 1 ) ), and estimates ( ̂ (D 1 ) and B) for each one of the three types of point patterns considered (1)) are on the tails of the corresponding distributions under randomness (on the right in the case of the Matérn cluster process and the opposite for the inhibition pattern (Fig. 7) ), that is, these indicate that the hypothesis of randomness should be rejected. In summary, the plots on the right panel of Fig. 7 show that the test proposed (based on B or ̂ (D 1 ) ) in all the three cases came to the correct decision. If the hypothesis of spatial randomness is rejected, it indicates whether the pattern is cluster or inhibitory. From Table 1 , it is important to note that conditional on n there is a value of reference ( ̂ (D 1 ) ) for the randomness hypothesis. The simulation-based distributions allow to establish whether the estimate ̂ (D 1 ) is significantly different from this value. The value of B allows measuring the strength of inhibition or clustering. The smaller or larger (further from zero) B is, the greater the degree of inhibition or clustering, respectively, of the point pattern under consideration.

We generate realisations of a Matérn cluster process with parameters ( , r, ) (Waagepetersen, 2007) . The method used generates a uniform Poisson point process of "parent" points with intensity . Then, each parent point is replaced by a random cluster of "offspring" points, the number of points per cluster being Poisson distributed, and their positions being placed and uniformly inside a disc of radius scale (r) centred on the parent point (Waagepetersen, 2007) . We use the function rMatClust of the library spatstat (Baddeley et al., 2015) to generate the simulations. For six selected values of r (0.1, 0.2, 0.4, 0.6, 0.8, and 1.0), one resulting simulated process is shown in Fig. 8 . From these plots, it gets clear that the smaller r, the greater the aggregation, and therefore more evidence to reject the hypothesis of spatial randomness. On the contrary, if r increases the configuration of points look more similar to a realisation of a random process under CSR. With this result in mind, in order to estimate the rejection probability of the test under different levels of spatial aggregation, we decided to propose a simulation study considering a more extensive set of values of r between 0.1 and 1 (0.1, 0.15, 0.20, ..., 0.90, 0.95, 1). Point patterns with a high level of aggregation are initially generated (using small r values), and then, (increasing r), we simulate others with point configurations similar to those obtained under spatial randomness. The procedure used is analogous to that described in Algorithm 1. Specifically, for each r value, the rejection probability of the CSR hypothesis is estimated using the iterative procedure given in Algorithm 2. The rejection probabilities for each r are shown in Table 2 . According to the values from this table, it is clear that there is an inverse relationship between r (column 1) and the probability of rejecting the null hypothesis (column 13). The lower the r value, the greater P(Reject H 0 ) , i.e. the more evident the spatial aggregation, the greater the rejection probability of the complete spatial randomness hypothesis. On the contrary, when the value of r tends to one, the corresponding rejection probabilities of the randomness hypothesis tend to zero. We include in Table 2 the first 10 values of B (of the total of 500) with the corresponding associated empirical p values. It is clear from these values that there is (in general) a transition in the B values. When r is small (r = 0.1, 0.15, 0.20), the values of B tend to be relatively large, and therefore, the simulation-based p values are close to zero, while when r is large (r = 0.9, 0.95, 1.00) the opposite occurs, the values of B tend to be relatively small (close to zero or even negative), and consequently, the corresponding empirical p values are greater than . The table results suggest that the proposed test is unbiased, i.e. the power of the test increases when the level of spatial aggregation increases.

In the analysis of spatial point patterns, the test for CSR is often based on graphical methods. Generally, the distribution functions of the event-event distance (function G (Clark and Evans, 1954) ), point-event distance (function F (Bartlett, 1964) ), and the number of events encountered up to a given distance of any particular event (function Ripley's K Ripley (1977) ) are employed for this purpose. These functions are typically inspected by plotting the empirical function calculated from the data, together with the theoretical function of the homogeneous Poisson process with the same average intensity Baddeley et al. (2015) . To assess the statistical significance of deviations between the observed and theoretical functions, it is required to know the expected variability when the pattern is completely random. To this purpose, simulated realisations under CSR are generated, and pointwise envelopes based on the minimum and maximum are calculated. In this Section, we show how the log-log plot defined in Sect. 2.2 can be used as an alternative to the functions G, F, and K to test graphically for CSR. The steps to define the graphical test based on the log-log plot are the following. Initially, calculate the log-log plot defined in Sect. 2.2 with the observed dataset. Then simulate m realisations from N(A) ∼ Poisson( |A| = n) , and for each simulated point pattern obtain the log-log plot. From the generated m curves, define pointwise envelopes as in the case of the G, F, and K functions mentioned above. We illustrate the use of the log-log function based on the same point patterns considered in Sect. 3.1 (Fig. 7) . The results obtained are compared with those found with the G, F, and K functions. We use the library spatstat (Baddeley et al., 2015) to generate the envelopes. Specifically, the functions Gest, Fest, Kest and envelope of the same library were used for carrying out the graphical tests. In all cases, the simulations to obtain the envelopes were conditioned to have the same number of events as the original point pattern ( n = 121 (random), n = 42 (regular), and n = 156 (clustered)). Figures 9, 10 and 11 show the corresponding envelopes (grey shading) for the G (top left), F (top right), K (bottom left) and log-log functions (bottom right) generated from the point patterns in Fig. 7 . The obtained results with the log-log function in all cases are in accordance with those given by the functions G, F, and K (Figs. 9, 10 and 11), that is, the estimated log-log function is inside the envelopes in the case of the Poisson pattern ( Fig. 9 ) and outside of envelopes in the case clustering and inhibition (cells) (Figs. 10 and 11, respectively). From an empirical point of view, we can note that the log-log plot has the same performance as the traditional G, F, and K functions. The log-log function has an analogous interpretation to the F function. There is clustering when the estimated function is below the envelopes and inhibition when it is above (Figs. 10 and 11 ). The results based on the log-log plot are also consistent with those described in Sect. 3.1. Recall that under inhibition, the estimated box-counting dimension ( ̂ (D 1 ) ) is greater than expected under randomness ( ̂ (D 1 ) ), or the opposite if the process is clustered ( ̂ (D 1 ) <̂ (D 1 ) ). A similar result can be identified from Figs. 10 and 11. The log-log plot for the pattern cells (Fig. 11) is above the envelopes, that is, it is greater than the expected log-log curve under CSR. Likewise, we can see in Fig. 10 (Matérn cluster process) that the estimated log-log function (black line) is below the envelopes, that is, the log-log plot for a clustered point pattern is lower than the expected under CSR. These results suggest a direct relationship between these two approaches. Table 2 Statistic Spatial statistics has emerged as a helpful tool in epidemiology to describe the spatial and spatio-temporal spread and incidence of different pathogens. This area of statistics is commonly used today in the study of the COVID-19 spread (a disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Chhikara et al. (2020)). Spatial statistics allows an understanding of how the COVID-19 outbreak is spatially distributed (Ramírez-Aldana et al., 2020) . Studying the spatial behaviour (at the local and regional level) of the spread by COVID-19 is essential for the formulation of control and mitigation measures by government and health authorities. For this reason, there has been a growing number of academic and scientific works related to the spatial modelling of its spread patterns (Kang et al. 2020 ; Miller et al. 2020) . In this section, we show how the methodology given in Sect. 3 can be used for this purpose.

In particular, we apply the test proposed to a dataset corresponding to COVID-19 cases recorded in March 2020 in the metropolitan area of Cali city, located in the southwest region of Colombia (Fig. 12) . The virus was confirmed to have reached Colombia in March 2020. Between March 2 and 31, 2020, there were 443 reports of COVID-19 infections in Cali. As input for our analysis, we take 405 spatial coordinates corresponding to the spatial residence locations of the infected people in this municipality. We exclude the duplicate coordinates. (The infections of several people in the same place are considered a single event.) In Fig. 13 , it is shown the spatial distribution of the events in this month. The southernmost part of the city is rural and unpopulated, so we carry out the analysis by delimiting the perimeter to the inhabited area.

Observing both the right panel in Fig. 12 and the point pattern in Fig. 13 , we identify zones with high cases burden. The most significant aggregation of cases is given in the city's south. However, other minor hot spots are placed to the west, the east, and the north. A detailed description of this respect is given in Cuartas (2020) . Based on the coordinates of the spatial point pattern in Fig. 13 , we estimated the functions G , F, and K (Fig. 14) . The three plots are concordant and confirm the above; they allow us to conclude that the specific pattern of COVID-19 cases in Cali city during the first month of the pandemic was clustered. We also found the distribution (under CSR) of the statistics B defined in Sect. 3 (the top left panel of Fig. 14) and its calculated value B = (̂ (D 1 ) −̂ (D 1 )) = 0.1549 (dashed blue line in Fig. 14) with the point pattern in Fig. 13 . The B value is on the right tail of the distribution. Consequently, it indicates that the null hypothesis of CSR must be rejected, (The same conclusion given by the classic graphical tests.)

We have analysed just one dataset of COVID-19 cases. The four strategies allow us to reach the same conclusion. However, there are implicitly advantages in using the method based on the box-counting estimation. On the one hand, we have a p value (see Algorithm 2 for its estimation), which allows being conclusive. (Sometimes the graphical tests are not.) On the other hand, using B (equivalently in ̂ (D 1 ) ) the point pattern under study is characterised with just one value. This opens the doors to the application of many traditional techniques (regression, ANOVA, longitudinal data analysis, time series, etc.) in those situations in which there is a collection of point patterns to be analysed simultaneously (obtained, for example, in different periods or under various experimental conditions). 

We have proposed a test to evaluate the hypothesis of complete spatial randomness based on the fractal dimension and its estimation by the box-counting methodology. Also, a graphical test is derived. Using simulated point patterns under randomness, inhibition and clustering, we found that the two approaches have a good performance. The results are concordant and coherent with those obtained employing classical graphical tests (G, F, and K functions). The graphical interpretation of the proposed test is similar to that obtained with the F function. The tests are not based on distances, and therefore, it is not necessary to consider the edge effect. A simulation study was carried out to show the behaviour of the test proposed under the null hypothesis (randomness) and the classical alternatives (inhibition and clustering). The simulation results were satisfactory. A detailed study about the power of the test was also conducted. This allows us to conclude that the test has a good performance under different levels of clustering. An advantage of the methodology considered is that a statistic is calculated ( ̂ (D 1 ) or equivalently B), which allows summarising the information of the point pattern in just one value. This can be useful from many inferential perspectives. For example, for modelling spatio-temporal point patterns or comparing groups of point patterns through ANOVA.

Fractals and Chaos: an illustrated course

Case studies in spatial point process modeling

Hybrids of Gibbs point process models and their implementation

Spatial point patterns: methodology and applications with R. Chapman and Hall/CRC

Hierarchical modeling and analysis for spatial data

The spectral analysis of two-dimensional point processes

Clustering multivariate data streams by correlating attributes using fractal dimension

Fractal dimensions for rainfall time series

Distance to nearest neighbor as a measure of spatial relationships in populations

Corona virus SARS-CoV-2 disease COVID-19: infection, prevention and clinical advances of the prospective chemical drug therapeutics

Statistics for spatial data

SARS-coV-2 spatio-temporal analysis in Cali. Colombia

An introduction to the theory of point processes

A brief historical introduction to fractals and fractal geometry

Statistical analysis of spatial point patterns

Statistical analysis of spatial point patterns

Statistical analysis of spatial and spatio-temporal point patterns

Fractal geometry: mathematical foundations and applications

Advances in the implementation of the box-counting method of fractal dimension estimation

Spatial statistics and modeling

Métodos del Registro de Cáncer en Cali. Colombia

Statistical analysis and modelling of spatial point patterns

Fractal dimension estimation with persistent homology: a comparative study

Spatial epidemic dynamics of the COVID-19 outbreak in China

Sample size requirements for fractal dimension estimation

An improved brown's method applying fractal dimension to forecast the load in a computing cluster for short time series

A fast algorithm to determine fractal dimensions by box counting

Fractal dimension of well logging curves associated with the texture of volcanic rocks

How long is the coast of Britain? Statistical self-similarity and fractional dimension

The fractal geometry of nature

Spatial analysis of global variability in Covid-19 burden

Fractal-based intrinsic dimension estimation and its application in dimensionality reduction

Statistical inference and simulation for spatial point processes

Spatial data analysis in ecology and agriculture using R

Spatial analysis of COVID-19 spread in Iran: insights into geographical and structural transmission determinants at a province level

Modelling spatial patterns

Spatial statistics

Modelling spatial patterns

Statistical methods for spatial data analysis

Density estimation

Environmental monitoring network characterization and clustering. Geostatistics, machine learning and Bayesian maximum entropy, advanced mapping of environmental data

Multifractal portrayal of the Swiss population

Fractal dimension and geostatistical parameters for soil microrelief as a function of cumulative precipitation

Handbook of spatial point-pattern analysis in ecology

An estimating function approach to inference for inhomogeneous Neyman-Scott processes

Acknowledgements This work is part of the research project "Modelación Espacio-Temporal del Covid-19 en Colombia" financed by Dirección de Investigación of Universidad Nacional de Colombia. We thank the epidemiological surveillance group of the Secretary of Health of Cali for providing us with the analysed information.