key: cord-1055616-hwihjsau authors: Wang, Xia; Washington, Dorcas; Weber, Georg F. title: Complex Systems Analysis Informs on the Spread of COVID-19 date: 2021-01-06 journal: bioRxiv DOI: 10.1101/2021.01.06.425544 sha: 32984d4448d0f5c5864df83e4f9fba706674b321 doc_id: 1055616 cord_uid: hwihjsau The non-linear progression of new infection numbers in a pandemic poses challenges to the evaluation of its management. The tools of complex systems research may aid in attaining information that would be difficult to extract with other means. To study the COVID-19 pandemic, we utilize the reported new cases per day for the globe, nine countries and six US states through October 2020. Fourier and univariate wavelet analyses inform on periodicity and extent of change. Evaluating time-lagged data sets of various lag lengths, we find that the autocorrelation function, average mutual information and box counting dimension represent good quantitative readouts for the progression of new infections. Bivariate wavelet analysis and return plots give indications of containment versus exacerbation. Homogeneity or heterogeneity in the population response, uptick versus suppression, and worsening or improving trends are discernible, in part by plotting various time lags in three dimensions. The analysis of epidemic or pandemic progression with the techniques available for observed (noisy) complex data can aid decision making in the public health response. phase. The time-averaged cross-wavelet power provides a summarized view on the shared periods, the corresponding power and the statistical significance. Cross-wavelet plots may mark areas significant due to one series swinging widely, rather than two series sharing a joint period. To avoid this false positive readout, it is more appropriate to examine wavelet coherence plots, like the coefficient of correlation. It has a value range between 0 and 1 and it shows statistical significance only in areas where the two series actually share jointly significant periods. Return plots: From the total numbers of new infections, we generated return plots with increasing lags, plotting daily changes x(t+1), …, x(t+7) versus x(t) and weekly changes x(t+14), …, x( Here, the characteristic of dimension is that it specifies the rate, at which the number of increments varies with scale size. We calculated the box counting dimension after binning into 16 x 16 squares of 2-dimensional return plots with various lags. Average mutual information: The average mutual information (ami) represents a non-linear correlation function, which indicates how much common information is shared by the measurements of x(t) and x(t+n). The average mutual information was calculated with the mutual function R package tseriesChaos. It estimates the mutual information index for a specified number of lags. The joint probability distribution function is estimated with a simple bi-dimensional density histogram. Embedding dimension: Here by R package nonlinearTseries, we first use the timeLag function to decide the optimal time lag based on the average mutual information and then by the estimateEmbeddingDim function to assess the optimal embedding dimension m. Then the optimal set of regressors related to x(t) is x(t-), …, x(t-(m-1) ), x(t-m ). Across countries, a wide spectrum of measures was taken to curb the spread of SARS-CoV2. This resulted in a range of very different progression curves when graphing the numbers of new infections over time ( Figure 1 ). India, Brazil, Sweden, Italy and the United States have been considered as hard-hit for their own internal reasons. France, Germany, over a long period Poland, and South Korea had tighter control and a less aggressive spread. All curves display close to linear ramp-up phases, followed by more or less irregular oscillations. The levels of success at suppressing the new infection rates diverged among countries, and several are experiencing a second peak. Wavelet methodology aids in studying periodic phenomena in time series, particularly in the presence of potential frequency changes over time. For cross-country evaluations, all graphs were plotted on the same scale ( Figure 2A ). Each country was also plotted on its own scale ( Figure 2B ). The univariate analysis of the time course for the countries under study shows prominence of the recent upswing in France (heat intensity on the right margin of the graph). By contrast, there is comparatively more successful management by Italy, Germany, Poland and South Korea through October 2020. India, Brazil, Sweden, and the United States display cyclical fluctuations of various durations, none of which have been contained. A period of 7 days is prominent in the fluctuations of most countries, which may reflect real cyclicity or weekly reporting habits. The worldwide data are displayed in Figure S1 . For cross-country comparisons, we converted the new infection total numbers to new infection rates by relating them to 10,000 members of the population ( Figure 3A) . Similarly, complex systems can be analyzed with Fourier analysis. We first plotted Fourier power spectra versus frequency for the rates of new infections ( Figure 3B ). Spectral density range (high in Brazil, low in South Korea) and frequency distribution provide a readout for infectious spread. The spectral density of the normalized rates (identically scaled y-axes) ( Figure 3C ) confirmed good management of the pandemic spread in Germany, Poland, and South Korea (and to some degree in Italy). Despite the progressive increase in the numbers of infections in India, on a population basis, control has apparently not been lost through October 2020. By contrast, the power spectra for Brazil, Sweden, and France are reflective of potentially adverse developments. The United States display an anomaly with a periodic behavior that has a prominent cycle around 100 days. To gain a better understanding of the dynamics, with which disease spread occurs, we This is the case for South Korea, Germany and Italy. High cross-wavelet power around a periodicity of 7 days is reflective of poor control. To generate informative return plots, we utilized 3 dimensions, which allows for the visualization of two lags from x(t) (or a from a later start point) and may reveal the pattern of an attractor. In this depiction, a rapid increase or decrease in new infections is reflected in a closeto straight line, oscillations generate a near-toroid attractor, while successful management shrinks the torus and moves it closer to the origin. Initially, we evaluated multiple time delays. Most discriminating were x(t)/x(t+7)/x(t+14), x(t+3)/x(t+7)/x(t+14), and x(t+5)/x(t+14)/x(t+28) ( Figure 4B ). The progressive increase in new cases over the time period in India is reflected in a predominantly linear curve on each scale. The wide fluctuations in Brazil generate a largely disordered appearance. Disorder is also apparent in Sweden. France initially managed the pandemic well, but is experiencing a dramatic upswing, which obscures order. Cyclical patterns, suggesting the outlines of attractors, are apparent in USA, Italy, Germany, and South Korea (where most data points are concentrated near the origin). Poland initially displayed a well-contained attractor, but the recent substantial upswing in new infections is reflected in a linear progression from there (for separate analyses of the two phases, see Figure S3 ). We also calculated the embedding dimensions for the lagged data ( Figure 4C ). Germany has the highest embedding dimension of 10, followed by Poland with 9. Several countries have an embedding dimension of 7, including Brazil, Sweden, USA and South Korea. Italy and France have the embedding dimension equal to 5. India is unusual due to its longer lag period of 24 days. When the lag period is set at 7 days, the embedding dimension of India is also equal to 7. For the worldwide data, the calculated embedding dimension is 7 with a time lag of 1 (not shown). The autocorrelation of two data strings with short time lags is expected to be high Figure 5C ). Within the USA, individual states have encountered a rather wide range of progression phenotypes in the spread of new COVID-19 infections ( Figure 6 ). This is due to variations in international connectedness and population density (reflected in the early peaks in the We normalized the new infection numbers to rates by relating them per 10,000 inhabitants ( Figure 8A ). Figure 8B shows the periodogram for the 6 states under investigation with frequencies between 0 and 0.10 (the graph is almost flat for the higher frequencies). There exist clear heterogeneous patterns in the comparison among these states. New York and Massachusetts display steadily decreasing spectral density values from the longest period to around 1-2 weeks (corresponding to a frequency range around 0.07-0.14). Florida and Texas share similar patterns with a few low spikes in their periodograms after the first 3 highest ones. The graph for California flattens out after the lowest three frequencies, with the longest period (the whole series) having the highest value. Ohio's pattern is quite unique with fluctuating values from the longest periods through around 5-6 weeks. The Fourier power spectrum for the infection rates ( Figure 8C ) indicates similar periodic patterns as in the periodograms of Figure 8B . These patterns are less prominent due to the adjustment to the same y-axis scale (the scale reflects the magnitude of the positive rates, the shape shows the evolution of the disease). We conducted bivariate wavelet analysis on the time-lagged data ( Figure 9A and Figure S4 ). The shared synchronicity segments between x(t) and x(t+n) can be grouped into shorter periods (around 7 days) and longer periods (approximately 3 weeks, 1 month, 2 months). New Figure 10A ). Up to a maximum lag of 49 days, the average mutual information for the 6 US states under study ranges between 1.0 and 2.0. Overall, all states show a slightly decreasing pattern except for California, which is relatively leveled at a value of 2.0 ( Figure 10B) . Unexpectedly, the box counting dimension ( Figure 10C ) is less discerning than it was for the evaluation across countries. This may be due to the much lower power conveyed by smaller population sizes. In the present investigation we find that the analysis tools for observed complex data can aid in the interpretation of pandemic spread across communities. Difficulties in analyzing the nonlinear patters of infectious disease spread may be tamed by applying the tools of complex systems research. The approach can reveal patterns, where a simple time course of new cases does not. Further, non-linear analysis allows the study into various facets of the process, depending on whether the starting data are new cases, hospitalizations, deaths or other readouts. Maps can be generated and evaluated for their fractal dimensions [11] . The operational approximation of Lyapunov exponents may be meaningful, although they were largely uninformative for the present study (Supplemental Figure S5 ). dimensions are shown, from left to right, for x(t)/x(t+7)/x(t+14), x(t+3)/x(t+7)/x(t+14), and x(t+5)/x(t+14)/x(t+28). Each state under investigation has its own row. Apollo's Arrow: The Profound and Enduring Impact of Coronavirus on the Way We Live Early Stage Machine Learning-Based Prediction of US County Vulnerability to the COVID-19 Pandemic: A Machine Learning Approach Modelling the initial epidemic trends of COVID-19 in Italy Spread of Infectious Disease Modeling and Analysis of Different Factors on Spread of Infectious Disease Based on Cellular Automata Power-law distribution in the number of confirmed COVID-19 cases Analysis of Observed Chaotic Data COVID-19 outbreak: Migration, effects on society, global environment and prevention On the temporal spreading of the SARSCoV-2 State-level variation of initial COVID-19 dynamics in the United States Computational Wavelet Analysis. R package version 1.1 An analysis of COVID-19 spread based on fractal interpolation and fractal dimension GFW is supported by NIH grant CA224104.