key: cord-0142139-nl0rlhv4 authors: Liang, Yingjie; Guan, Peiyao; Wang, Shuhong; Qiu, Lin title: Classification of COVID-19 anomalous diffusion driven by mean squared displacement date: 2021-06-10 journal: nan DOI: nan sha: 4c437fc5795d2134123314ad9afe10da031d73fc doc_id: 142139 cord_uid: nl0rlhv4 In this study, we classify the COVID-19 anomalous diffusion in two categories of countries based on the mean squared displacement (MSD) of daily new cases, which includes the top four countries and four randomly selected countries in terms of the total cases. The COVID-19 diffusion is a stochastic process, and the daily new cases are regarded as the displacements of diffusive particles. The diffusion environment of COVID-19 in each country is heterogeneous, in which the underlying dynamic process is anomalous diffusion. The calculated MSD is a power law function of time, and the power law exponent is not a constant but varies with time. The power law exponents are estimated by using the bi-exponential model and the long short-term memory network (LSTM). The bi-exponential model frequently use in magnetic resonance imaging (MRI) can quantify the power law exponent and make an easy prediction. The LSTM network has much better accuracy than the bi-exponential model in predicting the power law exponent. The LSTM network is more flexible and preferred to predict the power law exponent, which is independent on the unique mathematical formula. The diffusion process of COVID-19 can be classified based on the power law exponent. More specific evaluation and suggestion can be proposed and submitted to the government in order to control the COVID-19 diffusion. On 29 July 2020, more than 16 million cases of coronavirus disease 2019 (COVID-19) had been confirmed including 0.66 million deaths and 10 million recovered [1] . The COVID-19 pandemic is global threat to our societies, and needs to be tackled by the effort from different communities across all fields and disciplines. Scientists have fought against the COVID-19 pandemic from different perspectives [2] [3] [4] . One of the hot research topics is mathematical and physical modeling to explain and predict the occurrence and the temporal evolution mechanism of the COVID-19 spread [5] [6] [7] [8] [9] . In this study, we try to quantify the COVID-19 dynamics from diffusion. For each country, the common phenomena are that total cases increase and the daily new cases fluctuate with increasing time based on different conditions, such as spread environment, control measurements, and incubation period. To interpret these two behaviors from a statistical mechanics perspective, we consider the COVID-19 dynamic is particles diffusion within the country in any regions. The COVID-19 diffusion is considered a stochastic process. The diffusion environment of COVID-19 in each country is heterogeneous, and the underlying dynamic process is not Brownian motion but anomalous diffusion. Mean squared displacement (MSD) provides a common denominator for anomalous diffusion, which is no longer a linear function of time in anomalous diffusion as [10] : when 1   , the governing stochastic process is the classical Brownian motion, while if 1   , the process is sub-diffusion; and if 1   , the process is super-diffusion. Anomalous diffusion has been well studied by numerous popular anomalous diffusion models [11] [12] , specifically, the continuous time random walk (CTRW) models [13] , the fractional derivative models [14] , the Hausdorff derivative models [15] , and the nonlinear partial differential equation (PDE) models, i.e., time or space dependent diffusion models [16] . By determining the value of the power law exponent in Eq. (1), the COVID-19 diffusion can be classified. It is known that for a diffusion process with one single scale, the power law exponent in Eq. (1) is a constant, which can fully capture the statistical properties without transient phenomenon [17] . But for the diffusion process with temporal or spatial multi-scales, the power law exponent in Eq. (1) will not be a constant, but fluctuates with increasing time or space [18] [19] . The physical meaning of the variable values of  in Eq. (1) has been well investigated in the context of the variable order or distributed order fractional derivative diffusion models [20] [21] . Here we will focus on the pattern of the power law exponent for specific COVID- 19 including the bi-exponential model [22] , which is frequently used in magnetic resonance imaging (MRI) and can quantify the anomalous diffusion effected by fast and slow diffusion processes [23] [24] , and the LSTM network [25] [26] , a popular deep learning way, is used to fit and predict the power law exponents. The diffusion processes of COVID-19 in the selected countries will be deeply investigated and classified based on the MSD. In this study, the COVID-19 diffusion processes in two categories of countries are investigated from the period February 15th to July 3rd. The parameters in the bi-exponential model for each country are estimated using the curving fitting toolbox in MATLAB. Based on the deep learning toolbox in MATLAB, the LSTM network is trained in terms of the 80% data length of the power law exponents for each country. To predict the power law exponent, the number of hidden units is 288. In the training options, the solver is 'adam' and the epochs for training are 250 [25] . To prevent the gradients from exploding, the gradient threshold is set to 1. The root mean square errors (RMSEs) of the simulation results are also calculated for the bi-exponential model and the LSTM network. The daily new cases of US, Brazil, Russia and India are given in Fig. 1 (a) from the period February 15th to July 3rd. The corresponding MSDs are calculated and shown in Fig. 1(b) . Fig. 1 (c) provides the fitted values of the power law exponent  in MSD by using the bi-exponential model. Fig. 1(d) illustrates the predicted values of the power law exponent  by using the LSTM network. The parameters in the bi-exponential model are given in Table A1 in Appendix. Fig. 1(c) , for US, Brazil and India the diffusion processes transit from sub-diffusion to super-diffusion, but for Russia, the COVID-19 diffusion process transits from sub-diffusion to normal diffusion, which clarifies the fluctuations of daily new cases given in Fig. 1(a) . The increase patterns for the MSD in Fig. 1(b) and the values of  in Fig. 1(c) are similar with the day increases. More specifically, it can be observed from Fig. 1(a) In Fig. 1(c) , except for US, the fitted curves for the values of  are almost consistent with the patterns in the real cases. For US, the pattern for the increasing values of  cannot be very accurately estimated by using the bi-exponential model, because of the two periods of peak and valley in the curve of daily new cases. Based on the results in Fig. 1(c) , the bi-exponential model can quantify the values of  and makes an easy prediction using an explicit mathematical formula. But it is easy to estimate the patterns of  , and its accuracy is not very good, e.g., the case for US, to estimate the values of  with increasing days based on the RMSE in Table 1 . It can be seen from Fig. 1(d) that the predicted results for the 20% data length of the power law exponents given by the LSTM network can well match the real values of the power law exponent  in terms of the 80% data length of the power law exponents. The RMSE for each country in Table 1 is much smaller than that of the bi-exponential model. Compared with the bi-exponential model, the LSTM network is more flexible and preferred to predict the values of  , which do not depend on the unique mathematical formula. Thus, the LSTM network is also an alternative strategy to predict the patterns of the power law exponent  for the four countries. The daily new cases of Spain, UK, Canada and Singapore are given in Fig. 2(a) from the period February 15th to July 3rd. The MSDs of the daily new cases are shown in Fig. 2(b) . Fig. 2(c) gives the fitted values of the power law exponent  by using the bi-exponential model, and its parameters are provided in Table A2 in Appendix. The predicted values of the power law exponent  are displayed in Fig. 2 (d) by using the LSTM network. Table 2 provides the RMSEs for the bi-exponential model and the LSTM network. In Fig. 2(a) , the daily new cases of Spain, UK, Canada and Singapore are less than ten thousand for the highest peak of the curves and less than two thousand in the current state, which is different from the results of the four countries in Fig. 2(a) . The curves of the corresponding MSDs for the daily new cases in Fig. 2(b) reach to a stable state quickly and last for a very long period. The values of  in Fig. 2(c) indicate that the COVID-19 diffusion processes for Canada and Singapore are sub-diffusion. For UK it transits from sub-diffusion to normal diffusion, but for Spain it transits from sub-diffusion to super-diffusion, then transits to normal diffusion, which quantifies the trends of fluctuations in the daily new cases given in Fig. 2 Fig. 2(d) , it is also found that the LSTM network can predict the values of  for the 20% data length very well based on the 80% data length of the real values of  . And the RMSE for each country in Table 2 is very small, which is better than the bi-exponential model from To predict the patterns of the power law exponent  , the bi-exponential model in a traditional simple mathematical form frequently used in MRI and the long short-term memory network as a popular recurrent neural network method are used. The results show that the bi-exponential model can quantify the values of  and makes an easy prediction using an explicit mathematical formula. The accuracy of the bi-exponential model is not very good, and the physical mechanism from the fast and slow diffusion processes should be further considered in the next step. The LSTM network can predict the values of  for the 20% data length very well based on the 80% data length of the real values, which has better accuracy than the bi-exponential model. And the LSTM network is more flexible and preferred to predict the values of  , which do not depend on the unique mathematical formula. Based on the values of  , the diffusion process of COVID-19 can be classified and more specific evaluation and suggestion will be proposed and submitted to the government in order to control the COVID-19 diffusion. The effect of the measurements, e.g., lock down, can be quantified based on the values of  , which should approach zero for the sub-diffusion of COVID-19 controlled in a very stable state and lasts a very long time. feasible for normal diffusion. Table A1 and A2 provided the parameters in the bi-exponential model for the COVID-19 anomalous diffusion in the eight countries. The long short-term memory (LSTM) network is a kind of recurrent neural network (RNN), which has ability to deal with the long-term temporal correlations in time series. The topology LSTM with one cell is shown in Fig. A1 , which is adapted time t,  is the activation function of the gates, g and h are respectively the activation functions of the input and output of cell. More details can be found in [25] [26] . Fig. A1 Topology of LSTM network with one cell [25] . An epidemiological forecast model and software assessing interventions on COVID-19 epidemic in China Risk assessment of novel coronavirus COVID-19 outbreaks outside China The lockdown of Hubei province causing different transmission dynamics of the novel coronavirus (2019-nCov) in Wuhan and Beijing Applicability of time fractional derivative models for simulating the dynamics and mitigation scenarios of COVID-19 A fractional-order SEIHDR model for COVID-19 with inter-city networked coupling effects Preliminary prediction of the basic reproduction number of the Wuhan novel coronavirus 2019-nCoV Application of the ARIMA model on the COVID-2019 epidemic dataset Modeling and prediction of COVID-19 in Mexico applying mathematical and computational models Fractional motions The random walk's guide to anomalous diffusion: a fractional dynamics approach Models of anomalous diffusion in crowded environments A continuous time random walk approach to transient flow in heterogeneous porous media Using spectral and cumulative spectral entropy to classify anomalous diffusion in Sephadex™ gels A fractal derivative model for the characterization of anomalous diffusion in magnetic resonance imaging Similarity solutions for solute transport in fractal porous media using a time-and-scale-dependent dispersivity On random walks and entropy in diffusion-weighted magnetic resonance imaging studies of neural tissue Subdiffusion and the cage effect studied near the colloidal glass transition Distributed order Hausdorff derivative diffusion model to characterize non-Fickian diffusion in porous media Use of a variable-index fractional-derivative model to capture transient dispersion in heterogeneous media The distributed-order fractional diffusion-wave equation of groundwater flow: Theory and application to pumping and slug tests Separation of collagen-bound and porous bone-water longitudinal relaxation in mice using a segmented inversion recovery zero-echo-time sequence Bi-exponential diffusion signal decay in normal appearing white matter of multiple sclerosis Short-term building load forecast based on a data-mining feature selection and LSTM-RNN method Spatiotemporal traffic flow prediction with KNN and LSTM The work described in this paper was supported by the National Natural Science The bi-exponential model generalizes the classical exponential model with four parameters to be determined, which is expressed as . It is noted that the bi-exponential model is often used to describe signal decay in the field of MRI, in which the weighted parameters satisfy a + c = 1, and b and d are the slow and fast diffusion coefficient respectively [24] .Mono-exponential model is not suitable to the COVID-19 anomalous diffusion, but is from the Ref. [25] .The LSTM is conducted for the time series, L is the data length, by using the following steps [24] :Input gates 1 1Output gates 1 1where H is the number of hidden layer, M is the number of memory cells, ij  is the weight of the connection from unit i to unit j, t j I is the network input to unit j at time t, t j v is the value after activation function in the same unit, t c s is the state of cell at