key: cord-0134602-j1uy3t6i authors: Ebeigbe, Donald; Berry, Tyrus; Norton, Michael M.; Whalen, Andrew J.; Simon, Dan; Sauer, Timothy; Schiff, Steven J. title: A Generalized Unscented Transformation for Probability Distributions date: 2021-04-05 journal: nan DOI: nan sha: 04bde65feb1ee4a51232752ddf3429a765dae4b8 doc_id: 134602 cord_uid: j1uy3t6i The unscented transform uses a weighted set of samples called sigma points to propagate the means and covariances of nonlinear transformations of random variables. However, unscented transforms developed using either the Gaussian assumption or a minimum set of sigma points typically fall short when the random variable is not Gaussian distributed and the nonlinearities are substantial. In this paper, we develop the generalized unscented transform (GenUT), which uses 2n+1 sigma points to accurately capture up to the diagonal components of the skewness and kurtosis tensors of most probability distributions. Constraints can be analytically enforced on the sigma points while guaranteeing at least second-order accuracy. The GenUT uses the same number of sigma points as the original unscented transform while also being applicable to non-Gaussian distributions, including the assimilation of observations in the modeling of infectious diseases such as coronavirus (SARS-CoV-2) causing COVID-19. For many dynamic systems in practice, linearity is a reasonable assumption. For others, system non-linearities cause methods based on linear models to perform poorly. Most nonlinear systems can behave approximately linearly over small operation ranges. The extended Kalman filter (EKF) is one of the most widely used Kalman filter for nonlinear dynamic systems. The EKF employs a linear approximation of the nonlinear system around a nominal state trajectory [1] , [2] , [4] . However, for highly nonlinear systems, linear approximations can introduce errors that can lead to divergence of the state estimate. To address the drawbacks of the EKF, several well-known state estimators such as the ensemble Kalman filter [5] [6] [7] [8] , the unscented Kalman filter (UKF) [9] , [10] , and the particle filter [1] , [11] have been developed. Although the particle filter can give better performance than the UKF, this comes at the cost of a higher computational effort. In some applications, the improved performance might not be worth the additional computational costs [1] . The UKF is a nonlinear filter that uses the unscented transformation to approximate the mean and covariance of a Gaussian random variable [9] , [12] . The unscented transform uses the intuition that with a fixed number of parameters it should be easier to approximate a Gaussian distribution than it is to approximate an arbitrary nonlinear function or transformation [9] . It produces sets of vectors called sigma points that capture the moments of the standard Gaussian distribution. The UKF uses the generated sigma points to obtain estimates of the states and the state estimation error covariance. The UKF has been used to generate distributions which improve the performance of a particle filter [13] , [14] . It has also been employed to improve the performance of the EnKF [15] . Despite the several types of sigma points that exist in the literature [16] , [17] , a majority of them that were not developed using the Gaussian assumption do not try to match the skewness or kurtosis of a random variable, thereby ensuring only second-order accuracy. The need to effectively monitor, predict, and control the spread of infectious disease has led to the application of numerous state estimation techniques. The EKF [18] , [19] and the particle filter [20] have been used to estimate the parameters of the measles virus transmission dynamics from real data. The ensemble adjustment Kalman filter (EAKF) has been employed in the forecasting of influenza [21] and dengue fever [22] . Several infectious disease such as Ebola [23] , HIV [24] , and neonatal sepsis [25] have seen implementation of different Kalman filters. More recently, the outbreak of the novel coronavirus (SARS-CoV-2) causing COVID-19 has led to concerted efforts to properly understand its transmission and offer policy guidelines that can mitigate its spread. Recent efforts have employed the iterated EAKF to assimilate daily observations in the modeling of COVID-19 [26] . Distributions such as Poisson, negative-binomial, and binomial are typically used for modeling infectious disease from count data. Additionally, the number of patients arriving at a hospital or a testing center can be modeled by a Poisson distribution whose rate is proportional to the infected population. Although the use of standard Kalman filters in infectious disease estimation and prediction under the Poisson assumption can be justified with the fact that a Poisson distribution with a large rate can be approximated by a Gaussian distribution of the same mean and variance, the approximation breaks down when the rates are small [27] . The usage of Kalman filters to assimilate data generated by the transformation of random variables from different probability distributions revealed a fundamental mismatch in the application of the filters -the accuracy of the filter is reduced if the Gaussian assumption is not satisfied and the nonlinearities are high. This led to the development of unscented transforms that can account for some higher-order moment information such as the skewness and kurtosis [28] [29] [30] [31] [32] . The unscented transforms can be grouped into two categories: the ones that employ 2n + 1 sigma points [28] [29] [30] and the ones that use more than 2n + 1 sigma points [31] , [32] . First, we consider those that use 2n+1 sigma points. In [28] , an unscented transform was developed to match the average marginal skewness and kurtosis. The method however did not match the true skewness and true kurtosis for each element of the random vector. In [29] , a randomized unscented transform was used in the development of a filter for non-Gaussian systems. Although the method uses a stochastic integration rule to solve state and measurement statistics, the sigma points are generated under the Gaussian assumption. In [30] , an unscented transform was developed to capture the skewness of a random vector. However, the method assumes a closed skew normal distribution in its development. All preceding methods that use 2n + 1 sigma points either apply only to special distributions or can capture at most the average skewness and kurtosis. Now we consider those that use more than 2n + 1 sigma points. In [31] , an unscented transform was developed to match the first four moments of Gaussian random variables. In [32] , a higher order unscented transform was developed to match the skewness and kurtosis tensors with high accuracy. The method uses an approximate CANDECOMP/PARAFAC (CP) tensor decomposition to generate its sigma points. However, depending on the dimension of the problem and the error tolerance level in approximating the skewness and kurtosis tensors, this method can require significant computational costs. This is because the sequence of vectors and constants used in the approximate CP method can significantly increase when the error tolerance level is made small. All preceding methods that use more than 2n + 1 sigma points either applied to only to special distributions or had significantly higher complexity and computational cost. For an n-dimensional random vector, 2n + 1 sigma points generally employs 2n 2 +3n+1 free parameters (2n+1 weights and 2n 2 + n constants that define the coordinates of the sigma points). Trying to match the mean, covariance, skewness, and kurtosis imposes n, O(n 2 ), O(n 3 ), and O(n 4 ) constraints respectively. In principle, it is impossible to match all these moments using only 2n + 1 sigma points. The zero skewness nature of the Gaussian distribution made it possible to use 2n + 1 sigma points to accurately match up to the skewness in [9] . The presence of the O(n 3 ) skewness and O(n 4 ) kurtosis constraints are what prompted researchers to look beyond 2n + 1 sigma points. However, we note that matching the mean and covariance constraints of any random vector using 2n + 1 sigma points still leaves n 2 + 2n + 1 free parameters. These residual parameters have been underutilized in capturing as much information as possible about the components of the skewness and kurtosis tensors when the random variable is not Gaussian. One instance where the residual parameters were leveraged was in the capturing of the average marginal skewness and kurtosis, which only represents a total of 2 constraints [28] . In this paper, we develop the generalized unscented transform (GenUT) which is able to adapt to the unique statistics of most probability distributions. We use the intuition that employing sigma points more suitable to the inherent distributions of a random vector can lead to a more accurate propagation of means and covariances. Our method uses 2n + 1 sigma points that not only accurately matches the mean and covariance matrix, but also takes advantage of the additional free parameters to accurately match the diagonal components of the skewness tensor and kurtosis tensor of most random vectors. We employ n 2 +n 2 + 3n constraints in total; n for the mean, n 2 +n 2 for the covariance, n for the diagonal components of the skewness tensor, and n for the diagonal components of the kurtosis tensor. This total falls within the 2n 2 + 3n + 1 free parameters available. While more parameters remain, the diagonal components of the skewness and kurtosis tensors are the most significant. In comparison to [28] [29] [30] [31] , our method gives a general way to accurately match the diagonal components of the skewness and kurtosis tensors of most random vectors. In comparison to [32] , our method uses fewer sigma points which is crucial for larger system dimensions. In comparison to the standard unscented transform, we acquire the most significant higher moment information of most probability distributions with the same number of sigma points. In Section II, we discuss the problems that arise when the Gaussian assumption is employed in the unscented transform. In Section III, we develop the GenUT sigma points that can capture certain properties of most probability distributions, such as its mean, covariance, skewness, and kurtosis. In Section IV, we show that our sigma points are accurate in approximating the mean, covariance, and diagonal components of the skewness and kurtosis tensors. In Section V, we address constraints and show that imposing constraints can at least maintain second-order accuracy. In Section VI, we evaluate the accuracy of the GenUT sigma points in propagating means and covariances of nonlinear transformations of arbitrarily distributed random vectors and we give several examples that demonstrate its effectiveness when compared against other unscented transforms. We discuss the conclusions in Section VII. We analyze the performance of unscented transforms that were motivated by the Gaussian statistics [9] , [12] . We will show how linearization approximations, via Taylor series expansion of a nonlinear transformation of a random vector x evaluated about its meanx, introduces errors in the propagation of means and covariances. We will see that errors can be introduced in the propagation of means and covariances beyond the second order when used to approximate a nonlinear function λ(x) of a possibly non-Gaussian distributed random vector x ∈ R n . Definition 1. Let x ∈ R n be a random vector. We define the meanx ∈ R n , covariance P ∈ R n×n , skewness tensor S ∈ R n×n×n , and kurtosis tensor K ∈ R n×n×n×n as (1) The sample mean and sample covariance of the nonlinear transformation y ∈ R n given by can be calculated as follows [9] . 1) Calculate the 2n + 1 sigma points given by 1 is the ith column of (n + κ)P , w i is the weight associated with the ith sigma point, and κ is a free parameter 2 We typically set κ = n − 3 to minimize the fourth-order moment mismatch. 2) Pass the sigma points through the known nonlinear function to get the transformed sigma points 3) Evaluate the sample mean of the transformed sigma pointsȳ 4) Evaluate the sample covariance of the transformed sigma points A. Accuracy in Approximating the True Mean Applying a Taylor series expansion of λ(x) about its mean x, we show in Appendix A-A that the true mean of y = λ(x) is given as The analytical expression for the approximated mean from [9] is given as Comparing the above equation with the true mean of (9), we notice the following problems about the sigma points developed using the Gaussian assumption 1) The odd-powered moments in the approximation of the true mean are always zero due to their symmetry. This introduces significant approximation errors in situations where the odd-powered moments of the distribution of x are non-zero and the transformation y = λ(x) is highly nonlinear. 2) The fourth-order term fails to capture a part of the true kurtosis even when the optimal value of κ = n − 3 is selected because of the Gaussian assumption. We also note that errors in approximating the mean beyond the second order occur not only for sets of 2n + 1 sigma points existing in the literature, but also for sets of n + 1 sigma points [10] , [33] -this is because they do not account for the skewness and kurtosis of x when it is not Gaussian distributed. The true covariance matrix, which was evaluated in Appendix A-B, is given as where we have used the notation xx T = x[· · · ] T . The analytical expression for the approximated covariance matrix from [9] is given as Comparing the above equation with the true covariance matrix of (11), we notice similar issues that were pointed out in approximating the mean -the approximation is only accurate up to the second order when x is not Gaussian distributed. All the odd-powered moments are zero because of the symmetric nature of the sigma points, while the fourth-powered moment is also inaccurate because of the Gaussian nature of the sigma points. As with the mean approximation, errors in the covariance matrix approximation are introduced beyond the second order not only for sets of 2n + 1 sigma points existing in the literature, but also for sets of n + 1 sigma points. III. GENERALIZED UNSCENTED TRANSFORM For a random vector x ∈ R n , we develop sigma points that can accurately capture the mean, covariance matrix, and the diagonal components of both the skewness tensor and the kurtosis tensor. This is done by selecting sigma point distributions that have the flexibility to either be symmetric when x is symmetrically distributed or be asymmetric when x is asymmetrically distributed. Assumption 1. The random vector x follows a probability distribution with finite moments. We reduce the problem of approximating x to the problem of approximating a user-specified arbitrarily distributed random vector z ∈ R n with zero mean and unit variance, whose higher-order moments are functions of the higher-order moments of x. We write where √ P is the matrix square root of P , √ P √ P T = P Definition 2. Let x be a vector, P be a square matrix, and k be some positive integer. We define the element-wise product (Hadamard product) as , such that We also define the element-wise division (Hadamard division) as . We develop sigma points that match the first three moments of z in a single dimension, and then constrain those points to match the fourth moment of z. For a one-dimensional distribution, we will show how to select sigma points such that the first four moments satisfy To capture the first three moments in a single dimension, three points are used: the first point lies at the origin with a weight of w 0 ; the second point lies at a distance −u from the origin with a weight of w 1 ; the third point lies at a distance v from the origin with a weight of w 2 . Therefore, in onedimension, we use the following 3 sigma points where w 0 , w 1 , and w 2 are the weights for the respective sigma points. A visual representation of our sigma points in one dimension is shown in Fig. 1 . Obeying the moments of z and the fact that the sum of all weights should equal 1, we write From (15), we see that w 1 = v u w 2 . Rewriting (16) using (17) gives We designate u as the free parameter while assuming that From (14) and (18), we see that the weights are given as We note that the free parameter u can be selected to match the fourth moment of z. We now attempt to satisfy the fourth moment constraint given by Eliminating Using the relationships , the above equation reduces to The solution to the above quadratic equation is where v is given in (20) . The equations for w 1 , w 2 , and w 0 remain unchanged. Remark 1. We note that the sigma points described above, which accurately capture the kurtosis when constrained, were designed for when the state has a dimension of 1. This implies that z, P , S, K ∈ R 1 . In the next section, we extend this to multiple dimensions. For an n-dimensional vector z, we develop a set of sigma points that accurately matches its mean and covariance matrix, while accurately matching the diagonal components of the skewness tensor. Furthermore, by constraining the sigma points, we show that we can accurately match the diagonal components of the kurtosis tensor. We note that for an independent random vector, accurately matching the diagonal components of the skewness tensor implies an accurate matching of the entire skewness tensor. Definition 3. We define the vectorsS ∈ R n andK ∈ R n which contain the diagonal components of the skewness tensor and kurtosis tensor respectively, such that For a multi-dimensional distribution, we will show how to select the 2n + 1sigma points such that the first four moments satisfy where I ∈ R n×n is the identity matrix. Remark 2. Due to the positive definiteness of the covariance matrix P ∈ R n×n , it is always invertible. A visual representation of our sigma points for a twodimensional distribution is shown in Fig. 2 . Our first point lies at (0, 0) with a weight of w 0 . Our second point lies on the coordinate axes a distance −u 1 from the origin with a weight of w 1 . Our third point lies on the coordinate axes a distance −u 2 from the origin with a weight of w 2 . Our fourth point lies on the coordinate axes a distance v 1 from the origin with a weight of w 3 . Our fifth point lies on the coordinate axes a distance v 2 from the origin with a weight of w 4 . Therefore, our unscented transform uses the following 2n+1 sigma points Obeying the moments of z, we write where 1 ∈ R n is a vector of ones. From (26), we see that w = w v u. Rewriting (27) and (28) gives Selecting u > 0 as the free parameters, we get Therefore, from (25) and (29), we see that To match the diagonal components of the kurtosis tensor, we need to satisfy Solving the above equation results in constrained values for u, such that It can be shown from (13) that the algorithm for selecting the 2n + 1 sigma points for any random vector x is given in Algorithm 1. We recall from (31) the constraint u > 0 exists. Applying this constraint on (34), we see that The inequality in (35) -at least for a one-dimensional caseagrees with the findings by Pearson in [34] that for probability distributions, the standardized kurtosis always exceeds the squared of the standardized skewness. If the inequality in (35) were violated, then (34) becomes infeasible, which in turn requires the free parameter u > 0 in (31) to be selected such that v > 0 -although this eliminates the accuracy in matching the diagonal components of the kurtosis tensor, the sigma points are still able to accurately match the diagonal components of the skewness tensor. There might be concerns that v in (31) might be negative whenever the term is negative. If (35) is satisfied, then selecting u using (34) leads to v > 0. Alternatively, arbitrarily selecting u such that u > Algorithm 1 can be used to create sigma points that can match up to the kurtosis if (35) is satisfied. For example, we want to prescribe some arbitrary mean, variance, skewness, and kurtosis for a random variable x that is not from any known probability distribution. Randomly selecting the mean x, variance P , and skewness S asx = 0.1, P = 0.2, and S = −0.5 respectively, we can use Algorithm 1 to match them exactly. However, we can not randomly select a kurtosis K and expect to match it. The selection of the kurtosis K must satisfy (35), so for this example, we require K > −0.5 2 0.2 = 1.25. Prescribing a kurtosis value of K = 1.3 satisfies (35). Now using Algorithm 1, we see that w 0 = 0.2, w 1 = 0.0286, w 2 = 0.7714, u = 5.8055, and v = 0.2153. The sample mean, sample covariance, sample skewness, and sample kurtosis exactly match their true prescribed values. We show how to calculate the sample statistics in Section IV. We use the moment generating function (MGF) M (t) to evaluate the mean and higher-order central moments of a probability distribution. For any random variable x [35] , its MGF and n-th moment are given by We also use the gamma notation The first four moments of 10 different probability distributions can be found in Table I . We demonstrate the accuracy of our sigma points in approximating any random vector x ∈ R n . Theorem 1. Let x ∈ R n be any random vector with mean x and covariance matrix P , skewness tensor S, and kurtosis tensor K. The following statements are true for the 2n + 1 sigma points be defined as shown in Algorithm 1. 1) The sample mean, Proof. For our proof, we introduce diagonal matrices U , V ∈ R n×n such that U = diag(u) and V = diag(v). In matrix form, we evaluate the sample mean aŝ Negative Binomial N B(r, p) is the ith column of the matrix square root of P . 5 Calculate the weights ; Note : To match the diagonal components of the kurtosis tensor, select Algorithm 1: Sigma Points for the Generalized Unscented Transform because 2n i=0 w i = 1 and w v = w u. We see that the sample mean equals the actual mean. Evaluating the sample covariance matrix, we get because w u 2 + w v 2 = 1 is the diagonal of diag(w )U 2 +diag(w )V 2 . We see that the sample covariance matrix equals the actual covariance matrix. DefiningŜ ∈ R n as a vector containing the diagonal components of the sample skewness tensor such that S = [Ŝ 111 ,Ŝ 222 , · · · ,Ŝ nnn ] T we can evaluate the diagonal components of the sample skewness tensor aŝ We see that our sigma points accurately match the diagonal components of the skewness tensor. Finally, definingK ∈ R n as a vector containing the diagonal components of the sample kurtosis tensor such that we can evaluate the diagonal components of the sample kurtosis tensor aŝ We see that our sigma points accurately match the diagonal components of the kurtosis tensor. Theorem 1 shows that our sigma points in Algorithm 1 can accurately approximate the mean and covariance of any random vector, as well as the diagonal components of the skewness and kurtosis tensors -this makes it applicable to a wide variety of applications. V. CONSTRAINED SIGMA POINTS Noting that several physical systems require some constraints on their states or parameters, we show how our sigma points can be constrained while at least maintaining secondorder accuracy. We require the sigma points to be constrained such that where a ∈ R n and b ∈ R n are the lower bounds and upper bounds respectively. We note that our sigma points of Algorithm 1 can violate some state constraints despite being able to accurately capture the mean and covariance of a random vector, as well as the diagonal components of its skewness and kurtosis tensors. This might make them inapplicable in situations/models that only permit constrained values. For example, in applications that assume a Poisson distribution for the states, such as count data, the states are usually positive by default and can never be negative. When our sigma point of Algorithm 1 is applied, the positive constraint on an independent random vector can be violated. We demonstrate this using the following example. Example 1. We generate sigma points for an independent Poisson random vector x such that wherex is the mean, P is the covariance matrix, andS andK are vectors containing the diagonal components of the skewness tensor and kurtosis tensor respectively. Using Algorithm 1, we see that w 0 = 0. We see from Example 1 that despite the accuracy of the sample statistics, the sigma points χ [1] and χ [2] both had a negative value which do not satisfy the non-negativity of Poisson draws. Corollary 1. If the bound a