key: cord-0134602-j1uy3t6i
authors: Ebeigbe, Donald; Berry, Tyrus; Norton, Michael M.; Whalen, Andrew J.; Simon, Dan; Sauer, Timothy; Schiff, Steven J.
title: A Generalized Unscented Transformation for Probability Distributions
date: 2021-04-05
journal: nan
DOI: nan
sha: 04bde65feb1ee4a51232752ddf3429a765dae4b8
doc_id: 134602
cord_uid: j1uy3t6i

The unscented transform uses a weighted set of samples called sigma points to propagate the means and covariances of nonlinear transformations of random variables. However, unscented transforms developed using either the Gaussian assumption or a minimum set of sigma points typically fall short when the random variable is not Gaussian distributed and the nonlinearities are substantial. In this paper, we develop the generalized unscented transform (GenUT), which uses 2n+1 sigma points to accurately capture up to the diagonal components of the skewness and kurtosis tensors of most probability distributions. Constraints can be analytically enforced on the sigma points while guaranteeing at least second-order accuracy. The GenUT uses the same number of sigma points as the original unscented transform while also being applicable to non-Gaussian distributions, including the assimilation of observations in the modeling of infectious diseases such as coronavirus (SARS-CoV-2) causing COVID-19.

For many dynamic systems in practice, linearity is a reasonable assumption. For others, system non-linearities cause methods based on linear models to perform poorly. Most nonlinear systems can behave approximately linearly over small operation ranges. The extended Kalman filter (EKF) is one of the most widely used Kalman filter for nonlinear dynamic systems. The EKF employs a linear approximation of the nonlinear system around a nominal state trajectory [1] , [2] , [4] . However, for highly nonlinear systems, linear approximations can introduce errors that can lead to divergence of the state estimate.

To address the drawbacks of the EKF, several well-known state estimators such as the ensemble Kalman filter [5] [6] [7] [8] , the unscented Kalman filter (UKF) [9] , [10] , and the particle filter [1] , [11] have been developed. Although the particle filter can give better performance than the UKF, this comes at the cost of a higher computational effort. In some applications, the improved performance might not be worth the additional computational costs [1] .

The UKF is a nonlinear filter that uses the unscented transformation to approximate the mean and covariance of a Gaussian random variable [9] , [12] . The unscented transform uses the intuition that with a fixed number of parameters it should be easier to approximate a Gaussian distribution than it is to approximate an arbitrary nonlinear function or transformation [9] . It produces sets of vectors called sigma points that capture the moments of the standard Gaussian distribution. The UKF uses the generated sigma points to obtain estimates of the states and the state estimation error covariance. The UKF has been used to generate distributions which improve the performance of a particle filter [13] , [14] . It has also been employed to improve the performance of the EnKF [15] . Despite the several types of sigma points that exist in the literature [16] , [17] , a majority of them that were not developed using the Gaussian assumption do not try to match the skewness or kurtosis of a random variable, thereby ensuring only second-order accuracy.

The need to effectively monitor, predict, and control the spread of infectious disease has led to the application of numerous state estimation techniques. The EKF [18] , [19] and the particle filter [20] have been used to estimate the parameters of the measles virus transmission dynamics from real data. The ensemble adjustment Kalman filter (EAKF) has been employed in the forecasting of influenza [21] and dengue fever [22] . Several infectious disease such as Ebola [23] , HIV [24] , and neonatal sepsis [25] have seen implementation of different Kalman filters. More recently, the outbreak of the novel coronavirus (SARS-CoV-2) causing COVID-19 has led to concerted efforts to properly understand its transmission and offer policy guidelines that can mitigate its spread. Recent efforts have employed the iterated EAKF to assimilate daily observations in the modeling of COVID-19 [26] . Distributions such as Poisson, negative-binomial, and binomial are typically used for modeling infectious disease from count data. Additionally, the number of patients arriving at a hospital or a testing center can be modeled by a Poisson distribution whose rate is proportional to the infected population. Although the use of standard Kalman filters in infectious disease estimation and prediction under the Poisson assumption can be justified with the fact that a Poisson distribution with a large rate can be approximated by a Gaussian distribution of the same mean and variance, the approximation breaks down when the rates are small [27] .

The usage of Kalman filters to assimilate data generated by the transformation of random variables from different probability distributions revealed a fundamental mismatch in the application of the filters -the accuracy of the filter is reduced if the Gaussian assumption is not satisfied and the nonlinearities are high. This led to the development of unscented transforms that can account for some higher-order moment information such as the skewness and kurtosis [28] [29] [30] [31] [32] . The unscented transforms can be grouped into two categories: the ones that employ 2n + 1 sigma points [28] [29] [30] and the ones that use more than 2n + 1 sigma points [31] , [32] .

First, we consider those that use 2n+1 sigma points. In [28] , an unscented transform was developed to match the average marginal skewness and kurtosis. The method however did not match the true skewness and true kurtosis for each element of the random vector. In [29] , a randomized unscented transform was used in the development of a filter for non-Gaussian systems. Although the method uses a stochastic integration rule to solve state and measurement statistics, the sigma points are generated under the Gaussian assumption. In [30] , an unscented transform was developed to capture the skewness of a random vector. However, the method assumes a closed skew normal distribution in its development. All preceding methods that use 2n + 1 sigma points either apply only to special distributions or can capture at most the average skewness and kurtosis.

Now we consider those that use more than 2n + 1 sigma points. In [31] , an unscented transform was developed to match the first four moments of Gaussian random variables. In [32] , a higher order unscented transform was developed to match the skewness and kurtosis tensors with high accuracy. The method uses an approximate CANDECOMP/PARAFAC (CP) tensor decomposition to generate its sigma points. However, depending on the dimension of the problem and the error tolerance level in approximating the skewness and kurtosis tensors, this method can require significant computational costs. This is because the sequence of vectors and constants used in the approximate CP method can significantly increase when the error tolerance level is made small. All preceding methods that use more than 2n + 1 sigma points either applied to only to special distributions or had significantly higher complexity and computational cost.

For an n-dimensional random vector, 2n + 1 sigma points generally employs 2n 2 +3n+1 free parameters (2n+1 weights and 2n 2 + n constants that define the coordinates of the sigma points). Trying to match the mean, covariance, skewness, and kurtosis imposes n, O(n 2 ), O(n 3 ), and O(n 4 ) constraints respectively. In principle, it is impossible to match all these moments using only 2n + 1 sigma points. The zero skewness nature of the Gaussian distribution made it possible to use 2n + 1 sigma points to accurately match up to the skewness in [9] . The presence of the O(n 3 ) skewness and O(n 4 ) kurtosis constraints are what prompted researchers to look beyond 2n + 1 sigma points. However, we note that matching the mean and covariance constraints of any random vector using 2n + 1 sigma points still leaves n 2 + 2n + 1 free parameters. These residual parameters have been underutilized in capturing as much information as possible about the components of the skewness and kurtosis tensors when the random variable is not Gaussian. One instance where the residual parameters were leveraged was in the capturing of the average marginal skewness and kurtosis, which only represents a total of 2 constraints [28] .

In this paper, we develop the generalized unscented transform (GenUT) which is able to adapt to the unique statistics of most probability distributions. We use the intuition that employing sigma points more suitable to the inherent distributions of a random vector can lead to a more accurate propagation of means and covariances. Our method uses 2n + 1 sigma points that not only accurately matches the mean and covariance matrix, but also takes advantage of the additional free parameters to accurately match the diagonal components of the skewness tensor and kurtosis tensor of most random vectors. We employ n 2 +n 2 + 3n constraints in total; n for the mean, n 2 +n 2 for the covariance, n for the diagonal components of the skewness tensor, and n for the diagonal components of the kurtosis tensor. This total falls within the 2n 2 + 3n + 1 free parameters available. While more parameters remain, the diagonal components of the skewness and kurtosis tensors are the most significant. In comparison to [28] [29] [30] [31] , our method gives a general way to accurately match the diagonal components of the skewness and kurtosis tensors of most random vectors. In comparison to [32] , our method uses fewer sigma points which is crucial for larger system dimensions. In comparison to the standard unscented transform, we acquire the most significant higher moment information of most probability distributions with the same number of sigma points.

In Section II, we discuss the problems that arise when the Gaussian assumption is employed in the unscented transform. In Section III, we develop the GenUT sigma points that can capture certain properties of most probability distributions, such as its mean, covariance, skewness, and kurtosis. In Section IV, we show that our sigma points are accurate in approximating the mean, covariance, and diagonal components of the skewness and kurtosis tensors. In Section V, we address constraints and show that imposing constraints can at least maintain second-order accuracy. In Section VI, we evaluate the accuracy of the GenUT sigma points in propagating means and covariances of nonlinear transformations of arbitrarily distributed random vectors and we give several examples that demonstrate its effectiveness when compared against other unscented transforms. We discuss the conclusions in Section VII.

We analyze the performance of unscented transforms that were motivated by the Gaussian statistics [9] , [12] . We will show how linearization approximations, via Taylor series expansion of a nonlinear transformation of a random vector x evaluated about its meanx, introduces errors in the propagation of means and covariances. We will see that errors can be introduced in the propagation of means and covariances beyond the second order when used to approximate a nonlinear function λ(x) of a possibly non-Gaussian distributed random vector x ∈ R n . Definition 1. Let x ∈ R n be a random vector. We define the meanx ∈ R n , covariance P ∈ R n×n , skewness tensor S ∈ R n×n×n , and kurtosis tensor K ∈ R n×n×n×n as

(1)

The sample mean and sample covariance of the nonlinear transformation y ∈ R n given by

can be calculated as follows [9] . 1) Calculate the 2n + 1 sigma points given by 1

is the ith column of (n + κ)P , w i is the weight associated with the ith sigma point, and κ is a free parameter 2 We typically set κ = n − 3 to minimize the fourth-order moment mismatch. 2) Pass the sigma points through the known nonlinear function to get the transformed sigma points

3) Evaluate the sample mean of the transformed sigma pointsȳ

4) Evaluate the sample covariance of the transformed sigma points

A. Accuracy in Approximating the True Mean

Applying a Taylor series expansion of λ(x) about its mean x, we show in Appendix A-A that the true mean of y = λ(x) is given as

The analytical expression for the approximated mean from [9] is given as

Comparing the above equation with the true mean of (9), we notice the following problems about the sigma points developed using the Gaussian assumption 1) The odd-powered moments in the approximation of the true mean are always zero due to their symmetry. This introduces significant approximation errors in situations where the odd-powered moments of the distribution of x are non-zero and the transformation y = λ(x) is highly nonlinear.

2) The fourth-order term fails to capture a part of the true kurtosis even when the optimal value of κ = n − 3 is selected because of the Gaussian assumption.

We also note that errors in approximating the mean beyond the second order occur not only for sets of 2n + 1 sigma points existing in the literature, but also for sets of n + 1 sigma points [10] , [33] -this is because they do not account for the skewness and kurtosis of x when it is not Gaussian distributed.

The true covariance matrix, which was evaluated in Appendix A-B, is given as

where we have used the notation xx T = x[· · · ] T . The analytical expression for the approximated covariance matrix from [9] is given as

Comparing the above equation with the true covariance matrix of (11), we notice similar issues that were pointed out in approximating the mean -the approximation is only accurate up to the second order when x is not Gaussian distributed. All the odd-powered moments are zero because of the symmetric nature of the sigma points, while the fourth-powered moment is also inaccurate because of the Gaussian nature of the sigma points. As with the mean approximation, errors in the covariance matrix approximation are introduced beyond the second order not only for sets of 2n + 1 sigma points existing in the literature, but also for sets of n + 1 sigma points.

III. GENERALIZED UNSCENTED TRANSFORM For a random vector x ∈ R n , we develop sigma points that can accurately capture the mean, covariance matrix, and the diagonal components of both the skewness tensor and the kurtosis tensor. This is done by selecting sigma point distributions that have the flexibility to either be symmetric when x is symmetrically distributed or be asymmetric when x is asymmetrically distributed. Assumption 1. The random vector x follows a probability distribution with finite moments.

We reduce the problem of approximating x to the problem of approximating a user-specified arbitrarily distributed random vector z ∈ R n with zero mean and unit variance, whose higher-order moments are functions of the higher-order moments of x. We write where √ P is the matrix square root of P , √ P √ P T = P Definition 2. Let x be a vector, P be a square matrix, and k be some positive integer. We define the element-wise product (Hadamard product) as , such that

We also define the element-wise division (Hadamard division) as .

We develop sigma points that match the first three moments of z in a single dimension, and then constrain those points to match the fourth moment of z. For a one-dimensional distribution, we will show how to select sigma points such that the first four moments satisfy

To capture the first three moments in a single dimension, three points are used: the first point lies at the origin with a weight of w 0 ; the second point lies at a distance −u from the origin with a weight of w 1 ; the third point lies at a distance v from the origin with a weight of w 2 . Therefore, in onedimension, we use the following 3 sigma points

where w 0 , w 1 , and w 2 are the weights for the respective sigma points. A visual representation of our sigma points in one dimension is shown in Fig. 1 . Obeying the moments of z and the fact that the sum of all weights should equal 1, we write

From (15), we see that w 1 = v u w 2 . Rewriting (16) using (17) gives

We designate u as the free parameter while assuming that

From (14) and (18), we see that the weights are given as

We note that the free parameter u can be selected to match the fourth moment of z. We now attempt to satisfy the fourth moment constraint given by

Eliminating

Using the relationships

, the above equation reduces to

The solution to the above quadratic equation is

where v is given in (20) . The equations for w 1 , w 2 , and w 0 remain unchanged.

Remark 1. We note that the sigma points described above, which accurately capture the kurtosis when constrained, were designed for when the state has a dimension of 1. This implies that z, P , S, K ∈ R 1 .

In the next section, we extend this to multiple dimensions.

For an n-dimensional vector z, we develop a set of sigma points that accurately matches its mean and covariance matrix, while accurately matching the diagonal components of the skewness tensor. Furthermore, by constraining the sigma points, we show that we can accurately match the diagonal components of the kurtosis tensor. We note that for an independent random vector, accurately matching the diagonal components of the skewness tensor implies an accurate matching of the entire skewness tensor.

Definition 3. We define the vectorsS ∈ R n andK ∈ R n which contain the diagonal components of the skewness tensor and kurtosis tensor respectively, such that For a multi-dimensional distribution, we will show how to select the 2n + 1sigma points such that the first four moments satisfy

where I ∈ R n×n is the identity matrix.

Remark 2. Due to the positive definiteness of the covariance matrix P ∈ R n×n , it is always invertible.

A visual representation of our sigma points for a twodimensional distribution is shown in Fig. 2 . Our first point lies at (0, 0) with a weight of w 0 . Our second point lies on the coordinate axes a distance −u 1 from the origin with a weight of w 1 . Our third point lies on the coordinate axes a distance −u 2 from the origin with a weight of w 2 . Our fourth point lies on the coordinate axes a distance v 1 from the origin with a weight of w 3 . Our fifth point lies on the coordinate axes a distance v 2 from the origin with a weight of w 4 . Therefore, our unscented transform uses the following 2n+1 sigma points

Obeying the moments of z, we write

where 1 ∈ R n is a vector of ones. From (26), we see that w = w v u. Rewriting (27) and (28) gives

Selecting u > 0 as the free parameters, we get

Therefore, from (25) and (29), we see that

To match the diagonal components of the kurtosis tensor, we need to satisfy

Solving the above equation results in constrained values for u, such that

It can be shown from (13) that the algorithm for selecting the 2n + 1 sigma points for any random vector x is given in Algorithm 1. We recall from (31) the constraint u > 0 exists. Applying this constraint on (34), we see that

The inequality in (35) -at least for a one-dimensional caseagrees with the findings by Pearson in [34] that for probability distributions, the standardized kurtosis always exceeds the squared of the standardized skewness. If the inequality in (35) were violated, then (34) becomes infeasible, which in turn requires the free parameter u > 0 in (31) to be selected such that v > 0 -although this eliminates the accuracy in matching the diagonal components of the kurtosis tensor, the sigma points are still able to accurately match the diagonal components of the skewness tensor.

There might be concerns that v in (31) might be negative whenever the term

is negative. If (35) is satisfied, then selecting u using (34) leads to v > 0. Alternatively, arbitrarily selecting u such that u >

Algorithm 1 can be used to create sigma points that can match up to the kurtosis if (35) is satisfied. For example, we want to prescribe some arbitrary mean, variance, skewness, and kurtosis for a random variable x that is not from any known probability distribution. Randomly selecting the mean x, variance P , and skewness S asx = 0.1, P = 0.2, and S = −0.5 respectively, we can use Algorithm 1 to match them exactly. However, we can not randomly select a kurtosis K and expect to match it. The selection of the kurtosis K must satisfy (35), so for this example, we require K > −0.5 2 0.2 = 1.25. Prescribing a kurtosis value of K = 1.3 satisfies (35). Now using Algorithm 1, we see that w 0 = 0.2, w 1 = 0.0286, w 2 = 0.7714, u = 5.8055, and v = 0.2153. The sample mean, sample covariance, sample skewness, and sample kurtosis exactly match their true prescribed values. We show how to calculate the sample statistics in Section IV.

We use the moment generating function (MGF) M (t) to evaluate the mean and higher-order central moments of a probability distribution. For any random variable x [35] , its MGF and n-th moment are given by

We also use the gamma notation

The first four moments of 10 different probability distributions can be found in Table I .

We demonstrate the accuracy of our sigma points in approximating any random vector x ∈ R n . Theorem 1. Let x ∈ R n be any random vector with mean x and covariance matrix P , skewness tensor S, and kurtosis tensor K. The following statements are true for the 2n + 1 sigma points be defined as shown in Algorithm 1.

1) The sample mean,

Proof. For our proof, we introduce diagonal matrices U , V ∈ R n×n such that U = diag(u) and V = diag(v). In matrix form, we evaluate the sample mean aŝ 

Negative Binomial N B(r, p) 

is the ith column of the matrix square root of P . 5 Calculate the weights ;

Note : To match the diagonal components of the kurtosis tensor, select

Algorithm 1: Sigma Points for the Generalized Unscented Transform because 2n i=0 w i = 1 and w v = w u. We see that the sample mean equals the actual mean. Evaluating the sample covariance matrix, we get

because w u 2 + w v 2 = 1 is the diagonal of diag(w )U 2 +diag(w )V 2 . We see that the sample covariance matrix equals the actual covariance matrix. DefiningŜ ∈ R n as a vector containing the diagonal components of the sample skewness tensor such that S = [Ŝ 111 ,Ŝ 222 , · · · ,Ŝ nnn ] T we can evaluate the diagonal components of the sample skewness tensor aŝ

We see that our sigma points accurately match the diagonal components of the skewness tensor. Finally, definingK ∈ R n as a vector containing the diagonal components of the sample kurtosis tensor such that

we can evaluate the diagonal components of the sample kurtosis tensor aŝ

We see that our sigma points accurately match the diagonal components of the kurtosis tensor.

Theorem 1 shows that our sigma points in Algorithm 1 can accurately approximate the mean and covariance of any random vector, as well as the diagonal components of the skewness and kurtosis tensors -this makes it applicable to a wide variety of applications.

V. CONSTRAINED SIGMA POINTS Noting that several physical systems require some constraints on their states or parameters, we show how our sigma points can be constrained while at least maintaining secondorder accuracy.

We require the sigma points to be constrained such that

where a ∈ R n and b ∈ R n are the lower bounds and upper bounds respectively.

We note that our sigma points of Algorithm 1 can violate some state constraints despite being able to accurately capture the mean and covariance of a random vector, as well as the diagonal components of its skewness and kurtosis tensors. This might make them inapplicable in situations/models that only permit constrained values. For example, in applications that assume a Poisson distribution for the states, such as count data, the states are usually positive by default and can never be negative. When our sigma point of Algorithm 1 is applied, the positive constraint on an independent random vector can be violated. We demonstrate this using the following example. Example 1. We generate sigma points for an independent Poisson random vector x such that

wherex is the mean, P is the covariance matrix, andS andK are vectors containing the diagonal components of the skewness tensor and kurtosis tensor respectively. Using Algorithm 1, we see that w 0 = 0. We see from Example 1 that despite the accuracy of the sample statistics, the sigma points χ [1] and χ [2] both had a negative value which do not satisfy the non-negativity of Poisson draws. Corollary 1. If the bound a <x is violated after implementing Algorithm 1, then enforcing the constraint a <x leads to accuracy in capturing only the mean, covariance matrix, and the diagonal components of the skewness tensor.

Proof. Lower bounding x will require redefining the variable u such that (34) is no longer satisfied. Theorem 1 establishes that violating (34) ensures an inaccurate approximation of the diagonal components of the kurtosis tensor.

Corollary 2. If the boundx < b is violated after implementing Algorithm 1, then enforcing eitherx < b or a <x < b leads to accuracy in capturing only the mean and covariance matrix.

Proof. Both cases,x < b or a <x < b, require the redefinition of the variable v. This means that (31) will no longer be satisfied. Theorem 1 establishes that violating (31) ensures an inaccurate approximation of the diagonal components of skewness and kurtosis tensor.

To enforce constraints on the sigma points, we introduce a slack parameter θ ∈ (0, · · · , 1) which is a user selected constant. Using θ, we now redefine the free parameters u i and v i for i ∈ {1, · · · , n} as

where |.| denotes the absolute value, and the sigma points get closer to their constraints as θ → 1. We note that the equations for w and w are unchanged. We note that enforcing constrains on the sigma points results in a loss of accuracy in capturing the diagonal components of at least the kurtosis tensor. The constrained sigma point algorithm is given in Algorithm 2. We now show a benefit of Algorithm 2 in the following example. Note : θ ∈ (0, · · · , 1) is a user defined constant. The sigma points get closer to their constraints as θ → 1. We see from Example 2 that using Algorithm 2 ensures that the sigma points are always positive while ensuring accuracy in approximating the true mean and covariance of a random vector, as well as capturing the diagonal components of the skewness tensor. However, the ability to exactly capture the diagonal components of the kurtosis tensor is lost. A graphical representation of Examples 1 and 2 is shown in Fig. 3 where we plot the sigma points and the covariance.

We analyze the performance of our new sigma point algorithm when they undergo nonlinear transformations. We will show how linearization approximations, via Taylor series expansion of a nonlinear transformation of a random vector x evaluated about its meanx, introduce errors in the propagation of means and covariances. In Appendix A, we evaluated the true mean and true covariance of a random vector, as well as the approximated mean and approximated covariance. We see that although errors are introduced beyond the third order when approximating a nonlinear transformation of a random vector, these errors are minimized because of our ability to match the diagonal components of the skewness and kurtosis tensors. We also see that errors are introduced beyond the third order when the random vector is independent.

We will see that errors can be introduced in the propagation of means and covariances beyond the second order when sigma points developed under the Gaussian assumption [9] , [12] , [36] are used to approximate the nonlinear function λ(x) when x is an independent random vector. We note that the nonlinear transformation y ∈ R n is given by

where E[x] =x.x, P ,S, andK, we evaluate the sample mean and covariance of the nonlinear transformation of (45) using Algorithm 1.

For our comparison, we use the scaled unscented transform of [36] , which is denoted as UT for the remainder of this paper, and the higher order sigma point unscented transform (HOSPUT) of [28] . The scaling of the UT was selected to match a Gaussian distribution. We do not compare against the sigma points in [29] [30] [31] [32] because they either use a Gaussian assumption, a closed skew normal distribution, or more than 2n + 1 sigma points. The sample mean and sample covariance can be evaluated using (6)- (8) .

Defining x as a random variable that can follow any of the probability distributions given in the Table I , we evaluate the sample mean and covariance of two nonlinear transformations: a quadratic function of the random variable y = 3x + 2x 2 , and a trigonometric function of the random variable y = sin(x). We also use 10 5 Monte Carlo draws from the different probability distributions. The true mean and covariance of the quadratic function can be easily evaluated using the raw moments of x up to its fourth order. The true mean and covariance of the trigonometric function can be evaluated using their characteristic functions. A comparison between the accuracy of the GenUT, UT, 10 5 Monte Carlo draws, and HOSPUT in approximating the true mean and true covariance of the nonlinear transformations for the different probability distributions is shown in Tables II-V. For the quadratic function, we see that both the GenUT and HOSPUT gave an exact approximation of the true mean and true covariance for all the probability distributions while the UT was only accurate in approximating the true mean when the probability distribution was not Gaussian. This is because the GenUT and HOSPUT are accurate up to the fourth order moments when the random variable x has a dimension of 1. Although the 10 5 Monte Carlo draws gave relatively good approximations, they were not as accurate as the GenUT.

For the trigonometric function, we see that the GenUT, HOSPUT, and UT were unable to give exact approximations of the true mean and true covariance in most cases because the Taylor series expansion of λ(x) has terms beyond the fourth order. The GenUT and HOSPUT were more accurate than the UT for all the non-Gaussian probability distributions because they are both accurate up to the fourth order while the UT is accurate up to the second order. The 10 5 Monte Carlo draws sometimes gave better accuracy than the GenUT because of the random nature of its draws. A box plot of the accuracy of the GenUT, UT, and several Monte Carlo draws of different sizes is shown in Fig. 4 for the trigonometric function. We do not include the HOSPUT because it gives the same performance as the GenUT when a single random variable is transformed. We see that a significant number of Monte Carlo draws is needed to achieve the accuracy of the GenUT when approximating the mean. A significant number of Monte Carlo draws gives better accuracy in approximating the variance.

We examine the performance of the GenUT, HOSPUT, and UT in approximating the true mean and covariance of x

GenUT UT MC HOSPUT a nonlinear transformation of different random variables such that

We calculate the true mean and true covariance of y using 10 We see that for the nonlinear transformation, the GenUT gave the lowest percentage error when approximating the elements of the mean and covariance matrix. The UT gave the worst performance because it was unable to account for the non-Gaussian distributed nature of the random variable x. The HOSPUT performed worse than the GenUT because, when the problem dimension exceeds 1, it is only able to match the average values of the diagonal elements of the skewness and kurtosis tensors.

We consider an SIR (susceptible-infectious-recovered) infectious disease model given by the difference equation [37] 

where β is the infection rate, γ is the recovery rate, and N = S k +I k +R k . Using the conservation principle S +I +R = N , we reduce the model of (47) to

We note that by defining x = Poisson I k R k T , we can rewrite the above equation as

where x i is the ith element of the vector x.

We examine the performance of the GenUT, HOSPUT, and UT in approximating the true mean and covariance of (48). We use the parameters I k = 10, R k = 2, β = 1.5, γ = 0. We see that the GenUT gave the least approximation error of the true covariance matrix. The inability of the GenUT to exactly match the true covariance matrix is because the GenUT is only able to capture the diagonal components of the skewness and kurtosis tensors.

In this paper we have developed the generalized unscented transform (GenUT) that is capable of adapting to the unique statistics of an arbitrarily distributed random variable. We showed that due to its ability to match the diagonal elements of the skewness and kurtosis tensors of most random vectors using 2n+1 sigma points, the GenUT is preferable to and more accurate than unscented transforms that were either developed using the Gaussian assumption or were developed without any probability distribution in mind.

In terms of ease of implementation, we demonstrated that like the unscented transform originally developed in [12] which uses 2n + 1 sigma points, the GenUT uses the same number of sigma points. When compared against unscented transforms that employ more than 2n + 1 sigma points, the GenUT is characterized by a lower computational cost due to its lower number of sigma points that scales linearly with the problem dimension.

In terms of performance, the GenUT and unscented transforms that use 2n + 1 sigma points developed under the Gaussian assumption give the same performance when the random variable is Gaussian distributed. However, when the random variable or random vector is not Gaussian distributed, the GenUT gives better accuracy in the propagation of means and covariances. Additionally, we also showed that the GenUT formulation makes it easy to analytically enforce constraints on the sigma points while still guaranteeing at least a secondorder accuracy, which makes it appealing in models that permit only constrained values for random variables or parameters.

For uncertainty quantification, estimation, or prediction applications, when compared to existing unscented transforms, the GenUT gives the most accuracy that can be gotten by employing 2n + 1 sigma points. This accuracy will have more significant consequences if the nonlinearities are strong and the problem dimension is large. The GenUT can be applied to any filter that uses linear or nonlinear transformations of random variables. The MATLAB ® source code used to generate the results in this paper is available at [38] .

We derive analytical expressions for the true mean and covariance when we take the Taylor series expansion of the nonlinear function y = λ(x) where x is a random vector.

Applying Taylor series expansion aroundx, wherex = x −x, we write the true mean of y as

where Dxλ is the total differential of λ(x) when perturbed around a nominal valuex byx. We note that

Using (50), we can evaluate the true mean of (49) as

where

The true covariance of y is given as

Evaluating the expression y −ȳ, we write

Substituting (53) into (52) gives

We note that we can write the first term in the above equation as

Using (50) and (55), we can rewrite the true covariance matrix of (54) as

where we have used the notation xx T = x[· · · ] T . APPENDIX B APPROXIMATION OF MEANS AND COVARIANCES USING THE GENERALIZED UNSCENTED TRANSFORM We analytically show the accuracy in capturing the true mean and true covariance of y = λ(x) when using our 2n + 1 sigma points. We also show that our sigma point transformations give improved accuracy by capturing the diagonal components of the skewness and kurtosis tensors. We definẽ χ [i] = χ [i] −x while recalling that C = P P T . We note that

The approximated mean is given aŝ

Using (57), we can evaluate the above equation aŝ 

where 2n i=1 w iχjiχki = P jk , 2n i=1 w iχjiχkiχli =Ŝ jkl and 2n i=1 w iχjiχkiχliχmi =K jklm . In the Section IV, we already showed that we can accurately capture the diagonal components of the skewness and kurtosis tensors becauseŜ jkl = S jkl whenever j = k = l and K jklm = K jklm whenever j = k = l = m. Therefore, by comparing (58) with the true mean of (51), we can see that our sigma points improves on the accuracy of propagating the mean of a nonlinear transformation.

The approximated covariance can be evaluated using the expression

Substituting (60) into (59) and multiplying out gives

For the first term in (61),

Using (57) and (62), we can rewrite the approximated covariance matrix of (61) as

Comparing (63) with the true covariance of (56), we can see that our sigma points improves on the accuracy of propagating the covariance of a nonlinear transformation because we are able to accurately capture the diagonal components of the skewness and kurtosis tensors.

Optimal state estimation: Kalman, H infinity, and nonlinear approaches

Constrained state estimation using the unscented kalman filter

Kalman filtering based on the maximum correntropy criterion in the presence of non-gaussian noise

Some relations between extended and unscented kalman filters

Sequential data assimilation with a nonlinear quasigeostrophic model using monte carlo methods to forecast error statistics

Data assimilation using an ensemble kalman filter technique

An ensemble adjustment kalman filter for data assimilation

Adaptive ensemble kalman filtering of non-linear systems

A general method for approximating nonlinear transformations of probability distributions

Reduced sigma point filters for the propagation of means and covariances through nonlinear transformations

Monte carlo filter and smoother for non-gaussian nonlinear state space models

Consistent debiased method for converting between polar and cartesian coordinate systems

Better proposal distributions: Object tracking using unscented particle filter

The unscented particle filter

Ensemble kalman filter with the unscented transform

Kalman filtering with state constraints: a survey of linear and nonlinear algorithms

Optimized selection of sigma points in the unscented kalman filter

Assessment of the 2010 global measles mortality reduction goal: results from a model of surveillance data

Tracking measles infection through non-linear state space models

Time series analysis via mechanistic models

Forecasting seasonal outbreaks of influenza

Superensemble forecasts of dengue outbreaks

Analysis of bias in an ebola epidemic model by extended kalman filter approach

Using the kalman filter and dynamic models to assess the changing HIV/AIDS epidemic

A poisson kalman filter to control the dynamics of neonatal sepsis and postinfectious hydrocephalus

Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2)

Simple formula for the distortions in a gaussian representation of a poisson distribution

A new unscented kalman filter with higher order moment-matching

Randomized unscented transform in state estimation of non-gaussian systems: Algorithms and performance

A skewed unscented kalman filter

High-order unscented transformation based on the bayesian learning for nonlinear systems with non-gaussian noises

A higher order unscented transform

A systematization of the unscented kalman filter theory

Mathematical contributions to the theory of evolution.-XIX. Second supplement to a memoir on skew variation

Probability, random variables, and stochastic processes

Unscented filtering and nonlinear estimation

Modeling infectious diseases in humans and animals

Generalized unscented transform MATLAB ® source code