key: cord-0963748-2su7hrgi
authors: nan
title: Data Modeling With Polynomial Representations and Autoregressive Time-Series Representations, and Their Connections
date: 2020-06-08
journal: IEEE Access
DOI: 10.1109/access.2020.3000860
sha: 865321ac003a7c7aee6f6c07f4ed76d5c9069395
doc_id: 963748
cord_uid: 2su7hrgi

Two of the data modelling techniques - polynomial representation and time-series representation – are explored in this paper to establish their connections and differences. All theoretical studies are based on uniformly sampled data in the absence of noise. This paper proves that all data from an underlying polynomial model of finite degree [Formula: see text] can be represented perfectly by an autoregressive time-series model of order [Formula: see text] and a constant term [Formula: see text] as in equation (2). Furthermore, all polynomials of degree [Formula: see text] are shown to give rise to the same set of time-series coefficients of specific forms with the only possible difference being in the constant term [Formula: see text]. It is also demonstrated that time-series with either non-integer coefficients or integer coefficients not of the aforementioned specific forms represent polynomials of infinite degree. Six numerical explorations, with both generated data and real data, including the UK data and US data on the current Covid-19 incidence, are presented to support the theoretical findings. It is shown that all polynomials of degree [Formula: see text] can be represented by an all-pole filter with [Formula: see text] repeated roots (or poles) at [Formula: see text]. Theoretically, all noise-free data representable by a finite order all-pole filter, whether they come from finite degree or infinite degree polynomials, can be described exactly by a finite order AR time-series; if the values of polynomial coefficients are not of special interest in any data modelling, one may use time-series representations for data modelling.

Interests in data science have been growing extremely fast in the twenty-first century. As well as interests from many different subject areas, data science is being integrated in diverse range of industries and agencies (e.g., health, transport, energy, government, society, etc.). Strictly, a time-series refers to a series of data points ordered in time. It is very common that a time-series represents data points at equally separated in time. Of course, the analytics that are created for time-series data can generally be applied to a sequence of data that are equally separated in space (e.g., images) or some other domain. There are many types of time-series models, including autoregressive models.

Although there are many types of time-series models, the earliest and an alternative way to model data is by polynomial The associate editor coordinating the review of this manuscript and approving it for publication was Datong Liu . regression. Polynomial regression models are generally fitted with the Least-squares method to obtain estimated values of the polynomial coefficients. In 1805 Legendre published the Least-squares method [1] and Gauss published it in 1809 and later in 1823 [2] . In 1815 Gergonne wrote a paper on ''The application of the method of least squares to the interpolation of sequences'' [3] . This is an English translation by Stigler [4] of the original paper that was written in French. In the last 120 or so years, polynomial regression contributed greatly to the development of regression analysis [5] - [7] .

Although there are other ways to model data, the focus in this paper is around polynomial representation and autoregressive time-series representation. There has been a lot of research in time-series data representation [8] - [12] . For example, the main goal of time-series analysis in econometrics, geophysics, meteorology, quantitative finance, seismology, and statistics is prediction or forecasting [13] - [20] . On the other hand, it is used for signal detection and estimation in communication engineering, control engineering, and signal processing [21] - [28] . It is also used for clustering, classification, and prediction or forecasting in data mining, machine learning, and pattern recognition [29] - [34] . Mathematical modelling and time-series analysis are fundamental to many fields; a couple of very recent examples can be found in [35] , [36] .

In polynomial representations, observed data is a function of time (or some other variable). This function, except for the case of a constant or a straight line, represents a non-linear relationship between the time (or some other variable) and the observed data, even though the parameters are linear. On the other hand, in autoregressive (AR) time-series representation, observed data is a linear function of some of the earlier data and thus the model is linear in both data and parameters. Although both are used for data modelling, there are some fundamental differences. Hence, this paper explores many questions around polynomial and autoregressive representations with a view to establish their connections and differences. Two of these questions are:

1) Can all finite degree polynomials be expressed as finite order time series? If the answer is affirmative, what is the underlying relationship?

2) Can all finite order autoregressive time-series be represented as finite order polynomials?

This study is in the context of real-valued and uniformly sampled noise-free data. The paper presents the following original results: 1) All polynomials of degree 1 (linear), of degree 2 (quadratic), and of degree 3 (cubic) can be represented as autoregressive time-series of order 1, order 2, and order 3, with a constant respectively. This is illustrated in section II. 2) All polynomials of degree 3 can be represented by AR time-series with the set of coefficients with the same values but possibly with a different value for its constant term. This observation is also true for polynomials of degree 1 and of degree 2. This is presented in section II. 3) All polynomials of finite degree q can be represented as AR time-series of order q and a constant. This can be found in section III. 4) All polynomials of degree q can be represented by AR time-series with one set of coefficients with the same values but possibly with a different value for its constant term. This is demonstrated in section III. 5) The corresponding time-series coefficients are integers and of specific forms, which are derived in section III. 6) Some numerical explorations from several sources of both generated data and real data, including some current Covid-19 incidence data from the UK and the US, are presented in section IV. 7) Whilst all finite degree polynomials can be represented by finite order AR time-series, the converse is not true. There are infinitely many AR time-series of finite orders that cannot be represented by finite order polynomials. Furthermore, all finite order AR time-series with either non-integer coefficients or integer coefficients not of the aforementioned specific forms represent polynomials of infinite degree. This is shown in section V. 8) Section VI shows that all polynomials of degree q can be represented by an all-pole filter with q repeated roots (or poles) at z = +1. Thus, any noise-free data representable by a finite order all-pole filter, whether they come from finite degree or infinite degree polynomials, can be described exactly by a finite order AR time-series.

Given a set of uniformly sampled real-valued data points in discrete time, these may be represented by a polynomial or a time-series. A polynomial of degree N in continuous time can take the following form

For uniformly sampled discrete time, the continuous time, t, is represented as t = nT , where n is an integer and T is the sampling period. In this scenario, the above equation can be rewritten as

On the other hand, an autoregressive time-series model of order q, AR(q), can be written as

and may be used to represent the set of uniformly sampled data points in discrete time.

In this subsection, an exploration of data representation by a linear polynomial and an AR time-series is carried out. For any linear polynomial, N has the value of 1 in equation (1). It is easy to show from equation (1) that y (nT ) = y (nT− T ) + c (1) T . By removing T from indices, this can be written as y (n) = y (n − 1) + c (1) T . Comparing this with equation (2) for AR(q), it is clear that q = 1, a (1) = 1, and µ = c (1) T . Therefore, the following can be concluded:

• Every linear polynomial, i.e., of degree 1, can be perfectly represented by an AR(1) time-series.

• Every linear polynomial will have the same value of the coefficient in time-series, i.e., a (1) = 1.

• The constant term in the time-series is given by µ = c (1) T .

• This implies that every linear polynomial with different values of c (0) but the same value of c (1) will have the identical AR(1) representation, i.e., with the same values of a (1) and µ. VOLUME 8, 2020 B. QUADRATIC POLYNOMIAL In this subsection, an exploration of data representation by a quadratic polynomial and an AR time-series is carried out. For any quadratic polynomial, N has the value of 2 in equation (1) . Thus, it follows from equation (1) that

Using equations (3) and (4), one can write

and, using equations (4) and (5), one can write

Now, using equations (6) and (7), one finds

Therefore,

By removing T from indices, equation (9) can be written as y (n) = 2y (n − 1) − y(n − 2) + 2c (2) T 2 . Comparing this with equation (2) for AR(q), it is clear that q = 2, a (1) = 2, a (2) = −1, and µ = 2c (2) T 2 . Therefore, the following can be concluded:

• Every quadratic polynomial, i.e., of degree 2, can be perfectly represented by an AR(2) time-series.

• Every quadratic polynomial will have the same coefficient values in time-series, i.e., a (1) = 2 and a (2) = −1.

• The constant term in the time-series is given by µ = 2c (2) T 2 .

• This implies that every quadratic polynomial with different values of c (0) and c (1) but the same value of c (2) will have the identical AR(2) representation, i.e., with the same values of a (1) , a (2), and µ.

In this subsection, an exploration of data representation by a cubic polynomial and an AR time-series is carried out. For any cubic polynomial, N has the value of 3 in equation (1). Thus, it follows from equation (1) that

Using equations (10) and (11), one can obtain

and, using equations (11) and (12), one can obtain

Now, using equations (14) and (15), one obtains

Using equations (12) and (13), one can write

Now, using equations (15) and (17), one can write

Thus,

Combining equations (16) and (19) , one obtains

Therefore, (20) By removing T from indices, this can be written as y (n) = 3y (n − 1) − 3y (n − 2) + y (n − 1) + 6c (3) T 3 . This can be described by AR(q), provided q = 3, a (1) = 3, a (2) = −3, a (3) = 1, and µ = 6c (3) T 3 . Therefore, the following can be concluded:

• Every cubic polynomial, i.e., of degree 3, can be perfectly represented by an AR(3) time-series. ) , and µ. The summary of the exposition so far is that all polynomials of degree 1 (linear), of degree 2 (quadratic), and of degree 3 (cubic) can be perfectly represented as AR time-series of orders 1, 2, and 3 respectively. Furthermore, for each degree of polynomials all the time-series coefficients have predefined values and they are specific integers, while the constant term, µ, has a predefined form that depends on the coefficient of the leading degree of the polynomial, the degree of the polynomial, and the sampling period. These and more specific information can be found in Table 1 above.

In section II it has been demonstrated that all polynomials of degree 1 (linear), of degree 2 (quadratic), and of degree 3 (cubic) can be perfectly represented as autoregressive time-series of orders 1, 2, and 3 respectively. In this section the exploration is generalised for all polynomials of every finite degree. In section II it was found that, for q = 1, 2, and 3, the degree of the polynomial and the corresponding order of the AR time-series order are identical. In the following, a discrete-time polynomial of degree q of the form below is considered

in seeking a corresponding autoregressive time-series model of order q, AR(q). The time-series in equation (2) can be rewritten as

Now it is conjectured that

for i = 1, 2, . . . , q. Using this conjecture and equation (21), the equation (22) can be written as

In the above double summation, it is instructive and revealing to consider different values of j separately.

For the particular case of j = 0, the right-hand side of equation (24) can be written as

. The relation 0.154.6 on page 4 of [37] , for q ≥ n ≥ 1 and 0 0 ≡ 1, can be adapted to

Using this relation, for q ≥ n = 1,

Therefore, for the case of j = 0, the right-hand side of equation (24) is c (0).

For the case of j = 1, the right-hand side of equation (24) can be written as

Using equation (25), for q ≥ n > 1, one can write

Using equations (26) and (28) in equation (27), it is found that the right-hand side of equation (24), for the case of j = 1, is equal to c (1) nT .

Now the case of j = 2 is considered. The right-hand side of equation (24) can be written as

Using equations (26) and (28), in the previous expression for the right-hand side of equation (24), for the case of j = 2, the right-hand side is found to be equal to c (2) (nT ) 2 .

Similarly, for each value of j up to j = q − 1, it can be shown that the right-hand side of equation is equal to

According to equation (28) , which is valid if the top range of the summation is either larger than or equal to the power of i plus one, i.e., q ≥ (q − 1) + 1, this term equates to zero.

However, when j = q, there is a term of the form

For this term, equation (28) is not valid since the top range of the summation, i.e., q, is neither larger than nor equal to the power of i plus one, i.e., (q + 1). To deal with the case of j = q, the relation 0.154.4 on page 4 of [37] , for n ≥ 0 and 0 0 ≡ 1, can be adapted to

Thus,

Thus, for the case of j = q, the right-hand side of equation (24) can be written as

Using equation (26) the first term is c (q) (nT ) q . All the terms in the middle are zero by virtue of equation (28) . Using equation (30) the last term is found to be c (q) (−T ) q (− (−1) q q!), which is equal to −c (q) T q (q!). Adding all the results for j = 0, 1, . . . , q, one obtains

Therefore, all noise-free data from uniformly sampled polynomials of finite degree q can be perfectly represented by an autoregressive time-series model of order q such that

and

In this section some explorations are carried out for different types of data sources to illustrate a few themes. In reality, all real data have uncertainties; therefore, it is important to study sensitivities to degrees and types of uncertainties. Yet, in these explorations all generated data are error-free. Here the objectives are to underpin some theoretical results and to generate some intuitions from precise data and theoretical results, and not to get distracted into studying effects of noise interference. Two applications to real data, the current Covid-19 data from the UK and the US, are clearly not noise-free but are offered as real examples.

In the first four of these explorations, N data are generated. These are then modelled by polynomials as in equation (21) and time-series as in equation (2). When considering a polynomial of degree p, the first (p + 1) data are used to evaluate the (p + 1) coefficients of this polynomial. This works as data are error-free. On the other hand, when considering a timeseries of order q, the first 2q data are used to evaluate the q coefficients of this time-series.

Here N data are generated from a polynomial of degree 3,

This is a finite degree polynomial with no steady state. For each value of the degree p of the polynomial from p = 1, . . . , 9, the first (p + 1) data are used to calculate the (p + 1) coefficients of the polynomial. Using these polynomial coefficients, the remaining (N − p − 1) data values are predicted; these are labelled as yp(n) for n = (p + 2), . . . , N . Similarly, for each value of the time-series order of q from q = 1, . . . , 9, the first 2q data are used to calculate the q coefficients of the time-series. Using these coefficients, the remaining (N − 2q) data values are predicted; these are labelled as yt(n) for n = (2q + 1), . . . , N .

For the same values of (p + 1) and q, (N − p − 1) data values are predicted for polynomial and (N − 2q) data The RMS prediction error (polynomial) is depicted in Figure 1a ) as a function of (p + 1), while the RMS prediction error (time-series) is shown in Figure 1b) as a function of q. The prediction error at (p + 1) = 4 is (6.6 * 10 −13 ± 2.7 * 10 −12 ), while the prediction error at q = 4 is 7.2 * 10 −10 ± 1.6 * 10 −9 ; both are extremely small. Figure 2a) shows the data versus the time index, while the Figure 2b ) depicts the prediction errors versus the time index for (p + 1) = 4 (polynomial in red) and at q = 4 (timeseries in green). The results confirm that these data from a finite degree polynomial can be equally well described by both polynomial and time-series representations.

Here N data are generated from a sine wave y (n) = sin(2π n/16), for n = −17 : 1 : 17.

This represents an infinite degree polynomial and has no steady state, but its values are bounded between -1 and +1. The procedures for calculating the (p + 1) coefficients of the polynomial and calculating the q coefficients of the AR time-series are the same as described in Case I earlier. Also, the procedures for calculating the prediction error (polynomial) and the prediction error (times-series) have been described earlier in Case I.

The RMS prediction error (polynomial) is depicted in Figure 3a ) as a function of (p + 1), while the RMS prediction error (time-series) is shown in Figure 3b ) as a function of q. The error at (p + 1) = 1 is (6.9 ± 3.8), while the error at q = 2 is 2.5 * 10 −18 ± 1.4 * 10 −15 . Also, the error at (p + 1) = 6 is (422 ± 614), while the error at q = 2 is 1.3 * 10 −16 ± 1.4 * 10 −15 . Figure 4a) shows the data versus the time index, while the Figure 4b ) depicts the prediction errors versus the time index for (p + 1) = 1 (polynomial in red) and at q = 2 (time-series in green). The results confirm that these data from a sine wave are extremely well described by a time-series representation of only order 2; there is a theoretical reason for this (see section V for an explanation). Also, this time series representation is far better than any finite degree polynomial representation.

Here N data are generated from a non-polynomial y (n) = 1 − 2n−3n (n − 1)+(0.5) n , for n = −17 : 1 : 17.

This represents an infinite degree polynomial and has no steady state. The procedures for calculating the (p + 1) coefficients of the polynomial and calculating the q coefficients of the time-series are the same as described in Case I earlier. Also, the procedures for calculating the prediction error (polynomial) and the prediction error (times-series) have been described earlier in Case I.

The RMS prediction error (polynomial) is depicted in Figure 5a ) as a function of (p + 1), while the RMS prediction error (time-series) is shown in Figure 5b ) as a function of q. The prediction error at (p + 1) = 3 is (6.6 * 10 6 ± 4.9 * 10 6 ), while the prediction error at q = 4 is −1.2 * 10 −8 ± 8.7 * 10 −8 . Also, the prediction error at (p + 1) = 5 is (7.3 * 10 7 ± 8.9 * 10 7 ), while the prediction error at q = 4 is −1.2 * 10 −8 ± 8.7 * 10 −8 . Figure 6a ) shows the data versus the time index, while the Figure 6b ) depicts the prediction errors versus the time index for (p + 1) = 5 (polynomial in red) and at q = 4 (time-series in green). Thus, these data from a non-polynomial are significantly better described by a time-series representation of only order 4; the theoretical reason can be found in section V. Also, RMS (time-series) is many orders of magnitude smaller than RMS (any finite degree polynomial). Figure 5a ) presents the RMS prediction error (polynomial) as a function of (p + 1). Figure 5b ) displays the RMS prediction error (time-series) as a function of q. Figure 6b ) depicts the prediction errors versus the time index for (p + 1) = 5 (polynomial in red) and for q = 4 (time-series in green).

Here N data are generated from an inverse polynomial y (n) = 1/(1 + 0.2n), for n = 0 : 1 : 34.

This represents an infinite degree polynomial. It has neither a finite degree polynomial representation nor a finite order time-series representation. The procedures for calculating the (p + 1) coefficients of the polynomial and calculating the q coefficients of the time-series are the same as described in Case I earlier. Also, the procedures for calculating the prediction error (polynomial) and the prediction error (timesseries) have been described earlier in Case I.

The RMS prediction error (polynomial) is depicted in Figure 7a ) as a function of (p + 1), while the RMS prediction error (time-series) is shown in Figure 7b ) as a function of q. The prediction error at (p + 1) = 3 is (8.2 ± 6.7), while the prediction error at q = 3 is (−4.7 * 10 −4 ± 1.7 * 10 −4 ). Also, the prediction error at (p + 1) = 7 is (476 ± 617), while the prediction error at q = 7 is (−1.1 * 10 −7 ± 9.5 * 10 −8 ). Figure 8a) shows the data versus the time index, while the Figure 8b ) depicts the prediction errors versus the time index for (p + 1) = 3 (polynomial in red) and at q = 3 (timeseries in green). Results confirm that these data from an inverse polynomial are significantly better described by an AR time-series representation than a finite degree polynomial representation by several orders of magnitude in RMS.

This is an example of using real data from a current global Covid-19 epidemic as it is unfolding. The dataset represents cumulative daily confirmed cases of Covid-19 infections in the UK. This dataset is publicly available [38] . On 01 April 2020 there were 61 data (i.e., N = 61) covering the period from 31 January 2020 to 31 March 2020. Thus y(n) for n = 1 : 1 : 61 represents the cumulative daily confirmed cases of Covid-19 infections in the UK.

Of these 61 data, the first 50 data are used for estimating the free parameters and the last 11 data are used for forecasting. For a polynomial of the degree p, the first 50 data are used to estimate the (p + 1) coefficients of the polynomial using the Moore-Penrose inverse. By adopting the equation (21), the first 50 data can be described the matrix equation

Thus, Y is a column vector of size 50 × 1, C is a column vector of size (p + 1)x1, and A is a matrix of size 50x(p + 1). Now,

Using these estimated polynomial coefficients from the equation (32) , all 61 data are calculated using

where B is a matrix of size 61x(p + 1) and B = Similarly, for a time-series of order q, the first 50 data are used to estimate the q coefficients of the time-series. Each of these data values depends on the coefficients and earlier data values. As all data values are error prone, the Total Least Squares, which takes account of errors in both the dependent and independent variables, is more appropriate than the ordinary Least Squares, which takes account of only errors in dependent variables and not in the independent variables. It is not known a priori whether the data can be represented by a finite degree polynomial or a finite order time-series. The RMS error at (p + 1) = 5 is 5142, while the RMS error at q = 2 is 539. Clearly, the time-series representation is much more accurate. Also, the RMS error at (p + 1) = 6 is 1711, much smaller than at lower p values, but it is still much larger than the one from the time-series representation. To get a better idea of the fit (and not the predictions) Figure 10 plots data values (y) at n = 2, 3, . . . , 40 in blue, the corresponding fitted values (yp) in red according to polynomial representation at (p + 1) = 6, as well as the corresponding fitted values (yt) in green according to autoregressive time-series representation of order 2.

It is clear that the polynomial representation picks up the trend of the later data values, but it completely fails for the first half of the data values. On the other hand, this autoregressive time-series of order 2 picks up the trend over the whole range of the data values. The results confirm that the UK Covid-19 data are significantly better described by an AR time-series of order 2 (less RMS error) than a finite degree polynomial of degree 5 (and others).

This is another example of using real data. The dataset represents cumulative daily confirmed cases of Covid-19 infections in the US. This dataset is publicly available [39] . On 04 April 2020 there were 25 data (i.e., N = 25) covering the period from 10 March 2020 to 03 April 2020. Thus y(n) for n = 1 : 1 : 25 represents the cumulative daily confirmed cases of Covid-19 infections in the US.

Of these 25 data, the first 15 data are used for estimating the free parameters and the last 10 data are used for forecasting. For a polynomial of the degree p, the first 15 data are used to estimate the (p + 1) coefficients of the polynomial using the Moore-Penrose inverse in much the same way as for Case V above. Using these estimated polynomial coefficients, all 25 data are calculated in a similar manner to Case V. The YP is a column vector of size 25 × 1 and Similarly, for a time-series of order q, the first 15 data are used to estimate the q coefficients of the time-series. Each of these data values depends on the coefficients and earlier data values. As all data values are error prone, the Total Least Squares, which takes account of errors in both the dependent and independent variables, is more appropriate than the ordinary Least Squares, which takes account of only errors in dependent variables and not in the independent variables. It is not known a priori whether the data can be represented by a finite degree polynomial or a finite order time-series. The RMS error at (p + 1) = 4 is 15272, while the RMS error at q = 3 is 6533. Figure 11a) Figure 12a ) and Figure 12b ) are for (p + 1) = 5; otherwise, they can be similarly described as in Figure 11 , except for a different value of (p + 1).

are from the fit and the last 10 values are predictions]. To get a closer look at the predictions, Figure 11b ) shows the last 10 data values (y) in blue, the 10 predicted values (yp) in red according to polynomial representation at (p + 1) = 4, as well as the 10 predicted values (yt) in green according to autoregressive time-series representation of order 3. RMS errors increase for other choices of (p + 1) values. Clearly, the time-series representation is much more accurate.

Looking for better results with a higher degree of polynomial, the RMS error at (p + 1) = 5 is found to be 34692, which is significantly larger than the value from time-series representation at q = 2. Figure 12a) and Figure 12b ) for (p + 1) = 5 can be similarly described as Figure 11 for (p + 1) = 4. The results confirm that these US Covid-19 data are significantly better described by an AR time-series of order 3 than a finite degree polynomial of degree 3 (and others). Table 2 provides a summary of these six cases. Data from a polynomial of finite degree can be represented equally well by a finite degree polynomial as well as a finite order time-series with specific integer coefficients, while data from other sources are represented significantly better by time-series representations. In many cases, finite order time-series can theoretically represent data from infinite order polynomials extremely well. Therefore, whenever the knowledge of the polynomial coefficients is not necessary in an application, one may choose to use time-series representation.

It has been demonstrated in sections II and III that all data from polynomials of finite degree q can be perfectly represented by a time-series of order q, if µ is not zero. The coefficients of such time-series are always integers of a specified form. Below are demonstrated what time-series with other forms of coefficients (either non-integers or integers of different forms) represent.

The equation (2) is called non-homogeneous if µ in equation (2) is not zero [40] , [41] . Then equation (2) can be combined with its equivalent form

to obtain (by replacing µ)

with a (0) ≡ −1 and a (q + 1) ≡ 0. This is a homogeneous equation. The corresponding characteristics polynomial has (q + 1) roots, i.e., z (1) , z (2) , . . . , z (q + 1). When these roots are distinct,

On the other hand, when there are repeated roots, the solution is different. For only two repeated roots, e.g., z (1) = z (2) = z,

Each of these two solutions in equations (36) and (37) describes polynomials of infinite degrees. Hence, finite VOLUME 8, 2020 order time-series with other forms of coefficients (either non-integers or integers of different forms) represent polynomials of infinite degrees.

In case II above, N data were generated from a sine wave y (n) = sin(2π n/16), for n = −17 : 1 : 17.

Fitting y (n) = a (1) y (n − 1)+a (2) y(n−2) to the first 4 data values, it was found that a (1) = 1.8478 and a (2) = −1.000. These give rise to the characteristic polynomial of z 2 − 1.8478z + 1 = 0. The two roots are given by z (1) = 0.9239 + 0.d3827j and z (2) = 0.9239 − 0.3827j. As these two roots are distinct, the solution is given by y 

This is yet another example of how a finite order time-series (in this case of order 4) can represent perfectly this polynomial of infinite degree.

In this section a connection between polynomials and all-pole filters is demonstrated.

It is well known that AR time-series models can be realised with all-pole filters. It has already been proven in Section III that all polynomials of finite degree of q can be represented by AR time-series of order q (as in equation (2)). Using z-transform, equation (2) can be written as

and it has been proven in Section III that a (i) = (−1) i+1 q i for i = 1, 2, . . . , q. So, the denominator polynomial can now be written as

Therefore, all polynomials of finite degree q map onto z = 1 on the z-plane by its q repeated roots.

All roots on the unit circle away from z = −1 and z = 1 are complex. For a real-valued time-series, complex roots come in complex conjugate pairs. Consider just one such pair for illustration, i.e., z (1) = e −jθ and z (2) = e +jθ . Thus,

Since y (n) is realvalued, either b (1) = b(2) ≡ b and y (n) = 2b cos (θ n), or b (1) = −b (2) ≡ −b/j and y (n) = 2b sin (θn). Therefore, each pair of complex conjugates roots represent either a cosine or a sine, which can be described by an AR time-series of order 2 instead of a polynomial of infinite degree. The corresponding time-series coefficients are a (1) = 2 cos (θ ) and a (2) = −1.

Again, for a real-valued time-series, complex roots come in complex conjugate pairs, consider just one pair for illustration, i.e., z (1) = βe −jθ and z (2) = βe +jθ , with 0 < β < 1. In this case, y (n) = b (1) z (1) n + b (2) z (2) n = b (1) β n e −jθ n +b (2) β n e +jθ n . Since y (n) is real-valued, either b (1) = b(2) ≡ b and y (n) = 2bβ n cos (θ n), or b (1) = b (2) ≡ −b/j and y (n) = 2bβ n sin (θ n). Therefore, each pair of complex conjugates roots represent either a damped cosine or a damped sine, which can be described by an AR time-series of order 2 instead of a polynomial of infinite degree. The corresponding time-series coefficients are a (1) = 2βcos (θ ) and a (2) = −β 2 .

Let −1 < z (1) , z (2) , z(3) < 1 be the three distinct roots of the denominator polynomial. Then y (n) = b (1) z (1) n + b (2) z (2) n + b (3) z (3) n . This can be described by an AR time-series of order 3 rather than a polynomial of infinite degree.

On the other hand if −1 < z (1) , z (2) , z(3) < 1 be the three repeated roots of the denominator polynomial, i.e., z (1) = z (2) = z(3) ≡ z. In that case y (n) = z n [b (1) + b (2) n + b (3) n (n − 1)]. This is another example of a finite order AR time-series representing data that requires a polynomial of infinite degree.

The two lessons are: 1) All polynomials of degree q can be represented by an all-pole filter with q repeated roots (or poles) at z = +1. 2) Data representable by finite order all-pole filters, whether they are from finite degree or infinite degree polynomials, can be described by a finite order AR time-series.

Two of the data modelling techniques are polynomial representation and time-series representation. In this paper, all theoretical studies to explore their connections and differences have been based on uniformly sampled data in the absence of errors. It has been proven that all data from an underlying polynomial model of finite degree q as in equation (21) can be represented perfectly by either a polynomial of degree q or an autoregressive time-series of order q and a constant term. Also, it has been proven that all polynomials of degree q can be described by the same set of time-series coefficients with the only possible difference being in the constant term µ as in equation (2). These time-series coefficients are integers of a specific form. It was also demonstrated that time-series with either non-integer coefficients or integer coefficients of not the specific form represent polynomials of infinite degree. Explorations, in four cases with generated data and in two cases with real data, demonstrated that, while finite degree polynomial and finite order time-series representations are equally good for data following finite degree polynomial forms, finite order autoregressive time-series representations offer significant advantages in modelling data from other sources. All polynomials of degree q can be represented by an all-pole filter with q repeated roots (or poles) at z = +1.

Theoretically, all data representable by a finite order allpole filter, whether they come from finite degree or infinite degree polynomials, can be described by a finite order AR time-series. If the values of polynomial coefficients are not necessary in an application, one may choose to use finite order time-series representations as they are more general than finite degree polynomial representations.

Nouvelles Méthodes Pour la Détermination Des Orbites Des Comètes (Sur la Méthode Des Moindres Quarrés)

Theoria Combinationis Observationum Erroribus Minimis Obnoxiae

The application of the method of least squares to the interpolation of sequences

Gergonne's 1815 paper on the design and analysis of polynomial regression experiments

On the theory of correlation

The law of ancestral heredity

The goodness of fit of regression formulae, and the distribution of regression coefficients

Extrapolation, Interpolation, and Smoothing of Stationary Time Series

Time Series Analysis: Forecasting and Control, Revised Edition

The Nature of Mathematical Modeling

Applied Time Series Analysis

The Statistical Analysis of Experimental Data

A First Course on Time Series Analysis: Examples With SAS

StatsRef: Statistics Reference Online)

On clustering fMRI time series

Seizure prediction: The long and winding road

The analysis of meteorological time series for use in forecasting

Applied Econometric Times Series

Prediction paradigm involving time series applied to total blood issues data from England

Linear prediction: A tutorial review

Statistical Digital Signal Processing and Modeling

Adaptive Filter Theory

Wireless Communications: Principles and Practice

Coherence and time delay estimation

Noninvasive fetal electrocardiogram extraction: Blind separation versus adaptive noise cancellation

Natural Time Analysis: The New View of Time: Precursory Seismic Electric Signals, Earthquakes and Other Complex Time Series

Dynamic programming algorithm optimization for spoken word recognition

Clustering of time series data-A survey

Time-series clustering-A decade review

On the need for time series data mining benchmarks: A survey and empirical demonstration

Anomaly detection, analysis and prediction techniques in IoT environment: A systematic literature review

Clustering and classification for time series data in visual analytics: A survey

The Box-Jenkins analysis and neural networks: Prediction and time series modelling

Time series data cleaning: A survey

Early prediction of the 2019 novel coronavirus outbreak in the mainland China based on simple mathematical model

Tables of Integrals, Series, and Products

UK Covid-19 Data

US Covid-19 Data

Finite Difference Equations

Difference Equations: An Introduction With Applications

He is a Distinguished Visiting Professor at Xi'an Jiaotong University, China, and an Adjunct Professor with the University of Calgary, Canada. In 1983, he co-discovered the three fundamental particles known as W + , W − , and Z 0 (with the UA1 Team at CERN), providing the evidence for the unification of the electromagnetic and weak forces, for which the Nobel Committee for Physics awarded the prize to his two team leaders for their decisive contributions

The author acknowledges Dr C Liu for formatting the manuscript and Dr D A Nandi for supplying the US Covid-19 data.