key: cord-0173232-wtnz5b72
authors: Huang, Whitney K.; Chung, Yu-Min; Wang, Yu-Bo; Mandel, Jeff E.; Wu, Hau-Tieng
title: Airflow recovery from thoracic and abdominal movements using Synchrosqueezing Transform and Locally Stationary Gaussian Process Regression
date: 2020-08-11
journal: nan
DOI: nan
sha: e1687e42df5cf087c7c2a870ec7aa531a6de40c3
doc_id: 173232
cord_uid: wtnz5b72

Airflow signal encodes rich information about respiratory system. While the gold standard for measuring airflow is to use a spirometer with an occlusive seal, this is not practical for ambulatory monitoring of patients. Advances in sensor technology have made measurement of motion of the thorax and abdomen feasible with small inexpensive devices, but estimation of airflow from these time series is challenging. We propose to use the nonlinear-type time-frequency analysis tool, synchrosqueezing transform, to properly represent the thoracic and abdominal movement signals as the features, which are used to recover the airflow by the locally stationary Gaussian process. We show that, using a dataset that contains respiratory signals under normal sleep conditions, an accurate prediction can be achieved by fitting the proposed model in the feature space both in the intra- and inter-subject setups. We also apply our method to a more challenging case, where subjects under general anesthesia underwent transitions from pressure support to unassisted ventilation to further demonstrate the utility of the proposed method.

Breathing is an integrated physical activity involving different anatomical structures that mechanically transfer gases between alveoli and the environment. The recorded breathing activity is called the respiratory (or breathing) signal, which provide clinicians information for decision making. Numerous sensors have been described, including spirometer, piezo-sensor, electrocardiogram (ECG), photoplethysmogram (PPG), infrared video, to name but a few [32, 7] , to directly or indirectly acquire respiratory signals from different physiological aspects, ranging from airflow, thoracic movement (THO) and abdominal movement (ABD), impedance, physiological variability, to light spectrum. Sensing modalities vary in their intrusiveness, cost, and the richness of the physiological data they present. For example, limited respiratory data can be extracted from the ECG or PPG, including respiratory rate [7] and tidal volume [39] , but not the expiration information. However, the expiration information can be obtained from the spirometer. On the other hand, different sensors might be applicable to different monitoring scenarios. While the gold standard for measuring airflow is to use a spirometer with an occlusive seal, using this for continuous ambulatory monitoring is not practical [18] . Usually, it can only be carried out without the resources of a hospital. While recording the chest/abdominal girth for monitoring of respiratory efforts may require periodic recalibration to provide accuracy, it can be performed continuously. A system that permits accurate tracking of minute ventilation from inexpensive sensors mounted on the chest and abdomen might be useful if it provides a "pulse oximeter for the lungs", the airflow signal.

Motivated by the dramatically increased demands for telemedicine, for example, detection of onset of COVID-19, opioid overdose, or asthma attacks due to worsening air pollution, it is useful to be able to extract airflow signal from easy-to-install sensors that are suitable for homecare. The signals from different sensors are in general correlated, as they are monitoring a single integrated system. To this end, we examine the ability to predict airflow using ABO and THO from the physiological perspective. Under normal physiology, the cross sectional area and longitudinal length change of the rib cage are directly related to the lung volume. Since the THO sensor estimates the cross sectional area of the thoracic cage, and the ABD sensor estimates the increased abdominal girth due to displacement by the lung and the diaphragm movement, ABD and THO together provide information about the airflow signal. If lung volume is adequately represented by ABD and THO, a model relating these should accurately predict airflow. However, it is difficult to model airflow from THO and ABD due to the time-varying nonlinear nature of the physiology.

To our knowledge, there are limited studies in this direction, except [33] . To handle this challenging problem, the authors in [33] decomposed the ABD and THO signals by the Empirical Mode Decomposition (EMD) [26] , and demonstrated that a regression model built on the decomposed empirical modes can yield higher accuracy on predicting spirometer flow than that of the regression model with timedomain features. While the results are encouraging and exciting, however, there are several shortcomings. First, a non-trivial trial and error is needed to make EMD work, and only the in-sample predictions were carried out. Second, how EMD maps ABD and THO to any underlying physical property of the system is unclear. Indeed, so far a theoretical support of EMD is still missing, and hence we have limited understanding of the physical meaning of the decomposed empirical modes. This fact limits a further development based on EMD from the scientific research perspective. Due to the above limitations and its clinical importance, it is desirable if the study in [33] can be extended and we could develop a theoretically solid airflow prediction model to predict the airflow signal of a subject with only ABD and THO.

To this end, we shall examine the decomposition idea considered in [33] . Note that physiological signals, such as airflow, ABD and THO, often exhibit complicated oscillatory patterns, which contain crucial information about a person's health. It is a natural idea to decompose a complicated oscillatory signal into simple ingredients as a new representation, and analyze the signal by analyzing each ingredient in this new representation. EMD is an approach considered in [33] to achieve this goal. While there are various ways to represent an oscillatory signal, some representations are more suitable than others. To appreciate this fact, recall that most vibrating objects have more than one resonant oscillations. Usually, a harmonic is defined as a resonant oscillation whose frequency is an integer multiple of the fundamental frequency. Physically, different harmonics play different roles and integrate to describe the characteristic vibrational mode or the standing wave pattern of the signal. In other words, harmonics encode intrinsic mechanism of vibration. Mathematically, consider a toy oscillatory signal s(t), where s is a 1-periodic function (i.e., s(t + 1) = s(t), ∀t). Clearly, at time t, the signal can be represented as s(t), or equivalently as a vector [α 1 cos(2πt + β 1 ), . . . , α l cos(2πlt + β l ), . . . , α L cos(2πLt + β L )] ∈ R L , where s(t) satisfies the Fourier series expansion L l=1 α l cos(2πlt + β l ). We call this vectorvalued representation the harmonic representation of the oscillatory signal. The harmonic representation captures the detailed oscillatory behavior down to the harmonic level, and hence provides more information about the mechanism of vibration. Moreover, this harmonic representation holds for signals as complicated and nonstationary as the respiratory signals (see Fig. 1 for an example). We mention that the empirical modes of respiratory signals decomposed by EMD in [33] are in general not harmonic representation.

Motivated by the relationship between the harmonic representation and the underlying mechanism, in this study, we propose to use Gaussian process (GP) regression with the harmonic representations as covariates to predict the flow signal. We propose to determine the harmonic representations of ABD and THO using the nonlinear-type time-frequency (TF) analysis tool, synchrosqueezing transform (SST). We hypothesize that this approach works well during slowly evolving changes in ventilation. However, its performance of tracking "catastrophic events" might not be as good. Compared with the approach used in [33] , the covariates determined by SST are interpretable with theoretical supports.

The class of GP models has been widely used in spatial and spatio-temporal statistics to interpolate spatial and space-time data [12, 41, 2, 13, 44] ; in design and analysis of computer experiments to build a statistical emulator of a computationally expensive physical-based simulator [37, 14, 27, 38, 8] ; and in general regression context [36, 34, 45] where a flexible model is needed to learn the regression function. The effectiveness of a GP regression model critically depends on the choice of covariates. We consider the recently developed nonlinear-type TF analysis tool, SST, to convert ABD and THO to harmonic representations before conducting GP regression. SST handles a complicated and nonstationary oscillatory time series by manipulating the phase of its short-time Fourier transform (STFT) [46] or continuous wavelet transform [15] . Its theoretical and statistical properties have been extensively studied [9, 40] and it has been applied to analyze several biomedical time series. For its application to respiratory signals, see [31] for an automatic annotation system for sleep apnea events by analyzing ABD and THO, and [50] for an instantaneous respiratory rate estimator from the ECG for patients with atrial fibrillation. We refer readers with interest to [48] for a current review article.

The remainder of this paper is structured as follows. In Sec. 2, we provide a background for SST and GP regression. We mention that combining SST and GP regression to predict airflow from the ABD and THO, while making the computational load tractable by developing a locally stationary GP model (see the details in Sec. 2.3.3 and Sec. 3) are the main novelty of this study. The details of our model fitting procedure is described in Sec. 3. To assess the performance of the proposed model and methods, we apply our proposal to two real databases, which are described in Sec. 4. The setup for the prediction is also described in Sec. 4. Our data analysis results are presented in Sec. 5. We conclude with a discussion of the implications of these results and highlight some future extensions.

We begin with some background for our proposed model. First, the adaptive non-harmonic model (ANHM) is introduced to represent oscillatory signals. We then present SST that we will apply to convert ABD and THO to our covariates (i.e., input features) for the regression task. Next, a quick overview of the GP models is given. In the next section, SST and GP will be combined to carry out the predictions and the associated prediction uncertainties.

2.1. The adaptive non-harmonic model. A primary characteristic of respiratory signals, or general biomedical signals, is oscillation. This kind of signal usually repeats a certain pattern, where the strengths (amplitude) and periods (inverse of frequency) from one cycle to the next are usually not fixed. Moreover, the oscillatory pattern is usually non-sinusoidal, and can change with time as well. Another characteristic is that one biomedical signal might contain more than one oscillatory component. One can appreciate these features by looking at the respiratory signal of interest in this study (see Fig. 1 ).

Due to the aforementioned nonstationarity in both amplitude and frequency, it is challenging to model respiratory signals by the commonly used time series statistical models, like the SARMA (seasonal autoregressive and moving average) model [4, 5] , TBATS (Exponential smoothing state space model with Box-Cox transformation, ARMA errors, Trend and Seasonal components) model [16] , seasonal-trend decomposition (STL) [10] , etc. Due to its clinical importance, the adaptive non-harmonic model (ANHM) was proposed in [47] to model such nonstationary oscillatory time series. Below, we review the ANHM.

Fix > 0. A signal x satisfies the -ANHM if it can be represented by

where x is the recorded signal, which includes L oscillatory components, and the l-th oscillatory component, A l (t)s l (φ l (t)), satisfies the following conditions.

(1) φ l describes the phase, which is assumed to be monotonically increasing, C 2 , and satisfies |φ l (t)| ≤ φ l (t) for all t ∈ R. Since φ l is assumed to be monotonically increasing, φ l is positive. It describes how fast the l-th component oscillates at each time.

(2) A l (t) describes the amplitude of the l-th oscillatory component, which is assumed to be C 1 , positive, and satisfies |A l (t)| ≤ φ l (t) for all t ∈ R. The assumptions |φ l (t)| ≤ φ l (t) and |A l (t)| ≤ φ l (t) for all t ∈ R indicate that = + Figure 1 . A typical airflow signal (gray) and its respiratory component (blue) and hemodynamic component (red). The hemodynamic component is usually called the cardiogenic artifact. The ECG signal is also shown in the bottom for a comparison (black). It is clear that some breathing cycles take a longer time and oscillate stronger than others. The two green vertical arrows indicate two cycles of different lengths. It is also clear that the oscillatory patterns of both components are not sinusoidal (magenta boxes) and vary from one to the other. Note that the cardiogenic artifact cycles coincide with the spikes shown in the ECG signal, indicated by the six blue arrows.

the l-th oscillatory component A l (t)s l (φ l (t)) can be well approximated by

(3) s l describes the "oscillatory pattern", which is assumed to be mean zero 1-periodic so that its first Fourier mode is non-zero and s l L 2 ([0,1]) = 1. (4) ε is the noise that contaminates the signal, which is assumed to be a stationary random process with short range dependence structure. See [9] for details. While it is possible to consider a more general noise structure, it is out of the scope of this paper, so we focus on this noise structure.

We call A l (t) the amplitude modulation (AM), φ l (t) the instantaneous frequency (IF), and s l the wave-shape function, of the l-th oscillatory component. Take the airflow signal in Fig. 1 as an example. One would need L = 2 to model the THO signal. The first oscillatory component is the breathing activity, which has lower frequency and larger amplitude, and the second oscillatory component is the cardiogenic artifact, which has faster frequency and smaller amplitude. Note that the inhalation time is shorter than the exhalation time following the normal physiology, so the oscillatory pattern s 1 is asymmetric, and hence it is clear that s 1 cannot be sinusoidal. For the airflow signal, the wave-shape function is usually of mean zero, since the inhaled air volume is the same as the exhaled air volume in each breathing cycle. Moreover, the oropharyngeal structure, the airway, and the environments all impact the flow pattern [19] . So, overall the wave-shape function of the airflow signal can be viewed as an integration of the overall pulmonary system. However, as it has been extensively discussed in [29] , the ANHM might not adequately describe the complicated features of physiologic signals, and we need to reconsider what the wave-shape function is. We start with a mathematical discussion. Take A(t)s(φ(t)) as an example. By a direct Fourier series expansion, either pointwisely or in the weak sense depending on the regularity of s, we have

is the fundamental component, and a k cos(2πkφ(t) + α k ), k ≥ 1, is the k-th multiple (or harmonic) of the fundamental component. In other words, a non-sinusoidal oscillation can be viewed as a summation of many sinusoidal oscillations, and the frequencies of those sinusoidal oscillations are integer multiples of the frequency of the fundamental component. To see why the ANHM is not good enough, note that as can be visualized in Fig. 1 , the oscillatory pattern of the airflow signal is not fixed from time to time. This variation could be quantified by the deviation of each multiple [29] . Thus, in [29] , it is suggested to generalize the ANHM to the following:

(2)

where a l,k (t) > 0 and φ l,k > 0 so that |φ l,k (t)/φ l,1 (t) − k| ≤ . Clearly, a l,k (t) and φ l,k (t) are generalization of a k A(t) and kφ(t) + α k in (1) respectively. In this work, we consider (2) to model different respiratory signals, and we call {a j,k (t), φ j,k (t)} k∈N the harmonic representation of the j-th oscillatory component at time t. Last but not the least, we emphasize that this model is not only suitable for the respiratory signal, but can also be used to model several other oscillatory time series.

Synchrosqueezing transform (SST). Since harmonic representations of THO and ABD contain their key physiological features, in the following we describe how one can use SST to extract these features. In short, SST is produced by applying the reassignment rule to STFT. Let us first consider STFT. With a chosen window function h (e.g., Gaussian kernel function centered at the origin), STFT is defined as

where t ∈ R indicates time and ξ ∈ R indicates frequency. Typically, |V f (t, ·)| 2 is referred to as the spectrogram of the signal f at time t, since it represents the power spectrum of the truncated signal f (·)h(· − t) around t. Second, SST is defined by modifying STFT:

where ξ ≥ 0 and δ means the Dirac measure, and the reassignment rule Ω (h,υ) f is determined by

Here, Dh(t) is the derivative of the chosen window function h, means the imaginary part, and υ > 0 gives a threshold so as to avoid instability in the computation when |V 

As a result, with the estimated IF, φ 1,k , of the respiratory signal, one is able to recover a 1,k (t) cos (2πφ 1,k (t)), and hence the desired harmonic representation, {a 1,k (t), φ 1,k (t)}. We refer readers to a non-mathematical tutorial [49] and [48] for a recent review summarizing its clinical applications.

In practice, the continuous signal f is regularly sampled over a discrete set of time points with sampling interval ∆ t > 0. The sampling rate is hence f s = ∆ −1 t . Suppose the recording starts at time t = 0. Write the uniformly sampled signal as a column vector f ∈ R N , where N is the number of sample points and the -th entry of f is f ( ∆ t ). Take M so that 2M is the chosen number of pixels in the frequency axis of the TF representation. The TF representation of f determined by STFT, a matrix V f ∈ C N ×(M +1) , is then evaluated by directly discretizing the STFT formula. The TF representation of f determined by SST is also evaluated by a direct discretization of the SST formula, which is denoted as S υ f ∈ C N ×(M +1) . The IF estimation and oscillatory component reconstruction is then carried out by

where Z n = {1, 2, . . . , M + 1} and λ ≥ 0. Here, c indicates a curve in the TFR

|c(m)−c(m−1)| 2 quantifies the regularity of the extracted curve, and λ is the penalty term controlling the regularity of the curve c. Based on the robustness property of SST [9] , the extracted curve c * is a robust estimator of the IF of the strongest IMT function. The reconstruction formula (6) for the k-th IMT function at time t = l∆ t is discretized into:

where ∆ ξ > 0 is the discretization bin in the frequency axis, and

In practice, since is usually unknown, B could be chosen to be

where b > 0 is the frequency range chosen by the practitioner. One would take b sufficiently small so that φ k (l∆ t ) − b and φ k−1 (l∆ t ) + b are sufficiently separately. It is clear that the amplitude of a harmonic is directly related to the intensity of its associated curve in the TFR. The extracted features, x l , from both the abdominal and thoracic movement signals are shown in the bottom, which form the input covariates of the Gaussian process to predict the airflow signal.

In this work we use GP models to predict the airflow from the harmonic representations of THO and ABD (e.g., see Fig. 2 ) extracted by SST. Here a brief background on the GP model is given. First we describe the main elements of GP, namely the mean function and covariance function, and how these modeling elements are chosen in this work. We then discuss how we handle the computational and modeling issues when applying the GP model to long and nonstationary signals by using a locally stationary GP model. The interested reader is referred to [45] for a more detailed treatment on various topics of GP and its usage in machine learning and [21] for recent development in computer experiments.

2.3.1. Nonparametric regression with Gaussian processes. GP is a stochastic process such that every finite collection of those random variables jointly follow a multivariate normal distribution. The analytical tractability of GP makes it a powerful tool for conducting prediction and inference for a function. To fix the notation, we assume that there are N ∈ N pairs of predictor and response in the training dataset. Let Y := {y t } N t=1 be a set of measurements for the response of interest, which is the airflow signal in this study. The features of THO and ABD signals extracted by SST, the harmonic representations, will be treated as the covariates, and we write them as X := {x t } N t=1 ⊂ X, where X is in a finite dimensional Euclidean space. We call X the harmonic representation space. Consider the following regression model:

where g : X → R is an unknown nonlinear function that relates the airflow to THO and ABD represented in the harmonic representation space, and t is the white error term following a zero mean normal distribution with the constant variance τ 2 . When there is no danger of confusion, we drop the subscript t in the rest of this section. GP will be used to nonparametrically model the regression function g and facilitate the ultimate prediction task. From the Bayesian viewpoint, we consider a GP as a prior for the unknown regression function g.

Mean and covariance functions. The specification of a GP is complete with the mean function m(x) = E(g(x)), where x ∈ X, and the covariance function

The common GP modeling practice is to use a relatively simple mean structure (e.g., low-order polynomial of x in spatial prediction; constant or even zero mean function in emulating computer simulations in computer experiments and non-parametric regression in machine learning) while let the covariance function absorb most local structure. For a given covariance function, the resulting covariance matrix for the predictors X is positive-definite. There exist some families of parametric covariance function that are commonly used. In general, one could also design a valid covariance function using Bochner's Theorem [see 20, p. 208]. Additional assumptions on covariance function are typically made to facilitate the inference, specifically stationarity and isotropy. These two assumptions can be thought as the translational and rotational invariance properties so that K(x, x ) only depends on the "distance" regardless of their "locations", i.e., K(x, x ) = C(h), where C is a valid isotropic covariance function and h is an appropriate distance between x and x in the context of interest.

Once the mean and covariance functions have been specified, the response Y follows a multivariate normal distribution with the chosen mean m = {m(x t )} N t=1 ∈ R N and the covariance matrix Σ θ,σ 2 , which (i, j)-th entry is C θ,σ 2 ( x i −x j ), where i, j = 1, · · · , N , σ 2 is the marginal variance and θ includes the parameters of the correlation function. More specifically, under the GP regression model, we have

In this study, we assume the mean function is an unknown constant in the harmonic representation space; that is, m(x) = µ for x ∈ X. We use the Matérn class covariance function [41] :

where h ≥ 0, Γ(ν) is the gamma function, and K ν is the modified Bessel function of the second kind. In this work, we restrict our attention on a few fixed ν values (ν = 0.5, 1.5, ∞), which should give us some flexibility in determining the "smoothness" of the regression function of interest. Note that ν = 0.5 and ∞ correspond to the exponential and squared exponential covariances, respectively. We use maximum likelihood method to estimate the GP parameters, (µ, σ 2 , ρ, τ 2 ). Specifically, the maximum likelihood estimates (MLE), denoted by (μ,σ 2 ,ρ,τ 2 ), are obtained by finding the maximizer of the log-likelihood function:

For x 0 ∈ X, a harmonic representation that may or may not be in the training set X , g(x 0 ) can be estimated using the conditional distribution formula for the multivariate normal distribution. Specifically, we use the plug-in estimatorĝ(

Note that this is the best linear unbiased estimator (BLUE) under the squared loss. This estimator can be expressed as a linear predictor in terms of y, where the coefficients are determined by the estimated GP parameters using the same set of y. Some simulation studies [53, 52] suggested that this "empirical" or "estimated" BLUE works well even with small N . Additionally, the GP model allows for quantifying the prediction uncertainty using the estimated conditional variance Var(ĝ(x 0 )) = σ 2 −k[Σ +τ 2 I] −1kT .

Locally stationary GP. One obstacle for the direct use of GP with the likelihoodbased method is the computation of the log-likelihood function needed for parameter estimation. Specifically, the maximum likelihood estimation here requires computing the determinant and the matrix inversion repeatedly, which can be computationally prohibitive when n is large due to the O(n 3 ) computation and O(n 2 ) memory storage (see [25] for a recent case study for analyzing large spatial datasets and references therein or [1] for a fast method under some specific covariance structures). In this study, more than 200,000 data points are analyzed to predict the airflow signal. The solution we propose is fitting the GP model sequentially and "locally" in the harmonic representation space. Specifically, for each prediction window, we form the training data using the nearby historical observations, where these nearby observations are the nearest neighbors (NN) in terms of the Euclidean distance in the harmonic representation space. By using a small number of NN with a small prediction window in time, the size of the training set can be controlled so that each prediction can be computed fairly quickly. It is worth pointing out that our proposal here is motivated by the locally stationary GP in the literature [23, 24, 22, 28] . This approach not only allows for computational tractability but also provides a means of handling possible nonstationaries.

Curse of dimensionality is a well-known issue in data analysis. Another commonly encountered challenge is the nonlinearity of the data structure. In our work, we encounter both challenges in the harmonic representations X. To handle these challenges, a common approach is assuming that the covariates are supported on, or near some lower-dimensional nonlinear manifold; that is, assuming that X is located on a low-dimensional manifold in our work. It is thus reasonable to respect and explore such lower-dimensional nonlinear structure. While there have been several variations in the GP modeling context, taking the manifold structure into account is still in its infancy [51, 6, 30, 35] . By noticing the potential of combining manifold learning techniques and the GP regression, it was recently proposed in [17] to consider the symmetrized graph Laplacian (GL) as the covariance structure. Such a consideration was motivated by the fact that the GL links the random walk on the dataset to the underlying nonlinear structure via its eigenstructure. In [17] , the authors showed the spectral convergence of the GL in the L ∞ sense and proposed to take the symmetrized GL to form the covariance matrix. Theoretically, it would encode the geometric and topological information in the GP regression. While this new idea hasn't been extensively examined, for a comparison purpose, we also consider this diffusion-based approach in this work.

We now detail our model fitting procedure. There are two main steps. First, ABD and THO are represented in the harmonic representation space by SST. Then, a locally stationary GP is applied to predict the airflow signal. The proposed model is called SST-LocGP (SST-LocDBGP if diffusion-based GP is used), and it is summarized in Algorithm 1. In the remaining of this paper, for the simplicity, we drop the prefix SST and refer these two models as LocGP and LocDBGP, respectively.

In the first step, the THO and ABD signals are represented as (2) x (ABD) (t) = Suppose the airflow, ABD and THO are sampled at ξ 0 Hz. In the preprocessing step, the local regression using weighted linear least squares and a second degree polynomial model is applied to remove the trend. Based on this model and the preprocessing, at each time t l := l∆ t , where ∆ t = 1/ξ 0 and l ∈ N, (7) and (8) are applied to estimate a 

Here,â (ABD) 1,k (t l ), cos(2πφ (ABD) 1,k (t l )) and sin(2πφ (ABD) 1,k (t l )) jointly represents the k-th harmonic of ABD. Note that sine and cosine functions are simultaneously taken into account since they fully capture the monotonically increasing phase function φ (ABD) 1,k . With x l , the Taken's lag map [43] is applied to integrate the temporal structure by

x l, lag := [x l−9 , . . . , x l ] ∈ R 240 , which will be used as the covariates for the GP regression. Since in most cases the orders of magnitudes of amplitude and phase are different and their units are not comparable, x l, lag is normalized so that all components in the harmonic representation space are on the same scale [11] . The use of normalization is needed here because the Euclidean distance in the harmonic representation space is used when fitting GP. To simplify the notation, the same notation x l, lag is used to denote the normalized x l, lag .

With x l, lag and the concurrent airflow signal y(t l ), we are in the position to describe our LocGP fitting procedure. First, the airflow signal is divided into nonoverlapping time windows {I j }, j ∈ N. For a given window I j and a given time t l0 ∈ I j , to predict the airflow signal g(t l0 ), the training data is constructed by taking the union of all the K ∈ N NNs in I j in the harmonic representation space, where those NNs are chosen from the training pool (i.e., the "past" time windows {I j }, where j < j). Here, K is a tuning parameter. 1 After obtaining the MLE of the GP for each time window, we predict the airflow signal using the plug-in estimate of the conditional mean at each time point. Conditional standard deviation at each time point is used to quantify the pointwise prediction uncertainty. To examine the improvement using GP, we also fit a locally stationary linear regression but without the inclusion of lag map (to avoid numerically instability due to the strong temporal dependence in SST features). We refer this benchmark model as LocLm.

Inputs: subject s and the j-th time window I j ; {x l, lag }, where l ∈ I j (i.e., the covariates in the testing window); K: # of NNs; Prediction type.

(1) If prediction = intra-subject, search K-NNs from the training pool F = {y s l , x s l , lag } l ∈{I j } , j < j where {y s l , x s l , lag } is the flow and harmonic representations pairs of the s-th subject; if prediction =inter-subject, then the training pool is F = s {y s l , x s l, lag }, s = s. (2) Take union of the K-NNs of all harmonic representations within the j-th window to form the training set T j and its corresponding response set R j . (3) Construct the covariance matrix Σ from T j . If DB is considered, construct the covariance matrix as D −1/2 ΣD −1/2 , where D is a diagonal matrix with D i,i = j (Σ) i,j . (4) Estimate GP parameters (µ, σ 2 , ρ, τ 2 ) using MLE from T j and R j . Denote the estimated parameters as (μ,σ 2 ,ρ,τ 2 ). Outputs:

• Point estimate:μ +k[Σ +τ 2 I n ] −1 (y −μ1).

• Standard deviation: diag(σ 2 −k[Σ +τ 2 I n ] −1kT ).

Databases. The proposed method is examined by applying it to two datasets. The first dataset consists of 5 clinical subjects being evaluated for sleep apnea where their whole night standard polysomnography recordings were used. The recordings were collected from the sleep center at Mackay Memorial Hospital (MMH), Taipei, Taiwan. The institutional review board of the MMH approved the study protocol (No. 18MMHIS142e). These 5 subjects were diagnosed to be free of sleep apnea (i.e., the apnea-hypopnea index is less than 5) with recording length ranging from 6 to 7.5 hours. The THO and ABD were recorded by the piezo-electric bands and airflow was measured using nasal pressure, both at the sampling rate 100 Hz. All signals were acquired by a biosignal amplifier system from Embla (NeuroLite, Belp, Switzerland). These recordings will be integrated into the Taiwan Integrated Database for Intelligent Sleep (TIDIS) project, so we refer this database as TIDIS.

To focus on evaluating our proposed model LocGP and to avoid the complicate signal quality issue commonly encountered in the biomedical signal, these recordings were visually screened and confirmed to be limited contaminated by artifacts. The second database is from the Perelman School of Medicine, University of Pennsylvania (UPenn hereafter). The UPenn dataset consists of a single subject used in [33] with 829.3 seconds of 120 Hz signals of flow measured by spirometer and ABD and THO signals by respiratory inductance plethysmography (RIP). The airflow prediction of the UPenn dataset represents a more challenge setting where we would need to predict the paradoxical breathing due to modulation of respiration by transitions from spontaneous to pressure support ventilation during general anesthesia.

Evaluation. We carry out both intra-subject and inter-subject predictions. In the intra-subject setup, we take the subject's historical data to train the prediction model, which will be used to predict the airflow signal in the future. In the intersubject setup, we build the prediction model from a group of subjects, and apply the trained prediction model to predict the airflow signal of any subject not in that training group.

The following steps are common in both setups. First, the airflow, THO and ABD are resampled at 10 Hz (i.e., ∆ t = 1/10). For STFT and SST, h(t) = e −t 2 /32 is chosen as the window. The frequency range of the SST is set to be from 0 to 2 Hz, and the frequency resolution is 10 −4 Hz, and b, the frequency range is chosen to be 0.05. For the curve extraction, λ = 0.3 is chosen. Since the prediction is performed in a window-by-window fashion, the window size is set to be 30 seconds, which is consistent with the epoch length commonly used in the sleep stage classification. For the intra-subject prediction, the first time window is excluded as some historical information is needed for the prediction. To determine the training data, the number of NNs at each time point is chosen to be K = 3. The exponential covariance function is chosen when fitting LocGP.

Define the root mean square error (RMSE) of the predicted airflow signal by ŷ − y L 2 (I j ) , whereŷ and y are the predicted and true airflow signals respectively, and I j is the j th window. Due to the inter-subject variability, there is an inevitable global phase shift of the predicted airflow signal and the true airflow signal when the inter-subject prediction is carried out. The global phase shift is then estimated and removed by a global alignment before carrying out our evaluation in the intersubject prediction. With the RMSE, over each window I, the RMSE reduction is defined as

which is our primary metric to assess the prediction performance. A large RMSE reduction indicates a good prediction performance. While the window-based RMSE reduction index is a desirable metric given the nonstationarity nature of the respiratory signal, it does not distinguish errors in different frequency bands. Since we want to recover detailed information encoded in airflow signal as accurately as possible, we shall further evaluate its performance in the high frequency region. To this end, the RMSE reduction of the differentiation of the predicted airflow signal,ŷ , is evaluated against the RMSE reduction of the differentiation of the true airflow signal, y where a 6-th order Butterworth lowpass filter is applied to obtain these differentiated RMSEs. We call this index the differentiation-RMSE reduction.

In the following experiments, the computations were carried out using macOS Mojave 10.14.6, 3.2 GHz Intel Core i7, 32 GB 2667 MHz DDR4, with Matlab c version 9.8.0.1323502 (R2020a). Fig. 3 shows a segment of detrended signal of 200 seconds long from the TIDIS database (top three panels) and a segment of detrended signals with the same length from the Upenn database (bottom three panels). The airflow, ABD and THO signals all oscillate at the same rate, but with different oscillatory morphologies. Unlike the TIDIS database, we can clearly see a dramatic amplitude variation of the airflow, ABD and THO signals in the UPenn database. 5.1. TIDIS. We first summarize the results for the intra-subject prediction. In Fig. 4 , the boxplots of RMSE reduction for the intra-subject prediction over all windows of those subjects with the best, average and worst prediction performance are shown. Overall, the LocGP and LocDBGP with the Matérn covariance function ν = 1.5, the lag map, performs the best (except for Subject 4). However, this choice leads to less well calibrated prediction uncertainty. Therefore, we will report the results using the exponential covariance function. It is also clear from the boxplot that a good amount of improvement is achieved compared with LocLm. Next, we focus on Subject 5, which represents the "average" case among all 5 subjects in the TIDIS dataset. In the top panel of Fig. 5 , the RMSE reduction over consecutive 30-second windows is shown. The predictions are not accurate over the first few windows due to the limited historical signals. However, the predictive power improves as the observational history increases.

Physiologically, the prediction performance may depend on different sleep stages. According to American Academy of Sleep Medicine [3] , there are five sleep stages: wake (W), rapid-eye-movement (R), and non rapid-eye-movement that can further divided into N1, N2, and N3. See the top panel of Fig. 5 for an example of the whole night sleep dynamics. It is a physiological fact that the respiratory activity during N2 and N3 is more stable compared with that during W, R and N1. Therefore, we would expect a higher RMSE reduction during N2 and N3. To confirm this hypothesis, we divide all 30 seconds windows into two groups. The first group contains those windows that have the same sleep stages as their following windows, and the second group contains those windows that have different sleep stages as their following windows. In other words, the second group contains windows that have sleep stage jumps. The median of overall RMSE reduction of all windows of 5 subjects with (without) sleep stage jumps are 0.845 (0.885), 0.789 (0.834), 0.677 (0.696), 0.745 (0.766), 0.724 (0.754) for N3, N2, N1, REM and W. As expected, when there are changes in the sleep stage, the RMSE reduction is lower, and the RMSE reductions are usually high when the sleep stage is N2 or N3. In order to illustrate the quality of predictions, three examples that represent the "best", "average" and "poor" cases of Subject 5 are plotted in the bottom panel of Fig. 5 . Here, it is clear that LocDBGP and LocGP both predict the airflow well and their fits are nearly identical. The "poor" case shown here presents a typical example of how the artifacts deteriorate the prediction performance.

Next, the performance of the inter-subject prediction is evaluated. Here, the signals from Subjects 1 and 2 are used to establish the prediction model for the flow signals, and the model is applied to Subjects 3, 4, and 5. The global alignment to correct the global phase shift are 0.1 second forward, 0.1 second and 1 second backward for Subjects 3, 4 and 5, respectively. Fig. 6 shows the boxplots of the RMSE reduction over all windows. We also add the corresponding intra-subject prediction for a comparison to contrast the inter-subject and intra-subject predictions. The median differentiation-RMSE reduction for LocGP As in the assessment of intra-subject prediction, we examine the "average" case, the Subject 3 shown in Fig. 7 . In the top subplot of Fig. 7 , it is clear that compared with the intra-subject prediction, the prediction in the first few windows are reasonably well. It is because in the inter-subject prediction, we do not need to accumulate historical data before establishing a reasonably well prediction model. Lastly, we zoom in several 10-second time windows in Subject 3 to both qualitatively and quantitatively assess the inter-subject predictions. In the bottom subplot of Fig. 7 , three windows associated with the "best", "average", and "poor" prediction performances are shown. Despite the less impressive numerical performance in terms of RMSE reduction, the "best" and the "average" cases capture the flow signals reasonably well, and the overall oscillatory morphology is well captured. The "poor" case here again demonstrates that, as would be expected, the prediction performance degrades when the respiratory signals are contaminated by artifacts.

UPenn. The intra-subject prediction for the UPenn data is demonstrated. This database represents a more challenging paradigm in that we need to adapt to the paradoxical breathing due to modulation of respiration by transitions from spontaneous to pressure support ventilation during general anesthesia. The top subfigure of Fig. 8 shows the RMSE reduction of the prediction using LocGP and LocDBGP, respectively. Both fits are based on the exponential covariance function, normalized SST with lag map. We observe that both LocDBGP and LocGP preform reasonably well, except for those windows when dramatic changes in the oscillatory pattern occur (i.e, around these vertical gray lines, and in the middle of the third segment). The results suggest that the GP model tends to be more robust against sudden change of signals compared with linear regression model as these drops in RMSE reduction are smaller. The median coverage rates of LocGP and LocDBGP are 0.950 and 0.825 and the median differentiation-RMSE reduction are both 0.6794.

A closer look into how the LocGP and LocDBGP fit during three "representative" windows is shown in the bottom subfigures of Fig. 8 . The left panel represents the "best" case, where the algorithm performs well, especially LocDBGP, with the RMSE reduction close to 1. The middle panel shows an "average" case, where LocDBGP captures the general flow pattern while LocGP somewhat underestimate the Figure 6 . Boxplots of the RMSE reduction of all windows for intra-subject and inter-subject prediction. The "best", "average" and "worst" ones, Subjects 5, 3 and 4, are shown on the left, middle and right panels respectively. In each sub-panel, the intra-and inter-subject results are shown on the left and right respectively. As in Fig. 4 , red boxplots are for LocLm, lightblue boxplots are for Matérn ν = 1.5 covariance and green boxplots are for exponential covariance. In the supplementary materials, we provide more numerical results, including results with different covariance functions, different choice of K, skipping the diffusion, and skipping the normalization.

In this work, the LocGP and LocDBGP models are proposed to predict the airflow signal from the simultaneously recorded ABD and THO signals. This problem is challenged due to the need to handling the complicated non-stationary oscillation of the respiratory signal, and the nonlinear relationship among those signals. The main novelty of this work is twofold. First, we propose a novel harmonic representation of the respiratory signals by applying SST. Second, we develop a locally stationary GP, LocGP, to model the nonlinear relationship among the airflow, ABD and THO signals. The LocGP is enriched by capturing the geometric structure by the diffusion idea, which leads to LocDBGP. The LocGP not only enables the computational scalability but also provides a means to handle different types of non-stationarity. We evaluate our method in both the intra-subject and inter-subject setups and obtain encouraging results. To our knowledge, this is the first algorithm able to handle such a challenge. Compared with the EMD algorithm considered in [33] , the new harmonic representation is both theoretically well-founded and interpretable. We expect that this representation has a potential to further evaluate the respiratory dynamics for patients with different diseases. The proposed model may also be useful in estimating other physiologic signals such as arterial blood pressure, from nonintrusive surface sensors.

The study has several limitations. First, the number of components to use in SST is arbitrarily chosen as four, as this setting provides good reconstructions of ABD and THO. It would be interesting to investigate how to determine the optimal number of components. The curve extraction algorithm (7) is critical for constructing the harmonic representation. While it works reasonably well in this work, we notice that it is also a source of error. Specifically, a fixed λ that works well for some segments of recordings might not work that well for other segments. We will explore the solution in our upcoming clinical work. Second, we only consider the simple covariance structure with the Euclidean distance of the standardized harmonic representation when fitting GP. In general, other forms of covariance function can be used; for example, the product form or additive form. These general forms might have their own benefits, particularly when the amplitude and frequency are of different importance. How to construct a more sensible covariance function is another future direction. Third, the inter-subject variability is not handled in this study. We simply take the first two subjects (according to the chronological order) to construct the prediction model and apply it to the remaining subjects. It would be beneficial to take the inter-subject variability into account when constructing a prediction model that can more sensibly and effectively "borrow strength" across individuals to achieve a better inter-subject prediction. Fourth, the recordings in the TIDIS database are confirmed visually to be relatively less impacted by artifacts. Since artifacts would inevitably reduce the prediction performance, to apply the established algorithm to the clinical setup, establishing a proper signal quality index so that we can properly identity respiratory signal segments of poor quality is needed. To avoid distracting the focus of this paper, we postpone a development of such a robust procedure to our future work. Last but not the least, it should be emphasized that the breathing pattern can be controlled voluntarily. For example, when we are talking, the flow signal might deviate significantly. When a subject is under some pathological situations, like hemothorax or pneumothorax, the relationship between the rib cage volume and lung volume is different, and the above-mentioned physiological relationship might not hold. In this work, these voluntary activities and pathological situations are not discussed. These important topics will be explored in our future work.

We show several examples in Fig. 9 to demonstrate that SST can efficiently decompose ABD and THO into their harmonics. 6 Figure 9 . The reconstructed ABD (green) and THO (blue) signals of three segments from Subject 2 in the TIDIS dataset. The reconstruction is from the first 4 harmonics decomposed by SST.

Appendix B. The choice of covariates: SST versus time series

An example is provided to demonstrate the prediction improvement by using harmonic representation features rather than that of the time series features. Fig. 10 shows in-sample (left panel) and out-of-sample (right panel) predictions for linear regression using harmonic representation features as covariates (blue lines, LmSST, which is LocLm in the main article) and time series as covariates (pink lines, LmXY). The main message here is that, linear regression with harmonic representation features can often achieve a fairly reasonable prediction performance and the improvement compared with the time-domain linear regression is usually substantial. Figure 10 . Airflow predictions of the UPenn data for training and testing periods using linear regression with harmonic representation features and time domain features as covariates, respectively.

In Fig. 11 , the prediction results with the same training and testing sets but with Gaussian process (GP) regression with exponential covariance function are shown. Here, the GP fitting is carried out using both harmonic representation features (green) and time-domain features (red). A couple of observations follow: 1) GP usually achieves a good in-sample prediction, especially with harmonic representation feature; 2) the GP out-of-sample prediction using time-domain features as covariates can preform substantially worse than linear regression with harmonic representation features (blue). Figure 11 . As in Fig. 10 , but here the comparison of predictions using linear regression with harmonic representation as covariates (blue), GP with time domain features as covariates (red), and GP with harmonic representation as covariates (green), respectively, are shown.

An important conclusion here is that using harmonic representation features to represent the oscillatory signals to conduct regression analysis usually leads a good prediction performance, even with a simple model such as linear regression. A combination of SST and GP further improves the prediction accuracy.

The effect of normalization in terms of prediction by standardizing the harmonic representation space is examined here. After the standardization of the harmonic representation space, all components of the harmonic representation are in the same scale. Note that the scale of amplitude components are usually different from that of phase components, which are always between -1 and 1. In Fig. 12 Figure 12 . RMSE reduction of each subject with (SD) and without (Base) standardization of the harmonic representation features when fitting GP using exponential (Exp) covariance (upper) and Matérn ν = 1.5 (Mat32) covariance (lower), respectively.

In the main text of the manuscript, we standardize the harmonic representation features for training and testing datasets separately. In Fig. 13 , it is shown that the prediction performance further improves if we apply a single standardization for the whole data, including both training and testing datasets. Note that the standardization for training and testing datasets separately is what we can carry out in the real world application. Figure 13 . Bxoplots of RMSE reduction when applying standardization separately ("Sep") and all-together ("All") for exponential (green) and Matérn ν = 1.5 (lightblue) covariances.

A summary of prediction performance using different covariance functions when fitting LocGP is explored here. The results suggest that GP with the Matérn ν = 1.5 covariance trends to perform slightly better than the exponential and the squared exponential covariances in terms of RMSE reduction. However the asscoiated prediction uncertainty, evaluated by empirical coverage rate, tends to be the worst (a bit less than 80% coverage rate for its 95% confident interval). It is also worth pointing out that, despite commonly used when performing non-parametric regression with GP, the squared exponential performs the worst in all the cases we have examined. Figure 14 . Boxplots of RMSE reduction when using exponential (Exp) covariance (green) and Matérn ν = 1.5 (Mat32) covariance (lightblue), and squared exponential (sqExp) covariance (pink) with standardized harmonic representation features and lag map.

When fitting both LocGP and LocDBGP, one of the tuning parameters is K, the number of NNs used to form the training set. A sensitivity analysis by letting K = 1, 3, and 50 is carried out on the UPenn dataset. Fig. 15 indicates that the prediction performance is relatively insensitivity with respect to K, except for the first half part of segment 2, where the prediction using K = 50 is worse than that of the predictions with K = 1 and 3. This result is not surprising. Since there are few epochs during the transition period, it is better to consider a small K. The results here also suggest that K = 3 is a reasonable choice. We also examine how the empirical coverage rate (ECP) depends on the number of NNs used. Fig. 16 suggests that the ECP might decrease with K, where the higher than the nominal rate (i.e., 0.95) with K = 1 may suggest the interval is too wide and therefore the inference is not "sharp". Table 3 . The relative frequency for each sleep stage conditioning during transition periods for each TIDIS subject.

Due to the space limitation in the main article, we only provide the RMSE reduction time series for all the consecutive 30-second windows for Subject 2. Here we plot the RMSE reduction time series for each subject.

Due to the space limitation in the main article, we only provide the RMSE reduction time series for all the consecutive 30-second windows for Subject 3. Here we plot the RMSE reduction time series for Subject 4, 5. Appendix I. TIDIS inter-subject prediction RMSE with all the combinations of two training subjects

Here we present the RMSE reduction for the inter-subject prediction of each subject using any two other subjects as the training data so there 4 2 × 5 = 6 × 5 = 30 different combinations. As explained in the main text, here we apply global alignment to account for possible phase shifts. For most cases these shifts are relatively small (with 0.2 sec) except for all the cases for Subject 5 where the phase shifts are either 0.9 or 1.0 second backward and for Subject 2 using Subject 4 and 5 as the training set where the phase is 0.9 second forward. The number of NNs used here is 1 rather than 3. 

Fast direct methods for gaussian processes

Hierarchical modeling and analysis for spatial data

The aasm manual for the scoring of sleep and associated events. Rules, Terminology and Technical Specifications

Time series analysis : forecasting and control

Time series: theory and methods: theory and methods

Manifold gaussian processes for regression

Extraction of respiratory signals from the electrocardiogram and photoplethysmogram: technical and physiological determinants

Analysis methods for computer experiments: How to assess and what counts? Statistical science

Non-parametric and adaptive modelling of dynamic periodicity and trend with heteroscedastic and dependent errors

Stl: A seasonal-trend decomposition

Local regression models

Statistics for spatial data

Statistics for spatio-temporal data

Bayesian prediction of deterministic functions, with applications to the design and analysis of computer experiments

Synchrosqueezed wavelet transforms: An empirical mode decomposition-like tool. Applied and computational harmonic analysis

Forecasting time series with complex seasonal patterns using exponential smoothing

Diffusion based gaussian process regression via heat kernel reconstruction

Comparative provocation test of respiratory monitoring methods

Airflow shape is associated with the pharyngeal structure causing osa

The theory of stochastic processes I

Surrogates: Gaussian Process Modeling, Design, and Optimization for the Applied Sciences

Local gaussian process approximation for large computer experiments

Lognormal and moving window methods of estimating acid deposition

Local prediction of a spatio-temporal process with an application to wet sulfate deposition

A case study competition among methods for analyzing large spatial data

The empirical mode decomposition and the hilbert spectrum for nonlinear and nonstationary time series analysis

Bayesian calibration of computer models

Locally stationary spatio-temporal interpolation of argo profiling float data

Wave-shape function analysis

Extrinsic gaussian processes for regression and classification on manifolds

Sleep apnea detection based on thoracic and abdominal movement signals of wearable piezo-electric bands

Recent development of respiratory rate measurement technologies

Hilbert-huang transform yields improved minute volume estimates from respiratory inductance plethysmography during transitions to paradoxical breathing

Bayesian learning for neural networks

Intrinsic gaussian processes on complex constrained domains

Curve fitting and optimal design for prediction

Design and analysis of computer experiments

The design and analysis of computer experiments

An optimized method for estimating the tidal volume from intracardiac or body surface electrocardiographic signals: implications for estimating minute ventilation

Inference of synchrosqueezing transform-toward a unified statistical analysis of nonlinear-type timefrequency analysis

Interpolation of spatial data: some theory for kriging

Approximating likelihoods for large spatial data sets

Detecting strange attractors in turbulence

Spatiotemporal Statistics with R

Gaussian processes for machine learning

Adaptive analysis of complex data sets

Instantaneous frequency and wave shape functions (i)

Current state of nonlinear-type time-frequency analysis and applications to high-frequency biomedical signals. Current Opinion in Systems Biology

A new approach to complicated and noisy physiological waveforms analysis: peripheral venous pressure waveform as an example

Using synchrosqueezing transform to discover breathing dynamics from ecg signals

Bayesian manifold regression

Mean squared prediction error in the spatial linear model with estimated covariance parameters

A comparison of spatial semivariogram estimators and corresponding ordinary kriging predictors

This work is part of the Taiwan Integrated Database for Intelligent Sleep (TIDIS) project support by Ministry of Science and Technology 109-2119-M-002-014-, NCTS Taiwan.