key: cord-0656507-1yay69kq
authors: Sun, Chenxi; Hong, Shenda; Song, Moxian; Li, Hongyan
title: A Review of Deep Learning Methods for Irregularly Sampled Medical Time Series Data
date: 2020-10-23
journal: nan
DOI: nan
sha: ae27afd8505feb4db0e018645a6cb3c4672e660b
doc_id: 656507
cord_uid: 1yay69kq

Irregularly sampled time series (ISTS) data has irregular temporal intervals between observations and different sampling rates between sequences. ISTS commonly appears in healthcare, economics, and geoscience. Especially in the medical environment, the widely used Electronic Health Records (EHRs) have abundant typical irregularly sampled medical time series (ISMTS) data. Developing deep learning methods on EHRs data is critical for personalized treatment, precise diagnosis and medical management. However, it is challenging to directly use deep learning models for ISMTS data. On the one hand, ISMTS data has the intra-series and inter-series relations. Both the local and global structures should be considered. On the other hand, methods should consider the trade-off between task accuracy and model complexity and remain generality and interpretability. So far, many existing works have tried to solve the above problems and have achieved good results. In this paper, we review these deep learning methods from the perspectives of technology and task. Under the technology-driven perspective, we summarize them into two categories - missing data-based methods and raw data-based methods. Under the task-driven perspective, we also summarize them into two categories - data imputation-oriented and downstream task-oriented. For each of them, we point out their advantages and disadvantages. Moreover, we implement some representative methods and compare them on four medical datasets with two tasks. Finally, we discuss the challenges and opportunities in this area.

Time series data have been widely used in practical applications, such as health [1] , geoscience [2] , sales [3] , and traffic [4] . The popularity of time series prediction, classification, and representation has attracted increasing attention, and many efforts have been taken to address the problem in the past few years [4, 5, 6, 7] .

The majority of the models assume that the time series data is even and complete. However, in the real world, the time series observations usually have non-uniform time intervals between successive measurements. Three reasons can cause this characteristic: 1) The missing data exists in time series due to broken sensors, failed data transmissions or damaged storage. 2) The sampling machine itself does not have a constant sampling rate. 3) Different time series usually comes from different sources that have various sampling rates. We call such data as irregularly sampled time series (ISTS) data. ISTS data naturally occurs in many real-world domains, such as weather/climate [2] , traffic [8] , and economics [3] .

In the medical environment, irregularly sampled medical time series (ISMTS) is abundant. The widely used electronic health records (EHRs) data have a large number of ISMTS data. EHRs are the real-time, patient-centered digital version of patients' paper charts. EHRs can provide more opportunities to develop advanced deep learning methods to improve healthcare services and save more lives by assisting clinicians with diagnosis, prognosis, and treatment [9] . Many works based on EHRs data have achieved good results, such as mortality risk prediction [10, 11] , disease prediction [12, 13, 14] , concept representation [15, 16] and patient typing [17, 16, 18] .

Due to the special characteristics of ISMTS, the most important step is establishing the suitable models for it. However, it is especially challenging in medical settings.

Various tasks need different adaptation methods. Data imputation and prediction are two main tasks. The data imputation task is a processing task when modeling data, while the prediction task is a downstream task for the final goal. The two types of tasks may be intertwined. Standard techniques, such as mean imputation [19] , singular value decomposition (SVD) [20] and k-nearest neighbour (kNN) [21] , can impute data. But they still lead to the big gap between the calculated data distribution and have no ability for the downstream task, like mortality prediction. Linear regression (LR) [22] , random forest (RF) [23] and support vector machines (SVM) [24] can predict, but fails for ISTS data.

State-of-the-art deep learning architectures have been developed to perform not only supervised tasks but also unsupervised ones that relate to both imputation and prediction tasks. Recurrent neural networks (RNNs) [25, 26, 27] , auto-encoder (AE) [28, 29] and generative adversarial networks (GANs) [30, 31] have achieved good performance in medical data imputation and medical prediction thanks to their abilities of learning and generalization obtained by complex nonlinearity. They can carry out prediction task or imputation task separately, or they can carry out two tasks at the same time through the splicing of neural network structure.

Different understandings about the characteristics of ISMTS data appear in existing deep learning methods. We summarized them as missing data-based perspective and raw data-based perspective. The first perspective [1, 32, 33, 34, 35] treat irregular series as having missing data. They solve the problem through more accurate data calculation. The second perspective [17, 36, 37, 38, 39] is on the structure of raw data itself. They model ISMTS directly through the utilization of irregular time information. Neither views can defeat the other.

Either way, it is necessary to grasp the data relations comprehensively for more effectively modeling. We conclude two relations of ISMTS -intra-series relations (data relations within a time series) and inter-series relations (data relations between different time series). All the existing works model one or both of them. They relate to the local structures and global structures of data and we will introduced in Section 3.

Besides, different EHR datasets may lead to different performance of the same method. For example, the real-world MIMIC-III [40] and CINC [41] datasets record multiple different diseases. The records between diseases have distinct data characteristics and the prediction results of each general methods [1, 17, 33, 35] varied between each disease datasets. Thus, many existing methods model a specific disease record, like sepsis [42] , atrial fibrillation [43, 44] and kidney disease [45] and have improved the predicting accuracy.

The rest of the paper is organized as follows. Section 2 gives the basic definition and abbreviations. Section 3 describes the features of ISMTS based on two viewpoints -intra-series and inter-series. Section 4 and Section 5 introduce the related works in technology-driven perspective and task-driven perspective. In each perspective, we summarize the methods into specific categories and analyze the merits and demerits. Section 6 compares the experiments of some methods on four medical datasets with two tasks. In section 7 and 8, we raise the challenges and opportunities for modeling ISMTS data and then make conclusion.

The summary of abbreviations is in Table 1 .

A typical EHR dataset is consist of a number of patient information which includes demographic information and in-hospital information. In-hospital information is a hierarchical patient-admission-code form shown in Figure 1 . Each patient has certain admission records as he/she could be in hospital several times. The codes have diagnoses, lab values and vital sign measurements. 

Each record r i is consist of many codes, including static diagnoses codes set d i and dynamic vital signs codes set x i . Each code has the time stamp t.

EHRs have many ISMTS because of two aspects: 1) multiple admissions of one patient and 2) multiple time series records in one admission. Multiple admission records of each patient have different time stamps. Because of health status dynamics and some unpredictable reasons, a patient will visit hospitals under varying intervals [46] . For example, in Figure 1 , March 23, 2006 , July 11, 2006 and February 14, 2011 are patient admission times. The time interval between the 1st admission and 2nd admission is couple of months while the time interval between admissions 2, 3 is 5 years. Each time series, like blood pressure in one admission, also has different time intervals. Shown as Admission 2 in Figure 1 , the sampling time is not fixed. Different physiological variables are examined at different times due to the changes in symptoms. Every possible test is not regularly measured during an admission. When a certain symptom worsens, corresponding variables are examined more frequently; when the symptom disappears, the corresponding variables are no longer examined.

Without the loss of generality, we only discuss univariate time series. Multivariate time series can be modeled in the same way. 

Definition 2 illustrates three important matters of ISMTS -the value x, the time t and the time interval δ. In some missing value-based works (we will introduce in Section 4), they use masking vector m ∈ {0, 1} to represent the missing value.

3 Characteristics of irregularly sampled medical time series

The medical measurements are frequently correlated both within streams and across streams. For example, the value of blood pressure of a patient at a given time could be correlated with the blood pressure at other times, and it could also have a relation with the heart rate at that given time. Thus, we will introduce ISMTS's irregularity in two aspects: 1) intra-series and 2) inter-series.

Intra-series irregularity is the irregular time intervals between the nearing observations within a stream. For example, shown in Figure 1 , the blood pressure time series have different time intervals, such as 1 hour, 2 hours, and even 24 hours. The time intervals add a time sparsity factor when the intervals between observations are large [46] . Existing two ways can handle the irregular time intervals problem: 1) Determining a fixed interval, treating the time points without data as missing data. 2) Directly modeling time series, seeing the irregular time intervals as information. The first way requires a function to impute missing data [47, 48] . For example, some RNNs [1, 49, 34, 50, 51, 52] can impute the sequence data effectively by considering the order dependency. The second way usually uses the irregular time intervals as inputs. For example, some RNNs [17, 36] apply time decay to affect the order dependency, which can weaken the relation between neighbors with long time intervals.

Inter-series irregularity is mainly reflected in the multi-sampling rates among different time series. For example, shown in Figure 1 , vital signs such as heart rate (ECG data) have a high sampling rate (in seconds), while lab results such as pH (PH data) are measured infrequently (in days) [53, 54] . Existing two ways can handle the multi-sampling rates problem: 1) Considering data as a multivariate time series. 2) Processing multiple univariable time series separately. The first way aligns the variables of different series in the same dimension and then solves the missing data problem [55] . The second way models different time series simultaneously and then designs fusion methods [38] .

Numerous related works are capable of modeling ISMTS data, we category them from two perspectives: 1) technologydriven and 2) task-driven. We will describe each category in detail.

Based on technology-driven, we divide the existing works into two categories: 1) missing data-based perspective and 2) an raw data-based perspective. The specific categories are shown in Figure 2 . The missing data-based perspective regards every time series has uniform time intervals. The time points without data are considered to be the missing data points. As shown in Figure 5a , when converting irregular time intervals to regular time intervals, missing data shows up. The missing rate r missing can measure the degree of the missing at a given sampling rate r sampling .

r missing = # of time points with missing data # of time points (3)

The ISMTS in the real-world EHRs have a severe problem with missing data. For example, Luo et al. [56] gathered statistics of CINC2012 dataset [10, 57] . As time goes by, the results show that the maximum missing rate at each timestamp is always higher than 95%. Most variables' missing rate is above 80%, and the mean of the missing rate is 80.67%, as shown in Figure 3a . The other three real-word EHRs data set MIMIC-III dataset [40] ,CINC2019 dataset [58, 41] , and COVID-19 dataset [59] are also affected by the missing data, shown in Figure 3b , 3c, and 3d. In this viewpoint, existing methods impute the missing data, or model the missing data information directly.

The raw data-based perspective uses irregular data directly. The methods do not fill in missing data to make the irregular sampling regular. On the contrary, they think that irregular time itself is the valuable information. As shown in Figure  5b , the time are still irregular and the time intervals are recorded. Irregular time intervals and multi-sampling rates are intra-series characteristic and inter-series characteristic we have introduced in Section 3 respectively. They are very common phenomenons in EHR database. For example, CINC2019 dataset is relatively clean but still has more than 60% samples with irregular time intervals. Only 1.28% samples have the same sampling rate in MIMIC-III dataset. In this viewpoint, methods usually integrate the features of varied time intervals to the inputs of model, or design models which can process samples with different sampling rates.

The methods of missing data-based perspective convert ISMTS into equally spaced data. They [60, 61, 62] discretize the time axis into non-overlapping intervals with hand-designed intervals. Then the missing data shows up. The missing values damage temporal dependencies of sequences [56] and make applying many existing models directly infeasible, such as linear regression [63] and recurrent neural networks (RNN) [64] . As shown in Figure 4 , because of missing values, the second valley of the blue signal is not observed and cannot be inferred by simply relying on existing basic models [63, 64] . But the valley values of blood pressure are significant for ICU patients to indicate sepsis [65] , a leading cause of patient mortality in ICU [66] . Thus, missing values have an enormous impact on data quality, resulting in unstable predictions and other unpredictable effects [67] . Many prior efforts have been dedicated to the models that can handle missing values in time series. And they can be divided into two categories: 1) two-step approaches and 2) end-to-end approaches.

Two-step approaches ignore or impute missing values and then process downstream tasks based on the preprocessed data. A simple solution is to omit the missing data and perform analysis only on the observed data. But it can result in a large amount of useful data not being available [1] . The core of these methods is how to impute the missing data.

Some basic methods are dedicated to filling the values, such as smoothing, interpolation [68] , and spline [69] . But they cannot capture the correlation between variables and complex patterns. Other methods estimate the missing values by spectral analysis [70] , kernel methods [71] , and expectation-maximization (EM) algorithm [72] . However, simple reasoning design and necessary model assumptions make data imputation not accurate. Recently, with the vigorous development of deep learning, these methods have higher accuracy than traditional methods. RNNs and GANs mainly realize the deep learning-based data imputation methods.

A substantial literature uses RNNs to impute the missing data in ISMTS. RNNs take sequence data as input, recursion occurs in the direction of sequence evolution, and all units are chained together. Their special structure endows them with processing sequence data by learning order dynamics. In a RNN, the current state h t is affected by the previous state h t−1 and the current input x t and is described as

RNN can integrate basic methods, such as EM [73] and linear model (LR) [74] . The methods first estimate the missing values and again uses the re-constructed data streams as inputs to a standard RNN. However, EM imputes the missing values by using only the synchronous relationships across data streams (inter-series relations) but not the temporal relationships within streams (intra-series relations). LR interpolates the missing values by using only the temporal relationships within each stream (intra-series relations) but ignoring the relationships across streams (inter-series relations). Meanwhile, most of the RNN-based imputation methods, like simple recurrent network (SRN) and LSTM, which have been proved to be effective to impute medical data by Kim et al. [35] , are also learn an incomplete relation with considering intra-series relations only.

Chu et al. [49] have noticed the difference between these two relations in ISMTS data and designed multi-directional recurrent neural network (M-RNN) for both imputation and interpolation. M-RNN operates forward and backward in the intra-series directions according to an interpolation block and operates across inter-series directions by an imputation block. They implanted imputation by a Bi-RNN structure recorded as function Φ and implanted interpolation by fully connected layers with function Ψ. The final objective function is mean squared error between the real data and calculated data.

Where x, m and δ represent data value, masking and time interval we have defined in 2, we will not repeat it below. Bi-RNN is Bidirectional-RNN [75] . It is an advanced RNN structure with forward and backward RNN chains. It have two hidden states for one time point in the above two orders. Two hidden states concatenate or sum into the final value in this time point. Unlike the basic Bi-RNN, the timing of inputs into the hidden layers of M-RNN is lagged in the forward direction and advanced in the backward direction.

However, in M-RNN, the relations between missing variables are dropped, the estimated values are treated as constants which cannot be sufficiently updated.

To solve the problem, Cao et al. [34] proposed bidirectional recurrent imputation for time series (Brits) to predict missing values with bidirectional recurrent dynamics. In this model, the missing values are regarded as the variables in the model graph and get delayed gradients in both forward and backward directions with consistency constraints, which makes the estimation of missing values more accurate. It can update the predicted missing data with a combined three objective function L -the errors of historical-based estimationX, the feature-based estimationẐ and the combined estimationĈ, which not only considered the relations between missing data and known data, but also modeled the relations between missing data ignored by M-RNN.

But Brits did not take both inter-series and intra-series relations into account, M-RNN solved it.

GANs are a type of deep learning model which train generative deep models through an adversarial process [76] . From the perspective of game theory, GAN training can be seen as a minimax two-player game [77] between generator G and discriminator D with the objective function.

However, typical GANs require fully observed data during training. In response to this, Yoon et al. [78] proposed generative adversarial imputation nets (GAIN) model. Different from the standard GAN, its generator receives both noise Z and mask M as input data, The masking mechanism makes missing data as input possible. GAIN's discriminator outputs both real and fake components. Meanwhile, A hint mechanism H makes discriminator get some additional information in the form of a hint vector. GAIN changes the objective min G max D (V (D, G)) of basic GAN to 

To improve GAIN, Camino et al. [79] used multiple-inputs and multiple-outputs to the generator and the discriminator. The method did the variable splitting by using dense layers connected parallelly for each variable.

Zhang et al. [80] designed Stackelberg GAN based on GAIN to impute the medical missing data for computational efficiency. Stackelberg GAN can generate more diverse imputed values by using multiple generators instead of a single generator and applying the ensemble of all pairs of standard GAN losses.

The main goal of the above two-step methods is to estimate the missing values in the converted time series of ISMTS (convert irregularly sampled features to missing data features). However, in medical background, the ultimate goal is to carry out medical tasks such as mortality prediction [10, 11] and patient subtyping [17, 16, 18] . Two separated steps may lead to the suboptimal analyses and predictions [81] as the missing patterns are not effectively explored for final tasks. Thus, some researches proposed finding ways to solve the downstream tasks directly, rather than filling missing values.

End-to-end approaches process the downstream tasks directly based on modeling the time series with missing data. The core objective is to predict, classify, or clustering. Data imputation is an additional task or not even a task in this type of methods.

Lipton et al. [13] demonstrated a simple strategy -using the basic RNN model to cope with missing data in sequential inputs and the output of RNN being the final characteristics for prediction. Then, to improve this basic idea, they addressed the task of multilabel classification of diagnoses by given clinical time series and found that RNNs can make remarkable use of binary indicators for missing data, improving AUC, and F1 significantly. Thus, they approached missing data by heuristic imputation directly model missingness as a first-class feature in the new work [33] .

Similarly, Che at al. [1] also use RNN idea to predict medical issues directly. For solving the missing data problem, they designed a kind of marking vector as the indicator for missing data. In this approach, the value x, the time interval δ and the masking m impute missing data x * together. It first replaces missing data with the mean values, and then used the feedback loop to update the imputed values, which are the input of a standard RNN for prediction.

Meanwhile, they proposed GRU-Decay (GRU-D) to model EHRs data for medical predictions with trainable decays. The decay rate γ weighs the correlation between missing data x t and other data (previous data x t and mean datax t ).

Meanwhile, in this research, the authors plotted the Pearson correlation coefficient between variable missing rates of MIMIC-III dataset. They have observed that the missing rate is correlated with the labels, demonstrating the usefulness of missingness patterns in solving a prediction task.

However, the above models [1, 63, 34, 37, 33] are limited to using local information (empirical mean or the nearest observation) of ISMTS. For example, GRU-D assumed that a missing variable could be represented as the combination of its corresponding last observed value and the mean value. The global structure and statistics are not directly considered. The local statistics are unreliable when the continuous data misses (shown in Figure 4 ), or the missing rate rises up.

Tang et al. [32] have realized this problem and designed LGnet, exploring the global and local dependencies simultaneously. They used GRU-D model local structure, grasping intra-series relations, and used a memory module to model the global structures, learning inter-series relations. The memory module G have L rows, it capture the global temporal dynamics for missing values with the variable correlations a. Meanwhile, an adversarial training process can enhance the modeling of global temporal distribution.

The alternative of processing the sequences with missing data by pre-discretizing ISMTS is constructing models which can directly receive ISMTS as input. The intuition of raw data-based perspective is from the characteristics of raw data itself -the intra -series relation and the inter-series relation. The intra -series relation of ISMTS is reflected in the irregular time intervals between two neighbor observations within one series; The inter-series relation is reflected in the different sampling rate of different time series. Thus, two subcategories are 1) irregular time intervals-based approaches and 2) multi-sampling rates-based approaches.

In EHRs setting, the time lapse between successive elements in patient records can vary from days to months, which is the characteristic of irregular time intervals in ISMTS. A better way to handle it is to model the unequally spaced data using time information directly.

Basic RNNs only process uniformly distributed longitudinal data by assuming that the sequences have an equal distribution of time differences. Thus, design of traditional RNNs may lead to suboptimal performance.

They applied a memory discount in coordination with elapsed time to capture the irregular temporal dynamics to adjust the hidden status C t−1 of basic LSTM to a new hidden state C * t−1 .

However, when ISMTS is univariate, T-LSTM is not a completely irregular time intervals-based method. For the multivariate ISMTS, it has to align multiple time series and filling missing data first. Where they have to solve the missing data problem again. But the research did not mention the specific filling strategy and used simple interpolation like mean values when data preprocessing.

For the multivariate ISMTS and the alignment problem, Tan et al. [36] gave an end-to-end dual-attention time-aware gated recurrent unit (DATA-GRU) to predict patients' mortality risk. DATA-GRU uses a time-aware GRU structure T-GRU as same as T-LSTM. Besides, the authors give the strategy of multivariate data alignment problem. When aligning different time series to multi dimensions, previous missing data approaches, such as GRU-D [1] and LGnet [32] , assigned equal weights to observed data and imputed data, ignoring the relatively larger unreliability of imputation compared with actuality. DATA-GRU tackles this difficulty by a novel dual-attention structure -unreliability-aware attention α u with reliability score c and symptom-aware attention α s . The dual-attention structure jointly considers the data-quality and the medical-knowledge.

Further, the attention-like structure makes DATA-GRU explainable according to the interpretable embedding, which is an urgently needed issue in medical tasks.

Instead of using RNNs to learn the order dynamics in ISMTS, Bahadori et al. [37] have proposed methods for analyzing multivariate clinical time series that are invariant to temporal clustering. The events in EHRs may appear in a single admission together or may disperse over multiple admissions. For example, the authors postulated that whether a series of blood tests are completed at once or in rapid succession should not alter predictions. Thus, they designed a data augmentation technique, temporally coarsening, to exploits temporal-clustering invariance to regularize deep neural networks optimized for clinical prediction tasks. Moreover, they proposed a multi-resolution ensemble (MRE) model with the coarsening transformed inputs to improve predictive accuracy.

Only modeling the irregular time intervals of intra-series relation would ignore the multi-sampling rate relation of inter-series relation. Further, modeling inter-series relation is also a reflection of considering the global structure of ISMTS.

The above RNN-based methods of irregular time intervals-based category only consider the local order dynamics information. Although

LGnet [32] has integrated the global structures, it incorporates all of the information from all time points into an interpolation model, which is redundant and low adaptive. Some models can also learn the global structures of time series, like a basic model Kalman filters [82] and a deep learning deep Markov models [83] . However, this kind of models mainly process the every time series with a stable sampling rate.

Che et al. [39] focused on the problem of modeling multi-rate multivariate time series and proposed a multi-rate hierarchical deep Markov model (MR-HDMM) for healthcare forecasting and interpolation tasks. MR-HDMM learns generation model and inference network by auxiliary connections and learnable switches. The latent hierarchical structure reflected in the states/switches s factorizing by joint probability p with layer z.

p(x 1 , z 1 , s 1 |z 0 ) = p(x 1 |z 1 )p(z 1 , s 1 |z 0 ) 

These structures can capture the temporal dependencies and data generation process. Similarly, Binkowski et al. [84] presented an autoregressive framework for regression tasks by modeling ISMTS data. The core idea of implementation is roughly similar with MR-HDMM.

However, these methods considered the different sampling rates between series but ignored the irregular time intervals in each series. They process the data with a stable sampling rate (uniform time intervals) for each time series. For the stable sampling rate, they have to use forward or linear interpolation, where the global structures are omitted again for getting the uniform intervals. The Gaussian process can build global interpolation layers for process multi-sampling rate data. Li et al. [85] and Futoma et al. [86] used this technique. But if a time series is multivariate data, covariance functions are challenging due to the complicated and expensive computation.

Satya et al. [38] designed a fully modular interpolation-prediction network (IPN). IPN has an interpolation network to accommodate the complexity of ISMTS data and provide the multi-channel output by modeling three informationbroad trends χ, transients τ and local observation frequencies λ. The three information is calculated by a low-pass interpolation θ, a high-pass interpolation γ and an intensity function λ.

IPN also has a prediction network which operates the regularly partitioned inputs from the former interpolation module. In addition to taking care of data relationships from multiple perspectives, IPN can make up for the lack of modularity in [1] and address the difficulty of the complexity of the Gaussian process interpolation layers in [85, 86] .

Modeling ISTS data aims to achieve two main tasks: 1) Missing data imputation and 2) Downstream tasks. The specific categories are shown in Figure 7 .

Missing data imputation is of practical significance, as works on machine learning have become actively, getting large amounts of complete data has become an important issue. However, it is almost impossible in the real world to get complete data for many reasons like lost records. In many cases, the time series with missing values becomes useless and then thrown away. This results in a large amount of data loss. The incomplete data has adverse effects when learning a model [76] .

Existing basic methods, such as interpolation [68] kernel methods [71] and EM algorithm [72, 73] , have been proposed a long time ago. With the popularity of deep learning in recent years, most new methods are implemented by artificial neural networks (ANNS). One of the most popular models is RNN [64] . RNNs can capture long-term temporal dependencies and use them to estimate the missing values. Existing works [67, 34, 35, 33, 87, 88] have designed several special RNN structures to adapt the missingness and achieve good results. Another popular model is GANs [89] , which generate plausible fake data through adversarial training. GAN has been successfully applied to face completion and sentence generation [90, 88, 91, 92] . Based on their data generation abilities, some research [56, 76, 78, 93] have applied GAN on time series data generation with considering sequence information into the process.

Downstream tasks generally include prediction, classification, and clustering. For ISMTS data, medical prediction (such as mortality prediction, disease classification and image classification) [10, 12, 13, 14, 11] , concept representation [15, 16] and patient typing [17, 16, 18] are three main tasks. The downstream task-oriented methods calculate missing values and perform downstream tasks simultaneously, which is expected to avoid suboptimal analyses and predictions caused by the not effectively explored missing patterns due to the separation of imputations and final tasks [81] . Most methods [1, 32, 17, 36, 38, 39, 37, 33] use deep learning technology to achieve higher accuracy on tasks.

In this section, we apply the above methods on four datasets and two tasks. We will analyze the method through the experimental results.

Four datasets were used to evaluate the performance of baselines. CINC2012 dataset [10] consist of records from 12,000 ICU stays and have 4000 multivariate clinical time series. All patients were adults who were admitted for a wide variety of reasons to cardiac, medical, surgical, and trauma ICUs. Each record is a multivariate time series of roughly 48 hours and contains 41 variables such as Albumin, heart rate, glucose etc.

CINC2019 dataset [41] is publicly available and comes from two hospitals; it contains 30,336 patient admission records and 2,359 records of diagnosed sepsis cases. It is a set of multivariate time series that contains 40 related features, 8 kinds of vital signs, 26 kinds of laboratory values and 6 kinds of demographics. The time interval is 1 hour. The sequence length is between 8 and 336, and 29,414 records have lengths less than 60.

COVID-19 dataset [59] is collected between 10 January and 18 February 2020 from Tongji Hospital of Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China. The dataset contains 375 patients with 6120 blood sample records as training set, 110 patients with 757 records as test set and 80 characteristics.

The experiments have two tasks -1) mortality prediction and 2) data imputation. The mortality prediction tasks use the time series of 48 hours before onset time from the above four datasets. The imputation tasks use 8 features (using the method in [39] ) which are eliminated 10% of observed measurements from data. The eliminated data is the new ground-truth.

For RNN-based method, we fix the dimension of hidden state is 64. For GAN-based methods, the series inputs also use RNN structure. For final prediction, all methods use one 128-dimensions FC layer and one 64-dimensions FC layer. All methods apply Adam Optimizer [94] with α = 0.001, β 1 = 0.9 and β 2 = 0.999. We use the learning rate decay α current = α initial · γ global step decay steps with decay rate γ = 0.98 and the decay step is 2000. The 5-fold cross validation is used for both two tasks. [33] 0.809 ± 0.014 0.800 ± 0.016 0.825 ± 0.024 0.945 ± 0.004 LSTM [35] 0.812 ± 0.009 0.805 ± 0.010 0.829 ± 0.019 0.945 ± 0.005 GRU-D [1] 0.829 ± 0.003 0.818 ± 0.009 0.835 ± 0.013 0.965 ± 0.003 M-RNN [49] 0.827 ± 0.005 0.820 ± 0.011 0.842 ± 0.010 0.959 ± 0.003 Brits [34] 0.833 ± 0.002 0.819 ± 0.012 0.839 ± 0.013 0.959 ± 0.002 T-LSTM [17] 0.817 ± 0.004 0.804 ± 0.010 0.831 ± 0.014 0.963 ± 0.003 DATA-GRU [36] 0.832 ± 0.006 0.822 ± 0.012 0.851 ± 0.012 0.961 ± 0.003

LGnet [32] 0.833 ± 0.003 0.822 ± 0.013 0.843 ± 0.013 0.956 ± 0.002 IPN [38] 0.831 ± 0.003 0.824 ± 0.009 0.844 ± 0.015 0.960 ± 0.003 

The prediction results were evaluated by assessing the area under curve of receiver operating characteristic (AUC-ROC). ROC is a curve of true positive rate (TPR) and false positive rate (FPR). TN, TP, FP and FN stand for true positive, true negative, false positive and false negative rates.

T P R = T P T P + F N F P R = F P T N + F P (19) We evaluate the imputation performance in terms of mean squared error (MSE). For ith item,x i is the real value and x i is the predicting value. The number of missing values is N . Table 1 shows the performances of baselines for the mortality prediction task. For the two categories of technologydriven methods, each has its own merits, but irregularity-based methods work relatively well. Missing data-based methods have 2/4 top 1 results and 2/5 top 2 results, while irregularity-based methods have 2/4 top 1 results and 3/5 top 2 results.

For the methods of whether the two series relation are considered, the methods that take both inter-series relation and intra-series relation (both global and local structures) into account perform better. IPN, LGnet, and DATA-GRU have relatively good results. For different datasets, the methods show different effects. For example, as COVID-19 is a small dataset, unlike the other three datasets, the relatively simple methods perform better on this dataset, like T-LSTM, which doesn't perform very well on the other three datasets. Table 2 The data imputation is better in the Sepsis and COVID-19 dataset. Perhaps the time series in these two datasets is from the patients who suffered from the same disease. That's probably why they also have relatively better results in the prediction task. Table 3 shows a basic RNN model's performance for mortality prediction tasks based on baselines' imputation data. Different from the results in Table 2 , the RNN-based methods perform better. Where the RNN-based methods have 4/5 top 1 results, but GAN-based methods have 1/5. The reason may be that the RNN-based approaches have integrated the downstream tasks when imputing. So, the data generated by them is more suitable for the final prediction task.

According to the analysis of technologies and experiment results, in this section, we will discuss ISMTS modeling task from three perspectives -1) imputation task with prediction task, 2) intra-series relation with inter-series relation / local structure with global structure and 3) missing data with raw data. The conclusions of the approaches in this survey are in Table 5 .

Based on the above five perspectives, we summarize the challenges as follows.

How to balance the imputation with the prediction? Different kinds of methods suit different tasks. GANs prefer imputation while RNNs prefer prediction. However, in the medical setting, aiming at different datasets, the conclusion does not seem correct. For example, missing data is generated better by RNNs than GANs in the COVID-19 dataset. And the two-step methods based on GANs for mortality prediction are no worse than using RNNs directly. Therefore, it seems difficult to achieve a general and effective modeling method in medical settings. The method should be specified according to the specific task and the characteristics of the datasets.

How to handle the intra-series relation with inter-series relation of ISMTS? In other words, how to trade off the local structure with global structure. In ISMTS format, a patient has several time series of vital signs connected to the diagnoses of diseases or the probability of death. Seeing these time series as a whole multivariate data sample, intraseries relations are reflected in longitudinal dependences and horizontal dependencies. The longitudinal dependencies contain the sequence order and context, time intervals, and decay dynamics. The horizontal dependence is the relations between different dimensions. And the inter-series relations are reflected in the patterns of time series of different samples.

However, when seeing these time series as separated multi-samples of a patient, the relations will change. Intra-series relations change to the dependencies of values observed on different time steps in a univariate ISMTS. The features of different time intervals should be taken care of. Inter-series relations change to the pattern relations between different patients' different samples and between different time series of the same vital sign.

For the structural level, modeling intra-series relations is basically at the local level, while modeling inter-series relations is global. It is not clear what kind of consideration and which structure will make the results better. Modeling local and global structures seems to perform better in morality prediction, but it is a more complex method, and it's not universal for different datasets.

How to choose the modeling perspective, missing data-based or irregularity-based? Both two kinds of methods have advantages and disadvantages. Most existing works are missing data-based and there are methods of estimating missing data for a long time [95] . In settings of missing data-based perspective, the discretization interval length is a hyper-parameter needs to be determined. If the interval size is large, missing data is less, but several values will show in Low-applicability for multivariate data; Incomplete data relation.

Multi sampling rates-based [38, 39, 110, 111] No artificial dependency; No data imputation. Implementation complexity; Data generation patterns assumptions.

the same interval; If the interval size is small, the missing data becomes more. No values in an interval will hamper the performance, while too many values in an interval need an ad-hoc choosing method. Meanwhile, missing data-based methods have to interpolate new values, which may artificially introduce some naturally occurring dependencies.

Over-imputation may result in an explosion in size and the pursuit of multivariate data alignment may lead to the loss of raw data dependency. Thus, of particular interest are irregularity-based methods that can learn directly by using multivariate sparse and irregularly sampled time series as input without the need for other imputation.

However, although the raw data-based methods have metrics of no artificial dependencies introduced, they suffer from not achieving the desired results, complex designs, and large parameters. Irregular time intervals-based methods are not complex as they can be achieved by just injecting time decay information. But in terms of specific tasks, such as morality prediction, the methods seem not as good as we think (concluded from experiments section). Meanwhile, for multivariable time series, these methods have to align values on different dimensions, which leads to missing data problems again. Multi-sampling rates-based methods will not cause missing data. However, processing multiple univariate time series at the same time requires more parameters and is not friendly to batch learning. Meanwhile, modeling the entire univariate series may require data generation model assumptions.

Considering the complex patient states, the amount of interventions and the real-time requirement, the data-driven approaches by learning from EHRs are the desiderata to help clinicians.

Although some difficulties have not been solved yet, the deep learning method does show a better ability to model medical ISMTS data than the basic methods. Basic methods can't model ISMTS completely as interpolation-based methods [68, 69] just exploit the correlation within each series, imputation-based methods [72, 96] just exploit the correlation among different series, matrix completion-based methods [97, 98] assume that the data is static and ignore the temporal component of the data. Deep learning methods use parameter training to learn data structures, and many basic methods can be integrated into the designs of neural networks. The deep learning methods introduced in this survey basically solve the problem of common methods and have achieved state-of-the-art in medical prediction tasks, including mortality prediction, disease prediction, and admission stay prediction. Therefore, the deep learning model based on ISMTS data has a broad prospect in medical tasks.

The deep learning methods, both RNN-based and GAN-based methods mentioned in this survey, are troubled by poor interpretability [99, 100] , and clinical settings prefer interpretable models. Although this defect is difficult to solve due to models' characteristics, some researchers have made some breakthroughs and progress. For example, the attention-like structures which are used in [12, 14] can give an explanation for medical predictions.

This survey introduced a kind of data -irregularly sampled medical time series (ISMTS). Combined with medical settings, we described characteristics of ISMTS. Then, we have investigated the relevant methods for modeling ISMTS data and classified them by technology-driven perspective and task-driven perspective. For each category, we divided the subcategories in detail and represented each specific model's implementation method. Meanwhile, according to imputation and prediction experiments, we analyzed the advantages and disadvantages of some methods and made conclusions. Finally, we summarized the challenges and opportunities of modeling ISMTS data task.

Recurrent neural networks for multivariate time series with missing values

Convolutional LSTM network: A machine learning approach for precipitation nowcasting

Restful: Resolution-aware forecasting of behavioral time series data

Tensorized lstm with adaptive shared memory for learning trends in multivariate time series

Clustering and classification for time series data in visual analytics: A survey

Time2graph: Revisiting time series modeling with dynamic shapelets

Adversarial unsupervised representation learning for activity time-series

Revisiting spatial-temporal similarity: A deep learning framework for traffic prediction

Deep EHR: A survey of recent advances in deep learning techniques for electronic health record (EHR) analysis

Predicting in-hospital mortality of icu patients: The physionet/computing in cardiology challenge 2012

HOLMES: health online model ensemble serving for deep learning models in intensive care units

Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks

Learning to diagnose with LSTM recurrent neural networks

RETAIN: an interpretable predictive model for healthcare using reverse time attention mechanism

Multi-layer representation learning for medical concepts

Mime: Multilevel medical embedding of electronic health records for predictive healthcare

Patient subtyping via time-aware lstm networks

Deep computational phenotyping

A survey of methodologies for the treatment of missing values within datasets: limitations and benefits

Singular value decomposition and least squares solutions

An efficient nearest neighbor classifier algorithm based on pre-classify. Computer ence

Simple linear regression in medical research

Predicting disease risks from highly imbalanced data using random forest

A modified svm classifier based on rs in medical disease prediction

Alzheimer's Disease Neuroimaging Initiative. Rnn-based longitudinal analysis for diagnosis of alzheimer's disease

Estimating brain connectivity with varying-length time lags using a recurrent neural network

On clinical event prediction in patient treatment trajectory using longitudinal electronic health records

Bidirectional recurrent auto-encoder for photoplethysmogram denoising

A deep learning method based on hybrid auto-encoder model

Research and application progress of generative adversarial networks

An accurate saliency prediction method based on generative adversarial networks

Joint modeling of local and global temporal dynamics for multivariate time series forecasting with missing values

Directly modeling missing data in sequences with rnns: Improved classification of clinical time series

BRITS: bidirectional recurrent imputation for time series

Recurrent neural networks with missing information imputation for medical examination data prediction

DATA-GRU: dual-attention time-aware gated recurrent unit for irregular multivariate time series

Temporal-clustering invariance in irregular healthcare time series. CoRR, abs

Interpolation-prediction networks for irregularly sampled time series

Hierarchical deep generative models for multi-rate multivariate time series

Mimic-iii, a freely accessible critical care database. SCI. data

Early prediction of sepsis from clinical data: The physionet/computing in cardiology challenge

An intelligent warning model for early prediction of cardiac arrest in sepsis patients

K-marginbased residual-convolution-recurrent neural network for atrial fibrillation detection

Opportunities and challenges of deep learning methods for electrocardiogram data: A systematic review

Risk prediction for chronic kidney disease progression using heterogeneous electronic health record data and time series analysis

Learning from irregularly-sampled time series: A missing data perspective. CoRR, abs

Time series analysis : forecasting and control

Forecasting in multivariate irregularly sampled time series with missing values. CoRR, abs

Estimating missing data in temporal data streams using multi-directional recurrent neural networks

Long short-term memory

Empirical evaluation of gated recurrent neural networks on sequence modeling

Temporal belief memory: Imputing missing data during RNN training

Survey of clinical data mining applications on big data in health informatics

Analysis of incomplete and inconsistent clinical survey data

Modeling irregularly sampled clinical time series

Multivariate time series imputation with generative adversarial networks

Physiobank, physiotoolkit, and physionet: Components of a new research resource for complex physiologic signals

Early prediction of sepsis from clinical data -the physionet computing in cardiology challenge

An interpretable mortality prediction model for covid-19 patients

UA-CRNN: uncertainty-aware convolutional recurrent neural network for mortality risk prediction

A hybrid residual network and long short-term memory method for peptic ulcer bleeding mortality prediction

RAIM: recurrent attentive and intensive model of multimodal patient monitoring data

Linear regression with censored data

A learning algorithm for continually running fully recurrent neural networks

Arterial blood pressure during early sepsis and outcome

Hospital deaths in patients with sepsis from 2 independent cohorts

Data cleaning: Overview and emerging challenges

The effects of the irregular sample and missing data in time series analysis

Wavelet methods for time series analysis. (book reviews)

Comparison of correlation analysis techniques for irregularly sampled time series

Multiple imputation using chained equations. issues and guidance for practice

Pattern classification with missing data: a review. Neural Computing and Applications

A solution for missing data in recurrent neural networks with an application to blood glucose prediction

Speech recognition with missing data using recurrent neural nets

Framewise phoneme classification with bidirectional lstm and other neural network architectures

A survey of missing data imputation using generative adversarial networks

Stable and improved generative adversarial nets (GANS): A constructive survey

GAIN: missing data imputation using generative adversarial nets

Improving missing data imputation with deep generative models. CoRR, abs

Medical missing data imputation by stackelberg gan

Strategies for handling missing data in electronic health record derived data

Kalman Filtering and Neural Networks

Hidden markov and other models for discretevalued time serie

Autoregressive convolutional neural networks for asynchronous time series

A scalable end-to-end gaussian process adapter for irregularly sampled time series classification

Learning to detect sepsis with a multitask gaussian process RNN classifier

Doctor AI: predicting clinical events via recurrent neural networks

Generative face completion

Generative adversarial nets

Ambientgan: Generative models from lossy measurements

Approximation and convergence properties of generative adversarial learning

Seqgan: Sequence generative adversarial nets with policy gradient

Learning from incomplete data with generative adversarial networks

Adam: A method for stochastic optimization

A study of handling missing data methods for big data

Multiple imputation for nonresponse in surveys

Spectral regularization algorithms for learning large incomplete matrices

Temporal regularized matrix factorization for high-dimensional time series prediction

Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. online

Interpretability of machine learning-based prediction models in healthcare

Iterative robust semi-supervised missing data imputation

Medical time-series data generation using generative adversarial networks

Unsupervised online anomaly detection on irregularly sampled or missing valued time-series data using LSTM networks. CoRR, abs

Kernels for time series with irregularly-spaced multivariate observations. CoRR, abs

Timeautoml: Autonomous representation learning for multivariate irregularly sampled time series

A distributed descriptor characterizing structural irregularity of EEG time series for epileptic seizure detection

A bio-statistical mining approach for classifying multivariate clinical time series data observed at irregular intervals

Automatic classification of irregularly sampled time series with unequal lengths: A case study on estimated glomerular filtration rate

Mcpl-based FT-LSTM: medical representation learning-based clinical prediction model for time series events

A comparison between discrete and continuous time bayesian networks in learning from clinical time series data with irregularity

Multi-resolution networks for flexible irregular time series modeling (multi-fit)