key: cord-0213648-azd22dw8 authors: Gupta, Ashish; Gupta, Hari Prabhat; Biswas, Bhaskar; Dutta, Tanima title: Approaches and Applications of Early Classification of Time Series: A Review date: 2020-05-06 journal: nan DOI: nan sha: 8a09ba5f179fbaa6b46391884a7fe2572add55c7 doc_id: 213648 cord_uid: azd22dw8 Early classification of time series has been extensively studied for minimizing class prediction delay in time-sensitive applications such as healthcare and finance. A primary task of an early classification approach is to classify an incomplete time series as soon as possible with some desired level of accuracy. Recent years have witnessed several approaches for early classification of time series. As most of the approaches have solved the early classification problem with different aspects, it becomes very important to make a thorough review of the existing solutions to know the current status of the area. These solutions have demonstrated reasonable performance in a wide range of applications including human activity recognition, gene expression based health diagnostic, industrial monitoring, and so on. In this paper, we present a systematic review of current literature on early classification approaches for both univariate and multivariate time series. We divide various existing approaches into four exclusive categories based on their proposed solution strategies. The four categories include prefix based, shapelet based, model based, and miscellaneous approaches. The authors also discuss the applications of early classification in many areas including industrial monitoring, intelligent transportation, and medical. Finally, we provide a quick summary of the current literature with future research directions. Due to advancement of energy-efficient, small size, and low cost embedded devices, time series data has received an unprecedented attention in several fields of research, to name a few healthcare [1] - [3] , finance [4] , [5] , speech and activity recognition [6] - [8] , and so on [9] - [11] . There exists an inherent temporal dependency in the attributes (or data points) of a time series which allows the researchers to analyze the behavior of any process over time. In addition, the time series has a natural property to satisfy human eagerness of visualizing the structure (or shape) of data [12] . With these properties, numerous data mining algorithms have been developed to study various aspects of time series such as indexing [13] , forecasting [14] , [15] , clustering [16] , and classification [17] . Indexing algorithms focus on speeding up the searching of query time series in large dataset. Forecasting algorithms attempt to predict future data points of time series [14] . Next, the clustering algorithms aim to partition the unlabeled time series instances into suitable number of groups based their similarities [16] . Finally, the classification algorithms attempt to predict class label of an unlabeled time series by learning a mapping between training instances and their labels [17] , [18] . Time Series Classification (TSC) has been remain a topic of great interest since the availability of labeled dataset repos- itories such as UCR [19] and UCI [20] . As a consequence, large number of TSC algorithms have emerged by introducing efficient and cutting-edge strategies for distinguishing classes. Authors in [18] , [21] - [23] focused on instance-based learning where a similarity score is computed between a testing time series and each training instance, and the class label of training instance with maximum similarity is assigned to the testing time series. Dynamic Time Warping (DTW) [24] and its variations [18] , [23] with 1-Nearest Neighbors (1-NN) have been extensively used similarity measures in instance-based TSC algorithms. Another family of TSC algorithms [25] - [28] focused on finding most discriminatory subsequences (called as shapelets) of time series. A class label is identified by the presence of its one or more shapelets in the testing instance. Next, dictionary based algorithms [29] , [30] and ensemble approaches [31] , [32] have also demonstrated significant progress in time series classification. Finally, we also found some deep learning based TSC algorithms summarized in [33] . The main objective of TSC algorithms is to optimize accuracy of the classification by using complete time series. However, in time-sensitive applications such as gas leakage detection [34] , earthquake [35] , and electricity demand prediction [36] , it is desirable to classify time series as early as possible. A classification approach that aims to classify an incomplete time series is referred as early classification [37] - [39] . Xing et al. [37] stated that the earliness can only be achieved at the cost of accuracy. It indicates that the main challenge before early classification approaches is to optimize the balance between two conflicting objectives, i.e., accuracy and earliness. One of the first known approach for early classification of time series is proposed in [40] and then after several researchers have put their efforts in this direction and published wide range of research articles at renowned venues. After making an exhaustive search, we found a minor survey on early classification approaches in [41] which included only a handful of approaches and not provided any categorization. An MTS mainly consists of multiple correlated time series collected for a single event over a specified duration. Some of the important application scenarios are illustrated in Fig. 1 and are discussed below: 1) Human activity: Early classification of human activities refers to the identification of an ongoing activity before its complete execution [8] , [42] - [45] . Such early classification helps to minimize the response time of system and in turn improves the user experience [8] . The researchers in [8] , [42] , [43] utilized MTS to classify various human activities such as walking, running, sitting, upstairs, eating, etc. 2) Gene expression: It corresponds to an MTS that contains a crucial information about the biological condition of the humans. Gene expression data has been used to study the viral infection on patient, drug response on the disease, and patient recovery from the disease [46] - [48] . Early classification of gene expression time series significantly lowers the consequences of disease. 3) Electrocardiogram (ECG): It is a time series of electrical signals of generated from activity of the heart. ECG time series is usually recorded by placing multiple electrodes on the chest of patient. Early classification of ECG [3] , [49] , [50] helps to diagnose an abnormal heart beating at the earliest which reduces the risk of heart failure. 4) Industrial monitoring: With the advancements in sensor technology, monitoring the industrial processes has become convenient and effortless by using the sensors. The sensors generate time series which is to be classified for knowing the status of the operation. In chemical industries, even a minor leakage of chemical can cause hazardous effects on health of the crew members [34] . Early classification not only reduces the risks of health but also minimizes the maintenance cost by ensuring the smooth operations all time. Some notable industrial applications of early classification are gas leakage detection [34] , fault mode identification [51] , wafer quality [3] , [50] , and hydraulic system monitoring [52] . 5) Intelligent transportation: As modern vehicles are equipped with several sensors, it becomes easy to moni-tor the behavior of driver, road surface condition, insideoutside environment, etc. by using the generated sensory data. An early classification algorithm is presented in [10] to classify the type of road surface by using sensors such as accelerometer, light, temperature, etc. Such early classification of road surface helps to choose an alternative path if the surface condition is poor, i.e., bumpy or rough. In the absence of a thorough review of the early classification approaches, it requires enormous amount of efforts of a researcher to point out a potential research gap for future work. We therefore carry out a comprehensive survey of current literature on early classification approaches and propose a useful categorization for better understanding of the status of area. This paper presents a systematic review of the early classification approaches for both univariate and multivariate time series data. We categorize the various approaches into four broad groups based on the type of strategy they followed for early classification. Fig. 2 illustrates the categorization hierarchy with groups and their subgroups. Next section discusses the fundamentals of early classification approaches and their categorization into four broad groups as shown Fig. 2 . First group (i.e., prefix based) includes the review of those approaches which utilize the prefixes of time series for achieving earliness. Section III discusses the prefix based approaches in detail. Second group (i.e., shapelet based) of approaches use key shapelets (subsequences of time series) for reliable prediction of class label of an incomplete time series. Shapelet based early classification approaches are reviewed in Section IV. Another set of approaches are included in third group (i.e., model based) and discussed in Section V. The model based approaches develop a mathematical model for optimizing the balance between earliness and desired level of accuracy. Next, the fourth group includes the miscellaneous approaches that do not meet the inclusion criterion of aforementioned categories. These approaches are discussed in Section VI. Finally, Section VII summarizes the review along with some promising research directions for further work. We also provide a nomenclature table for quick reference of abbreviations and notations used in this paper. In this section, we first discuss fundamentals of time series that are prerequisite for acquiring a sound understanding of various early classification approaches. Later, we explain the categories into which various approaches are grouped. This subsection defines the notations and terminologies used in this paper. 1) Time series: It is defined as a sequence of T ordered observations typically taken at equal-spaced time intervals [53] , where T denotes the length of complete time series. A time series is denoted as X d = {X 1 , X 2 , · · · , X T }, where d is the dimension and X i ∈ R d for 1 ≤ i ≤ T . If d = 1 then the time series is referred as univariate otherwise multivariate. If a time series is a dimension (or part) of MTS then it can be referred as component [8] , [10] . In general, a time series is univariate unless it is explicitly mentioned as multivariate. 2) Time series classification: It refers to the prediction of class label of a time series by constructing a classifier using labeled training dataset [17] . Let D is a training dataset which consists of N instances as N pairs of time series X and their class labels y. The time series classifier learns a mapping function as H : X → y. The classifier H can predict class label of a testing time series X ′ / ∈ D only if it is complete, i.e., the length of X ′ should be same as the length of training instances [54] . 3) Early classification of time series: According to [55] , early classification is an extension of time series classification with the ability to classify an unlabeled incomplete time series. In other words, an early classifier H t is able to classify a testing time series with t data points only, where t ≤ T . Early classification is desirable in the applications where data collection is costly or late prediction is causing hazardous consequences [10] , [56] . Intuitively, an early classifier may take more informed decision about class label if more data points are available in the testing time series [57] but it will delay the decision. Therefore, the researchers focused on optimizing the accuracy of prediction with minimum delay (or maximum earliness). Further, the early classification of time series is analogous to a case of missing features with the constraint that the features are missing only because of unavailability of data points [57] . Such unavailability of data points makes an incomplete time series and one has to wait for more data points to make it complete. In the context of early classification, a test time series can be referred as incomplete or incoming time series. Fig. 3 illustrates an early classification framework for predicting a class label of an incoming time series X ′ . 4) Earliness: It is an important measure to evaluate the effectiveness of early classification approaches. Let t is the number of data points of the testing time series that are used by early classifier. The authors in [8] defined the earliness as E = T −t T ×100, where T is the length of complete time series. Earliness is also called as timeliness [58] . In [38] , prefix of a time series X is defined as a following subsequence X[1, t] = X[1], X [2] , · · · , X[t], where t denotes the length of the prefix. Moreover, the training dataset is said to be in prefix space if it contains only the prefix of each time series with their associate class labels. 6) Shapelet: It is defined as quadruple S = (s, l, δ, y), where s is any subsequence of time series of length l, δ is the distance threshold, and y is the associated class label [46] , [59] . The distance threshold δ is learned using training instances and is used to find whether the shapelet is matched with any subsequence of testing time series or not. Interpretability: It mainly refers to the fact that how convincing the classification results are to the domain experts. In the healthcare applications, adaptability of any early classification approach heavily relies on its interpretability [39] . The authors in [3] , [39] , [48] , [59] assert that a short segment of time series is more convincing and helpful than the time series itself if that segment contains class discriminatory patterns. 8) Reliability: It expresses the guarantee that the probability of early predicted class label of an incomplete time series is met with a user-specified threshold [57] , [58] . Reliability is a crucial parameter to ensure minimum required accuracy in the early classification. It is also termed as uncertainty estimate or confidence measure in different studies [56] , [59] , [60] . Literature indicates large number of early classification approaches for time series data. These approaches addressed the problems from wide range of research areas including healthcare [46] - [48] , [61] - [63] , human activity recognition [8] , [43] , [44] , industry [34] , [51] , [64] , and so on. After making comprehensive survey, we found that UTS has attracted more researchers than MTS. It is due to following reasons: i) MTS has complicated relationship between its dimensions (time series), ii) MTS may have redundancy in dimensions which could misguide the classifier, and iii) classifier finds it challenging to handle MTS data due to curse of dimensionality. This work categorizes various early classification approaches into meaningful groups, for better understanding of their differences and similarities. We believe that one of most meaningful way to categorize these approaches is the strategy which they have discovered to achieve the earliness. We broadly categorize the early classification of time series (including both univariate and multivariate) approaches into four major groups as shown in Fig. 2 and the included papers in different groups are given in Table I . The strategy is to learn a minimum prefix length of time series using training instances and then classify a testing time series using its prefix of learned length. During training, a set of T classifiers (one for each prefix space) are constructed and then checked for the stability of relationship between the results of prefix space and full-length space. The classifier that achieves a desired level of stability with minimum prefix length, is considered as early classifier and the corresponding prefix length is called as Minimum Prediction Length (MPL) [38] , [54] , [65] or Minimum Required Length (MRL) [8] , [10] , [66] . This early classifier has the ability to classify an ongoing time series as soon as MPL is available. 2) Shapelet based early classification: A family of early classification approaches [3] , [39] , [46] - [48] , [50] , [59] , [62] , [63] , [67] , [68] focused on obtaining a set of key shapelets from the training dataset and utilized them as class discriminatory features of time series. As there exists a huge number of shapelets in the training dataset, the different approaches attempted to select only those shapelets that can provide maximum earliness and can uniquely manifest the class label. These selected shapelets are matched with ongoing testing time series and the class label of best matched shapelet is assigned to the time series. 3) Model based early classification: Another set of early classification approaches [34] , [42] , [44] , [49] , [55] , [58] , [69] proposed mathematical models based on conditional probabilities. The approaches obtain these conditional probabilities by either fitting a discriminative classifier or using generative classifiers on training. A decision or stopping rule is designed to ensure the reliability of early prediction of class label. Some of these early classification approaches have also developed a cost based trigger function for making the reliable prediction. 4) Miscellaneous approaches: The early classification approaches that do not qualify any of the above mentioned categories, are included here. Some of these approaches employed deep learning techniques [70] , [71] , reinforcement learning [72] , and so on [40] , [51] . [38] , [54] , [65] MPL computation using posterior probabilities [8] , [10] , [45] , [52] , [60] , [66] Early classification using shapelets Key shapelets selection using utility measure [39] , [46] , [47] , [59] , [62] , [68] Key shapelets selection using clustering [3] , [48] , [50] , [63] , [67] , [73] Model based early classification Using discriminative classifier [34] , [49] , [51] , [74] - [77] Using generative classifier [42] - [44] , [56] - [58] , [69] Miscellaneous approaches With tradeoff [71] , [72] Without tradeoff [40] , [51] , [70] C. Statistical evaluation of early classifier One of most useful technique for statistical evaluation of an early classifier is proposed in [78] . As early classifiers address two conflicting objectives (i.e., earliness and accuracy) together, comparing the statistical significance of one early classifier with other becomes more challenging. In [78] , the authors therefore employed two well known statistical methods including Wilcoxon signed-rank test [79] and Pareto optimum [80] , for evaluating the early classifiers on many UCR datasets [19] . The evaluation technique uses Wilcoxon signedrank test for independent comparison where it compares two early classifiers on both objectives independently on same dataset. Further, it uses Pareto optimum with the fact that an early classifier is said to be statistically better than other if it is superior on one objective without degrading on the other. This section discusses the prefix based early classification approaches in detail. The key idea is to learn a stable prefix length of each time series during training and then utilize them for classifying an incomplete time series during testing. One of the first notable prefix based early classification approach is proposed in [37] . The authors in [37] introduced two interesting methods, Sequential Rule Classification (SCR) and Generalize Sequential Decision Tree (GSDT), for early classification of symbolic sequences. For a given training dataset, SCR method first extracts a large number of sequential rules from different length of prefix spaces and then selects some top-k rules based on their support and prediction accuracy. These selected rules are used as early classifier. The GSDT method also extracts the sequential rules but it aims to find the rules with smaller length and higher earliness. Next, we split the prefix based approaches into two groups according to their MPL computation methods. In first group, the approaches [38] , [54] , [65] developed a concept of Reverse Nearest Neighbor (RNN) to compute MPL of time series. In the second group of approaches [8] , [10] , [60] , [66] , the authors employed a probabilistic classifier to first obtain posterior class probabilities and then utilized these probabilities for MPL computation. We first discuss the concept of RNN for the time series data and then describe the approaches that have used RNN for MPL computation. Let D is a labeled time series dataset with N instances of length T . According to [38] , RNN of any time series X ∈ D is a set of time series in D which have X in their nearest neighbors. It is mathematically given as where t denotes the length of time series in prefix space and t = T for dataset D in full-length space. Fig. 4 illustrates an example of RNN with a dataset of six time series X 1 , X 2 , · · · , X 6 . An arrow from X i to X j represents that X j is the most nearest time series of X i based on given distance measure. It is easy to see that the time series can also have empty RNN. To compute MPL of any time series X, the authors [38] in prefix space of all lengths. The MPL of X is set to t if following conditions are satisfied: where Here, Eq. 4 checks the stability of RNN using prefix of X with length t. Xing et al. [38] developed two different algorithms, Early 1-NN and Early Classification of Time Series (ECTS), for UTS data. Early 1-NN algorithm computes the MPL of each time series of training dataset using 1-NN. These computed MPLs are first arranged in ascending order and then used for early classification of incoming testing time series X ′ . Let m is a least value of computed MPLs. Now, as soon as the number of data points in X ′ becomes equal to m, Early 1-NN starts classification of X ′ using prefix space of length m. It first computes 1-NN of X ′ with m data points as follows where D mpl is dataset of those time series, of training dataset D, whose MPL is at most m and dist(·) function computes Euclidean Distance (ED) between two time series. Now, if N N m (X ′ ) consists more than one time series of training dataset then most dominating class label is assigned to X ′ . If N N m (X ′ ) is empty then the classifier waits for more data points in X ′ and repeats the above process. Early 1-NN has two major drawbacks: i) each time series can have different MPL and ii) computed MPLs are short and not robust enough due to overfitting problem of 1-NN. To overcome these drawbacks, ECTS algorithm [38] first clusters the time series based on their similarities in full-length space. It employed an agglomerative hierarchical clustering [81] with single linkage for clustering. The agglomerative clustering is parameterized by minimum support threshold to avoid the over fitting issue. Later, ECTS computes only one MPL for each cluster to have a more generalized set of MPLs for reliable early classification of an incomplete time series. In [54] , the authors presented an extension of ECTS, called as Relaxed ECTS, to find shorter MPLs. Relaxed ECTS relaxes the stability condition of RNN while computing MPLs for the clusters. To compute MPL of any cluster, Relaxed ECTS requires only a subset of time series with stable RNN instead of all. It also speeds up the learning process. In [65] , the authors proposed a MTS Early Classification based on PAA (MTSECP) approach where PAA stands for Piecewise Aggregated Approximation method [82] . MTSECP first applies a center sequence method [83] to transform each MTS instance of dataset into UTS and then reduces the length of the transformed UTS by using PAA method. Let X d denotes an MTS with d components and X denotes its corresponding transformed UTS. Mathematically, j th data point of X is obtained using center sequence method as given below where 1 ≤ j ≤ T . Next, MTSECP [65] represents X using PPA method as is computed as Finally, the PAA representation of MTS training instances are used to compute class-wise MPLs by utilizing RNN. • Remarks: Learning MPLs using RNN is one of the simplest way to achieve earliness in the classification. However, the approaches including Early 1-NN, ECTS, and Relaxed ECTS, deal with UTS data and can not be easily extended to MTS. Apart from that MTSECP is proposed for early classification of MTS but it instead worked on transformed UTS. It indicates that the MTSECP does not utilize the correlation among different dimensions of MTS. The correlation helps to capture class identifiable information from multiple dimensions together. Apart from RNN, some researchers have also utilized the posterior probabilities for MPL computation of time series. This group of early classification approaches compute a class discriminative MPL for each class label of the dataset. For a given training dataset, these approaches fit a probabilistic classifier in prefix space of length t, where 1 ≤ t ≤ T . The probabilistic classifier provides posterior class probabilities for each time series of training dataset. The class with highest posterior probability is then used to compute the accuracy of the probabilistic classifier on the training data in prefix space of length t. Finally, a class discriminative MPL for class label y is set to t if where A t y and A T y are the training accuracy for class y in prefix space of t and full-length space T , respectively. The parameter α denotes a desired level of accuracy of the early classification and 0 < α ≤ 1. From the literature, we found that Gaussian Process (GP) classifier [84] has been the most preferred probabilistic classifier for early classification of time series [8] , [10] , [55] , [60] , [66] . Fig. 5 shows an example of class discriminative MPLs for five different classes along the progress of time series. MPL of any class y i is basically a timestamp of time series after which the class y i can be discriminated from other classes of the dataset. In addition, a threshold parameter is also required to learn with the MPLs to check the reliability of prediction [60] . Mori et al. [60] proposed an Early Classification framework based on DIscriminativeness and REliability (ECDIRE) of the i.e., y 1 , y 2 , · · · , y 5 . classes over time. ECDIRE employed GP classifier to compute the class discriminative MPLs. It also assigned some thresholds to each class label to ensure the reliability of predictions. Such thresholds are computed from two highest posterior probabilities that are obtained by applying GP classifier on training dataset. Let p t 1 (X) and p t 2 (X) denote first and second highest probabilities for a training time series X using prefix of length t, respectively. Now, the threshold for any class y is computed using following equation where D y consists the time series that are correctly classified in class y using GP classifier. These computed thresholds are used to check the reliability of predicted class label during classification of an incomplete time series. In addition, the authors in [60] also conducted a case study for early identification of bird species by using their chirping sounds. Additionally, they analyzed the statistical significance of ECDIRE using two widely used tests from [79] . In [66] , the authors utilized a concept of game theory for early classification of Indian rivers by using the time series of water quality parameters such as pH value, turbidity, dissolved oxygen, etc. They first formulate an optimization problem involving accuracy and earliness, and then solve it by proposing a game model. Such optimization helps to compute the class-wise MPLs while maintaining desired level of accuracy α. The authors in [10] , [45] , [52] attempted to classify an incoming MTS as early as possible with at least α accuracy. The main focus of these work is to handle a special type of MTS which is collected by the sensors of different sampling rate in a fixed period of time. In order to classify such incoming MTS, the proposed approaches [10] , [45] , [52] first estimate the class-wise MPLs for each component (i.e., time series) of MTS separately. Later, the approaches [10] , [45] developed a class forwarding method to early classify an incoming MTS by using the computed MPLs. On the other hand, the approach [52] proposed a divide and conquer based method to handle the different sampling rate component during classification of incoming MTS. These approaches [10] , [45] , [52] implicitly utilize the correlation among the components and thus maximize earliness of the classifier during class label prediction. Apart from this, the authors in [10] also employed a Hidden Markov Model (HMM) [85] during prediction for the further improvement in the earliness. Finally, they evaluated the proposed approach for classifying the type of road surface by using sensors generated MTS data. Gupta et al. [8] extended the concept of early classification for the MTS with faulty or unreliable components. They proposed a Fault-tolerant Early Classification of MTS (FECM) approach to classify an ongoing human activity by using its MTS of unreliable sensors. FECM first identifies the faulty components using auto regressive integrated moving average model [86] whose parameters are learned from training instances. Later, these faulty components are removed from the MTS and only reliable components are used for classification. During training, the FECM employed GP classifier and kmeans clustering for estimating the MPLs for each component separately. An utility function is developed to optimize the tradeoff between accuracy A t and earliness E, and is formulated as given below The accuracy A t is computed using the confusion matrix obtained by applying k-means clustering on training instances. Next, the MPL of a time series X is computed as Later, FECM employed a kernel density estimation method [87] for estimating the class-wise MPLs, which are used for the classification of an ongoing human activity. • Remarks: This group of approaches have covered diverse aspects of time series such as different sampling rate components of MTS [10] , [52] , faulty components [8] , and application of game theory model for optimizing tradeoff [66] . It is also observed that the approaches [8] , [10] , [52] also focused on utilizing the correlation that may exist among the components of MTS. This section presents a detailed review of the approaches that have used shapelets for early classification of time series. The authors [27] , [53] have successfully implemented the idea of shapelets for time series classification, which became the motivation point for many researchers to utilize the shapelets for achieving the earliness in the classification. Moreover, the shapelets improve the interpretability of the classification results [39] , [48] , [59] , which enhances the adaptability of the proposed approach for real world applications such as health informatics and industrial process monitoring. In the early classification approaches [3] , [46] , [48] , [50] , [59] , [67] , the authors focused on to extract a set of perfect shapelets (called as key shapelets) from the given training dataset. Ideally, a perfect shapelet is powerful enough to distinguish all the time series of one class from the time series of other classes. However, it is impractical to find such perfect shapelets. The researchers therefore put their efforts towards developing a proper criterion that can provide a set of effective shapelets (if not perfect) for early classification [48] , [50] , [63] , [68] . For a given training dataset, the early classification approaches first extract all possible subsequences (segments) of the time series with different length and then evaluate the quality and earliness of these subsequences to obtain a set of key shapelets. To compute a distance threshold δ of any shapelet S = {s, l, δ, y}, the distances are calculated between the subsequence s and each time series of training dataset. The distance between s and a time series X is computed as where symbol ⊑ is used to select a subsequence from the set of all subsequences of X. The authors in existing approaches [3] , [46] , [48] , [50] , [59] , [67] have developed different methods for computing the distance threshold δ of shapelet S by using its distances from each training time series. Later, these shapelets are filtered out based on their utility to obtain the most useful (key) shapelets. Finally, these keys shapelets are used for early classification of an incomplete time series. An example of early classification using shapelets is illustrated in Fig. 6 , where X ′ is an incoming time series which is to be classified using key shapelets. The class label of the shapelet S is assigned to X ′ if the distance d between X ′ and S is less than its pre-computed threshold δ. The shapelet based approaches can be further divided into two groups based on the key shapelets selection methods. The authors [39] are the first to address the early classification problem using shapelets. They developed an approach called Early Distinctive Shapelet Classification (EDSC) which utilizes the local distinctive subsequences as shapelets (or features) for early classification of time series. EDSC consists two major steps: feature extraction and feature selection. In former step, it first finds all local distinctive subsequences from training dataset and then computes a distance threshold δ for each subsequence. EDSC designed a strategy for threshold computation where it finds a Best Match Distance (BMD) for each shapelet by computing the distances from that shapelet to each training time series. EDSC employed two methods, kernel density estimation [87] and chebyshevs inequality [88] , to compute the threshold δ of each shapelet based on the distribution of BMDs. Next, in feature selection step, the authors selected key shapelets based on their utility. In EDSC, the utility of any shapelet S is computed using its precision P and weighted recall R w , as given below The precision P (S) captures the class distinctive ability of the shapelet on the training dataset. On the other, the weighed recall R w (S) captures earliness and frequency of shapelets in the training instances. Additionally, the authors [39] also discussed a heuristic technique to speed up the key shapelets selection process of EDSC. As EDSC does not provide any estimate of certainty while making the decision about class label of an incoming time series, Ghalwash et al. [59] presented an extension of EDSC with an additional property of uncertainty estimate. They named their approach as Modified EDSC with Uncertainty (MEDSC-U). The uncertainty estimate indicates the confidence level with which the prediction decision is made and if it is less than some user-defined confidence level then the decision may be delayed even after a shapelet is matched. In [46] , the authors utilized shapelets for early classification of gene expression data. A Multivariate Shapelets Detection (MSD) method is proposed to classify an incoming MTS by extracting the key shapelets from training dataset. MSD finds several multivariate shapelets from all dimensions of MTS with same start and end points. It computes a information gain based distance threshold for each multivariate shapelet to facilitate the matching with incoming MTS. In addition, the authors also formulated a weighted information gain based utility measure to select the key shapelets and to prune the needless shapelets in the process. Ghalwash et al. [47] pointed out two major limitations in MSD: i) shapelets should have same start and end points in all dimensions of MTS and ii) it is unable to handle a common problem of varied response rate in the clinical data. To overcome these limitations, a hybrid Early Classification Model (ECM) is presented in [47] by combining a generative model (HMM) with a discriminative model. At first, several HMM classifiers are trained over short segments of time series to learn the distribution of patterns in training data. Next, these trained HMM models generate an array of log likelihood values for the disjoint shapelets of a time series. Such array of likelihood is passed as features for training Support Vector Machines (SVM). For an incoming MTS, when number of arrived data points becomes equal to the shortest segment, respective HMM models generate a set of log likelihood values which is given as input to SVM to estimate probability scores of possible classes. If probability score is higher than a confidence threshold then ECM assigns the respective class label to the MTS otherwise it waits for more data points. The authors also proposed an extension of ECM in [62] which aimed to find the relevant length of segments so that HMM models can leverage the temporal dependencies in the patient specific time series. Lin et al. [68] developed a Reliable EArly ClassifiTion (REACT) approach for MTS where some of components are categorical along with numerical. REACT first discretizes the categorical time series and then generates their shapelets along with the numerical time series. It employed a concept of Equivalence Classes Mining [89] to avoid large number of redundant shapelets. This concept also helps to retain distinctive shapelets in the process. Apart from that, the authors proposed pruning techniques to further minimize the redundant shapelets. Later, REACT uses a information gain based utility measure for selecting the key shapelets. Let D is a training dataset with N MTS instances belonging to k different classes. REACT first defines the entropy of the dataset D as where n i is number of instances in class y i . Let D s is a sub-dataset which consists only those instances of D where the shapelet S appears as a subsequence. In other words, D S includes X if d ≤ δ where d can computed using Eq. 12. REACT measures the utility of shapelet S using following equation where ω ≥ 1 is controls the significance of information with respect to earliness of the shapelet. Another term W sup (S) is referred as weighted support which is similar to weighted recall R w (S) as given in Eq. 13 Due to large number of shapelets, REACT incurs high computational overhead. The authors therefore implemented REACT with the concepts of parallel computing and executed it on GPU based system, which improved the efficiency of REACT to a great extent. • Remarks: It is true that EDSC is the first to adopt the idea of shapelets for early classification but it deals with only UTS. In fact, its modified version MEDSC-U is also limited to UTS only. However, both of these approaches have successfully drawn the attention of many researchers towards shapelet based early classification of MTS. Gene expression classification has been remain a common interest in the shapelet based approaches that are proposed for MTS and discussed in this group. He et al. [50] attempted to solve an imbalanced class problem of ECG classification where training instances in abnormal class are much lesser than normal. They addressed this problem in the framework of early classification of MTS and proposed a solution approach called as Early Prediction on Imbalanced MTS (EPIMTS). At first, EPIMTS extracts all possible subsequences (candidate shapelets) of different length from each component of MTS separately. Unlike MSD [46] , the extracted candidate shapelets are univariate and thus need not to have same start and end points in all the dimensions. These candidate shapelets are clustered using Silhouette Index method [90] . Later, the shapelets in the clusters are ranked according to a Generalized Extended F-Measure (GEFM) and a shapelet with maximum rank is used to present the respective cluster. For a shapelet S, GEFM is computed as where weight parameters w 0 , w 1 , and w 2 are used to control the importance of earliness E, precision P , and recall R, respectively. In EPIMTS, GEFM worked well for measuring the quality of shapelets. Finally, key (core) shapelets are selected from each component of MTS based on the ranking of the obtained clusters. In [48] , the authors proposed an approach, called as Interpretable Patterns for Early Diagnosis (IPED), for studying viral infection in humans using their gene expression data. Similar to MSD, IPED also extracts multivariate candidate shapelets from the training MTS but it allows to have a multivariate shapelet with different start and end points in different dimensions. IPED computes an information gain based distance threshold for each shapelet. The authors formulated an optimization problem to find the relevant components of MTS and the key shapelets are selected from these components only. For each relevant component, the candidate shapelets are clustered into k groups (total number of classes in the dataset) and then only key shapelet is selected from each group. IPED finds such key shapelets by optimizing a logistic loss over training instances. One of the major drawback of MSD [46] , EPIMTS [50] , and IPED [48] approaches, is that they do not incorporate the correlation among the shapelets of different components of MTS during classification. Such a correlation helps to improve the interpretability of the shapelets. To overcome this drawback, the authors in [3] , [67] developed an approach, called as Mining Core Features for Early Classification (MCFEC), where core features are the key shapelets. MCFEC first obtains candidate shapelets from each component independently and then discovers the correlation among the shapelets of different components to enhance their interpretability. Later, the key shapelets are selected using Silhouette Index method [90] based on their ranking computed using GEFM (given in Eq. 15). For classification of an incomplete MTS, MCFEC employed a Query By Committee (QBC) [91] strategy where a class label is first predicted for each component of the MTS and the class label that appears in majority is assigned to the incomplete MTS. The authors in [73] presented a Confident Early Classification framework for MTS with interpretable Rules (CECMR) where key shapelets are extracted by using a concept of local extremum and turning points. Local extemum point of a time series X is where 1 ≤ t ≤ T . Next, the turning point of X is X[t] if following condition holds CECMR first discovers interpretive rules from the sets of candidate shapelets and then estimates the confidence of each rule to select the key shapelets. The correlation among the components of MTS is also incorporated in CECMR. Recently, the shapelets are adopted for estimating an appropriate time to transfer a patient into Intensive Care Unit (ICU) by using the MTS data of physiological signs [63] . The authors in [63] proposed a Multivariate Early Shapelet (MEShapelet) approach to estimate such appropriate time. In this case, the measurements for different physiological signs are not recorded at same interval, which generates an asynchronous MTS where the components may have different length. MEShapelet first extracts candidate asynchronous multivariate shapelets and then computes a tolerance of time threshold for each extracted candidate shapelet. Such threshold is proposed to limit the deviation among the dimensions of a shapelet. Later, the key shapelets are filtered out using clustering, which are used to construct two information gain based classifiers, decision tree and random forest. The authors collected ICU data of 2127 patients to examine the effectiveness of MEShapelet. • Remarks: Clustering of candidate shapelets has been proven a good idea to select more distinctive key shapelets than that of selected by utility measure. However, its credit goes to GEFM which ranks the shapelets based on their distinctiveness, earliness, and frequency. The approaches [3] , [67] , [73] utilize the correlation among the components of the shapelets which improved their earliness and interpretability to a great extent. It is interesting to note that MEShapelet [63] is able to classify even an asynchronous MTS of ICU data of patients. This section discusses the model based early classification approaches for time series data. Unlike prefix based or shapelets based approaches, the model based approaches [55] - [58] , [76] formulate a mathematical model to optimize the tradeoff between earliness and reliability of prediction. Most of these approaches aimed to design a decision or stopping rule by using the conditional probabilities. These conditional probabilities are either generated by generative classifiers or computed by fitting a discriminative classifier on training dataset. Further, there exist some approaches [42] - [44] , [69] which do not incorporate the reliability parameter in the model but still provided significant earliness. Mathematically, the generative classifiers estimate the joint probability distribution p(X, y) from the given labeled instances, where X and y denote the time series and its label, respectively. These classifiers use Bayes' rule to make predictions by generating conditional probability p(y|X) and thus they learn true distribution of classes [92] . During classification, a testing instance is first modeled by using learned distribution and then classified by comparing its model with the models of training instances. On the other hand, discriminative classifiers calculate the conditional probability p(y|X) by mapping the input instances to their labels. These classifiers attempt to a learn decision boundary between classes [92] , which helps to classify the testing instance. We divide the model based approaches into two following groups based on the type of adopted classifier. In [34] , the authors developed an ensemble model based early classification approach to recognize a type of gas using an incomplete 8-dimensional time series generated by a sensors-based nose. The ensemble model consists a set of classifiers with a reject option which allows them to express their doubt about the reliability of the predicted class label. The probabilistic classifier assigns a class label y ′ to an incomplete time series X ′ using the posterior class probabilities, where If p(y ′ |X ′ ) is close to 1/2 then the classifier can choose reject option to express its doubt on the class label y ′ . A threshold 0 ≤ τ < 1 is used to decide whether the classifier should choose reject option or not. The set of classifiers are kept serially along the progress of time series to facilitate the prediction using small portion of data points. If sufficient data points are not arrived then prediction is carried out again by the next classifier when another portion of data points are arrived. This process is repeated until majority of the classifiers are confident enough about the predicted class label. Decision of choosing the reject option is also depend on the cost of data collection time. Another work in [74] focused on minimizing response time to obtain the earliness in the classification. This work developed an empirical risk function which allows to minimize the risk associated with early prediction and thus optimizes the response time and earliness with confidence. Dachraoui et al. [49] proposed a non-myopic early classification approach where the term non-myopic means at each time step t the classifier estimates an optimal time τ * in the future when a reliable prediction can be made. For an incomplete time series X ′ t with t data points, the optimal time τ * is calculated by following expression where function f τ (X ′ t ) estimates an expected cost for future time steps t + τ . The expected cost function is formulated as where c i denotes a cluster obtained after clustering the training instances into k clusters and Y is the set of all class labels (i.e., k) of the dataset. The term p(c i |X ′ t ) computes a membership probability of X ′ t into c i cluster and p t+τ (y ′ |y, c i ) estimates the posterior probabilities of the classes using training data. Other terms C(y ′ |y) and C(t+τ ) represent the cost of misclassification and expected cost of classification that may incur after τ steps, respectively. The formulated cost function in Eq. 18 works as trigger function to decide whether sufficient data is arrived in X ′ for making a reliable prediction or not. If τ * = 0 then the classifier is allowed to make prediction about the class label of X ′ . As the cost function is also using the arrived data points of incomplete time series for estimating optimal time, the proposed approach is also adaptive. The authors in [75] pointed out two weaknesses of [49] : i) assumption of low intra-cluster variability, which is impractical while obtaining membership probabilities using clustering and ii) clustering is carried out with complete time series, which may impact the estimation of optimal time. In [75] , two different algorithms (NoCluster and 2Step) are introduced to overcome these weaknesses while preserving adaptive and non-myopic properties. The expected cost function in NoCluster algorithm is given as 1 1+e −λδ i t and δ i t is the distance between X ′ t and a training time series X i . Next, E t+τ (X i ) computes the expected cost per time series basis and is expressed as where p t+τ (y ′ |X i t ) is can be obtained by applying a probabilistic classifier on the training data. Next, in 2Step algorithm, the authors build a set of classifiers for achieving earliness and a set of regressors for maintaining the non-myopic property. Mori et al. [76] proposed an EarlyOpt framework where a separate probabilistic classifier is constructed for each step of the time series. They formulated a stopping rule by using two highest posterior probabilities obtained from the classifiers. The main objective of EarlyOpt is to minimize the cost of prediction by satisfying the stopping rule. EarlyOpt employed two widely used discriminative classifiers, i.e., GP and SVM. In another work [55] , the authors developed two different stopping rules by using the class-wise posterior probabilities. These stopping rules also included some real-value parameters which are optimized by using Genetic algorithms [93] . In addition to the stopping rules, the authors developed three cost functions, with 0-norm and 1-norm, to optimize accuracy and earliness of the classification. The authors in [77] introduced a two-tier early classification approach (TEASER) based on master-slave paradigm. In first tier, a slave classifier first computes posterior probabilities for each class label of the dataset and then constructs a feature vector for each training time series. Let p 1:k (X) is a set of k posterior probabilities which is obtained for training time series X. Now, the feature vector for X can be given as where y(X) is the most probable class label and ∆ X is difference between first and second highest posterior probabilities. The feature vector is passed to a master classifier. In second tier, the authors employed a one-class classifier (e.g., oc-SVM [94] ) as master classifier to check the reliability of the probable class label. • Remarks: In this group, we found two interesting approaches [49] , [75] that have addressed the early classification problem with a different property known as non-myopic. However, computational complexity of such approaches is very high during classification. The authors in [56] , [57] formulated a decision rule to classify an incomplete test time series with some pre-defined reliability. They employed Gaussian Mixture Model estimation and joint Gaussian estimation for estimating the distribution of incomplete time series by modeling the complete time series of training dataset as random variables. Two generative classifiers, linear SVM and Quadratic Discriminant Analysis (QDA) [95] , with the formulated decision rule were adopted to provide a desired level of reliability (or accuracy) in the early classification. The authors in [58] also employed QDA to classify an incomplete time series with a desired level of reliability. The proposed approach (called as Early QDA) assumed that the training time series have Gaussian distribution, which helps to estimate parameters (i.e., mean and covariance) easily from training instances. Antonucci et al. [69] developed a generative model based approach for early recognition of Japanese vowel speakers using their speech time series data. The proposed approach employed an imprecise HMM (iHMM) [96] to compute likelihood of intervals of incoming time series with respect to the training instances. It uses expectation maximization algorithm to infer the parameters without using the observations of state variables. For reliable prediction, a class label is assigned to the incoming time series only if the ratio of two highest likelihoods is greater than a predefined threshold value. Li et al. [42] employed a stochastic process, called as Point Process model, to capture the temporal dynamics of different components of MTS. They proposed a Multilevel Discretized Marked Point Process (MD-MPP) approach for early classification of MTS. MD-MPP models temporal dynamics of each component independently and then computes sequential cues to capture temporal order of events that have occurred over time among components. The authors also incorporated the correlation among components of MTS by using a variable order markov model. Another point process based approach, called as Dynamic Marked point Process with Prediction by Partial Matching (DMP+PPM), is presented in [43] . DMP+PPM captures temporal dynamics of three dimensional observations of human actions. It also incorporates temporal dependencies among different human joints during classification of ongoing action. Finally, the authors in [44] proposed a complex activities recognition framework for mobile platform, which is named as Simultaneous complex activities Recognition and Action sequence Discovering (SimRAD). It incorporates two probabilistic models, one for action sequences and another for complex activities. The probabilistic models estimate the distribution parameters from training data and use them for inferring the class label of an incomplete MTS corresponding to an ongoing activity. • Remarks: Generative classifier based early classification approaches are more complicated than that of based on discriminative classifier. Moreover, these approaches heavily depends on the estimation of data distribution by fitting the stochastic processes which makes them difficult to understand and degrades the interpretability of their results as well. This section covers other early classification approaches which do not meet the inclusion criteria of other categories. One of a primary objective of every early classification approach is to build a classifier that can provide earliness while maintaining a desired level of reliability or accuracy. However, there exist some approaches [40] , [51] , [70] which are capable enough to classify an incomplete time series but without ensuring the reliability, in other words, they did not attempt to optimize the tradeoff between accuracy and earliness. The authors in [72] introduced a reinforcement learning based early classification framework using a Deep Q-Network (DQN) [97] agent. The framework uses a reward function to keep balance between accuracy and earliness. It also includes a suitable set of states and actions for the observations of the training time series. The DQN agent learns an optimal decision making strategy during training which helps to pick a suitable action after receiving an observation in the incoming time series during testing. In another work [71] , the authors developed a deep neural network based early classification framework that focused on optimizing the tradeoff by estimating the stopping decision probabilities at all time stamps of time series. The authors formulated a new loss function to compute the loss of the classifier if a class label y is predicted for an incomplete time series X ′ t with t data points. The loss at time t is given as where β is a tradeoff parameter to control the weightage of classification loss L a (·) and earliness loss L e (·). The authors implemented the framework by using a set of Long Short-Term Memory (LSTM) layers and a single convolutional layer along with the new loss function. • Remarks: Recently, the researchers in approaches [71] , [72] have successfully employed reinforcement learning and deep learning techniques for early classification. These approaches have unfold a new direction for further research. One of the first work that mentioned early classification of time series, is presented in [40] . Though this work aimed to classify an incomplete time series, it does not attempt to optimize the tradeoff between reliability and earliness. The authors in [40] divide the time series into intervals and then treat each interval as predicate. They use only available predicates in the classification and ignore the unavailable to achieve earliness. The authors in [51] applied a Case-Based Reasoning (CBR) method for early classification of faults in a simulated dynamic system. CBR employed a k-NN classifier to classify a fault by using an incomplete time series. The simulation studies showed that the CBR method has achieved significant earliness of around 30% − 50% but without ensuring the desired level of reliability or accuracy. Recently, Huang et al. [70] proposed a Multi-Domain Deep Neural Network (MDDNN) based early classification framework for MTS. MDDNN employed two widely used deep learning techniques including Convolutional Neural Network (CNN) [108] and LSTM [109] . It first truncates the training MTS up to a fixed time step and then give it as input to a CNN layer which is followed by another CNN and a LSTM layers. Frequency domain features are also calculated from the truncated MTS which are also given as input to a similar framework of CNN-CNN-LSTM layers. Output features from both the frameworks are passes to a fully connected layer. Finally, the fully connected layer along with a softmax function is applied on the input to obtain the class assignment probabilities for the given incomplete MTS. • Remarks: The approaches [40] , [51] are quite old and did not focus on optimizing the tradeoff which is a primary objective of the early classification. However, these approaches built a foundation for the concept of early classification. Further, we also found a recent work [70] without tradeoff optimization where deep learning models are exploited to achieve an adequate level of earliness. With the presented categorization of early classification approaches, one can get a quick understanding of the notable contributions that have been made over the years. After reviewing the literature, we found that most of the early classification approaches have appeared after ECTS [38] . Although some early classification approaches (e.g. [40] , [51] ) have attempted to achieve the earliness far before than ECTS but they did not maintain a desired level of accuracy or reliability of the class prediction, which is a primary criterion of a true early classifier. We therefore included such approaches in without tradeoff group of miscellaneous category. Further, we present a summary of all the categorized approaches in Table II to have a quick acquaintance of various categories with their included papers. Moreover, this table provides details about the employed classifiers and the datasets that have been used for experimental evaluation. Additionally, one can easily separate out the approaches based on the type of time series (i.e., UTS or MTS). We make following points after a thorough review of the early classification approaches: • Prefix based approaches are easy to understand and have provided satisfactory results on various UCR and UCI datasets. In these approaches, 1-NN classifier for UTS and GP classifier for MTS have been a common choice for learning MPLs. • Majority of the approaches that use shapelets, have focused on early classification of gene expression data (i.e., MTS) and thus suitable for medical applications. As the doctors may be reluctant to adapt an approach without interpretable results, the primary objective of these approaches was to obtain the key shapelets that can exclusively represent all the time series of one class. Such shapelets are easy to interpret by linking with the patient's disease. • Model based early classification approaches are difficult to understand as they involve complicated statistical methods for developing the stopping rule. From Table II , we observe that SVM has been a widely employed classifier to obtain conditional probabilities for evaluating the stopping rule or trigger function. • Recently, the researchers in [70] , [71] have shown an interest in deep learning models for early classification of time series and have achieved promising results for both UTS and MTS. Challenges: Despite promising results of prefix based and model based approaches, the end users may not be prefer [20] these approaches for medical applications due to lack of interpretability in the classification results. Moreover, model based approaches are sophisticated and thus can be used only as black box. On the other hand, the shapelet based approaches are good for medical applications but impose heavy computations as huge number of candidate shapelets are to be extracted from training instances. • Research directions: In spite of having several existing early classification approaches, there exist some promising areas for further research as discussed below: • One of a most promising research direction is to incorporate the interpretability in the prefix based approaches but without imposing any heavy computations. • Imbalanced distribution of time series among the classes is a common problem in the applications where some classes have much lesser instances than other classes. Only EPIMTS [50] has focused on this problem by using the shapelets. It indicates a scope of better solution through other types of approaches. • Few recent studies have employed deep learning models such as LSTM and CNN, in the framework of early classification. It also opens a new direction towards enhancing the interpretability of neurons in the models which in turn will improve the adaptability of the approach. • In addition, the deep learning based early classification framework can be extended to incorporate correlation among the components of MTS while classifying an incoming MTS. Such correlation will surely improve the early classification results. Accuracy of classifier using t data points S A shapelet with quadruple (s, l, δ, y) s A subsequence of time series of length l δ Distance threshold of a shapelet N N t (X) Nearest neighbors of Xt in D RN N t (X) Set of reverse nearest neighbors of Xt in D Efficient motif discovery for large-scale time series in healthcare Automated change-point detection of eeg signals based on structural time-series analysis Early classification on multivariate time series with core features A prediction approach for stock market volatility based on time series data Financial time series segmentation based on turning points Multiobjective time series matching for audio classification and retrieval Multivariate time-series classification using the hidden-unit logistic model A fault-tolerant early classification approach for human activities using multivariate time series Real-time change point detection with application to smart home time series data An early classification approach for multivariate time series of on-vehicle sensors in transportation Time series analysis and its applications: with R examples Time-series data mining Fast, scalable, and accurate algorithms for time-series analysis A survey on forecasting of time series data A review on time series forecasting techniques for building energy consumption Time-series clustering-a decade review The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances Efficient classification of long time series by 3-d dynamic time warping The UEA & UCR Time Series Classification Repository UCI machine learning repository Experimental comparison of representation methods and distance measures for time series data Time series classification with ensembles of elastic distance measures Addressing big data time series: Mining trillions of time series subsequences under dynamic time warping Using dynamic time warping to find patterns in time series A fast shapelet selection algorithm for time series classification Fast classification of univariate and multivariate time series through shapelet discovery Time series shapelets: a novel technique that allows accurate, interpretable and fast classification Fast shapelets: A scalable algorithm for discovering time series shapelets Bag of recurrence patterns representation for time-series classification Dictionary-based compression for long time-series similarity Time-series classification with cote: the collective of transformation-based ensembles Time series classification with hive-cote: The hierarchical vote collective of transformation-based ensembles Deep learning for time series classification: a review Classifiers with a reject option for early time-series classification A distributed multi-sensor machine learning approach to earthquake early warning Early classification of individual electricity consumptions Mining sequence classifiers for early prediction Early prediction on time series: A nearest neighbor approach Extracting interpretable features for early classification on time series Boosting interval-based literals: Variable length and early classification A literature survey of early time series classification and deep learning Early classification of ongoing observation Early recognition of 3d human actions Predicting complex activities from ongoing multivariate time series Early classification approach for multivariate time series using sensors of different sampling rate Early classification of multivariate temporal observations by extraction of interpretable shapelets Early classification of multivariate time series using a hybrid hmm/svm model Extraction of interpretable multivariate patterns for early diagnostics Early classification of time series as a non myopic sequential decision making problem Early prediction on imbalanced multivariate time series Early fault classification in dynamic systems using case-based reasoning A divide-andconquer-based early classification approach for multivariate time series with different sampling rate components in iot Time series shapelets: a new primitive for data mining Early classification on time series Early classification of time series by simultaneously optimizing the accuracy and earliness Classifying with confidence from incomplete information Early time series classification with reliability guarantee Reliable early classification of time series Utilizing temporal patterns for estimating uncertainty in interpretable early decision making Reliable early classification of time series based on discriminating the classes over time Toward the early diagnosis of neonatal sepsis and sepsis-like illness using novel heart rate analysis Patient-specific early classification of multivariate observations Asynchronous multivariate time series early prediction for icu transfer Traffic classification on the fly Early classification of multivariate time series based on piecewise aggregate approximation Game theory based early classification of rivers using time series data Early classification on multivariate time series Reliable early classification on multivariate time series with numerical and categorical attributes Early classification of time series by hidden markov models with set-valued parameters Multivariate time series early classification using multi-domain deep neural network End-to-end learning for early classification of time series A deep reinforcement learning approach for early classification of time series Confidence-based early classification of multivariate time series with multiple interpretable rules Minimizing response time in time series classification Cost-aware early classification of time series Early classification of time series from a cost minimization point of view Teaser: Early and accurate time series classification Evaluation protocol of early classifiers over multiple data sets Statistical comparisons of classifiers over multiple data sets Pareto-based multiobjective machine learning: An overview and case studies Cluster aggregate inequality and multi-level hierarchical clustering Dimensionality reduction for fast similarity search in large time series databases Piecewise aggregate representations and lower-bound distance functions for multivariate time series Gaussian Processes for Machine Learning A tutorial on hidden markov models and selected applications in speech recognition Time series analysis: forecasting and control Kernel density classification and boosting: an l 2 analysis Probability, random variables, and stochastic processes. Tata McGraw-Hill Education Mining and ranking generators of sequential patterns Silhouettes: a graphical aid to the interpretation and validation of cluster analysis Query by committee On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence Estimating the support of a high-dimensional distribution Bayesian quadratic discriminant analysis Robust classification of multivariate time series by imprecise hidden markov models Human-level control through deep reinforcement learning River Dataset Ntu rgb+d: A large scale dataset for 3d human activity analysis Gene expression signatures diagnose influenza and other symptomatic respiratory viral infections in humans Transcription-based prediction of response to ifnβ using supervised computational methods Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals Plaid: a public dataset of high-resoultion electrical appliance measurements for load identification research: demo abstract Appliance consumption signature database and recognition test protocols Motion Capture Dataset View invariant human action recognition using histograms of 3d joints Imagenet classification with deep convolutional neural networks Long short-term memory