key: cord-0043237-muivhebw authors: Almutairi, Faisal M.; Kanatsoulis, Charilaos I.; Sidiropoulos, Nicholas D. title: Tendi: Tensor Disaggregation from Multiple Coarse Views date: 2020-04-17 journal: Advances in Knowledge Discovery and Data Mining DOI: 10.1007/978-3-030-47436-2_65 sha: a8b3d74cc9815d172cd5cf549aa3098a75a2ab4d doc_id: 43237 cord_uid: muivhebw Multidimensional data appear in various interesting applications, e.g., sales data indexed by stores, items, and time. Oftentimes, data are observed aggregated over multiple data atoms, thus exhibit low resolution. Temporal aggregation is most common, but many datasets are also aggregated over other attributes. Multidimensional data, in particular, are sometimes available in multiple coarse views, aggregated across different dimensions – especially when sourced by different agencies. For instance, item sales can be aggregated temporally, and over groups of stores based on their location or affiliation. However, data in finer granularity significantly benefit forecasting and data analytics, prompting increasing interest in data disaggregation methods. In this paper, we propose Tendi, a principled model that efficiently disaggregates multidimensional (tensor) data from multiple views, aggregated over different dimensions. Tendi employs coupled tensor factorization to fuse the multiple views and provide recovery guarantees under realistic conditions. We also propose a variant of Tendi, called TendiB, which performs the disaggregation task without any knowledge of the aggregation mechanism. Experiments on real data from different domains demonstrate the high effectiveness of the proposed methods. Low-resolution data, aggregated over multiple data indices, are found in the databases of diverse applications, e.g., economics [8] , health care [15] , education [5] , and smart grid systems [6] , to name a few. The most common type of aggregation is temporal aggregation, for example, the GDP quarterly national accounts are aggregated over months. Aggregation over other dimensions is also common, such as geographically (e.g., population of New York by county) or according to a defined affiliation (e.g., number of students by majors). The latter is known in economics literature as contemporaneous aggregation. The different types of aggregation are often combined. For instance, the number of foreigners who visited different US states in 2019 can be aggregated in time, location (states), and affiliation (nationality). Aggregated data offer data summarization, which serves multiple purposes, including scalability, communication cost, and privacy. On the other hand, a plethora of data mining and machine learning tasks strive for data in high-resolution (disaggregated). Analysis results can differ substantially when using aggregated versus disaggregated data in many application domains, such as economics [8] , education [5] , and supply chains [20] . This has motivated numerous works in developing algorithms for data disaggregation. The task of data disaggregation, in general, boils down to finding a solution to a system of linear equations Ux = y, where y is the vector of aggregated observations, x is the target disaggregated series, and U is the aggregation matrix that maps the target series to the aggregated measurements. In practical settings, the linear system is under-determined as the number of observations is often significantly smaller than the length of the target series, resulting in an ill-posed problem. In order to tackle the problem, disaggregation techniques exploit side information or domain knowledge [2, 14] , in their attempt to overdetermine the problem and enhance the disaggregation accuracy. Some common prior models, imposed on the target high-resolution data, involve smoothness, periodicity [14] , non-negativity, and sparsity over a given dictionary [2] . The main issue with these approaches is that they impose application-specific constraints and therefore they cannot generalize to different disaggregation tasks in a straightforward manner. Moreover, it is unclear whether the assumed models are identifiable (i.e., an optimal solution of the model is not guaranteed to be the true disaggregated data), especially when the solution does not exactly follow the imposed constraints. Note that, identifiability is important, in the sense of assuring correct recovery under certain reasonable conditions. In our present context, identifiability has not received the attention it deserves, likely because guaranteed recovery is considered mission impossible under realistic conditions. An interesting special case of disaggregation arises when data are aggregated over more that one dimension. This is a popular research problem in the area of business and economics going back to the 70's [4] . In this case, temporal and contemporaneous aggregated views of the data are available. For instance, we are interested in estimating the quarterly Gross Regional Product (GRP) values for regions of a country, given: 1) the annual GRP per region (temporal aggregates), and 2) the GDP quarterly national accounts (contemporaneous aggregates) [16] . Another notable example appears in healthcare, where data are collected by national, regional, and local government agencies, health and scientific organizations, insurance companies and other entities, and are often aggregated in many dimensions (e.g., temporally, geographically, or group of hospitals), often to preserve privacy [15] -see Sect. 2.2 for another example. Algorithms have been developed to integrate the multiple aggregates in the disaggregation process [4, 16] . The majority of them leverage linear regression models with priors and require additional information to perform the disaggregation task. In this paper we study the multiview dissagregation task using a tensor decomposition approach, which provably converts the ill-posed problem to an identifiable one. Our work is inspired by the following question: Is the disaggregation task possible when the data are: 1) multidimensional, and 2) observed by different agencies via diverse aggregation mechanisms? This is a well motivated problem due to the ubiquitous presence of data with multiple dimensions (three or more), also known as tensors, in a large number of applications. It is also very common that aggregation happens in more than one dimensions as in the previously explained examples. The informal definition of the problem is: -Given: two (or more) observations of a multidimensional dataset, each representing a view of the data aggregated in one (or more) dimension (e.g., temporal and contemporaneous aggregates). -Recover: the data in high-resolution (disaggregated) in all the dimensions. We propose Tendi: a principled model for fusing the multiple aggregates of multidimensional data. The proposed approach represents the target highresolution data as a tensor, and models them using the canonical polyadic decomposition (CPD) to reduce the number of unknowns, while capturing correlations and higher-order statistical dependencies across dimensions. Tendi employs a coupled CPD approach and estimates the low-rank factors of the target data, to perform the disaggregation task. This way the originally ill-posed disaggregation problem is transformed to an over-determined one, by leveraging the uniqueness properties of the CPD. Tendi can disaggregate under the challenging scenario where the views are doubly aggregated, i.e., a view is aggregated in two dimensions. We also propose an algorithm (called TendiB) that handles the disaggregation task in cases where the aggregation pattern is unknown. As a result, the proposed framework not only provides a disaggregation algorithm, but also gives insights that can be potentially exploited in creating accurately retrievable data summaries for database applications. Along the same lines, our work provides insights on when aggregation does not preserve anonymity. With the aid of another view of aggregated data, estimating the individual-level accurately is possible as we show in this work, even without knowing the aggregation pattern. This leads to privacy violation if data are aggregated to preserve anonymity. Experiments on real data from different applications show that Tendi is very effective and significantly improves the accuracy of the baselines. In summary, the contributions of our work are as follows: -Formulation: we formally define the multidimensional data disaggregation task from multiple views, aggregated across different dimensions, and provide an efficient algorithm. -Identifiability: the considered model can provably transform the original ill-posed disaggregation problem to an identifiable one. -Effectiveness: Tendi recovers real data accurately and reduces the disaggregation error of the best baseline by up to 48%. -Blind disaggregation: the proposed model works very effectively, even when the aggregation mechanism is unknown (TendiB). Notation: x, X, X denote a vector, a matrix, and a tensor, respectively, X (n) is mode-n matricization of X , . F is the Frobenius norm, and [[.] ] denotes the Kruskal operator, e.g., X ≈ [[A, B, C]]. X T is the Transpose of X, and vec(·) is the vectorization operator for matrix X or tensor X . Finally, •, , and denote the outer, Khatri-Rao, and Hadamard (element-wise) products, respectively. Tensors are multidimensional arrays indexed by three or more indices, (i, j, k, ...). A third-order tensor X ∈ R I×J×K consists of three modes: columns X (:, j, k), rows X (i, :, k), and fibers X (i, j, :). Moreover, X (i, :, :), X (:, j, :), and X (:, :, k) denote the i th horizontal, j th lateral, and k th frontal slabs/slices of X , respectively-refer to [13, 17] for more background on tensors. A rank-one third-order tensor X ∈ R I×J×K results from the outer product of three vectors, i.e., A striking property of CDP is that it is essentially unique (the rank-one components a r • b r • c r are unique; or, equivalently, A, B, C can be identified up to common column permutation and scaling) under mild conditions [3] . The CPD can also be expressed using the matricized (unfolded) tensors as X (1) (2) ∈ R J×IK , and X (3) ∈ R K×IJ are mode-1, mode-2, and mode-3 unfolding of X , respectively. Mode Product: is the multiplication of a matrix by a tensor in one particular mode, e.g., mode-1 product of matrix U ∈ R Iu×I and tensor X ∈ R I×J×K corresponds to multiplying every column X (:, j, k) of the tensor by U. Similarly, mode-2 (mode-3) product corresponds to multiplying every row (fiber) of X by a matrix. Mode products can also be expressed in terms of unfolded tensors. Multiplying a matrix U in the n th mode can be denoted as: Y = X × n U ⇐⇒ Y (n) = UX (n) , where "× n " is the product over the n th mode-see Fig. 1 for an illustration. An important observation is that mode products can be absorbed in the CPD of the tensor, i.e., in Fig Given a set of low-resolution observations y ∈ R Iu (e.g., monthly) about a time series x ∈ R I , the goal of the time series disaggregation problem is to estimate the series x in a higher resolution (e.g., weekly). This can be cast as a linear inverse problem y = Ux, where U ∈ R Iu×I is a 'fat' aggregation matrix that maps the observations in y with the variables in x. In this work, we consider the case where the target high-resolution data are multidimensional (tensor). The different dimensions represent the physical dimensions of the data, e.g., time stamps, locations, etc. For the sake of simplicity of exposition, we focus on three-dimensional data in our formulation and algorithm. However, the proposed method can handle more general cases with data of higher order. Specifically, let X ∈ R I×J×K be the target high-resolution third-order tensor. In the considered problem, we are given two sets of observations, each aggregated over one or more different dimension(s), which is common when data are reported by different agencies, resulting in multiple views of the same information. The key insight is that the given aggregates can be modeled as mode product(s) of X by an aggregation matrix in a particular mode(s). To see this, consider tensor X ∈ R 4×2×2 , a simple example of a set of observations aggregated over the first mode can be expressed as The same idea applies when the aggregation is over the second (third) mode using mode-2 (mode-3) product. The major challenge in data disaggregation is that the number of available aggregated observations is much smaller than the number of variables, resulting in an under-determined ill-posed problem. This is the case even when more than one set of aggregates are available. Before defining the problem formally, we explain the concept with an example of retail sales. There are two sources of data used to forecast future demand in retail sales: 1) store-level data, commonly aggregated in time (temporal aggregate Y t ); and 2) historical orders by the retailers' Distribution Centers (DC orders), aggregated over their multiple stores (contemporaneous aggregate Y c ). Note that both store-level and DC orders data are used for demand forecasting, and especially store-level data are vital in predicting future orders [20] . Hence, many retailers share data with their suppliers to assist in the forecasting task and avoid shortage or excess in inventory [9] . In a more restricted scenario, the second source collects sales of each category of items rather than each item individually. The question that arises is whether we can fuse these sources to reconstruct high-resolution data in stores, items, and time dimensions. Formally, we are interested in: We tackle this problem using a coupled low-rank factorization model as we explain next. Coupled factorization techniques are commonly used to fuse information when data share common dimension(s) for different tasks, e.g., link prediction [7] , demand forecasting [21] , context-aware recommendation [1] , medical imaging [10] , and remote sensing [11] . Closest to our work is the approach in [11] , which employs a coupled CPD to fuse a hyperspectral image with a multispectral image, to produce a high spatial and spectral resolution image. To our knowledge, this work is the first to propose a coupled tensor factorization to tackle data disaggregation applications. Tendi builds upon two basic principles. The first is that the target tensor, X ∈ R I×J×K , admits a CPD model (X ≈ [[A, B, C]] ). The second notes that the available aggregates, Y t and Y c , are resulting from the mode product of an aggregation matrix (matrices) by X in a particular mode(s). In particular, Y t = X × 3 W, and Tendi learns the factor matrices A, B, and C by applying a coupled CPD model on the available aggregates- Fig. 2 illustrates the high level picture of Tendi. Specifically, we propose the following formulation: Note that additional aggregated views can be handled in a similar fashion. Problem (2) is non-convex, and NP-hard in general. To tackle it we employ a block coordinate descent (BCD) approach and update the three factors in an alternating fashion as summarized in Algorithm 1. The gradient of the loss function L w.r.t. A is Using the properties of the Khatri-Rao product, the space and time computational complexity of the products (C (VB)) T (C (VB)) can be reduced using the following element-wise Hadamard product C T C B T V T VB (similarly for ((WC) B) T ((WC) B) ) [19] . The updates of the factors B and C can be derived similarly using mode-2 and mode-3 unfolding of the tensors, respectively. The step size parameters α, β, and γ in Algorithm 1 are chosen by the exact line search method-see steps 1,3, and 5 in Algorithm 1. The initialization step in Algorithm 1 is crucial to the disaggregation accuracy. Thus, we propose to initialize as follows: if Y c is aggregated in two modes, then we initialize by: if Y c is aggregated only in one mode, i.e., V = I, then B is common in the two aggregated tensors and we can use the CPD of either to get two "disaggregated" factors. In this case, if I > K, then we initialize with (4), otherwise, we use: This way we have obtained an initial guess for all the factors. We use the Matlabbased package Tensorlab to compute the CPD in the initialization step. The computational complexity of each step in Algorithm 1 boils down to matrix multiplications that are dominated by O (I u J v K + IJK w )R . Since R is very small relative to the size of the tensors with many real data, the complexity is linear in the number of observations in Y t and Y c . In most practical applications, the aggregation details are known. However, there exists cases with limited or no information on how data are aggregated (i.e., U, V, and W are unknown). This happens in privacy sensitive domains such as healthcare [15] , where hospital records are aggregated to protect the privacy of patients. For such cases, we propose TendiB (Tendi with Blind disaggregation) to get the factors of the disaggregated tensor (A, B, and C): Where A = UA, and C = WC are treated as separate variables since we do not know U and W. This results in a more challenging problem than (2) as the number of variables is increased, with the same number of equations. Another challenge is that there is a scaling ambiguity between the factors of the two tensors, if we omit the third term in (6) . To overcome this, we observe that temporal aggregation W in most aggregated data is non-overlapping and includes all the time ticks 1 . This means that the respective column sums of C and C should be equal. We exploit this observation by adding the last term in (6) , thereby reconciling the scaling ambiguity. In order to solve (6), we adopt an Alternating Optimization (AO) procedure described in Algorithm 2. The updates of A, A, and B are solving overdetermined linear systems, and those for C, C boil down to solving a Sylvester equation. The Sylvester equation is a special form of a linear system of equations, which can be handled efficiently [12] . To initialize the variables in Algorithm 2, we compute the (CPD(Y c )) to get A, B, and C. To get an initial estimate of C, we exploit the fact that the temporal aggregates are the summation over consecutive time stamps in most real data. As such, we sum every consecutive w = round( K KW ) rows in C (In the experiments, we make sure that the true and estimated temporal aggregation do not align). As mentioned earlier, the disaggregation task is an inverse ill-posed problem. Modeling the data with CPD allows to provably transform the ill-posed disaggregation problem to an identifiable one. In other words, the optimal solution of (2) and (6) are guaranteed to be unique, under mild conditions and identify the original high-resolution tensor almost surely. Formally identifiability is established in Proposition 1. with rank R. Also let Y t ∈ R I×J×Kw = X × 3 W and Y c ∈ R Iu×Jv×K = X × 1 U × 2 V be the two aggregated observations. Assume that A, B and C are drawn from some absolutely continuous distribution, and that (A , B , C ) is an optimal solutions to problem (2) or (6) . The proof is relegated to a journal version of this work due to space limitation, and it leverages the uniqueness properties of the CPD. From our experiments, we observed the tested data approximately exhibit a low-rank structure and therefore our identifiability conditions are satisfied. We evaluate Tendi using the following publicly online available datasets: DFF : retail sales data from Dominick's Finer Foods (DFF), which used to be a grocery store chain in Chicago until it closed. DFF data were collected by the James M. Kilts Center, University of Chicago Booth School of Business. We create 2 ground-truth category-specific (stores × items × weeks) tensors X ∈ R I×J×K containing the number of sold items of 50 different types of Cheese (CHE) and fabric softeners (FSF). We choose these two categories because they have different statistics, i.e., different sparsity and standard deviation (SD), to thoroughly examine the disaggregation performance. In addition, we form a (stores × items × weeks) tensor containing items from 10 different categories combined, 50 items from each (namely DFF in Table 1 ). DFF data contain the geographical locations of stores, which we use to aggregate stores into groups. Crime: number of crime incidents in the City of Chicago from 2001 to present, marked with beats (police geographical areas), and codes indicating the crime types. We form a (locations (by beat) × crime types × months) tensor. Walmart: weekly sales for a number of departments in 45 Walmart stores. A (stores × departments × weeks) tensor is created from this data. The information on square feet size of stores is available and we use it to aggregate the stores. Weather: daily weather observations from 49 stations in Australia. These data include 17 different variables, e.g., min/max temperature, humidity, etc. We form a (station × variables × days) tensor of daily observations for one year. The data, we aim to disaggregate, are created using the datasets summarized with their statistics in Table 1 and represented by X ∈ R I×J×K . We examine the performance on two different scenarios: 1) Scenario A: we are given temporally aggregated tensor Y t = X × 3 W (i.e., aggregated in the third dimension), and contemporaneously aggregated tensor Y c = X × 1 U aggregated in the first mode (stores/locations dimension); and 2) Scenario B: where we observe Y t similar to scenario A, however, the contemporaneous aggregate is aggregated in the first and second dimensions in this scenario (double aggregation), i.e., Y c = X × 1 U× 2 V. The difficulty of the problem also depends on the aggregation level, i.e., the number of data points (e.g., weeks) in one sum. Fewer aggregated samples result in more challenging problems, and we test the performance using different aggregation levels. We evaluate the performance of Tendi using the Normalized Disaggregation Error (NDE = X − X 2 F / X 2 F ), whereX is the estimated data. We compare the performance to state-of-art approaches in time series disaggregation literature as well as methods developed to fuse multiple views of multidimensional data, but for different tasks (CMTF baseline). To the best of our knowledge our work is the first to perform disaggregation on multidimensional data from multiple views. Mean: assumes that the constituents data atoms (entries in X ) have equal contribution in their aggregated samples. The final estimate of Mean is the average of the estimation from the temporal and contemporaneous aggregates. LS: baseline is inspired by [16] . However, this work uses additional information that is not available in our context. Therefore, we find the minimum-norm solution to the least squares criterion on the linear relationship between vec(X ), and vec(Y t ) and vec(Y c ). H-Fuse: [14] constrains the solution of LS baseline above to be smooth, i.e., it penalizes the large differences between adjacent time ticks. HomeRun: [2] solves for vec(X ) in the frequency domain. Specifically, it searches for the vector s such that s = Dvec(X ), where D is a matrix containing the Discrete Cosine Transform basis. CMTF: [18] is coupled low-rank matrix factorization of the matricized tensors. CP: fits a CPD model to the ground-truth tensor X using Tensorlab. Then,X is reconstructed from the learned factors (lower bound on the NDE we can achieve). In the experiments, we set μ = 100, and choose R for Tendi (and CP baseline) based on Proposition 1. For CMTF we perform a grid search and show results with the best R. We run 10 iterations of the CPD in the initialization step of Algorithms 1 and 2 using Tensorlab, then 10 iterations of Tendi (or TendiB). We test this scenario with two aggregation levels on four datasets (CHE, FSF, Walmart, and Weather) as shown in Fig. 3 . The aggregation levels with CHE and FSF data are: 1) weeks are aggregated into months in Y t , whereas 93 stores are divided into 16 areas in Y c with the moderate aggregation ("mod agg") level; and 2) quarterly samples (every 12 weeks) in Y t , and stores are divided into only 9 areas with the high aggregation ("high agg") level. We conclude from the results of CHE and FSF that Tendi is more robust compared to all baselines when aggregation is aggressive (only few samples are available). For instance, with "high agg", the number of available samples in Y t and Y c is only 8.56% and 9.68% of the original size, respectively. In this case, the NDE of the second best baseline is 1.89x (1.81x) the error of Tendi with CHE (FSF) data. The best baseline is CP, which is a lower bound of the NDE we can achieve. Moreover, TendiB, which does not have access to the aggregation matrices, works remarkably well. It reduces the NDE of the second best baseline, that uses the aggregation information, by 37.77% (30.98%) with "high agg" level on CHE (FSF) data. With Walmart data in Fig. 3 (c) , "mod agg" means: weeks are aggregated into months in Y t , and 45 stores are clustered into 15 groups in Y c . Whereas, "high agg" is: weeks → quarterly samples in Y t , and 45 stores → 9 groups in Y c . CMTF works slightly better when the aggregation is moderate, which can be explained from the fact that departments (second mode in Walmart data) do not exhibit high correlation levels and thus the advantage of a tensor model over a matricized tensor one is not obvious. However, Tendi works markedly better with aggressive aggregation, even without using the aggregation information (TendiB). With Weather data (it has 93.30% zeros) in Fig. 3(d) , "mod agg" corresponds to the daily weather observations averaged into weekly samples in Y t , and the 49 stations are clustered into 13 stations resolution in Y c . On the other hand, days → months in Y t , and 49 stations → 7 groups in Y c in the "high agg" level. Although CMTF and H-Fuse work better with this datasets compared to the other data, Tendi improves their error, especially with "high agg". HomeRun is excluded in Fig. 3(d) as it imposes non-negativity. CMTF works better with this dataset owing to the fact that the second mode is small (J = 17), thus the advantage of a tensor over a matricized tensor model is less clear. H-Fuse works well as it imposes smoothness, and weather data are suitable for such constraint. Although TendiB does not work as well as with other data, it still has smaller error than the simple baselines (Mean and LS), especially with aggressive aggregation. The CP error is invisible in Fig. 3(d) as it is close to zero. In this scenario, Y c is doubly aggregated in two dimensions: stores and items, or crime locations and types. We test the performance on DFF and Crime data and compare with Mean, CMTF, and CP baselines. We omit the other baselines as they run out of memory. Difficulty (i.e., level of aggregation), increases as we move from case (a) to (c) in Fig. 4 . With DFF data, these levels are: a) weeks → months in Y t , and 93 stores → 16 areas in Y c with no aggregation over the items; b) weeks → months in Y t , and 93 stores → 16 areas and 500 items → 50 categories in Y c ; and c) weeks → quarters (12 weeks), and 93 stores → 16 areas and 500 items → 20 categories in Y c . One can see that Tendi significantly improves the disaggregation accuracy of the baselines with DFF data in Fig. 4(a) , with double aggregation and few available samples. With Crime data in Fig. 4(b) , the aggregation levels are: a) months → quarters in Y t , and 304 locations → 61 areas and 388 types → 78 categories in Y c ; b) months → quarters in Y t , and 304 locations → 31 areas and 388 types → 39 categories in Y c ; and c) months → bi-yearly samples in Y t , and 304 locations → 16 areas and 388 types → 20 categories in Y c . Crime dataset is challenging as it has 91.56% zero values and small SD. The naive mean (Mean) has a relatively large NDE with moderate aggregation in case (a), which indicates that the task is difficult. Although CMTF performs slightly better with the first two levels, Tendi becomes superior with extreme aggregation. In this work, we proposed a novel framework for fusing multiple aggregated views of multidimensional data. The proposed method leverages the properties of tensors in estimating the low-rank factors of the target data in higher resolution. The assumed model is provably transforming a highly ill-posed problem to an identifiable one. Experimental results show that the proposed algorithm is very effective, even with aggressive aggregation. The contributions of our work are summarized as follows: 1) Formulation: we formally defined the problem of multidimensional data disaggregation from views aggregated in different dimensions; 2) Identifiability: The considered tensor model provably converts a highly ill-posed problem to an identifiable one; 3) Effectiveness: Tendi reduces the disaggregation error of the competing alternatives by up to 48% on real data; and 4) Unknown aggregation: TendiB works even when the aggregation mechanism is unknown. Context-aware recommendationbased learning analytics using tensor and coupled matrix factorization Homerun: scalable sparse-spectrum reconstruction of aggregated historical data On generic identifiability of 3-tensors of small rank Best linear unbiased interpolation, distribution, and extrapolation of time series by related series The effects of student-faculty interactions on minority students' college grades: differences between aggregated and disaggregated data Privacypreserving data aggregation in smart metering systems: an overview Link prediction in heterogeneous data via generalized coupled tensor factorization Aggregated versus disaggregated data in regression analysis: implications for inference Forecasting with temporally aggregated demand signals in a retail supply chain Tensor completion from regular sub-Nyquist samples Hyperspectral superresolution: a coupled tensor factorization approach Fast algorithms for the Sylvester equation ax − xb t = c Tensor decompositions and applications H-fuse: efficient fusion of aggregated historical data Ludia: an aggregate-constrained low-rank reconstruction algorithm to leverage publicly released health data On estimating contemporaneous quarterly regional GDP Tensor decomposition for signal processing and machine learning A convex formulation for hyperspectral image superresolution via subspace-based regularization Multi-aspect streaming tensor completion Creating order forecasts: point-of-sale or order history? Temporal regularized matrix factorization for highdimensional time series prediction