key: cord-0151243-lycq9l72 authors: Uddin, Ajim; Zhou, Dan; Tao, Xinyuan; Chou, Chia-Ching; Yu, Dantong title: MLCTR: A Fast Scalable Coupled Tensor Completion Based on Multi-Layer Non-Linear Matrix Factorization date: 2021-09-04 journal: nan DOI: nan sha: 67bcecab5fbe789fb3adbd16e24014d6ac1858ad doc_id: 151243 cord_uid: lycq9l72 Firms earning prediction plays a vital role in investment decisions, dividends expectation, and share price. It often involves multiple tensor-compatible datasets with non-linear multi-way relationships, spatiotemporal structures, and different levels of sparsity. Current non-linear tensor completion algorithms tend to learn noisy embedding and incur overfitting. This paper focuses on the embedding learning aspect of the tensor completion problem and proposes a new multi-layer neural network architecture for tensor factorization and completion (MLCTR). The network architecture entails multiple advantages: a series of low-rank matrix factorizations (MF) building blocks to minimize overfitting, interleaved transfer functions in each layer for non-linearity, and by-pass connections to reduce the gradient diminishing problem and increase the depths of neural networks. Furthermore, the model employs Stochastic Gradient Descent(SGD) based optimization for fast convergence in training. Our algorithm is highly efficient for imputing missing values in the EPS data. Experiments confirm that our strategy of incorporating non-linearity in factor matrices demonstrates impressive performance in embedding learning and end-to-end tensor models, and outperforms approaches with non-linearity in the phase of reconstructing tensors from factor matrices. Tensor completion algorithms are mostly based on two representative low-rank tensor factorization models: CANDECOMP/PARAFAC (CP) [Harshman et al., 1970] and Tucker [Tucker, 1966] . These approaches attempt to identify low-rank factor matrices using the observed entries and then reconstruct the target tensor based on these factors matrices. The low-rank tensor completion essentially boils down to a two-step problem with two objectives: representation learning (Factorization) and subsequent multi-way relationship prediction Figure 1 : Tensor completion on the Earning Per Share (EPS) data. The accuracy of tensor completion with low-rank decomposition severely degenerates with increasing tensor sparsity. In contrast, our algorithm with the coupled tensor factorization maintains accuracy even with 99.13% of missing values. (Reconstruction). Over the years, a number of low-rank tensor completion methods have been proposed [Acar et al., 2011a , Gandy et al., 2011 , Song et al., 2017 , Liu et al., 2018 ,Wu et al., 2019b ,Liu et al., 2019 . These existing algorithms suffer two mutually exclusive problems in achieving the two objectives. First, linear low-rank algorithms [Acar et al., 2011a , Gandy et al., 2011 , Song et al., 2017 ) attain low-rank embedding matrices by Singular Value Decomposition (SVD), but fail to capture the multi-way (non-linear) relationships that are common in real-world tensor applications. The lack of multi-way relationship modeling results in suboptimal performance in tensor completion or downstream prediction [Fang et al., 2015 , He et al., 2014 , Zhe et al., 2016 . Second, algorithms with nonlinear reconstruction (e.g., [Liu et al., 2018 ,Wu et al., 2019b ,Liu et al., 2019 ) focus on nonlinear relationship learning among factors and use "kernel tricks" or neural network layers to represent the embedding factors and multi-way relationships. When multi-way relationship learning becomes dominant, it ignores the structure of input signals, makes no constraints on factor matrices, and attempts to encode all information signal into the relationships. The lack of data structure in embedding matrices will lead to lowquality embeddings that are prone to noise, variance, and overfitting. In this paper, we design a multi-layer matrix factorization neural networks for coupled tensor reconstruc- In CP and MLCTR factorization, almost all information is forced to pass through the embedding matrices, whereas in CoSTCo the CNN of the later part also captures a significant portion of tensor information, resulting in less informative embedding matrices. tion (MLCTR). MLCTR learns the distributed representations effectively for prediction and multi-way relationship tasks. Unlike existing nonlinear tensor completion methods, it avoids the difficult trade-off between the two tensor completion objectives and introduces non-linearity in the embedding learning step. It explicitly employs Multi-Layer matrix factorization for factor matrices and uses nonlinear transfer functions in each layer, thereby learning the highly complex structures and relationship among the hidden variables. Besides, to avoid the vanishing gradient in the deep architecture, we use by-pass connection following [He et al., 2016] . The resulted architecture has less reconstruction error ( Fig. 1 ) and generates high-quality embedding matrix (Fig. 2 ). Figure 1 also shows the sparsity problem with tensor completion: as the number of missing observations increases, the accuracy decreases significantly. With our proposed model, we can easily mitigate this problem by augmenting sparse data sets with auxiliary data. Literature suggest auxiliary information from a secondary dataset significantly improves tensor completion accuracy [Narita et al., 2012 , Kim et al., 2017 , Acar et al., 2011b , Bahargam and Papalexakis, 2018 . We take advantage of data structures among multiple tensors, apply tensor integration mechanism as appropriate to reduce the associated computation cost, and scale-up MLCTR to factorize two or more coupled sparse tensors simultaneously. Our coupled tensor completion algorithm uses a modified objective function for element-wise reconstruction and SGD optimization. To confirm the MLCTR algorithm's superiority, we evaluate it on finance datasets and three other commonly used public data sets, including climate and point of interest (POI) data. The experiment results reveal the consistency and reliability of our model. The main contributions of our paper are as follows: • We develop a novel nonlinear coupled tensor completion model based on multi-layer matrix factorization, nonlinear-deep neural networks, and bypass connections to efficiently learn both embedding matrix and nonlinear interaction between the embedded vectors. • The learned embeddings encode latent data structures and patterns and provide high-quality distributed representation for downstream machine learning tasks. • We propose the first-ever SGD based nonlinear coupled tensor completion algorithm that is fast and scalable. • We introduce the by-pass connection to mitigate the gradient diminishing problem in networks with great depths. 2011, , Song et al., 2017 , Acar et al., 2011a are developed based on classical CP [Harshman et al., 1970] and Tucker factorization [Tucker, 1966] . The low-rank approach is not always precise and often fails to capture frequent nonlinear interactions in real-world applications. To capture real-world nonlinear relationships, in [Liu et al., 2018 , Wu et al., 2019b , authors replace multi-linear operation with multi-layer perceptions (MLP) and in [Liu et al., 2019] authors proposed a convolution neural network-based architecture. Nevertheless, these works try to learn the lowrank representation of single tensor and do not consider any auxiliary information to improve the factor matrices. In [Narita et al., 2012 , Kim et al., 2017 , authors introduce regularization from auxiliary data and demonstrate performance improvement. In recent years, coupled matrix-tensor factorization (CMTF) also gains broad interests [Acar et al., 2011b, Bahargam and Papalexakis, 2018] . CMTF factorizes a higher-order tensor with a related matrix in a coupled fashion. Unlike CMTF, our approach is a coupled tensor factorization for sparse data where both data sets are higher-order tensors. There are several coupled tensor factorization approaches available [Khan et al., 2016 , Genicot et al., 2016 , Wu et al., 2019a ; however, these models are not designed for sparse data and require a full observation in both tensors. Compared to this, our MLCTR relaxes the constraints of complete observations and captures non-linearity in both target tensors. Expected EPS conveys the vital information about firms' future cash flows and is one of the critical inputs for security pricing [Lee and So, 2017] . The current industry benchmark averages across all available analysts' forecasts for each firm at each quarter. However, studies suggest that this straightforward average forecast has several drawbacks: it may contain systematic bias [Ramnath et al., 2008 , Bradshaw et al., 2012 and fail to incorporate additional information from markets, firm characteristics, analysts features, and crowdsourcing [Bradley et al., 2017, Ball and Ghysels, 2018] . To address these problems, the authors in [Bradley et al., 2017] assign different weights among analysts based on their past performance. More efforts of [Corredor et al., 2019 , Bradshaw et al., 2012 show that the model combining accounting characteristics with analysts' EPS forecast generates better earning predictions than the widely adopted time-series models do. Our work entails a novel data mining approach MLCTR to explore a new avenue of analyzing financial data, especially with tensor representation and missing value imputation. Tensor factorization can be formulated as a two-step paradigm: embedding learning and subsequent relationship modeling. In contrast to the majority of tensor algorithms that focus on non-linear relationship modeling in the second step, MLCTR considers the first embedding learning step and explicitly guides networks to learn representative vectors for each entity. The MLCTR learns the embedding matrix and multi-way interaction among the embeddings with the same stack of networks. The approach has a connection to the kernel-based support vector machine. The well-known Radial Basis Function Kernel (RBF) essentially is an infinite sum over polynomial kernels, each of which can be further expanded to linear dot products in the polynomial space. The RBF kernel defines a high-dimensional transform Ψ RBF : . An appropriate embedding transformation Ψ approximates non-linear kernels with linear dot products among embedding vectors, and thus, greatly simplify the downstream relationship learning. The embedding learning algorithm starts with a random signal and adds incrementally new information into the embedding vectors in multi-layered network architecture in Figure 3 . We adopt Multi-Layer matrix factorization to construct the factor matrices of the tensor, thereby learning meaningful embeddings. Given a factor matrix U ∈ R d1×r , U = [u 1 , u 2 , · · · , u d1 ] is a collection of d 1 embeddings with r dimensions. The factor matrix holds the feature vectors of d 1 entities. Their feature vectors presumably have structure and are generated from H hidden variables in l different groups (clusters). For simplicity, we assume that each group uniformly has h = H l hidden variables. For example, the real space image features might originate from different groups of hidden variables: frequency bands, pose features, color features, expression features, and identify features. Based on this preassumption, we further decompose U into two hidden matrices P and Q and learn the feature grouping structure simultaneously as follows: where P (j) = [p jh , p jh+1 , · · · , p jh+h−1 ] and Q (j) = [q jh , q jh+1 , · · · , q jh+h−1 ] . When the group information is known, we explicitly arrange the order of hidden variables and provide a group-aware matrix factorization, as shown in the right hand in Eqn 3.1. In most cases, the group information and hidden variables are Figure 3 : In the Multi-Layer Network Architecture for Learning Embedding, we use by-pass connections to create very deep networks (up to 34 layers in this example) to learn complex data structures. The multi-way relationships among all factors are modeled by the linear dot product. We can also use MLP or convolutional neural networks and trade-off the complexity between the embedding layers and relationship modeling layers. This architecture mitigates the overfitting problem by adding structural constraints in the highdimensional embedding. unknown, and nevertheless can be extracted by our proposed multi-layer matrix factorization networks. The multi-layer matrix factorization has the fundamental connection to signal processing: the Q (j) consists of the base (loading) vectors of transformation (for example, Fourier or Spectral transformation) and P (j) is the loading scores of U on the base matrix Q (j) , i.e., the row vectors in U are the weighted sum of the base signals in different frequency bands. We do not assume any prior knowledge of the bands of hidden variables and treat data as the signals from different frequency bands. Here, each rank of the matrix factorization P and Q represents one frequency band. We will use the j-th layer neural network in Figure 3 to learn the P (j) and Q (j) in Eqn 3.1 and attempt to extract h related frequencies in the same band simultaneously. The dimensonality of U at different layer is the same (U j ∈ R di×r and U j−1 ∈ R di×r ), which helps to remove noises and learn meaningful signal without forcing the model to compress the available information. Given the complicated relationship embedded in U , this multi-layer approach partitions the learning into l frequency bands. Each frequency band corresponds to a network layer in Figure 3 . This design eases the complexity associated with single layer and avoids the complexity of learning to model the entire signal altogether. We introduce non-linear transfer functions σ, i.e., ReLU, ELU, and sigmoid, in the factor matrix U and its corresponding multi-layer neural network of Eqn 3.1. We rewrite non-linear matrix factorization in each layer j as follows: where the input matrix at layer 0 is zero, and the inputs to the multi-linear dot product in the right-hand side of Figure 3 is U (out) = U (l−1) . Figure 3 shows P (j) and Q (j) are the trainable parameters, and their products are applied with non-linear transfer functions before and after being added into the forward path from the lower layer to the higher layer. The network has a by-pass connection from the layer input directly to the layer output and applies element-wise matrix additions to implement identity mapping. Similar to the ResNet [He et al., 2016] , the by-pass connection design does not increase the number of neurons and mitigates the problem of gradient vanishing and explosions that often occur in networks with a great depth. The multiple layers of matrix factorization and by-pass greatly enhance the modeling capacity for non-linearity, while incurring no higher training errors than those without it. Nearly all tensor completion algorithms, including our proposed MLCTR, often suffer the cold start problem and extremely low signal to noise ratio (SNR) [Acar et al., 2012] . Especially in our finance application, the EPS dataset has a high number of missing values, i.e., 99%. The time, analyst, and f irm latent factors learned from the EPS tensor are less informative because of the excessive number of miss values. To recover the critical missing signal, we introduce additional data synergistic to the EPS tensor to be imputed. The firm fundamentals share the same time and firm dimensions with EPS and provide complementary information for any firm in EPS, including key performance indicators and firm characteristics. In the coupled tensor framework, MLCTR attempts to enforce the same time factor and firm factor matrices during factorization. The information propagates from Algorithm 1: MLCTR Coupled Input : Tensor X ∈ R d1×d2×d3 and tensor Y ∈ R d1×d2×d4 to be completed, rank of tensor decomposition r, rank of matrix factorization h, network layers l, index set of observed entries Ω X in the tensor X and Ω Y in the tensor Y. Output: Updated factor matrices U (0) , V (0) , W (0) , T (0) and hidden matrices P (j) and Q (j) (j = 0, . . . , l − 1) 1 Initialize all hidden matrices P (j) and Q (j) for all layers, initialize Update all U (0) , V (0) , W (0) and associated P (j) and Q (j) based on chain rule and Eqn. 3.5. 12 until maximum number of epochs or early stopping; the dense tensor (firm accounting fundamentals) to the sparse tensor (analysts' EPS forecast) by coupling the firm and time factors in tensor factorization and completion. Figure 4 describes the MLCTR system for coupled tensor, factorizing two tensors: X and Y, each of which has three factor matrices: two common factor matrices U and V and the unique matrix T for X and W for Y. We assume all factor matrices have the same rank r. Figure 4 shows a simple linear dot product for multi-way relationship. Alternatively, we add MLP between the embedding learning layer and the output layer for modeling addition non-linear relationships. 1 1 In the experiment part, we call MLCTR for using simple dot product and MLCTR (MLP) for using MLP layers between the middle layer and output layer on the architecture. The objective function for coupled tensor is to minimize the mean square error of two tensor factorizations by optimizing the following equation: Here λ is the hyper-parameter to adjust the relative importance between the two coupled tensors. The factors matrices T (out) , U (out) , V (out) , W (out) are the output of the multi-layer embedding learning networks. Traditional CMFT requires complete tensors for factorization and incurs long computation time for large tensors. In this paper, we perform an element-wise tensor reconstruction from the observation set Ω for fast convergence. We revise the objective function as follows to be compatible with any deep neural network platform: where 1 Ω X (ijk) is an indicator function. It is straightforward to implement Eqn 3.4 with the Stochastic Gradient Descent (SGD): we first mix the training data from Ω X and Ω Y , randomly choose one mini-batch of training samples, and calculate the mean square error defined in Eqn 3.4. This mixing strategy intelligently employs indicator functions to allow any observation to be treated uniformly, thereby enabling the parallel processing of the samples in the same mini-batch. Eqn 3.4 essentially is multi-task learning and thereby highly scalable to allow multiple tensors to be factorized simultaneously. For any training sample, the gradient of the first term or the second term in Eqn. 3.4 is zero. Considering the sample with index (i, j, k) ∈ Ω X , we calculate the corresponding gradient of L to the embeddings and update the relevant parameters as follows: Here is the Hadamard product of two vectors. The gradients on other factor matrices V, W, T have the identical formula to Eqn 3.5. We only show the parameter in matrix U . The gradient is back-propagated through the network layers in Figures 3 and 4 . Algorithm 1 shows the pseudo-code of SGD based MLCTR 2 . We conduct two experiments for four datasets to evaluate our algorithm: 1) efficiency in tensor completion in both time and accuracy compared to other stateof-the-art tensor completion techniques and 2) ability to factorize sparse coupled tensor while learning meaningful factor matrix. To alleviate the overfitting problem, we try several regularization methods inlcudung Lasso, Ringe, and Elaticnet and call these models as Resnet-L1, Resnet-L2, Resnet-Elastic. We compare our algorithm with CPWOPT [Acar et al., 2011a ] -the benchmark low-rank sparse tensor completion method, P-Tucker [Oh et al., 2018 ] -a scalable Tucker model with fully paralleled row-wise updating rule, and CoSTCo [Liu et al., 2019] -CNN based state-of-the-art nonlinear tensor completion method. To evaluate performance, we use three metrics, RMSE, MAE, and MAPE. We apply our MLCTR for SafeGraph Foot Traffic data. SafeGraph collects cellphone GPS location data from a panel of cellphone users when a set of installed apps are used and they are available for free to academics studying COVID-19 (https://www.safegraph.com/covid-19data-consortium). These cellphone GPS location data are supplied at the daily level for residents of each Census Block Group(CBG). In the following experiment We collect 95509754 records belonging to five boroughs of New York States (The Bronx, Brooklyn, Manhattan, Queens, and Staten Island.) for the sample period of 2019. And we use these records to construct a three order tensor to describe the three-way relationship of original CBG, destination CBG and date with shape 6439 × 6439 × 365. To ensure a reasonable data distribution, we apply the log transformation and use the grid search to find the proper base as 10. In addition, we test the efficiency of our algorithm on two commonly used public datasets. The first one is climate data, which is used in [Lozano et al., 2009 , Liu et al., 2010 . The dataset has 18 climate agents from 125 locations from 1992-2002. The second one is a real point of interest (POI) data used in [Li et al., 2015] . The Foursquare check-in data made in Singapore between Aug. 2010 and Jul. 2011. The data comprises 194,108 check-ins made by 2,321 users at 5,596 POI. Using two different processing systems, we develop two different tensor representations of the POI dataset, i.e., POI and POI-3D. For POI we follow the approach used in [Liu et al., 2019] and represent the tensor as (user id, poi id, location id). The first two dimensions user id and poi id are available in data, in [Liu et al., 2019] , the authors created the third dimension 'location id' by splitting the POIs into 1600 location clusters based on their respective latitude and longitude. Hence, both 2nd-order and 3rd-order represent location information, and for each location id, we have different poi id, resulting in an unnecessary large tensor. In POI-3D we overcome this limitation by incorporating time information available in the data and replace location id with time to represent tensor as (user id, poi id, time). We divide the 24 hours into 12 groups of 2 hours intervals. This incorporation of time information helps us learn a better latent representation of user probability of visiting a specific POI at a specific time. We normalize the datasets with zero mean and unit variance. For EPS and climate data, we use an 80/20 train-test split, with 10% of the training data as the validation set and early-stopping if validation loss does not improve for 10 epoch. For both POI datasets, we use the train-validation-test set following [Li et al., 2015] . The tensor shape, number of observed entries for each of the data sets, and Hyper-parameters are reported in Table 1 . Tensor Completion We factorize two sparse tensors -analysts' EPS forecast (quarter, f irm, analyst) and firm fundamentals (quarter, f irm, f undamental)-together with the ob- jective function of eq. 3.4. For imputing missing values, coupled tensor factorization produces much higher accuracy than single factorization. As reported in Table 2 , MLCTR (coupled) can outperform CPWOPT by 49% (rank = 30, best performing CPWOPT) and CoSTCo by 34%(rank = 40, best performing CoSTCo). The benefits of coupled tensor completion beyond single tensor completion can be captured by the performance improvement between MLCTR and MLCTR coupled. With rank 40, MLCTR (coupled) outperforms MLCTR (MLP) by 16% (RMSE), 15% (MAE) and 18% (MAPE). The proposed MLCTR algorithm is also robust to the increasing number of missing values. As shown in Figure 1 , even with 99% missing values, our algorithm can still impute missing values accurately, outperforming CoSTCo by 37%. MLCTR is also less sensitive to rank. Figure 5b shows that with higher ranks, MAPE decline smoothly for all three versions of MLCTR. Completion MLCTR is not only effective for coupled tensor completion, but the technique of using residuals by further factorizing latent factors can also learn better embedding for single tensor. To show such generalization of MLCTR, we conduct analysis using three public datasets. On climate forecasting, MLCTR outperforms CPWOPT, P-Tucker, and CoSTCo in all three performance metrics (Table 2) . At rank 30, MLCTR (MLP) outperforms CPWOPT by 24%, P-Tucker by 20% and CoSTCo by 15% in RMSE. For both POI datasets, the data sparsity is too high. With only 0.0005% (POI) and 0.09% (POI-3D) available observation, CPWOPT with gradient descent does not converge. Therefore, for POI data, we did not report the CPWOPT result. In POI, with rank 30, MLCTR (MLP) outperforms P-Tucker by 31%, and CoSTCo by 13% in MAPE; whereas CoSTCo is only better at rank 10 in RMSE. For POI-3D, CoSTCo outperforms simple MLCTR in some performance metrics. However, our MLP version MLCTR (MLP) still outperforms CoSTCo with higher ranks (30 and 40) by a significant margin. Fig. 2 , the learned factor matrices from MLTCR is much more informative than other nonlinear tensor factorization models. To further understand the learned factors, we visualize the factor matrices learned from coupled tensor factorization on analysts' EPS forecast and firm fundamentals data. Fig. 6 shows the cosine similarity between "quarters" learn from the time factors matrix. The temporal patterns in the time factors are clearly visible. Fig. ? ? shows the t-distributed stochastic neighbor embedding (t-SNE) to plot the spectral clustering based on firm latent factors. MLCTR learns meaningful embedding for firms according to their size, service type, and the client groups they serve. 4.5 Running Time Comparison MLCTR uses low-rank MF and by-pass connections for learning latent factors; thus, it learns embedding matrix much faster than other nonlinear algorithms, i.e., CoSTCo, P-Tucker. MLCTR takes indices of the observed values as input variables; therefore, it is linear to the number of available observations rather than the size of the target tensor. Figure 8 shows the running time of each algorithm in each data sets at different ranks. The reported time elapsed for each algorithm is with early stopping criteria. The time complexity of MLCTR is also linear to the rank and does not increase drastically with higher ranks. The proposed MLCTR algorithm divides the tensor factorization and completion into two interleaved modules: the first one that learns the rank r embeddings and the second one for modeling multi-way relationships among the embeddings of the participating entities. The majority of related work focuses on the latter: for an N thorder tensor, N -way linear (including CP and Tucker decompositions) and nonlinear kernels (RBF, polynomial) are employed to model the relationships and minimize the mean square errors between the observations and predicted values. Tensor rank is the key parameter in factorizing a tensor. The common practice is to perform a grid search on an appropriate rank r. A small rank r incurs large bias in tensor analysis while a high rank r leads to overfitting [Liu et al., 2019] . The overfitting problem is primarily due to many unconstrained r hidden variables in embeddings and must be regularized to minimize variance. The standard l 1 and l 2 regularizations only add the local constraints of smoothness and sparsity on embeddings and might not be sufficient for our problem. Inspired by the signal processing theory, we introduce the structure, base constraints, and global regularization to the embedding space. We argue that high-quality embedding learning will mitigate the complexity in the second module for relationship modeling so that a simple linear dot product in CP or a shallow MLP is sufficient in the algorithmic implementation. In this paper, we apply an innovative approach to shift the learning towards the embedding module, ensuring its central role in a tensor algorithm, and easing relationship learning. With the high-quality embedding, many multi-way relationships can be efficiently modeled by the CP tensor algorithm or simple MLP networks. We implement MLCTR using multi-layer neural networks where each layer performs low-rank matrix factorization for embedding matrices. Experiments show that our algorithm works exceptionally well for both single tensor and coupled tensor factorization and completion and is less sensitive to tensor rank, robust to noise, and fast to converge during training. Scalable tensor factorizations for incomplete data Coupled matrix factorization with sparse factors to identify potential biomarkers in metabolomics Automated earnings forecasts: Beat analysts or combine and conquer Before an analyst becomes an analyst: Does industry experience matter? A reexamination of analysts' superiority over time-series forecasts of annual earnings The role of sentiment and stock characteristics in the translation of analysts' forecasts into recommendations Personalized tag recommendation through nonlinear tensor factorization using gaussian kernel Tensor completion and low-n-rank tensor recovery via convex optimization Foundations of the parafac procedure: Models and conditions for an" explanatory Deep residual learning for image recognition Dusk: A dual structurepreserving kernel for supervised tensor learning with applications to neuroimages Bayesian multi-tensor factorization. Machine Learning Discriminative and distinct phenotyping by constrained tensor factorization Uncovering expected returns: Information in analyst coverage proxies Rank-geofm: A ranking based geographical factorization method for point of interest recommendation Neuralcp: Bayesian multiway data analysis with neural tensor decomposition Costco: A neural tensor completion model for sparse tensors Tensor completion for estimating missing values in visual data Learning temporal causal graphs for relational time-series analysis Spatial-temporal causal modeling for climate change attribution The financial analyst forecasting literature: A taxonomy with suggestions for further research Some mathematical notes on three-mode factor analysis Improved coupled tensor factorization with its applications in health data analysis