key: cord-0058985-9lo37kk0
authors: Ortega-Bustamante, M. C.; Hasperué, W.; Peluffo-Ordóñez, D. H.; Paéz-Jaime, M.; Marrufo-Rodríguez, I.; Rosero-Montalvo, P.; Umaquinga-Criollo, A. C.; Vélez-Falconi, M.
title: Introducing the Concept of Interaction Model for Interactive Dimensionality Reduction and Data Visualization
date: 2020-08-19
journal: Computational Science and Its Applications - ICCSA 2020
DOI: 10.1007/978-3-030-58802-1_14
sha: a27173dfb203f118f6c892e4110cf6a9fee7369c
doc_id: 58985
cord_uid: 9lo37kk0

This letter formally introduces the concept of interaction model (IM), which has been used either directly or tangentially in previous works but never defined. Broadly speaking, an IM consists of the use of a mixture of dimensionality reduction (DR) techniques within an interactive data visualization framework. The rationale of creating an IM is the need for simultaneously harnessing the benefit of several DR approaches to reach a data representation being intelligible and/or fitted to any user’s criterion. As a remarkable advantage, an IM naturally provides a generalized framework for designing both interactive DR approaches as well as readily-to-use data visualization interfaces. In addition to a comprehensive overview on basics of data representation and dimensionality reduction, the main contribution of this manuscript is the elegant definition of the concept of IM in mathematical terms.

Very often, dimensionality reduction (DR) is an essential building block to design both machine learning systems, and information visualization interfaces [1, 2] . In simple terms, DR consists of finding a low-dimensional representation of the original data (said to be high-dimensional) by keeping a criterion of either data structure preservation, or class-separability ensuring. Recent analysis has shown that DR should attempt to reach two goals: First, to ensure that data points that are neighbors in the original space should remain neighbors in the embedded space. Second, to guarantee that two data points should be shown as neighbors in the embedded space only if they are neighbors in the original space. In the context of information retrieval, these two goals can be seen as precision and recall measures, respectively. In spite of being clearly conflicting, the compromise between precision and recall denes the DR method performance. Furthermore, since DR methods are often developed under determined design parameters and pre-established optimization criterion, they still lack of properties such as user interaction and controllability. These properties are characteristic of information visualization procedures. The eld of data visualization (DataVis) is aimed at developing graphical ways of representing data so that information can be more usable and intelligible for the user [3] . Then, one can intuit that DR can be improved by importing some properties of the DataVis methods. This is in fact the premise on which this research is based.

This emergent research area can be referred as interactive dimensionality reduction for visualization. Its main goal is to link the field of DR with that of DataVis, in order to harness the special properties of the latter within DR frameworks. In particular, the properties of controllability and interactivity are of great interest, which should make the DR outcomes significantly more understandable and tractable for the (no-necessarily-expert) user. These two properties allow the user to have freedom to select the best way for representing data. Then, in other words, it can be said that the goal of this project is to develop a DR framework that facilitates an interactive and quick visualization of data representation to make more intelligible the DR outcomes, as well as to allow users modifying the views of data according to their needs in an affordable fashion.

In this connection, this letter formally introduces the concept of interaction model (IM) as a key tool for both interactive DR and DataVis. Even though the term interaction model has been referred directly or tangentially in previous works [4] [5] [6] [7] [8] , it has not been formally defined. This paper aims to fill that void. In general terms, the concept of IM refers to a mixture of dimensionality reduction (DR) techniques within an interactive data visualization framework. The rationale of creating an IM is the need for simultaneously harnessing the benefit of several DR approaches to reach a data representation being intelligible and/or fitted to any user's criterion. As a remarkable advantage, an IM naturally provides a generalized framework for designing both interactive DR approaches as well as readily-to-use data visualization interfaces. That said, the main contribution of this manuscript is the elegant definition of the concept of IM in mathematical terms. Also, it overviews some basics of data representation and dimensionality reduction from matrix algebra point of view. A special interest is given to spectral and kernel-based DR methods, which can be generalized by kernel principal component analysis (KPCA) and readily incorporated into a linear IM.

The remaining of this manuscript is organized as follows: Sect. 2 states the mathematical notation for main variables and operators. Section 3 presents a short overview on basic concepts and introductory formulations for DR, with a special interest in KPCA. In Sect. 4, we formally define the concept of IM as well as its particular linear version. Also, the use of DR-based DataVis is outlined. Finally, some concluding remarks are gathered in Sect. 5.

For introducing further concepts and describing some mathematical developments, let us consider the following notation: Let X ∈ R N ×D be the input data matrix (also called observed space) holding N data points or samples described by a D-dimensional feature set, such that:

where

Likewise, let us consider a lower-dimensional matrix, so-named embedded space, as

with y i ∈ R d , such that d < D and ∈ {1, . . . , d}. Also, let us suppose that there exists an unknown high dimensional representation space Φ ∈ R D h ×N such that D h ≫ D, in which calculating the inner product should improve the representation and visualization of resultant embedded data in contrast to that obtained directly from the observed data. Hence, the need for a kernel representation arises to calculate the dot product in the unknown high dimensional space. Let φ(·) be a function that maps data from the original dimension to a higher one, such that:

Therefore, the matrix Φ can be expressed as:

Data representation is a wide-meaning term coined by some authors to refer byand-large to either data transformation or feature extraction. The former consists of transforming data into a new version intended to fulfill a specific goal [9] . The latter, meanwhile, is somewhat as a data remaking in such a manner that input data undergo a morphological deformation, or projection (also called rotation) by following a certain transformation criterion [10] . Also, data representation may be referred to yielding a new data representation -just as a new data matrix being an alternative to the original given one. For instance, a dissimilarity-based representation of input data [11] . An exhaustive review on data representation is presented in [12] .

Dimensionality reduction (DR) [13] is the most widely-used data representation approach. Broadly, DR is intended to embed a high-dimensional space X into a lower-dimensional space X. More technically, DR can be seen as a data transformation through a certain operator T {·} in the form:

such that Y = T {X}. Such a transformation operator can be ruled by different criteria. Figure 1 depicts the effect of a toy dataset, so-called Swiss Roll (N = 3000 data points, and D = 3 dimensions).

DR Approaches: For instance, pioneer approaches such as principal component analysis (PCA) or classical multidimensional scaling (CMDS) optimize the reduction in terms of variance and distance preservation criterion, respectively [14] . More sophisticated methods attempt to capture the data topology through a non-directed and weighted data-driven graph, which is formed by nodes located at the geometrical coordinates pointed out by the data points represent the nodes, and a non-negative similarity (also called affinity, or Gram) matrix holding the pairwise edge weights. Such data-topology-based criteria has been addressed by both spectral [15] and divergence-based methods [16] . Similarity matrix can represent either the weighting factor for pairwise distances as happens in Laplacian eigenmaps and locally linear embedding [17, 18] , or a probability distribution as is the case of methods based on divergences such as stochastic neighbour embedding [19] . In this letter, we give especial interest to the spectral approaches -more specifically, to the so-named kernel PCA. Kernel PCA (KPCA): Now, let us recall the high-dimensional space Φ defined in Sect. 2, let us also consider a lower-rank reconstructed matrix Φ, and a dissimilarity function δ(·, ·). Matrix Φ is obtained from a lower-rank base whose full-rank version spans Φ, and minimizes δ (Φ, Φ) . To keep all the KPCA conditions, following developments are done under the assumption that Φ is centered. Let us define a d-dimensional base W ∈ R D h ×d , such that W = (w (1) , . . . , w (d) ) and W W = I d , where w ( ) ∈ R D h , ∈ {1, . . . , d}, and I d is a d-dimensional identity matrix. Since d < D, we can say that the base W is lower-rank. Given this, the low-dimensional space can be calculated by means of a linear projection, as follows:

Thus, from Eq. 6, we can write that:

If we define δ(Φ, Φ) = ||Φ − Φ|| 2 F , where || · || F denotes Frobenius norm. By design, a squared distance is used so that the problem has a dual version that can be readily expressed as a quadratic form. Accordingly, we can pose the following primal optimization problem:

which can be expressed as a dual problem in the form:

having as a feasible solution for W and Y the eigenvectors associated to the d largest eigenvalues of ΦΦ and Φ Φ, respectively. This theorem is known as optimal low-rank representation widely discussed and demonstrated in [15] .

Kernel Trick: Furthermore, by following the Mercer's condition or the so-called kernel trick, we can introduce a kernel function k(·, ·), which estimates the inner product φ(

By gathering all the pairwise kernel values, we can write a kernel matrix K = [k ij ] as:

where k ij = k(x i , x j ), and i, j ∈ {1, . . . , N}. As mentioned above, such a kernel matrix captures the data topology and may be considered as a similarity matrix.

Quite intuitively, one can infer that the premise underlying the use of DR for DataVis purposes is to making directly intelligible the information of a highdimensional dataset by displaying it into a representation in 3 or less dimensions. Besides, the incorporation of interactivity into the DR technique itself or DR-based DataVis interfaces enables the users (even non-expert ones) to select a method or tune parameters thereof in an intuitive fashion.

Herein, the interaction is considered as the ability to incorporate in a readily manner the user's criterion into the stages of the data exploration process. In this case, the DR is the stage of interest. Particularly, we refer to interactivity to the possibility of tuning parameters or selecting methods within an interactive interface. As traditionally done in previous works [8] , the interactivity consists of a mixture of functions or elements representing DR techniques. In the following, we formally define the concept of IM: 

Then, IM can be itself understood as a suitable building block for powering data analysis systems with interactive dimensionality-reduction-driven visualization stages, as depicted in 2. 

As a particular case of the Definition 1, we can define the linear IM as follows: 

As explained [20] , spectral DR methods are susceptible to be represented as kernel matrices. Also, in [15] is demonstrated that, when incorporated into a KCPA algorithm, such kernel matrices reach the same low-dimensional spaces as those obtained by the original DR methods. Let us consider the following kernel representations for three well-known spectral DR approaches:

Classical Multi-dimensional Scaling (CMDS): CMDS kernel can be expressed as the double centered, squared Euclidean distance matrix D ∈ R N ×N so

where the ij entry of D is given by d ij = ||x i − x j || 2 2 . Laplacian Eigenmaps (LE). Since KPCA is a maximization of the highdimensional covariance represented by a kernel, LE can be represented as the pseudo-inverse of the graph Laplacian L:

where L = D − S, S is a similarity matrix and D = Diag(S1 N ) is the degree matrix.

A kernel for LLE can be approximated from a quadratic form in terms of the matrix W holding linear coefficients that sum to 1 and optimally reconstruct the original data matrix X. Define a matrix M ∈ R N ×N as M = (I N − W)(I N − W ) and λ max as the largest eigenvalue of M . Kernel matrix for LLE is in the form

To explain how to use the IM for DR purposes, consider the kernel matrices from Sect. 4.3, by setting M = 3, as well as f 1 = K (1) = K CMDS , f 2 = K (2) = K LE and f 3 = K (3) = K LLE . Then, the linear IM applied to spectral DR methods using kernel representations can be expressed as follows:

To add the effect of interactivity, the values of weighting factors α can be provided through a user-friendly and intuitive-to-use interface 1 , as those reviewed in [8] . Finally, the low-dimensional space Y is calculated by applying KPCA on K, so:

where KPCA(·) means that the KPCA problem stated in Sect. 3 is solved with the kernel matrix in its argument. The first notion of this approach (even though with no formal definition) is presented in [4] . Figure 3 graphically explains the use of LIM for DR techniques (DRT) following a KPCA approach. 

In this work, we have elegantly defined the concept of the so-named interaction model (IM). Such a definition open the possibility of developing more formally new interactive data visualization based on a mixture of dimensionality reduction techniques.

In future works, we will explore and/or develop novel kernel representations arising from other dimensionality reduction methods as well as IM approaches enabling users to readily incorporate their knowledge and expertise into data exploration and visualization.

Discriminative globality and locality preserving graph embedding for dimensionality reduction

Multi-scale similarities in stochastic neighbour embedding: reducing dimensionality while preserving both local and global structure

Interactive Data Visualization: Foundations, Techniques, and Applications

Geometrical homotopy for data visualization

Interactive interface for efficient data visualization via a geometric approach

Interactive data visualization using dimensionality reduction and similarity-based representations

Data visualization using interactive dimensionality reduction and improved color-based interaction model

Interactive visualization interfaces for big data analysis using combination of dimensionality reduction methods: a brief review

Cross-company customer churn prediction in telecommunication: a comparison of data transformation methods

Unsupervised relevance analysis for feature extraction and selection: a distancebased approach for feature relevance

Dissimilarity-based representation for radiomics applications

An overview on data representation learning: from traditional feature learning to recent deep learning

Nonlinear Dimensionality Reduction

Modern Multidimensional Scaling: Theory and Applications

Generalized kernel framework for unsupervised spectral methods of dimensionality reduction

Short review of dimensionality reduction methods based on stochastic neighbour embedding

Laplacian eigenmaps for dimensionality reduction and data representation

MLLE: modified locally linear embedding using multiple weights

Stochastic neighbor embedding

A kernel view of the dimensionality reduction of manifolds

The authors acknowledge to the research project "Desarrollo de una metodología de visualización interactiva y eficaz de información en Big Data" supported by Agreement No. 180 November 1st, 2016 by VIPRI from Universidad de Nariño.As well, authors thank the valuable support given by the SDAS Research Group (www.sdas-group.com).