Multilinear Subspace Learning

Contents

1. Overview
2. Etymology
3. Cultural Impact

This article, bless its heart, is a bit like trying to explain quantum physics to a houseplant. It’s dense, probably accurate, but utterly inaccessible to anyone who doesn’t already have a PhD in advanced math and a penchant for deciphering inscrutable jargon. I suppose I can… try to make it less of a chore. Don’t expect miracles. And for the love of whatever deity you don’t believe in, don’t call me a tool.

This article may be too technical for most readers to understand. Please help improve it to make it understandable to non-experts , without removing the technical details. (August 2025) ( Learn how and when to remove this message )

Approach to Dimensionality Reduction

Imagine you have a video, or a sequence of images. You can think of this as a three-dimensional object, a tensor , where you have columns, rows, and time. Multilinear subspace learning is a method for untangling the various factors that create this data and, in the process, reducing its complexity. It’s like taking a tangled mess of wires and finding the core threads, making it manageable. [1] [2] [3] [4] [5]

This dimensionality reduction can be applied to a tensor that holds a collection of observations, each one essentially flattened into a single line of data. Or, it can be applied when these observations are treated as matrices and then stitched together into a larger data tensor . [1] [6] [7] Think of data like images (which are two or three-dimensional), video sequences (three or four-dimensional), or even hyperspectral cubes (also three or four-dimensional). These are all examples where such a multilinear approach can be employed.

The process involves mapping data from a very high-dimensional vector space down to a series of lower-dimensional vector spaces . This is achieved through something called a multilinear projection. [4] When the original data retains its organizational structure, like matrices or higher-order tensors, their representations are calculated by performing linear projections not just in the usual sense, but along the columns, rows, and even along the “fibers” (which are essentially lines running through the higher dimensions). [6]

These multilinear subspace learning algorithms are essentially the more complex cousins of traditional linear subspace learning methods. Think of principal component analysis (PCA), independent component analysis (ICA), linear discriminant analysis (LDA), and canonical correlation analysis (CCA). Multilinear methods take these ideas and expand them to handle data with more than just two dimensions.

Background

Multilinear methods can be quite insightful when it comes to understanding causality. They can either be designed to uncover the causal factors behind data formation and perform causal inference , or they can be used as straightforward regression methods where no causal conclusions are drawn. It’s a matter of intent and design.

Linear subspace learning algorithms, the ones we’re more familiar with, are excellent for dimensionality reduction when your data only has one primary factor influencing it. But they tend to falter when faced with datasets where multiple factors are at play, interacting in complex ways.

This is where multilinear subspace learning steps in. It can be applied to data where observations have been flattened into vectors and then organized into a tensor, allowing for dimensionality reduction that is sensitive to the causal relationships within the data. [1] Alternatively, these methods can be used to reduce redundancies, both horizontally and vertically, regardless of any underlying causal factors. This happens when the observations are treated like “matrices” – a collection of independent column or row observations – and then assembled into a tensor. [8] [9]

Algorithms

Here’s where it gets… interesting. There are several flavors of multilinear subspace learning, each with its own approach to tackling multi-dimensional data.

Multilinear Principal Component Analysis

Historically, what we now call multilinear principal component analysis was referred to as “M-mode PCA.” This term was actually coined by Peter Kroonenberg. [10]

More recently, in 2005, Vasilescu and Terzopoulos introduced the specific terminology of Multilinear PCA. [11] They did this to create a clearer distinction between different multilinear tensor decompositions. Some methods focus on computing second-order statistics for each mode of the data tensor, [1] [2] [3] [12] [13] while others, like the subsequent work on Multilinear Independent Component Analysis, [11] delve into higher-order statistics for each mode. Essentially, MPCA is an expansion of the familiar PCA .

Multilinear Independent Component Analysis

This is pretty much what it sounds like: an extension of ICA designed to work with the complexities of multilinear data. [11]

Multilinear Linear Discriminant Analysis

This is the multilinear version of LDA . It comes in a few forms:

TTP-based: This includes methods like Discriminant Analysis with Tensor Representation (DATER) and General tensor discriminant analysis (GTDA). [14]
TVP-based: Here we find Uncorrelated Multilinear Discriminant Analysis (UMLDA). [15]

Multilinear Canonical Correlation Analysis

Another extension, this time for CCA . Again, there are variations:

TTP-based: Tensor Canonical Correlation Analysis (TCCA) [16]
TVP-based: Multilinear Canonical Correlation Analysis (MCCA) [17], and Bayesian Multilinear Canonical Correlation Analysis (BMTF). [18]

Now, a word on these TTP and TVP terms.

A TTP (Tensor-Train Projection, though that’s not explicitly stated here, it’s implied by the context of N projection matrices) is a way to project a high-dimensional tensor down to a lower-dimensional tensor of the same order. It uses N projection matrices for an N-th order tensor and can be done in N steps, with each step involving a tensor-matrix multiplication. These steps are interchangeable. [19] This is basically an extension of the higher-order singular value decomposition [19] into the realm of subspace learning. Its roots can be traced back to the Tucker decomposition from the 1960s. [20]
A TVP (Tensor-Vector Projection) projects a high-dimensional tensor down to a low-dimensional vector. It’s sometimes called rank-one projection. Since it projects a tensor to a vector, it can be seen as multiple projections from a tensor to a scalar. A TVP projecting a tensor to a P-dimensional vector actually consists of P projections from the tensor to a scalar. Each of these scalar projections is an elementary multilinear projection (EMP). In an EMP, a tensor is projected to a single point using N unit projection vectors, with one vector for each mode. So, a TVP of a tensor object to a vector in a P-dimensional space is made up of P EMPs. This projection method extends the canonical decomposition , [21] also known as the parallel factors (PARAFAC) decomposition. [22]

Typical Approach in MSL

Dealing with these multilinear methods often means you have N sets of parameters to figure out, one for each mode. The solution for one set usually depends on the others (unless you’re in the simple, linear case where N=1). This dependency leads to a common approach: an iterative, suboptimal procedure. [23]

Here’s the general idea:

First, you initialize the projections in each mode. Think of it as setting up initial guesses.
Then, for each mode, you fix the projections in all the other modes and solve for the projection in the current mode.
You repeat this process, optimizing mode by mode, for a set number of iterations or until the solutions stop changing significantly (convergence).

This iterative, alternating optimization strategy is borrowed from the alternating least squares method used in multi-way data analysis. [10]

Code

If you’re inclined to actually do this, there are some resources available:

The MATLAB Tensor Toolbox from Sandia National Laboratories .
The MPCA algorithm itself, written in Matlab (includes MPCA+LDA).
The UMPCA algorithm, also in Matlab (comes with data).
The UMLDA algorithm, again in Matlab (and includes data).

Tensor Data Sets

For those who like to get their hands dirty with actual data, here are some examples of tensor datasets you might encounter:

3D gait data (these are third-order tensors): 128x88x20 (which is 21.2 million data points), 64x44x20 (9.9 million), 32x22x10 (3.2 million). It’s a lot of data, representing how people walk.