Classical Multidimensional Scaling

This post is about an dimension reduction method I used in my master thesis.

Introduction

Before we start to talk about classical Multidimensional Scaling(MDS). Let’s first consider one simple problem from middle school math textbook, given two cities coordinates (coordinates of Berlin and Hamburg for example) and how to find the squared distance between them.

It’s quite simple,right? Assume the coordinate of Berlin is $(x_{11},x_{12})$ and the coordinate of Hamburg is $(x_{21},x_{22})$ . And the squared distance is denoted by $d^2$ . Then for Euclidean distance we have

$d^2=(x_{11}-x_{21})^2+(x_{12}-x_{22})^2$

Now let’s go further into m-dim case, how about the squared distance between point $i$ and point $j$ ? And the formula now becomes

$d^2_{ij} = \sum_{k = 1}^{m}(x_{ik} - x_{jk})^2$

If we have $n$ points, then we need a distance matrix $D^2$X$$ to describe the squared ditances between each of the two points.

$D^2(X) = \begin{bmatrix} 0 & d^2_{12} & d^2_{13} & ... & d^2_{1n}\\ d^2_{21} & 0 & d^2_{23} & ... & d^2_{2n}\\ ... & ... & ... & ... & ...\\ d^2_{n1} & d^2_{n2} & d^2_{n3} & ... & 0\\ \end{bmatrix}$

For now, nothing special, I just started from a two-point distance case to a more general distance case using distance matrix. But the basic idea is the same, which is to calculate distance or find distance matrix given coordinates.

Now consider reverse situation: We know the distance matrix, is it possible to find the coordinate of each point? If yes, then how?

Classical MDS can address this problem.

Definition

According to wiki, MDS is a means of visualizing the level of similarity of individual cases of a dataset. An MDS algorithm aims to place each object in N-dim space such that the between-object distances are preserved as well as possible.

The definition from Florian Wickelmaier’s paper [1] is similar. The data for MDS is now called proximities, which indicate the overall similarity or dissimilarity of the objects. An MDS program looks for a spatial configuration of the objects, so that the distance between the objects match their proximities as colsely as possible.

In our city-distance example, the proximity is the distance matrix and classical MDS will give us the spatial configuration of the cities, which is the coordinate of each city.

The classicial MDS is also known as Principal Coordinates Analysis(PCoA).

Classical MDS

Given

$D^2(X) = \begin{bmatrix} 0 & d^2_{12} & d^2_{13} & ... & d^2_{1n}\\ d^2_{21} & 0 & d^2_{23} & ... & d^2_{2n}\\ ... & ... & ... & ... & ...\\ d^2_{n1} & d^2_{n2} & d^2_{n3} & ... & 0\\ \end{bmatrix}$

Goal

$X = \begin{bmatrix} x_{11} & x_{12} & x_{13} & ... & x_{1m}\\ x_{21} & x_{22} & x_{23} & ... & x_{2m}\\ ... & ... & ... & ... & ... \\ x_{n1} & x_{n2} & x_{n3} & ... & x_{nm} \end{bmatrix}$

Assumption

Coordinate matrix $X$ has column means equal to 0, since distances don’t change under translations.

Solution

Firstly, we define two n-element column vectors $\vec{e}$ and $\vec{a}$

$\vec{e} = \begin{bmatrix} 1 \\ 1 \\ 1 \\ ... \\ 1 \\ \end{bmatrix}$ $\vec{a} = \sum_{k=1}^m \begin{bmatrix} x_{1k}^2 \\ x_{2k}^2 \\ x_{3k}^2 \\ ... \\ x_{nk}^2 \\ \end{bmatrix}$

Then we have

$\vec{a}*\vec{e}^T = \sum_{k=1}^m \begin{bmatrix} x_{1k}^2 & x_{1k}^2 & ... & x_{1k}^2 \\ \\ x_{2k}^2 & x_{2k}^2 & ... & x_{2k}^2 \\ \\ ... & ... & ... & ... \\ \\ x_{nk}^2 & x_{nk}^2 & ... & x_{nk}^2 \\ \end{bmatrix} \vec{e}*\vec{a}^T = \sum_{k=1}^m \begin{bmatrix} x_{1k}^2 & x_{2k}^2 & ... & x_{nk}^2 \\ \\ x_{1k}^2 & x_{2k}^2 & ... & x_{nk}^2 \\ \\ ... & ... & ... & ... \\ \\ x_{1k}^2 & x_{2k}^2 & ... & x_{nk}^2 \\ \end{bmatrix}$

The distance matrix can be decomposed into 3 parts.

$\begin{align*} D^2(X) &= \vec{a}*\vec{e}^T + \vec{e}*\vec{a}^T - 2XX^T\\ &=\sum_{k=1}^m \begin{bmatrix} x_{1k}^2 & x_{1k}^2 & ... & x_{1k}^2 \\ \\ x_{2k}^2 & x_{2k}^2 & ... & x_{2k}^2 \\ \\ ... & ... & ... & ... \\ \\ x_{nk}^2 & x_{nk}^2 & ... & x_{nk}^2 \\ \end{bmatrix} + \sum_{k=1}^m \begin{bmatrix} x_{1k}^2 & x_{2k}^2 & ... & x_{nk}^2 \\ \\ x_{1k}^2 & x_{2k}^2 & ... & x_{nk}^2 \\ \\ ... & ... & ... & ... \\ \\ x_{1k}^2 & x_{2k}^2 & ... & x_{nk}^2 \\ \end{bmatrix} -2 \sum_{k=1}^m \begin{bmatrix} x^2_{1k} & x_{1k} x_{2k} & ... & x_{1k} x_{nk} \\ \\ x_{2k}x_{1k} & x_{2k}^2 & ... & x_{2k} x_{nk} \\ \\ ... & ... & ... & ...\\ \\ x_{nk} x_{1k} & x_{nk} x_{2k} & ... & x^2_{nk} \\ \end{bmatrix} \end{align*}$

Now we need to introduce one special matrix called centering matrix $J$

$\begin{align*} J &= I - \frac{1}{n}\vec{e}*\vec{e}^T\\ &= \begin{bmatrix} \frac{n-1}{n} & -\frac{1}{n} & -\frac{1}{n} & ... & -\frac{1}{n} \\ \\ -\frac{1}{n} & \frac{n-1}{n} & -\frac{1}{n} & .. .& -\frac{1}{n} \\ \\ ... & ... & ... & ... & ...\\ \\ -\frac{1}{n} & -\frac{1}{n} & ... &-\frac{1}{n} & \frac{n-1}{n}\\ \end{bmatrix} \end{align*}$

$n$ is the number of elements in vector $\vec{e}$

The property of centering matrix we are going to use here is that by multiplication $JX$ , the mean from each of the columns in $X$ will be removed and by multiplication $XJ$ , the mean from each of the rows in $X$ will be removed.

The result of $-\frac{1}{2}JDJ$ is

$\begin{align*} -\frac{1}{2}JDJ &= -\frac{1}{2}J(\vec{a}*\vec{e}^T + \vec{e}*\vec{a}^T - 2XX^T)J \\ &= -\frac{1}{2}J\vec{a}*\vec{e}^TJ - -\frac{1}{2}J \vec{e}*\vec{a}^TJ +J XX^TJ \\ &= XX^T\\ \end{align*}$

Because

The result of $\vec{e}^TJ$ or $J\vec{e}$ is a vector. And all elements in this vector is $0$ .
The mean from each of the columns in $X$ is 0, $JX = X$ and $X^TJ = X^T$ .

Apparently, $XX^T$ is a symmetric matrix. And by using eigendecompostion, we will obtain

$\begin{align*} XX^T &= Q \Lambda Q^T\\ &= (Q\Lambda^{\frac{1}{2}})(Q\Lambda^{\frac{1}{2}})^T \end{align*}$

$Q$ is the a matrix whose columns are eigenvectors of matrix $XX^T$ .

$\Lambda$ is a diagonal matrix whose entries are the eigenvalues of $XX^T$ .

$\Lambda^{\frac{1}{2}}$ is the result of element-wise square root of $\Lambda$

Finally, the problem of finding the coordinate matrix now becomes finding the eigenvalues and eigenvectors of $JDJ$

Depending on the desired output dimension $m$ , usually $m=2$ , we take $m$ largest eigenvalues and corresponding eigenvectors. We can obtain $X$ .

$X=Q_m\Lambda_m^{\frac{1}{2}}$

Negative eigenvalues are simply ignored as error in classical MDS.

Summary

Classical MDS gives us an analytical solution for finding coordinates given distance matrix. The limit of classical MDS is also obvious. Since it assumes Euclidean distance, it is not applicable for direct dissimilarity ratings [3].

For the choice of output dimension $m$ , Sibson (1979) suggests that the sum of the eigenvalues in $\Lambda_m$ should approximate the sum of all eigenvalues in $\Lambda$ , so that small negative eigenvalues cancel out small positive eigenvalues [2].

References

[1] Florian Wickelmaier, An Introduction to MDS, Aalborg University, 2003

[2] Ingwer Borg, Patrick Groenen, Modern Multidimensional Scaling Theory and Applications, Springer

[3] https://en.wikipedia.org/wiki/Multidimensional_scaling

Written on March 8, 2018