Classical Multidimensional Scaling

This post is about an dimension reduction method I used in my master thesis.

Introduction

Before we start to talk about classical Multidimensional Scaling(MDS). Let’s first consider one simple problem from middle school math textbook, given two cities coordinates (coordinates of Berlin and Hamburg for example) and how to find the squared distance between them.

It’s quite simple,right? Assume the coordinate of Berlin is and the coordinate of Hamburg is . And the squared distance is denoted by . Then for Euclidean distance we have

Now let’s go further into m-dim case, how about the squared distance between point and point ? And the formula now becomes

equation

If we have equation points, then we need a distance matrix equation to describe the squared ditances between each of the two points.

equation

For now, nothing special, I just started from a two-point distance case to a more general distance case using distance matrix. But the basic idea is the same, which is to calculate distance or find distance matrix given coordinates.

Now consider reverse situation: We know the distance matrix, is it possible to find the coordinate of each point? If yes, then how?

Classical MDS can address this problem.

Definition

According to wiki, MDS is a means of visualizing the level of similarity of individual cases of a dataset. An MDS algorithm aims to place each object in N-dim space such that the between-object distances are preserved as well as possible.

The definition from Florian Wickelmaier’s paper [1] is similar. The data for MDS is now called proximities, which indicate the overall similarity or dissimilarity of the objects. An MDS program looks for a spatial configuration of the objects, so that the distance between the objects match their proximities as colsely as possible.

In our city-distance example, the proximity is the distance matrix and classical MDS will give us the spatial configuration of the cities, which is the coordinate of each city.

The classicial MDS is also known as Principal Coordinates Analysis(PCoA).

Classical MDS

Given

equation

Goal

equation

Assumption

Coordinate matrixequation has column means equal to 0, since distances don’t change under translations.

Solution

Firstly, we define two n-element column vectors equation and equation

equation equation

Then we have

equation

The distance matrix can be decomposed into 3 parts.

equation

Now we need to introduce one special matrix called centering matrix equation

equation

equation is the number of elements in vector equation

The property of centering matrix we are going to use here is that by multiplication equation, the mean from each of the columns in equation will be removed and by multiplication equation, the mean from each of the rows in equation will be removed.

The result of equation is

equation

Because

  1. The result of equation or equation is a vector. And all elements in this vector is equation.
  2. The mean from each of the columns in equation is 0, equation and equation.

Apparently, equation is a symmetric matrix. And by using eigendecompostion, we will obtain

equation

equation is the a matrix whose columns are eigenvectors of matrix equation.

equation is a diagonal matrix whose entries are the eigenvalues of equation.

equation is the result of element-wise square root of equation

Finally, the problem of finding the coordinate matrix now becomes finding the eigenvalues and eigenvectors of equation

Depending on the desired output dimension equation, usually equation, we take equation largest eigenvalues and corresponding eigenvectors. We can obtain equation.

equation

Negative eigenvalues are simply ignored as error in classical MDS.

Summary

Classical MDS gives us an analytical solution for finding coordinates given distance matrix. The limit of classical MDS is also obvious. Since it assumes Euclidean distance, it is not applicable for direct dissimilarity ratings [3].

For the choice of output dimension equation, Sibson (1979) suggests that the sum of the eigenvalues in equation should approximate the sum of all eigenvalues in equation, so that small negative eigenvalues cancel out small positive eigenvalues [2].

References

[1] Florian Wickelmaier, An Introduction to MDS, Aalborg University, 2003

[2] Ingwer Borg, Patrick Groenen, Modern Multidimensional Scaling Theory and Applications, Springer

[3] https://en.wikipedia.org/wiki/Multidimensional_scaling

Written on March 8, 2018