Classical Multidimensional Scaling
This post is about an dimension reduction method I used in my master thesis.
Introduction
Before we start to talk about classical Multidimensional Scaling(MDS). Let’s first consider one simple problem from middle school math textbook, given two cities coordinates (coordinates of Berlin and Hamburg for example) and how to find the squared distance between them.
It’s quite simple,right? Assume the coordinate of Berlin is and the coordinate of Hamburg is . And the squared distance is denoted by . Then for Euclidean distance we have
Now let’s go further into m-dim case, how about the squared distance between point and point ? And the formula now becomes
If we have points, then we need a distance matrix to describe the squared ditances between each of the two points.
For now, nothing special, I just started from a two-point distance case to a more general distance case using distance matrix. But the basic idea is the same, which is to calculate distance or find distance matrix given coordinates.
Now consider reverse situation: We know the distance matrix, is it possible to find the coordinate of each point? If yes, then how?
Classical MDS can address this problem.
Definition
According to wiki, MDS is a means of visualizing the level of similarity of individual cases of a dataset. An MDS algorithm aims to place each object in N-dim space such that the between-object distances are preserved as well as possible.
The definition from Florian Wickelmaier’s paper [1] is similar. The data for MDS is now called proximities, which indicate the overall similarity or dissimilarity of the objects. An MDS program looks for a spatial configuration of the objects, so that the distance between the objects match their proximities as colsely as possible.
In our city-distance example, the proximity is the distance matrix and classical MDS will give us the spatial configuration of the cities, which is the coordinate of each city.
The classicial MDS is also known as Principal Coordinates Analysis(PCoA).
Classical MDS
Given
Goal
Assumption
Coordinate matrix has column means equal to 0, since distances don’t change under translations.
Solution
Firstly, we define two n-element column vectors and
Then we have
The distance matrix can be decomposed into 3 parts.
Now we need to introduce one special matrix called centering matrix
is the number of elements in vector
The property of centering matrix we are going to use here is that by multiplication , the mean from each of the columns in will be removed and by multiplication , the mean from each of the rows in will be removed.
The result of is
Because
- The result of or is a vector. And all elements in this vector is .
- The mean from each of the columns in is 0, and .
Apparently, is a symmetric matrix. And by using eigendecompostion, we will obtain
is the a matrix whose columns are eigenvectors of matrix .
is a diagonal matrix whose entries are the eigenvalues of .
is the result of element-wise square root of
Finally, the problem of finding the coordinate matrix now becomes finding the eigenvalues and eigenvectors of
Depending on the desired output dimension , usually , we take largest eigenvalues and corresponding eigenvectors. We can obtain .
Negative eigenvalues are simply ignored as error in classical MDS.
Summary
Classical MDS gives us an analytical solution for finding coordinates given distance matrix. The limit of classical MDS is also obvious. Since it assumes Euclidean distance, it is not applicable for direct dissimilarity ratings [3].
For the choice of output dimension , Sibson (1979) suggests that the sum of the eigenvalues in should approximate the sum of all eigenvalues in , so that small negative eigenvalues cancel out small positive eigenvalues [2].
References
[1] Florian Wickelmaier, An Introduction to MDS, Aalborg University, 2003
[2] Ingwer Borg, Patrick Groenen, Modern Multidimensional Scaling Theory and Applications, Springer
[3] https://en.wikipedia.org/wiki/Multidimensional_scaling