This documents an informal experiment to see how the lapbin.m, a Matlab function for the Laplacian algorithm with binary weights, behaves when its graph is created with varying numbers of nearest neighbors.
(Note that for lapbin.m to run, you also need the file L2_distance.m by Roland Bunschoten.)
The test was quite simple: create several points in 2d, map them to 3d with some smooth function, then map them back to 2d with the Laplacian algorithm and see what happens.
The original data was created by randomly sampling from a Gaussian Mixture Model with centers/means at (0,0), (0,5), (5,0) and (5,5). The covariance for each gaussian was the 2x2 identity matrix. This was created with the makegaussmixnd.m and plotcol.m functions :
ppm = [400 400 400 400]; % points per mixture [data2,labels]=makegaussmixnd([0 0; 0 5; 5 0; 5 5],1,ppm); plotcol (data2, ppm, 'rgbk');
And then printing the resulting plot with the printt.m function :
The resulting image files original_5x5_400.eps and original_5x5_400.jpg look like this:
There are 400 points in each of the four clusters above.
The data was then converted from 2 to 3 dimensions with the mapping (x,y) -> (x cos 2x, y, x sin 2x). Matlab code for this:
X = data2(:,1); Y = data2(:,2); data3 = [X.*cos(X*2) Y X.*sin(X*2)]; plotcol (data3,ppm,'rgbk');
This results in the following picture, made using the Partiview files points3d.cf, points3d.speck (created by just saying in Matlab
blah =[data3 labels'] ; save points3d.speck speck -ascii
and then adding the line 'datavar 0 lab' at its start) and points3d.cmap.
Now we can map this back to 2 dimensions. This can be done with
[E2,V2]=lapbin(data3,2,10); % using 10 nearest neighbors
And get this (eps)
We could also map it to 3 dimensions, even though we are in 3 dimensions already.
If we just plot the first 2, we get pretty much the same picture as before (as theory says... but have a look at the last few lines of lapbin.m). But not quite the same picture, since the leigs function is randomized and the actual gradient descent (or equivalent) it uses runs differently each time.
The top six eigenvalues are 0.0130, 0.0275, 0.0697, 0.0850, 0.0975, and 0.1429.
This does not look like the original data set. And there is information in the third component as well. Below is what happens when you throw that component in as well.
All the above computations were done with 10 nearest neighbors used to make the Laplacian Algorithm's graph. What if we use 5 instead? Then we get this:
Note the blue section at the end of the black tail. Not good. If we use 20 instead, then the twist in the manifold (in 3d) is captured - again, not good. It is not being local enough. On the other hand, the 10-nearest neighbors version was doing this as well.
Use the binary-weights version of the Laplacian algorithm with care.