Partiview for Machine LearningA Matlab Interfaceby Dinoj Surendran, with many thanks to Stuart Levy
for patiently answering several questions! This has been superceded. Go to the Ndaona Project Page instead.
This page is updated regularly. Reload every time. On IE, that can mean adding 'index.html' to the url and then reloading. Go figure. If Partiview proves useful for you, Stuart and I would appreciate it a great deal if you cited the paper Visualizing High Dimensional Datasets Using Partiview from the Proceedings of INFOVIS 2004. The documentation on this page is for those of you who like step-by-step instructions and want to know what's going on. If you prefer learning by example, just read the installation instructions and then head to this worked-out example that with some face recognition data.
Here are some notes that you could live without, on the assumption that living with them can't be too bad either.
If the above documentation doesnt help, try reading the Partiview User's Guide by Brian Abbott (it's very readable) or the Partiview Reference Manual by Peter Teuben and Stuart Levy. Examples with real data:
Installation
Raw 3d dataThe general format for using makepv.m is by typing in Matlab
makepv (POS,'blah',param1,value1,param2,value2,...,paramN,valueN);
The first two arguments are required. The remaining arguments occur in parameter-value pairs. The pairs can be in any order. POS is a P x 3 matrix (P x D in general, more on that later) representing the 3d positions of P points. For example
X = rand(100,3);
makepv(X,'blah');
This creates a visualization of a hundred random points, all colored white. Actually, that's a simplification. What actually happens is that a series of files of the form blah* is created. One of these is called blah.cf. If the partiview binary (a.k.a. executable) is in the current Matlab directory, then the binary then runs on blah.cf (which calls the other files blah*), producing a visualization that you can play with. Partiview is a data viewer - it has nothing to do with Matlab. If the binary/executable is elsewhere, then the files blah* will be produced but no viewer will come up. Either way, you can always run partiview on blah.cf later; here's how:
The files blah* created by makepv are placed in the same directory that Matlab is running in. If you want to place them in a different directory, place the name of that directory in some variable, say mydir and send that in as well:
makepv (POS,'blah','OUTDIR',mydir);
For example, if the above statement was preceded by the line mydir = 'urbana', then blah.cf and other blah* files are created in the subdirectory urbana of the current directory. If mydir = '../urbana', then they would have been created in the subdirectory urbana of the current directory's parent directory. If the directory already exists, the files will be created automatically, overwriting any files of the same name already present there. The rest of this documentation describes parameter-value pairs for representing things like the groups and colors and pictures of your points. Plotting Pictures at PointsThis is probably the #1 feature separating Partiview from existing software. Suppose you have images for each of your P points. Each image has to be an SGI file. (You can convert more common picture formats to SGI format using free packages like GIMP or ImageMagick.) First, create a P-element cell array pict (say) with the name of each picture. If you don't have a picture for a particular data item, leave it blank, i.e. set pict{i} = ''. In that case nothing will appear for the i-th point. Once you use pictures, directory structure is important. Pictures must be placed in a subdirectory of the directory OUTDIR. (By default, OUTDIR is the current directory you are working in in Matlab.) For example, if your pictures are in the directory specified by the variable projectdir, then say
makepv(POS,'blah','PIC',pict,'OUTDIR',projectdir);
This creates the files blah* in the directory specified by projectdir. Caveat: Regardless of the size of your picture, Partiview will squish it to be a square. Coloring by AttributesYou have to do this in two steps: supplying a C x 3 matrix of color values (if not, one will be provided for you randomly) and supplying a P x 1 (simplest case) vector whose elements specify the color of each point. Suppose the C x 3 matrix is called colmap and the P x 1 vector is called coloring
makepv(POS,'blah',...,'ATTRIB',coloring,'COLORS',colmap,...);
If you're happy with the default colouring, use
makepv(POS,'blah',...,'ATTRIB',coloring);
Suppose you want to be able to color by any of A attributes. You can do this with the commands above and making the P x 1 vector 'coloring' be a P x A matrix instead, with each of the A columns having a different attribute to color by. By default the first column is used to give colors at startup. You can type in the command window "color 1" to color by the second column, "color 2" by the third, etc. If A is large, you probably dont want to have to remember what attributes correspond to what column, especially with the off-by-one thing. In that case you can create a cell array, 'names' say, with the names of the A attributes.
makepv(POS,'blah',...,'ATTRIB',coloring, ..., 'ATTRIBNAMES',names,...);
So if, for example, A is 3, and names = {'age','sex','weight'} then you can say 'color weight' instead of 'color 2' in the command window. All this assumes that you want all your groups to be colored the same way. Partitioning your data into groupsIf you are happy with all your data being in one group, ignore this section, the default values are fine for you. Suppose you want to partition your P points into G groups. If you create a P x 1 vector gps whose entries are integers from 1 to G, then the p-th point will be in the gps(p)-th group. Then say
makepv(POS,'blah',...,'GROUPS',gps, ...);
Now your data will be organized into the G groups, with a button for each group so you can turn groups on and off easily. These buttons will be labelled 'g1','g2',... etc. Perhaps you don't want that. In that case, create a cell array gpnames such that the name of the i-th group is gpnames{i}. Then say
makepv(POS,'blah',...,'GROUPS',gps, .., 'GROUPNAMES', gpnames,....);
You don't have to specify every group name. For example, say you have 5 groups, and you only have names for the second and fourth. Then you could define gpnames as
gpnames = cell(5,1);
gpnames{2} = 'yoyo';
gpnames{4} = 'ma';
Your groups will then be named 'g1','yoyo','g3','ma', and 'g5'. Drawing graphsPartiview has limited graph drawing capability. We do not recommend Partiview as a dedicated graph viewing package. But for some machine learning applications (e.g. viewing the nearest neighbors graph of Isomap, Laplacian Eigenmaps or Locally Linear Embedding (LLE)). overlaid on the points embedded in 3d), this functionality is adequate. Suppose you have a m x 2 matrix E representing m edges, with the i-th edge joins points E(i,1) and E(i,2).
makepv(POS,'blah',...,'EDGES',E,...)
Now the first group will consist of all edges, so that their visibility can be toggled. At the moment, all edges are colored the same. Partiview can have edges being colored any color and (to some extent) thickness. Future versions of this interface will account for that. Added Feb 22: If your m x 2 matrix E represents directed edges, with the i-th edge going from point E(i,1) to E(i,2) then say
makepv(POS,'blah',...,'DIREDGES',E,...)
It draws arrows at a fifth of the way across the edge. Labelling pointsSuppose you want to label your P points with text. Place your labels in a P-element cell array. For example, if your labels are in array names, just say
makepv(POS,'blah',...,'LABELS',names,...)
Even some regular matlabbers don't use cell arrays, so here's an example of how to create them. Suppose P=4 and your names are Beatlic. You can define names using
names{1} = 'john';
names{2} = 'paul';
names{3} = 'george';
names{4} = 'starkeyravingmad';
or
names = {'john','paul','george','starkeyravingmad'};
Note that if your data is partitioned into groups, the labels will be too. Once in partiview, labels can be turned off by typing 'labels off' or ('gall labels off' for turning labels off in all groups). Typing 'labels on' does the obvious too. There is also a button at the top of the partiview menu for toggling label visibility. Labels do not have to be unique identifiers. Problem: labels appear at fixed points, so you can't use them if you using the Linear Projections feature to view more than three dimensions at a time. Storing additional information for data pointsSometimes you have information, in the form of text strings, associated with each data point that you want to be able to look up. Place these in a P-element cell array, say pointinfo, and type
makepv(POS,'blah',...,'INFO',pointinfo,...)
Now when you're viewing your data in partiview, you can move your mouse over a point and press the P key. In the command menu, data for that point will appear, including its informative text string. You can also use this feature for labels that are too long and unwieldy, or if you want to use the next feature. Linear ProjectionsStuart added this feature in mid/late 2004, so it's not yet available in the Partiview binary available from Hayden. It is available in the binaries and source code available from the NCSA. Viewing more than 3 dimensions in 3d space is a perennial problem, and several techniques (with varying degrees of bogosity) have been proposed to deal with it. Partiview's is pretty basic in comparison. But useful. If your data points are in D > 3 dimensions, you can define your three spatial dimensions to be any weighted combination of the D dimensions. In other words, if your positions are in a P x D matrix M, then the weights can be represented as a D x 3 matrix W, and you can view the matrix M*W. Partiview allows you specify an arbitrary W and change the view instantly, without having to restart the program. Suppose your positions are in a P x D matrix. Then say
makepv(M,'blah',...)
Partiview will start up with data in the positions given by the first three of the D dimensions. Now you can specify the W matrix. The user interface for this is text-only for now. If you want something prettier, find us a grant! (Seriously.) The first thing to remember is that the D dimensions are indexed 0,1,...,D-1. The second thing to remember is that the three spatial dimensions that you view the data in are called wx, wy and wz. We illustrate the commands with the D = 5 case. In other words, the matrix W specifies a linear mapping from dimensions < 0 1 2 3 4> to < wx wy wz >. To specify W to be the matrix of numbers (all the letters below are really numbers) below
a b c d e
f g h i j
k m n o p
You would type the command (note the "add" in addition to "warp")
gall add warp -wx a,b,c,d,e -wy f,g,h,i,j -wz k,m,n,o,p
If you have only one group, then you can leave out the word 'gall'. If the initial dimensions have zero weights, you can specify the first nonzero dimension. For example, if f=k=m=n=0, then you could say this:
gall add warp -wx a,b,c,d,e -wy 1:g,h,i,j -wz 3:o,p
(Note to partiview users who know what warp really does: if D>3, makepv.m writes the first three columns of the speck file twice.) If W is sparse, you can specify dimension:coeffient pairs. For example, if the only nonzero entries in W are a, e, h and p, you could say
gall add warp -wx 0:a,4:e -wy 2:h -wz 4:p
To see what the current matrix W is, type 'warp' in the command window. Passive Stereo (GeoWall)To produce data that is viewable on a GeoWall, send in the SCREEN parameter with value 'geowall'.
makepv(POS,'blah',...,'SCREEN','geowall',...)
The default eye separation is 0.005; you can change this to, say, 0.01, with
makepv(POS,'blah',...,'SCREEN','geowall',...,'EYESEP',0.01,...)
Using PartiviewNavigationThe trick is to hold the mouse button down while moving. Navigation is inertia-based (this is a feature, not a bug!), so the scene keeps moving when you let go of the mouse button, depending on how fast you were moving the mouse when you let go. Move-click with the left mouse button to rotate, with the right mouse button to zoom. To translate, move the mouse with the right mouse button pressed, while pressing the CTRL key. Alternatively, press the 'f' key, move around with the left mouse button pressed, then press the 'o' key to return to the original setting of the left-mouse-move command. Using the Command WindowPartiview starts up with two screens, one for the model visualized and the other is a command menu, shown below:
You can type commands in the command window. To get a cursor there, click in the bottom line of the window ("Cmd:...") or press TAB. Minimize the command window by pressing the "_" button near its top right corner. Click on the "x" button removes the window completely, so you probably dont want to do that. Getting PicturesType in the command window
snapshot
And you'll find in the current matlab directory a file of the form snap.000.ppm (on Windows) or snap.000.sgi (on *nix) that has a screenshot of the current picture displayed on partiview. The next time you type the command, the picture snap.001.ppm/sgi will be created, and so on. The next time you run Partiview, the snap counter goes back to 000. You may want this picture to not have the command window in it. To do that, close or minimize the command window, and then press TAB and type (blindly) snapshot. This results in the same picture without the menu or with the menu in a corner that you can crop out. You can only make movies on Linux, for technical reasons. Here's documentation on that at the Cosmus site. Modifying parametersThis interface estimates values for Partiview parameters, but you may still need to modify them for your nefarious needs. Axes
censize
The second-bottom part of the command window now displays the length of censize, for example
censize 10.134 (interest-marker size)
Ignore the "interest-marker size" thing. This means that each half-axis is 10.134 units. The units are pixels, but you'll only ever use them relatively. So for example, if you want to change the length of the axes to 20, type in the command window
censize 20
To turn the axes off, type
censize 0
To multiply or divide the size of the axes by, say, 5, type
censize *5
censize /5
I find censize a useful way to find out the dimensions of what I'm looking at, when I don't remember/know the stats of the original matrices. Changing image sizeWhile makepv tries to guess how large you want your images, it is not at all guaranteed to succeed. You may want to make images larger or smaller. Making them smaller also makes it easier on your graphics card. To halve the size of all images, say
gall polysize /2
To multiple the size of all images by 1.5, say
gall polysize *1.5
The 'gall' makes the command apply to all groups that your data is in. Otherwise it only applies to the current group, which is displayed somewhere in the top left quadrant of the command menu. Turning points on/offThis can be done by typing in the command window
gall points
If you specifically want them turned off, type
gall points off
To do just for individual groups, use g* instead of gall. For example, to only turn on points for groups g2 and g5, type
g2 points on
g5 points on
To do the same with the images (also called polygons), type poly instead of points in the commands above. Examples with Real DataUSPS Handwritten DigitsGet usps.zip and uncompress it in your current Matlab directory. This places usps.mat in there creates a subdirectory usps that has pictures (sgi files) of 2007 digits. usps.mat has data for the 2007 USPS test points, and embeddings of the points in 8 dimensions using Principal Components Analysis, Laplacian Eigenmaps, Landmarks-Isomap, and Local Linear Embedding. EDU>> whos Name Size Bytes Class Wiso 2007x2007 393376 double array (sparse) Wlapbin 2007x2007 292576 double array (sparse) Wlle 2007x2007 393376 double array (sparse) X 2007x256 4110336 double array Yiso 2007x8 128448 double array Ylapbin 2007x10 160560 double array Ylle 2007x10 160560 double array Ypca 2007x10 160560 double array digitnames 1x10 684 cell array images 2007x1 182430 cell array labels 2007x1 16056 double array picid 2007x1 150318 cell array ![]() To see how well Principal Components Analysis does, use one of the two commands below. You should get something like the picture above. The colors will be different, since they are randomly chosen each time.
makepv (Ypca,'uspspca','GROUPS',labels);
makepv (Ypca,'uspspca','OUTDIR','usps','GROUPS',labels);
The first command places the files uspspca* in the current directory. The second places them in the directory usps instead. The second is recommended, since hiding file clutter is A Good Thing. A problem with the present setup is that you don't know the identity of each point. It would be nice to be able to click on a point and have information about it appear. Partiview supports this to a limited extent. You have to create a cell array with P elements that identify the P data points. The workspace you loaded in usps.mat has such an array, called picid. (In case you are curious, it was created with the command for i=1:2007, picid{i}=sprintf('test%d',i);end.)
makepv (Ypca,'uspspca','OUTDIR','usps','GROUPS',labels,'INFO',picid);
Now if you move the mouse over a point and press the 'p' key (dont use quotes), you'll see some information appear in the command window about that point, including its 3d position and the value in picid{i} where i is the index of the point. To see how well Principal Components Analysis does, but with pretty pictures, you need to have .sgi files with images of pictures. If you unzipped usps.zip as instructed, then these pictures should already be in the subdirectory usps of the current directory, and have their names in the cell array images.
makepv (Ypca,'uspspca','OUTDIR','usps','GROUPS',labels,'INFO',picid,'PIC',images);
This produces a visual like this: ![]() If you zoom in, you see the pictures... ![]() And you'll also decide that they are too large... therefore, move your mouse to the bottom row of the command window (or just press TAB) and type
gall polysize /2
This should create the image below. ![]() Some algorithms (Isomap, Laplacian Eigenmaps, LLE, etc) compute a nearest-neighbor graph that you might want to overlay on the image. In the current workspace these graphs are stored in the 2007 x 2007 sparse matrices Wiso, Wlapbin and Wlle. To see the graphs overlaid on the positions in 3d, send them in with the parameter EDGES, e.g.
makepv (Ylapbin,'usps_laplacian','OUTDIR','usps','GROUPS',labels,'PIC',images,'INFO',picid,'EDGES',Wlapbin);
makepv (Yiso,'usps_isomap','OUTDIR','usps','GROUPS',labels,'PIC',images,'INFO',picid,'EDGES',Wiso);
makepv (Ylle,'usps_lle','OUTDIR','usps','GROUPS',labels,'PIC',images,'INFO',picid,'EDGES',Wlle);
Yale Face Recognition DataAnother dataset is yale.zip. Unzip it in the current Matlab directory as well; this creates a file yale.mat in the current directory and a subdirectory yalefaces that contains pictures of 166 faces, each a 64 x 64 image. In Matlab, type "load yale.mat". There are now several matrices here. We are concerned with the 166 x 7 matrices Ypca, Yiso, Ylap, Ylle, which have the 166 faces embedded in R7 using Principal Components Analysis, Isomap, Laplacian Eigenmaps, and LLE. If you type in Matlab
makepv (Yiso,'blah','OUTDIR','yalefaces','GROUPS',labels,'PIC',images,'EDGES',Wiso);
Then you get pictures like that below (click on them to expand). You can of course substitute Ylap/Wlap or Ylle/Wlle for Yiso/Wiso to see how well other manifold algorithms do.
|