Partiview: a tool for visually analyzing the perfomance of dimensionality reduction and clustering algorithms

Dinoj Surendran

Department of Computer Science, University of Chicago

Stuart Levy

Experimental Technologies Group, National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign

This will be a demo at this year's NIPS conference.

Abstract

Partiview is a free open source viewer for exploring data in three or more dimensions. It is fast - a user can smoothly interact with million-point scatterplots on a laptop. Data can be represented by colored points, text labels, images, or a combination thereof.

This demo shows how Partiview can be used via a Matlab interface to visualize the results of dimensionality reduction algorithms (e.g. Laplacian Eigenmaps, LLE) and kernel methods in several domains, including handwritten digit recognition, computational linguistics, face recognition, and bioinformatics.

If the data has an underlying visual representation (e.g. MNIST digits) each datum can be represented by its original image. This is useful for qualitative error analysis (and making cute demos).

For data in RN, the user can specify a N x 3 matrix so that the data's 3d positions are affine combinations of their N dimensions. The matrix can be changed in real-time.

Points can have several attributes associated with them. The user can color points by any attributes, and change the coloring attribute instantly. Groups of points can be turned on and off based on their attribute values.

Demos are downloadable from http://people.cs.uchicago.edu/~dinoj/vis.

Extended Abstract

Partiview is a plotting tool that we have found very useful for exploring datasets, and for presenting machine learning research to others. It runs on Windows, Linux and OS X, and is freeware under a BSD-ish license.

What's novel

  1. Existing 3d plotting tools only allow you to place dots at specified 3d coordinates. Colored, labelled dots. With Partiview you can also place arbitrary images, and you can always make them face the viewer. This is especially useful if the data has a natural representation, e.g. handwritten digit recognition, face recognition, galaxy classification, etc.

    The number of pictures you can have at a time depends on how much graphics memory you have. 64Mb, which is available on several recent laptops (especially Dell's) allows you to display over 5000 pictures of handwritten digits, for example.

  2. Plotting data with N dimensions in a 3-dimensional space is a perennial problem. Existing programs like XGVis allow you to arbitrarily choose 3 of the N dimensions and have the plot change accordingly. Partiview allows you to specify an arbitrary real N x 3 matrix that defines each of the 3 spatial coordinates as weighted combinations of the N available coordinates.

    What we do when dealing with datasets with tens or hundreds of dimensions is reduce them with a method like Laplacian Eigenmaps to, say, eight dimensions, and then use Partiview to try different matrices from then on. You don't have to restart Partiview every time you change matrices either; it changes in real time.

What's not necessarily novel, but very useful

  1. Partiview can handle hundreds of thousands of points at a time, even on laptops.
  2. There are often several types of labels associated with each datapoint. You can use the labels (whether they are discrete or continuous) as indices into color maps (that you can specify) and then, in realtime, change the colors of the points according to whatever label you want.
  3. Partiview supports several kinds of stereo, including red-cyan, chromadepth, and side-by-side. The last allows it to be used on a GeoWall (http://www.geowall.org), which is a cheap (under $10K) one-wall CAVE that is within the reach of several institutions. Having stereo can be quite useful with visualizing data in 3 dimensional space.
  4. Partiview permits lines to be drawn between points, of different colors. Unfortunately, the lines's end points are still associated with 3d space, not with a combination of different dimensions - that's something we have to add.
  5. You can take a series of snapshots in Partiview, and thus make animations and movies.
  6. Partiview can run off Powerpoint, which is very useful for presentations.
  7. You can group data points, and turn groups on and off with the press of a button.
  8. You can label data points with text, and turn the labels on and off. Or click on a point and have its label appear.
  9. Navigation is inertia-based, so you can leave the data moving (it's easier to get a sense of depth when points are moving) without having to touch the keyboard/mouse. Very useful for presentations.
  10. You can turn the x-y-z axes on and off, and make them whatever size you want.

What's problematic

The user interface, particularly for the Nx3 matrix mentioned above, is non-intuitive. There are plans to improve the user interface. Documentation needs to be improved.

User Experience

Several demos are available, so we are choosing just a few to show the kinds of things Partiview can be used for. In each case the user can zoom, translate, and spin the data in realtime on the laptop. They can also look at the data in red-cyan stereo, we'll bring glasses along.

  1. Handwritten Digit Recognition: 5000 digits from the MNIST dataset processed by the Laplacian Eigenmaps algorithm (Belkin & Niyogi, 2002). Colors correspond to each digit type. You can turn all the 1's off and on with a single button, ditto all the 2's, 3's etc.

    This demo, which can be downloaded for Windows and Linux from http://people.cs.uchicago.edu/~dinoj/vis/digits , has reached the semifinals of the NSF's 2004 Science and Engineering Visualization Challenge, Interactive Media Category. It was created for, and using data provided by, Mikhail Belkin, Department of Computer Science, University of Chicago.

  2. College Football Clustering: In United States college football, there are 119 football teams divided into several groups, called conferences. Each year, teams play other teams, usually but not necessarily, in the same conference. Two teams in the same conference may not play each other in a particular year. Question: can a clustering algorithm work out which teams are in the same conference? This demo shows how well the Laplacian Eigenmaps algorithm succeeds at this task, looking at its first 8 components. (I suppose I should try the same dataset using some other algorithms for comparison purposes.)

    For the sake of interesting the general public, we represent each football team by a picture of its logo.

  3. Galaxy Classification: Here the data is of galaxies found by the Sloan Digital Sky Survey, and we can see how well a dimensionality reduction algorithm classifies the data, coloring the different points according to different attributes of the galaxies, such as orientation, apparent magnitude, absolute magnitude, etc. We also have many of the galaxies represented by pictures found by the SDSS. This was created as part of a research project recently begun by Mark Subbarao (Department of Astronomy and Astrophysics, University of Chicago; and Department of Astronomy, Adler Planetarium and Science Museum) and Dinoj Surendran.

Why such a tool exists

Partiview is the little brother of Virtual Director, a program Stuart Levy and others wrote to make high-end astronomy visualizations. Partiview uses many of the same parts, and is thus an industrial strength viewer. It was initially used for astronomy visualizations only. Dinoj Surendran learnt about it while working with an astronomer (Mark Subbarao) on making a model of the universe with data from the Sloan Digital Sky Survey, and began using it for machine learning applications. Stuart has added some features based on Dinoj's requests, e.g. the N x 3 matrix, and continues to do so.

Bibiography

Levy, Stuart. Interactive 3-D Visualization of Particle Systems with Partiview, in Astrophysical Supercomputing Using Particles (Proceedings of International Astronomical Union Symposium Vol 208), 2001.