Partiview: a tool for visually analyzing the perfomance of dimensionality reduction and clustering algorithms
Dinoj Surendran
Department of Computer Science, University of Chicago
Stuart Levy
Experimental Technologies Group, National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign
This will be a demo at this year's NIPS conference.
Abstract
Partiview is a free open source viewer for exploring data in three
or more dimensions. It is fast - a user can smoothly interact with
million-point scatterplots on a laptop. Data can be represented by
colored points, text labels, images, or a combination thereof.
This demo shows how Partiview can be used via a Matlab interface to
visualize the results of dimensionality reduction algorithms
(e.g. Laplacian Eigenmaps, LLE) and kernel methods in several
domains, including handwritten digit recognition, computational
linguistics, face recognition, and bioinformatics.
If the data has an underlying visual representation (e.g. MNIST
digits) each datum can be represented by its original image. This
is useful for qualitative error analysis (and making cute demos).
For data in RN, the user can specify a N x 3
matrix so that the data's 3d positions are affine combinations of
their N dimensions. The matrix can be changed in real-time.
Points can have several attributes associated with them. The user
can color points by any attributes, and change the coloring
attribute instantly. Groups of points can be turned on and off
based on their attribute values.
Demos are downloadable from http://people.cs.uchicago.edu/~dinoj/vis.
Extended Abstract
Partiview
is a plotting tool that we have found very useful for exploring
datasets, and for presenting machine learning research to others. It
runs on Windows, Linux and OS X, and is freeware under a BSD-ish
license.
What's novel
- Existing 3d plotting tools only allow you to place dots at
specified 3d coordinates. Colored, labelled dots. With Partiview you
can also place arbitrary images, and you can always make them face the
viewer. This is especially useful if the data has a natural
representation, e.g. handwritten digit recognition, face recognition,
galaxy classification, etc.
The number of pictures you can have at a
time depends on how much graphics memory you have. 64Mb, which is
available on several recent laptops (especially Dell's) allows you to
display over 5000 pictures of handwritten digits, for example.
- Plotting data with N dimensions in a 3-dimensional space is a
perennial problem. Existing programs like XGVis allow you to
arbitrarily choose 3 of the N dimensions and have the plot change
accordingly. Partiview allows you to specify an arbitrary real N x 3
matrix that defines each of the 3 spatial coordinates as weighted
combinations of the N available coordinates.
What we do when dealing with datasets with tens or hundreds of
dimensions is reduce them with a method like Laplacian Eigenmaps to,
say, eight dimensions, and then use Partiview to try different
matrices from then on. You don't have to restart Partiview every time
you change matrices either; it changes in real time.
What's not necessarily novel, but very useful
- Partiview can handle hundreds of thousands of points at a time,
even on laptops.
- There are often several types of labels associated with each
datapoint. You can use the labels (whether they are discrete or
continuous) as indices into color maps (that you can specify) and
then, in realtime, change the colors of the points according to
whatever label you want.
- Partiview supports several kinds of stereo, including red-cyan,
chromadepth, and side-by-side. The last allows it to be used on a
GeoWall (http://www.geowall.org), which is a cheap (under $10K)
one-wall CAVE that is within the reach of several institutions. Having
stereo can be quite useful with visualizing data in 3 dimensional
space.
- Partiview permits lines to be drawn between points, of different
colors. Unfortunately, the lines's end points are still associated
with 3d space, not with a combination of different dimensions - that's
something we have to add.
- You can take a series of snapshots in Partiview, and thus make animations and movies.
- Partiview can run off Powerpoint, which is very useful for
presentations.
- You can group data points, and turn groups on and off with the
press of a button.
- You can label data points with text, and turn the labels on and
off. Or click on a point and have its label appear.
- Navigation is inertia-based, so you can leave the data moving
(it's easier to get a sense of depth when points are moving) without
having to touch the keyboard/mouse. Very useful for presentations.
- You can turn the x-y-z axes on and off, and make them whatever
size you want.
What's problematic
The user interface, particularly for the Nx3 matrix mentioned above, is non-intuitive. There are plans to improve the user interface. Documentation needs to be improved.
User Experience
Several demos are available, so we are choosing just a few to show
the kinds of things Partiview can be used for. In each case the user
can zoom, translate, and spin the data in realtime on the laptop. They
can also look at the data in red-cyan stereo, we'll bring glasses
along.
- Handwritten Digit Recognition: 5000 digits from the MNIST dataset
processed by the Laplacian Eigenmaps algorithm (Belkin & Niyogi,
2002). Colors correspond to each digit type. You can turn all the 1's
off and on with a single button, ditto all the 2's, 3's etc.
This demo, which can be downloaded for Windows and Linux from http://people.cs.uchicago.edu/~dinoj/vis/digits , has reached the
semifinals of the NSF's 2004 Science and Engineering Visualization
Challenge, Interactive Media Category. It was created for, and using
data provided by, Mikhail Belkin, Department of Computer Science,
University of Chicago.
- College Football Clustering: In United States college
football, there are 119 football teams divided into several groups,
called conferences. Each year, teams play other teams, usually but not
necessarily, in the same conference. Two teams in the same conference
may not play each other in a particular year. Question: can a
clustering algorithm work out which teams are in the same conference?
This demo shows how well the Laplacian Eigenmaps algorithm succeeds at
this task, looking at its first 8 components. (I suppose I should try
the same dataset using some other algorithms for comparison purposes.)
For the sake of interesting the general public, we represent each
football team by a picture of its logo.
- Galaxy Classification: Here the data is of galaxies found
by the Sloan Digital Sky Survey, and we can see how well a
dimensionality reduction algorithm classifies the data, coloring the
different points according to different attributes of the galaxies,
such as orientation, apparent magnitude, absolute magnitude, etc. We
also have many of the galaxies represented by pictures found by the
SDSS. This was created as part of a research project recently begun
by Mark Subbarao (Department of Astronomy and Astrophysics, University
of Chicago; and Department of Astronomy, Adler Planetarium and Science
Museum) and Dinoj Surendran.
Why such a tool exists
Partiview is the little brother of Virtual Director, a program
Stuart Levy and others wrote to make high-end astronomy
visualizations. Partiview uses many of the same parts, and is thus an
industrial strength viewer. It was initially used for astronomy
visualizations only. Dinoj Surendran learnt about it while working
with an astronomer (Mark Subbarao) on making a model of the universe
with data from the Sloan Digital Sky Survey, and began using it for
machine learning applications. Stuart has added some features based on
Dinoj's requests, e.g. the N x 3 matrix, and continues to do so.
Bibiography
Levy, Stuart. Interactive
3-D Visualization of Particle Systems with Partiview, in
Astrophysical Supercomputing Using Particles (Proceedings of
International Astronomical Union Symposium Vol 208), 2001.