RESEARCH INTERESTS
I am broadly interested in the problem of human and machine
intelligence. To make progress, I examine this with the point of view
provided by the twin windows of learning and language. In many ways,
learning is the centerpiece of intelligence. This is what presumably
distinguishes ``intelligent'' from ``pre-programmed'' behavior.
Language is partly interesting because it is almost certainly
learnt. Together they provide a testbed of challenging problems that
shed insight on the nature of intelligence and if successful would
lead to significant applications of the future. My specific research
program therefore focuses on learning and language --- how each works
in the human and how they can be replicated in a machine. (Note: I use
language in the broad sense to include speech, language, and hearing
that form the complete human oral communication system).
In my personal view, exciting research leading to insight and
consequent innovation will arise out of the synergistic interaction of
three points of view --- theory -- to provide computational and
mathematical characterizations; science -- to provide an
understanding of how intelligent systems work in the human; and
engineering -- to translate the insights into useful real-world
systems. At another level, given the unique problem of human and
machine intelligence, progress will come from the union of computer
science, cognitive science, and neuroscience. Consequently, I strive
to keep abreast of developments in all three and strike a balance
between them. With this point of view, there are several research
strands I wish to follow:
RESEARCH TOPICS: QUESTIONS AND DIRECTIONS
Machine Learning and Information Extraction
My research in this area is motivated by the following basic
questions. What are the fundamental limits of learning machines? Does
an understanding of these limits direct us towards the construction of
tractable learning paradigms? Can one use these learning paradigms to
explain learning in the human or develop learning techniques for
real-world applications? Do these learning paradigms allow us to
usefully extract information from large amounts of partially organized
data collected from the real world? I list below some of the specific
research directions that I think are important and have made progress
in:
Fundamental Limits
With F. Girosi, I have obtained theoretical results on the fundamental
limits of neural network learning. The results are quite general and
illustrate the two fundamental sources of error due to learning:
limited representational capacity of the hypothesis classes, and
limited amounts of data. The characterization of the tension between
these two sources of error allows us a way to choose neural networks
of the right complexity for any learning problem. More generally, they
allow us to trade-off the complexity of one's model with the amount of
data available leading to learning paradigms like the support vector
machines that I have investigated with V. Vapnik.
Novel Algorithms and Paradigms
One way to reduce the informational complexity of learning is by
active learning --- a mechanism of learning by choosing information
selectively. In a series of papers (some jointly with K. K. Sung), I
have considered various formulations of the problem, theoretically
derived the conditions under which such techniques are likely to work
for function learning and pattern classification, and developed
applications to object detection and image retrieval. This direction
is of crucial importance in the intelligent retrieval of information
from large knowledge repositories where one has to derive intelligent
ways of sampling the target space. It is also closely related to the
general theme of incorporating prior knowledge usefully in machine
learning tasks.
Recently, in joint work with N.K. Karmarkar, I have developed a
framework for unsupervised learning that is tractable in the sense
that all algorithms within the framework provably converge to the
globally optimal solution in polynomial time --- a property that is
rare since most frameworks typically use gradient-descent type
learning schemes (backprop or EM) that converge to local solutions. A
provably correct algorithm for clustering has been derived within the
framework and extensions to various other kinds of learning problems
are being considered. Applications in speech, vision, and data mining
are being developed.
The Human Language System
There are two aspects of the human language system that fascinate me
--- (1) that it is learnt (2) that it has two very different
manifestations, in the physical world as speech and in the mental
world as language. How the child might move from the highly variable,
continuous, acoustic stream that it gets to the structured, discrete,
symbolic representations of language poses some of the deepest
unsolved scientific questions of our time. How we might get a computer
to do the same presents some of the greatest technological challenges
that we face. I list below some of the research directions I have
chosen to conentrate on --- each topic below has some connection to
learning and/or recognition that I think is interesting.
Speech Recognition and Perception
Work with Victor Zue attempted to characterize speaker variability and
incorporate articulatory constraints in speech recognition. More
recently, with a variety of people at Bell Laboratories, I have been
exploring alternative techniques for speech recognition. This is
motivated by the fact that there seems to be good reason to believe
that the lexicon is organized in terms of distinctive features and
acoustic cues for these features are distributed in a non-uniform
manner in the time-frequency plane. The research program has several
sub-components including frameworks for the robust and accurate
detection of distinctive features and for the integrating the
asynchronous outputs of such feature detectors to form phonetic
hypotheses. We are proceeding on these issues in parallel using
techniques from machine learning, linguistic representations, and
signal processing to construct a perceptually motivated approach
towards speech recognition that seems promising at this point.
Language Acquisition
This is the classic learning problem that humans solve --- they learn
their native language. I have developed (jointly with R. C. Berwick)
algorithms for the acquisition of syntax and analyzed the
informational complexity of learning syntactic problems. However,
syntax is only a small part of the language acquisition story --- the
child receives continuous speech inputs. From this it has to uncover
the phonetic inventory, the phonological rules, the lexicon and so
on. I am currently examining ways in which this sort of information
can be extracted with a focus on acquiring phonetic and phonological
knowledge. Progress would lead to computers that can automatically
learn language directly from the speech signal --- much as humans do.
Language Evolution
A twist to the whole language acquisition story is provided by the
fact that if children truly attained the language of the parental
generation perfectly, then languages would be transmitted perfectly
from generation to generation with no change. This however is not true
since we know that languages change with time. By considering a
population of language learners and taking ensemble averages over the
population, one can derive models of language change. Such models are
the evolutionary consequences of language learning. This has developed
into an extremely promising direction of research and suggests a
computational framework within which various aspects of historical
linguistics and language evolution can be studied --- something that
was not possible before. In addition to the obvious applications to
historical linguistics, there are strong algorithmic connections to
genetic algorithms, artificial life, populations of interacting
agents, computational economic agents and the like that I would like
to explore further to shed light on the general theme of the
interaction of learning with evolution.
APPLICATION AREAS
My research portfolio, spanning as it does the fields of learning,
language, and vision will result in many applications in the context
of the following two areas of technology:
Multimodal, Intelligent, Adaptive, Human-Computer Interaction
Ultimately, we want to be able to build multimodal computer systems
that interact with humans and learn from such interactions. My
research on learning, vision, and language is aimed at this eventual
goal. In the process of understanding the fundamental principles that
would underlie the construction of such human computer interaction
systems, various shorter term applications can be conceived,
e.g. audio-visual speech recognition; combining speech recognition
with natural language processing leading to spoken language systems;
learning to recognize salient audio-visual cues that are correlated
with end user objectives, combining handwriting recognition with
language modeling etc.
Analysis and Retrieval of Large Knowledge Repositories
Increasingly, we are being forced to deal with huge amounts of
data --- large databases arising from linguistic corpora, image
databases, internet browsing, neural data from fMRI and multiple
electrode studies and so on. We will need to understand the structure
of such data sets, store them and retrieve them intelligently. There
is a natural nexus therefore with techniques that lie on the boundary
of computer science and statistics --- precisely where modern
computational learning resides and I expect applications to emerge
from my work in this area.