At some point in the future we're going to want little robots to help
us in our everyday lives. After all wouldn't it be nice to have a
robot wash your dishes, throw out trash, and cook dinner? I've been
thinking about the types of knowledge you explicitly need to know in
order to act appropriately. I believe it won't be the case that we'll
structure our homes around robots; rather, robots will have to adapt
to our environments instead.
A nice thing about our environments is our need to impose some sort of
structure on it. So when you go looking for pancake mix in a grocery
store and you see a sign saying "maple syrup", you might think pancake
mix could be nearby. That's an example of knowing the organization of
an environment. In designing sensors to process images I go one step
further and make a bunch of assumptions concerning the visual images.
For example, light comes from above, shelves always have items, etc.
You can take a look at some of these images in the simulated store.
I'm exploring the ways in which people use general principles to
function intelligently in man-made environments. For example, you
know how to get a drink of water in a house even if you've never been
there before.
To study explore my ideas more concretely, I built an agent, Shopper,
which shops in the domain of grocery stores. Its simplest task is to
find grocery items in GroceryWorld.
In order to do this, I first identified two types of useful
information: structural and perceptual regularities.
Structural regularities
Structural regularities are the social norms to which a culture
subscribes so that people can get along without much trouble. Driving
is a good example because it's an activity created by people, replete
with rules and regulations. Regular violation of a rule---say running
through a stop light---downgrades everyone's performance on the road.
But regular adherence to the rule upgrades everyone's performance.
Kitchens are also a good example. The cupboards, refrigerator, and
drawers have customary uses which people learn and follow. For
example, looking for a fork in anywhere other than a waist-high drawer
can be a futile effort. But specifically looking for a fork in a
waist-high drawer can significantly improve performance. Essentially,
structural regularities are the rules of thumb a society follows.
Deviation from these rules often causes more harm than good.
A designer can use structural regularities to construct a plan library
which constitutes the agent's operational knowledge for accomplishing
tasks. For the task of finding a fork in a kitchen, there are no
guarantees that a fork will be in a waist-high drawer, but is a likely
location. Instead of considering all the conceivable places a fork
could be, an agent should first consider the appropriate places for a
fork. In turn, a designer can write a plan which prescribes a course
of action to search for a fork in a drawer first. After finding a
fork, the agent can then fetch and return forks to the same drawer.
Perceptual regularities
Perceptual regularities are invariant properties an environment
has with respect to perception. For example, only stop signs have an
octagonal shape on a road. They are also the only signs which are,
aside for text, all red. A driving system which needs to know whether
there is a stop sign at an intersection might be able to only look for
a red octagonal shape to the right side of the road. Here, an
octagonal shape over a predominantly red and white color region are
simple features which can be cheaply detected by a vision system,
allowing a designer to construct a simple yet effective mechanism.
Initially, I was just trying to get Shopper to look at signs and
search aisles for a sought item. In addressing this problem, I
roughly divided work into action selection and
perception. Action selection has to do with figuring out what
to do now in order to accomplish tasks. Perception has to do
with figuring out what to look at, and where to look.
Action and perception
In response, I built a simple plan execution system based on the
Runner and RAPs systems. Essentially,
Shopper executes hierarchical plans which check the state of the world
and recursively enable other plans. For example, the basic search
plan for Shopper is to go to the end of an aisle, and then start
moving across aisles until it sees a relevant sign. If Shopper ends
up back where it started, it knows that that plan didn't work. But if
a sign is seen---let's say Shopper is looking for AppleJacks and it
sees a cereal sign---then Shopper will activate an "aisle search" plan
which will have it look to the left and to the right while traveling
down the aisle searching for AppleJacks. As Shopper looks to the left
and the right, it uses its vision routines to interpret the images
it's seeing. For example, it'll first figure out where the shelves
are, find color regions on top of the shelf close to AppleJacks, and
then try template matching. I hooked the perception and action
selection mechanisms together so that Shopper could do a primitive
form of shopping. This was written up and was published in IJCAI-95.
Navigation
The next ability I wanted Shopper to have was the ability to make a
map of its environment so that it didn't have to search in the future.
One thing about humans is that we're predictable: we often buy a fixed
set of goods such as milk, cereal, butter, laundry detergent, apples,
etc. So, it makes sense to remember where we find these. This way we
don't have to keep searching all the time---it's simply less
efficient. Sort of like driving with a map instead of just knowing
where to go.
One problem with current map-making research is that it often assumes
that the goal of the robot is to make a map. That's not true for
Shopper. It has to search for an item, and when it finds it, it then
needs to remember. Making a map of the store first is possible, but
people don't do that and why should a robot have to? In order to come
up with a solution, we need to address these problems:
path planning
route following
location identification
map making
task
Previous research has often held one of these five "constant" while
addressing the other four. My response to these problems was to have
Shopper make a map incrementally while it was searching for items.
Essentially, the idea is to have a passive mapper monitor Shopper's
actions (both physical and perceptual) and note when it was at an
intersection, or when it found something. It works reasonably well
and was published in AAAI-96.
Noticing opportunities
Earlier I mentioned Shopper using perceptual regularities. Relying on
perceptual regularities encourages the design of specialized
mechanisms. While specialized mechanisms are simple and, in general,
very efficient, they are not general. In the same way that forks
could be stored in a refrigerator, there could be times when the world
presents circumstances which a specialized mechanism can't handle
because we didn't anticipate them.
Shopper doesn't solve the hardest problem of expecting the unexpected.
It does, however, expect the not-quite-unexpected: it uses its map of
the environment to predict opportunities. This way it can get a
handle on the perception necessary, plus know when to take advantage
of an opportunity.
Here's some common questions I get about this research project.
- Can't you automate this?
- Well, of course. For every specific instance of everyday life, we
can engineer the environment to be more accommodating to our
artificial friends. But the question misses the point: We want
agents who can live in our everyday world---not the other
way around!
- What about these vision routines? They're very specific for the
task. Do they really say anything about general vision?
- Yes they do. You can view these routines as computing the minimal
information necessary to perform the task. Of course, when
conditions substantially change, they won't work. But for every
actual instance where a routine might fail, I believe a
new mechanism might be created and substituted. If this
hypothesis is true, then it poses a new way of approaching
general vision. But it also raises a different sort of problem;
namely, deciding which mechanism to use at any one time.
Last modified 07 November 1997