Motivation

At some point in the future we're going to want little robots to help us in our everyday lives. After all wouldn't it be nice to have a robot wash your dishes, throw out trash, and cook dinner? I've been thinking about the types of knowledge you explicitly need to know in order to act appropriately. I believe it won't be the case that we'll structure our homes around robots; rather, robots will have to adapt to our environments instead.

A nice thing about our environments is our need to impose some sort of structure on it. So when you go looking for pancake mix in a grocery store and you see a sign saying "maple syrup", you might think pancake mix could be nearby. That's an example of knowing the organization of an environment. In designing sensors to process images I go one step further and make a bunch of assumptions concerning the visual images. For example, light comes from above, shelves always have items, etc. You can take a look at some of these images in the simulated store.

Project Description

I'm exploring the ways in which people use general principles to function intelligently in man-made environments. For example, you know how to get a drink of water in a house even if you've never been there before.

To study explore my ideas more concretely, I built an agent, Shopper, which shops in the domain of grocery stores. Its simplest task is to find grocery items in GroceryWorld. In order to do this, I first identified two types of useful information: structural and perceptual regularities.

Structural regularities

Structural regularities are the social norms to which a culture subscribes so that people can get along without much trouble. Driving is a good example because it's an activity created by people, replete with rules and regulations. Regular violation of a rule---say running through a stop light---downgrades everyone's performance on the road. But regular adherence to the rule upgrades everyone's performance. Kitchens are also a good example. The cupboards, refrigerator, and drawers have customary uses which people learn and follow. For example, looking for a fork in anywhere other than a waist-high drawer can be a futile effort. But specifically looking for a fork in a waist-high drawer can significantly improve performance. Essentially, structural regularities are the rules of thumb a society follows. Deviation from these rules often causes more harm than good. A designer can use structural regularities to construct a plan library which constitutes the agent's operational knowledge for accomplishing tasks. For the task of finding a fork in a kitchen, there are no guarantees that a fork will be in a waist-high drawer, but is a likely location. Instead of considering all the conceivable places a fork could be, an agent should first consider the appropriate places for a fork. In turn, a designer can write a plan which prescribes a course of action to search for a fork in a drawer first. After finding a fork, the agent can then fetch and return forks to the same drawer.

Perceptual regularities

Perceptual regularities are invariant properties an environment has with respect to perception. For example, only stop signs have an octagonal shape on a road. They are also the only signs which are, aside for text, all red. A driving system which needs to know whether there is a stop sign at an intersection might be able to only look for a red octagonal shape to the right side of the road. Here, an octagonal shape over a predominantly red and white color region are simple features which can be cheaply detected by a vision system, allowing a designer to construct a simple yet effective mechanism.

Research Problems

Initially, I was just trying to get Shopper to look at signs and search aisles for a sought item. In addressing this problem, I roughly divided work into action selection and perception. Action selection has to do with figuring out what to do now in order to accomplish tasks. Perception has to do with figuring out what to look at, and where to look.

Action and perception

In response, I built a simple plan execution system based on the Runner and RAPs systems. Essentially, Shopper executes hierarchical plans which check the state of the world and recursively enable other plans. For example, the basic search plan for Shopper is to go to the end of an aisle, and then start moving across aisles until it sees a relevant sign. If Shopper ends up back where it started, it knows that that plan didn't work. But if a sign is seen---let's say Shopper is looking for AppleJacks and it sees a cereal sign---then Shopper will activate an "aisle search" plan which will have it look to the left and to the right while traveling down the aisle searching for AppleJacks. As Shopper looks to the left and the right, it uses its vision routines to interpret the images it's seeing. For example, it'll first figure out where the shelves are, find color regions on top of the shelf close to AppleJacks, and then try template matching. I hooked the perception and action selection mechanisms together so that Shopper could do a primitive form of shopping. This was written up and was published in IJCAI-95.

Navigation

The next ability I wanted Shopper to have was the ability to make a map of its environment so that it didn't have to search in the future. One thing about humans is that we're predictable: we often buy a fixed set of goods such as milk, cereal, butter, laundry detergent, apples, etc. So, it makes sense to remember where we find these. This way we don't have to keep searching all the time---it's simply less efficient. Sort of like driving with a map instead of just knowing where to go.

One problem with current map-making research is that it often assumes that the goal of the robot is to make a map. That's not true for Shopper. It has to search for an item, and when it finds it, it then needs to remember. Making a map of the store first is possible, but people don't do that and why should a robot have to? In order to come up with a solution, we need to address these problems:

path planning
route following
location identification
map making
task
Previous research has often held one of these five "constant" while addressing the other four. My response to these problems was to have Shopper make a map incrementally while it was searching for items. Essentially, the idea is to have a passive mapper monitor Shopper's actions (both physical and perceptual) and note when it was at an intersection, or when it found something. It works reasonably well and was published in AAAI-96.

Noticing opportunities

Earlier I mentioned Shopper using perceptual regularities. Relying on perceptual regularities encourages the design of specialized mechanisms. While specialized mechanisms are simple and, in general, very efficient, they are not general. In the same way that forks could be stored in a refrigerator, there could be times when the world presents circumstances which a specialized mechanism can't handle because we didn't anticipate them.

Shopper doesn't solve the hardest problem of expecting the unexpected. It does, however, expect the not-quite-unexpected: it uses its map of the environment to predict opportunities. This way it can get a handle on the perception necessary, plus know when to take advantage of an opportunity.

Questions & Answers

Here's some common questions I get about this research project.

Can't you automate this?
Well, of course. For every specific instance of everyday life, we can engineer the environment to be more accommodating to our artificial friends. But the question misses the point: We want agents who can live in our everyday world---not the other way around!
What about these vision routines? They're very specific for the task. Do they really say anything about general vision?
Yes they do. You can view these routines as computing the minimal information necessary to perform the task. Of course, when conditions substantially change, they won't work. But for every actual instance where a routine might fail, I believe a new mechanism might be created and substituted. If this hypothesis is true, then it poses a new way of approaching general vision. But it also raises a different sort of problem; namely, deciding which mechanism to use at any one time.


Last modified 07 November 1997