## CSPP 56553 - Artificial Intelligence Winter 2004 Homework #3: Due February 4, 2004

### Goals

Through this assignment you will:
• Consider issues in the design of memory-based systems.
• Explore the effect of different expansion strategies on information retrieval

### Background

Previous homeworks have explored the representation and manipulation of knowledge to produce inferences. Using mechanisms such as rule-based systems and constraint satisfaction, we have seen how knowledge about the world and about specific problems can be encoded in different ways. Further, we have seen how the choices of knowledge representation can be manipulated in different ways. However, all these techniques share the fundamental requirement that the knowledge be captured manually by some knowledge engineer to enable later automatic inference. This homework begins our exploration of general techniques that can use the regularities in observed data to acquire functions from inputs vectors to output values or classifications. In other words, we will explore systems that can automatically learn from examples. We begin with two well-known and widely used learning techniques that learn functions from labeled training data: nearest neighbor and identification trees.

### Problem 1

#### Picking Moves

You've studied both alpha-beta minimax game play and nearest neighbor learning. An impatient friend suggests that all that search is simply too complicated and unnecessary since there are many on-line game servers that record the moves of a game. Your friend suggests that rather than doing alpha-beta you should simply download all the games from a server and use a memory-based technique. For any given game position (e.g. board position in a board game like chess or hand of cards in a card game), find a move in the server's collection that matches that position exactly and then do the move made by the player in the recorded game. You argue that things, unfortunately, aren't that simple. Give three problems with your friend's approach to game-playing. Please keep your answers brief (2-3 lines each).

### Problem 2

#### Information Retrieval

Formulate four (4) detailed queries in a domain you know well. You will use these queries to test your intuition in the questions below.

#### Part 1

Expand the queries developed about to include all of the morphological variants of each query word. Submit these expanded queries to two of your favorite search engines. Does such morphological processing seem warranted?

#### Part 2

Using WordNet, an online semantic ontology/thesaurus, expand your queries to include all the synonyms of each word in the original queries aboce. Report on the results of submitting the queries to the two search engines used above.

#### Part 3

Again using WordNet, expand your queries to include only those synonyms that are appropriate for the senses of the terms you intended in the original query. Submit these expanded queries to a set of search engines and compare the results to those in the previous exercise.

#### Part 4

Word sense disambiguation - picking the right senses of synonyms for expansion - seems to have little effect on retrieval effectiveness in settings where long queries are used. Suggest reasons why this might be the case.

#### Part 5: Relevance Feedback

In class, we suggested that you could use information from the set of documents retrieved in an initial query to improve the query formulation by adding new terms.

Positive evidence

Consider the case where you select five documents that you believe to be relevant to use as a source of terms to add to the query. Which terms would you select? Consider values that we suggested should contribute to the weight of query terms. Explain why.

Positive and Negative evidence

Now assume that you have identified 5 non-relevant documents that were nevertheless ranked highly by the first pass of the retrieval system, in addition to the 5 relevant documents chosen above. How would you modify your expansion process given this new evidence? Justify your answer.

### Problem 3

#### Building a Recommender System

You would like to build a book recommender system similar to that on Amazon.com. Based on purchasing patterns - since it's hard to get people to submit actual ratings, you would like to identify similar shoppers, and recommend to them books that were purchased other similar purchasers.

Formulate input feature vectors for training and classification based on book categories and purchasing patterns. Given five example vectors.

Describe the type of labels you assign as classifications - aka the output.

Describe the similarity function for identifying the nearest neighbor.

Some people have similar tastes in book types, but some may be much more voracious readers than others. Does this present a problem for your similarity measures? If not, why not? If it does, explain how to modify your measure to over this problem.