1 Data definitions
2 Calculating best fit
3 Plotting points and lines
4 Submitting your work
Version: 4.1.1.1

CMSC 15100: Lab 3

In this lab (credit: Adam Shaw) you will write a program that performs a linear regression analysis on a given dataset and plots the result as an image.

Linear regression is an idea borrowed from statistics. Roughly speaking, it is a method for going from this:

to this:

where the line it draws is the best-fit line, the line that passes the closest to each point.

For this exercise, use the language Beginning Student with List Abbreviations and add the image.ss teachpack.

1 Data definitions

Write data definitions for the data that will be useful in this lab: some representation of a point that has x- and y-coordinates (we’ll call it point in the rest of this lab), some representation of a list of points (which we’ll call a dataset), and a representation of a line in slope-intercept form (i.e., a line is defined by a slope m and an intercept b in the equation y = mx + b), which we’ll call an equation.

Also write examples and templates for each of these kinds of data.

(Note: you do not necessarily need to define structures for each of these. All you need to do is write down how they will be represented in your program, which might be a structure you define, a pre-existing structure, or something else entirely.)

2 Calculating best fit

Write the function

  best-fit-line : dataset -> equation

The slope of the best-fit line for a particular dataset is given by the formula

The intercept of the best-fit line for a particular dataset is given by the formula

In these formulas, n is the number of points in the dataset, and x and y refer to the x- and y -coordinates of the points in the dataset, and the notation Σx means "the sum of the x coordinates of each point in the dataset."

To make things a little simpler, you may assume that all denominators are always non-zero.

Break best-fit-line into small helper functions. For reference, my version uses seven helper functions, none of is longer than three lines.

3 Plotting points and lines

Write the function

  plot-dataset : dataset number number number number -> image

that makes an image of the given dataset on a graph that has the given minimum and maximum x and y coordinates. Then write the function

  plot-regression : dataset number number number number -> image

that does the same thing as plot-dataset but also draws the best-fit line on the graph. For this second task, use the function add-line provided by the image.ss teachpack.

The outputs should look something like the images in the introduction, generated from this dataset: {(0, 0), (1, 1.5), (1.5, 0.5), (2, 2.5), (2.9, 1) (3, 3.5), (4, 3.5), (4.5, 2), (4.5, 4)}.

There is a subtlety here: when plotting data it’s traditional for the x-axis to point to the right as x-coordinates get larger and for the y-axis to point up as y-coordinates get larger. But in computer graphics generally, and in the image.ss teachpack specifically, the y-axis points down as the y-coordinates get larger. Furthermore, if you want your graph to be big enough to see, you’ll probably want the point (1,1) to be more than one pixel away from the point (1,2).

A good way to make this problem easier is to define a function that produces a background image for the given minimum and maximum x- and y-coordinates, and puts its pinhole in the position where the point (0, 0) would be plotted.

4 Submitting your work

Submit your work by Friday night using the hand-in tool built into DrScheme. If you worked with a partner, you must submit twice – once under each partner’s name.