Introduction, ComSci 221, U. Chicago

Introduction

Lecture Notes for Com Sci 221, Programming Languages

Last modified: Fri Jan 27 15:13:25 1995

Nature of the course

There are at least 4 different courses that are taught in different CS departments under the title of "Programming Languages":

how to program in several different languages
survey of the history and nature of several languages
how to implement programming languages
conceptual issues in programming languages

Com Sci 221 is the 4th sort of course. We will study the nature of programming languages, what they can and should be, rather than precisely what they are and how to use them. We will use Pascal, C, Scheme, ML, and Prolog as examples to illustrate important concepts, but these languages and their programming techniques are not the focus of our study. Why do I teach a concepts course, instead of programming techniques, survey or implementation?

I like it
it's the U. Chicago egghead thing to do
the concepts last longer than the rest

Computing techniques in general, and programming languages in particular, are still changing so quickly that detailed knowledge of a given programming language becomes obsolete quickly. I disapprove of the idea of measuring progress in computing by the number of different programming languages that you "know." There is so much that is useful to know about computing, and new stuff is invented so fast, that premature immersion in detailed information is wasteful. The right approach to computing is to learn the concepts of problem solving, and to learn how to look up and understand the relevant details as you need them. To start using a new programming language, a good computer practitioner needs a manual and a few hours of study, not a quarter course at a liberal arts college.

So, in Com Sci 221, we will try to understand the fundamental structural issues that lead programming languages to be what they are. The course will not prepare you thoroughly to program in particular languages. But, it will give you the foundational insights to understand the programming languages of the future, when they are invented. It will not teach you to write interpreters and compilers, but it will prepare you to take a course, or read a book, on programming language implementation.

Multiple views of a programming language

There are 3 different points of view from which to consider a programming language:

designer (the inventor of the language)
implementor (the one who makes the interpreter or compiler work)
user (the one who writes programs in the language)

Programming language courses can take any one of these points of view. Com Sci 221 takes all 3. This makes life harder for you, but it's the only way to really understand the structural concepts in programming languages. Programming languages are what they are because all 3 of these types of people must deal with them successfully. The difficulty of this course does not lie in the inherent complexity of any of the ideas and skills that I expect you to understand. Rather, it lies in the need to switch flexibly and quickly between different approaches to a problem. Not only must you be able to consider in rapid succession the 3 points of view, but you must also apply simple mathematical tools, socially determined conventions, and intuitive common sense, to each problem. Sometimes doing math is not as hard as knowing when to do math, and what math to do.

Once a programming language is designed and implemented, and in principle only the user needs to be involved with it, you might think that life becomes simpler. Not really. The problem is that users must do many things with programs. Introductory programming courses focus attention on writing programs, but that is only a small part of the activity involving a program, which includes:

writing the original program
executing it on useful inputs
reading it to understand what it does
testing it to find errors
correcting the errors
modifying it to adapt to changes in the world and our needs
adapting and extending it to provide more useful services
porting it to different computers and languages

End Wednesday 4 January
%<----------------------------------------------

%<----------------------------------------------

Begin Friday 6 January
Furthermore, these activities go on under conditions that vary along many different dimensions, including:

program size (dealing with a 1000-page program is completely different from dealing with a 12-line program)
frequency of use (a program that is used millions of times should meet different standards from one that is used once or a few times)
who uses it (writing programs for use by strangers is different from writing for your colleagues, your boss, yourself)
how efficient does it need to be (sometimes, speed requirements conflict with other requirements)
the consequences of errors (a program for a life-support system should meet different standards from a program to play backgammon)
stability of the requirements (a program to compute orbits of astronomical bodies might not have to change a lot; a program to figure income tax must change each year)
who modifies it (a program that must be modified by others must be particularly understandable)

If you start multiplying out the numbers of significantly different possibilities in each of these dimensions, you'll see why a lot of imagination is required on your part to take the tiny examples we will have time for in this course, and extrapolate the insights that they provide into general understanding of the world of programming languages.

Introductory programming courses, due to their limited time and resources, are inherently deceitful. Many instructors try to teach programming techniques that are appropriate for moderate sized (10s of pages) programs, heavily used by strangers over a long period of time, with a high cost for errors, a modest need for efficiency, and the potential for many modifications made in the future by strangers. This is a sensible choice, since it requires a very disciplined style of programming (it's easier to relax discipline in later work than to tighten it), but it can be done by individuals in a few days. And, fairly disciplined programming is advantageous in typical "real world" applications, where the cost of maintaining programs is much greater than the cost of writing them initially. Unfortunately, the actual experience you usually have with homework exercises is quite different. You write tiny programs (1 or 2 pages), use them once yourself to produce sample executions, feel no incentive for reliability, a slight incentive for efficiency, the programs are never modified, and you throw them away immediately after they are graded. Even in the "real world" it is hard to create incentives for disciplined programming, since the original creator of a program is often not the one who must maintain it.

Most students notice intuitively that the disciplined techniques espoused by the instructor are not really useful for just getting the assignments done. It is quite difficult to develop the vivid imagination required to see the utility of disciplined programming techniques in a more realistic setting than the completion of a homework assignment. Another hard thing about Com Sci 221 is that I require a lot of that sort of imagination. All of the homework programs will be really tiny examples, solving toy problems with only an oblique connection to the "real world." Rather than exercising programming ability, they will illustrate key structural concepts in the shortest possible way, by focussing on the special qualities that make one language essentially different from another. In spite of their tininess, the homework programs will be difficult because

the problems will be peculiar
you won't have time to really "learn" each language: rather you will have the manual beside you, and fly by the seat of your pants
I will demand that you use your programs, along with explanatory short essays, in order to explain something interesting about programming languages, not just to produce a particular silly output

A mind-opening paradox

Here is a simple thought experiment, intended only to open your mind to the fact that obvious facts about programs are often false.

Axiom: A correct program is better than an incorrect one.

An axiom is a self-evident proposition, and it's hard to imagine anything much more self-evident than the proposition that a correct program is better than an incorrect one. Like many axioms, this one is false.

Consider the rather silly toy problem of printing out the vowels in the Roman alphabet (I'm already asking you to use your imagination to scale the structural qualities of tiny toy programs up to practical "real world" programs that would be too big to discuss in class). Here is a correct program to solve this problem, using [note] Pascal notation:

n := ord('A')
for i := 4 to 8 do
  write(chr(n))
  n := n + 2 * (i div 2)

Believe it or not, this program works. Now, consider the incorrect program:

write('A')
write('E')
write('O')
write('U')

Which is more valuable? (Remember, you are imagining that these programs are really much longer, so the work involved in writing them is substantial, rather than trivial.)

Although the first program is correct, it's hard to see why. If it were solving a problem for which you did not already know the answer (which is usually the case for useful programs), then it would be very difficult to develop confidence in the program. If an error were introduced by accident, it would be quite difficult to correct.

Although the second program is incorrect, it's obvious what it does, and very easy to correct the one small error. The resulting program is clear, and we can be pretty confident that it's now correct. Finally, imagine that we are told to reinterpret the problem so that 'Y' is also treated as a vowel. The second program is easy to modify for this purpose. It is hard to imagine modifying the first program for any reason.

What is structure?

In this course, we focus attention on the structure of programs and the languages that they are written in. What the heck is "structure"? Webster says:

structure: 1: manner of building, constructing, or organizing

The phrase "structure of a program" sounds as though the structure is inherent in the program itself. That may be true for physical objects, but for abstract objects such as programs it doesn't make much sense. Notice that Webster's definition attributes structure, not to an object, but to the way in which it is constructed or organized. Consider that the same final result may be constructed or organized in more than one way. For programs and other abstract objects, we need to go even farther than Webster in attributing structure, not to an object, but to the way that object relates to other things.

For our purposes, the structure of a program is given by the operations that we can perform on the program, and the relation of the program to other objects of interest. For example, the relation of the program to its well-formed parts is clearly an important aspect of structure. But, the relation of a program to the other programs that it can be transformed into, and the amount of effort required for such transformations, are also crucial. In the second vowel-printing example above, the ease of transforming the program into a correct one is a key aspect of its structure. Also, the relation of a program to readable English text is a structural aspect that is important to clarity. Later in the course, when we study program typing, we will find that the relation of a correct program to similar syntactically incorrect programs may be important, and that the presence of the syntactically incorrect programs may even improve the structural properties of the correct one.

I have deliberately not attempted a precise definition of program structure, but merely discussed its nature in a general way. In effect, the interesting content of this course is the discovery of which interpretations of the word "structure" are most fruitful in various contexts. In a particular precisely defined context, it is possible to give a precise definition of structure, but we will never inhabit any single precisely defined context long enough to make that worthwhile. If you are mathematically inclined, you might wish to ponder the way in which algebra formalizes the concept of structure (if not, you may ignore this passage with impunity). The signature of a class of algebras determines the operations and relations that are relevant in that class. Isomorphism is structural equivalence, and homomorphism shows that one structure is completely represented by another.

End Friday 6 January