Basic Structure, ComSci 221, U. Chicago

Basic structural concepts in programming languages

Lecture Notes for Com Sci 221, Programming Languages

Last modified: Fri Jan 27 15:11:19 1995

How "high-level" can a programming language be?

Very roughly, progress in programming languages is measured by the use of higher and higher "levels" of languages. That is, languages that allow the structures of programs to correspond more closely to the structures of the problems that they solve, and be less constrained by the structures of the machines that execute them. In the bad old days, there was an overemphasis on run-time efficiency. That is, we assumed that we knew exactly what a program should do, and a number of ways to make it do so, and the action was in making it do so as quickly as possible, using as little memory as possible. This assumption favored programming languages that followed the structure of the underlying machines very closely. Such programming languages were also relatively easy to design and implement. By now, it is clear that the cost of using computer software is determined overwhelmingly by the cost of testing, debugging, modifying, and otherwise maintaining programs, rather than by the cost of running them, or even of writing them in the first place.

Higher-level programming languages, if well designed and implemented, make the truly costly part of the software business more efficient. There is often some cost in run-time efficiency, but overall it is negligible compared to the savings in human labor. Mature implementations of high-level languages even beat human-coded machine language for fun-time efficiency in many cases. It is easy to be intimidated by a person, posing as one of the cognoscienti of computing, who denigrates careful structuring and claims that "practical" code in the "real-world" is arcane and full of intricate and obtuse tricks to gain efficiency. In the overwhelming majority of cases, the only "practical" aspect of overly clever code is that it allows the programmer to earn more salary accomplishing less. Good code is code that explains itself, not code that requires a specialized expertise to understand.

It may seem that the proper future of computing will make all programming languages obsolete, by allowing people to merely describe what they want from a program, and applying some sort of "automatic programming" to do the hard work. There is good reason for skepticism. Many years ago, the problem of "automatic programming" was declared to be solved. The solution was the assembly language. In my opinion, while the amount of detail worked out automatically by a programming language implementation will become more and more impressive, and the pointlessly difficult or tedious parts of programming will be removed, whatever language is used for communicating with computing machines will be recognized as different than the language in which we ask for an ice cream cone. Conceivably, it will not be called a "programming language," and conceivably it will be expressed in a natural language form, such as English. But, the sort of English used to communicate specialized problems precisely will necessarily be a somewhat special and precise sort of English. It is the complexity of the problems that we wish to solve with computers, and the precision that we demand in the solutions, that makes programming difficult. As more and more sophisticated problems are solved, and put in the can, we will expect programmers to accomplish more, rather than expecting programming to be easier.

A small example of structure

Perhaps we need a concrete example of the meaning of "structure" in a program, to ground all of the abstract discussion. Recall from the introductory lecture that the structure of a program resides in the relations of the program to other things. Now, consider the problem of summing lines of integers, solved in [note] schematic pseudoPascal in two different ways:

while there is another line of input do
  sum := 0;
  while there is more input on this line do
    read(next);
    sum := sum + next
    end while;
  write(sum)
end while

Summing lines of integers, method A

read in all input to a 2-dimensional array;
sum each row to a 1-dimensional array;
write the sums

Summing lines of integers, method B

If we focus on structure as inherent in the text of a program, we might descibe method A as "nested while loops, iterating through lines and integers on each line, with output in the outer loop" and method B as "sequential reading, summing, and writing" (of course, a more detailed description of B would indicate nested loops in the reading section). The preceding descriptions tell us what the programs look like, but little of value in using them. More important structural qualities of the two programs include:

A converts easily to interactive operation, with any binary operation instead of +, while B is inherently a batch program.
B converts easily to sums and other combinations of columns, diagonals, or other substructures of the input, while A is limited to rows.

That is, the important structural properties of A and B have to do with the modifications that can easily be made to solve different problems, rather than to the textual look of the particular solutions to the given problem. The relational/modificational properties are related to the textual ones, but not completely explained by them. For example, the interactive potential of method A arises, not from the presence of nested while loops per se, but from the fact that the control structure for the calculation follows the structure of the input. The flexibility of method B with respect to different substructures arises, not from the sequential nature per se, but from the fact that input, calculation, and output are separated into independently modifiable subsections (called "modules" in programming languages jargon).

Basic structural terminology regarding programming languages

Dimensions of progress in programming languages

Now, shift your attention back to the progress of programming languages from "low-level" languages that tie the structure of a program to the structure of a machine, toward "high-level" languages that allow the structure of a program to mimic the structure of the problem that it solves. Starting with the machine language of a von Neumann/RAM ideal machine, "higher-level" programming languages have so far tended to progress along two different dimensions:

Execution model: the basic capabilities of the language for manipulating values and for controlling the flow of execution
Organizational model: the structural capablilities of the language for combining and modifying modular pieces of program code

Exercise: Reconcile the 2 dimensions above with the 4 points given in the text under "What Languages Provide" on pages 11-12.

New execution models are often understood as "rebuilding the machine": using the given machine to simulate one with a more powerful set of datatypes, primitive operations, and control structures. For example, an ideal machine might have only integers as data values, only addition, subtraction, and multiplication as primitive operations, and only conditionals with one-statement then clauses and gotos as control structures. Such a machine may be used to implement a language with characters (represented by their numerical positions in a collating sequence), division (implemented as a short program involving addition and subtraction), and loops and nestable conditionals (implemented by combinations of restricted conditionals and gotos). New organizational models involve facilities for programmers to define new names and associate them with locations, datatypes, operations, values, etc. Certain constructs provide some progress in both dimensions. For example, Algol procedure definitions extend the simple execution model with recursion as a control structure, and also extend the organizational model by allowing programmers to define new operations and give them names so that they can subsequently be used as if they were primitive operations.

Compilers and interpreters

In order for a newly designed programming language to be used, it must be somehow implemented in terms of previously implemented languages. Every chain of language implementations must lead eventually to a machine language, which is implemented by the hardware (wiring and physical components) in a machine, rather than by software. There are two basic types of programming language implementation in common use today:

compilers: translate programs from a new language (the source language) into a previously implemented language (the object, or target, language)
interpreters: read programs in the source language and execute their intended effects directly

In principle, these categories are ambiguous, but in practice the vast majority of implementations are naturally classified as

compilers (with little bits of interpration included) or interpreters (with little bits of compiling included). In the world of natural languages, a compiler is analogous to a translator (some people call programming language compilers translators). A programming language interpreter does not correspond very directly to a normal activity in natural languages. Imagine someone who understands English using detailed instructions, written in English, to allow it to cook according to a recipe written in French. A fragment of the instructions might say, "if the next word in the French recipe is 'mélangez', then stir with a spoon." That's sort of what's in a programming language "interpreter."

Generally, a compiler invests a certain effort in the initial translation of a program, in order to be able to execute the program repeatedly. Execution requires only the object code: no further reference to the source is needed. When a program is to be executed repeatedly, and efficiency is important, a compiler is usually used for the implementation. Interpreters are sometimes faster for a single execution of a program (although even a large number of iterations of a loop body may easily throw the advantage to a compiler even for a single execution). Interpreters are usually easier to write than compilers, and it is usally easier to provide debuggers and other useful programming tools with interpreters. All of these general rules have exceptions, though. So far in the history of computing, compilers are the dominant method for producing commercial software, but interpreters are used for most experimental implementations of radically new programming languages.

When a program is compiled and executed, the total work and resources are often analyzed into

compile-time: things that happen only during the compilation of the object code from the source
run-time: things that happen each time the program (or a relevant fragment of the program) is executed

The word "time" in both of these phrases refers to the different regions in time during which the two activities go on, not to time as a resource to be conserved. So, it is perfectly sensible to discuss "run-time space consumption" vs. "compile-time space consumption." Generally, compilers try to use cleverness at compile time to avoid extra work at run time.

End Friday 13 January
%<----------------------------------------------

%<----------------------------------------------

Begin Monday 16 January

Qualities of programming languages and their implementations

Here comes another ridiculously long list. These lists are not things to try to memorize. They have no pretensions at being complete or definitive. But, it is important to think briefly and incisively about the items on the lists, to get an idea of the broad scope of the issues that affect programming languages. So, here is a list of a few of the qualities that affect the value of particular programming languages and their implementations:

Functions definable
Size of compiled object code
Run-time efficiency
Transparency to
- machine code
- efficiency
- mathematical definitions
Datatypes definable
Timing and control abilities
Self-documentability, readability
Conciseness of source code
Extensibility
Primitives
Modifiability of programs
Maintainability of programs
Reusability of programs
Version control facilities
Support for team programming efforts
Debugging tools
Modularity features
Interfaces to other languages and systems
Interactive abilities
Input/Output abilities
Control structures
Syntax
Typography/Display
Specialized editors for programming

Whew! This is probably the last silly list for this course. We cannot possibly critique even a tiny fraction of the relevant issues in judging the quality of a programming language design or implementation. But, it is important to think about the broad scope of those qualities, in order to focus attention on particular ones. Even the research professionals in this area have a very difficult time focussing their thinking. When confronted with the whole package of a programming language implementation, it is hard to avoid judging the entire quality of the idea from one or two often superficial impressions in the areas that affect our first naive uses of the language.

Such a superficially holistic approach is often a valid way to choose a particular language to use right now for a short-term task. But, it is no way whatsoever to try to understand the underlying structural concepts that can lead to fundamental progress in programming languages. For example, when you use Prolog you may observe that many aspects of the notation in which programs are written are incredibly awkward. That is a true, but uninteresting, observation about Prolog. The thing to learn from Prolog is the potential power of its radically different basic computational mechanism. You will understand how to think about programming languages when you are able to critique separately and independently those parts of design and implementation that are in fact separate and independent. Research progress has tended to remove dependencies that were long assumed. For example, when Algol was first promulgated, recursive procedures were seen as a cute programming tool for quick and inefficient programming, but inherently slow to execute, and never suitable when speed is important. Since then, recursion has in fact been made quite fast, and in many contexts may be used quite casually.

In this course, we will focus on the suitability of programming languages to compute particular functions from input to output with modest efficiency. We will occasionally consider interactive tasks (where a particular interleaving of input and output is required), but not real-time problems (where the precise timing of inputs and outputs is crucial). We will take a particular interest in the structural issues that make programs easy to write, understand, and modify. There is nothing fundamentally special about these characteristics, but they are where the most interesting progress has been made in programming language design so far.