\chapter{Equational Logic Programming}{Michael J.~O'Donnell}
\label{cha:equational-logic-programming}
\newcounter{eqnex}\newcounter{oveqnex}\newcounter{seqeqnex}
\newcounter{opteqnex}\newcounter{trickyeqnex}
\vspace{2ex}
This text is a preprint of Chapter 2 in the {\em Handbook of Logic in
Artificial Intelligence and Logic Programming,} volume 5 on {\em Logic
Programming,} Dov Gabbay editor, Oxford University Press. Donald
W.~Loveland, Gopalan Nadathur, and David J.~Sherman served as official
readers, and provided useful feedback on drafts of the chapter.
\section[Introduction]{Introduction to Equational Logic Programming}
\label{sec:intro}
\subsection{Survey of Prerequisites}
\label{sec:prerequisites}
\subind{equational logic programming|(}\subind{logic programming!equational|(}
Sections~\ref{sec:equational-logic}
and~\ref{sec:functional-programming} of the chapter `Introduction:
Logic and Logic Programming Languages' are crucial prerequisites to
this chapter. I summarize their relevance below, but do not repeat
their content.
Logic programming languages in general are those that compute by
deriving semantic consequences\subind{semantic consequence}
\subind{consequence!semantic}
of given formulae in order to answer questions. In equational logic
programming languages, the formulae are all equations expressing
postulated properties of certain functions, and the questions ask for
equivalent normal forms\subind{normal form} for given terms.
Section~\ref{sec:equational-logic} of the `Introduction \ldots'
chapter gives definitions of the models of equational logic, the
semantic consequence relation $$\mathbold{T}\Emodels(t_1\formeq t_2)$$
(\mbox{$t_1\formeq t_2$} is a semantic consequence of the set
$\mathbold{T}$ of equations, see
Definition~\ref{def:equation-semantic-system}), and the question
answering relation $$(\normalize t_1,\ldots,t_i\quantsep
t)\answers_{\formeq}(t\formeq s)$$\notind{$\normalize$ (normal form
question)} (\mbox{$t\formeq s$} asserts the equality of $t$ to the
normal form $s$, which contains no instances of $t_1,\ldots,t_i$, see
Definition~\ref{def:equational-query-system}). Since this chapter is
entirely about Equational Logic, we drop the subscripts and write
$\models$\notind{$\models$ (semantic entailment relation)} for
$\Emodels$\notind{$\Emodels$ (equational semantic entailment
relation)} and $\answers$\notind{$\answers$ (answer relation)} for
$\answers_{\formeq}$\notind{$\answers_{\formeq}$ (equational answer
relation)}. The composed relation $$(\normalize
t_1,\ldots,t_i\quantsep t)\answers\mathbold{T}\models(t\formeq s)$$
(\mbox{$t\formeq s$} is a semantically correct answer to the
question $(\normalize t_1,\ldots,t_i\quantsep t)$ for knowledge
$\mathbold{T}$, see Definition~\ref{def:semantically-correct-answer})
means that $s$ is a normal form---a term containing no instances of
$t_1,\ldots,t_i$---whose equality to $t$ is a semantic consequence of
the equations in $\mathbold{T}$. Equational logic programming
languages in use today all take sets $\mathbold{T}$ of equations,
prohibited forms $t_1,\ldots,t_i$, and terms $t$ to normalize, and
they compute normal forms $s$ satisfying the relation above.
Section~\ref{sec:functional-programming} of the `Introduction \ldots'
chapter explains how different equational languages variously
determine $\mathbold{T}$, $t_1,\ldots,t_i$, and $t$ from the language
design, the program being executed, and the input. An alternate style
of equational logic programming, using questions of the form
\mbox{$(\solve x_1,\ldots,x_i\quantsep t_1\formeq t_2)$}
\notind{$\solve$ (equation solving question)}
that ask for substitutions for $x_1,\ldots,x_i$
solving the equation \mbox{$(t_1\formeq t_2)$}, is very attractive for
its expressive power, but much harder to implement efficiently (see
Section~\ref{sec:equation-solving}).
There is a lot of terminological confusion about Equational Logic
Programming. First, many in the Prolog community use `logic' to mean
the First-Order Predicate Calculus (FOPC), while I stick closer to the
dictionary meaning of logic\subind{logic!dictionary definition}, in
which FOPC is one of an infinity of possible logical systems. Those
who identify logic with FOPC often use the phrase `equational logic
programming' to mean some sort of extension of Prolog using equations,
such as logic programming in FOPC with equality.\subind{equational
logic programming!as extended Prolog} In this chapter, `equational
logic programming' means the logic programming of pure equational
logic, as described in the chapter `Introduction: Logic and Logic
Programming Languages.'
A second source of confusion is that many equational logic programming
languages have been invented under different labels. Lisp\subind{Lisp}
\cite{McCarthy}\auind{McCarthy, J.}, APL \cite{Iverson}\auind{Iverson,
K. E.}, Red languages\subind{Red languages} \cite{Backus-red}\auind{Backus, J.},
Functional Programming languages\subind{functional programming}
\cite{Backus-turing,Hudak-Peyton-Jones-Wadler}, \auind{Backus,
J.}\auind{Hudak, P.}\auind{Peyton Jones, S. L.} \auind{Wadler, P.} many
dataflow\subind{dataflow} languages
\cite{Wadge-Ashcroft,Pingali-Arvind-1,Pingali-Arvind-2},
\auind{Wadge, W. W.}\auind{Ashcroft, E. A.}
\auind{Pingali, K.}\auind{Arvind} and languages for algebraic
specification of abstract data\-types\subind{algebraic
specification}\subind{abstract datatype}\subind{datatype!abstract}
\subind{specification!algebraic}
\cite{Futatsugi-Goguen-Jouannaud-Meseguer,Guttag-Horning,Wand}
\auind{Futatsugi, K.}\auind{Goguen, J. A.}\auind{Jouannaud, J.-P.}
\auind{Meseguer, J.}\auind{Guttag, J. V.}\auind{Horning, J. J.}
\auind{Wand, M.}
are all forms of equational logic programming
languages, although they are seldom referred to as such. This chapter
focusses on a generic notion of equational logic programming, rather
than surveying particular languages.
\subsection{Motivation for Programming with Equations}
\label{sec:motivation}
\subind{equational logic programming!vs.\ functional|(}
\subind{functional programming!vs.\ equational|(} From a programmer's
point of view, an equational logic programming language is the same
thing as a functional programming\subind{functional programming}
language\cite{Backus-turing}\auind{Backus, J.}. The advantages of
functional programming languages are discussed in
\cite{Hudak,Bird-Wadler,Field-Harrison}---equational
\auind{Bird, R.}\auind{Wadler, P.}\auind{Field, A. J.}\auind{Harrison, P. G.}
logic programming languages offer essentially the same advantages to
the programmer. Functional programming and equational logic
programming are different views of programming, which provide
different ways of designing and describing a language, but they yield
essentially the same class of possible languages. The different styles
of design and description, while they allow the same range of
possibilities, influence the sense of naturalness of different
languages, and therefore the relative importance of certain features
to the designer and implementer. The most important impact of the
equational logic programming view on language design is the strong
motivation that it gives to implement {\em lazy,} or {\em
demand-driven,} computation. \subind{evaluation!lazy vs.\
strict}\subind{lazy evaluation!vs.\ strict} \subind{strict
evaluation!vs.\ lazy}
In the conventional view of functional programming, computation is the
{\em evaluation\/}\subind{evaluation} of an input term in a {\em
unique\/} model associated with the programming language. This view
makes it very natural to evaluate a term of the form
$f(s_1,\ldots,s_n)$ by first evaluating all of the arguments
$s_i$, and then applying the function denoted by $f$ to the
values of the arguments. If the attempt to evaluate one of the
arguments leads to infinite computation, then the value of that
argument in the model is said to be an object called
`undefined'\subind{undefined!adjective vs.\ noun} (the word is used
here as a {\em noun,} although the dictionary recognizes it only as an
adjective), and typically denoted by the symbol $\bot$.\notind{$\bot$
(undefined value)} But, since the value $\bot$ is indicated by the
behavior of infinite computation, there is no chance to actually apply
the function denoted by $f$ to it, so that every function is forced to
map $\bot$ to $\bot$. Such functions are called {\em strict\/}
functions.\subind{strict function}
Early functional programming languages required all primitive
functions to be strict, except for the conditional function $\cond$.
The normal way to evaluate a term of the form
$\cond(s,t,u)$ is to evaluate $s$, then use its
value to determine which of $t$ or $u$ to evaluate, omitting
the other. The function denoted by $\cond$ is thus not strict, since
for example the value of $\cond(\true,0,\bot)$ is $0$ rather than
$\bot$. Only Backus\auind{Backus, J.} seems to have been annoyed by
the inconsistency between the nonstrictness of the conditional
function and the strictness of all other primitives. He proposed a
strict conditional,\subind{conditional!strict vs.\ nonstrict}
recovering the selective behavior of the nonstrict conditional through
a higher-order coding\subind{higher-order programming} trick
\cite{Backus-turing}. In effect, he took advantage of the nearly
universal unconscious acceptance of a nonstrict interpretation of
function {\em application,} even when the function to be applied is
strict.\subind{function application!as nonstrict operation}
In the equational logic programming view, computation is the derivation
of an equivalent normal form for an input term using the information
given by a set of equations describing the symbols of the programming
language. The equivalence of input to output holds in {\em all\/} of
the infinitely many models of those equations. This view makes it very
natural to apply equations involving $f$ to derive an equivalent form
for $f(s_1,\ldots,s_n)$ at any time, possibly before all
possible derivation has been performed on the arguments $s_i$. The
natural desire for {\em completeness\/}\subind{completeness!of
programming system}\subind{programming system!completeness} of an
implementation requires that infinite computation be avoided whenever
possible. Notice that the equational logic programming view does not
{\em assign\/} a value $\bot$\notind{$\bot$ (undefined value)} denoting
`undefined' (the noun) to a term with infinite computational
behavior.\subind{undefined!adjective vs.\ noun} In fact, in each
individual model all functions are total. Rather, we might observe that
a term is {\em undefined\/} (the word is now an adjective, as approved
by the dictionary) if there is no equivalent term suitable for output,
although each model of the given equations assigns it some value. So,
equational logic programming leads naturally to computational behavior
that is not strict---in fact, a logically complete implementation of
equational logic programming must make functions as unstrict as
possible. The preference for nonstrictness comes from regarding
undefinedness as our inability to discover the value of a function,
rather than the inherent lack of a semantic value.
The contrast between strict and nonstrict treatments of functions is
best understood by comparing the conventional implementation of $\cond$,
$\true$ and $\false$ to that of $\cons$, $\car$ and $\cdr$ in
Lisp\subind{Lisp!strict vs.\ lazy|(}.\subind{conditional!vs.\ selection|(}
\begin{example}\notind{$\cons$ (list constructor)}
\notind{$\true$}\notind{$\false$}
\notind{$\car$ (first projection)}
\notind{$\cdr$ (second projection)}
\label{exa:cond-vs-cons}
The following equations define the relationship between $\cond$,
$\true$, and $\false$: $$\mathbold{T}_{\cond}=\{(\cond(\true,x,y)\formeq
x),\;(\cond(\false,x,y)\formeq y)\}$$ Similarly the following
equations define the relationship between $\cons$, $\car$, and $\cdr$:
$$\mathbold{T}_{\cons}=\{(\car(\cons(x,y))\formeq
x),\;(\cdr(\cons(x,y))\formeq y)\}$$ These equations were given,
without explicit restriction, in the earliest published definition of
Lisp \cite{McCarthy}\auind{McCarthy, J.}.
\end{example}
Notice the formal similarity between $\mathbold{T}_{\cond}$ and
$\mathbold{T}_{\cons}$ in Example~\ref{exa:cond-vs-cons} above. In
both cases, two equations provide a way to select one of the two
subterms denoted by the variables $x$ and $y$. In
$\mathbold{T}_{\cond}$, the selection is determined by the first
argument to $\cond$, in $\mathbold{T}_{\cons}$ it is determined by the
function symbol applied to the term headed by $\cons$. Yet in all
early Lisp implementations $\cons$ is evaluated strictly, while
$\cond$ is not. The equation $(\car(\cons(0,s))\formeq 0)$ is a
logical consequence of $\mathbold{T}_{\cons}$, even when $s$ leads to
infinite computation, so a complete implementation of equational logic
programming must {\em not\/} treat $\cons$ strictly.
In the Lisp and functional programming communities, nonstrict
evaluation of functions other than the conditional is called {\em lazy
evaluation.}\subind{lazy evaluation} The power of lazy evaluation as a
programming tool is discussed in
\cite{Friedman-Wise,Henderson-Morris,Henderson,Hudak,Bird-Wadler,Field-Harrison,O'Donnell-MIT}.
\auind{Friedman, D.}\auind{Wise, D.}\auind{Henderson, P.}
\auind{Morris, J. H.}\auind{Bird, R.}\auind{Wadler, P.}
\auind{Field, A. J.}\auind{Harrison, P. G.}
Lazy evaluation is {\em demand-driven\/}---computation is performed
only as required to satisfy demands for output. So, the programmer may
define large, and even infinite,\subind{infinite data
structures}\subind{data structures!infinite} data structures as
intermediate values, and depend on the language implementation to
compute only the relevant parts of those structures. In particular,
lazily computed lists behave as {\em streams\/}
\cite{Karlsson,Hudak-Sundaresh},
\auind{Karlsson, K.}\auind{Hudak, P.}\auind{Sundaresh, R. S.}
allowing a straightforward encoding of pipelined coroutines in a
functional style.
\subind{Lisp!strict vs.\ lazy|)}\subind{conditional!vs.\ selection|)}
Many modern implementations of functional programming languages offer
some degree of lazy evaluation, and a few are now uniformly lazy.
But, in the functional programming view, lazy evaluation is an
optional added feature to make programming languages more powerful.
The basic denotational semantic approach to functional programming
makes strictness very natural to describe, while denotational
semantics for lazy evaluation seems to require rather sophisticated
use of domain theory to construct models with special values
representing all of the relevant nonterminating and
partially-terminating behaviors of terms \cite{Winksel}\auind{Winksel,
G.}. In the equational logic programming view, lazy evaluation is
required for logical completeness, and strict evaluation is an
arbitrary restriction on derivations that prevents certain answers
from being found.
\subind{error!value vs.\ quality|(}
The functional and equational views also diverge in their treatments
of certain terms that are viewed as pathological. From the functional
programming view, pathological terms seem to require specialized
logical techniques treating errors as values, and even new types of
models called {\em error algebras\/}\subind{error
algebra}\subind{algebra!error} \cite{Goguen}\auind{Goguen, J. A.}. For
example, in a language with stacks, the term \mbox{$\pop(\sempty)$} is
generally given a value which is a token denoting the erroneous
attempt to pop an empty stack. Given a set of equations, equational
logic programming provides a conceptual framework, based on
well-understood traditional concepts from mathematical logic, for
prescribing completely the computational behavior of terms. The
judgement that a particular term is pathological is left to the
consumer of that answer, which might be a human reader or another
program. For example, the term \mbox{$\pop(\sempty)$} need not be
evaluated to an error token: it may be output as a normal form, and
easily recognized as a pathological case by the consumer. Or, an
explicit equation \mbox{$\pop(\sempty)\formeq e$} may be added to the
program, where $e$ gives as much or as little detailed information
about the particular error as desired.
\subind{error!value vs.\ quality|)}
So, for the programmer there is nothing to choose between lazy
functional programming and equational logic programming---these are
two styles for describing the same programming languages, rather than
two different classes of programming languages. To the language
designer or implementor, the functional programming view provides a
connection to a large body of previous work, and offers some
sophisticated tools for the thorough description of the processing of
erroneous programs and the use of varying degrees of strictness or
laziness. The equational logic programming view offers a deeper
explanation of the logical content of computations, a way of defining
correctness of the computation of answers independently of the
classification of programs as correct or erroneous, and a strong
motivation for uniformly lazy evaluation. It also connects
equational/functional programming to other sorts of logic programming
in a coherent way, which may prove useful to future designs that
integrate equational/functional programming with other styles.
\subind{equational logic programming!vs.\ functional|)}
\subind{functional programming!vs.\ equational|)}
\subsection{Outline of the Chapter}
\label{sec:outline}
The next part of this chapter is primarily concerned with problems in
the implementation of equational logic programming and some
interesting variants of it. Those problems arise at four very
different levels of abstraction---logic, strategy, algorithm, and
code. At the level of pure logic, Section~\ref{sec:eq-proof-system}
discusses two different formal systems of proof for Equational
Logic---inferential proof and term rewriting proof---and argues that
the latter is logically weaker in general, but more likely to provide
efficient computation for typical equational programs. The {\em
confluence property\/} of sets of equations is introduced, and shown
to be a useful way of guaranteeing that term rewriting proof can
succeed. Next, Section~\ref{sec:proof-strategies} treats high-level
strategic questions in the efficient search for a term rewriting proof
to answer a given question. The crucial problem is to choose the next
rewriting step out of a number of possibilities, so as to guarantee
that all correct answers are found, and to avoid unnecessary steps.
Then, Section~\ref{sec:algorithms} discusses the design of efficient
algorithms and data structures for finding and choosing rewriting
steps, and for representing the results of rewriting.
Section~\ref{sec:code} contains a brief description of the
conventional machine code that a compiler can generate based on these
algorithms and data structures. Section~\ref{sec:parallel} discusses
briefly some of the problems involved in parallel implementation of
equational logic programming. Finally, Section~\ref{sec:extensions}
treats several possible extensions to the functionality of equational
logic programming and the problems that arise in their semantics and
implementation.
\section[Proof Systems]{Proof Systems for Equational Logic}
\label{sec:eq-proof-system}
\subind{proof system!equational|(}\subind{equation!proof system|(}
The basic idea in implementations of equational logic programming is to
search for a proof that provides a correct answer to a given question.
The basic idea behind proofs in Equational Logic is that the equation
\mbox{$t_1\formeq t_2$} allows $t_1$ and $t_2$ to be used
interchangeably in other formulae. As in
Definition~\ref{def:proof-system}, of the chapter `Introduction: Logic
and Logic Programming Languages,' \mbox{$\mathbold{T}\hypos D\proves
F$} means that $D$ is a correct proof of the formula $F$ from
hypotheses in $\mathbold{T}$. \mbox{$\mathbold{T}\hyproves F$} means
that there exists a proof of $F$ from hypotheses in $\mathbold{T}$. In
this chapter, subscripts on the generic symbols $\hypos\:\proves$ and
$\hyproves$ are omitted whenever the particular proof system is clear
from the context.
In Sections~\ref{sec:inferential-proof} and~\ref{sec:rewrite-proof},
we consider two different styles of equational proof. Inferential
proofs derive equations step by step from other equations. Term
rewriting proofs use equations to transform a given term into a
provably equivalent term by substituting equals for equals.
\subsection{Inferential Proofs}
\label{sec:inferential-proof}
\subind{inferential proof!equational|(}\subind{equation!inferential proof|(}
In order to explore a variety of approaches to proving equations, we
first define generic concepts of rules of inference and proofs using
rules, and then consider the power of various sets of rules.
\subind{inference rule!equational}\subind{rule of inference!equational}
\subind{equation!inference rule}
\begin{definition}\label{def:inference-rule}
Let\/ the set $\mathbold{V}$ of variables, the sets\/ $\mbox{\bf
Fun}_i$ of $i$-ary function symbols, and the set $\mathbold{T}_P$ of
terms, be the same as in Definition~\ref{def:FOPC-formula} of the
`Introduction \ldots' chapter
Section~\ref{sec:examples-logic-programming}, and let the set of\/
{\em equational formulae,} or simply\/ {\em equations,} be
\mbox{$\mathbold{F}_{\formeq}=\{t_1\formeq t_2\setsep
t_1,t_2\in\mathbold{T}_P\}$}, as in
Definition~\ref{def:equation-formula} in
Section~\ref{sec:equational-logic} of the chapter `Introduction: Logic
and Logic Programming Languages.'
An\/ {\em equational rule of inference\/} is a binary relation $${\cal
R}\subseteq\powerset{{\boldscript{F}_{\formeq}}}\times\mathbold{F}_{\formeq}$$ When\/
\mbox{$\mathbold{T}\mathrel{{\cal R}}F$}, we say that $F$ {\em follows
from\/} $\mathbold{T}$ by rule\/ ${\cal R}$. Members of $\mathbold{T}$
are called\/ {\em hypotheses\/} to the application of the rule, and
$F$ is the\/ {\em conclusion.} When \mbox{$\emptyset\mathrel{{\cal
R}}F$}, we call $F$ a\/ {\em postulate.} (It is popular now to call a
postulated formula $F$ an\/ {\em axiom,} although the dictionary says
that an axiom must be self-evident, not just
postulated.)\subind{postulate!vs.\ axiom}\subind{axiom!vs.\ postulate}
Rules of inference are usually presented in the form
$$\derive{H_1,\ldots,H_m}{C}$$ Where $H_1\ldots,H_m$ are schematic
descriptions of the hypotheses, and $C$ is a schematic description of
the conclusion of an arbitrary application of the rule. Notice that
the union of rules of inference is itself a rule.
The set of\/ {\em inferential equational proofs\/} is
\mbox{$\eqproofs=\mathbold{F}_{\formeq}^+$}, the set of nonempty
finite sequences of equations. Given a rule of inference ${\cal R}$,
the proof relation\/\notind{$\hypos\:\proves_{\cal R}$ (proof by
rule)} $$\hypos\:\proves_{\cal R}\subseteq
\powerset{{\boldscript{F}}_{\formeq}}\times
\eqproofs\times\mathbold{F}_{\formeq}$$
is defined by
\begin{center}
$\mathbold{T}\hypos\langle F_0,\ldots,F_m \rangle\proves_{\cal R} F$ if and
only if\/ \mbox{$F_m=F$} and,\\ for all\/ $i\leq m$, one of the
following cases holds:
\end{center}
\begin{enumerate}
\item\label{enu:eq-hypothesis} \mbox{$F_i\in\mathbold{T}$}
\item\label{enu:rule-application} There exist\/
\mbox{$j_1,\ldots,j_n*0$}, since a nonempty prefix of a term
rewriting proof is also a term rewriting proof, we have\/
\mbox{$\mathbold{T}\hypos\langle
u_0,\ldots,u_{n-1}\rangle\trproves(s\formeq u_{n-1})$}. By the
induction hypothesis, there is a\/ $D$ such that \mbox{$\mathbold{T}\hypos
D\infproves(s\formeq u_{n-1})$}. It is easy to extend\/ $D$ to\/
$D^\prime$ so that\/ \mbox{$\mathbold{T}\hypos
D^\prime\proves(s\formeq t)$}, by adding the following steps:
\begin{itemize}
\item the appropriate equation from\/ $\mathbold{T}$;
\item a sequence of applications of the Instantiation rule to
produce the appropriate instance of the equation above;
\item one application of the Substitution rule to produce
\mbox{$u_{n-1}\formeq t$};
\item one application of the Transitive rule to produce
\mbox{$s\formeq t$}.
\end{itemize}
Since inferential proof is sound, it follows that term rewriting proof
is also sound.
\end{proposition}
\begin{example}
\label{exa:rewrite-inference}
Let\/ \mbox{$\mathbold{T}=\{f(a,f(x,y))\formeq f(y,x),\;g(x)\formeq
x\}$}.
$$\langle f(g(a),f(g(b),c)),\;f(a,f(g(b),c)),\;f(c,g(b)),\;f(c,b)\rangle$$
is a term rewriting proof of
$$f(g(a),f(g(b),c))\formeq f(c,b)$$
from\/ $\mathbold{T}$. The corresponding inferential proof from the
induction in Proposition~\ref{pro:rewrite-inference} is given below.
Line numbers are added on the left, and rules cited on the right, for
clarity: formally the proof is just the sequence of equations. The
key occurrences of the terms in the term rewriting sequence above are
boxed to show the correspondence.
$$\begin{array}{rrrcllr}
& \langle \\
1 & &f(g(a),f(g(b),c))&\formeq& \fbox{$f(g(a),f(g(b),c))$}&, &
{\em Reflexive} \\
2 & &g(x)&\formeq& x&, & {\em Hypothesis} \\
3 & &g(a)&\formeq& a&, & {\em Instantiation,}2 \\
4 & &f(g(a),f(g(b),c))&\formeq& f(a,f(g(b),c))&, & {\em Substitution,}3 \\
5 & &f(g(a),f(g(b),c))&\formeq& \fbox{$f(a,f(g(b),c))$}&, &
{\em Transitive,}1,3 \\
7 & &f(a,f(x,y))&\formeq& f(y,x)&, & {\em
Hypothesis} \\
8 & &f(a,f(g(b),y))&\formeq& f(y,g(b))&, & {\em Instantiation,}7 \\
9 & &f(a,f(g(b),c))&\formeq& f(c,g(b))&, & {\em Instantiation,}8 \\
10 & &f(a,f(g(b),c))&\formeq& f(c,g(b))&, & {\em Substitution,}9 \\
11 & &f(g(a),f(g(b),c))&\formeq& \fbox{$f(c,g(b))$}&, &
{\em Transitive,}1,10 \\
12 & &g(x)&\formeq& x&, & {\em Hypothesis} \\
13 & &g(b)&\formeq& b&, & {\em Instantiation,}12 \\
14 & &f(c,g(b))&\formeq& f(c,b)&, & {\em Substitution,}13 \\
15 & &f(g(a),f(g(b),c))&\formeq& \fbox{$f(c,b)$}& & {\em Transitive,}1,14
\\
& & && & \rangle
\end{array}$$
Steps 5, 10, and 12 above are redundant (they reproduce the results
already obtained in steps 4, 9, 2), but the systematic procedure in
the induction of Proposition~\ref{pro:rewrite-inference} includes them for
uniformity.
\end{example}
So, a term rewriting proof is a convenient and natural shorthand
for an inferential proof.
Not every inferential proof corresponds to a term rewriting proof.
First, the proofs corresponding to term rewriting sequences do not use
the Symmetric rule. This represents a serious incompleteness in term
rewriting proof. Section~\ref{sec:confluence} shows how restrictions
on equational hypotheses can avoid the need for the Symmetric rule,
and render term rewriting complete for answering certain normal form
questions.
\begin{example}
\label{exa:no-symmetry}
\subind{symmetric rule!missing in term rewriting}
Let\/ \mbox{$\mathbold{T}=\{a\formeq b,\;c\formeq b,\;c\formeq d\}$}.
\mbox{$\mathbold{T}\models(a\formeq d)$}, and\/
\mbox{$\mathbold{T}\infhyproves(a\formeq d)$}, by one application of
the Symmetric rule and two applications of the Transitive rule. But,
there is no term rewriting sequence from\/ $a$ to\/ $d$, nor from\/
$d$ to\/ $a$, nor from\/ $a$ and\/ $d$ to a common form equal to both.
\end{example}
Second, term rewriting proofs limit the order in which the
Instantiation, Substitution, and Transitive rules are applied. This
second limitation does not affect the deductive power of the proof system.
\begin{proposition}
\label{pro:rewrite-symmetric}
Let\/ \mbox{$\mathbold{T}=\{l_1\formeq r_1,\ldots,l_n\formeq
r_n\}$} be a set of equations. Let\/
\mbox{$\mathbold{T}^R=\{r_1\formeq l_1,\ldots,r_n\formeq l_n\}$}.
$\mathbold{T}^R$ is the same as\/ $\mathbold{T}$ except that the left
and right sides of equations are interchanged---equivalently,
$\mathbold{T}^R$ contains the results of applying the Symmetric rule
to the equations in $\mathbold{T}$.
For all equations\/ \mbox{$(s\formeq t)$}, if \mbox{$\mathbold{T}
\infhyproves(s\formeq t)$} (equivalently, if
\mbox{$\mathbold{T}\models(s\formeq t)$}) then
\mbox{$\mathbold{T}\cup\mathbold{T}^R\trhyproves(s\formeq t)$}.
The proof of the proposition, given in more detail
in \cite{O'Donnell-Springer}\auind{O'Donnell, M. J.},
works by permuting the steps in an arbitrary
inferential proof of\/ \mbox{$s\formeq t$} into the form:
\begin{enumerate}
\item hypotheses;
\item applications of the Symmetric rule;
\item applications of the Instantiation rule;
\item applications of the Substitution rule;
\item applications of the Transitive rule.
\end{enumerate}
The reflexive rule is only needed in the degenerate case when\/
\mbox{$s=t$} ($s$ and $t$ are the same term). In this form, it is easy
to represent each of the applications of the Transitive rule as
concatenating two term rewriting sequences. The crucial quality of
the permuted form of the proof is that all uses of the Instantiation
rule come before any use of the Transitive and Substitution rules.
\end{proposition}
\subind{term rewriting!vs.\ inferential proof|)}
\subind{proof!length/search tradeoff|(}
The implementor of a logic programming system often faces a tradeoff
between the cost of an individual proof, and the cost of the search
for that proof. The discipline of term rewriting can be very
advantageous in reducing the number of possible steps to consider in
the search for a proof to answer a question, but it increases the
lengths of proofs in some cases. Section~\ref{sec:ground-sharing}
shows how clever uses of Instantiation\subind{instantiation rule}
sometimes reduce the length of a proof substantially compared to term
rewriting proofs. Efficient implementations of programming languages
have not yet succeeded in controlling the costs of search for a proof
with the more sophisticated approaches to Instantiation, so term
rewriting is the basis for almost all implementations.
\subind{proof!length/search tradeoff|)}
\subind{proof system|)}\subind{equation!proof system|)}
\subind{term rewriting!proof|)}\subind{equation!rewriting proof|)}
\subsection[The Confluence Property]
{The Confluence Property and the Completeness of Term Rewriting}
\label{sec:confluence}
\subind{confluence|(}\subind{Church-Rosser property|(}
Term rewriting is often much more efficient than an undisciplined
search for an equational proof. But, for general sets $\mathbold{T}$
of equational hypotheses, term rewriting is not complete, due to its
failure to apply the Symmetric rule.\subind{symmetric rule} It is
tempting, then, to use each equation in both directions, and take
advantage of the completeness result of
Proposition~\ref{pro:rewrite-symmetric}. Unfortunately, known
techniques for efficient term rewriting typically fail or become
inefficient when presented with the reversed forms of equations. So,
we find special restrictions on equations that imply the completeness
of term rewriting for the answering of particular normal form
questions. The {\em confluence property,} also called the {\em
Church-Rosser property,} provides the key to such restrictions.
\begin{definition}
\label{def:confluence}
Let\/ $\rightarrow$ be a binary relation, and\/ $\rightarrow^*$ be its
reflexive-transitive closure. $\rightarrow$ is\/ {\em confluent\/} if
and only if, for all $s$, $t_1$, $t_2$ in its domain such that
\mbox{$s\rightarrow^*t_1$} and \mbox{$s\rightarrow^*t_2$}, there exists
a $u$ such that \mbox{$t_1\rightarrow^*u$} and
\mbox{$t_2\rightarrow^*u$} (see Figure~\ref{fig:confluence} B)
\end{definition}
\begin{figure}[hbtp]
\begin{center}
\begin{tabular}{c@{\hspace{5ex}}c@{\hspace{5ex}}c}
\setlength{\unitlength}{.1 cm}
\begin{picture}(20,20)(-10,-20)
\put(0,0){ \makebox(0,0){$s$} }
\put(-2,-2){ \vector(-1,-1){6} }
\put(2,-2){ \vector(1,-1){6} }
\put(-10,-10){ \makebox(0,0){$t_1$} }
\put(-8,-12){ \vector(1,-1){6} }
\put(-7,-17){ \makebox{$\ast$} }
\put(10,-10){ \makebox(0,0){$t_2$} }
\put(8,-12){ \vector(-1,-1){6} }
\put(7,-17){ \makebox{$\ast$} }
\put(0,-20){ \makebox(0,0){$u$} }
\put(0,-20){ \circle{3} }
\end{picture}
&
\setlength{\unitlength}{.1 cm}
\begin{picture}(20,20)(-10,-20)
\put(0,0){ \makebox(0,0){$s$} }
\put(-2,-2){ \vector(-1,-1){6} }
\put(-7,-4){ \makebox{$\ast$} }
\put(2,-2){ \vector(1,-1){6} }
\put(7,-4){ \makebox{$\ast$} }
\put(-10,-10){ \makebox(0,0){$t_1$} }
\put(-8,-12){ \vector(1,-1){6} }
\put(-7,-17){ \makebox{$\ast$} }
\put(10,-10){ \makebox(0,0){$t_2$} }
\put(8,-12){ \vector(-1,-1){6} }
\put(7,-17){ \makebox{$*$} }
\put(0,-20){ \makebox(0,0){$u$} }
\put(0,-20){ \circle{3} }
\end{picture}
&
\setlength{\unitlength}{.1 cm}
\begin{picture}(20,20)(-10,-20)
\put(0,0){ \makebox(0,0){$s$} }
\put(-2,-2){ \vector(-1,-1){6} }
\put(2,-2){ \vector(1,-1){6} }
\put(-10,-10){ \makebox(0,0){$t_1$} }
\put(-8,-12){ \vector(1,-1){6} }
\put(10,-10){ \makebox(0,0){$t_2$} }
\put(8,-12){ \vector(-1,-1){6} }
\put(0,-20){ \makebox(0,0){$u$} }
\put(0,-20){ \circle{3} }
\end{picture}
\\[3ex]
A. Local confluence & B. Confluence & C. One-step confluence
\end{tabular}
\end{center}
The circle around $u$ indicates that it is existentially quantified,
the uncircled $s$, $t_1$, $t_2$ are universally quantified.
\caption{\label{fig:confluence}Confluence and related properties.}
\end{figure}
Two similar properties that are very important
in the literature are {\em local confluence,} which is weaker than
confluence, and {\em one-step confluence,} which is stronger than
confluence.
\begin{definition}[\cite{Newman}\auind{Newman, M. H. A.}]
\label{def:local-confluence}
\subind{local confluence}\subind{confluence!local}
Let\/ $\rightarrow$ be a binary relation, and\/ $\rightarrow^*$ be its
reflexive-transitive closure. $\rightarrow$ is\/ {\em locally confluent\/} if
and only if, for all $s$, $t_1$, $t_2$ in its domain such that
\mbox{$s\rightarrow t_1$} and \mbox{$s\rightarrow t_2$}, there exists
a $u$ such that \mbox{$t_1\rightarrow^*u$} and
\mbox{$t_2\rightarrow^*u$} (see Figure~\ref{fig:confluence} A).
\end{definition}
While confluence guarantees that divergent term rewriting sequences
may always be rewritten further to a common form, local confluence
guarantees this only for single step term rewriting sequences.
\begin{definition}[\cite{Newman}\auind{Newman, M. H. A.}]
\label{def:one-step-confluence}
\subind{one-step confluence}\subind{confluence!one-step}
Let\/ $\rightarrow$ be a binary relation.
$\rightarrow$ is\/ {\em locally confluent\/} if
and only if, for all $s$, $t_1$, $t_2$ in its domain such that
\mbox{$s\rightarrow t_1$} and \mbox{$s\rightarrow t_2$}, there exists
a $u$ such that \mbox{$t_1\rightarrow u$} and
\mbox{$t_2\rightarrow u$} (see Figure~\ref{fig:confluence} C).
\end{definition}
While confluence guarantees that divergent term rewriting sequences may
always be rewritten further to a common form, one-step confluence
guarantees that for single step divergences, there is a single-step
convergence.
\begin{proposition}
\label{pro:three-confluences}
One-step confluence implies confluence implies local confluence.
The first implication is a straightforward induction on the number of
steps in the diverging rewrite sequences. The second is trivial.
\end{proposition}
\subsubsection{Consequences of Confluence.}
\label{sec:consequences-confluence}
When the term rewriting relation $\onerewrite{\boldscript{T}}$ for a
set $\mathbold{T}$ of equations has the confluence property, term
rewriting is sufficient for deriving all logical consequences of
$\mathbold{T}$, in the sense that \mbox{$\mathbold{T}\models(s\formeq
t)$} implies that $s$ and $t$ rewrite to some common form $u$.
\begin{proposition}[\cite{Curry-Feys}]
\label{pro:common-reduct}
Let\/ $\mathbold{T}$ be a set of equations, and let\/
$\onerewrite{\boldscript{T}}$ be the term rewriting relation for\/
$\mathbold{T}$ (Definition~\ref{def:rewriting-sequence}). If
$\onerewrite{\boldscript{T}}$ is confluent, then for all terms\/
$s$ and\/ $t$ such that\/ \mbox{$\mathbold{T}\models(s\formeq t)$}, there
is a term\/ $u$ such that\/ \mbox{$\mathbold{T}\trhyproves(s\formeq u)$}
and\/
\mbox{$\mathbold{T}\trhyproves(t\formeq u)$}.
The proof of the proposition is an elementary induction on the length
of an inferential proof $D$ such that \mbox{$\mathbold{T}\hypos
D\infproves(s\formeq t)$}.
\end{proposition}
So, confluent term rewriting is nearly complete, in the sense that
every logical consequence \mbox{$s\formeq t$} of a set of equations
$\mathbold{T}$ may be derived by choosing an appropriate term $u$, and
finding two term rewriting proofs and a trivial inferential proof as
follows:
\begin{enumerate}
\item \mbox{$\mathbold{T}\trhyproves(s\formeq u)$}
\item \mbox{$\mathbold{T}\trhyproves(t\formeq u)$}
\item \mbox{$\{s\formeq u,\;t\formeq u\}\infhyproves(s\formeq t)$}
trivially, by one application of Symmetry and one application of Transitivity.
\end{enumerate}
The near-completeness of confluent term rewriting leads to its use in
theorem proving \cite{Knuth-Bendix,Loveland}.
\auind{Knuth, D. E.}\auind{Bendix, P.}\auind{Loveland, D. W.}
For equational logic programming, term rewriting can answer all normal
form queries in a confluent system, when the prohibited terms in
normal forms are all the left-hand sides of equations.
\begin{proposition}
\label{pro:confluent-normalization}
\subind{normal form!completeness of rewriting}\subind{completeness!rewriting}
Let\/ \mbox{$\mathbold{T}=\{l_1\formeq r_1,\ldots,l_m\formeq
r_m\}$} be a set of equations, with confluent term rewriting
relation\/ $\onerewrite{\boldscript{T}}$, and let $t$ be any term. If
$$(\normalize l_1,\ldots,l_m\quantsep
t)\answers\mathbold{T}\models(t\formeq s)$$ then $$(\normalize
l_1,\ldots,l_m\quantsep t)\answers\mathbold{T}\trhyproves(t\formeq
s)$$
The proof is elementary. By confluence, $t$ and\/ $s$ rewrite to a
common form\/ $u$. Since\/ $s$ is a normal form, it
is not rewritable, and must be the same as\/ $u$.
\end{proposition}
So, for equations $\mathbold{T}$ with confluent rewriting relation, term
rewriting based on $\mathbold{T}$ is sufficient for answering all queries
requesting normal forms that prohibit left-hand sides of equations in
$\mathbold{T}$. From now on, a {\em normal form\/} will mean a normal form
for the left-hand sides of whatever set of equations we are discussing
(see Definition~\ref{def:equational-query-system} in the chapter
`Introduction: Logic and Logic Programming Languages' for the general
concept of {\em normal form\/}).
The most famous consequence of the confluence property is uniqueness
of normal forms.
\begin{proposition}
\label{pro:unique-normal-forms}
\subind{normal form!unique}
Let\/ \mbox{$\mathbold{T}=\{l_1\formeq r_1,\ldots,l_m\formeq
r_m\}$} be a set of equations, with confluent term rewriting
relation\/ $\onerewrite{\boldscript{T}}$. If $$(\normalize
l_1,\ldots,l_m\quantsep t)\answers\mathbold{T}\models(t\formeq s_1)$$
and $$(\normalize l_1,\ldots,l_m\quantsep
t)\answers\mathbold{T}\models(t\formeq s_2)$$ then\/ \mbox{$s_1=s_2$}
($s_1$ and $s_2$ are the same term).
The proof is elementary. By confluence, $s_1$ and\/ $s_2$ rewrite to a
common form\/ $u$. Since\/ $s_1$ and\/ $s_2$ are normal forms, they
are not rewritable, and must be the same as\/ $u$.
\end{proposition}
So, equational logic programs using confluent systems of equations
have uniquely defined outputs. This is an interesting property
to note, but it is not essential to the logic programming
enterprise---logic programs in FOPC are allowed to have indeterminate
answers (Section~\ref{sec:fopc} of the `Introduction' chapter), and
this freedom is often seen as an advantage. In efficient equational logic
programming, confluence is required for the completeness of term
rewriting, and uniqueness of answers is an accidental side-effect that
may be considered beneficial or annoying in different applications.
Confluence, in effect, guarantees that the order of applying rewrite
steps cannot affect the normal form. In
Section~\ref{sec:proof-strategies} we see that the order of
application of rewrite rules {\em can\/} affect the efficiency with
which a normal form is found, and in some cases whether or not the
unique normal form is found at all.
\subsubsection{Testing for Confluence.}
\label{sec:testing-confluence}
\begin{proposition}
\label{pro:confluence-undecidable}
Confluence is an undecidable property of finite sets of equations.
The proof is straightforward. Given an arbitrary Turing Machine ${\cal
M}$, modify ${\cal M}$ so that, if it halts, it does so in the special
configuration $I_f$. Encode configurations (instantaneous descriptions)
of ${\cal M}$ as terms (just let the tape and state symbols be unary
function symbols), and provide rewriting rules to simulate the
computation of ${\cal M}$. So far, we have a system of equations in
which an arbitrary encoding of an initial configuration $I_0$ rewrites
to $I_f$ if and only if ${\cal M}$ halts on $I_0$. Choose a new symbol
$a$ not used in encoded configurations, and add two more equations:
$I_0\formeq a$ and $I_f\formeq a$. The extended system is confluent
if and only if ${\cal M}$ halts on $I_0$.
\end{proposition}
For practical purposes in programming language
implementations, we need a {\em sufficient\/} condition for confluence
that is efficient to test.
\paragraph{Orthogonality.}
\label{sec:orthogonality}
\subind{orthogonal rewrite rules|(}\subind{regular rewrite rules|(}
A particularly useful sort of condition for guaranteeing confluence is
{\em orthogonality,} also called {\em regularity\/} (but not connected
in any sensible way to the regular languages). Orthogonality is a set
of restrictions on rewrite rules insuring that they do not interfere
with one another in certain pathological ways. We consider three
versions of orthogonality. {\em Rewrite-orthogonality\/} insures that
the rewrites performed by the rules do not interfere, while the
stronger condition of {\em rule-orthogonality\/} prohibits even the
appearance of interference based on an inspection of the left-hand
sides of the rules, and ignoring the right-hand sides. {\em
Constructor-orthogonality\/} is an even stronger and simpler syntactic
condition that guarantees rule-orthogonality. In other
literature on term rewriting, `orthogonality' and `regularity' refer
to the stronger form, rule-orthogonality.
\begin{definition}
\label{def:rewrite-orthogonal}
\subind{left-linear}\subind{linear!left-}\subind{nonambiguous}
Let\/ \mbox{$\mathbold{T}=\{l_1\formeq r_1,\ldots,l_n\formeq
r_n\}$} be a set of equations. $\mathbold{T}$ is\/ {\em
rewrite-orthogonal\/} if and only if the following conditions hold:
\begin{enumerate}
\item (Nontrivial) No left-hand side\/ $l_i$ of an equation\/
\mbox{$l_i\formeq r_i$} in\/ $\mathbold{T}$ consists entirely of a variable.
\item (Rule-like) Every variable in the right-hand side\/ $r_i$ of an
equation\/ \mbox{$l_i\formeq r_i$} in\/ $\mathbold{T}$ occurs in the
left-hand side\/ $l_i$ as well.
\item (Left-linear) No variable occurs more than once in the
left-hand side\/ $l_i$ of an equation\/ \mbox{$l_i\formeq r_i$} in\/
$\mathbold{T}$.
\item (Rewrite-Nonambiguous)
Let\/ $l_i$ and\/ $l_j$ be left-hand sides of equations in\/
$\mathbold{T}$, and let $s$ be a term with a single occurrence of a new
variable\/ $y$ (not occurring in any equation of $\mathbold{T}$). If
$$\subst{\subst{t_1,\ldots,t_m}{x_1,\ldots,x_m}{l_i}}{y}{s}=
\subst{t^\prime_1,\ldots,t^\prime_n}{x^\prime_1,\ldots,x^\prime_n}{l_j}$$
then either\/ $s$ is an instance of\/ $l_j$, or
$$\subst{\subst{t_1,\ldots,t_m}{x_1,\ldots,x_m}{r_i}}{y}{s}=
\subst{t^\prime_1,\ldots,t^\prime_n}{x^\prime_1,\ldots,x^\prime_n}{r_j}$$
\end{enumerate}
In clause 4 the nested substitution may be hard to read.
$$\subst{\subst{t_1,\ldots,t_m}{x_1,\ldots,x_m}{r_i}}{y}{s}$$ is the
result of substituting $t_1,\ldots,t_m$ for $x_1,\ldots,x_m$ in $r_i$,
to produce $r'_i=\subst{t_1,\ldots,t_m}{x_1,\ldots,x_m}{r_i}$, then
substituting $r'_i$ for $y$ in $s$. Clause 4 is best
understood by considering an example where it fails. The set of
equations $\{f(g(v,w),x)\formeq a,\;g(h(y),z)\formeq b\}$ is
rewrite-ambiguous because, in the term $f(g(h(c),d),e)$, there is an
instance of $f(g(w,x))$ and an instance of $g(h(y),z)$, and the two
instances share the symbol $g$. Furthermore, $f(g(h(c),d),e)$ rewrites
to $a$ using the first equation, and to a different result, $f(b,e)$,
using the second equation.
\end{definition}
Nontriviality and the Rule-like property are required in order for the
interpretation of the equations as term rewriting rules to make much
sense. Left-linearity is of practical importance because the
application of a rule with repeated variables on the left-hand side
requires a test for equality. Non-left-linear systems also fail to be
confluent in rather subtle ways, as shown in
Example~\ref{exa:nonorthogonal} below. Rewrite-nonambiguity says that
if two rewriting steps may be applied to the same term, then they are
either completely independent (they apply to disjoint sets of symbols),
or they are equivalent (they produce the same result).
Example~\ref{exa:nonorthogonal} below shows more cases of
rewrite-ambiguity and its consequences.
One simple way to insure rewrite-nonambiguity is to prohibit all
interference between left-hand-sides of rules.
\begin{definition}[\cite{Klop-term},\cite{Klop-thesis}\auind{Klop, J. W.}]
\label{def:rule-orthogonal}
\subind{nonambiguous}\subind{nonoverlapping}
Let\/ \mbox{$\mathbold{T}=\{l_1\formeq r_1,\ldots,l_n\formeq r_n\}$} be
a set of equations. $\mathbold{T}$ is\/ {\em rule-orthogonal\/} if and only if\/
$\mathbold{T}$ satisfies conditions 1--3 of
Definition~\ref{def:rewrite-orthogonal} above, and also
\begin{enumerate}
\item[$4^\prime$](Rule-Nonambiguous)
Let\/ $l_i$ and\/ $l_j$ be left-hand sides of equations in\/
$\mathbold{T}$, and let $s$ be a term with a single occurrence of a new
variable\/ $y$ (not occurring in any equation of $\mathbold{T}$). If
$$\subst{\subst{t_1,\ldots,t_m}{x_1,\ldots,x_m}{l_i}}{y}{s}=
\subst{t^\prime_1,\ldots,t^\prime_n}{x^\prime_1,\ldots,x^\prime_n}{l_j}$$
then either\/ $s$ is an instance of\/ $l_j$, or\/ \mbox{$s=y$} and\/
\mbox{$i=j$}.
\end{enumerate}
\end{definition}
Rule-nonambiguity says that if two rewriting steps may be applied to
the same term, then they are either completely independent, or they
are identical (the same rule applied at the same place). Notice that
rule nonambiguity depends only on the left-hand sides of equations,
not the right-hand sides. In fact, only the Rule-like condition of
rule-orthogonality depends on right-hand sides.
\begin{definition}
\label{def:left-similar}
\subind{left-similar}
Two systems of equations are\/ {\em left-similar\/} if the multisets
of left-hand sides of equations are the same, except for renaming of
variables.
\end{definition}
\begin{proposition}
\label{pro:left-similar-orthogonal}
A set\/ $\mathbold{T}$ of equations is rule-orthogonal if and only if
\begin{itemize}
\item $\mathbold{T}$ satisfies the rule-like restriction, and
\item every rule-like set of equations left-similar to $\mathbold{T}$
is rewrite-orthogonal.
\end{itemize}
\end{proposition}
That is, rule-orthogonality holds precisely when rewrite-orthogonality
can be guaranteed by the forms of the left-hand sides alone,
independently of the right-hand sides.
An even simpler way to insure rule-nonambiguity is to use a {\em
constructor\/} system, in which symbols appearing leftmost in rules
are not allowed to appear at other locations in left-hand sides.
\begin{definition}
\label{def:constructor-orthogonal}
\subind{constructor}
Let\/ \mbox{$\mathbold{T}=\{l_1\formeq r_1,\ldots,l_n\formeq r_n\}$} be
a set of equations. $\mathbold{T}$ is\/ {\em constructor-orthogonal\/} if
and only if\/ $\mathbold{T}$ satisfies conditions 1--3 of
Definition~\ref{def:rewrite-orthogonal} above, and the symbols of the system partition into two disjoint sets---the set $\mathbold{C}$ of\/ {\em constructor symbols,} and the set\/ $\mathbold{D}$ of\/ {\em defined symbols,} satisfying
\begin{enumerate}
\item[$4^{\prime\prime}$](Symbol-Nonambiguous)
\begin{itemize}
\item Every left-hand side of an equation in\/ $\mathbold{T}$ has the
form\/ $f(t_1,\ldots,t_m)$, where\/ $f\in\mathbold{D}$ is a defined symbol,
and\/ $t_1,\ldots,t_m$ contain only variables and constructor symbols
in\/ $\mathbold{C}$. \item Let\/ $l_i$ and\/ $l_j$ be left-hand sides of
equations in\/ $\mathbold{T}$. If there exists a common instance\/ $s$
of\/ $l_i$ and\/ $l_j$, then\/ $i=j$.
\end{itemize}
\end{enumerate}
\end{definition}
In most of the term-rewriting literature, `orthogonal' and `regular'
both mean {\em rule-orthogonal.} It is easy to see that constructor
orthogonality implies rule-orthogonality, which implies
rewrite-orthogonality. Most functional programming languages have
restrictions equivalent or very similar to constructor-orthogonality.
Orthogonal systems of all varieties are confluent.
\begin{proposition}
\label{pro:orthogonal-confluent}
Let\/ $\mathbold{T}$ be a constructor-, rule- or rewrite-orthogonal set of
equations. Then the term rewriting relation\/
$\onerewrite{\boldscript{T}}$ is confluent.
Let $\rightarrow$ be the rewrite relation that is to be proved
confluent. The essential idea of these, and many other, proofs of
confluence is to choose another relation $\rightarrow'$ with the
one-step confluence property (Definition~\ref{def:one-step-confluence},
whose transitive closure is the same as the transitive closure of
$\rightarrow$. Since confluence is defined entirely in terms of the
transitive closure, $\rightarrow$ is confluent if and only if
$\rightarrow'$ is confluent. $\rightarrow'$ is confluent because
one-step confluence implies confluence. To prove confluence of
orthogonal systems of equations, the appropriate $\rightarrow'$ allows
simultaneous rewriting of any number of disjoint subterms.
Theorem~10.1.3 of the chapter `Equational Reasoning and Term-Rewriting
Systems' in Section 10.1 of Volume~1 of this handbook is the
rewrite-orthogonal portion of this proposition, which is also proved
in~\cite{Huet-Levy,Klop-term}\auind{Huet, G.}\auind{L\'evy,
J.-J.}\auind{Klop, J. W.}. The proof for rewrite-orthogonal systems has
never been published, but it is a straightforward
generalization. \cite{O'Donnell-Springer}\auind{O'Donnell, M. J.}
proves a version intermediate between rule-orthogonality and
rewrite-orthogonality.
\end{proposition}
In fact, for nontrivial, rule-like, left-linear systems,
rule-nonambiguity captures precisely the cases of confluence that
depend only on the left-hand sides of equations.
\begin{proposition}
\label{pro:left-similar-confluent}
A nontrivial, rule-like, left-linear set\/ $\mathbold{T}$ of equations
is rule-nonambiguous if and only if, for every set of equations\/
$\mathbold{T}^\prime$ left-similar to\/ $\mathbold{T}$,
$\onerewrite{\mbox{$\boldscript{T}^\prime$}}$ is confluent.
($\limplies$) is a direct consequence of
Propositions~\ref{pro:orthogonal-confluent}
and~\ref{pro:left-similar-orthogonal}. ($\limplied$) is
straightforward. In a rule-ambiguous system, simply fill in each
right-hand side with a different constant symbol, not appearing on any
left-hand side, to get a nonconfluent system.
\end{proposition}
In the rest of this chapter, we use the term `orthogonal' in
assertions that hold for both rewrite- and rule-orthogonality. To get
a general understanding of orthogonality, and its connection to
confluence, it is best to consider examples of nonorthogonal systems
and investigate why they are not confluent, as well as a few examples
of systems that are not rule orthogonal, but are rewrite orthogonal,
and therefore confluent.
\begin{example}
\label{exa:nonorthogonal}\addtocounter{eqnex}{1}
The first example, due to Klop \cite{Klop-thesis}\auind{Klop, J. W.}, shows the
subtle way in which non-left-linear systems may fail to be confluent.
Let $$\mathbold{T}_{\theeqnex}=\{\seq(x,x)\formeq\true,\;
f(x)\formeq\seq(x,f(x)),\;
a\formeq f(a)\}$$ $\seq$ represents an equality test, a very useful
operation to define with a non-left-linear equation. Now
$$f(a)\onerewrite{{\boldscript{T}_{\theeqnex}}}\seq(a,f(a))
\onerewrite{{\boldscript{T}_{\theeqnex}}}\seq(f(a),f(a))
\onerewrite{{\boldscript{T}_{\theeqnex}}}\true$$ and also
$$f(a)\onerewrite{{\boldscript{T}_{\theeqnex}}}f(f(a))
\onerewrite{{\boldscript{T}_{\theeqnex}}}f(\seq(a,f(a)))
\onerewrite{{\boldscript{T}_{\theeqnex}}}f(\seq(f(a),f(a)))
\onerewrite{{\boldscript{T}_{\theeqnex}}}f(\true)$$ $\true$ is in
normal form, and\/ \mbox{$f(\true)$} rewrites infinitely as
$$f(\true)\onerewrite{{\boldscript{T}_{\theeqnex}}}\seq(\true,f(\true))
\onerewrite{{\boldscript{T}_{\theeqnex}}}\seq(\true,\seq(\true,f(\true)))
\onerewrite{{\boldscript{T}_{\theeqnex}}}\cdots$$ The system is not
confluent, because the attempt to rewrite\/ \mbox{$f(\true)$} to\/
$\true$ yields an infinite regress with\/
\mbox{$f(\true)\onerewrite{{\boldscript{T}_{\theeqnex}}}\seq(\true,f(\true))$}.
Notice that\/ $\onerewrite{{\boldscript{T}_{\theeqnex}}}$ has unique
normal forms. The failure of confluence involves a term with a normal
form, and an infinite term rewriting sequence from which that normal
form cannot be reached. Non-left-linear systems that satisfy the
other requirements of rule-orthogonality always have unique normal
forms, even when they fail to be
confluent \cite{Chew-nonlinear}\auind{Chew, L. P.}. I conjecture
that this holds for rewrite-orthogonality as well.
\addtocounter{eqnex}{1}
A typical rewrite-ambiguous set of equations is
$$\mathbold{T}_{\theeqnex}=\{c(a,y)\formeq a,\;c(x,b)\formeq b\}$$ $c$
represents a primitive sort of\/ {\em nondeterministic choice\/}
operator. $\mathbold{T}_{\theeqnex}$ violates condition ($4^\prime$) because
$$\subst{\subst{b}{y}{c(a,y)}}{w}{w}=c(a,b)=\subst{a}{x}{c(x,b)}$$ but
$$\subst{\subst{b}{y}{a}}{w}{w}=a\neq b=\subst{a}{x}{b}$$
$\onerewrite{{\boldscript{T}_{\theeqnex}}}$ is not confluent, as\/
\mbox{$c(a,b)\onerewrite{{\boldscript{T}_{\theeqnex}}}a$} by the first
equation, and\/
\mbox{$c(a,b)\onerewrite{{\boldscript{T}_{\theeqnex}}}b$} by the second
equation, but\/ $a$ and\/ $b$ are in normal form.
By contrast, consider the set
\notind{$\mathbold{T}_{\sor+}$ (positive parallel or)}
\subind{parallel or!positive}\subind{positive parallel or}
$$\mathbold{T}_{\sor+}=\{\sor(\true,y)\formeq\true,\;
\sor(x,\true)\formeq\true\}$$ of equations defining the\/ {\em positive
parallel or\/} operator. Although\/ $\mathbold{T}_{\sor+}$ is
rule-ambiguous, it is\/ {\em rewrite-}nonambiguous:
$$\subst{\subst{\true}{y}{\sor(\true,y)}}{w}{w}=\sor(\true,\true)=
\subst{\true}{x}{\sor(x,\true)}$$ and\/ $w$ is not an instance of\/
\mbox{$\sor(x,\true)$}, but the corresponding right-hand sides are
both\/ $\true$:
$$\subst{\subst{\true}{y}{\true}}{w}{w}=\true=\subst{\true}{x}{\true}$$
$\mathbold{T}_{\sor+}$ is rewrite-orthogonal, so\/
$\onerewrite{{\boldscript{T}_{\sor+}}}$ is confluent.
A more subtle example of a rewrite-orthogonal set of equations that is
rule-ambiguous is the\/ {\em negative parallel or:}
\notind{$\mathbold{T}_{\sor-}$ (negative parallel or)}
\subind{parallel or!negative}\subind{negative parallel or}
$$\mathbold{T}_{\sor-}=
\{\sor(\false,y)\formeq y,\;\sor(x,\false)\formeq x\}$$
Although
$$\subst{\subst{\false}{y}{\sor(\false,y)}}{w}{w}=\sor(\false,\false)=
\subst{\false}{x}{\sor(x,\false)}$$ and\/ $w$ is not an instance of\/
\mbox{$\sor(x,\false)$}, the substitution above unifies the
corresponding right-hand sides as well:
$$\subst{\false}{y}{y}=\false=\subst{\false}{x}{x}$$
$\mathbold{T}_{\sor-}$ is rewrite-orthogonal, so\/
$\onerewrite{{\boldscript{T}_{\sor-}}}$ is confluent.
\addtocounter{eqnex}{1}\setcounter{oveqnex}{\value{eqnex}}
Another type of rewrite-ambiguous set of equations is
$$\mathbold{T}_{\theeqnex}=\{f(g(x,y))\formeq
g(f(x),f(y)),\;g(i,z)\formeq z\}$$ These equations express the fact
that\/ $f$ is a\/ {\em homomorphism\/} for\/ $g$ (i.e., $f$ {\em
distributes\/} over\/ $g$), and that\/ $i$ is a\/ {\em left
identity\/} for\/ $g$. The left-hand sides of the two equations
overlap in\/ \mbox{$f(g(i,z))$}, with the symbol\/ $g$ participating
in instances of the left-hand sides of both equations. Condition (4)
is violated, because
$$\subst{\subst{y}{z}{g(i,z)}}{w}{f(w)}=f(g(i,y))=
\subst{i}{x}{f(g(x,y))}$$ but\/ \mbox{$f(w)$} is not an instance of\/
\mbox{$f(g(x,y))$}. $\onerewrite{{\boldscript{T}_{\theeqnex}}}$ is
not confluent, as\/
\mbox{$f(g(i,i))\onerewrite{{\boldscript{T}_{\theeqnex}}}g(f(i),f(i))$}
by the first equation, and\/
\mbox{$f(g(i,i))\onerewrite{{\boldscript{T}_{\theeqnex}}}f(i)$} by the
second equation, but both\/ \mbox{$g(f(i),f(i))$} and\/ \mbox{$f(i)$}
are in normal form. While the previous examples of ambiguity involved
two rules applying to precisely the same term, the ambiguity in
$\mathbold{T}_{\theeqnex}$ comes from two overlapping applications of
rules to a term and one of its subterms. Some definitions of
orthogonality/regularity treat these two forms of ambiguity
separately.
\addtocounter{eqnex}{1}
By contrast, consider the set
$$\mathbold{T}_{\theeqnex}=\{f(g(x,y))\formeq f(y),\;g(i,z)\formeq
z\}$$ Although $\mathbold{T}_{\theeqnex}$ is rule-ambiguous, it is\/
{\em rewrite-}nonambiguous:
$$\subst{\subst{y}{z}{g(i,z)}}{w}{f(w)}=f(g(i,y))=
\subst{i}{x}{f(g(x,y))}$$ and\/ \mbox{$f(w)$} is not an instance
of\/ \mbox{$f(g(x,y))$}, but the corresponding right-hand sides yield
$$\subst{\subst{z}{z}{z}}{w}{f(w)}=f(z)=\subst{i,z}{x,y}{f(y)}$$
$\mathbold{T}_{\theeqnex}$ is rewrite-orthogonal, so\/
$\onerewrite{{\boldscript{T}_{\theeqnex}}}$ is confluent.
\addtocounter{eqnex}{1}
Condition (4) may also be violated by a single self-overlapping
equation, such as $$\mathbold{T}_{\theeqnex}=\{f(f(x))\formeq g(x)\}$$ The
left-hand side\/ $f(f(x))$ overlaps itself in\/ $f(f(f(x)))$, with the
second instance of the symbol\/ $f$ participating in two different
instances of\/ $f(f(x))$. Condition (4) is violated, because
$$\subst{f(f(x))}{y}{f(y)}=f(f(f(x)))= \subst{f(x)}{x}{f(f(x))}$$
but\/ \mbox{$f(y)$} is not an instance of\/ \mbox{$f(f(x))$}.
\mbox{$\onerewrite{{\boldscript{T}_{\theeqnex}}}$} is not confluent,
as\/
\mbox{$f(f(f(a)))\onerewrite{{\boldscript{T}_{\theeqnex}}}g(f(a))$}
and\/
\mbox{$f(f(f(a)))\onerewrite{{\boldscript{T}_{\theeqnex}}}f(g(a))$},
but both\/ \mbox{$g(f(a))$} and\/ \mbox{$f(g(a))$} are in normal form.
\addtocounter{eqnex}{1} A final example of overlapping left-hand sides
is $$\mathbold{T}_{\theeqnex}=\{f(g(a,x),y)\formeq a,\;g(z,b)\formeq
b\}$$ The left-hand sides of the two equations overlap in\/
\mbox{$f(g(a,b),y)$}, with the symbol\/ $g$ participating in instances
of the left-hand sides of both equations. Condition (4) is violated,
because $$\subst{\subst{a}{z}{g(z,b)}}{w}{f(w,y)}=f(g(a,b),y)=
\subst{b}{x}{f(g(a,x),y)}$$ but\/ \mbox{$f(w,y)$} is not an instance
of\/ \mbox{$f(g(a,x),y)$}.
\mbox{$\onerewrite{{\boldscript{T}_{\theeqnex}}}$} is not confluent,
as\/ \mbox{$f(g(a,b),c)\onerewrite{{\boldscript{T}_{\theeqnex}}}a$} by
the first equation, and\/
\mbox{$f(g(a,b),c)\onerewrite{{\boldscript{T}_{\theeqnex}}}f(b,c)$} by
the second equation, but both\/ $a$ and\/ \mbox{$f(b,c)$} are in
normal form.
The equations for combinatory logic
\notind{$\mathbold{T}_{SK}$ (combinatory logic)}
\subind{combinator calculus!defining equations}
$$\mathbold{T}_{SK}=\{@(@(@(S,x),y),z)\formeq
@(@(x,z),@(y,z)),\;@(@(K,x),y)\formeq x\}$$ are rule-orthogonal,
but not constructor-orthogonal, since the symbol $@$ (standing for
application of a function to an argument) appears leftmost and also in
the interior of left-hand sides. In more familiar notation,
$@(\alpha,\beta)$ is written\/ $(\alpha\beta)$, and leftward
parentheses are omitted, so the equations look like $$Sxyz\formeq
xz(yz),\;Kxy\formeq x$$ Many functional programming languages vary
the definition of constructor-orthogonality to allow pure applicative
systems (the only symbol of arity greater than zero is the apply
symbol $@$) in which the {\em zeroary\/} symbols ($S$ and $K$ in the
example above) are partitioned into defined symbols and constructors.
The equations for addition of Horner-rule form polynomials in the
symbolic variable $V$ ($V$ is a variable in the polynomials, but is
treated formally as a constant symbol in the equations) are
\notind{$\mathbold{T}_{\mathord{\mbox{\it poly}}}$ (Horner-rule addition)}
\subind{addition of polynomials}\subind{Horner-rule polynomials}
\subind{polynomial!Horner rule}
$$\begin{array}{rccc}
\mathbold{T}_{\mathord{\mbox{\it poly}}}= &
\{ & +(+(w,*(V,x)),+(y,*(V,z)))\formeq+(+(w,y),*(V,+(x,z))), \\
& & +(w,+(y,*(V,z)))\formeq+(+(w,y),*(V,z)), \\
& & +(+(w,*(V,x)),y)\formeq+(+(w,y),*(V,x)) & \}
\end{array}$$
This system is rule-orthogonal, but not constructor-orthogonal,
because the symbols $+$ and $*$ appear leftmost and also in the
interior of left-hand sides. In the more familiar infix form for $+$
and $*$, the equations look like
$$\begin{array}{c}
(w+V*x)+(y+V*z)\formeq(w+y)+V*(x+z),\\
w+(y+V*z)\formeq(w+y)+V*z,\\
(w+V*x)+y\formeq(w+y)+V*x
\end{array}$$
No natural variation on the definition of constructor-orthogonality
seems to allow these equations. The only obvious way to simulate
their behavior with a constructor-orthogonal system is to use two
different symbols for addition, and two different symbols for
multiplication, depending on whether the operation is active in adding
two polynomials, or is merely part of the representation of a
polynomial in Horner-rule form.
\end{example}
Although the polynomial example above shows that some natural sets of
equations are rule-orthogonal but not constructor-orthogonal, Thatte
has an automatic translation from rule-orthogonal to
constructor-orthogonal systems \cite{Thatte}\auind{Thatte, S.} showing that
in some sense the programming power of the two classes of systems is
the same. I still prefer to focus attention on the more general forms
of orthogonality, because they deal more directly with the intuitive
forms of equations, and because I believe that improved equational
logic programming languages of the future will deal with even more
general sets of equations, so I prefer to discourage dependence on the
special properties of constructor systems.
\subind{orthogonal rewrite rules|)}\subind{regular rewrite rules|)}
\paragraph{Knuth-Bendix Methods.}
\label{sec:knuth-bendix}
\subind{Knuth-Bendix method|(}\subind{completion (Knuth-Bendix)|(}
Although overlapping left-hand sides of equations {\em may\/} destroy
the confluence property, there are many useful equational programs
that are confluent in spite of overlaps. In particular, the equation
expressing the associative property has a self-overlap, and equations
expressing distributive or homomorphic properties often overlap with
those expressing identity, idempotence, cancellation, or other
properties that collapse a term. These overlaps are usually benign,
and many useful equational programs containing similar overlaps are in
fact confluent.
\begin{example}
\subind{associative|(}\subind{distributive|(}
\label{exa:associative-distributive}\addtocounter{eqnex}{1}
Consider the singleton set $$\mathbold{T}_{\theeqnex}=\{g(x,g(y,z))\formeq
g(g(x,y),z)\}$$ expressing the\/ {\em associative\/} law for the
operator\/ $g$. This equation has a self-overlap, violating condition
(4) of rewrite-orthogonality (Definition~\ref{def:rewrite-orthogonal}) because
$$\subst{g(x,g(y,z))}{u}{g(w,u)}=g(w,g(x,g(y,z)))=
\subst{w,x,g(y,z)}{x,y,z}{g(x,g(y,z))}$$ but the corresponding
right-hand sides disagree:
$$\begin{array}{c}\subst{g(g(x,y),z)}{u}{g(w,u)}= g(w,g(g(x,y),z))\\
\neq \\ g(g(w,x),g(y,z))=
\subst{w,x,g(y,z)}{x,y,z}{g(g(x,y),z)}\end{array}$$ Nonetheless,
$\onerewrite{{\boldscript{T}_{\theeqnex}}}$ is confluent. For
example, while
$$g(a,g(b,g(c,d)))\onerewrite{{\boldscript{T}_{\theeqnex}}}g(a,g(g(b,c),d))$$
and
$$g(a,g(b,g(c,d)))\onerewrite{{\boldscript{T}_{\theeqnex}}}g(g(a,b),g(c,d))$$
by different applications of the equation, the two results rewrite to
a common normal form by
$$g(a,g(g(b,c),d))\onerewrite{{\boldscript{T}_{\theeqnex}}}g(g(a,g(b,c)),d)
\onerewrite{{\boldscript{T}_{\theeqnex}}}g(g(g(a,b),c),d)$$ and
$$g(g(a,g(b,c)),d)\onerewrite{{\boldscript{T}_{\theeqnex}}}
g(g(g(a,b),c),d)$$
\addtocounter{eqnex}{1}
Consider also the set $$\mathbold{T}_{\theeqnex}=\{f(g(x,y))\formeq
g(f(x),f(y),\;g(i,z)\formeq z,\;f(i)\formeq i\}$$ expressing the
distribution of\/ $f$ over\/ $g$, and the fact that\/ $i$ is a left
identity for\/ $g$ and a fixed point for\/ $f$. The first and second
equations overlap, violating condition (4) of rewrite-orthogonality,
because
$$\subst{g(i,z)}{w}{f(w)}=f(g(i,z))=\subst{i,z}{x,y}{f(g(x,y))}$$ but
the corresponding right-hand sides disagree:
$$\subst{z}{w}{f(w)}=f(z)\neq
g(f(i),f(z))=\subst{i,z}{x,y}{g(f(x),f(y))}$$ Nonetheless,
$\onerewrite{{\boldscript{T}_{\theeqnex}}}$ is confluent. For
example, while
$$f(g(i,a))\onerewrite{{\boldscript{T}_{\theeqnex}}}g(f(i),f(a))$$ by
the first equation and
$$f(g(i,a))\onerewrite{{\boldscript{T}_{\theeqnex}}}f(a)$$ by the
second equation, the first result rewrites to the second, which is in
normal form, by
$$g(f(i),f(a))\onerewrite{{\boldscript{T}_{\theeqnex}}}g(i,f(a))
\onerewrite{{\boldscript{T}_{\theeqnex}}}f(a)$$ Notice that\/
\mbox{$\mathbold{T}_{\theeqnex}=\mathbold{T}_{\theoveqnex}\cup\{f(i)\formeq
i\}$}, and that confluence failed for\/ $\mathbold{T}_{\theoveqnex}$
(Example~\ref{exa:nonorthogonal}).
\subind{associative|)}\subind{distributive|)}
\end{example}
Experience with equational logic programming suggests that most
naively written programs contain a small number of benign overlaps, which are
almost always similar to the examples above.
\subind{overlap!typical}
An efficient test for confluence in the presence of such overlaps
would be extremely valuable.
The only known approach to proving confluence in spite of overlaps is
based on the Knuth-Bendix procedure \cite{Knuth-Bendix}\auind{Knuth,
D. E.}\auind{Bendix, P.}. This procedure relies on the fact that local
confluence (Definition~\ref{def:local-confluence}) is often easier to
verify than confluence, and that local confluence plus termination
imply confluence.
\begin{proposition}[\cite{Newman}\auind{Newman, M. H. A.}]
\label{pro:local-termination-confluence}
If\/ $\rightarrow$ is locally confluent, and there is no infinite
sequence\/ \mbox{$s_0\rightarrow s_1\rightarrow\cdots$}, then\/
$\rightarrow$ is confluent.
The proof is a simple induction on the number of steps to normal form.
\end{proposition}
Unfortunately, a system with nonterminating rewriting sequences
may be locally confluent, but not confluent.
\begin{example}
\label{exa:local-not-confluent}\addtocounter{eqnex}{1}
$\mathbold{T}_1$ of Example~\ref{exa:nonorthogonal} is locally confluent,
but not confluent.
Consider also the set of equations $$\mathbold{T}_{\theeqnex}=\{a\formeq
b,\;b\formeq a,\;a\formeq c,\;b\formeq d\}$$
$\onerewrite{{\boldscript{T}_{\theeqnex}}}$ is locally confluent, but
not confluent. Notice how confluence fails due to the two-step
rewritings\/ \mbox{$a\onerewrite{{\boldscript{T}_{\theeqnex}}}b
\onerewrite{{\boldscript{T}_{\theeqnex}}}d$} and\/
\mbox{$b\onerewrite{{\boldscript{T}_{\theeqnex}}}a
\onerewrite{{\boldscript{T}_{\theeqnex}}}c$} (see
Figure~\ref{fig:local-not-confluent}).
\begin{figure}[hbtp]
\begin{center}
\setlength{\unitlength}{.1 cm}
\begin{picture}(40,10)(-20,-10)
\put(-7,0){ \makebox(0,0){$a$} }
\put(7,0){ \makebox(0,0){$b$} }
\put(-17,-10){ \makebox(0,0){$c$} }
\put(17,-10){ \makebox(0,0){$d$} }
\put(-9,-2){ \vector(-1,-1){6} }
\put(9,-2){ \vector(1,-1){6} }
\put(-3,0){ \vector(1,0){8} }
\put(3,0){ \vector(-1,0){8} }
\end{picture}
\end{center}
\caption{\label{fig:local-not-confluent}
$\mathbold{T}_{\theeqnex}$ is locally confluent, but not confluent.}
\end{figure}
\addtocounter{eqnex}{1}
Another example, without a rewriting loop, is the set of equations
$$\mathbold{T}_{\theeqnex}=\{f(x)\formeq g(h(x)),\;g(x)\formeq
g(h(x)),\;f(x)\formeq c,\;g(x)\formeq d\}$$
$\onerewrite{{\boldscript{T}_{\theeqnex}}}$ is locally confluent, but
not confluent. Again, confluence fails due to the two-step
rewritings\/
\mbox{$f(x)\onerewrite{{\boldscript{T}_{\theeqnex}}}g(h(x))
\onerewrite{{\boldscript{T}_{\theeqnex}}}d$} and\/
\mbox{$g(x)\onerewrite{{\boldscript{T}_{\theeqnex}}}f(h(x))
\onerewrite{{\boldscript{T}_{\theeqnex}}}c$} (see
Figure~\ref{fig:local-not-confluent-inf}).
\begin{figure}[hbtp]
\begin{center}
\setlength{\unitlength}{.1 cm}
\begin{picture}(100,80)(-50,-80)
\put(0,0){ \makebox(0,0){$f(a)$} }
\put(0,-20){ \makebox(0,0){$g(h(a))$} }
\put(0,-40){ \makebox(0,0){$f(h(h(a)))$} }
\put(0,-60){ \makebox(0,0){$g(h(h(h(a))))$} }
\put(-40,-80){ \makebox(0,0){$c$} }
\put(40,-80){ \makebox(0,0){$d$} }
\put(0,-4){ \vector(0,-1){12} }
\put(0,-24){ \vector(0,-1){12} }
\put(0,-44){ \vector(0,-1){12} }
\put(0,-64){ \vector(0,-1){6} }
\put(0,-74){ \makebox(0,0){$\vdots$} }
\put(8,-64){ \vector(2,-1){24} }
\put(-4,-44){ \vector(-1,-1){32} }
\put(3,-24){ \vector(2,-3){34} }
\put(-2,-4){ \vector(-1,-2){36} }
\put(-20,-74){ \makebox(0,0){$\vdots$} }
\put(20,-74){ \makebox(0,0){$\vdots$} }
\end{picture}
\end{center}
\caption{\label{fig:local-not-confluent-inf}
$\mathbold{T}_{\theeqnex}$ is locally confluent, but not confluent.}
\end{figure}
\end{example}
The Knuth-Bendix procedure examines overlaps one at a time to see
whether they destroy the local confluence property. Given a pair of
equations\/ \mbox{$l_1\formeq r_1$} and
\mbox{$l_2\formeq r_2$} whose left-hand sides overlap---i.e., there is
a term\/ \mbox{$s\neq y$} with one occurrence of\/ $y$ such that
$$\subst{\subst{t_1,\ldots,t_m}{x_1,\ldots,x_m}{l_1}}{y}{s}=
\subst{t^\prime_1,\ldots,t^\prime_n}{x^\prime_1,\ldots,x^\prime_n}{l_2}$$
but\/ $s$ is not an instance of\/ $l_2$. For each\/ $s$, $l_1$ and
$l_2$, use the smallest\/ $t_1,\ldots,t_m$ and\/
$t^\prime_1,\ldots,t^\prime_n$ that satisfy this equation. The
results of rewriting the instance of\/ $s$ above in two different ways,
according to the overlapping instances of equations, are
\mbox{$c_1=\subst{\subst{t_1,\ldots,t_m}{x_1,\ldots,x_m}{r_1}}{y}{s}$}
and \mbox{$c_2=\subst{t^\prime_1,\ldots,t^\prime_n}
{x^\prime_1,\ldots,x^\prime_n}{r_2}$}. The pair \mbox{$\langle
c_1,c_2\rangle$} is called a {\em critical pair\/}.\subind{critical
pair} A finite set of equations generates a finite set of critical
pairs, since only a finite number of $s$s can be compatible with some
$l_2$, but not an instance of $l_2$. The procedure checks all
critical pairs to see if they rewrite to a common normal form. If so,
the system is locally confluent.
\begin{proposition}[\cite{Huet}\auind{Huet, G.}]
\label{pro:knuth-bendix}
Let\/ $\mathbold{T}$ be a set of equations. If for every critical pair\/
\mbox{$\langle c_1,c_2\rangle$} of\/ $\mathbold{T}$ there is a term\/ $d$
such that\/ \mbox{$c_1\rewrites{\boldscript{T}}d$} and
\mbox{$c_2\rewrites{\boldscript{T}}d$}, then\/
$\onerewrite{\boldscript{T}}$ is locally confluent.
\end{proposition}
This proposition, and the Knuth-Bendix method, apply even to
non-left-linear sets of equations. For example, the local confluence
of $\mathbold{T}_1$ in Example~\ref{exa:nonorthogonal} may be proved by
inspecting all critical pairs.
When some critical pair cannot be rewritten to a common form, the
Knuth-Bendix procedure tries to add an equation to repair that failure
of local confluence. For equational logic programming, we would like
to use just the part of the procedure that checks local confluence,
and leave it to the programmer to decide how to repair a failure.
Although, in principle, the search for a common form for a critical
pair might go on forever, in practice a very shallow search suffices.
I have never observed a natural case in which more than two rewriting
steps were involved. Unfortunately, many useful equational programs
have nonterminating term rewriting sequences, so local confluence is
not enough. The design of a variant of the Knuth-Bendix procedure
that is practically useful for equational logic programming is an open
topic of research---some exploratory steps are described in
\cite{Chen-O'Donnell}\auind{Chen, Y.}\auind{O'Donnell, M. J.}. A
number of methods for proving termination are known\subind{termination
of rewriting}
\cite{Dershowitz-termination,Guttag-Kapur-Musser},
\auind{Dershowitz, N.}\auind{Guttag, J. V.}\auind{Kapur, D.}
\auind{Musser, D. } which might be applied to portions of an
equational program even if the whole program is not terminating, but
we have no experience with the practical applicability of these
methods. If the rewriting of the terms $c_1$ and $c_2$ in a critical
pair to a common form $d$ (see Proposition~\ref{pro:knuth-bendix})
takes no more than one rewriting step (this is {\em one-step
confluence,} Definition~\ref{def:one-step-confluence}), then we get
confluence and not just local confluence. Rewrite-orthogonal systems
are those whose critical pairs are all trivial---the members of the
pair are equal, and so the reduction to a common form takes zero
steps. Unfortunately, all of the important examples so far of
confluent but not rewrite-orthogonal equational programs have the
basic structure of associativity or distributivity (see
Example~\ref{exa:associative-distributive}) and require two rewriting
steps to resolve their critical pairs.
The sets of equations in Example~\ref{exa:associative-distributive}
pass the Knuth-Bendix test for local confluence, and a number of
well-known techniques can be used to prove that there is no infinite
term rewriting sequence in these systems. But, we need to recognize
many variations on these example systems, when they are embedded in
much larger sets of equations which generate some infinite term
rewriting sequences, and no completely automated method has yet shown
practical success at that problem (although there are special
treatments of commutativity and\subind{commutative-associative rewriting}
associativity \cite{Baird-Peterson-Wilkerson,Dershowitz-Hsiang-Josephson-Plaisted}).
\auind{Baird, T.}\auind{Peterson, G.}\auind{Wilkerson, R.}
\auind{Dershowitz, N.}\auind{Hsiang, J.}\auind{Josephson, N.}
\auind{Plaisted, D. A.}
On the other hand, in practice naturally constructed systems of
equations that are locally confluent are almost always confluent.
Surely someone will find a useful and efficient formal criterion to
distinguish the natural constructions from the pathological ones of
Example~\ref{exa:local-not-confluent}.
\subind{Knuth-Bendix method|)}\subind{completion (Knuth-Bendix)|)}
\section[Proof Strategies]{Term Rewriting Proof Strategies}
\label{sec:proof-strategies}
\subind{proof strategy!equational|(}\subind{equation!proof strategy|(}
\subind{term rewriting!strategy|(}
Given an orthogonal set of equations \mbox{$\mathbold{T}=\{l_1\formeq
r_1,\ldots,l_m\formeq r_m\}$}, or any set with confluent term
rewriting relation $\onerewrite{\boldscript{T}}$, we may now answer
all questions of the form \mbox{$(\normalize l_1,\ldots,l_m\quantsep
t)$} by exploring term rewriting sequences starting with $t$.
Confluence guarantees that if there is an answer, some term rewriting
sequence will find it (Proposition~\ref{pro:confluent-normalization}).
Furthermore, confluence guarantees that no finite number of term
rewriting steps can be catastrophic, in the sense that if
\mbox{$s\rewrites{\boldscript{T}}t$} and if $s$ rewrites to a
normal form, then $t$ rewrites to the same normal form. Confluence,
however, does not guarantee that no infinite term rewriting sequence
can be catastrophic.
\begin{example}
\label{exa:nonterminating-confluent}\addtocounter{eqnex}{1}
Consider the set of equations
$$\mathbold{T}_{\theeqnex}=\{\car(\cons(x,y))\formeq x,\;a\formeq
f(a)\}$$ The first equation is the usual definition of the \/ $\car$
(left projection) function of Lisp, the second is a silly example of
an equation leading to infinite term rewriting.
$\mathbold{T}_{\theeqnex}$ is orthogonal, so
$\onerewrite{{\boldscript{T}_{\theeqnex}}}$ is confluent. But\/
\mbox{$\car(\cons(b,a))$} rewrites to the normal form\/ $b$, and also
in the infinite rewriting sequence\/
\mbox{$\car(\cons(b,a))\onerewrite{{\boldscript{T}_{\theeqnex}}}
\car(\cons(b,f(a)))\onerewrite{{\boldscript{T}_{\theeqnex}}}\cdots$}.
\end{example}
Notice that, due to confluence, no matter how far we go down the
infinite term rewriting sequence
\mbox{$\car(\cons(b,a))\onerewrite{{\boldscript{T}_{\theeqnex}}}
\car(\cons(b,f(a)))\onerewrite{{\boldscript{T}_{\theeqnex}}}\cdots$}, one
application of the first equation leads to the normal form $b$.
Nonetheless, a naive strategy might fail to find that normal form by
making an infinite number of unfortunate rewrites. In fact, the usual
recursive evaluation techniques used in Lisp and other term-evaluating
languages correspond to term rewriting strategies that choose infinite
sequences whenever possible. A breadth-first search of all possible
rewriting sequences is guaranteed to find all normal forms, but at the
cost of a lot of unnecessary work.
\subind{confluence|)}\subind{Church-Rosser property|)}
For efficient implementation of equational logic programming, we need
strategies for choosing term rewriting steps so that
\begin{itemize}
\item a small number of term rewriting sequences is explored,
preferably only one;
\item if there is a normal form, it is found, preferably by the
shortest or cheapest sequence possible.
\end{itemize}
Some theoretical work on sequencing in the lambda calculus has already
been explored under the title of {\em one-step
strategies\/} \cite{Barendregt}\auind{Barendregt, H. P.}.
\subsection{Complete and Outermost Complete Rewriting Sequences}
\label{sec:complete-outermost}
In orthogonal systems of equations, there are two useful results on
strategies that are guaranteed to find normal forms. The formal
notation for stating these results precisely is somewhat involved (see
the chapter `Equational Reasoning and Term Rewriting Systems in
Volume~1), so I only give rough definitions here. The concepts in this
section can be extended to nonorthogonal systems, but in some cases
there are very subtle problems in the extensions, and they have never
been treated in the literature.
\begin{definition}[\cite{Huet-Levy}]
\label{def:redex-residual}
\subind{residual}\subind{redex}
A\/ {\em redex\/} is an occurrence of an instance of a left-hand side
of an equation in a term. An\/ {\em outermost\/} redex is one that is
not nested inside any other redex. When\/ $\alpha$ is a redex in\/
$s$, and \mbox{$s\rewrites{\boldscript{T}}t$}, the\/ {\em
residuals\/} of\/ $\alpha$ in\/ $t$ are the redexes in\/ $t$ that
correspond in the obvious way to\/ $\alpha$ in\/ $s$---they are
essentially explicit copies of\/ $\alpha$, except that some rewriting
step may have rewritten a subterm of\/ $\alpha$, so that some copies may
be modified. All residuals of\/ $\alpha$ are occurrences of instances
of the same left-hand side as\/ $\alpha$.
\end{definition}
\begin{example}
\label{exa:redex-residual}\addtocounter{eqnex}{1}
Consider the rule-orthogonal set of equations
$$\mathbold{T}_{\theeqnex}=\{f(x)\formeq g(x,x),\;g(h(x),y)\formeq
h(x),\;a\formeq b\}$$ The term\/
\mbox{$g(f(f(h(a))),f(h(a)))$} has five redexes: two
occurrences each of the terms\/ $a$ and\/ \mbox{$f(h(a))$}, and one
occurrence of\/ \mbox{$f(f(h(a)))$}. The latter two are both
instances of the left-hand side\/ \mbox{$f(x)$} of the first equation.
The leftmost occurrence of\/ \mbox{$f(h(a))$} is nested inside\/
\mbox{$f(f(h(a)))$}, so it is not outermost. Each occurrence of\/ $a$
is nested inside an occurrence of\/ \mbox{$f(h(a))$}, so neither is
outermost. The rightmost occurrence of\/ \mbox{$f(h(a))$}, and the
sole occurrence of\/ \mbox{$f(f(h(a)))$}, are both outermost redexes.
In the rewriting sequence below, the leftmost occurrence of\/
\mbox{$f(h(a))$}, and its residuals in each succeeding term, are boxed.
\begin{eqnarray*}
g(f(\,\fbox{$f(h(a))$}\,),f(h(a)))
& \onerewrite{{\boldscript{T}_{\theeqnex}}}
& g(g(\,\fbox{$f(h(a))$}\,,\,\fbox{$f(h(a))$}\,),f(h(a))) \\
& \onerewrite{{\boldscript{T}_{\theeqnex}}}
& g(g(\,\fbox{$f(h(b))$}\,,\,\fbox{$f(h(a))$}\,),f(h(a))) \\
& \onerewrite{{\boldscript{T}_{\theeqnex}}}
& g(g(g(h(b),h(b)),\,\fbox{$f(h(a))$}\,),f(h(a))) \\
& \onerewrite{{\boldscript{T}_{\theeqnex}}}
& g(g(h(b),\,\fbox{$f(h(a))$}\,),f(h(a))) \\
& \onerewrite{{\boldscript{T}_{\theeqnex}}} & g(h(b),f(h(a))) \\
& \onerewrite{{\boldscript{T}_{\theeqnex}}} & h(b)
\end{eqnarray*}
Notice how the leftmost occurrence of\/ \mbox{$f(h(a))$} in the first
term of the sequence is copied into two occurrences in the second, due
to the rewriting of a redex in which it is nested.
The first of these is changed to\/ \mbox{$f(h(b))$} in the third term
of the sequence, but it is still a residual of the original leftmost\/
\mbox{$f(h(a))$}. In the fourth term of the sequence,
\mbox{$f(h(b))$} is rewritten, eliminating one of the residuals. In the
sixth term, the remaining residual, still in the form\/
\mbox{$f(h(a))$}, is eliminated due to rewriting of a redex in which it
is nested. Another occurrence of\/ \mbox{$f(h(a))$} remains, but it
is a residual of the rightmost occurrence of that term in the first
term of the sequence.
\end{example}
In general, a residual\/ $\alpha$ of a redex is eliminated when\/
$\alpha$ is rewritten (or, in rewrite-orthogonal systems, when a redex
overlapping $\alpha$ is rewritten). $\alpha$ is copied zero, one, or
more times (zero times eliminates the residual) when another redex in
which\/ $\alpha$ is nested is rewritten. $\alpha$ remains the same when
another redex disjoint from\/ $\alpha$ is rewritten. Finally, $\alpha$
is modified in form, but remains an instance of the same left-hand
side, when another redex nested inside\/ $\alpha$ is rewritten.
\subind{complete rewriting sequence|(}\subind{fair rewriting sequence|(}
\subind{rewriting!fair|(}
\subind{term rewriting!complete|(}
\subind{term rewriting!fair|(}
In orthogonal systems, the nontrivial, rule-like, and nonambiguous
qualities of equations (restrictions 1, 2, 4 or $4^\prime$ of
Definition~\ref{def:rewrite-orthogonal} or~\ref{def:rule-orthogonal})
guarantee that a given redex may be rewritten in precisely one way. So,
a term rewriting strategy need only choose a redex to rewrite at each
step. The most obvious way to insure that all normal forms are found
is to rewrite every redex fairly.
\begin{definition}[\cite{O'Donnell-Springer}\auind{O'Donnell, M. J.}]
\label{def:complete-fair-sequence}
A finite or infinite term rewriting sequence\/ \mbox{$t_0\rightarrow
t_1\rightarrow\cdots$} is\/ {\em complete\/} (also called\/ {\em
fair\/}) if and only if, for every\/ $i$ and every redex\/ $\alpha$
in\/ $t_i$, there exists a\/ \mbox{$j>i$} such that\/ $t_j$ contains
no residual of\/ $\alpha$.
\end{definition}
A complete term rewriting sequence is fair to all redexes, in the
sense that every redex $\alpha$ (or its residuals, which are
essentially the later versions of the redex) eventually gets
eliminated, either by rewriting $\alpha$ (with rewrite-orthogonality,
a redex overlapping $\alpha$), or by making zero copies of $\alpha$
while rewriting another redex in which $\alpha$ is nested. Complete
term rewriting sequences are maximal, in the sense that they produce
terms that are rewritten further than every other sequence. Since
nothing is rewritten further than a normal form, complete sequences
produce a normal form whenever there is one.
\begin{proposition}[\cite{O'Donnell-Springer}\auind{O'Donnell, M. J.}]
\label{pro:complete-sequence-maximal}
Let\/ $\mathbold{T}$ be an orthogonal set of equations, let\/
\mbox{$t_0\onerewrite{\boldscript{T}}t_1
\onerewrite{\boldscript{T}}\cdots$} be a complete rewriting
sequence, and let\/ $s$ be any term such that\/
\mbox{$t_0\rewrites{\boldscript{T}}s$}. There exists an\/ $i$ such
that\/ \mbox{$s\rewrites{\boldscript{T}}t_i$}. In particular, if\/
$s$ is in normal form, then\/ \mbox{$t_i=s$}.
\end{proposition}
\subind{complete rewriting sequence|)}\subind{fair rewriting sequence|)}
\subind{rewriting!fair|)}
\subind{term rewriting!complete|)}
\subind{term rewriting!fair|)}
\subind{outermost rewriting sequence|(}
\subind{rewriting!outermost|(}
\subind{term rewriting!outermost|(}
Computing a single complete term rewriting sequence is generally
cheaper than searching breadth-first among a number of sequences, but
fair rewriting strategies (such as the strategy of adding new redexes
to a queue, and rewriting all residuals of the head redex in the queue)
typically perform a substantial number of superfluous rewriting steps,
and can easily waste an exponentially growing amount of work in some
cases. Since a residual $\alpha$ of a redex may only be eliminated by
rewriting $\alpha$, or some redex inside which $\alpha$ is nested, we
need only be fair to the outermost redexes in order to be sure of
finding normal forms.
\begin{definition}[\cite{O'Donnell-Springer}\auind{O'Donnell, M. J.}]
\label{def:outermost-sequence}
A finite or infinite term rewriting sequence\/ \mbox{$t_0\rightarrow
t_1\rightarrow\cdots$} is\/ {\em outermost complete\/} or\/ {\em
outermost fair\/} (called\/ {\em eventually outermost\/}
in \cite{O'Donnell-Springer}\auind{O'Donnell, M. J.}) if and only if,
for every\/ $i$ and every outermost redex\/ $\alpha$ in\/ $t_i$, there exists
a\/ \mbox{$j>i$} such that the unique residual of\/ $\alpha$ in\/
$t_{j-1}$ is either eliminated by rewriting in\/ $t_j$, or is no
longer outermost in\/ $t_j$ (equivalently, no residual of\/ $\alpha$
is outermost in\/ $t_j$).
\end{definition}
Since, for the least $j$ satisfying the definition above, $\alpha$
remains outermost from $t_i$ through $t_{j-1}$ and cannot be copied,
there is no loss of generality in requiring the residual of $\alpha$
in $t_{j-1}$ to be unique.
\begin{proposition}[\cite{O'Donnell-Springer}\auind{O'Donnell, M. J.}]
\label{pro:outermost-termination}
Let\/ $\mathbold{T}$ be a rule-orthogonal set of equations, let\/
\mbox{$t_0\onerewrite{\boldscript{T}}t_1\onerewrite{\boldscript{T}}\cdots$}
be an outermost complete rewriting sequence, and let\/ $s$ be a
(unique) normal form for $t_0$. There exists an\/ $i$ such that\/
\mbox{$t_i=s$}.
\cite{O'Donnell-Springer} proves this proposition for a form of
orthogonality intermediate between rule- and rewrite-orthogonality.
I conjecture that the proof generalizes to rewrite-orthogonality.
\end{proposition}
The requirement that $\mathbold{T}$ be orthogonal, and not just
confluent, is essential to Proposition~\ref{pro:outermost-termination}.
\begin{example}
\label{exa:confluent-not-outermost}\addtocounter{eqnex}{1}
Consider the set of equations
$$\mathbold{T}_{\theeqnex}=\{f(g(x,b))\formeq b,\;g(x,y)\formeq
g(f(x),y),\;a\formeq b\}$$ These equations are confluent, but not
rewrite-orthogonal, since the left-hand sides of the first and second
equations overlap in\/ \mbox{$f(g(x,b))$}, but the corresponding
right-hand sides yield $b\neq f(g(f(x),b))$. The natural outermost
complete rewriting sequence starting with \mbox{$f(g(b,a))$} is the
infinite one
$$f(g(b,a))\onerewrite{{\boldscript{T}_{\theeqnex}}}f(g(f(b),a))
\onerewrite{{\boldscript{T}_{\theeqnex}}}f(g(f(f(b)),a))
\onerewrite{{\boldscript{T}_{\theeqnex}}}\cdots$$ But
\mbox{$f(g(b,a))$} rewrites to normal form by
$$f(g(b,a))\onerewrite{{\boldscript{T}_{\theeqnex}}}f(g(b,b))
\onerewrite{{\boldscript{T}_{\theeqnex}}}b$$
The problem is that rewriting the nonoutermost redex $a$ to $b$ creates
a new outermost redex for the first equation {\em above\/} the
previously outermost one for the second equation. This leapfrogging
effect allows an inner redex to kill an outer one indirectly, by
creating another redex even closer to the root. There should be some
interesting conditions, weaker than rewrite-orthogonality, that
prohibit this leapfrogging effect and guarantee outermost termination
for confluent systems.
\end{example}
The obvious way to generate outermost-complete rewriting sequences is
to alternate between finding all outermost redexes, and rewriting them
all. The order in which the outermost redexes are rewritten is
irrelevant since they are all disjoint and cannot cause copying or
modification of one another. Unfortunately, this strategy often
generates a lot of wasted work. For example, consider a system
containing the equations $\mathbold{T}_{\cond}$ for the conditional
function from Example~\ref{exa:cond-vs-cons}
$$\mathbold{T}_{\cond}=\{\cond(\true,x,y)\formeq
x,\;\cond(\false,x,y)\formeq y\}$$ In a term of the form
\mbox{$\cond(r,s,t)$}, there will usually be outermost redexes in all
three of $r$ $s$ and $t$. But, once $r$ rewrites to either $\true$ or
$\false$, one of $s$ and $t$ will be thrown away, and any rewriting in
the discarded subterm will be wasted. The ad hoc optimization of
noticing when rewriting of one outermost redex immediately causes
another to be nonoutermost sounds tempting, but it probably introduces
more overhead in detecting such cases than it saves in avoiding
unnecessary steps. Notice that it will help the $\cond$ example only
when $r$ rewrites to $\true$ or $\false$ in one step. So, we need
some further analysis to choose which among several outermost redexes
to rewrite.
\subind{outermost rewriting sequence|)}
\subind{rewriting!outermost|)}
\subind{term rewriting!outermost|)}
\subsection{Sequentiality Analysis and Optimal Rewriting}
\label{sec:sequentiality}
\subind{sequential term rewriting|(}\subind{sequentiality analysis|(}
For rewrite-orthogonal systems of equations in general, it is
impossible to choose reliably a redex that {\em must\/} be rewritten
in order to reach normal form, so that there is no risk of wasted
work.
\begin{example}
\label{exa:parallel-or-problem}
\subind{parallel or!positive|(}\subind{positive parallel or|(}
\notind{$\mathbold{T}_{\sor+}$ (positive parallel or)}
Let\/ $\mathbold{T}_c$ be equations defining some general-purpose
programming system, such as Lisp. The forms of the particular
equations in\/ $\mathbold{T}_c$ are not important to this example,
merely the fact that they are powerful enough for general-purpose
programming. Assume that in the system\/ $\mathbold{T}_c$ there is an
effective way to choose a redex that must be rewritten to reach normal
form (this is the case for typical definitions of Lisp). Now, add
the\/ {\em positive parallel-or\/} equations
$$\mathbold{T}_{\sor+}=\{\sor(\true,x)\formeq
\true,\;\sor(x,\true)\formeq\true\}$$
and consider the system\/ $\mathbold{T}_c\cup\mathbold{T}_{\sor+}$.
For an arbitrary given term \mbox{$\sor(s,t)$}, we would like to choose
either $s$ or $t$ to rewrite first. If $s$ rewrites to $\true$,
but $t$ does not, then it is crucial to choose $s$, else work (possibly
infinitely much work) will be wasted. Similarly, if $t$ rewrites to
$\true$, but $s$ does not, it is crucial to choose $t$. If {\em
neither\/} $s$ nor $t$ rewrites to $\true$, then both must be rewritten
to normal form in order to normalize the whole term, so we may choose
either. If {\em both\/} $s$ and $t$ rewrite to $\true$ then, ideally,
we would like to choose the one that is cheapest to rewrite, but
suppose that we are satisfied with either choice in this case also.
Suppose that we have an effective way to choose $s$ or $t$ above.
Then, we have a {\em recursive separation\/} of the terms
\mbox{$\sor(s,t)$} in which\/ $s$ rewrites to\/ $\true$ and\/ $t$ has
no normal form from those in which\/ $t$ rewrites to\/ $\true$ and\/
$s$ has no normal form. Such a separation is known to be impossible.
(It would lead easily to a computable solution of the halting
problem. See \cite{Machtey-Young}\auind{Machtey, M.}\auind{Young, P.}
for a discussion of {\em recursive inseparability\/}\subind{recursively
inseparable}.) So, we cannot decide effectively whether to rewrite
redexes in\/ $s$ or in\/ $t$ without risking wasted work.
The case where\/ {\em both\/} $s$ and $t$ rewrite to\/ $\true$ poses
special conceptual problems for sequentiality theory. Although it is
necessary to rewrite\/ {\em one\/} of\/ $s$ or\/ $t$ in order to reach
the normal form\/ $\true$, it is neither necessary to rewrite\/ $s$,
nor necessary to rewrite\/ $t$. The criterion of choosing a redex
that\/ {\em must\/} be rewritten fails to even {\em define\/} a next
rewriting step mathematically, and the question of computability does
not even arise. Notice that this latter case is problematic for\/
$\mathbold{T}_{\sor+}$ alone, without the addition of $\mathbold{T}_c$.
\end{example}
The difficulty in Example~\ref{exa:parallel-or-problem} above appears
to depend on the unifiability of the left-hand sides of the two
equations in $\mathbold{T}_{\sor+}$, which is allowed in rewrite-orthogonal
systems, but not rule-orthogonal systems.
\subind{parallel or!positive|)}\subind{positive parallel or|)}
A more subtle example, due to Huet and L\'evy, shows that
rule-orthogonality is still not sufficient for effective sequencing.
\begin{example}[\cite{Huet-Levy}\auind{Huet, G.}\auind{L\'evy, J.-J.}]
\label{exa:tricky-nonsequential}\addtocounter{eqnex}{1}
\setcounter{trickyeqnex}{\value{eqnex}}
Replace the parallel-or equations of
Example~\ref{exa:parallel-or-problem} by the following:
$$\mathbold{T}_{\theeqnex}=\{f(x,a,b)\formeq c,\;f(b,x,a)\formeq
c,\;f(a,b,x)\formeq c\}$$ and consider the system
$\mathbold{T}_c\cup\mathbold{T}_{\theeqnex}$. Given a term of the
form\/ \mbox{$f(r,s,t)$}, we cannot decide whether to rewrite redexes
in\/ $r$, in\/ $s$, or in\/ $t$ without risking wasted work, because
we cannot separate computably the three cases
\begin{itemize}
\item $r\rewrites{}a$ and\/ $s\rewrites{}b$
\item $r\rewrites{}b$ and\/ $t\rewrites{}a$
\item $s\rewrites{}a$ and\/ $t\rewrites{}b$
\end{itemize}
Unlike the parallel-or example, it is impossible for more than one of
these three cases to hold. There is always a mathematically
well-defined redex that\/ {\em must\/} be rewritten in order to reach
a normal form, and the problem is entirely one of choosing such a
redex effectively. In fact, for sets\/ $\mathbold{T}$ of equations such
that $\mathbold{T}\cup\mathbold{T}_{\theeqnex}$ is\/ {\em terminating\/} (every
term has a normal form), the choice of whether to rewrite\/ $r$, $s$,
or\/ $t$ in\/ $f(r,s,t)$ is effective, but usually unacceptably
inefficient.
\end{example}
So, some further analysis of the form of equations beyond checking for
orthogonality is required in order to choose a good redex to rewrite
next in a term rewriting sequence. Analysis of equations in order to
determine a good choice of redex is called {\em sequentiality
analysis.}
\subsubsection{Needed Redexes and Weak Sequentiality}
\label{sec:needed-redex}
\subind{needed redex|(}\subind{redex!needed|(}
The essential ideas for sequentiality analysis in term rewriting are
due to Huet and L\'evy \cite{Huet-Levy}\auind{Huet, G.}\auind{L\'evy,
J.-J.}, based on a notion of sequential predicate\subind{sequential
predicate} by Kahn
\cite{Kahn-Plotkin}\auind{Kahn, G.}\auind{Plotkin, G.}. A redex that
may be rewritten without risk of wasted work is called a {\em needed
redex.}
\begin{definition}[\cite{Huet-Levy}\auind{Huet, G.}\auind{L\'evy, J.-J.}]
\label{def:needed-redex}
Given an orthogonal set\/ $\mathbold{T}$ of equations and a term\/ $t_0$, a
redex\/ $\alpha$ in $t_0$ is a\/ {\em needed redex\/} if and only if,
for every term rewriting sequence\/
\mbox{$t_0\onerewrite{\boldscript{T}}t_1
\onerewrite{\boldscript{T}}\cdots\onerewrite{\boldscript{T}}t_m$},
either
\begin{itemize}
\item there exists an\/ $i$, \mbox{$1\leq i\leq m$} such that
a residual of\/ $\alpha$ is rewritten in the step\/
\mbox{$t_{i-1}\onerewrite{\boldscript{T}}t_i$}, or
\item $\alpha$ has at least one residual in\/ $t_m$.
\end{itemize}
\end{definition}
A needed redex is a redex whose residuals can never be completely
eliminated by rewriting other redexes. So, the rewriting of a needed
redex is not wasted work, since at least one of its residuals has to
be rewritten in order to reach normal form. Huet and L\'evy defined
needed redexes only for terms with normal forms, but the
generalization above is trivial. A system is {\em weakly
sequential\/} if there is always a needed redex to rewrite.
\begin{definition}
\label{def:weakly-sequential}
\subind{sequential!weakly}
A rewrite-orthogonal set of equations is\/ {\em weakly sequential\/}
if and only if every term that is not in normal form contains at least
one needed redex. A set of equations is\/ {\em effectively weakly
sequential\/} if and only if there is an effective procedure that
finds a needed redex in each term not in normal form.
\end{definition}
The word `sequential' above is conventional, but may be misleading to
those interested in parallel computation. A weakly sequential system is
not {\em required\/} to be computed sequentially---typically there is
great opportunity for parallel evaluation. Rather, a weakly sequential
system {\em allows\/} sequential computation without risk of wasted
rewriting work. In this respect `sequentializable' would be a more
enlightening word than `sequential.'
The parallel-or system $\mathbold{T}_{\sor+}$ of
Example~\ref{exa:parallel-or-problem} is rewrite-orthogonal, but not
weakly sequential, because the term
$$\sor(\sor(\true,a),\sor(\true,a))$$ has two redexes, neither of
which is needed, since either can be eliminated by rewriting the
other, then rewriting the whole term to $\true$. Rule-orthogonality
guarantees weak sequentiality.
\begin{proposition}[\cite{Huet-Levy}\auind{Huet, G.}\auind{L\'evy, J.-J.}]
\label{pro:rule-orthogonal-weakly-sequential}
A nontrivial, rule-like, and left-linear set of equations
(Definition~\ref{def:rewrite-orthogonal}) is weakly sequential if and
only if it is rule-orthogonal.
The proof of $(\limplied$) is in \cite{Huet-Levy}\auind{Huet,
G.}\auind{L\'evy, J.-J.}. It involves a search through all
rewriting sequences (including infinite ones), and does not yield an
effective procedure. $(\limplies)$ is straightforward, since when two
redexes overlap neither is needed.
\end{proposition}
Proposition~\ref{pro:rule-orthogonal-weakly-sequential} above shows
that no analysis based on weak sequentiality can completely
sequentialize systems whose confluence derives from
rewrite-orthogonality, or from a Knuth-Bendix analysis.
Section~\ref{sec:extended-sequencing} discusses possible extensions of
sequentiality beyond rule-orthogonal systems.
The system $\mathbold{T}_c\cup\mathbold{T}_{\thetrickyeqnex}$ of
Example~\ref{exa:tricky-nonsequential} is rule-orthogonal, and
therefore weakly sequential. For example, in a term of the form
$f(r,s,t)$, where $r\rewrites{}a$ and $s\rewrites{}b$, both $r$ and $s$
contain needed redexes. The subsystem $\mathbold{T}_{\theeqnex}$, without
the general-purpose programming system $\mathbold{T}_c$, is {\em
effectively\/} weakly sequential, but only because it is terminating.
I conjecture that effective weak sequentiality is undecidable for
rule-orthogonal systems.
\subind{needed redex|)}\subind{redex!needed|)}
\subsubsection[Strongly Needed Redexes]
{Strongly Needed Redexes and Strong Sequentiality}
\label{sec:strongly-needed-redex}
\subind{strongly need redex|(}\subind{redex!strongly needed|(}
The uncomputability of needed redexes and the weak sequential property
are addressed analogously to the uncomputability of confluence: by
finding efficiently computable sufficient conditions for a redex to be
needed, and for a system to be effectively weakly sequential. A
natural approach is to ignore right-hand sides of equations, and
detect those cases of needed redexes and effectively weakly sequential
systems that are guaranteed by the structure of the left-hand sides.
To this end we define $\omega$-rewriting, in which a redex is replaced
by an arbitrary term.
\begin{definition}[\cite{Huet-Levy}]
\label{def:omega-rewriting}
\subind{omega-rewrite}\subind{rewriting!omega-}
\notind{$\onerewritem{\boldscript{T}}{\omega}$
(one step $\omega$-rewrite relation)}
Let\/ \mbox{$\mathbold{T}=\{l_1\formeq r_1,\ldots,l_n\formeq
r_n\}$} be a rule-orthogonal set of equations.
A term\/ $s_1$ {\em $\omega$-rewrites to\/} $s_2$ {\em by\/} $\mathbold{T}$ (written
$s_1\onerewritem{\boldscript{T}}{\omega}s_2$) if and only if there is a
term\/ $t$, a variable\/ $x$ with exactly one occurrence in\/ $t$, an
instance\/ $l_i^\prime$ of a left-hand side\/
$l_i$ in\/ $\mathbold{T}$, and a term\/ $r$ such that\/
\mbox{$s_1=\subst{l_i'}{x}{t}$} and\/
\mbox{$s_2=\subst{r}{x}{t}$}. That is, $s_2$ results from
finding exactly one instance of a left-hand side of an equation in\/
$\mathbold{T}$ occurring as a subterm of $s_1$, and replacing it with an
arbitrary term. The definition of residual
(Definition~\ref{def:redex-residual}) generalizes naturally to
$\omega$-rewriting.
\end{definition}
Now, a {\em strongly needed redex\/} is defined analogously to a
needed redex, using $\omega$-rewriting instead of rewriting.
\begin{definition}[\cite{Huet-Levy}\auind{Huet, G.}\auind{L\'evy, J.-J.}]
\label{def:strongly-needed-redex}
Given a rule-orthogonal set\/ $\mathbold{T}$ of equations and a term\/
$t_0$, a redex\/ $\alpha$ in $t_0$ is {\em strongly needed\/} if and
only if, for every $\omega$-rewriting sequence\/
\mbox{$t_0\onerewritem{\boldscript{T}}{\omega}t_1
\onerewritem{\boldscript{T}}{\omega}\cdots
\onerewritem{\boldscript{T}}{\omega}t_m$}, either
\begin{itemize}
\item there exists an\/ $i$, \mbox{$1\leq i\leq m$} such that
a residual of\/ $\alpha$ is rewritten in the step\/
\mbox{$t_{i-1}\onerewritem{\boldscript{T}}{\omega}t_i$}, or
\item $\alpha$ has at least one residual in\/ $t_m$.
\end{itemize}
\end{definition}
Because of rule-orthogonality, the property of being strongly needed
depends only on the location of a redex occurrence, and not on its
internal structure. So, we call an arbitrary occurrence in a term {\em
strongly needed\/} if and only if a redex substituted in at that
location is strongly needed. \cite{Huet-Levy}\auind{Huet,
G.}\auind{L\'evy, J.-J.} defines {\em strong indexes,} and shows
that they determine exactly the strongly needed redexes. It is easy
to see that every strongly needed redex is needed, and outermost.
And, it is easy to detect whether a given redex is strongly needed
(see Section~\ref{sec:algorithms} and \cite{Huet-Levy}). A
system of equations is strongly sequential if there is always a
strongly needed redex to be rewritten, except in a normal form term.
\begin{definition}[\cite{Huet-Levy}\auind{Huet, G.}\auind{L\'evy, J.-J.}]
\label{def:strongly-sequential-equations}
\subind{strongly sequential}\subind{sequential!strongly}
A rule-orthogonal set of equations is\/ {\em strongly sequential\/} if and
only if every term that is not in normal form contains at least one
strongly needed redex.
\end{definition}
It is obvious that every strongly sequential system is effectively
weakly sequential, but the converse does not hold.
\begin{example}
\label{exa:sequential-not-strongly}
The system $\mathbold{T}_{\thetrickyeqnex}$ of
Example~\ref{exa:tricky-nonsequential}, although it is effectively
weakly sequential, is\/ {\em not\/} strongly sequential.
\mbox{$f(f(a,b,c),f(a,b,c),f(a,b,c))$} $\omega$-rewrites to\/
\mbox{$f(f(a,b,c),a,b)$}, which is a redex, so the first
redex \mbox{$f(a,b,c)$} is not strongly needed. Similarly,
\mbox{$f(f(a,b,c),f(a,b,c),f(a,b,c))$} $\omega$-rewrites to the
redexes \mbox{$f(b,f(a,b,c),a)$} and \mbox{$f(a,b,f(a,b,c))$}, so the
second and third redexes are not strongly needed. All three redexes
are weakly needed.
\addtocounter{eqnex}{1}
By contrast, consider the strongly sequential system
$$\mathbold{T}_{\theeqnex}=\{f(g(b,b),h(x,b))\formeq b,\;
f(g(c,x),h(c,c))\formeq c,\;h(x,d)\formeq d,\;a\formeq b\}$$ In
the term
\mbox{$f(g(a,a),h(a,a))$}, the first and last occurrences of $a$ are
strongly needed, but the second and third are not.
\end{example}
Notice that $\omega$-rewriting allows different redexes that are
occurrences of instances of the same left-hand side to be rewritten
inconsistently in different $\omega$-rewriting steps. Such
inconsistency is critical to the example above, where in one case
\mbox{$f(a,b,c)$} $\omega$-rewrites to $a$, and in another case it
$\omega$-rewrites to $b$.\subind{omega-rewrite}\subind{rewriting!omega-}
\notind{$\onerewritem{\boldscript{T}}{\omega}$
(one step $\omega$-rewrite relation)}
Strong sequentiality is independent of the right-hand sides of equations.
\begin{proposition}
\label{pro:left-similar-sequential}
\subind{left-similar}
If\/ $\mathbold{T}_1$ and\/ $\mathbold{T}_2$ are left-similar (see
Definition~\ref{def:left-similar}), and\/ $\mathbold{T}_1$ is strongly
sequential, then so is\/ $\mathbold{T}_2$.
The proof is straightforward, since\/ $\mathbold{T}_1$ and\/ $\mathbold{T}_2$
clearly have the same\/ $\omega$-rewriting relations
(Definition~\ref{def:omega-rewriting}).
\end{proposition}
It is {\em not\/} true, however, that a system is strongly sequential
whenever all left-similar systems are weakly sequential. The system
of Example~\ref{exa:tricky-nonsequential} and all left-similar systems
are weakly sequential, but not strongly sequential. But in that case,
no given redex is needed in all of the left-similar systems. An
interesting open question is whether a redex that is needed in all
left-similar systems must be strongly needed.
For finite rule-orthogonal sets of equations, strong sequentiality is
decidable.
\begin{proposition}[\cite{Huet-Levy,Klop-Middledorp}\auind{Huet,
G.}\auind{L\'evy, J.-J.}\auind{Klop J. W.}\auind{Middeldorp, A.}]
\label{pro:strong-sequentiality-decidable}
Given a finite rule-orthogonal set\/ $\mathbold{T}$ of equations, it is
decidable whether\/ $\mathbold{T}$ is strongly sequential.
The details of the proof are quite tricky, but the essential idea is
that only a finite set of terms, with sizes limited by a function of
the sizes of left-hand sides of equations, need to be checked for
strongly needed redexes.
\end{proposition}
In developing the concept of {\em strongly needed\/} redexes and
connecting it to the concept of {\em weakly needed\/} redexes, Huet
and L\'evy define the intermediately powerful concept of an {\em
index.}\subind{index} Roughly, an index is a needed redex that can be
distinguished from other redexes just by their relative positions in a
term, without knowing the forms of the redexes themselves
\cite{Huet-Levy}\auind{Huet, G.}\auind{L\'evy, J.-J.}. Every index is
a weakly needed redex, but not vice versa. {\em Strong\/} indexes are
equivalent to strongly needed redexes. A system in which every term
not in normal form has at least one index is called {\em sequential.}
The precise relation between sequentiality in all left-similar
systems, and strong sequentiality, is an interesting open problem.
All of the sequentiality theory discussed in this section deals with
sequentializing the process of rewriting a term to normal form.
Example~\ref{exa:parallel-head-normalize} in
Section~\ref{sec:inc-inf-IO} shows that even strongly sequential
systems may require a parallel evaluation strategy for other purposes,
such as a complete procedure for rewriting to head-normal form.
\subind{strongly need redex|)}\subind{redex!strongly needed|)}
\subsubsection{Optimal Rewriting}
\label{sec:optimal}
\subind{term rewriting!optimal|(}
From a naive point of view, the natural strategy of rewriting a
strongly needed redex at each step does not lead to minimal-length
rewriting sequences ending in normal form. The problem is that the
rewriting of a strongly needed redex may cause another needed (but not
strongly needed) redex to be copied arbitrarily many times. Since
strongly needed redexes are always outermost, they are particularly
likely to cause such copying.
\begin{example}
\label{exa:sequential-not-optimal}\addtocounter{eqnex}{1}
\setcounter{opteqnex}{\value{eqnex}}
In the strongly sequential system of equations
$$\mathbold{T}_{\theeqnex}=\{f(x)\formeq g(x,x)\}$$ given the
initial term
\mbox{$f(f(a))$}, both redexes, \mbox{$f(f(a))$} and \mbox{$f(a)$},
are needed, but only the outermost one is strongly needed. By
rewriting a strongly needed redex at each step, we get the 3-step
sequence
$$f(f(a))\onerewrite{{\boldscript{T}_{\theeqnex}}}g(f(a),f(a))
\onerewrite{{\boldscript{T}_{\theeqnex}}}g(g(a,a),f(a))
\onerewrite{{\boldscript{T}_{\theeqnex}}}g(g(a,a),g(a,a))$$ But, there
is a 2-step sequence
$$f(f(a))\onerewrite{{\boldscript{T}_{\theeqnex}}}f(g(a,a))
\onerewrite{{\boldscript{T}_{\theeqnex}}}g(g(a,a),g(a,a))$$ which does
not rewrite the unique strongly needed redex in the first step.
\end{example}
It is easy to construct further examples in which the number of steps
wasted by rewriting strongly needed redexes is arbitrarily large.
\begin{proposition}
\label{pro:no-effective-optimal}
Given an arbitrary strongly sequential system of equations, and a
term, there is no effective procedure to choose a redex at each step
so as to minimize the length of the rewriting sequence to normal form.
I am not aware of a treatment of this point in the literature. The
basic idea is to use the equations\addtocounter{eqnex}{1}
$$\mathbold{T}_{\theeqnex}=\{f(x,y)\formeq g(x,x,h(y)),\;a\formeq
b,\;g(x,x,0)\formeq 0\}$$ similar to $\mathbold{T}_{\theopteqnex}$
of Example~\ref{exa:sequential-not-optimal}, and add additional
equations to define $h$ as the evaluator for Lisp or some other
general-purpose computing system. Now, start with a term of the form
$f(a,p)$, where $p$ is an arbitrary program. If evaluation of $p$ halts
with value $0$ (that is, if $h(p)$ rewrites to $0$), then an optimal
rewriting sequence rewrites $f(a,p)$ to $g(a,a,h(p))$ in the first
step, and never rewrites the $a$s. If evaluation of $p$ halts with any
value other than $0$, then an optimal sequence rewrites $a$ to $b$ in
the first step (else it must be rewritten twice later). An effective
method for choosing the first step would yield a recursive separation
of the programs that halt with output $0$ from those that halt with
output $1$, which is known to be
impossible~\cite{Machtey-Young}\auind{Machtey, M.}\auind{Young, P.}.
\subind{recursively inseparable}
\end{proposition}
Notice that, when there is a normal form, breadth-first search over
all rewriting sequences yields a very expensive computation of a
minimal sequence. But, no effective procedure can choose some redex
in all cases (even in the absence of a normal form), and minimize the
number of rewriting steps when there is a normal form.
\subind{sharing|(}
The uncomputability of minimal-length rewriting strategies in
Proposition~\ref{pro:no-effective-optimal} sounds discouraging. The
number of rewriting steps is {\em not,} however, a good practical
measure of the efficiency of a sequencing strategy. Given equations,
such as \mbox{$f(x)\formeq g(x,x)$} in
Example~\ref{exa:sequential-not-optimal}, with more than one
occurrence of the same variable $x$ on the right-hand side, normal
sensible implementations do {\em not\/} make multiple copies of the
subterm substituted for that variable. Rather, they use multiple
pointers to a single copy. Then, only one actual computing step is
required to rewrite all of the apparent multiple copies of a redex
within that substituted subterm. So, in
Example~\ref{exa:sequential-not-optimal}, the strategy of choosing a
strongly needed redex actually leads to only two steps, from
\mbox{$f(f(a))$} to \mbox{$g(f(a),f(a))$}, and then directly to
\mbox{$g(g(a,a),g(a,a))$}. The normal form is represented in practice
with only one copy of the subterm \mbox{$g(a,a)$}, and two pointers to
it for the two arguments of the outermost $g$. If we charge only one
step for rewriting a whole set of shared redexes, then rewriting
strongly needed redexes is optimal.
\begin{proposition}
\label{pro:sequential-optimal}
Consider multiple-rewriting sequences, in which in one step all of the
shared copies of a redex are rewritten simultaneously. Given a
strongly sequential set of equations and a term, the strategy of
rewriting at each step a strongly needed redex and all of its shared
copies leads to normal form in a minimal number of steps.
\end{proposition}
This proposition has never been completely proved in print. I claimed
a proof \cite{O'Donnell-Springer}\auind{O'Donnell, M. J.} but had a
fatal error \cite{Berry-Levy,O'Donnell-letter}\auind{Berry,
G.}\auind{L\'evy, J.-J.}. The hard part of the proposition---that
the rewriting of a strongly needed redex is never a wasted step---was
proved by Huet and L\'evy \cite{Huet-Levy}\auind{Huet,
G.}\auind{L\'evy, J.-J.}. The remaining point---that rewriting a
strongly needed redex never causes additional rewriting work later in
the sequence---seems obvious, but has never been treated formally in
general. L\'evy \cite{Levy}\auind{L\'evy, J.-J.} treated a similar
situation in the lambda calculus, but in that case there is no known
efficient implementation technique for the sequences used in the
optimality proof. Although the formal literature on optimal rewriting
is still incomplete, and extensions of optimality theory to systems
(such as the lambda calculus) with bound variables are extremely
subtle, for most practical purposes Huet's and L\'evy's work justifies
the strategy of rewriting all shared copies of a strongly needed redex
at each step. Optimality aside, rewriting strategies that always
choose a strongly needed redex are examples of {\em one-step
normalizing strategies,} which provide interesting theoretical
problems in combinatory logic and the lambda
calculus \cite{Barendregt}\auind{Barendregt, H. P.}.
\subind{sharing|)}
\subind{term rewriting!optimal|)}
\subsubsection{Extensions to Sequentiality Analysis}
\label{sec:extended-sequencing}
Proposition~\ref{pro:rule-orthogonal-weakly-sequential} seems to
invalidate rewrite-orthogonal systems for efficient or optimal
sequential rewriting. A closer look shows that the definition of
weakly needed redexes and weak sequentiality is inappropriate for
rewrite-orthogonal systems. When two redexes from a
rewrite-orthogonal system overlap, we get the same result by rewriting
either one. So, there is no need for a sequential strategy to choose
between them, and we might as well allow an arbitrary selection. This
observation suggests a more liberal concept of needed redex.
\begin{definition}
\label{def:rewrite-needed-redex}
Given a rewrite-orthogonal set\/ $\mathbold{T}$ of equations and a term\/
$t_0$, a redex\/ $\alpha$ in $t_0$ is a\/ {\em rewrite-needed redex\/}
if and only if, for every term rewriting sequence\/
\mbox{$t_0\onerewrite{\boldscript{T}}t_1
\onerewrite{\boldscript{T}}\cdots\onerewrite{\boldscript{T}}t_m$},
either
\begin{itemize}
\item there exists an\/ $i$, \mbox{$1\leq i\leq m$} such that
a residual of\/ $\alpha$ is rewritten in the step\/
\mbox{$t_{i-1}\onerewrite{\boldscript{T}}t_i$}, or
\item there exists an\/ $i$, \mbox{$1\leq i\leq m$} such that
a redex overlapping a residual of\/ $\alpha$ is rewritten in the
step\/
\mbox{$t_{i-1}\onerewrite{\boldscript{T}}t_i$}, or
\item $\alpha$ has at least one residual in\/ $t_m$.
\end{itemize}
\end{definition}
This is the same as Definition~\ref{def:needed-redex} of needed redex,
except that when one redex is reduced, we give credit to all redexes
that overlap it. We generalize Definition~\ref{def:weakly-sequential}
with the new version of needed redexes.
\begin{definition}
\label{def:weakly-rewrite-sequential}
An orthogonal set of equations is\/ {\em weakly rewrite-sequential\/}
if and only if every term that is not in normal form contains at least
one rewrite-needed redex. A set of equations is\/ {\em effectively
weakly rewrite-sequential\/} if and only if there is an effective
procedure that finds a rewrite-needed redex in each term not in normal
form.
\end{definition}
\subind{overlap|(}
Three sorts of overlaps between left-hand sides of equations have
different impacts on weak rewrite-sequentiality. Recall
(Definition~\ref{def:rewrite-orthogonal}) that the problematic
overlaps occur when there is a term $s$, and left-hand sides $l_i$
and $l_j$, such that
$$\subst{\subst{t_1,\ldots,t_m}{x_1,\ldots,x_m}{l_i}}{y}{s}=
\subst{t^\prime_1,\ldots,t^\prime_n}{x^\prime_1,\ldots,x^\prime_n}{l_j}$$
Rewrite-nonambiguity requires that either $s$ is an
instance of $l_j$, or the corresponding right-hand sides $r_i$ and
$r_j$ satisfy
$$\subst{\subst{t_1,\ldots,t_m}{x_1,\ldots,x_m}{r_i}}{y}{s}=
\subst{t^\prime_1,\ldots,t^\prime_n}{x^\prime_1,\ldots,x^\prime_n}{r_j}$$
\begin{enumerate}
\item Sometimes the structure of the inner term $l_i$ is entirely subsumed
by the structure of the outer term $l_j$---that is, the substituted
terms \mbox{$t^\prime_1,\ldots,t^\prime_n$} are trivial, and
$$\subst{\subst{t_1,\ldots,t_m}{x_1,\ldots,x_m}{l_i}}{y}{s}=l_j$$ In
this case, the equation \mbox{$l_j\formeq r_j$} is redundant, since
every possible application of it can be accomplished by applying
\mbox{$l_i\formeq r_i$} instead.
\item Sometimes the structure of the inner term $l_i$ extends below
the structure of the outer term $l_j$---that is, the substituted terms
\mbox{$t_1,\ldots,t_m$} are trivial, and $$\subst{l_i}{y}{s}=
\subst{t^\prime_1,\ldots,t^\prime_n}{x^\prime_1,\ldots,x^\prime_n}{l_j}$$
Overlaps of this sort appear not to destroy weak rewrite-sequentiality.
\item Otherwise, neither set of substituted terms
\mbox{$t_1,\ldots,t_m$} nor \mbox{$t^\prime_1,\ldots,t^\prime_n$} is
trivial. This is the interesting case. Weak rewrite-sequentiality
may hold or not, depending on the extent to which redexes in
substituted subterms are copied or eliminated by the right-hand sides.
\end{enumerate}
Figure~\ref{fig:overlap-types} illustrates the three types of overlap
with suggestive pictures.
\begin{figure}[hbtp]
\begin{center}
\setlength{\unitlength}{.1 cm}
\begin{picture}(110,45)(-55,-45)
\put(-40,-40){ \makebox(0,0){Type 1} }
\put(-40,-5){ \makebox(0,0){$l_j$} }
\put(-40,0){ \circle*{1} }
\put(-40,0){ \line(-1,-2){15} }
\put(-40,0){ \line(1,-2){15} }
\put(-55,-30){ \line(1,0){30} }
\put(-40,-20){ \makebox(0,0){$l_i$} }
\put(-40,-15){ \circle*{1} }
\put(-40,-15){ \line(-1,-2){5} }
\put(-40,-15){ \line(1,-2){5} }
\put(-45,-25){ \line(1,0){10} }
\put(0,-40){ \makebox(0,0){Type 2} }
\put(0,-5){ \makebox(0,0){$l_j$} }
\put(0,0){ \circle*{1} }
\put(0,0){ \line(-1,-2){12} }
\put(0,0){ \line(1,-2){12} }
\put(-12,-24){ \line(1,0){24} }
\put(0,-15){ \makebox(0,0){$l_i$} }
\put(0,-10){ \circle*{1} }
\put(0,-10){ \line(-1,-2){10} }
\put(0,-10){ \line(1,-2){10} }
\put(-10,-30){ \line(1,0){20} }
\put(40,-40){ \makebox(0,0){Type 3} }
\put(40,-5){ \makebox(0,0){$l_j$} }
\put(40,0){ \circle*{1} }
\put(40,0){ \line(-1,-2){15} }
\put(40,0){ \line(1,-2){9} }
\put(25,-30){ \line(2,1){24} }
\put(40,-14){ \makebox(0,0){$l_i$} }
\put(40,-9){ \circle*{1} }
\put(40,-9){ \line(-1,-2){6} }
\put(40,-9){ \line(1,-2){10} }
\put(50,-29){ \line(-2,1){16} }
\end{picture}
\end{center}
\caption{\label{fig:overlap-types}The three types of left-hand-side overlap.}
\end{figure}
\begin{example}
\label{exa:rewrite-sequential}\addtocounter{eqnex}{1}
Consider the set $$\mathbold{T}_{\theeqnex}=\{f(g(h(x)))\formeq
f(h(x)),\;g(x)\formeq x\}$$ The overlap here is of type 1, since
$$\subst{\subst{h(x)}{x}{g(x)}}{y}{f(y)}=f(g(h(x)))$$ The first
equation is redundant since it is essentially a special case of the
second.
\addtocounter{eqnex}{1}
Next, consider $$\mathbold{T}_{\theeqnex}=\{f(g(x),a)\formeq
f(x,a),\;g(h(x))\formeq h(x)\}$$ The overlap here is of type 2, since
$$\subst{g(h(x))}{y}{f(y,a)}=f(g(h(x)),a)=\subst{h(x)}{x}{f(g(x),a)}$$
\mbox{$\mathbold{T}_{\theeqnex}$} is weakly rewrite-sequential. In a term
of the form\/ \mbox{$f(s,t)$}, in which\/
\mbox{$s\onerewrite{{\boldscript{T}_{\theeqnex}}}g(h(s^\prime))$} and
also\/ \mbox{$t\onerewrite{{\boldscript{T}_{\theeqnex}}}a$},
it is always safe to rewrite\/ $t$ first, since by rewriting\/ $s$
to\/ \mbox{$g(h(s^\prime))$} and then rewriting this to\/
\mbox{$h(s^\prime)$}, we cannot eliminate the redexes in\/ $t$.
\addtocounter{eqnex}{1}
Now, consider $$\mathbold{T}_{\theeqnex}=\{f(g(a,x))\formeq
f(a),\;g(x,a)\formeq a\}$$ The overlap here is of type 3, since
$$\subst{\subst{a}{x}{g(x,a)}}{y}{f(y)}=f(g(a,a))=\subst{a}{x}{f(g(a,x))}$$
is the smallest substitution showing the overlap, and neither
substitution is trivial. \mbox{$\mathbold{T}_{\theeqnex}$} is not weakly
rewrite-sequential, since the term\/
\mbox{$f(g(g(a,a),g(a,a)))$} has the two rewrite sequences
$$f(g(g(a,a),g(a,a)))\onerewrite{{\boldscript{T}_{\theeqnex}}}f(g(a,g(a,a)))
\onerewrite{{\boldscript{T}_{\theeqnex}}}f(a)$$ which shows that the
rightmost occurrence of\/ \mbox{$g(a,a)$} is not needed and
$$f(g(g(a,a),g(a,a)))\onerewrite{{\boldscript{T}_{\theeqnex}}}f(g(g(a,a),a))
\onerewrite{{\boldscript{T}_{\theeqnex}}}f(a)$$ which shows that the
leftmost occurrence of\/ \mbox{$g(a,a)$} is not needed.
\addtocounter{eqnex}{1}
Modify the previous example slightly, by changing the right-hand sides:
$$\mathbold{T}_{\theeqnex}=\{f(g(a,x))\formeq f(x),\;g(x,a)\formeq x\}$$
The result is still a type 3 overlap, but the system is weakly
rewrite-sequential, since the redexes that are not needed immediately
in the creation of an outer redex are preserved for later rewriting.
\subind{positive parallel or}\subind{parallel or!positive}
\notind{$\mathbold{T}_{\sor+}$ (positive parallel or)}
The positive parallel-or equations in\/ $\mathbold{T}_{\sor+}$ of
Examples~\ref{exa:nonorthogonal} and~\ref{exa:parallel-or-problem}
give another example of a type 3 overlap where weak
rewrite-sequentiality fails. On the other hand, the negative
parallel-or equations of\/ $\mathbold{T}_{\sor-}$ in
Example~\ref{exa:nonorthogonal} have type 2 overlap, but they are
sequential. In a term of the form\/ \mbox{$\sor(s,t)$} where\/
\mbox{$s\onerewrite{{\boldscript{T}_{\sor-}}}\false$} and\/
\mbox{$t\onerewrite{{\boldscript{T}_{\sor-}}}\false$}, it is safe
to rewrite either\/ $s$ or\/ $t$ first, since the redexes in the
other, unrewritten, subterm are preserved for later rewriting.
\end{example}
\subind{overlap|)}
Theories extending sequentiality analysis, through concepts such as
weak rewrite-sequentiality, are open topics for research. I
conjecture that weak rewrite-sequentiality is an undecidable property
of rewrite-orthogonal systems, and that the natural concept of strong
rewrite-sequentiality has essentially the same properties as strong
sequentiality, except for allowing type 2 overlaps. Optimality is very
subtle in these extensions,\subind{term rewriting!optimal} since the
amount of sharing may vary depending on which of two overlapping
redexes is reduced. More interesting and powerful extensions of
sequentiality will require analysis of right-hand sides to deal with
type 3 overlaps. Such analysis should be related in interesting ways
to {\em strictness analysis\/} in functional programming languages
\cite{Mycroft,Hughes-strict}\auind{Mycroft, A.} \auind{Hughes, R. J. M.},
which detects partial strictness properties of defined
functions. {\em Abstract interpretation\/}
\cite{Abramsky-Hankin,Cousot}\auind{Abramsky, S.}\auind{Hankin, C.}\auind{Cousot, P.}\auind{Cousot, R.}
provides a promising approach to sequentiality analysis based on
right-hand sides.
Extensions of useful sequentiality analysis to systems whose
confluence is established by variations on the Knuth-Bendix procedure
will require the concept of residual to be generalized so that the
residual of a redex $\alpha$ may be an arbitrarily long rewriting {\em
sequence\/} used in resolving a critical pair involving $\alpha$.
Variations on sequentiality analysis for incremental and parallel
implementations of equational logic programming are discussed in
Sections~\ref{sec:parallel} and~\ref{sec:extensions}, respectively.
\subind{term rewriting!strategy|)}
\subind{proof strategy!equational|)}\subind{equation!proof strategy|)}
\subind{sequential term rewriting|)}\subind{sequentiality analysis|)}
\section[Algorithms and Data Structures]
{Algorithms and Data Structures to Implement Equational Languages}
\label{sec:algorithms}
The basic idea of implementing equational logic programming for
strongly sequential systems is straightforward. Represent terms as
linked structures with sharing, in the time-honored style of
Lisp \cite{McCarthy-Abrahams-Edwards-Hart-Levin,McCarthy}.\subind{Lisp}
\auind{McCarthy, J.}\auind{Abrahams, P. W.}\auind{Edwards, D. J.}
\auind{Hart, T. P.}\auind{Levin, M. I.}
At every step, find a strongly needed redex and rewrite it, halting if
and when the sequence ends with a normal form. A lot of work is required
to reduce these basic ideas to efficient practice. At the abstract level
of algorithm and data-structure design, the problem breaks naturally into
three components: a data structure to represent terms, a pattern-matching
and sequencing algorithm to find strongly needed redexes, and a driving
procedure to invoke the pattern-matcher/sequencer, perform the chosen
rewrites, and incorporate the results into the term data structure.
\subsection{Data Structures to Represent Terms}
\label{sec:representing-terms}
\subind{term!data structures|(} The natural data structure for terms
is a linked structure in a heap,\subind{heap} with
sharing\subind{sharing} allowed. Each occurrence of a symbol $f$ in a
term is represented by a node of storage containing $f$ and pointers
to its arguments. Sharing is accomplished by allowing several
different argument pointers to point to the same node. There are a
number of optimizations that coalesce small nodes, or break large
nodes into linked sequences, that have been explored in the literature
on Lisp compilers \cite{Bobrow-Clark}\auind{Bobrow, D.}\auind{Clark, D.}.
In this section, we consider data structures at an abstract level with
precisely one symbol per heap node, and assume that such optimizations
are applied at a lower level of implementation.
\subsubsection[Conceptual Model]{A Conceptual Model for Term Data Structures}
\label{sec:model-term-structures}
Some useful techniques for implementing equational logic programming
require more than the linked heap structures representing terms. For
example, it is sometimes better to represent the rewriting of $s$ to $t$
by a link from the head node of the representation of $s$ pointing to the
head node of $t$, rather than by an actual replacement of $s$ by
$t$. This representation still uses a heap, but the heap now represents a
portion of the infinite rewriting graph for a starting term, rather than
just a single term at some intermediate stage in rewriting to normal
form. Other techniques involve the memoing\subind{memo} of intermediate
steps to avoid recomputation---these require more efficient table lookup
than may be achieved with a linked heap. Fortunately, there is a single
abstract data structure that subsumes all of the major proposals as
special cases, and which allows a nice logical interpretation
\cite{Sherman-lazy}\auind{Sherman, D. J.}. This data structure is best
understood in terms of three tables representing three special sorts of
functions.
\begin{definition}
\label{def:term-tables}
For each\/ \mbox{$i\geq 0$} let\/ $\Fun_i$ be a countably infinite set
of\/ {\em function symbols of arity\/ $i$}. The\/ $0$-ary function
symbols in\/ $\Fun_0$ are called\/ {\em constant symbols.}
$\mathbold{T}^0_P$ is the set of ground terms (terms without variables)
constructed from the given function symbols (see
Definition~\ref{def:FOPC-formula} of the chapter `Introduction: Logic
and Logic Programming Languages').
Let\/ $\mathbold{P}$ be a countably infinite set. Members of\/
$\mathbold{P}$ are called\/ {\em parameters,}\subind{parameter} and are
written\/ $\alpha,\beta,\ldots$, sometimes with subscripts. Formally,
parameters behave just like the\/ {\em variables\/} of
Definition~\ref{def:FOPC-formula}, but their close association with heap
addresses later on makes us think of them somewhat differently.
An\/ {\em $i$-ary signature\/}\subind{signature} is a member of\/
$\Fun_i\times\mathbold{P}^i$. The signature\/ \mbox{$\langle
f,\langle\alpha_1,\ldots,\alpha_i\rangle\rangle$}, is normally denoted
by\/ \mbox{$f(\alpha_1,\ldots,\alpha_i)$}. $\Sig$\notind{$\Sig$
(signatures)} denotes the set of signatures of all arities.
Let\/ $\bfnil$\notind{$\bfnil$ (meta-nil for term tables)} be a symbol
distinct from all function symbols, parameters, and signatures.
A\/ {\em parameter valuation\/}\subind{parameter valuation} is a function\/
\mbox{$\val:\mathbold{P}\funcarrow\Sig\cup\{\bfnil\}$}.
\notind{$\val$ (parameter valuation)}
A\/ {\em parameter replacement\/}\subind{parameter replacement} is a function\/
\mbox{$\repl:\mathbold{P}\funcarrow\mathbold{P}\cup\{\bfnil\}$}.
\notind{$\repl$ (parameter replacement)}
A\/ {\em signature index\/}\subind{signature index} is a function\/
\mbox{$\ind:\Sig\funcarrow\mathbold{P}\cup\{\bfnil\}$}.
\notind{$\ind$ (signature index)}
A parameter valuation, parameter replacement, or signature index is\/
{\em finitely based\/}\subind{finitely based} if and only if its value
is\/ $\bfnil$ for all but a finite number of arguments.
\end{definition}
The conventional representation of a term by a linked structure in a
heap\subind{heap} may be understood naturally as a table representing a
finitely based parameter valuation. The parameters\subind{parameter} are
the heap addresses, and the signatures\subind{signature} are the possible
values for data nodes.
\mbox{$\val(\alpha)$} is the signature stored at address $\alpha$.
But, we may also think of parameters as additional $0$-ary symbols, of
signatures as terms of height $1$ built from parameters, and of the
function $\val$ as a set of formulae asserting equalities between
parameters and signatures. Each value
\mbox{$\val(\alpha)=f(\beta_1,\ldots,\beta_i)\neq\bfnil$} of the
valuation function represents the formula \mbox{$\alpha\formeq
f(\beta_1,\ldots,\beta_i)$}. When $\val$ represents the contents of
a heap with the head symbol of a term $t$ stored at address $\alpha$,
then \mbox{$\alpha\formeq t$} is a logical consequence of the
equations represented by $\val$.
\begin{example}
\subind{parameter valuation|(}\notind{$\val$ (parameter valuation)|(}
\label{exa:parameter-valuation}
Consider the finitely based parameter valuation $\val$ given by
Table~\ref{tab:parameter-valuation}.
\begin{table}[btp]
\caption{\label{tab:parameter-valuation}
Parameter valuation representing \mbox{$f(g(a,f(a,a)),f(a,a)))$}.}
$$\begin{array}{r@{\hspace{1em}\mapsto\hspace{1em}}l}
\hline
\alpha_0 & f(\alpha_1,\alpha_2) \\
\alpha_1 & g(\alpha_3,\alpha_2) \\
\alpha_2 & f(\alpha_3,\alpha_4)\\
\alpha_3 & a \\
\alpha_4 & a \\
\hline
\end{array}$$
\end{table}
All values of $\val$ not shown in the tables are $\bfnil$. The linked
heap structure associated with $\val$ is shown in
Figure~\ref{fig:val-heap}.
\begin{figure}[btp]
\begin{center}
\setlength{\unitlength}{.1 cm}
\begin{picture}(60,60)(-30,-60)
\put(-12,-3){ \makebox(0,0){$\alpha_0$} }
\put(-9,-6){ \framebox(6,6){$f$} }
\put(-3,-6){ \framebox(6,6){\relax} }
\put(0,-3){ \circle*{1} }
\put(0,-3){ \vector(-1,-1){12} }
\put(3,-6){ \framebox(6,6){\relax} }
\put(6,-3){ \circle*{1} }
\put(6,-3){ \vector(1,-4){6} }
\put(-30,-21){ \makebox(0,0){$\alpha_1$} }
\put(-27,-24){ \framebox(6,6){$g$} }
\put(-21,-24){ \framebox(6,6){\relax} }
\put(-18,-21){ \circle*{1} }
\put(-18,-21){ \vector(1,-2){12} }
\put(-15,-24){ \framebox(6,6){\relax} }
\put(-12,-21){ \circle*{1} }
\put(-12,-21){ \vector(3,-1){18} }
\put(0,-33){ \makebox(0,0){$\alpha_2$} }
\put(3,-36){ \framebox(6,6){$f$} }
\put(9,-36){ \framebox(6,6){\relax} }
\put(12,-33){ \circle*{1} }
\put(12,-33){ \vector(-1,-1){12} }
\put(15,-36){ \framebox(6,6){\relax} }
\put(18,-33){ \circle*{1} }
\put(18,-33){ \vector(1,-1){12} }
\put(-12,-51){ \makebox(0,0){$\alpha_3$} }
\put(-9,-54){ \framebox(6,6){$a$} }
\put(24,-51){ \makebox(0,0){$\alpha_4$} }
\put(27,-54){ \framebox(6,6){$a$} }
\end{picture}
\end{center}
\caption{\label{fig:val-heap}
Linked structure representing \mbox{$f(g(a,f(a,a)),f(a,a)))$}.}
\end{figure}
The set of equations represented by $\val$ is $$\{\alpha_0\formeq
f(\alpha_1,\alpha_2),\;\alpha_1\formeq
g(\alpha_3,\alpha_2),\;\alpha_2\formeq
f(\alpha_3,\alpha_4),\;\alpha_3\formeq a,\;\alpha_4\formeq a\}$$
Logical consequences of these equations include
\mbox{$\alpha_2\formeq f(a,a)$}, \mbox{$\alpha_1\formeq
g(a,f(a,a))$}, and \mbox{$\alpha_0\formeq f(g(a,f(a,a)),f(a,a))$}.
\end{example}
It is useful to have a notation for the term represented explicitly by
a parameter valuation at a particular parameter.
\begin{definition}
\label{def:extended-parameter-valuation}
Let\/ $\val$ be a parameter valuation. The partial function\/
\mbox{$\val^*:\mathbold{P}\funcarrow\mathbold{T}^0_P$} is defined
inductively by\notind{$\val^*$ (extended parameter valuation)}
$$\val^*(\alpha)=f(\val^*(\beta_1),\ldots,\val^*(\beta_i))$$ when
$\val(\alpha)=f(\beta_1,\ldots,\beta_i)$. Notice that the case where\/
$f$ is a constant symbol, and therefore\/ \mbox{$i=0$}, provides a basis
for the inductive definition. If a value of\/ $\bfnil$ is encountered
anywhere in the induction, or if the induction fails to terminate because
of a loop, $\val^*$ is undefined.
\end{definition}
When \mbox{$\val^*(\alpha)$} is well-defined, the equation
\mbox{$\alpha\formeq\val^*(\alpha)$} is always a logical consequence
of the equations represented by $\val$.
\subind{parameter valuation|)}\notind{$\val$ (parameter valuation)|)}
Optimized implementations of equational logic programming languages
sometimes find it more efficient to link together nodes representing
left- and right-hand sides of equations, rather than to actually
perform rewriting steps. Such linking can be implemented by an
additional pointer in each node of a heap structure. The information
in these links is naturally represented by a parameter replacement
function. Given a set $\mathbold{T}$ of equations, it first appears that we
should think of the function $\repl$ as a set of formulae asserting
that one term rewrites to another---that is,
\mbox{$\repl(\alpha)=\beta\neq\bfnil$} represents the formula
\mbox{$\val^*(\alpha)\rewrites{{\boldscript{T}}}\val^*(\beta)$}.
But, further rewriting steps on subterms of the term represented by
$\alpha$ may invalidate such a relation. There are also
implementations that make efficient use of data structures for which
$\val^*$ is ill defined. So, the proper logical interpretation of
\mbox{$\repl(\alpha)=\beta$} as a formula is merely
\mbox{$\alpha\formeq\beta$}. An efficient implementation
manipulates $\val$ and $\repl$ so that $\beta$ is in some way a better
starting point for further rewriting than $\alpha$. The precise sense
in which it is better varies among different implementations. $\val$
and $\repl$ together yield a set of terms for each parameter, all of
which are known to be equal. The set \mbox{$\val^*_{\repl}(\alpha)$}
\notind{$\val^*_{\repl}$ (extended parameter valuation/replacement)|(}
defined below is the set of terms that may be read by starting at
$\alpha$ and following links in $\val$ and $\repl$.
\begin{definition}
\label{def:replaced-valuation-set}
Let\/ $\val$ be a parameter valuation and let\/ $\repl$ be a parameter
replacement. The function\/
\mbox{$\val^*_{\repl}:\mathbold{P}\funcarrow\powerset{{\boldscript{T}}^0_P}$}
is defined so that\/
\mbox{$\val^*_{\repl}(\alpha)$} is the least set satisfying
\begin{enumerate}
\item If\/ \mbox{$\val(\alpha)=f(\beta_1,\ldots,\beta_i)$}, then
$$\val^*_{\repl}(\alpha)\supseteq\{f(t_1,\ldots,t_i)\setsep
t_1\in\val^*_{\repl}(\beta_1)\land\cdots\land
t_i\in\val^*_{\repl}(\beta_i)\}$$
\item If\/ \mbox{$\repl(\alpha)\neq\nil$}, then\/
\mbox{$\val^*_{\repl}(\alpha)\supseteq\val^*_{\repl}(\repl(\alpha))$}
\end{enumerate}
\end{definition}
In the presence of loops, \mbox{$\val^*_{\repl}(\alpha)$} may be
infinite. Even without loops, its size may be exponential in the size
of the data structure representing $\val$ and $\repl$. The power of
such data structures derives from this ability to represent large sets
of equivalent terms compactly. When \mbox{$\val^*(\alpha)$} is well
defined, \mbox{$\val^*(\alpha)\in\val^*_{\repl}(\alpha)$}.
\notind{$\val^*_{\repl}$ (extended parameter valuation/replacement)|)}
Another special member of \mbox{$\val^*_{\repl}(\alpha)$} is particularly
interesting---the one reached by following $\repl$ links as much as
possible.
\begin{definition}
\label{def:replaced-valuation}
\subind{parameter valuation!replaced|(}\subind{replaced parameter valuation|(}
\notind{$\val^{\mbox{\it max}}_{\repl}$ (replaced parameter valuation)|(}
Let\/ $\val$ be a parameter valuation and let\/ $\repl$ be a parameter
replacement. The partial function\/ \mbox{$\val^{\mbox{\it
max}}_{\repl}:\mathbold{P}\funcarrow\mathbold{T}^0_P$}
is defined inductively by
$$\val^{\mbox{\it max}}_{\repl}(\alpha)=\val^{\mbox{\it
max}}_{\repl}(\repl(\alpha))$$ when\/
\mbox{$\repl(\alpha)\neq\bfnil$}.
$$\val^{\mbox{\it max}}_{\repl}(\alpha)= f(\val^{\mbox{\it
max}}_{\repl}(\beta_1),\ldots,\val^{\mbox{\it max}}_{\repl}(\beta_i))$$
when\/ \mbox{$\repl(\alpha)=\bfnil$} and\/
$\val(\alpha)=f(\beta_1,\ldots,\beta_i)$. As with\/ $\val^*$,
$\val^{\mbox{\it max}}_{\repl}$ is undefined if\/ $\bfnil$ is
encountered as a value of\/ $\val$, or if the induction fails to
terminate because of a loop.
\end{definition}
\begin{example}
\label{exa:val-repl}\addtocounter{eqnex}{1}
Consider the finitely based parameter valuation\/ $\val$ and parameter
replacement\/ $\repl$ given by Table~\ref{tab:val-repl}. All values
of\/ $\val$ and\/ $\repl$ not shown in the tables are\/ $\bfnil$.
These tables represent some of the consequences of the equations\/
$$\mathbold{T}_{\theeqnex}=\{f(f(x,y),z)\formeq g(x,z),\;b\formeq c\}$$
when used to rewrite the term $f(f(a,b),b)$
\begin{table}[btp]
\caption{\label{tab:val-repl}
Parameter valuation and replacement.}
$$\begin{array}{r@{\hspace{1em}\mapsto\hspace{1em}}lc@{\hspace{6em}}
r@{\hspace{1em}\mapsto\hspace{1em}}l}
\hline
\multicolumn{2}{c}{\val} && \multicolumn{2}{c}{\repl} \\ \hline
\alpha_0 & f(\alpha_1,\alpha_3) && \alpha_0 & \alpha_4 \\
\alpha_1 & f(\alpha_2,\alpha_3) && \alpha_1 & \bfnil \\
\alpha_2 & a && \alpha_2 & \bfnil \\
\alpha_3 & b && \alpha_3 & \alpha_5 \\
\alpha_4 & g(\alpha_2,\alpha_3) && \alpha_4 & \bfnil \\
\alpha_5 & c && \alpha_5 & \bfnil \\
\hline
\end{array}$$
\end{table}
The linked heap structure associated with $\val$ and $\repl$ is shown
in Figure~\ref{fig:val-repl-heap}. The rightmost link in each node
$\alpha$ points to \mbox{$\repl(\alpha)$}.
\begin{figure}[btp]
\begin{center}
\setlength{\unitlength}{.1cm}
\begin{picture}(100,70)(-50,-70)
\put(-35,-3){ \makebox(0,0){$\alpha_0$} }
\put(-32,-6){ \framebox(6,6){$f$} }
\put(-26,-6){ \framebox(6,6){\relax} }
\put(-23,-3){ \circle*{1} }
\put(-23,-3){ \vector(-1,-2){6} }
\put(-20,-6){ \framebox(6,6){\relax} }
\put(-17,-3){ \circle*{1} }
\put(-17,-3){ \vector(1,-4){13} }
\put(-14,-6){ \framebox(6,6){\relax} }
\put(-11,-3){ \circle{1} }
\put(-11,-3){ \vector(1,0){19} }
\put(-47,-21){ \makebox(0,0){$\alpha_1$} }
\put(-44,-24){ \framebox(6,6){$f$} }
\put(-38,-24){ \framebox(6,6){\relax} }
\put(-35,-21){ \circle*{1} }
\put(-35,-21){ \vector(0,-1){36} }
\put(-32,-24){ \framebox(6,6){\relax} }
\put(-29,-21){ \circle*{1} }
\put(-29,-21){ \vector(1,-2){18} }
\put(-26,-24){ \framebox(6,6){\relax} }
\put(-23,-21){ \makebox(0,0){$\bfnil$} }
\put(-41,-63){ \makebox(0,0){$\alpha_2$} }
\put(-38,-66){ \framebox(6,6){$a$} }
\put(-32,-66){ \framebox(6,6){\relax} }
\put(-29,-63){ \makebox(0,0){$\bfnil$} }
\put(-12,-63){ \makebox(0,0){$\alpha_3$} }
\put(-9,-66){ \framebox(6,6){$b$} }
\put(-3,-66){ \framebox(6,6){\relax} }
\put(0,-63){ \circle{1} }
\put(0,-63){ \vector(1,0){14} }
\put(13,-3){ \makebox(0,0){$\alpha_4$} }
\put(16,-6){ \framebox(6,6){$g$} }
\put(22,-6){ \framebox(6,6){\relax} }
\put(25,-3){ \circle*{1} }
\put(25,-3){ \vector(-1,-1){54} }
\put(28,-6){ \framebox(6,6){\relax} }
\put(31,-3){ \circle*{1} }
\put(31,-3){ \vector(-1,-2){27} }
\put(34,-6){ \framebox(6,6){\relax} }
\put(37,-3){ \makebox(0,0){$\bfnil$} }
\put(19,-63){ \makebox(0,0){$\alpha_5$} }
\put(22,-66){ \framebox(6,6){$c$} }
\put(28,-66){ \framebox(6,6){\relax} }
\put(31,-63){ \makebox(0,0){$\bfnil$} }
\end{picture}
\end{center}
\caption{\label{fig:val-repl-heap}
Linked structure showing $\val$ and $\repl$ links.}
\end{figure}
By following links from\/ $\alpha_0$ in the table we can construct the
six ground terms in\/ \mbox{$\val^*_{\repl}(\alpha_0)$}:
\mbox{$\val^*(\alpha_0)=f(f(a,b),b)$},
\mbox{$f(f(a,b),c)$}, \mbox{$f(f(a,c),b)$}, \mbox{$f(f(a,c),c)$},
\mbox{$g(a,b)$}, and\/ \mbox{$\val^{\mbox{\it
max}}_{\repl}(\alpha_0)=g(a,c)$}. Every equality between these terms
is a logical consequence of\/ $\mathbold{T}_{\theeqnex}$, and all of these
equalities may be read immediately from the data structure by
following links from\/ $\alpha_0$.
\end{example}
\subind{parameter valuation!replaced|)}\subind{replaced parameter valuation|)}
\notind{$\val^{\mbox{\it max}}_{\repl}$ (replaced parameter valuation)|)}
\subind{signature index|(}
A prime weakness of data structures based on parameter valuations and
replacements is that both functions require a parameter as argument.
Given a newly constructed signature, there is no direct way, other than
searching through the parameter valuation, to discover whether
information on that signature is already available. Signature indexes are
intended precisely to allow a newly constructed signature to be
translated to an equivalent parameter. While finitely based parameter
valuations and replacements are normally implemented by direct memory
access, using parameters as addresses, the number of possible signatures
is generally too great to allow such an implementation of a finitely
based signature index. General-purpose table-look-up methods are used
instead, usually hash tables \cite{Knuth-3}\auind{Knuth, D.
E.}.\subind{hash table} A typical application of a hash-table
representation of a signature index is the {\em hashed cons\/}
optimization in Lisp\subind{Lisp}\subind{hashed cons}
\cite{Spitzen-Levitt-Robinson}, \auind{Spitzen, J. M.}\auind{Levitt,
K. N.}\auind{Robinson, L.} where every newly constructed node is
looked up in a hash table to see whether it already exists in the
heap---if it does the existing node may be shared instead of creating
another copy in the heap. The most obvious use of a signature index
$\ind$, such as the hashed cons application, requires that whenever
\mbox{$\ind(f(\beta_1,\ldots,\beta_i))=\alpha\neq\bfnil$}, then
\mbox{$\val(\alpha)=f(\beta_1,\ldots,\beta_i)$}; that is, $\val$ is a
partial inverse to $\ind$. It may be advantageous in some cases to
let \mbox{$\ind(f(\beta_1,\ldots,\beta_i))$} be a parameter known to
be equal to \mbox{$f(\beta_1,\ldots,\beta_i)$}. The proper logical
interpretation of
\mbox{$\ind(f(\beta_1,\ldots,\beta_i))=\alpha\neq\bfnil$} is merely
the formula \mbox{$f(\beta_1,\ldots,\beta_i)\formeq\alpha$}. So,
$\ind$ provides the same type of logical information as $\val$, but
allows access to that information through a signature argument,
instead of a parameter argument.
\subind{signature index|)}
\subsubsection[Logical Interpretation]
{Logical Interpretation of Term Data Structures}
\label{sec:interpret-term-structures}
\subind{term!data structures!logical interpretation|(}
Each point in the graph of a parameter valuation $\val$, a parameter
replacement $\repl$, and a signature index $\ind$ represents an
equation. An entire data structure consisting of finitely based
functions $\val$, $\repl$, and $\ind$ represents a suitably quantified
conjunction of these equations. For definiteness, suppose that inputs
and outputs are rooted at the parameter $\alpha_0$. Then the logical
meaning of $\val$, $\repl$, and $\ind$ is the conjunction of all their
equations, with all parameters {\em except\/} $\alpha_0$ existentially
quantified.
\begin{definition}
\label{def:logical-interpretation}
Let\/ \mbox{$\val,\repl,\ind$} be a parameter valuation, a parameter
replacement, and a signature index, respectively, all three finitely
based. Let\/ \mbox{$\alpha_1,\ldots,\alpha_n$} be all of the
parameters\/ $\beta$ occurring in the finite basis of the domain of\/
$\val$ or of $\repl$ (\mbox{$\val(\beta)\neq\nil$} or\/
\mbox{$\repl(\beta)\neq\nil$}), or in the range of\/ $\repl$, or as a
component of a signature in the finite basis of the domain of\/ $\ind$
or in the range of\/ $\val$, except for the input/output parameter\/
$\alpha_0$. The\/ {\em logical interpretation\/} of\/
\mbox{$\val,\repl,\ind$} is the formula $F_{\val,\repl,\ind}$ defined
by $$F_{\val,\repl,\ind}=(\exists\alpha_1,\ldots\alpha_n\quantsep G)$$
where $G$ is the conjunction of all the equations
\begin{enumerate}
\item \mbox{$\beta\formeq f(\gamma_1,\ldots,\gamma_i)$} where\/
\mbox{$\val(\beta)= f(\gamma_1,\ldots,\gamma_i)$}
\item \mbox{$\beta\formeq\gamma$} where\/ \mbox{$\repl(\beta)=\gamma$}
\item \mbox{$f(\beta_1,\ldots,\beta_i)\formeq\gamma$} where\/
\mbox{$\ind(f(\beta_1,\ldots,\beta_i))=\gamma$}
\end{enumerate}
\end{definition}
\begin{example}
\label{exa:logical-data-structure}
Consider the finitely based parameter valuation\/ $\val$ and parameter
replacement\/ $\repl$ discussed in Example~\ref{exa:val-repl}, and
shown in Table~\ref{tab:val-repl} and Figure~\ref{fig:val-repl-heap}.
If\/ $\alpha_0$ is the parameter used for the root of the input and
output, then the logical interpretation of\/ $\val$ and $\repl$ is
$$\begin{array}{rl}
\exists\alpha_1,\ldots,\alpha_5\quantsep &
\alpha_0\formeq f(\alpha_1,\alpha_3)\land\alpha_1\formeq f(\alpha_2,\alpha_3)\land
\alpha_2\formeq a\land\alpha_3\formeq b\land \\
& \alpha_4\formeq g(\alpha_2,\alpha_3)\land\alpha_5\formeq c\land\alpha_0\formeq \alpha_4\land
\alpha_3\formeq \alpha_5
\end{array}$$
\end{example}
The interpretation of parameter valuations, parameter replacements,
and signature indexes as encodings of existentially quantified
conjunctions of equations makes it much easier to insure correctness
of proposed algorithms for manipulating data structures based on these
functions. The essential idea is that a transformation of a data
structure from a state representing $\val_0$, $\repl_0$, $\ind_0$ to
one representing $\val_1$, $\repl_1$, $\ind_1$ in the computation of
an equational program $\mathbold{T}$ is logically permissible if and only
if the formula represented by $\val_1$, $\repl_1$, $\ind_1$ is a
logical consequence of $\mathbold{T}$ plus the formula represented by
$\val_0$, $\repl_0$, $\ind_0$. So, an evaluation algorithm may take
input $s$, start in a state where \mbox{$\val^{\mbox{\it
max}}(\alpha_0)=s$}, and apply permissible transformations to reach a
state where \mbox{$t\in\val^{\mbox{\it max}}_{\repl}(\alpha_0)$}, for
some normal form $t$. The permissibility of the transformations
guarantees that $t$ is a correct answer.
The interpretation of the term data structures as sets of formulae
applies the concepts of logic programming to the implementation of a
logic programming language. A similar use of logical concepts within the
implementation of a logic programming language occurs in recent
presentations of {\em unification\/}\subind{unification!as equation
transformation} algorithms (used in the implementation of Prolog) as processes
that derive solutions to equations, where every intermediate step
represents a new and simpler set of equations to be solved
\cite{Kapur-Krishnamoorthy-Narendran,Martelli-Montanari}.
\auind{Kapur, D.}\auind{Krishnamoorthy, M. S.}\auind{Narendran, P.}
\auind{Martelli, A.}\auind{Montanari, U.}
The view of unification as transformation of equations to be solved
clarifies the correctness of a number of clever and efficient
algorithmic techniques.
Similarly, the logical view of term data structures shows immediately
that a wide range of transformations of such data structures are
correct, leaving an implementer free to choose the ones that appear to
be most efficient. In order to represent new terms during a
computation, we must be careful not to introduce spurious information
about the input/output parameter $\alpha_0$, nor to postulate the
solvability of nontrivial equations. The concept of {\em
reachability,} defined below, may be used to enforce both of these
conditions.
\begin{definition}
\label{def:reachable}
\subind{reachable parameter}\subind{parameter!reachable}
The set of parameters\/ {\em reachable\/} from a given parameter\/
$\alpha$ is defined inductively by
\begin{enumerate}
\item $\alpha$ is reachable from\/ $\alpha$
\item If\/ $\beta$ is reachable from\/ $\alpha$, and one of the
following holds as well
\begin{enumerate}
\item \mbox{$\val(\gamma)=f(\ldots,\beta,\ldots)$}
\item \mbox{$\val(\beta)=f(\ldots,\gamma,\ldots)$}
\item \mbox{$\repl(\gamma)=\beta$}
\item \mbox{$\repl(\beta)=\gamma$}
\item \mbox{$\ind(f(\ldots,\gamma,\ldots))=\beta$}
\item \mbox{$\ind(f(\ldots,\beta,\ldots))=\gamma)$}
\end{enumerate}
then\/ $\gamma$ is reachable from\/ $\alpha$
\end{enumerate}
\end{definition}
Permissible transformations include
\subind{term!data structures!transformations}
\begin{enumerate}
\item Building new terms: when\/ $\alpha$ is not reachable from any of\/
\mbox{$\alpha_0,\beta_1,\ldots,\beta_i$}, reassign\/ \mbox{$\val(\alpha)\assign
f(\beta_1,\ldots,\beta_i)$}. (This allows both bottom-up and
top-down construction of directed-acyclic representations of terms,
using names completely unconnected to\/ $\alpha_0$. More liberal
conditions than the unreachability of\/ $\alpha$ from\/
\mbox{$\alpha_0,\beta_1,\ldots,\beta_i$} are possible, but they are
somewhat complex to define.)
\item Representing rewriting in $\repl$: when some\/
\mbox{$s\in\val^*_{\repl}(\alpha)$}
rewrites to some\/ \mbox{$t\in\val^*_{\repl}(\beta)$}, reassign\/
\mbox{$\repl(\alpha)\assign\beta$}.
\item Rewriting in\/ $\val$: when\/ \mbox{$\repl(\alpha)\neq\nil$},
reassign\/ \mbox{$\val(\alpha)\assign\val(\repl(\alpha))$}.
\item Compressing paths in\/
$\repl$: when\/ \mbox{$\repl(\repl(\alpha))\neq\nil$}, reassign\/
\mbox{$\repl(\alpha)\assign\repl(\repl(\alpha))$}.
\item Rewriting arguments in\/ $\val$: when\/
\mbox{$\val(\alpha)=f(\beta_1,\ldots,\beta_i)$} and\/
\mbox{$\repl(\beta_j)\neq\nil$}, reassign\/
\mbox{$\val(\alpha)\assign f(\beta_1,\ldots,\repl(\beta_j),\ldots,\beta_i)$}.
\item Opportunistic sharing: when\/
\mbox{$\ind(\val(\alpha))\neq\nil,\alpha$}, reassign\/
\mbox{$\repl(\alpha)\assign\ind(\val(\alpha))$}
\item Indexing: when\/ \mbox{$\val(\alpha)=f(\beta_1,\ldots,\beta_i)$},
reassign \mbox{$\ind(f(\beta_1,\ldots,\beta_i))\assign\alpha$}.
\item Garbage collecting: reassign\/
\mbox{$\val(\alpha)\assign\nil$}, and/or\/
\mbox{$\repl(\alpha)\assign\nil$}, and/or\/
\mbox{$\ind(f(\ldots))\assign\nil$}.
(This is always {\em permissible,} because it only erases assertions.
It is\/ {\em desirable\/} only when the information removed
is not useful for the remainder of the computation.)
\end{enumerate}
A straightforward evaluator scans \mbox{$\val^{\mbox{\it
max}}_{\repl}(\alpha_0)$} to find an instance of a left-hand side $s$
of an equation rooted at some node $\beta$, uses transformation 1
repeatedly to build a copy of the corresponding right-hand side $t$ in
free nodes rooted at $\gamma$, then links $\beta$ to $\gamma$ by
transformation 2. Transformations 3--5 allow the same term
\mbox{$\val^{\mbox{\it max}}_{\repl}(\alpha_0)$} to be scanned more
efficiently, by reducing the number of $\repl$ links that must be
followed. Transformations 6 and 7 are used for optimizations such as {\em
hashed cons, congruence closure,} and {\em memo functions\/} to avoid
re-evaluating the same subterm when it is constructed repeatedly in a
computation. Section~\ref{sec:drivers} discusses several
optimizations using transformations 3--7.
The existential quantification of all parameters other than $\alpha_0$
is required for the logical correctness of transformation 1 (term
building), which changes the meaning of some existentially quantified
parameter, without affecting any assertion about $\alpha_0$. 2--8 are
permissible independent of the quantification. Notice that only
transformation 2 depends on the given equational program, and only 2
adds information to the data structure. 1 is needed to build new
terms required by 2, but by itself 1 does not change the logical
assertions about $\alpha_0$. 3--8 can only {\em reduce\/} the
information given by $\val$ and $\repl$, but they may improve the
efficiency of access to information that is retained. 3--7 never
change \mbox{$\val^{\mbox{\it max}}_{\repl}(\alpha_0)$}.
Transformation 8 is normally applied only to nodes that are
inaccessible from $\alpha_0$, in which case 8 also preserves
\mbox{$\val^{\mbox{\it max}}_{\repl}(\alpha_0)$}.
The logical interpretation introduced in
Definition~\ref{def:logical-interpretation} seems to be the most
natural one for explaining currently known techniques for implementing
equational logic programming, but others are possible, and might lead
to useful extensions of, or variations on, the current sort of
equational logic programming. Let \mbox{$\alpha_1,\ldots,\alpha_n$} be
all of the parameters used in \mbox{$\val,\repl,\ind$}, other than the
input/output parameter $\alpha_0$, and let $G$ be the conjunction of
all the equations represented by points in the graphs of
\mbox{$\val,\repl,\ind$}, just as in
Definition~\ref{def:logical-interpretation}. Two other interesting
logical interpretations worthy of study are
\begin{enumerate}
\item
\mbox{$(\exists\alpha_0,\alpha_1,\ldots,\alpha_n\quantsep
G)\limplies(\exists\alpha_1,\ldots,\alpha_n\quantsep G)$}
\item \mbox{$\forall\alpha_0,\alpha_1,\ldots,\alpha_n\quantsep
(G\limplies(\mathord{\mbox{{\it input\/}}}\formeq\alpha_0))$}, where
$\mathord{\mbox{{\it input\/}}}$ is a new zeroary symbol
\end{enumerate}
The first proposal above allows introduction of arbitrary structures
not connected to the input/output variable $\alpha_0$. The second one
allows for solution of equations, where every solution to the output
is guaranteed to be a solution to the input problem as well.
\subind{term!data structures!logical interpretation|)}
\subind{term!data structures|)}
\subsection{Pattern-Matching and Sequencing Methods}
\label{sec:pattern-matching}
\subind{pattern-matching|(}\subind{sequencing of equational evaluation|(}
Given a representation of a term, the implementation of an equational
logic program must perform {\em pattern-matching\/} to discover
instances of left-hand sides of equations that may be rewritten, and
must apply {\em sequencing\/} techniques to determine which of several
such redexes to rewrite. These two conceptual tasks appear to be
inextricably connected, so it is best to provide a single procedure to
do both---that is, to find the next appropriate redex to rewrite in a
given term. In order to find and choose a redex in a term, a
procedure traverses the term until it has gathered enough information
about the symbols in the term to make its choice. The full details of
pattern-matching and sequencing methods are too long for this chapter.
So, I describe two basic approaches to the problem and illustrate them
by examples. The reader who needs more detail should
consult
\cite{Huet-Levy,Hoffmann-O'Donnell-POPL1,Hoffmann-O'Donnell-Strandh,O'Donnell-MIT,Klop-term,Klop-Middledorp}.
\auind{Huet, G.}\auind{L\'evy, J.-J.}\auind{Hoffmann, C. M.}
\auind{O'Donnell, M. J.}\auind{Klop, J. W.}\auind{Middeldorp, A.}
\auind{Strandh, R. I.}
\subsubsection[$\Omega$-Terms]
{Representing Sequencing Information by $\Omega$-Terms}
\label{sec:omega-terms}
A natural way to represent partial information about a term is to
present the known structure as a term, with a special symbol
($\Omega$) representing unknown portions.
\begin{definition}
\label{def:omega-term}
\subind{omega term}\notind{$\mathbold{T}^{\Omega}_{P}$ ($\Omega$-terms)}
\notind{$\Omega$ (unknown subterm)}
The set\/ $\mathbold{T}^{\Omega}_{P}$ of\/ {\em $\Omega$-terms\/} is
defined in the same way as the set\/ $\mathbold{T}_P$ of terms
(Definition~\ref{def:FOPC-formula} of the chapter `Introducation:
Logic and Logic Programming Languages'),
except the new constant $\Omega$ is added to the set\/ $\mbox{\bf
Fun}_0$ of symbols of arity\/ $0$.
An\/ $\Omega$-term\/ $s$ is\/ {\em less defined than\/} an\/
$\Omega$-term\/ $t$, written\/ $s\sqsubseteq t$, if and only if\/ $s$
is the result of replacing zero or more subterms of\/ $t$ by\/
$\Omega$.\notind{$\sqsubseteq$ (less defined relation on $\Omega$-terms)}
\mbox{$s\sqcap t$} denotes the greatest lower bound of\/ $s$ and\/ $t$.
\notind{$\sqcap$ (greatest lower bound of $\Omega$-terms)}
When\/ $\Omega$-terms\/ $s$ and\/ $t$ have a common upper bound (that
is, when there is a\/ $u$ such that\/ \mbox{$s\sqsubseteq u$} and\/
\mbox{$t\sqsubseteq u$}), we write\/ $s\uparrow t$.
\end{definition}
\notind{$\uparrow$ (consistency relation on $\Omega$-terms)}
The $\Omega$ in an $\Omega$-term behaves formally much like a variable,
except that each occurrence of the same symbol $\Omega$ is treated as
a {\em unique\/} variable, occurring only at that location.
There is an elegant and simple procedure, called {\em melting,} for
\subind{melting!$\Omega$-terms|(}
computing all of the possible effects of $\omega$-rewriting on an
$\Omega$-term.
\begin{definition}
\label{def:melting}
\notind{$\patt$ (left-hand side patterns in equations)}
The function\/ \mbox{$\patt:\mathbold{T}_P\rightarrow\mathbold{T}^{\Omega}_P$} is
defined by
$$\patt(t)=\subst{\Omega,\ldots,\Omega}{x_1,\ldots,x_m}{t}$$ where\/
\mbox{$x_1,\ldots,x_m$} are all of the variables occurring in\/ $t$.
Extend\/ $\patt$ to sets of equations
\mbox{$\mathbold{T}=\{\l_1\formeq r_1,\ldots,l_n\formeq r_n\}$}
by $$\patt(\mathbold{T})=\{\patt(l_1),\ldots,\patt(l_n)\}$$
Given a set\/ $\mathbold{T}$ of equations, and an\/ $\Omega$-term\/
\mbox{$s\in\mathbold{T}^{\Omega}_P$}, $s$ is transformed into\/
\mbox{$\melt^{\boldscript{T}}_{\omega}(s)$} by the following extremely
\notind{$\melt^{\boldscript{T}}_{\omega}$ (melting $\Omega$-terms)|(}
simple polynomial time procedure:
\begin{enumerate}
\item If\/ \mbox{$s=\subst{u}{x}{t}$}, where\/ \mbox{$u\neq\Omega$}
and\/ \mbox{$u\uparrow p$} for some\/ \mbox{$p\in\patt(\mathbold{T})$},
then replace\/ $s$ by\/ \mbox{$\subst{\Omega}{x}{t}$}.
\item Repeat (1) above until it is not applicable.
\end{enumerate}
\end{definition}
If $s$ is an $\Omega$-term representing current information about a
term $t$ that we are rewriting to normal form, then
\mbox{$\melt^{\boldscript{T}}_{\omega}(s)$} represents precisely the
information in $s$ that is guaranteed to hold for all $t'$ such that
\mbox{$t\rewritesm{\boldscript{T}}{\omega}t'$}. By marking an
occurrence with a new inert symbol, and melting, we can determine
whether $\omega$-rewriting may eliminate that occurrence without
rewriting it---that is, we determine whether the occurrence is
strongly needed.
\begin{proposition}
\label{pro:melting}
\subind{strongly needed redex}\subind{redex!strongly needed}
Let\/ $s$ be an\/ $\Omega$-term, and let\/ $\alpha$ be an occurrence in\/
$s$. Let\/ $s'$ be the result of replacing $\alpha$ in $s$ with a new
constant symbol, $\bullet$.\notind{$\bullet$ (marker for strongly needed
redex)} $\alpha$ is strongly needed if and only if\/
\mbox{$\melt^{\boldscript{T}}_{\omega}(s)$} contains an occurrence of\/
$\bullet$.
\end{proposition}
\subind{melting!$\Omega$-terms|)}
\notind{$\melt^{\boldscript{T}}_{\omega}$ (melting $\Omega$-terms)|)}
Huet and L\'evy \cite{Huet-Levy}\auind{Huet, G.}\auind{L\'evy, J.-J.}
propose a pattern-matching and sequencing technique that accumulates
an $\Omega$-term $s_\alpha$ (initially just $\Omega$), representing
information about the subterm at a particular node $\alpha$ in a term
$t$. At each step, they choose a strongly needed occurrence of
$\Omega$ in $s_\alpha$, and extend $s_\alpha$ by reading the symbol at
the corresponding subterm occurrence in $t$. The information in
$s_\alpha$ may be used to control pattern-matching and sequencing as
follows:
\begin{enumerate}
\item If there is a pattern \mbox{$p\in\patt(\mathbold{T})$} such that
\mbox{$p\sqsubseteq s_\alpha$}, then $\alpha$ is a redex occurrence
(in strongly sequential systems we always get\/ \mbox{$s_\alpha=p$},
but it is better to think of the more general case).
\item If there is no pattern \mbox{$p\in\patt(\mathbold{T})$} such that
\mbox{$s_\alpha\uparrow p$}, then $\alpha$ is not currently a redex
occurrence.
\item If \mbox{$\melt^{\boldscript{T}}_{\omega}(s_\alpha)\neq\Omega$},
then $\alpha$ will never be a redex occurrence.
\item It is safe to query the symbol in $t$ corresponding to a
strongly needed occurrence of $\Omega$ in $s_\alpha$, and to rewrite
any redex occurring there, since it must be strongly needed.
\end{enumerate}
In cases 2 and 3, Huet and L\'evy move to a proper descendant
$\alpha'$ of $\alpha$ corresponding to the largest subterm $s'$ of $s_\alpha$,
containing the newly read symbol, such that \mbox{$s'\uparrow p$} for
some \mbox{$p\in\patt(\mathbold{T})$} (because of strong sequentiality, in
fact \mbox{$s'\sqsubseteq p$}). Then they let \mbox{$s_{\alpha'}=s'$} and
proceed at $\alpha'$. In case 1, the fact that we have reached
$\alpha$ by safe moves implies that the redex is strongly needed. So,
they rewrite, and continue processing at an appropriate ancestor of
$\alpha$, rereading whatever symbols have been changed by rewriting at
$\alpha$.
\begin{figure}[btp]
\begin{center}
\setlength{\unitlength}{.1 cm}
\begin{picture}(110,60)(-55,-60)
\put(-40,-3){ \makebox(0,0){$f$} }
\put(-42,-5){ \vector(-1,-1){6} }
\put(-38,-5){ \vector(1,-1){6} }
\put(-50,-13){ \makebox(0,0){$\Omega$} }
\put(-30,-13){ \makebox(0,0){$f$} }
\put(-32,-15){ \vector(-1,-1){6} }
\put(-28,-15){ \vector(1,-1){6} }
\put(-40,-23){ \makebox{$f$} }
\put(-20,-23){ \makebox{$a$} }
\put(-42,-25){ \vector(-1,-1){6} }
\put(-38,-25){ \vector(1,-1){6} }
\put(-50,-33){ \makebox{$\Omega$} }
\put(-30,-33){ \makebox{$a$} }
\put(10,-3){ \makebox(0,0){$f$} }
\put(8,-5){ \vector(-1,-1){6} }
\put(12,-5){ \vector(1,-1){6} }
\put(0,-13){ \makebox(0,0){$b$} }
\put(20,-13){ \makebox(0,0){$f$} }
\put(18,-15){ \vector(-1,-1){6} }
\put(22,-15){ \vector(1,-1){10} }
\put(10,-23){ \makebox{$f$} }
\put(34,-27){ \makebox{$f$} }
\put(8,-25){ \vector(-1,-1){6} }
\put(12,-25){ \vector(1,-1){6} }
\put(32,-29){ \vector(-1,-1){6} }
\put(36,-29){ \vector(1,-1){6} }
\put(0,-33){ \makebox{$b$} }
\put(20,-33){ \makebox{$b$} }
\put(24,-37){ \makebox{$b$} }
\put(44,-37){ \makebox{$f$} }
\put(42,-39){ \vector(-1,-1){6} }
\put(46,-39){ \vector(1,-1){6} }
\put(34,-47){ \makebox(0,0){$f$} }
\put(54,-47){ \makebox(0,0){$b$} }
\put(32,-49){ \vector(-1,-1){6} }
\put(36,-49){ \vector(1,-1){6} }
\put(24,-57){ \makebox(0,0){$b$} }
\put(44,-57){ \makebox(0,0){$b$} }
\end{picture}
\end{center}
\caption{\label{fig:omega-terms}
Pictures of terms for sequencing example}
\end{figure}
\begin{example}
\label{exa:omega-sequencing}
\addtocounter{eqnex}{1}
Consider the strongly sequential system $$\begin{array}{c}
\mathbold{T}_{\theeqnex}=\{f(x,f(f(y,a),a))\formeq a,\;b\formeq a\} \\[3ex]
\patt(\mathbold{T}_{\theeqnex})=\{f(\Omega,f(f(\Omega,a),a)),\;b\}
\end{array}$$
and search for a strongly needed redex in the term
$$t=f(b,f(f(b,b),f(b,f(f(b,b),b))))$$
with occurrences\/ $\Lambda$ (the root), $1$ (leftmost\/ $b$), $2$
(\mbox{$f(f(b,b),f(b,f(f(b,b),b)))$}), $2.1$ (leftmost
\mbox{$f(b,b)$}), $2.1.1$, $2.1.2$ (second and third\/ $b$s), $2.2$
(\mbox{$f(b,f(f(b,b),b))$}), $2.2.1$ (fourth\/ $b$), $2.2.2$
(\mbox{$f(f(b,b),b)$}), $2.2.2.1$ (rightmost \mbox{$f(b,b)$}),
$2.2.2.1.1$, $2.2.2.1.2$, $2.2.2.2$ (fifth, sixth and seventh\/ $b$s).
The pattern\/ \mbox{$f(\Omega,f(f(\Omega,a),a))$}, and the term\/ $t$,
are shown pictorially in Figure~\ref{fig:omega-terms}.
The occurrence of\/ $\Omega$ chosen for expansion at each step is
underlined, and the subterm replacing it is shown in a box.
\begin{itemize}
\item Start at the root of\/ $t$, with\/ \mbox{$s_\Lambda=\underline{\Omega}$}.
\begin{itemize}
\item There is only one choice, so read the\/ $f$ at the root, and
expand to\/ \mbox{$s_\Lambda=\fbox{$f(\Omega,\underline{\Omega})$}$}.
\item Only the rightmost\/ $\Omega$ is strongly needed, so read the
corresponding symbol, and expand to\/
\mbox{$s_\Lambda=f(\Omega,\fbox{$f(\Omega,\underline{\Omega})$})$}.
\item Only the rightmost\/ $\Omega$ is strongly needed,
so expand to\/
\mbox{$s_\Lambda=
f(\Omega,f(\Omega,\fbox{$f(\Omega,\Omega)$}))$}.
\item The new\/ $\Omega$-term is incompatible with left-hand side
patterns (the\/ $f$ conflicts with an\/ $a$ in the pattern), so we
find the largest subterm containing the box
that is compatible with a pattern. That subterm is the second
principal subterm at $2$,
\mbox{$s'=f(\Omega,\fbox{$f(\Omega,\Omega)$})$},
so initialize\/ $s_2$ to\/ $s'$ and continue at\/ $2$.
\end{itemize}
\item Now process the second principal subterm of\/ $t$,
\mbox{$t_2=f(f(b,b),f(b,f(f(b,b),b)))$}, with
\mbox{$s_2=f(\Omega,f(\Omega,\underline{\Omega}))$}.
\begin{itemize}
\item Only the rightmost\/ $\Omega$ is strongly needed,
so expand to\/
\mbox{$s_2=f(\Omega,f(\Omega,\fbox{$f(\Omega,\Omega)$}))$}.
\item The new\/ $\Omega$-term is incompatible with the patterns
(the\/ $f$ conflicts with an\/ $a$ in the pattern), so we find\/
\mbox{$s'=f(\Omega,f(\Omega,\Omega))$}, initialize\/ $s_{2.2}$ to\/
$s'$, and continue at\/ $2.2$.
\end{itemize}
\item Now process the subterm\/ \mbox{$t_{2.2}=f(b,f(f(b,b),b))$} at\/
$2.2$, with\/ \mbox{$s_{2.2}=f(\Omega,f(\Omega,\underline{\Omega}))$}.
\begin{itemize}
\item Only the rightmost\/ $\Omega$ is strongly needed, so expand
to\/ \mbox{$s_{2.2}=f(\Omega,\fbox{$f(\Omega,\underline{\Omega})$})$}.
\item Only the rightmost\/ $\Omega$ is strongly needed,
so expand to\/
\mbox{$s_{2.2}=f(\Omega,f(\Omega,\fbox{$b$}))$}.
\item The new\/ $\Omega$-term is incompatible with the patterns
(the\/ $b$ conflicts with an\/ $a$ in the pattern), so we find\/
\mbox{$s'=b$}, initialize\/ $s_{2.2.2.2}$ to\/
$s'$, and continue at\/ $2.2.2.2$.
\end{itemize}
\item Now process the subterm\/ \mbox{$t_{2.2.2.2}=b$} at\/ $2.2.2.2$,
with\/ \mbox{$s_{2.2.2.2}=b$}. \mbox{$b\sqsubseteq s_{2.2.2.2}$},
and\/ \mbox{$b\in\patt(\mathbold{T}_{\theeqnex})$}, so this subterm
is a strongly needed redex. Rewrite it to\/ $a$, yielding
$$t=f(b,f(f(b,b),f(b,f(f(b,b),a))))$$
and continue at\/ $2.2$, using the last value for\/ $s_{2.2}$ before
reading\/ $2.2.2.2$:
\mbox{$s_{2.2}=f(\Omega,f(\Omega,\underline{\Omega}))$}.
\begin{itemize}
\item Expand the fourth\/ $\Omega$ again, but this time we read an\/
$a$ instead of a\/ $b$, yielding\/
\mbox{$s_{2.2}=f(\Omega,f(\underline{\Omega},\fbox{$a$}))$}.
\item Only the rightmost\/ $\Omega$ is strongly needed, so expand to\/
\mbox{$s_{2.2}=f(\Omega,f(\fbox{$f(\Omega,\underline{\Omega})$},a))$}.
\item Only the rightmost\/ $\Omega$ is strongly needed, so expand to\/
\mbox{$s_{2.2}=f(\Omega,f(f(\Omega,\fbox{$b$}),a))$}.
\item Again the new\/ $\Omega$-term is incompatible with the
patterns, and further processing at\/ $2.2.2.1.2$ rewrites the\/
$b$ to\/ $a$, yielding
$$t=f(b,f(f(b,b),f(b,f(f(b,a),a))))$$
We continue at\/ $2.2$, with the last version of\/ $s_{2.2}$ before
reading\/ $2.2.2.1.2$. Extend\/
\mbox{$s_{2.2}=f(\Omega,f(f(\Omega,\underline{\Omega}),a))$}
again, this time to
\mbox{$s_{2.2}=f(\Omega,f(f(\Omega,\fbox{$a$}),a))$}.
\item Now\/ \mbox{$f(\Omega,f(f(\Omega,a),a))\sqsubseteq s_{2.2}$},
and\/
\mbox{$f(\Omega,f(f(\Omega,a),a))\in\patt(\mathbold{T}_{\theeqnex})$},
so we have a strongly needed redex, which we rewrite to\/ $a$, yielding
$$t=f(b,f(f(b,b),a))$$
\end{itemize}
\item Continue at\/ $2$ with the last\/ $\Omega$-term before reading\/
$2.2$: \mbox{$s_2=f(\Omega,\underline{\Omega})$}.
\begin{itemize}
\item Expand the rightmost\/ $\Omega$ again, but this time we read an\/
$a$ instead of an\/ $f$, yielding\/
\mbox{$s_2=f(\Omega,\fbox{$a$}))$}.
\item The new\/ $\Omega$-term is incompatible with the patterns, and
there is no compatible subterm, so we move back toward the root.
\end{itemize}
\item Continue at\/ $\Lambda$ with the last $\Omega$ term before
reading\/ $2.2$:
\mbox{$s_\Lambda=f(\Omega,f(\Omega,\underline{\Omega}))$}.
\begin{itemize}
\item This time we read an\/ $a$ instead of an\/ $f$, yielding:
\mbox{$s_\Lambda=f(\Omega,f(\underline{\Omega},\fbox{$a$}))$}.
\item Only the rightmost\/ $\Omega$ is strongly needed, so expand to\/
\mbox{$s_\Lambda=f(\Omega,f(\fbox{$f(\Omega,\underline{\Omega})$},a))$}.
\item Only the rightmost\/ $\Omega$ is strongly needed, so expand to\/
\mbox{$s_\Lambda=f(\Omega,f(f(\Omega,\fbox{$b$}),a))$}.
\item As before, the $\Omega$-term is incompatible with the
patterns; further processing at\/ $2.1.2$ discovers that\/
$b$ is a redex, and rewrites it to\/ $a$, yielding
$$t=f(b,f(b,a),a)$$
Then we continue at\/ $\Lambda$ with\/
\mbox{$s_\Lambda=f(\Omega,f(f(\Omega,\underline{\Omega}),a))$}
again, this time reading an\/ $a$ and extending to\/
\mbox{$s_\Lambda=f(\Omega,f(f(\Omega,\fbox{$a$}),a))$}.
\item Now we have a redex pattern\/
\mbox{$f(\Omega,f(f(\Omega,a),a))$} at the root. We rewrite to
$a$ and start over at the root.
\end{itemize}
\item In general, there might be more computation at the root, but in
this example we immediately get\/
\mbox{$s_\Lambda=a$}, which is incompatible with the patterns. Since
there is nowhere left to go, we have a normal form.
\end{itemize}
\end{example}
Huet and L\'evy use $\Omega$-terms in a pattern-matching and
sequentializing method that succeeds for every strongly sequential
system of equations. Several similar but less general methods have
been proposed in order to simplify the method in special cases
\cite{O'Donnell-MIT,Strandh-thesis,Strandh-classes,Durand-Salinier,Durand,Ramakrishnan-Sekar}.
\auind{O'Donnell, M. J.}\auind{Strandh, R. I.}\auind{Durand, I.}
\auind{Salinier, B.}\auind{Ramakrishnan, I. V.}\auind{Sekar, R. C.}
\subsubsection[Sets of Subpatterns]
{Representing Sequencing Information by Sets of Subpatterns}
\label{sec:subpattern-sets}
\subind{subpattern set!in pattern-matching/sequencing|(}
Another natural approach to pattern-matching and sequencing is to
focus on the left-hand sides of equations, and represent information
about a term according to the ways that it does and does not match
portions of those left-hand sides.
\begin{definition}
\label{def:subpattern-set}
Let\/ $\mathbold{T}$ be a set of equations. $\mathbold{U}_{\boldscript{T}}$ is
the set of all subterms of members of \mbox{$\patt(\mathbold{T})$}.
\notind{$\mathbold{U}_{\boldscript{T}}$ (universal subpattern set)}
A\/ {\em subpattern set\/} for\/ $\mathbold{T}$ is a subset\/
\mbox{$B\subseteq\mathbold{U}_{\boldscript{T}}$}.
\end{definition}
Subpattern sets may be used to present information about
pattern-matching and sequencing in several ways:
\begin{enumerate}
\item A {\em match set\/} is a subpattern set containing all of the
subpatterns known to match at a particular node in a term.
\subind{match set}\subind{subpattern set!match}
\item A {\em possibility set\/} is a subpattern set containing all of
the subpatterns that might match at a particular node in a term, as
the result of $\omega$-rewriting the proper subterms of that node.
\subind{possibility set}\subind{subpattern set!possibility}
\item A {\em search set\/} is a subpattern set containing subpatterns
to search for at a particular node, in order to contribute to a redex.
\subind{search set}\subind{subpattern set!search}
\end{enumerate}
Notice that match sets and possibility sets always contain $\Omega$
(except in the unusual case when $\mathbold{T}$ has no occurrence of a
variable), because everything matches $\Omega$. Search sets never
contain $\Omega$, since it is pointless to search for something that
is everywhere; they always contain \mbox{$\patt(\mathbold{T})$}, since
finding an entire redex occurrence is always useful.
In order to control pattern-matching and sequencing with subpattern
sets, associate a match set $M_\alpha$ and a possibility set
$P_\alpha$ with each node in a term $t$. Initially,
\mbox{$M_\alpha=\{\Omega\}$} and \mbox{$P_\alpha=\mathbold{U}_{\boldscript
T}$} for all $\alpha$. At all times the subterm at $\alpha$ will
match every subpattern in $M_\alpha$, but no subpattern that is not in
$P_\alpha$. That is, \mbox{$[M,P]$} is an interval in the lattice of
subpattern sets in which the true current description of the subterm
at $\alpha$ always resides. At every visit to a node $\alpha$, update
$M_\alpha$ and $P_\alpha$ based on the symbol $f$ at $\alpha$ and the
information at the children $\alpha_1,\ldots,\alpha_n$ of $\alpha$ as
follows:
$$\begin{array}{ccl}
M_\alpha & \assign & M_\alpha\cup\\
\multicolumn{3}{r}{\{f(m_1,\ldots,m_n)\in\mathbold{U}_{\boldscript
T}\setsep m_1\sqsubseteq
m'_1,\ldots,m_n\sqsubseteq m'_n\mbox{ for some
}m'_1\in M_{\alpha_1},\ldots,m'_n\in M_{\alpha_n}\}} \\[3ex]
P'_\alpha & \assign & P_\alpha\cap\\
\multicolumn{3}{r}{(\{f(p_1,\ldots,p_n)\in\mathbold{U}_{\boldscript{T}}\setsep
p_1\sqsubseteq p'_1,\ldots,p_n\sqsubseteq p'_n\mbox{ for some
}p'_1\in P_{\alpha_1},\ldots,p'_n\in P_{\alpha_n}\}\cup\{\Omega\})} \\[3ex]
P_\alpha & \assign & \left\{\begin{array}{ll}P'_\alpha & \mbox{if
}P'_\alpha\cap\patt(\mathbold{T})=\emptyset \\
\mathbold{U}_{\boldscript{T}} & \mbox{otherwise}\end{array}\right.
\end{array}$$
$P'_\alpha$ represents the set of all subpatterns that may match at
node $\alpha$, as the result of $\omega$-rewriting at {\em proper\/}
descendants of $\alpha$. Notice the similarity between the calculation
of $P_\alpha$ and the process of melting---$\mathbold{U}_{\boldscript T}$
above plays the role of $\Omega$ in melting.
Similarly, keep a search set $S_\alpha$ at each node that is visited
in the traversal, entering at the root with
\mbox{$S_\Lambda=\patt(\mathbold{T})$}. When moving from a node $\alpha$
to the $i$th child $\alpha.i$ of $\alpha$, update the search set at
$\alpha.i$ by
$$
\begin{array}{ccl}
S_{\alpha.i} & \assign & \patt(\mathbold{T})\cup\\
\multicolumn{3}{r}{\{s\setsep s\mbox{ is the }i\mbox{th principal subterm of a
member of }S_{\alpha}\cap P'_\alpha\}}
\end{array}
$$
The information in $M_\alpha$, $P_\alpha$, $S_\alpha$, and the corresponding
information at the children $\alpha_1,\ldots\alpha_n$ of $\alpha$, may
be used to control pattern-matching and sequencing as follows:
\begin{enumerate}
\item If \mbox{$M_\alpha\cap\patt(\mathbold{T})\neq\emptyset$}, then
$\alpha$ is a redex occurrence.
\item If \mbox{$P_\alpha\cap\patt(\mathbold{T})=\emptyset$}, then $\alpha$
will never be a redex occurrence.
\item If \mbox{$M_\alpha\cap S_\alpha\neq\emptyset$}, then $\alpha$
contributes to a possible redex at one of its ancestors.
\item If \mbox{$P_\alpha\cap S_\alpha=\emptyset$}, then the symbol at
$\alpha$ will never contribute to a redex occurrence.
\item If there is no $\Omega$-term \mbox{$s\in S_\alpha$} whose $i$th
principal subterm is in $M_{\alpha.i}$, then it is safe to move to
$\alpha.i$ and process it further---in particular, any redex
occurrence reached by safe moves is strongly needed.
\end{enumerate}
Hoffmann and I proposed a pattern-matching and sequencing method based
on Match/Possibility/Search sets. The method appears heuristically to
succeed on most naturally defined orthogonal systems, but Jiefei Hong
noticed that there are
some strongly sequential systems for which it fails.
\begin{example}
\label{exa:mps-failure}
\addtocounter{eqnex}{1}
Consider the strongly sequential system
$$\mathbold{T}_{\theeqnex}=\{f(g(a,x),a)\formeq d,\;f(g(x,a),b)\formeq
e,\;c\formeq a\}$$
and search for a strongly needed redex in the term\/
\mbox{$f(g(c,c),c)$} with occurrences\/ $\Lambda$ (the root), $1$ (the
subterm \mbox{$g(c,c)$}), \mbox{$1.1$}, \mbox{$1.2$}, and $2$ (the
three occurrences of $c$, from left to right).
\begin{itemize}
\item Initially, we search at $\Lambda$, with
$$\begin{array}{cc}
S_\Lambda=\patt(\mathbold{T}_{\theeqnex})=
\{f(g(a,\Omega),a),\;f(g(\Omega,a),b),\;c\} &
M_\Lambda=\{\Omega\} \\[3ex]
\multicolumn{2}{c}{P_\Lambda=\mathbold{U}_{{\boldscript{T}}_{\theeqnex}}=
\{f(g(a,\Omega),a),\;f(g(\Omega,a),b),\;c,
\;g(a,\Omega),\;g(\Omega,a),\;a,\;b,\;\Omega\}}
\end{array}$$
Update\/ $M_\Lambda$ and $P_\Lambda$ based on\/
\mbox{$M_1=M_2=\{\Omega\}$},
\mbox{$P_1=P_2=\mathbold{U}_{{\boldscript{T}}_{\theeqnex}}$},
and the symbol\/ $f$ at the root---in this case\/ $M_\Lambda$ and
$P_\Lambda$ do not change, and
$$
P'_\Lambda=\{f(g(a,\Omega),a),\;f(g(\Omega,a),b),\;\Omega\}
$$
From condition (5) above, it is safe to
move to either of\/ $1$ or\/ $2$.
\end{itemize}
Suppose that we choose arbitrarily to move to\/ $1$.
\begin{itemize}
\item Now, we search at\/ $1$, with
$$\begin{array}{cc}
\multicolumn{2}{c}{S_1=\{f(g(a,\Omega),a),\;f(g(\Omega,a),b),\;c,\;
g(a,\Omega),\;g(\Omega,a)\}} \\[3ex]
M_1=\{\Omega\} &
P_1=\mathbold{U}_{{\boldscript{T}}_{\theeqnex}}
\end{array}$$
Update\/ $M_1$ and\/ $P_1$ to
$$\begin{array}{cc}
M_1=\{\Omega\} & P_1=P'_1=\{g(a,\Omega),\;g(\Omega,a),\;\Omega\}
\end{array}$$
The possibility set decreases in this case, because\/ there is no
left-hand side with $g$ at the root. Now by condition (5), it is\/
{\em not\/} safe to move to the first child of\/ $1$, $1.1$, because\/
\mbox{$g(\Omega,a)\in S_1$} has\/ $\Omega$ as its first principal
subterm, and\/ \mbox{$\Omega\in M_{1.1}$}. Similarly, it is not safe
to move to\/ $1.2$, because of\/ \mbox{$g(a,\Omega)\in S_1$}. The
method suggested in \cite{Hoffmann-O'Donnell-POPL1}
\auind{Hoffmann, C. M.}\auind{O'Donnell, M. J.} is not capable of backing
up to parents and siblings of\/ $1$ in order to decide which of the
children of\/ $1$ to visit first.
\end{itemize}
On the other hand, suppose we choose arbitrarily to move from\/ $\Lambda$
to\/ $2$ instead of\/ $1$.
\begin{itemize}
\item Now, we search at\/ $2$, with
$$\begin{array}{cc}
\multicolumn{2}{c}{S_2=
\{f(g(a,\Omega),a),\;f(g(\Omega,a),b),\;c,\;a,\;b\}} \\[3ex]
M_2=\{\Omega\} & P_2=\mathbold{U}_{\boldscript{T}}
\end{array}$$
Update\/ $M_2$ and\/ $P_2$ to
$$\begin{array}{ccc}
M_2=\{c,\;\Omega\} & P'_2=\{c,\Omega\} & P_2=\mathbold{U}_{\boldscript{T}}
\end{array}$$
Since\/ \mbox{$c\in M_2\cap\patt(\mathbold{T})$}, there is a redex, which
is strongly needed since we found it by safe moves, and we rewrite\/
$c$ to\/ $a$, yielding\/ \mbox{$f(g(c,c),a)$}. Recalculate $M_2$ and
$P_2$ to
$$\begin{array}{cc}
M_2=\{a,\;\Omega\} & P_2=P'_2=\{a,\;\Omega\}
\end{array}$$
\mbox{$a\in M_2\cap S_2$}, so continue back at\/ $\Lambda$ trying to
complete a redex using the\/ $a$ at\/ $2$.
\item Search again at\/ $\Lambda$; update\/ $M_\Lambda$, $P'_\Lambda$ and\/
$P_\Lambda$ again. $M_\Lambda$ and $P_\Lambda$ do not change, but now
$$
P'_\Lambda=\{f(g(a,\Omega),a),\;\Omega\}
$$
It is still safe to move to\/ $1$.
\item Search at\/ $1$ with
$$\begin{array}{cc}
\multicolumn{2}{c}
{S_1=\{f(g(a,\Omega),a),\;f(g(\Omega,a),b),\;c,\;g(a,\Omega)\}} \\[3ex]
M_1=\{\Omega\} & P_1=\mathbold{U}_{\boldscript{T}}
\end{array}$$
Notice that information from\/ $2$ has led to a smaller search set,
not containing\/ \mbox{$g(\Omega,a)$}. Update\/ $M_1$ and\/ $P_1$ to
$$\begin{array}{cc}
M_1=\{\Omega\} & P_1=P'_1=\{g(a,\Omega),\;g(\Omega,a),\;\Omega\}
\end{array}$$
This time it is safe to move to\/ $1.1$, but not to\/ $1.2$, so we
choose the former.
\item Search at\/ $1.1$ with
$$\begin{array}{cc}
\multicolumn{2}{c}{S_{1.1}=
\{f(g(a,\Omega),a),\;f(g(\Omega,a),b),\;c,\;a\}} \\[3ex]
M_{1.1}=\{\Omega\} & P_{1.1}=\mathbold{U}_{\boldscript{T}}
\end{array}$$
The results are the same as in the search at\/ $2$ above---$c$
rewrites to\/ $a$ and we continue back at\/ $1$.
\item Search again at\/ $1$, update\/ $M_1$, $P_1$ again to
$$\begin{array}{cc}
M_1=\{g(a,\Omega),\;\Omega\} & P_1=P'_1=\{g(a,\Omega),\;\Omega\}
\end{array}$$
\mbox{$g(a,\Omega)\in M_1\cap S_1$}, so continue back at\/ $\Lambda$
\item Search one more time at\/ $\Lambda$, recalculating\/ $M_\Lambda$,
$P_\Lambda$ to
$$\begin{array}{ccc}
M_\Lambda=\{f(g(a,\Omega),a),\;\Omega\} &
P'_\Lambda=\{f(g(a,\Omega),a),\;\Omega\} &
P_\Lambda=\mathbold{U}_{\boldscript{T}}
\end{array}$$
\mbox{$f(g(a,\Omega),a)\in M_\Lambda\cap\patt(\mathbold{T})$}, so there is
a redex at\/ $\Lambda$. Rewrite it, yielding\/ $d$. Update\/
$M_\Lambda$, $P_\Lambda$ to
$$\begin{array}{cc}
M_\Lambda=\{\Omega\} & P_\Lambda=P'_\Lambda=\{\Omega\}
\end{array}$$
\mbox{$P_\Lambda\cap\patt(\mathbold{T})=\emptyset$}, so the\/ $d$ at\/
$\Lambda$ will never change. Since\/ $\Lambda$ now has no children,
we are done, and\/ $d$ is a normal form.
\end{itemize}
$\mathbold{T}_{\theeqnex}$ shows that for some strongly sequential systems,
the success of the subpattern set analysis depends on the choice of
traversal order. In the strongly sequential
system\addtocounter{eqnex}{1}\setcounter{seqeqnex}{\value{eqnex}}
$$\begin{array}{rcl}
\mathbold{T}_{\theeqnex} & = & \{f(g(a,x),g(a,x))\formeq b,\;
f(g(x,a),h(a,x))\formeq c, \\[1.5ex]
& & f(h(a,x),g(x,a))\formeq d,\;
f(h(x,a),h(x,a))\formeq e\}
\end{array}$$
{\em no\/} order of traversal allows the subpattern set analysis to
succeed. The only way to sequence with\/ $\mathbold{T}_{\theeqnex}$ is to
visit {\em both\/} of the children of an\/ $f$ before descending to
grandchildren. The Huet-L\'evy method allows this breadth-first
behavior, but the Hoffmann-O'Donnell method does not.
\end{example}
\subind{subpattern set!in pattern-matching/sequencing|)}
\subsubsection{Applying the Sequencing Techniques}
\label{sec:applying-sequencing}
Neither $\Omega$-terms nor subpattern sets should be computed
explicitly at run time. Rather, at compile time we compute the finite
set of values that can possibly occur while executing a given system
of equations $\mathbold{T}$, along with tables of the operations
required to update them at run time. It is natural to think of
$\Omega$-terms and subpattern sets as components of the state of a
finite automaton, and the tables of operations as a representation of
the transition graph. A key problem is that the number of states may
grow exponentially in the size of $\mathbold{T}$.
\cite{Hoffmann-O'Donnell-pattern}\auind{Hoffmann, C. M.}
\auind{O'Donnell, M. J.} analyzes the number of match sets\subind{match set}
associated with a set of patterns. There is no published analysis of
the number of $\Omega$-terms, nor of possibility and search sets, but
it appears that all three may grow at an exponential rate also. Most
implementations of equational logic programming appear to use methods
that are equivalent to highly restricted forms of the $\Omega$-term
method.
The Huet-L\'evy $\Omega$-term method is the only published method that
succeeds for all strongly sequential systems. The only published
method using subpatterns fails in cases, such as
$\mathbold{T}_{\theseqeqnex}$, where it is necessary to back up before
reaching the leaves of a pattern, in order to carry information about
sequencing to a sibling. I conjecture that a simple variation in the
algorithm of \cite{Hoffmann-O'Donnell-POPL1}\auind{Hoffmann, C.
M.}\auind{O'Donnell, M. J.} will succeed on all strongly sequential
systems. Besides the superior generality of the Huet-L\'evy algorithm,
$\Omega$-term methods have the advantage of a simple notation, making
it relatively easy to discover restrictions that control the number of
possible values. On the other hand, subpattern sets express the
significance of information for pattern matching purposes in a
particularly transparent way. The subpattern set method also has the
advantage of separating clearly the information that is passed up the
tree (match and possibility sets) from the information that is passed
down the tree (search sets). Match and possibility sets may be
associated with nodes in a heap representation of a tree, taking
advantage of sharing in the heap. Search sets and $\Omega$-terms
depend on the path by which a node was reached, so they must be stored
on a traversal stack, or otherwise marked so that they are not shared
inappropriately.
\subind{pattern-matching|)}\subind{sequencing of equational evaluation|)}
\subsection[Driving Procedures]{Driving Procedures for Term Rewriting}
\label{sec:drivers}
\subsubsection[Recursive Schema]{A Recursive Schema for Lazy
Evaluation}
\label{sec:recursive-schema}
\subind{recursive evaluation|(}\subind{evaluation!recursive schema|(}
It is natural to drive the conventional strict
evaluation of a term by a recursive procedure, $\eval$, shown in
Figure~\ref{fig:eval} below. For each function symbol $f$ that appears
in a term, there is a predefined procedure $\bar{f}$ to compute the
value of that function on given arguments.
\begin{figure}[hbtp]
\begin{center} \begin{tabbing}
mm\=mm\=mm\=mm\=mm\=mm\=mm\=mm\=mm\=mm\=\kill {\bf Procedure}
\mbox{$\eval(t)$}\+\\ Let \mbox{$t=f(t_1,\ldots,t_n)$};\`(1)\\ {\bf
For} \mbox{$i\assign 1,\ldots,n$} {\bf do}\`(2)\+\\
\mbox{$v_i\assign\eval(t_i)$}\`(3)\-\\ {\bf end for};\\ {\bf Return}
\mbox{$\bar{f}(v_1,\ldots,v_n)$}\`(4)\-\\ {\bf end procedure} $\eval$
\end{tabbing} \end{center} \caption{\label{fig:eval}Recursive
schema for strict evaluation.}
\subind{evaluation!recursive schema!strict}
\subind{normal form!recursive evaluation schema!strict}
\notind{$\eval$ (strict normalization procedure)}
\end{figure}
The simplicity of recursive evaluation is appealing, and it has a
direct and transparent correspondence to the semantic definition of
the value of a term (Definition~\ref{def:FOPC-valuation} of the
chapter `Introduction: Logic and Logic Programming Languages). It is
straightforward to let the value of a term be its normal form, and use
the recursive $\eval$ above to find the normal form of its input. But,
the structure of the procedure is heavily biased toward strict
evaluation. Even the conditional function $\cond$ requires a special
test between lines (1) and (2) to avoid evaluating both branches. Lazy
functional languages have been implemented with conventional recursive
evaluation, by adding new values, called {\em suspensions, thunks,} or
{\em closures\/}\subind{suspension}\subind{thunk}\subind{closure} to
encode unevaluated subterms
\cite{Peyton-Jones,Bloss-Hudak-Young}.\auind{Peyton Jones, S. L.}
\auind{Bloss, A.}\auind{Hudak, P.}\auind{Young, J.} The overhead of
naive implementations of suspensions led to early skepticism about the
performance of lazy evaluation. Clever optimizations of suspensions
have improved this overhead substantially, but it is still arguably
better to start with an evaluation schema that deals with lazy
evaluation directly.
For orthogonal systems of equations, a function symbol that
contributes to a redex occurrence at a proper ancestor cannot itself
be at the root of a redex. So, a term may be normalized by a
recursive procedure that rewrites its argument only until the root
symbol can never be rewritten.
\begin{definition}
\label{def:head-normal}
\subind{head-normal form|(}\subind{normal form!head-|(}
Let\/ $\mathbold{T}$ be a set of equations, and\/ $t$ be a term. $t$ is a\/
{\em (weak) head-normal form\/} for\/ $\mathbold{T}$ if and only if there
is no redex\/ $u$ such that\/ \mbox{$t\rewrites{\boldscript{T}} u$}.
$t$ is a\/ {\em strong head-normal form\/} for\/ $\mathbold{T}$ if and
only if there is no redex\/ $u$ such that\/
\mbox{$t\rewritesm{\boldscript{T}}{\omega}u$}.
\end{definition}
Head normality is undecidable, but strong head normality is easy to
detect by melting.
\begin{proposition}
\label{prom:melt-head-normal}
\subind{melting!head-normal form}
Let\/ $t$ be a term. $t$ is a strong head-normal form for\/ $\mathbold{T}$
if and only if\/
\mbox{$\melt^{\boldscript{T}}_{\omega}(t)\neq\Omega$}.
Let\/ $t$ be an\/ $\Omega$-term. The folowing are equivalent:
\begin{itemize}
\item $u$ is a strong head-normal form for\/ $\mathbold{T}$, for all\/
\mbox{$u\sqsupseteq t$}
\item \mbox{$\melt^{\boldscript{T}}_{\omega}(t)\neq\Omega$}.
\end{itemize}
\end{proposition}
\subind{head-normal form|)}\subind{normal form!head-|)}
The procedure $\heval$ in Figure~\ref{fig:head-normalize} below
rewrites its argument only to strong head-normal form, instead of all
the way to normal form.
\begin{figure}[hbtp]
\subind{evaluation!recursive schema!head-normal}
\subind{head-normal form!recursive evaluation schema}
\notind{$\heval$ (head-normalization procedure)}
\notind{$\rhs$ (right-hand side construction procedure)}
\begin{center}
\begin{tabbing}
mm\=mm\=mm\=mm\=mm\=mm\=mm\=mm\=mm\=mm\=\kill
{\bf Procedure} \mbox{$\heval(t)$}\+\\
{\bf While} $t$ is not in strong head-normal form {\bf do}\`(1)\+\\
{\bf If} $t$ is a redex {\bf then}\`(2)\+\\
\mbox{$t\assign\rhs(t)$}\`(3)\-\\ {\bf else}\+\\
Let \mbox{$t=f(t_1,\ldots,t_n)$};\`(4)\\
Choose \mbox{$i\in[1,n]$} that is safe to process;\`(5)\\
\mbox{$t_i\assign\heval(t_i)$};\`(6)\\
\mbox{$t\assign f(t_1,\ldots,t_n)$}\`(7)\-\\
{\bf end if}\-\\ {\bf end while};\\
{\bf Return} $t$\`(8)\-\\
{\bf end procedure} $\heval$
\end{tabbing} \end{center}
\caption{\label{fig:head-normalize}Recursive schema to find
head-normal form.}
\end{figure}
The tests whether $t$ is in strong head-normal form (1), whether $t$ is a
redex (2), and the choice of a safe child $i$ (5), may be accomplished
by any pattern-matching and sequencing method that succeeds on the
given system of equations. The procedure $\rhs$
\notind{$\rhs$ (right-hand side construction procedure)}
called by $\heval$ above builds and returns the right-hand side instance
corresponding to a given redex. $\rhs$ contains the information about
function symbols analogous to that contained in the collection of
$\bar{f}$s used by the strict evaluator $\eval$. In a detailed
implementation, pattern-matching and sequencing require some additional
data structures, and possibly some additional values returned by
$\heval$.
In order to produce a normal form for $t$, $\heval$ must be called
recursively on the subterms of $t$, as shown by the procedure $\norm$
in Figure~\ref{fig:normalize} below.
\begin{figure}[hbtp]
\subind{evaluation!recursive schema!lazy}
\subind{lazy evaluation!recursive schema}
\subind{normal form!recursive evaluation schema!lazy}
\notind{$\norm$ (lazy normalization procedure)}
\begin{center}
\begin{tabbing}
mm\=mm\=mm\=mm\=mm\=mm\=mm\=mm\=mm\=mm\=\kill
{\bf Procedure} \mbox{$\norm(t)$}\+\\
\mbox{$t\assign\heval(t)$};\`(1)\\
Let \mbox{$t=f(t_1,\ldots,t_n)$};\`(2)\\
{\bf For} \mbox{$i\assign 1,\ldots,n$} {\bf do}\`(3)\+\\
\mbox{$t_i\assign\norm(t_i)$}\`(4)\-\\
{\bf end for};\\
\mbox{$t\assign f(t_1,\ldots,t_n)$};\`(5)\\
{\bf Return} $t$\`(6)\-\\
{\bf end procedure} $\norm$
\end{tabbing}
\end{center}
\caption{\label{fig:normalize}Recursive schema to normalize a term
lazily.}
\end{figure}
Section~\ref{sec:extensions} shows how other schemes for calling
$\heval$, more sophisticated than $\norm$, may be used to generalize
the behavior of input and output in equational logic programming.
\subind{recursive evaluation|)}\subind{evaluation!recursive schema|)}
\subsubsection[Term Data Structure]{Using the Term Data Structure in
Rewriting}
\label{sec:using-data-structure}
\subind{term!data structures|(}
\notind{$\val^{\mbox{\it max}}_{\repl}$ (replaced parameter valuation)|(}
\notind{$\val$ (parameter valuation)|(}
\notind{$\repl$ (parameter replacement)|(}
\notind{$\ind$ (signature index)|(}
The basic recursive schema of $\heval$ may manipulate the tables
$\val$, $\repl$ and $\ind$ in a variety of different ways, with
various advantages and disadvantages in performance. In most cases,
particularly in implementations of orthogonal systems of equations,
the pattern-matching and sequencing method operates on
\mbox{$\val^{\mbox{\it max}}_{\repl}(\alpha_0)$}, where\/ $\alpha_0$
is a heap address that initially represents the root of the input
term. \mbox{$\val^{\mbox{\it max}}_{\repl}(\alpha_0)$}
is typically the closest thing to a normal form for the original input
that can be read easily from $\val$, $\repl$ and $\ind$, so we achieve
the best progress by rewriting it further. It is also clearly correct to
run the pattern-matcher/sequencer on another term in
\mbox{$\val^*_{\repl}(\alpha_0)$}. I am not aware of any current
implementation that does this, but it is likely to be neccessary in
complete implementations of nonconfluent systems, and equation solving
systems (Section~\ref{sec:equation-solving}), which will presumably need
to explore several rewriting sequences in parallel. The details of access
to $\val$ and $\repl$ for pattern-matching/sequencing fit into the
implementation of the conditions tested in lines (1) and (2) of
$\heval$\notind{$\heval$ (head-normalization procedure)}, and in the
choice of $i$ in line (5). If the calculation of \mbox{$\val^{\mbox{\it
max}}_{\repl}(\alpha_0)$} follows $\repl$ links, it is usually desirable
to compress the paths using transformations (4) and (5) from
Section~\ref{sec:interpret-term-structures}, which speeds up future
calculations of \mbox{$\val^{\mbox{\it max}}_{\repl}(\alpha_0)$}. The
benefits of transformations (4) and (5) are essentially the same as the
benefits of path compression\subind{path compression} in the UNION/FIND
algorithm \cite{Aho-Hopcroft-Ullman-design}
\auind{Aho, A. V.}\auind{Hopcroft, J. E.}\auind{Ullman, J. D.}
(think of $\repl$ as representing a partition of terms into
equivalence classes of terms that have been proved equal).
The implementation of $\rhs$,
\notind{$\rhs$ (right-hand side construction procedure)}
called in line (3) of $\heval$,\notind{$\heval$ (head-normalization procedure)}
is the only place where updating of $\val$, $\repl$ and $\ind$ is required.
Suppose that we are applying the equation \mbox{$l\formeq r$} to
rewrite the term $t$, represented in the heap with root node $\alpha$.
The most obvious implementation of $\rhs$ builds all of the
nonvariable structure of $r$ in newly allocated heap nodes in $\val$,
using pointers to pre-existing structure for the variable
substitutions (this is essentially what a conventional Lisp
implementation does). Let $\beta$ be the root node of the constructed
representation of the appropriate instance of $r$. Once the instance
of $r$ is built, $\beta$ may be copied into $\alpha$.
\subind{sharing|(}
Unfortunately, when $r$ consists of a single variable, as in
\mbox{$\car(\cons(x,y))\formeq x$}, $\beta$ may have already been
shared extensively, and this copying loses an obvious opportunity for
further sharing \cite{O'Donnell-Springer}\auind{O'Donnell, M. J.}. The
natural alternative to copying $\beta$ is to assign
\mbox{$\repl(\alpha)\assign\beta$} (represent the rewriting step by a
link, instead of rewriting in place). This avoids any loss of sharing,
and if path compression is applied in calculating
\mbox{$\val^{\mbox{\it max}}_{\repl}(\alpha_0)$}, it appears to have
an acceptable cost in overhead for access to the heap. In many
programs, a more sophisticated implementation of $\rhs$ can save an
exponentially growing amount of wasted work by sharing identical
subterms of $r$. Many natural programs have such identical subterms of
right-had sides, and in practice this makes the difference between
efficiency and infeasibility often enough to be well worth the effort.
The cost of detecting identical subterms of right-hand sides using the
tree isomorphism algorithm \cite{Aho-Hopcroft-Ullman-design}
\auind{Aho, A. V.}\auind{Hopcroft, J. E.}\auind{Ullman, J. D.}
is quite modest, and can be borne entirely at compile time.
\subind{term!data structures|)}
\notind{$\val^{\mbox{\it max}}_{\repl}$ (replaced parameter valuation)|)}
\notind{$\val$ (parameter valuation)|)}
\notind{$\repl$ (parameter replacement)|)}
\subsubsection[Ground Sharing]{Dynamic Exploitation of Sharing and
Memoing of Ground Terms}
\label{sec:ground-sharing}
\notind{$\rhs$ (right-hand side construction procedure)|(}
\notind{$\heval$ (head-normalization procedure)|(}
There are several ways that the signature index $\ind$ may be used to
improve sharing dynamically and opportunistically at run time (these
are examples of transformations (6) and (7) of
Section~\ref{sec:interpret-term-structures}). A further sophistication
of $\rhs$ builds the structure of the instance of $r$ from the leaves
and variable instances up, and for each constructed signature
\mbox{$f(\alpha_1,\ldots,\alpha_n)$} it checks
\mbox{$\ind(f(\alpha_1,\ldots,\alpha_n))$}. If the result is $\nil$,
$\rhs$ builds a new heap node $\beta$ containing signature
\mbox{$f(\alpha_1,\ldots,\alpha_n)$}, and updates $\ind$ by
\mbox{$\ind(f(\alpha_1,\ldots,\alpha_n)\assign\beta$}. If the result
is \mbox{$\gamma\neq\nil$} it shares the pre-existing representation
rooted at $\gamma$. This technique is essentially the same as the {\em
hashed $\cons$\/} optimization in Lisp implementations
\cite{Spitzen-Levitt-Robinson}.
\notind{$\rhs$ (right-hand side construction procedure)|)}
\auind{Spitzen, J. M.}\auind{Levitt, K. N.}\auind{Robinson, L.}
There are more places where it may be valuable to check the signature
index for opportunistic sharing. Immediately after line (6) of $\heval$,
the signature at the root $\alpha$ of the heap representation of $t$ may
have changed, due to rewriting of $t_i$ and path-compression. A
sophisticated implementation may check \mbox{$\ind(\val(\alpha))$}, and
if the result is \mbox{$\beta\neq\nil,\alpha$}, reassign
\mbox{$\repl(\alpha)\assign\beta$}, so that further path compression
will replace links pointing to $\alpha$ by links to $\beta$, thus
increasing sharing. If \mbox{$\ind(\val(\alpha))=\nil$}, then
reassign \mbox{$\ind(\val(\alpha))\assign\alpha$}. Finally, when
$\heval$ is called recursively on a subterm rooted at the heap address
$\alpha$, the signature at $\alpha$ may be modified as the result of
rewriting of a shared descendant of $\alpha$ that was visited
previously by a path not passing through $\alpha$. So, the same
checking of $\ind$ and updating of $\repl$ or $\ind$ may be performed
in step (1) of $\heval$, after any implicit path compression, but
before actually testing for strong head-normal form.
The opportunistic sharing techniques described above have a nontrivial
cost in overhead added to each computation step, but in some cases
they reduce the number of computation steps by an exponentially
growing amount. See \cite{Sherman-lazy}\auind{Sherman, D. J.} for more
details on opportunistic sharing, and discussion of the tradeoff
between ovehead on each step and number of steps. The method that
checks new opportunities for sharing at every node visit as described
above is called {\em lazy directed congruence closure.}
\subind{lazy directed congruence closure}
\subind{congruence closure!lazy directed}
An even more
aggressive strategy, called {\em directed congruence closure\/}
\subind{directed congruence closure|(}\subind{congruence closure!directed|(}
\cite{Chew-congruence}\auind{Chew, L. P.}, makes extra node visits
wherever there is a chance that a change in signature has created new
sharing opportunities, even though the pattern-matcher/sequencer has
not generated a need to visit all such nodes (i.e., when the signature
at a shared node changes, directed congruence closure visits all
parents of the node, many of which might never be visited again by
$\heval$). Directed congruence closure may reduce the number of steps
in lazy directed congruence closure by an exponential amount, but the
added overhead appears heuristically large, and changes the structure
of the implementation significantly from the recursive schema of
$\heval$.
Opportunistic sharing strategies are generalizations of the idea of
{\em memo functions\/}\subind{memoing}
\cite{Keller-Sleep,Mitchie,Field-Harrison}.
\auind{Keller, R. M.}\auind{Sleep, M. R.}\auind{Mitchie, D.}
\auind{Field, A. J.}\auind{Harrison, P. G.}
Most memoing techniques limit themselves to remembering equalities of
the form \mbox{$f(t_1,\ldots,t_n)\formeq u$}, where the $t_i$s and $u$
are all in normal form. Lazy directed congruence closure and its
variants can remember partial evaluations as well, as can the {\em
lazy memo functions\/} of Hughes\auind{Hughes, J.} \cite{Hughes}.
There is a nice theoretical characterization of the memoing power of
directed congruence closure.
\begin{proposition}[\cite{Chew-congruence}\auind{Chew, L. P.}]
\label{pro:directed-congruence-closure}
At any step in the process of rewriting a term, there is a set\/
$\mathbold{G}$ of ground instances of equations that have been applied
in rewriting. Using directed congruence closure, we never apply a
ground instance that is a semantic consequence of $\mathbold{G}$.
\end{proposition}
Not only does directed congruence closure avoid ever doing the same
rewriting step twice in different contexts, it only performs a new
rewriting step when new substitution of terms for variables in the
given equations is absolutely necessary to derive that step.
\notind{$\ind$ (signature index)|)}
\notind{$\heval$ (head-normalization procedure)|)}
\subind{directed congruence closure|)}\subind{congruence closure!directed|)}
\subsubsection[Nonground Sharing]{Sharing and Memoing Nonground
Terms---Paramodulation}
\label{sec:nonground-sharing}
\subind{memoing|(}
\subind{paramodulation|(}\subind{sharing!nonground|(}
\subind{nonground sharing|(}\subind{lambda sharing|(}\subind{sharing!lambda|(}
The next natural step beyond directed congruence closure is sharing/memoing
nonground terms. Abstractly in theorem-proving language, this amounts
to applying {\em paramodulation\/}
\cite{Robinson-Wos,Loveland,Gallier}
\auind{Robinson, G. A.}\auind{Wos, L.}\auind{Loveland, D. W.}
\auind{Gallier, J. H.} to derive new
equations with variables, instead of using demoduation to merely
rewrite a given term. Paramodulation takes two equations
\mbox{$p\formeq\subst{r}{x}{q}$} and \mbox{$s\formeq t$} and
substitutions $\sigma_r$, $\sigma_s$ of terms for variables such that
\mbox{$r\sigma_r=s\sigma_s$}, and derives the new equation
\mbox{$p\sigma_r\formeq\subst{t\sigma_s}{x}{q\sigma_r}$}, which
follows from the first two by instantiation, substitution, and
transitivity. Paramodulation sometimes improves the lengths of proofs by
exponentially growing amounts compared to pure term
rewriting \cite{Loveland}.
\begin{example}
\label{exa:paramodulation}
\addtocounter{eqnex}{1}
Consider the set\/ $\mathbold{T}_{\rev}$ of equations defining list
reversal (\/$\rev$) and appending an element to the end of a list
(\/$\append$).
$$\begin{array}{rcllr}
\mathbold{T}_{\rev} & = & \{ &
\rev(\nil)\formeq\nil, \\
& & & \rev(\cons(x,y))\formeq\append(x,\rev(y)), \\
& & & \append(x,\nil)\formeq x, \\
& & & \append(x,\cons(y,z))\formeq\cons(y,\append(x,z)) & \}
\end{array}$$
The number of rewriting steps required to append an element to a list
of\/ $i$ elements is\/ $i+1$. But, consider applying paramodulation
instead. One step of paramodulating the last equation in\/
$\mathbold{T}_{\rev}$ with itself yields\/
\mbox{$\append(w,\cons(x,\cons(y,z)))\formeq
\cons(x,\cons(y,\append(w,z)))$}, which moves the appended element\/
$w$ past the first\/ $2$ elements in the list. The new equation paramodulates
again with\/
\mbox{$\append(x,\cons(y,z))\formeq\cons(y,\append(x,z))$} to move
the appended element past\/ $3$ list elements, but paramodulating the
new equation with itself does even better, moving the appended element
past\/ $4$ list elements. It is straightforward to produce a sequence
of equations, each by paramodulating the previous one with itself, so
that the\/ $i$th equation in the sequence moves an appended element
past\/ $2^i$ list elements. So, paramodulation reduces the number of
steps involved in normalizing\/ \mbox{$\append(\alpha,\beta)$},
where\/ $\beta$ is a list of\/ $i$ elements, from\/ $i+1$ to\/
\mbox{${\cal O}(\log i)$}.
The improvement in normalizing\/ \mbox{$\rev(\alpha)$} is less
dramatic, but perhaps more useful in practice. If\/ $\alpha$ has\/ $i$
elements, then rewriting requires\/ \mbox{$\Omega(i^2)$} steps
to normalize\/ \mbox{$\rev(\alpha)$}, because it involves appending to
the end of a sequence of lists of lengths\/ $0,1,\ldots,i$. But, we
may produce a sequence of equations, each by paramodulating the
previous one with
$$\append(x,\cons(y,z))\formeq\cons(y,\append(x,z))$$ so that
the\/ $i$th equation in the sequence appends an arbitrary element to
a list of length\/ $i$. The whole sequence requires only $i$ steps of
paramodulation, so a list of\/ $i$ elements may be reversed by\/
\mbox{${\cal O}(i)$} paramodulation steps.
\end{example}
There are several problems in reducing the ideas on paramodulation
suggested in Example~\ref{exa:paramodulation} to useful practice.
First, we need a way to control paramodulation---that is, to choose
useful steps of paramodulation to perform (equivalently, useful
derived nonground equations to save and use along with the originally
given program equations for further evaluation steps). Sequential
strategies for lazy evaluation appear to solve this problem in
principle. Whenever a sequential rewriting process discovers that a
ground term $s$ rewrites in one step to a ground term $t$,
and existing links in the $\repl$ table rewrite $t$ further to
$u$, ground sharing techniques such as lazy directed congruence
closure in effect save the derived ground equation
\mbox{$s\formeq u$} for later use as a single step (this is
accomplished by setting \mbox{$\repl(\alpha)\assign\gamma$} where
\mbox{$t=\val^*_{\repl}(\alpha)$} and
\mbox{$u=\val^*_{\repl}(\gamma)$}). A method
for nonground sharing should save instead the nonground equation
\mbox{$s'\formeq u'$}, where $s'$ is the
generalization of $s$ containing all of the symbol instances in
$s$ that were scanned by the sequential rewriting process in
order to rewrite $s$ to $u$, with unique variables replacing
the unscanned portions of $s$. The appropriate generalization
$u'$ of $u$ is easy to derive from the same information that
determines $s'$. The equation \mbox{$s'\formeq u'$}
may be derived by paramodulation, using the same equations that were
used to rewrite $s$ to $u$. This strategy for driving
paramodulation by the same sequentializing process that drives
rewriting accomplishes the improvement of list reversal from quadratic
to a linear number of inference steps in
Example~\ref{exa:paramodulation} above.
A second problem is to deal efficiently with the more general sets of
equations that we get by adding the results of paramodulation to the
original program equations. Notice that, even when the original
program is orthogonal, there are overlaps\subind{overlap} between the original
equations and the results of paramodulation. When several different
equations apply to the same subterm, we probably should choose the
most specific one, but that may not always be uniquely determined (see
Figure~\ref{fig:overlap-types} in
Section~\ref{sec:extended-sequencing} for the different sorts of
overlap). At least in Example~\ref{exa:paramodulation} above, choosing
the most specific applicable equation accomplishes the improvement of
list appending from linear to logarithmic.
Finally, we need an efficient data structure for representing sets of
nonground equations, with a good way to add the new equations that
result from paramodulation. Suffix trees\subind{suffix tree}
\cite{Aho-Hopcroft-Ullman-design} \auind{Aho, A. V.}\auind{Hopcroft,
J. E.}\auind{Ullman, J. D.} appear to provide a good basis for
representing dynamically increasing sets of patterns and/or equations
\cite{Strandh-pattern}\auind{Strandh, R. I.}, but it is not at all
clear how to generalize these ideas efficiently to terms instead of
strings (a good dissertation project: implement string rewriting with as
much sharing as possible at both ends of a substring---this is
essentially nonground sharing with only unary functions). A more detailed
examination of Example~\ref{exa:paramodulation} above reveals that,
although the number of abstract steps required to reverse a list of
length $i$ using paramodulation is only \mbox{${\cal O}(i)$}, each
abstract step involves an equation with a larger left-hand side than the
one before. So, while the number of steps reduces, the cost of
pattern-matching in each step increases, and the total cost for reversal
remains \mbox{$\Omega(i^2)$}. The exponential improvement in appending to the
end of a list still yields a net win, but it is not clear how often
such exponential improvements will arise in practice. In order to
decrease the total cost of reversal to \mbox{${\cal O}(i)$}, we need a
way to make the pattern matching process incremental, so that the work
done to match a smaller pattern may be reused in matching a larger
pattern derived from it. I cannot find a reason why such efficient
incremental pattern matching is impossible, but no known technique
accomplishes it today.
Another starting point for research on nonground sharing is the work
on sharing in the lambda calculus.
\cite{Staples-two,Lamping,Kathail,Gonthier-Abadi-Levy}\auind{Staples, J.}
\auind{Lamping, J.}\auind{Gonthier, G.}\auind{Abadi,
M.}\auind{Kathail, V.}
\auind{L\'evy, J.-J.}
Some surprisingly powerful data structures have been discovered already,
but the overhead of using them is still not understood well enough to
determine their practical impact.
\subind{sharing|)}
\subind{paramodulation|)}\subind{sharing!nonground|)}
\subind{nonground sharing|)}\subind{lambda sharing|)}\subind{sharing!lambda|)}
\subind{memoing|)}
\subsubsection{Storage Allocation}
\label{sec:storage-allocation}
\subind{storage allocation|(}\subind{garbage collection|(}
Essentially all of the known ideas for automatic storage allocation
and deallocation are applicable to the $\val$/$\repl$/$\ind$ heap
\cite{Cohen,Appel}\auind{Cohen, J.}\auind{Appel, A.}. Unfortunately,
one of the most attractive methods in current practice---generational
garbage collection\subind{garbage collection!generational}
\subind{generational garbage collection} \cite{Lieberman-Hewitt}---has
\auind{Lieberman, H.}\auind{Hewitt, C.} been analyzed on the assumption
that it is rare for old nodes in the heap to point to much more recently
allocated ones. The $\repl$ links in lazy implementations of equational
logic programming appear to violate that assumption. I know of no
experimental or theoretical study of the efficiency of generational
garbage collection with lazy evaluation. The aggressive sharing
techniques also call into question the normal definition of `garbage'
nodes as those that {\em cannot\/} be reached starting from some set of
directly accessible root nodes. In the discussion above, the directly
accessible nodes are $\alpha_0$ and all of the nodes entered in the
$\ind$ table. With the more aggressive approaches to sharing, every node
in the heap is usually accessible from nodes in $\ind$, so there is no
garbage according to the strict definition. On the other hand, nodes that
are inaccessible from $\alpha_0$ (the root of the term being rewritten to
normal form) contain information that may be recalculated from the
$\alpha_0$-accessible nodes and the equational program, so, while they
are not useless garbage, they are not {\em essential\/} to the
calculation of a final result. I am not aware of any published literature
on deallocation of useful but inessential nodes in a heap, which might be
called `rummage sale' instead of `garbage collection.' Phillip Wadler
proposed that a garbage-collection/rummage-sale
\subind{rummage sale!vs.\ garbage collection}
\subind{garbage collection!vs.\ rummage sale}
procedure might also perform rewriting steps that reduce the size of the
given term. Such a technique appears to provide the space-saving benefits
of strictness analysis\subind{strictness analysis}
\cite{Mycroft,Hughes-strict}, \auind{Mycroft, A.}\auind{Hughes, R. J. M.}
avoiding both the compile-time cost and the nonuniformity in the
run-time control. Research is needed on the practical impact of
various heuristic strategies for looking ahead in a rewriting sequence
in search of a smaller version of the current term.
\subind{storage allocation|)}\subind{garbage collection|)}
\section{Compiling Efficient Code from Equations}
\label{sec:code}
\subind{compiler!equational logic programming|(}
Conventional approaches to compiling functional programs, following
early Lisp\subind{Lisp} compilers, associate a block of machine code with each
function symbol $f$, and use the recursive schema for $\eval$ in
Figure~\ref{fig:eval}, with some representation of {\em suspensions\/}
\subind{suspension}\subind{evaluation}
\notind{$\eval$ (strict normalization procedure)}
to encode partially evaluated terms as a new sort of value
\cite{Peyton-Jones}\auind{Peyton Jones, S. L.}.
A natural and less explored alternative is to associate a block of
machine code with each state in a finite automaton whose states are
derived from an analysis of the program using $\Omega$-terms or
subpattern sets \cite{Bondorf,Durand-Sherman-Strandh}.
\auind{Bondorf, A.}\auind{Durand, I.} \auind{Sherman, D.
J.}\auind{Strandh, R. I.} The state-based approach has no explicit
representation of suspensions, although in effect terms that it must
build in the heap are implicit suspensions. And, a naive
implementation of the state-based approach leads to more and smaller
recursive procedures than the symbol-based approach. Of course,
sophisticated symbol-based compilers may optimize by compiling
multiple specialized versions of a single function for different sorts
of arguments, and sophisticated state-based compilers may optimize by
inlining some of the recursive calls. The {\em supercombinator\/}
\subind{supercombinator}
method \cite{Hughes-super}\auind{Hughes, R. J. M.} introduces new
symbols internal to the implementation, reducing the dependence of
symbol-based compiling on the actual symbols in the source program.
After such optimizations, it is not clear whether symbol-based
compiling and state-based compiling will actually generate different
code. The function symbols appear to be conceptually tied to the
programmer's view of her program, while the automaton states appear to
be conceptually tied to the computational requirements of the program.
I conjecture that in the long run the state-based approach will prove
more convenient for achieving the best run-time performance, while the
symbol-based approach will maintain a more intuitive connection
between program structure and performance.
Both the function-based and state-based methods compile the information
in the set of equations constituting a program, but they both treat the
input term and all of the intermediate terms produced in rewriting to
normal form as static data structures, which must be interpreted by the
compiled equational code. The Tigre\subind{Tigre} system
\cite{Koopman-Lee,Koopman}\auind{Koopman, P. J.}\auind{Lee, P.} by
contrast compiles each function symbol {\em in the term being
evaluated\/} into a block of code---the entire term becomes a self
modifying program that reduces itself to normal form. Tigre compiles
only a particular extension of the combinator calculus; an interesting
generalization would be a Tigre-like compiler for an arbitrary
orthogonal system.
\subind{abstract machines!equational logic programming|(}
\subind{equational logic programming!abstract machines|(}
\subind{intermediate code!equational logic programming|(}
\subind{equational logic programming!intermediate code|(} Several
abstract machine languages have been proposed as intermediate
languages for equational compilers, or as real machine languages for
specialized term-rewriting machines. Landin's SECD machine
\subind{SECD machine} \cite{Landin}\auind{Landin, P. J.} is designed
to support evaluation of lambda terms. The G Machine\subind{G machine}
\cite{Johnsson,Augustsson,Burn-Peyton-Jones-Robson}
\auind{Johnsson, T.}\auind{Augustsson, L.}
\auind{Burn, G. L.}\auind{Peyton Jones, S. L.}\auind{Robson, J. D.}
is intended as a specialized machine
language or intermediate language for term rewriting in general.
Equational Machine (EM)\subind{EM code} code
\cite{Durand-Sherman-Strandh,Strandh-thesis}
\auind{Durand, I.}\auind{Sherman, D. J.}\auind{Strandh, R. I.} is designed to
support state-based compiling of equational programs, and the
optimizations that seem most important in that approach. Several
other abstract machine proposals are in
\cite{Cardelli-functional,Cardelli-compiling,Cousineau-Curien-Mauny}
\auind{Cardelli, L.}\auind{Cousineau, G.}\auind{Curien, P.-L.}
\auind{Mauny, M.}. Turner proposes to compile equational systems into
the combinator calculus,\subind{combinator calculus!compiling to}
using the well-known capability of the combinator calculus to encode
substitution in the lambda calculus
\cite{Turner}\auind{Turner, D.A.}.
Then, an implementation of the combinator calculus, fine tuned
for performance, can support essentially all equational logic
programming languages. The impact of such translations of one
rewriting system to another is ill understood---in particular the
effects on sharing and parallelism (see Section~\ref{sec:parallel})
are quite subtle. The Warren Abstract Machine (WAM)\subind{WAM!term
representation} \subind{Warren abstract machine!term representation}
\cite{Warren-abstract}\auind{Warren, D. H. D.}, intended as an
intermediate language for Prolog compilers, has also been used in
theorem provers as a representation for terms---$t$ is represented by
WAM code to unify an arbitrary input with $t$. I am not aware of any
published study of the applicability of such a WAM encoding of terms
to functional or equational logic programming.
\subind{compiler!equational logic programming|)}
\subind{abstract machines!equational logic programming|)}
\subind{equational logic programming!abstract machines|)}
\subind{intermediate code!equational logic programming|)}
\subind{equational logic programming!intermediate code|)}
\section{Parallel Implementation}
\label{sec:parallel}
\subind{parallel evaluation|(}
One of the primary early motivations for studying functional
programming languages was the apparent opportunities for parallel
evaluation \cite{Backus-turing}\auind{Backus, J.}. So, it is ironic
that most of the effort in functional and equational logic programming
to date involves sequential implementation. A variety of proposals for
parallel implementation may be found in
\cite{Peyton-Jones}\auind{Peyton Jones, S. L.}.
\subind{strictness analysis!parallel|(}
\subind{sequentiality analysis!parallel|(}
Both strictness analysis and sequentiality analysis are used primarily to
choose a sequential order of computation that avoids wasted steps. An
important open topic for research is the extension of sequentiality
analysis to support parallel computation. Huet and L\'evy's sequentiality
analysis is already capable of identifying more than one strongly needed
redex in a term. A parallel implementation might allocate processors
first to the strongly needed redexes, and then to other more speculative
efforts. It appears that sequentiality analysis can be generalized rather
easily (although this has not been done in print) to
identify\subind{strongly needed redex set} strongly needed {\em sets\/}
of redexes, where no individual redex is certain to be needed, but at
least one member of the set is needed. For example, with the positive
parallel-or equations in\/ $\mathbold{T}_{\sor+}$ of
Examples~\ref{exa:nonorthogonal} and~\ref{exa:parallel-or-problem}, it is
intuitively clear that if $\alpha$ is needed in $t$, and $\beta$ is
needed in $u$, then at least one of $\alpha$ and $\beta$ must be
rewritten in \mbox{$\sor(t,u)$} (although neither is {\em needed\/}
according to the formal Definition~\ref{def:needed-redex}). Further
research is required on the practical impact of heuristic strategies for
allocating parallel processors to the members of strongly needed sets. It
is natural to give priority to the singleton sets (that is, to the
strongly needed redexes), but it is not clear whether a set of size $2$
should be preferred to one of size $3$---perhaps other factors than the
size of the set should be considered. Strongly needed redex sets are
essentially {\em disjunctive\/} assertions about the need for
redexes---more general sorts of boolean relations may be useful (e.g.,
either {\em all\/} of $\alpha_1,\ldots,\alpha_m$ or {\em all\/} of
$\beta_1,\ldots,\beta_n$ are needed).
Unfortunately, since strongly needed redexes are all outermost,
sequentiality analysis as known today can only help with parallelism
between different arguments to a function. But, one of the most useful
qualities of lazy programming is that it simulates a parallel
producer-consumer relationship between a function and its arguments.
It seems likely that much of the useful parallelism to be exploited in
equational logic programming involves parallel rewriting of nested
redexes. An analysis of nonoutermost needed redexes
\subind{needed redex!nonoutermost}\subind{redex!needed!nonoutermost}
appears to require the sort of abstract interpretation that is used in
strictness analysis \cite{Mycroft,Hughes-strict}---it
\auind{Mycroft, A.}\auind{Hughes, R. J. M.}
certainly will depend on right-hand sides as well as left-hand sides
of equations. Unfortunately, most of the proposed intermediate
languages for compiling equational programs are inherently sequential,
and a lot of work is required to convert current sequential compiling
ideas to a parallel environment.
\subind{combinator calculus!compiling to|(}
The idea of compiling everything into combinators may not be useful for
parallel implementation. The known translations of lambda terms into
combinators eliminate some apparent parallelism between application of a
function and rewriting of the definition of the function (although no
direct implementation is known to support all of this apparent
parallelism either). Only very preliminary information is available on
the inherent {\em parallel\/} power of rewriting systems: even the
correct definition of such power is problematic
\cite{O'Donnell-MIT}\auind{O'Donnell, M. J.}.
\subind{combinator calculus!compiling to|)}
\subind{strictness analysis!parallel|)}
\subind{sequentiality analysis!parallel|)}
At a more concrete level, there are a lot of problems involved in
parallelizing the heap-based execution of evaluation/rewriting
sequences. A data structure, possibly distributed amongst several
processors, is needed to keep track of the multiple locations at which
work is proceeding. If any speculative work is done on redexes that
are not known to be needed, there must be a way to kill off processes
that are no longer useful, and reallocate their processing to to
useful processes (although aggressive sharing through the signature
index confuses the question of usefulness of processes in the same way
that it confuses the usefulness of heap nodes).
\subind{sharing!parallel|(}
Sharing presents another challenge. Several different processors may
reach a shared node by different paths. It is important for data
integrity that they do not make simultaneous incompatible updates, and
important for efficiency that they do not repeat work. But, it is
incorrect for the first process to lock all others completely out of
its work area. Suppose we are applying a system of equations
including \mbox{$\car(\cons(x,y))\formeq x$}. There may be a node
$\alpha$ in the heap containing the signature
\mbox{$\cons(\beta_1,\beta_2)$} shared between two parents, one of
them containing \mbox{$\cons(\alpha,\delta)$} and the other containing
\mbox{$\car(\alpha)$}. A process might enter first through the $\cons$
parent, and perform an infinite loop rewriting inside $\beta_2$. If a
second process enters through the $\car$ node, it is crucial to allow
it to see the $\cons$ at $\alpha$, to link to $\beta_1$, and depending
on the context above possibly to continue rewriting at $\beta_1$ and
below.
Abstractly, we want to lock node/state pairs, and allow
multiple processes to inspect the same node, as long as they do so in
different states, but when a process reaches a node that is already
being processed {\em in the same state\/} it should wait, and allow
its processor to be reassigned, since any work that it tries to do
will merely repeat that of its predecessor. It is not at all clear how
to implement such a notion of locking with acceptable overhead.
Perhaps a radically different approach for assigning work to
processors is called for, that does not follow the structure of
current sequential implementations so closely. For example, instead of
the shared-memory approach to the heap in the preceeding discussion,
perhaps different sections of the heap should be assigned to
particular processors for a relatively long period, and demands for
evaluation should be passed as messages between processors when they
cross the boundaries of heap allocation.
\subind{sharing!parallel|)}
\subind{parallel evaluation|)}
\section{Extensions to Equational Logic Programming}
\label{sec:extensions}
\subsection{Incremental Infinite Input and Output}
\label{sec:inc-inf-IO}
\subind{incremental input/output|(}\subind{infinite input/output|(}
\subind{input/output!incremental|(}\subind{input/output!infinite|(}
\subind{lazy evaluation|(}
The logic programming approach to answering normal form queries leads
naturally to the use of lazy evaluation to guarantee completeness.
But, once we commit to lazy evaluation, the basic scheme of normal
form queries seems too limited. Inside an equational computation, it
is natural to conceive of procedures consuming and producing
potentially infinite terms incrementally, in the manner of
communicating coroutines. It is perfectly sensible to define objects
that correspond intuitively to infinite terms, as long as only finite
portions of these infinite terms are required to produce output. It is
annoying that the full flexibility of incremental demand-driven
computation in the internal computation is not available at the
input/output interface. Aside from the loss of input/output
programming power per se, such a discrepancy between internal and
external communication tends to encourage large, unwieldy, monolithic
programs, and discourage the collections of small, separate, modular
programs that are more desirable in many ways.
\subind{modularity!equational logic programming}
\subind{lazy evaluation|)}
The desired abstract behavior of a demand-driven incremental I/O
interface is essentially clear. The consumer of output demands the
symbol at the root of the output. The equational program computes
until it has produced a head-normal form equivalent to the input, then
it outputs the root symbol of that head-normal form, and recursively
makes the principal subterms available to the consumer of output, who
can demand symbols down different paths of the output tree in any
order. During the computation of a head-normal form, the equational
program generates demands for symbols in the input. The producer of
input is responsible only for providing those input symbols that are
demanded. In this way, both input and output terms are treated
incrementally as needed, and each may be infinite.
A direct generalization of equational logic programming by extending
the set of terms to include infinite ones is semantically problematic.
\begin{example}
\label{exa:infinite-output}
Consider the system of equations\addtocounter{eqnex}{1}
$$\mathbold{T}_{\theeqnex}=\{a\formeq f(c,a),\;b\formeq f(c,f(c,b))\}$$
The natural infinite output to expect from input\/ $a$ is\/
\mbox{$f(c,f(c,\ldots))$}. $b$ produces the same infinite output. So,
if in our semantic system \mbox{$f(c,f(c,\ldots))$} is a generalized
term, with a definite value (no matter what that value is), then\/
\mbox{$\mathbold{T}_{\theeqnex}\models(a\formeq b)$}. But, according to the
semantics of Definition~\ref{def:equation-semantic-system} in
Section~\ref{sec:equational-logic} of the chapter `Introduction: Logic
and Logic Programming Languages,'
\mbox{$\mathbold{T}_{\theeqnex}\not\Emodels(a\formeq b)$}, because
there are certainly models in which the equation $x\formeq f(c,x)$ has
more than one solution. For example, interpret $f$ as integer addition,
and $c$ as the identity element zero. Every integer $n$ satisfies
$n=0+n$.
\end{example}
Even if we replace the second equation with the exact substitution of
$b$ for $a$ in the first, that is \mbox{$b\formeq f(c,b)$}, it does
not follow by standard equational semantics that \mbox{$a\formeq b$}.
With some difficulty we may define a new semantic system, with a
restricted set of models in which all functions have sufficient
continuity properties to guarantee unique values for infinite
terms. But it is easy to see that the logical consequence relation for
such a semantic system is not semicomputable (recursively enumerable).
It is always suspect to say that we understand a system of logic
according to a definition of meaning that we cannot apply effectively.
Instead, I propose to interpret outputs involving infinite terms as
abbreviations for infinite conjunctions of formulae in the First-Order
Predicate Calculus (FOPC) (such infinite conjunctions, and infinite
disjunctions as well, are studied in the formal system called
$L_{\omega_1,\omega}$\subind{infinitary logic}
\notind{$L_{\omega_1,\omega}$ (infinitary con/disjunctions)}
\cite{Karp}\auind{Karp, C. R.}).
\begin{definition}
\label{output-term-conjunction}
\subind{conjunctive equation}
The\/ {\em First-Order Predicate Calculus with Equality\/}
($\mbox{FOPC}_{\formeq}$) is the result of including the equality symbol\/
$\formeq$ in the set\/ $\mbox{\bf Pred}_2$ of binary predicate symbols, and
combining the semantic rules for FOPC (Definitions
\ref{def:FOPC-semantic-system}--\ref{def:FOPC-formula},
Section~\ref{sec:fopc} of the chapter `Introduction: Logic and Logic
Programming Languages') and Equational Logic (Definitions
\ref{def:equation-formula}--\ref{def:equation-semantic-system},
Section~\ref{sec:equational-logic} of the `Introduction \ldots'
chapter) in the natural way (a model for $\mbox{FOPC}_{\formeq}$
satisfies the restrictions of a model for FOPC and the restrictions of
a model for Equational Logic).
$L_{\omega_1,\omega,\formeq}$ is the extension of $\mbox{FOPC}_{\formeq}$ to allow
countably infinite conjunctions and disjunctions in formulae. Let\/
$\mathbold{F}_{P,\omega_1,\omega,\formeq}$ be the set of formulae in\/
$L_{\omega_1,\omega,\formeq}$. Extend the semantic system for
$\mbox{FOPC}_{\formeq}$ to a semantic system for $L_{\omega_1,\omega,\formeq}$ by
the following additional rules for infinite conjunctions and
disjunctions:
\begin{enumerate}
\item $\rho_{\tau,\nu}(\bigwedge\{A_1,A_2,\ldots\})=1\mbox{ if and only if }
\rho_{\tau,\nu}(A_i)=1\mbox{ for all }i\geq 1$
\item $\rho_{\tau,\nu}(\bigvee\{A_1,A_2,\ldots\})=1\mbox{ if and only if }
\rho_{\tau,\nu}(A_i)=1\mbox{ for some }i\geq 1$
\end{enumerate}
That is, an infinite conjunction is true precisely when all of its
conjuncts are true; and an infinite disjunction is true precisely when
at least one of its disjuncts is true.
A set\/ $T$ of terms is a\/ {\em directed set\/} if and only if, for
\subind{directed set}
every two terms\/ \mbox{$t_1,t_2\in T$}, there is a term\/
\mbox{$t_3\in T$} such that\/ $t_3$ is an instance of\/ $t_1$ and also
an instance of\/ $t_2$.
Let\/ $U$ be a countable directed set of linear terms. For each\/
\mbox{$u\in U$}, let\/ $\vec{y}_u$ be a list of the variables
occurring in\/ $u$. Let\/ $t$ be a finite ground term. The\/ {\em
conjunctive equation\/} of\/ $t$ and\/ $U$ is the infinite
conjunction
$$\bigwedge\{(\exists\vec{y}_u\quantsep (t\formeq u))\setsep u\in U\}$$
$\mathbold{T}^\omega_P$ is the set of finite and infinite terms.
The\/ {\em term limit\/} of a directed set\/ $U$ (written\/
\mbox{$\lim(U)$}) is the possibly infinite term resulting from
overlaying all of the terms\/ \mbox{$u\in U$}, and substituting the
new symbol\/ $\bot$ for any variables that are not overlaid by
nonvariable symbols.
\subind{term limit}\notind{$\lim$ (term limit)}
Since members of $U$ are pairwise consistent, every location in the
limit gets a unique symbol, or by default comes out as $\bot$. To see
more rigorously that the term limit of $U$ is well defined, construct a
(possibly transfinite) {\em chain,} $\langle t_0\sqsubseteq
t_1\sqsubseteq\cdots\rangle$. First, $t_0=\bot$. Given $t_{\alpha}$,
choose (axiom of choice) an $s\in U$ such that $s\not\sqsubseteq
t_{\alpha}$, and let $t_{\alpha+1}$ be the overlay of $s$ with
$t_{\alpha}$. At limit ordinals $\lambda$, let $t_{\lambda}$ be the
limit of the chain of terms preceding $\lambda$. With a lot of
transfinite induction, we get to a $t_{\beta}$ such that, for all $s\in
U$, $s\sqsubseteq t_{\beta}$, at which point the chain is finished.
$\lim(U)$ is the limit of that chain.
The\/ {\em canonical set\/} of an infinite term\/ $t$ (written\/
\mbox{$\appr(t)$}) is the set of all finite linear terms\/ $t'$
(using some arbitrary canonical scheme for naming the variables) such
that\/ $t$ is an instance of $t'$. \mbox{$\appr(t)$} is a directed
set, and\/ \mbox{$\lim(\appr(t))=t$}.
\subind{canonical set}\notind{$\appr$ (canonical set of approximations)}
A careful formal definition of infinite terms and limits is in
\cite{Kenneway-Klop-Sleep,Dershowitz-Kaplan-Plaisted},
\auind{Kenneway, J. R.}\auind{Klop, J. W.}\auind{Sleep, M. R.}
\auind{Dershowitz, N.}\auind{Kaplan, S.}\auind{Plaisted, D. A.}
but an intuitive appreciation suffices for this section.
\end{definition}
When the finite input $t$ produces the infinite output
\mbox{$\lim(U)$}, instead of interpreting the output as the equation
\mbox{$t\formeq\lim(U)$}, interpret it as the conjunctive equation of
$t$ and $U$. Notice that a {\em chain\/}\subind{chain} (a sequence
\mbox{$u_1,u_2,\ldots$} such that $u_{i+1}$ is an instance of $u_i$)
is a special case of a directed set. The infinite outputs for finite
inputs may be expressed by chains, but the generality of directed sets
is needed with infinite outputs for infinite inputs below.
Infinite inputs require a more complex construction, since they
introduce universal quantification that must be nested appropriately
with the existential quantification associated with infinite output.
Also, different orders in which a consumer explores the output may
yield different sequences of demands for input, and this flexibility
needs to be supported by the semantics.
\begin{definition}
\label{def:i-o-term-conjunction}
\subind{conjunctive equation}
Let\/ $T$ and\/ $U$ be directed sets of linear terms such that no
variable occurs in both\/ \mbox{$t\in T$} and\/ \mbox{$u\in U$}. For
each term\/ \mbox{$t\in T$} and\/ \mbox{$u\in U$}, let\/ $\vec{x}_t$
and\/ $\vec{y}_u$ be lists of the variables occurring in\/ $t$ and\/
$u$, respectively. Let\/ \mbox{$f:T\funcarrow\powerset{U}$} be a
function from terms in\/ $T$ to directed subsets of\/ $U$, such
that when\/ $t_2$ is an instance of\/ $t_1$, then every member of\/
\mbox{$f(t_2)$} is an instance of every member of\/ \mbox{$f(t_1)$}.
The\/ {\em conjunctive equation\/} of\/ $T$ and\/ $U$ by\/ $f$ is the
infinite conjunction
$$
\bigwedge\{(\forall\vec{x}_{t}\quantsep\exists\vec{y}_u\quantsep
(t\formeq u))\setsep t\in T\mbox{ and }u\in f(t)\}$$
\end{definition}
\begin{definition}
\label{def:inc-eq-query-system}
Let\/ \mbox{$s_1,\ldots,s_i\in\mathbold{T}_P$} be terms. A term\/ $u$ is
an\/ {\em incremental normal form\/} for\/ \mbox{$\{s_1,\ldots,s_i\}$}
if and only if no nonvariable subterm of\/ $u$ unifies with a term
in\/ \mbox{$\{s_1,\ldots,s_i\}$}. Equivalently, $u$ is an incremental
normal form if and only if\/ $u$ is a normal form and every
substitution of normal forms for variables leaves\/ $u$ in normal
form. Equivalently, $u$ is an incremental normal form if and only if
melting preserves\/ $u_\Omega$, where\/ $u_\Omega$ results from
substituting\/ $\Omega$ for each variable in\/ $u$.
Let\/ $\cnormalize$ be a new formal symbol. Let
$$\mathbold{Q}^\omega_{\formeq}=\{(\cnormalize s_1,\ldots,s_i\quantsep
t_\omega)\setsep
t_\omega\in\mathbold{T}^\omega_P,s_1,\ldots,s_i\in\mathbold{T}_P\}$$
Define the relation\/
\mbox{$\answers^\omega_{\formeq}\subseteq\mathbold{Q}^\omega_{\formeq}\times
\mathbold{F}_{P,\omega_1,\omega,\formeq}$} by
$$(\cnormalize s_1,\ldots,s_i\quantsep
t_\omega)\answers^\omega_{\formeq} A$$ if and only if there exists a
directed set\/ $U$ and a monotonic (in the instance ordering
$\sqsubseteq$) function
\mbox{$f:\appr(t_{\omega})\funcarrow\powerset{U}$} such that
\begin{enumerate}
\item $A$ is the conjunctive equation of\/ \mbox{$\appr(t_\omega)$} and\/
$U$ by\/ $f$
\item $u$ is in incremental normal form, for all\/ \mbox{$u\in U$}
\end{enumerate}
\end{definition}
Now\/ \mbox{$\langle\mathbold{F}_{P,\omega_1,\omega,\formeq},
\mathbold{Q}^\omega_{\formeq},\answers^\omega_{\formeq}\rangle$}
is a query system representing the answers to questions of the form
`what set of incremental normal forms for\/ $s_1,\ldots,s_i$ is
conjunctively equal to\/ $t$?' In an implementation of equational
logic programming based on $\cnormalize$, the consumer of output
generates demands causing the computation of some $u\in U$. Some of
the symbols in $u$ are demanded directly by the consumer, others may
be demanded by the system itself in order to get an incremental normal
form. The implementation demands enough input to construct
\mbox{$t\in\appr(t_\omega)$} such that
\mbox{$u\in f(t)$}. By modelling the input and output as directed
sets, rather than sequences, we allow enough flexibility to model all
possible orders in which the consumer might demand output. The function
$f$ is used to model the partial synchronization of input with output
required by the semantics of equational logic. Unfortunately, the
trivial output $y,y,\ldots$, with term limit $\bot$, always satisfies
1--2 above. The most useful implementation would provide {\em
consequentially strongest\/} answers
(Definition~\ref{def:strongest-answer}, Section~\ref{sec:query-systems}
of the chapter `Introduction: Logic and Logic Programming
Languages')---that is, they would demand the minimum amount of input
semantically required to produce the demanded output. If
consequentially strongest answers appear too difficult to implement, a
more modest requirement would be that \mbox{$\lim(U)$} is maximal among
all correct answers---that is, all semantically correct output is
produced, but not necessarily from the minimal input. The techniques
outlined above do not represent sharing information, neither within the
input, within the output, nor between input and output. Further
research is needed into semantic interpretations of sharing
information.
\begin{example}
\label{exa:inc-query}
\addtocounter{eqnex}{1}
Consider the system
$$\mathbold{T}_{\theeqnex}=\{f(g(x,y))\formeq h(h(g(x,y)))\}$$
and the infinite input term
$$t_{\omega}=f(g(t_{\omega},t_{\omega}))$$
and the query
$$\cnormalize f(g(x,y))\quantsep t_{\omega}$$
The natural desired answer $A$ to this query is the conjunction of
{\renewcommand{\arraystretch}{1.5}
$$\begin{array}{llrcl}
\forall x_1\quantsep & \exists y_1\quantsep & x_1 & \formeq & y_1 \\
\forall x_1\quantsep & \exists y_1\quantsep & f(x_1) & \formeq & y_1 \\
\forall x_1,x_2\quantsep & \exists
y_1,y_2\quantsep & f(g(x_1,x_2)) & \formeq & h(h(g(y_1,y_2))) \\
\forall x_1,x_2\quantsep & \exists
y_1,y_2\quantsep & f(g(f(x_1),x_2)) & \formeq & h(h(g(y_1,y_2))) \\
\forall x_1,x_2,x_3\quantsep & \exists
y_1,y_2,y_3\quantsep & f(g(f(g(x_1,x_2),x_3))) & \formeq &
h(h(g(h(h(g(y_1,y_2))),y_3))) \\
\forall x_1,x_2\quantsep & \exists
y_1,y_2\quantsep & f(g(x_1,f(x_2))) & \formeq & h(h(g(y_1,y_2))) \\
\forall x_1,x_2,x_3\quantsep & \exists
y_1,y_2,y_3\quantsep & f(g(x_1,f(g(x_2,x_3)))) & \formeq &
h(h(g(y_1,h(h(g(y_2,y_3)))))) \\
& & & \vdots
\end{array}$$
}
$A$ represents the infinite output
$t'_{\omega}=h(h(g(t'_{\omega},t'_{\omega})))$.
$U$ does not contain all approximations to $t'_{\omega}$, but only the
ones that have $g$s as the rootmost non-$\bot$ symbols. The montonte function
mapping partial inputs to the portions of output that they determine is:
{\renewcommand{\arraystretch}{1.5}
$$\begin{array}{rcl}
\bot & \mapsto & \bot \\
f(\bot) & \mapsto & \bot \\
f(g(\bot,\bot)) & \mapsto & h(h(g(\bot,\bot))) \\
f(g(f(\bot),\bot)) & \mapsto & h(h(g(\bot,\bot))) \\
f(g(f(g(\bot,\bot)),\bot)) & \mapsto & h(h(g(h(h(g(\bot,\bot))),\bot))) \\
f(g(\bot,f(\bot))) & \mapsto & h(h(g(\bot,\bot))) \\
f(g(\bot,f(g(\bot,\bot)))) & \mapsto & h(h(\bot,g(h(h(g(\bot,\bot)))))) \\
& \vdots
\end{array}$$
}
A natural approach to representing sharing information would be to use
$\forall x_1,x_2\quantsep f(g(x_1,x_2))\formeq
h(h(g(x_1,x_2)))$ instead of $\forall x_1,x_2\quantsep\exists
y_1,y_2\quantsep f(g(x_1,x_2))\formeq h(h(g(y_1,y_2)))$, etc., but
this idea has yet to be explored.
\end{example}
It is simple in principle to modify an implementation of equational
logic programming to deal incrementally with infinite input queries
and output answers in the query system
$\mathbold{Q}^\omega_{\formeq}$. The practical details of such an
implementation are interesting and challenging. The {\em tours\/}
protocol \cite{Rebelsky-tree,Rebelsky-thesis}\auind{Rebelsky, S. A.}
provides one design for incremental input and output of infinite
terms. Implementations based on the recursive scheme for $\heval$
\notind{$\heval$ (head-normalization procedure)} in
Figure~\ref{fig:head-normalize} may be adapted relatively simply to
incremental input and output, simply by replacing the scheme for
$\norm$ in Figure~\ref{fig:normalize} by a scheme that issues calls to
$\heval$ only as symbols are demanded by the consumer of output.
Unfortunately, such adaptations are not normally complete, because of
sequentializing problems. The trouble is that conjunctive behavior
(which is easy to sequentialize) in the application of a rule
corresponds to disjunctive behavior (which is impossible to
sequentialize) in the production of strong head-normal forms, and vice
versa.
\begin{example}
\label{exa:parallel-head-normalize}
Consider the following equation defining the {\em negative
sequential-or:}
$$\mathbold{T}_{\sor-s}=\{\sor(\false,\false)\formeq\false\}$$ This
equation causes no problem in sequentializing the rewriting of a term
to a finite normal form, since it requires\/ {\em both\/} arguments
to\/ $\sor$. But, \mbox{$\sor(s,t)$} is a strong head-normal form if
and only if {\em either\/} of\/ $s$ or\/ $t$ is a strong head-normal
form\/
\mbox{$\neq\false$}. So, it is not safe to rewrite\/ $s$, because of
the possibility that\/ $s$ has no strong head-normal form, while\/ $t$
rewrites to strong head-normal form\/ \mbox{$u\neq\false$}. For symmetric
reasons, it is not safe to rewrite\/ $t$. Only a parallel rewriting of
both\/ $s$\/ and $t$ is complete, and such parallel rewriting is very
likely to waste steps.
By contrast, the positive parallel-or equations\/ $\mathbold{T}_{\sor+}$ of
Examples~\ref{exa:nonorthogonal} and~\ref{exa:parallel-or-problem}
prevent sequential computation of a finite normal form, but pose no
problem to sequential head-normalization of\/ \mbox{$\sor(s,t)$} by
rewriting\/ $s$ and\/ $t$---{\em both\/} arguments must rewrite to
strong head-normal forms\/ \mbox{$\neq\true$} in order to head-normalize\/
\mbox{$\sor(s,t)$}.
\end{example}
So, to achieve complete implementations of equational logic
programming with incremental input and output, we must solve the
sticky problems of efficient implementation of parallel rewriting,
even if the parallelism is simulated by multitasking on a sequential
processor. In the meantime, sequential and incomplete implementations
are likely to be practically useful for many problems---the
incompleteness appears to be no more serious than the incompleteness
of Prolog, which is widely accepted.
Beyond the incremental evaluation of an input that increases over
time, there is an even more challenging problem to evaluate a {\em
changing\/} input. In principle, changing input may be reduced to
increasing input by representing the entire edit history of a term as
another term \cite{Rebelsky-thesis}\auind{Rebelsky, S. A.}. In
practice, there is a lot of research required to achieve practical
efficiency. A highly efficient general scheme for re-evaluation after
a change in input will be extremely valuable. There are several
techniques known for re-evaluation in special structures, such as
attributed parse trees \cite{Demers-Reps-Teitelbaum,Pugh-Teitelbaum},
\auind{Demers, A.}\auind{Reps, T.}\auind{Teitelbaum, T.}\auind{Pugh, W.}
but none approaches the generality or fine-grainedness of equational
re-evaluation. A late result for the $\lambda$-calculus that I have not
had a chance to review is in~\cite{Field-Teitelbaum}\auind{Field, J.}.
Powerful memo techniques seem to be crucial to efficient re-evaluation
of a changed input term. In some sense efficient re-evaluation is
precisely the intelligent reuse of partial results from the previous
evaluation. I conjecture that nonground memoing will be especially
valuable for this problem.
\subind{incremental equational program}
Finally, we should study implementations of equational logic
programming in which the set of equations is changing. Changing sets
of equations arise in rapid recompiling during program development, in
applications that use equations as inputs rather than as programs, and
in methods based on nonground memoing or paramodulation
(Section~\ref{sec:nonground-sharing}). Very little is known about this
problem---all that I am aware of is a technique for incremental
maintenance of a finite automaton for pattern
matching \cite{Strandh-pattern}\auind{Strandh, R. I.}.
\subind{incremental input/output|)}\subind{infinite input/output|)}
\subind{input/output!incremental|)}\subind{input/output!infinite|)}
\subsection{Solving Equations}
\label{sec:equation-solving}
\subind{equation solving|(}
Queries of the form \mbox{$(\solve x_1,\ldots,x_n\quantsep s\formeq
t)$} (Definition~\ref{def:solve-equation},
Section~\ref{sec:equational-logic} of the chapter `Introduction: Logic
and Logic Programming Languages') require the solution of the equation
\mbox{$s\formeq t$} for the values of $x_1,\ldots,x_n$. This appears
to be much more difficult than finding normal forms, and to offer more
programming power. In particular, it appears that solving equations is
the essential problem in the most natural approach to combining the
useful qualities of Prolog and lazy functional programming. Notice
that $\solve$ queries are quite
\notind{$\solve$ (equation solving question)}
similar to the $\what$ queries processed by Prolog
(Section~\ref{sec:fopc} of the `Introduction \ldots' chapter). A
combination of the methods for solving equations with the techniques
of Prolog should provide a way of answering $\what$ queries in the
First-Order Predicate Calculus with Equality.\notind{$\what$ (`what'
question)}
Much of the theory of term rewriting falls short of dealing with the
problems posed by equation solving. Confluence appears to help---if
the given set $\mathbold{T}$ of equations is confluent, then we may seek an
instance $s'$ of $s$ and $t'$ of $t$, and a term $u$, such that
\mbox{$s'\rewrites{\boldscript{T}}u$} and
\mbox{$t'\rewrites{\boldscript{T}}u$}, ignoring the symmetric rule and
applying equations only from left to right. But, for a complete
implementation, it is not sufficient to rewrite needed redexes. That
is, it may be that there is a $u$ such that
\mbox{$s'\rewrites{\boldscript{T}}u$} and
\mbox{$t'\rewrites{\boldscript{T}}u$}, but no such $u$ may be reached
by rewriting only needed redexes (Definition~\ref{def:needed-redex},
Section~\ref{def:needed-redex}). Outermost complete rewriting
(Definition~\ref{def:outermost-sequence},
Section~\ref{sec:complete-outermost}) is similarly
insufficient. Complete rewriting sequences
(Definition~\ref{def:complete-fair-sequence}) come a bit closer. If
$s'$ and $t'$ have a common form, then a complete rewriting sequence
starting with $s'$ is guaranteed eventually to produce a term $u$ such
that \mbox{$t'\rewrites{\boldscript{T}}u$}. But, another complete
rewriting sequence starting with $t'$ is not guaranteed to produce the
same $u$. Finally, even if we generate rewriting sequences from $s'$
and $t'$ that both contain $u$, it is not clear how to synchronize the
sequences so that $u$ appears in both of them at the same time.
No thoroughly satisfactory method is known for a complete
implementation of equation solving in an arbitrary strongly sequential
system. But, enough is known to make two observations that are
likely to be useful in such an implementation.
\begin{enumerate}
\item Suppose we are solving \mbox{$f(s_1,\ldots,s_m)\formeq
g(t_1,\ldots,t_n)$}, where \mbox{$f\neq g$}. It suffices to rewrite
only needed redexes until the head symbols agree. Unfortunately, it is
hard to know whether to rewrite \mbox{$f(s_1,\ldots,s_m)$} or
\mbox{$g(t_1,\ldots,t_n)$} or both, but within each we can use the
same sort of sequentializer that rewrites a single term to normal
form.
\item There are only two cases in which it is helpful to instantiate a
variable:
\begin{enumerate}
\item when the instantiation creates a redex at a proper ancestor of
the variable, which we immediately rewrite (so, in the situation
of (1) above, the redex must be needed);
\item when the instantiation unifies corresponding subterms of $s$
and $t$.
\end{enumerate}
In each case, we should substitute the most general (smallest)
term that achieves the redex or the unification.
\end{enumerate}
Jayaraman noticed (1) \cite{Jayaraman}\auind{Jayaraman, B.}, and used
it to implement a system called EqL for solving equations
\mbox{$s\formeq t$} only in the case where instances $s'$ of $s$ and
$t'$ of $t$ reduce to a common form consisting entirely of constructor
symbols in a constructor-orthogonal system
(Definition~\ref{def:constructor-orthogonal},\subind{constructor}
Section~\ref{sec:orthogonality}). For this purpose, it suffices to
rewrite $s'$ and $t'$ to head-normal forms. If the head symbols agree,
recursively solve equations between the corresponding arguments,
otherwise there is no solution. It should be straightforward to
generalize EqL to find solutions in normal form in strongly sequential
rewrite- and rule-orthogonal systems. If we want all solutions, even
those that are not in normal form, we must somehow explore in parallel
the attempt to rewrite the head symbol of $s'$ and the attempt to
solve the equation without head rewriting one or both of them (and a
similar parallelism for $t'$). This introduces all of the problems of
parallelism discussed in Section~\ref{sec:parallel}, and it probably
limits or eliminates the ability to do path compression on $\repl$
links (Section~\ref{sec:drivers}), since it is not sufficient to work
only on the most rewritten form of a term (\mbox{$\val^{\mbox{\it
max}}_{\repl}(\alpha_0)$}, where $\alpha_0$ is the root of the
heap representation of the term being rewritten).
Heuristically, it appears that the majority of the individual steps in
solving an equation are unlikely to create such parallelism. But, EqL's
uniform requirement of finding a solution in normal form is likely to
introduce the same huge inefficiencies in some cases as strict
evaluation (notice that the individual evaluations of $s'$, $t'$ to
normal form are lazy, but the equation solution is not lazy, as it
may take steps that are required to reach normal form, but not to
solve the equation). If these heuristic observations are accurate,
even an implementation with rather high overhead for parallelism may
be very valuable, as long as the cost of parallelism is introduced
only in proportion to the actual degree of parallelism, rather than as
a uniform overhead on sequential evaluation as well.
Among the theoretical literature on term rewriting, the most useful
material for equation solving will probably be that on {\em
narrowing\/}\subind{narrowing} \cite{Fay}\auind{Fay, M.} (see the
chapter `Equational Reasoning and Term Rewriting Systems in Volume~1),
which is a careful formalization of observation (2) above. Information
on sequencing narrowing steps seems crucial to a truly efficient
implementation: some new results are in \cite{Antoy-Echahed-Hanus}.
\auind{Antoy, S.}\auind{Echahed, R.}\auind{Hanus, M.}
\subind{equation solving|)}
\subsection{Indeterminate Evaluation in Subset Logic}
\label{sec:indeterminate}
\subind{nonconfluent term rewriting|(}\subind{term rewriting!nonconfluent|(}
\subind{indeterminate evaluation|(}\subind{evaluation!indeterminate|(}
Until now, I have considered only quantified conjunctions of equations
as formulae, else I cannot claim to be using equational logic. The
discipline of equational logic constrains the use of term rewriting
systems, and leads us to insist on confluent systems with lazy
evaluation. Lazy evaluation appears to be mostly a benefit, but the
restriction to confluent systems is annoying in many ways, and it
particularly weakens the modularity of equational logic programming
languages, since orthogonality depends on the full textual structure
of left-hand sides of equations, and not just on an abstract notion of
their meanings. As well as improving the techniques for guaranteeing
confluence, we should investigate nonconfluent term rewriting. Since,
without confluence, term rewriting cannot be semantically complete for
equational logic, we need to consider other logical interpretations of
term rewriting rules.
\subind{subset logic programming|(}
A natural alternative to equational logic is {\em subset logic\/}
\cite{O'Donnell-bordeaux}. Subset logic has the same terms as
equational logic, the formulae are the same except that they use the
backward subset relation symbol $\supseteq$ instead of $\formeq$
(Definition~\ref{def:equation-formula},
Section~\ref{sec:equational-logic} of the chapter `Introduction: Logic
and Logic Programming Languages'). Subset logic semantics are the same
as equational (Definition~\ref{def:equation-semantic-system},
Section~\ref{sec:equational-logic} of the `Introduction \ldots'
chapter~), except that every term represents a {\em subset\/} of the
universe of values instead of a single value, function symbols
represent functions from subsets to subsets that are extended
pointwise from functions of individual values (i.e.,
\mbox{$f(S)=\bigcup\{f(\{x\})\setsep x\in S\}$}). Subset logic is
complete with the {\em reflexive, transitive,} and {\em
substitution\/} rules of equality, omitting the {\em symmetric\/}
rule (Definition~\ref{def:equational-rules},
Section~\ref{sec:inferential-proof}). So, term rewriting from left to
right is complete for subset logic, with no restrictions on the rules.
Technically, it doesn't matter whether the left side of a rule is a
subset of the right, or {\em vice versa,} as long as the direction is
always the same. Intuitively, it seems more natural to think of
rewriting as producing a subset of the input term, since then a term
may be thought of as denoting a set of possible answers. Thus, a
single line of a subset logic program looks like \mbox{$l\supseteq
r$}. Note that normal forms do not necessarily denote singleton
sets, although it is always possible to construct models in which they
do.
Subset logic programming naturally supports programs with
indeterminate answers. When a given input $s$ rewrites to two
different normal forms $t$ and $u$, subset logic merely requires that
\mbox{$s\supseteq t$} and \mbox{$s\supseteq u$} are true, from
which it does not follow that \mbox{$t\formeq u$} is true, nor
\mbox{$t\supseteq u$}, nor \mbox{$u\supseteq t$}. While
equational logic programming extends naturally to infinite inputs and
outputs, without changing its application to finite terms, such
extension of subset logic programming is more subtle. If only finite
terms are allowed, then infinite computations may be regarded as a
sort of failure, and a finite normal form must be found whenever one
exists. If incremental output of possibly infinite normal forms is
desired, then there is no effective way to give precedence to the
finite forms when they exist. The most natural idea seems to be to
follow all possible rewriting paths, until one of them produces a
stable head symbol (that is, the head symbol of a strong head-normal
form) for output. Whenever such a symbol is output, all rewriting
paths producing different symbols at the same location are
dropped. Only a reduction path that has already agreed with all of the
symbols that have been output is allowed to generate further output.
The details work out essentially the same as with infinite outputs for
equational programs, merely substituting $\supseteq$ for $\formeq$. It
is not at all clear whether this logically natural notion of
commitment to a symbol, rather than a computation path, is useful. I
cannot find a natural semantic scheme to support the more conventional
sort of commitment to a computation path instead of an intermediate
term, although a user may program in such a way that multiple
consistent computation paths never occur and the two sorts of
commitment are equivalent.
\subind{subset logic programming|)}
Efficient implementation of nonconfluent rewriting presents a very
interesting challenge. It is obviously unacceptable to explore naively
the exponentially growing set of rewriting sequences. A satisfying
implementation should take advantage of partial confluent behavior to
prune the search space down to a much smaller set of rewriting
sequences that is still capable of producing all of the possible
outputs. The correct definition of the right sort of partial
confluence property is not even known. It is {\em not\/} merely
confluence for a subset of rules, since two rules that do not
interfere with one another may interfere differently with a third, so
that the order in which the two noninterfering rules are applied may
still make a difference to the outcome. Pattern-matching and
sequencing techniques must be generalized as well, and a good data
structure designed to represent simultaneously the many different
rewritings under consideration as compactly as possible. A good
complete implementation of nonconfluent rewriting will probably allow
the prohibited terms in normal forms
(Definition~\ref{def:equational-query-system},
Section~\ref{sec:equational-logic} of the chapter `Introduction: Logic
and Logic Programming Languages') to be different from the left-hand
sides of rules. Sharing of equivalent subterms becomes problematic
since it may be necessary to reduce two different occurrences of the
same subterm in two different ways. The difficulties in discovering a
complete implementation of subset logic programming would be well
rewarded, since this style of programming can capture the useful
indeterminate behavior of Prolog, while avoiding the repeated
generation of the same solution by slightly different paths that slows
some Prolog programs down substantially.
Even an incomplete implementation of nonconfluent rewriting might be
very valuable. In the spirit of logic programming, such an
implementation should depend on inherent structural properties of the
rules in a program to determine which of a set of possible rewriting
steps is chosen when the redex patterns overlap. From the point of
view of implementation, the most natural approach appears to be to
select an outermost connected component of overlapping redex patterns,
but to rewrite within the component from innermost out. It is not
clear how to resolve the ambiguous case where two redex patterns have
the same root, nor has anyone proposed a sensible semantic explanation
based on logic programming for any principle for choosing between
conflicting rules.
Jayaraman has a different concept of subset logic programming
\cite{Jayaraman-subset}.\subind{subset logic programming}
\auind{Jayaraman, B.}
Meseguer's {\em rewriting logic\/}\subind{rewriting logic} has the same
rules as my subset logic, with a different semantic justification.
\cite{Meseguer-multiparadigm}\auind{Meseguer, J.}
\subind{indeterminate evaluation|)}\subind{evaluation!indeterminate|)}
\subind{nonconfluent term rewriting|)}\subind{term rewriting!nonconfluent|)}
\subsection{Relational Rewriting}
\label{sec:relational-rewriting}
\subind{relational rewriting|(}\subind{rewriting!relational|(}
\subind{predicate logic programming without functors|(}
Equation solving can lead to a combination of Prolog and functional
programming through logic programming in the First-Order Predicate
Calculus with Equality. Subset logic programming, implemented with
nonconfluent term rewriting, can lead to an implementation of
Predicate Calculus programming in a sort of indeterminate functional
programming. Another intriguing way to reconcile some of the good
qualities of Prolog and functional programming is relational logic
programming. Relational logic programming can be approached as a
generalization of terms to systems of relational constraints, or as a
purification of Prolog by removing the function symbols (called
`functors' in Prolog).
A relational formula is a FOPC formula with no function symbols (this
prohibition includes zeroary functions, or constants). The natural
analogue to a term in relational logic is a conjunction of atomic
formulae, which may be represented as a hypergraph (like a graph, but
edges may touch more than two nodes) with nodes standing for variables
and hyperedges (edges touching any number of nodes, not necessarily
two) representing predicate symbols. Two fundamental problems stand
in the way of a useful implementation of relational logic programming
through hypergraph rewriting.
\begin{enumerate}
\item We need an efficient implementation of hypergraph rewriting. The
pattern-matching problem alone is highly challenging, and no practical
solution is yet known.
\item We need a semantic system that makes intuitive sense, and also
supports computationally desirable methods of hypergraph rewriting.
\end{enumerate}
A first cut at (2) might use the semantic system for FOPC, and express
rewrite rules as formulae of the form
$$\forall\vec{x}\quantsep\exists\vec{y}\quantsep(A_1\land\cdots\land
A_m)\limplied(B_1\land\cdots\land B_n)$$ These clauses differ from the
Horn clauses of Prolog in two ways. The quantification above is
$\forall\vec{x}\quantsep\exists\vec{y}\quantsep$, where clauses are
universally quantified. And, the consequence of the implication above
is a conjunction, where Horn clauses allow at most one atomic formula
in the consequence, and even general clauses allow a disjunction,
rather than a conjunction. Notice that, with Prolog-style
all-universal quantification, the implication distributes over
conjunctions in the consequence (\mbox{$(A_1\land A_2)\limplied B$} is
equivalent to
\mbox{$(A_1\limplied B)\land(A_2\limplied B)$}). But, if existentially
quantified variables from $\vec{y}$ appear in $A_1,\ldots,A_m$, the
distribution property does not hold.
Since the consequence and hypothesis of our formula are both
conjunctions of atomic formulae, they may be represented by
hypergraphs, and it is natural to read the formula as a hypergraph
rewrite rule. A natural sort of hypergraph rewriting will be sound for
deriving universally quantified implications from $\forall\exists$
implications, as long as the existentially quantified variables
$\vec{y}$ correspond to nodes in the graph that participate {\em
only\/} in the hyperedges representing $A_1,\ldots,A_m$. The
universally quantified variables in $\vec{x}$ correspond to nodes that
may connect in other ways as well as by hyperedges representing
$A_1,\ldots,A_m$, so they are the interface nodes to the unrewritten
part of the hypergraph. Interestingly, the final result of a
computation on input $I$ is an implication $\forall\vec{w}\quantsep
I\limplied O$. This implication does not give a solution to the goal
$I$, as in Prolog. Rather, it rewrites the goal $I$ into a (presumably
simpler or more transparent) goal $O$, such that every solution to $O$
is also a solution to $I$. This is formally similar to functional and
equational logic programming, in that the class of legitimate outputs
is a subclass of the legitimate inputs---in Prolog inputs are formulae
and outputs are substitutions. Constraint logic programming
\subind{constraint logic programming}\subind{logic programming!constraint}
\cite{Jaffar-Lassez,Lassez}\auind{Jaffar, J.}\auind{Lassez, J.-L.} is perhaps
heading in this direction.
The semantic treatment suggested above is not very satisfactory, as it
rules out lazy evaluation, and even storage reclamation (an
intermediate part of a hypergraph that becomes disconnected from the
input and output cannot be thrown away, even though it cannot affect
the nature of a solution, since if any part of the hypergraph is
unsolvable, the whole conjunction is unsolvable). Also, the logically
sound notion of unification of the left-hand side of a rule with a
portion of a hypergraph in order to rewrite it seems far too liberal
with this semantic system---without the single-valued quality of
functions, additional relations can be postulated to hold anywhere. I
conjecture that the first problem can be solved by a semantic system
in which every unquantified formula is solvable, so that a
disconnected portion of the hypergraph may be discarded as
semantically irrelevant. One way to achieve this would be to use some
sort of measured universe \cite{Cohn-measure}\auind{Cohn, D. L.}, and
require every atomic predicate to hold for all but a subset of measure
$0$---then every finite intersection of atomic predicates
(equivalently, the value of every finite conjunction of atomic
formulae) has a solution. Such a semantic system is not as weird as it
first seems, if we understand the universe as a space of {\em
concepts,} rather than real objects, and think of each predicate
symbol as giving a tiny bit of information about a concept. We can
conceive of every combination of information (such as a green unicorn
with a prime number of feet that is a large power of $2$), even though
most such combinations never enter our notion of reality. Although the
question of solvability becomes trivial in this semantic system,
implications between solutions still have interesting content. The
problem of limiting unifications in a useful way appears more
difficult. I suggest that nonclassical `relevant' interpretations of
implication
\subind{relevance logic}\subind{relevant implication}
\subind{implication!relevant}
\cite{Anderson-Belnap}\auind{Anderson, A.
R.}\auind{Belnap, N. D.} are likely to be helpful.
\subind{equational logic programming|)}\subind{logic programming!equational|)}
\subind{relational rewriting|)}\subind{rewriting!relational|)}
\subind{predicate logic programming without functors|)}
*