\documentstyle[12pt]{article}
\newcommand{\derives}{\vdash}
\newcommand{\notderives}{\not\:\derives}
\newcommand{\locderives}{\mathrel{\derives_L}}
\newcommand{\notlocderives}{\mathrel{\notderives_L}}
\newcommand{\globderives}{\mathrel{\derives_G}}
\newcommand{\limplies}{\Rightarrow}
\newcommand{\sunion}{\cup}
\newcommand{\englishprop}[1]{\mathord{\mbox{\it #1\/}}}
\newcommand{\searchres}{\mathord{\mbox{\it Search\/}}}
\newcommand{\consistent}{\mathord{\mbox{\it Consistent\/}}}
\newcommand{\BF}{\englishprop{Typical birds fly}}
\newcommand{\PNF}{\englishprop{Penguins do not fly}}
\newcommand{\GB}{\englishprop{George is a bird}}
\newcommand{\GP}{\englishprop{George is a penguin}}
\newcommand{\GF}{\englishprop{George flies}}
\newcommand{\GNF}{\englishprop{George does not fly}}
\newtheorem{example}{Example}
\newtheorem{definition}{Definition}
\title{Against Nonmonotonic Logic\\(DRAFT)}
\author{Michael J. O'Donnell\thanks{Supported in part by NSF grant no.
CCR-9016905}}
\date{8 April, 1993}
\begin{document}
\maketitle
\begin{abstract}
  The {\em monotonicity\/} property of inference in conventional
  formal systems of logic (${\bf S}\derives A$ and ${\bf
  S}\subseteq{\bf T}$ implies ${\bf T}\derives A$) appears to prohibit
  certain crucial steps in practical intuitive reasoning.  For
  example, reasoning by default (using assertions that are assumed to
  hold until explicitly contradicted) appears to be impossible in
  monotonic systems.  Many crucial steps in practical reasoning
  certainly require nonmonotonic behaviors in inference procedures.
  But, I argue that it is highly misleading to describe these
  nonmonotonic behaviors in terms of nonmonotonic inference relations
  $\derives$.  Rather, inference procedures should use
  metatheoretic operators, with conventionally monotonic
  inference relations, to support nonmonotonic behavior.
\end{abstract}

\section{Defeasible Reasoning Requires Nonmonotonicity}

Most formal systems for logical reasoning are essentially definitions
of a relation $\derives$, called {\em logical inference.} When ${\bf
T}$ is a set of propositions expressed in the language used by such a
formal system, and $A$ is another such proposition, \mbox{${\bf
T}\derives A$} means that the formal system allows $A$ to be inferred
from hypotheses in ${\bf T}$. In the vast majority of formal systems
studied by logicians, the relation $\derives$ is required to be
monotonic in its left argument.
\begin{definition}
A logical inference relation\/ $\derives$ is\/ {\em monotonic\/} if and
only if, for all sets of propositions\/ ${\bf S}$ and\/ ${\bf T}$, and
for all propositions\/ $A$, if\/ \mbox{${\bf S}\derives A$} and\/
\mbox{${\bf S}\subseteq{\bf T}$}, then\/ \mbox{${\bf T}\derives A$}.
\end{definition}
Certain practical considerations appear to argue in favor of formal
systems with {\em nonmonotonic\/} logical inference relations. I will
sketch these practical considerations, but argue nonetheless that they
are best accommodated by logical inference relations that are
conventionally monotonic.

Practical reasoning often involves {\em
defeasible\/}~\cite{defeasible} (retractable) steps. In particular,
there are often {\em default\/} assumptions~\cite{default} about
typical cases, that are accepted until specific reasons to reject them
are found. The example of concluding that a given bird flies until
learning that the bird is a penguin has achieved chestnut status.
\begin{example}[Penguins~\cite{penguins}]
  Consider a formal language with formulae expressing at least the
  following six propositions
  $$
  \begin{array}{rccl}
    \{&\BF,&\PNF,\\
    & \GB, & \GP,\\
    &\GF,&\GNF&\}
  \end{array}
  $$ Given knowledge of the set of three propositions $${\bf
  T}_1=\{\BF,\;\PNF,\;\GB\}$$ it seems quite rational to suppose that
  George is in fact a\/ {\em typical\/} bird, and conclude that $\GF$.
  That relation between a set of propositions and another proposition
  is often expressed as
  $${\bf T}_1\derives\GF$$ But, given the additional knowledge that
  $\GP$, it seems irrational to conclude that $\GF$.  Letting $${\bf
  T}_2={\bf T}_1\cup\{\GP\}$$ this relation is expressed as
  $${\bf T}_2\notderives\GF$$ But
  $${\bf T}_1\subseteq{\bf T}_2$$ so the appropriate formal system
  of logic in which to perform practical reasoning about birds,
  penguins, and ability to fly seems to be nonmonotonic.
\end{example}
The desire to have a reasoning procedure that assumes typical
conditions until finding evidence of a special case seems to prohibit
monotonicity in the penguin example above.  But, a closer analysis
reveals more than one formal site for the nonmonotonicity of the
procedure.

\section{Two Formal Sites for Nonmonotonic Behavior}

I believe that the penguin example, and many similar examples, provide
a strong case for nonmonotonic behavior in a reasoning procedure.
That is, there are certainly cases in practice where the discovery of
new knowledge requires a retraction of old assertions. But, the
explication of this nonmonotonic behavior can be attached to at least
two different sites in a formal system of reasoning: the inference
relation $\derives$, and the formulae themselves. I will argue for the
latter choice, but first I will reject some particular forms for doing
so, in spite of their initial appeal.

In principle, nonmonotonic behavior of procedures using a monotonic
inference relation may be explained by weakening the conclusions of
inference.  For example, instead of \mbox{${\bf T}_1\derives\GF$}, we
might have $${\bf T}_1\derives\englishprop{George is a typical
case}\limplies\GF$$ or $${\bf T}_1\derives\englishprop{Special
case}\lor\GF$$ Nonmonotonic behavior is produced by introducing a
large or even infinite set of propositions, such as
$\englishprop{George is a typical case}$, as {\em default\/}
assumptions, then removing them as soon as counterevidence appears.
The extra clauses in conclusions serve as switches to enable or
disable the real conclusions, such as $\GF$, by respectively including
or excluding the default assumptions.

The basic idea of weakening conclusions to maintain monotonicity may
be refined~\cite{weakening}, but I believe that most reasoners rightly
reject this complication of the form of final conclusions.  Since the
normal mode of assertion of a proposition $A$ in everyday life
implicitly means something like, {\em ``As well as can be determined,
$A$ holds,''} it seems inappropriate to introduce explicit formal
disclaimers into every proposition. The best known formal systems for
mathematical reasoning (the {\em Classical First-Order Predicate
Calculus\/} heads the list, but the same holds for higher-order
classical logics, set theories, and formalizations of intuitionistic
or constructive reasoning) all treat the formal assertion of a
proposition $A$ as meaning something like, {\em ``$A$ is absolutely
true, with no possibility of error.''} But, nowhere in popular usage,
not even in mathematical journals, is that formal notion actually
practiced. If it were, Russell's antinomy would have been a
catastrophe rather than a stimulus to further research. So, I reject
the weakening of conclusions by explicit formal disclaimers, since it
forces a detailed accounting for assumptions that are made uniformly
and universally in the practice of language, and thereby obscures the
natural structure of practical assertions of propositions.

Rather than explaining nonmonotonic behavior by weakening the
statements of the conclusions of inference, we need to strengthen the
hypotheses. An obvious possibility is to add the same sorts of
propositions that are used to weaken the conclusions, yielding $${\bf
T}_1\sunion\{\englishprop{George is a typical case}\}\derives\GF$$ or
the more transparent $${\bf T}_1\sunion\{\englishprop{George is not a
flightless bird}\}\derives\GF$$ The trouble is that assertions of
typicality, such as $\englishprop{George is a typical case}$, do not
represent direct intuitive judgements, but rather are themselves the
results of sophisticated epistemological reasoning. Instead of
begging the question of how one decides to assume $\englishprop{George
is a typical case}$, we should reduce that judgement to concepts that
can be referred more directly to observations and intuition. The
use of $\englishprop{George is not a flightless bird}$ avoids the
explicit reference to typicality, but even more obviously begs the
question of how we establish the hypothesis that $\englishprop{George is not a
flightless bird}$, which is essentially as hard as the final
conclusion $\GF$.

Another way of augmenting the hypotheses is to use a formal operator
$\searchres$, where for propositions $A$ and sets ${\bf T}$ of
propositions, $\searchres(A,{\bf T})$ means that {\em a reasonably
diligent search for known propositions relevant to $A$ yields only
${\bf T}$.} Then, the inference that $\GF$ is represented by
$$\searchres(\GF,{\bf T}_1)\derives\GF$$ while $${\bf
T}_1\notderives\GF$$ This approach captures the idea that the
information in ${\bf T}_1$ alone does {\em not\/} allow us to conclude
rationally that $\GF$; rather, the fact that a diligent search for
knowledge bearing on $\GF$ came up with nothing more than ${\bf T}_1$
is required to support the conclusion.  $\searchres$ is unusual in
that it asserts something about the very inference procedure that is
considering propositions built up from $\searchres$, so the operator
is {\em metatheoretic\/} and {\em reflective\/}~\cite{reflection}, in
some ways analogous to the {\em negation as failure\/} used in certain
implementations of Prolog~\cite{negation-as-failure}.

$\searchres$ can trigger nonmonotonic behavior, since the addition of
new information to the knowledge base, or even the discovery of more
information that was already in it, can change $\searchres(A,{\bf T})$
from true to false. In the penguin example, once we discover the
proposition $\GP$ among our knowledge, the metatheoretic proposition
\mbox{$\searchres(\GF,{\bf T}_1)$} becomes false, and the conclusion
$\GF$ can no longer be drawn, because the hypothesis required to infer
it does not hold. \mbox{$\searchres(\GF,{\bf T}_2)$} becomes true at
this point, but it does {\em not\/} support the inference that $\GF$.

Further refinement of these ideas may show how to avoid the annoying
repetition of $\GF$ on both sides of $\derives$, or may reveal that
search keys have a different type than propositions, with some useful
relations determining which search keys are relevant to inferring
which propositions.  One natural operator to consider is the {\em
consistency\/} operator $\consistent$~\cite{consistent}.  When ${\bf
T}$ is a set of propositions, \mbox{$\consistent({\bf T})$} asserts
that ${\bf T}$ is internally consistent (noncontradictory).  Using
$\consistent$, we might approach the penguin problem by allowing
$${\bf T}_1\sunion\{\consistent({\bf
T}_1\sunion\{\GF\})\}\derives\GF$$ In a reasonable system,
$$\consistent({\bf T}_1\sunion\{\GF\})$$ holds, but
$$\consistent({\bf T}_2\sunion\{\GF\})$$ may not hold.  In spite of
the attractive conceptual simplicity of reasoning using $\consistent$,
the huge cost of testing consistency, and the lack of apparent
flexibility for dealing with conflicting defaults, suggest that we
need an operator that refers somehow to the results of the limited
searches for knowledge carried out by practical inference procedures,
rather than the more abstract notion of consistency in a logical
system.  On the other hand, some sort of {\em resource-limited\/} test
of consistency might be the right basis for introducing
nonmonotonicity.  Much further thought is required to determine
whether expressions describing resource limitations need to be
included as arguments to the operator, or whether they may be fixed or
quantified out in a standard way.

In the next two sections, I argue that the use of a nonmonotonic
logical inference relation $\derives$ is highly misleading, and {\em
not\/} the appropriate way to support nonmonotonic behavior in a
procedure for reasoning.  Rather, I argue that nonmonotonic behavior
should be supported by including, in the hypotheses of a logical
inference, an explicit formal representation of the metatheoretic fact
that certain propositions contain all of the knowledge relevant to a
given problem discovered by a reasonably diligent search. I use the
form $\searchres(A,{\bf T})$, where $A$ is a proposition and ${\bf T}$
is a finite set of propositions, to represent such metatheoretic
assertions. I do {\em not\/} believe that this form is exactly the
right one in practice, but it has enough of the essential elements to
serve for the current comparative argument.

\section{Two Interpretations of the Logical Inference Relation}

The crux of my argument depends on choosing between two
interpretations of the logical inference relation denoted by
$\derives$, one of which allows the relation to be nonmonotonic, and
the other of which demands monotonicity.  Roughly, ${\bf T}\derives A$
means that, given knowledge of the propositions in ${\bf T}$, it is
rational to infer $A$ as well.  Two different ways of making this
rough idea more precise yield radically different results for
practical reasoning.
\begin{definition}
Let $A$ be a proposition, and ${\bf T}$ be a set of propositions.
$${\bf T}\locderives A$$ means that, whenever an agent knows each of
the propositions in ${\bf T}$, the agent may rationally conclude that
$A$ holds as well (the subscript $L$ stands mnemonically for
``local'').  $${\bf T}\globderives A$$ means that, whenever an agent
knows each of the propositions in ${\bf T}$, and the agent knows that
${\bf T}$ includes at least all the results of some reasonably
diligent search for all available information relevant to judging the
correctness of $A$, then the agent may rationally conclude that $A$
holds as well (the subscript $G$ stands for ``global'').
\end{definition}
The only distinction between $\locderives$ and $\globderives$ relevant
to my discussion is in the different modes of asserting ${\bf T}$---as
merely some set of known propositions in one case, or a set containing
all known relevant propositions in the other case.  The use of
propositions rather than formulae, and sets ${\bf T}$ rather than
some other structures of propositions or formulae, makes no
difference to the present discussion.  By ``knowledge'' I mean
reasonably reliable information that is relatively easily accessible
to an agent.  In particular, I do not assume that all knowledge is
true.  Some readers may wish to substitute ``rational and informed
belief'' for ``knowledge.''

Since knowing each of the formulae in a set ${\bf T}$ implies knowing
each of the formulae in every subset of ${\bf T}$, $\locderives$ must
be monotonic. $\globderives$ is typically nonmonotonic, since as we
make the set ${\bf T}$ {\em larger,} the information that ${\bf T}$
contains all available relevant knowledge becomes {\em weaker,} and
therefore supports {\em fewer\/} inferences. In principle, logicians
may choose to study any interesting relation between propositions and
sets of propositions that they please.  But, by focusing attention on
an infelicitously chosen relation, and referring to it as ``logical
inference,'' we may mislead those who are trying to apply formalisms
to practical problems.  An intuitively well grounded focus of
attention may clarify thinking. In the next section, I analyze the
concrete operations performed by a reasoning agent, and argue that
$\locderives$ is a useful relation for studying such agents, while
$\globderives$ is not.

\section{The Case for Monotonic Logical Inference, with Nonmonotonic
Procedures}
\label{sec:the-case}

The choice of an inference relation such as $\locderives$ or
$\globderives$ as the basis for studying practical reasoning should
depend on the following properties of a typical useful reasoning
agent.
\begin{enumerate}
\item The agent has access to a huge set of propositions representing
  knowledge about the world, through some combination of queries to an
  internal database, observations of the world, and receipt of
  communications from other agents.
\item Access to available knowledge takes time, and consumes valuable
  resources, in a manner that is generally monotonic in some reasonable
  measure of the size of the set of propositions accessed.
\item The agent must do essentially all of its reasoning from small
  subsets of propositions selected from the available knowledge.
\item The agent must often interleave inference and access to
  knowledge---that is, an inference from knowledge accessed so far
  will often be required in order to determine how to search for
  further useful knowledge.
\end{enumerate}
From these properties of a reasoning agent, it follows that the
individual steps by which inferred propositions are added to current
knowledge are of the form
\begin{itemize}
\item from some subset ${\bf T}$ of current knowledge, find a
  proposition $A$ such that ${\bf T}\locderives A$, and add $A$ to
  current knowledge.
\end{itemize}
Property 4 requires $\locderives$ here, instead of $\globderives$,
since some inferences must be made before a reasonably diligent search
is completed.  $\globderives$ essentially forces a reasoner into a
two-phase procedure of search followed by inference.

In principle, propositions might be stratified hierarchically, so that
whenever a judgement about the truth of $A$ is required for a
reasonably diligent search for all information relevant to $B$, then
no judgement about $B$ is required for a reasonably diligent search
for all information relevant to $A$.  Then $\globderives$ might
conceivably work, if we require ${\bf T}$ to include all the
propositions found by the latest search.  That is, the two-phase
procedure using $\globderives$ may be repeated to simulate
interleaving of search and inference.  But, experience doesn't suggest
such a natural stratification, and it seems pointlessly expensive to
require completion of a reasonably diligent search in the large number
of cases where the inference does {\em not\/} require it semantically.

Certainly, $\globderives$ is an interesting relation, and it is
normally a stronger inference tool ({\em not\/} a logically stronger
relation), in the sense that there are normally ${\bf T}$ and $A$ such
that ${\bf T}\globderives A$, but ${\bf T}\notlocderives A$, while the
reverse will not happen with conventional modes of assertion.  So, in
those cases where an appropriate search has been completed, and all of
its results included in ${\bf T}$, an agent should be allowed to add
$A$ such that ${\bf T}\globderives A$.  But, the agent must {\em
notice\/} the fact that a diligent search is complete (this is
necessary even if {\em all\/} inference steps use $\globderives$,
since the agent must decide when to stop gathering members of ${\bf
T}$, and start applying $\globderives$), so it seems natural to encode
the act of noticing completion of the search as a metatheoretical
proposition added to current knowledge.  This leads to the encoding of
$${\bf T}\globderives A$$ in a form such as $$\searchres(A,{\bf
T})\locderives A$$ Certainly, the syntactic form for representing the
proposition $\searchres(A,{\bf T})$ can be refined to be compatible
with some useful data structure for noticing search completion, so
this encoding need not reduce the efficiency of a reasoning procedure
originally based directly on $\globderives$.

So, it appears that the information carried by $\locderives$ is
essential in the design of practical reasoning procedures, and that
the information in $\globderives$ may be encoded into $\locderives$ by
adding appropriate metatheoretic operators, which construct assertions
that certain sets of propositions contain all of the results of certain
searches, to our formal language.  On the other hand, I doubt that
$\locderives$ can be defined efficiently and naturally from
$\globderives$.  An obvious candidate definition is
\begin{itemize}
\item ${\bf T}\locderives A$ if and only if ${\bf T}'\globderives A$
  for all ${\bf T}'\supseteq{\bf T}$.
\end{itemize}
A naive application of this definition leads to an outrageously
expensive implementation of $\locderives$.  It seems unlikely that
there is an efficient general implementation.  Furthermore, it is not
even clear that the definition is extensionally correct in practically
interesting formal systems of logic.  The {\em only if\/} part
probably holds in most reasonable systems.  Monotonicity of
$\locderives$ requires that \mbox{${\bf T}'\locderives A$}.  When
${\bf T}'\locderives A$ we normally expect that ${\bf T}'\globderives
A$ as well, since the in the second case the hypothesis carries the
information given directly by propositions in ${\bf T}'$ {\em as well
as\/} the information that ${\bf T}'$ contains all results of some
appropriate search.  But, the {\em if\/} part seems to depend in a
delicate way on the range of quantification of extended sets ${\bf
T}'$ of propositions, which varies depending on the generality of the
formal language in use.

I do {\em not\/} claim that the particular syntactic form of
\mbox{$\searchres(A,{\bf T})$} is the right one for practical
reasoning. But, this form illustrates some of the crucial properties of
the right solution.
\begin{enumerate}
\item Inferences using default reasoning must have {\em
metatheoretic\/} hypotheses, since the basis for using a default
assumption is the inability to discover a counterargument, rather than
some direct observation about the world outside of a reasoning agent.
\item Some sort of search parameter must be involved in such
metatheoretic hypotheses, since all practical searches are guided by
some description of a goal, and the results of search depend
critically on this description. The $A$ in \mbox{$\searchres(A,{\bf
T})$} is a very naive presentation of this search parameter.
\item Some sort of description of the results of search must be
involved in such metatheoretic hypotheses, since it is the absence of
a counterindication in those results that enables the default
assumption. The ${\bf T}$ is a very naive presentation of the results
of the search.
\item Although the extensional contents of the knowledge base available to a
reasoning agent determines the truth or falsehood of such
metatheoretic hypotheses, the contents of that knowledge base must
{\em not\/} be mentioned explicitly in the formal presentation of the
hypotheses, since it is too big. The whole point of a formal logical
language is to represent the information actually manipulated by a
reasoning agent, which should normally be much smaller than the world
about which the agent is reasoning. For comparative discussion of the
results of reasoning from different knowledge bases, some description
of the knowledge base may be treated as an additional parameter to the
logical inference relation $\derives$, but it must be understood that
the reasoning agent has no direct access to it, only to the formal
propositions that result from searching it. In particular, useful
definitions of $\derives$ must not depend on qualities of the
knowledge base other than those derived by the search procedures used
by the reasoning agent. If several different knowledge bases are
available simultaneously to a single agent, the {\em name\/} of a
knowledge base to be searched may be modelled as part of the search
parameter mentioned in item 2, but the {\em contents\/} of the
knowledge base must not be an explicit parameter.
\end{enumerate}

\section{The Hard Problem Remains}

Of course, arguments of the sort advanced above do not solve any hard
technical problems.  The hard problem to solve in automating practical
reasoning is the problem of reacting to {\em updates\/} (changes in
knowledge)~\cite{update-problem}.  That problem is essentially the
same at a technical level, whether we use a monotonic inference
relation $\locderives$ or a nonmonotonic $\globderives$.  In
conventional monotonic systems, without metatheoretic operators, the
addition of new knowledge leads only to the {\em addition\/} of new
inferences.  The hard part of the update problem is how to handle {\em
retraction\/} of previously accepted knowledge.  There is no clear
consensus even on the proper form in which to express retractions.

Both nonmonotonic inference relations, such as $\globderives$, and
metatheoretic operators, such as $\searchres$, seem to add a
complication to the problem of changing knowledge, since they both
lead to cases where {\em added\/} knowledge requires conclusions to be
{\em retracted.} I conjecture, though, that once the basic problem of
managing retraction of knowledge is solved, the extra interaction of
additions and retractions introduced by nonmonotonicity will become
much clearer.  I propose that, by representing nonmonotonic reasoning
behavior through metatheoretic operators, such as $\searchres$, with
the monotonic inference relation $\locderives$, we can improve the
flexibility and transparency by which future breakthroughs in the
update problem are applied to a wide range of interesting domains of
reasoning.

While the 4 properties of useful reasoning agents mentioned in
Section~\ref{sec:the-case} support the use of formal systems that are
conventional to the extent of being monotonic, they by no means
support the particular choices of classical or intuitionistic
reasoning. In fact, an additional property of reasoning agents argues
strongly against such choices:
\begin{enumerate}
\setcounter{enumi}{4}
\item Some of the knowledge provided to the agent is wrong, and
possibly even contradictory.
\end{enumerate}
Both classical and intuitionistic logic trivialize in the presence of
contradictory information---{\em everything\/} follows logically from
any contradiction, and the mere fact that classically or
intuitionistically \mbox{${\bf T}\derives A$} does {\em not\/} make it
safe for a reasoning agent to conclude $A$ given knowledge of each
proposition in ${\bf T}$. Even in the absence of contradiction,
classical and intuitionistic logic seem to be too sensitive to errors,
due to their conceptual foundation on the assertion of absolute truth
rather than of rational belief based on fallible information, although
I can find no rigorous discussion of this sensitivity in the
literature.

An obvious fix is to have the agent test the consistency of ${\bf T}$
before applying logical inference.  Unfortunately, consistency
checking ranges from intractably expensive to fundamentally
impossible, so the time and resource for consistency checking would
dominate that for real reasoning steps. Rather, property 5 requires
the use of {\em paraconsistent\/} formal
systems~\cite{paraconsistent}, such as {\em relevance
logics\/}~\cite{relevance} instead of classical or intuitionistic
formal systems. Further research is required to illuminate the choice
of a particular paraconsistent formal system for a particular
reasoning agent. The right paraconsistent formal system can reduce the
harm done by errors and contradictions in the hypotheses used by a
reasoning agent, allowing useful reasoning to go on during the
typically long time between the introduction of an error or
contradiction and its detection. But, they still do not address the
hard problem of retracting information when an error is finally
detected.

\end{document}
