I was eating my baloney sandwich for lunch at the University of Chicago when Stuart Kurtz proposed that the time had come to publish computer science research online. We had known for some time that online publishing was inevitable in the near future. Stuart observed that the widespread availability of FTP for acquiring articles, PostScript printers and viewers for reading articles, and LaTeX for producing articles, made the whole thing possible, right now. ``Right now'' right then was 1989. Stuart Kurtz, Janos Simon, and I began planning what is right now, now 1997, the Chicago Journal of Theoretical Computer Science. CJTCS has been published on the World Wide Web by MIT Press since 1995.
The Journal of Online Publishing has kindly invited me to share what I know about online scholarly publishing. I realize that I still know essentially nothing. Essentially nothing about online publishing has stayed constant long enough to determine anything useful by experiment. The formats, distribution methods, financial arrangements, are all unstable. So, instead I will describe my experience with the design and operations of CJTCS and another MIT journal, the Journal of Functional and Logic Programming. Then, I will explain the opinions that my experience with these journals has stimulated. Someday, these opinions will be replaced by knowledge, but not right now.
The development of systems for online scholarly publishing is like a huge multiplayer semicooperative game of Tetris. Funny shaped pieces come at us in an order that we cannot control, we are required to place each one promptly, and our success depends on the smoothness of the final surface. Actually, it's worse than that, since the shapes of pieces that we have already placed can change. Every decision must yield a workable system right now, while taking us down a path toward the best possible stable system at some unknown time in the future. Disagreements about decisions arise at least as much from different notions of the path as they do from different notions of the goal. Our final success depends at least as much on our emotional commitment to make things work, whatever the future brings, as it does on the details of our incremental choices. We are mired in media res among ephemeral details, trying always to steer for a truly productive stable system that we cannot foresee reliably. We are piloting our publications toward an unseen world 50 years ahead, while our visibility is 1 week. This is what makes it frustrating, and also what makes it fun.
The reason for moving scholarly publishing online is technological. But, the main activity in advancing the state of online publishing is coalition building. There are more plausible attitudes on the subject than there are participants in the enterprise, and interesting results come from groups that can co-operate on a common project in spite of their disagreements on details and reasons. The analysis below represents my personal understanding of the online publishing enterprise. My colleagues in CJTCS, JFLP, and at the MIT Press, helped me reach this understanding, and we co-operate to publish the journals, but probably nobody else agrees with all of my points.
In order to design online publication, we need to reconsider publication in general, to expose issues that we may have taken for granted during the long dominance of printed publication. For starters, we should identify as well as possible the reason for publishing. Many people are served by the conventional printed publication or a scholarly article:
In the long run, I expect to see scholarly publishing evolve toward
the best possible service to readers, and I want that long run to come
as soon as possible. There are cogent arguments, however, for a focus
on 1-3. Steven Harnad
I think that reader-centered publication has an ethical edge over
other models, as long as it really works. I expect authors, editors,
and publishers to succeed in their own endeavors by catering to the
interests of readers. But, each of these activities---writing,
editing, publishing, and reading---is essential to the success of
scholarship. It is still possible that the best way to serve readers
is to focus on providing incentives to the other players, for example to
authors as suggested by Harnad. My bets are on the reader-centered
model; if I win it will not be because of abstract principles that
favor this model, but because all of the details of scholarly
publishing work out well in this way.
Having decided to focus on service to readers, it is still not obvious
how to best serve them. At first, it seems that we must predict the
behavior of readers over the effective lifetime of an article, which
is at least decades and possibly centuries, in order to maximize their
satisfaction. I believe that such prediction is impossible, and that
attempts to cater to detailed needs of readers is harmful: instead of
serving their actual needs, we will impose constraints on their
behavior. Rather then trying to anticipate readers' needs, I propose
that we provide them with as much information as possible, and the
greatest flexibility possible to use that information in whatever ways
they like. This may sound even harder than serving specific
anticipated needs, but in computing and informational enterprises,
there are sensible principles that tend to maximize flexibiilty.
In choosing real estate, the three most important issues are location,
location, and location. Location is most important because it is the
quality of real estate that cannot be changed by
construction, renovation, or zoning legislation. Similarly, in
designing computing/information systems for the long term, the three
most important issues are binding time, binding time, and binding
time. The ``binding time'' of a datum is the time in which its value
is recorded permanently, or ``bound,'' in the the
system. Although the buzz phrase specifies time, it is just
as important who chooses the binding. The key to flexible
systems is to delay binding time for each datum until there is no
further reason to change, or until it is crucial to establish
uniformity. For example, early versions of the UNIX operating system
used to allow the system installer to bind the time zone and the
choice to follow or ignore daylight savings, but had the starting and
ending dates for DST compiled into the system. This seemed great until
congress changed those dates. And, those who carry modern laptop
computers on trips may be annoyed by the assumption that machines don't
change their time zones.
So, I considered the right binding times for different qualities of a
published article. It is important to bind the content of
each article when it is published, as well as those aspects of style
that are subject to criticism and discussion by a variety of readers
in the future. The value of a published article is diluted severely if
readers cannot refer with confidence to a uniform notion of the
content. But, the form of display of an article is subject to
the particular needs and tastes of the reader, and should be bound
only at the moment of reading. Even the word ``display'' is too
restrictive, since readers may wish to process articles through
automatic systems for information retrieval and analysis, they may
wish to import formulae into symbolic math utilities, they are almost
certain to invent ways of using the information in articles that are
inconceivable today. So, the fundamental principle in the design of
CJTCS is to provide readers with the most direct presentation
of the textual contents of an article that we can arrange, and leave
them free to use that presentation however they like. This flexibility
is unlikely to be noticed much in the first few years, but should be a
big win in the long run.
We publish articles because the act of publication adds value. I find
four basic sources of value in publication:
Back to 1989. Stuart and Janos and I decided that our project should
be guided by three general principles:
These principles, and other considerations, led to concrete decisions
about the nature of our journal:
The requirement of manpower in item 4, and of institutional support in
item 6, delayed us several years. We were unwilling to start entirely
on volunteer effort and hope. We might have chosen differently had our
topic been suitable for plain textual articles. It is perfectly
sensible for volunteer editors to critique style and content of
English text. And, plain ASCII files can be distributed widely so that
archiving becomes the distributed responsibility of a large number of
libraries and readers. In the old print regime, publishers did little
or nothing about archiving: they shipped printed volumes and left
archiving to the libraries. The difficulty of our highly mathematical
material requires more sophisticated publication formats than plain
ASCII. Editing for good format and layout requires skills and a type
of work associated with professionals rather than volunteers. These
considerations convinced us that we needed a modest budget for
editorial and production.
When we considered archival permanence, we thought at first about the
longevity of material media, such as CDs, magnetic tapes, disks. It is
very difficult to get reliable information on these points. We soon
realized that the deterioration of materials was not the key problem,
anyway. Online bits are always being copied between disks and other
storage devices. Merely by having two or three repositories in
different cities, we could be very secure against physical
disasters. Our real problem was obsolescence of data formats. There is
no way to guarantee in advance that a database will be converted into
new formats when the old ones lose support. Security against format
obsolescence hinges on the commitment of an institution to doing
whatever is necessary, when it is necessary, without advance knowledge
of the specific steps. So, we needed a reliable institutional
sponsor. Over time, we thought of other uses for such a sponsor. For
example, we needed someone to hold copyrights and/or licenses, to
ensure our permanent rights to distribute articles.
At first, we sought funding from a government agency, or foundation,
to subsidize a few years of journal operations while arranging
permanent funding for university sponsorship of the journal. We still
find the idea of direct university sponsorship very attractive for the
long run. Imagine that all university libraries took their large
periodicals budgets, and diverted them from subscription fees, to the
direct subsidy of important journals. Each university could sponsor a
modest set of journals, and give away the articles for free
online. The total cost could be substantially less than the cost of
giving articles to publishers and then buying them back. Access to
scholarly information would improve. Those publishers flexible enough
to adapt to the new regime could offer their editing and production
skills on a contract basis. The risk of journal failures would be
borne by universities instead of publishers. But, before we found
willing sponsors for our favored model, MIT Press took the initiative
in proposing to operate the journal. We found this offer irresistible,
and decided to edit a journal for publication by MIT Press. I consider
MIT Press' enthusiastic and open-minded attitude to be far more
important than the fact that it is a publisher, rather than a
university or library.
We decided to focus on theoretical computer science, mainly because we
had the expertise and contacts to form a good editorial board in that
area. Also, TCS is a mature field with a number of good conferences
and printed journals, but the extreme competition for conference
positions and multiyear delays for journal publication suggest that
there is room for another venue for articles. The presence of more
conventional printed journals is important, so that we can send
unsuitable authors elsewhere instead of compromising our methods in
order to serve everybody.
Online publishing should lead us to question almost every aspect of
the journal business, even though we resolve some of those questions
by sticking to tradition. We wondered what should determine the
boundaries of a journal. If it is the format and access method, then
the scope of a journal should be as broad as possible.
Springer-Verlag's
Journal of Universal
Computer Science, and the
British Computer Society's
Computer Journal
try to cover all of CS. The
New England Complex Systems Institute's
InterJournal
is even more ambitious, with no limitation on the scope of topics,
although only three areas are represented so far. There is a good
chance that, in the long run, there will be a small number of
comprehensive collections of scholarly articles, divided according to
irreconcilable differences regarding format, access methods, power
politics, and ego. These collections will have relatively independent
editorial boards for each of a large number of topics. There is no
compelling case that each of these topics needs a creative name and a
separate identity as a "Journal." We considered seriously founding a
journal for all of CS, with an initial topic area in theoretical
CS. We decided that we could not expand the set of topics fast enough
to maintain credibility in the broader scope. But, perhaps
CJTCS will merge someday into a much larger and broader
journal.
I have argued at some length
elsewhere that the
definitive archival format of articles published online
should present the textual structure of articles as transparently as
possible, leaving details of the display, such as typographical
layout, to readers. Layouts that are attractive to large numbers of
readers should be provided when convenient as a derivative service,
but the definitive copy of an article should be a structural source
format.
We decided very early, as a corollary of our commitment to flexibility
for future readers, to publish textual structure rather than
typography. But, what is an appropriate format for textual structure?
SGML (Standard Generalized Markup Language), of which
HTML (Hypertextual Markup Language) is a derivative,
was designed mainly to represent textual structure. Although SGML has
many flaws as structural format, it is the best default choice today,
since so many people are supporting it. The
Astrophysical
Journal, published by the
University of Chicago
Press, uses SGML as the definitive source format for
articles.
The topic of CJTCS ruled out SGML as a definitive
publication format. Research articles in theoretical CS require a lot
of complex mathematical formulae, which cannot be read efficiently
without a high quality of typographical formatting. Nobody has yet
provided a general-purpose SGML or HTML viewer with
an acceptable presentation of mathematical formulae. Most articles in
theoretical CS are written with the LaTeX and
AMSLaTeX typographical languages, based on the TeX
system for typesetting text and mathematics. LaTeX and
AMSLaTeX were designed to support authors through a series of
drafts and revisions, with high quality printed copy as the final
product. They were never intended as source formats for publication.
Fortunately, the best way to support an author through the revision of
a series of drafts is to use a source format representing textual
structure. Leslie Lamport, the creator of LaTeX explains this
point in the
CJTCS could not survive even the briefest startup without
excellently readable presentations of articles, since no author would
submit a paper to an illisible journal. In order to provide acceptable
presentation instantly, with the best feasible structural information
for future use, we decided to publish LaTeX source, but to
edit it carefully to conform to strict standards of our own. To
support these standards, I provided a freencollection of macros, called
CJstruct,
and defined a disciplined subset of LaTeX using those macros
to present structure as clearly and unambiguously as I could
manage. CJstruct does very little to determine typographical
layout. Rather, it translates a roughly SGML-like structure
into the commands defined by existing LaTeX styles. The small
amount of typographical content in CJstruct provides default
layouts for a few structural elements, such as journal head matter,
that are not supported by existing styles. We provide a standard
preformatted version of each article, in a style agreeable to the
majority of current readers. But, readers in the future have the power
to vary all of the elements of typographical style in the article. The
obvious variations involve individual favored fonts, alternate page
sizes, large type for readers with vision impairments. But, the most
important uses of flexibility are the ones that nobody does yet, but
will do decades from now.
The
Astrophysical
Journal solved a very similar problem in
reconciling presentation of mathematics with good textual structure,
in a substantially different way. AJ uses SGML as
the definitive source format, and uses its own software to convert
this source automatically to LaTeX and other formats. In a
static sense, the AJ solution is better than
CJTCS's. So why do we do differently? AJ is a huge
journal, and substantial resources were spent on professional
programmers to provide the conversion software, which works only on
AJ's particular material. CJTCS's programming staff
consists of my ``free'' time. AJ devotes substantial
editorial resources to re-entering author's manuscripts in the correct
form. CJTCS's corresponding resource is currently me, in the
process of transferring to one staffer at MIT Press, who has many
other tasks in hand besides CJTCS editing. AJ's
mathematical demands are strong, but not nearly as severe as
CJTCS's. Finally, AJ is not trying to present any
internal structure in mathematical formulae, only the typographical
layout information. At some point in the future, I expect
CJTCS to convert to a better structural format, which may be
a derivative of SGML, but only when someone else is
providing a general-purpose display method with high-quality
presentation of formulae. AJ deserves commendation for trying
the immediate SGML path. It allowed them to convert very
efficiently from pure print to online presentation in
HTML. Their conversion programs may develop into the basis
for a general-purpose SGML formula formatter, or they may
merely provide useful data to whomever creates that formatter.
The Journal of
Functional and Logic Programming, also published by MIT
Press, decided to use CJstruct, and the same standards for
published LaTeX source. This sort of bandwagon effect is
crucial to the long-term success of our methods, since most of the
development work required for one journal ports to the other. The
bandwagon will grow, perhaps by attracting more users of
CJstruct in the near term, certainly by merging with other,
independently derived methods in the long term.
When we first conceived CJTCS, anonymous FTP was the
normal method for distributing information over the Internet, and we
assumed that it would be the main method for the journal. Then, I
discovered Gopher, and spent substantial effort organizing a
Gopher interface, which was obsoleted by the World Wide Web
about the time the journal started publishing. Now, World Wide Web
access is the norm. We support anonymous FTP too,
particularly for automatic mirroring operations. The good news is
that the change from FTP to HTTP through
Gopher made no difference whatsoever to the journal
operation, and further changes in network protocols are likely to be
equally transparent. Rather than committing to a specific network
distribution protocol, we are committed to tracking whatever protocol
everybody is using. Since both MIT Press and the University of Chicago
need to run the latest protocols for other reasons, there is very
little cost to the journal for this commitment.
As an auxilliary service, MIT will print hard copies of articles and
mail them for a charge. This has no impact on operations, since it is
run as a printing service, and does not affect the production of the
definitive published articles.
Library subscribers often ask what they need to do to ``acquire''
CJTCS articles, now that they have subscribed. The
acceptance, cataloging, and shelving of newly received paper materials
is a substantial job for libraries in the print regime. What are the
analogous activities in the online regime? There aren't any, and that
is good news. Since the World Wide Web is so successful at delivering
information directly from the producer to the reader, libraries do not
need to mediate readers' access to journal articles. The libraries'
role with an online journal is to help local readers find the
journal. That requires an entry in the (preferably online) catalog,
and a pointer to the journal's URL. No work whatsoever is called for
by the library to receive individual articles, issues,
volumes. Certain libraries will choose to mirror certain journals to
improve network efficiency, but the article-by-article effort in such
mirroring is entirely automated.
We intend that CJTCS will support itself primarily through
institutional subscriptions, which are priced at $125 per year. A
grant from the Mellon Foundation is helping with the startup
explorations. For the first three years of operations, we have
provided articles to subscribers by FTP and HTTP
(World Wide Web) with no technical access restrictions. There is some
controversy going on whether subscribers are offended by the
possibility that nonsubscribers are looking at articles for
free. Since academic libraries have always viewed themeselves as
stewards of the literature, more than as institutions gaining
advantages over competing libraries by controlling access to strategic
documents, I do not expect them to care about access restrictions. The
lack of technical access restrictions, such as passwords, makes life
simpler for subscribers, and it allows the sort of good-faith browsing
by nonsubscribers that goes on in book stores. There is no method for
controlling online access to information that does not make legitimate
access less convenient in order to prevent illegitimate access. Since
the journal and its subscribers are not harmed directly by unpaid
access, I have opposed access controls. The freedom from controls
makes volunteer mirroring particularly easy, and paves the way for our
inclusion in broader document collections in the future.
We do not yet have enough subscriptions to support operations. I
suspect that we are not producing enough articles to attract
subscribers, and that we have diverted to production work a lot of
effort that should go into explaining the journal to
libraries. There are many other theories about the low initial
subscriptions. We will never know the real reason, but I hope that we
will eliminate it whatever it is.
Starting in 1998, CJTCS will restrict access by Internet
address to subscribing institutions and individuals. I opposed this
step, but considered it not important enough to argue about in the
midst of more serious issues. Address restriction is a lot less
annoying than passwords. It makes access completely transparent from
the right Internet addresses, but creates a large problem for
subscribers who normally come in from private Internet access
providers instead of from their home institutions. I expect that the
access restrictions will have a minor negative impact on
subscriptions, rather than the positive impact that motivates this
step. I also expect that, having solved the more serious problems of
insufficient articles and subscribers, that we will eventually reverse
this step, due to competition with a large unified archive of freely
accessible articles in computer science that should arise in the near
future. We will see.
Independently of the question of technical restrictions on network
access, CJTCS remains committed to provide the most liberal
possible
license
to subscribers. The only restrictions are those required to insure
fair attribution to authors and the journal, and to prohibit use in a
competing commercial product. In particular, subscribers have access
to current articles and back issues, they may read articles online
from MIT Press, the University of Chicago, or any mirror site, they
may form their own mirrors or private archives and preserve copies of
articles after their subscriptions expire, they may display and print
copies in whatever formats they like, they may use any information
processing and retrieval software that they like. I believe that the
unbundling of information from the methods for displaying and
processing it is absolutely crucial for the
effectiveness of the network in academia. In the long term, readers
should be able to choose their favorite software for display and other
processing, and use that software on all the types of information that
they read, rather than dealing with specialized interfaces that are
coupled tightly to particular sources of information.
In its third year of operation, here's how CJTCS processes
scholarly articles, right now.
No, there is not a space missing in the section title. Experience with
online publishing is thoroughly inconclusive, and it would be harmful
rather than helpful to draw definitive conclusions. Instead, here are
some personal predictions, some of which might be right.
Serving Readers Through Flexibility
The Story
Principle 1 is essential to attract the support of authors, volunteer
editors, and subscribers. Principle 2 secures my interest in the
project. Principle 3 tries to keep us sane.
Looking for a Sponsor
Choosing a Topic
Choosing a Publication Format
Distribution Methods
Support Through Subscriptions
Journal Operations
We have published 12 articles (far too few) by this process. Some of
the inefficiencies are quite glaring, but the whole process is
unstable, and it doesn't always make sense to fix problems that are
going to change anyway. The repeated FTP transmissions are
not as bad as they look, since good user interfaces under
Emacs make this transmission look almost the same as just
storing a file locally. Much of my production work is being
transferred to new technical staff at MIT Press, so the pattern of
communication will change before we could improve the current form. In
principle, all work should be done on a single definitive copy of an
article, using RCS to control the revisions from a single
site, and CVS if people at different sites work on the same
article. Improving interaction with the copy editor is a bit
harder. Printing and expressing of unmarked hard copy can be solved by
transmitting PostScript. It would be easy to create
LaTeX macros to display the copy editor's markup. But, it is
difficult to provide a user interface for data entry that competes in
speed with pen and paper markup.
Inconclusion
What the online publishing enterprise needs now is not one person who
knows everything right. Even if such a person existed, we'd surely
ignore her. Rather, we need a lot of people, full of energy and good
will, to keep a grip on the controls and dodge the obstacles as they
appear. The sound of a successful online publishing enterprise is
something like this: "Oops." "Aaagh!" "Hmmm." "Whew." Repeat until
things settle down.