DRAFT IN PROGRESS
Last modified 15 August 1997

Piloting an Electronic Journal, Visibility 1 Week

Introduction (details versus concepts)

In Media Res

I was eating my baloney sandwich for lunch at the University of Chicago when Stuart Kurtz proposed that the time had come to publish computer science research online. We had known for some time that online publishing was inevitable in the near future. Stuart observed that the widespread availability of FTP for acquiring articles, PostScript printers and viewers for reading articles, and LaTeX for producing articles, made the whole thing possible, right now. ``Right now'' right then was 1989. Stuart Kurtz, Janos Simon, and I began planning what is right now, now 1997, the Chicago Journal of Theoretical Computer Science. CJTCS has been published on the World Wide Web by MIT Press since 1995.

Digging Our Way Out

The Journal of Online Publishing has kindly invited me to share what I know about online scholarly publishing. I realize that I still know essentially nothing. Essentially nothing about online publishing has stayed constant long enough to determine anything useful by experiment. The formats, distribution methods, financial arrangements, are all unstable. So, instead I will describe my experience with the design and operations of CJTCS and another MIT journal, the Journal of Functional and Logic Programming. Then, I will explain the opinions that my experience with these journals has stimulated. Someday, these opinions will be replaced by knowledge, but not right now.

The development of systems for online scholarly publishing is like a huge multiplayer semicooperative game of Tetris. Funny shaped pieces come at us in an order that we cannot control, we are required to place each one promptly, and our success depends on the smoothness of the final surface. Actually, it's worse than that, since the shapes of pieces that we have already placed can change. Every decision must yield a workable system right now, while taking us down a path toward the best possible stable system at some unknown time in the future. Disagreements about decisions arise at least as much from different notions of the path as they do from different notions of the goal. Our final success depends at least as much on our emotional commitment to make things work, whatever the future brings, as it does on the details of our incremental choices. We are mired in media res among ephemeral details, trying always to steer for a truly productive stable system that we cannot foresee reliably. We are piloting our publications toward an unseen world 50 years ahead, while our visibility is 1 week. This is what makes it frustrating, and also what makes it fun.

Design of the Chicago Journal of Theoretical Computer Science

The reason for moving scholarly publishing online is technological. But, the main activity in advancing the state of online publishing is coalition building. There are more plausible attitudes on the subject than there are participants in the enterprise, and interesting results come from groups that can co-operate on a common project in spite of their disagreements on details and reasons. The analysis below represents my personal understanding of the online publishing enterprise. My colleagues in CJTCS, JFLP, and at the MIT Press, helped me reach this understanding, and we co-operate to publish the journals, but probably nobody else agrees with all of my points.

Publishing in Support of Reading

In order to design online publication, we need to reconsider publication in general, to expose issues that we may have taken for granted during the long dominance of printed publication. For starters, we should identify as well as possible the reason for publishing. Many people are served by the conventional printed publication or a scholarly article:

  1. The author gains prestige and credentials for employment and promotion.
  2. The editor gains similar prestige.
  3. The publisher makes money from the sale of the journal.
  4. Readers learn from the article.
  5. The scholarly community advances itself.
  6. Scholarship contributes to the overall quality of life in the world.
I choose 4, the act of reading, as the strategic step which should drive the publishing business. The main focus of my work in designing CJTCS is providing the greatest possible service to readers. Why? 5-6 are more fundamental sources of value, but they are too general, applying to all aspects of scholarship as well as to publishing. 1-3 are derivative issues. In principle, we could change or eliminate the activities of publishers, editors, and authors, if there were other ways to deliver equal or better enlightenment to readers.

In the long run, I expect to see scholarly publishing evolve toward the best possible service to readers, and I want that long run to come as soon as possible. There are cogent arguments, however, for a focus on 1-3. Steven Harnad models scholarly publishing as a service to authors, who want to share their ideas for both altruistic and selfish reasons. Publishers naturally view the scholarly enterprise as an engine for supporting their own activities. The publishers' view is inherently no better or worse than the author-centered view, although the latter is easier to promote in a scholarly community made up essentially of authors. I haven't seen a proposal for editor-centered publication, but perhaps there is one.

I think that reader-centered publication has an ethical edge over other models, as long as it really works. I expect authors, editors, and publishers to succeed in their own endeavors by catering to the interests of readers. But, each of these activities---writing, editing, publishing, and reading---is essential to the success of scholarship. It is still possible that the best way to serve readers is to focus on providing incentives to the other players, for example to authors as suggested by Harnad. My bets are on the reader-centered model; if I win it will not be because of abstract principles that favor this model, but because all of the details of scholarly publishing work out well in this way.

Serving Readers Through Flexibility

Having decided to focus on service to readers, it is still not obvious how to best serve them. At first, it seems that we must predict the behavior of readers over the effective lifetime of an article, which is at least decades and possibly centuries, in order to maximize their satisfaction. I believe that such prediction is impossible, and that attempts to cater to detailed needs of readers is harmful: instead of serving their actual needs, we will impose constraints on their behavior. Rather then trying to anticipate readers' needs, I propose that we provide them with as much information as possible, and the greatest flexibility possible to use that information in whatever ways they like. This may sound even harder than serving specific anticipated needs, but in computing and informational enterprises, there are sensible principles that tend to maximize flexibiilty.

In choosing real estate, the three most important issues are location, location, and location. Location is most important because it is the quality of real estate that cannot be changed by construction, renovation, or zoning legislation. Similarly, in designing computing/information systems for the long term, the three most important issues are binding time, binding time, and binding time. The ``binding time'' of a datum is the time in which its value is recorded permanently, or ``bound,'' in the the system. Although the buzz phrase specifies time, it is just as important who chooses the binding. The key to flexible systems is to delay binding time for each datum until there is no further reason to change, or until it is crucial to establish uniformity. For example, early versions of the UNIX operating system used to allow the system installer to bind the time zone and the choice to follow or ignore daylight savings, but had the starting and ending dates for DST compiled into the system. This seemed great until congress changed those dates. And, those who carry modern laptop computers on trips may be annoyed by the assumption that machines don't change their time zones.

So, I considered the right binding times for different qualities of a published article. It is important to bind the content of each article when it is published, as well as those aspects of style that are subject to criticism and discussion by a variety of readers in the future. The value of a published article is diluted severely if readers cannot refer with confidence to a uniform notion of the content. But, the form of display of an article is subject to the particular needs and tastes of the reader, and should be bound only at the moment of reading. Even the word ``display'' is too restrictive, since readers may wish to process articles through automatic systems for information retrieval and analysis, they may wish to import formulae into symbolic math utilities, they are almost certain to invent ways of using the information in articles that are inconceivable today. So, the fundamental principle in the design of CJTCS is to provide readers with the most direct presentation of the textual contents of an article that we can arrange, and leave them free to use that presentation however they like. This flexibility is unlikely to be noticed much in the first few years, but should be a big win in the long run.

We publish articles because the act of publication adds value. I find four basic sources of value in publication:

  1. Certification. By publishing an article, we certify that it meets some standards of quality imposed by the editors.
  2. Standardization.
  3. Distribution.
  4. Archiving.

The Story

Back to 1989. Stuart and Janos and I decided that our project should be guided by three general principles:

  1. provide a substantial service to the scholarly community;
  2. perform a significant experiment in online publishing; and
  3. don't try too many innovations at once.
Principle 1 is essential to attract the support of authors, volunteer editors, and subscribers. Principle 2 secures my interest in the project. Principle 3 tries to keep us sane.

These principles, and other considerations, led to concrete decisions about the nature of our journal:

  1. We would publish original research in computer science, starting with theoretical computer science.
  2. We would start a new journal, published online only. There is value both in starting fresh, and in gradually converting existing print journals. We decided to make CJTCS an experiment in total commitment to the network. We had more freedom to do this with a new journal, since we had no legacy commitments to constrain us.
  3. We would apply conventional peer review. The key resource that is conserved by peer review is readers' attention. Online publication presents no compelling reason to change this. The network may provide opportunities to improve the process of selecting articles for publication, but we had too many other innovations to work on, and decided to be conservative here.
  4. We would provide substantial editing of accepted manuscripts. Neither the textual content of authors' manuscripts, nor the markup structure, nor the typographical layout, is usually good enough to support the full range of future uses for online information.
  5. We would focus on publishing conventional articles in a new medium, then add more value incrementally through variations and auxiliary services.
  6. We would plan from the beginning for permanent archival value. Since online materials are more likely to be lost through obsoloscence of formats than through deterioration of materials, the key to archival value is the commitment of stable institutions.

Looking for a Sponsor

The requirement of manpower in item 4, and of institutional support in item 6, delayed us several years. We were unwilling to start entirely on volunteer effort and hope. We might have chosen differently had our topic been suitable for plain textual articles. It is perfectly sensible for volunteer editors to critique style and content of English text. And, plain ASCII files can be distributed widely so that archiving becomes the distributed responsibility of a large number of libraries and readers. In the old print regime, publishers did little or nothing about archiving: they shipped printed volumes and left archiving to the libraries. The difficulty of our highly mathematical material requires more sophisticated publication formats than plain ASCII. Editing for good format and layout requires skills and a type of work associated with professionals rather than volunteers. These considerations convinced us that we needed a modest budget for editorial and production.

When we considered archival permanence, we thought at first about the longevity of material media, such as CDs, magnetic tapes, disks. It is very difficult to get reliable information on these points. We soon realized that the deterioration of materials was not the key problem, anyway. Online bits are always being copied between disks and other storage devices. Merely by having two or three repositories in different cities, we could be very secure against physical disasters. Our real problem was obsolescence of data formats. There is no way to guarantee in advance that a database will be converted into new formats when the old ones lose support. Security against format obsolescence hinges on the commitment of an institution to doing whatever is necessary, when it is necessary, without advance knowledge of the specific steps. So, we needed a reliable institutional sponsor. Over time, we thought of other uses for such a sponsor. For example, we needed someone to hold copyrights and/or licenses, to ensure our permanent rights to distribute articles.

At first, we sought funding from a government agency, or foundation, to subsidize a few years of journal operations while arranging permanent funding for university sponsorship of the journal. We still find the idea of direct university sponsorship very attractive for the long run. Imagine that all university libraries took their large periodicals budgets, and diverted them from subscription fees, to the direct subsidy of important journals. Each university could sponsor a modest set of journals, and give away the articles for free online. The total cost could be substantially less than the cost of giving articles to publishers and then buying them back. Access to scholarly information would improve. Those publishers flexible enough to adapt to the new regime could offer their editing and production skills on a contract basis. The risk of journal failures would be borne by universities instead of publishers. But, before we found willing sponsors for our favored model, MIT Press took the initiative in proposing to operate the journal. We found this offer irresistible, and decided to edit a journal for publication by MIT Press. I consider MIT Press' enthusiastic and open-minded attitude to be far more important than the fact that it is a publisher, rather than a university or library.

Choosing a Topic

We decided to focus on theoretical computer science, mainly because we had the expertise and contacts to form a good editorial board in that area. Also, TCS is a mature field with a number of good conferences and printed journals, but the extreme competition for conference positions and multiyear delays for journal publication suggest that there is room for another venue for articles. The presence of more conventional printed journals is important, so that we can send unsuitable authors elsewhere instead of compromising our methods in order to serve everybody.

Online publishing should lead us to question almost every aspect of the journal business, even though we resolve some of those questions by sticking to tradition. We wondered what should determine the boundaries of a journal. If it is the format and access method, then the scope of a journal should be as broad as possible. Springer-Verlag's Journal of Universal Computer Science, and the British Computer Society's Computer Journal try to cover all of CS. The New England Complex Systems Institute's InterJournal is even more ambitious, with no limitation on the scope of topics, although only three areas are represented so far. There is a good chance that, in the long run, there will be a small number of comprehensive collections of scholarly articles, divided according to irreconcilable differences regarding format, access methods, power politics, and ego. These collections will have relatively independent editorial boards for each of a large number of topics. There is no compelling case that each of these topics needs a creative name and a separate identity as a "Journal." We considered seriously founding a journal for all of CS, with an initial topic area in theoretical CS. We decided that we could not expand the set of topics fast enough to maintain credibility in the broader scope. But, perhaps CJTCS will merge someday into a much larger and broader journal.

Choosing a Publication Format

I have argued at some length elsewhere that the definitive archival format of articles published online should present the textual structure of articles as transparently as possible, leaving details of the display, such as typographical layout, to readers. Layouts that are attractive to large numbers of readers should be provided when convenient as a derivative service, but the definitive copy of an article should be a structural source format.

We decided very early, as a corollary of our commitment to flexibility for future readers, to publish textual structure rather than typography. But, what is an appropriate format for textual structure? SGML (Standard Generalized Markup Language), of which HTML (Hypertextual Markup Language) is a derivative, was designed mainly to represent textual structure. Although SGML has many flaws as structural format, it is the best default choice today, since so many people are supporting it. The Astrophysical Journal, published by the University of Chicago Press, uses SGML as the definitive source format for articles.

The topic of CJTCS ruled out SGML as a definitive publication format. Research articles in theoretical CS require a lot of complex mathematical formulae, which cannot be read efficiently without a high quality of typographical formatting. Nobody has yet provided a general-purpose SGML or HTML viewer with an acceptable presentation of mathematical formulae. Most articles in theoretical CS are written with the LaTeX and AMSLaTeX typographical languages, based on the TeX system for typesetting text and mathematics. LaTeX and AMSLaTeX were designed to support authors through a series of drafts and revisions, with high quality printed copy as the final product. They were never intended as source formats for publication.

Fortunately, the best way to support an author through the revision of a series of drafts is to use a source format representing textual structure. Leslie Lamport, the creator of LaTeX explains this point in the Why LaTeX? section of his book on LaTeX. In particular, he explains the deficiencies of pictorial formats, which are often praised as ``WYSIWYG'' (``What you see is what you get''), but which might be better described by ``What you see is all you've got.'' LaTeX and AMSLaTeX have two types of gross deficiencies as publication formats, in spite of their structural slant:

  1. Even sloppy structure is good enough for revising drafts, so the structural qualities of LaTeX and AMSLaTeX are not laid out at all clearly.
  2. Many values, such as the numbers of sections, theorems, and other textual units, are properly bound at publication time. Since LaTeX and AMSLaTeX support prepublication revision of drafts, these items are all unbound in the source, and computed during the automatic typographical layout, subject to stylistic decisions that may be varied in the execution of the layout program. For standardization of the text, these items all need to be bound in a definitive published version. (On the other hand, it is a good thing that pagination, line breaking, page breaking, page numbering, and similar layout values, are left unbound in the source, since these are properly left to the readers' control).

CJTCS could not survive even the briefest startup without excellently readable presentations of articles, since no author would submit a paper to an illisible journal. In order to provide acceptable presentation instantly, with the best feasible structural information for future use, we decided to publish LaTeX source, but to edit it carefully to conform to strict standards of our own. To support these standards, I provided a freencollection of macros, called CJstruct, and defined a disciplined subset of LaTeX using those macros to present structure as clearly and unambiguously as I could manage. CJstruct does very little to determine typographical layout. Rather, it translates a roughly SGML-like structure into the commands defined by existing LaTeX styles. The small amount of typographical content in CJstruct provides default layouts for a few structural elements, such as journal head matter, that are not supported by existing styles. We provide a standard preformatted version of each article, in a style agreeable to the majority of current readers. But, readers in the future have the power to vary all of the elements of typographical style in the article. The obvious variations involve individual favored fonts, alternate page sizes, large type for readers with vision impairments. But, the most important uses of flexibility are the ones that nobody does yet, but will do decades from now.

The Astrophysical Journal solved a very similar problem in reconciling presentation of mathematics with good textual structure, in a substantially different way. AJ uses SGML as the definitive source format, and uses its own software to convert this source automatically to LaTeX and other formats. In a static sense, the AJ solution is better than CJTCS's. So why do we do differently? AJ is a huge journal, and substantial resources were spent on professional programmers to provide the conversion software, which works only on AJ's particular material. CJTCS's programming staff consists of my ``free'' time. AJ devotes substantial editorial resources to re-entering author's manuscripts in the correct form. CJTCS's corresponding resource is currently me, in the process of transferring to one staffer at MIT Press, who has many other tasks in hand besides CJTCS editing. AJ's mathematical demands are strong, but not nearly as severe as CJTCS's. Finally, AJ is not trying to present any internal structure in mathematical formulae, only the typographical layout information. At some point in the future, I expect CJTCS to convert to a better structural format, which may be a derivative of SGML, but only when someone else is providing a general-purpose display method with high-quality presentation of formulae. AJ deserves commendation for trying the immediate SGML path. It allowed them to convert very efficiently from pure print to online presentation in HTML. Their conversion programs may develop into the basis for a general-purpose SGML formula formatter, or they may merely provide useful data to whomever creates that formatter.

The Journal of Functional and Logic Programming, also published by MIT Press, decided to use CJstruct, and the same standards for published LaTeX source. This sort of bandwagon effect is crucial to the long-term success of our methods, since most of the development work required for one journal ports to the other. The bandwagon will grow, perhaps by attracting more users of CJstruct in the near term, certainly by merging with other, independently derived methods in the long term.

Distribution Methods

When we first conceived CJTCS, anonymous FTP was the normal method for distributing information over the Internet, and we assumed that it would be the main method for the journal. Then, I discovered Gopher, and spent substantial effort organizing a Gopher interface, which was obsoleted by the World Wide Web about the time the journal started publishing. Now, World Wide Web access is the norm. We support anonymous FTP too, particularly for automatic mirroring operations. The good news is that the change from FTP to HTTP through Gopher made no difference whatsoever to the journal operation, and further changes in network protocols are likely to be equally transparent. Rather than committing to a specific network distribution protocol, we are committed to tracking whatever protocol everybody is using. Since both MIT Press and the University of Chicago need to run the latest protocols for other reasons, there is very little cost to the journal for this commitment.

As an auxilliary service, MIT will print hard copies of articles and mail them for a charge. This has no impact on operations, since it is run as a printing service, and does not affect the production of the definitive published articles.

Library subscribers often ask what they need to do to ``acquire'' CJTCS articles, now that they have subscribed. The acceptance, cataloging, and shelving of newly received paper materials is a substantial job for libraries in the print regime. What are the analogous activities in the online regime? There aren't any, and that is good news. Since the World Wide Web is so successful at delivering information directly from the producer to the reader, libraries do not need to mediate readers' access to journal articles. The libraries' role with an online journal is to help local readers find the journal. That requires an entry in the (preferably online) catalog, and a pointer to the journal's URL. No work whatsoever is called for by the library to receive individual articles, issues, volumes. Certain libraries will choose to mirror certain journals to improve network efficiency, but the article-by-article effort in such mirroring is entirely automated.

Support Through Subscriptions

We intend that CJTCS will support itself primarily through institutional subscriptions, which are priced at $125 per year. A grant from the Mellon Foundation is helping with the startup explorations. For the first three years of operations, we have provided articles to subscribers by FTP and HTTP (World Wide Web) with no technical access restrictions. There is some controversy going on whether subscribers are offended by the possibility that nonsubscribers are looking at articles for free. Since academic libraries have always viewed themeselves as stewards of the literature, more than as institutions gaining advantages over competing libraries by controlling access to strategic documents, I do not expect them to care about access restrictions. The lack of technical access restrictions, such as passwords, makes life simpler for subscribers, and it allows the sort of good-faith browsing by nonsubscribers that goes on in book stores. There is no method for controlling online access to information that does not make legitimate access less convenient in order to prevent illegitimate access. Since the journal and its subscribers are not harmed directly by unpaid access, I have opposed access controls. The freedom from controls makes volunteer mirroring particularly easy, and paves the way for our inclusion in broader document collections in the future.

We do not yet have enough subscriptions to support operations. I suspect that we are not producing enough articles to attract subscribers, and that we have diverted to production work a lot of effort that should go into explaining the journal to libraries. There are many other theories about the low initial subscriptions. We will never know the real reason, but I hope that we will eliminate it whatever it is.

Starting in 1998, CJTCS will restrict access by Internet address to subscribing institutions and individuals. I opposed this step, but considered it not important enough to argue about in the midst of more serious issues. Address restriction is a lot less annoying than passwords. It makes access completely transparent from the right Internet addresses, but creates a large problem for subscribers who normally come in from private Internet access providers instead of from their home institutions. I expect that the access restrictions will have a minor negative impact on subscriptions, rather than the positive impact that motivates this step. I also expect that, having solved the more serious problems of insufficient articles and subscribers, that we will eventually reverse this step, due to competition with a large unified archive of freely accessible articles in computer science that should arise in the near future. We will see.

Independently of the question of technical restrictions on network access, CJTCS remains committed to provide the most liberal possible license to subscribers. The only restrictions are those required to insure fair attribution to authors and the journal, and to prohibit use in a competing commercial product. In particular, subscribers have access to current articles and back issues, they may read articles online from MIT Press, the University of Chicago, or any mirror site, they may form their own mirrors or private archives and preserve copies of articles after their subscriptions expire, they may display and print copies in whatever formats they like, they may use any information processing and retrieval software that they like. I believe that the unbundling of information from the methods for displaying and processing it is absolutely crucial for the effectiveness of the network in academia. In the long term, readers should be able to choose their favorite software for display and other processing, and use that software on all the types of information that they read, rather than dealing with specialized interfaces that are coupled tightly to particular sources of information.

Journal Operations

In its third year of operation, here's how CJTCS processes scholarly articles, right now.

  1. The author submits an article to an editor in LaTeX source form, sometimes with figures in PostScript or other pictorial formats. Authors' LaTeX source varies in typographical and structural quality, but never meets our publication standards, and is not expected to.
  2. The editor takes advice from referees (typically 1-3 of them) regarding the quality of the article. The identities of the referees are not revealed to authors.
  3. Usually the editor requires some revisions. Depending on the nature of the revisions, there may be a new round of refereeing. In principle, this may repeat indefinitely, but 1-2 rounds is typical.
  4. If the article meets the editor's standards, then it is accepted for publication, and the revised LaTeX source is sent to the managing editor (me).
  5. I make sure that the article is basically printable, and I may send to the authors for minor revisions, e.g. if they have typed in the bibliography as text, instead of providing BIBTeX source.
  6. I transmit the article by FTP to MIT Press.
  7. MIT Press prints the article, and expresses a copy to an independent copy editor, working on contract.
  8. The copy editor expresses a marked up copy to MIT Press.
  9. MIT Press expresses the copy edited article to me.
  10. I enter the copy editor's changes, and convert the article to publication format.
  11. I send queries to the author by electronic mail. Most of these queries involve minor textual issues from the copy editor. But, since I understand the content of the articles, I sometimes find mathematical problems overlooked in the refereeing. I provide the author with printable copy for reference, but not with source, since I want to control all edits at this stage.
  12. The author responds to my queries, and makes other comments on the production editing. In principle, this step may iterate, but in practice it almost never does.
  13. I make (usually small) changes based on the author's responses to queries and other comments.
  14. I transmit the article by FTP to MIT Press.
  15. MIT Press prints the article, and expresses to the copy editor for proofreading.
  16. The copy editor expresses a marked up copy to MIT Press.
  17. MIT Press expresses the proofreading markup to me.
  18. I enter final changes from proofreading. Usually, I have no more correspondence with the authors. But, if something unusual comes up in proofreading, or if the authors express a desire to see the results before giving final approval, I send to them again.
  19. I transmit the final version by FTP to MIT Press.
  20. MIT Press reviews the final article, and approves it for publication.
  21. MIT Press and I, in parallel, enter the article on our servers.
We have published 12 articles (far too few) by this process. Some of the inefficiencies are quite glaring, but the whole process is unstable, and it doesn't always make sense to fix problems that are going to change anyway. The repeated FTP transmissions are not as bad as they look, since good user interfaces under Emacs make this transmission look almost the same as just storing a file locally. Much of my production work is being transferred to new technical staff at MIT Press, so the pattern of communication will change before we could improve the current form. In principle, all work should be done on a single definitive copy of an article, using RCS to control the revisions from a single site, and CVS if people at different sites work on the same article. Improving interaction with the copy editor is a bit harder. Printing and expressing of unmarked hard copy can be solved by transmitting PostScript. It would be easy to create LaTeX macros to display the copy editor's markup. But, it is difficult to provide a user interface for data entry that competes in speed with pen and paper markup.

Inconclusion

No, there is not a space missing in the section title. Experience with online publishing is thoroughly inconclusive, and it would be harmful rather than helpful to draw definitive conclusions. Instead, here are some personal predictions, some of which might be right.

What the online publishing enterprise needs now is not one person who knows everything right. Even if such a person existed, we'd surely ignore her. Rather, we need a lot of people, full of energy and good will, to keep a grip on the controls and dodge the obstacles as they appear. The sound of a successful online publishing enterprise is something like this: "Oops." "Aaagh!" "Hmmm." "Whew." Repeat until things settle down.