APS Reports

Published in the Bulletin of the American Physical Society Vol. 36, No. 4, p. 1119 (1991)

Report of the APS task force on electronic information systems

INTRODUCTION

The Task Force on Electronic Information Systems was formed in November 1988 by APS President, Professor Val Fitch. The charge to the Task Force was the following:

The American Physical Society recognizes that the new information technology offers the Society an unprecedented challenge and opportunity to further the mission of advancing and diffusing the knowledge of physics. The rapidly decreasing cost and increasing use of personal computers, optical and electronic data storage and delivery and the use of these new developments by other suppliers of scientific information make it absolutely necessary that the Society move to utilize these technologies with all deliberate speed. The Society therefore requests the Task Force to:

  1. Review the present use of electronic and optical storage, and the organization of the information for most efficient retrieval and delivery, and to project the prospects for the use of these systems in the next decade.
  2. Review the technology presently available and that which is likely to become available in the next decade for supplying physics information.
  3. Develop a strategy and time scale for the Society to utilize these new information technologies in order to distribute to the physics community most effectively, information now published in APS journals.
  4. Develop a plan for integrating the new technologies into tbe present APS editorial and journal-production systems. The Task Force should state its opinion on whether or not published journals will eventually be replaced by electronic systems or will both published journals and electronic information systems be necessary for the foreseeable future.
  5. Develop plans for developing desirable new modes of presenting physics information to both the physics community and the wider technological community.
  6. Suggest a plan for a workable metbod of charging costs to producers and users which will represent the value of the information delivered and will generate sufficient income to enable the total information system to be self-supporting.

The Task Force members are listed in Appendix A . The members include representatives of information-oriented industrial laboratories (AT&T Bell Laboratories; Xerox, IBM), universities (Princeton, Virginia), and some national laboratories with heavy electronic information requirements (Lawrence Berkeley Laboratory, Fermi National Accelerator Laboratory). The members themselves brought expertise in hardware, software, network management, data handling, information management, and on-line information usage. The Task Force was assisted by APS management (Miriam Forman), editors (Peter Adams, George Basbas, and Gene Wells) and publishing personnel (Peggy Judd and Peggy Sutherland). Their contributions to the work of the Task Force were critically important.

The Task Force held six meetings between December 1988 and March 1990. These included extensive interviews with personnel involved in all levels of production of APS and AIP journals; with representatives of a database firm (Dialog), of scholarly societies (American Mathematical Society, American Chemical Society, and Institute of Electrical and Electronics Engineers) involved in experiments on information dissemination with Compact Disc-Read-Only Memories (CD-ROMs) and on-line, of the Association of Research Libraries, and of the Defense Technical Information Center and with information managers (Xerox PARC and LBL). We also had extensive opportunities to see and discuss some advanced hardware and software at AT&T Bell-Holmdel, and Xerox PARC. The meetings are summarized briefly in Appendix B , In addition, members of the Task Force met with the APS Council and with the APS Publications Committee. During the meetings, the Task Force learned a great deal about the efforts of other societies and of industry to utilize electronic technologies to manage and disseminate scientific information. These discussions and presentations have been most helpful to the Task Force in developing this report.

The thinking of the Task Force has evolved significantly during the preparation of this report. The initial focus was on the delivery of the APS literature in a way that would give the physicist more convenient access to the information in the journals. Compact optical or magnetic data-storage systems would permit the physicist to keep large collections of physics literature in a personal computer or workstation. A physicist would be able to search not only the titles, authors, and abstracts, but also the full text of articles using powerful search algorithms. This alone could have a significant impact on his or her work.

The Task Force has concluded, however, that this is not sufficient. Just as a library cannot serve the needs of the physicist by providing only The Physical Review, an electronic information system must provide access to articles in more than one or a few journals. The Society, in planning its own information system, must take into account the efforts of other societies and publishers and must where possible , adhere to standards that will make its journals available as part of the more-general scientific literature. We believe that the Society should adopt as a goal a National Physics Database. This, we hope, will be the first step toward integrating all the world's scientific literature into an electronic information system.

It is not possible to present a detailed plan to accomplish this ambitious goal because we cannot predict in detail what technologies will evolve to address issues that this project would raise. The Society must plan to evolve gradually from the present, always guided by a long-term vision of a worldwide electronic information system for physics (or all science), while it monitors progress in a number of crucial areas. In addition, the Society, together with the AIP, will have strong reasons to take a leadership role in moving toward this vision.

In developing our recommendations for short-term activities, we distinguish between two classes of project: those that help the Society accomplish its mission in the short/medium term, and those that move the Society toward what we see as the long-term goal, to be part of a full "physics information system. " It is clear that not all short-term measures (e.g., delivering journals on CD-ROM or articles by fax) are steps toward the long-term goal.

In this report we begin by reviewing the present situation within the physics community. In Sec. II we describe the physicist's current electronic information environment, the developing role of electronic communications, the information currently available on-line in physics and closely related sciences, and some on-going experiments in the use of CD-ROMs. In Sec. III we review the current status of hardware and software systems and describe the network infrastructure that is evolving within the United States to link the scientific community. In Sec. IV we present our vision for where the physics community will be in 20 or 30 years, "Vision 2020." We recognize that the vision will not be realized soon or in exactly the form we describe. It is useful, however, to provide a focus for our short-term recommendations. In Sec. V we discuss some of the many issues and challenges which are raised by this vision. Finally, we turn to a plan for the Society to provide its own information system and to work towards the goal we describe. In Sec. VI we outline such a plan, present a set of short-term and medium-term recommendations, and discuss the financial implications of these recommendations. Our conclusions are summarized in Sec. VII .

We realize that the details of this program will evolve. We are convinced, however, that an important aspect of this evolution will be the continuing education of the physics community regarding the power of electronic information systems and the potential impact that they can have on the conduct of scientific research. It is our hope that this report will contribute to that education.

II. PRESENT STATUS OF ELECTRONIC INFORMATION SERVICES AND ORIENTATION OF USER COMMUNITY

In order to chart a sensible course into the electronic age, it is useful to assess the present situation in the physics community and the broader scientific community as it relates to electronic submissions, preprint exchanges, forums, on-line databases, databases on portable media, and other electronic information resources. There is, in fact, an entire sociology of electronic information developing within these communities. It is still fairly heterogeneous, with different fields of physics following somewhat different patterns and the different sciences showing even more variation, but there are some common trends that have important implications for the future. We shall highlight a few of these.

We shall also try, in this section, to survey the present electronic landscape as seen by the working physicist. This survey, combined with the assessment of technological issues presented in Sec. III, leads us to the vision described in Sec. IV, and the challenges discussed in Sec. V. The electronic information systems that we mention range from the experimental to the highly developed, cover a wide range of subject areas, use several different technologies, and exhibit a variety of management structures. Together they represent a rich and growing set of examples whose careful study can suggest many ways for the APS to proceed, and pitfalls for the APS to avoid. In Appendix C we give more-detailed information on some of these systems.

A. The Individual's Immediate Environment

Probably the most ubiquitous feature of the current electronic environment in physics is the use of word processors to produce and edit papers. Even physicists who have otherwise shunned the use of computers in their research have learned how to log on and use the word processor on their local VAX or desktop personal computer. This fact alone has enormous implications. Not only is the text of physics papers available ab initio in digital form, but also many of the physicists who write papers are plugged into a computing environment that includes network connections, electronic mail service, on-line databases, and other computer-based services, in addition to word processors.

This proliferation of informational equipment in the individual's immediate environment has led to a profound change in the sociology of physics, a change that has taken place rather rapidly and was driven primarily by new technology. It is interesting (and perhaps humbling) to speculate on whether a task force report such as this, if written in the mid-1970s (before the advent of inexpensive VLSI microcomputers), would have accurately foreseen the present situation. Such reflections certainly invite the prediction that the technological revolution anticipated for networks (see Sec. III C) will drive an equally profound sociological change, one that will lead to the world envisioned in Sec. IV.

Although electronic word processing certainly increases the power to create printed documents, it also opens the possibility of exchanging the full text of documents electronically and of submitting this text to journals as computer code (thus obviating the journals' need to typeset their articles). Yet the development of word processing has been as chaotic as it has been rapid. The variety of ways in which word processing is performed is so large (a host of stand-alone word processors and an even larger variety of word-processing programs run on PCs and mainframes) that it creates serious problems. For example, recipients of electronic text can usually print it out only if it is in one or (occasionally) a few formatting languages, so that most publishers must turn a deaf electronic ear to most authors.

The effect of this babel of formatting languages on publishing can be dramatic. The history of our own Physical Review is illustrative. In 1979, Physical Review began allowing authors to submit, in addition to hard copy for refereeing, an electronic file or "compuscript" (first on magnetic tape, later on a floppy disk or UNIX-to-UNIX over telephone lines) for direct typesetting of galleys. Until 1987, this compuscript had to be in the troff formatting language, a restriction that undoubtedly explains why the number of these compuscripts, after rising in the first few years, leveled off at roughly 50 per year, less than 0.5% of the total number of papers.

In 1987, Physical Review began to accept compuscripts for typesetting in a second formatting language, TEX (plain TEX and variants like LATEX). The arrival rate of these author-prepared compuscripts immediately began to grow, a growth that was enhanced when the APS developed and began to promote its own TEX variant called REVTEX and when, in 1989, the APS began to accept BITNET submissions in any of the TEX variants. The differences between LATEX or REVTEX and plain TEX have now proved crucial: although compuscripts in the former usually require only light editing to produce Physical Review galleys, those in plain TEX require such time-consuming modifications that it has proved simpler to print them out and rekey them in troff as if they had never been in electronic form. And even for REVTEX and LATEX compuscripts, one-third cannot be used to produce galleys, for one reason or another (often the nonavailability of authors for the consultation that is still necessary).

The number of electronic compuscripts submitted either in the initial phase for refereeing or in the later phase for typesetting is now rising rapidly and has already eclipsed the rate for papers with troff compuscripts. For the near future, one expects this number to increase to 20-30 % of all papers, but not higher. The majority of papers are likely to remain nonelectronic because it appears that only one-quarter of papers submitted to the Physical Review are originally prepared with one of the TEX family of "formatting languages. [In a 1987 APS survey of authors of 497 Physical Review papers, 49% of whom responded, Peggy Sutherland of the APS found that only 44% of these authors (or their staffs) ever used a TEX language and that only half the papers by that 44% (i.e., 22% of the responding authors) were originally prepared using a TEX language. Similarly, in a survey of 2800 authors of papers submitted to European physics journals, reported by van Herwigjnen and Sens (see Ref. 1), 39% responded, but the papers of only 22% of these had been prepared with a TEX language.]

Furthermore, although plain TEX and its offshoots LATEX and REVTEX are closely related and the relative proportion of Physical Review submissions in the latter two is now increasing, their differences are so important that many proponents of plain TEX may be unwilling to switch. A contrasting example is that of the heavily edited magazines of the IEEE's Computer Society. In these magazines, where the text is largely free of equations, submissions in any formatting language are accepted, and the electronic submission rate is now roughly 95%, resulting in an estimated saving of 35% in printing costs. At the other extreme, the journal Complex Systems requires that all submissions- be electronic and in LATEX. We do not know what success this policy has had but if further study shows it to have succeeded, the APS should find out why.

B. Electronic Mail

Another feature of everyday life for physicists has been the rapid increase in use, both in frequency and scope, of electronic mail services, "e-mail." This has been made possible by the expansion of the networks such as BITNET, DECnet, and ARPANET (see Sec. III C). And for physicists not having direct access to such networks, the AIP now provides BITNET access through its PINET facility, which, in turn, is accessed with a modem and phone-line connection to the commercial TELENET network. This e-mail capacity has been used for a number of purposes:

Short informal communications.
e-mail has at least two advantages over telephone communication: it leaves documentation, and the resulting computer files are machine-searchable.
Nonlocal collaborations.
e-mail facilitates a rapid exchange of data and of manuscripts in preparation. The latter requires a common formatting language and even some similarity of computer environments that is often absent. For example, this Task Force, thoroughly immersed in the possibilities of e-mail and electronic text processing, was constantly confronted with files created at one site that couldn't be correctly formatted at another or that would even produce different characters at different sites or different characters on screen and on paper at the same site. The exchange of full text has only become convenient with the recent advent of high-baud-rate networks. (At 9600 Kbits/second, the transmission of a 20-page paper with 65-character lines and 24-line pages takes less than 30 seconds.) It is clear that, when far higher data-transfer rates become widely available (see Sec. III C), the electronic exchange of manuscripts and even the mass e-mailing of preprints will increase greatly. This development may significantly alter the nature of physics collaborations and of the role of preprints, already the principal mode for rapid "publication" in very active fields.
Submissions for publication.
The electronic submissions to Physical Review discussed above, originally possible only with magnetic tape or on floppy disks, are now possible and encouraged via e-mail. For some authors this is only a mild convenience and a saving of a few days time. For others, in more remote parts of the world from which mail service may be a matter of several weeks, the time saving can be very important.
Referee reports.
Physical Review now encourages referee reports via BITNET. This speeds the refereeing process and satisfies many referees who are symbiotic with computers, but is possible only because such reports are relatively free of equations, tables and figures.
Conferences.
For some conferences involving hundreds of people, the fraction of attendees having BITNET addresses has been sufficiently large that everything including initial announcements, submission of abstracts, coordination of arrival times, etc., has been handled by BITNET, with conventional paper mailings serving primarily as a backup. (This is especially true for computer-oriented meetings such as the International Conference on Computing in High Energy Physics, but even major physics conferences now use e-mail heavily. In addition, most conferences have found it necessary to provide facilities so that attendees can send and receive e-mail from the meeting.) An interesting by-product of this increase in electronic communication is the blurring of distinctions between private communication (the "informal literature") and published articles (the "formal literature"). Whereas the level of formality has usually been defined to some extent by the medium (e.g., word-of-mouth and hand-written notes are informal communications, mailed-out preprints are more formal, and bound typeset journals are the most formal), the coalescing of the various media into the one electronic medium removes that source of distinction and forces the physics community to make its distinctions in other ways.

C. Bulletin Boards and Forums

A least formal line of communication, most pervasive in the lay community, is the computer bulletin board or forum. Such facilities, in which users can write as well as read, have been adopted by the physics community in at least two known cases and probably many more: high-temperature physics and cold fusion. Such bulletin boards are usually available through some dial-up service, or may be maintained on-line for uses of some institutional network. The Superconductivity Information Service of the Department of Energy (SIS/DOE) provides a high-Tc bulletin broad, for example. We have made no systematic study of such bulletin boards, either of their extent or usefulness. It is clear from a casual reading that the contributions, which are totally unrefereed, may often be of doubtful value.

D. On-line Databases and Information Systems

Among all the aspects of electronic information systems we have considered, the use of on-line electronic databases has the most far-reaching potential for altering the way physicists conduct their research. With the rapid advance of technology, a plethora of electronic databases has emerged, The thresholds have been reached for the data-storage, retrieval, and transmission capacities needed to make on-line acquisition of full text (without figures) an option, although such databases are still rare. A database with full text and quality figures is some way off. Here we mention a few of the bibliographic and textual databases and information systems, to give an indication of the range of possibilities. We give the details about these and several other databases, systems, and hosts in Appendix C.

The databases available in physics, chemistry, and related fields differ from one another in several respects:

  • The range of primary sources (journals, books, etc.) they cover Most of the bibliographic databases cover published literature in certain subject areas; some include things like patents (CA SEARCH, the Chemical Abstracts Search); some are devoted to preprints (the Preprint Database of SIS/DOE and HEP/SPIRES/ SLAC, the High Energy Physics Database at the Stanford Linear Accelerator Center).
  • The information they give about each entry. Most bibliographic databases give abstracts, which have been encoded during publication, SIS encodes the abstracts of preprints, whereas HEP/SPIRES omits the abstracts of its preprints. For INSPEC, new abstracts of articles in journals of the AIP and its member societies must be supplied because the IEE (the Institute of Electrical Engineers) refused to pay for the original author abstracts; these abstracts do appear in the competitive database PHYS, for which the AIP is a producer.
  • The producers - the organizations that provide tbe information that forms tbe database. Until now the producers of bibliographical databases have been the offshoots of scholarly organizations that publish indexes on paper (IEE, Chemical Abstract Service of the ACS) or on tape (AIP). In a database with full text (Chemical Journals Online), the producers are the publishers of the original journals (ACS, Royal Chemical Society, John Wiley & Sons). The hosts - the organizations that store, market, and provide access to tbe information in a database The same database may be available from different hosts, with different modes of access, different search programs, different down-loading features, and different prices. In several cases, the producer and host are the same organization (the DOE for its databases and systems; SLAC for HEP/SPIRES) or almost the same organization (STN International, a consortium, of which the American Chemical Society is one of the three members, for databases produced largely by the ACS or its offshoot, the Chemical Abstract Service). In other cases the producer and host are entirely different.
  • The charges. The charges to readers, which may vary from host to host, are presumably set by the host; the compensation to the producer (and thence, to the original sources) is presumably negotiated and may vary in surprising ways. We have not investigated this thicket in detail.

The Society must review these issues before it begins to move toward creating the recommended Physics Database.

E. Searching Databases

In addition to the issue of data-handling capacity, an equally serious concern regarding the usefulness of a large, unified, full-text database is the need for an efficient searching strategy, one that returns most of the relevant articles and weeds out most of the irrelevant. The issues here are complex and in many respects similar to those relating to artificial intelligence.

In preparing this report we had an opportunity to sample a number of databases, both bibliographic and full text. In the case of purely bibliographic databases like HEP/SPIRES, the intelligent use of boolean operations could sometimes result in a reasonably efficient search strategy. This, of course, depends on how specific one's original intentions are. In a purely bibliographic database (i.e., title, author, and publication information), it is often impossible to do a useful subject search based on the text, since the subject information contained in the title is extremely limited. (In fact, it is our experience that such databases are primarily used to find a specific article whose author or title is already known.) If keywords or subject codes are provided (by authors, editors, or indexers), things can improve greatly, but such descriptions tend to become obsolete as the terminology and popular subjects to which the paper is relevant change.

The storage of abstracts and text obviously provides, in principle, a large amount of subject information, but it requires much more sophisticated software to make use of that information efficiently. Indeed, one would like to query the database as follows: "Give me all the articles on this particular subject that I would find interesting and useful in my research. " In practice, this requires a delicate balance of context-sensitive falters and probably a much-more-interactive searching program, e.g., one that asks questions of the user as well as vice versa, and thereby encourages the user to refine his or her search along lines that have proven efficient in the past.

For the most part, software for such interactive searching does not yet exist. Thus, a large full-text physics database, if such existed, would not be fully exploited without major new software developments. The search problem is being studied widely. An example, using ten years of full-text chemistry literature, is in the Chemistry Online Retrieval Experiment (CORE), a joint project of the ACS, Bellcore, the Online Computer Library Center, and Cornell University. The purpose of this experiment is to find ways of integrating figures with text and to study search methods and the man-machine interface, with Cornell chemists acting as the text subjects. A more detailed description is given in Appendix E .

F. Databases on CD-ROMS and Other Portable Media

The recent development of CD-ROMS and especially the drop in their production costs associated with the large production capacity developed for the music industry, has led to a sudden proliferation of databases on this medium. (The setup costs for manufacturing a single CD-ROM are about $2,000; the cost for additional discs is about $2/disc.)

A CD-ROM holds roughly 500 Mbytes (more than a year of Physical Reviews, minus figures and without index), so it will hold all kinds of databases heretofore widely available only in print (e. g., encyclopedias, bibliographic reference books like Books-in-Print etc.). To read a CD-ROM, one needs only a CD-ROM reader costing $500-$1000 To search the database, one needs search software and an index for the CD-ROM, both of which usually come on the CD-ROM itself.

In recent years, a host of databases of special interest to scientists have become available on CD-ROMs. We list some of these in Appendix D . One characteristic of these is that none are truly massive (requiring several CD-ROMs) and all are essentially static, i.e., requiring little updating or updatable quite simply with issuance of a replacement CD-ROM.

It is tempting to want to solve the shelf-space problem of libraries, and introduce some machine-searching capability at the same time, by putting scientific journals on CD-ROMs, but several problems come immediately to mind. Here are some, a few with partial answers.

  • Is the CD-ROM machine-readable or bit-mapped? Only machine-readable text can be searched, but at present there is a babel of formatting languages behind the printed literature. There are also ways to combine machine-readable text and bit-mapped illustrations (e.g., using Postscript as mentioned in Sec. III B), but there is no standard way. The IEE/IEEE/UMI experiment (see below) is trying a hybrid solution: bit-mapping the journal pages, but having machine-readable bibliographic information and abstracts for searching.
  • If the current literature is to be made available on CD-ROMs, the entire year's English-language physics literature might occupy only 15 CD-ROM's, so that they would come out only once every three or four weeks-along delay. If less than the whole world's literature were available, the discs would appear even less often. A "solution" found by The American Mathematical Society (see below) is to issue discs more often, with each disc covering a wider time span, but with substantial overlap from one disc to the next. The discs are on lease and must be returned.
  • If it were possible to have the entire literature in machine-readable form on CD-ROM's, the reading of this collection could be severely stressed by even a small number of would-be readers (imagine a library in which only one person at a time could browse all the physics journals that have arrived in the last three or four weeks!). The obvious solution is multiple copies and several CD-ROM readers. The time-sharing of single discs is not possible because of access speeds and non-random access, although the information on a CD-ROM can be stored or cached on servers with magnetic disks.
  • The collection, if not accessible through a local-area network, would still have the remoteness quality of an on-paper library, not a full step into the electronic age. Such local-area networks are clearly coming, but the more users the greater the previous problem becomes.
  • Although each CD-ROM would carry its own index, the searching of the entire library, especially context-sensitive searching, would be impossible, given the slow rate of switching from one CD-ROM to another.
  • If the CD-ROM is a replacement for on-paper journals, what would that do to the production cost per copy of the paper version, as the demand drops? And if it is not a replacement, then what is it? Imaginative pricing (low price if added to a paper subscription, high price if not) is being tried by the AMS, which finds CD-ROMS more competition for its Math/Sci Online than for Mathematical Reviews.
  • CD-ROMs would have fewer of these problems if they were used only for archiving, year by year, with the current issues continuing on paper, but then they would be an intelligent replacement for microfilm.

We have alluded to two "experiments" on using CD-ROMS now in progress that may give some useful answers. We describe them briefly here, provide more detail in Appendix E , and leave a detailed study of these experiments to a specialist in electronic information systems. (In Sec. VI we recommend that the APS hire a specialist to follow on-going experiments and develop the Society's plans in more detail.)

IEEE/IEE Publications Ondisc
The first experiment involves all the journals, standards, and conference proceedings published (about one-third of those sponsored) by the IEE and IEEE, which are being bit-mapped onto CD-ROM called "image discs. " The citations, including abstracts, for all these articles as found in the IEE's INSPEC database (see Appendix C ) are being put on CD-ROMS called index discs, which will serve as a machine-searchable index to the articles. This product will come to about 24 image discs and 12 index discs per year. University Microfilms will produce the CD-ROMS, install the appropriate hardware, supply the necessary software, and market the product to libraries. The discs will be leased, not sold, so that libraries, to ensure long-term archives, must continue to purchase the on-paper version. A detailed report on this experiment is given in Appendix E . [During the writing of this report we became aware (See Ref. 2) of the similar Adonis project in which a consortium of European publishers (Blackwells, Elsevier, Pergamon, and Springer) plan to distribute to subscribers a CD-ROM per week carrying the bit-mapped images of more than 400 scientific journals at something like 5000 typical pages per disc, the result of a two-year trial. Details like indexing, searching, pricing, etc., are not available to us at this writing, but this experiment also bears close watching.]

We are tempted to make a few observations.

  • The publications involved contain only 26% of the published electrotechnological literature, which limits the search window to much less than the full literature, a serious drawback.
  • The bit-mapping option may be viewed as temporary, a way to start up with the literature one has rather than to wait until one has it all in one formatting language (or a small number of languages) and one knows what to do with the figures.
  • The searching of bibliographical information and abstracts is claimed to be nearly as efficient as searching full text, given the current search algorithms, but is it? And would it be if the search algorithm were smarter?
The consensus of the Task Force is that this experiment is premature, except possibly as a replacement for microfilm and as an intelligent copier for students, but we shall see.
Math/Sci Disc
The American Mathematical Society is now making its Math/Sci Online database available on lease to would-be subscribers. This database is far smaller than the IEEE one (about one CD-ROM every two years), so the frequency-of-delivery and indexing problems are entirely different Also, just as full text is available on-line with machine-forrnattable TEX code, it is also available on Math/Sci Disc. The necessary software to give formatted output is supplied with the service. It is unclear to the AMS what they will do when they have filled their first disc (not yet, but soon).

G. Conclusion

We are now at a time of true revolution in the communication of scientific information. Much of the hardware for the individual's environment is available; the rest is on the way, as we shall see. The practice of using this hardware is becoming more widely spread every day. Wide-area networks of the necessary capacity and speed are further away, but much is on the drawing board, as we shall see in Sec. III C. Major problems include an agreement on the form and standards for this communication and methods for searching the masses of information becoming available, methods that are more efficient (and more sophisticated) than anything now available.

III. IMPORTANT TECHNOLOGIES

A. Hardware in Use Today

The following discussion is not meant to be exhaustive or highly detailed as to all the systems and their specifications. The purpose is to introduce the reader to a broad range of what is generally being used and generally available and to emphasize the main features and limitations as they pertain to electronic publishing of journals that are currently being printed.

For storing and retrieving documents today, there is a variety of magnetic and optical systems and media. Optical systems and media, which usually have the benefits of permanence and ruggedness over magnetic systems and media, are either write-once read-many (WORM) disks and tapes, erasable disks (readable and writeable), or read-only disks such as mass produced CD-ROM discs and analog video discs. Magnetic media are either floppy disks, fixed platter disks (Winchester-type drives), or tapes, all of which are inherently erasable/reusable.

Disadvantages of optical systems and media, in general, are their lack of standards, cost of the actual media, slowness in writing and verifiing, and slow access speed to random data (due to movement of optical read/write mechanism). Disadvantages of magnetic systems and media include vulnerability of the media/system to failure (disk crashes, accidental erasure, media damage) and their cost per byte of storage. Therefore, the choice of either optical or magnetic systems and media is strongly dependent on the actual application (number of users, cost and amount of distribution, speed of information retrieval/storage, integrity of data, etc.). It should be mentioned that the lifetime of most current optical media (WORM, read-only, or erasable) is finite, on the order of years. This is due to the ingassing, or diffusion into the medium, of atmospheric gases or contaminants that attack the reflective or transmissive layer used in the optical medium.

Magnetic systems and media that are currently used include 3.5-in., 5.25-in., 8-in., and 1.4-in. Winchester-type disk drives with capacity ranging from 10 megabytes (Mb) to over 1.2 gigabytes (Gb); 3.5-in., 5.24-in., and 8-in. floppy disk drives with capacity ranging from 256 kilobytes (Kb) to 10-Mb, 0.5-in. tape in various configurations (reels and cartridges) with capacity ranging from 10 Mb to over 300 lMb, 0.25-in. tape in cartridge form with capacity from 60 to 300 Mb, and the new 8-mm and 4-mm helical-scan tape systems which store 2.3 and 1.2 Gb, respectively.

Optical systems and media that are currently used include 5.25-in. 8-in., 12-in. and 14-in. WORM discs with capacity ranging from 200 Mb to about 3.2 Gb per side, erasable discs, mainly 5.25 in., with capacity ranging from 200 to 600 Mb, and mass-produced (i.e., stamped) read-only discs such as 5 l/4-in. compact-disc-read-only-memory (CD-ROM), analog video discs in 8-in. and 12-in. sizes with capacity ranging from 600 Mb to over 1 Gb. A new and exciting optical storage technology is based on so-called digital paper (See Ref. 3) from ICI, which is a WORM medium being used in tape form and floppy form. The tape form of digital paper can store up to one Terabyte (Tb) on a 12-in. tape reel and the floppy form up to 1 Gb on a 5.25-in floppy disk. In addition, optical discs are being configured into so-called "jukebox" systems that offer disc selection through mechanical means similar to musical jukeboxes.

With the data stored on some medium (either magnetic or optical), remote access or transmission is typically required. This is usually accomplished through the use of local-area networks (LANs), which offer up to 10 Mbit/see access rates, wide-area networks (WANs), which offer up to 1.5 Mbit/sec access rates, and dial-up modem connections, which offer up to 9.6 Kbit/sec access rates. LANs are typically configured on Ethernet with various protocols such as DECnet and Internet. One of the most popular and widely used collection of networks is based on the Internet protocol and consists of a variety of loosely coupled networks such as the ARPANET, MILNET, and NSFNET. This collection of networks is actually a WAN because the networks span a great physical distance and are limited by lower-speed data rates (typically 56 Kbit/sec or 1.5 Mbit/sec) in their interconnections. Bandwidth or access speed to users on these networks is highly dependent on the number of users and/or traffic on the network and is rarely the full bandwidth of the network. Therefore, as the use of these networks increase, the effective access speed or transmission rate to any individual user has to decrease. On the other hand, modem connectivity guarantees that the user will always have a fixed rate of access or transmission speed. Because the analog telephone network is the most widespread and widely used network in the country, modem connectivity offers a universal mechanism for remote computer access and data transmission. The upgrade of the current analog telephone network to the digital version called Integrated Data Semites Network (IDSN) will occur over the next ten years and allow such access or transmission at speeds from 64 Kbit/sec up to 1.5 Mbit/sec.

Most of the commercial databases that are currently available (such as DIALOG, STN, EASYNET, BRS) are usually accessed via direct modem connection (through dial-up lines) or through equivalent modem ports on one of the commercial private connecting networks (such as TELENET or TYMNET). Modem access (even at ISDN rates of 64 Kbit/see) greatly limits the amount and type of information that can be accessed or interactively manipulated. Thus, for universal or generic access to databases, on-line journals, references, or data, consideration must be given to the access method available to most of the users of the information.

A fast-growing method of transmitting and receiving printed material is facsimile Or "fax." This method scans a printed sheet of paper and produces a black-and-white digital bilevel image. However, the image produced, although good for textual material (though inefficient for just text storage), cannot be multilevel gray or color images. Resolutions of fax images vary from 200 to 400 dpi using well-established international standards for the analog or digital transmission (with compression) of the data.

In the area of electronic publishing, there are currently several projects that are using CD-ROM to deliver information. There is the IEEE trial project discussed in Sec. II and Appendix E , which is storing fax images of all the pages of the IEEE journals on CD-ROM with limited indexing capability. Special workstations or terminals are needed to retrieve and display the journal pages for interactive user. Another project using CD-ROM is the "Computer Library," which is a collection of full pages (text-only) of the previous 12 months of about 20 computer magazines plus abstracts of articles in about 100 other magazines. The information in this collection is fully reversed indexed (i.e., the CD-ROM contains a fully inverted index to all the information on the disc). AS we have seen, the American Mathematical Society has been producing the Math/Sci Disc, which is a collection of their database records for articles in two of their publications. This Math/Sci Disc contains the complete journal articles, but no figures or images. Appendix D contains a representative list of some current CD-ROM-based databases and information sources.

B. Software

As indicated in Sec. II, the use of word-processing equipment and software has increased significantly in the physics community, and most documents submitted for publication are produced electronically with a word processor, personal computer, workstation, or mainframe.

Until recently, however, there was little if any need for standardization among the various word-processing systems. The only output from such a word-processing system was the printed page, with all the printed pages plus figures submitted by mail. The journals or conferences would enter all the text manually or through optical-character-recognition (OCR) equipment.

Today there is a full range of word-processing equipment and software ranging from simple text manipulation to typesetting programs capable of equations, tables, and graphs to full-page-layout systems that manipulate the entire page, including figures and images. The next generation of products will support multimedia documents that will have audio, video, animation, graphics, figures, and text all cohesively linked and self-contained.

Recently, however, the demand for electronic submission of documents and the widespread use of computer networking has resulted in the need for standardization of the layout or formatting of the submitted documents. The history of the APS's attempt to set standards within the physics community starting with troff moving to plain TEX, and then to LATEX and REVTEX, has been recounted in subsection II A. Both TEX and troff offer the ability to control the textual placement and style of documents and, with additional procedures called macros, can format or lay out or organize the document logically (paragraphs, headings, pages, references, chapters, titles, etc.). The APS has prepared its own standard library of macros that maintains the current appearance and style of the APS journals and has named it REVTEX. To encourage electronic submission of APS manuscripts in REVTEX, the APS is distributing to potential authors the REVTEX library of macros. Other publishers have constructed their own sets of TEX macros that are specific to their journals or needs. One example is the American Mathematical Society (AMS) collection of macros called AMSTEX, which is oriented toward equations, tables, and text (no figures or images) because most of AMS papers can be presented in that form. Even though TEX has become a de facto standard for text preparation and style, there are already specific implementations of TEX (such as REVTEX and AMSTEX that are journal or application specific. This leads to an interchange-of-documents problem. In addition, there is a difficulty because both TEX and troff do not include standards for figures or images, and in fact, figures are usually still added to the text by editors in the preparation of the final document.

A number of efforts are underway to develop a higher level of standardization for document layout and text preparation to facilitate document interchange and logical consistency. The most active is the Standard Generalized Markup Language (SGML), which is an ISO standard and has been accepted/required by the Department of Defense (DoD) and other Federal agencies. SGML is widely used in Europe, especially at CERN. In fact CERN has proposed SGML as a standard for all journal submissions in Europe. An alternative ISO standard is the Office Document Architecture (ODA) with a related standard format called Office Document Interchange Format (ODIF).

Although SGML is not widely used here in the United States, its use has been mandated by the DoD and other Federal agencies for all their current and future documentation, to bring consistency and standardization to their large volume of documentation.

The problem of how to include figures and images (color, gray scale, black and white) within text-preparation systems is significant. One popular method (due to its use in facsimile) is the so-called bit-mapped scan of the image or figure. These bit-mapped scans of the images and figures are limited to black-and-white renditions of the original images and figures and are included in documents exactly as scanned. Extensions to this method allow gray scale and/or color at the expense of greater storage requirements. As an example of storage requirements for a bit-map-seamed document consider the following:

An 8 1/2 x 11 page (inches) has approximately a usable area of about 6 x 9, which is about 50-60 square inches. Most current seamers have resolution of 200-400 dots per inch (dpi). If the document is seamed at 300 dpi (to be consistent with current laser-printer resolution), the usable area would require about 5 Mbit of storage, or about 500 Kb. The ASCII code storage for the text alone on a typical page is about 5 Kb. A single-page figure or image would thus occupy 100 times the storage of a single text page. This is only for a b black-and-white image. For gray scale, multiple bits are needed, which increases the storage by a factor of 8. For color, the storage is increased by a factor of 24. Even with compression, the storage occupied by the figures is a substantial part of the total document storage. ( Appendix F contains a summary of useful numbers related to storage needs.)

Another popular method for generating or including figures or images is through the use of the page-description language (PDL) PostScript. This language allows for the inclusion of bit-map images or the generation of figures and graphics through its language description. Though the PostScript language is mainly intended to drive printers or display devices (terminals and workstations) through a concise language description of the document with its text, graphics, figures, and images (which the printer or display-device hardware translates into a form the local hardware can use), it is also being used for document storage and retrieval because of its popularity and proliferation.

C. Networking

In Sec. II we indicated that networking and especially e-mail are increasingly important to the physics community. The use of networks today falls into three major areas BITNET, DECnet, and the Internet. In the future two new initiatives will have a significant impact on computing and networking for physicists. These are the conversion to the International Standards Organization's Open Systems Interconnection standards (ISO/OSI) and the development of a National Research and Education Network (NREN) as part of the Federal High Performance Computing Program.

In this subsection we shall describe briefly the three major networks used by the physics community, and shall then discuss the ISO/OSI standards and NREN.

1. BITNET

BITNET began as a spontaneous effort to connect research communities. The acronym stands for the "Because-It's-Time" NETwork. Today BITNET serves more than 1500 computers in the United States and more than 2500 world-wide through connections to other networks such as the European Academic Research Network (EARN) and Nordnet in Canada. BITNET provides access to countries from the Far East to the Middle East. Today, BITNET might better be defined as the "Because-It's-There" NETwork.

Despite its popularity, however, the routing of BITNET connections and the choice of protocols do not lend themselves to developing a robust, high-performance network. As the network grew, additional universities attached themselves by leasing a line to a nearby node creating a "daisyChain" of universities that does not reflect the traffic of the network. In addition, the network uses a "store-and forward" mechanism so that all messages are moved sequentially through all nodes in the network along the path between the two end points for the message. This means that a singe point of failure can hold up a message for days, sometimes weeks, at a time. Nevertheless, BITNET continues to be very popular for transmission of e-mail.

The network provides only limited capability in other areas. It does not, in general, support file transfer, except in response to e-mail messages to an "information server." This approach has been used by the SPIRES group at SLAC to provide information from the particle-physics database at SLAC.

2. DECnet

DECnet is a proprietary protocol developed by Digital Equipment Corporation (DEC) for use on its PDP11 and VAX computer systems. More recently, it has been implemented on a number of other computer systems as well, but its widespread popularity is a result of the extensive use of VAX/VMS computers in physics departments and other science departments at universities and laboratories throughout the world.

The DECnet system provides a full range of networking functions including e-mail, file transfer, file sharing, and distributed computing. The major use of DECnet is, however, for e-mail and file transfer. The BITNET mail has been implemented using DECnet as a carrier for providing BITNET on VAX/VMS systems.

DECnet for physics in the United States evolved in a similar way to BITNET. In the high-energy-physics community the links were established to further specific programs at SLAC and LBL, and later at Fermilab, Argonne, and Brookhaven. University groups associated with programs at those laboratories leased a line to that lab. A backbone network architecture evolved through the volunteer efforts of the network managers at the various labs. Similar evolution occurred in the Space Sciences program, giving rise to the Space Physics Analysis Network (SPAN). More recently, other groups of scientists have utilized the backbone and have added lines to their university. The physics DECnet network now reaches more than 20,000 computers worldwide and is the largest DECnet network outside Digital Equipment Corporation.

The continuing support of DECnet has been given a high priority by the DOE and is part of the Energy Sciences Network (ESNET). Following the backbone topology, DOE has installed high-speed lines (T1 or 1.5 Mbit/see) throughout the United States. These lines carry DECnet and, in addition, Internet traffic (TCP/IP) discussed below. This network will continue to satisfy the needs of the physics community, but DEC is committed to moving the DECnet protocols into compliance with the ISO/OSI standards. Within the next 12-18 months the DECnet network can be expected to be a part of the networking mainstream.

3. Internet

In the 1960s the Defense Research Advanced Projects Agency (DARPA) began a program to develop and demonstrate computer networking based on packet switching. This effort resulted in the ARPANET, the first packet-switched network. Later, DARPA supported the development of a set of procedures and rules for addressing and routing messages across independent networks. The DoD has adopted these "Internet Protocols" as standards for all its packet-switched data communications.

The National Science Foundation (NSF) has used the same internet technology to build a national network, NSFNET. This network consists of a high-speed backbone (1.5 ~Mbit/see) to connect many regional or mid-level networks as well as the NSF supercomputer centers. The regional networks are managed locally and serve large parts of states (e.g., Bay Area Regional Research Network, BARRNET, in Northern California), states (e.g., New York State Educational Research Network, NYSERNET), or multistate regions (e.g., Southern Universities Research Association Network, SURANET).

The DOE and National Aeronautics and Space Administration (NASA) have developed and continue to evolve networks that support their programmatic needs and that adhere to the Internet standards. These are the Energy Science Network (ESNET) and the NASA Science Internet (NSI). The interconnected set of networks that uses the Internet Protocols is referred to as the Internet. The interagency networking efforts will be further enhanced with the advent of the National Research and Education Network (NREN) discussed below.

4. IS0/0SI

The International Standards Organization (ISO) has, for a number of years, been working on a new level of protocols for Open Systems Interconnection (0SI). These protocols are based on a seven-tiered model of networking from the presentation layer (user interaction with network resources) to the physical layer (describing the pattern of bits on the interconnection medium). The definition of the standard is essentially complete, and vendors are developing products that meet them.

The move to the 0SI protocol standard will be taking place over the next few years and will provide enhanced connectivity and interoperability between a variety of computer systems from various vendors. In particular, the distinction between DECnet and the Internet Protocols (TCP/IP) will disappear.

The use of ISO/OSI protocols will become more important for the physics community as a result of the introduction of new products and in response to pressure from the United States Government, which has adopted a policy of supporting the migration to ISO/OSI through its GOSIP (Government Open Systems Interconnection Profile) directive. This migration will take several years to complete.

5. The National Research and Education Network

In response to a series of studies by the Federal Coordinating Council on Science, Engineering and Technology (FCCSET) in 1986-87, the Office of Science and Technology Policy (OSTP) developed a Strategy for Research and Development in High Performance Computing. This program proposed four areas of effort: high-performance computer systems, advanced software and algorithms, networking, and basic research and education. This strategy formed the basis for a bill submitted by Senator Gore and, more recently, has resulted in a program plan for the Federal High Performance Computing Program developed by the OSTP.

The National Research and Education Network is seen as three-phase program. The three phases will proceed in parallel.

  • Stage 1 upgrades the trunk lines in existing backbone networks of the participating agencies to T1 speeds (1. 5 Mbit/see). The agency networks will remain distinct and individually funded, but will be interconnected to permit interagency communication. Most of Stage 1 will be complete in mid-1990.
  • Stage 2 coalesces the physically distinct backbone networks into a single backbone with shared T3 trunks (45 Mbit/see). The agency networks will remain logically separate to ensure that the security, integrity, and resource requirements are met. The backbone will support high-speed connections to hundreds of institutions via links to mid-level or regional networks. Stage 2 will begin in early 1990 and is expected to be complete in 1993.
  • Stage 3 is the research, develop ment, and implementation phase that will result in a shared network with multi-gigabit-per-second trunks. The objectives for this phase exceed the reach of current technology. The new technologies that will be developed are expected to drive products and applications well into the next century. Pilot projects to test new technologies will begin in 1991. Gigabit networks are expected to be in widespread use by 1996.

The participating agencies are the NSF, DARPA, DOE, NASA, and the Department of Health and Human Services. The NSF is the lead agency for implementing and operating Stage 1 and Stage 2 networks. DARPA is the lead agency for the advanced research required in Stage 3.

The planned network will provide access to computing and information resources throughout the country. It will have a major impact on the conduct of scientific research and will significantly influence education in the United States. It is this network, coupled with modern workstations, that will permit us to realize the vision for physics information described in the next section.

IV. VISION FOR THE YEAR 2020

A. Introduction

Once physicists communicated their results to one another by letter, and the small number of physicists could keep in touch with the whole world of physics in that way-but that time is long gone.

More recently, physicists could go to occasional, small meetings or conferences, meet directly with many of the principal workers in a broad field, and have discussions of the most informal and speculative kind. Or, they could scan the tables of contents of most of the physics journals, be aware of essentially all that was published that was of interest to them, and read most of what was especially interesting. That day is now gone, too.

Today, meetings devoted to a narrow field may be as frequent as one a week (in high-temperature superconductivity, the worst example) and attract as many as several thousand participants. In addition, the size of a year's physics literature is doubling nearly each decade; almost no one has time to read it; scanning the tables of contents is an arcane part practiced successfully by only a few; and the canonical question occurring in almost every discussion among physicists is "who has time to read?" It is simply no longer possible either to be aware of new developments or to find out about much old knowledge using conventional methods.

The explosion in the printed literature has further ramifications. Al- though its volume is doubling nearly every ten years, the cost to libraries is increasing much more rapidly. This is because, with the proliferation of specialized journals by commercial publishers, the average cost per printed character is skyrocketing (see Refs. 4 and 5). Little wonder that the community of Librarians is becoming restless, if not downright rebellious, from the pressure this is putting not only on their shelves but also on their budgets.

Thus a revolution in how new scientific developments are communicated, both informally and formally, and how old scientific knowledge is stored, searched, and recalled is inevitable (see Ref. 6). Furthermore, the coming of storage media with unimaginable capacity, communication networks of equally unimaginable transmission rates, and personal computers and workstations with the power of yesterday's mainframes, will facilitate this revolution. The scientific community will switch largely, if not totally, to electronic media and networks in the next few decades.

In this section we give a vision of where we might be in the year 2020. The description is necessarily speculative and vague, but we don't think it is unduly optimistic; the seeds for most of what we describe are already planted. If anything, this asymptotic state may be reached by the year 2000 or 2010. By 2020 we may have moved on in unpredictable ways.

Electronic information technology will affect the physics community in many ways, but the most revolutionary and dramatic will be on the physics "literature" and the way physicists send and receive information. We focus first on this area and at the end of this section indicate a few other directions in which we may be taken.

In this section we describe a worldwide electronic information system devoted to physics (possibly to all science). This description covers several technical aspects of this system:

  • The hardware and software environments
  • The nature of the new "literature"
  • Producing the literature
  • Disseminating the literature-the Physics Information System
  • Using the literature
  • The "old" literature-of the preelectronic age
  • Novel forms of the literature - preprints, comments/discussions

The actual path by which the physics community reaches this asymptotic state will depend on the rate of technological development, the way the community of physicists and the institutions serving that community respond to this development, and the lessons learned from each step along the way.

We defer to Sec. V a discussion of nontechnical issues like the financial and administrative aspects. These, more than the technical aspects, will confront the APS itself with a host of challenges and opportunities whose resolution could have a profound effect on the asymptotic state.

B. Environment- Hardware/Software

Although there will be enormous variation among the facilities available to physicists in different kinds of institutions (public, nonprofit, and commercial) and in different parts of the world, we believe the overwhelming majority of physicists will have terminals or workstations connected to local or remote servers. These workstations will have color-graphic resolution comparable with today's best on-paper printing and computing power that exceeds today's largest mainframes.

The workstations will be linked either to local-area servers or directly to remote servers (possibly via mainframes). These workstations and the local and remote servers will all provide computing and information resources, and all these resources, accessible in seconds wherever they reside, will appear the same to the user.

The remote servers will be linked to form a single worldwide network having a hierarchical structure in analogy with the highway system with its superhighways, state and regional highways, local roads, and driveways. In fact a three-tiered structure for these networks is already beginning to emerge, as we have seen in subsection III C. The network of 2020 will employ fiber-optic cables in which a single fiber will be able to transmit at 50 Tbit/see, allowing a central or regional server with a high degree of parallelism to transmit simultaneously over hundreds of charnels at 100 Gbit/sec for each.

Storage of information will be possible at less than 10-4 times the cost of storing on paper and at such large densities that all the English-language physics literature ever produced could be stored on a few 12-in. reels of digital paper (at 1 Tb per 12-in. reel).

Software standards will be in place for dealing with a wide variety of tasks-text, equations, graphics, bit-mapped illustrations, hypertext scripting, multimedia output, computational algorithms, etc.-in a wide variety of documents, so that individuals working in the inevitable multitude of computer languages will be able to communicate easily with one another, whatever the nature of the document.

C. Scientific Information in 2020

In this subsection we shall describe the way scientific information in 2020 is likely to appear and to be produced, disseminated, and used.

1. The Forms of the Literature

Today we think of most physics information as-what can be displayed on the printed page, a black-and-white page at that. The information comes in papers that are ground into journals or proceedings, in books, in collections of numerical data that are usually represented by tables or graphs, and in some cases in programs that can be installed in one's computer to do certain tasks.

In the year 2020, all these concepts (if not the words) will have much more general meaning:

  • A "paper" or "book," to be called simply a "document," will be a block of computer code with any of the following features:
    • It will drive graphic terminals to produce high-quality printed text and graphic output that may be three-dimensional, colored, and dynamic, and that can be viewed in arbitrary orientations, with arbitrary magnification, and at variable speeds-all selected by the user.
    • When connected to additional output devices, it will produce elaborate multimedia displays.
    • It will put users in an interactive mode, allowing them to construct complex, multibranched paths through the document according to their needs-so-called hypertext in a "scripted" document.
    • It will allow users to obtain numerical data from a document in forms that are immediately usable in their own computer environments.
    • It will provide users with programs to evaluate expressions and solve equations occurring in the document.
    • It will allow users to bring to the screen, at the press of a key, other documents (or particular sections of such documents) to which the document under scrutiny refers.
    • It will allow users to make comments on the document.
  • Paper editions of journals, books, etc. will still exist, but on-paper versions will be limited to presenting only restricted portions of some documents (e.g., no multi-media, no hypertext, etc.) just as on-line services today can present only restricted portions of the papers appearing in print (e.g., no figures or displayed equations can be seen in the on-line version of the publications of the ACS). AS the percentage of such documents increases and electronic networks, workstations, etc., become more widely avai!able, dissemination by paper will diminish. We don't imagine paper disappearing completely but 2020 is a long time away.
  • Numerical data and computer programs will also exist as independent entities, in forms that are transportable and therefore immediately usable.

2. Producing the Literature

For the preparation and submission of documents, the author will use a WYSIWYG (What You See Is What You Get) editor to create text files and to integrate graphics and tables. Other material (data, equations, and their solving algorithms, etc.), material on other media, and the "script" for reading a hypertextual document will also be prepared by the author. For all these parts of a document, the author will have available a variety of languages, all of which conform to certain internationally accepted standards so that editors, referees, and readers can all "read" these documents, perhaps with readily available translator to languages resident on their own workstations or servers. With the additional power that such documents offer over traditional papers, and with the freedom to submit in a variety of computer languages, the circulation of drafts among collaborators and the submission of documents to journals will almost all be done electronically. However, experience suggests that the versatility described will, in fact, be necessary before the number of nonelectronic submissions becomes as small as the number of electronic submissions is today.

One can even imagine the day when machine translation of natural languages would be so advanced that papers could be submitted (and even read) in a natural language of the author's (reader's) choice. The software developments that will make this possible will occur independently of any efforts of the APS, however, and so are not our concern.

Journal staffs, proceedings editors, book publishers, etc., will continue to exist and perform work similar to what they do today. Their work will be considerably expedited, however, by electronic submissions (if authors can use familiar languages that conform rigorously to international standards). Communications with referees, the subsequent back-and-forth with the authors, and the proofreading by authors of changes made by editors will all be electronic, hence accelerated and simplified.

The publication process will be completed by the journal or book publisher who will produce blocks of code for each document (and additional code for printing on-paper journals, when appropriate) together with keywords and/or other indexing features where appropriate. These blocks of code could conceivably be supplied directly to institutional libraries and even to individual users, possibly document-by-document and on-line. More likely, the code will be supplied to a world Physics Information System, which we describe below.

Although journal staffs or their equivalent will still be required, the importance (and distinctive character) of journals per se, as opposed to individual documents, will depend on a number of policy questions How universal is the searching, whether users (local libraries, individuals) will still have to subscribe to entire journals to access full documents in those journals, how users are charged, how producers (publishers) are paid, etc. The answers given to these policy questions will have a profound effect on the sociology and economics of journaI production and on the use of scientific information generally. We discuss possible answers in Sec. V.

3. Disseminating the Literature-The Physics Information System

Today most of the physics literature is disseminated to the ultimate consumers, the physicists, on paper via the intermediary of institutional libraries (although, as we have seen, some of the chemistry and mathematics literature is available on-line and some of the mathematics and electrical engineering literature is being disseminated via portable media such as CD-ROMS).

Although the dissemination on paper via institutional libraries could continue, we think the dominant mode will be via a single electronic physics library, or Physics Database, which will be the heart of a worldwide Physics Information System. This mammoth electronic database will contain all published books, papers, conference abstracts and proceedings, numerical data, computer programs, etc. This contrasts with current electronic databases (e.g., those provided by STN, DIALOG, BRS), in which the contents of one database may not be in another (e.g., most of the text of articles published by the ACS is in an STN database, but only bibliographical information is in any BRS or DIALOG database) and in which, at least in physics, chemistry, and mathematics, the full document is never available.

The Physics Database, continually updated, will be available to users on-line from a central server or regional servers. Equally available will be software for searching, outputing, usage monitoring, and charging. Although we speak of a single Physics Database, we do not exclude the possibility that there may be more than one, just as there are now at least two on-line databases of physics abstracts and even more systems to access them.

The entire Physics Database will be searchable at one time. This contrasts with the present nonelectronic libraries, which are highly fragmented and are searchable only within small domains of the literature (individual books, individual journals, only bibliographical information, only citations, etc.) and/or time (a few years, just one year, just a portion of a year).

The proprietor of this Physics Information System could have two responsibilities: the physical operation (hardware, software, networking) and the administration (making and implementing the policy decisions discussed in Sec. V). These could both be borne by the same organization, or the physical operation could be contracted out. The proprietor could be a government agency (e.g., the NSF, a new National Library of Science paralleling the National Library of Medicine, the Office of Scientific and Technical Information of the DOE), an existing consortium of not-for-profit institutions (e.g., STN International, which, we recall, is a joint venture of the Chemical Abstracts Service of the ACS, FIZ Karlsruhe, and the Japan Information Center of Science and Technology),a new consortium of scholarly societies (e.g., APS, AIP, European Physical Society, Physical Society of Japan), a commercial database proprietor (e.g., BRS, DIALOG), a consortium of commercial physics publishers-or some combination of these. It is clear that the policies and practices of the proprietors could have a profound impact on the physics community and on physics itself.

The Physics Database could ultimately be connected with databases in other sciences to form a single Science Database to which users of the Physics Information System, of a possible Chemistry Information System, etc., would all have access. The proprietors of the Physics Information System, in preparation for such an eventuality, should take steps along the way to ensure compatibility of the different systems and could take the lead in forming the Science Information System.

The physics Database will likely be available via portable media too. The database together with periodic updates (to the data, which would be easy, and to the index, which might be harder) could be supplied to users on CD-ROMs or some kind of tapes, under lease or sale, presumably (but not necessarily) by the same proprietor that operates the Physics Information System. We expect that CD-ROMs and tapes will be used in different ways.

  • On CD-ROMs, the data would come with the use and charging software needed for direct use in workstations that serve individuals (in private offices or in open rooms) or that act as servers for local-area networks (as part of a group or institudonal electronic library), in the spirit of the current IEEE-IEE-UMI experiment. In this mode, the resulting electronic database would resemble a present institutional library, although it would be far more compact (solving the horrendous storage problem), more easily searchable, and, possibly, remotely accessible. Considering that the literature might amount to 50-100 CD-ROMs per year, this medium/system might not have the access speed to serve many users simultaneously or to call up quickly a chain of citations or the ability to view constantly updated files (e.g., citation indexes, preprints, bulledn boards, and forums) - advantages offered by the on-line Physics Information Svstem.
  • On tapes (magnetic, digital paper), the data and index could be used for uploading onto a mainframe computer that would act as a server for an entire institution (a use for which the AMS is now beginning.to supply tapes of the database previously available as Math/Sci Disc). The tapes could even be used to supply mainframes that act as regional servers and are managed either by the same database proprietor or possibly by a group of -users in the region.

Individual publishers could also continue to deliver their code to traditional printing companies whose machines would, in all probability, be operated directly from this code.

4. Using the Literature

Electronic information systems, despite their great power and convenience, may be used much less than one would expect unless certain threshold barriers are made very low. Of course, such barriers, even if not lowered, will appear lower to future users than they do- now, but the speed of adaptation of the physics community to the electronic information age will depend sensitively on these barrier heights, something the APS must keep constantly in mind. By 2020, one can expect the use of the Physics Database to have the following features:

  • Accessing-To lower the threshold for using the system, access to the Physics Database will be possible in at least three modes, all with a standardized procedure, short access times, and minimal costs:
    • At the highest level, through the Physics Information System, the Physics Database will be searchable, readable, and downloadable from a central server. The user will also be able to interact with the documents in the database through this server and possibly write in the database.
    • Large portions of this database may also be available on-line from a local server, having been transferred to that server either electronically or via a portable medium.
    • Smaller segments, possibly customized for each particular user, will be available to individuals for use at their powerful workstations.
  • Searching-The enormousness of the Physics Database will require search programs that are easy to use and very efficient, finding a high percentage of what is relevant and very little that is not. By 2020 there should be extremely good search algorithms, since there is now much work going on in this area. Among the variety of algorithms that are possible, some will involve indexing documents by bibliographic information, descriptors, keywords or symbols, and citations-indexing that will be done partially by computer and partially with human assistance from editors, authors, other experts in the field, and/or professional indexers. Other algorithms will involve indexing of all substantive words (or their word stems) and symbols, indexing that can be done solely by computer. With some algorithms, the user will search using the usual boolean operations in combination with related operations like "with," "adjacent," etc.; with others, the search will be more interactive, using the computer as an expert system in sophisticated ways. Although the search algorithms supplied by the database proprietor may be designed for the particular database, or may be chosen to be the same as for other databases managed by that proprietor, users served by a local server may use algorithms that are well known in their own user community, an important barrierlowering strategy.
  • Downloading-In present central databases, downloading of a significant portion of a document is usually done on hard copy and transmitted by mail. In the Physics Database of 2020, downloading from a remote server will almcst certainly be over the network, with the user interacting with the document(s) via a local terminal or workstation. When the nature of the document permits, it might also be printed on paper; however, with time, documents will become less and less printable, so that downloading will inaeasingly be into computer files rather than onto paper.
  • Awareness services-Users will be able to order services to keep them aware of new entries into the database, services that automatically provide each user with titles, authors, perhaps abstracts, and possibly full documents-all matching profiles created by the user. These, and downloaded documents, could come with their own indexing information, so that each user could develop his own, private, fully indexed, electronic library.
  • Librarians-Despite the attractiveness of being able to search in a powerful and interactive way, many scientists will prefer not to be the actual searchers. This has been the frequent experience of information managers (and liorarians, where they are available to do the searching). This will define an important role for the electronic-age librarian.

D. Literature of the Pre-electronic Age

Publishers will also be encouraged to supply already-published documents to the Physics Database, possibly with only the titles, authors, abstracts, and citations in machine-readable form, the body of the text being available as bit-mapped pages. Keywords and other indexing features would be also be supplied, possibly with the aid of computers reading the abstracts. The day may come when full-text search algorithms have so improved, and the cost of transforming to machine-readable form has so decreased, that the APS, other publishers, and even the database proprietors may wish to convert all bit-mapped text to machine-readable form; but that day is not yet foreseeable. We can only urge that experiments like that of IEEE/IEE/UMI, which will be facing this problem weekly, and technical developments in full-text searching and character readers all be watched closely.

E. Novel Forms of "Informal Literature"

Two forms of more informal literature, on which dhe electronic age will have special impact, deserve separate consideration: preprints and comments/discussions.

1. Preprints

Preprints are already a form of the literature (if not the formal literature); in some fields like high-energy physics and high-temperature superconductivity, they are perhaps dhe dominant form. But their accessibility is haphazard depending on the mailing lists of authors, except in a few cases where an attempt is made to centralize their collection and the dissemination of their titles, authors, and abstracts (e.g., the SPIRES facility at SLAC in high-energy physics and the SIS facility of the DOE at Oak Ridge in high-Tc superconductivity). When most of the documents are submitted in machine-readable form, it is conceivable that the journals could support a database of these preprints (or include them in the single Physics Database). Such a database, being just as accessible and searchable as the formal literature, would be far more accessible and searchable than the preprints of today-even in fields that SPIRES and SIS now serve. The electronic availability of preprints could even formalize and give significance to an optional "presubmission phase" in the publication process that would have the following characteristics, some of which are proposed in a provocative column by Rogers and Hurt (Ref. 10).

  • The preprints would be available on-line, as preprints (i.e., unrefereed), but only with author approval.
  • Each preprint would have an associated data field reserved for direct electronic comments entered by readers.
  • The preprint iself would be removed from the database after a prescribed period, terminating the presubmission phase and ensuring a lasting importance for the actual publication.
  • Revisions would then be made by the author before the formal submission.
  • The original presubmission version, the readers' comments, and the final submitted version would all be available to the referees for dheir refereeing process.
  • Final publication would be marked by a reappearance of the accepted document in the on-line database.

There are many serious questions of policy dhatwould have to be resolved before such a database could be implemented, e.g., the impact on the standard refereed literature, on the quality of preprints if authors ever became widely contented widh such nonrefereed "publication," however transitory, the acknowledgments to be given commenters, etc. It is important now simply to note the possibilities.

2. Comments/Discussions

The current method of publishing scientific literature allows for very little comments, discussions, subsequent updating, etc. The electronic age would certainly make this possible, even easy. This might start as informal (i.e., unrefereed) comments on documents (or on other comments). Just as with documents, comments could refer to more than one document or comment, and each comment could also contain computer-supplied pointers to later comments and documents in the database that cite the particular comment. Comments could also refer to (and be indexed by) keywords, subjects, etc., rather than just the specific documents to which they refer. In this way, an entire web of referencing, both forward and backward in time, could develop that could involve both the published literature and informal contributions. The user could still restrict his searching to the more formal literature, but could now have access to a back-and-forth discussion of a given document or subject as well.

The desirability and details of including such comments in the database involve serious policy questions, e.g., the willingness of scientists to expose to broad public view what they may now confine to private discussions with trusted friends, the priority that would be claimable for contributions of this nature, the impact this would have on the willingness of scientists to participate, etc. As with preprints, it is sufficient now to note the possibilities.

F. Functions Other Than Producing the Literature

There are many other areas in which new electronic information technologies may play a role. Areas that come immediately to mind include the following:

  • Meetings/Conferences/Workshops - the preliminary announcements, the scheduling of talks and handling of housing arrangements, and all the publications before, during, and after the meeting.
  • Information for nontechnical and semitechnical audiences-for the press, for students, teachers at lower levels, and even for members and subscribers in fields not their own (e.g., Physics Today).
  • Educational services for students from kindergarten to the colleges and universities.

The Task Force has not concerned itself with developments in these areas because it was felt that the future there is unpredictable-the applications of new technologies in these areas will grow naturally with time, depending on the wit and imagination of many practitioners, but with no clear role for the APS.

V. ADMINISTRATIVE AND FINANCIAL CONSIDERATIONS: A CHALLENGE TO THE APS

Until now we have concerned ourselves with a host of technical aspects-hardware, software, networks, systems-as if they were the only, or dominant, aspects of the physicists' informational environment in 2020. Although developments in all these areas will be necessary-and inevitable-the policies on organization, pricing, payment of publishers, availability, and use will all have profound effects on this informational environment, And in these areas, a particular line of development is not inevitable; the APS can have a very important influence.

The possibilities in each of these areas are so varied that it may be premature to try to analyze their ultimate impact now. Nevertheless, some of these possibilities are also so important that they deserve mention at this early date, if only to alert the APS to the importance that its own leadership can have in determining how physicists interact with each other and with the scientific literature in 2020. We do this by discussing a few important aspects and the possible impact of each.

A. The Physics Information System as Consumer and Producer

Today, in the physics-information marketplace, the individual physicist may be the ultimate consumer of the published literature, but the institutional libraries are the dominant consumers economically most of the payments to publishers come from the libraries. As consumers, the libraries have much greater power than have individuals, but this power is still diffuse and hence weak. The producers in the marketplace are the publishers, and although they are a varied lot, some have (and exert) considerable economic power. This is because of the very inelastic (i.e., price-insensitive) demand for a journal, once it contains some minimum number of important articles. This explains how there can be one group of journals providing, on average, 1/20 as many words per dollar as another significant number of journals, a ratio that drops to 1/100 when the impact per word per dollar is considered (see Refs. 4 and 5).

In 2020, The Physics Information System will function as the ultimate middleman, both as a powerful consumer (being a massive library) and as a powerful producer (supplying the whole world of physicists and/or their institutions). If there is only one such system, its power in both roles will be enormous. Therefore, it will have to exercise this power with great care and consideration for all parties and for the good of physics. But will it? That may depend on who operates the Physics Information System.

Some cautionary implications The concentration of so much power in one group has certain drawbacks. These include bad decisions with no recourse, unhealthy attempts to influence these decisions, Justice Department actions to oppose such a concentration, etc. A detailed study of the medical field, and the National Library of Medicine in particular, where this kind of effort is further along, is in order.

B. A Single Physics Information System-Or Several?

The alert reader may wonder why there won't be several physics information systems, with competition among them diluting the influence of any system. There are, after all, at least three hosts for physics bibliographic information (BRS, DIALOG, STN).

The answer revolves around a basic difference between bare bibliographic information and full text, or even abstracts. The former cannot be copyrighted; the latter can. This difference is already reflected in online databases. The STN database PHYS has the original abstracts of all papers from the journals of the AIP and its member societies, whereas the INSPEC database on BRS and DIALOG, unable to reach agreement with the AIP on terms for supplying these abstracts, has substituted abstracts of its own, written by others on commission by the IEE, INSPEC'S producer. The situation with full text is more extreme, because there can be no substitute papers. Thus, only STN carries the full text of ACS articles, and, not surprisingly, STN is, in part, an ACS enterprise.

C. A Partial, If Not Total, Physics Database?

One can easily imagine a physics information system in which the documents from some publishers are fully available and documents from others are not. An analogous situation exists now with STN's Chemical Journals Online, which carries full text (minus figures, etc.-see Appendix C ) of the chemistry journals of the ACS, Royal Society of Chemistry, and John Wiley, but no other chemistry journals. Another analogy is AIP's SPIN, which carries advanced bibliographic information on articles in only AIP and AIP-related journals. These partiaI databases have some value, but hardly in proportion to their size. When users search they want to search broadly and all at once. If they are not just searching but reading and/or downloading, limitations on the range of readable papers may be less of a drawback ("some is better than none"). But in 2020, searching and reading will be more strongly interacting activities, in which case documents that are missing from a database will be at a disadvantage. If the database has a significant part of the literature, so that it becomes widely and frequently used, the pressure on publishers of the rest of the literature to have their journals included will be strong.

D. A National Library of Science-Not Just of Physics?

Today, there is a National Library of Medicine. By 2020, a National Library of Science, rather than databases limited to just physics or chemistry, is probably inevitable. The establishment of such a facility would, of course, change many of the administrative and financial issues discussed here. Such a national library could well emerge from the experiences that the ACS is gaining with its involvement in STN and the APS and AIP would gain with a Physics Information System. The actual formation of such a library will require real leadership, leadership that the APS and AIP, working closely with the ACS, could provide. The ease with which this library is formed will also depend on the success of the enterprises in the separate fields of physics and chemistry and on the degree to which these separate enterprises develop in technically compatible ways.

E. The Document, Not the Journal, Becomes Primary

Today, from an economic point of view, and even informationally, the journal, not the paper, is the primary entity. Subscriptions are to journals, not papers; to read papers, one must have an entire issue of a journal; to make oneself aware of papers, physicists will browse certain journals, but not others. Furthermore, to the extent that journals have a geographic base and physicists interact individually with others in their own geographic area, journals tend to enhance the geographic compartmentalization of physics; physicists are more likely to submit papers to "their" journals and to know of and cite papers in these journals. Thus the existence of a journal per se as opposed to the set of papers it contains, has a strong influence on what physicists are likely to read.

In 2020, when individual searches and awareness services can search worldwide, when they are ubiquitous and easy to use, and when ail documents are easy to browse, the document itself rather than the journal will have the stronger influence on what the physicist is aware of. Furthermore, if all documents are equally available, and at comparable cost, the document will have the stronger influence on what the physicist reads or interacts with. This realignment of roles is certain to affect the prices that end users or their libraries pay for the literature and the way collections of documents we now call journals are produced.

Some cautionary implications: If documents become fundamental, journals could tend to lose their identities. This could reduce the standards to which a journal aspires and increase the pressure for economy in production. The results could include a drop in staff morale, possibly a loss of volunteer editors or lower quality of editorial work, possibly a loss of referees and a degradation of the peer-review process, and a lower quality of the final product, the published documents. If this happens widely, that would be most undesirable.

F. Monitoring Usage- and Charging Users

Today, it is clear that a journal sold to a library is used much more than one sold to an individual. Hence, one has a complicated multitiered structure of prices to reflect this difference. But the system is very crude; libraries serving 200 physicists and libraries serving five must pay the same.

In 2020, the usage of the Physics Information System could be monitored in exquisite detail-the number, kind, and extent of uses (searches, inspections, studies downloadings, other interactions, etc.) could all be recorded. Usage by b accessing a local server (itself fed with magnetic tapes or CD-ROMS) would be more difficult to monitor, but not impossible. Charging that takes this detailed usage information into account, perhaps not for each individual but statistically for each user's institution, would not be crude, and it could be far fairer.

Another implication: Usage monitoring could replace some of the current methods for evaluating physicists (e.g., number of citations or of letters in Physical Review Letters).

G. Ability to Pay

Today, no real attempt is made to charge institutions (or individuals) according to their ability to pay. Not only is this ability difficult to estimate, but also widely different charges could lead to efforts to circumvent the differences. (Certainly the big price differential between individuals and libraries has led to many public displays of "private" journal collections.)

In 2020, it might not be inappropriate to make the price structure vary from one region of the world to another in supplying individuals, other parameters beside geography could also enter in determining this structure. The Physics Information System could easily handle this.

H. Paying Publishers

Today, the economics of journal production are extremely varied. There are very general journals and highly specialized journals; there are nonprofit publishers, commercial publishers, and even something akin to "vanity" publishers; there are journals with large page charges, some with none, and some that pay authors; there are journals where the editors serve gratis, and journals where editors are paid. And, as we have noted, the ratio of price per word per unit of impact between some "high-priced" journals and some "low-priced" journals exceeds 100.

In 2020, a single Physics Database could have a significant influence in reducing this ratio. Payment to publishers could be at flat rates; it could be based on monitored usage of the documents in their journals; and it could be negotiated with each publisher to take into account other factors in the publisher's own situation (societal, commercial, etc.). The possibilities could also be nightmarish: one could imagine a situation in which the publisher and author would negotiate a "price" (positive or negative) for each document, much as is now frequently done for books. Such a situation could in turn have an effect on what is written and how it is written. The possibilities are enormous, and their implications are not all desirable. Here, we have simply mentioned a few.

I. Conclusions

It is clear that the world of physics is on the verge of a revolution, a revolution that is driven by technology, but whose true nature will be determined by the response of the world scientific community. The revolution will change what and how physicists read, how they become aware of what they read, and even what "read" means. Institutionally, the revolution will change the very nature of libraries, will reduce the importance of journals, and could have important financial consequences for both. This revolution will lead most likely to a National Library of Science-or even a World Library of Science. The way all this comes about will have a profound impact not only on physicsts but also on physics. In this process, the APS, in close collaboration with the AIP, will want to play a fundamental role. This is the challenge that the vision of 2020 presents to the APS today.

VI. PLAN TO REALIZE THE VISION

A. Goals

In developing a strategy for Electronic Information Systems, the APS should have the following goals

  • A National Physics Database
  • A National Physics Information System
  • A World Physics Database or Information System
  • A World Scientific Database or Information System

The development of a World Scientific Information System will take many years and will require the cooperation of all scientific publishers. We believe it is essential that the APS, working with the AIP and its other member societies, take a leadership role in bringing together the professional societies so that we can realize this final goal as soon as possible, and so that the World Scientific Information System will be organized to provide maximum benefit to the scientific community and thus to science itself.

In the meantime, however, the Society must plan its own activities to ensure that it can satisfy the needs of its members. We turn now to a discussion of strategy.

B. Strategy

The Society has, in the past, taken a leadership role in using new technologies to improve publication procedures. Recently, however, that lead has been taken by other societies. Although the APS will not take on major new projects, we feel that it is important that the Society continue to apply new technologies and help to define the direction for scientific publication. We suggest the following components of an APS strategy.

  • The Society must modernize and enhance its computer capacity and improve its network link to provide the capability of dealing with a large volume of electronic submissions and electronic reviews. It should encourage authors to use electronic submission and should increase the use of electronic reviews.
  • The Society should closely track developments in the industry and in other societies to ensure that its efforts are consistent to the extent possible.
  • The Society should ensure that its activities in this area are reviewed on a regular basis by knowledgeable members of the APS and of other societies and organizations.
  • The Society should encourage projects to learn about techniques for full-text searching and how physics data can be used effectively with advanced search techniques. It should encourage the physics community to use new electronic techniques for information exchange and recovery.

In planning its activities, the APS must recognize that some of the short-term and medium-term items will not move the Society toward its goal of a World Scientific Information System. The steps we take now, however, will help build the expertise and the facilities that will be needed to develop our vision in collaboration with other professional societies.

C. Recommendations

In this subsection we describe short-term and medium-term projects that will significantly improve our handling of electronic information and that will begin to develop the foundation for the World Scientific Information System.

Upgrade computing equipment
The APS computing facility consists of a VAX/780 computer, a technology that is now over 10 years old. Since that computer was introduced, there have been major changes in the computing industry, and the typical desktop computer has more capacity than the VAX. To deal with increasing numbers of electronic submissions and to increase the use of modern publishing software, the APS should begin a program of upgrading the main computer system and adding workstations that are connected to it. Funds should be committed each year to continue to acquire new equipment.
Improve network connectivity
The computer system should be connected to the New State State Education and Research Network (NYSERNET) to permit access from the Internet and to provide higher speed and better reliability. The BITNET connection should be retained as long as necessary to handle e-mail that cannot be carried by the Internet.
Have someone monitor industry, other societies, pilot projects
We believe that the Society will need a new person to monitor projects in other societies and to plan the APS program. We suggest a job description in Appendix H . It may be that some of these responsibilities will be taken by existing staff. If that is the case, we feel that a new staff member will be necessary to ensure that the staff have adequate time to devote to these new responsibilities. An important aspect of the communication with other journals is better collaboration with the AIP. Since the APS and AIP share facilities and, together, account for approximately 30% of the physics literature in English, they can significantly influence the development of electronic information systems.
Investigate and test new publication products
This will be one of the major responsibilities of a member of the publishing staff, either a new hire or someone assigned from another area. There are a variety of products for page layout and embedded graphics, as well as new standards like SGML. The Society will need some expertise in these areas to make decisions about new acquisitions and to ensure that what they do is compatible with other societies.
Have the Publications Committee monitor progress
The APS Publications Committee should review the development of new technologies and the progress of projects in other societies.
Ensure that the Publications Committee includes experts in electronic publishing
To evaluate progress in these areas, the Publications Committee should have some expertise in these areas or should establish a technical subcommittee. The APS may need to modify procedures for naming committees to ensure that some members of the committee have this expertise.
Work with other journals to increase availability in databases of physics information, including full text and graphics
Begin an effort to make the physics community aware of potential of electronic journals
This should include special sessions or demonstrations at meetings. Part of this effort should be directed at increasing the number of electronic journal submissions.
Encourage special document projects using the new technologies
This could be done by a competition with awards and demonstrations of winning entries at meetings.

D. Financial Impact

The major financial impact of these recommendations is the addition of a new staff member. The new hire may, in fact, take on responsibilities of an existing staff member, so as to permit that person to concentrate on monitoring ongoing projects and investigating new technologies.

The replacement of the VAX/780 will require approximately $250K. This replacement must be planned carefully, so that there is an opportunity to move existing programs and files to the new system. The Society should budget $5OK-1OOK per year to continue to introduce new hardware and software into the publication office. Once the conversion to a new system is complete, the costs of hardware maintenance should decrease.

The connection to the Internet, through NYSERNET, should cost approximately $20K-30K. The continuing maintenance of the link will be about $5K.

VII. CONCLUSIONS

In this section we wish to summarize the principal points of this report and emphasize the conclusions they imply.

We have defined a single, long-term goal, a World Scientific Information System, which we have called our Vision 2020 and have described in detail in Sec. IV. This vision has several principal features:

  • All the world's formal scientific literature is available, on-line, to scientific workers throughout the world, from a world scientific database. Various forms of less formal literature, like preprints in a "republication phase,'> comments and discussion on the published literature, and a variety of possible forums, could all be included.
  • The "documents" of this database would go well beyond the articles and books of the present on-paper age: they would provide the "reader" with an interactive environment including hypertext, instantaneous referencing both backward and forward in time, algorithms for evaluating expressions and solving equations occurring in the document all manner of graphics capability and even multimedia output, etc.
  • Individual documents, rather than electronic versions of journals, would become the fundamental entities, a result of broad and efficient searching algorithms. These algorithms will have much greater power and efficiency than current keyword-searching algorithms.
  • The database, stored in a few central places or possibly in many regional or local servers, would be accessed over netvorks having transmission rates of the order of terabits/see, many orders of magnitude faster than the best current dial-up lines (9600 Kbit/see).
  • Access would be via workstations with graphic terminals that far surpass current technology.
  • Parts of the database would be portable, and individual scientists could create their own local images of selected parts of the database.
  • The economics of both producing and consuming the scientific literature could be radically different from what it is today, and this could have a profound impact on current producers (editorial staffs and publishers) and consumers (libraries, their readers, and their institutions). These and related issues the discussed also in Sec. V and are summarized below.

The process by which we get to this vision will for the most part be evolutionary, with evolutions required in several different areas: hardware, software, networks, and the attitudes and habits of the working scientists, editors, librarians, and other individuals in the scientific-information community. All of these evolutions have much to accomplish, but each will have its own distance to go, will encounter its own problems, and will proceed at its own speed. In Sees. II and III we have described where we stand in these various areas and have attempted to describe where we are going, at least in the near future. Here we summarize some of the features of these evolutions, and the possible roles for the APS. We consider the different areas in order of what is likely to be increasing difficulty.

  • The hardware evolution, described in Sec. III A (workstations, graphics terminals, storage media, central servers), is probably furthest along, is least likely to provide obstacles to achieving the vision, and, because it is out of the APS's control, will offer the fewest challenges to the APS. The APS must, however, upgrade its own hardware and continue to remain near the leading edge in the various technologies, if it is to play a leading role in achieving the vision.
  • The network evolution may be next furthest along. Plans within the United States, described in Sec. III C, call for gigabit/see networks that are widespread by 1996. Terabit/sec transmission will be technically feasible not much later. Networks at such speeds, on a worldwide scale, and with something like uniform coverage, will take much longer, but they should come eventually. Economics and/or politics rather than technology will limit the rate of network evolution. The APS can at best play the role of an intelligent consumer, getting to the forefront of network technologies, providing feedback, testimony, and various forms of leadership wherever it seems possible and useful.
  • The software evolution has further to go. Software for text, graphics, "page" makeup, hypertext and multimedia, widely-adopted standards in all these areas, translators between different languages conforming to these standards, and powerful and efficient searching algorithms-all these could provide a more serious obstacle to reaching the ultimate goal. In this area, the APS must first get its own house in order, by choosing a text standard, making REVTEX or some equivalent conform to the standard, and then by participating actively in attempts to define standards in all the other areas. Some of this is discussed in Sec. III B.
  • Still more difficult are the socio-psychological issues-those centering around individuals like scientists, editors, referees, and librarians. Big changes in traditional attitudes and methods will be required, and many of these may be slow to come. An example of this, revealed in at least two surveys, is the small fraction of papers that are prepared for publication in any single computer language or set of related languages, even when the scientist-authors normally use one of these languages. Another example is the slow rate at which the editorial process at Physical Review is making its own transformation from paper to electronic. Here, the APS can do more. It can begin to reexamine its own methods, accelerate its own conversion to an electronic operation, and when its equipment is ready to reap the harvest, it can do much to promote the use of languages conforming to a single standard in document preparation (beginning with REVTEX), it can promote the use of electronic methods for submission of those documents directly into the author-prepared program, and it can complete its own conversion to electronic methods in document submission, refereeing, copy-editing, author-correcting, and page makeup. As the scientist community itself gets converted to this electronic culture, services involving full documents, like reprint delivery from the APS/AIP journals and on-line availability in commercial databases, will also become attractive to an alert APS.

The most difficult obstacles, discussed in Sec. V, are in the administrative and financial areas and offer the APS its most serious challenge. How these obstacles are overcome, and under whose leadership, could have a profound influence on the ultimate form of the World Scientific Information System-what it will be, when it will come, even if there will be one. In this area, the process is less likely to be evolutionary, more likely to involve some decisive actions by key players. Although the APS, together with the AIP, occupies a natural position to lead the transition at least to a Physics Information System within the United States and probably throughout the world, it is not the kind of activity in which the APS finds itself comfortable. Also, the competition from other sources with somewhat different goals, will be intense, because the form of the ultimate system will have enormous influence. In Sec. V, we have highlighted several issues:

  • The dominant role of such a system (and its operators) when it acts, in a terrain sense, as the ultimate middleman.
  • Whether there is likely to be one or several such systems.
  • Whether all or only some publishers will be represented, and the implications of this for users and publishers.
  • Whether the system should be for just physics or for all of science.
  • Why documents rather than journals will become the primary entities, and the implications of this revolution.
  • How the ability to monitor usage could affect the economics of journal production and literature consumption.
  • How information will be paid for, which could lead to much more equitable arrangements.
  • How publishers will be paid, which could have a profound effect on the publishing industry, and thus on the availability of scientific information itself.

Recognizing the unpredictability of the detailed steps in each of the areas leading to the vision, we have, in Sec. VI, recommended several steps for the immediate and short term that the APS should take. We have also suggested various strategical steps like having a "tracker" to follow technological developments, assuring expertise in this area on the Publications Committee, working towards b standards, etc. But if there is one overriding strategic recommendation that is implied, it is to keep an eye firmly on the vision and take whatever leadership initiatives seem appropriate to lead the U.S. physics community, and ultimately the world scientific community, toward that vision.

References

  1. E. van Herwijnen and J. C. Sens, Streamlining Publishing Procedures, Europhysics News 20, 171 (1989).
  2. John Maddox, Towards tbe Electronic Journal?, Nature 344, 287 (1990).
  3. Anthony D. Vanker, Digital Paper: Mass Storage Revolution?, Laserdisk Professional, January, 38 (1989).
  4. H. H. Barschall and J. R. Arrington, Cost of Physics Journals: A Survey, Bull. Am. Phys. Soc. 33 (7) (1988).
  5. Heinz H. Barschall, The Cost-Effectiveness of Physics Journals Phys. Today, July, 55 (1988).
  6. G. Feinberg, Making Knowledge Accessible: The CUSL Project, Columbia University preprint.
  7. J. Barg et al. (Editors), Directory of Online Databases, Cuadra/Elsevier.
  8. Mary Holland, IEEE/IEE on CD-ROM: A Review from a Beta Test, CD-ROM Librarian, February, 34 (1990).
  9. Information Technology and the Conduct of Research- The Users View, Report of The Panel on Information Technology and the Conduct of Research, of the Committee on Science, Engineering and Public Policy, National Academy Press, Washington, DC, 1989.
  10. S. Rogers and C. Hurt, How Scholarly Communication Should Work in the 21st Century The Chronicle of Higher Education, October 18, A56 (1989).
  11. Brian Kirk, Jumping On-Line with Your DataBase, Association Management, August, 171 (1988).

The Appendices to the report are provided separately.

Links to several related pages on the web as of January 1997:

  • UNESCO conference on Electronic Publishing in Science, Paris, February 1996.
  • The Los Alamos e-Print Archives which provides a world-wide unrefereed database of full text papers in physics, greatly resembles the Physics Information System described above.
  • SPIRES is now on the web, providing preprint information in high energy physics, citation statistics, and other related information. However, the "Superconductivity Information Service" (SIS) seems to have disappeared in the interim, or at least now has no conspicuous web site.
  • November issue of the APS News, discussing the current status of electronic projects related to the APS publications (see also the home page for the Physical Review journals).

The other information sources cited in the report (from the IEEE, OCLC, AMS, etc.) have evolved significantly and a synopsis would be beyond the scope of this brief addendum...

Contact apsmith@aps.org if you notice errors in this document. See the translation notes for details on how it was made available online.