|
|
APS ReportsPublished in the Bulletin of the American Physical Society Vol. 36, No. 4, p. 1119 (1991) Report of the APS task force on electronic information systemsINTRODUCTIONThe Task Force on Electronic Information Systems was formed in November 1988 by APS President, Professor Val Fitch. The charge to the Task Force was the following: The American Physical Society recognizes that the new information technology offers the Society an unprecedented challenge and opportunity to further the mission of advancing and diffusing the knowledge of physics. The rapidly decreasing cost and increasing use of personal computers, optical and electronic data storage and delivery and the use of these new developments by other suppliers of scientific information make it absolutely necessary that the Society move to utilize these technologies with all deliberate speed. The Society therefore requests the Task Force to:
The Task Force members are listed in Appendix A . The members include representatives of information-oriented industrial laboratories (AT&T Bell Laboratories; Xerox, IBM), universities (Princeton, Virginia), and some national laboratories with heavy electronic information requirements (Lawrence Berkeley Laboratory, Fermi National Accelerator Laboratory). The members themselves brought expertise in hardware, software, network management, data handling, information management, and on-line information usage. The Task Force was assisted by APS management (Miriam Forman), editors (Peter Adams, George Basbas, and Gene Wells) and publishing personnel (Peggy Judd and Peggy Sutherland). Their contributions to the work of the Task Force were critically important. The Task Force held six meetings between December 1988 and March 1990. These included extensive interviews with personnel involved in all levels of production of APS and AIP journals; with representatives of a database firm (Dialog), of scholarly societies (American Mathematical Society, American Chemical Society, and Institute of Electrical and Electronics Engineers) involved in experiments on information dissemination with Compact Disc-Read-Only Memories (CD-ROMs) and on-line, of the Association of Research Libraries, and of the Defense Technical Information Center and with information managers (Xerox PARC and LBL). We also had extensive opportunities to see and discuss some advanced hardware and software at AT&T Bell-Holmdel, and Xerox PARC. The meetings are summarized briefly in Appendix B , In addition, members of the Task Force met with the APS Council and with the APS Publications Committee. During the meetings, the Task Force learned a great deal about the efforts of other societies and of industry to utilize electronic technologies to manage and disseminate scientific information. These discussions and presentations have been most helpful to the Task Force in developing this report. The thinking of the Task Force has evolved significantly during the preparation of this report. The initial focus was on the delivery of the APS literature in a way that would give the physicist more convenient access to the information in the journals. Compact optical or magnetic data-storage systems would permit the physicist to keep large collections of physics literature in a personal computer or workstation. A physicist would be able to search not only the titles, authors, and abstracts, but also the full text of articles using powerful search algorithms. This alone could have a significant impact on his or her work. The Task Force has concluded, however, that this is not sufficient. Just as a library cannot serve the needs of the physicist by providing only The Physical Review, an electronic information system must provide access to articles in more than one or a few journals. The Society, in planning its own information system, must take into account the efforts of other societies and publishers and must where possible , adhere to standards that will make its journals available as part of the more-general scientific literature. We believe that the Society should adopt as a goal a National Physics Database. This, we hope, will be the first step toward integrating all the world's scientific literature into an electronic information system. It is not possible to present a detailed plan to accomplish this ambitious goal because we cannot predict in detail what technologies will evolve to address issues that this project would raise. The Society must plan to evolve gradually from the present, always guided by a long-term vision of a worldwide electronic information system for physics (or all science), while it monitors progress in a number of crucial areas. In addition, the Society, together with the AIP, will have strong reasons to take a leadership role in moving toward this vision. In developing our recommendations for short-term activities, we distinguish between two classes of project: those that help the Society accomplish its mission in the short/medium term, and those that move the Society toward what we see as the long-term goal, to be part of a full "physics information system. " It is clear that not all short-term measures (e.g., delivering journals on CD-ROM or articles by fax) are steps toward the long-term goal. In this report we begin by reviewing the present situation within the physics community. In Sec. II we describe the physicist's current electronic information environment, the developing role of electronic communications, the information currently available on-line in physics and closely related sciences, and some on-going experiments in the use of CD-ROMs. In Sec. III we review the current status of hardware and software systems and describe the network infrastructure that is evolving within the United States to link the scientific community. In Sec. IV we present our vision for where the physics community will be in 20 or 30 years, "Vision 2020." We recognize that the vision will not be realized soon or in exactly the form we describe. It is useful, however, to provide a focus for our short-term recommendations. In Sec. V we discuss some of the many issues and challenges which are raised by this vision. Finally, we turn to a plan for the Society to provide its own information system and to work towards the goal we describe. In Sec. VI we outline such a plan, present a set of short-term and medium-term recommendations, and discuss the financial implications of these recommendations. Our conclusions are summarized in Sec. VII . We realize that the details of this program will evolve. We are convinced, however, that an important aspect of this evolution will be the continuing education of the physics community regarding the power of electronic information systems and the potential impact that they can have on the conduct of scientific research. It is our hope that this report will contribute to that education. II. PRESENT STATUS OF ELECTRONIC INFORMATION SERVICES AND ORIENTATION OF USER COMMUNITYIn order to chart a sensible course into the electronic age, it is useful to assess the present situation in the physics community and the broader scientific community as it relates to electronic submissions, preprint exchanges, forums, on-line databases, databases on portable media, and other electronic information resources. There is, in fact, an entire sociology of electronic information developing within these communities. It is still fairly heterogeneous, with different fields of physics following somewhat different patterns and the different sciences showing even more variation, but there are some common trends that have important implications for the future. We shall highlight a few of these. We shall also try, in this section, to survey the present electronic landscape as seen by the working physicist. This survey, combined with the assessment of technological issues presented in Sec. III, leads us to the vision described in Sec. IV, and the challenges discussed in Sec. V. The electronic information systems that we mention range from the experimental to the highly developed, cover a wide range of subject areas, use several different technologies, and exhibit a variety of management structures. Together they represent a rich and growing set of examples whose careful study can suggest many ways for the APS to proceed, and pitfalls for the APS to avoid. In Appendix C we give more-detailed information on some of these systems. A. The Individual's Immediate EnvironmentProbably the most ubiquitous feature of the current electronic environment in physics is the use of word processors to produce and edit papers. Even physicists who have otherwise shunned the use of computers in their research have learned how to log on and use the word processor on their local VAX or desktop personal computer. This fact alone has enormous implications. Not only is the text of physics papers available ab initio in digital form, but also many of the physicists who write papers are plugged into a computing environment that includes network connections, electronic mail service, on-line databases, and other computer-based services, in addition to word processors. This proliferation of informational equipment in the individual's immediate environment has led to a profound change in the sociology of physics, a change that has taken place rather rapidly and was driven primarily by new technology. It is interesting (and perhaps humbling) to speculate on whether a task force report such as this, if written in the mid-1970s (before the advent of inexpensive VLSI microcomputers), would have accurately foreseen the present situation. Such reflections certainly invite the prediction that the technological revolution anticipated for networks (see Sec. III C) will drive an equally profound sociological change, one that will lead to the world envisioned in Sec. IV. Although electronic word processing certainly increases the power to create printed documents, it also opens the possibility of exchanging the full text of documents electronically and of submitting this text to journals as computer code (thus obviating the journals' need to typeset their articles). Yet the development of word processing has been as chaotic as it has been rapid. The variety of ways in which word processing is performed is so large (a host of stand-alone word processors and an even larger variety of word-processing programs run on PCs and mainframes) that it creates serious problems. For example, recipients of electronic text can usually print it out only if it is in one or (occasionally) a few formatting languages, so that most publishers must turn a deaf electronic ear to most authors. The effect of this babel of formatting languages on publishing can be dramatic. The history of our own Physical Review is illustrative. In 1979, Physical Review began allowing authors to submit, in addition to hard copy for refereeing, an electronic file or "compuscript" (first on magnetic tape, later on a floppy disk or UNIX-to-UNIX over telephone lines) for direct typesetting of galleys. Until 1987, this compuscript had to be in the troff formatting language, a restriction that undoubtedly explains why the number of these compuscripts, after rising in the first few years, leveled off at roughly 50 per year, less than 0.5% of the total number of papers. In 1987, Physical Review began to accept compuscripts for typesetting in a second formatting language, TEX (plain TEX and variants like LATEX). The arrival rate of these author-prepared compuscripts immediately began to grow, a growth that was enhanced when the APS developed and began to promote its own TEX variant called REVTEX and when, in 1989, the APS began to accept BITNET submissions in any of the TEX variants. The differences between LATEX or REVTEX and plain TEX have now proved crucial: although compuscripts in the former usually require only light editing to produce Physical Review galleys, those in plain TEX require such time-consuming modifications that it has proved simpler to print them out and rekey them in troff as if they had never been in electronic form. And even for REVTEX and LATEX compuscripts, one-third cannot be used to produce galleys, for one reason or another (often the nonavailability of authors for the consultation that is still necessary). The number of electronic compuscripts submitted either in the initial phase for refereeing or in the later phase for typesetting is now rising rapidly and has already eclipsed the rate for papers with troff compuscripts. For the near future, one expects this number to increase to 20-30 % of all papers, but not higher. The majority of papers are likely to remain nonelectronic because it appears that only one-quarter of papers submitted to the Physical Review are originally prepared with one of the TEX family of "formatting languages. [In a 1987 APS survey of authors of 497 Physical Review papers, 49% of whom responded, Peggy Sutherland of the APS found that only 44% of these authors (or their staffs) ever used a TEX language and that only half the papers by that 44% (i.e., 22% of the responding authors) were originally prepared using a TEX language. Similarly, in a survey of 2800 authors of papers submitted to European physics journals, reported by van Herwigjnen and Sens (see Ref. 1), 39% responded, but the papers of only 22% of these had been prepared with a TEX language.] Furthermore, although plain TEX and its offshoots LATEX and REVTEX are closely related and the relative proportion of Physical Review submissions in the latter two is now increasing, their differences are so important that many proponents of plain TEX may be unwilling to switch. A contrasting example is that of the heavily edited magazines of the IEEE's Computer Society. In these magazines, where the text is largely free of equations, submissions in any formatting language are accepted, and the electronic submission rate is now roughly 95%, resulting in an estimated saving of 35% in printing costs. At the other extreme, the journal Complex Systems requires that all submissions- be electronic and in LATEX. We do not know what success this policy has had but if further study shows it to have succeeded, the APS should find out why. B. Electronic MailAnother feature of everyday life for physicists has been the rapid increase in use, both in frequency and scope, of electronic mail services, "e-mail." This has been made possible by the expansion of the networks such as BITNET, DECnet, and ARPANET (see Sec. III C). And for physicists not having direct access to such networks, the AIP now provides BITNET access through its PINET facility, which, in turn, is accessed with a modem and phone-line connection to the commercial TELENET network. This e-mail capacity has been used for a number of purposes:
C. Bulletin Boards and ForumsA least formal line of communication, most pervasive in the lay community, is the computer bulletin board or forum. Such facilities, in which users can write as well as read, have been adopted by the physics community in at least two known cases and probably many more: high-temperature physics and cold fusion. Such bulletin boards are usually available through some dial-up service, or may be maintained on-line for uses of some institutional network. The Superconductivity Information Service of the Department of Energy (SIS/DOE) provides a high-Tc bulletin broad, for example. We have made no systematic study of such bulletin boards, either of their extent or usefulness. It is clear from a casual reading that the contributions, which are totally unrefereed, may often be of doubtful value. D. On-line Databases and Information SystemsAmong all the aspects of electronic information systems we have considered, the use of on-line electronic databases has the most far-reaching potential for altering the way physicists conduct their research. With the rapid advance of technology, a plethora of electronic databases has emerged, The thresholds have been reached for the data-storage, retrieval, and transmission capacities needed to make on-line acquisition of full text (without figures) an option, although such databases are still rare. A database with full text and quality figures is some way off. Here we mention a few of the bibliographic and textual databases and information systems, to give an indication of the range of possibilities. We give the details about these and several other databases, systems, and hosts in Appendix C. The databases available in physics, chemistry, and related fields differ from one another in several respects:
The Society must review these issues before it begins to move toward creating the recommended Physics Database. E. Searching DatabasesIn addition to the issue of data-handling capacity, an equally serious concern regarding the usefulness of a large, unified, full-text database is the need for an efficient searching strategy, one that returns most of the relevant articles and weeds out most of the irrelevant. The issues here are complex and in many respects similar to those relating to artificial intelligence. In preparing this report we had an opportunity to sample a number of databases, both bibliographic and full text. In the case of purely bibliographic databases like HEP/SPIRES, the intelligent use of boolean operations could sometimes result in a reasonably efficient search strategy. This, of course, depends on how specific one's original intentions are. In a purely bibliographic database (i.e., title, author, and publication information), it is often impossible to do a useful subject search based on the text, since the subject information contained in the title is extremely limited. (In fact, it is our experience that such databases are primarily used to find a specific article whose author or title is already known.) If keywords or subject codes are provided (by authors, editors, or indexers), things can improve greatly, but such descriptions tend to become obsolete as the terminology and popular subjects to which the paper is relevant change. The storage of abstracts and text obviously provides, in principle, a large amount of subject information, but it requires much more sophisticated software to make use of that information efficiently. Indeed, one would like to query the database as follows: "Give me all the articles on this particular subject that I would find interesting and useful in my research. " In practice, this requires a delicate balance of context-sensitive falters and probably a much-more-interactive searching program, e.g., one that asks questions of the user as well as vice versa, and thereby encourages the user to refine his or her search along lines that have proven efficient in the past. For the most part, software for such interactive searching does not yet exist. Thus, a large full-text physics database, if such existed, would not be fully exploited without major new software developments. The search problem is being studied widely. An example, using ten years of full-text chemistry literature, is in the Chemistry Online Retrieval Experiment (CORE), a joint project of the ACS, Bellcore, the Online Computer Library Center, and Cornell University. The purpose of this experiment is to find ways of integrating figures with text and to study search methods and the man-machine interface, with Cornell chemists acting as the text subjects. A more detailed description is given in Appendix E . F. Databases on CD-ROMS and Other Portable MediaThe recent development of CD-ROMS and especially the drop in their production costs associated with the large production capacity developed for the music industry, has led to a sudden proliferation of databases on this medium. (The setup costs for manufacturing a single CD-ROM are about $2,000; the cost for additional discs is about $2/disc.) A CD-ROM holds roughly 500 Mbytes (more than a year of Physical Reviews, minus figures and without index), so it will hold all kinds of databases heretofore widely available only in print (e. g., encyclopedias, bibliographic reference books like Books-in-Print etc.). To read a CD-ROM, one needs only a CD-ROM reader costing $500-$1000 To search the database, one needs search software and an index for the CD-ROM, both of which usually come on the CD-ROM itself. In recent years, a host of databases of special interest to scientists have become available on CD-ROMs. We list some of these in Appendix D . One characteristic of these is that none are truly massive (requiring several CD-ROMs) and all are essentially static, i.e., requiring little updating or updatable quite simply with issuance of a replacement CD-ROM. It is tempting to want to solve the shelf-space problem of libraries, and introduce some machine-searching capability at the same time, by putting scientific journals on CD-ROMs, but several problems come immediately to mind. Here are some, a few with partial answers.
We have alluded to two "experiments" on using CD-ROMS now in progress that may give some useful answers. We describe them briefly here, provide more detail in Appendix E , and leave a detailed study of these experiments to a specialist in electronic information systems. (In Sec. VI we recommend that the APS hire a specialist to follow on-going experiments and develop the Society's plans in more detail.)
G. ConclusionWe are now at a time of true revolution in the communication of scientific information. Much of the hardware for the individual's environment is available; the rest is on the way, as we shall see. The practice of using this hardware is becoming more widely spread every day. Wide-area networks of the necessary capacity and speed are further away, but much is on the drawing board, as we shall see in Sec. III C. Major problems include an agreement on the form and standards for this communication and methods for searching the masses of information becoming available, methods that are more efficient (and more sophisticated) than anything now available. III. IMPORTANT TECHNOLOGIESA. Hardware in Use TodayThe following discussion is not meant to be exhaustive or highly detailed as to all the systems and their specifications. The purpose is to introduce the reader to a broad range of what is generally being used and generally available and to emphasize the main features and limitations as they pertain to electronic publishing of journals that are currently being printed. For storing and retrieving documents today, there is a variety of magnetic and optical systems and media. Optical systems and media, which usually have the benefits of permanence and ruggedness over magnetic systems and media, are either write-once read-many (WORM) disks and tapes, erasable disks (readable and writeable), or read-only disks such as mass produced CD-ROM discs and analog video discs. Magnetic media are either floppy disks, fixed platter disks (Winchester-type drives), or tapes, all of which are inherently erasable/reusable. Disadvantages of optical systems and media, in general, are their lack of standards, cost of the actual media, slowness in writing and verifiing, and slow access speed to random data (due to movement of optical read/write mechanism). Disadvantages of magnetic systems and media include vulnerability of the media/system to failure (disk crashes, accidental erasure, media damage) and their cost per byte of storage. Therefore, the choice of either optical or magnetic systems and media is strongly dependent on the actual application (number of users, cost and amount of distribution, speed of information retrieval/storage, integrity of data, etc.). It should be mentioned that the lifetime of most current optical media (WORM, read-only, or erasable) is finite, on the order of years. This is due to the ingassing, or diffusion into the medium, of atmospheric gases or contaminants that attack the reflective or transmissive layer used in the optical medium. Magnetic systems and media that are currently used include 3.5-in., 5.25-in., 8-in., and 1.4-in. Winchester-type disk drives with capacity ranging from 10 megabytes (Mb) to over 1.2 gigabytes (Gb); 3.5-in., 5.24-in., and 8-in. floppy disk drives with capacity ranging from 256 kilobytes (Kb) to 10-Mb, 0.5-in. tape in various configurations (reels and cartridges) with capacity ranging from 10 Mb to over 300 lMb, 0.25-in. tape in cartridge form with capacity from 60 to 300 Mb, and the new 8-mm and 4-mm helical-scan tape systems which store 2.3 and 1.2 Gb, respectively. Optical systems and media that are currently used include 5.25-in. 8-in., 12-in. and 14-in. WORM discs with capacity ranging from 200 Mb to about 3.2 Gb per side, erasable discs, mainly 5.25 in., with capacity ranging from 200 to 600 Mb, and mass-produced (i.e., stamped) read-only discs such as 5 l/4-in. compact-disc-read-only-memory (CD-ROM), analog video discs in 8-in. and 12-in. sizes with capacity ranging from 600 Mb to over 1 Gb. A new and exciting optical storage technology is based on so-called digital paper (See Ref. 3) from ICI, which is a WORM medium being used in tape form and floppy form. The tape form of digital paper can store up to one Terabyte (Tb) on a 12-in. tape reel and the floppy form up to 1 Gb on a 5.25-in floppy disk. In addition, optical discs are being configured into so-called "jukebox" systems that offer disc selection through mechanical means similar to musical jukeboxes. With the data stored on some medium (either magnetic or optical), remote access or transmission is typically required. This is usually accomplished through the use of local-area networks (LANs), which offer up to 10 Mbit/see access rates, wide-area networks (WANs), which offer up to 1.5 Mbit/sec access rates, and dial-up modem connections, which offer up to 9.6 Kbit/sec access rates. LANs are typically configured on Ethernet with various protocols such as DECnet and Internet. One of the most popular and widely used collection of networks is based on the Internet protocol and consists of a variety of loosely coupled networks such as the ARPANET, MILNET, and NSFNET. This collection of networks is actually a WAN because the networks span a great physical distance and are limited by lower-speed data rates (typically 56 Kbit/sec or 1.5 Mbit/sec) in their interconnections. Bandwidth or access speed to users on these networks is highly dependent on the number of users and/or traffic on the network and is rarely the full bandwidth of the network. Therefore, as the use of these networks increase, the effective access speed or transmission rate to any individual user has to decrease. On the other hand, modem connectivity guarantees that the user will always have a fixed rate of access or transmission speed. Because the analog telephone network is the most widespread and widely used network in the country, modem connectivity offers a universal mechanism for remote computer access and data transmission. The upgrade of the current analog telephone network to the digital version called Integrated Data Semites Network (IDSN) will occur over the next ten years and allow such access or transmission at speeds from 64 Kbit/sec up to 1.5 Mbit/sec. Most of the commercial databases that are currently available (such as DIALOG, STN, EASYNET, BRS) are usually accessed via direct modem connection (through dial-up lines) or through equivalent modem ports on one of the commercial private connecting networks (such as TELENET or TYMNET). Modem access (even at ISDN rates of 64 Kbit/see) greatly limits the amount and type of information that can be accessed or interactively manipulated. Thus, for universal or generic access to databases, on-line journals, references, or data, consideration must be given to the access method available to most of the users of the information. A fast-growing method of transmitting and receiving printed material is facsimile Or "fax." This method scans a printed sheet of paper and produces a black-and-white digital bilevel image. However, the image produced, although good for textual material (though inefficient for just text storage), cannot be multilevel gray or color images. Resolutions of fax images vary from 200 to 400 dpi using well-established international standards for the analog or digital transmission (with compression) of the data. In the area of electronic publishing, there are currently several projects that are using CD-ROM to deliver information. There is the IEEE trial project discussed in Sec. II and Appendix E , which is storing fax images of all the pages of the IEEE journals on CD-ROM with limited indexing capability. Special workstations or terminals are needed to retrieve and display the journal pages for interactive user. Another project using CD-ROM is the "Computer Library," which is a collection of full pages (text-only) of the previous 12 months of about 20 computer magazines plus abstracts of articles in about 100 other magazines. The information in this collection is fully reversed indexed (i.e., the CD-ROM contains a fully inverted index to all the information on the disc). AS we have seen, the American Mathematical Society has been producing the Math/Sci Disc, which is a collection of their database records for articles in two of their publications. This Math/Sci Disc contains the complete journal articles, but no figures or images. Appendix D contains a representative list of some current CD-ROM-based databases and information sources. B. SoftwareAs indicated in Sec. II, the use of word-processing equipment and software has increased significantly in the physics community, and most documents submitted for publication are produced electronically with a word processor, personal computer, workstation, or mainframe.Until recently, however, there was little if any need for standardization among the various word-processing systems. The only output from such a word-processing system was the printed page, with all the printed pages plus figures submitted by mail. The journals or conferences would enter all the text manually or through optical-character-recognition (OCR) equipment. Today there is a full range of word-processing equipment and software ranging from simple text manipulation to typesetting programs capable of equations, tables, and graphs to full-page-layout systems that manipulate the entire page, including figures and images. The next generation of products will support multimedia documents that will have audio, video, animation, graphics, figures, and text all cohesively linked and self-contained. Recently, however, the demand for electronic submission of documents and the widespread use of computer networking has resulted in the need for standardization of the layout or formatting of the submitted documents. The history of the APS's attempt to set standards within the physics community starting with troff moving to plain TEX, and then to LATEX and REVTEX, has been recounted in subsection II A. Both TEX and troff offer the ability to control the textual placement and style of documents and, with additional procedures called macros, can format or lay out or organize the document logically (paragraphs, headings, pages, references, chapters, titles, etc.). The APS has prepared its own standard library of macros that maintains the current appearance and style of the APS journals and has named it REVTEX. To encourage electronic submission of APS manuscripts in REVTEX, the APS is distributing to potential authors the REVTEX library of macros. Other publishers have constructed their own sets of TEX macros that are specific to their journals or needs. One example is the American Mathematical Society (AMS) collection of macros called AMSTEX, which is oriented toward equations, tables, and text (no figures or images) because most of AMS papers can be presented in that form. Even though TEX has become a de facto standard for text preparation and style, there are already specific implementations of TEX (such as REVTEX and AMSTEX that are journal or application specific. This leads to an interchange-of-documents problem. In addition, there is a difficulty because both TEX and troff do not include standards for figures or images, and in fact, figures are usually still added to the text by editors in the preparation of the final document. A number of efforts are underway to develop a higher level of standardization for document layout and text preparation to facilitate document interchange and logical consistency. The most active is the Standard Generalized Markup Language (SGML), which is an ISO standard and has been accepted/required by the Department of Defense (DoD) and other Federal agencies. SGML is widely used in Europe, especially at CERN. In fact CERN has proposed SGML as a standard for all journal submissions in Europe. An alternative ISO standard is the Office Document Architecture (ODA) with a related standard format called Office Document Interchange Format (ODIF). Although SGML is not widely used here in the United States, its use has been mandated by the DoD and other Federal agencies for all their current and future documentation, to bring consistency and standardization to their large volume of documentation. The problem of how to include figures and images (color, gray scale, black and white) within text-preparation systems is significant. One popular method (due to its use in facsimile) is the so-called bit-mapped scan of the image or figure. These bit-mapped scans of the images and figures are limited to black-and-white renditions of the original images and figures and are included in documents exactly as scanned. Extensions to this method allow gray scale and/or color at the expense of greater storage requirements. As an example of storage requirements for a bit-map-seamed document consider the following: An 8 1/2 x 11 page (inches) has approximately a usable area of about 6 x 9, which is about 50-60 square inches. Most current seamers have resolution of 200-400 dots per inch (dpi). If the document is seamed at 300 dpi (to be consistent with current laser-printer resolution), the usable area would require about 5 Mbit of storage, or about 500 Kb. The ASCII code storage for the text alone on a typical page is about 5 Kb. A single-page figure or image would thus occupy 100 times the storage of a single text page. This is only for a b black-and-white image. For gray scale, multiple bits are needed, which increases the storage by a factor of 8. For color, the storage is increased by a factor of 24. Even with compression, the storage occupied by the figures is a substantial part of the total document storage. ( Appendix F contains a summary of useful numbers related to storage needs.) Another popular method for generating or including figures or images is through the use of the page-description language (PDL) PostScript. This language allows for the inclusion of bit-map images or the generation of figures and graphics through its language description. Though the PostScript language is mainly intended to drive printers or display devices (terminals and workstations) through a concise language description of the document with its text, graphics, figures, and images (which the printer or display-device hardware translates into a form the local hardware can use), it is also being used for document storage and retrieval because of its popularity and proliferation. C. NetworkingIn Sec. II we indicated that networking and especially e-mail are increasingly important to the physics community. The use of networks today falls into three major areas BITNET, DECnet, and the Internet. In the future two new initiatives will have a significant impact on computing and networking for physicists. These are the conversion to the International Standards Organization's Open Systems Interconnection standards (ISO/OSI) and the development of a National Research and Education Network (NREN) as part of the Federal High Performance Computing Program. In this subsection we shall describe briefly the three major networks used by the physics community, and shall then discuss the ISO/OSI standards and NREN. 1. BITNETBITNET began as a spontaneous effort to connect research communities. The acronym stands for the "Because-It's-Time" NETwork. Today BITNET serves more than 1500 computers in the United States and more than 2500 world-wide through connections to other networks such as the European Academic Research Network (EARN) and Nordnet in Canada. BITNET provides access to countries from the Far East to the Middle East. Today, BITNET might better be defined as the "Because-It's-There" NETwork. Despite its popularity, however, the routing of BITNET connections and the choice of protocols do not lend themselves to developing a robust, high-performance network. As the network grew, additional universities attached themselves by leasing a line to a nearby node creating a "daisyChain" of universities that does not reflect the traffic of the network. In addition, the network uses a "store-and forward" mechanism so that all messages are moved sequentially through all nodes in the network along the path between the two end points for the message. This means that a singe point of failure can hold up a message for days, sometimes weeks, at a time. Nevertheless, BITNET continues to be very popular for transmission of e-mail. The network provides only limited capability in other areas. It does not, in general, support file transfer, except in response to e-mail messages to an "information server." This approach has been used by the SPIRES group at SLAC to provide information from the particle-physics database at SLAC. 2. DECnetDECnet is a proprietary protocol developed by Digital Equipment Corporation (DEC) for use on its PDP11 and VAX computer systems. More recently, it has been implemented on a number of other computer systems as well, but its widespread popularity is a result of the extensive use of VAX/VMS computers in physics departments and other science departments at universities and laboratories throughout the world. The DECnet system provides a full range of networking functions including e-mail, file transfer, file sharing, and distributed computing. The major use of DECnet is, however, for e-mail and file transfer. The BITNET mail has been implemented using DECnet as a carrier for providing BITNET on VAX/VMS systems. DECnet for physics in the United States evolved in a similar way to BITNET. In the high-energy-physics community the links were established to further specific programs at SLAC and LBL, and later at Fermilab, Argonne, and Brookhaven. University groups associated with programs at those laboratories leased a line to that lab. A backbone network architecture evolved through the volunteer efforts of the network managers at the various labs. Similar evolution occurred in the Space Sciences program, giving rise to the Space Physics Analysis Network (SPAN). More recently, other groups of scientists have utilized the backbone and have added lines to their university. The physics DECnet network now reaches more than 20,000 computers worldwide and is the largest DECnet network outside Digital Equipment Corporation. The continuing support of DECnet has been given a high priority by the DOE and is part of the Energy Sciences Network (ESNET). Following the backbone topology, DOE has installed high-speed lines (T1 or 1.5 Mbit/see) throughout the United States. These lines carry DECnet and, in addition, Internet traffic (TCP/IP) discussed below. This network will continue to satisfy the needs of the physics community, but DEC is committed to moving the DECnet protocols into compliance with the ISO/OSI standards. Within the next 12-18 months the DECnet network can be expected to be a part of the networking mainstream. 3. InternetIn the 1960s the Defense Research Advanced Projects Agency (DARPA) began a program to develop and demonstrate computer networking based on packet switching. This effort resulted in the ARPANET, the first packet-switched network. Later, DARPA supported the development of a set of procedures and rules for addressing and routing messages across independent networks. The DoD has adopted these "Internet Protocols" as standards for all its packet-switched data communications. The National Science Foundation (NSF) has used the same internet technology to build a national network, NSFNET. This network consists of a high-speed backbone (1.5 ~Mbit/see) to connect many regional or mid-level networks as well as the NSF supercomputer centers. The regional networks are managed locally and serve large parts of states (e.g., Bay Area Regional Research Network, BARRNET, in Northern California), states (e.g., New York State Educational Research Network, NYSERNET), or multistate regions (e.g., Southern Universities Research Association Network, SURANET). The DOE and National Aeronautics and Space Administration (NASA) have developed and continue to evolve networks that support their programmatic needs and that adhere to the Internet standards. These are the Energy Science Network (ESNET) and the NASA Science Internet (NSI). The interconnected set of networks that uses the Internet Protocols is referred to as the Internet. The interagency networking efforts will be further enhanced with the advent of the National Research and Education Network (NREN) discussed below. 4. IS0/0SIThe International Standards Organization (ISO) has, for a number of years, been working on a new level of protocols for Open Systems Interconnection (0SI). These protocols are based on a seven-tiered model of networking from the presentation layer (user interaction with network resources) to the physical layer (describing the pattern of bits on the interconnection medium). The definition of the standard is essentially complete, and vendors are developing products that meet them. The move to the 0SI protocol standard will be taking place over the next few years and will provide enhanced connectivity and interoperability between a variety of computer systems from various vendors. In particular, the distinction between DECnet and the Internet Protocols (TCP/IP) will disappear. The use of ISO/OSI protocols will become more important for the physics community as a result of the introduction of new products and in response to pressure from the United States Government, which has adopted a policy of supporting the migration to ISO/OSI through its GOSIP (Government Open Systems Interconnection Profile) directive. This migration will take several years to complete. 5. The National Research and Education NetworkIn response to a series of studies by the Federal Coordinating Council on Science, Engineering and Technology (FCCSET) in 1986-87, the Office of Science and Technology Policy (OSTP) developed a Strategy for Research and Development in High Performance Computing. This program proposed four areas of effort: high-performance computer systems, advanced software and algorithms, networking, and basic research and education. This strategy formed the basis for a bill submitted by Senator Gore and, more recently, has resulted in a program plan for the Federal High Performance Computing Program developed by the OSTP. The National Research and Education Network is seen as three-phase program. The three phases will proceed in parallel.
The participating agencies are the NSF, DARPA, DOE, NASA, and the Department of Health and Human Services. The NSF is the lead agency for implementing and operating Stage 1 and Stage 2 networks. DARPA is the lead agency for the advanced research required in Stage 3. The planned network will provide access to computing and information resources throughout the country. It will have a major impact on the conduct of scientific research and will significantly influence education in the United States. It is this network, coupled with modern workstations, that will permit us to realize the vision for physics information described in the next section. IV. VISION FOR THE YEAR 2020A. IntroductionOnce physicists communicated their results to one another by letter, and the small number of physicists could keep in touch with the whole world of physics in that way-but that time is long gone. More recently, physicists could go to occasional, small meetings or conferences, meet directly with many of the principal workers in a broad field, and have discussions of the most informal and speculative kind. Or, they could scan the tables of contents of most of the physics journals, be aware of essentially all that was published that was of interest to them, and read most of what was especially interesting. That day is now gone, too. Today, meetings devoted to a narrow field may be as frequent as one a week (in high-temperature superconductivity, the worst example) and attract as many as several thousand participants. In addition, the size of a year's physics literature is doubling nearly each decade; almost no one has time to read it; scanning the tables of contents is an arcane part practiced successfully by only a few; and the canonical question occurring in almost every discussion among physicists is "who has time to read?" It is simply no longer possible either to be aware of new developments or to find out about much old knowledge using conventional methods. The explosion in the printed literature has further ramifications. Al- though its volume is doubling nearly every ten years, the cost to libraries is increasing much more rapidly. This is because, with the proliferation of specialized journals by commercial publishers, the average cost per printed character is skyrocketing (see Refs. 4 and 5). Little wonder that the community of Librarians is becoming restless, if not downright rebellious, from the pressure this is putting not only on their shelves but also on their budgets. Thus a revolution in how new scientific developments are communicated, both informally and formally, and how old scientific knowledge is stored, searched, and recalled is inevitable (see Ref. 6). Furthermore, the coming of storage media with unimaginable capacity, communication networks of equally unimaginable transmission rates, and personal computers and workstations with the power of yesterday's mainframes, will facilitate this revolution. The scientific community will switch largely, if not totally, to electronic media and networks in the next few decades. In this section we give a vision of where we might be in the year 2020. The description is necessarily speculative and vague, but we don't think it is unduly optimistic; the seeds for most of what we describe are already planted. If anything, this asymptotic state may be reached by the year 2000 or 2010. By 2020 we may have moved on in unpredictable ways. Electronic information technology will affect the physics community in many ways, but the most revolutionary and dramatic will be on the physics "literature" and the way physicists send and receive information. We focus first on this area and at the end of this section indicate a few other directions in which we may be taken. In this section we describe a worldwide electronic information system devoted to physics (possibly to all science). This description covers several technical aspects of this system:
The actual path by which the physics community reaches this asymptotic state will depend on the rate of technological development, the way the community of physicists and the institutions serving that community respond to this development, and the lessons learned from each step along the way. We defer to Sec. V a discussion of nontechnical issues like the financial and administrative aspects. These, more than the technical aspects, will confront the APS itself with a host of challenges and opportunities whose resolution could have a profound effect on the asymptotic state. B. Environment- Hardware/SoftwareAlthough there will be enormous variation among the facilities available to physicists in different kinds of institutions (public, nonprofit, and commercial) and in different parts of the world, we believe the overwhelming majority of physicists will have terminals or workstations connected to local or remote servers. These workstations will have color-graphic resolution comparable with today's best on-paper printing and computing power that exceeds today's largest mainframes. The workstations will be linked either to local-area servers or directly to remote servers (possibly via mainframes). These workstations and the local and remote servers will all provide computing and information resources, and all these resources, accessible in seconds wherever they reside, will appear the same to the user. The remote servers will be linked to form a single worldwide network having a hierarchical structure in analogy with the highway system with its superhighways, state and regional highways, local roads, and driveways. In fact a three-tiered structure for these networks is already beginning to emerge, as we have seen in subsection III C. The network of 2020 will employ fiber-optic cables in which a single fiber will be able to transmit at 50 Tbit/see, allowing a central or regional server with a high degree of parallelism to transmit simultaneously over hundreds of charnels at 100 Gbit/sec for each. Storage of information will be possible at less than 10-4 times the cost of storing on paper and at such large densities that all the English-language physics literature ever produced could be stored on a few 12-in. reels of digital paper (at 1 Tb per 12-in. reel). Software standards will be in place for dealing with a wide variety of tasks-text, equations, graphics, bit-mapped illustrations, hypertext scripting, multimedia output, computational algorithms, etc.-in a wide variety of documents, so that individuals working in the inevitable multitude of computer languages will be able to communicate easily with one another, whatever the nature of the document. C. Scientific Information in 2020In this subsection we shall describe the way scientific information in 2020 is likely to appear and to be produced, disseminated, and used. 1. The Forms of the LiteratureToday we think of most physics information as-what can be displayed on the printed page, a black-and-white page at that. The information comes in papers that are ground into journals or proceedings, in books, in collections of numerical data that are usually represented by tables or graphs, and in some cases in programs that can be installed in one's computer to do certain tasks. In the year 2020, all these concepts (if not the words) will have much more general meaning:
2. Producing the LiteratureFor the preparation and submission of documents, the author will use a WYSIWYG (What You See Is What You Get) editor to create text files and to integrate graphics and tables. Other material (data, equations, and their solving algorithms, etc.), material on other media, and the "script" for reading a hypertextual document will also be prepared by the author. For all these parts of a document, the author will have available a variety of languages, all of which conform to certain internationally accepted standards so that editors, referees, and readers can all "read" these documents, perhaps with readily available translator to languages resident on their own workstations or servers. With the additional power that such documents offer over traditional papers, and with the freedom to submit in a variety of computer languages, the circulation of drafts among collaborators and the submission of documents to journals will almost all be done electronically. However, experience suggests that the versatility described will, in fact, be necessary before the number of nonelectronic submissions becomes as small as the number of electronic submissions is today. One can even imagine the day when machine translation of natural languages would be so advanced that papers could be submitted (and even read) in a natural language of the author's (reader's) choice. The software developments that will make this possible will occur independently of any efforts of the APS, however, and so are not our concern. Journal staffs, proceedings editors, book publishers, etc., will continue to exist and perform work similar to what they do today. Their work will be considerably expedited, however, by electronic submissions (if authors can use familiar languages that conform rigorously to international standards). Communications with referees, the subsequent back-and-forth with the authors, and the proofreading by authors of changes made by editors will all be electronic, hence accelerated and simplified. The publication process will be completed by the journal or book publisher who will produce blocks of code for each document (and additional code for printing on-paper journals, when appropriate) together with keywords and/or other indexing features where appropriate. These blocks of code could conceivably be supplied directly to institutional libraries and even to individual users, possibly document-by-document and on-line. More likely, the code will be supplied to a world Physics Information System, which we describe below. Although journal staffs or their equivalent will still be required, the importance (and distinctive character) of journals per se, as opposed to individual documents, will depend on a number of policy questions How universal is the searching, whether users (local libraries, individuals) will still have to subscribe to entire journals to access full documents in those journals, how users are charged, how producers (publishers) are paid, etc. The answers given to these policy questions will have a profound effect on the sociology and economics of journaI production and on the use of scientific information generally. We discuss possible answers in Sec. V. 3. Disseminating the Literature-The Physics Information SystemToday most of the physics literature is disseminated to the ultimate consumers, the physicists, on paper via the intermediary of institutional libraries (although, as we have seen, some of the chemistry and mathematics literature is available on-line and some of the mathematics and electrical engineering literature is being disseminated via portable media such as CD-ROMS). Although the dissemination on paper via institutional libraries could continue, we think the dominant mode will be via a single electronic physics library, or Physics Database, which will be the heart of a worldwide Physics Information System. This mammoth electronic database will contain all published books, papers, conference abstracts and proceedings, numerical data, computer programs, etc. This contrasts with current electronic databases (e.g., those provided by STN, DIALOG, BRS), in which the contents of one database may not be in another (e.g., most of the text of articles published by the ACS is in an STN database, but only bibliographical information is in any BRS or DIALOG database) and in which, at least in physics, chemistry, and mathematics, the full document is never available. The Physics Database, continually updated, will be available to users on-line from a central server or regional servers. Equally available will be software for searching, outputing, usage monitoring, and charging. Although we speak of a single Physics Database, we do not exclude the possibility that there may be more than one, just as there are now at least two on-line databases of physics abstracts and even more systems to access them. The entire Physics Database will be searchable at one time. This contrasts with the present nonelectronic libraries, which are highly fragmented and are searchable only within small domains of the literature (individual books, individual journals, only bibliographical information, only citations, etc.) and/or time (a few years, just one year, just a portion of a year). The proprietor of this Physics Information System could have two responsibilities: the physical operation (hardware, software, networking) and the administration (making and implementing the policy decisions discussed in Sec. V). These could both be borne by the same organization, or the physical operation could be contracted out. The proprietor could be a government agency (e.g., the NSF, a new National Library of Science paralleling the National Library of Medicine, the Office of Scientific and Technical Information of the DOE), an existing consortium of not-for-profit institutions (e.g., STN International, which, we recall, is a joint venture of the Chemical Abstracts Service of the ACS, FIZ Karlsruhe, and the Japan Information Center of Science and Technology),a new consortium of scholarly societies (e.g., APS, AIP, European Physical Society, Physical Society of Japan), a commercial database proprietor (e.g., BRS, DIALOG), a consortium of commercial physics publishers-or some combination of these. It is clear that the policies and practices of the proprietors could have a profound impact on the physics community and on physics itself. The Physics Database could ultimately be connected with databases in other sciences to form a single Science Database to which users of the Physics Information System, of a possible Chemistry Information System, etc., would all have access. The proprietors of the Physics Information System, in preparation for such an eventuality, should take steps along the way to ensure compatibility of the different systems and could take the lead in forming the Science Information System. The physics Database will likely be available via portable media too. The database together with periodic updates (to the data, which would be easy, and to the index, which might be harder) could be supplied to users on CD-ROMs or some kind of tapes, under lease or sale, presumably (but not necessarily) by the same proprietor that operates the Physics Information System. We expect that CD-ROMs and tapes will be used in different ways.
Individual publishers could also continue to deliver their code to traditional printing companies whose machines would, in all probability, be operated directly from this code. 4. Using the LiteratureElectronic information systems, despite their great power and convenience, may be used much less than one would expect unless certain threshold barriers are made very low. Of course, such barriers, even if not lowered, will appear lower to future users than they do- now, but the speed of adaptation of the physics community to the electronic information age will depend sensitively on these barrier heights, something the APS must keep constantly in mind. By 2020, one can expect the use of the Physics Database to have the following features:
D. Literature of the Pre-electronic AgePublishers will also be encouraged to supply already-published documents to the Physics Database, possibly with only the titles, authors, abstracts, and citations in machine-readable form, the body of the text being available as bit-mapped pages. Keywords and other indexing features would be also be supplied, possibly with the aid of computers reading the abstracts. The day may come when full-text search algorithms have so improved, and the cost of transforming to machine-readable form has so decreased, that the APS, other publishers, and even the database proprietors may wish to convert all bit-mapped text to machine-readable form; but that day is not yet foreseeable. We can only urge that experiments like that of IEEE/IEE/UMI, which will be facing this problem weekly, and technical developments in full-text searching and character readers all be watched closely. E. Novel Forms of "Informal Literature"Two forms of more informal literature, on which dhe electronic age will have special impact, deserve separate consideration: preprints and comments/discussions. 1. PreprintsPreprints are already a form of the literature (if not the formal literature); in some fields like high-energy physics and high-temperature superconductivity, they are perhaps dhe dominant form. But their accessibility is haphazard depending on the mailing lists of authors, except in a few cases where an attempt is made to centralize their collection and the dissemination of their titles, authors, and abstracts (e.g., the SPIRES facility at SLAC in high-energy physics and the SIS facility of the DOE at Oak Ridge in high-Tc superconductivity). When most of the documents are submitted in machine-readable form, it is conceivable that the journals could support a database of these preprints (or include them in the single Physics Database). Such a database, being just as accessible and searchable as the formal literature, would be far more accessible and searchable than the preprints of today-even in fields that SPIRES and SIS now serve. The electronic availability of preprints could even formalize and give significance to an optional "presubmission phase" in the publication process that would have the following characteristics, some of which are proposed in a provocative column by Rogers and Hurt (Ref. 10).
There are many serious questions of policy dhatwould have to be resolved before such a database could be implemented, e.g., the impact on the standard refereed literature, on the quality of preprints if authors ever became widely contented widh such nonrefereed "publication," however transitory, the acknowledgments to be given commenters, etc. It is important now simply to note the possibilities. 2. Comments/DiscussionsThe current method of publishing scientific literature allows for very little comments, discussions, subsequent updating, etc. The electronic age would certainly make this possible, even easy. This might start as informal (i.e., unrefereed) comments on documents (or on other comments). Just as with documents, comments could refer to more than one document or comment, and each comment could also contain computer-supplied pointers to later comments and documents in the database that cite the particular comment. Comments could also refer to (and be indexed by) keywords, subjects, etc., rather than just the specific documents to which they refer. In this way, an entire web of referencing, both forward and backward in time, could develop that could involve both the published literature and informal contributions. The user could still restrict his searching to the more formal literature, but could now have access to a back-and-forth discussion of a given document or subject as well. The desirability and details of including such comments in the database involve serious policy questions, e.g., the willingness of scientists to expose to broad public view what they may now confine to private discussions with trusted friends, the priority that would be claimable for contributions of this nature, the impact this would have on the willingness of scientists to participate, etc. As with preprints, it is sufficient now to note the possibilities. F. Functions Other Than Producing the LiteratureThere are many other areas in which new electronic information technologies may play a role. Areas that come immediately to mind include the following:
The Task Force has not concerned itself with developments in these areas because it was felt that the future there is unpredictable-the applications of new technologies in these areas will grow naturally with time, depending on the wit and imagination of many practitioners, but with no clear role for the APS. V. ADMINISTRATIVE AND FINANCIAL CONSIDERATIONS: A CHALLENGE TO THE APSUntil now we have concerned ourselves with a host of technical aspects-hardware, software, networks, systems-as if they were the only, or dominant, aspects of the physicists' informational environment in 2020. Although developments in all these areas will be necessary-and inevitable-the policies on organization, pricing, payment of publishers, availability, and use will all have profound effects on this informational environment, And in these areas, a particular line of development is not inevitable; the APS can have a very important influence. The possibilities in each of these areas are so varied that it may be premature to try to analyze their ultimate impact now. Nevertheless, some of these possibilities are also so important that they deserve mention at this early date, if only to alert the APS to the importance that its own leadership can have in determining how physicists interact with each other and with the scientific literature in 2020. We do this by discussing a few important aspects and the possible impact of each. A. The Physics Information System as Consumer and ProducerToday, in the physics-information marketplace, the individual physicist may be the ultimate consumer of the published literature, but the institutional libraries are the dominant consumers economically most of the payments to publishers come from the libraries. As consumers, the libraries have much greater power than have individuals, but this power is still diffuse and hence weak. The producers in the marketplace are the publishers, and although they are a varied lot, some have (and exert) considerable economic power. This is because of the very inelastic (i.e., price-insensitive) demand for a journal, once it contains some minimum number of important articles. This explains how there can be one group of journals providing, on average, 1/20 as many words per dollar as another significant number of journals, a ratio that drops to 1/100 when the impact per word per dollar is considered (see Refs. 4 and 5). In 2020, The Physics Information System will function as the ultimate middleman, both as a powerful consumer (being a massive library) and as a powerful producer (supplying the whole world of physicists and/or their institutions). If there is only one such system, its power in both roles will be enormous. Therefore, it will have to exercise this power with great care and consideration for all parties and for the good of physics. But will it? That may depend on who operates the Physics Information System. Some cautionary implications The concentration of so much power in one group has certain drawbacks. These include bad decisions with no recourse, unhealthy attempts to influence these decisions, Justice Department actions to oppose such a concentration, etc. A detailed study of the medical field, and the National Library of Medicine in particular, where this kind of effort is further along, is in order. B. A Single Physics Information System-Or Several?The alert reader may wonder why there won't be several physics information systems, with competition among them diluting the influence of any system. There are, after all, at least three hosts for physics bibliographic information (BRS, DIALOG, STN). The answer revolves around a basic difference between bare bibliographic information and full text, or even abstracts. The former cannot be copyrighted; the latter can. This difference is already reflected in online databases. The STN database PHYS has the original abstracts of all papers from the journals of the AIP and its member societies, whereas the INSPEC database on BRS and DIALOG, unable to reach agreement with the AIP on terms for supplying these abstracts, has substituted abstracts of its own, written by others on commission by the IEE, INSPEC'S producer. The situation with full text is more extreme, because there can be no substitute papers. Thus, only STN carries the full text of ACS articles, and, not surprisingly, STN is, in part, an ACS enterprise. C. A Partial, If Not Total, Physics Database?One can easily imagine a physics information system in which the documents from some publishers are fully available and documents from others are not. An analogous situation exists now with STN's Chemical Journals Online, which carries full text (minus figures, etc.-see Appendix C ) of the chemistry journals of the ACS, Royal Society of Chemistry, and John Wiley, but no other chemistry journals. Another analogy is AIP's SPIN, which carries advanced bibliographic information on articles in only AIP and AIP-related journals. These partiaI databases have some value, but hardly in proportion to their size. When users search they want to search broadly and all at once. If they are not just searching but reading and/or downloading, limitations on the range of readable papers may be less of a drawback ("some is better than none"). But in 2020, searching and reading will be more strongly interacting activities, in which case documents that are missing from a database will be at a disadvantage. If the database has a significant part of the literature, so that it becomes widely and frequently used, the pressure on publishers of the rest of the literature to have their journals included will be strong. D. A National Library of Science-Not Just of Physics?Today, there is a National Library of Medicine. By 2020, a National Library of Science, rather than databases limited to just physics or chemistry, is probably inevitable. The establishment of such a facility would, of course, change many of the administrative and financial issues discussed here. Such a national library could well emerge from the experiences that the ACS is gaining with its involvement in STN and the APS and AIP would gain with a Physics Information System. The actual formation of such a library will require real leadership, leadership that the APS and AIP, working closely with the ACS, could provide. The ease with which this library is formed will also depend on the success of the enterprises in the separate fields of physics and chemistry and on the degree to which these separate enterprises develop in technically compatible ways. E. The Document, Not the Journal, Becomes PrimaryToday, from an economic point of view, and even informationally, the journal, not the paper, is the primary entity. Subscriptions are to journals, not papers; to read papers, one must have an entire issue of a journal; to make oneself aware of papers, physicists will browse certain journals, but not others. Furthermore, to the extent that journals have a geographic base and physicists interact individually with others in their own geographic area, journals tend to enhance the geographic compartmentalization of physics; physicists are more likely to submit papers to "their" journals and to know of and cite papers in these journals. Thus the existence of a journal per se as opposed to the set of papers it contains, has a strong influence on what physicists are likely to read. In 2020, when individual searches and awareness services can search worldwide, when they are ubiquitous and easy to use, and when ail documents are easy to browse, the document itself rather than the journal will have the stronger influence on what the physicist is aware of. Furthermore, if all documents are equally available, and at comparable cost, the document will have the stronger influence on what the physicist reads or interacts with. This realignment of roles is certain to affect the prices that end users or their libraries pay for the literature and the way collections of documents we now call journals are produced. Some cautionary implications: If documents become fundamental, journals could tend to lose their identities. This could reduce the standards to which a journal aspires and increase the pressure for economy in production. The results could include a drop in staff morale, possibly a loss of volunteer editors or lower quality of editorial work, possibly a loss of referees and a degradation of the peer-review process, and a lower quality of the final product, the published documents. If this happens widely, that would be most undesirable. F. Monitoring Usage- and Charging UsersToday, it is clear that a journal sold to a library is used much more than one sold to an individual. Hence, one has a complicated multitiered structure of prices to reflect this difference. But the system is very crude; libraries serving 200 physicists and libraries serving five must pay the same. In 2020, the usage of the Physics Information System could be monitored in exquisite detail-the number, kind, and extent of uses (searches, inspections, studies downloadings, other interactions, etc.) could all be recorded. Usage by b accessing a local server (itself fed with magnetic tapes or CD-ROMS) would be more difficult to monitor, but not impossible. Charging that takes this detailed usage information into account, perhaps not for each individual but statistically for each user's institution, would not be crude, and it could be far fairer. Another implication: Usage monitoring could replace some of the current methods for evaluating physicists (e.g., number of citations or of letters in Physical Review Letters). G. Ability to PayToday, no real attempt is made to charge institutions (or individuals) according to their ability to pay. Not only is this ability difficult to estimate, but also widely different charges could lead to efforts to circumvent the differences. (Certainly the big price differential between individuals and libraries has led to many public displays of "private" journal collections.) In 2020, it might not be inappropriate to make the price structure vary from one region of the world to another in supplying individuals, other parameters beside geography could also enter in determining this structure. The Physics Information System could easily handle this. H. Paying PublishersToday, the economics of journal production are extremely varied. There are very general journals and highly specialized journals; there are nonprofit publishers, commercial publishers, and even something akin to "vanity" publishers; there are journals with large page charges, some with none, and some that pay authors; there are journals where the editors serve gratis, and journals where editors are paid. And, as we have noted, the ratio of price per word per unit of impact between some "high-priced" journals and some "low-priced" journals exceeds 100. In 2020, a single Physics Database could have a significant influence in reducing this ratio. Payment to publishers could be at flat rates; it could be based on monitored usage of the documents in their journals; and it could be negotiated with each publisher to take into account other factors in the publisher's own situation (societal, commercial, etc.). The possibilities could also be nightmarish: one could imagine a situation in which the publisher and author would negotiate a "price" (positive or negative) for each document, much as is now frequently done for books. Such a situation could in turn have an effect on what is written and how it is written. The possibilities are enormous, and their implications are not all desirable. Here, we have simply mentioned a few. I. ConclusionsIt is clear that the world of physics is on the verge of a revolution, a revolution that is driven by technology, but whose true nature will be determined by the response of the world scientific community. The revolution will change what and how physicists read, how they become aware of what they read, and even what "read" means. Institutionally, the revolution will change the very nature of libraries, will reduce the importance of journals, and could have important financial consequences for both. This revolution will lead most likely to a National Library of Science-or even a World Library of Science. The way all this comes about will have a profound impact not only on physicsts but also on physics. In this process, the APS, in close collaboration with the AIP, will want to play a fundamental role. This is the challenge that the vision of 2020 presents to the APS today. VI. PLAN TO REALIZE THE VISIONA. GoalsIn developing a strategy for Electronic Information Systems, the APS should have the following goals
The development of a World Scientific Information System will take many years and will require the cooperation of all scientific publishers. We believe it is essential that the APS, working with the AIP and its other member societies, take a leadership role in bringing together the professional societies so that we can realize this final goal as soon as possible, and so that the World Scientific Information System will be organized to provide maximum benefit to the scientific community and thus to science itself. In the meantime, however, the Society must plan its own activities to ensure that it can satisfy the needs of its members. We turn now to a discussion of strategy. B. StrategyThe Society has, in the past, taken a leadership role in using new technologies to improve publication procedures. Recently, however, that lead has been taken by other societies. Although the APS will not take on major new projects, we feel that it is important that the Society continue to apply new technologies and help to define the direction for scientific publication. We suggest the following components of an APS strategy.
In planning its activities, the APS must recognize that some of the short-term and medium-term items will not move the Society toward its goal of a World Scientific Information System. The steps we take now, however, will help build the expertise and the facilities that will be needed to develop our vision in collaboration with other professional societies. C. RecommendationsIn this subsection we describe short-term and medium-term projects that will significantly improve our handling of electronic information and that will begin to develop the foundation for the World Scientific Information System.
D. Financial ImpactThe major financial impact of these recommendations is the addition of a new staff member. The new hire may, in fact, take on responsibilities of an existing staff member, so as to permit that person to concentrate on monitoring ongoing projects and investigating new technologies. The replacement of the VAX/780 will require approximately $250K. This replacement must be planned carefully, so that there is an opportunity to move existing programs and files to the new system. The Society should budget $5OK-1OOK per year to continue to introduce new hardware and software into the publication office. Once the conversion to a new system is complete, the costs of hardware maintenance should decrease. The connection to the Internet, through NYSERNET, should cost approximately $20K-30K. The continuing maintenance of the link will be about $5K. VII. CONCLUSIONSIn this section we wish to summarize the principal points of this report and emphasize the conclusions they imply. We have defined a single, long-term goal, a World Scientific Information System, which we have called our Vision 2020 and have described in detail in Sec. IV. This vision has several principal features:
The process by which we get to this vision will for the most part be evolutionary, with evolutions required in several different areas: hardware, software, networks, and the attitudes and habits of the working scientists, editors, librarians, and other individuals in the scientific-information community. All of these evolutions have much to accomplish, but each will have its own distance to go, will encounter its own problems, and will proceed at its own speed. In Sees. II and III we have described where we stand in these various areas and have attempted to describe where we are going, at least in the near future. Here we summarize some of the features of these evolutions, and the possible roles for the APS. We consider the different areas in order of what is likely to be increasing difficulty.
The most difficult obstacles, discussed in Sec. V, are in the administrative and financial areas and offer the APS its most serious challenge. How these obstacles are overcome, and under whose leadership, could have a profound influence on the ultimate form of the World Scientific Information System-what it will be, when it will come, even if there will be one. In this area, the process is less likely to be evolutionary, more likely to involve some decisive actions by key players. Although the APS, together with the AIP, occupies a natural position to lead the transition at least to a Physics Information System within the United States and probably throughout the world, it is not the kind of activity in which the APS finds itself comfortable. Also, the competition from other sources with somewhat different goals, will be intense, because the form of the ultimate system will have enormous influence. In Sec. V, we have highlighted several issues:
Recognizing the unpredictability of the detailed steps in each of the areas leading to the vision, we have, in Sec. VI, recommended several steps for the immediate and short term that the APS should take. We have also suggested various strategical steps like having a "tracker" to follow technological developments, assuring expertise in this area on the Publications Committee, working towards b standards, etc. But if there is one overriding strategic recommendation that is implied, it is to keep an eye firmly on the vision and take whatever leadership initiatives seem appropriate to lead the U.S. physics community, and ultimately the world scientific community, toward that vision. References
The Appendices to the report are provided separately. Links to several related pages on the web as of January 1997:
The other information sources cited in the report (from the IEEE, OCLC, AMS, etc.) have evolved significantly and a synopsis would be beyond the scope of this brief addendum... Contact apsmith@aps.org if you notice errors in this document. See the translation notes for details on how it was made available online. |

