When a document is "born digital," how long can it be expected to live? Will the information found on the web last week still be around next year? How about 100 years from now? For the users of scholarly journals -- and the librarians who maintain them -- these are important questions. More and more of the journals that once existed only in paper form are being published in electronic editions. But suppose the publisher of a journal goes out of business 10 years from now or decides that the electronic edition is unprofitable and closes down its web site? Paper journals take up a lot of space on shelves, and librarians want to know when it will be safe for libraries to discard their paper copies in favor of digital versions.
With a $150,000, one-year planning grant from the Andrew W. Mellon Foundation, Cornell University Library will explore the idea of creating permanent digital archives for scholarly journals, with the goal of setting up a pilot archive of agricultural journals. The effort -- called "Project Harvest" -- follows in the footsteps of Project Euclid, a Mellon-funded venture by Cornell and Duke University in the online publication of math journals.
"One of the things that libraries do is make sure that the literature of the present is available for the future," says Peter Hirtle, co-director of the Cornell Institute for Digital Collections. "We want to investigate how to preserve literature that is now being distributed in electronic form." Hirtle will serve as the project coordinator. Sarah Thomas, the Carl A. Kroch University Librarian, will be principal investigator.
A full-time person will be hired by the project to negotiate agreements with journal publishers for the inclusion of their journals in the archive. "We hope that the negotiations will lead to the development of a model agreement that other publishers could readily accept," Thomas says.
The planning will consist mostly of answering a long list of questions, including:
o Will the collection be a "living archive" that scholars can access or a "dark archive" that simply preserves journals against the possibility that they are needed in the future? o Will the scholarly community feel sure the archive will be available in the future? Should there be a procedure for "certification" of an archive that to assure users it is reliable?
o Should everything be converted to one standard format, or should the formats used by individual publishers be retained? Tentatively, the project plans call for one copy in each journal's usual format, plus another copy in a commonly supported format.
o How do librarians ensure that stored material will be readable as technology evolves? It's now a given that some sort of "migration" procedure must be built in so that documents are copied from old formats to new ones when the old formats start to become obsolete.
o What assurances are there that digital texts will not be altered? Publishers of electronic material have the right to change what they have online, but librarians want to preserve the original content, just as they do with paper publications.
o Should there be multiple copies in different locations? Cornell is a partner with Stanford University and Highwire Press in a program called LOCKSS -- "Lots of Copies Keeps Stuff Safe" -- that other land-grant universities might use to mirror a Cornell archive. Another alternative is for various universities to maintain different parts of an archive, spreading the workload.
o Who will pay for long-term maintenance of the archive? Do publishers pay to be included? Do users pay for access?
Cornell will draw on its already extensive experience in creating and preserving digital documents. The library has digitized and made available to scholars a wide variety of historical documents and is engaged in research on preserving digital information. It also has completed several projects involving negotiations with scholarly publishers, including TEEAL (The Essential Electronic Agricultural Library), which makes agricultural journals available to Third World scholars on CD-ROM. Project Harvest will create a development team with representatives from a small pilot group of interested publishers. Later, other publishers will be invited to participate.
Next year Cornell hopes to secure further funding to purchase hardware and create the actual archive. By that time, according to Thomas, "We will have modeled the architecture for a long-term repository based on the best thinking in the digital preservation community tempered by the realities of what our publisher/partners are willing to accept."
Related World Wide Web sites: The following sites provide additional information on the projects and institutions mentioned in this news release. Some might not be part of the Cornell University community, and Cornell has no control over their content or availability.
o Cornell Institute for Digital Collections: http://cidc.
o Project Euclid: http://projecteuclid.
o The Andrew W. Mellon Foundation: http://www.