Preservation

Digital texts offer opportunities for storage, distribution, access, analysis and forms of interaction and use that are not possible with print texts. However, the same features of malleability and replicability that make digital text so versatile also increase its vulnerability. Unlike printed books, which are self-sufficient textual objects, digital texts are machine- and software-dependent, which means that our ability to use them is subject to transient and unstable technologies (Beagrie & Jones, 2002). Digital texts are more easily altered than print texts, so it is more difficult to ensure their integrity and authenticity (Lavoie & Gartner, 2005). The vulnerability of digital texts, and the implications for long-term use of textual artifacts, are widely recognized in the library and archive communities. International initiatives in North America and elsewhere exist to study and develop guidelines for digital preservation (Giaretta, 2006; Waters & Garret, 1996). Examples of such initiatives are PLANETS, CASPAR and the Digital Curation Centre. Their goal is not only to preserve the texts themselves, but “to preserve the information integrity; that is, to define and preserve those features of an information object that distinguish it as a whole and singular work" (Waters & Garret, 1996).

The main consideration in trying to preserve the texts themselves, and to ensure that they are available to future generations, is technological obsolescence. A complex layer of technologies mediates between digital texts and users. These technologies –storage devices, data formats, interaction and viewing applications – have a limited life expectancy (Lavoie & Gartner, 2005). Digital storage media break down surprisingly quickly; versions and types of applications and systems change rapidly due to technological advances (Beagrie & Jones, 2002; Lazinger, 2001). Some approaches to this problem have been developed, and they need to be considered in any preservation strategy.

To combat instability, stored data need to be regularly backed-up and may be periodically “refreshed”, i.e. copied onto fresh storage media. Large data archives should be stored on multiple sites and in more than one format (Linden, Martin, Masters, & Parker, 2005). To combat obsolescence, both emulation and migration can be used (Hedstrom, Lee, Olson, & Lampe, 2006). Emulation is the use of current technology to mimic older systems needed to access legacy data (Granger, 2000). Migration is the practice of periodically moving data onto new systems to keep up with current technology. Both practices can be effective; however, it is essential that they be well-documented and use clear standards and methods, because both have the potential to change the digital artifacts, either through data corruption or changes in the structures or capabilities of the systems.

This susceptibility to change is a serious challenge for digital preservation initiatives, as it is difficult to ensure that texts are actually what they claim to be, and that they have not been altered in any way. Even if interactivity and changes in texts are desirable in current and future use scenarios, it is important to be able to preserve earlier versions of texts, and understand how changes are made over time. The principal approach to this problem has been to develop models and standards for the archiving and description of digital texts. The Open Archival Information System (OAIS) reference model is an ISO standard (2002) that describes the environment, functional components and information objects of systems designed to preserve digital materials. The key concept in this model is that objects are information packages, which include both the content and the associated representation information needed to make the object understandable (Caplan, 2006). Although not explicit in the model, this type of framework can be used to design "trusted repositories” that preserve the integrity of digital texts. Implicit in OAIS and other related standards (such as RM standard ISO 15489) is the need for preservation metadata, “information about the technical environment in which records are created and exist" (Duff, 2003). Preservation metadata serve as documentation for digital texts and may cover a number of areas (Lavoie & Gartner, 2005):

Nested within the broad OAIS model are more specific frameworks for preservation metadata. The Metadata Encoding and Transmission Standard (METS) provides a coding and structure standard for metadata elements, useful in shaping OAIS information packages. METS is an empty container that can include structural metadata for complex objects. It is also a wrapper for administrative, descriptive, and structural metadata, and it defines a metadata exchange syntax (Day, 2005; Duff, 2003). A set of core metadata elements that can be used in the METS framework is defined by the PREMIS DATA Dictionary (2005), which also provides guidelines for the development of preservation metadata schemas.

However, no preservation metadata schema can be developed in the abstract. One of the major conceptual challenges associated with digital preservation is to determine what needs to be preserved, given that digital texts can be defined at different levels – as data bits, text, structure, format, intellectual meaning, interpretations and so on (Caplan, 2006). Furthermore, the decision as to what features determine the integrity of a text is not based on universal standards, but is dependent upon current and future user and task contexts (Ross, 2002). For example, some digital texts are fluid by nature, and so a fixed ordering of the parts may not be necessary to preserve their integrity (Steemson, 2002). Thus it is important that digital preservation issues be considered at the outset of any project involving digital texts, so that the creators of the texts and the infrastructure may participate in determining the key elements for preservation. These decisions should be developed in the framework of a data model capable of accommodating diverse types of primary sources and preservation metadata. The data model should be based on an understanding of user roles, tasks, and use contexts. The technology infrastructure should be capable of protecting the integrity of digital texts and supporting the concept of self-documenting objects by providing a mechanism for both automatic and manual creation of metadata.

Digital preservation issues should be considered from the outset of the HCI-Book project in order to develop conceptual and physical models of the e-book and e-book repositories that are consistent with developing standards in archival information systems and preservation metadata. In particular, we need to take a broad perspective on identifying the features of an e-book that are essential to its integrity, and begin to think about an infrastructure capable of documenting and preserving those features. This should be done in coordination with current initiatives, such as the PLANETS project, which are focusing on these questions.

Top