Structured Glossary of Technical Terms--

[Document Tree]
The Preserved Copy
3.1 Preservation Technology
3.2 Capture Technology
3.3 Storage Technology
3.4 Access Technology
3.5 Distribution Technology
3.6 Presentation Technology

3.3. Storage Technology

Storage Technology refers to the technology used to store the images or information obtained through the use of some form of Capture Technology (3.2). This includes the medium used for storage (3.3.1), the compression methodology used to minimize the amount of storage medium employed (3.3.2), the format used to program the image or information onto the medium (3.3.3), the encoding methods used to represent any interpretation of the stored information (3.3.4), and the useful life of the storage medium (3.3.5).

3.3.1. Storage Medium

3.3.1.1 Paper (see 1.1.1)

3.3.1.2 Microform (see 1.1.2)

3.3.1.3 Video (see 1.1.3)

3.3.1.4 Film (see 1.1.4)

3.3.1.5 Audio (see 1.1.5)

3.3.1.6 Digital Electronic

A family of storage devices where information or data are represented by a series of quantized changes to the surface of the storage medium, where such quanta are recorded or modified using electronic means. There are two main classes in this category: magnetic devices where, in recording, the magnetic state of a coated surface is altered by the electronic digital signal, and, in reading, the surface is sensed using reading heads conceptually similar to those used in common tape recorders; and optical devices where the optical properties of a coated surface are altered (in one such technology, submicrometer-sized holes are recorded and read by laser beams focused by electronic means onto the area of the spot). The recorded quanta normally corresponds to a recorded "I" or a recorded "0", that is, of bits (derived from "binary digits"), all data and information being constructed from these basic building blocks.

Such devices are further classified according to whether they are read/write devices (that is, information may be written onto the device and read from the device, and the information can be modified as many times as desired), read only memory (ROM) devices (that is, prerecorded information can be read from the device, but the information cannot be modified), or write-once-read-many (WORM) devices (that is, information may be written once by the consumer onto the device, but thereafter it can only be read). Most optical devices are either read only or WORM devices, but a class of devices that combine both magnetic and optical technologies (magneto-optical devices) are indeed read/write devices.

Typically, magnetic devices are of higher performance in terms of access time to a given segment of recorded information and transfer time of such accessed information to the host device. Optical devices, however, are generally more economic in terms of storage capacity. Magnetic technologies have a longer history than optical technologies, and more is known about their useful life, for example (see 3.3). Both technologies seem to be following similar cost/performance curves with performance parameters doubling in capability approximately every two to three years (except for access times which are improving much more slowly), and cost per bit halving about every two to three years.

Both devices are further classified as to whether they are random access devices (such as disk storage devices) or serial access devices (such as tape storage devices). With random access devices, information stored at any point can be directly accessed (much as is accomplished by placing the playing-arm of a phonograph at any point on the phonograph record); with serial access devices, information can only be accessed by passing through information that may be recorded ahead of it on the medium (as in winding through a tape on a tape recorder to arrive at a particular passage).

3.3.1.6.1 Magnetic Disk

A rotating circular plate having a magnetized surface on which information may be stored as a pattern of polarized spots on concentric or spiral recording tracks. These plates or platters are usually stacked in disk drives, several to a drive. These platters may either be removable or not, although in high performance disk drives, the platters are usually not removable. They are, however, read/write devices (3.3.1.6). Some removable magnetic disks of lower capacity are known as floppy disks, since originally the recording medium was made of a flexible plastic.

3.3.1.6.2 Magnetic Tape

A plastic, paper, or metal tape that is coated or impregnated with magnetizable iron oxide particles on which information is stored as a pattern of polarized spots. These are read using magnetic tape drives. Access times with magnetic tapes are slower than those associated with correspondingly priced disks, since they are serial access devices, but the tapes are almost always removable so that the information can be stored off-line, thus making tapes [25] useful for archival storage (but see 3.3.5).

3.3.1.6.3 Optical Disk

A rotating circular plate on which information is stored as submicrometer-sized holes and is recorded and read by laser beams focused on the disk. This includes the class of CD-ROM devices, which embodies the same 5 1/4" diameter format used for CD recordings. CD-ROM's are usually read by inserting the CD-ROM disk into a CD-ROM player. Other typical formats involve 12" or 14" diameter formats, but there is a dearth of standards. The latter are usually read by inserting them into optical jukebox devices, which perform the role suggested by their name. Even when mounted, access times for optical disks are typically relatively slow, because of the lag time needed to "spin up" the disk. However, the cost per stored bit is extremely low. Error rates may also be higher than for magnetic technologies. As such, optical disks are most useful where there is an abundance of redundant information contained in the stored data, such as would be the case with the storage of scanned document pages. On viewing the data, the eve would not likely be troubled by a tiny dot among an ocean of dots being the wrong shade of grey. See also the discussion of magneto-optical devices (3.3.1.6.5). Conversely, magnetic devices excel in the recording of encoded text (see 3.3.4.2), but may be expensive to use for the storage of images even when compressed (3.3.2).

3.3.1.6.4 Optical Tape

An emerging class of technology that combines the advantages and disadvantages of tape (3.3.1.6.2) with those of optical recording technology (3.3.1.6.3). Their chief advantage ma,v lie in very cheap cost per bit storage, but at this time they suffer from relatively high error rates.

3.3.1.6.5 Magneto-Optical Disk

Disks that combine the use of magnetic and optical technologies. To record data, elements of the crystal structure of the substrate are aligned by using a laser to heat the element in the presence of an applied magnetic field. When the magnetic field is aligned one way, a "I" is recorded; when the magnetic field is reversed, a "0" is recorded. The data are read by reflecting a lower-intensity laser beam off the surface; the polarization of the reflected light varies according to the crystal alignment of the element of the substrate. Unlike regular optical disks, magneto-optical disks are read/write, and have performance characteristics somewhere between those of magnetic disks and optical disks in terms of access times, transfer rates, and storage capacity.

3.3.2. Compression

Compression refers to the extent to which the encoded form of the preserved or reformatted document has been modified to reduce the amount of storage space required by the storage medium. The technique takes advantage of the great redundancy that is present in much recorded data, particularly in image documents (3.1.5.1). Savings of storage of factors of ten or more may readily be achieved depending upon the scanning resolution and methodology employed (3.2.3), the type of material being scanned, and the particular compression method used. Although without compression the storage requirements grow rapidly as the square of the scanning resolution (3.2.3), with effective compression methods the storage requirements can be constrained to grow almost linearly with the scanning resolution. This is because advantage is taken of the greater data redundancy accruing from the increase of scanning resolution--compression effectively eliminates or reduces this data redundancy. Thus, the greater the redundancy of information contained in the scanned material, the more compression is possible--continuous tone photographs, for example, often contain large amounts of redundant information. Compression is an important factor in the economics and efficacy of digital preservation.

3.3.2.1 Uncompressed

No compression has occurred.

3.3.2.2 Reversibly Compressed

Compression has occurred so that the process can, if required, be reversed so that the original can be recovered without loss of information. Also known as "lossless".

3.3.2.2.1 CCITT Group Compression

Compression standards defined by the International Consultative Committee for Telephony and Telegraphy (Comite Consultative Internationale pour la Telephonie et la Telegraphie).

3.3.2.2.2 Reversible Textual Compression

If sufficiently complete, the representation in whole or in part of documents as formatted text (3.1.5.2.2) may represent a form of reversible compression. The use of a markup language (3.3.4.3) is also a form of reversible textual compression. See also 3.3.4.

3.3.2.2.3 Page Description Language Compression (PDL)

See 3.3.4.4

3.3.2.2.4 Other Compression Standards or Algorithms

Refers to other compression standards, de facto standards, or algorithms.

3.3.2.3 Irreversibly Compressed

Compression has occurred so that the process cannot be precisely reversed. The original cannot be recovered without loss of information.

3.3.2.3.1 Irreversible Textual Compression

The representation in whole or in part of a document as unformatted or partially formatted text (3.1.5.2) may represent a form of irreversible compression. The content of the text may be obtained but not one or more of its font style, font size, or positioning on the page.

3.3.3. Storage Format

As used in information storage and retrieval, Format or Storage Format refers to the actual representation of the stored data on the storage medium, that is, the specific way in which it is encoded or programmed onto the medium. Classifying such methodologies is beyond the scope of this document. Indeed, for the most part--and particularly as applied to digital electronic storage technologies--there are few general standards that are accepted by all or most manufacturers. The implication is that access to the information stored on the medium depends upon specific software or computer programs supplied by the manufacturer, software that may become obsolete with the passage of time. One result may be that stored information may need to be reformatted or transferred to newer storage media periodically in order for the information to remain accessible with current software and technology.

3.3.4. Encoding Method

Encoding Method refers to the extent to which the information content of the document has been interpreted and encoded, rather than merely recorded. Such interpretation may be beneficial for a number of reasons including as a means of achieving reversible compression (3.3.2.2); for the construction of document indices to facilitate searching and access (3.4.1); or for efficient distribution of the information across data networks (3.5.5). For example, a document that has been merely scanned as a bit-mapped image (3.1.5.1) has not been encoded (3.3.4.1), even though faithful "digital pictures" of the pages of the document have been obtained. If the images of the document text are later interpreted through internal character recognition (3.2.5), then the digital representation has been textually encoded (3.3.4.2).

3.3.4.1 No Encoding

No interpretation of the information contained in the original document has occurred. If the document were originally scanned using a digital image scanner (3.2.3), then the document in this instance is generally stored in some image format (3.1.5.1), compressed or not (3.3.2). If portions of the document were originally scanned using optical character recognition (3.2.4), then those portions will be stored as either formatted or unformatted text (3.1.5.2).

3.3.4.2 Textual Encoding

The text contained in the original document has been interpreted so that each character has a separate representation (see 3.1.5.2). Such interpretation may have occurred at the time of scanning if an optical character recognition device is used (3.2.4), or later using internal character recognition (3.2.5) programs applied to documents in image format (3.1.5.1). Such textual interpretation may result in either unformatted or formatted text, depending upon the degree of sophistication of the device or program. Recognition accuracy may also be limited.

3.3.4.3 Markup Language Encoding

A computer markup language is a means for describing, for an electronically stored document, the complete positioning, format, and style of text and image segment representations (3.1.5) within the document. When combined with textual representation, it is a means for achieving fully formatted text (3.1.5.2.1). When combined with relevant image information about document graphics material (if any), it may be a means of archiving fully reversible compression (3.3.2.2) of the document. An example of a markup language is SGML (Standard Generalized Markup Language) that has been adopted by the United States Government and by many publishers as a pseudo-standard.

3.3.4.4 Page Description Language Encoding

A computer language in which segments of text and images are economically described with respect to form, orientation, size, density, and other characteristics for purposes of economic transmission across networks and between host devices and output devices such as printers. Page Description Languages are another form of compression (3.3.2), as well as a form of encoding.

3.3.5. Useful Life

Useful Life refers to the archival quality of the storage medium. It usually refers to the period of time during which there is no unacceptable loss of information stored on the medium; and during which the storage medium remains usable for its intended purpose.

The longevity of paper varies considerably depending upon its method of manufacture and conditions of storage (see 1.5). Unless the paper is produced to meet permanent standards (1.5.1), paper may last from a few years or so to hundreds of years. Most paper produced since the middle of the nineteenth century has a useful life of less than 100 years. Paper produced to meet archival standards should last several hundred years. Film, provided it is manufactured, processed, and stored according to archival standards, appears to have a useful life well in excess of 500 years. Videotape appears to be extremely vulnerable and to have a relatively short life of a few decades.

Digital electronic storage media have a varying useful life projected to range from a few years to over 100 years. The latter has not been formally tested by experience, but is projected based on laboratory stress tests. Such media, however, become obsolete for other reasons long before their physical properties render them useless (see, for example, 3.3.3). It becomes economically and functionally infeasible to maintain the information stored on the original medium of capture, since it becomes far cheaper to transfer the information periodically to higher density and cheaper newer technologies. Concerns also exist regarding the possibility of modifying digitally-encoded documents, particularly when "read/write" (3.3.1.6) devices are used (this is essentially not possible with "read only" or "write once, read many" technologies (3.3.1.6)); and regarding other issues of security.

The implications of periodic recopying for libraries are quite far- reaching. Libraries are not used to having to maintain their inventory by periodic recopying, even though such practices are quite common in data centers. Indeed, the recent impetus of preservation may have caused some librarians to rethink their position in this regard, although librarians still tend to think in terms of periods of centuries rather than having (or wanting) to recopy every few years. Such considerations may either hinder the adoption of digital technologies or eventually cause some rethinking of the underlying economics of librarianship.

Further implications are discussed in the Introduction.

[Document Tree]

[Search all CoOL documents]