The Commission on Preservation and Access

[Document Tree]
The Preserved Copy
3.1 Preservation Technology
3.2 Capture Technology
3.3 Storage Technology
3.4 Access Technology
3.5 Distribution Technology
3.6 Presentation Technology


3.4. Access Methodology or Technology

Access Methodology or Technology refers to the means of selecting information from among all the information that is stored.

3.4.1. Indexed Access

A Document Index is a systematically ordered file of objects [26] that refer to a collection of documents or to specific parts of those documents, organized in such a way as to facilitate searching the document collection for purposes of selection of single documents or groups of documents contained in the collection. Such document indices may be stored on different media depending upon how they are to be used.

3.4.1.1 Via Catalog

Access via a file of bibliographic records, created according to specific and uniform principles of construction and under the control of an authority file, which describes the documents contained in a collection. The file is usually organized in a systematic manner to facilitate access and document selection. Catalogs historically have been implemented in card files, but increasingly such card files are retroactively and prospectively giving way to computerized data files (1.2.10) which may be accessed and searched by patrons with the use of computer workstations (3.6.2.6) and data networks (3.5.5). Such computer-based catalogs are increasing in sophistication to support complex queries, including Boolean queries, which support logical searching (e.g., all the works of fiction written in Albania published between 1890 and 1919 by authors whose last name begins with the letter "L").

3.4.1.2 Via Abstract

Access via a summary of the document. Most often, the summary is of a contribution to a journal (1.2.6) or other periodical (1.3.2). Such a summary is usually without interpretation or criticism, and may contain a bibliographic reference (or pointer) to the original document. A collection of document abstracts may be used for purposes of search and selection (e.g., Chemical Abstracts, published by the American Chemical Society and also available in digital electronic form).

3.4.1.3 Via Table of Contents

Access via a list of parts contained in a document, such as chapter titles or articles in a periodical, with references by page number or other locator to the starting point of the particular part, usually ordered by sequenced groupings of the order of appearance. Collections of tables of contents may also be used for search and selection purposes.

Other parts of documents that may be used for search and selection purposes include:

3.4.1.4 Via List of Figures, Tables, Maps or Other Illustrations

Access via a list of those parts of a document that are either figures, tables, maps or other illustrations, respectively, with location reference by page number or other locator, usually ordered by location of appearance within the document. Figures, tables, maps, etc. may be listed separately. Usually, in a document, these lists follow the Table of Contents in some order.

3.4.1.5 Via Preface

Access via a note preceding the body of a document that usually states the origin, purposes, and scope of the work(s) contained in the document and may include acknowledgements of assistance. When written by someone other than the author(s) of the document, the preface is more properly termed a foreword.

3.4.1.6 Via Introduction

Access via the material that heads the body of a document and that provides an overview of the work that follows, or other introductory material to the text.

3.4.1.7 Via Index

Access via a systematically ordered collection of words or other terms or objects [27] contained within a document, with references by page number or other locator to the placement of the object within the document for purposes of accessing the object. The index is usually placed last in a document.

3.4.1.8 Via Citation

Access via reference to a document or to a part of a document, such as an article in a journal (1.2.6). A bibliography is a collection of citations directed to a specific purpose, such as a subject bibliography or a bibliography of citations appended to a journal article.

3.4.2. Full (or Partial) Document Access

Full Document or full text searching is where the full text of a collection of documents is stored, and the entire text of all or portions of the documents is searched for specific character strings, usually combined with some Boolean logical searching capabilities. This requires that the document be textually encoded (3.3.4.2) either because it was initially created that way or perhaps more likely in the context of preservation because such textual encoding was obtained from scanned document images (3.1.5.1) with internal character recognition (3.2.5). Thus, for example, a search may consist of searching for all documents in the collection published by a given author or set of authors between certain dates containing the text "all that glitters." Full text searching is normally implemented on computers. For other than small collections of documents, a given search may be very costly in terms of computer processing time.

3.4.2.1 Via Inverted Text File Index

The use of Inverted Text Files (or other similar techniques) is often used as a compromise between indexed and full text searching. A file of words (Keyword), phrases (Key Phrase), or other text objects contained in a given collection of stored documents is created from an initial analysis of the full text together with locators as to where all instances of the word, phrase, or other object can be found within the file. In use, instead of the full text being searched for all occurrences of the object, [28] the inverted file itself efficiently gives pointers to the locations. The construction of such an inverted file, however, may be expensive for large collections of documents, as would adding new words or other objects [29] to the file at a later date. Furthermore, the use of the file is only as good as the care that has been given to the choice of objects to be contained within the file.

3.4.3. Compound Document Access

Compound documents are documents that contain both textually and other forms of encoded information, including image (see 3.3.4). Techniques are being developed for expanding the concept of text searching to searching of full compound documents, including those containing image objects [30]. A full glossary of such techniques, however, is premature and beyond the scope of this document.

[Document Tree]


[Search all CoOL documents]