We do two types of digital reformatting.
With respect to videotapes, motion picture film, records, digital originals, and so forth, although these aren't the subject of this talk I wanted to point out that, for these media types, many issues I discuss will be the same, or at least analogous. However, in these cases, we are more likely to be doing true reformatting (not just providing digital surrogates) because the original media are so susceptible to problems.
Our first priority, when undergoing a digital reformatting project is to not make another problem that is as bad as the current one.
To do this we look at three areas:
Media Integrity
Format Obsolescence
Information Fidelity
On the positive side, with digital files we can get around this problem because we can make perfect "clones" of the digital information, unlike the copies we make in the analog world, which often result in signal loss. So there are some pretty straightforward solutions to digital media instability: regularly refreshing the host media, and keeping redundant copies (backups). In addition, with compact discs, for example, we could monitor something like the Block Error Rate (BLER), which measures the number of data blocks per second that have one or more bad symbols. That way, when we see the BLER rise too high, we could refresh the host.
But, I want to point out that for our unique materials, for example the scans of photographs of the Stanford Family, the Preservation department does not recommend compact discs. When I started we had a model that was inherited from the days of Microform reformatting, where master copies were deposited in salt mines, and our intention at that point was to send CDs of our TIFFs to Kansas. As I said just a moment ago, Microfilm has a much higher life expectancy, and it would be a mistake to follow this model with digital media, but this really brings me to the second area.
A rather obvious example of format obsolescence from recent history would be a 5 �" floppy containing files in some old proprietary format such as XYWrite. From this example we can see that format obsolescence has two parts:
Of course this implies an ongoing reformatting program as well, but file formats based on open standards are far more stable than media formats, which require functioning hardware. Also the effort required to reformat data files does not increase directly in relation to the number of files, as would happen in a media reformat. Imagine a large number of files, say 100,000 TIFFs. To reformat the whole lot we probably only have to execute one "batch" command, whereas a reformatting project of 100,000 floppy disks is almost inconceivable due to the labor cost required to hand manipulate that many disk insertions, to say nothing of the tedium that would result.
Our experiments with SRM 1010a led us to the conclusion that this was too subjective a method of evaluating resolving power. It also led us to ask the question "How much resolution is too much?" There is a point where one scans in so much detail about a photograph, that one can clearly see the shape of the silver particles themselves, but weve calculated that we are no where near this resolution. What we discovered is that the scanner manufacturers treat DPI as a selling point, and make excessive, although legally accurate claims. The key is that they state how many samples per inch they are capable of taking, but they dont say how big the samples taken are.
Fortunately for us, the FBI already determined the need for objective analysis of visual sharpness, and they developed a system using a Modulation Transfer Function (MTF) and a sine wave target. We are now starting follow up studies with the program and target that the FBI developed to measure the quality of fingerprint scanners. What we hope to get out of this is first of all a better understanding of where that "sweet spot" is where we are getting the maximum resolving power, without generating additional false information, and secondly a better tool to evaluate future acquisitions of digital imaging devices.
But the creation of the file is only half the story. Seeing doesnt take place on the printed page, or a computer monitor; it takes place in the mind. Seeing, especially seeing color is an event, and we all know this because we have seen that the color of a building will look quite different in late afternoon sunlight than it will at noon, and that an article of clothing that appears to be one color when viewed indoors, in artificial light appears to be a different color when seen outside in natural lighting.
In the first half of the last century an international committee was formed which developed some standards for describing color. The CIE, the Commission Internationale De L'Eclairage, or the International Commission on Illumination, established, amongst other things, a standard observer, which is sort of a theoretical viewing event where many of the variables which effect viewing something can be factored out, and the ISO, the International Organization on Standards, adopted the work of the CIE. More recently the International Color Consortium (ICC) defined a file format to characterize imaging devices in accordance with the work of the CIE. When we scan an image, we supply an ICC profile of the capturing device, which basically explains how the device differed from the standard observer. An end user could, then take the digital image, the ICC profile, and an ICC profile for their output device (either a monitor or a printer) and hope to faithfully create a seeing event similar to the CIE standard observer viewing the original photograph.
I wanted to finish by talking about metadata, which is really tied up with all three areas I have discussed. The ICC profiles I just mentioned, for example, are important metadata components, but metadata is also important for avoiding data loss due to media integrity problems or format obsolescence. Apparently older records for both the general collection and special collections use very broad terms like "motion picture", which could mean 16mm color polyester film, 35 mm B&W acetate film, or even PAL formatted VHS, and so on. When we determine that we need to reformat all the U-matic videotapes, how do we locate them? Similarly, in the digital realm, if we have holdings on CD, we need to know that, as well as if they are ISO-9660 or use the Jolliet extensions, or follow the Red Book specifications. We also need to know if the files are encoded in TIFF, GIF, or PDF. General information like "computer file" is not going to be enough, for us to manage all of the reformatting and data migration tasks that we face.
LE as per ANSI / NAPM IT 9.13-1996, "The length of time that information is predicted to be retrievable in a system under extended term storage conditions."