[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [ARSCLIST] long range file storage
The Sound Directions project is also investigating the use of
"checksums" (MD5 hash, SHA-1, for example) at Indiana University and
Harvard University. We will investigate data integrity checking both for
interim (within our archives) and long-term (mass storage)storage. We're
just getting rolling but I expect we will have data on performance,
implications for workflow, etc. next year.
www.dlib.indiana.edu/projects/sounddirections/
----------
Mike Casey
Coordinator of Recording Services
Archives of Traditional Music
Indiana University
(812) 855-8090
-----Original Message-----
From: Association for Recorded Sound Discussion List
[mailto:ARSCLIST@xxxxxxx] On Behalf Of Damien Moody
Sent: Monday, August 15, 2005 7:03 AM
To: ARSCLIST@xxxxxxxxxxxx
Subject: Re: [ARSCLIST] long range file storage
My department is investigating the use of checksums for the National
Audio Visual Conservation Center. Checksums are a good way to validate
files. Our concern here, though, is that our archive will be so large
and process so much data that we may not be able to create/compare
checksums on every file - perhaps only 1 to a few percent. We may just
have to accept a certain amount of file corruption risk. But we'll
surely continue investigating ways to ensure file integrity for
extremely large archives.
Damien J. Moody
Information Technology Services
Library of Congress
>>> seubert@xxxxxxxxxxxxxxxx 08/11/05 11:04 PM >>>
We are currently creating parallel preservation copies, both online and
on optical media, but eventually I see us phasing out the physical
media. Before we do that, one thing I feel is necessary and that we have
been discussing is the integrity of the data stored online. Once the
data goes onto disk, there is no practical way to manually make sure
that files haven't become corrupted over time, during a backup and
restore process, or during a migration from one system to another. We've
discussed using checksum files created upon ingest that would be
periodically and automatically compared against the files to ensure that
nothing has become corrupted. In case of corruption, the original file
could be restored from tape. I've noticed that the audio files in the
Internet Archive have associated checksum files so you can make sure
that the file you have downloaded is identical to the original. I don't
know if they also use these to ensure data integrity over the long term.
Has anybody looked into this further or implemented this for archiving
audio files?
David Seubert
UCSB