[Table of Contents]


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ARSCLIST] Digitizing libraries



This is a very interesting post, just one very quick comment. I have been a consultant for the Library of Congress for about 5 years now - and I can tell you for sure - absolutely - that those quotations of space are just - well - silly. Since the library does not even have a full accounting of exactly how large the collection is - and because it grows every minute (literally) these "estimates" really have absolutely no basis in fact. The Libraries collection includes many more types of objects then books. And even if you just consider the books - they are in many different languages - and what about the pictures in the books? There are illuminated manuscripts. In the National Audio Visual Conservation Center being built in Culpeper Virginia, the estimate is that many terabytes a day will be generated in the transfer of analog carriers. So - the planning is in petabytes on an annual basis. As far as Ultra HD - that is an experimental format that has been developed by NHK and is not a broadcast format, and to the best of my knowledge is primarily a research and development project. I would have loved to attend the lecture on the St. Catherine digitization project - I am personally very interested in that project. One comment about the operators used for digitization - and this is from someone who has supervised many people who have that precise job. There are many different types of materials that similarly require different skill levels. One of the overarching issues is not only the operator skill level, but the standards set for a project and the quality control and system (by system i mean ENTIRE system) used to maintain it. While humans are great at certain things they are not great at everything - and as just a matter of fact - often the task of digitization is a boring job that is difficult to maintain concentration on. Sure, if all the material was fascinating and we all had perfect days at work and at home, and if humans could maintain a verifiable (meaning measurable) level of focus that would be great. But - for example - for a human to really be able to note material condition on a second by second basis is just not possible. So after one has done this for a long time - you get to the point where you look elsewhere to insure quality control and consistency of work over time. Generally that means systems that are either automated or semi-automated to assist the operator.

I have not had time to comment on the threads on the Digital Black Hole - but I have been saying for MANY years that the cost of storage was only a small part of the full cost of maintaining a digital (or anaog) archive. As time goes on, the proportion of cost of storage relative to other costs will continue to shrink. This should not be a big surprise to anyone really. It may be a paradigm shift, but it is not all that different then what has been happening in the IT field for many decades. One of the "good news" items in all this is that because computers are so widely used there is tremendous purchasing power that continues to drive prices down, and most expect that trend to continue. Technology tends to get smaller, better, faster, and cheaper over time. Society is making more AV "stuff" then ever before, we need to continue the shift to an IT environment so that we can manage it all.



Jim Lindner

Email: jim@xxxxxxxxxxxxxxxxx

  Media Matters LLC.
  SAMMA Systems LLC.
  450 West 31st Street 4th Floor
  New York, N.Y. 10001

eFax (646) 349-4475
Mobile: (917) 945-2662
Office: (212) 268-5528

www.media-matters.net
Media Matters LLC. is a technical consultancy specializing in archival audio and video material. We provide advice and analysis, to media archives that apply the beneficial advances in technology to collection management.


www.sammasystems.com
SAMMA Systems provides tools and products that implement and optimize the advances in modern technology with established media preservation and access practices.



On Dec 13, 2006, at 9:56 AM, Karl Miller wrote:


"Steven C. Barr(x)" <stevenc@xxxxxxxxxxxxxx> wrote: >If this implies what I suspect it may...it gets me thinking about
a further possibility! Since sound files for the most part start out
in digital form...and image files (as well as possibly text files,
such as books...!) can be converted to digital files by >scanning...how long will it be before libraries are converted to >institutions with huge multi-disc servers...Will future libraries be >measured in terabytes (or whatever follows those...?!)...

There was a time when the content of all of the text in the Library of Congress was used as a point of reference to give some conceptualization of a terabyte...as I recall reading some years ago...in an attempt to give some notion of the size of a terabyte it was stated that if the entire text of all of the material in the Library of Congress was converted to ASCII it woulc require about 4 or 5 terabytes of storage. Most recently wikipedia suggests it is about 20 terabytes.


Also from wikipedia...one hour of uncompressed "ultra" high def video takes approximately 11.5 terabytes. While it might not be time for us to think in terms of zettabytes or yottabytes, we might need to think in terms of petabytes.

As to the role of libraries in all of this. Many libraries outsource their digital storage. It makes sense in that they don't have the infrastructure to deal with it...level of salaries, expertise, hardware, etc. Computer providers have all of the above, so, they will be (and already are) our libraries.

Two nights ago I attended a lecture...there is a project to digitize the approximately 4,500 volumes of the library of the Monastery of St. Catherine at the base of Mount Sinai. They still have lots of money to raise, but they estimate it will take 5 workstations about 5 years to do the job. Ok, we aren't talking regular books, we are talking fragile material...perhaps not unlike dealing with a glass based lacquer with some cracking...well, I would guess the lacquer would be more problematic. So, it will take time...my guess is that it will take them much longer than they estimate.

In short, who is going to do all of this work? Who is going to train the people? The presentor said they plan to use some of the local bedouins to do the job...I am reminded of those who would use work study students to do audio transfers...I am reminded of our library director who places no value on the skill sets required to do audio reformatting.

The folks working on the St. Catherine's project are having to design their own scanning workstations...with an estimated cost of about $150,000 a workstation...then the cost of salaries of those doing the work...insurance, meetings, training, etc. Who has such a large checkbook for something that might be of interest to Biblical scholars around the world...how many biblical scholars are there? What will be the final cost, per scholar, of scanning those 4,500 books?

My guess is that they will never finish the project. The presentor also spoke enthusiastically about scanning two other major libraries. Of course there is much to be said of the planned imaging technology which will be applied to the these pages...the reading of texts which had been written below the most readable texts, those older texts having been washed off in order to reuse the parchment being a major consideration.

I referred him to Jonas Palm's "Digital Black Hole."

While I am just thinking out loud...I wonder, by the time such projects are done, what will be the state of the files of the first pages scanned...will that data be error free...will we have changed file formats...will our indexing modalities be the same...will our imaging technology have evolved to provide us with even greater clarity? Of course these are concerns which those of us in audio preservation have considered from the first time we were able to reformat.

For me, there are some fascinating questions. When is a library not a library? My answer is, when the information is digitized. When it is digitized it becomes magnetic storage in a computing facility. Hence, libraries are now becoming coffee bars, cafes, lounges, movie theaters, etc. Ah, now it all makes sense to me!

And, if you want a book, you go to Barnes and Noble or amazon.com or abebooks.com or...

Karl



[Subject index] [Index for current month] [Table of Contents]