[Table of Contents]


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [AV Media Matters] Answers to Some Technical Questions - WARNING - LONG A



At 04:45 PM 07/05/2001 +0000, James Lindner wrote:

Jim,

I LOVE "The Three Cs!" That was a great explanation! I agree with Jim
Wheeler that everyone should save a copy of that message!

I think we all are in agreement that the compression on DigiBeta is about
as benign as it comes. Perhaps the only other compressed format that is
cleaner is Ampex DCT but that is such a pricey fringe format that it's hard
to recommend its use at all--although I've used it on some projects for
very specific reasons.

I've interspersed some inline comments below on the file-based archiving

>There have been some comments about having a file-based system instead  of
>a tape-based system. This is an involved discussion ­ but a few
>comments. The jump between a tape based format and a file-based format
>is not simple and NOT necessarily transparent.

This is definitely true. If I somehow implied that it was simple, I
apologize. My original goal in stating that this was an option was based on
an assumption that if someone was planning on transferring a huge amount of
data from one format to another, that instead of thinking of another
digital tape format that the digital archive was perhaps a better solution.

>In the [snipped] example above [relating to error correction and
>concealment] we
>saw how the data goes through several stages ­ this is not necessarily
>the same with file based systems ­ meaning that file based systems
>generally do not have the concealment phase. One could argue that they
>don?t have to ­ but depending on a number of things concealment of
>errors of picture information is important.

Going back to your original example of the paycheck, is it $100 or $1000
and someone not being happy either way if the computer decides it's $550.
It appears that the data industry--banking, stock markets, CAD designers,
Legal and Medical references, and other digital library and archive
infrastructures have solved this problem. As one colleague on the AES
Technical Committee for Archives, Restoration, and Digital Libraries, Dr.
Henry Gladney, is fond of pointing out to me when I ask him similar
questions, is that this is a well-known problem of data center management,
and management to preserve all bits is common today. One must make the
serious leap from being a shelf-based archive to being an information
technology (IT) data center to successfully adopt and maintain the digital
archive.

Whether or not this is a plus is TBD. We certainly know that IT operations
are not 100% foolproof, but many users have very respectable records in
this area.

>Most file based systems at
>this time use lossy compression which most in the archival circles feel
>is a problem and most in the professional broadcasting circles have
>accepted for many years now. I am not going to step in the middle of
>this one at the moment ­ let?s say that there is room for several points
>of view ­ but the important thing is that while file based systems
>certainly have many advantages they also have several disadvantages on a
>practical basis at this particular moment in technology

I think it's safe to say that any compression system that works on a
dedicated-format tape can also be written to a file. In fact, DIgiBeta uses
the same basic tape system as Sony's DTF (I hope I have the correct
acronym, my Alphabet Soup is cloudy like the rest of LA today). It uses
different, but related, tape mechanisms but totally different electronic
systems. It's still bits on the tape. Think of Audio CDs and Data CDROMs
for a simplistic example.

The amount of loss in any compression scheme is related to the quality of
the compression scheme and to the ratio of the input to output data rates.
Program material also plays a role. Noise in the original plays a great
role. Certain program materials are easier to compress than others. Schemes
which code single frames achieve poorer compression ratios for the same
perceived picture quality than systems that look at multiple frames at a
time, although the latter present greater challenges for reconstruction and
future processing.

>­ so one can?t unfortunately simply say ?not putting it on a file based
>system is a
>mistake?.

I agree! I think the mistake is not fully evaluating file-based,
data-domain solutions when one has a large archive to migrate. Doing it now
is certainly at the leading if not the bleeding edge of the curve,  but
making a dedicated-format tape copy without invoking a file-based structure
just means that you'll have to transfer the entire archive with some degree
of manual intervention in the future. We need to think of the 100+ year
solution not the 20 year solution in some instances. In other instances,
solving the problem for 20 years is more than adequate.

>In fact most of the systems that I have seen that are in
>actual use in the real world that ARE file based are really hybrid
>systems.

By hybrid, I am assuming that Jim means a hybrid of disk and tape, and that
is certainly part of a digital archive or digital library infrastructure.
There are software solutions that address this hierarchical storage
management model (HSM). Both EMC through their acquisition of Avalon and
IBM through their relationship with Tivoli are major players in this field.
These systems also allow for completely automated migration from one
storage format to another. They also monitor media usage and can schedule
media cloning after a certain number of passes, for example.

There is another view of hybrid and this has several sub-groups. One
example is a system that needs to provide browsing of any given asset for a
large number of users. A prevalent solution to this challenge of bandwidth
is to provide a lower-bitrate (but still frame-accurate) proxy of the
higher bitrate archival copy. This works for "born digital" assets that are
in the bit domain from their inception as well as for retrospectively
ingested analog and dedicated-format digital video tapes. Where this model
falls short is the archivists' concern that no current scanning technology
is suitable for the preservation of 35mm and wider film. However, that does
not preclude  yet another level in the HSM where the original optical film
is also cross-referenced as the ultimate parent. In this case the digital
archive could also contain multiple levels of quality at varying bitrates
including the browse proxy and conceivable a high-resolution digital copy
(24P?) suitable for production in any current or near-term future video
standard. In this case, the original film could be (figuratively if not
literally) put in the deep freeze and only accessed when someone wanted to
make an IMAX presentation print (or something like that but slightly less
extreme).

>I personally feel that file based will be the way to go ­ but
>at this particular moment there are really no agreed upon standards
>although there is movement in that direction. Even the metadata
>standards are not quite set in ?stone? as of yet. You might say ­ who
>cares??

We all care! There are strong movements afoot to standardize--in an
extensible manner--many of these areas. The one thing that I believe is
critical is that whatever bit-reduction schemes are accepted must be
well-documented and have their decoding schemes available in software (even
if a hardware accelerator is used in practical systems). This will ensure
that the signal can be decoded at an arbitrary point in the future.

The Society of Motion Picture and Television Engineers (SMPTE)--with wide
support--has developed a universal metadata dictionary (see
http://www.smpte-ra.org) and an organization to manage it. This supports
the Key-Length-Value (KLV) file structure that is eminently extensible.

MPEG7 is coming along as a metadata standard, and some of the same people
working on MPEG7 committees are also involved with the Advanced Authoring
Format (AAF http://www.aafassociation.org) file format, so we are hoping
for some harmonization there. Additional harmonization work--and I am
peripherally involved in this one--between AAF and AES-31 has been initiated.

If metadata is retained in a granular-enough form, then transcoding tables
MAY take care of many of the issues. If simple transcoding fails, then more
sophisticated parsing may solve the translation problems. We need to be
careful as we enter metadata that we don't make it difficult to transcode
and/or parse.  We also have to go back to "Alice in Wonderland" and (pardon
me if I butcher the quote) address the concept that "a word means exactly
what I want it to mean, nothing more and nothing less." We need to bridge
the gap between archivists in our field and library scientists and other
linguistics-oriented professionals who understand how words relate to other
words. For example, if I'm looking for something in Florida, and the
location field says "Southeast" the system needs to know that Florida is a
member of the superset "Southeast." Yes, there are many subtle but thorny
issues that need to be addressed, but we should not be afraid of them and
sit and wait for someone else to take the first step. I think the
technology initiatives that are underway now will support digital libraries
for rich media collections.

>It is very important because you should NOT assume that one digital file
>format would necessarily be perfect when migrated to another. This is a
>very major point. Anyone who has migrated data over the years will have
>war stories to tell you about trying to migrate word processing files
>from some old relic to a new old relic?.. It is not necessarily a
>perfect thing ­ particularly when there is compression involved. So just
>because video or audio information is saved as data on a file system
>somewhere does NOT necessarily mean that it is safe or that it can
>necessarily be migrated without any problems. Maybe it can ­ maybe it
>can?t.

I think that with some of the guidelines that I've provided above and the
intent to make it happen properly, that this will be less of an issue. For
example if we choose DigiBeta (assuming there to be a software decoder),
MPEG2, or DV compression formats, then it should be fairly easy to keep
around a decoder that will functionally decode these in the future. In
fact, the AAF file format allows for the embedding of such a codec so it's
already there in the "briefcase" of the file.  Obviously transcoding the
essence of any file from one compression standard to another will have all
of the same problems in a file-based system as it will in a
dedicated-format tape-based system. One of the problems of how this decoder
can be long-lived is addressed in the work done by Raymond Lorie of IBM and
his software concepts for making not only data but programs survive into
the future. While it may not be practical to create a "Universal Virtual
Machine" it still seems like it should be practical to continue to migrate
dedicated software "viewers" for the few formats that will actually be
used. Certainly that will be easier than maintaining the 72 or whatever
formats of videotape the Jim has so well documented on his slide-rule handout!

>I will go back to work now?

I think I'd better, too, although I discuss this around work, too, and
there ARE issues of economic models and how this all works, but these, too,
will come to be resolved at some point. Thanks, Jim, for a great post and
some interesting discussion!

Cheers,

Richard

(these are my own opinions and do not necessarily reflect the opinions of
my employer)


[Subject index] [Index for current month] [Table of Contents]