Re: [ARSCLIST] .wav file content information - chunks

To: ARSCLIST@xxxxxxxxxxxx
Subject: Re: [ARSCLIST] .wav file content information - chunks
From: dave nolan <DaveNolanAudio@xxxxxxxxxxxxx>
Date: Thu, 24 Mar 2005 12:09:48 -0500
Comments: To: Association for Recorded Sound Discussion List <ARSCLIST@loc.gov>
Domainkey-signature: a=rsa-sha1; q=dns; c=simple; s=test1; d=earthlink.net; h=User-Agent:Date:Subject:From:To:Message-ID:In-Reply-To:Mime-version:Content-type:Content-transfer-encoding; b=m7kd0wVE5DQMxf/iw7NWx/PpTX1qRj/Eqjf9BIDxCqAwuwKMZ+OTqazEeYTCIX7O;
In-reply-to: <200503240500.j2M5ShcG006234@sun8.loc.gov>
Message-id: <BE685E8C.1072%DaveNolanAudio@earthlink.net>
Reply-to: Association for Recorded Sound Discussion List <ARSCLIST@xxxxxxx>
User-agent: Microsoft-Entourage/10.1.0.2006
Hello all - 

I am not enough of a "computer bits" whiz to know the real bit-by-bit blood
and guts of BWF metadata, but...

As best I can understand from the EBU BWF standards, there are numerous
"chunks" of metadata that are associated with BWF files - the "bext" (or
"mext") chunk which contains the internationally agreed upon minimum
metadata.  

Additionally there appears to be a way to define chunks that are unique to
the needs of the originating client/institution:

******************************************

(from 
http://www.ebu.ch/en/technical/publications/userguides/bwf_user_guide.php):

What is a "chunk"?
A "chunk" is a self contained collection of data in a RIFF file. It contains
a header, which gives its type and length, followed by data arranged in
fixed or variable length fields.

What is the extra chunk in the BWF?
The "Broadcast extension" chunk, coded "bext", is contained in all BWF
files. It contains the minimum information expected to be needed by all
applications in broadcast production. It contains information on the title,
origination, date, time, etc. of the audio content. A BWF file containing
MPEG audio data also includes a further extra chunk "mext".

Should it be BEXT or bext, MEXT or mext?
Fairly late in the development, it was decided that the lower case was
correct i.e. <bext> and <mext> not <BEXT> and <MEXT>. It appears that
IBM/Microsoft originally intended to use upper case for registered chunks.
However it seems that there have been no new chunks registered since 1994,
according to the documentation on the Microsoft website.

******************************************

Can I add extra information or metadata about my programme to a BWF file?
You can add any valid chunk to a BWF file. However, the extra chunks will
only be interpreted by applications which are programmed to do so. Other
applications will ignore the contents of these chunks. The EBU intends to
register a limited number of extra chunks for specific application in
broadcasting. The current types are given below.

What information do I put in the "bext" and "mext" chunks, and what form
should it have?
The "bext" and the "mext" chunks contain various fields of data. Many of
these fields are fully defined in the specification. For example, the Date
field in the "bext" chunk and the SoundInformation field in the "mext" chunk
have defined formats. For other fields, such as OriginatorReference, the
specification only covers the type of data and the length. However the EBU
Members have developed recommended formats for the data in some of these
fields. See below.

When a BWF file is imported from another software system, how should the
contents of the "bext" chunk be treated?
This depends on whether the audio software works with a file structure, or
is a system connected to a database. Different designs must be used in each
case.

If the receiving system is based on a file structure, but with no database,
selected fields from the "bext" chunk of a file incoming from another system
should be displayed in a pop-up window.

In an audio software system that works with a database, the audio files are
indexed and the metadata contained in the "bext" chunk is stored and
retrieved from the database. For an incoming file, the content of the
various fields of the "bext" chunk are interpreted and the corresponding
fields of the database in the receiving system, are updated accordingly.

How should the metadata from the "bext" chunk be stored in a user's
database?
Below are some examples of how manufacturers have inserted the information
from the "bext" chunk into the fields of their databases. The examples are
mainly taken from network based radio on-air systems with simple two-track
editing facilities:

Field
Comments

 Description
The 256 characters can be named "Title" or similar in the database. This
field holds the working name of the file. For instance "Summitbriefing".
This field should not  be the name of the file.

 Originator
This field can, for instance, be the name of the reporter or the producer of
the file, or the artist or orchestra, if it is a music recording. This field
can have the name "Reporter", "Producer", "Client" or "Artist" or similar in
the database, depending on the most common use for a majority of files in
the production area where the database is used.

 OriginatorReference
A format for this field is described below. This long string could be kept
as a separate Unique ID field in the database not  be the name of the file.

 OriginationDate
This field should be the date of the creation of the audio file. When the
actual original recording is being made, the date is retrieved from the
computer's clock and stored in this field.

 OriginationTime
This field should be treated similarly to the date. The time is that
retrieved from the real-time clock in the computer exactly when the
recording begins. This will make it possible to seek files based on date and
time-of-day. The accuracy depends on the stability of the real-time clock in
the computer used for recording the file. Date and time should be inserted
into the corresponding fields in the database. OriginationTime is not
necessarily the same information as the "time" field in the file directory.

 TimeReference
This field is a count from midnight in samples to the first sample of the
audio sequence. This number can be used for time code generation if the
audio section of the computer being used for recording the file has a very
accurate and time-stable sampling frequency. This feature might be omitted
from BWF files that are not used with accompanying video. If used, this
number can be transferred to and from the database.

 CodingHistory
A format for this field has been agreed but is not compulsory. The strings
for each stage in the coding history can be extracted and kept in field(s)
of the database. When the next copy of the file is being made, these fields
can be retrieved from the database and used to generate the coding history
field of the new "bext" chunk.

**************************************

I think that even small archives should be be able to move to a "sensible
data migration plan" as the cost of hard drives continues to plummet,
leaving audio CDs or data CDs/DVDs as access copies only.  Tape-based and
writable CD/DVD copies are turning out to have SO many problems with
longevity past 10 years that it seems that they are NOT good long-term
archival storage media.

I believe the best solution will be a decent xml-based system that allows
the entry of bext/mext metadata, user definition of institution-specific
metadata chunks, a decent GUI to parse the XML for metadata
entry/modification, and an automatic export/import of the metadata to and
from common database packages such as Microsoft Office or Filemaker Pro.

I have a friend who is a high muckity-muck in the web design/XML/library
finding aids world - it seems there is enough of a need for this software to
be written that his firm might just take it on...  Or, we could go the
sourceforge open-source route...

dave nolan
Nyc

p.s. - everyone have fun in Austin...

 
> Date:    Wed, 23 Mar 2005 18:10:15 -0500
> From:    "Richard L. Hess" <ArcLists@xxxxxxxxxxxxxxx>
> Subject: Re: .wav file content information
> 
> Hello, John,
> 
> Yes, I hope we're not boring the majority of the list!
> 
> I'm afraid I'm going to have to cut this dialogue short at this point as
> I'm headed out the door to go to the ARSC conference -- and see some family
> and friends as well as do some errands along the way both going and coming.
> The most important one is Friday, seeing my 89 year old Dad in
> Pennsylvania. On the return, I pick up 24 channels of Dolby A, some logging
> recorders, and a Sony DASH digital player. I'll be at the mercy of dialup
> hotel networks for the next two weeks. A few have wireless which should be
> better.
> 
> At 03:57 PM 3/23/2005, John Spencer wrote:
>> I agree that this is a useful (and hopefully not too boring!) dialogue.
>> 
>> Let me hurl a few softballs back, and please, do understand that I agree
>> fundamentally with what you are saying.  As they say, "the devil is in the
>> details".
> 
> Oh yes! Definitely in the details. I was trying to provide a broad overview
> rather than get into devilish details
> 
> 
>> I truly believe this is a "crisis" for small archives, as the lack of
>> funding means that structured metadata gets pushed to the back of the bus
>> (or worse, OFF the bus).
> 
> That is definitely the case. The CD-R preservation route is the only thing
> that they can afford. The minute I start talking to archives about managed
> data storage, many (not all) archivists' eyes seem to glaze over. One of
> the things I'm looking for in these cases is an IT department that the
> archive can piggy-back onto. It's imperative that we get the mindset away
> from CDs on the shelf or hard drives on the shelf. Overall, when you
> include administrative (IT services) costs, it is far more cost effective
> to dump 2 TB of data into a 20 TB IT department than try to manage a
> separate 2 TB store (numbers are semi-random, but 2 TB of oral history is a
> fair amount).
> 
>>> At 12:40 PM 3/23/2005, John Spencer wrote:
>>>> Also, we've built these tools for our internal use, it's
>>>> certainly not that hard.
>>> 
>>> Right, but I think Scott addressed that and what we're trying to do here.
>>> Mounting heads and aligning tape machines isn't that hard for me, but lots
>>> of people don't do it themselves. Writing the software would be harder
>> for me.
>> 
>> Understood and agreed.  We have a number of data projects underway where the
>> archive is doing the "real work" (the actual transfers) and we're helping
>> out with the IT issues.
> 
> This might be useful to learn more about--if you're coming to ARSC we
> should try and sit down and talk about your services in this regard.
> 
>>> I think these tools are intended for smaller archives and people like me.
>>> Larger operations will require you to use the rigorous tools that they
>>> develop internally or purchase with rights management.
>> 
>> Here I must disagree.  If I were to share my collection of files with
>> another institution (small or large), I would have a problem if all present
>> metadata were modifiable.  DRM or not, the core information should not be
>> easily changeable.
> 
> This is all a matter of degree. The essence is modifiable unless we
> completely lock the file. If the essence is modifiable, then the metadata
> will be as well. So,  now we talk about degree. I do not see the
> modification as something that is done on a regular basis. I see these
> tools used much more for the creation and reading of metadata than
> modifying. The metadata I see embedded in files is not the type that should
> be modified.
> 
> 
>> This is another area of concern for me.  How can we assume that SANiP has
>> their metadata fields laid out in the same manner as ACHCN (Aboriginal
>> Cultural Heritage Centre of Nowhere in Particular)?  Sounds like there might
>> be some re-keying (or re-mapping, or crosswalks) of data, which is not my
>> favorite scenario.  The more times we re-type the same information, the
>> greater chance for error.  Are we talking about MARC records, DC metadata,
>> etc.?  The use of XML should remove many of these obstacles, but the same
>> cannot be said for those using Excel 95 to collect metadata!
> 
> No, I was always assuming that there was a structure that would be mappable
> either via field names (as used in Excel 95 etc.) or to be more modern, via
> XML.
> 
> I don't know what structured metadata system makes the most sense. I've
> been specifically avoiding that area of study for the moment. Yes in the
> generic sense of MARC that is what I had in mind, but the specific LoC MARC
> fields leave something to be desired for audio -- at least what I've seen.
> 
> 
>> I do agree that some metadata should reside in the header, as you could
>> always open the file up in hexadecimal and read it.  At our office, we call
>> this "catastrophic metadata" (or "CYA" metadata).  However, I'm somewhat
>> unsure of your meaning of "tied together".  Are you referring to
>> 1) a wrapper that can be opened automatically (like MXF), or
>> 2) the metadata and audio files reside on the same physical carrier, or
>> 3) all of the metadata would be in the BWF header?
> 
> "Tied together" means that there is one entity that is passed from A to B
> with essence and metadata.
> 
> (1) Yes MXF is an approach, but so is BWF as I understand it. Other than
> the semantic difference of wrapper vs. file, isn't what we're talking about
> with BWF and MXF very similar? Actually, I've been a fan of AAF for a long
> time--I wish it gained more traction.
> 
> (2) is an invitation to trouble IMHO.
> (3) Yes, that is what I'm talking about -- using BWF in a way similar to MXF.
> 
> 
>> Also, I was under the impression that many smaller archives don't have
>> "digital storage systems", hence the transitional migration to Gold CD-R (as
>> evidenced by various discussions on this list).
> 
> See above - yes, but it has to change.
> 
> Note, some snippage happened. Presumably anyone interested has the earlier
> posts as well.
> 
> Cheers,
> 
> Richard
> 
> Richard L. Hess                           email: richard@xxxxxxxxxxxxxxx
> Vignettes
> Media                           web:   http://www.richardhess.com/tape/
> Aurora, Ontario, Canada             (905) 713 6733     1-877-TAPE-FIX
> 
> ------------------------------
> 
> Date:    Wed, 23 Mar 2005 18:11:55 -0600
> From:    John Spencer <js@xxxxxxxxxxxxxxxxxxxxxxxx>
> Subject: Re: .wav file content information
> 
> Richard,
> 
> I will be at the ARSC Conference, and I am presenting a topic Sat. afternoon
> regarding digital archives.
> 
> I'll try and include some slides about a project currently underway.
> 
> John
> --
> John Spencer
> http://www.bridgemediasolutions.com/
> 
> 
>> From: "Richard L. Hess" <ArcLists@xxxxxxxxxxxxxxx>
>> Reply-To: Association for Recorded Sound Discussion List <ARSCLIST@xxxxxxx>
>> Date: Wed, 23 Mar 2005 18:10:15 -0500
>> To: ARSCLIST@xxxxxxxxxxxx
>> Subject: Re: [ARSCLIST] .wav file content information
>> 
>> Yes, I hope we're not boring the majority of the list!
>> 
>> I'm afraid I'm going to have to cut this dialogue short at this point as
>> I'm headed out the door to go to the ARSC conference
> 
> ------------------------------
> 
> End of ARSCLIST Digest - 21 Mar 2005 to 23 Mar 2005 (#2005-70)
> **************************************************************
Follow-Ups:
- Re: [ARSCLIST] .wav file content information - chunks
  - From: Don Cox
Prev by Date: Re: [ARSCLIST] .wav file content information
Next by Date: Re: [ARSCLIST] Longevity of half-track reel
Previous by thread: [ARSCLIST] Library of Congress vacancy - Chief Technology Officer for NAVCC
Next by thread: Re: [ARSCLIST] .wav file content information - chunks
Index(es):
- Date
- Thread
[Table of Contents]

Re: [ARSCLIST] .wav file content information - chunks

[Subject index] [Index for current month] [Table of Contents]