Re: [ARSCLIST] A/B testing: another approach

To: ARSCLIST@xxxxxxxxxxxx
Subject: Re: [ARSCLIST] A/B testing: another approach
From: Marcos Sueiro Bal <mls2137@xxxxxxxxxxxx>
Date: Mon, 28 Jan 2008 13:56:28 -0500
Comments: To: Association for Recorded Sound Discussion List <ARSCLIST@loc.gov>
In-reply-to: <265c01c861d4$682c3db0$6401a8c0@d1400>
List-archive: <http://listserv.loc.gov/cgi-bin/wa?LIST=ARSCLIST>
List-help: <http://listserv.loc.gov/cgi-bin/wa?LIST=ARSCLIST>, <mailto:LISTSERV@LISTSERV.LOC.GOV?body=INFO ARSCLIST>
List-owner: <mailto:ARSCLIST-request@LISTSERV.LOC.GOV>
List-subscribe: <mailto:ARSCLIST-subscribe-request@LISTSERV.LOC.GOV>
List-unsubscribe: <mailto:ARSCLIST-unsubscribe-request@LISTSERV.LOC.GOV>
Message-id: <479E255C.4040309@columbia.edu>
Organization: Columbia University
References: <479DA03302000033000260FF@ntgwgate.loc.gov> <479DF78A.5010501@columbia.edu> <479DBCE7020000330002613F@ntgwgate.loc.gov> <265c01c861d4$682c3db0$6401a8c0@d1400>
Reply-to: Association for Recorded Sound Discussion List <ARSCLIST@xxxxxxx>
User-agent: Thunderbird 2.0.0.6 (Windows/20070728)

It seems we are re-creating (albeit in a more civilised manner) the Great Debate in audio between Objectivists and Subjectivists.

http://www.stereophile.com/news/050905debate/

I believe tests have been performed where users are allowed to take the ABX box to their home system and switch as often or as little as they want, and take as long or as short as they want (weeks, for example). The results were actually less accurate than in standard "short" ABX tests (perhaps because of the well-documented effect of one getting "used" to the sound of certain equipment).

I believe my Marantz sounds better than my Yamaha amplifier, but lately I have begun to suspect that the way the Marantz looks and feels, as well as my own believes, etc etc affect my perception. The act of listening is a combination of the ear and the brain, but the auditory part of the brain is not isolated from other stimuli, so it seems logical to think that they may affect how I hear. So in a way, it does sound different --in a very real way. But take away the visual stimulus, and it ceases to sound different.

As professionals, however, we have to be concerned with subjectivity and accurate transfers. So my own ideal strategy when choosing a piece of equipment would be:

1 - If there are metric differences (e.g. lower measured harmonic distortion), then I will use the equipment with "best specs" I can afford, even if I cannot tell the difference; else,

2 - If there are peer-reviewed, well-designed scientific studies (of any sort, not just ABX) that show preferences (or differences) perceived among a large or important enough group, I will use the perceived "better" piece; else,

3 - I will try to determine any perceived differences in my own setup via blind testing.

This is my ideal algorithm, but of course the three steps all affect each other (e.g. Am I willing to spend $3000 extra on an amplifier that shows a minimally better THD number, even if I know no one can ever consistently tell the difference?). It seems pretty rational.

This has been an interesting discussion. Thank you.

Marcos

Malcolm Davidson wrote:

For many people the brain's ability to perceive subtle difference is
severely limited when the samples are played sequentialy.  It's difficult to
remember the "reference" track and the one being listened to in the moment.
For large difference it is much easier so that a sequential test does have
some validity comparing say 96 KHz 24 bit, CD and MP3.
When we did all the SDMI (Secure Digital Music Initiative)  watermark
evaluations, not only were we evaluating the watermarking technologies, but
we were also (unintenionally) evaluating the abilities of the participants.
This varied widely amongst all the so called experts.  With a certain amount
of training people can improve their listening skills.  We observed
individuals who were perceived as "golden ears" who could not pick out a
watermark, whereas there were some participants who could pick out certain
watermarks with ease, yet did not have a reputation as an expert listener.
Any type of comparative listening test is highly subjective for the brain
can pick out subtelties that we are not skilled at measuring. For example
bit for bit identical streams have been consistantly thought of as different
by expert listeners,  (separate files on a hard disk)   due in part to the
buffering and speed matching of the data as it is read off the disk.  It
changes the small amount of jitter of the digital stream which subsequently
alters the noise floor of the D/A.  This has influences the spacial imaging
slightly of stereo.  However many people might describe it differently and
it might not necessariy be a bad thing.  With the complexity of the "supply
chain", to the end user,  how close does the final product replicate the
actual recording experience and do people care and are they willing to pay
for it?
Malcolm Davidson

----- Original Message ----- From: "Matthew Barton" <mbarton@xxxxxxx> To: <ARSCLIST@xxxxxxxxxxxx> Sent: Monday, January 28, 2008 11:30 AM Subject: Re: [ARSCLIST] A/B testing: another approach
It wasn't a scientific experiment--just an engineer having a bit of
fun--though he seems to have had a point to make, if not a detailedl
hypothesis to test. I just thought the structure was interesting, and
worth considering if our goal is to develop a listening test or tests.
Perhaps the thing that I find most interesting here is that it involved
a real-time, uninterrupted  listening experience of A, B, C, and D.
Perhaps the brain does respond differently to such a listening
experience.
Matthew Barton
MBRS
The Library of Congress
101 Independence Ave., SE
Washington, DC 20540-4696
202-707-5508
email: mbarton@xxxxxxx
Marcos Sueiro Bal <mls2137@xxxxxxxxxxxx> 1/28/2008 10:40:58 AM >>>
Matt,

This is an interesting link, but as a scientific experiment it does not

seem very useful: What is the hypothesis? How are we quantifying it? A
statement such as "my fellow listeners appeared to be equally
uncomfortable" does not seem conducive to analysis.
If the highs were perceived not to be "as silky smooth" --in other
words, if the differentiating factor has been identified after just one
listen of a short passage--, should not the same listener be able to
correctly identify such a difference in a blind test? Logic seems to
indicate that he should, but perhaps the brain works in mysterious
ways.
Incidentally, it seems that not all ABX tests have concluded that
listeners are less sensitive than we thought. I was told in school that
most average Joes can hear at most a difference of 1 dB, but a group of
5 listeners in an ABX test perceived differences of 0.4 dB 93% of the
time (note: this is not a peer-reviewed paper, and this is from the ABX
web page, so it is not conclusive evidence).

http://www.provide.net/~djcarlst/abx_lvl.htm

Cheers,

marcos

Matthew Barton wrote:
Here's a link to an article from the October issue of Stereophile,
in
which an interesting approach to blindfold testing is described:

http://www.stereophile.com/asweseeit/1007awsi/index.html

This is not an analog vs. digital article, and I'm not endorsing the test or or its results, or any conclusions in the article, but I
think
the approach is interesting. Instead of an A:B comparision, in which listeners first heard A, and then B, and were asked for opinions,
this
engineer created a composite patchwork of different formats using a
repetitive passage from a recent recording of Handel's Messiah. He
didn't tell his audience that this is what they would be hearing:
"It turned out that we'd been unwittingly involved in a blind
listening
test. The DVD-A was a ringer. Philip had chosen a Handel chorus in
which
the same music is heard four times. He had prepared four versions of
the
chorus—the original 24-bit/88.2kHz data transcoded straight from
the
DSD master; a version sample-rate–converted and decimated to
16/44.1
CD data; an MP3 version at 320kbps; and, finally, an MP3 version at 192kbps—and spliced them together in that order. The last three versions had been subsequently upsampled back to 24/88.2 so that the DAC's performance would not be a variable. The peak and average
levels
were the same for all four versions; the only difference we would
hear
would be the reductions in bandwidth and resolution. "-- from
"Watching
the Detectives," by John Atkinson, Stereophile, October, 2007.

We can all argue about the specs here, but the most interesting
thing
to me is that the changes in the audio unfolded over four iterations
of
the same passage of music in the same recording. Listeners were not asked to use their memory of recording A to appraise recording B,or
vice
versa. They heard (or did not hear) the changes as part of
continuous
listening experience.
Matthew Barton
MBRS
The Library of Congress
101 Independence Ave., SE
Washington, DC 20540-4696
202-707-5508
email: mbarton@xxxxxxx

References:
- [ARSCLIST] A/B testing: another approach
  - From: Matthew Barton
- Re: [ARSCLIST] A/B testing: another approach
  - From: Matthew Barton
- Re: [ARSCLIST] A/B testing: another approach
  - From: Malcolm Davidson

Prev by Date: Re: [ARSCLIST] Classical movements info sought
Next by Date: Re: [ARSCLIST] Classical movements info sought
Previous by thread: Re: [ARSCLIST] A/B testing: another approach
Next by thread: Re: [ARSCLIST] ARSCLIST Digest - 26 Jan 2008 to 27 Jan 2008 (#2008-27)
Index(es):
- Date
- Thread

[Table of Contents]

Re: [ARSCLIST] A/B testing: another approach

[Subject index] [Index for current month] [Table of Contents]