Comparing Speakers and Level Matched A/B Comparisons

December 7 2016, 06:00
When I review speakers, I make a point of doing my continuous (20-second) averaging room/power curves (moving a microphone slowly over a 1 x 1 x 5’ area at head level at the listening couch) with an AudioControl SA-3051 RTA. This gives me a good idea of just how effectively a speaker pair can deliver flat power to a typical listening area. It continues to amaze me that a simple measurement such as this can do such a good job of dissecting the spectral balance of a group of speakers in a good room. Invariably, speakers that do well with this measurement sound more realistic to me than those that do not. At least this is the case with really good recordings. 

In addition, to validate my measurement discoveries, I always perform level matched A/B comparisons between the units under test and some “reference” models I keep on hand. (The primary measurements are nearly always done in one room, and the primary comparisons are nearly always done in another, to weed out both negative and positive room-related anomalies.) Because those reference systems are pretty good, and also because I attempt to compare speakers with similar conceptual approaches to good sound and similar placement requirements, A/B evaluations can possibly do more to make a review worthwhile than the measured curves. Certainly, they are as important as the latter.

I have also pointed out in some of my reviews that when I do level-matched comparisons of stereo pairs, I try to set up the systems AB – AB style, as opposed to AB – BA. (In this case, each of the two units in one pair is A and each of the two units in the other pair is B.) I also position them on a mid-distance soundstage that is reasonably well centered in relation to my seating position. This arrangement keeps the units in each pair the same distance apart, although it does make it tricky to determine precisely and quickly which pair has the most exacting center focus.

While some critics make a big deal out of center focus and precise imaging stability in the middle of the soundstage, I consider enveloping spaciousness, breadth, and frontal blending to be more important. However, there is no doubt that good central focus can be very important with solo performances, particularly those dealing with the voice. In any case, when listening for proper center focus, my AB – AB arrangement only requires me to shift my head somewhat left and right to the sides when doing the A/B switching. On the other hand, when listening for spaciousness, breadth, and frontal blending, I can listen critically, enjoy the sound, and sit still.

OK, so we know that placing speakers for comparison work is tricky, and involves compromises. But what about the above-noted level-matching issue? Just how tricky is it to level-match a pair of speakers for a decent A/B comparison?

Random Noise
In the old days, I remember reading about how some reviewers would level-match speaker pairs being compared by feeding each amplifier a single-frequency signal (often 1kHz) and then using an SPL meter to get them equally loud at that point. (No matter what approach you use, you need two stereo amps and a line-level switching feature to do level-matched stereo-speaker comparisons, with at least one amp having level controls.) While this seemed OK on paper, the problem is that it did not consider the often very different response curves exhibited by each pair of speakers.

One speaker pair might have a peak at that frequency and the other might be flat or even have a dip there, and the result would be a significant mismatch in their overall adjustment levels. Once matched up that way, one pair might sound considerably louder than the other with musical source materials, with the result being that the listener would probably consider it more dynamic, more revealing of detail at frequency extremes, and just plain more transparent sounding than its less-loud playing competition.

(Note that intentional mismatching of this kind was also sometimes used, and no doubt may often still be used, by unscrupulous dealers eager to make sure that a pair they wanted to sell performed more impressively than a pair set up to be a reticent, muffled, and anemic-sounding straw man.)

Anyway, there is a problem with using a single-frequency tone to level-match speakers. (Note that when level-matching amps for comparison purposes, this approach should work OK, because decent amps typically exhibit very flat frequency-response curves, at least in the midrange.) Consequently, conscientious reviewers (and sales clerks, and enthusiasts wanting to know the facts) will use a randomized series of tones such as pink or white noise when setting up speakers for level-matched comparisons.

Now, you would think that using random noise along with an SPL meter would work like a charm. After all, the meter reads the total, average, wide bandwidth output of the speakers, and so it should allow you to match up levels and get things just right. Unfortunately, this may not always (or even normally) be the case. One problem is that, besides exhibiting sharp peaks, speaker pairs being compared may have broad-bandwidth bulges and slopes in their response curves, with the spread and tilt of the bulges and slopes differing considerably.

For example, I recently auditioned a pair of speakers that were reasonably flat in the midrange and treble, but which had a considerably elevated bass range. (This was no doubt the result of a voicing decision on the part of the designer.) If a system like that were given a random-noise feed (particularly with broadband pink noise) and was level-matched against a system that had flatter response in the bass range, the bass-heavy system would tend to sound less loud with standard program source materials. Even if both systems had reasonably similar bass outputs in the middle- and upper-bass range, but one had further extension into the low-bass range, doing a random-noise level matching with an SPL meter would give a false impression of relative levels.

That is, the person doing the level adjusting would back off the amp gain for the bass-heavy (or extended-bass) speaker, because the SPL meter would be reading the average broadband output. While the bass-heavy system might have its problems (the one I auditioned and decided not to waste time reviewing did), it certainly did not deserve to have them highlighted by a mismatching job during the setup procedure for the A/B comparison.

The same kind of thing, of course, can happen when one pair of speakers in an A/B comparison is somewhat bright in the treble compared to the second pair. Indeed, here things can become even more tricky, because if one system is flat out to 20kHz (unusual) and the other rolls off somewhat above 10kHz (not unusual at all), we would again have problems matching average SPL readouts fairly with random noise and a meter. So, it appears that single-frequency inputs and random-noise inputs can both be problematic when trying to level-match speaker pairs for A/B comparisons. 

So, what do we do about it? Well, my solution involves measuring each pair with my RTA and then adjusting levels so that the two curves overlap, particularly in the midrange, as much as possible. While this cannot work to perfection (particularly with speakers exhibiting gross frequency-response anomalies), it certainly allows the listener to adjust levels in such a way that the performance advantages (or disadvantages) of each speaker pair are presented in the most workable way possible.

For those who do not have access to 1/3-octave RTA analyzers, there is still a nifty solution: the human ear. Yep, rather than let a mindless electromechanical device such as an SPL meter play misleading games with a random-noise input, you can simply listen to the random-noise source and then adjust levels so the speakers appear to be reproducing the input equally loud. This is not a perfect solution, but neither is my RTA approach, and it certainly has the potential to work better than the above-noted single-frequency approach or the broadband random noise approach in combination with an SPL meter.

Two More Points
First, while pink noise is de rigueur when performing RTA measurements, in most cases you will get more mileage with the “ear-use” approach if you employ a white-noise source. Remember, pink noise delivers equal energy per octave, while white noise delivers equal energy with frequency. Consequently, with both SPL measurements and with the use of the ear, a pink-noise source will have an impact on the bass/treble balance of the systems. It is hard to set up levels this way, even by ear, and so using white noise (which to the ear emphasizes mid-frequency energy better) makes it easier to match things up with two pairs of speakers.

Second, when you want to do things precisely, it might be a good idea to level match each side one at a time instead of doing it globally with the speakers in each pair operating together. That way, anomalies resulting from one of the amps possibly having slightly unbalanced outputs that would impact left-right soundstage balance would be eliminated. (One- or two-dB channel-balance differences are not unusual at all with amps, particularly with versions that have level controls that look equally adjusted or balance controls that are not truly centered at the vertical mark.) Heck, it might even be possible for both amps to be slightly biased in opposite directions, which would make the soundstaging differences even worse. (Note, for the same reason, I also suggest doing this kind of “one channel at a time” balancing job when A/B comparing amplifiers that are feeding one pair of speakers.)

Anyway, the bottom line is that you need to get those speaker levels as similar as possible when doing A/B comparisons. And it is also important to get the systems physically set up in such a way that neither is at a disadvantage when it comes to soundstage spread and the ability to deliver a balanced presentation.

Do this with two really good pairs of speakers and you might be shocked to discover just how alike they can sound at times. Screw up the level matching and placement and you might be equally shocked at just how different the two groupings sound—even if all four are identical models.

Interestingly, several decades ago speaker designer (and head man at Acoustic Research Corporation, and father of the acoustic-suspension woofer and dome tweeter and midrange drivers) Edgar Villchur did a series of live-versus-recorded demonstrations with AR-3 speakers and a string quartet. The recordings were very carefully made outdoors, so as to eliminate any hall reverb from the recorded sound (the microphone placement was tedious and demanding, because of radiation-pattern artifacts with live instruments), and the demos involved the quartet and recording basically being compared A/B style in several different ways, with the quartet sometimes pretending to play while the recording produced the sound. Needless to say, the levels were carefully matched, and most of the people in the audiences were hard pressed to detect differences. (Some of those detectable differences involved analog tape hiss.)

This tells me two things. First, Villchur knew how to impressively demonstrate his speakers, which went a long way toward making AR as successful as it was in the 1960s. Second, with proper equalization (the treble output of the speakers, which normally had a shallow rolloff above the midrange, was boosted about 5dB by means of the preamp tone control) and careful placement, really good speakers have been able to deliver subjectively perfect sound for decades.

You can also read what Edgar Villchur wrote about “Loudspeaker Testing and Measurement” here:

This article was originally published in audioXpress, January 2010.
related items