Back in the 1960s, after crossover distortion was tamed in the better solid-state amplifiers, many serious audiophiles remained convinced that tube amps sounded better than any solid-state ones. Then in the 1970s, after Matti Otala and others had demonstrated the effects, cause, and prevention of slewing-induced distortion, the percentage of serious audiophiles who preferred tube amps may have declined slightly, but it certainly did not drop to zero.
The Great Debate
The first tests to determine the cause of the difference in sound, if any, were mostly performed by engineers, who concluded that since the frequency response and distortion performance of the best tube and the best solid-state amplifiers were comparable, there must not be any difference (see Photo 1). Thus began the so-called “Great Debate.” I believe we are now in a position to put this debate to rest.
First, I must affirm, as mentioned in a previous article, that subjective amplifier judgments are, by nature, individual perceptions. Every perception is a fact to the person perceiving. Thus, if you tell me you can hear a difference between two amplifiers, I have to believe you. If I can hear no difference, then perhaps you are a more critical listener than I am. If neither I nor any of a dozen other skilled listeners can hear a difference, then you have indeed heard a subjective difference, but since the perceived difference is not a shared reality, we cannot say there is an objective difference.
If you cannot prove in a properly designed double-blind test that you can repeatedly hear a difference, then we must conclude that the difference you do hear does not proceed from physical, and therefore, determinable, causes.
The Gold Standard
In last month’s article, I discussed blind and double-blind testing. Both assume that the equipment under test is presented in a pair: two amplifiers or two speakers and so forth are being compared. In a blind amplifier test, the test subject (i.e., the person providing opinions about the equipment under test) does not know which amplifier is playing at a given time, in order to avoid the effects of extraneous variables (e.g., manufacturer preferences or finish details).
Long ago, experimenters found that test operators (e.g., set-up technicians or other persons involved with the experiment) would sometimes unintentionally provide clues via facial expression or body language that could give away the identity of the equipment under test. Thus, double-blind tests, in which no one involved in the test operation knows which piece of equipment is being used at a given time, were developed. Double-blind tests are the gold standard for any tests involving human perception.
Some people do not like double-blind tests, believing themselves to be immune to hidden biases. However, there is no evidence to prove that even the best of intentions enable a person to avoid subconsciously tilting his/her test responses in a preferred direction. (Perhaps a Vulcan could?)
Resistance to double-blind testing is not limited to those who believe tube amplifiers sound better, although some of those listeners still offer objections to the elaborate test protocols. Some people firmly believe that any well-designed
IC-based amplifier sounds better than any hollow-state amplifier and still claim that double-blind testing is unnecessary for them. Perhaps the most convincing for me personally was an individual who told me he made a change that was expected to make an amplifier sound better, but he was surprised to find that it sounded worse. No details of the test protocol were given. Of course, this amounts to hearsay, and thus cannot be considered scientific evidence.
Distortion Measurements
In 1977, the British magazine Hi-Fi News and Record Review published an article by Jean Hiraga detailing sensitive distortion measurements made in Japan on a variety of tube and solid-state amplifiers that operated well below clipping. The article was reprinted by audioXpress in the March, 2004 issue. This article showed conclusively that differences in the distortion spectra of excellent amplifiers do exist, but no effort was made to correlate these measured differences with perception.
In the March, 1980 issue of High Fidelity magazine, “The Great Ego-Crunchers: Equalized Double-Blind Tests” by Daniel Shanefield was published. This article directly addressed the perception issue. Shanefield mentions the division of audiophiles into “golden ears” who insisted that they could hear differences in amplifiers and “nonbelievers,” who constitute the majority. He makes the statement: “In the next few years (after 1970) several small-circulation magazines espoused the golden-ear point of view, though they often disagreed with each other about which components were truly excellent and even changed their minds drastically from issue to issue.”
Shanefield’s conclusion was that audible differences among good-to-excellent amplifiers do indeed exist, but that if the frequency response of all amplifiers under test was equalized to be flat within 0.25 dB, no perceptible difference remained. He includes the somewhat startling detail that when he compared three Dynaco 400 samples, frequency response differences of a few tenths of a decibel did exist, and the amplifiers did sound different. His experimental protocol ensured that the amplifiers were operated well below clipping. Shanefield’s tests were subsequently replicated by several members of the Boston Audio Society.
A/B/X Tests
In October 1991, David Clark of DLC Design presented the paper, “Ten Years of A/B/X Testing” at the 91st Convention of the Audio Engineering Society. A/B/X amplifier tests compare an unknown amplifier “X” with two known amplifiers, “A” and “B.” The test subject’s goal is to determine whether “X” is “A” or “B.” For example, an excellent amplifier can be used as “A,” and a medium-grade amplifier as “B.” The test subject can switch at will among “A,” “B,” and “X,” but the switches may be connected so that position “X” is actually amplifier “A”.
If the subject can reliably identify “A” is “X,” then clearly the difference between “A” and “B” is perceptible to him. If not, we cannot conclude that there are physical causes for the differences that some listeners perceive under less-controlled conditions. A/B/X tests are usually double blind, but they do not require the equipment’s brand/model under test be kept from the listeners, since neither the listener nor the test operator knows which is “A,” “B,” or “X.”
Test subjects are not permitted to communicate with each other. In an A/B/X test, the listener chooses when to flip the switch, allowing whatever amount of time he feels is needed to properly identify the unit under test as “A” or “B.”
A test may span several listening sessions, if the listener so chooses, or may be finished quickly if the listener is confident he has determined the identity of “A” or “B” as “X” in a short time.
A/B/X testing excels at finding perceptible differences, if any exist, but is not designed to establish levels of accuracy or preferability. The A/B/X test itself was compared with long-term listening as a method of identifying a calibrated 2.5% total harmonic distortion (THD) component that was added to a musical signal. The Audiophile Society acted as the “golden ears.” The Southwestern Michigan Woofer and Tweeter Marching Society (SMWTMS) acted as the “engineers.”
Neither group could identify the distortion at a 5% confidence level in long-term listening tests. However, using A/B/X testing, the SMWTMS not only correctly proved the audibility of the distortion in 45 min. of testing, but also correctly identified a lower amount of distortion. In the complete series of tests, THD was found to be audible at 4% using big-band jazz music, 2% using flute music, and 0.4% using a sine wave. The spectrum of the harmonic distortion was not specified in Clark’s AES paper (AES Preprint 3167).
Clark added a note in his 1991 paper, based on a private conversation with Thomlinson Holman (the “TH” of THX). Holman has found that a number of professional power amplifiers do distort audibly when driving highly reactive loads (e.g., some theater speakers) when playing explosive movie sounds. The cause could well be that under such severe load conditions, the amps’ power supplies experience instantaneous drops in voltage. This condition would be easy to identify using an oscilloscope.
Audio Pro’s Challenge
Audio professional Richard Clark (note: this is a different Clark), originally a believer that different amplifiers sound different, set up a $10,000 challenge: anyone who, by listening only, can identify which of two amplifiers is which, under rules he has established, will receive the prize. The index for all the test and results can be found at http://tom-morrow-land.com/tests/ampchall/index.htm. They primarily include minimum-quality levels for participating amplifiers, level matching, and so forth, all of which are essential to the validity of any comparative test.
In the years since the challenge was first offered, most large groups have obtained accuracy of 49–51%, which are essentially the results one would expect to occur by chance. Smaller groups have never gotten more than 60% correct. In any statistical sampling, small test groups are more likely to deviate from chance results. As the test population is increased, test results converge toward a specific value. For a random process, a larger test population is likely to converge toward chance results.
The fact that Clark found more nearly chance results when measuring larger groups is itself a strong indicator the test subjects’ responses were random, not ordered as would be the case if there were perceptible differences between the amplifiers being tested. These are averaged scores for the groups. No individual has ever reached 65% correct. These results do not permit us to say no person can ever hear differences between two good amplifiers, but they do strongly indicate that any such differences must not be very robust.
No test such as David Clark’s or Richard Clark’s can ever show that there are no perceptible differences in amplifier sound. Proving the nonexistence of anything is philosophically problematic because we can truthfully say only that any experiment did not find such-and-such a thing. However, we cannot perform all possible experiments.
As an analogy, we cannot say there are no white crows, because we cannot look everywhere at once. If we postulate the nonexistence of white crows, the person who finds a single example will prove us wrong. So far, however, no person has demonstrated publicly in a scientific fashion that he can reliably distinguish between the sound of two good amplifiers with identical frequency response and low noise driven below clipping. And yet the perception remains that there are real differences in amp sound. As a consultant in acoustics and sound/video system design, I regularly encounter people who assume I must have a vacuum-tube stereo system “because everyone knows they’re better.”
It is true that almost all tube power amplifiers have a very slight high-frequency rolloff (tenths of a dB), but no test has convincingly shown that such a small deviation from flat is perceptible. (If it is, and makes the sound better, should we all add inexpensive RC low-pass filters to our amplifiers to improve the sound so they sound like tube amps?)
The Search Continues
There are still serious researchers who are trying to find the elusive ingredient to tube sound. In October, 2011, Shengchao Li of Potomac, MD, presented a paper, “Why Tube Amps Have Fat Sound While Solid-State Amplifiers Don’t,” to the Audio Engineering Society’s 131st convention in New York.
The paper was reviewed by two qualified anonymous reviewers. Li begins from the assumption that tube amps do indeed sound different. He proceeds to explain that the differences arise from output-tube nonlinearities, amplifier output impedance, and output-transformer nonlinearity resulting from the core material’s B-H curve. These nonlinearities, Li says, interact to reduce the low-frequency output of the amplifier under some conditions. The speakers, whose nonlinearity is worst at low frequencies, thus have less signal at those frequencies and thus produce less distortion. In this manner, some low-frequency output is traded for reduced low-frequency distortion.
The first two mechanisms suggested by Li have been discussed in earlier Hollow-State columns. The third may be significant in low-feedback amplifier designs, but in more typical Williamson designs, transformer nonlinearities are largely compensated by negative feedback. Certainly all three mechanisms will be exacerbated at levels approaching or exceeding clipping. It would be instructive to see if a low-feedback tube amplifier could be identified in A/B/X testing, and just what level of overdriving is necessary to permit objective identification of any amplifier.
Choosing a Side
Let us now step bravely into the hornets’ nest. After four decades of testing by a number of very capable scientists, no evidence has been published that shows any objective difference among the sound of good-to-excellent audio amplifiers operated well below clipping, if the frequency responses are equalized within 0.25 dB of flat. Does this indicate there is no “tube sound”? It does not, for several reasons.
First, almost no home-music listener (or even recording studio engineer) equalizes the amplifiers flat within 0.25 dB. Virtually no speaker pairs are matched that closely all across the passband. And a flat response is not always everyone’s first choice. Hi-fi systems of the 1950s usually had “scratch” and “rumble” filters to remove the sound of artifacts on vinyl recordings. These typically applied a 3-dB/octave high cut above 8 kHz, and low cut below 80 Hz, respectively. Presumably they were included because equipment manufacturers found that their customers wanted and used them: at least under some conditions, listeners did not prefer flat response.
Second, a significant number of audio amplifiers are made for instrument amplification. The distortion used intentionally with electric guitars is well-known. I have also played with professional keyboardists who preferred tube-type Hammond organs because of the “growl” they produce at high volumes. This growl comes from distortion— largely intermodulation distortion — in the tube power amplifiers. A smaller percentage of audio listeners, but still a non-negligible number, prefer some distortion even in their music reproduced from recordings.
Naturally, the distortion spectrum is quite important to such people. As shown in a previous column and in Hiraga’s 1977 tests, that spectrum is very different for a tube amplifier versus a solid-state amplifier. It is a safe generalization to say that most people prefer tube distortion over the distortion of a solid-state amplifier. Most probably prefer triode tube distortion specifically.
Third, there are conditions under which distortion, even if not desired, occurs in an audio chain. It can happen when a person is listening to music at high levels, especially if low-efficiency speakers and/or underpowered amplifiers are being used. I believe this occurrence is the rule rather than the exception for many serious music listeners. Not only rock music, but also symphonic music, pipe organ music, and big-band jazz require a lot of amplifier power in order to play through typical home-stereo speakers at realistic levels without clipping.
Aside from amplifiers for music listening, another place where “tube sound” is much hyped is in microphone preamps for audio recording. In this application, it is not uncommon for the microphone to put out surprisingly high voltages. Capacitor microphone cartridges, especially, are almost immune to clipping, so when exposed to high sound pressures, they can produce output voltages close to 1,000 times their normal output levels. Under these conditions, preamp clipping is pretty much inevitable. Again, the distortion spectrum is very important, and most recording engineers prefer the spectrum provided by tube pre-amps.
So in this columnist’s codgerly opinion, there is indeed a “tube sound” under some conditions, and many excellent technicians, engineers, and audiophiles find it preferable for specific applications. It is not, however, scientifically accurate to claim that hollow-state amplifiers are better or worse than — or even perceptibly different from — solid-state ones for all applications.
Purely audio considerations aside, many audiophiles prefer the aesthetics of a “warm,” artistically designed (perhaps handcrafted) amplifier over the usual “high-tech” appearance of most solid-state amplifiers. And many of us enjoy the “legacy” feel of the equipment setup when operating vacuum tubes are visible. These are valid reasons to buy hollow-state equipment, if it appeals to you.
Test for Yourself
If you are still not satisfied about the topic of “tube sound,” you are invited to take part in an online test. The test has two parts. In the first part, you will be invited to use quality headphones (please, not computer speakers) to listen to a pair of .wav files of a short clip played by the lead guitarist of Darrell Harwood and the Coolwater Band. One file was made by recording directly to digital from the guitar, then playing the recording through a solid-state power amplifier adjusted to produce an acceptable amount of distortion for proper artistic effect. The other file uses the same digital recording, played through a tube amplifier adjusted in the same way.
Both recordings were made using the same speaker and cabinet, and were recorded using a type 1 calibrated measurement microphone with a tensioned stainless steel diaphragm. The same physical setup was used for both recordings, and the levels of the recordings were carefully matched. You may play the files as many times as you wish, and will then be requested to send an e-mail stating which file, “A” or “B,” sounded better to you. If you choose to include your opinion as to which is the tubeamp and which is solid-state recording, please do that as well. The purpose of this part of the test is to illustrate the sound of the different distortion spectra of tube versus solid-state amplifiers.
The second part of the test consists of three .wav recordings of a brief clip of classical music. There is a reference clip made directly from the CD, a clip of the selection being played through a McIntosh hollow state power amplifier, and a clip of the selection being played through a commercial solid-state power amplifier. Both of the “amplifier” recordings were made using an Evenstar Pro Peregrin studio monitor speaker, recorded using the measurement microphone described above, with the same physical setup. No equalization was applied to either amplifier, and the amplifiers were operated well below clipping. The reference (“X”) straight-through recording is identified, and you are asked to listen to the “A” and “B” recordings, determine which sounds more like the reference, and e-mail your results.
These tests can be found online at www.edcsound.com/amptest. Instructions for participating are included on the website. aX
Click here for Part 1 of this article: Differences in Amp Sound: How Do We Find the Truth?