A Look at Soundbar Audio Signal Processing

December 12 2018, 10:10
In this article, we will take a look at what is trending in audio signal processing for the poster child of integrated electronics speakers… the soundbar.

As an audio consumer product category, soundbars remain a top seller, having replaced receivers and their associated passive speaker systems, which have sadly been reduced to legacy product categories. More than 10 million soundbars are sold annually but most deliver barely more performance than a table radio. Entry-level soundbars typically have very little power, a bit of rudimentary spatial processing, and a woofer posing as a subwoofer.

The Samsung surround soundbar that I use in my bedroom has a power supply rated at 40W — so with maybe 80% real-world Class D efficiency along with the power for the signal processing, the amplifier output power probably is about 5W per channel. Therefore, I think about 50-plus million soundbars in use are due for an upgrade. Aside from better components overall and a reasonably higher price point, there are a number of truly outstanding signal processing algorithms that can provide a step-change for immersive spatiality, freedom from overload, clearer dialog, and more credible bass.
Bose Soundbar 70 is a custom digital signal processor with Bose’s PhaseGuide for projecting discrete sound where there are no speakers, with ADAPTiQ room EQ compensation and voice assistant support.

Potential Improvements
To start, let’s consider that a wider soundstage is appropriate beyond the width of the soundbar enclosure. And in most entry-level soundbars, some sort of spatial processing is common using simple anti-phase stereo image expanding circuits (e.g., SRS, Waves MaxxStereo, or something similar).

The more sophisticated surround sound decoding (in some cases) delivers height as well as horizontal surround localization. Examples aside from Fraunhofer’s IIS include Dolby ATMOS, Qualcomm 3D Audio Studio (actually another type of IIS), DTS-X, Dirac Panorama, and others.

While the limited spatial processing in most soundbars is a function of low cost, there may also be intellectual property issues. Typically, the low-end soundbar manufacturers are using a simple chip solution or the processing is embedded into the system-on-a-chip (SOC) from the selected semiconductor solution provider.

Dolby was not always paid all the royalties it was owed by its Chinese licenses, which forced the company to be more particular regarding with whom it would work. An unintended consequence was that many of the less established (I am being tactful here) soundbar OEMs/ODMs did not make the grade. These vendors had to develop their ODM models using alternative spatial processing. A visit to Best Buy will prove how few soundbars offer Dolby processing. Today, the balance between receivers and soundbars has flipped and 10 million soundbars are sold a year versus about 1 million receivers. Dolby ATMOS and Fraunhofer MPEG-H and Dirac Panorama and DTS-X are all immersive sound processing that deliver the shock and awe needed for consumers to ditch their inferior soundbars.
The MPEG-H TV Audio System is designed to work with today’s broadcast and streaming equipment. The object-based system includes interactive and immersive audio that enables viewers to adjust the sound mix to their preferences and improves the realism of sound.

Object-Based Audio Immersive Sound
The key attributes of this agile surround solution provide the designer with scalable and flexible numbers of playback channels regardless of the program source, yet, properly conveyed to the listeners the spatial attributes intended by the sound designer at the studio.

Soundbars have not achieved the acceptance in Asia or Europe that they have in the US market but China’s AVS 3D Audio Task Group has just chosen Fraunhofer’s MPEG-H as the codec solution for its 3D object-based immersive Audio standard for 4K UHD broadcast. Fraunhofer is a huge European research group that has developed all sorts of audio technology — including audio projects from surround algorithms (e.g., MPEG-H) to diamond tweeter diaphragms. The home theater and stereo high-end stereo market in China is not thriving and perhaps the timing is right for high-grade soundbars there also.

This next-generation audio codec is already on the air in South Korea, as part of the country’s broadcasting trials based on the ATSC 3.0 standard, and the immersive and interactive features of Fraunhofer’s MPEG-H enhanced viewers’ audio experience during the 2018 Winter Olympics in Pyeongchang, Korea. MPEG-H is part of the ATSC 3.0 and the DVB-UHD television standards, and is also suitable for over-the-top (OTT) content.

Sennheiser has demonstrated Fraunhofer’s IIS system in its high-end soundbar prototype at a number of audio shows and it is impressive. I know engineers should not use themselves as a marketing test subject, but I would pay $1,500 for achieving high-fidelity home-theater sound in the soundbar form factor. And aside from using decent speakers, the only way to get there is through signal processing. 
Sennheiser’s AMBEO soundbar boasts an immersive 5.1.4 sound jointly developed with Fraunhofer, the AMBEO soundbar captures knowledge of room size and reflective surfaces and achieves deep bass response down to 30 Hz without the need for a subwoofer.

What About AutoEQ and Augmented Response Processing
The surround sound receiver business is, as with passive speakers that connect to receivers, in a sad state. About a decade ago, 10 million surround receivers were sold per year but only 1 million soundbars. Back then, many if not most of the surround receivers (from Pioneer, Denon, Marantz, Kenwood, Onkyo, Sony, etc.) offered autoEQ functions. The end user or system integrator set up an inexpensive measurement mic and hit the EQ button and “calibrated” the sound system to the seating position and the room. Audyssey was one of the big players that licensed its AutoEQ. Audyssey’s solution was an ingenious approach of borrowing the receiver’s DSP used for Dolby decoding to do the EQ process, then these response settings required little DSP horsepower, while most of the million instructions per second (MIPS) computing power was turned over to the Dolby decoding algorithms. Aside from Audyssey, Trinnov and Dirac also offer this type of room correction. More recently Even, Hearezanz, and Mimi have introduced “augmented hearing processes” that include the listener’s head as a component in their compensation correction. While originally intended for headphones where the room acoustics are negligible, this type of software can work for speaker systems (and the lowly soundbar)—on the limited condition of one listener in a specific listening position.

Another “killer app” for signal processing for the soundbar that has yet to launch is home videophone conferencing. This extends the home-theater flat screen and soundbar beyond playback to full duplex speakerphone functionality. Limes Audio — known in the conferencing industry for its full duplex processing used by several vendors — developed a full duplex soundbar solution that it planned to license to the soundbar industry. Limes Audio focused on both full duplex in connection with far-field voice control/activation. Alas, too tasty not to be eaten by a bigger fish, Limes Audio was swallowed up by Google’s skunk works in January 2017, never to be heard of again. But the processing power needed for immersive spatial audio can be unleashed for full duplex acoustic with echo canceling when the phone rings — as well as stable and accurate voice command from the couch to adjust the blaring TV.
Tymphany and voice solutions specialist XMOS demonstrated a new Alexa Built-In soundbar at Amazon’s exhibit during IFA 2018 in Berlin, Germany.

70 dB Mics
I never use voice command user interfaces, at least from my many experiences, they still come up short. Okay, when used in a quiet space on a boom mic, they often work, but even just a short distance away (e.g., the mic located on an earphone) or used walking on a city street, all bets are off. Surely, most words said to Siri, Cortana, or Alexa are foul language, and even the most courteous users are still mostly telling them to shut up and go away. While voice recognition algorithms have a way to go, another aspect that will be part of the solution is a lower mic noise floor.

Microelectro-mechanical systems (MEMS) mics have come a long way — from 50 dB signal noise at the start in 2001, to typically 60 dB signal noise five years ago to 65 dB for decent examples today. Yet 75 dB is really needed for consistent far-field (with the talker 10’ from the mic) voice command results. Also as the signal to noise improves, the number of mics in the array can shrink, which simplifies cost and complexity (it is not just mics but the number of codecs, etc.).

Knowles, Infineon, Vesper, and other MEMS mic vendors have announced or promised 69 dB mics with roadmaps for a bit more. Further down the road, sensiBel with its optical MEMS mic and GraphAudio with graphene capacitive MEMS technology may reach 80 dB or higher.

Kopin’s Whisper Voice Extraction processor promises (and seems to deliver) far greater accuracy of speech recognition in noisy environments and that dramatically improves far-field voice quality in smartphones, wearables, or just from the couch to the soundbar. Developed for voice interface in fighter jet helmets, pre-processing the speech results in significant improvements in the recognition rates and a more natural-sounding voice to the far-end listener.

Audience, now the Knowles Intelligent mic group, has integrated advanced speech processing into the MEMS mic itself. With the lion’s share of the voice command business, Sensory’s TrulyHandsfree Voice Control technology offers multiple phrase technology that recognizes, analyzes, and responds to dozens of keywords. It recognizes phrases even when embedded in sentences and surrounded by noise. Well… sometimes anyway. With voice-controlled user interface becoming a must-have function for everything from smartphones to Bluetooth earphones and smart wireless speakers to soundbars, the pressure is on for robust and stable solutions.

Streaming video programing seems to have too much level variation from program to program whereas automatic level control processing is a necessary component, and is a good dynamic control along with compression of dynamic range and they are offered by most the signal processing providers. Video processing’s lack of synchronization with the audio and wireless speaker and Bluetooth headphone latency are other aspects that require adjustability in audio for video applications.

The soundbar is destined to remain the de facto audio solution and will progressively migrate to respectable quality through a combination of quality speaker design along with a strong measure of signal processing. VC

This article was originally published in Voice Coil, October 2018
related items