Imagine watching a crime drama on TV. The detective interrogates a suspect as suspense-filled music plays in the background. The back-and-forth dialog, however, is hard to understand because of the music’s volume. A similar dilemma happens while watching a live sports broadcast, where the commentary drowns out the stadium atmosphere.
TV stations have to deal with issues like these all the time. They have to find a compromise in the audio mix that suits a diverse audience. What would change the game is enabling viewers to change the audio playback to their own preferences. The MPEG-H TV Audio System, substantially developed by Fraunhofer IIS, is designed to offer new interactive options for the viewer as well as deliver immersive 3D sound. The format is integrated in the ATSC 3.0 and DVB television standards. It has been on air in South Korea since May 2017 as part of the world’s first regular terrestrial UHDTV service. MPEG-H, however, is not only a new audio codec, but a complete system — from the microphone to the loudspeaker — and opens up new options for creatives as well as new sound dimensions for viewers.
MPEG-H Audio: Creative Freedom Through Channels, Objects, and Ambisonics
MPEG-H offers three dimensions of efficient audio transmission: channels, objects, and ambisonics. Channel-based audio ensures, among other things, that all conventional content can be broadcast using MPEG-H. Additionally, the capabilities of channel-based audio are enhanced by the introduction of height channels, enabling the playback of true 3D immersive sound. For usage in ATSC 3.0 and DVB, the number of available loudspeaker channels in the MPEG-H TV Audio System is limited to 12. That way, common formats such as 5.1+4 and 7.1+4 are possible. For the Japanese market, the MPEG standard allows for a variant that supports the 22.2 system proposed in Japan.
The second dimension is audio objects, which include interactive elements or dynamic sound effects, such as a helicopter flying “above” the viewers. Objects are not tied to a specific playback setup, but are defined via their positions in the room. Supplementary metadata, which describe the characteristics of each object, are also transmitted inside the bitstream. With the help of such metadata, the object is reproduced over the loudspeakers in the playback setup. Another advantage of audio objects is their ability to be manipulated by the viewer within boundaries defined by the content providers.
For example, a viewer can switch between different languages or choose from several commentators for sports broadcasts. The transmission of music and effects in a channel-based “sound bed” along with dialog and commentary tracks as objects enables TV broadcasters to efficiently provide different commentary options — including in multiple languages — since the sound bed only needs to be transmitted once instead of separately for each language.
Another benefit of using objects for the speech elements in a broadcast is that the volume and, if desired, the position of each object can be manipulated. For the first time, this enables the user to individually optimize the mix of dialog and music/effects based on their specific listening situation. This option, also known as “Dialogue Enhancement,” also makes it easier for viewers with hearing impairments to understand the TV broadcast. When it comes to playback on mobile devices, the mix can be further adapted to surrounding noise conditions.
Ambisonics, the third dimension, is a mathematical reproduction of the sound field. As such, it is independent from the playback setup. This makes Ambisonics an excellent option for all those applications that require a manipulation of the audio contents (e.g., Virtual and Augmented Reality). The order of the Ambisonic representation — First Order Ambisonics (FOA) or Higher Order Ambisonics (HOA) — defines the spatial resolution of the signal: The higher the order, the more precise the localization in the sound field. The MPEG-H TV Audio System supports Ambisonic representations up to sixth order. The combination of Ambisonics and objects enables working with a low order for Ambisonic representations. That way, Ambisonics is used for the sonic background atmosphere and audio objects for the exactly localizable sound elements.
MPEG-H Used in Production
MPEG-H Audio has been on the air since May 2017 as the sole audio codec of South Korea’s terrestrial UHDTV service and was most recently used to broadcast the 2018 Winter Olympics in Pyeongchang. This means that the complete production and transmission chain was fully realized using MPEG-H. The respective equipment is readily available, including broadcast encoders from various manufacturers, audio monitoring and authoring units for the production of metadata in live broadcasts as well as various post-production plug-ins for Digital Audio Workstations (DAWs). The transmission of metadata in the studio and production ecosystem happens via SDI, meaning that the audio tracks are transmitted in an uncompressed way in the PCM format, with the metadata in the “Control Track” format via the 16th SDI channel.
In the last few years, the team of sound engineers at Fraunhofer IIS gained significant knowledge and expertise in recording and producing content with the new opportunities offered by MPEG-H (see Photo 2). This knowledge has been utilized in several TV productions in collaboration with European broadcasters, such as a documentary by one of the major public TV stations in Germany. The know-how in creating immersive and interactive content is also being passed on to other sound engineers at various workshops. In South Korea, Fraunhofer IIS operates a training and demo facility where courses and demonstrations for the Korean TV stations are offered.
MPEG-H Playback
The biggest challenge for MPEG-H playback is making the interactive features user-friendly and easily accessible. Therefore, the MPEG-H TV Audio System comes with two levels of interaction. The first level includes broadcaster-set presets that offer a selection of useful alternatives to the original mix.
These may include raised dialog volume, access to a different sports commentator, or the isolated stadium atmosphere. The second level enables users to fine-tune the various objects they can individually manipulate. The extent to which the mix can be manipulated by the user is determined by the presets, which are defined via metadata from the broadcaster.
MPEG-H-enabled TV sets have been on the market since March 2017 from manufacturers such as Samsung and LG. While the interactive features can be enjoyed on every device with MPEG-H support—from large UHD TVs to small smartphones—the reproduction of true 3D sound requires additional hardware. Most TV viewers will, however, be reluctant to install a 3D loudspeaker setup in their living rooms, since it requires numerous loudspeakers and cables along with specific technical knowledge for setting them up.
A more consumer friendly solution to 3D sound reproduction at home comes in the form of a 3D soundbar. The first 3D soundbars capable of playing back MPEG-H content are expected to enter the market later this year, including one from the German company Sennheiser, which debuted its MPEG-H enabled soundbar prototype at CES 2018 to rave reviews (see Photo 3).
3D Audio Rendering
The most recent Fraunhofer development in this field takes care of rendering 3D sound in a tailored way: The upHear Immersive Audio Virtualizer is a codec-agnostic audio post-processing technology that enables the reproduction of high-quality 3D sound from any input source in soundbars or TV sets.
Fraunhofer upHear consists of a sophisticated signal analysis and optimal use of the device hardware for signal reproduction (see Figure 1). The upHear algorithm is individually tuned by experienced Fraunhofer sound engineers for the various soundbar product categories. This approach has, among other achievements, led to the development of an effective soundbar design that reproduces immersive audio without the need for additional satellite surround loudspeakers and subwoofers.
Thanks to an intelligent sound distribution in the room, an upHear-enabled soundbar can achieve an immersive audio quality that’s almost indistinguishable from discrete 3D loudspeaker setups. upHear can process immersive audio formats as well as legacy surround and stereo content sources which are spatially enhanced with the help of the integrated upmix. The result is a 3D audio consumer experience that is convenient and easy to set up, without a compromise in high-quality sound reproduction.
MPEG-H Audio is the perfect partner for upHear, as it enables an efficient transmission of high-quality immersive audio through TV broadcasts or streaming services, at bit rates which are typically used just for 5.1 surround transmissions today.
With MPEG-H and associated technologies, Fraunhofer continues the drive to give consumers greater freedom in the way they interact with audio content along with the most convenient way to enjoy immersive sound. Such advancements in audio technology break through the limitations that made enjoying a favorite broadcast a somewhat passive experience. As consumers realize the advancements now available at their fingertips, it is reasonable to expect even more tools and capabilities to be delivered soon by innovators such as Fraunhofer.
More information on the MPEG-H Audio System is available at www.mpegh.com. aX
This article was originally published in audioXpress, June 2018
About the Author
Stefan Meltzer studied electrical engineering at the Friedrich-Alexander University in Erlangen, Germany. In 1990, he joined the Fraunhofer Institute for Integrated Circuits (IIS) in Erlangen, Germany. After working in the field of IC design for several years, Stefan became the project leader for the development of the WorldSpace Satellite Broadcasting System in 1995 and in 1998 of the XM Satellite Radio broadcasting system. In 2000, he joined Coding Technologies in Nuremberg as Vice President for business development, Germany. After Coding Technologies was acquired by Dolby Labs, Stefan joined Iosono as CTO in April 2008. In January 2010, Stefan started to work as an independent technology consultant with the main focus on audio and multimedia. In this role, he supported Fraunhofer IIS in the business development and marketing activities within the TV broadcast market. In March 2019, he was appointed Chief Business Development Manager at Fraunhofer IIS.
Resources
R. Bleidt and J. Martins, “Considerations for the MPEG-H Audio Standard,”
http://www.audioxpress.com/article/Standards-Review-Considerations-for-the-MPEG-H-Audio-Standard
Fraunhofer IIS, www.iis.fraunhofer.de/audio
upHear, www.uphear.com