For the Love of Music: Loudness Normalization

August 21 2024, 14:10
Wild theories about level normalization "destroying sound" continue to reappear, as if decades of loudness normalization efforts have been for nothing. This article hopes to set the facts straight regarding loudness concepts, standards, and implementation. Loudness normalization not only tremendously improves the listening experience for consumers, but it is also an element in control of sound exposure and prevention of hearing loss.

 
ThomasLund_LUFS-TWeb.jpg

From the early days of sound recording, compressors, limiters, and de-essers have been used to keep up the level, making most of the dynamic range of tape recorders and cutting machines. In listening tests, loudness is a typical confounder because it only takes a fraction of a decibel’s difference to systematically bias results in favor of louder examples. At normal listening levels, everybody likes louder, so what is the problem? 

When digital recording was introduced around 1980, within a decade, dynamic compressors were just as active as before, and overall level in music started creeping up. New digital processors, however, could bring absolute level up higher than ever, turning the pop music industry effectively into level junkies, with no regard for details such as the sampling theorem or co-existence with other genres.

At the height of the music loudness wars, digital audio was saved from a death spiral by a handful of companies, broadcasters, and labs working together to develop an open, global standard for setting the level reliably and transparently between tracks and programs. 

The ever-increasing compression of music had started spreading to other fields of audio, but it literally became pointless over night, when any upside of squashing and adding more than 10% of irreversible distortion to tracks vanished. Consequently, loudness normalization has been a tremendous benefit to listeners, streaming providers, and broadcasters; so content from different eras or genres can be interleaved without the need for listeners to constantly adjust the “volume.”

Without loudness normalization, lowest common denominator practice would have kept churning without moderation. Most content — music, broadcast, drama, film, and gaming — had all been destroyed by now; and manufacturers would have been designing with gain structures making it impossible to break free of an appalling situation. The fine potential of digital audio had been wasted, and everybody would merely be obsessed with keeping average level close to 0dBFS.

It took an international, rational intervention to stop this death spiral, and we need to remind ourselves from time to time. Even in audioXpress, hearsay about loudness normalization finds its way to the articles. This was recently the case, so here are some more accurate details and references on the venerable loudness standards.

Level and Loudness Measurement
In the late 1990s, it became obvious digital audio had a systemic level problem, and it was documented how standard music production procedures led to massive amounts of distortion being generated in mastering, lossy codecs, sample rate converters and DA converters inside consumer equipment [1]. In addition to new music releases being unpredictable in how much downstream distortion they would generate, classic albums also degraded with every remastering and new “deluxe” version [2].

Good people at ITU-R wisely decided not only to improve the way peak level was measured, but also to develop a new metric estimating perceived loudness. The two would replace the quasi-peak meters typically in use at the time, and provide a relevant answer when normalizing programs, commercials, and tracks for broadcast.

The pioneering work culminated 20 years ago when loudness models from different labs and companies were compared, and a relatively simple calculation, proposed by Gilbert Soulodre from CRC in Canada, performed as well or better than complex and patent-prone submissions from around the world [3]. In the following years, Gilbert’s model was verified in several independent studies, based on thousands of assessments of music, speech, nature, and effect sounds by hundreds of different people [4]. The model was also extended from its original mono topology to stereo, which included an update of the original frequency weighing, known as “Leq(RLB)”.

The final frequency weighting got the designation “K”, and the resulting standards, ITU-R BS.1770 and BS.1771, are capable now of measuring loudness and true-peak level in audio formats from mono to 22.2. Considering the loudness calculation, channels are summed in the power domain, thereby largely mimicking how loudspeakers behave in the real world. Sample-peak measurement tends to underestimate the peak level burden of a downstream path, sometimes by 6dB or more. Its replacement, true-peak measurement, is more accurate and not so easy to fool. However, true-peak measurement was included in the standards for one reason only: To avoid overload.

The exemplary International Telecommunication Union (ITU) standards are practical, universal, free to use, simple to implement, and rigorously tested. It is quite a challenge to find audio examples disagreeing much with an average person listening at normal playback level. BS.1770 and BS.1771 are therefore at the core of countless regional measurement and normalization standards, in Australia, Brazil, Canada, China, Europe, India, Japan, Korea, United States, and others. Loudness is a rare case of global agreement on a single measurement method. The only minor ambiguity is the unit used to express Loudness, which follows ISO conventions in some regions, [LUFS], but points to the K weighting elsewhere, [LKFS]. The two, however, measure the same, so -20 LUFS, for instance, is exactly the same as -20 LKFS.
 
Figure1-Lundt-Loudness-Web.jpg
Figure 1: Four tracks creating annoying level jumps without Loudness normalization, indicated by arrows.

Loudness Normalization
Loudness normalization is the principle of assigning one static gain value to each track, program or commercial, to minimize level jumps between them during playback. Audio platforms revolve around a specific target Loudness level, not only in broadcast, but also in OTT (Netflix, HBO, etc.), podcast, car audio, gaming, music streaming, and others.

If a streaming platform has chosen a reasonable Target level, or if it only ever applies negative normalization gain, no dynamics processing is needed during distribution. Broadcast and OTT providers use a target level of -23 LUFS or lower, while music streaming typically use a target level in the -12 to -20 LUFS range.

Figure 1 shows the same four music tracks played unchecked and with Loudness Normalization, using a target level of -20 LUFS and negative normalization gain only. Studies find 95% of listeners wanting to adjust the level if loudness increases by 5dB from one track to the next, or if loudness decreases by 8dB [5]. The example is nowhere near worst case, but with unchecked playback, listener annoyance is still triggered at every transition (black arrows).

A target level in the -10 to -14 LUFS range makes Loudness Normalization much less helpful, because well produced tracks tend to either play too softly, or they may be subjected to dubious dynamics processing during playback. In technical terms, if the Peak-to-Loudness Ratio (PLR) of a track is lower than the platform’s headroom, Loudness normalization works as intended, and no further dynamics processing is needed. PLR examples of typical content are shown in Figure 2.

Compared to other degradation happening to a track during distribution (e.g., an unknown version, lossy data reduction, peak clipping, dynamics, or other processing), Loudness normalization, even at the source, is entirely negligible. If Loudness normalization happens late in the chain, it does not affect audio quality at all. Applied in the output amp, Loudness normalization rather improves resolution and dynamic range of the entire system.
 
Figure2-Lundt-Loudness-Web.jpg
Figure 2: Peak-to-Loudness Ratio (PLR) values of typical content = height of the blue bar. 2020 pop is the appalling outlier.
Normalization is normally performed track by track or program by program, but two special cases are also prevalent: 1) If a service only carries music, normalization may be set to “album,” by which the loudest track of an album decides the normalization gain applied to all tracks of that particular album; or 2) If a service only carries speech-based content, normalization may be based on just the level of a program’s speech, disregarding all other sounds.

In a new study by the European Broadcasting Union (EBU), however, it is recommended not to let speech level drop more than 5 LU below Program Loudness [6]. The speech-only measurement method is not ideal when leveling promos or commercials, so they are normalized based on Program Loudness.

PLR in Music Streaming
Figure 3 shows PLR of the most popular music tracks over the past 60 years. PLR in acoustic music, human voice and music reference tracks generally measures between 14dB and 20dB. Pink noise or a vacuum cleaner has a PLR of around 10dB; so an average PLR of 7.5dB in 2023 is a sign of general and severe squashing in pop music still. “As It Was” by Harry Styles took the prize as worst Top-10 production in 2023 with a PLR of a measly 5.2dB. 
 
Figure3-Lundt-LoudnessPLR60yrs-Web.jpg
Figure 3: Average PLR in pop music over the past 60 years. The graph represents >7000 tracks. Adopted from Ortner’s work [2], extended by the author from 2011, based on official UK singles charts from each year.
Besides the irony that pop productions recorded with today’s high-resolution converters and DAWs end up having lower PLR and worse audio quality than cassette tapes in 1975, another major issue has also developed in music distribution: We simply no longer know what we are listening to. Streaming services often carry just the latest version (read: worst) of a track, and some squash it even further. Just to be sure it is dead, I presume. At times, Apple Music Store (AMS) have more versions, but you cannot tell which might be good before buying.

At the 2023 Audio Engineering Society (AES) convention in New York, we played two versions of Toto’s “Africa” from AMS (see Figure 4). HDMI audio in the conference room was unreliable, so we had to make do with MacBook speakers from the podium. Even so, the first eight to 10 rows could easily tell a difference, and everybody preferred the high PLR (older) version.

George Massenburg, who sat at the podium, spotted the squashed version after a few seconds, though the tiny MacBook speakers were pointed away from him. The small loudspeakers did not even reveal one of the most annoying side-effects of squashing classic tracks, how kick drum and bass notes get stretched in time by a dynamics processor, because the level is already maxed out.

Notice how hyper-compression often affects low frequency timing and temporal integrity of a track. Because audio, imaging, and timing of the music get distorted with hyper-compression, it is dishonest to release new versions of music at high bitrates where PLR is intentionally low. Quality conscious music streaming might instead enable listeners to select tracks based on PLR value, or simply always suggest the version with the highest PLR.

To check your own collection, MusicTester measures and auditions tracks at high audio quality completely for free. It is available at http://musictester.net/demo. Enable Loudness normalization to hear the real difference between tracks, or between versions of the same track.
 
Figure4-Lundt-Loudness-TotoAfrica-Web.jpg
Figure 4: MusicTester showing the five measurements indicated per track. It also plays linear or lossy files.

Loudness and Personal Players
Health standards to prevent hearing loss from the use of personal players are transitioning from electro-acoustic gain-capping to sound exposure estimation. Thereby, the damage-potential of content is now determined by sound energy, rather than the position of the “volume” control. If you are playing quiet or high PLR songs, the control can be turned up higher than if songs are hyper-compressed [7].

Loudness metrics are based on sound power, which is a component of sound exposure. Besides Loudness normalization providing user satisfaction by reducing level-jumps between tracks, normalization keeps sound exposure at bay, and it helps the user to select a sustainable “volume” setting (e.g., for an 8-hour flight).

Every personal player sold in Europe is mandated to perform sound exposure estimation, and to warn the user about potential danger [8]. Consequently, there is no place for squashed music or commercials to hide, even if upstream Loudness normalization is mediocre or non-existent.

Furthermore, the International Electrotechnical Commission (IEC), International Telecommunication Union (ITU), and World Health Organization (WHO) are recommending the same methodology and the same sound exposure limits globally. In essence, protecting hearing health plus reducing the risk of excellent recordings being further destroyed. Harry Styles can’t be saved, but future artists might be. aX


Resources
[1] S. H. Nielsen and T. Lund, “Overload in Signal Conversion,”Audio Engineering Society (AES) 23rd International Conference, Elsinore, Denmark (2003).
[2] R. M. Ortner, “Je lauter desto bumm! – The Evolution of Loud,” Master’s Thesis, Donau-Universität Krems, Austria (2012).
[3] G. Soulodre, “Evaluation of Objective Loudness Meters,” Audio Engineering Society (AES) 116th International Convention, Berlin, Germany (2004).
[4] E. Skovenborg & S. H. Nielsen, “Evaluation of Different Loudness Models with Music and Speech Material,” Audio Engineering Society (AES) 117th International Convention, San Francisco, CA (2004)
[5] E. Skovenborg & T. Lund, “Loudness Descriptors to Characterize Wide Loudness-Range Material,” Audio Engineering Society (AES) 127th International Convention, New York City, CA (2009).
[6] F. Camerer et al., “Loudness Normalization of Cinematic Content,” EBU Recommendation R128 s4, Geneva, Switzerland (2023).
[7] T. Lund, “Prevention of Hearing Loss From the Use of Personal Music Players,” Audio Engineering Society (AES) 58th International Conference, Aalborg, Denmark (2015).
[8] EN/IEC 62368-1,”Requirements for Personal Music Players, Safety Standard,’ Bruxelles, Belgium (2023).

Audio Engineering Society Audio Loudness Web Portal

Related Standards and Recommendations
AES TD1008: Recommendations for Loudness of Internet Audio Streaming and On-Demand Distribution (2021).
ATSC A/85: Techniques for Establishing and Maintaining Audio Loudness for Digital Television (2013).
ANSI/CTA-2075: Loudness Standard for Over the Top Television and Online Video Distribution for Mobile and Fixed Devices (2020).
EBU R128: Loudness Normalisation and Permitted Maximum Level of Audio Signals. Recommendation (2023).
EN/IEC 62368-1: Audio/video, information and communication technology equipment. Safety requirements (2023).
ITU-R BS.1770: Algorithms to measure audio programme loudness and true-peak audio level (2023).
ITU-R BS.1771: Requirements for loudness and true-peak indicating meters (2012).
ITU-T H.870: Guidelines for safe listening devices/systems (2022).

This article was originally published in audioXpress, July 2024.
Page description
About Thomas Lund
Thomas Lund is the author of papers on human perception, envelopment, loudness, sound exposure, and true-peak level. He contributes to audio standardization, is a researcher at Genelec OY, and convenor of the working group TC108X/WG3 under the European Commiss... Read more

related items