There are many challenges when formulating audio products for hostile environments like the outdoors, confined spaces, noisy backgrounds and more. When formulating for these unique applications, the two most important areas to focus on are creating quality sound playback and enhanced voice pick up. Within these two categories, of course, are multiple factors to consider for your unique applications. Let’s dive into them a little deeper.
Sound Playback
Sound playback is the audio that comes out of your device. The factors that contribute to compromised sound quality will differ for each application. For example, a confined space will have different problems from an open one. Here are the areas you should consider:
Lack of bass. If your product is going to be outside, you have to account for the difference in directionality between bass and treble to ensure that your users can hear the balanced frequency spectrum. This requires clever placement of the drivers and use of reflective surfaces to reflect the sound. To accomplish this, consider a full-range driver facing toward the intended listener, or if reflective surfaces are known and constant, position one of the drivers to reflect off of that surface toward the intended listener. If there are corners in the room, it typically helps increase the bass performance.
Loss of treble clarity. Let’s say you want to put a speaker inside a couch. This limited space or non-optimal direction can cause loss of treble which causes a lack of voice clarity and dull musical playback. To solve this challenge, try the addition of a high-frequency driver or add energy using a digital signal processor.
Distortion in highly vibrating systems. As with any audio system, mechanical vibrations may be produced, which can add unwanted sounds. As the system power increases, these vibrations will need to be addressed with structural methods. Another method is to use Digital Signal Processing and remove the offending frequencies from the incoming signal. Using damping materials can also reduce distorting vibrations.
Portability. The increase in power density of modern batteries has led to an increase in the performance of portable playback devices. With this increase in power capability, new speaker drivers have been developed to handle the increase in power. The application of digital amplifiers has also led to a large increase in power output without the need for heatsinks. These two technology advancements have enabled the current generation of portable devices to play louder and longer.
Noise suppression. The playback system can also be used to perform noise suppression with the addition of a microphone. The microphone will capture the environmental noise using an “anti-noise,” or Active Noise Cancellation algorithm. The playback can produce an out-of-phase response to eliminate portions of the environmental noise.
Voice Pick Up (Communication & Smart Assistant Commands)
In hostile sound environments, it can also be difficult for voice agents to pick up commands. What our human ears can naturally tune out, like wind, background noise, etc., is extremely difficult for voice agents. Here are some key areas you should consider when building your smart product:
Single or multiple interfering audio sources. When was the last time you thought about background noise? As with most humans, our brain is the best audio processor developed. With voice assistants, they do not have the luxury of a “brain” with built-in audio processing. If you analyze a full-frequency recording of a “quiet” modern environment, you will see a large amount of energy in all frequencies. This needs to be removed before any speech algorithms can successfully process the audio. Many techniques have been developed to pre-process or remove the unwanted energy.
Beamforming: The use of two or more microphones to form a beam or a sensitive direction while reducing the sensitivity in other directions.
Blind Source Separation: Choosing a frequency range and rejecting other frequency ranges; for example, human voice while rejecting HVAC noises.
Reverb signals. Another issue with voice processing will be voice reflection from the environment. Once again, the human brain is programmed to remove most of the echo signals. A voice assistant also has to be programmed to remove what is called the “long tail” in the audio signal. There are many DSP algorithms for removing reverb energy from the microphone signal.
Acoustic Echo Cancellation. As with most technology, voice processing will depend on the generation of the device. For example, a smart speaker could be playing music, which will saturate the microphone on the device. This can be handled with DSP by using the original audio signal as a reference and subtracting it from the microphone signal.
Whether you’re formulating a product for the outdoors, a noisy space, a shower or a moving vehicle, correct placement of microphones and the precise tuning of your voice algorithms can help provide accurate voice capture. As with any complex system, the overall solution is only as strong as the weakest component. aX
Bruce Ryan, Director of Engineering, Harman Embedded Audio
This article was originally published in The Audio Voice newsletter (#344), September 2021.