"Sonos is committed to delivering new experiences that effortlessly connect listeners to the content they love,” says Joseph Dureau, Vice President, Voice Experience, Sonos. "One of the most natural ways to connect to your music is with your voice, but when we speak to our customers, we hear that privacy concerns mean many are choosing not to use voice control. Created purely for listening on Sonos and designed with privacy at its core, Sonos Voice control delivers the Sonos app experience using only your voice."
Sonos Voice Control works on every voice-capable Sonos speaker, processing requests entirely on the device. No audio or transcript is sent to the cloud, stored, listened to or read by anyone. Available on new voice capable products and as a free update for existing users, Sonos Voice Control is compatible with Sonos Radio, Apple Music, Amazon Music, Deezer, and Pandora at launch, with more services and markets to follow.
According to the company, the on-device voice control system was optimized to find the music users want to listen to, processing requests entirely on the device. "Local processing delivers faster response times, and effortless follow-ups. All you need is one “Hey Sonos” and you can follow up without the need for additional wake-words. Just like the Sonos app, you can control music and speakers in any room, easily move music around the home, save and like your favorite songs to your personal music library and more," the company explains.
The on-device voice control development effort for Sonos speakers was lead by Alice Coucke, Head of Machine Learning Research, Voice Experience; Joseph Dureau, Vice President, Voice Experience; David Leroy, Director, Voice Experience Machine Learning; and Sébastien Maury, Senior Director, Voice Experience for Sonos. In a detailed article about the effort, the team explains how they addressed the multiple challenges of speech understanding, including speech enhancement, wake word detection, automatic speech recognition, and natural language understanding. The core development has been in Paris, within Sonos' first product and engineering site in Europe, which resulted from the acquisition in 2019 of Snips, a company working in local and offline, embedded voice assistants and interfaces.
Sonos confirms that the team was able to overcome the significant challenges that have long been identified in voice control applications, that go beyond the variability of conditions for far-field voice capture, language variations and the training of models that can be suited to running effectively on-device. As the team explains, one of the key challenges to perform speech recognition directly on Sonos speakers was to be able to run all the operations needed to process user voice requests in real-time, using the computing and memory resources dedicated to Sonos Voice Control.
"We are very proud to have been able to deploy it on every model of voice-enabled Sonos speakers ever sold, including the most limited Sonos One Gen 1 with only a fraction of a Cortex-A9 CPU with 1GB of RAM available. It required handcrafted software optimizations for each target speaker to make sure we maximize the compute resource usage. This challenge has been achieved by optimizing a trade-off between accuracy and computational efficiency when designing the acoustic model, and by contextualizing the language model and the natural language understanding component to the music domain, in order both to reduce their size and increase their in-domain accuracy.
"We also developed our own deep neural network inference library to better control the inference process, especially matrix multiplication operations, to improve real time performance. We managed to keep the latency of Sonos Voice Control on par with cloud-based voice assistants, despite significant constraints on the hardware environment."
And of course, the team had to overcome the specific challenges that derive from music catalog management. While the phrasings and formulations used to request music are limited in variety, this use case is still very complex and involves an immense vocabulary of constantly evolving multilingual music entities. To make sure that common content is correctly decoded, regardless of the language of origin of a given song name, each music entity is processed through a curation procedure.
"On top of a common catalog of the most popular music entities, we also locally include each user’s personal library in their own voice engine. This personal content is favored over any other content in the resolution strategy in order to maximize the content coverage for everyone. The resulting music catalog is updated regularly."
Finally, the machine learning models were tuned to adapt to each Sonos speaker’s acoustic properties, from portable all-in-one players to home-theater soundbars, and to the acoustic environments typical of a Sonos user’s home: the command being given several meters away from the speaker, with external background noise (e.g. conversations or TV) or music played by the speaker itself (self-sound).
"To do so without large scale personal data collection, we rely heavily on scripted audio collection and simulations of large numbers of acoustic environments to artificially generate data representative of the production environment. These simulations are based on randomly generated virtual acoustic rooms with random speaker locations, sound pressure level adjustment, and addition of external noise and self-sound with controlled intensity," they detail.
The new Sonos Voice Control is therefore able to understand the nuance of human communication and respond to natural commands like “turn it up!”. And after careful search, Sonos chose award-winning actor Giancarlo Esposito - best known for his roles in Breaking Bad, Better Call Saul and The Mandalorian - to deliver a familiar voice for US customers. With careful recording, advanced processing and mastering, the voice is natural, unobtrusive, yet confident and engaging. Sonos’ first voice will be joined by others over time as Sonos continues to expand the experience to new people and places.
Sonos Voice Control will be available in the US starting June 1, and in France later in 2022, with additional markets to follow.
New Sonos Ray Soundbar
Together with the new voice control announcement, Sonos also extended its home theater line with an all-new compact and affordable soundbar, alongside three fresh new colors for its popular ultraportable Bluetooth smart speaker, the Sonos Roam. Ray will be available globally on June 7 for $279, while Roam’s three new colors are out beginning today, for $179 each.
“Homes have become movie theaters, fitness studios, gaming hubs and so much more, all supported by a streaming era that is no longer exclusive to just TV, music and film,” says Patrick Spence, CEO of Sonos. “Ray makes it easier than ever to enhance those listening experiences, thanks to its smaller size and impressive sound.”
While being affordable, the new Ray soundbar features acoustic innovations that make it able to deliver balanced sound, crisp dialogue and solid bass. Using custom-designed waveguides to project sound from wall to wall, and advanced processing, the soundbar accurately positions elements throughout the room so the user feels at the center of the story. A new bass reflex system with a proprietary design delivers convincing bass, while custom acoustics precisely harmonize mid and high-range frequencies.
Like all Sonos speakers, Ray was tuned with the input of the Sonos Soundboard, and users can tune the sound even further with Trueplay to create the ideal listening experience for any room. Sonos’ Speech Enhancement ensures even greater clarity, while Night Sound reduces the intensity of loud effects in order not to disturb anyone else at home.
The new soundbar is compatible with all streaming services, and users can easily control Ray with an existing TV remote, the Sonos app, Apple AirPlay 2 and more. Adding a pair of Sonos Ones to the home theater setup enable surround sound, while connecting any Sonos speaker enable multi-room listening.
www.sonos.com