NXP Speeds Development for Voice Applications with Voice Intelligent Technology and New Training Tools

April 7 2022, 00:50

NXP Semiconductors has announced Voice Intelligent Technology (VIT), a comprehensive, local voice control software package with online training tools, free for developers and manufacturers, regardless of end application production volumes. Based on advanced deep learning, VIT is a ready-to-use library that provides a far-field audio front end supporting up to three microphones, an always-on wake word engine, and a voice command engine. NXP’s free online tools supports custom wake words and voice commands, using simple text entry and without the need for voice recordings.

The latest addition to NXP's voice enablement portfolio is Voice Intelligent Technology (VIT), a comprehensive voice control software solution available as a ready-to-use library in the MCUXpresso Software Development Kit (SDK). NXP launched the VIT solution to inspire developers to invent new and innovative applications for local voice control, and low-cost on-device voice control applications. The software always developers to easily train their own commands without the need for specialty tools or audio recordings required by other solutions. Because VIT software is royalty-free, it can be scaled to mass production on edge device applications at no cost to developers.

Implementing reliable, on-device voice control can be extremely challenging. Developers not only need to select the appropriate hardware, but also must navigate complicated speech processing software. This frequently requires managing an audio front-end beamformer, as well as a separate wake-word and voice command engine, often from separate software vendors.

Based on state-of-the-art deep learning and speech recognition technologies, NXP's VIT software provides a complete far-field audio front end (AFE) supporting up to three microphones, as well as speech processing software, which includes an AFE beamformer, a separate always-on wake-word engine and a voice command engine, along with online tools to generate custom wake word and voice command models.

VIT uses state-of-the-art deep learning technology to help developers create and program voice command vocabularies. The VIT tool maps user-entered text commands to phonemes sequences and generates a downloadable model file for the target device software. Speech commands are processed using ML and deep learning technologies to create the neural network model.

The VIT far-field AFE supports different microphone topologies with no tuning required, as well as local voice command recognition with on-device processing. With VIT’s text-to-model approach, it’s easy to make custom versions of wake words and commands.

The VIT software is available on several popular NXP i.MX edge processing platforms based on Arm Cortex-M7 and M33, Cadence Xtensa HiFi 4 and Fusion F1 cores. It is currently supported on i.MX RT500 MCUs with M33, DSP and GPU cores, i.MX RT600 MCUs with M33 and DSP cores, i.MX RT1060 MCUs with a M7 core, i.MX RT1160 MCUs with M7 and M4 cores, i.MX RT1170 MCUs with up to 1 GHz MCU with M7 and M4 cores, with device support for other products in the future.

"Voice is the interface of choice for many smart technologies, including those in smart homes, smart cities and smart factories. By reducing the complexity of voice application development, we’ve made it easier and faster to bring new, on-device voice control to market,” says Joe Yu, Vice President and General Manager, IoT Edge Processing Product Line, NXP.

The VIT library is delivered as a ready-to-use library in the MCUXpresso SDK and the online training tool are available here.
www.nxp.com