The Impact of MEMS Speakers in Audio

February 4 2020, 05:10
The consumer electronics industry at-large has been able to digitize and shrink most of the device components and electronics. One of the last remaining barriers is the speaker — it remains comparatively heavy, bulky, and restrictive. This month, we explore new technology microspeakers and their challenges to transform the consumer electronics industry as microelectromechanical systems (MEMS) did to electret condenser microphones (ECMs). MEMS is the technology of very small devices, usually consisting of a micro-sensor/transducer and an application-specific integrated circuit (ASIC) that processes data or in the case of MEMS speakers amplification and codec functions.
Three examples of MEMS speaker chips. The Audiopixels speaker chip, USound MEMS speaker and the new Fraunhofer MEMS speaker chip.

Back in January 2015, I gave a talk on MEMS loudspeakers at the Association of Loudspeaker Manufacturing & Acoustics (ALMA) International’s Symposium and Expo (AISE) and followed that discussion with an article in the pages of Voice Coil. Five years later, and the time is now ripe for an update survey of MEMS audio technology — what is just now reaching the market and what is coming. This month’s focus centers on microspeakers and earphones, with MEMS microphones coming up next.

We will examine MEMS devices in general, provide a basic explanation of how the various types of MEMS speakers work, and what commercialization challenges are ahead. As the first MEMS speakers are now becoming viable commercial products, we also need to consider their practical applications, unit costs, and acoustical strengths and weaknesses.

So now it is reality check time — Could MEMS loudspeakers signal the end of speakers as we know them? As MEMS speakers are just starting to evolve into viable commercial products, what might be the impact on the speaker industry, practical applications, projected unit costs, and acoustical strengths and weaknesses? For decades, MEMS microphones were all show and no go. Yet progressively over the last few years, they have come to dominate the mobile audio device market. On the other hand, ATCO’s hypersonic array speaker was also supposed to take over the industry — but that was more of a stock market exercise — and today, it is only a boutique application.

NXT’s distributed mode loudspeakers (DML) also supposedly signaled the “end of the world for existing speaker technology,” but even 20 years later this flat-panel topology’s current proponents are still fighting for market share. The Tymphany linear array transducers (LATs) of a decade ago showed another different path to sound reproduction. Currently, Tymphany is successfully producing conventional but well-executed speaker designs and no longer offers LATs.

With these past “not-quite game changers” still fresh in our memories, how might MEMS speakers fit into the speaker industry? It is clear that speakers are going to be a tough application for MEMS technology as speakers need piston area and excursion to move air. The verdict is MEMS speakers might take some time to come to fruition.
This is a cross-section of an electrostatic IEM tweeter developed by Sonion, which designs and manufactures cutting-edge audio components and provides complete solutions to its customers who then manufacture hearing aids, in-ear earphones and hearables/wearables.

A Bit of MEMS History
MEMS development over the last three decades has been slow and painful. The semiconductor industry’s favorite joke regarding MEMS development roadmaps are that they are calculated in dog years (seven times that of human years). However, MEMS devices became practical when they could be manufactured with high yields using integrated circuit (IC) fabrication and device packaging processes. 

MEMS devices include microphones, accelerometers, vibration/shock sensors (e.g., burglar alarms and airbag sensors), gyros and now microspeakers and earphone transducers. The implementation of MEMS speakers is daunting compared to mics due to the far higher excursion requirements. Yet even the promise of MEMS microphones was slow to be achieved, with many development teams in the 1990s eventually giving up. Venture capital investments in MEMS mic startups rarely reached successful outcomes as the investors just did not have the staying power to keep pouring funds into research and process control.

There are quite a few steps in MEMS fabrication and getting high yields on every step always seemed to be another development phase away. It took more than 20 years for the first billion MEMS microphones, and two years for the second billion’s production, compared with monthly production now reaching about 1 billion monthly. Today, MEMS microphones totally dominate smartphones, tablets, laptops, portable media players, speech recognition systems, personal computers, surveillance cameras, 3D cameras, radars, anti-theft alarms, headphones, smart speakers, music recorders, and various smart home voice command appliances, including air conditioners, refrigerators, and service robots.

Back to the MEMS Microspeaker
The microspeaker and earphone driver market is about $10 billion annually. Just considering the work needed to shift production lines, even automated speaker production line manufacturing, over to semiconductor foundries is mind boggling. The titans of microspeaker manufacturing typically have about 50,000 employees, while MEMS foundries producing similar quantities of devices have staffs of less than 500. Yes, the wafers from the foundry will still need to be “packaged” but a few zeros in workforce numbers are still lobed off…) With the rising cost of salaries in China, MEMS microspeakers will have a dramatic impact on staffing along with other far-reaching implications. But it is not just the fabrication of the transducers, but the promise of automated pick-and-place of MEMS speakers for surface-mount technology (SMT) board stuffing rather than hand soldering of billions of speakers. Let’s ponder practical applications, projected unit costs, and acoustical strengths and weaknesses.

While MEMS microphones have taken the lion’s share of the microphone market, why are the microspeaker transducers almost 100% electro-dynamic (magnetic structure with a voice coil)? The 800 lb. gorilla blocking MEMS speakers is “pumping power.” While the micro-mechanism in MEMS mics only need to have enough movement to respond to the acoustic signal, MEMS speakers need to move air. But even MEMS mics have so little excursion capability that the acoustic overload point (AOP) is a serious consideration in spec’ing MEMS mics.

Some MEMS mics will latch up (the diaphragm will stick to the plates) if what they are mounted into is dropped or even if a car door is slammed. With conventional speakers, acoustic physics for sound output is the Xmax (excursion) times the piston area. The typical smartphone speaker diaphragm footprint is 10 mm × 15 mm and has about 0.5 mm Xmax peak excursion. The air moving power of MEMS speakers is significantly less than even the lowest performance microspeakers.

In every case of the unique transducers surveyed here, output is minuscule and outside of the application to in-ear monitors (IEMs) or hearing aids, they must be used in multiples. USound describes MEMS speakers as the “LED of the acoustics,” and the size and configuration of the array would be application-specific. Multiple speakers means multiple cost. Many of transducers here are made from wafers, which are sliced and diced and then packaged into complete speakers, much like MEMS mics. Wafer costs could be $500 to more than $1,000 depending on the process and diameter of the wafer. Using advanced math, if you need two (or four or half a dozen) MEMS devices for your application, the costs of both the the wafer and the packaging starts to add up quickly.

Speaker engineers following conventional wisdom means that for achieving the required sound levels and bass response, you need to have a large enough diaphragm moving far enough. Some of the new contenders point out their sound production technology does not follow the conventional physics of moving diaphragm transducers. Just a caveat here, perhaps the rules are different but they may come with a new set of problems.

The “Holy Grail” for these alternative MEMS speaker technologies is to become the next smartphone microspeaker. There are more than 1.6 billion mobile phones produced each year, each with at least two microspeakers — a receiver and speakerphone transducer. Less “pumping power”is required for headphones than speakerphone applications, still less for earphones (and hearing aids) and even less for earphone “tweeters.”

MEMS speakers promise to be ideal receivers for in-canal hearing aids and implantable hearing devices (i.e., cochlear and auditory brainstem implants). These applications have very small “air pumping volume” required for adequate acoustic output due to the enclosed duct and close proximity to the middle ear. A more ambitious step are in-canal IEM earphones, which require not much more acoustic output than implantable transducers. Between these two applications, IEM tweeter transducers are another application (many IEM earphones are two-way or more designs using balanced armature drivers). 

The first MEMS speakers have already reached the market. There are a handful of MEMS solutions that replace conventional voice coil actuator with MEMS mechanisms while others are not precisely MEMS, but all are relevant for earphone and microspeaker applications. Each of these designs has significant development and manufacturing barriers to mass acceptance and productization.

Piezo Speakers
One of the promising technologies is piezo speakers which already have a long history in the speaker industry. Motorola’s ceramic horn tweeters were used in "prosumer" speakers by the millions for decades. Piezo microspeakers have very low profiles, which are highly desirable for smartphones, and there has been a half dozen short-lived piezo microspeakers. The challenge has always been the limited excursion along with lack of adequate bottom-end or even lower midrange output. The redeeming aspect of piezo transducers is that while excursion of the ceramic element is limited, this can be somewhat addressed with larger and thinner ceramics, but also as the force of the ceramic element is high, enabling a cantilever to increase excursion. Now for our survey of these next-generation devices including an overview of their technology and development status.
In 2019, Austrian start-up USound brought their first MEMS microspeaker to market, enabling the company to target opportunities in wearables, headsets and embedded speakers.

USound is a fabless audio semiconductor company offering piezo silicon speakers based on MEMS technology. USound was able to overcome the limitation of traditional piezo transducers, and with its innovative MEMS concept have proven that they can generate relatively large displacements. USound has developed and shipped several hundred thousand of what it believes are the smallest and first MEMS loudspeakers in the world.

USound’s Co-Founder & CTO, Andrea Rusconi pointed out that their major selling points, confirmed by customers, are form factor and weight along with reflow solder compatibility. Reflow soldering was the one major motivation for the breakthrough of MEMS microphones in consumer electronics. USound’s solution with its MEMS processes (including microelectronics-grade packaging) works better for speaker manufacturing but also at product level because reflow soldering of the speaker enables audio modules manufactured in SMD lines with integrated electronics (i.e., connectivity, sensors etc.). Another major advantage of USound’s MEMS loudspeakers is their flexibility, with different versions for in-ear and also speaker applications.

USound microspeakers are currently offered for smartphones, earbuds, audio modules for augmented reality and virtual reality glasses, and numerous consumer wearables, as well as 3D surround sound headphones. Together with production partners STMicroelectronics, Flex and AT&S, USound has implemented a global semiconductor supply chain.

TDK, best known for its sensors and electronics components, based its PiezoListen microspeakers on a haptic device. Twelve layers of piezoelectric material are stacked so that displacement and maximum sound level is increased with response down to 200 Hz. Intended for consumer products such as tablet computers and TVs. The speaker comes in two types: a "wide-range" type and "high-range" type. The wide-range type bandwidth is 400 Hz to 20 kHz. Though it does quite reach the low-end response of conventional speakers, it is adequate for tablets and laptops.

Ultrasonic heterodyne sound generators has had their proponents and commercial audio designs from ATCO, Holosonics (Spot Light) and others, but these have been shoebox to ceiling tile size implementations. Work continues on MEMS implementations of ultrasonic heterodyne, ultrasonic shutter modulation, and digital sound reconstruction for microspeakers. Questions remain on achieving signal reproduction integrity, issues with the high levels of ultrasonics generated to achieve adequate audio levels, and attaining usable low-end frequency response.
Audio Pixels is one of the pioneers in the development and production of MEMS digital speaker chip technology. Its silicon chip can be used either as a stand-alone speaker or cascaded in any multiples of the same chip to achieve required performance specifications.

Audio Pixels
Audio Pixels is one of the pioneers in the development and production of MEMS digital speaker chip technology. The company is directly generating sound from a digital audio stream. Audio Pixels holds innovative patents in the fields of electromechanical structures, pressure generation, acoustic wave generation, and control, signal processing, and packaging. Its silicon chip can be used either as a stand-alone speaker or cascaded in any multiples of the same chip to achieve required performance specifications.

This modular paradigm is comparable to “parametric speakers” such as phased arrays or using more transducers for increasing the dynamic range. Audio Pixels’ Digital Sound Reconstruction (DSR) technique is based on a theory introduced by Bell Labs in the 1930s. Originally a secure “digital” speech vocoder for military communications with a “digital speaker” to reconstruct the speech. The sound wave is generated from the summation of discrete pulses that are produced from an array of pressure generating micro-transducers. Within each transducer is an array of identical elements fine-tuned to a particular frequency. As with analog speakers, different frequencies are produced by varying the timing of the motion. Proof-of-concept continues to progress. Audio Pixels is in partnership with Sony as one of its MEMS foundry partners and ICsense for the ASIC design.

GraphAudio licensed the graphene audio work and patents from The Lawrence Berkeley National Labs in 2016 for development of commercialized audio products. GraphAudio has developed an electrostatic driver where the pure graphene diaphragm functions as part of the “motor.” Its initial products are earphones using a graphene diaphragm sandwiched between electrodes. When this field oscillates due to the audio signal, it causes the graphene to vibrate in a physical analogy to the audio electrical signal and this generates sound. It’s essentially an electrostatic speaker; but instead of a metalized polymer film diaphragm, graphene is used. Also in development is a studio microphone and super wideband measurement mic.
GraphAudio has developed an electrostatic driver where the pure graphene diaphragm functions as part of the “motor.” Here is an exploded view of an 8 mm speaker assembly.

Graphene diaphragms are very thin and light with a small spring constant so that the air itself damps its motion. The symmetrical push-pull electrostatic drive has been the core technology of the finest audiophile headphones and speakers and studio microphones. The ability to power graphene earphones and speakers using conventional mobile battery power expands their application from just the boutique end of the market. Batteries for the DC bias, work for graphene since they source only voltage and virtually no current. Since the power is tiny, there is no need for high current and small batteries suffice. Demonstration earphones have been produced and demonstrated with audiophile quality results.

Hedging its bets, Fraunhofer, the German research institute is developing both piezo and capacitive (electrostatic) all-silicon MEMS-speakers. Its CMOS-compatible MEMS speaker is based on electrostatic bending actuators. Future work will focus on increased SPL and reduced distortion through optimized actuator design. Concurrently, development continues on a piezoelectric MEMS with concentrically cascaded lead zirconate titanate actuators making it the first integrated two-way MEMS speaker.
Fraunhofer, the German research institute is developing both piezo and capacitive (electrostatic) all-silicon MEMS-speakers. Its CMOS-compatible MEMS speaker is based on electrostatic bending actuators.

Designed to operate without a closed membrane to improve the acoustic performance, energy efficiency, and manufacturability. Extensive finite element analysis studies revealed an SPL of more than 79 dB in 10 cm distance at 500 Hz for a device 1 cm² in size operated at 30 V. At higher frequencies larger SPL values are calculated enabling a flat frequency response with 89 dB for frequencies above 800 Hz. Based on this concept, first speaker prototypes have been fabricated.

Sonion’s electrostatic IEM tweeter (electret) is designed for a smoother, more clean sound in the higher frequencies than traditional balanced armature IEM’s using standard tweeters. The Sonion electrostatic super-tweeter produces high frequencies from 7 kHz and upward. The driver comprises a specifically arranged dual electret cartridge that lowers symmetric distortion combined with a miniature transformer. This enables electrostatic performance in IEMs without the usual separate power supply for stepping up the voltage and supplying bias to the driver. The result is stunning audio quality with crystal clear undistorted sound that goes well beyond the limits of human hearing. The dimensions of their electrostatic tweeter are 3.55 mm × 3.55 mm × 2.54 mm (32 mm³), a single version is also available and measures 3.55 mm x 3.55 mm x 1.27 mm (16 mm³).

xMEMS Labs
xMEMS Labs, a California MEMS startup has developed a MEMS speaker initially for earphone applications. Promising transducers of small size and low power consumption, with scalable design enabling the application’s SPL requirement defining the number and arrangement of “speaker cells.” Specifically a handful of cells may be sufficient for earbuds, but smartphones may require more. xMEMS claims ability to reach a range of frequencies as low as 20 Hz at least half the size of a conventional dynamic microspeaker. While xMEMs has not yet revealed specifics on its MEMS speaker technology, it is developing a complete MEMS process that reduce the manufacturing complexities which integrate the membrane and actuator making it uniquely capable for high-volume MEMS manufacturing. If you are curious, check out their patent on an “Air Pulse Generating Element and Sound  Producing Device.”
xMEMS Labs, a California MEMS startup has developed a MEMS speaker initially for earphone applications.

This MEMS speaker survey is just the tip of the iceberg as I know of other initiatives that are still in the stealth mode. But as with MEMS microphone development, many of these efforts will dead-end, at least until the technology infrastructure catches up. VC

This article was originally published in Voice Coil, December 2019.
Original Title: The Coming Impact of MEMS Audio in 2020.
The article was edited from the original version.
related items