For most people, wearing glasses or contact lenses all day long is a normal thing to do. These aids are lightweight, they do not need to be recharged, and they do not give the user the feel of being cutoff from the surroundings.
For earbuds this is certainly different. They are often more bulky, heavier, and less comfortable to wear, so they are not conducive to being worn all day. Even worse, their current technology quickly drains the battery. This limits how much smart functionality, Internet connectivity, and privacy earbuds can offer. A glance at a typical teardown (Figure 1) reveals that it is mostly the battery and the speaker consuming the space inside these designs.
In any case, the strive for better earbud technologies has recently gained substantial momentum. Improved batteries are on the agenda. Luckily, battery technology advances are currently driven by the megatrends of electromobility and green energy storage. Batteries for mobile communication will continue to become smaller, accordingly.
However, there are tight limits for the battery capacity when it comes to earbuds: Mounted inside the ear canal, the total energy stored in a suitable battery should not be large enough to represent a safety hazard. For this reason, more power efficient wireless solutions and less power consuming speech recognition platforms are key items on the R&D roadmap. In fact, all next-generation earbud components must improve: They must shrink in size, become much more power efficient, and comply with tough cost targets.
This clearly calls for innovations on microspeaker technology.
Pure Silicon Microelectromechanical Systems (MEMS) Transducers
Silicon technology is the modern world’s epitome of miniaturization and price competitiveness. Silicon derives its commercial success from replacing complex mechanical assembly processes by lithography — a process conceptually as simple as taking a photographic picture. Thousands of devices are made on a single silicon wafer, a kind of silverish disc, in one go. An unprecedented product quality, manufacturing productivity, and decline in cost per unit results. Is this also an option for microspeakers?
The mantra here is MEMS devices, transducers made purely from silicon. MEMS devices are ubiquitous in modern life, they tell our smartphones to rotate the picture from portrait to landscape, if needed, and trigger the airbag in our cars. Last but not least, they have become today’s dominant microphone technology. The idea of a MEMS microspeaker has attracted many talents around the globe for over a decade now. But making this dream come true is not easy.
Let us look for example, at the size of a MEMS microspeaker chip. A headset designer probably wants a sound pressure level (SPL) of say 120dB — to have enough systems budget for active noise control. The silicon version of a speaker using a diaphragm needs to have a large enough silicon chip surface to generate this SPL. But the cost of processing one silicon wafer is roughly the same, be it a wafer full of microphone chips or full of speaker chips.
Larger chip size means less chips per wafer and thus higher costs per chip. A typical MEMS microphone chip has 1mm². So, a MEMS microspeaker of say 50mm² is likely to be 50 times as expensive as a MEMS microphone! Commercially attractive MEMS microspeakers should, therefore, be much smaller, which is hardly compatible with the use of a silicon membrane.
There potentially are several ingenious ways to cut through the surface area dilemma. A very attractive strategy is depicted in Figure 2. The idea is to use the chip volume, rather than the chip surface! An earbud only needs to displace a certain volume of air to generate the desired sound pressure level in the ear canal. No need to be concerned about the physics of sound radiation.
An air displacement as small as 0.5mm³ is enough for 120dB and a typical wafer is 0.75mm thick. That makes the idea of using the wafer volume so attractive. The novel design, therefore, is to virtually cut the speaker diaphragm into small strips and mount them upright into the chip volume for lateral movement. When two of these lamellas move toward each other, they will squeeze the air enclosed between them through air vents, suitably positioned in a top and a bottom silicon layer enclosing the device. That makes a perfect acoustic dipole as required for a tiny earbud!
Lamella Movement
Now what makes the lamellas move? This is a challenge on its own! Employing electrostatic forces certainly provides scope for excellent audio quality without the need for any audio signal preprocessing. High-end headsets are designed this way. However, existing electrostatic headsets require hundreds of volts to drive them. That is not an option for earbuds! Neither would it be acceptable from a product safety point of view, nor would it be compatible with the performance of a small battery.
The root cause of the voltage dilemma is the difficulty to design an elastic suspension strong enough to tame electrostatic forces. Think of two electrically charged and elastically suspended plates, used to generate sound pressure, that are facing each other at a certain distance. The electrostatic force between them quickly gains strength with the surfaces approaching each other. This happens so rapidly that after about one-third of the initial distance, the balance of the elastic and the electrostatic forces collapses. An elastic suspension cannot stop this. A large initial electrode distance is usually required for this reason—to generate enough sound output. But a large distance between the charged surfaces means that a large drive voltage of hundreds of volts is inevitable.
A design solving the voltage dilemma is depicted in Figure 3. The lamellas shown in Figure 2 are indeed electrically insulated stacks of three electrodes each. The tiny roughly 2.5μm gaps between the electrodes, reduce the drive voltage requirements to only a few volts. Still it remains true that the maximum stroke of the electrode movement is restricted to about one-third of the gap. That is about 800nm only!
The curvy geometric shape of the lamellas depicted in Figure 3 is capable of mechanically converting the minute electrode movement into the required large stroke of the lamellas. That is the secret of this design. Essentially it is a smart mechanical lever mechanism.
The design principles presented so far enable a small form factor and a low voltage drive with a technology that exclusively uses pure silicon technology, requires no specialty materials or manufacturing processes, and is compatible with wafer-level assembly. The remaining topic to be addressed is power consumption.
Power Consumption
Conceptually, electrostatic speakers do not dissipate any energy at all. Any electrical energy required to compress the air in the ear canal, is released from the device back into the drive electronics once the audio signal commands the speaker to expand the air again: Effectively, the electrostatic microspeaker is all based on reactive power (i.e., power that is not dissipated). Power of course means energy utilization per time.
Unfortunately, a major amount, if not a multiple of the reactive power, will be dissipated by the audio amplifier into heat. And the series resistance of the battery worsens this power dissipation, even when the average power take-up is low. This is the challenge.
Basically, all MEMS designs aspiring to microspeaker markets are, from an electrical engineering point of view, capacitors. This means that the reactive power doubles when the frequency doubles, it quadruples when the voltage doubles. There is little that can be done regarding the frequency and the voltage range. The frequency range is defined by the required audio performance, and even if 120mV are enough to generate 80dB SPL, it takes a perfectly linear electrostatic transducer 12V for a 120dB SPL output.
Yet, for a miniaturized MEMS audio system this still is manageable, provided the electrical currents required to load and unload the electrical capacitance of the speaker are small. An attractive earbud audio reproduction system should have a total power dissipation of well below 3mW for a typical use case. Essentially, this puts a cap on the total electrical capacitance of the speaker chip. It should be significantly below 1nF. Note, the basic physics of sound generation (i.e., the adiabatic air compression in the ear canal), in principle allows for capacitances of a factor 30 below.
The additional capacitance is required to provide the energy needed to move the silicon parts inside the MEMS microspeaker. Fortunately, there are various apt MEMS design principles (e.g., reducing the total mass of elastically deformed silicon parts), avoiding fast moving silicon valves and using electrical surface charges instead of electrical space charges.
MEMS Speaker Development at Arioso Systems
Arioso Systems GmbH is developing high-fidelity microspeaker technology designs for mobile audio reproduction, far superior in size and power efficiency, and based on standard silicon-technology with the potential to claim cost-leadership in mass markets.
The idea is to only use CMOS-compatible materials and manufacturing processes, enabling a swift technology adoption by the market. Meeting tight cost targets calls for small microspeaker chip sizes. This is achieved by Arioso’s approach using the chip volume rather than surface mounted silicon diaphragms. An artist’s view of an earbud design using Arioso’s technology is shown in Figure 4.
Arioso’s technology aims at a wafer-level microspeaker assembly (Figure 5), largely avoiding costly mechanical assembly steps on the level of the individual speaker chips. The targeted electrical capacity of a microspeaker that produces a sound pressure level of 120dB from a 10mm² chip, is below 1nF. This restricts the reactive electrical currents to the level compatible with using small charge pumps without external passive components and enables the use of a novel, proprietary and very power efficient audio amplifier technology for driving. The targeted battery drain of the entire audio reproduction system is well below 3mW, for typical use cases.
Arioso Systems is a spin-off of the Fraunhofer Institute for Photonic Microsystems, Dresden. The company seeks to reach its key technological targets mid-2021, followed by a Series-A investment round. Currently, Arioso Systems is working with selected market participants to perfect its technology and identify market opportunities. aX
This article was originally published in audioXpress, January 2021
About the Authors
Hermann Schenk holds a PhD in theoretical physics from Bonn University. He was founder and managing director of Covion Organic Semiconductors GmbH, commercializing OLED materials. After the sales of Covion to Merck KGaA, he served as managing director of Freiberger Compound Materials GmbH, a leading supplier of III/V-Semiconductors. Currently he is co-founder and managing director of Arioso Systems GmbH, dedicated to the development, manufacturing and sales of MEMS μSpeaker technology. His recent research contributions are centered around novel modeling techniques for MEMS systems.
Lutz Ehrig studied Mechatronics in Dresden, Germany, and Acoustics in Aalborg, Denmark. In 2009 he joined the Fraunhofer IDMT in Ilmenau, Germany, as a research assistant in electroacoustics. In 2015, he joined a consulting engineering office as a project manager and was working for two years on room acoustics and environmental noise. Afterward, he worked as a research assistant at the Fraunhofer IPMS in Dresden, where he participated in the development of a CMOS compatible MEMS microspeaker. In 2019, he co-founded Arioso Systems and joined the company in 2020. He is responsible for Technical Marketing and Business Development.