Arm Newsroom Blog

Sensory Removes Barriers to Entry for Endpoint Voice Control

Sensory Inc. executive Joseph Murphy explores the obstacles in the path to full natural language control of our home devices, and the ways Sensory is mitigating against them
By Joe Murphy, Executive, Sensory Inc.

“Defrost one pound of frozen chicken.” I have no clue what magical button sequence on my microwave will get me there, but it sure is easy to just say it. “Start a load of whites at 10pm tonight.” I’m not even going to try to dial that in to my washing machine—if I did, I’d undoubtedly end up with a pile of shrunken, off-color laundry.

Today, voice control around the home is mostly limited to the smart speaker and the smartphone. Yet just look around your home and I am sure you will find at least two or three appliances that have a user interface that is a confusing mashup of buttons, lights, switches, text and beeps.

These appliances offer multiple layers of functionality and advanced convenience options, but no one knows how to access them. Voice and natural language will unlock the advanced functionality already supported by such devices: you’ll speak to them in your natural language, and they’ll understood and action your requests.

In one form or another, Sensory has been supporting voice-forward and voice-first products for more than 20 years, working on voice technologies such as wake word detection, embedded speech recognition and voice biometrics for a long time. We’re now more committed than ever to a future of voice control because there are so many opportunities beyond smart speakers and mobile phones.

If the microwave example I opened with seems far-fetched, I have some news: It’s already available to preorder on Amazon. Watch the video below to see how we did it.

That microwave may seem cutting-edge, but to me and my colleagues at Sensory, voice and natural language has always made so much more sense as a way to ask machines to do what we want. So why don’t we see more products taking advantage of voice commands, what are the barriers to entry and how are we mitigating against them?

Barrier 1: Voice adds cost

Cost is a primary concern for brand owners. Adding voice capabilities to a device typically incudes an increased bill of materials (BOM) due to the components and technologies required to support the voice experience.

Consider the average smart speaker that relies on a direct line to the cloud via Wi-Fi, an applications processor and an advanced audio front end. Adding all that to an appliance can quickly get expensive.

Sensory’s TrulyNatural speech recognizer is embedded on device: all processing is performed at the endpoint. With support for tens of thousands of natural language commands, there’s no need to process the audio it captures in the cloud. Embedded speech recognition also alleviates any privacy concerns about recordings in the cloud.

We’ve also reduced BOM significantly by removing the application processor. While most natural language engines require an application processor and operating system, a recent breakthrough from Sensory has enables us to run our speech recognition on ARM Cortex-M7.

Low cost and energy efficient Cortex M cores are already embedded in tens of billions of consumer devices. The ability for us to run natural language on these microcontrollers opens up a whole new range of products for voice control.

Barrier 2: Voice is hard to develop for

There’s no doubt that developing custom voice grammars and language models requires a very specific set of skills and a talented team. However, Sensory recently launched VoiceHub, an online portal that brings those skills to the masses. VoiceHub enables developers to create custom voice grammars without the need for their very own machine learning team.

Sensory’s VoiceHub is almost a WYSIWYG interface for designing voice control commands. Developers can login and create prototype grammars for free. Proof of concept, MVP and demos that once took weeks can now be created in hours. No previous programming skills are required to develop complex grammars with hundreds or thousands of voice commands.

The capability to run Sensory’s speech recognition on Arm Cortex-M based microcontrollers has already been integrated into Sensory’s VoiceHub. Developers in VoiceHub can create edge-based artificial intelligence (AI) voice models and download them directly to their Arm-based developer board. Developing voice control with Sensory and Arm just got a lot easier.

VoiceHub – Fast, Free, Flexible Voice Development

Barrier 3: Voice is confusing for customers to set up

The final barrier to consider is applicable to most smart home products and not just voice control. Getting smart home devices set up can be a real hassle for consumers. The initial setup can be a real pain point.

Most smart devices require a smartphone app and sometimes multiple apps for setup. Then a confusing mix of Wi-Fi, Bluetooth, ZigBee and other protocols can leave consumers feeling like they need to hire a system administrator just to turn the lights on.

If the neighborhood ever loses power or internet connectivity, it can take time to get everything provisioned again.

Sensory speech recognition is performed at the endpoint—so even if your Internet is offline, it’s ready to go, out of the box. Just plug in the device and start talking.

What’s Next for Voice?

Enabling the voice user interface is in Sensory’s DNA. We’ll will continue to push the envelope on accuracy, language support, and memory size. Sensory’s technologies are well positioned to support the next wave of voice user interfaces and enable voice control on endpoint devices that were previously only touch-based and all too often impossible to understand.

And thanks to Arm Cortex-M microcontrollers, we can guarantee that we’ll be able to speak to this new wave of devices in our natural language, in the knowledge that even without a connection to the Internet, all of that compute is happening on the device itself.

Cortex-M7: The Highest Performance Cortex-M Processor

The Cortex-M processor series is designed to enable developers to create cost-sensitive and power-constrained solutions for a broad range of devices. Highly energy efficient and designed for mixed-signal devices, Cortex-M7 is the highest-performance member of the family. Its DSP capability and flexible system interfaces makes it suitable for a wide variety of applications—from automotive and medical applications to sensor fusion and the Internet of Things (IoT).

Article Text
Copy Text

Any re-use permitted for informational and non-commercial or personal use only.

Editorial Contact

Brian Fuller and Jack Melling
Subscribe to Blogs and Podcasts
Get the latest blogs & podcasts direct from Arm

Latest on Twitter