Blog

April 14, 2021

Sensory Removes Barriers to Entry for Endpoint Voice Control

Sensory Inc. executive Joseph Murphy explores the obstacles in the path to full natural language control of our home devices, and the ways Sensory is mitigating against them

By Joe Murphy, Executive, Sensory Inc.

“Defrost one pound of frozen chicken.” I have no clue what magical button sequence on my microwave will get me there, but it sure is easy to just say it. “Start a load of whites at 10pm tonight.” I’m not even going to try to dial that in to my washing machine—if I did, I’d undoubtedly end up with a pile of shrunken, off-color laundry.

Today, voice control around the home is mostly limited to the smart speaker and the smartphone. Yet just look around your home and I am sure you will find at least two or three appliances that have a user interface that is a confusing mashup of buttons, lights, switches, text and beeps.

These appliances offer multiple layers of functionality and advanced convenience options, but no one knows how to access them. Voice and natural language will unlock the advanced functionality already supported by such devices: you’ll speak to them in your natural language, and they’ll understood and action your requests.

In one form or another, Sensory has been supporting voice-forward and voice-first products for more than 20 years, working on voice technologies such as wake word detection, embedded speech recognition and voice biometrics for a long time. We’re now more committed than ever to a future of voice control because there are so many opportunities beyond smart speakers and mobile phones.

If the microwave example I opened with seems far-fetched, I have some news: It’s already available to preorder on Amazon. Watch the video below to see how we did it.

That microwave may seem cutting-edge, but to me and my colleagues at Sensory, voice and natural language has always made so much more sense as a way to ask machines to do what we want. So why don’t we see more products taking advantage of voice commands, what are the barriers to entry and how are we mitigating against them?

Barrier 1: Voice adds cost

Cost is a primary concern for brand owners. Adding voice capabilities to a device typically incudes an increased bill of materials (BOM) due to the components and technologies required to support the voice experience.

Consider the average smart speaker that relies on a direct line to the cloud via Wi-Fi, an applications processor and an advanced audio front end. Adding all that to an appliance can quickly get expensive.

Sensory’s TrulyNatural speech recognizer is embedded on device: all processing is performed at the endpoint. With support for tens of thousands of natural language commands, there’s no need to process the audio it captures in the cloud. Embedded speech recognition also alleviates any privacy concerns about recordings in the cloud.

We’ve also reduced BOM significantly by removing the application processor. While most natural language engines require an application processor and operating system, a recent breakthrough from Sensory has enables us to run our speech recognition on ARM Cortex-M7.

Low cost and energy efficient Cortex M cores are already embedded in tens of billions of consumer devices. The ability for us to run natural language on these microcontrollers opens up a whole new range of products for voice control.

Barrier 2: Voice is hard to develop for

There’s no doubt that developing custom voice grammars and language models requires a very specific set of skills and a talented team. However, Sensory recently launched VoiceHub, an online portal that brings those skills to the masses. VoiceHub enables developers to create custom voice grammars without the need for their very own machine learning team.

Sensory’s VoiceHub is almost a WYSIWYG interface for designing voice control commands. Developers can login and create prototype grammars for free. Proof of concept, MVP and demos that once took weeks can now be created in hours. No previous programming skills are required to develop complex grammars with hundreds or thousands of voice commands.

The capability to run Sensory’s speech recognition on Arm Cortex-M based microcontrollers has already been integrated into Sensory’s VoiceHub. Developers in VoiceHub can create edge-based artificial intelligence (AI) voice models and download them directly to their Arm-based developer board. Developing voice control with Sensory and Arm just got a lot easier.

^{VoiceHub – Fast, Free, Flexible Voice Development}

Barrier 3: Voice is confusing for customers to set up

The final barrier to consider is applicable to most smart home products and not just voice control. Getting smart home devices set up can be a real hassle for consumers. The initial setup can be a real pain point.

Most smart devices require a smartphone app and sometimes multiple apps for setup. Then a confusing mix of Wi-Fi, Bluetooth, ZigBee and other protocols can leave consumers feeling like they need to hire a system administrator just to turn the lights on.

If the neighborhood ever loses power or internet connectivity, it can take time to get everything provisioned again.

Sensory speech recognition is performed at the endpoint—so even if your Internet is offline, it’s ready to go, out of the box. Just plug in the device and start talking.

What’s Next for Voice?

Enabling the voice user interface is in Sensory’s DNA. We’ll will continue to push the envelope on accuracy, language support, and memory size. Sensory’s technologies are well positioned to support the next wave of voice user interfaces and enable voice control on endpoint devices that were previously only touch-based and all too often impossible to understand.

And thanks to Arm Cortex-M microcontrollers, we can guarantee that we’ll be able to speak to this new wave of devices in our natural language, in the knowledge that even without a connection to the Internet, all of that compute is happening on the device itself.

Cortex-M7: The Highest Performance Cortex-M Processor

The Cortex-M processor series is designed to enable developers to create cost-sensitive and power-constrained solutions for a broad range of devices. Highly energy efficient and designed for mixed-signal devices, Cortex-M7 is the highest-performance member of the family. Its DSP capability and flexible system interfaces makes it suitable for a wide variety of applications—from automotive and medical applications to sensor fusion and the Internet of Things (IoT).

Learn More

By Joe Murphy, Executive, Sensory Inc.

Article Text

Copy Text

Any re-use permitted for informational and non-commercial or personal use only.

Editorial Contact

Brian Fuller and Jack Melling

editorial@arm.com

Stay informed with Arm's top stories, insights, and conversations.

Media Information

Latest on X

; Arm @Arm ·

13h 2033672197156319248

ICYMI: Arm CEO Rene Haas delivers the keynote at #ArmEverywhere on March 24.

Discover what’s redefining AI and compute—and watch it unfold live.

Catch the livestream: https://okt.to/swWrt4

Reply on Twitter 2033672197156319248 Retweet on Twitter 2033672197156319248 2 Like on Twitter 2033672197156319248 14 Twitter 2033672197156319248

; Arm @Arm ·

13 Mar 2032590808931561698

Partnership is driving Texas’ leadership in AI and semiconductor innovation. 🤝

Arm is proud to support the state's growing innovation ecosystem - bringing together research, infrastructure, and workforce talent to power the future of AI and compute. https://okt.to/2xFUBC

Reply on Twitter 2032590808931561698 Retweet on Twitter 2032590808931561698 1 Like on Twitter 2032590808931561698 19 Twitter 2032590808931561698

; Arm @Arm ·

13 Mar 2032540993321328997

Space exploration is no longer confined to labs and launchpads. 🚀

Students are designing, coding, and experimenting with technologies inspired by real missions. We're proud to power the platforms that are helping connect classrooms, maker spaces, and libraries to outer space

Reply on Twitter 2032540993321328997 Retweet on Twitter 2032540993321328997 6 Like on Twitter 2032540993321328997 21 Twitter 2032540993321328997

; Arm @Arm ·

12 Mar 2032181622494105735

AI is exposing the limits of legacy data center designs.

In response to the shift to purpose-built rack-level systems and continuous inferencing through agentic AI demands, Arm Neoverse is increasingly the choice CPU foundation for system architects, delivering stability,

Reply on Twitter 2032181622494105735 Retweet on Twitter 2032181622494105735 7 Like on Twitter 2032181622494105735 40 Twitter 2032181622494105735

; Arm @Arm ·

11 Mar 2031735686819188956

🚨 Just Announced: Arm CEO Rene Haas will deliver the keynote at #ArmEverywhere.

Join us live on March 24 for a defining moment in AI compute, as Rene shares more on the evolution of intelligence and the ecosystem powering innovation at scale.

Watch the livestream:

Reply on Twitter 2031735686819188956 Retweet on Twitter 2031735686819188956 11 Like on Twitter 2031735686819188956 36 Twitter 2031735686819188956

; Arm @Arm ·

11 Mar 2031690647103537409

As AI transforms embedded products, teams are under more pressure than ever to deliver, faster and with fewer resources.

At #EmbeddedWorld, we shared how Arm’s tools and software, including Keil MDK6, help simplify AI adoption at the edge.

Thanks to everyone who joined the

Reply on Twitter 2031690647103537409 Retweet on Twitter 2031690647103537409 7 Like on Twitter 2031690647103537409 30 Twitter 2031690647103537409

; Arm @Arm ·

10 Mar 2031361980146045329

Hello from Nuremberg! 👋

We’re excited to be on the show floor at #EmbeddedWorld, showcasing how Arm is powering intelligent systems everywhere.

Find us at Booth 4-504 to explore our demos in action, join our speaking session, and chat with our team.

If you’re at EW, come say

Reply on Twitter 2031361980146045329 Retweet on Twitter 2031361980146045329 3 Like on Twitter 2031361980146045329 30 Twitter 2031361980146045329

Sensory Removes Barriers to Entry for Endpoint Voice Control

Barrier 1: Voice adds cost

Barrier 2: Voice is hard to develop for

Barrier 3: Voice is confusing for customers to set up

What’s Next for Voice?

Cortex-M7: The Highest Performance Cortex-M Processor

Editorial Contact

Media Information

Company Overview & History

Arm Corporate Guidelines

Media Contacts

Latest on X