Blog

November 13, 2024

Arm Ethos-U85 NPU: Unlocking Generative AI at the Edge with Small Language Models

Arm Ethos-U85 small language model demo advances performance, power efficiency, and generative AI at the edge.

By Arm Editorial Team

As artificial intelligence evolves, there is increasing excitement about executing AI workloads on embedded devices using small language models (SLM).

Arm’s recent demo, inspired by Microsoft’s “Tiny Stories” paper and Andrej Karpathy’s TinyLlama2 project, where a small language model trained on 21 million stories generates text, showcases endpoint AI’s potential for IoT and edge computing. In the demo, a user inputs a sentence, and the system generates an extended children’s story based on it.

Our demo featured Arm’s Ethos-U85 NPU (Neural Processing Unit) running a small language model on embedded hardware. While large language models (LLMs) are more widely known, there is growing interest in small language models due to their ability to deliver solid performance with significantly fewer resources and lower costs, making them easier and cheaper to train.

Implementing A Transformer-based Small Language Model on Embedded Hardware

Our demo showcased the Arm Ethos-U85 as a small, low-power platform capable of running generative AI, highlighting that small language models can perform well within narrow domains. Although TinyLlama2 models are simpler than the larger models from companies like Meta, they are ideal for showcasing the U85’s AI capabilities. This makes them a great fit for endpoint AI workloads.

Developing the demo involved significant modeling efforts, including the creation of a fully integer int8 (and int8x16) Tiny Llama2 model, which was converted to a fixed-shape TensorFlow Lite format suitable for the Ethos-U85’s constraints.

Our quantization approach has shown that fully integer language models can successfully balance the tradeoff between maintaining strong accuracy and output quality. By quantizing activation, normalization functions, and matrix multiplications, we eliminated the need for floating-point computations, which are more costly in terms of silicon area and energy—key concerns for constrained embedded devices.

The Ethos-U85 ran a language model on an FPGA platform at only 32 MHz, achieving text generation speeds of 7.5 to 8 tokens per second—matching human reading speed—while using just a quarter of its compute capacity. In a real system-on-chip (SoC), performance could be up to ten times faster, significantly enhancing speed and energy efficiency for AI processing at the edge.

The children’s story-generation feature used an open-source version of Llama2, running the demo on TFLite Micro with an Ethos-NPU back-end. Most of the inference logic was written in C++ at the application level. Adjusting the context window enhanced narrative coherence, ensuring smooth, AI-driven storytelling.

The team’s adaptation of the Llama2 model to run efficiently on the Ethos-U85 NPU required careful consideration of performance and accuracy due to the hardware limitations. Using mixed int8 and int16 quantization demonstrates the potential of fully integer models, encouraging the AI community to optimize generative models for edge devices and expand neural network accessibility on power-efficient platforms like the Ethos-U85.

Showcasing the Power of the Arm Ethos-U85

Scalable from 128 to 2048 MAC units (multiply-accumulate units), the Ethos-U85 achieves a 20% power efficiency improvement over its predecessor, the Ethos-U65. A standout feature of the Ethos-U85 is its native support for transformer networks, which earlier versions could not support.

The Ethos-U85 enables seamless migration for partners using previous Ethos-U NPUs, allowing them to capitalize on existing investments in Arm-based machine learning tools. Developers are increasingly adopting the Ethos-U85 for its power efficiency and high performance.

The Ethos-U85 can reach 4 TOPS (trillions of operations per second) with a 2048 MAC configuration in silicon. In the demo, however, a smaller configuration of 512 MACs on an FPGA was used to run the Tiny Llama2 small language model with 15 million parameters at just 32 MHz.

This capability highlights the potential for embedding AI directly into devices. The Ethos-U85 effectively handles such workloads even with limited memory (320 KB of SRAM for caching and 32 MB for storage), paving the way for small language models and other AI applications to thrive in deeply embedded systems.

Bringing Generative AI to Embedded Devices

Developers need better tools to navigate the complexities of AI at the edge, and Arm is addressing this with the Ethos-U85 and its support for transformer-based models. As edge AI becomes more prominent in embedded applications, the Ethos-U85 is enabling new use cases, from small language models to advanced vision tasks.

The Ethos-U85 NPU delivers the performance and power efficiency required for innovative, cutting-edge solutions. Like the “Tiny Stories” paper, our demo represents a significant advancement in bringing generative AI to embedded devices, demonstrating the ease of deploying small language models on the Arm platform.

Arm is opening new possibilities for Edge AI across a wide range of applications, positioning the Ethos-U85 to power the next generation of intelligent, low-power devices.

Read how Arm is accelerating real-time processing for edge AI applications in IoT with ExecuTorch.

By Arm Editorial Team

Article Text

Copy Text

Any re-use permitted for informational and non-commercial or personal use only.

Editorial Contact

Arm Editorial Team

editorial@arm.com

Subscribe to Blogs and Podcasts

Get the latest blogs & podcasts direct from Arm

News

Oct 29, 2024

Arm Empowers Developers with AI-Driven Tools on GitHub

Alex Spinelli, SVP, AI and Developer Platforms and Services, Arm

News

Oct 24, 2024

Accelerating Generative AI at the Edge on Arm with ExecuTorch Beta Release

Alex Spinelli, SVP, AI and Developer Platforms and Services, Arm

Blog

May 22, 2024

Small Language Models: Efficient Arm Computing Enables a Custom AI Future

Ravi Malhotra, Hyperscale Solutions Architect, Arm

Blog

Apr 09, 2024

Arm Ethos-U85: Addressing the High Performance Demands of IoT in the Age of AI

Parag Beeraka, Senior Director, Consumer Computing, Client Business, Arm

Media Information

Latest on X

; Arm @Arm ·

18h 1996297412927754468

Achieve greater integration, intelligence, and control for OEMs? Greater flexibility, faster iteration, and less friction across product lifecycles for developers?

Arm is powering this shift with flexible, AI-ready, energy-efficient compute at the edge. 💪…

Reply on Twitter 1996297412927754468 Retweet on Twitter 1996297412927754468 4 Like on Twitter 1996297412927754468 16 Twitter 1996297412927754468

; Arm @Arm ·

2 Dec 1995919188670423331

We were delighted to welcome H.R.H. Prince Daniel and the inspiring entrepreneurs from Prince Daniel’s Fellowship to Arm Cambridge!🙌

Great conversations, bright ideas, and lots of energy around the future of innovation.

Thanks for spending the day with us! 🤝

Reply on Twitter 1995919188670423331 Retweet on Twitter 1995919188670423331 10 Like on Twitter 1995919188670423331 19 Twitter 1995919188670423331

; Arm @Arm ·

2 Dec 1995864094986174913

It started in karting. Now, @1jessicahawkins is chasing the toughest race in the world, Le Mans.

Fueled by determination and innovation, and supported by the @AstonMartinF1 team and Arm.

This is Jessica's story.

🎥 Watch now: https://shorturl.at/GfycI

Reply on Twitter 1995864094986174913 Retweet on Twitter 1995864094986174913 4 Like on Twitter 1995864094986174913 12 Twitter 1995864094986174913

; Arm @Arm ·

1 Dec 1995570159277224334

5 cities. 1 vision. 👏

Arm Unlocked 2025 in Shanghai, Shenzhen, Seoul, Taipei, and Tokyo brought the global ecosystem together to define how AI scales, from edge and physical AI to cloud AI, all on the Arm compute platform.

https://okt.to/q3PXFU

Reply on Twitter 1995570159277224334 Retweet on Twitter 1995570159277224334 13 Like on Twitter 1995570159277224334 27 Twitter 1995570159277224334

; Arm @Arm ·

30 Nov 1995155477496529242

What’s your favorite way to celebrate milestones at work?

Reply on Twitter 1995155477496529242 Retweet on Twitter 1995155477496529242 5 Like on Twitter 1995155477496529242 4 Twitter 1995155477496529242

; Arm @Arm ·

27 Nov 1994134938543648930

35 years of innovation. 35 years of people shaping the future of computing. 💪

Our offices around the world celebrated 35 years of Arm in true style — connection, creativity, and (of course) cake 🎉

Reply on Twitter 1994134938543648930 Retweet on Twitter 1994134938543648930 22 Like on Twitter 1994134938543648930 59 Twitter 1994134938543648930

; Arm @Arm ·

27 Nov 1993988228559306857

Today, we're celebrating 35 years of Arm! 🎉

That makes us old enough to remember dial-up, and young enough to be at the leading edge of AI.

Arm technology touches 100% of the connected global population, with more than 325 billion chips shipped worldwide.…

Reply on Twitter 1993988228559306857 Retweet on Twitter 1993988228559306857 14 Like on Twitter 1993988228559306857 49 Twitter 1993988228559306857

; Arm @Arm ·

26 Nov 1993730279337341311

This week, we're celebrating 35 years of Arm! 🎉

That makes us old enough to remember dial-up, and young enough to be at the leading edge of AI.

Arm technology touches 100% of the connected global population, with more than 325 billion chips shipped worldwide.…

Reply on Twitter 1993730279337341311 Retweet on Twitter 1993730279337341311 6 Like on Twitter 1993730279337341311 33 Twitter 1993730279337341311

; Arm @Arm ·

26 Nov 1993701791662305749

Mohamed Awad, EVP of Cloud AI Business Unit at Arm, recently joined Silicon Valley Tech Talks to discuss how AI is reshaping data centers, and how Arm’s Neoverse and Arm Total Design ecosystem are helping the industry meet the growing demands of compute.

Arm's Innovations for AI Infrastructure

...

okt.to

Reply on Twitter 1993701791662305749 Retweet on Twitter 1993701791662305749 1 Like on Twitter 1993701791662305749 8 Twitter 1993701791662305749

; Arm @Arm ·

26 Nov 1993626264649728284

Collaboration drives impact. 📷

Over the past decade, we've partnered with @UNICEFinnovate

to apply technology for good, developing scalable AI solutions to protect children’s health, advance education, and build climate resilience.

Discover the story @FT:…

Reply on Twitter 1993626264649728284 Retweet on Twitter 1993626264649728284 4 Like on Twitter 1993626264649728284 15 Twitter 1993626264649728284

; Arm @Arm ·

25 Nov 1993316340346540220

#MSIgnite may be over but if you missed us there's no need to worry.

Head over to our Cloud Migration Hub or Developer Program resources to learn more about the future of cloud computing on Arm. 💪 https://okt.to/gTJ8qj

Reply on Twitter 1993316340346540220 Retweet on Twitter 1993316340346540220 3 Like on Twitter 1993316340346540220 14 Twitter 1993316340346540220

; Arm @Arm ·

24 Nov 1993079730501829061

Inspired by #SC25? So are we.

From record-breaking Arm-based supercomputers to powerful new innovations in AI and HPC, the future of compute is being built on Arm.

Be part of what’s next 👉 http://careers.arm.com

Reply on Twitter 1993079730501829061 Retweet on Twitter 1993079730501829061 6 Like on Twitter 1993079730501829061 22 Twitter 1993079730501829061

; Arm @Arm ·

20 Nov 1991534383849714126

From building networks to shaping the AI era, @AristaNetworks’ Jayshree Ullal joins Rene Haas on the Tech Unheard podcast to talk bold moves, software-first thinking, and leading through change in the AI era.

Listen to the new episode now: https://okt.to/9dEpyc 🎧

Reply on Twitter 1991534383849714126 Retweet on Twitter 1991534383849714126 3 Like on Twitter 1991534383849714126 9 Twitter 1991534383849714126

; Arm @Arm ·

18 Nov 1990875924099649560

Micosoft Azure Cobalt 100 VM powered by Arm Neoverse deliver up to 99% better price-performance across workloads. From web infrastructure to quantitative finance, these instances are enabling efficiency, scalability, and real-world value .

Built on Arm - redefining the future.…

Reply on Twitter 1990875924099649560 Retweet on Twitter 1990875924099649560 4 Like on Twitter 1990875924099649560 32 Twitter 1990875924099649560

; Arm @Arm ·

18 Nov 1990873672475632110

📢 Introducing Cobalt 200!

In partnership with @Microsoft we're bringing you the first publicly announced silicon built on the Arm Neoverse Compute Subsystem V3 (CSS V3).

A vital part of our commitment to a more efficient, scalable, and sustainable cloud!…

Reply on Twitter 1990873672475632110 Retweet on Twitter 1990873672475632110 9 Like on Twitter 1990873672475632110 56 Twitter 1990873672475632110

; Arm @Arm ·

18 Nov 1990818581177606191

👋 Good morning from #SC25!

Stop by the Arm booth to explore our latest demos, connect with our talent team, and learn about open roles and life at Arm.

💡 Discover how you can help shape the future of AI and HPC — and be part of the team driving the future of compute! 📍#4425

Reply on Twitter 1990818581177606191 Retweet on Twitter 1990818581177606191 3 Like on Twitter 1990818581177606191 30 Twitter 1990818581177606191

; Arm @Arm ·

18 Nov 1990663306969784632

🚀 We kicked off #SC25 with @AWSCloud & @NVIDIA, bringing the Arm HPC & Advanced Compute community together to connect and share their experiences building on Arm.

With every major hyperscaler choosing Arm, we’re powering the future of AI and supercomputing. See you at the show!

Reply on Twitter 1990663306969784632 Retweet on Twitter 1990663306969784632 6 Like on Twitter 1990663306969784632 30 Twitter 1990663306969784632

; Arm @Arm ·

17 Nov 1990549783006593058

Our partnership with @NVIDIA keeps growing. 🤝

By extending Arm Neoverse with NVIDIA NVLink Fusion, we’re enabling partners to achieve Grace Blackwell-class performance, bandwidth, and efficiency — delivering greater intelligence per watt for the AI era.

https://okt.to/PHg461

Reply on Twitter 1990549783006593058 Retweet on Twitter 1990549783006593058 81 Like on Twitter 1990549783006593058 406 Twitter 1990549783006593058

; Arm @Arm ·

17 Nov 1990490148270686276

The Fujitsu A64FX powered Fugaku supercomputer showed what was possible with Arm architecture. FUJITSU-MONAKA shows what’s next.

Available in 2027, it brings supercomputing innovation to data centers and the edge, combining SVE2 acceleration and our confidential computing…

Reply on Twitter 1990490148270686276 Retweet on Twitter 1990490148270686276 1 Like on Twitter 1990490148270686276 22 Twitter 1990490148270686276

; Arm @Arm ·

14 Nov 1989463925105078706

Arm is powering the future of cloud computing for the AI and enterprise era.⚡

Whether you’re at #MSIgnite in person or online, don't miss our our on-demand session to learn more about how we're enabling performance, efficiency and innovation.
https://okt.to/Jy5ERg

Reply on Twitter 1989463925105078706 Retweet on Twitter 1989463925105078706 1 Like on Twitter 1989463925105078706 8 Twitter 1989463925105078706

; Arm @Arm ·

14 Nov 1989424614103990705

AI, cloud-native, and multi-architecture design are transforming how workloads are deployed and scaled. The momentum seen at KubeCon + CloudNativeCon 2025 reflects an industry building for flexibility, performance, and efficiency - powered by Arm.

https://okt.to/QuhlMj

Reply on Twitter 1989424614103990705 Retweet on Twitter 1989424614103990705 1 Like on Twitter 1989424614103990705 17 Twitter 1989424614103990705

; Arm @Arm ·

14 Nov 1989133471315345661

Counting down the days until #SC25!

From Fugaku to Jupiter, discover why the world's most advanced supercomputers and AI systems run on the Arm compute platform.

👇 Here's where you'll find us.

Reply on Twitter 1989133471315345661 Retweet on Twitter 1989133471315345661 7 Like on Twitter 1989133471315345661 13 Twitter 1989133471315345661

; Arm @Arm ·

13 Nov 1989050943455863273

AI is changing what’s possible within robotics innovation. 🤖🧠

Recently Anders Beck, VP of Technology at @Universal_Robot, joined the Arm Viewpoints podcast and shared his thoughts on how AI is driving a more flexible, collaborative era of automation.

https://okt.to/IXWvYn

Reply on Twitter 1989050943455863273 Retweet on Twitter 1989050943455863273 3 Like on Twitter 1989050943455863273 17 Twitter 1989050943455863273

; Arm @Arm ·

13 Nov 1988772310220779568

Some inventions don’t just break boundaries, they redefine what’s possible.

The Arm-based Meta Ray-Ban Display AI glasses and EMG wristband are changing how we interact with technology — no touchscreens, no buttons, just movement.

Congrats to the team at @Meta behind the…

Reply on Twitter 1988772310220779568 Retweet on Twitter 1988772310220779568 9 Like on Twitter 1988772310220779568 23 Twitter 1988772310220779568

Arm Ethos-U85 NPU: Unlocking Generative AI at the Edge with Small Language Models

Implementing A Transformer-based Small Language Model on Embedded Hardware

Showcasing the Power of the Arm Ethos-U85

Bringing Generative AI to Embedded Devices

Editorial Contact

Related

Arm Empowers Developers with AI-Driven Tools on GitHub

Accelerating Generative AI at the Edge on Arm with ExecuTorch Beta Release

Small Language Models: Efficient Arm Computing Enables a Custom AI Future

Arm Ethos-U85: Addressing the High Performance Demands of IoT in the Age of AI

Media Information

Company Overview & History

Arm Corporate Guidelines

Media Contacts

Latest on X