Blog

May 23, 2024

Scalable Matrix Extension (SME) for Armv9 Architecture Enables AI Innovation on the Arm CPU

Armv9 SME architecture feature offers significant performance uplifts for AI and ML-based applications.

By Arm Editorial Team

Arm is the technology foundation that runs artificial intelligence (AI) everywhere across every technology touchpoint. This is made possible by our industry-leading architecture that enables a wide range of computational workloads across the billions of diverse devices worldwide.

Arm has a relentless focus on fast-paced architectural evolution that prepares our industry-leading ecosystem for future technology trends and ever-changing compute requirements. While the astronomical rise of AI may feel like a recent phenomenon, Arm has been laying the groundwork for AI innovation for the past two decades, from the Armv7 architecture that introduced the Advanced Single Instruction Multiple Data (SIMD) Extension as part of the first foray into machine learning (ML) based workloads, to today’s Armv9 architecture that incorporates features that accelerate and protect advanced generative AI workloads, like large language models (LLMs), on the Arm CPU.

The Scalable Matrix Extension, known as SME, is an innovative feature designed to meet the needs of today’s AI and ML workloads growing in complexity and power. In addition to accelerating today’s AI, SME provides flexibility on the Arm architecture to manage ever-evolving generative AI workloads.

What is Arm SME?

SME is an Instruction Set Architecture (ISA) extension introduced in the Armv9-A architecture, which accelerates AI and ML workloads and enables improved performance, power efficiency, and flexibility for AI and ML-based applications running on the Arm CPU. This is achieved through the following features:

Enabling a significant increase in matrix and vector processing throughput and efficiency on the Arm CPU;
Maximizing the reuse of data loaded in registers by introducing outer-product instructions that reduce memory bandwidth pressure;
Expanding compressed user data where input elements throughput is increased without increasing the memory load bandwidth;
Supporting a wide range of storage and compute data types, making it a flexible solution for many current and future use cases; and
Permitting an implementation to select a Streaming Vector Length (SVL) between 128 and 2048 bits, with a resulting SVL^2 throughput for matrix-matrix multiplies.

SME2 builds on SME by adding multi-vector instructions that allow the reuse of the architectural state (the ZA array) for both matrix and vector operations, along with higher throughput vector processing capabilities. This brings balance to the vector and matrix accelerations by compressing the AI formats to reduce memory bandwidth and save power. SME2 also enables flexible on-the-fly dequantization and the ability to decompress 2-bit and 4-bit weights to save memory bandwidth. These are important in the context of generative AI workloads that are increasingly complex and power-hungry, while also supporting Arm’s wider commitment to tackling AI’s insatiable energy needs.

What are the key use cases of SME and SME2?

SME accelerates various types of AI and ML workloads, such as generative AI and classical ML networks, as well as computer vision (CV). This is achieved through SME being able to handle both matrix by matrix, matrix by a vector, and multiple vectors by vector operations, as well as the pre and post-processing stages required for ML execution. We anticipate that SME will benefit from a variety of different AI use cases in different markets including:

Applications that combine ML and classical CV/DSP approaches, for example; cinematic photography, media processing, driver monitoring, digital cockpit, audio processing, advanced driver assistance system (ADAS) L2+, and real-time voice assistants.
Use cases that make use of small language models and LLMs, for example; chatbots, conversation summarization, and virtual assistants.

Explaining vector processing, matrix processing, and quantization

To understand how SME operates, it is important to explain the different AI-based processing techniques that it enables and the benefits that SME and the Armv9 architecture provide for each technique. These include:

Vector processing;
Matrix processing;
Matrix multiplications; and
Quantization.

What is vector processing?

In the context of AI and ML, vectors represent one-dimensional arrays of values and data points that typically encode features, inputs, or weights in neural networks. Vector processing is commonly used in modern AI frameworks and libraries like TensorFlow and PyTorch. By leveraging this approach, AI algorithms can efficiently handle complex computations and process large datasets more quickly, leading to faster training times and improved performance. SME includes vector instructions that perform calculations on multiple values in parallel, instead of processing each value sequentially, to significantly speed up many aspects of AI computation.

What is matrix processing?

Matrices, which are two-dimensional arrays of values and data points, play a crucial role in various AI techniques, including ML and deep learning. Matrix processing through SME involves performing operations on these matrices to improve the performance and efficiency of core AI-based workloads including linear algebra operations, like matrix multiplication, and neural networks.

Where are matrix multiplications present? And what do they improve?

Matrix multiplications are an important part of AI and ML-based workloads, as well as other computing workloads, such as scientific simulations and CV. The matrix-matrix multiply operation is becoming increasingly important for AI acceleration on CPUs and benefits significantly from SME. The Arm architecture has evolved, gaining features that improve the performance and efficiency of these operations. For example:

Armv7 added the Advanced SIMD Extension, which is also known as the Arm NEON™ instructions.
Armv8.4-A includes support for 8-bit integer DOT product instructions.
Armv8.6-A includes support for in-vector integer and floating-point matrix-multiply instructions for various data types as well as including the new BFloat16 data type.
Armv9.0-A includes Scalable Vector Extension 2 (SVE2) for digital signal processors (DSPs), media and general-purpose vectorization.
Armv9.2-A introduces SME.

What is quantization?

Quantization involves reducing the precision of numerical values, typically from floating-point representation to fixed-point representation. The process is used in SME to make AI and ML models more efficient by reducing their memory bandwidth and footprint and computational complexity, which is important for compute-intensive generative AI workloads. This means they can be deployed on resource-constrained devices, such as smartphones, mobile devices, embedded systems, and IoT devices.

How long has Arm been adding AI-based features to the architecture?

Arm has been working on adding AI-based features, specifications, and instructions to our architecture for the past two decades. The Armv7 architecture, which was first released in 2003, added the Advanced SIMD extension, which is also known as the Arm NEON™ instructions. NEON™ considers registers as one-dimensional vectors of elements of the same data type, with instructions operating on multiple elements simultaneously. The Armv8 architecture then added a range of AI-based specifications and instructions, including dot product instructions, in-vector matrix multiply instructions, and BFLoat16 support. It also improved the Advanced SIMD Extension by doubling the number of vector registers and adding floating point support. All these improvements and additions were designed to accelerate AI and ML performance in response to evolving AI workloads. The Armv9 architecture incorporates all these features, specifications, and instructions, alongside SVE2, SME, and the new SME2.

What are the core benefits of SMEs?

SME on the Armv9 architecture significantly improves the processing of existing AI and ML workloads on the Arm CPU, leading to faster, more responsive user experiences across various AI-powered devices and applications. It also accelerates a range of applications that use matrix arithmetic, like DSPs, scientific computing, augmented reality (AR) and virtual reality (VR), and imaging to name a few, all of which AI and ML play an increasingly important role in.

Similar to the Arm CPUs that can run a wide variety of neural networks in many different data formats, SME provides the flexibility to respond to evolving AI and ML workloads and requirements that are growing in complexity. This ensures that the Arm architecture will remain relevant for the most important compute workloads in the fast-moving age of AI and beyond. Looking ahead, we will continue to add more AI capabilities into the instruction set for the benefit of Arm’s industry-leading ecosystem, so our partners can deliver improved performance, innovative features, and scalability for their AI-based solutions.

AI-based architecture innovation on Arm

SME exemplifies Arm’s continuous architectural innovation. As AI continues to evolve and grow, SME will ensure that new power-hungry generative AI workloads are processed efficiently on Arm CPUs, leading to better AI-based experiences across the billions of Arm-powered devices. This will ensure that the world’s AI continues to be built on Arm.

Learn more about SME in this Arm Community blog and also how to apply SME in this Programmers Guide.

By Arm Editorial Team

Article Text

Copy Text

Any re-use permitted for informational and non-commercial or personal use only.

Editorial Contact

Brian Fuller & Jack Melling

editorial@arm.com

Subscribe to Blogs and Podcasts

Get the latest blogs & podcasts direct from Arm

Blog

Jan 08, 2024

Arm: The Technology Foundation for AI Everywhere

Arm Editorial Team

Blog

May 15, 2024

Generative AI is on Mobile and it’s Powered by Arm

James McNiven, Vice President of Product Management, Client Line of Business, Arm

Blog

May 22, 2024

Small Language Models: Efficient Arm Computing Enables a Custom AI Future

Ravi Malhotra, Hyperscale Solutions Architect, Arm

Blog

Apr 17, 2024

Arm’s Mission to Help Tackle AI’s Insatiable Energy Needs

Rene Haas, CEO, Arm

Blog

Jan 22, 2024

Business as Usual? Computing Security in the Age of AI

Richard Grisenthwaite, EVP, Chief Architect & Fellow, Arm

Blog

May 20, 2024

Transforming AI Experiences at the Edge with a System-Level Approach

Arm Editorial Team

Media Information

Latest on X

; Arm @Arm ·

10h 1953208734693425333

We believe in supporting the next generation of innovators - that's why we're proud to have signed the Pledge to America's Youth and committed to increasing our work in this area.

This effort will foster early interest in Al, promote Al literacy, and enable comprehensive Al…

Reply on Twitter 1953208734693425333 Retweet on Twitter 1953208734693425333 3 Like on Twitter 1953208734693425333 13 Twitter 1953208734693425333

; Arm @Arm ·

10h 1953204947199095140

Leaders around the globe are investing in the search for efficient and sustainable solutions to enable AI data centers at scale.

@FT spoke with Mohamed Awad, SVP and GM, Infrastructure Line of Business at Arm, about the ongoing race for AI capacity! ⚡

Inside the relentless race for AI capacity

The quest for superintelligence is spurring a data centre boom — but critics question the cost, environmental impact and whether it is all needed

okt.to

Reply on Twitter 1953204947199095140 Retweet on Twitter 1953204947199095140 5 Like on Twitter 1953204947199095140 22 Twitter 1953204947199095140

; Arm @Arm ·

4 Aug 1952377677467255178

Musical legend 🤝 Tech CEO

In the 🆕 episode of Tech Unheard, @itspetergabriel joins Rene Haas to explore how AI can open up access to science, music and the arts - and why that access matters.

Listen here: https://okt.to/ZVn8ft

Reply on Twitter 1952377677467255178 Retweet on Twitter 1952377677467255178 2 Like on Twitter 1952377677467255178 19 Twitter 1952377677467255178

; Arm @Arm ·

4 Aug 1952346796040020100

A world-first for the world’s youngest. We’re supporting @Simprints and @Gavi to launch a contactless AI tool that identifies infants for vital vaccines.

Built on Arm Neoverse, starting in Ghana. Because smarter, more equitable healthcare should start from birth.…

Reply on Twitter 1952346796040020100 Retweet on Twitter 1952346796040020100 10 Like on Twitter 1952346796040020100 30 Twitter 1952346796040020100

; Arm @Arm ·

1 Aug 1951396286961270968

Chiplets are here and they’re reshaping the landscape of compute.

At #62DAC, @EddieRamirez, VP Infrastructure at Arm, shared how Arm Total Design alongside open standards, scalable IP, and a strong partner ecosystem are accelerating the creation of interoperable, silicon-proven…

Reply on Twitter 1951396286961270968 Retweet on Twitter 1951396286961270968 8 Like on Twitter 1951396286961270968 42 Twitter 1951396286961270968

; Arm @Arm ·

1 Aug 1951321099058168204

"We’re just at the beginning of the AI-defined vehicle era.” – Suraj Gajendra

Built on our scalable compute platform, SOAFEE enables standardization, flexibility, and software reuse which help OEMs move faster in this new automotive era.🚗

https://okt.to/whaNx9

Reply on Twitter 1951321099058168204 Retweet on Twitter 1951321099058168204 2 Like on Twitter 1951321099058168204 11 Twitter 1951321099058168204

; Arm @Arm ·

1 Aug 1951246558176985330

In this article for @eetimes, Dipti Vachani explores how the automotive industry must evolve to meet the demands of increasingly complex, AI-defined vehicles.

Read more about what it will take to build a more resilient automotive compute ecosystem: https://okt.to/l0wPcG

Reply on Twitter 1951246558176985330 Retweet on Twitter 1951246558176985330 5 Like on Twitter 1951246558176985330 13 Twitter 1951246558176985330

; Arm @Arm ·

30 Jul 1950649493310808259

We're kicking off the financial year strong with our best Q1 revenue quarter ever, topping $1B for the second quarter in a row.

As AI rewrites what's possible, Arm is the only platform that can deliver performance, efficiency & scale from cloud to edge:

Reply on Twitter 1950649493310808259 Retweet on Twitter 1950649493310808259 10 Like on Twitter 1950649493310808259 40 Twitter 1950649493310808259

; Arm @Arm ·

29 Jul 1950006138499441009

Edge AI is rewriting the playbook for IoT and embedded development as it shifts towards collaborative ecosystems and heterogeneous compute.

@VDC_Research partnered with us to explore the next era of embedded technology - led by AI and built on Arm. ⚡⬇️

https://okt.to/nIkNe6

Reply on Twitter 1950006138499441009 Retweet on Twitter 1950006138499441009 3 Like on Twitter 1950006138499441009 25 Twitter 1950006138499441009

; Arm @Arm ·

28 Jul 1949917544954892736

➡️50% faster vector indexing
➡️20% performance boost
➡️10% cost reduction

@zilliz_universe achieved all this and more by transitioning from x86 to Arm CPUs for compute intensive workloads, reducing operational costs and delivering scale across the organization:…

Reply on Twitter 1949917544954892736 Retweet on Twitter 1949917544954892736 4 Like on Twitter 1949917544954892736 17 Twitter 1949917544954892736

; Arm @Arm ·

28 Jul 1949868215485845764

Ready to push genAI performance to the next level?

Our new course gives you hands-on experience in optimizing AI models from cloud to edge using Arm-based platforms like SIMD (SVE, Neon), low-bit quantization, and the KleidiAI library.

Reply on Twitter 1949868215485845764 Retweet on Twitter 1949868215485845764 4 Like on Twitter 1949868215485845764 10 Twitter 1949868215485845764

; Arm @Arm ·

25 Jul 1948827821310161337

We're building a future for real people.

We caught up with @1JessicaHawkins from our partners over at @AstonMartinF1 during our latest brand film shoot where she gave us a look into her own career journey and the importance of empowerment, growth & pushing the limits.

The…

Reply on Twitter 1948827821310161337 Retweet on Twitter 1948827821310161337 2 Like on Twitter 1948827821310161337 10 Twitter 1948827821310161337

; Arm @Arm ·

25 Jul 1948785102562992381

http://x.com/i/article/1948015818245079041

Reply on Twitter 1948785102562992381 Retweet on Twitter 1948785102562992381 4 Like on Twitter 1948785102562992381 14 Twitter 1948785102562992381

; Arm @Arm ·

25 Jul 1948767747002814600

GenAI is reshaping compute and we’re seeing the shift firsthand.

Since 2021, we’ve seen a 14x increase in our data center customer base. With more AI startups than ever choosing Arm platforms for high-performance, power-efficient compute across workloads, it’s clear that the…

Reply on Twitter 1948767747002814600 Retweet on Twitter 1948767747002814600 4 Like on Twitter 1948767747002814600 11 Twitter 1948767747002814600

; Arm @Arm ·

25 Jul 1948737013575774707

🚗 How do you scale safe, efficient compute for increasingly intelligent vehicles?

Meet Arm Zena CSS , our scalable compute platform that will help OEMs accelerate deployment of L2+ to L4 automated driving, beating analyst predictions: https://okt.to/5Nn0rB

Reply on Twitter 1948737013575774707 Retweet on Twitter 1948737013575774707 7 Like on Twitter 1948737013575774707 25 Twitter 1948737013575774707

; Arm @Arm ·

24 Jul 1948407313770975710

🔋Power efficiency is key to scaling AI.

At #FortuneAISingapore, Will Abbey shared why the time to rethink how we build is now ⏭️and how the Arm compute platform is driving that shift.

FORTUNE @FortuneMagazine

“Power efficiency is going to be the key word that the whole industry needs to focus on.”

@Arm EVP and CCO Will Abbey told #FortuneAISingapore that the global supply chain for semiconductor chips needs to find effective solutions to keep up with demand. https://trib.al/DOI1tyi

Reply on Twitter 1948407313770975710 Retweet on Twitter 1948407313770975710 5 Like on Twitter 1948407313770975710 17 Twitter 1948407313770975710

; Arm @Arm ·

23 Jul 1948057694998302815

In this spotlight by @themoment_media, Ami Badani, Chief Marketing Officer shares how our AI tech is helping shape the next era of productivity, creativity, and purpose.

Big thanks to the team at ATM for featuring this moment.🙌

ATM - At The Moment Media @themoment_media

Everyone's thinking about AI - Ami Badani, CMO @Arm is thinking about AI for GOOD 🫶🏽

🌎 Making society more productive while empowering everyone to use technology for positive
change 🙌🏽

#ATM #advertising #technology #media #experiences #storytellers #influencers #stories…

Reply on Twitter 1948057694998302815 Retweet on Twitter 1948057694998302815 2 Like on Twitter 1948057694998302815 10 Twitter 1948057694998302815

; Arm @Arm ·

22 Jul 1947726864522293518

In an interview with @automotiveworld, Dipti Vachani shares how we're helping automakers move faster by making software development simpler, scalable, and AI-ready thanks to SOAFEE and Arm Zena CSS. 🚗

Download the full story: https://okt.to/2RE9nT

Reply on Twitter 1947726864522293518 Retweet on Twitter 1947726864522293518 2 Like on Twitter 1947726864522293518 10 Twitter 1947726864522293518

; Arm @Arm ·

22 Jul 1947704622866411702

Proud to collaborate with @unitygames on their new e-book: “The Ultimate Guide to Profiling Unity Games” 🎮

We helped integrate hardware tools like Arm Performance Studio and Streamline Performance Analyzer to help developers better understand runtime behavior on Arm-based…

Unity for Games @unitygames

🚀 New e-book alert!

“The ultimate guide to profiling Unity games (Unity 6 edition)” is ready to download. Learn how to Get almost 100 pages of tips on profiling, memory management, and power consumption optimization.

🕵️ Learn how to pinpoint performance issues with the Unity…

Reply on Twitter 1947704622866411702 Retweet on Twitter 1947704622866411702 4 Like on Twitter 1947704622866411702 17 Twitter 1947704622866411702

; Arm @Arm ·

21 Jul 1947431357376041069

Will Abbey joins Graphcore’s Nigel Toon at #FortuneAISingapore to unpack how chipmakers can scale AI sustainably in an era where global strategy meets silicon and intelligence.

https://okt.to/rcwGLb

📍Main Stage | 2:40 PM

Reply on Twitter 1947431357376041069 Retweet on Twitter 1947431357376041069 4 Like on Twitter 1947431357376041069 9 Twitter 1947431357376041069

; Arm @Arm ·

21 Jul 1947238915921920265

We joined @AstonMartinF1’s #MakeAMark initiative to help young people explore the future of tech.

From training AI with micro:bit devices to discussing the human side of innovation, it was all about real-world skills, hands-on learning, and big inspiration.

Reply on Twitter 1947238915921920265 Retweet on Twitter 1947238915921920265 4 Like on Twitter 1947238915921920265 19 Twitter 1947238915921920265

; Arm @Arm ·

17 Jul 1945924983990981072

Edge AI is triggering the Great Embedded Awakening 🌍

💡Modern workloads = modern tools
💡Rich operating systems are displacing RTOSs
💡Heterogeneous compute is becoming the norm

Our report with @VDC_Research explores how much the landscape is changing.
https://okt.to/9PxFJM

Reply on Twitter 1945924983990981072 Retweet on Twitter 1945924983990981072 3 Like on Twitter 1945924983990981072 18 Twitter 1945924983990981072

; Arm @Arm ·

17 Jul 1945920169433313612

Congrats, @nuro on the launch of its next-gen global robotaxi program! 🥳

The Nuro Driver, built on Arm, will soon enable safe, AI-first autonomy across Uber’s fleet.

We're proud to support our partners to the AI-driven future of mobility.

Nuro @nuro

.@LucidMotors’ premium EVs. @Nuro’s proven L4 autonomy. @Uber’s global ride-hailing network.

Together, we're launching a next-gen robotaxi fleet—20K+ vehicles, starting in 2026.

Details here: https://www.nuro.ai/nuro-lucid-uber-robotaxi-announcement

#autonomousvehicles #technology #innovation #partnership…

Reply on Twitter 1945920169433313612 Retweet on Twitter 1945920169433313612 7 Like on Twitter 1945920169433313612 25 Twitter 1945920169433313612

; Arm @Arm ·

17 Jul 1945802180981637406

🚗 Arm Zena CSS brings a world class ecosystem of software partners like @awscloud, DENSO, @Mapbox, @RedHat and more together to collaborate and drive the AI-defined future.

Together, we’re transforming vehicles into intelligent, safer, updatable platforms…

Reply on Twitter 1945802180981637406 Retweet on Twitter 1945802180981637406 4 Like on Twitter 1945802180981637406 22 Twitter 1945802180981637406

Scalable Matrix Extension (SME) for Armv9 Architecture Enables AI Innovation on the Arm CPU

What is Arm SME?

What are the key use cases of SME and SME2?

Explaining vector processing, matrix processing, and quantization

What is vector processing?

What is matrix processing?

Where are matrix multiplications present? And what do they improve?

What is quantization?

How long has Arm been adding AI-based features to the architecture?

What are the core benefits of SMEs?

AI-based architecture innovation on Arm

Editorial Contact

Related

Arm: The Technology Foundation for AI Everywhere

Generative AI is on Mobile and it’s Powered by Arm

Small Language Models: Efficient Arm Computing Enables a Custom AI Future

Arm’s Mission to Help Tackle AI’s Insatiable Energy Needs

Business as Usual? Computing Security in the Age of AI

Transforming AI Experiences at the Edge with a System-Level Approach

Media Information

Company Overview & History

Arm Corporate Guidelines

Media Contacts

Latest on X