Why the Right Software Approach is Vital to AI Innovation

As Mark Hambleton, Arm’s SVP of Software, says in the Arm Silicon Reimagined report: “The future of AI development relies on the synergy between software and hardware.”
However, the big challenge, as set out in a new Arm-sponsored CIO report, is that developer workflows are often fragmented. This means developers are unable to move as quickly as they would like when creating and scaling new AI applications.
At Arm, we recognize the importance of software in fulfilling AI’s true potential. We work from the foundational architecture and across the stack to simplify AI development and enable seamless performance accelerations for new AI applications and workloads.
How Armv9 Architecture Accelerates AI and ML Workloads
Arm is continuously evolving our architecture, which essentially acts as the interface between hardware and software. Today, the Armv9 architecture is the modern technology foundation for various markets from cloud to edge, including smartphones, datacenters, high-performance computing and automotive applications.
Arm updates each architecture with new features, which more recently included Scalable Matrix Extension (SME) and Scalable Vector Extension 2 (SVE2), that are critical for accelerating generative AI and common machine learning (ML) workloads across all applications. SME enables complex matrix processing capabilities in common instruction sets that allow developers to get great performance on their AI applications and then seamlessly migrate across ecosystems. This creates greater possibilities to run more AI workloads across more hardware, while offering an improved user experience.
Why CPU Remains the Preferred Platform for AI Development
These architectural features are built into Arm’s CPUs, which have emerged as the target platform of choice for software developers. This is due to their widespread adoption from cloud to edge, and suitability as an immediate target for most AI inference workloads that are commonly used across billions of devices, like today’s smartphones, and in cloud and datacenters worldwide. By targeting the CPU, developers can run a broader range of software in a greater variety of data formats without needing to build multiple versions of the code required for specialist NPUs.
CPUs offer developers the consistency that they value, avoiding the fragmentation and inefficiencies associated with bespoke hardware solutions. As Hambleton noted in the Silicon Reimagined report: “Interoperability across AI frameworks is a critical concern for developers. This is why developers frequently default to CPU back-ends, as their ubiquity ensures broader compatibility.”
There are other factors beyond architectural advancement that are helping to scale AI workloads. In the CIO report, Nick Horne, Arm’s VP of ML Engineering, states that AI has evolved from requiring enormous models in the cloud to smaller, more efficient models that can run at the edge – on devices. He says: “Now you can get excellent models that provide great results running on the device in your pocket, and in some cases entirely on the CPU.”
How Arm’s Open Source Collaboration Empowers AI Developers
Arm works extensively with the open source community to democratize AI and create opportunities for developers to easily access the latest architectural features and performance across hardware from a broad range of Arm ecosystem partners.
Horne highlights the benefits of this approach to developers in the CIO report. He says: “Working with open source AI frameworks with good hardware abstraction minimizes the loss of flexibility.” This helps developers to avoid being tied down to a specific piece of hardware, cloud service provider or software platform.
What is Arm Kleidi and How Does it Accelerate AI Workloads
Arm Kleidi is a great example of these benefits in action. Kleidi includes developer enablement technologies, resources and micro-kernel libraries providing effortless AI workload acceleration for models running on Arm CPUs. As Kleidi libraries are integrated into the most popular open source AI frameworks and runtimes, including MediaPipe from Google, ExecuTorch and PyTorch from Meta, and llama.cpp, these performance optimizations require no additional work from developers, saving time, effort and costs. Kleidi is now integrated across all markets that Arm covers, including mobile, cloud, datacenter, automotive, and IoT.
How Arm’s Ecosystem Partnerships Enable Scalable AI Deployment
On a broader level, Arm works across our industry-leading software ecosystem with various partners to securely and safely deploy AI at scale. For example, our partnership with GitHub on GitHub runners enables developers to test and deploy trained models more efficiently in the cloud. More recently, the Arm extension for GitHub CoPilot offers developers access to a fully integrated native Arm workflow, with accurate code generation, test case creation and bug fixing.
Arm is also committed to frictionless software development through various initiatives that simplify and accelerate the deployment of low-level software and firmware. Initiatives such as Linaro OneLab, Trusted Firmware and PSA Certified foster collaboration and provide blueprints for secure software deployment and support in the rapidly advancing spheres of edge AI and high-performance IoT. In the automotive industry, the Arm-founded SOAFEE (Scalable Open Architecture for Embedded Edge) initiative is dedicated to delivering a standards-based framework for software re-use at scale to accelerate development cycles. This is supporting unprecedented demands for more AI in applications featured in the software-defined vehicle (SDV), while enhancing driver experiences.
Why Open Standards Are Key to AI Innovation
Finally, a lack of standardized practices can hinder innovation and lead to future complexities for developers. Open standards mean developers and researchers can transition seamlessly between platforms, while also allowing them to focus on training, quantization, and deployment that add value to the ongoing innovation of models.
How Arm is Accelerating and Future-Proofing AI Development
For AI to reach its full potential, the software development process needs to be accelerated, open and streamlined. Arm’s technologies and supporting ecosystem helps to future-proof AI development through focusing on open standards, hardware abstraction and compatibility with evolving frameworks. This approach allows developers to seamlessly create and deploy their AI applications, models and workloads at scale across diverse hardware with enhanced performance — building better software on Arm for the age of AI.
Takeaways
- SME and SVE2 are Armv9 architecture extensions designed to accelerate generative AI and ML workloads across cloud, edge, and device environments.
- SME enables efficient matrix processing, allowing developers to achieve strong AI performance and easily migrate workloads across different hardware ecosystems.
- SME2 builds on SME with higher throughput, real-time processing improvements, and better power efficiency for advanced mobile AI use cases.
Frequently Asked Questions
What are SME and SVE2 in the Armv9 architecture?
They are extensions that boost AI and ML performance by accelerating matrix and vector operations directly on the CPU.
How does SME improve AI application performance?
SME delivers scalable matrix processing and efficient memory use, enabling AI workloads to run smoothly across different hardware platforms.
What advantages does SME2 add over SME?
SME2 increases throughput, improves real-time AI task performance, and enhances energy efficiency for mobile and edge devices.
AI in Software
Learn more about AI in software for the new age of development.
Any re-use permitted for informational and non-commercial or personal use only.