Arm Newsroom Blog
Blog

From Cloud to Edge, Why Arm is Built for Scaling Your AI Stack 

By Arm Editorial Team

As AI rapidly expands across datacenters, devices, and everything in between, the real challenge isn’t building intelligent computing, it’s building the infrastructure required to scale it. 

AI is not a single-layer problem, it’s a sprawling ecosystem shaped by the world’s biggest technology leaders. Across this trillion‑dollar transformation, one architecture keeps emerging – Arm. 

AI Datacenters Defined by Industry Giants, Powered by Arm 

Alongside Arm, the biggest names in AI—NVIDIA, AWS, Microsoft, Google, Oracle, and OpenAI—are collectively driving next‑generation datacenter buildouts. Estimates place AI infrastructure investments in the trillions of dollars, fueled by demands for training, inference, and cost‑efficient scale. 

By 2025, half of the compute shipped to top hyperscalers is projected to be Arm‑based. AWS (Graviton), Google Cloud (Axion), and Microsoft Azure (Cobalt) all now deploy Arm-based chips for cloud infrastructure, enabling significant energy and cost savings as well as scalability. NVIDIA’s Grace CPU, built on Arm Neoverse, anchors their Grace Blackwell AI superchip, which has seen 3.6 million units ordered by the top four U.S. hyperscale cloud providers alone. In fact, over 1 billion Arm Neoverse CPUs have now been shipped into datacenters, underscoring the architecture’s central role in this global buildout. 

Across the most advanced AI datacenter stacks, Arm is the common denominator, enabling scalability, efficiency, and adaptable performance where older architectures fall short. 

In fact, Arm delivers unmatched price-performance and power-efficiency: 

  • NVIDIA’s Grace‑Hopper Superchip yields up to 8X faster model training and 4.5X LLM inference performance versus x86 systems¹. 
  • Google’s Axion offers up to 3X better recommender performance², 2.5X higher inference, and 64% cost savings compared to x86³. 
  • As of December 2024, over 50% of EC2 capacity is built on AWS Graviton⁴. 

Moreover, recent analysis from the consulting Signal65 shows that the Arm Neoverse-based AWS Graviton 4 chips, are not only leading the competition on price-performance, but significantly outpacing comparable x86 offerings from AMD and Intel on overall performance across enterprise workloads. For example, Signal65’s benchmarking tests showed Graviton4 delivering up to 168% better large language model (LLM) inference performance and 220% higher price-performance than AMD, while also beating Intel in networking throughput by 53% and machine learning (ML) training speeds by 34%. These results underscore Arm’s architectural advantage across both AI and general compute tasks. 

AI from Cloud to Edge Needs New Compute

AI isn’t confined to datacenters, it’s expanding outward. Smartphones, PCs, and IoT devices – from low power sensors to high-performance industrial applications – now demand on‑device generative AI, reshaping user experiences. 

Arm is uniquely positioned here, too. The new Arm Lumex Compute Subsystem (CSS) platform for consumer devices unlocks real-time on-device AI use cases like assistants, voice translation and personalization, with the new SME2-enabled Arm CPUs delivering up to 5x faster AI performance. Meanwhile, the world’s first Armv9 edge AI platform, which is optimized for edge AI workloads across IoT applications, enables on-device AI models of over one billion parameters.  

Arm is powering a cloud-to-edge revolution, and it’s built to scale across that continuum. 

Software as the Differentiator, Arm Tools for an AI Era

In AI, hardware provides the foundation, but it’s software that defines the experience. As AI workloads scale in complexity and reach, developers need an ecosystem that can move as fast as their ambitions. This is where Arm’s unique advantage shines: a unified architecture supported by a robust, optimized software ecosystem that spans from cloud to edge. 

Arm’s massive developer base—now 22 million strong—benefits from an ecosystem where the same code, tools, and frameworks run seamlessly across devices, whether it’s datacenter-scale model training or real-time inference at the edge. This architectural consistency enables faster development, streamlined optimization, and wider deployment without redundant engineering effort. 

Key frameworks like PyTorch ExecuTorch, TensorFlow Lite, and MediaPipe are now deeply integrated and optimized for Arm-based systems via Arm KleidiAI, a lightweight, open-source optimization layer that activates Arm-optimized microkernels under the hood. That means developers can tap into performance enhancements automatically without modifying code, across everything from hyperscale cloud platforms to smartphones and embedded devices. 

For example, on Graviton4, KleidiAI enables time-to-first-token for Llama 3 to run up to 2.5x faster than baseline, while mobile implementations leveraging MediaPipe see performance boosts of up to 30% on models, like Gemma 2B. Whether managing AI factories or deploying chatbots at the edge, the software experience is predictable, performant, and power efficient. 

This kind of seamless, system-aware software enablement is what differentiates Arm’s approach. Developers aren’t left navigating fragmented stacks or doing backend rework. Instead, they inherit the benefits of an ecosystem that’s co-designed, hardware and software together, for AI performance and efficiency. 

In the AI era, where performance-per-watt is everything, Arm’s software ecosystem isn’t just keeping up, it’s meeting developers where they are and accelerating innovation.  

The Backbone of AI at Scale 

AI is being forged at an unprecedented scale, from trillion-dollar datacenters to next-gen smartphones and in-vehicle systems. The architecture bridging these worlds is Arm. 

With hyperscaler adoption, flexible edge compute, and a vibrant, AI-ready software ecosystem, Arm stands as the backbone of AI infrastructure, today and tomorrow. 

Ready to learn more? Explore how Arm is powering the AI era at scale

References 

¹ NVIDIA GH200 Grace Hopper Superchip Architecture 

² Unpacking Axion: Google Cloud’s Custom Arm-based Processor Built for the AI age 

³ Harness the Power of Retrieval-Augmented Generation with Arm Neoverse-powered Google Axion Processors 

⁴ AWS re:Invent 2024 – Monday Night Live with Peter DeSantis 

Article Text
Copy Text

Any re-use permitted for informational and non-commercial or personal use only.

Editorial Contact

Arm Editorial Team
Subscribe to Blogs and Podcasts
Get the latest blogs & podcasts direct from Arm

Latest on X

promopromopromopromopromopromopromopromo