Arm Newsroom Blog
Blog

From Cloud to Edge: How Arm is powering AI at scale

By Arm Editorial Team

Summary: What does it take to scale AI from the cloud to the edge? Arm makes it possible by enabling consistent AI development across datacenters, edge infrastructure, and devices. A unified architecture and a mature software ecosystem helps developers deploy AI efficiently, wherever it runs.

As AI rapidly expands across datacenters, devices, and everything in between, theΒ real challengeΒ isn’tΒ buildingΒ intelligent computing,Β it’sΒ building the infrastructure required to scale it.Β 

AI is not a single-layer problem,Β it’sΒ a sprawling ecosystem shaped by the world’s biggest technology leaders.Β Across this trillion‑dollar transformation, one architecture keeps emerging – Arm.Β 

How industry leaders are building AI datacenters on Arm

Alongside Arm, the biggest names in AIβ€”NVIDIA, AWS, Microsoft, Google, Oracle, and OpenAIβ€”are collectively driving next‑generation datacenter buildouts. Estimates place AI infrastructure investments in the trillions of dollars, fueled by demands for training, inference, and cost‑efficient scale. 

By 2025, half of the compute shipped to top hyperscalers is projected to be Arm‑based. AWS (Graviton), Google Cloud (Axion), and Microsoft Azure (Cobalt) all now deploy Arm-based chips for cloud infrastructure, enabling significant energy and cost savings as well as scalability. NVIDIA’s Grace CPU, built on Arm Neoverse, anchors their Grace Blackwell AI superchip, which has seen 3.6 million units ordered by the top four U.S. hyperscale cloud providers alone. In fact, over 1 billion Arm Neoverse CPUs have now been shipped into datacenters, underscoring the architecture’s central role in this global buildout. 

Across the most advanced AI datacenter stacks, Arm is the common denominator, enabling scalability, efficiency, and adaptable performance where older architectures fall short. 

In fact, Arm delivers unmatched price-performance and power-efficiency: 

  • NVIDIA’s Grace‑Hopper Superchip yields up to 8X faster model training and 4.5X LLM inference performance versus x86 systemsΒΉ. 
  • Google’s Axion offers up to 3X better recommender performanceΒ², 2.5X higher inference, and 64% cost savings compared to x86Β³. 
  • As of December 2024, over 50% of EC2 capacity is built on AWS Graviton⁴. 

Moreover, recent analysis from the consulting Signal65 shows that the Arm Neoverse-based AWS Graviton 4 chips, are not only leading the competition on price-performance, but significantly outpacing comparable x86 offerings from AMD and Intel on overall performance across enterprise workloads. For example, Signal65’s benchmarking tests showed Graviton4 delivering up to 168% better large language model (LLM) inference performance and 220% higher price-performance than AMD, while also beating Intel in networking throughput by 53% and machine learning (ML) training speeds by 34%. These results underscore Arm’s architectural advantage across both AI and general compute tasks. 

Why scaling AI from cloud to edge requires new compute architectures

AI isn’t confined to datacenters, it’s expanding outward. Smartphones, PCs, and IoT devices – from low power sensors to high-performance industrial applications – now demand on‑device generative AI, reshaping user experiences. 

Arm is uniquely positioned here, too. The new Arm Lumex Compute Subsystem (CSS) platform for consumer devices unlocks real-time on-device AI use cases like assistants, voice translation and personalization, with the new SME2-enabled Arm CPUs delivering up to 5x faster AI performance. Meanwhile, the world’s first Armv9 edge AI platform, which is optimized for edge AI workloads across IoT applications, enables on-device AI models of over one billion parameters.  

Arm is powering a cloud-to-edge revolution, and it’s built to scale across that continuum. 

Why software is the real differentiator in scaling AI

In AI, hardware provides the foundation, but it’s software that defines the experience. As AI workloads scale in complexity and reach, developers need an ecosystem that can move as fast as their ambitions. This is where Arm’s unique advantage shines: a unified architecture supported by a robust, optimized software ecosystem that spans from cloud to edge. 

Arm’s massive developer baseβ€”now 22 million strongβ€”benefits from an ecosystem where the same code, tools, and frameworks run seamlessly across devices, whether it’s datacenter-scale model training or real-time inference at the edge. This architectural consistency enables faster development, streamlined optimization, and wider deployment without redundant engineering effort. 

Key frameworks like PyTorch ExecuTorch, TensorFlow Lite, and MediaPipe are now deeply integrated and optimized for Arm-based systems via Arm KleidiAI, a lightweight, open-source optimization layer that activates Arm-optimized microkernels under the hood. That means developers can tap into performance enhancements automatically without modifying code, across everything from hyperscale cloud platforms to smartphones and embedded devices. 

For example, on Graviton4, KleidiAI enables time-to-first-token for Llama 3 to run up to 2.5x faster than baseline, while mobile implementations leveraging MediaPipe see performance boosts of up to 30% on models, like Gemma 2B. Whether managing AI factories or deploying chatbots at the edge, the software experience is predictable, performant, and power efficient. 

This kind of seamless, system-aware software enablement is what differentiates Arm’s approach. Developers aren’t left navigating fragmented stacks or doing backend rework. Instead, they inherit the benefits of an ecosystem that’s co-designed, hardware and software together, for AI performance and efficiency. 

In the AI era, where performance-per-watt is everything, Arm’s software ecosystem isn’t just keeping up, it’s meeting developers where they are and accelerating innovation.  

Where cloud-to-edge AI is being deployed today

AI is being forged at an unprecedented scale, from trillion-dollar datacenters to next-gen smartphones and in-vehicle systems. The architecture bridging these worlds is Arm. 

With hyperscaler adoption, flexible edge compute, and a vibrant, AI-ready software ecosystem, Arm stands as the backbone of AI infrastructure, today and tomorrow. 

Ready to learn more? Explore how Arm is powering the AI era at scale

Frequently asked questions

Q: What is cloud to edge AI?

A: It refers to AI workloads that start in cloud datacenters and extend to edge devices, which help optimize compute where data is collected.

Q: How does Arm support scalable AI infrastructure?

A: Arm architecture power hyperscaler cloud chips and efficient edge processors with consistent software tooling to scale AI workloads end-to-end.

Q: What AI developer frameworks does Arm support?

A: Arm supports PyTorch (via ExecuTorch), TensorFlow Lite, MediaPipe, and optimized libraries via KleidiAI.

Q: Why build on Arm for edge AI?

A: Edge AI minimizes latency, boosts privacy, and reduces energy usage compared to cloud-only AI deployments.

References 

ΒΉ NVIDIA GH200 Grace Hopper Superchip Architecture 

Β² Unpacking Axion: Google Cloud’s Custom Arm-based Processor Built for the AI age 

Β³ Harness the Power of Retrieval-Augmented Generation with Arm Neoverse-powered Google Axion Processors 

⁴ AWS re:Invent 2024 – Monday Night Live with Peter DeSantis 

Article Text
Copy Text

Any re-use permitted for informational and non-commercial or personal use only.

Editorial Contact

Arm Editorial Team
Subscribe to Blogs and Podcasts
Get the latest blogs & podcasts direct from Arm

Latest on X

promopromopromopromopromopromopromopromo