Summary: What does it take to scale AI from the cloud to the edge? Arm makes it possible by enabling consistent AI development across datacenters, edge infrastructure, and devices. A unified architecture and a mature software ecosystem helps developers deploy AI efficiently, wherever it runs.
As AI rapidly expands across datacenters, devices, and everything in between, theΒ real challengeΒ isnβtΒ buildingΒ intelligent computing,Β itβsΒ building the infrastructure required to scale it.Β
AI is not a single-layer problem,Β itβsΒ a sprawling ecosystem shaped by the worldβs biggest technology leaders.Β Across this trillionβdollar transformation, one architecture keeps emerging – Arm.Β
How industry leaders are building AI datacenters on Arm
Alongside Arm, the biggest names in AIβNVIDIA, AWS, Microsoft, Google, Oracle, and OpenAIβare collectively driving nextβgeneration datacenter buildouts. Estimates place AI infrastructure investments in the trillions of dollars, fueled by demands for training, inference, and costβefficient scale.
By 2025, half of the compute shipped to top hyperscalers is projected to be Armβbased. AWS (Graviton), Google Cloud (Axion), and Microsoft Azure (Cobalt) all now deploy Arm-based chips for cloud infrastructure, enabling significant energy and cost savings as well as scalability. NVIDIAβs Grace CPU, built on Arm Neoverse, anchors their Grace Blackwell AI superchip, which has seen 3.6 million units ordered by the top four U.S. hyperscale cloud providers alone. In fact, over 1 billion Arm Neoverse CPUs have now been shipped into datacenters, underscoring the architectureβs central role in this global buildout.
Across the most advanced AI datacenter stacks, Arm is the common denominator, enabling scalability, efficiency, and adaptable performance where older architectures fall short.
In fact, Arm delivers unmatched price-performance and power-efficiency:
- NVIDIAβs GraceβHopper Superchip yields up to 8X faster model training and 4.5X LLM inference performance versus x86 systemsΒΉ.
- Googleβs Axion offers up to 3X better recommender performanceΒ², 2.5X higher inference, and 64% cost savings compared to x86Β³.
- As of December 2024, over 50% of EC2 capacity is built on AWS Gravitonβ΄.
Moreover, recent analysis from the consulting Signal65 shows that the Arm Neoverse-based AWS Graviton 4 chips, are not only leading the competition on price-performance, but significantly outpacing comparable x86 offerings from AMD and Intel on overall performance across enterprise workloads. For example, Signal65βs benchmarking tests showed Graviton4 delivering up to 168% better large language model (LLM) inference performance and 220% higher price-performance than AMD, while also beating Intel in networking throughput by 53% and machine learning (ML) training speeds by 34%. These results underscore Armβs architectural advantage across both AI and general compute tasks.
Why scaling AI from cloud to edge requires new compute architectures
AI isn’t confined to datacenters, itβs expanding outward. Smartphones, PCs, and IoT devices β from low power sensors to high-performance industrial applications β now demand onβdevice generative AI, reshaping user experiences.
Arm is uniquely positioned here, too. The new Arm Lumex Compute Subsystem (CSS) platform for consumer devices unlocks real-time on-device AI use cases like assistants, voice translation and personalization, with the new SME2-enabled Arm CPUs delivering up to 5x faster AI performance. Meanwhile, the worldβs first Armv9 edge AI platform, which is optimized for edge AI workloads across IoT applications, enables on-device AI models of over one billion parameters.
Arm is powering a cloud-to-edge revolution, and itβs built to scale across that continuum.
Why software is the real differentiator in scaling AI
In AI, hardware provides the foundation, but itβs software that defines the experience. As AI workloads scale in complexity and reach, developers need an ecosystem that can move as fast as their ambitions. This is where Arm’s unique advantage shines: a unified architecture supported by a robust, optimized software ecosystem that spans from cloud to edge.
Armβs massive developer baseβnow 22 million strongβbenefits from an ecosystem where the same code, tools, and frameworks run seamlessly across devices, whether it’s datacenter-scale model training or real-time inference at the edge. This architectural consistency enables faster development, streamlined optimization, and wider deployment without redundant engineering effort.
Key frameworks like PyTorch ExecuTorch, TensorFlow Lite, and MediaPipe are now deeply integrated and optimized for Arm-based systems via Arm KleidiAI, a lightweight, open-source optimization layer that activates Arm-optimized microkernels under the hood. That means developers can tap into performance enhancements automatically without modifying code, across everything from hyperscale cloud platforms to smartphones and embedded devices.
For example, on Graviton4, KleidiAI enables time-to-first-token for Llama 3 to run up to 2.5x faster than baseline, while mobile implementations leveraging MediaPipe see performance boosts of up to 30% on models, like Gemma 2B. Whether managing AI factories or deploying chatbots at the edge, the software experience is predictable, performant, and power efficient.
This kind of seamless, system-aware software enablement is what differentiates Armβs approach. Developers arenβt left navigating fragmented stacks or doing backend rework. Instead, they inherit the benefits of an ecosystem thatβs co-designed, hardware and software together, for AI performance and efficiency.
In the AI era, where performance-per-watt is everything, Armβs software ecosystem isnβt just keeping up, itβs meeting developers where they are and accelerating innovation.
Where cloud-to-edge AI is being deployed today
AI is being forged at an unprecedented scale, from trillion-dollar datacenters to next-gen smartphones and in-vehicle systems. The architecture bridging these worlds is Arm.
With hyperscaler adoption, flexible edge compute, and a vibrant, AI-ready software ecosystem, Arm stands as the backbone of AI infrastructure, today and tomorrow.
Ready to learn more? Explore how Arm is powering the AI era at scale.
Frequently asked questions
Q: What is cloud to edge AI?
A: It refers to AI workloads that start in cloud datacenters and extend to edge devices, which help optimize compute where data is collected.
Q: How does Arm support scalable AI infrastructure?
A: Arm architecture power hyperscaler cloud chips and efficient edge processors with consistent software tooling to scale AI workloads end-to-end.
Q: What AI developer frameworks does Arm support?
A: Arm supports PyTorch (via ExecuTorch), TensorFlow Lite, MediaPipe, and optimized libraries via KleidiAI.
Q: Why build on Arm for edge AI?
A: Edge AI minimizes latency, boosts privacy, and reduces energy usage compared to cloud-only AI deployments.
References
ΒΉ NVIDIA GH200 Grace Hopper Superchip Architecture
Β² Unpacking Axion: Google Cloudβs Custom Arm-based Processor Built for the AI age
β΄ AWS re:Invent 2024 – Monday Night Live with Peter DeSantis
Any re-use permitted for informational and non-commercial or personal use only.






