Blog

March 12, 2026

Why system architects now default to Arm in AI data centers

As AI workloads push infrastructure to rack‑level limits, system architects are rethinking their foundations, and increasingly starting with Arm.

By Arm Editorial Team

For more than a decade, cloud infrastructure scaled through abstraction. Standardized servers, virtualized resources, and software layers helped smooth over hardware differences. That model worked because some workloads could tolerate inefficiency. But AI cannot. It exposes the limits of legacy architectures on power delivery, thermal and compute density, memory bandwidth, and system-level performance.

Essentially, AI has redefined what “good” infrastructure looks like. In response, platform design is shifting beyond individual chips and servers to rack-level, scale-up systems engineered to grow efficiently under power and budget constraints. This is against the backdrop of expanding inference and agentic AI workloads that are running continuously and accelerating demand for dense, always-on compute.

Futurum’s report “Arm at the Center of the AI and Data Center Revolution” frames this as a shift to “system-level harmony” where the key design question is how effectively platforms orchestrate compute across accelerators, CPUs, memory, networking, and software – not simply how much raw compute can be deployed.

That is why the industry is moving toward purpose-built rack-level system design, where platforms are engineered end-to-end around AI behavior, power volatility, and sustained utilization. Increasingly, system architects are revisiting foundational compute assumptions, and choosing Arm-based architectures to address the constraints shaping modern AI platforms.

AI forces a reset toward purpose-built rack-level systems

The biggest break is not that general-purpose, commodity infrastructure cannot run AI. It is that fragmented system design shows up as real cost at AI scale.

AI workloads are tightly coupled across compute, memory, networking, storage, and software. If CPUs fall behind, expensive accelerators wait. If power consumption and cooling headroom fluctuates, utilization drops. If the data pipeline, scheduling, and orchestration are not tuned to the platform, throughput becomes unpredictable. Peak performance still matters, but stability, performance-per-watt and system balance matters more.

Mohamed Awad, EVP, Cloud AI Business Unit, Arm, on why performance per watt is the new AI data center benchmark

Futurum notes that hyperscalers are structurally realigning toward architectures that can deliver exponential compute growth without exponential energy consumption. It cites Arm data indicating that nearly half the compute shipped to top hyperscalers at the end of 2025 is expected to be Arm-based.

Architects are now pushing past the “headline benchmark” questions and into those that actually decide whether an AI platform holds up when running agentic AI and continuous inferencing workloads in real-world applications. For example:

What happens under sustained load, hour after hour?
How do power limits and thermals reshape the performance curve in the real world?
How do compute layers in the stack in a rack-level system deterministically make sure accelerators stay continuously fed, minute by minute, not just on paper specs?

When efficiency, scalability, and system balance become first principles, it is natural to revisit the CPU foundation. That’s exactly why Arm – with our leading architecture and broad ecosystem – is central to this conversation through our architecture and broad ecosystem.

In the data center, the Arm Neoverse platform is a core enabler of this transition, with major hyperscalers and AI leaders, including AWS, Google, Microsoft, and NVIDIA, building on and adopting Arm-based platforms. Arm’s model supports purpose-built system design while preserving consistency across platforms, ecosystem partners and software. That flexibility matters when teams are designing tightly integrated platforms, but do not want a single path to be the only path.

Agentic AI and continuous inference reshape the economics of scale

AI workloads are changing and infrastructure needs to support different profiles as AI and general compute workloads are converging.

The center of gravity is shifting toward agentic AI, and that’s fundamentally an inference problem. Agentic systems don’t “serve one answer” and stop; they plan, call tools, retrieve data, validate results, and loop. That creates continuous inferencing patterns: steady, always-on token generation, more variability in request shapes, and much heavier orchestration and data movement around the accelerators.

In agentic AI, CPUs are not just “support” actors; they act as the head node for an AI system. They coordinate the control plane, schedule and route work, manage IO, handle networking and storage services, enforce security, and keep the overall system balanced as models, contexts, and tool chains evolve.

Consider a service hosting a large language model (LLM) with hundreds or thousands of concurrent requests. Even when the accelerator is doing the core compute, the CPU is often responsible for request admission control, tokenization and pre-processing, batching and queueing decisions, orchestrating data movement, and coordinating the network and storage paths for model weights and KV cache behavior. With agentic workflows, that CPU-side work expands further, adding tool calling, retrieval pipelines, structured output validation, and multi-step scheduling that runs continuously, not just per-request.

All this means the CPU layer matters more than many teams planned for. If the CPU can’t keep up with orchestration, then data movement, processing, and accelerators become structurally stranded.

Arm momentum is visible in how the converged AI data center is being built

This is where Arm momentum is accelerating. Across the industry’s leading integrated AI systems, Neoverse-based CPUs are being selected for the orchestration layer supporting agentic, inference-heavy fleets, particularly where efficiency, predictable scaling, and broad deployment footprints are required.

Independent testing underscores the impact of a modern CPU foundation in these “AI-adjacent” workloads. Futurum’s Signal65 benchmarking of AWS Graviton4 (Neoverse) versus comparable AMD and Intel EC2 alternatives, Neoverse-based Graviton4 instances delivered significantly higher performance and better price/performance across all tested workloads, including generative AI (Llama-3.1-8B), database (Redis), Machine Learning (XGBoost), and networking (Nginx.)

Those results map directly to agentic AI data center realities: LLMs, retrieval layers, caching, web and API tiers, and classical ML components all sit on the hot path of agentic systems, and they scale better when the CPU layer is both fast and efficient.

The newest “rack-scale” AI systems are being designed around a purpose-built accelerator layer and an Arm-based CPU layer for orchestration, data movement, and agentic reasoning. NVIDIA’s portfolio spanning Grace Hopper and Grace Blackwell systems, pair NVIDIA GPUs with Grace CPUs built on Neoverse, while the company’s newest rack-scale platform, Vera Rubin NVL72, integrates 72 Rubin GPUs with 36 Arm-powered Vera CPUs in a system designed to drive down inference costs for interactive, deep reasoning agentic AI.

AWS is making the same system-level move. AWS Trainium3 UltraServer pairs Trainium3 accelerator chips with AWS Graviton CPUs reinforcing the “converged” design pattern: matching accelerating with purpose-built, performance per watt efficient CPUs that can effectively scale.

Better choice is now a requirement, not a preference

AI systems evolve too fast for static architectures, so providing better choice to customers is a risk-management requirement.

System architects want:

Platforms that can adapt across hardware generations, workload profiles, and deployment environments; and
Software portability that reduces friction when systems change.

At the same time, they want to avoid optimizing so tightly around a single vendor path that future decisions become constrained when the model mix shifts, the footprint expands, or new requirements appear. That’s especially important in the agentic era where the “shape” of inference keeps changing leading to longer contexts, more tool use, more multimodal inputs, and more always-on workloads that reward efficiency and balance over peak-only design.

Arm is designed to preserve that portability while scaling system-level performance. The Arm architecture brings features that matter in modern AI infrastructure, while being supported by a robust software ecosystem. Arm Compute Subsystems (CSS) provide validated, infrastructure-grade building blocks that accelerate silicon development, while still enabling differentiation and preserving choice across partners. For any Arm-based platform, there is consistency across the board and it is easy to migrate cloud workloads to Arm. Meanwhile, at the software level, the Arm ecosystem helps teams move faster with a consistent foundation across environments and platforms without rewriting everything that runs on it.

Agentic AI economics are reshaping CPU choices and Arm Neoverse is the choice of leaders

System architects are increasingly shifting to Arm because it aligns with what purpose-built AI systems are demanding: power efficiency, scalability, and performance per watt. Efficiency matters because power and budgets are hard limits. System balance and CPU performance matter because stranded accelerator utilization is expensive. Consistency matters because AI infrastructure changes quickly and increasingly spans multiple environments.

In the converged, agentic AI data center, continuous inferencing turns those priorities into day-one requirements. Agentic systems don’t just need accelerators that can generate tokens; they need CPU-led orchestration that can sustain real utilization across networking, storage, scheduling, and security – continuously, efficiently, and at scale.

That’s the momentum behind Arm right now, with Neoverse acting as the CPU foundation for the agentic era. It is the head-node layer that keeps AI systems coherent, efficient, and ready for the future.

By Arm Editorial Team

Article Text

Copy Text

Any re-use permitted for informational and non-commercial or personal use only.

Editorial Contact

Arm Editorial Team

editorial@arm.com

Stay informed with Arm's top stories, insights, and conversations.

Blog

Feb 12, 2026

Why CPUs sit at the center of AI infrastructure: Five takeaways from Futurum’s latest report

Arm Editorial Team

Blog

Feb 26, 2026

As AI scales, so do CPUs

Arm Editorial Team

Blog

Jan 06, 2026

Arm in the agentic era: Scaling the converged AI data center

Arm Editorial Team

Blog

Mar 24, 2026

A comprehensive guide to understanding Arm Neoverse

Arm Editorial Team

Blog

Feb 10, 2026

From commodity to purpose-built: Why AI infrastructure is entering a new era

Arm Editorial Team

Blog

Jan 26, 2026

Why cloud developers are moving to Arm: Building the AI-ready infrastructure of the future

Arm Editorial Team

Media Information

Latest on X

; Arm @Arm ·

18 May 2056423648567701947

As AI moves into real-world deployment, challenges like cost, scalability and coordination are becoming harder to solve. 💭

Where do you think AI needs to improve most?

Reply on Twitter 2056423648567701947 Retweet on Twitter 2056423648567701947 2 Like on Twitter 2056423648567701947 10 Twitter 2056423648567701947

; Arm @Arm ·

15 May 2055418561401663542

At @WebSummitr, Arm CMO Ami Badani shared insight into the renewed focus on the CPU.

As AI agents scale, power efficiency and orchestration become increasingly critical for managing and running distributed AI workloads across cloud and edge environments.

Reply on Twitter 2055418561401663542 Retweet on Twitter 2055418561401663542 6 Like on Twitter 2055418561401663542 31 Twitter 2055418561401663542

; Arm @Arm ·

15 May 2055387773977104418

At the @scsp_ai+ Expo last week, Arm’s Vince Jesaitis and Stephen Ozoigbo from our Government Affairs team led conversations on the future of AI and semiconductor growth.

From scaling the physical AI stack to developing the workforce behind it, the discussions highlighted Arm’s

Reply on Twitter 2055387773977104418 Retweet on Twitter 2055387773977104418 4 Like on Twitter 2055387773977104418 17 Twitter 2055387773977104418

; Arm @Arm ·

14 May 2055030863696441842

Building the future of AI, on Arm, takes a team of engineers, researchers, creatives, and product teams around the globe. The Arm AGI CPU was made possible by the collaboration from the teams across Arm working to usher in the next era of AI. 🚀

Reply on Twitter 2055030863696441842 Retweet on Twitter 2055030863696441842 8 Like on Twitter 2055030863696441842 58 Twitter 2055030863696441842

; Arm @Arm ·

14 May 2054991171546444249

At #CadenceLIVE, Anirudh Devgan, CEO of Cadence highlights the growing momentum behind Arm in the data center, from hyperscaler platforms like AWS Graviton to new systems from NVIDIA, Microsoft, Google and more. 🎥

As AI-driven design workloads scale, @Cadence is advancing EDA

Reply on Twitter 2054991171546444249 Retweet on Twitter 2054991171546444249 9 Like on Twitter 2054991171546444249 35 Twitter 2054991171546444249

; Arm @Arm ·

13 May 2054452801049100712

Google’s introduction of Googlebook marks an exciting step forward for computing, bringing Gemini experiences to a new generation of laptops designed for AI-first experiences.

Built on part of the Android tech stack and enabled by Arm compute, Googlebook brings together

Google @Google

Introducing Googlebook, the first laptop designed for Gemini Intelligence. It’s crafted for heavyweight performance, built with Gemini at the core and perfectly synced with your Android phone. Coming this fall. 💻✨

#TheAndroidShow

Reply on Twitter 2054452801049100712 Retweet on Twitter 2054452801049100712 46 Like on Twitter 2054452801049100712 422 Twitter 2054452801049100712

; Arm @Arm ·

12 May 2054269900286288208

AI isn’t just about GPUs.💡

As AI scales, CPUs play a critical role in orchestrating workloads, memory, accelerators, storage and control planes.

Arm delivers the performance, efficiency and flexibility needed to help AI scale from devices to cloud.

Reply on Twitter 2054269900286288208 Retweet on Twitter 2054269900286288208 4 Like on Twitter 2054269900286288208 41 Twitter 2054269900286288208

Why system architects now default to Arm in AI data centers

AI forces a reset toward purpose-built rack-level systems

Agentic AI and continuous inference reshape the economics of scale

Arm momentum is visible in how the converged AI data center is being built

Better choice is now a requirement, not a preference

Agentic AI economics are reshaping CPU choices and Arm Neoverse is the choice of leaders

Editorial Contact

Related

Why CPUs sit at the center of AI infrastructure: Five takeaways from Futurum’s latest report

As AI scales, so do CPUs

Arm in the agentic era: Scaling the converged AI data center

A comprehensive guide to understanding Arm Neoverse

From commodity to purpose-built: Why AI infrastructure is entering a new era

Why cloud developers are moving to Arm: Building the AI-ready infrastructure of the future

Media Information

Company Overview & History

Arm Corporate Guidelines

Media Contacts

Latest on X