Blog

March 19, 2026

AI infrastructure from cloud to edge: Why system-level design matters

AI performance is being defined by how intelligently the entire system across compute, memory, and data movement works together.

By Arm Editorial Team

In the AI era, performance is increasingly bound by a broad set of system constraints – power, thermals, memory bandwidth, and data movement. These apply across the entire compute spectrum, from hyperscale gigawatt data centers to milliwatt edge devices.

As Futurum’s “Arm at the Center of the AI and Data Center Revolution” report states: “AI is not a single workload with a single ideal infrastructure. Instead, AI is a set of workloads that require a cohesive strategy to accommodate diverse requirements cost-effectively and at high performance.”

The rise of agentic AI – where models no longer respond to a single prompt, but operate as many autonomous agents that plan, reason, and execute different tasks – is accelerating this need. Instead of isolated inference calls, agentic AI systems generate continuous workflows involving memory retrieval, tool usage, and coordination across multiple models and services, placing sustained demands on compute, memory bandwidth, and system orchestration.

This reality is driving a fundamental shift in how computing infrastructure is designed. AI systems are no longer collections of independent chips. They are integrated machines, where CPUs, accelerators, memory, and networking operate as a coordinated whole. As a result, system-level intelligence is now the primary determinant of silicon performance, efficiency, and scalability.

System-level AI infrastructure is the coordinated design of CPUs, accelerators, memory, interconnects, networking, and software orchestration so AI workloads can run efficiently at scale. As AI shifts from isolated model calls to continuous agentic and inference workloads, performance is increasingly determined by how well the whole system moves data, manages power, and coordinates compute from cloud data centers to edge devices.

Why AI infrastructure needs system-level performance, not just faster chips

In modern AI data centers, customers aren’t buying “the best CPU” or “the fastest accelerator.” They’re optimizing for the most power-efficient rack, with a sharp focus on performance-per-watt, especially under sustained, real-world workloads, not brief peak synthetic benchmarks. A rack today may draw anywhere from 50kW to over 300kW, forcing trade-offs that make per-socket “hero” performance numbers less relevant.

What matters instead is system balance:

Moving data efficiently between computing components;
Doing more useful work per watt at rack scale; and
Keeping CPUs, GPUs, NPUs, memory pools, and fabrics operating coherently as one system.

As Futurum notes, the industry conversation has shifted from: “How much raw compute can we deploy?” to “How intelligently can we orchestrate compute across diverse requirements and environments at a system level?”

This reframes the role of silicon. Accelerators provide raw power throughput, but system orchestration – scheduling, memory management, security, and data movement – determines whether that throughput can be sustained at scale. Without that orchestration, even the best accelerators risk becoming isolated, present in the rack but waiting on memory, networking, or control-plane bottlenecks.

The dynamic becomes even more pronounced with agentic AI workloads. When millions of software agents operate concurrently – querying data, invoking tools, generating outputs, and coordinating across services – compute demand becomes less “bursty” and more structural and continuous. Infrastructure must support sustained orchestration and data movement, not just peak model throughput.

What is the role of CPUs in AI infrastructure?

As AI models and workloads and deployment environments diversify and evolve, CPUs are increasingly used as the AI head node – the system’s control plane – coordinating and orchestrating across the entire system.

In large-scale AI platforms, CPUs are responsible for:

Dispatching and scheduling work across heterogeneous accelerators;
Managing memory coherence, data locality, and host memory offload for workloads, such as KV caches and vector databases;
Handling pre- and post-processing tasks that sit outside pure matrix math; and
Enforcing control-plane operations, security, and isolation across the system.

Essentially, while accelerators crunch the numbers that drive AI models, CPUs are what turn that compute into reliable, scalable, real-world value.

The criticality of the CPU is validated by industry leaders. In a Bloomberg interview, NVIDIA’s founder and CEO Jensen Huang confirmed that the Arm-powered Vera CPU – which is part of the new Vera Rubin platform – would be available as a standalone offering. This is a clear signal of the CPU’s growing importance in AI system design.

Moreover, as AI infrastructure becomes more heterogeneous, the value of a flexible, power-efficient, system-aware CPU architecture rises sharply. This is where Arm’s role becomes clear. Futurum highlights this directly: “Specialized accelerators such as GPUs and TPUs are often paired with Arm CPUs to handle general control and data management duties without incurring high cost or power overhead.”

How Arm enables system-level AI infrastructure

Arm’s compute advantage is performance, efficiency, scalability, and our broad ecosystem. For system architects, that choice translates to better risk management in a fast-moving AI landscape. These map directly with the requirements of modern rack-level AI systems.

Across the world’s leading hyperscalers, this is already visible:

AWS integrates Arm-based Graviton CPUs with Nitro DPUs and Trainium accelerators to optimize rack -level efficiency.
Google’s TPU-based systems are increasingly matched with Google Axion processors – which integrate Arm CPU cores – to anchor orchestration and control.
NVIDIA’s Grace, Grace-Hopper, and upcoming Vera platforms pair GPUs with Arm-based CPUs and DPUs to deliver tightly integrated AI systems.

The Arm compute platform is being used as the system foundation that integrates accelerators, memory, and networking into coherent, power-efficient machines.

Why AI inference is reshaping AI infrastructure

While training can grab the headlines, inference is where AI scales. The rise of agentic AI further amplifies this shift, with agents operating continuously to execute sequences of inference steps rather than a single model call.

Across many roadmaps, inference workloads overtaking training over the next decade, with these stressing systems in different ways due to the following:

Lower latency requirements;
Higher sensitivity to memory bandwidth;
Sustained, always-on execution; and
Tight power and thermal limits.

These characteristics apply not just in the data center, but at the edge – in the consumer devices and IoT systems that are part of our everyday lives. As with the cloud, the same system-level principles apply to the edge:

Performance depends on both acceleration and the efficiency of data movement across systems;
Security relies on system-level coordination to enforce protection across workloads and memory domains; and
Integration speed determines time-to-market.

Edge AI systems where acceleration is not tightly coupled with memory and interconnect quickly run into bandwidth, power, and software complexity constraints. Systems that tightly integrate CPU acceleration with memory and interconnect deliver more consistent performance, scale efficiently, and are easier for developers to target.

As Futurum observes: “Tasks that once required the cloud now execute locally, leveraging Arm’s power-efficient cores and integrated AI engines.”

How pre-integrated compute subsystems reduce AI system design risk

As system complexity grows, integration and validation – not transistor design – become the dominant cost and risk. This is why the industry is moving toward pre-integrated compute subsystems and standardized system interfaces.

Arm’s Compute Subsystems (CSS) reflect this industry shift, with demand for these platforms continuing to grow. CSS offers a concrete path to purpose-built system design by providing pre-verified subsystems that reduce integration risk while enabling partner differentiation. Rather than delivering isolated IP blocks, CSS provides pre-verified system blueprints featuring CPUs, interconnects, coherency, and memory behavior, engineered to work together from day one.

Moreover, Arm’s System IP portfolio – spanning interconnects, memory controllers, and coherency fabrics – enables partners to design complete, AI-optimized systems faster and with lower risk. This system-level foundation is increasingly critical as AI workloads stress bandwidth, latency, and power simultaneously.

What system-level AI infrastructure means for the future of AI

As AI evolves from isolated model inference toward agent-driven systems, the defining challenge of compute infrastructure becomes coordination, and ensuring that diverse processors, memory systems, and networks operate as a coherent machine.

The winners in AI will be defined at a system-level by those who can:

Build efficient systems under power constraints;
Integrate heterogeneous compute without fragmentation;
Move data with minimal energy cost; and
Deliver security and performance as properties of the system, not add-on features.

As Futurum concludes: “The industry will be reshaped not only by who builds the most powerful chips, but by who creates the most integrated and efficient systems across the spectrum.”

By enabling specialization, efficiency, and optionality at the system level, Arm sits beneath an increasing share of the most scalable AI compute environments, from hyperscale data centers to edge devices.

The future of AI isn’t just faster silicon. It’s smarter, more efficient, and more scalable systems built on Arm.

FAQs

What is system-level intelligence in AI infrastructure?
System-level intelligence is the ability of the full compute system — CPUs, accelerators, memory, interconnects, networking, and software — to work together efficiently rather than optimizing each chip in isolation.

Why do CPUs matter in AI infrastructure?
CPUs coordinate AI systems by scheduling work across accelerators, managing memory coherence and data locality, handling pre- and post-processing, and enforcing security and isolation.

Why is AI becoming an infrastructure problem?
As AI moves toward inference-first and agentic workloads, systems must support continuous execution, memory retrieval, tool use, low latency, and efficient data movement across many processors and services.

Why is performance-per-watt important for AI data centers?
AI systems are constrained by power, thermals, rack density, and sustained workload efficiency. Performance-per-watt helps determine how much useful AI work can be delivered within real-world data center power limits.

How does Arm support AI infrastructure from cloud to edge?
Arm provides power-efficient CPUs, Compute Subsystems, and System IP that help partners build coordinated AI systems spanning data centers, edge devices, and embedded platforms.

By Arm Editorial Team

Article Text

Copy Text

Any re-use permitted for informational and non-commercial or personal use only.

Editorial Contact

Arm Editorial Team

editorial@arm.com

Stay informed with Arm's top stories, insights, and conversations.

Blog

Mar 24, 2026

A comprehensive guide to understanding Arm Neoverse

Arm Editorial Team

Blog

Mar 12, 2026

Why system architects now default to Arm in AI data centers

Arm Editorial Team

Blog

Feb 26, 2026

As AI scales, so do CPUs

Arm Editorial Team

Blog

Jan 26, 2026

Why cloud developers are moving to Arm: Building the AI-ready infrastructure of the future

Arm Editorial Team

Blog

Feb 12, 2026

Why CPUs sit at the center of AI infrastructure: Five takeaways from Futurum’s latest report

Arm Editorial Team

Blog

Sep 10, 2025

Accelerating Development Cycles and Scalable, High-Performance On-Device AI with New Arm Lumex CSS Platform

Kinjal Dave, Senior Director, Product Management, Client Line of Business, Arm

Media Information

Latest on X

; Arm @Arm ·

13h 2055030863696441842

Building the future of AI, on Arm, takes a team of engineers, researchers, creatives, and product teams around the globe. The Arm AGI CPU was made possible by the collaboration from the teams across Arm working to usher in the next era of AI. 🚀

Reply on Twitter 2055030863696441842 Retweet on Twitter 2055030863696441842 7 Like on Twitter 2055030863696441842 40 Twitter 2055030863696441842

; Arm @Arm ·

16h 2054991171546444249

At #CadenceLIVE, Anirudh Devgan, CEO of Cadence highlights the growing momentum behind Arm in the data center, from hyperscaler platforms like AWS Graviton to new systems from NVIDIA, Microsoft, Google and more. 🎥

As AI-driven design workloads scale, @Cadence is advancing EDA

Reply on Twitter 2054991171546444249 Retweet on Twitter 2054991171546444249 7 Like on Twitter 2054991171546444249 28 Twitter 2054991171546444249

; Arm @Arm ·

13 May 2054452801049100712

Google’s introduction of Googlebook marks an exciting step forward for computing, bringing Gemini experiences to a new generation of laptops designed for AI-first experiences.

Built on part of the Android tech stack and enabled by Arm compute, Googlebook brings together

Google @Google

Introducing Googlebook, the first laptop designed for Gemini Intelligence. It’s crafted for heavyweight performance, built with Gemini at the core and perfectly synced with your Android phone. Coming this fall. 💻✨

#TheAndroidShow

Reply on Twitter 2054452801049100712 Retweet on Twitter 2054452801049100712 45 Like on Twitter 2054452801049100712 420 Twitter 2054452801049100712

; Arm @Arm ·

12 May 2054269900286288208

AI isn’t just about GPUs.💡

As AI scales, CPUs play a critical role in orchestrating workloads, memory, accelerators, storage and control planes.

Arm delivers the performance, efficiency and flexibility needed to help AI scale from devices to cloud.

Reply on Twitter 2054269900286288208 Retweet on Twitter 2054269900286288208 4 Like on Twitter 2054269900286288208 41 Twitter 2054269900286288208

; Arm @Arm ·

11 May 2053976223206850564

Arm and @RedHat are expanding their collaboration to advance agentic AI infrastructure!

By combining Arm AGI CPUs with Red Hat’s open source platforms we're enabling scalable, efficient performance for Agentic AI systems that plan, reason and act across cloud and on-prem

Reply on Twitter 2053976223206850564 Retweet on Twitter 2053976223206850564 5 Like on Twitter 2053976223206850564 34 Twitter 2053976223206850564

; Arm @Arm ·

8 May 2052811804653306289

See you in Taipei! 👋

Rene Haas joins #COMPUTEX2026 to share how Arm's powering the rise of agentic AI during his livestreamed keynote.🚨

From hyperscale infrastructure to AI PCs and intelligent edge devices, see how we're enabling AI at every scale, with a platform built for

Reply on Twitter 2052811804653306289 Retweet on Twitter 2052811804653306289 3 Like on Twitter 2052811804653306289 38 Twitter 2052811804653306289

; Arm @Arm ·

6 May 2052118260049387862

We closed FYE26 with record Q4 results and launched the Arm AGI CPU with $2B+ of customer demand across fiscal '27 and '28. 🎉

The direction is clear: customers want Arm at the center of the AI data center—strengthening our position from cloud to edge.
https://okt.to/spCmV2

Reply on Twitter 2052118260049387862 Retweet on Twitter 2052118260049387862 18 Like on Twitter 2052118260049387862 78 Twitter 2052118260049387862

AI infrastructure from cloud to edge: Why system-level design matters

Why AI infrastructure needs system-level performance, not just faster chips

What is the role of CPUs in AI infrastructure?

How Arm enables system-level AI infrastructure

Why AI inference is reshaping AI infrastructure

How pre-integrated compute subsystems reduce AI system design risk

What system-level AI infrastructure means for the future of AI

FAQs

Editorial Contact

Related

A comprehensive guide to understanding Arm Neoverse

Why system architects now default to Arm in AI data centers

As AI scales, so do CPUs

Why cloud developers are moving to Arm: Building the AI-ready infrastructure of the future

Why CPUs sit at the center of AI infrastructure: Five takeaways from Futurum’s latest report

Accelerating Development Cycles and Scalable, High-Performance On-Device AI with New Arm Lumex CSS Platform

Media Information

Company Overview & History

Arm Corporate Guidelines

Media Contacts

Latest on X