From cloud to edge: Why system-level intelligence is the foundation of AI
In the AI era, performance is increasingly bound by a broad set of system constraints – power, thermals, memory bandwidth, and data movement. These apply across the entire compute spectrum, from hyperscale gigawatt data centers to milliwatt edge devices.
As Futurum’s “Arm at the Center of the AI and Data Center Revolution” report states: “AI is not a single workload with a single ideal infrastructure. Instead, AI is a set of workloads that require a cohesive strategy to accommodate diverse requirements cost-effectively and at high performance.”
The rise of agentic AI – where models no longer respond to a single prompt, but operate as many autonomous agents that plan, reason, and execute different tasks – is accelerating this need. Instead of isolated inference calls, agentic AI systems generate continuous workflows involving memory retrieval, tool usage, and coordination across multiple models and services, placing sustained demands on compute, memory bandwidth, and system orchestration.
This reality is driving a fundamental shift in how computing infrastructure is designed. AI systems are no longer collections of independent chips. They are integrated machines, where CPUs, accelerators, memory, and networking operate as a coordinated whole. As a result, system-level intelligence is now the primary determinant of silicon performance, efficiency, and scalability.
Sustained, system-led performance – not performance at any cost
In modern AI data centers, customers aren’t buying “the best CPU” or “the fastest accelerator.” They’re optimizing for the most power-efficient rack, with a sharp focus on performance-per-watt, especially under sustained, real-world workloads, not brief peak synthetic benchmarks. A rack today may draw anywhere from 50kW to over 300kW, forcing trade-offs that make per-socket “hero” performance numbers less relevant.
What matters instead is system balance:
- Moving data efficiently between computing components;
- Doing more useful work per watt at rack scale; and
- Keeping CPUs, GPUs, NPUs, memory pools, and fabrics operating coherently as one system.
As Futurum notes, the industry conversation has shifted from: “How much raw compute can we deploy?” to “How intelligently can we orchestrate compute across diverse requirements and environments at a system level?”
This reframes the role of silicon. Accelerators provide raw power throughput, but system orchestration – scheduling, memory management, security, and data movement – determines whether that throughput can be sustained at scale. Without that orchestration, even the best accelerators risk becoming isolated, present in the rack but waiting on memory, networking, or control-plane bottlenecks.
The dynamic becomes even more pronounced with agentic AI workloads. When millions of software agents operate concurrently – querying data, invoking tools, generating outputs, and coordinating across services – compute demand becomes less “bursty” and more structural and continuous. Infrastructure must support sustained orchestration and data movement, not just peak model throughput.
Why CPUs matter more in the AI era
As AI models and workloads and deployment environments diversify and evolve, CPUs are increasingly used as the AI head node – the system’s control plane – coordinating and orchestrating across the entire system.
In large-scale AI platforms, CPUs are responsible for:
- Dispatching and scheduling work across heterogeneous accelerators;
- Managing memory coherence, data locality, and host memory offload for workloads, such as KV caches and vector databases;
- Handling pre- and post-processing tasks that sit outside pure matrix math; and
- Enforcing control-plane operations, security, and isolation across the system.
Essentially, while accelerators crunch the numbers that drive AI models, CPUs are what turn that compute into reliable, scalable, real-world value.
The criticality of the CPU is validated by industry leaders. In a Bloomberg interview, NVIDIA’s founder and CEO Jensen Huang confirmed that the Arm-powered Vera CPU – which is part of the new Vera Rubin platform – would be available as a standalone offering. This is a clear signal of the CPU’s growing importance in AI system design.
Moreover, as AI infrastructure becomes more heterogeneous, the value of a flexible, power-efficient, system-aware CPU architecture rises sharply. This is where Arm’s role becomes clear. Futurum highlights this directly: “Specialized accelerators such as GPUs and TPUs are often paired with Arm CPUs to handle general control and data management duties without incurring high cost or power overhead.”
Arm and the rise of system-level infrastructure
Arm’s compute advantage is performance, efficiency, scalability, and our broad ecosystem. For system architects, that choice translates to better risk management in a fast-moving AI landscape. These map directly with the requirements of modern rack-level AI systems.
Across the world’s leading hyperscalers, this is already visible:
- AWS integrates Arm-based Graviton CPUs with Nitro DPUs and Trainium accelerators to optimize rack -level efficiency.
- Google’s TPU-based systems are increasingly matched with Google Axion processors – which integrate Arm CPU cores – to anchor orchestration and control.
- NVIDIA’s Grace, Grace-Hopper, and upcoming Vera platforms pair GPUs with Arm-based CPUs and DPUs to deliver tightly integrated AI systems.
The Arm compute platform is being used as the system foundation that integrates accelerators, memory, and networking into coherent, power-efficient machines.
AI is becoming inference-first
While training can grab the headlines, inference is where AI scales. The rise of agentic AI further amplifies this shift, with agents operating continuously to execute sequences of inference steps rather than a single model call.
Across many roadmaps, inference workloads overtaking training over the next decade, with these stressing systems in different ways due to the following:
- Lower latency requirements;
- Higher sensitivity to memory bandwidth;
- Sustained, always-on execution; and
- Tight power and thermal limits.
These characteristics apply not just in the data center, but at the edge – in the consumer devices and IoT systems that are part of our everyday lives. As with the cloud, the same system-level principles apply to the edge:
- Performance depends on both acceleration and the efficiency of data movement across systems;
- Security relies on system-level coordination to enforce protection across workloads and memory domains; and
- Integration speed determines time-to-market.
Edge AI systems where acceleration is not tightly coupled with memory and interconnect quickly run into bandwidth, power, and software complexity constraints. Systems that tightly integrate CPU acceleration with memory and interconnect deliver more consistent performance, scale efficiently, and are easier for developers to target.
As Futurum observes: “Tasks that once required the cloud now execute locally, leveraging Arm’s power-efficient cores and integrated AI engines.”
System design at scale
As system complexity grows, integration and validation – not transistor design – become the dominant cost and risk. This is why the industry is moving toward pre-integrated compute subsystems and standardized system interfaces.
Arm’s Compute Subsystems (CSS) reflect this industry shift, with demand for these platforms continuing to grow. CSS offers a concrete path to purpose-built system design by providing pre-verified subsystems that reduce integration risk while enabling partner differentiation. Rather than delivering isolated IP blocks, CSS provides pre-verified system blueprints featuring CPUs, interconnects, coherency, and memory behavior, engineered to work together from day one.
Moreover, Arm’s System IP portfolio – spanning interconnects, memory controllers, and coherency fabrics – enables partners to design complete, AI-optimized systems faster and with lower risk. This system-level foundation is increasingly critical as AI workloads stress bandwidth, latency, and power simultaneously.
Why this all matters
As AI evolves from isolated model inference toward agent-driven systems, the defining challenge of compute infrastructure becomes coordination, and ensuring that diverse processors, memory systems, and networks operate as a coherent machine.
The winners in AI will be defined at a system-level by those who can:
- Build efficient systems under power constraints;
- Integrate heterogeneous compute without fragmentation;
- Move data with minimal energy cost; and
- Deliver security and performance as properties of the system, not add-on features.
As Futurum concludes: “The industry will be reshaped not only by who builds the most powerful chips, but by who creates the most integrated and efficient systems across the spectrum.”
By enabling specialization, efficiency, and optionality at the system level, Arm sits beneath an increasing share of the most scalable AI compute environments, from hyperscale data centers to edge devices.
The future of AI isn’t just faster silicon. It’s smarter, more efficient, and more scalable systems built on Arm.
Any re-use permitted for informational and non-commercial or personal use only.






