Arm Newsroom Podcast
Podcast

Why AI infrastructure needs a new kind of processor

Arm's Eddie Ramirez explains why Arm launched the AGI CPU, how agentic AI is reshaping data centers, and why efficiency is becoming the defining metric for the AI era
The Arm Podcast · Arm Viewpoints: The Arm AGI CPU Launch and the Future of AI Infrastructure

Listen now on:

Applepodcasts Spotify

Summary

The launch of the Arm AGI CPU marks one of the most significant milestones in Arm’s history. In this episode of Arm Viewpoints, Brian Fuller sits down with Eddie Ramirez to explore why Arm made the leap from IP and compute subsystems to deployment-ready silicon, and what that means for the future of AI infrastructure.

They discuss the rise of agentic AI, the growing importance of CPUs in orchestrating AI workloads, the power and efficiency challenges facing modern data centers, and how Arm is positioning itself to help organizations deploy AI at unprecedented scale. Along the way, Eddie shares behind-the-scenes insights from the AGI CPU launch event and offers a glimpse into how AI, cloud infrastructure, and physical intelligence may evolve over the next five years.

Speakers

Eddie Ramirez, VP of marketing, Arm Cloud AI Business Unit

Eddie Ramirez, VP of marketing, Arm Cloud AI Business Unit

Eddie leads a go-to-market team at Arm responsible for helping partners innovate and grow through the adoption of Arm-based solutions into data center, networking and edge markets. He also manages a team responsible for ecosystem development activities and is focused on creating a rich and vibrant ecosystem of hardware and software partners. Prior to Arm, Eddie held various executive leadership roles in product management, segment marketing and applications engineering. He has more than 20 years of experience in storage and networking having had successful campaigns at AMD, Marvell, Sandforce/LSI, Seagate and Western Digital. Eddie has bachelor’s degrees in both Management Science and Electrical Engineering from Massachusetts Institute of Technology (MIT).

Brian Fuller, Arm Editor-in-Chief, host

Brian Fuller, Arm Editor-in-Chief, host

Brian Fuller is an experienced writer, journalist and communications/content marketing strategist specializing in both traditional publishing and emerging digital technologies. He has held various leadership roles, currently as Editor-in-Chief at Arm and formerly at Cadence Design Systems, Inc.

Prior to his content-marketing work inside corporations, he was a wire-service reporter and business editor before joining EE Times and spending nearly 20 years there in various roles, including editor-in-chief and publisher. He holds a B.A. in English from UCLA.

Transcript

Brian: [00:00:00] Hello, and welcome to another episode of the Arm Viewpoints podcast, where we explore topics at the intersection of AI and human imagination. I’m Brian Fuller, editor-in-chief at Arm. Today, we’re looking at one of the most significant moments in Arm’s history, the launch of the Arm AGI CPU, a move that marks a new chapter in how Arm shows up in the market, not just as an IP company, not just as a compute subsystem company, but as a provider of deployment-ready silicon for the AI era.

My guest is Eddie Ramirez, who had a front-row seat to the launch in San Francisco this spring, and to the years of work behind it. In our conversation, Eddie helps unpack why this moment matters not just for Arm, but for the future of AI infrastructure. Talk about why AI is becoming a system-level problem, why the rise of agentic AI is changing [00:01:00]the balance between CPUs and accelerators, and why efficiency has become one of the defining constraints in modern data centers.

At the center of it all is a simple but powerful idea. As AI moves from isolated models to coordinated agents, the infrastructure beneath it needs to change, and Arm is positioning itself to help make that transition faster, more efficient, and easier to deploy Let’s jump right in. Eddie, welcome. March 24, 2026, probably not a day we’ll ever forget.

What was it like being there?

Eddie: It was probably the highlight of my career here at Arm. A lot of work that folks throughout Arm have been putting into the Phoenix program. It was finally our time to be able to go out and publicly talk about a very exciting product, a product that’s really a game changer for us as Arm in the way that we show up into the [00:02:00] marketplace.

And to be able to do that in San Francisco with so many of our partners supporting us, that was what really made the biggest difference, right? Is that we had so many of our traditional IP partners who– leaders of companies like Nvidia, Marvell, and Broadcom, who were all endorsing, our new strategic direction.

And at the same time, we just had a wealth of partners that wanted to see how they could get their hands on an AGI CPU. And to bring that together in one place the excitement, the buzz that we created and that was there in the room it was really fantastic. It was definitely a once in a career opportunity for me.

Brian: And for those of us who weren’t there but were watching the live stream, the energy ca- crackled through the digital space. It was astonishing-

Eddie: You know what? Kudos to the marketing team. It was a great location. Yeah. Fort [00:03:00] Mason has a lot of history and then to be able to weave that into a narrative around Arm, having its own rich history as well.

But both of those being a launching point for something bigger. It was really great because from the outside of the event, you maybe couldn’t capture what, what really was gonna go on until you actually came inside and the way that the marketing and events team had everything planned was great.

One, one small tidbit is, customers came in and socialized, and all of the demos and all of the systems that we had were all hidden from them when they started the event. And it wasn’t until after they came out of the keynotes that suddenly they saw servers and racks and demos and everything just reinforced, how exciting a time this is, right?

And how we’ve really thought bigger. It’s not just the chip, it’s an entire platform that we’re making available to the market.

Brian: [00:04:00] Speaking of time, why did Arm believe this was the right moment to make this move, right? It’s been an IP and a subsystems company for three, four decades.

Eddie: I think it’s because our customers were demanding that we show up in a different way.

And if I just take back to Meta’s new challenge is that they have to scale data centers at a much faster pace because of the need for AI computing. And so a typical, if I just rewind back to even a few years back, Meta would plan out their data centers in four to five-year cycles. ‘Cause it took that long to decide you were gonna build a new data center, then go off and build it, then go find the equipment that you needed to get in s- and then get it up and operational, right?

This was a five-year cycle. They are now trying to expedite that to two years or less. And so therefore, they need [00:05:00] solutions that are deployment ready from the, day one. One data center campus is now five times the capacity of all the data centers they had before. These massive AI factories that are now, being thought of and being built are just at a different scale than what our partners were thinking about before, and so they just need more deployment-ready solutions, right?

Where is the rack that you can just wheel in and I can get hundreds of AGI CPUs up and running? That’s really why we had to show up in, in a different way.

Brian: You’ve described, and I think you probably did this in your presentation at OCP this spring, you’ve described AI as a system-level problem, not just a compute problem.

What does that mean in practice?

Eddie: We tend to hear a lot about heterogeneous computing, right? That really the new paradigm is that you [00:06:00]pair, a general purpose processor with some sort of AI accelerator, and now you’re trying to coordinate the activity between both. In today’s AI factory scale, what customers are looking at is effectively how do I build racks of computing, racks of, accelerators, and network them with high-speed networking, networking that also has intelligence behind it.

And so you’re seeing DPUs with Arm cores even in the, that networking equipment. And all of this has to get orchestrated because they’re– we’re also now looking at using AI to orchestrate workflows, to decide where agents get run, how to provision systems. And so it’s really exciting time because the AI itself is being applied to how to manage these data centers and how to build them out and run them.

So it’s a really unique opportunity for us to think about, not [00:07:00] just the CPU, but how does that entire rack get built, and also how does a cluster or a pod of different kinds of servers need to get deployed. That’s really the paradigm that we’re now in as people are really, companies are racing to build out compute capacity.

Brian: And how much of this is being driven by the rise of agentic AI workloads versus simply scaling existing AI models?

Eddie: This really changed in September and October of last year. What was… we had for quite some time been saying that there would be this transition from AI computing that was being used to train the models and computing that was gonna be used for inference, right?

How do I actually do work with these AI models? And what really changed in Q4 of last year was that we finally had that killer app for agentic AI, and that was OpenClaw, right? [00:08:00] Oh, yeah. OpenClaw was released. It was a framework where people downloaded it and could start creating their own AI agents. In effect, you could create your own personal AI assistant using this framework, and it skyrocketed.

Effectively, we saw about 38 million monthly visitors in April to the GitHub page for OpenClaw. It received, the most stars, which is a way of people saying that this project is, super interesting or super important. Within a month, it became most starred than even Kubernetes or Linux.

Wow. So it just shows you this, avalanche of interest ’cause you now had a killer app, and it really felt like ChatGPT moment all over again, right? When ChatGPT- Yeah … was the moment you realized, wow, this is great. I can, get lots of information out of these chatbots. I think the [00:09:00] same thing happened with OpenClaw, and suddenly people started to think about the fact that I can now automate a lot of workflows using agents, and that is a game changer in itself because if today most people interact with a chatbot, the limitation in how many AI queries is how fast you can type those queries.

Yeah. If you now put agents that are automating this you’re gonna see 15x more inference queries because the agents are working 24, right? Yeah. They’re spawning off additional agents. So we think that the amount of inference queries is gonna spike up X, and that’s a real opportunity for more compute to help support this increase in inference and the inference that’s coming from these agents.

Brian: So what infrastructure assumptions start to break when AI systems stop behaving like [00:10:00] isolated models, like you’ve mentioned, and start behaving more like coordinated agents?

Eddie: That’s a great question because I think what we’re gonna find is that the da- the way the data centers were originally designed, they were designed for passive tools, right?

Humans interacting with websites, humans entering data into databases. And now you’re gonna have systems that have some degree of autonomy, okay? And so that means that, one, we’re gonna find that they’re gonna utilize the hardware a lot more, right? We’re gonna start seeing bottlenecks in the network, bottlenecks in compute, bottlenecks between servers and a rack.

And that is, one way that we’re gonna see maybe the current type of deployment that needs to change. But the other way is that it needs to be continuously able to adapt and orchestrate workflows. You are going to have agents that are [00:11:00] deciding where You know, work gets done. That is really the change that I think many companies are now talking about, and it’s really gonna kinda fundamentally shape the way we think about data center design.

And every time you think about, an agent orchestrating a workflow or an agent making a decision, most of that will run on CPU. Yeah. And so that’s the real opportunity for Arm as well, is that we are entering the CPU market with the AGI silicon right at a time when people are realizing they need way more cores in the data center- Yeah

to handle agentic AI.

Brian: Let’s put, let’s pull on that thread a little bit more. So talk more about the role, the orchestration role that the CPU plays. What specifically is it coordinating inside these environments?

Eddie: So if you look at w- what typically was referred to as an AI server, a typical AI server up to [00:12:00] this point would’ve had one to two CPUs, and it would’ve had accelerators like GPUs.

And the very popular AI server is like a Grace Hopper or a Grace Blackwell, where you typically had one CPU for every two, in some cases four accelerators. That CPU’s main job was to keep the GPUs or other accelerators feeding them data, okay? So f- that was the main role.

Now in an agentic AI workload, the, that CPU is doing a lot more. You can have an agent that is actually trying to make a decision on a much smaller model, and then it’s passing that information to another agent. So we showed a demo at the Arm Everywhere event of an agent workflow that was one agent would go through your calendar, one agent would go through your emails, another agent would look through databases.

And so you would query a question, and it would look through all of these [00:13:00] different data sets for, to create a response, all of that on CPU. And then at the very end, it goes off to a large language model to create a response using all of these data sets. And so the CPU was now running most of the time, and only at the very end did you go and actually access the GPU when you needed to go and look at that, multi-billion parameter model.

That’s the fundamental change that is happening in AI that is now creating, that CPU to be, more cores are needed, lower latency and more power efficient architectures because there’s now a different balance between what I would say is GPU accelerator versus CPU. And we’re not the only ones that are seeing that.

I think lots of folks in the industry are talking about that. That’s why there was so many exciting articles written about this sort of new wave of criticality for [00:14:00] CPUs.

Brian: With that crazy increase in inference and agentic workloads firing off back and forth constantly, that seems to put a premium, correct me if I’m wrong, on efficiency.

Talk a little bit about that.

Eddie: When you think of a data center, the limiting factor in a data center is power. When you first design a data center, you decide how much power is gonna come into the data center. That doesn’t change through the life of the data center, okay? And so what does change is that you put in equipment that is more efficient, that gives you more computing power than the previous equipment you had in there.

So you refresh a rack, and that new rack should be able to give you twice the compute without increasing your power budget. And that’s how you can scale more and more computing. Arm is super critical in achieving that. And [00:15:00]we were able to show that at the rack level with the AGI CPU, we could provide X the computing performance over an X-based rack.

And the reason for that is because at each CPU, we are the most efficient CPU in the data center. And so now that means that you could fit more CPUs into a rack, more cores into a rack, and more compute density. So that is the real TCO benefit of the AGI CPU, is that efficiency leads to more compute capacity.

Brian: So the other thing that sort of blows my mind, having been in this industry for probably as long as you have, is that not only was the launch of Arm AGI CPU transformational in the company’s history, but Arm was already foundational across the hyperscale cloud infrastructure. This from a company that [00:16:00] for decades was the mobile CPU company.

How did we get there?

Eddie: One of the things we tried to educate folks on is that the AGI CPU isn’t the first time Arm has– the Arm architecture has been in a CPU that folks can use. We’ve started this sort of journey seven years back when we engaged with Amazon, and they wanted to build their own Arm-based CPU for their cloud data center, which was the Graviton family of processors, right?

We’re now on Graviton5. And the best part is that AWS has shown the same sort of benefits of power efficiency leading to TCO, leading to very unique rack designs, leading to more capacity that they’ve been able to get out of their data centers because of the Graviton processor. And now they’ve also shown that they can have tens of thousands of customers that can use Graviton.

They use it on a daily basis. In fact, [00:17:00] we’re now 50% of all of AWS’s server deployments are Arm-based. So people and enterprises have been using now Arm in, on a consistent basis. They can use it in AWS’s cloud data centers. They can also use it in Microsoft’s and Google’s data centers. In China, we have Arm instances in Alibaba’s data centers as well.

So we talk a lot about how there’s already been 1.25 billion Arm cores deployed in cloud data centers today. So it’s no longer about, “Is Arm gonna work for me?” It’s now about, “Hey, can I get that same experience I have on Graviton? Can I get that in my own on-premise server fleet?” And up to now, there really wasn’t a for them to use to get the same experience as Graviton in their own data center.

The AGI CPU solves that problem You’re effectively getting [00:18:00] Graviton5 because it’s the same Neoverse v3 platform that AWS built and that we built to the AGI CPU. That’s super powerful. That means that if you write your workloads and deploy them in Graviton, you’ll get the same experience, the same code set, the same software can now be run on the AGI CPU.

That was different than in, in past years and that’s really that ease of deployment that we are now making available so you can have Arm regardless of, how much you use the cloud versus how much you wanna keep servers in your own fleet, in your own control.

Brian: Let’s talk about that a little bit more. Rene stands up on the stage, holds up this beautiful chip. It’s a physical thing. It’s tangible. You talked earlier about showing the partners all the hardware after the keynotes were over, more tangible, cool stuff to look at. But the software ecosystem, somewhat unheralded, but [00:19:00] hugely important to this whole transition.

Talk about that a little bit.

Eddie: There’s a, been a evolution in our software ecosystem. And one of the biggest changes is that it’s easy to find software that now runs on Arm. And, we have an Arm ecosystem dashboard on our developer website that now tracks over 1,000 different software packages, both open source and what we call ISV software, companies software that runs on Arm natively, meaning you can go download a container, it’s already ready to run on Arm.

You can go download different Linux software versions, and they run on Arm as well. And so it’s made the what I would say the deployment, ease of deployment much improved than when we started the journey with Graviton 1 seven, eight years ago. We spent a lot of time working on the [00:20:00] software ecosystem.

So we had folks in CE software that were contributing to open source. We had a lot of folks in the BU working with companies to help them port their software. We used to run this Works on Arm program where we gave people free servers if they would port to Arm. And then most importantly, every time a cloud provider turned on an Arm instance, they also turned on people that would help with the software ecosystem effort.

The efforts that AWS put in that Google have put in, that Nvidia has put in to, continue to improve the software ecosystem, it now means that it is easy to run modern software, including AI frameworks.

Brian: So a lot of organizations now, thanks to the acceleration of AI and the proof points that have been seen all around the world, want hyperscale style efficiency and flexibility, but they don’t wanna become silicon companies themselves.[00:21:00]

How does that reshape infrastructure demand?

Eddie: Even Meta who has the size of a hyperscaler, came to us and said, “Arm, I’d rather you just build the chip.” Because it’s not necessarily a core piece of their business. And so we see this a lot, that in order for us to grow our overall market share, we had to offer a solution that was– didn’t require them to have an entire ASIC team.

And so we definitely saw that at Meta. We are seeing that at other partners as well, where they’re just not gonna invest in custom silicon. But yet they definitely wanna deploy Arm in their data center.

Brian: So earlier, you talked about how different data centers are, data centers in particular are today from the traditional data center of just a few years ago.

Look ahead for us three to five years. What’s a modern AI data center gonna look like then?

Eddie: What’s [00:22:00] really interesting is that we’re now starting to see a lot of these cloud providers, especially the newer ones, we call them like neo cloud providers. They sprung up, whether it was a CoreWeave, a Lambda, or, companies like this that have gotten into the business because they were renting GPUs.

They would buy Nvidia gear and almost have a model of renting out GPUs. They’re all realizing that, people are coming in and are asking for really just to be able to run and get tokens, AI tokens. Like, how can I… I don’t really want to go and set up an entire stack. I just really want a tokens as a service.

And then on top of that, we’re starting to see that not every token that gets generated has the same value, okay? Meaning I may need tokens that have very low latency, and I’m willing to pay a premium if I get my work done faster. Or I may need something that runs [00:23:00] overnight, and it’s like a batch job, and I don’t really care how long it takes.

And so we’re starting to now see companies asking for tokens as a service, right? How do I– Arm, help me just light up a tokens as a service. All I really want is an API, and suddenly I can start running queries on models, and I bring my own model. So I think what we’re starting to see is that, when we went from Arm give me IP to Arm give me silicon to Arm give me a rack, to suddenly it’ll be Arm, I just really want– how do I get, tokens as a service?

So the level that we kinda move up the solution stack, I think will keep climbing, and that’s what’s super exciting for us, right? Is how do we continue to build products that make it easy for customers to get to the end goal, which is to start using AI effectively.

Brian: So one more crystal ball question, and then I’ll let you get back to more important pursuits.

What– Five years from [00:24:00] now, what will seem obvious in hindsight that people still don’t fully grasp right now?

Eddie: I ask my team, “How many times did you use AI today?” What I tend to find is that there is this curve where you ask the new college grads, and they wonder why you’re even asking them that question, right?

But many of us who’ve been in the industry longer are starting to realize, “Hey, we’re probably not using it enough.”

And I have compared that to new co- So I’ve… I think in general we’re gonna see that in five years that’s not even a question we ask, right? And then on top of that, I think when we think of automation I talked about how AI agents are helpful in automating workflows.

That automation is gonna extend to the physical AI plane, okay? So AI’s gonna enable robots and- Yeah … and we’re gonna in five years think that it was silly that we [00:25:00] never had thought of, why didn’t we have robots before? Because now they’re intelligent, and AI has helped them make themselves, reliable as well as intelligent.

And then we’re gonna have devices that can just automatically communicate, so you’re not gonna h- be tied to a cell phone or a different device because a lot of these agents are gonna be able to migrate to any device, and hopefully that means that those devices are running Arm, right?

Yeah. And they can seamlessly run work anywhere. So your agent that you kick off on your phone can run in your car, can run in the cloud, and the cloud can then ask your, a robot to also finish the job, in a sense. That’s what’ll be super exciting, right? And that’s really why the whole vision of Arm Everywhere, being in all of these places and having seamlessly software run in all these places is gonna be a huge advantage for us.

I think we’ll see that vision, really come as five years, I think we’ll see it.

Brian: Eddie, [00:26:00] I don’t know about you, but we are so fortunate to be living in this most consequential of times, at least in technology. Just astonishing. So thank you for your time. Thanks for giving us a little context.

Thanks for gazing into your crystal ball. Really appreciate it.

Eddie: Thank you, and a big thanks to everybody at Arm that’s making this happen, right? It takes a village, and it takes a lot of people willing to look out and, at what we can change and how we can do it and how we can do it together and faster.

And so it’s an exciting place not just to be in technology, but to be here at Arm.

Brian: Absolutely.

Stay informed with Arm's top stories, insights, and conversations.
promopromopromopromopromopromopromopromo