Podcast

March 11, 2025

How Arm & NVIDIA Are Shaping the Future of AI and Datacenters

Arm's Robbie Williamson and NVIDIA's Ian Finder explore the NVIDIA Grace CPU, Superchips, and the evolution of accelerated computing

The Arm Podcast · Arm Viewpoints: NVIDIA + Arm: Supercharging the Data Center Revolution

player embed code

Summary

The datacenter landscape is undergoing a profound transformation as the industry works to deliver unprecedented performance and efficiency for AI and high-performance computing workloads. In this episode of the Arm Viewpoints podcast, Ian Finder from NVIDIA and Robbie Williamson from Arm share valuable insights into their collaboration, highlighting how this partnership is reshaping enterprise computing.

Breaking Free from Traditional Constraints

“NVIDIA can build all sorts of different things. We can innovate on micro architecture… on I/O, on fabrics,” Finder says, describing how Arm’s standards-driven ecosystem enables NVIDIA to focus on innovation while maintaining software compatibility across platforms. This freedom allows NVIDIA to develop specialized solutions like the Grace CPU with exceptional memory bandwidth for AI workloads, while while allowing customers to use standard tools, operating sytems, drivers and more.

The result? A system that delivers what Ian describes as “2x the performance per watt of competing parts” while ensuring that containerized applications work seamlessly across environments. As Williamson puts it, “A binary running on an AWS cloud is going to run on NVIDIA Grace. You don’t have to do anything.”

Grace, Superchips, and Strategic Innovation

Williamson and Finder also discuss the NVIDIA Grace^TM CPU Superchip, packaging two Grace SOCs with industry-first datacenter-class LPDDR memory for exceptional performance.

The conversation emphasizes the practical benefits for developers, with Finder noting that binaries running on AWS cloud will run on NVIDIA Arm without recompilation. They explore how Grace CPU is optimized for workloads that aren’t optimized for traditional architectures, such as ETL, graph analytics, and vector databases, while different Superchip configurations like Grace Hopper and Grace Blackwell offer flexibility in CPU-to-GPU ratios for various use cases.

Finder describes the benefit to Arm’s ecosystem, allowing each company to innovate in their areas of strength. Both speakers envision a future where organizations embrace multi-architecture environments strategically, starting with workloads that benefit most from the combined strengths of Arm efficiency and NVIDIA’s accelerated computing innovations.

For developers and organizations exploring these technologies, the message is clear: the future of high-performance computing combines the standardization and efficiency of Arm with NVIDIA’s innovation in accelerated computing, delivering unprecedented performance for tomorrow’s most demanding workloads.

Come see us at NVIDIA GTC next week or visit this page for more info on the collaboration.

Speakers

Ian Finder, Group Product Manager, NVIDIA

Ian Finder is a Group Product Manager at NVIDIA, where he is responsible for NVIDIA’s datacenter CPU product lines, including the Grace and Vera chips– which are available as both standalone compute products, as well as with GPU acceleration in platforms like Grace Hopper Superchip. He specializes in the development of “technically rewarding” products, including cutting-edge hardware and software solutions for complex computing challenges.

Before joining NVIDIA, Finder played a crucial role at Microsoft Azure, where he led the development of the N-series accelerated IaaS products. Notably, he spearheaded the creation of AI supercomputing infrastructure that supported some of OpenAI’s earliest scale-up training. His team’s work resulted in Azure’s first public cloud VM offering to rank among the top 10 supercomputers globally.

Earlier in his career, Finder worked on Microsoft’s Office 365 platform, where he architected a variety of cloud-scale, data-intensive services. He holds a BS in Computer Engineering from the University of Washington’s Paul G. Allen School of Computer Science & Engineering.

Robbie Williamson, Senior Director of Engineering, Infrastructure Line of Business, Arm

Robbie Williamson is passionate about building and growing technical teams, developing and challenging engineers, and creating an atmosphere of accountability, respect, and most of all, excitement in the workplace.

He currently leads a team of engineers to drive product adoption for Arm’s partners within the Infrastructure market segment. With a focus on Cloud & Edge Native Computing, Datacenter Networking, Offload Acceleration, Hyperscale Computing, and Carrier/5G technologies, they help Arm partners provide & promote high-performance, low-power cloud to edge solutions built on the Arm Neoverse platform.

Prior to Arm, he was VP of Engineering at Vapor IO, provider of edge datacenter solutions, held multiple executive roles across Engineering and Customer Success at Canonical, creators of Ubuntu Linux, and started his 20+ year career in technology at IBM, working and leading teams within their Linux Technology Center.

Born and raised in the Austin, Texas, Robbie is also a two-time graduate of the University of Texas at Austin, 5-time Ironman race finisher, a blessed father of two teenage boys (plus a 120lb Cane Corso dog), and a lover of all things SciFi/Superhero/Tech related since childhood.

Brian Fuller, host

Host Brian Fuller is an experienced writer, journalist and communications/content marketing strategist specializing in both traditional publishing and evolving content-marketing technologies. He has held various leadership roles, currently as Editor-in-Chief at Arm and formerly at Cadence Design Systems, Inc. Prior to his content-marketing work inside corporations, he was a wire-service reporter and business editor before joining EE Times where he spent nearly 20 years in various roles, including editor-in-chief and publisher. He holds a B.A. in English from UCLA.

Transcript

Key Highlights:

[00:01:00] Ian explains how Arm’s standards-driven ecosystem enables NVIDIA to innovate on CPU microarchitecture, IO, and fabrics while maintaining software compatibility.
[00:02:48] Clarification of Grace vs. Grace Superchip, with Superchip packaging two Grace SOCs together with industry-first data center-class LPDDR memory.
[00:04:30] Discussion of NVIDIA’s custom Scalable Coherency Fabric providing twice the bisection bandwidth of competing products with 72 cores on a single NUMA domain.
[00:06:40] Robbie highlights the seamless software compatibility where “a binary running on AWS cloud is going to run on NVIDIA Arm without recompilation.
[00:09:00] Explanation of optimized workloads for Grace CPU, including ETL, graph analytics, and vector databases that benefit from the architecture.
[00:12:50] Ian discusses different Superchip configurations like Grace Hopper and Grace Blackwell, offering flexibility in CPU-to-GPU ratios for varied workloads.
[00:16:00] Robbie describes Arm’s unique partnership model that creates a “crowdsourcing the best” approach to chip design.
[00:27:00] Both speakers emphasize the future focus on allowing each company to innovate in their areas of strength within the multi-architecture data center environment.

Brian: [00:00:00] So Ian, Robby, welcome. Thanks for joining us today. How are you guys doing?

Robbie: I’m doing great. It’s Friday. The weekend’s coming up and this is good for me.

Ian: Doing all right too. Can’t complain. Can’t complain. Thanks for having us on.

Brian: Yep. Yep. Thanks for your time. So there’s been a lot of buzz lately about how NVIDIA and ARM are reshaping the data center landscape together, particularly with gray CPU and the various CPU architectures.

So we want to do a little deep dive to better understand this collaboration. So let’s start at a high level. Give us an overview and I guess we’ll start with Ian. Give us an overview of how Arm and NVIDIA are collaborating on next generation computing solutions and why the world should care.

Yeah,

Ian: I think it’s great to have a standards driven ecosystem, which is what Arm is doing with things like Arm Server Ready. that allow us to [00:01:00] innovate on the CPU. We can build all sorts of different things. We can innovate on micro architecture. If we choose to, we can innovate on IO on fabrics.

We can choose where we want to add value. And that can be typically like you’ll see with Grace Hopper in an accelerated compute ecosystem, right? Where we built a CPU. Around exceptional memory bandwidth per watt, exceptional memory bandwidth per dollar AIDL workloads that need that kind of memory bandwidth and that kind of connectivity into the CPU fabric.

And when we do all that, we have fundamentally a chip that could still run the same software as everyone else. And so we’ve moved into a world where we’re free to innovate on microarchitecture on IO on fabric on needle movers, and we don’t have to. Break the mold on things like software compatibility, driver compatibility.

If someone takes a docker container from AWS, they run it on grace, it’s going to work great. And that’s a really powerful thing to be able to do. [00:02:00] And something that historically, if you look to some of the great architectures of the past that frankly, the industry has struggled with, right?

You look at things I’m not going to pick on anyone in specific, but you can go back to the 80s and see this like rich ecosystem. Of really great varied CPU ideas that were totally incompatible with each other from a software ecosystem perspective. And now we don’t have that friction.

We can all build our best product for our best use case. And share that ecosystem.

Brian: So Neoverse is a foundation for grace superchip architectures. What makes it particularly well suited for that? And I perhaps first for the unfamiliar, could you describe the difference between grace and the grace superchip?

Ian: Yeah. So grace refers to our first-generation CPU product for the data center. Grace Super Chip is a way that we package grace. So grace is the chip, grace is the SOC. And when we have things like Grace super chip, that, [00:03:00] that’s a effectively two grace SOCs on a very fast socket to socket link or chip to chip link.

Put onto a module with the industry’s first data center class deployment. of LPDDR memory. That’s a huge thing for us. And what Superchip gives you is it takes that grace technology and it packages it into something that embodies the essence of, let’s say, like a high end two socket x86 like server similar kinds of performance, similar kinds of memory bandwidth.

And that Superchip module, because of the design point we’ve used, because of the LPDDR memory is actually, in many cases 2x the performance per watt. Of competing parts. We’re able to do that at a system level, right? Your SoC is only as good, if you can build the most power efficient SoC in the world, and if the memory takes a bunch of power, you’ve only moved the needle this much.

So superchip was a way for us to get the technology of the memory, the packaging, the socket to socket link, the CPU into the market. Without having to build a whole new, ecosystem of servers in one go. And I think what you’ll see [00:04:00] with other products we’ve announced such as C one, as you’re starting to see.

A more diverse set of form factors as the industry ecosystem gets on board with grace things like single socket designs, chip down designs, that you’re starting to see as well. So super chip was a great go to market vehicle that allowed people to move quickly. And, there was a question on Neoverse in there as well, in the Neoverse IP, we found a core that was, yeah, pretty good. Pretty good data center class core. And for our use cases, as I mentioned, we like to pick where we innovate. There’s all these different places we can innovate. That’s the cool thing. We could do the micro architecture, do the I. O. With grace, we chose to innovate on really the fabric.

What we found is for a lot of the workloads that we wanted to target. We wanted, high bandwidth, quarter core communications, For the types of things that don’t necessarily move to GP, right? Sparse, branchy, memory intensive workloads, lots of fork joint concurrency. So we built a totally custom NVIDIA fabric that we call our scalable coherency fabric, [00:05:00] twice the bisection bandwidth of competing products.

It’s 72 cores on a single NUMA (non-uniform memory access) domain. There’s no sub NUMA islands. This is a really great fabric. And, by using Neoverse, it allowed us to spend our design cycles on building that fabric, which is how grace was able to get partially how grace was able to get the exceptional performance that it did.

Brian: Robbie, we have some questions for you later on, but feel free if you want to, if you want to jump in here. No,

Robbie: No, you’re good. You’re good. I have the same problem. So I totally understand. It just comes with the roles, right? But. No, I think Neoverse, specifically the V series, it’s the design that the mindset of Arm going into that design was, for high end HPC level, big compute, and it was a perfect marriage for NVIDIA to go after the V series, IP, and then again, to marry it with their expertise, in terms of, the mesh and obviously, the GPU, and I think the cool thing that happened was, yes, great. There are workloads that you want to, that you absolutely need to transfer the GPU. We all know this, especially from the AI [00:06:00] perspective, being able to take full advantage of just your general-purpose compute workloads in the same system is where you get to benefit the Neoverse.

Cause it’s the same line that we’ve pounded through the cloud partners and really iterated on and learn from that so that NVIDIA could take that IP and take advantage of what we’ve learned. I think that the thing that ARM brings to the table is that our You know, with all the hyperscalers, we’ve learned a lot.

You could put that into the IP so that NVIDIA can actually produce, the highest-powered Arm server you could buy off the shelf. Right now it, de facto, and again, I think, just to reiterate on the point around, multi architectures and, the eighties and nineties having worked at previous companies that have done previous architectures and live that.

That, that pain of software, it’s always a software, no matter how great the architecture is, if you don’t have that ecosystem around it, if developers have to recompile, if it doesn’t work, it doesn’t take off. I don’t care how great the hardware is. Now I’m a software guy. Of course I said that, but at the same time, it’s true.

And I think the cool part is that, as you said, it’s going to work. You could take, A binary running on an [00:07:00] AWS cloud is going to run on NVIDIA ARM. It, you don’t have to do anything. It may, it may run better, you could tweak it, but there’s no recompile, there’s no, Oh, there’s a library mismatch.

No ARM is ARM now. And in the past it really wasn’t. And that was really painful. If you’ve lived through the previous iterations of the ARM server adventure as we have.

Ian: Yeah. Yeah. One of the things I think people actually miss that you touched on Rob, Robbie is that. People always think about this at the application and kind of container level which used to be, as you mentioned, kind of a wild west that, that is now very easy story to tell, but I don’t think people realize how deep down into the stack that compatibility has actually got, talking about if I write, if NVIDIA provides a driver for our ConnectX series NICs, On Arm, that driver will run on a, on a grace platform.

It will run any other Arm platform. If we write, a PCI express GPU driver all the low-level systems code is very good point into the workloads. And then that’s [00:08:00] huge.

Robbie: Yes. Yes. Cause in another prior life I was, in the distro world of Linux and having to have separate kernels for different Arm variants of servers is not ideal when the competition just had one, right?

So it was like, Oh yeah, exactly. Exactly.

Brian: So Robby, you run Ironman triathlons and when I think of the great CPU super chip, I think Ironman triathlon. If it’s more of a question for you, what kind of server workloads is the gray CPU chip being optimized for these days?

Ian: I think what you have to do is you have to look at the incredible growth in accelerated computing.

For our customer base and for the kinds of workloads that people do on accelerated computing, AI, AI MLDL workloads, we continue to be able to get frankly, two X year over year kinds of gains. Through employing a bunch of optimizations that we can do in the parallel compute world, things like mixed precision, things like sparsity and we’re able to drive incredible progress gen on gen in [00:09:00] GPU performance.

And when you do that, there’s these other workloads that people are doing near their GPUs, things like ETL, like graph analytics, vector databases, right? There’s this entire world of structured data that doesn’t always get the benefit of those immense parallel compute optimizations that go into the GPU.

And what our goal is really to build a CPU that’s for. And AI focused AI centric world. And so it’s great at things like, breadth first search algorithms, love our fabric. They fan out, they fan down, they fan up, they fan down. You can get something like literally two X, the performance per socket versus some of the best competition out there with those types of workloads.

Same with things like ETL on structured data. And there’s a very conscious thing because it hey an unaccelerated lens on structured data. These are the [00:10:00] workloads that are going alongside large GPU deployments, but you can actually take it a step further and you can think about the potential now with again being able to have a standard CPU architecture but being able to have shared memory with the GPU coherent memory with the GPU.

Extremely high bandwidth, the GPU relatively low latency to the GPU. You can also start to see how the line between what a CPU workload and GPU workload is can become blurred a little bit, because now, you can do that portion that might be, branching on the CPU and they can hand a pointer up to the GPU.

It’s a very powerful programming technique. I think one of the things that people actually get wrong a lot of the time about coherency is they think of it as purely a performance thing. Not a developer productivity thing. So these architectures, at their simplest point, go after the set of workloads that are on the CPUs today that scale as you build more GPUs.

And at a slightly more complex point, they provide a demarcation line where you can take that workload and [00:11:00] you can split the part that wants to be on the CPU or wants to be on the GPU. Be very productive in how you split it. So it’s two worlds of what we’re doing with that, if that answers the question.

Yeah.

Brian: Yeah. Let’s talk a little bit. I cut you off there, Robbie. You want to? I was just

Robbie: going to add yeah, I think he hit the spot on what can’t the machine do now? This is it’s been optimized so well for AI ML, VL, right? So yes you’re taking full advantage of the GPU.

You have this amazing CPU. Now you can take advantage of that as well. There’s other, and timing wise, we’re getting close to MWC. NVIDIA is a great partner there. We have the AI RAN story going on and, in our booth and, now it’s now I can run these machines, at the edge and they’re doing more than, I’m getting AI, ML, sources close to the end user, and I’m running the normal services that I’d normally run on the server, but now I don’t have to, I don’t have to do two things.

It’s like a consolidation and optimization. Of getting, the best of both worlds in a single box, right? Like you said, it’s very efficient now that a developer doesn’t have to go, Oh, let me go switch over to this server to do this. And, Oh, the ML stuff, like I go over here now, it’s all in the [00:12:00] same thing.

So it’s very, it’s way more efficient in terms of being able to deliver what you need to.

Brian: So let’s talk a little bit about Grace Hopper and Grace Blackwell, the super chips there. How do they handle different AI workloads, whether it’s training or inference?

Ian: Yeah. First thing I would say on that is, is earlier in the call, we talked about the Grace Superchip, and I want to clarify our branding a little bit before I answer your question, which is that, the Grace Superchip was the CPU Superchip that we were talking about earlier, heart and soul of a two socket compute platform on a module, there’s other ways that we compose Superchips, this will segue into the answer to your question as well.

Yeah. We have different compositions of things like the Grace Hopper superchip, where you saw a one-to-one coupling of a CPU and a GPU together, a certain amount of bandwidth between them. Moving into the Grace Blackwell superchip, a different module that has a different subscription rate of CPU to GPU.

And that’s the neat thing about looking at this at a system level, [00:13:00] right? The superchip is all about system. Add. We can actually play around with the topologies; what we’re doing to make different value propositions for people with different workload demands. And a customer doing, co located inference, where they want to actually put the web server, the preprocessing, if they’re looking at image data, they want to resize it, they want to scale it, something like that.

There’s all sorts of different ways you can architect this. But that customer might prefer, to co locate some of their inferencing infrastructure right next to the GPU that’s doing it. They might be compelled to go to something like a Hopper superchip. You have something, like the Grace Blackwell superchip, and you’ll see the products we’ve announced today have a little bit of a different ratio of CPU to GPU power.

Those might skew toward training scenarios. But on this particular call, I don’t want to be as, as prescriptive as to what these things are for, because I actually think the message of accelerated compute. Is all about people being able to be more inventive in how they want to place their work, how they want to divide their workloads and having that Arm CPU, a very capable [00:14:00] CPU at a very low power, great memory bandwidth co located with your GPU infrastructure.

You get to be a lot more creative. Hey, maybe I want to move some of these services that I was going to run somewhere else in the data center. Maybe those move right next to the GPU. Maybe the workloads that I was running on my CPU start to split and we move some of those. To the GPU, cause now we have coherent programming.

So I’m not going to be on this call, be really prescriptive about what topologies for what workload, because I actually think there’s far more opportunity now than there was when the GPUs were, traditionally at the end of a PCI express link and you would copy data to them, you get it. That, that put a much more.

Let’s call it definitive set of rules around what topology was for what workload than what we have.

Brian: Robbie, here’s one for you. We’ve heard a lot about the structure of these devices, the utility, the workloads, the flexibility. Walk us through how you see ARM’s architecture enabling the [00:15:00]customization and power advantage, power efficiency advantages in these solutions.

Robbie: Sure. You’re before war, you’re asking the software guy to talk to hardware, but I, I’ve worked in hardware companies my entire life.

So I live the fine line. No, I think. It’s funny because I think, we say that Arm’s architectures always come from a heritage of an understanding of power and performance and the criticality of low power and high performance, right? Like we’ve been bred up from those roots, those ends, of where we started.

I would say that one of the big advantages of ARM is really the approach that we do to licensing IP and our partner ecosystem such that we’re allowed to learn so much and feed that into the products that we provide, by working, with all these various hyperscaler partners, understanding their critical workloads, what they’re trying to do, feeding that into our IP pipeline and creating the products that we, that come out of the Neoverse and V series pipeline.

We’re able to at least Learn, [00:16:00] do something that other people can’t because, they’re not opening up their IP to these partners. So we’re learning, we’re getting the best of all these partners and then we pull it together and then we offer it to a new one such as Nvidia. And then they could tweak it to their own needs.

By the way, this is what we learned. These are the workloads that we’ve already done in our pre-silicon projections. We know about these workloads. Oh, but you’re adding all these AI things. Awesome. Okay. So how do we work together there to make your product even better? Based off of the data you feed us, I think that is the real kicker that we, the real, it’s the partnership, the ecosystem, together we build these great products, and you’re benefiting from, it’s like a, it’s like a, it’s like the, the whole open source approach, but obviously it’s not open source, but it’s like you’re benefiting, it’s crowdsourcing the best.

And then we deliver that to you to customize to your own unique needs. It’s the best of both worlds there. And so I think that’s what ARM really brings. And again, based off of our heritage of power and performance, that you can’t just outright run without any constant, concern over power it’s, especially now that these data centers get larger and larger.

Power consumption is important. And as [00:17:00] a, as someone who has to sell it, Nvidia is like, look we definitely need to deliver performance, but we are conscientious about the power that it takes. And the cooling and everything else. So I think that’s what really ARM brings to that is that flexibility and crowdsourcing.

Brian: So Ian, how did the software guy do on that one?

Ian: I was going to say, yeah, I really like a couple aspects of Robbie’s answer that they stick with me. And part of that, you guys can edit this out if you want, on this particular topic. And I actually don’t inherently believe that in an age where the instruction set architect, the abstractions on top of it.

I don’t believe that there’s anything inherently power efficient about R. As like an ISA. And that’s a common misconception. People come up and they say ARM is inherently more power efficient. And it’s not, and that’s not a knock. That’s what I love about Robbie’s answer. Is because what ARM does do is it frees us to put forth products with different value [00:18:00] props, particularly different value props around performance per watt.

It frees us to really look at our system design and figure out where to get all the power back and put something forth in the market that is differentiated in those respects without having to worry. How the heck are we going to boot this thing? Where am I going to get the driver for the NIC? How is someone going to run Python on it?

So I actually I think I think it’d be great.

Robbie: Yeah. Let’s be clear. There are other architectures out there that also claim power efficiency, but their software ecosystem is not as robust. It’s a, it’s the wild west of which, we’ve come from. And that’s ultimately, where things start to break down and that, where the work that ARM’s done and NVIDIA has done in the ecosystem.

All this work on the software side is pays off for everybody.

Brian: Okay. So Robbie, a software question for you. So let’s pretend I’m a developer and I’ve, I’m for some amazing reason, new to this whole world of NVIDIA and [00:19:00] AI and developing on NVIDIA devices and ARM devices. Where do I start?

What software tools and resources are available for me to get started? What’s not

Robbie: available for you. And to be honest and not, and really no joking aside, like in terms of, especially with NVIDIA, and the CUDA ecosystem, all the work they’ve done it’s a no brainer in terms of where do you go, for example, yes, there’s a tons of open-source tools as well. PyTorch is, I think Ian mentioned before. That is, it’s more of what doesn’t work and it’s a slim amount of things. And if it doesn’t, it’s probably on the way to being optimized and ported, and so forth.

Yeah. I think yes, access to the hardware has gotten a lot easier and that, as we’re at the digits and obviously clouds and access has, that’s probably the bigger hurdle, I think there’s been so much work. Done by our partners in this space, like NVIDIA, Amazon, they’re great open source, contributors for ARM.

And it’s all, and because it now all works across the board, you’re not, again, I don’t have to have [00:20:00]PyTorch for NVIDIA and PyTorch for, Corebot 100 on Azure and PyTorch for Amazon, it all, it’s all the same binary, in terms of working. I think now the question really is please find the stuff that doesn’t work and we wanna know about it.

But you shouldn’t really stumble, outside of some of the newer tools that are being developed, but now they are being developed in a multi architectural mindset of, now it’s not just what architecture I need to support when I develop these tools. I need to really, and now there’s the CICD and GitHub runners in place for all these folks to exist.

Right?

Brian: So let’s, let’s assume I’m hearing a lot about all the cool kids are developing on ARM based solutions, but I don’t right now. How do I migrate there? What’s your perspective on porting my applications over? How does that work?

Robbie: It depends on what you’re porting in a sense of, in some cases, using a cloud instance is fine because you’re at such a high level.

You don’t really need to know a lot about the hardware underneath, or you don’t need to do a lot of prescriptive [00:21:00] things. Cases you may need, and the video server or the scaled down version that, digits, which is pretty awesome by the way. But I think it depends on where you’re a developer.

If you’re an OS developer, if you’re a low-level library developer, you’re going to want bare metal hardware that you have options for an ampere server. If you got big bucks, you can buy an NVIDIA gray server, or now that digits is out there, which is like that

Ian: pleasantly surprised at our pricing,

Robbie: but I’ve bought the servers, but the digits it’s awesome, right?

Like I really get geared up for that as a. That’s like the Raspberry Pi of brace for in a way. That’s pretty, as a developer and then obviously there’s laptops coming on the space and if those can, that’s another big push, so it’s all about access to hardware, but I think as a developer, it’s really about where are you targeting?

Do you need, can you just run off a cloud instance, then go do that. That’s probably the easiest thing financially for you, but. Again, a lot of people need to develop, in a corner with no internet on a plane that you need a device there’s, laptops coming on there that are Arm-based and again, the digits platform, which [00:22:00] is, I’m, I love that type of hacky stuff like jets and nanos and all that stuff.

So I’m all about, but Yeah, I think it’s really, I think there, ARM has been running this works on ARM program. You can go to the, you can look that up and there’s a lot of different options of partners and how to get access to their hardware, either remotely or, in your hands, it was a problem of say, I’d say, three, four years ago, but it’s gotten a lot easier in terms of getting access to the hardware to do the development you need.

Brian: So that FUD that you hear out there that porting to ARM is hard is just that.

Robbie: That’s a tricky question. Porting software to any architecture is easy. Having it run well is the hard part. Because, and I’ve done this. I’ve ported different Indian, I’ve done this, right? And you’d recompile port, pretty straightforward, unless you have some weird, assembly language in there, intrinsics.

But it’s but it’s not that easy. You still have to then look at that software and what if you want to perform well, really understand what the architecture you’re running on. And that takes a little nuance. [00:23:00] If there’s a lot of material out there that ARM provides, learn. Arm. com, NVIDIA provides a lot of materials.

To understand how to tune your applications, but that’s in generally what you’re going to have to do anyway, even on x86 platforms, when you move from one to another, to really get the best performance out of it. But porting, yeah, Rick or Paul, go ahead and hit me.

Ian: That was exactly what I was going to say, that is, almost Robbie’s giving you the point of view of someone who’s actually sitting there.

Porting the stuff. And for most of the people who think about this whole porting the Arm is hard. They’re generally like enterprise customers or academic customers, things like that. And if you go look at their stack, all the key areas of growth in their infrastructure are all like free open-source software, Linux Apache Spark, Hadoop, it’s all.

FOSS stuff that exists and actually someone has done that hard work that, that Robbie’s talked about, sat there, on their laptop, playing, playing golf. With all the code generation tools and for [00:24:00] those tools, it really is like it just about the same level of tuning as you do on an x86.

But if you’re running on like a, I’m not picking on AMD here, you’re running on like an AMD platform. You have to worry about sub numa islands, right? You get eight cores per chiplet. You have to worry about sub numa islands and a lot of configuration case, you have to tune things. Oh, where in the chip do my containers live?

That kind of stuff. Is actually in some ways harder for the end user to deal with because the end user is the one who’s making those are deployment type concerns, deployment type tuning concerns. And in the, and most of the stuff that our customers deal with in terms of actually having good, optimized binaries on ARM, that’s there today.

For all of the kind of free open-source software ecosystem, ISVs more and more every quarter, pretty much every ISV we talk to has a decent ARM built in house. Whether they’re talking about it yet or not. And that’s tremendous to see. And the reason for that is high performance Arm products entering the market in the last couple of years.

Robbie: Yeah. I think one thing I would add is also again, especially at the enterprise level, [00:25:00] you do run into legacy apps where maybe the developer left and they can’t,

Ian: that’s true. But I have a very, I have a certain opinion. I’ll go back to, you said something like you got big bucks.

You can buy grace. Grace is actually very inexpensive and the reason it’s inexpensive is because for the performance it has on analytics, workloads, on, on data warehousing type workloads. Y it’s the it’s, I promise you it’s the least expensive thing in the long run you can buy. Absolutely. And those types of workloads are the engine of all the data center growth.

So just like today, some, some, you go to a large customer and they have a mainframe somewhere and it’s running their line of business transactions and it’s producing a dataset and all of the research and analysis and insights that they’re building from that enterprise data, those are growing in places like free and open-source ecosystems or accelerated ecosystems alongside the data.

At the end of the day, that thing runs a batch job, produces a bunch of data. And then the heavy compute, the [00:26:00] real work happens. All of that is, is, our native today it’s growing. A lot of it is accelerated. Even more of it’s being accelerated. And that, and it’s really the insights on that, on the data sets that are being produced by those legacy applications that are, that I believe are the real growth area.

So let’s,

Brian: let’s look ahead. Cause we’re going to be bumping up on time here pretty soon. Where do you guys see the biggest opportunities for innovation in this partnership?

Ian: I think just continuing to offer a really solid foundation that, that, that frees our cycles to focus on our systems, on our hardware architecture. And not have to have something that the rest of the world can use. That’s,

Robbie: yeah, I would agree. I think it’s, as, as unsexy as it sounds, let us do the commodity part for Nvidia, which is the CPU.

It’s great, but let them focus. It [00:27:00] allowed, I think the more we can focus on the parts that we do best, the better it is for both sides in a sense. And. As accelerated computing gets way more, spread used and, as we approach, the AGI and the Skylink, Skynet, world yeah, accelerated computing and the fact that it allows, we’re allowing, partner like NVIDIA to innovate and iterate on the parts that they care about most, but have a reliable core that they know is going to be there.

And that they can also, push towards their needs still, I think, it’s going to be, yeah, it’s just keep doing what’s working

Brian: right in that sense. So what should developers and organizations be thinking about as they, they plan their technology roadmaps? I’d like to oversimplify it and say it’s up into the right, but that’s probably simplifying it.

What do you guys think? That’s a loaded,

Robbie: do developers plan the technology roadmaps really? Or is it like the total? I,

Brian: how about

Robbie: decision makers? I wish [00:28:00] developers. Yeah, I’m sure. I know some developers wish they did, right? Yeah.

I think plan for a multi architecture first class citizen environment where, you know I’m not going to be naive. I think x86 is going away. But you have to play, I think as ARM becomes more prevalent and the accelerator workloads, you need to really architect your solutions, your data centers, everything you do around a true multi architectural world, what does that mean?

CICD, maintenance, everything has to be thought of. And now that I’m really living in and a lot of, and to be fair, let’s say, x86 has dominated for so long in a data center. A lot of folks have never had to do this unless you, familiar with maybe IBM power or mainframe, but that’s, these are niche areas like finance, some things.

So you’ve lived that, but I think that is, we’re dealing with partners now as they adopt ARM internally, they have their own ARM products. They’re like, yeah, this isn’t a lift and shift. This is an expansion. And so now we’re having to reconsider all the way we approach things. And I [00:29:00] think that’s just something, we just got to really start thinking about that.

You’re living in a world where. You’re going to have to be aware of the underlying architecture to an extent, because you’re going to be placing workloads and so forth, but

Ian: it’s also not as daunting as it sounds because there’s, yeah, that’s true. It’s not recompile and everything else.

You’re right. It’s just be aware of it. Yeah. You start where it makes sense. Oh, I’m doing all this accelerated compute. I’m doing all this analytics. I have all this, I have a bunch of products that I’m using for that. Let’s. Let’s buy some Arm compute servers. Let’s buy some Arm accelerated servers.

And, maybe that partition that can be an Arm-only, partition. You’ve got a huge handful of vendors. You got Nvidia now selling the CPU products. If you need more CPUs, buy more CPUs to go alongside your GPUs and start to think of where you can group these birds of a feather.

Together, because you don’t have to, it doesn’t, supporting multiple architectures in your data center doesn’t mean that you’re doing like VDI, on right Arm today.

Robbie: Right. And every application has to work across every ar No, [00:30:00] you could be very selective about it, for sure.

Yeah. Yeah.

Ian: And I think for the for what’s recent, what’s a recent development? Is some of those pillars could be easily, ARM only or even NVIDIA only pillars at this point. We have, now that we have a data center class, CPU offering. Hey, we’re only maintaining this particular distro, this particular set of libraries, this particular binaries.

That’s just going to be grace, grace CPU, grace GPU, and the task becomes much less daunting when you don’t have multi architecture support in every swivel aid.

Brian: All right. You gentlemen have been incredibly generous with your time. Thank you so much. We look forward to seeing you at GTC and best of luck this year. Thank you.

Robbie: Awesome.

Brian: Cheers.

Subscribe to Blogs and Podcasts

Get the latest blogs & podcasts direct from Arm

Blog

Mar 22, 2022

Arm Neoverse NVIDIA Grace CPU Superchips: Setting the Pace for the Future of AI

Chris Bergey, SVP and GM of the Client Line of Business, Arm