Blog

April 16, 2025

Llama 4 Runs on Arm

Meta’s Llama 4 delivers impressive performance and seamless deployment with its Mixture of Experts (MoE) architecture on Arm's power-efficient platform.

By Arm Editorial Team

AI is moving faster—and getting smarter. Today’s open large language models are not only powerful but also designed with real-world deployment in mind: they’re lightweight, cost-efficient, and built to scale across billions of devices. In short, they’re ready for just about anything developers can imagine.

The launch of Meta’s Llama 4 is a great example—especially when you consider what it can do on Arm-powered platforms. With its innovative Mixture of Experts (MoE) architecture, Llama 4 delivers impressive performance in areas like multimodal reasoning, tool use, and more. But what really makes it stand out is how easily it can be deployed in real-world scenarios—thanks in large part to Arm.

Optimized for performance, ready for deployment

Arm’s flexible, power-efficient compute platform enables Llama 4 to run efficiently on Arm-based cloud infrastructure, giving developers the ability to deploy large language models with enhanced performance, lower power usage, and greater scalability across diverse cloud environments.

At a broader level, we’re seeing an interesting shift in the industry. While the push towards larger and more intelligent multimodal models continues, Llama 4 represents a rising trend toward smaller, practical models that enterprises and customers can run on their own infrastructure—whether in the cloud or on-premises. Llama 4, particularly the Scout model, is efficient, focused, and structured around agentic and MoE architectures that are exceptionally well-aligned with cost-efficient, scalable platforms like Arm.

Since the release of Llama 2, Arm’s dedication to optimizing model compatibility across its platforms ensures that developers and end users can efficiently deploy each new generation of Meta’s Llama models. Llama 4 Scout is a clear example of these optimizations in action, running seamlessly across the Arm ecosystem.

Llama 4 Scout is a milestone for Arm-based systems

We’re proud to announce that Llama 4 Scout runs efficiently on Arm-based infrastructure. To validate this compatibility, we successfully deployed Llama 4 Scout on Arm-based Graviton4 using the open source inference engine llama.cpp. This straightforward deployment demonstrates that developers can seamlessly integrate advanced AI capabilities without needing specialized hardware or proprietary software. With vertically integrated frameworks like llama.cpp and general-purpose ML tools like PyTorch, the path to production is clear and accessible.

Why Mixture of Experts (MoE) architecture is ideal for the Arm platform

Intelligent Efficiency: MoE models intelligently route inputs to specialized subnetworks, dynamically allocating computational resources. This adaptive approach naturally complements Arm’s renowned energy efficiency and resource-conscious workload management.

Scalable by Design: Arm platforms, such as AWS Graviton, Google Axion and Microsoft Cobalt, offer scalable core counts and threading capabilities ideal for the parallel nature of MoE models, effectively managing workloads to maximize both throughput and overall efficiency.

Optimized for Diverse Workloads: Arm’s architecture philosophy emphasizes performance and efficiency across varied applications, closely aligning with the MoE capability to compartmentalize and specialize tasks within subnetworks.

Forward-Looking Alignment: Together, Arm platforms and MoE architectures represent a forward-thinking synergy, equipped to meet evolving demands for smarter, more resource-efficient AI solutions.

Explore Llama 4 on Arm

We’re excited to invite developers and ecosystem partners to explore Llama 4 Scout on Arm—a powerful example of our shared commitment to open, collaborative AI. Running on Arm-based infrastructure like AWS Graviton, Llama 4 Scout delivers the performance, efficiency, and scalability needed for modern AI workloads.

Discover the expansive potential of Arm-powered AI and help shape a smarter, more connected future—from cloud deployments all the way to the edge.

Ready to get started? Explore the tools, connect with the community, and help shape a smarter, more connected future—powered by Arm.

By Arm Editorial Team

Article Text

Copy Text

Any re-use permitted for informational and non-commercial or personal use only.

Editorial Contact

Arm Editorial Team

editorial@arm.com

Subscribe to Blogs and Podcasts

Get the latest blogs & podcasts direct from Arm

Blog

Dec 12, 2024

5 Key Benchmarks That Prove Llama 3.3 70B Runs Efficiently on Arm Neoverse CPUs

Na Li, AI Solutions Architect, Arm

News

Sep 25, 2024

Accelerating and Scaling AI Inference Everywhere with New Llama 3.2 LLMs on Arm

Ian Bratt, VP of ML Technology and Fellow, Arm

News

Sep 16, 2024

Arm Accelerates AI From Cloud to Edge With New PyTorch and ExecuTorch Integrations to Deliver Immediate Performance Improvements for Developers

Alex Spinelli, SVP, AI and Developer Platforms and Services, Arm

News

Oct 24, 2024

Accelerating Generative AI at the Edge on Arm with ExecuTorch Beta Release

Alex Spinelli, SVP, AI and Developer Platforms and Services, Arm

Media Information

Latest on X

; Arm @Arm ·

18 Nov 1990875924099649560

Micosoft Azure Cobalt 100 VM powered by Arm Neoverse deliver up to 99% better price-performance across workloads. From web infrastructure to quantitative finance, these instances are enabling efficiency, scalability, and real-world value .

Built on Arm - redefining the future.…

Reply on Twitter 1990875924099649560 Retweet on Twitter 1990875924099649560 4 Like on Twitter 1990875924099649560 30 Twitter 1990875924099649560

; Arm @Arm ·

18 Nov 1990873672475632110

📢 Introducing Cobalt 200!

In partnership with @Microsoft we're bringing you the first publicly announced silicon built on the Arm Neoverse Compute Subsystem V3 (CSS V3).

A vital part of our commitment to a more efficient, scalable, and sustainable cloud!…

Reply on Twitter 1990873672475632110 Retweet on Twitter 1990873672475632110 9 Like on Twitter 1990873672475632110 53 Twitter 1990873672475632110

; Arm @Arm ·

18 Nov 1990818581177606191

👋 Good morning from #SC25!

Stop by the Arm booth to explore our latest demos, connect with our talent team, and learn about open roles and life at Arm.

💡 Discover how you can help shape the future of AI and HPC — and be part of the team driving the future of compute! 📍#4425

Reply on Twitter 1990818581177606191 Retweet on Twitter 1990818581177606191 3 Like on Twitter 1990818581177606191 27 Twitter 1990818581177606191

; Arm @Arm ·

18 Nov 1990663306969784632

🚀 We kicked off #SC25 with @AWSCloud & @NVIDIA, bringing the Arm HPC & Advanced Compute community together to connect and share their experiences building on Arm.

With every major hyperscaler choosing Arm, we’re powering the future of AI and supercomputing. See you at the show!

Reply on Twitter 1990663306969784632 Retweet on Twitter 1990663306969784632 5 Like on Twitter 1990663306969784632 29 Twitter 1990663306969784632

; Arm @Arm ·

17 Nov 1990549783006593058

Our partnership with @NVIDIA keeps growing. 🤝

By extending Arm Neoverse with NVIDIA NVLink Fusion, we’re enabling partners to achieve Grace Blackwell-class performance, bandwidth, and efficiency — delivering greater intelligence per watt for the AI era.

https://okt.to/PHg461

Reply on Twitter 1990549783006593058 Retweet on Twitter 1990549783006593058 81 Like on Twitter 1990549783006593058 408 Twitter 1990549783006593058

; Arm @Arm ·

17 Nov 1990490148270686276

The Fujitsu A64FX powered Fugaku supercomputer showed what was possible with Arm architecture. FUJITSU-MONAKA shows what’s next.

Available in 2027, it brings supercomputing innovation to data centers and the edge, combining SVE2 acceleration and our confidential computing…

Reply on Twitter 1990490148270686276 Retweet on Twitter 1990490148270686276 1 Like on Twitter 1990490148270686276 20 Twitter 1990490148270686276

; Arm @Arm ·

14 Nov 1989463925105078706

Arm is powering the future of cloud computing for the AI and enterprise era.⚡

Whether you’re at #MSIgnite in person or online, don't miss our our on-demand session to learn more about how we're enabling performance, efficiency and innovation.
https://okt.to/Jy5ERg

Reply on Twitter 1989463925105078706 Retweet on Twitter 1989463925105078706 1 Like on Twitter 1989463925105078706 8 Twitter 1989463925105078706

; Arm @Arm ·

14 Nov 1989424614103990705

AI, cloud-native, and multi-architecture design are transforming how workloads are deployed and scaled. The momentum seen at KubeCon + CloudNativeCon 2025 reflects an industry building for flexibility, performance, and efficiency - powered by Arm.

https://okt.to/QuhlMj

Reply on Twitter 1989424614103990705 Retweet on Twitter 1989424614103990705 1 Like on Twitter 1989424614103990705 18 Twitter 1989424614103990705

; Arm @Arm ·

14 Nov 1989133471315345661

Counting down the days until #SC25!

From Fugaku to Jupiter, discover why the world's most advanced supercomputers and AI systems run on the Arm compute platform.

👇 Here's where you'll find us.

Reply on Twitter 1989133471315345661 Retweet on Twitter 1989133471315345661 7 Like on Twitter 1989133471315345661 13 Twitter 1989133471315345661

; Arm @Arm ·

13 Nov 1989050943455863273

AI is changing what’s possible within robotics innovation. 🤖🧠

Recently Anders Beck, VP of Technology at @Universal_Robot, joined the Arm Viewpoints podcast and shared his thoughts on how AI is driving a more flexible, collaborative era of automation.

https://okt.to/IXWvYn

Reply on Twitter 1989050943455863273 Retweet on Twitter 1989050943455863273 3 Like on Twitter 1989050943455863273 17 Twitter 1989050943455863273

; Arm @Arm ·

13 Nov 1988772310220779568

Some inventions don’t just break boundaries, they redefine what’s possible.

The Arm-based Meta Ray-Ban Display AI glasses and EMG wristband are changing how we interact with technology — no touchscreens, no buttons, just movement.

Congrats to the team at @Meta behind the…

Reply on Twitter 1988772310220779568 Retweet on Twitter 1988772310220779568 9 Like on Twitter 1988772310220779568 22 Twitter 1988772310220779568

; Arm @Arm ·

11 Nov 1988378686341702065

KubeCon + CloudNativeCon highlights just how quickly the cloud-native ecosystem is advancing. Developers everywhere are rethinking performance, scalability, and efficiency - across architectures - built on Arm.

Arm Software Developers @ArmSoftwareDev

KubeCon + CloudNativeCon 2025 shows the evolution of cloud-native systems and multi-architecture innovation. We're accelerating this shift by enabling scalable, efficient performance for AI and next-generation workloads across diverse architectures!
https://okt.to/KYaX5H

Reply on Twitter 1988378686341702065 Retweet on Twitter 1988378686341702065 1 Like on Twitter 1988378686341702065 9 Twitter 1988378686341702065

; Arm @Arm ·

11 Nov 1988377725451587876

📅 Tomorrow at #WebSummit, Ami Badani joins global leaders shaping the future of AI.

She’ll share how Intelligence per Watt is redefining progress — and why scaling AI responsibly means designing compute that’s as efficient as it is powerful.

Reply on Twitter 1988377725451587876 Retweet on Twitter 1988377725451587876 1 Like on Twitter 1988377725451587876 6 Twitter 1988377725451587876

; Arm @Arm ·

10 Nov 1988017107838398857

Hello KubeCon + CloudNativeCon USA 👋

We're so excited to see you all in Atlanta this week. We're bring community programs, booth demos, and so much more.

Be sure to swing by the Arm booth to see what we're up to!

Arm Software Developers @ArmSoftwareDev

Collaboration, learning, and innovation for the future of cloud native computing? Sign us up!

We can't wait to see you at KubeCon + CloudNativeCon USA where we'll be bringing the Arm developer experience to life with demos, community and more. 🥳
https://okt.to/nGB5Y3

Reply on Twitter 1988017107838398857 Retweet on Twitter 1988017107838398857 1 Like on Twitter 1988017107838398857 11 Twitter 1988017107838398857

; Arm @Arm ·

6 Nov 1986497586925031935

Today's announcement is cause for celebration! 🎉

@googlecloud's new N4A VMs and C4A metal, powered by Arm Neoverse, deliver unmatched performance-per-watt and scalability - showing what’s possible when one platform powers innovation from cloud to car.

https://okt.to/k2f7HJ

Reply on Twitter 1986497586925031935 Retweet on Twitter 1986497586925031935 10 Like on Twitter 1986497586925031935 35 Twitter 1986497586925031935

; Arm @Arm ·

5 Nov 1986188215766950282

Celebrating a strong Q2 FYE26, with revenue surpassing $1B for the third consecutive quarter.

As the only unified compute platform combining unmatched breadth with the performance, efficiency & security the AI era demands, Arm is delivering AI everywhere. https://newsroom.arm.com/news/arm-q2-fye26-results?utm_source=twitter&utm_medium=social-organic&utm_content=blog&utm_campaign=mk29_exec-comms_na

Reply on Twitter 1986188215766950282 Retweet on Twitter 1986188215766950282 9 Like on Twitter 1986188215766950282 44 Twitter 1986188215766950282

; Arm @Arm ·

4 Nov 1985521798000304164

OneTrust’s deployment on Azure Kubernetes Service using the Arm-based Azure Cobalt 100 processor shows what’s possible with efficient, scalable cloud compute. Together, we’re driving secure, high-performance cloud-native innovation with Azure.🤝https://okt.to/lQkmMy

Reply on Twitter 1985521798000304164 Retweet on Twitter 1985521798000304164 2 Like on Twitter 1985521798000304164 14 Twitter 1985521798000304164

; Arm @Arm ·

3 Nov 1985465776954904725

Last week, Richard Grisenthwaite joined theTSF-AI Conference to explore how Arm is powering the AI revolution. Our architecture enables trusted innovation, helping businesses build and run securely as AI scales globally. 💪

Reply on Twitter 1985465776954904725 Retweet on Twitter 1985465776954904725 6 Like on Twitter 1985465776954904725 18 Twitter 1985465776954904725

; Arm @Arm ·

3 Nov 1985425254479429978

Physical AI needs more than hardware - it needs a collaborative ecosystem built on silicon, software, and safety.

Paul Williamson, SVP and GM of IoT, notes how flexibility across platforms drives innovation efficiently and at scale.🧠💡

Physical AI Needs An Ecosystem - EE Times

Robotics is entering the era of physical AI, where smarter, safer machines work alongside humans—driven by advances ...

okt.to

Reply on Twitter 1985425254479429978 Retweet on Twitter 1985425254479429978 1 Like on Twitter 1985425254479429978 10 Twitter 1985425254479429978

; Arm @Arm ·

3 Nov 1985317893009985918

AI is reshaping the world ⏩ but can laws keep up?

In the latest episode of Arm Tech Unheard, Rene Haas and Minister @AshwiniVaishnaw unpack how innovation and policy must work together to govern AI responsibly.

Catch the full episode: https://okt.to/nXTvkb

Reply on Twitter 1985317893009985918 Retweet on Twitter 1985317893009985918 3 Like on Twitter 1985317893009985918 17 Twitter 1985317893009985918

; Arm @Arm ·

2 Nov 1985095589432881499

AI innovation isn’t just about hardware, it’s the software that connects it all. Complexity in AI toolchains still blocks real-world deployment.

See how we're simplifying the AI stack to help developers build faster, deploy anywhere in this article by @VentureBeat.…

Reply on Twitter 1985095589432881499 Retweet on Twitter 1985095589432881499 5 Like on Twitter 1985095589432881499 25 Twitter 1985095589432881499

; Arm @Arm ·

31 Oct 1984374835686895914

We're powering a major shift in AI. 💪

With Arm-based cloud instances organizations can implement AI efficiently and at scale - gaining higher performance-per-watt, lower total cost, and the flexibility to move from pilot projects to full AI platforms.

From pilot to platform: How Arm is powering AI in the cloud

Why now is the right time to evaluate Arm-based cloud instances

okt.to

Reply on Twitter 1984374835686895914 Retweet on Twitter 1984374835686895914 9 Like on Twitter 1984374835686895914 27 Twitter 1984374835686895914

; Arm @Arm ·

30 Oct 1984026047466045622

By migrating to Arm-based AWS Graviton processors and GitHub’s native Arm64 runners, @ThePSF cut compute costs by 25%, reduced carbon emissions by 40%, and achieved zero downtime - keeping Python’s ecosystem running stronger. ⚡💪
https://okt.to/atludF

Reply on Twitter 1984026047466045622 Retweet on Twitter 1984026047466045622 9 Like on Twitter 1984026047466045622 27 Twitter 1984026047466045622

; Arm @Arm ·

30 Oct 1983995601793519960

Personalized AI is reshaping our daily lives and it all starts with power-efficient compute.

From your morning latte to life-changing medical care, Arm is powering the future of AI everywhere.

Learn more in this @nytimes feature.
https://okt.to/5AQvsK

Reply on Twitter 1983995601793519960 Retweet on Twitter 1983995601793519960 3 Like on Twitter 1983995601793519960 14 Twitter 1983995601793519960

Llama 4 Runs on Arm

Optimized for performance, ready for deployment

Llama 4 Scout is a milestone for Arm-based systems

Why Mixture of Experts (MoE) architecture is ideal for the Arm platform

Explore Llama 4 on Arm

Editorial Contact

Related

5 Key Benchmarks That Prove Llama 3.3 70B Runs Efficiently on Arm Neoverse CPUs

Accelerating and Scaling AI Inference Everywhere with New Llama 3.2 LLMs on Arm

Arm Accelerates AI From Cloud to Edge With New PyTorch and ExecuTorch Integrations to Deliver Immediate Performance Improvements for Developers

Accelerating Generative AI at the Edge on Arm with ExecuTorch Beta Release

Media Information

Company Overview & History

Arm Corporate Guidelines

Media Contacts

Latest on X