Arm Newsroom Blog
Blog

Gemma 4 on Arm: Accessible, immediate, optimized on-device AIΒ toΒ accelerate the mobile app experience

Gemma 4 on Arm brings fast, privacy-preserving, power-efficient AI directly onto Android devices, helping developers deliver richer real-time app experiences to billions of users without relying on the cloud.
By Alex Spinelli, SVP, AI and Developer Platforms and Services, Arm

Real-timeβ€―assistance, seamless communication, andβ€―greater personalizationβ€―are nowβ€―baseline expectations forβ€―billionsβ€―of smartphone users worldwide.β€―Highlyβ€―capable on-device AIβ€―thatβ€―operatesβ€―in the power envelope ofβ€―modern smartphonesβ€―is essential to delivering instant, intelligent experiences at scale,β€―while unlocking AI’s future potential.β€― 

Google’s launch ofβ€―Gemma 4β€―accelerates the ongoingβ€―shift to on-device AI, enabling developers toβ€―seamlesslyβ€―access optimizedβ€―performance andβ€―bring increasingly capable AIβ€―experiencesβ€―directly into the apps people use every day. Unlockingβ€―theseβ€―benefitsβ€―atβ€―aβ€―global smartphone scale depends on the underlyingβ€―compute foundation, with one constantβ€―that is ubiquitousβ€―across the entire Android ecosystem: Arm. 

What’sβ€―new for Gemma 4β€―

Gemma 4 further advances on-device AI by delivering improved performance and efficiency, while expanding support for the kinds of multimodal experiences that matter most on Arm-based devices, including reasoning, agentic workflows, and vision-and-audio enabled use cases. With enhanced capabilities across text, audio*, and image, broader language support, and a foundation for real-time assistive experiences, it enables more responsive, context-aware interactions directly on-device without increasing memory footprint. 

Exploring Gemma 4 performance on Arm CPUs

In early Arm engineering tests, SME2 shows promising performance gains for running Gemma 4 workloads. Initial tests on the Gemma 4 E2B (Effective 2 Billion) model demonstrate an average of 5.5x speedup in prefill (processing user input) and up to 1.6x faster decode (generating responses), highlighting the potential of Armv9 CPU innovations for on-device AI workloads. These engineering tests include upcoming patches to Google XNNPACK and Arm KleidiAI

As an early example of what is possible with these improvements, Envision, an accessibility-focused app for blind and low-vision users, evaluated an on-device approach for delivering more of its experience locally. Historically, Envision’s scene interpretation relied on cloud connectivity. In this prototype, Gemma 4 was evaluated running locally on Arm CPUs with SME2 capabilities, enabling users to capture a photo and receive a detailed scene description directly on-device without requiring a network connection or sending sensitive data off-device. 

These explorations on Arm CPUs highlight the broader flexibility of the Arm compute platform and the potential for continued innovation across CPU and heterogeneous compute pathways. 

The result isβ€―lower latency, stronger privacy, and more consistent user experiences regardless of connectivity conditions.β€―This shift from cloud dependency to local inference is criticalβ€―for mobile applications. It has the potential to reduce infrastructure costs for developers, improve reliability for users, and unlock new categories of real-time applications.β€― 

β€œEnvision is excited to work with Arm and Google to bring powerful accessibility experiences directly onto smartphones. Running visual understanding models like Gemma 4β€―on-device on SME2-enabled Arm CPUs opens the door to reliable, low-latency scene description and visual Q&A for blind and low-vision users. For our community, the ability to access these capabilities offline is incredibly meaningful because it ensures the technology works wherever they are, while also improving privacy by keeping more processing on the device itself.”  β€“ Karthikβ€―Mahadevan, CEO, Envisionβ€― 

Envision is an early example of what’s possibleβ€―whenβ€―Gemma 4β€―meetsβ€―the Arm compute platform at mobile scale.β€―As more developers integrate Gemma 4, on-device AI will increasingly become the default architecture rather than the exception.β€― 

Why Arm matters for on-device AI at Android scale

Theβ€―Armv9 architectureβ€―is the most secure,β€―pervasiveβ€―and advanced ISAβ€―ever​​​​.β€―Arm Scalable Matrix Extension 2 (SME2) – a set of advancedβ€―CPU instructionsβ€―in theβ€―Armv9β€―architecture – isβ€―a keyβ€―technology, as it acceleratesβ€―matrix-heavy AI workloadsβ€―within the power envelope ofβ€―smartphones.β€―Already built intoβ€―Arm C1 CPUsβ€― thatβ€―are integratedβ€―intoβ€―the latest Android smartphone devices, SME2 unlocksβ€―higher sustained performanceβ€―and improved efficiency.β€―β€―Β 

Through Arm KleidiAI β€“β€― Arm’s software acceleration layer integrated into leading runtime libraries, like Google’s XNNPACK, and frameworks, like Googleβ€―LiteRTβ€―andβ€―MediaPipeβ€― β€“ theβ€―benefitsβ€―of SME2β€―are readily accessible to mobile developers with no changesβ€―requiredβ€―to existing code,β€―modelsβ€―orβ€―deployment pipelines.β€―As a result, developers automatically access out-the-boxβ€―performance optimizationsβ€―simply by targeting Arm-based Android devicesβ€―built on SME2.β€― 

​​​In practice, these software-level gains translate directly into better on-device experiences. Users benefit from faster responses, smoother sustained interactions, andβ€―moreβ€―reliable on-deviceβ€―AI,β€―all while maintaining battery lifeβ€―and thermal stability, even as modelsβ€―growβ€―more capable.β€― 

β€œDelivering Gemma 4 efficiently across the Android ecosystem requires deep collaboration across hardware and software.β€―Our work with Arm reflects a shared commitment to advancing on-device AI, combining the benefits of the Armv9 architecture and built-in acceleration technologies, like SME2, with the Android operating system to unlock greater performance and efficiency at scale. Together,β€―we’reβ€―making it easier for developers to bring fast, responsive, and privacy-preserving AI experiences to our users, without needing toβ€―modifyβ€―their existing applications.”  – Sandeep Patil, Engineering Director, Android​​ 

Arm and Google: Building the future of on-device AI togetherβ€―

As more applications move AI on-device, Arm and Google are committed to supporting developers with accessible performanceβ€―optimizationsβ€―and clear guidanceβ€―that helpβ€―Gemma 4β€―accelerateβ€―application experiencesβ€―across allβ€―Arm-basedβ€―mobileβ€―devices.β€― 

The future of mobile AI will not be defined solely by larger models, but by how efficiently, securely, and pervasively they run at scale across the Android ecosystem.β€―Through this collaboration, the benefits of on-device AI will be felt by billions of Android smartphone users worldwide.β€―β€― 

*only for E2B (Effective 2 Billion) and E4B (Effective 4 Billion) 

Article Text
Copy Text

Any re-use permitted for informational and non-commercial or personal use only.

Editorial Contact

Arm Editorial Team
Stay informed with Arm's top stories, insights, and conversations.

Latest on X

promopromopromopromopromopromopromopromo