New Armv9 CPUs for Accelerating AI on Mobile and Beyond
The majority of today’s AI workloads on mobile can run on the Arm CPU. In the smartphone space, AI-enabled flagship smartphones built on Arm’s v9 CPU technologies are leading the way. These include the MediaTek Dimensity 9300-powered vivo X100 and X100 Pro smartphones, Samsung Galaxy S24 and Google Pixel 8 that all deliver unprecedented opportunities for AI innovation.
As AI workloads continue to get more compute intensive and complex, Arm is laying the foundation for next-generation AI, with more performance, efficiency and features at the heart of our latest Armv9.2 CPU cluster. These benefits are scalable across a broad range of consumer devices, from flagship smartphones and AI PCs right through to mainstream mobile, XR and wearable devices as part of our commitment to enable AI everywhere.
The new additions to the Armv9 CPU portfolio include the Arm Cortex-X925 CPU for ultimate performance and Arm Cortex-A725 CPU for superior sustained performance. We have also refreshed the Arm Cortex-A520 for the best energy efficiency for low intensity workloads, as well as updating the DynamIQ Shared Unit (DSU-120) to provide lower power and area across Armv9.2 CPU cluster configurations. All these are integrated into the new Arm Compute Subsystems (CSS) for Client, which is Arm’s fastest ever compute platform for Android.
Cortex-X925: The highest Cortex-X performance uplift ever
Cortex-X925, previously codenamed Blackhawk, is redefining the computing performance trajectory by providing the largest year-over-year performance uplift in the history of Cortex-X. It delivers 36 percent single-threaded (peak) performance improvements (Geekbench 6.2 vs. 2023 Premium Android smartphone) and 46 percent better AI performance (time-to-first token for Phi3 vs. the previous generation Arm Cortex-X4 CPU).
The power performance profile of Cortex-X925 means it delivers peak performance when it matters. This helps to elevate the user experience, with increased responsiveness across applications, generative AI workloads, web browsing, camera post-processing, video recording and AAA gaming.
These performance improvements are made possible by the best-in-class performance foundation of the Cortex-X925 and its trailblazing new microarchitecture. An optimized 3nm implementation of Cortex-X925 complemented by a premium subsystem and packaging will enable more than 30 percent higher performance scores on next-generation consumer devices. Improvements to the microarchitecture, including up to 3MB private L2 cache, provide enhanced configurability for CPU cluster implementations across a broad range of consumer devices.
As part of CSS for Client, we are co-designing and delivering CPU physical implementations. Working with leading foundry partners, we have enabled a tape-out ready Cortex-X925 physical implementation for 3nm. This helps our partners to unlock the full PPA (power, performance and area) benefits on the 3nm process, while shortening silicon development and deployment timelines through the high-volume production-ready silicon solution.
Cortex-A725: Delivering superior sustained performance
Arm’s Cortex-A700 line of CPUs provide a rich lineage of performance efficiency, with the Cortex-A725 being no different. Cortex-A725 is the workhorse for CPU workloads. Our engineering and design teams have provided targeted updates focusing on key AI and gaming use cases where superior sustained performance is required. This helps to deliver a 35 percent improvement in performance efficiency and 25 percent improvement in power efficiency compared to Cortex-A720.
Improvements in the microarchitecture of Cortex-A725 enable these performance efficiency improvements. Just like the Cortex-X925, we provide an optimized implementation of Cortex-A725 on 3nm through Arm’s advanced physical implementation. We also offer an area optimized implementation for mainstream consumer technology markets.
Refreshing Cortex-A520 and DSU-120
Cortex-A520 has been refreshed for CSS for Client to deliver the best energy efficiency, providing 15 percent efficiency improvements compared to Cortex-A520 in TCS23. The refreshed Cortex-A520 is made possible by an updated implementation and advanced 3nm physical implementation.
As part of the new CSS for Client, the DSU-120 has been enhanced for next-generation use cases and consumer device experiences. These include new performance and efficiency features, new low power modes and enhancements for mainstream consumer devices, as well as maintaining the option of scaling up to 14 cores for high-performance use cases. These contribute to significant power reductions of 50 percent for typical workloads and 60 percent lower cache miss power across the CPU cluster to reduce leakage and improve the battery life of consumer devices. The new low power modes, like half slice power down and quick nap, and enhancements support a broad range of low and high-intensity AI-based workloads, from biometrics and speech-to-text right through to AI smart camera, content creation and ML-based AAA gaming.
Arm’s most performant, efficient and versatile CPU cluster ever
These new and updated CPUs contribute to Arm’s CPU cluster configurations that offer unprecedented levels of performance, efficiency and versatility across a broad range of consumer devices. At a high level, the new CPU cluster offers 46 percent better AI performance compared to CPU clusters that utilize the previous generation Cortex-X4, leading to more responsive performance and sustained throughput. It offers 30 percent improvements in key user experience indicators – that combine performance and power – compared to the TCS23 CPU cluster, with this translating into faster applications and web browsing, superior AAA gaming and improved battery life.
The latest Arm CPU cluster also provides superior scaling for any consumer device segment. For example, it delivers best-in-class performance for PCs and laptops, with 25 percent performance improvements compared to PC and laptop devices currently shipping. Meanwhile, the low power and area compared to the previous generation DSU-120 in TCS23, alongside area and power optimizations through Cortex-A725 and Cortex-A520, provide a flexible portfolio of CPU cluster configurations for mainstream devices. This helps to bring performance and AI capabilities across a variety of lower cost consumer devices, ensuring that everyday device users can access advanced AI-based experiences.
Armv9 CPUs for next-gen AI experiences
The new Armv9.2 CPU cluster provides ultimate performance and user experiences for Android smartphones, PCs and laptops, and beyond. It delivers a complete package of real-world improvements, with each CPU component in the cluster covering a broad range of real-world use cases and workloads. For example, Cortex-X925 handles the ‘bursty’ workloads of launching applications and web browsing, Cortex-A725 provides the sustained performance required for common AI workloads and AAA gaming and Cortex-A520’s high efficiency is best for light media and idle and background tasks. All these enhanced real-world experiences are scalable across all consumer technology segments, with the new Armv9 CPUs bringing more performance and greater AI capabilities to mainstream devices and everyday users, leading to 30 percent improvements across key user experience indicators.
The consumer demand for technology is insatiable, as users spend more time on their devices and expect more advanced experiences. Whether that’s faster web browsing and applications or enhanced AAA gaming and generative AI workloads, the new Armv9 CPUs elevate all of these experiences through advanced compute capabilities that are defining the future of consumer technology.
Any re-use permitted for informational and non-commercial or personal use only.