Blog

October 1, 2024

Beyond the Newsroom: 10 Latest Innovations from Arm in September 2024

All the technological marvels in CPU performance, AI, and automotive technologies that happened at Arm.

By Arm Editorial Team

September 2024 has been another innovative month for us, showcasing various Arm innovations across various domains. From enhancing CPU performance and efficiency to revolutionizing AI software development and optimizing automotive microcontrollers, we continue to push the boundaries of technology and ensure the future of computing is built on Arm.

Here’s a summary of the top 10 technological developments at Arm this month:

Setting new performance and efficiency standards with Armv9 CPUs and SVE2

Each new generation of Arm CPUs gets faster and better, meeting the needs of modern computing tasks. Poulomi Dasgupta, Senior Manager of Consumer Computing, highlights how Armv9 CPUs and their exclusive SVE2 optimizations are part of the latest Arm technological advancements. They help enhance performance and efficiency for mobile devices and boosts HDR video decoding by 10% and image processing by 20%. This helps improve battery life and app performance for popular apps like YouTube and Netflix.

Likewise, Yibo Cai, Principal Software Engineer, explains how the SVMATCH instruction introduced with SVE2 speeds up multi-token searches, simplifying tasks like parsing CSV files. This further helps reduce the number of operations needed, leading to better performance, as seen in the optimized Sonic JSON decoder. A highlight of SVMATCH is that it helps enhance various software engineering tasks, making data processing faster and more efficient.

How Arm and Meta are transforming AI software development

Sy Choudhury, Director, AI Partnerships at Meta, explains how Arm and Meta are accelerating AI software development through open innovation and optimizing large language models (LLMs), like Llama, across data centers, smartphones, and IoT devices.

Learn more about the latest Kleidi integrations for PyTorch, ExecuTorch, and more, as well as how to unlock the true performance potential of LLMs with Arm’s cutting-edge innovations – all while simplifying model customization and deployment.

Faster PyTorch Inference using Kleidi on Arm Neoverse

PyTorch is a popular open-source library for machine learning. Ashok Bhat, Senior Product Manager, explains how Arm has improved PyTorch’s inference performance using Kleidi technology, integrated into the Arm Compute Library and KleidiAI library. This includes optimized kernels for machine learning (ML) tasks on Arm Neoverse CPUs.

These optimizations lead to significant performance improvements. For example, using torch.compile can achieve up to 2x better performance compared to Eager mode for various models. Additionally, new INT4 and INT8 kernels can enhance inference performance by up to 18x for specific models like Llama and Gemma. These advancements make PyTorch more efficient on Arm hardware, potentially reducing costs and energy consumption for machine learning tasks.

Advancing ASR Technology with Kleidi on Arm Neoverse N2

Automatic Speech Recognition (ASR) technology is widely used in applications like voice assistants, transcription services, call center analytics, and speech-to-text translation. Willen Yang, Senior Product Manager, and Fred Jin, Senior Software Engineer, introduce FunASR, an advanced toolkit developed by Alibaba DAMO Academy.

FunASR supports both CPU and GPU, with a focus on efficient performance on Arm Neoverse N2 CPUs. It excels in accurately understanding various accents and speaking styles. Using bfloat16 fastmath kernels on Arm Neoverse N2 CPUs, FunASR achieves up to 2.4 times better performance compared to other platforms, making it a cost-effective solution for real-world deployments.

Optimizing LLMs with Arm’s Kleidi Innovation

As Generative AI (GenAI) transforms business productivity, enterprises are integrating Large Language Models (LLMs) into their applications on both cloud and edge. Nobel Chowdary Mandepudi, Graduate Solutions Engineer, discusses how Arm’s Kleidi technology enhances PyTorch for running LLMs on Arm-based processors. This integration simplifies access to Kleidi technology within PyTorch, boosting performance.

The demo application demonstrates significant improvements, such as faster token generation and reduced costs. For example, the time to generate the first token is less than 1 second, and the decode rate is 33 tokens per second, meeting industry standards for interactive chatbots.

These optimizations make running LLMs on CPUs practical and effective for real-time applications like chatbots, leading to more efficient and cost-effective AI solutions. This benefits businesses and developers by reducing latency and operational costs.

Demonstrating AI performance uplifts with KleidiAI

Arm’s KleidiAI, integrated with ExecuTorch, enhances AI inference on edge devices. In a demo by Gian Marco Iodice, real-time inference of the Llama 3.1 8B parameter model is showcased on a mobile phone. This demo highlights KleidiAI’s capability to accelerate various AI models across billions of Arm-based devices globally.

KleidiAI uses optimized micro-kernels and advanced Arm CPU instructions like SMMLA and FMLA to deliver efficient AI performance without compromising speed or accuracy.

Meanwhile, Nobel Chowdary Mandepudi demonstrates how KleidiAI boosts AI performance in the cloud using AWS Graviton 4 instances. This demo highlights KleidiAI’s ability to drive AI performance on Arm-powered cloud servers while maintaining energy efficiency.

This side-by-side demo compares PyTorch inference with and without KleidiAI optimizations, showcasing significant improvements in efficiency and speed with the Llama 3.1 8B model. KleidiAI leverages advanced Arm instructions to accelerate generative AI workloads, enhancing text generation and prompt evaluation.

Arm’s growing ecosystem and server integration

The Arm ecosystem is expanding rapidly across all sectors, including Microsoft Copilot+ PCs, cloud services (AWS, Google, Microsoft), and automotive innovations like in-vehicle infotainment (IVI) and advanced driver assistance systems (ADAS). Steve Demski, Director of Product Marketing, highlights the integration of several hundred HPE ProLiant RL300 Gen11 servers, powered by Ampere Altra Max CPUs, into Arm’s Austin datacenter.

*Comparison of core density per rack using Arm or x86-based CPUs*

These high-performance, power-efficient servers support various workloads and align with Arm’s goal to run at least 50% of their on-premises EDA cluster infrastructure on Arm by 2024. This transition boosts productivity frees up space and power for future workloads like generative AI, and offers a lower cost-per-core, enabling more efficient budget allocation.

Simplifying automotive microcontrollers with EB tresos Embedded Hypervisor

The EB tresos Embedded Hypervisor by Elektrobit allows multiple virtual machines (VMs) to run on a single automotive microcontroller, supporting various operating systems and applications. Dr. Bruno Kleinert from Elektrobit explains how this technology optimizes resource use, enhances safety, and cuts costs.

The hypervisor technology is crucial for software-defined vehicles (SDVs), enabling flexible updates to vehicle functions. It also supports sustainable design by reducing hardware, cabling, weight, and energy use, with the technology touted to be ready for mass production in October 2024, and safety-approved versions expected in early 2025.

Boosting performance for AI and beyond with OpenRNG

OpenRNG is an open-source Random Number Generator (RNG) library that boosts performance for AI, scientific, and financial applications. Kevin Mooney, Staff Software Engineer, explains how it can replace Intel’s Vector Statistics Library (VSL) and supports various random number generators, including pseudorandom, quasirandom, and true random generators.

Bar chart showing the performance benefit of using OpenRNG instead of the C++ standard library. The height of each bar is the ratio of time spent generating random numbers with both libraries. Greater than 1 means OpenRNG was faster.

OpenRNG significantly improves performance, enhancing PyTorch’s dropout layer by up to 44 times and speeding up the C++ standard library by 2.7 times. OpenRNG is crucial for applications needing fast and reliable random number generation, like AI, gaming, and financial modeling, ensuring consistent results across systems.

Pioneering automotive safety with Arm Software Test Libraries

Integrating Arm’s Software Test Libraries (STLs) into automotive systems boosts safety and reliability, meeting ISO26262 standards. Andrew Coombes, Principal Automotive Software Product Manager, and ETAS explains how Arm STL can be used with Classic AUTOSAR to improve diagnostics, detect faults early, and offer flexible integration. Using Arm STLs with microcontroller hypervisors based on Arm architecture supports mixed-criticality systems and enhances fault mitigation. Meanwhile, achieving Functional Safety certification requires comprehensive strategies, as detailed in a joint white paper by Exida and Arm.

Optimizing video calls with artificial intelligence

While video conferencing is a ubiquitous tool for communication, it is not always a straightforward plug-and-play experience, as adjustments may be needed to ensure a good audio and video setup. Ayaan Masood, a Graduate Engineer, has developed a demo mobile app that uses a neural network model to improve video lighting in low-light conditions.

This app processes video frames in real time, providing smooth and clear visuals. It ensures a professional appearance during video calls, which is essential for remote work and social interactions. The success of this app highlights the potential of AI to solve everyday problems, paving the way for more AI-driven solutions in various fields.

By Arm Editorial Team

Article Text

Copy Text

Any re-use permitted for informational and non-commercial or personal use only.

Editorial Contact

Arm Editorial Team

editorial@arm.com

Subscribe to Blogs and Podcasts

Get the latest blogs & podcasts direct from Arm

Blog

Sep 05, 2024

Exploring 6 Key Arm Innovations for August 2024

Arm Editorial Team

Blog

Aug 01, 2024

Beyond the Newsroom: 10 Cutting-Edge Innovations at Arm in July 2024

Arm Editorial Team

Blog

Jul 01, 2024

Beyond the Newsroom: 13 Breakthroughs at Arm in June 2024

Arm Editorial Team

Blog

Jun 03, 2024

Beyond the Newsroom: Unpacking Arm’s 11 Key Innovations for May 2024

Arm Editorial Team

Media Information

Latest on X

; Arm @Arm ·

11 Feb 2021615315126211052

This National Apprenticeship Week, we’re proud to highlight Ayo Giwa, who completed a apprenticeship with us and has since moved into a graduate role, a brilliant example of where this pathway can lead. 👏 https://okt.to/p7em6M

Reply on Twitter 2021615315126211052 Retweet on Twitter 2021615315126211052 3 Like on Twitter 2021615315126211052 25 Twitter 2021615315126211052

; Arm @Arm ·

11 Feb 2021386234736624104

Advancing on-device AI starts with extending the CPU architecture to better support machine learning workloads.

With Exynos 2600, @SamsungDSGlobal adopts Arm Scalable Matrix Extension 2 (SME2), accelerating matrix operations on CPUs and expanding the role of CPU-based AI for

Reply on Twitter 2021386234736624104 Retweet on Twitter 2021386234736624104 25 Like on Twitter 2021386234736624104 185 Twitter 2021386234736624104

; Arm @Arm ·

10 Feb 2021325825786757570

Commodity infrastructure was built for a different era. 🦖

As AI training and inference scale, efficiency and system level optimization are becoming critical. Purpose-built platforms designed end-to-end are emerging as the new foundation for AI infrastructure.

Reply on Twitter 2021325825786757570 Retweet on Twitter 2021325825786757570 3 Like on Twitter 2021325825786757570 23 Twitter 2021325825786757570

; Arm @Arm ·

6 Feb 2019884565108511226

AI hype or real transformation? 🫧

On the latest episode of The Master Investor podcast with @WilfredFrost - Arm CEO Rene Haas breaks down where Arm sits in the AI value chain, how data centers and edge computing are evolving, and what today’s AI surge gets right and wrong.

Reply on Twitter 2019884565108511226 Retweet on Twitter 2019884565108511226 8 Like on Twitter 2019884565108511226 22 Twitter 2019884565108511226

; Arm @Arm ·

6 Feb 2019837004851065241

Where is on-device AI delivering real value today?

At a @GSMAi panel, Steve Raphael, Senior Director, Smartphone at Arm joined industry leaders to explore how on-device AI is shaping smartphones, wearables, PCs and enterprise devices.

Watch the full conversation here

Reply on Twitter 2019837004851065241 Retweet on Twitter 2019837004851065241 3 Like on Twitter 2019837004851065241 16 Twitter 2019837004851065241

; Arm @Arm ·

4 Feb 2019156568151867552

Celebrating a strong Q3 FYE26, with revenue exceeding $1B for the fourth consecutive quarter.

As AI continues to proliferate across industries & form factors, Arm serves as the common foundation—accelerating innovation with partners across the ecosystem: https://okt.to/bRZXv3

Reply on Twitter 2019156568151867552 Retweet on Twitter 2019156568151867552 18 Like on Twitter 2019156568151867552 214 Twitter 2019156568151867552

; Arm @Arm ·

2 Feb 2018415149267231187

The Arm MCP Server brings Arm cloud migration intelligence into your favorite AI assistant.

Analyze codebases, containers, and CI pipelines in minutes and move from uncertainty to clear guidance for Arm-based cloud migration.

👀: https://okt.to/Hnbpjr

Reply on Twitter 2018415149267231187 Retweet on Twitter 2018415149267231187 4 Like on Twitter 2018415149267231187 35 Twitter 2018415149267231187

Beyond the Newsroom: 10 Latest Innovations from Arm in September 2024

Setting new performance and efficiency standards with Armv9 CPUs and SVE2

How Arm and Meta are transforming AI software development

Faster PyTorch Inference using Kleidi on Arm Neoverse

Advancing ASR Technology with Kleidi on Arm Neoverse N2

Optimizing LLMs with Arm’s Kleidi Innovation

Demonstrating AI performance uplifts with KleidiAI

Arm’s growing ecosystem and server integration

Simplifying automotive microcontrollers with EB tresos Embedded Hypervisor

Boosting performance for AI and beyond with OpenRNG

Pioneering automotive safety with Arm Software Test Libraries

Optimizing video calls with artificial intelligence

Editorial Contact

Related

Exploring 6 Key Arm Innovations for August 2024

Beyond the Newsroom: 10 Cutting-Edge Innovations at Arm in July 2024

Beyond the Newsroom: 13 Breakthroughs at Arm in June 2024

Beyond the Newsroom: Unpacking Arm’s 11 Key Innovations for May 2024

Media Information

Company Overview & History

Arm Corporate Guidelines

Media Contacts

Latest on X