Accelerating Generative AI at the Edge on Arm with ExecuTorch Beta Release
News highlights:
- Combination of Arm Compute Platform with ExecuTorch framework is enabling smaller, optimized models for faster generative AI at the edge
- New quantized Llama models ideal for on-device and edge AI applications on Arm, providing reduced memory footprint and improved accuracy, performance and portability
- 20 million Arm developers can create and deploy more intelligent AI-based applications quicker at scale across billions of edge devices
To realize the true potential of AI, we need to make it accessible to the broadest range of devices and developers. Through collaborating with the PyTorch team at Meta on the new ExecuTorch Beta release, we are fulfilling this mission, bringing AI and machine learning (ML) capabilities to billions of edge devices, as well as millions of developers worldwide.
Generative AI improvements on the Arm Compute Platform with ExecuTorch and new quantized Llama 3.2 1B and Llama 3.2 3B models
The powerful combination of the ubiquitous Arm compute platform, which powers many of the world’s edge devices, with ExecuTorch, a PyTorch-native framework designed to deploy AI models on mobile and edge devices, is enabling smaller, optimized models, including the new quantized Llama 3.2 1B and 3B models. The new quantized models are ideal for generative AI use cases on smaller devices, such as virtual chatbots, text summarization and AI assistants, as they offer a reduced memory footprint, higher levels of accuracy, greater performance and greater portability.
Developers can seamlessly integrate the new quantized models into their applications with no additional modifications or optimizations, saving time and resources. This empowers them to quickly create and deploy more intelligent AI-based applications at scale across a broad range of Arm-powered devices.
As with the new Llama 3.2 large language model (LLM) releases, Arm is optimizing AI performance through the ExecuTorch framework, with this leading to real-world generative AI workloads running faster on edge devices that are built on the Arm Compute Platform. Developers are then able to access these enhancements from day one of the ExecuTorch Beta release.
Accelerating generative AI on mobile with KleidiAI integration
In mobile, Arm’s work with ExecuTorch means virtual chatbots, text generation and summarization and real-time voice and virtual assistants are all running at improved performance entirely on the device on the Arm CPU. This was achieved through integrating KleidiAI, which now introduces micro-kernels optimized for 4-bit quantization, into ExecuTorch via XNNPACK to seamlessly speed-up the execution of AI workloads when running LLMs with 4-bit quantization on Arm Compute Platform. For example, the execution of the prefill stage of the quantized Llama 3.2 1B model will now run 20 percent faster with the KleidiAI integration, leading to speeds of over 400 tokens per second for text generation on some Arm-based mobile devices. This means the end-user will benefit from quicker, more responsive AI-based experiences on their mobile devices.
Learn more about Arm support for ExecuTorch in mobile markets in this blog.
Accelerating real-time processing for edge AI applications in IoT
Meanwhile, in IoT markets, the ExecuTorch work will improve real-time processing for edge AI applications across a broad range of IoT devices, from smart home appliances and wearables to autonomous systems used in retail in industrial IoT. Being able to process more real-time AI tasks on the edge means IoT devices and applications can respond to their environments in milliseconds, which is crucial for safety and functionality.
ExecuTorch can be leveraged across Arm’s Cortex-A CPUs and Ethos-U NPUs to accelerate the development and deployment of edge AI applications. In fact, through combining ExecuTorch with the Arm Corstone-320 reference platform (which is also available as an emulated Fixed Virtual Platform), the Arm Ethos-U85 NPU driver and compiler support into one package, developers can start creating their edge AI applications months before platforms arrive on the market.
Learn more about Arm support for ExecuTorch in IoT markets in this blog.
More accessible, faster edge AI experiences
We believe that ExecuTorch has the potential to be one of the world’s most popular frameworks for efficient AI and ML development. Combining ExecuTorch with the ubiquitous Arm compute platform, we are accelerating the democratization of AI through new quantized models that empower developers to deploy their applications quicker across more devices, and bring more generative AI experiences to the edge.
Any re-use permitted for informational and non-commercial or personal use only.