Accelerating AI Developer Innovation Everywhere with New Arm Kleidi
In the ever-evolving, fast paced age of AI, we are steadfast in our support for the millions of developers worldwide and ensuring that they have access to the performance, tools and software libraries needed to seamlessly create the next wave of stunning AI-enabled experiences.
This is why we are launching Arm Kleidi, a broad program of software and software community engagements for accelerating AI. The first of which is our Arm Kleidi libraries for popular AI frameworks. This enables developers to transparently access the outstanding AI capabilities of the pervasive Arm CPU, where most of the world’s AI inference workloads from cloud to edge already run today. Developers can leverage over 20 years of Arm architectural innovation that consistently improves AI capabilities and performance, from the Armv7 architecture that first introduced the Advanced Single Instruction Multiple Data (SIMD) Extension for machine learning (ML) workloads to today’s Armv9 architecture that incorporates features that accelerate and protect advanced generative AI workloads on the Arm CPU.
Featuring KleidiAI for all AI workloads and KleidiCV for best-in-class computer vision (CV) workloads on Arm CPUs across all tiers, Kleidi Libraries will be embedded directly into popular AI frameworks, with no action needed by developers. This allows developers to frictionlessly enable the AI capabilities of the Arm CPU to build their AI-based applications quickly, at the highest possible performance and across the broadest range of devices.
Accelerating AI
KleidiAI is our solution to the explosion in device types, neural networks and inference engines. It is a collection of highly optimized AI kernels that deliver high performance in use cases such as generative AI. The beauty of KleidiAI is that rather than giving developers extra work to do, we are working directly with leading AI frameworks, including MediaPipe (via XNNPACK), LLAMA.cpp, PyTorch (via ExecuTorch) and TensorFlow Lite (via XNNPACK), to integrate KleidiAI. This accelerates the development process and unlocks AI performance, giving developers performance by default, so they can seamlessly create the best possible AI experiences. KleidiAI also provides forward-looking compatibility to ensure developers can take full advantage of future AI acceleration opportunities as we bring additional technologies to market.
The integration of KleidiAI is already translating into significant performance improvements for generative AI workloads. It accelerates the time-to-first token for Meta’s Llama 3 and Microsoft’s Phi-3 LLMs using llama.cpp by 190 percent on the new Arm Cortex-X925 CPU compared with the reference implementation (which is based on llama.cpp without our software Kleidi optimizations). KleidiAI is so easy to integrate that it took Arm’s engineering teams less than 24 hours to measure this optimized performance for Llama 3. Also, through the KleidiAI integration with MediaPipe through XNNPACK, which provides support for the Gemma open LLM running on mobile, there is a 25 percent improvement for the time-to-first token for Gemma 2B on the Google Pixel 8 Pro smartphone.
Finally, we are working with Unity on Sentis, its on-device AI inference engine that empowers game developers to create innovative, AI-driven gameplay experiences on all devices that support the Unity Game Engine. After integrating KleidAI, Unity Sentis managed to enable the int4 quantization to reduce the model memory utilization by 72.5 percent and improve performance by 660 percent when running the Phi-2 LLM.
For more information about KleidiAI read this blog.
Accelerating CV
KleidiCV accelerates CV pipelines that are used for many camera use cases. OpenCV, the world’s largest CV library containing over 2500 algorithms and supporting hundreds of thousands of developers, has already identified a typical performance uplift of 75 percent for a variety of image processing tasks based on KleidiCV integrations. As part of our strategic software partnership with OpenCV, we are also bringing Android builds to Maven Central, a repository of open-source software components and libraries for Java development, for the very first time.
For more information about KleidiCV read this blog.
The benefits of AI on the CPU
Arm Kleidi focuses on accelerating the AI capabilities on the CPU, as in most cases all AI workloads will start by running on the CPU. This makes it the easiest path for developers when targeting their AI workloads. Therefore, the more performant that we can make this path for developers, the more likely that they will be able to keep using and targeting the CPU during the development process. Also, as LLMs become smaller and more efficient, there will be an increasing number of AI-based workloads that will make sense to be processed on the CPU. The end result is a smoother, more seamless development process that optimizes the performance of developers’ AI workloads.
Building the future of AI on Arm
The introduction of Arm Kleidi re-emphasizes Arm’s role as the leading compute platform for on-device generative AI. It enables developers to access the exceptional AI performance of the Arm CPU across the widest array of hardware possible without the need to learn additional tools and skills. As we continuously innovate our leading-edge architecture for the next generation of AI, developers will have access to even greater, more advanced AI capabilities in the future. For the end-user, this means exceptional AI experiences that are faster, more intelligent, more interactive, more immersive, and more secure.
What’s happening with Arm Kleidi is just the start, with more libraries, compute kernels and engine integrations planned for the future. Our message is to watch this space for further updates as we continue to build the future of AI on Arm.
Any re-use permitted for informational and non-commercial or personal use only.