KleidiAI Integration Brings AI Performance Uplifts to Google AI Edge’s MediaPipe
At the end of May 2024, we launched Arm Kleidi, which are broad software deliverables and community engagements for accelerating AI across the developer ecosystem. The first part are the Arm Kleidi Libraries for popular AI frameworks, which feature Arm KleidiAI for unleashing CPU performance across AI workloads.
Arm is working directly with leading AI frameworks on KleidiAI integrations, with these already proving to be successful through bringing significant performance improvements to today’s generative AI workloads, including leading large language models (LLMs). For developers, the KleidiAI integrations are completely seamless and transparent, so there is no need to learn additional tools and skills. This allows developers to move faster and extract maximum AI performance for their AI-based applications.
The work has been well-received throughout the industry, notably from the world’s largest technology companies. As part of his COMPUTEX 2024 keynote, Arm CEO Rene Haas shared testimonial videos of executives from Google, Meta and Samsung Mobile talking about how KleidiAI will enable them to accelerate AI innovation across multiple markets.
KleidiAI integration with Google AI Edge
The impact of KleidiAI is becoming a reality, as we have worked with Google AI Edge on its integration into the MediaPipe framework accelerated on the Arm CPU through XNNPACK, which provides support for numerous LLMs including the Gemma 2B LLM. Thanks to the KleidiAI integration, we have seen 30 percent performance improvements in time-to-first token when running our chatbot summarization demo on the Gemma 2B LLM on Samsung’s Galaxy S24 smartphone (Exynos 2400), which is built on Arm CPU technologies.
The performance improvements relate to how many tokens are being processed per second, with the KleidiAI integration enabling around 250 tokens to be processed in one second, making the demo far more responsive. These are exciting results that have positive implications for AI and ML developers, so we are inviting them to test KleidiAI powering Google AI Edge’s MediaPipe using our new Learning Path.
Previously we showed how LLMs can run on the CPU, with Arm being one of the first to demonstrate this. We expanded this work to Google with the chatbot demo shown above, which runs an application that utilizes the MediaPipe APIs and the XNNPACK CPU backend, which is then accelerated by the KleidiAI integration. The great performance shows what is possible for LLMs on the CPU, and how this can enable many real-world AI use cases, including chatbot, smart reply and message summarization.
A series of firsts
This demo is part of a series of firsts for Arm and the wider AI developer ecosystem.
It is the first demonstration of Arm’s partnership with Google AI Edge on MediaPipe and XNNPACK to accelerate AI workloads for developers on the CPU. This is just the start of our work to bring best-in-class performance to XNNPACK, which is an open-source library of highly optimized neural network operators. As XNNPACK has over 7 billion third-party installs, the KleidiAI integration brings the best in AI performance on the CPU to the widest possible market.
“We are excited to support KleidiAI in Google AI Edge’s XNNPACK to accelerate AI workloads on current and future Arm CPUs. This allows AI developers to access existing and new Arm architecture features to deliver outstanding performance that will only improve over time.” Matthias Grundmann, Google AI Edge Lead.
It is also the first in a series of Kleidi integrations that are happening over the coming months, where Arm will be enabling many more LLMs to run as effectively and efficiently as possible on devices on the Arm CPU. By focusing on Kleidi integrations across the software ecosystem, Arm is making AI performance available across the broadest range of hardware and accessible to the broadest community of software developers that are building AI-based applications on Arm, for Arm.
Easy access to AI performance
For developers, the KleidiAI integrations accelerate the development process and unlock AI performance on the pervasive Arm CPU to deliver the very best AI-based experiences on devices. KleidiAI also works across all tiers of CPU that utilize our industry-leading architecture features, like Neon, SVE2 and Scalable Matrix Extension, enabling the development of portable software solutions for application developers.
As Rene Haas said in his COMPUTEX 2024 keynote: “If you don’t have something developers can get access to, the hardware is not going to do you much good.” Stay tuned for more Kleidi integrations and optimizations as we continue to build the future of AI on Arm.
Learn more about the new KleidiAI technical demo here.
Any re-use permitted for informational and non-commercial or personal use only.