Arm Newsroom Blog
Blog

New Arm KleidiAI Integration Accelerates Multimodal AI Experiences at the Edge With Alibaba’s Qwen Model 

Multimodal AI workloads at the edge improved by up to 57 percent leading to exceptional user experiences following Arm collaboration with the open source MNN framework developed by Alibaba
By Ronan Naughton, Director, Product Management, Client Line of Business, Arm

As part of the ongoing AI evolution, we are seeing the emergence of multimodal AI models that can process and understand multiple data sources, including text, images, audio, video and sensor data. However, deploying these advanced multimodal models on edge devices is challenging due to the power limitations and memory constraints of the hardware and complexity processing multiple data types simultaneously.

Meeting this need is Arm Kleidi, which delivers seamless performance optimizations for all AI inference workloads running on Arm CPUs. KleidiAI, a lightweight suite of highly performant open-source Arm routines for accelerating AI, is already integrated into the latest versions of popular AI frameworks for the edge, including ExecuTorch, Llama.cpp, LiteRT via XNNPACK and MediaPipe. This means millions of developers can automatically benefit from AI performance optimizations with no additional effort needed.

Today, we are delighted to announce another successful KleidiAI integration following a close collaboration with MNN, a lightweight open source deep learning framework developed and maintained by Alibaba. As a result of this collaboration, multimodal AI workloads are now running on mobile on the Arm CPU via Alibaba’s instruction-tuned 2B parameter Qwen2-VL-2B-Instruct model, which is designed for image understanding, text to image reasoning, and multimodal generation across multiple languages for edge devices.

Faster response times for multimodal AI use cases at the edge

Through the KleidiAI integration with MNN, Arm and MNN have measured accelerated performance for the Qwen2-VL-2B-Instruct model and faster response times across key AI multimodal use cases at the edge. These unlock enhanced user experiences across a range of customer-focused Alibaba applications, including chatbots for customer service queries and e-shopping applications that enable photo-to-goods searching functions to help customers find the items they are looking for.

The faster response times for these use cases are made possible by a 57 percent performance improvement for pre-fill, which is how AI models handle multi-source prompt inputs before generating a response, and 28 percent performance improvement for decode, which is the process of generating text from the AI model after processing a prompt. The KleidiAI integration also helps to process AI workloads more efficiently at the edge through lowering the overall computational cost of multimodal workloads. These performance and efficiency uplifts are accessible to millions of developers that run their applications and workloads on the MNN framework and other popular AI frameworks for edge devices where KleidiAI is integrated.

Video: Arm KleidiAI and MNN demo

KleidiAI integration demo at MWC

At MWC, we will be showcasing the multimodal capabilities of the Qwen2-VL-2B-Instruct model via the new KleidiAI integration with MNN at the Arm booth in Hall 2 Stand I60. The demo highlights how the model understands diverse combinations of visual and text inputs before responding with a summary of what is in the image. All of this happens on the Arm CPU on smartphones built on MediaTek’s Arm-powered Dimensity 9400 mobile system-on-chip (SoC), including the vivo X200 Series.

A leap forward in multimodal AI experiences

The integration of Arm’s KleidiAI with the MNN framework for Alibaba’s Qwen2-VL-2B-Instruct model provides a significant step up in the user experience for multimodal AI workloads, and they are all delivered at the edge on the Arm CPU. These outstanding experiences are available on mobile, with leading customer-facing applications already utilizing the benefits of KleidiAI. Looking ahead, KleidiAI’s seamless optimizations for AI workloads will continue to enable millions of developers to create more sophisticated multimodal experiences on edge devices. This will set the stage for the next wave of intelligent computing and an exciting leap forward in the ongoing evolution of AI.

Partner quotes

We are pleased to see the collaboration between Alibaba Cloud’s large language model Qwen, Arm KleidiAI, and MNN. Integrating MNN’s on-device inference framework with Arm KleidiAI has significantly improved the latency and energy efficiency of Qwen. This partnership validates the potential of LLMs on mobile devices and enhances the AI user experience. We look forward to continued efforts in advancing on-device AI computing.” Dong Xu, GM of Tongyi Large Model Business, Alibaba Cloud

The technical integration between the MNN inference framework and Arm KleidiAI marks a major breakthrough in on-device acceleration. With joint optimization of the architecture, we have greatly improved the Tongyi LLM’s on-device inference efficiency, bridging the gap between limited mobile computing power and advanced AI capabilities. This achievement highlights our technical expertise and cross-industry collaboration. We look forward to continuing this partnership to enhance the on-device computing ecosystem, delivering smoother and more efficient AI experiences on mobile.” Xiaotang Jiang, Head of MNN, Taobao and Tmall Group, Alibaba

Arm Kleidi

Find out more information about Arm Kleidi and the benefits it brings to developers.

Article Text
Copy Text

Any re-use permitted for informational and non-commercial or personal use only.

Editorial Contact

Arm Editorial
Subscribe to Blogs and Podcasts
Get the latest blogs & podcasts direct from Arm

Latest on X

promopromopromopromopromopromopromopromo