First Arm KleidiCV Integration Accelerates Computer Vision Workloads on Mobile by 4x with OpenCV 4.11

The widespread growth of generative and multimodal AI workloads is leading to increased demand for computer vision (CV) technologies that enable the interpretation and analysis of visual information from the real-world. They can be used in many applications, such as face recognition, photo sorting, filter treatments, augmented reality, and more. However, these CV applications can struggle to achieve optimal latency and processing speeds, especially on mobile devices that have memory, battery and processing power limitations.
This is where Arm KleidiCV, an open-source library leveraging high-performance image processing functions in the latest Arm CPUs, plays a vital role. It can be integrated into any CV framework to simplify and accelerate performance optimizations for CV workloads with no action needed by developers. For mobile, this is crucial because accelerating image processing is a crucial first step to the creation of more lightweight, accurate and capable models for broader AI use cases, while also enabling CV features to run faster and, at the same time, preserving the battery life for users.
Through a new integration with OpenCV, the world’s largest open-source CV library, KleidiCV performance accelerations will be available to millions of CV developers worldwide. The integration, which is live and accessible by default to Android users via OpenCV 4.11, is delivering unprecedented performance enhancements for CV applications on Arm-based devices in Android mobile markets.
The performance benefits of KleidiCV integration with OpenCV
OpenCV is the platform of choice for a global base of CV developers, with over 300,000 daily downloads of the OpenCV Python package. The KleidiCV integration into OpenCV, accessed via the Maven Repository, enables automatic performance enhancements for CV developers by mapping directly to the underlying Arm architecture and features, like Neon and SVE2, that offer a range of acceleration capabilities for CV workloads.
Arm launched KleidiCV last year alongside KleidiAI, a library providing targeted kernels for integrations into AI frameworks and seamlessly accelerating traditional ML and generative AI based models on Arm CPUs. Initial benchmarking at the time of launch suggested a typical performance uplift of 75 percent for a variety of image processing tasks on OpenCV.
However, following the KleidiCV integration into OpenCV 4.11, we have observed an up to 4x performance uplift, leading to faster CV computations and quicker response times for key image processing tasks used in object detection and recognition and image segmentation. These tasks include:
- Blur, which enhances images for object detection by reducing high frequency details;
- Filter, which sharpens and smooths images;
- Rotation, which aligns images for object recognition; and,
- Resizing, which reduces the computational load when processing large images.
The key features and benefits of KleidiCV
KleidiCV simplifies the development process by automatically detecting the hardware it runs on and selecting the best implementation accordingly. This means developers can achieve maximum performance without needing to manually optimize their code. Other key features and benefits of KleidiCV include:
- Multithreading: This allows for faster processing and improved performance.
- Broad applicability: KleidiCV supports a wide range of workloads, including image processing and resizing, making it relevant to applications across automotive, consumer tech and infrastructure markets.
- In-built security: Arm’s Security Development Lifecycle is embedded into the functions of KleidiCV.
Enhancements and updates to OpenCV 4.11
The OpenCV 4.11 update brings several enhancements to its suite of tools and functionalities for CV workloads that complement the KleidiCV integration. These include:
- Improved DNN module, which provides initial support for 3D convolution networks and async inference with the InferenceEngine backend.
- Enhanced Calib3d module, which is a new IPPE algorithm for solvePNP and pose refinement routines.
- Optimized universal intrinsics, which provide the AVX-512 implementation and other optimizations for better performance.
These updates, combined with KleidiCV’s optimizations, significantly enhance the capabilities of OpenCV, making it a powerful tool for developers.
Shaping the future of CV workloads
With the first integration of KleidiCV now complete, we are proving how software optimizations are unlocking new CV performance and capabilities on Arm CPUs on mobile. This is paving the way for accelerated CV workloads and models across a variety of other markets beyond mobile, such as robotics, automotive and medical applications, to name a few.
By leveraging the power of Arm CPUs and widespread use of OpenCV by developers, the OpenCV 4.11 KleidiCV integration delivers substantial performance enhancements for a wide range of CV applications. Through KleidiCV and our leading compute platform, we are shaping the future of CV in mobile and beyond.
Learning path on Arm KleidiCV performance optimizations
For more information about the KleidiCV integration with OpenCV and the performance optimizations that this enables, please visit the Learning Path in the link below.
Any re-use permitted for informational and non-commercial or personal use only.