Unlocking New Real-world Generative AI Use Cases on the Mobile CPU
In 2022, the first example of generative AI emerged through text to image generation in the cloud. The text prompt was “a photograph of an astronaut riding a horse”, with the generative AI workload creating an image of just that. While there were some issues with the image, it showcased the awe-inspiring power and potential of generative AI workloads.
Rather than running this use case in the cloud, I remember thinking to myself at the time “this is great, but could it ever be processed entirely on a mobile device?”
Generative AI is (already) part of today’s smartphone experience
Fast-forward to today and you can. In fact, many generative AI workloads, like image generation and text summarization, that are now a common part of the modern smartphone experience are being processed at the edge – on the device. This is largely thanks to the computing capabilities of today’s AI-enabled flagship smartphones and the large language models (LLMs) behind generative AI becoming smaller and more efficient. And these trends will continue to evolve, with generative AI set to be a part of every single mobile application in the near future.
AI workloads start on the CPU
As we’ve talked about previously, AI on mobile starts on the CPU. It offers software flexibility and programmability for the world’s developers. On top of this, the ubiquity of the CPU, which features in every single digital consumer device on the planet, means developers can “write once, deploy everywhere” when creating their applications, ensuring they reach the widest number of users.
Earlier this year, we demonstrated a chatbot demo deployed to be a virtual teaching assistant for science and coding that runs on mobile on the CPU. The success of the demo meant we started exploring other practical generative AI mobile use cases that run on the Arm CPU and could be used every day by the average smartphone user. This led to the creation of three new demos – group chat summarization, voice note summarization and a real-time voice assistant. Like the chatbot demo, these process and run generative AI workloads entirely on the device, which provides privacy, latency and cost benefits compared to sending the data to the cloud to be processed.
The new generative AI demos
For me personally, group chat and voice note summarization are brilliant life hacks. Like most smartphone users, I can get inundated with various messages and voice notes from friends and family, so being able to use generative AI to summarize what’s been said is invaluable.
The group chat summarization demo quickly distills group chat messages with multiple participants down to the key points in an easily digestible format. Even though the demo itself is showcasing group chat messages, it can be used for other applications like summarizing emails. This use case could also be multimodal and even include pictures as part of the summarization.
The voice note summarization demo shows how an LLM and speech-to-text model can work together in a pipeline to summarize and transcribe voice notes sent to users, with the model converting the voice note to text and the LLM then summarizing the text. For me personally, this demo is a real time-saver!
Like the previous chatbot demo, what’s innovative about the real-time voice assistant demo is that it’s happening completely in flight mode. This shows the capabilities of the Arm CPU to process generative AI workloads entirely on the device.
The demo uses whisper.cpp for automatic speech recognition before going to the LLM module which uses Google AI Edge’s MediaPipe to run the Gemma 2B model. There is also the option to use the Llama 3 model too. Even using a 3B parameter model, this can achieve a real-time conversation, with a realistic voice and also context retention.
In both the whisper block for speech recognition and LLM block for generating the response, Arm KleidiAI, a collection of highly optimized AI kernels that deliver high performance for generative AI workloads, is integrated to enhance performance. This use case could also be used across automotive applications for hands-free driver to device interactions user voice command, such as asking for directions while driving, or interactive dialogue with characters in gaming.
For all three demos, we use AI-enabled flagship smartphones that adopt Armv9 CPU technologies in their chipsets, including Google Pixel 8 and Pixel 8 Pro (Google Tensor G3 chipset), Xiaomi Redmi K60 Ultra and vivo X100 (MediaTek Dimensity 9300 chipset). The Armv9 CPU technologies integrate the latest architecture features for enhanced AI performance, including SVE2.
In the future, AI-enabled flagship smartphones built on Arm CPUs will utilize the Scalable Matrix Extension (SME) architectural feature, which accelerates AI workloads and enables improved performance, power efficiency and flexibility for AI-based applications running on the Arm CPU.
Looking to the future
While present day possibilities from generative AI are incredible, the future is likely to be even more exciting. In fact, I believe that we are scratching the surface of generative AI on mobile, particularly with image and video generation.
Recently, OpenAI showcased text to video generation, and Luna Labs demoed image to video generation. While both generative AI workloads are being processed in the cloud, if we follow the current trajectory, then there is no reason why they couldn’t be processed on mobile on the CPU in two years’ time – just like the astronaut riding the horse!
Generative AI on mobile runs on Arm
With so many different use cases and workloads that are possible, generative AI is consolidating the smartphone as the center of personal and professional compute. This makes it a hugely exciting time for generative AI in the mobile space.
Through our ubiquitous CPU technologies that feature in 99 percent of the world’s smartphones and industry-leading mobile ecosystem, Arm is the company which is enabling these amazing possibilities.
As we continue to add more capabilities and architecture features to the Arm CPU, alongside unlocking yet more AI performance for developers through Arm Kleidi, Arm is the mobile platform for the future of AI.
AI runs on Arm
Learn about how Arm is accelerating AI everywhere, from cloud to edge.
Any re-use permitted for informational and non-commercial or personal use only.