November 18, 2020
SiMa.ai Sets Sights on High Performance, Low Power Endpoint AI
SiMa.ai’s Kavitha Prasad explains the driving force behind the company’s endpoint AI SoC and the wide-ranging applications it enables
By Kavitha Prasad, VP Business Development and System Applications, SiMa.ai
lifecycle of an endpoint AI device may span years—even decades. Those that go
the distance will be capable of processing the machine learning (ML) algorithms
of the future.
we may not know what those algorithms might look like yet, we can be sure that
they will be more complex and more demanding than the workloads we currently
task endpoint AI devices with today.
endpoint AI devices today are capable of around 4 or 5 Tera Operation Per Second
(TOP) per watt. That’s enough for basic ML routines yet incomparable to a datacenter offering of AI compute.
Reducing the power profile of endpoint AI
SiMa.ai began as an ambition to shrink this performance divide: to redefine the performance associated with endpoint AI today. Yet achieving anything close to cloud-like performance in an endpoint AI device would require a marked reduction in power consumption—or rather, a significant increase in TOPs per watt.
goal in mind, we developed the MLSoC™ (Machine Learning System on Chip)
platform, targeting a peak of 10 TOPs per watt. For an embedded power profile
of 5 watts, we can achieve up to 50 TOPs for our ML accelerator. That’s enough
to enable AI workloads that would traditionally require cloud performance in a passively
cooled endpoint AI device.
We designed our heterogeneous MLSoC to be capable of processing the workloads our customers had created some time back but also one future-proofed for upcoming workloads none of us have identified yet. Unlike a data center, which can be upgraded as new iterations of components come to market, the hardware embedded within an endpoint AI device is set the day it’s baked into silicon.
Our solution to this challenge combines traditional
compute IP from Arm with our own machine learning accelerator and dedicated
vision accelerator. As the market leader in low power compute, Arm IP was the
obvious choice as a secure platform upon which to build our MLSoC. We chose the
Arm Cortex-A65 CPU after working closely with our customers to define the
compute requirements for their applications: it was a decision very much based
on customer needs, from performance down to software toolchain.
While it’s capable of a wide range of ML workloads
such as natural language processing (NLP), SiMa.ai’s MLSoC is initially optimized
for computer vision applications. Computer vision is already central to many
endpoint AI use cases,
from traffic cameras to manipulating selfies—and we believe its use will only
increase in future applications such as high-end surveillance, crowd control
and thermal scanning.
Computer vision unlocks future complex use cases for endpoint AI
the vision accelerator with the ML accelerator also ensures MLSoC can handle
complex workloads such as sensor fusion from multiple sensors—this enables it
to play a role in autonomous systems from consumer autonomous vehicles to autonomous
robots in industrial IoT settings. We also foresee a role for MLSoC in
aerospace and defense.
these complex autonomous workloads require more than 50 TOPs. That’s why we’ve
designed MLSoC to be modular: by combining multiple machine learning
accelerator mosaics via a proprietary interconnect, we can scale from 50 TOPs
at 5 watts up to 400 TOPs at 40 watts.
that today’s level 5 autonomous vehicle prototypes draw around 4 kilowatts,
that’s potentially a 100x reduction in power consumption and a greatly reduced
physical hardware footprint, alongside reduced need for active cooling.
another good reason for reducing power consumption in devices that will soon be
filling our world in the millions. A lot of the OEMs and customers we talk to are very conscious about how
to bring down the power profile so they can become carbon neutral by 2030 or
earlier. That’s reason enough for us to want to design something low power.
I believe that MLSoCs will play a key role in enabling
low-power AI in edge and endpoint devices. But I also know that it’s not enough
to simply provide a license to a solution benchmarked to achieve a certain
number of TOPs.
Many of the solutions that exist on the market today advertise their performance based on benchmarks such as ResNet-50. But quoting frames per second or TOPs per watt only matters if it is achievable under real-world conditions—i.e., our customers’ workloads.
Our customers want one thing: development velocity.
How quickly they can go to market. They don’t want to spend months in
development cycles trying to achieve the performance they’ve been promised,
they want to be able to license your solution and then add their own secret
sauce using simple and comprehensive tools.
We’re planning to tape out our MLSoC early next year, with a view to delivering engineering samples and potentially customer samples towards the end of next year. However, we’re already working very closely with customers to define and build their applications and map them to our hardware, and the software development kit (SDK) will be available to customers in advance.
This means they’ll be able to work through the flows, develop their applications and run simulations so that when the silicon becomes available it’s simply a case of compile-and-go.
And because MLSoC is grounded in Arm technology, our customers can be sure that they will have the software, tools and ongoing support they need to build not only the next generation but many subsequent generations of highly capable, low power AI devices.
Transform Your Business with Arm Technology
Arm technology scales from the tiniest of sensors to the largest of data centers, providing power-efficient intelligence upon which transformative applications and business models are built. Discover more about Arm solutions and get started on your journey to market.
Any re-use permitted for informational and non-commercial or personal use only.
Brian Fuller and Jack Melling