Pushing AI to the Edge: A Conversation
Summary
BrainChip CMO Nandan Nayampally interviews Arm Fellow Ian Bratt about bringing more advanced AI capabilities to edge devices. They discuss the growing demand for on-device AI to handle privacy, latency, and scalability issues. Ian notes that while state-of-the-art models will continue to get larger, there is also constant work on optimization and creating smaller, more specialized models that can run efficiently at the edge.
They talk through challenges including enabling no-code development environments and getting enough quality data for training models that will reside on edge devices. Ian also touches on Arm’s efforts like the TOSA specification to help standardize and stabilize the fragmented edge AI ecosystem and where future innovation may be headed.
Speakers
Ian Bratt, Fellow, Senior Director of Technology, Arm
Ian Bratt is a Fellow and Distinguished Engineer at Arm, where he leads the Machine Learning Technology group within the ML business unit. Recently, Ian’s team defined the architecture for Arm’s family of Machine Learning Processors and has been responsible for multiple ML related improvements to the Arm IP roadmap. Before working in machine learning, Ian worked as an architect on several generations of Arm Mali GPUs, during a high-growth period which culminated in Arm partners shipping over 1B Mali GPUs in 2016. Prior to Arm, Ian worked at the pioneering multicore startup, Tilera. Ian has worked on NPUs, CPUs, GPUs, memory systems and SoC architecture. He holds an S.M. from MIT and has 23 granted US patents.
Nandan Nayampally, CMO, BrainChip
Nandan Nayampally is the Chief Marketing Officer at BrainChip Holdings Ltd, a role in which he drives all aspects of marketing, product management, and business strategy for the company’s neuromorphic AI IP and its portfolio of Essential AI enabling technology solutions. He has over 25 years of success in building and growing technology businesses with industry-wide impact. Prior to joining BrainChip, Nayampally spent more than 15 years at Arm in various product marketing and product management leadership roles, eventually becoming vice president and general manager of Arm’s signature CPU group and the Client Line of Business. He also worked at Amazon, where he helped accelerate the adoption of Alexa Voice and other multimodal services into third-party devices.
Transcript
Moderator: This is the BrainChip podcast, hear from our thought leaders about neuromorphic computing, beneficial AI, and how BrainChip’s Akita is pushing AI to the edge. This podcast is a place for investors, practitioners, and anyone interested in the future of AI.
Nandan Nayampally, BrainChip: Hello, and welcome to episode 30 of BrainChip’s “This is Our Mission” podcast.
I’m Nandan Nayampally. Chief Marketing Officer and Head of Product here at BrainChip. And today we’re going to explore intelligence at the edge with another esteemed guest, Ian Bratt, who’s a well-known name in AI. Ian is a Fellow and Senior Director of Technology for the Central Engineering Group at Arm.
Ian has an extensive background in all forms of compute, having worked in CPU, GPU, obviously in memory systems, but in particular as a lead technologist for Arm’s Neural Processing Initiative. Welcome Ian.
Ian Bratt, Arm: Thank you, Nandan. Thank you for having me.
Nandan: So it’s great to reconnect. I think Jem Davies was with us a year ago.
So it’s a, an annual ritual that we have to. Work closely with Arm and match our trend. What I really want to do in this episode is actually you were at a forefront of AI implementations across the broad series of market segments, et cetera, right? And, but naturally the focus on edge computing is coming into the spotlight.
And this is really where we believe. I believe you would do too, that really pulls together the scaling of AI and intelligent services delivery. In terms of this even leading lights like OpenAI Sam Altman mentioned earlier that this year that large level language models won’t always be a focus on the cloud, but smaller, more specialized models will emerge rapidly.
Let’s use that as a starting point and. Let’s see what you think about that.
Ian: Yeah, so I definitely like to echo your point about the kind of the explosion of edge AI. I think, the way I think about AI from a compute perspective is that the demand is essentially insatiable. At all levels of the stack.
And so therefore, if you can do it on the edge, then it will be done on the edge. So there’s huge demand to basically enable significant AI workloads on the edge, I think, Altman’s comment about seeing optimized and smaller models in the future, I think is definitely true. And I think if we look at the.
The past history of model development that roughly rings true. Although I think there’s a bit of a nuance to it. So I think what we have seen in the past is there will be a breakthrough. So there’ll be a breakthrough for some new use case. And there will be a neural network that that satisfies that the requirements for that use case.
And then you see a phase of optimization and that makes the model much smaller much more optimized and more amenable for deployment on the edge and just deployment in general, right? If you can do it with less compute, what’s not to like, of course. buT I think that. That process is happening in conjunction with this ever-increasing model size.
So we see. We see a big model and then we see an optimization and then we’ll see a new breakthrough with an even bigger model and then more optimization. And we keep seeing that repeating. On the vision side, we start Inception V3 was a big deal. And then MobileNet significantly smaller with that, with excellent performance, I think large language models, we definitely saw, some big improvements from the original GPT type work to the smaller models. So I think that trend will continue. There’ll be a breakthrough and then an optimization phase, but that, that gen general trend of just more and more compute, I think is still holding, but there’s points along the way where engineers pick off and optimize the networks.
Nandan: I think that’s a fair observation, right? And in fact, if you we were looking through Hey, transformers, five, four years ago, transformers were huge and they continue to get bigger, but you do have versions of that now available for the edge. You’ve got vision transformers. You’ve got tiny vision transformers.
I say we would. These similar types of optimization for specialized use cases, as you say in the LLM format as well. Speaking of which, do you have any bets on what are the use cases that will drive some of these optimization?
Ian: I’ve always been a bit reticent to make specific bets because a lot of the way I think of. Of these use cases is not it’s not about some new shiny use case that maybe didn’t exist before. It’s really about making existing use cases better. So I, I think iN the edge, things like classic, vision and audio processing has been around for a long time, but we’ve been very limited in what we can do.
But with, this continued advancement of neural network technology and the enablement of the underlying platforms, we’re just going to be able to make those experiences much, much better step function in, in performance. So that’s the way I think about it. It’s just a rising tide that lifts all boats.
Nandan: That’s a fair assessment. I think, but on the edge, if I look at it, right? Keyword spotting and kind of visual wake words were a starting point. Graduating to something more comprehensive in its ability to do speech, even if it is a limited subset, seems like a natural next step, uh, adding vision into a broader thing rather than just wake words are possible to do on smaller Model but speaking of that type of trend.
Obviously there’s a large set of customers trying to bring that, uh, what should I call condensation, if you will, of models down. The llama project that are trying to do similar things. Have you seen a lot of traction around that on the Arm platforms? Clearly, smartphones are a great place for to start, but on broader edge devices.
Do you see some of the trajectories around llama coming? Coming down.
Ian: Yes. So the question specifically on the generative large language models being pushed to the edge. Absolutely. And I think that is there are multi multiple forces driving that from both directions.
So one, Is on the, from the edge perspective, the, those use cases, like maybe a an auto completer or some sort of chat experience on your edge device. The latency is very important and there might be privacy concerns. So if you can do that on the edge that’s great. On the other end, it is.
Take significant compute and storage to run these models. And if every single device on the planet is now using large language models a few times a second, and you send that to the cloud, there’s just nowhere near enough cloud to handle all of that. Those competing forces, so there’s cooperative forces, both from the cloud, actually unable to handle all that compute plus the desire for having it on the edge, I think are working together to make that happen.
Nandan: So I think it’ll definitely happen. Yeah. That makes perfect sense, right? I think we’re beginning to see a consistent team here and everybody agrees almost on it. The rough numbers I had, and you may have a better insight was. A chat GPT based search is probably 10 times the compute cost of a regular search that just put the tax and in fact, probably lines up with ultimate statement that you’re going to have to do more on the edge to make it economically sustainable on both sides, not to mention, as you say, the privacy and security aspects of it and the real time response aspect of it.
Ian: Yeah. Yeah. I think that’s a great way to, to look at it. And when you start thinking about all of the scenarios that could be improved with LLMs, like every time I’m writing an email, anytime I’m doing anything on my local laptop or on my phone that could be calling into LLMs and that’s just not scalable to send all of that to the cloud.
That just won’t happen.
Nandan: Since we are on this topic of kind of what’s the big buzz at the moment, what are your thoughts on multimodal edge AI?
Ian: I think that’s inevitable. But I do not have a concrete idea for kind, for how that will manifest. I think if you think of edge AI systems as. they are primarily often sensing systems that are making some decisions at providing feedback, determining when to notify somebody of something or when to log an event.
And in those sensing systems, the more modalities of sense, the better the Result. So I think we will definitely see that in that trend continue in edge AI, but specifically how I’m not quite sure, but I think that, I think, yeah, that will happen.
Nandan: I think from a, from an investment perspective, I think that’s where a lot of the.
The focus and dollars are going and how to make that more possible, which also brings me to the point that I think the three V’s, as we talked about, vision, voice, vibration at the minimum set, uh, if you can do all three on an edge device, that is effectively what everybody’s trying to go to, is that a fair statement?
And if so, what would you see are the key challenges there? And how is, for example how are you addressing it with Arm? Internally, as well as through your partnership and ecosystem.
Ian: Oh, boy. So that’s a, it’s a great question. And so just to play it back, so I make sure I understand you’re saying in a future where The voice vibration and vision aspects are working together in edge systems for multimodal.
What are the key challenges there and how do we enable that? Yeah, I think, from the, I’ll take the enablement bit first from the Arm perspective. A lot of it is around I don’t want to say business as usual because that. Maybe trivializes it, but it is around just basically providing more energy efficient compute, right?
That’s what the Arm ecosystem does well. So providing more energy efficient compute, the more that’s capable in those envelopes, then the more you will be able to express in those systems and the richer problems you’ll be able to solve. I think the key challenges there probably are around the practicalities of.
Of data collection and divide designing your neural networks with the right data such that they’re robust and can handle all sorts of different scenarios. So I think it’s a lot of the challenges are really around the applications area around, data and application design.
Nandan: And I think this is a really good jumping off point to the.
The reality, which is it’s software and ecosystem rather than the efficiency of the hardware itself, right? The systems are great, but it’s really the enablement of how those things come to market. Give us a little bit of insight on, how you’re working with things like Nvidia’s Tau on the Cortex M and the Ethos platform.
And I was also going to say, what does that mean for. The market, right? How can you actually help people do local no code and enable the AI services or applications to move closer to the edge?
Ian: Yeah, that’s a great question. And thank you for mentioning NVIDIA TAO. So the going back I think I want to say almost like six years ago, we started some work with an Arm and the. The vision of the work was a tool where you would input your data and you would tell the tool, Oh, I’m running on this edge device.
And then the tool would do some compute for a while and out would spit out. It would spit a, an optimized neural network for that data for that device. Subject to the, the constraints of the device. So small energy efficient. So that was the vision of this work. We started a long time ago and we.
And it’s so easy to articulate a vision like that. And we were so wrong in terms of the state of the technology to enable that sort of tool. So yeah, we could get some things to work, but it would take an army of Ph.Ds turning knobs and cranking levers to get anything out so that, that automated vision flow.
Turned out to be very hard to achieve. And I think a lot of companies have been realizing that. And I think it’s just going a bit slower than we had expected, but it is happening. And we’re now starting to see stuff in the ecosystem like NVIDIA TAU, where if you constrain the problem and say I’m just going to support these certain networks, and I’m just going to support these sorts of.
Optimizations in the search space, then yeah, you can create some optimized models that that might be better suited for the edge. So I think like everything with engineering, the real path forward is constraining it to a simpler case providing that tools. And then over time we’ll get more and more towards that, that automated vision of the kind of no code push button flow to get a, an optimized neural network.
But I think that will, it will take a long time for that to happen. So I was definitely wrong in how quickly I thought that would manifest. Yeah,
Nandan: I think it’s the famous overestimating what a technology can do today and underestimating what it can do in 10 years type of mindset, right?
I think the applications, especially at the edge, pretty we probably don’t even know the scope of what we’ll see in 10 years, but we’re probably rushing too hard and assuming it’ll happen in one. And we need exactly
Ian: the court. Yeah, exactly.
Nandan: And to that effect, in fact, just in the podcast, we showed last week with Zach Shelby from Edge and Pulse, a common partner of ours.
This was another real topic, right? How do you actually make Edge a first class citizen in terms of AI development? How do you support what the cloud side and at least what we’d call the network edge side has already done in terms of enabling it with. Optimize modeling frameworks that work well, development environments that work well.
And I think Edge Impulses model for us is also similar, right? Enabling how and using that to their advantage. Now we come to the challenges and AI model is only as good as the data that was used to train it. And this, I think is probably a bigger challenge when you start thinking about the edge than it was for cloud.
How do you see that happening? Is it a collective path of improving it? Or are there known ways to solve that problem? Considering the types of models you try to do at the edge may be very different because they are very sensor specific, so on and so forth. I know this is a little bit of a hand baby high level question, but I think there is a lot of challenge in how to make that work well.
Ian: Yeah I agree. I think that’s a huge challenge. And it’s, I think I’ve heard the phrase data is the new oil. And I think the data needed, like you said, for these edge devices is particularly hard to extract and refine. So I think that is a probably, an unsolved problem or I’m sure lots of folks are trying to solve it right now.
But yeah, I think that’s, I think that’s very important, just as important and as challenging as actually deploying all of the AI on the edge, getting that clean edge data is going to require lots of innovation and probably entrepreneurship to get the ecosystem to the point where that data is out there that people can use.
Nandan: That’s a very good point, right? Which is think about AI innovation at the edge, not just in terms of hardware and models, tools that can help you build those models more effectively or train your next generation in a better way. Yeah. And this kind of brings me to something from the old Arm vaults where Mike Muller, former CTO said, there’s no big data without little data.
There are similar aspects that you start going much closer to the sensor that you need to have larger sensor data utilized well and intelligence utilized well there to make the collective higher intelligence, if you will, um, work better.
Ian: Yeah. Yeah, definitely. I think I love that. I love that saying. And I think also another an interesting aspect of this angle is that often this data is even if you’re getting it, it’s often noisy or, you’re generating it on maybe lower cost sensors.
And so there’s always going to be a need. In these platforms for kind of more classical computing, just to process that data, get it ready before it feeds into the, to the neural network. And that’s something it’s easy to lose sight of that aspect of the pipeline. But it’s very important in these edge systems to think about we call it sometimes the neural network sandwich. So the neural networks in the middle, but there’s often, compute before and after to process the data and also interpret the results.
Nandan: I think that’s a an excellent segue. I do think that at least what we are seeing, and I wanted to see whether you were seeing similar things in the network sandwich, as you call it, neural network sandwich, sometimes the bread slices are much bigger than the filling.
Ian: That’s right, and nobody likes a sandwich that’s all bread.
Nandan: How do you see actually the neural network getting probably smarter to try and reduce the amount of bread it needs, right? Do you see that as a trend? Happening across the industry, because certainly we think that there is more value coming in saying the less filtering I need to do before I feed it into my network, the better the less I need to post process, the better there’s smarts going into networks to reduce filtering, um, and preprocessing.
Ian: Yeah, I think that’s definitely something we’re seeing as well. So you think of the neural network is essentially like eating into those classic algorithms or the pre and post processing, um, I think in some scenarios the pre and post processing is sufficiently well understood.
There’s sufficiently. Sophisticated classical algorithms that are very energy efficient. And we shouldn’t throw those away and do it with a very inefficient neural network, just because we can. So it’s around finding that right trade off.
Nandan: Yeah, if I look through that friend, what kind of form factors are you expecting? let’s say if I look further out 2030 do you expect devices that are. Going to run on years of battery life, let’s say an embeddable device, uh, with intelligence to both
manage what it needs to and survive on very little, or even do energy harvesting. Do you think the models and the technology get to that point in the next five to 10 years?
Ian: Yeah I do. I think it’s, um, I think it’s going to require, riding the innovation curves across all aspects of the tech of the stack, technology, battery technology sorry, transistor technology.
I think all of that coming together. And then I think in these In some of these sensor systems, they’re, they might be always on, but you can have a hierarchy of processing. You can have very rudimentary detection to determine whether or not you should then wake up something that’s a little more sophisticated, which could then determine whether you’re not, whether or not you wake up an even more sophisticated device.
So I think I think that, yeah. Style of system can be very energy efficient in the long term. And if you look at the trends around technology scaling, then yeah, I think that will happen.
Nandan: And this kind of brings me to effectively what you’re talking about, like tiny ML.
So you look at tiny ML started a few years ago. It is basically at always on and very simple stuff only at the edge. And some of the ML perf type of benchmarks started going down that path, which is a, I’m only going to do that for very small devices that can. Be always on and support it.
How do you see what is a need? Let’s say from the market perspective to expand that, right? When we get to always on now, you click something else up, but it still needs to be energy efficient and do its thing. And maybe it clicks another way up. Is there a Set of benchmark that you would think of that the industry or consortium need to come up with that will help us Understand what boobies use cases as well as growing out of tiny into, let’s say a small ML.
Ian: Yeah, that’s a good question. And it’s funny you asking it reminds me of. So in the early days of. Tiny ML where we were having some committees to, to organize the first workshop and we spent hours arguing about what does tiny mean and just, is it the, a hundred milliwatts, is it single milliwatts, is it a single watt and just hours and hours talking about what, how tiny it is tiny.
And I think at the end, it was we gave up and said people will know what tiny is, but. So I do think the benchmarks you’re talking about will be needed kind of growing beyond tiny. But I think the ecosystem is on the right track in that. A lot of the TinyML work reflects the status of what’s in Silicon today and what people can take and then benchmark.
And as those platforms improve in their capabilities, we will see that the benchmark suites advance and expand in scope following that trend,
Nandan: This is a fair point, right? So effectively benchmarks will follow that trend or consortia working towards it. arE there and extending that to the ecosystem side?
The challenge with that AGI Is it’s extremely fragmented. Do you see enough of a push in the industry for standardization to help with it? And if so, what are the key things that they need to solve to at least get that baseline that
Ian: Yeah. So I, that, that’s the million-dollar question. I completely agree that the standardization around edge AI is something that is holding back the ecosystem. And I think at the end of the day, what, it’s worth thinking about what does good look like, right? So I think what good looks like is a data scientist using DAX or PyTorch in AWS or some other environment to develop their model.
They have some sort of automated tool or semi-automated tool that says, yeah, this will run on your edge platform and, or no, it won’t make this change. And yes, it will. So I think that’s what good looks like. So then to actually build to that point. Is really the challenge for the community. And I think there’s a huge demand and because there’s a demand, those gaps that are in between what good looks like and where we’re at today will be filled by different companies in the ecosystem.
And I think leadership from companies like Arm and others coming up and saying Hey, this, you should support these operators. These are the frameworks you should support. This is where you should put your effort. I think the leadership from companies coupled with. Natural development and entrepreneurship to plug those gaps.
I think we will get there, but again, I think that’s another one of those that will take a while.
Nandan: And in fact, that kind of was my lead into the fact that soon as you get to the edge, especially in the embedded world, as It’s about trying to find best of breed for their specialized implementations, right?
So like you have your own sets of products for processing of all types, but like we’ve seen some of our engagements merging, let’s say an NPU from a neural processor from BrainChip, alongside a Cortex M85 with its vector engines alongside can create a pretty Compelling use case as well. Arm has always been a leader of an ecosystem providing technology as well as partnering and collaborating.
As you look at embedded in particular IOT and the AIOT or artificial intelligence of things, do you see this trend consolidating or actually growing in terms of the diversity
of offerings?
Ian: I think it’s like any other new space where we’re seeing kind of an explosion in offerings and then we’ll see consolidation and convergence actually, which part we’re on there now, the upslope or the downslope, I’m not sure but it’s all a cycle, it’ll start again. Yeah, exactly. Yeah and I think that, there are things that, that Arm is doing, we’ve been working on this project called the tensor operator set architecture TOSA (Tensor Operator Set Architecture) for several years, which is. A stable layer, it’s actually a sensor dialect within MLIR, which is a compilation framework for building compilers.
So it’s a spec back definition of what we think is an important MLIR (Multi-Level Intermediate Representation), compiler infrastructure that is designed to accelerate the development of machine learning (ML) frameworks and applications). And then others are free to implement that specification in their NPUs. And if they, if an ecosystem can consolidate around standard layers like that, I think it will help. It will help a lot in terms of providing stability for edge AI, but it’ll take time for these things to develop.
Oh, I think
Nandan: it’s been time has been running fast. So I wanted to ask you a more arguably lighthearted or deeper question, depending on how you take it.
Do you see a singularity in our lifetime? Do we get to AGI by 2050?
Ian: I do think we will get to AGI. I do. So yeah, I, I’m a I am one of these people that, of the folks that believe we will, at AGI, I am a techno optimist. I, there’s. There’s two views of the future, there’s the dystopian Blade Runner view, and then there’s the Star Trek view, where there’s replicators and it’s a world of abundance, and I’m in that optimist camp, the Star Trek view of the future, so I’m looking forward
Nandan: to it.
And for those listening that may not know what the term AGI is Particle General Intelligence, where, and I’m going to toss the baton back to Ian.
Ian: I guess that’s a good question. One of the, that is one of the great questions is once you’ve built such a thing, how do you probably know it’s actually AGI? But the idea is something that a, an artificial system that generalizes across many different tasks, as good, if not better than a human.
And it’s indistinguishable from a human, from a computability and generalizability perspective. Cool.
Nandan: I think that is a pretty positive note for those worried about the dystopian side of, uh, artificial intelligence. So I will take that as a plus and thank you very much, Ian, for coming on board and look forward to both collaborating and seeing what you guys
Ian: do.
Thank you, Nandan. It was great talking to you. Thanks a lot.
Moderator: Thanks for listening to the Brain Chip Podcast. Please remember to rate and review on your favorite podcast platform.