This video clearly demonstrates that specialized hardware is only as good as the software stack supporting it, making optimization the true differentiator in AI performance. Itβs a sobering reminder that raw silicon remains a bottleneck until the code is properly tuned to unlock its potential.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
CPU vs NPU: The Mind-Blowing Difference when properly configuredAdded:
Okay, so I thought I'd do a follow-up video on the Space Mic K3 risk-5 mini PC because the NPU on this, so the processor that deals with AI, is incredible. But you do need to be running the right optimized LLM to be able to get the performance. A bit like running a game and having the latest GPU drivers, the NPU is only used if the software is asking it to be used. And I've tried to run LLMs on Raspberry Pi.
But I don't think I've ever had it properly running the NPU, especially in the way that this works. So, this is running a much bigger parameter large language model than I've run. So, 30 billion parameter. So, I was running a 1.5 billion parameter on this DeepSeek video on Raspberry Pi without the NPU.
And I've got the Pi AI hat, too, which is in this video. And it definitely took the load off the Raspberry Pi. But as in other people's tests, the Pi was pretty much as powerful as the NPU in running a large language model. But I wonder if it's more about how well it's configured. Let me explain it by closing this down.
Uh and if I open the terminal, and I'll just run a llama.
So, let's start it.
Let's get Htop running.
And we'll run a small model. So, I downloaded a smaller Gemma 3B just to check it was working.
And I'll show you what happens in real time.
So, you can see it booting up.
And we'll wait for it to be ready. So, it's ready now.
So, if I say hi, you'll see that the response is are very slow. And if you have a look at Htop, it's maxing out the CPU, but the NPU, which are these eight cores, aren't being used at all.
And the RAM is only at 5.07 gig. And you see, it's not even answered that question yet.
And this is on a 60 tops machine, but not using the NPU, so it doesn't really count.
There you go. See, it's And I'm leaving this bit in real time, so it's it's just about managed to say hi there. And it's obviously thinking about what else it's going to say to me.
I'll let it go for a little bit longer.
It almost feels like it's not working, though. So, again, 100% on the CPU. Oh, the exclamation mark has arrived now.
So, we can see how bad that is. Uh and if I do control C just to stop that, and I'm actually going to restart cuz I don't know what bits will still keep running.
But before I do that, let's see what Gemma 3 4B, how many parameters it is.
So, 4 billion compared to 30 billion.
Actually, after I got it running, I found this video, and there's some really good information on the LLM and how you have to use their version, which is exactly my experience. So, if you use the Space mid version, it just it's so fast. So, I'm going to restart.
So, I've got some notes here of how I got it to run.
So, I used Gemini 2 install it to find the optimized version, and it and it worked. Uh it gave me the wrong instruction to launch it first of all, but I put the error into it, and it came out with the right one. Uh but we need to CD into the folder where it's downloaded to. And you can see it was downloaded here.
So, the Qwen 3 30 billion.
So, let's have another terminal.
I'm going to the folder, and we'll launch that from the folder. Let's just get rid of this.
So, the model takes longer to load.
Notice that and you can see the CPU usage is really moderate at the moment. The memory though, you can see it filling up.
And that's because it's putting the LLM into RAM and it's going to run it from RAM. So, that's obviously a massive performance boost.
But also, it's going to start using the MPU once it switches over to RAM.
So, again, the initial startup does take longer and I guess I'll leave this in real time. It'll get up to about 20 gig in the RAM.
And we can see, if we look at temperature, it's not the CPU usage isn't super high while it's doing this bit.
Uh and so, temperatures-wise, what have we got? The maximum of 63Β° at the moment. Hasn't gone over that.
The fan hasn't really started running uh on the Space Man. I mean, it probably is. Yeah, it says it's running at 2,300 RPM.
Be interesting to see how fast it runs later on when the MPU is properly running. Right, so you can see it's launched.
And if you have a look at the memory, so you can see the memory usage is high.
If I start to ask it a question, so what Let's just say hi like we did with the other one.
It's just instant, but you can see the MPU is the bit that's working now.
So, it's not using very much of the CPU.
The RAM usage is very high because it's loaded it into there. And when you think about how long the other model, which was a fraction of the size of this 30 billion parameter model, uh and it's just instant.
I'm trying to think of the question I asked the other day. I put this question, "What lightweight 80s mountain bikes were around in the late 80s? Which I think this is a weird question to ask a local LLM because it hasn't got access to the internet, but you'll see how quick it's just instant.
And I haven't had an LLM work like this before.
So, the user is asking about lightweight mountain bikes from the late 80s. Let me start by recalling what I know about 80s mountain bikes. So, remember that the 80s were a time when mountain biking was becoming popular. And it goes on, but it starts to mention manufacturers, Mongoose, Scott. Uh I think in the other one it mentioned Specialized. And then they're talking about the type of weight that we'd be looking at, 25 to 30 lb.
And it And it just goes on. And it And it does such a good job considering it's a local language model.
I mean, it I guess it's more for different questions than this. Because maybe And then it's getting specific, so the Trek 8000 series comes to mind.
Giant TCR. And then it thinks that the Giant was a a road bike.
Specialized Stumpjumper would be a good call.
And so, yeah, I'm just really, really impressive. But it really shows how the model needs to be optimized for the MPU.
Otherwise, the MPU isn't going to use it. And And you can see it's using it about 80%. The temperature isn't high.
And if I have a look at my iPad, so it's currently using 20 W of power.
While it's working all this out. So, it's not going to get that hot.
So, it's not really putting a strain on the machine.
And it's kind of shown me that MPUs can be really good. I mean, I have liked the Raspberry Pi MPU with image recognition.
It certainly did improve that. And And it was really fast and reactive. But I was also a bit disappointed at my Samsung laptop with a smashed screen, which has got one of the Snapdragon processors, one of the top end ones.
And loads of things just don't support the MPU. It's just It's just sitting there waiting for something to ask for it to work.
But the software needs to ask for the MPU to be used. And when it is properly used like this, it just blew me away. I couldn't believe how fast it was. And it's still going as you can see. If I If I go full screen on this, look at how much it's It's giving me an information. I should go through it really.
Look at 26-in wheels.
So, it knows little details like that.
So, Trek 820, Trek 830.
Stump Jumper. And it's It's good summary, look.
So, I'm very, very impressed.
But if I stop that and go back to the browser. Hailo is the maker of the NPU for the Raspberry Pi.
And the 10H is the latest one I've got.
And I did see things about large language models being supported on it. I never found it for the 10H.
I always found things to do with cameras and image recognition. And if you ask AI, it will tell you that it's there.
But I've never found one that's properly optimized. And I'd really like to try.
So, if anybody knows of one that is, I'll have another look and see if I can find something on there. But I just couldn't find it. But knowing that the Raspberry Pi is 40 tops, this is 60 tops, it means that the Raspberry Pi could get closer to this. Obviously, the 16 gig is the highest RAM on a Raspberry Pi. But then the the Hailo 10H has its own RAM on it. So, it's got 8 gig of dedicated RAM. But then if I say, "What LLMs are compatible?" So, I'm using online now. Depends entirely on whether you're using the native CPU or offloading the workloads to the AI hat.
Now, I've not managed this before, but I do need to try that. And it says that they there's some 1.5 billion to 2 billion parameter models from the Halo AI model zoo. But they're tiny compared to this 30 billion one. And because I'm quite new to AI, I hadn't realized it was going to make so much of a difference. So, super impressed with this. Really going to keep playing around with it. And I've now seen the value of an NPU. So, hopefully this helps. Thanks very much for watching.
Please like and subscribe.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 viewsβ’2026-05-29
Long-Running Agents β Build an Agent That Never Forgets with Google ADK
suryakunju
142 viewsβ’2026-05-30
5 Mind Blowing Omni Uses Cases
PaulJLipsky
1K viewsβ’2026-06-02
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K viewsβ’2026-05-28
BREAKING: Microsoftβs New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 viewsβ’2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 viewsβ’2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K viewsβ’2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 viewsβ’2026-05-29











