Google's Gemma 4 is a locally-run AI model that uses a mixture of experts (MoE) architecture, where only a subset of parameters (3.8 billion out of 25.2 billion total) are activated during inference, making it significantly more efficient than traditional dense models. This optimization allows the model to achieve performance comparable to cloud-based models like Claude and ChatGPT while running locally on personal computers, eliminating ongoing subscription costs. The model supports 140 languages, multimodal reasoning (text, image, audio), and can be fine-tuned on custom datasets. It is released under the Apache 2.0 license, enabling commercial use. The model can be deployed using tools like Ollama, which provides a command-line interface and web UI for interacting with the local AI model.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Run Google's Gemma 4 Locally: Better than Claude for Coding?Added:
Hi, it's Al here from Al's Geek Lab. I hope you're doing great today. But a different video than usual. I thought I would look into the AI stuff which is going around at the moment. And AI obviously we've got everywhere. I've been using Claude. I've been using Open AI's ChatGPT of course. And various [clears throat] other AI tools.
I guess it's kind of this thing. If you're in a situation where you work in IT and you haven't been using AI, I guess I would start to feel a little bit worried around now because you need to use AI to be part of our job in IT any longer.
Like it it's going to get to that point where that will become the reality soon.
If you're a developer, my reckoning is that's happening already. So if you're if you're not a developer, things will things will be different depending on what sort of role you're in. But I reckon that no matter what you do in IT unless you're running network cable I guess or working in a data center manual kind of labor AI is going to affect your role to some extent whether you like it or not. And that's quite a a lot of people would say that's kind of hard to swallow right now. A lot of people are saying oh there's this big AI bubble going on. And I think yes to some extent there's probably a bit of a a bubble going on that people are just going crazy with it. I'm not one of those people. Um however, I find myself increasingly have having to use AI as part of my day-to-day because there are parts of it which are really helpful. They are definitely force multipliers. I feel like there's I mean this video for example none of it scripted. None of it is using AI. And I feel like people can really tell when something is AI and it's scripted and it's nonsense and there's slop. People have got used to that term AI slop and you know, people can tell when something is genuine and when something is nonsense. And most people anyway. And so I think that's important.
Anyway, that's not what today's video is about cuz I'm not going to have that big discussion about AI. What I am going to talk about is a bit of a change in what's happening right now. So a lot of people, most of us, have been using things like Chat GPT or Claude or another model basically.
Most of us are using those. And the big difference with those is that they go out to some cloud server and you pay generally to use that service. It's on somebody else's computer out there on the internet, the cloud, and then you ask it a question or you ask it to create an image or whatever it is you're wanting it to do. It returns to you with that answer and you pay consumption costs for that. Fair enough, you know, you're using somebody else's computer.
There have been a few models like Ollama by Meta, the people behind Facebook, for example, who have had an open source model for a wee while now. There are other models out there that are open source as well. And look, generally they've been goodish and it meant that you can download the model onto your own computer, but they all work in the same sort of way. And I'm not going to get technical about this cuz I have no idea really how this works. A lot of people who are much cleverer than me know how this works. But basically, they the models that have been around until now, including Chat GPT, including Claude, they're all using the same sort of methodology.
So when you type in a word, for example, [clears throat] any word, and then you keep typing your sentences, you just keep on typing.
Um every single word that you type in has to be analyzed by this big AI model, this large language model. And so, if you think about that, it has to go through the large language model, which is billions of permutations or words inside this model. And it has to spin those cogs and think about what the right answer is. So, if you imagine all of that input, say you've put 100 words or tokens, and it goes off and it tries the combinations against against this database, I guess, of other words and goes, "Right, is this the right answer?" That's kind of how it works. So, if you think about autocorrect or Clippy, it's like that but on a massive, massive scale, right? It's trying to give you the answer back. So, it does all that, and that's very computationally hard, right? There's so much math involved in going, "Here's a Here's billions and billions of words and permutations, getting that down to something which is probably feasible for you." Right? So, it's doing that. It costs a lot of compute GPUs in in the case as well. It's using all of that raw power and then returning to you, you know, a couple of sentences, maybe a paragraph or two.
So, that's all fair and good. What Google have come up with is basically a way of saying, "Right, you might have typed in 100 words, uh but actually out of that 100 words, I only need 10 of them to make my decision about what to pick out of the large language model." So, I think the model at the moment is um I think it's like 40 billion. There's a different There's a few models. Um but basically, there's a few, you know, a few models here. Um J Mar 431 billion, right? And then there's 26 billion, and then there's a few other models, different sizes. In fact, this is them here, I think. So, 31 billion, 26 billion, uh E4B, uh IT, and then so forth. These are the smaller ones. Um and in fact, you can use those ones on your mobile phone. So, you can actually download onto your mobile phone a large language model and ask it to do things like, you know, it'll even do useful things, agentic things, like, you know, if you want it to uh switch your torch on on your mobile phone or open your email, it will do that sort of stuff, and that's pretty helpful. The bigger models over here, 26 billion and 31 billion, the performance of these, if you look at where it's at, it's up here, right? It's It's close to, you know, the top performing models on the internet now.
So, Gemini, JLM, Gwen, right?
Deep Seek. It's better than the Deep Deep Seek 3.2 expert thinking model.
Right, so this is cutting-edge stuff, but you can run it on your own PC at home. Just think about that for a second. You're paying, you know, I'm I'm paying for Claude, you know, Max Prop plan at the moment, which is like, I don't know, 170 bucks a month to help me with writing code. And it's great. I mean, it's a really useful model. It does most of the things I really need it to do.
Imagine if I didn't have to pay that money and I could run my own model at home within the uh the restrictions of my own PC. So, they've really, really optimized the computation behind that model to give you something that will do what the big, high-power, high-computational models that um Gemini used to, but also Claude and ChatGPT. So, it's optimized it so well that you can now run it instead of in a massive big GPU farm in a data center in the cloud somewhere, you can do the same pretty much on your own PC.
So, that in itself is, you know, a massive leap in where we're have been in comparison to the likes of uh All Lama and so forth, right? So, this is whether whether it's good enough now, that's what I'm going to see on this video. So, I'm I'm literally going to do this for the first time on like I'm just going to do it live and I'm going to see how much I struggle and see if it's any good and you're going to see it with me.
Regardless of the whether this is as good as it claims it is, that doesn't really matter.
What is the case is that this opens up a whole new way of working for people at home and even if this one isn't quite as good as it could be, it's easy to see where it will get to because it need they need to just make a few more optimizations. So, this is majorly important I think for the next steps in AI and Google have given this away for free.
Right? That's the important thing. And it's not just free as in oh, there's some kind of scary license that sits behind it that you don't know that you can actually push [snorts] on. You you can't use it commercially or something like that. No, you can. You literally can. It's using the Apache 2.0 license, which has been around for what decades now. It's very well known. You have completely free reign to commercialize that. You can, you know, wrap it all up in your own product. You can charge for that product. You can sell it on to somebody else. I think really the main thing about the Apache license is you have to keep the license inside your code. Like you have to make sure that that part of it is Apache licensed, right?
So, you can see here that it's got a agentic workflows, so that's actually doing things that are useful to you, you know, interacting with other apps, you know, working on your mobile phone, multimodal reasoning, so that's, you know, it says here developing applications with audio and visual understanding. You can get it to generate images and listen to your voice, not just text. Um it supports 140 language, does fine-tuning, um you can so you can give it your own information, fine-tune it on your data sets. For example, like I work in the in the world of cybersecurity, so I might want it to know all about the NIST Cybersecurity Framework version two.
It wants to I want it to be an expert on that particular framework, so I can just feed it the data from NIST and say, "Right, okay, you know everything about NIST now, off you go." So, it's really cool.
So, what I'm going to do now is I'm just going to step right into So, I press the download button up here.
Let's have a look and see what it says.
So, download model weights and then run, train, and deploy.
So, uh it says like you can use these I have no idea. I mean, obviously Ollama I understand is a is a well-known um open-source model. There's also Hugging Face, uh Kaggle LM Studio, and then Docker. So, like I've got absolutely no idea what I'm doing, but I know how to use Docker, um you know, lots of people do. So, as you can see here, I click on um the Docker one, it says Gemma Gemma for multimodal uh open AI models by Google, and it's showing you there that is um indeed Apache licensed. So, let's have a look here and see if there is a a version that we we can choose. I think here you can see the different tags. So, those will relate to the size of the model. And if we go back up to this image, you can see the the largest models are here. So, 26 billion model and the 31 billion model. Now, the differences between those is that they will have different requirements of on your hardware. So, I got 64 gig in memory and RAM memory in this this workstation that I have here. It's a fairly grunty machine, but actually a really average graphics card. So, um there's an Nvidia and it's a what? Um running a uh GeForce RTX 5060. So, not a not a really basic card, but unfortunately only has 8 gig of VRAM, right? So, maybe I want to get it invest in a better graphics card at some point. A newish graphics card, but just not very much GPU RAM uh or or GRAM. Um so, I don't actually [clears throat] know what one I can run.
It doesn't seem like it's got a system requirements um or at least on that main page, which is kind of annoying.
Um but let's see if we can uh have a look here. Run efficiently on the edge.
Uh deploy to production enterprise.
Obviously, you can you can run this in the cloud. And again, that's what Google are hoping for. So, if you you know, get it into get it ready for production and you're like, "Okay, well, now I got to run it on a server somewhere. I can't be running it on my own PC." So, they're hoping I guess that, you know, you'll choose um AI um through Vertex, which is, you know, uh Google's um AI system. And I guess it would all just run in the cloud seamlessly. It's easy to do. Uh so, I guess that's their kind of plan behind that. That's how they're going to get money.
But, you don't have to use um Google Cloud. You can run it in any cloud. Um but, I guess it will just work flawlessly in the Google Cloud. You just deploy it to their system, and you don't have to even think about running it on a VM and and all the rest. So, let me just have a look here and see if there's anything about um the the requirements. I don't want to deploy it in my browser. That is something you can do. You can use WebGPU.
I don't want to do that, obviously.
So, you could see here there um that's quite uh helpful. Um so, the E2B and the E4B shows you the intended platform is a mobile device, and then the next one is mobile devices and laptops. The next one after that is the A4B, and if we go back to this one here, the A4B, uh which is this one here, it has uh is very highly performant still, and is only slightly less performant. It's still, you know, more than it looks like about 1442 or 1441 in terms of its ELO score.
Um so, we might try that one. But, then, you know, this one here, you can actually run it across a cluster. We can have a look here in terms of um the size of the parameters. Gemma 4 A4B is um 48 gig in size.
So, I guess I'd be pushing my system right to the max. Going to have 64 gig.
And this one here is 58 gig in size.
And what I hadn't read up on was basically this page here, which talks about the different models. And it's talking about here the A4B, the 31B, and the difference in what's called density, dense models, and mixture of experts models. Now, the ones that they're pushing at the moment uh when we had a look at this here is these are the expert the mixture of experts models and if we look back at this page we can see the 26B A4B MOE and dense models are effectively what we're used to. So dense models is the classic thing that we have with chat GPT and Gemini the one that we usually use and also Claude as well. So this is you know that our ability here to download classic models or dense models is still available that that that's not going away but obviously the problem is that they have quite a taxing load and probably can't run on your desktop unless your desktop is an absolute behemoth. If we have a look at the different models and their capabilities as well the modalities do change. You've got these two here which do it all effectively text image and audio and this one only does text and images. So I think it's quite important to choose the model which suits your desires.
This is interesting though when we talk about parameters here you can see that they have total parameters in this model of 25.2 billion parameters. All right so that's the amount of total data that's available to it the large language model as far as I understand and here's the active parameters. So when you ask that ask it a question it's only ever actually going to hit 3.8 billion of those 25.8 25.2 billion. That's the math that work that's the ingenuity of this particular model of experts system. So it's basically never going to try and pull any more than 3.8 billion active parameters into your system RAM or your GPU RAM at any point in time and that's what makes it um much more feasible to run on your system. So, uh that's that's effectively I'm learning as I go here, but that's what I think it's doing. Yeah, right. Okay, hope that makes sense. Just about makes sense to me.
>> [snorts] >> So, doing some quick um back of a cigarette pa- packet sort of math on this uh just to make sure that I'm going to use the right model, which I already kind of figured out myself, but I just um wanted to look at things here. If I look at the I've got um a Ryzen 9900X 12-core.
Um I've got DDR5 RAM and um you know, my memory usage is all good. But basically, what that tells me is that my model that I want to use is that um MoE mo- mixture of experts model. To run any of these dense models, I'm going to need uh a much more powerful um not GPU. The GPU is probably fine, but the the video RAM would be my limitation there, I dare say. So, I'd really need to have a bit more juice. So, I'm going to go on that basis.
All right. So, I think I might have an idea what I'm doing.
Maybe. Maybe not. We're about to find out. All right. So, uh I figured out that I want to use this um 24 26B, sorry, A4B model. So, first of all, if you are like me using an Nvidia card which has the capability of CUDA, um which basically most modern ones do, then you can install the um and uh CUDA system. So, I'm not 100% sure why we're using um, Ollama, but as you saw earlier, um, it's only when you click on the download button here, you've got a few ways to use it, right? So, these are the options. You get five options, basically. You can use Ollama or Hugging Face or whatever. So, I'm running Arch Linux here or Arch K, whatever you say, and there is a package in there called Ollama CUDA. There's just Ollama as well, of course, and of course the CUDA libraries as well. So, installing that first of all, and then once we've done that, so CUDA by the way is just like the graphic I mean, again, I'm probably making a total mess of this, but CUDA is the, um, like a a rendering library, um, from Nvidia. So, if you don't have an Nvidia card, it's not going to work, but basically it's the 3D, um, rendering something or other, right?
Okay, so, many people out there are screaming at their screens going, "You don't know what you're talking about." I don't really, not when it comes to things like CUDA, um, cuz I've never used it before.
It looked like, um, it installed there, um, very strange that it didn't work the first time. Maybe it was just it needed refreshing or something like that, but in any case, um, under an SYU, it installed. It said the CUDA libraries are now in the, um, opt CUDA bin. Says you need to source path, uh, or restart your session. So, I'm assuming that having a new shell, uh, open is sufficient. It was quite a big download, by the way. It was like 4 gig or something. It was not small.
Now, um, what we can do on this webpage here, ollama.com/library/gemma-4, basically it's this is the Gemma 4 webpage for Ollama, right? And I think this is obviously we go back here just for a second. You can see all the different ways to install it. I was going to think that I'll install through Docker because I know Docker but I guess any one of these will do.
So, I had a look at this. Now that I've got Ollama installed through Pacman there or I guess you could be through um uh Debian packages um through apt or something like that, right? So, there's plenty of ways to install Ollama, I'm sure. Obviously, you can just do it this way as well. Now, um what I want to do now is then get um Ollama to be enabled. So, I want to do I'll increase the size of this bit. Uh system CTL enable Ollama.
Uh and I'll do now as well.
Ollama.
Okay, so that's enabled the service and I'm going to pull the model that I want, which is obviously the Gemma 4 version that that that I want to pull it down, download the Gemma 4 uh model. So, yeah, I'm going to pull the uh 26B model, which is this one here. So, I think um all I need to do is pull the Gemma 4 and then you'll tell it which model that you want. So, I think in this case it's 20 26B.
And again, I think we have a look down here somewhere on this page.
E4B 26B. Yeah, so mixture of experts for workstation models. That's basically where it is. You can see all the different models here. So, I'm going to do that.
See what it says. Looks like it's pulling what looks to be like a Docker container, but obviously, I'm sure that's not the case.
17 gigs worth of data. So, hopefully, I've got enough space on my hard drive.
I better just make sure that is the case. Um Uh no.
Maybe not. I need to free up some space.
Well, it is now many, many hours later.
Um What can I say? I got to love butter FS.
You know, it's just one of my favorite things ever. Um and snapshots on butter FS. So, anyway, uh this is not a butter FS video, but I am pulling the uh manifest, as you can see once again, the a Gemma 4 26B model. And I will report back when I'm finished, but basically, the only thing that I need to do after that is effectively run it. Uh just kind of like you do a pull with Docker. You do a Docker pull, and then you do a Docker run. This is an Ollama pull and Ollama run. So, I'm going to do that. Uh I'll fast forward here, and I'll come back to you in a moment. Okay. Yay, that downloaded. All right. So, back to the Ollama website, and it says here to run it. So, we will go with that. Oll ama run Gemma 4 and then we're running on the 26B, I think, was what I said. 26B, yeah.
Here we go.
Send a message. Sweet. All right. So, um if I [clears throat] scroll down here a bit past the benchmark information, what it recommends is to set some parameters.
So, I think this is what I'll do first of all, to set, you know, temperature, top P, top K, all the rest. So, set param uh, tur temp temperature 1.0 Sweet. Okay, that was good. Set parameter top P to 0.95 and top K as well to 64.
All right, cool.
Then I guess we could try and trigger it to think.
Uh This is where I'm not quite sure what I'm doing. So, set system prompt I don't know.
Think.
Like this. Is that what you have to do?
Like that? Okay, not sure. Um Set system message. So, now if I try and give it a message maybe that will work.
So if I quit out of there, I'll say, "Is it bye?"
Bye.
Bye.
And then run this, but give it a prompt. So say, "Hello world."
Like that. Will that work?
Hey, it works. It works.
So, I guess I could give it a prompt as well. Um Tell me uh reason for life on Earth.
And then maybe I could then pipe that back.
Prompt. Like that. Does that work?
Oh, there is no single objective answer.
Look at this.
This is using the local machine. It is not uh online. It is using the LLM based from the system here. So, I could unplug my ethernet and it should be completely offline.
Um so, yeah, I mean, obviously, it will not just run in the in the command line like this. Um there is, you know, there's a web interface you could give this. So, this here docker container.
Let's have a look.
This might need internet access.
But, let's have a look at this. See if we can connect into this one. This should run on the local host. So, stick it on port what, 3,000 I believe I set it there.
Yep, cool.
All right. Now, that should be running.
Yep, looks like it's up. Running on port 3,000. Yeah, it should be Oh.
It It It came alive.
Okay, get started. Open web UI. Right, so this is like a container that will create it's like a basically a chat GPT style interface for any large language model, not just um JMR4. Um let's have a look. Let's see.
So, yeah, first thing to do is set up the connection to to the model, and then I guess you run it from there. So, I'll figure that out and then get back to you. So, I thought, well, I'm using my own AI model, why don't I ask it how to do the job, right? Like, that seems seemed pretty obvious at the time.
[laughter] So, I did. And it's given me some steps, which is really cool. So, told me to edit the system D unit for Ollama.
So, in Linux, I run Ollama as a system service. Yes, I do. Under the service section, I need to tell it to listen on the for all connections rather than just on 127001.
So, that makes kind of sense.
Environment equals So, I guess this is a new line I want here. Edits below this comment will be discarded.
So, does that mean I want to put them in here?
And then Okay.
So, I want Now, I want to tell Docker how to find the host machine. Probably want to stop that Docker container.
Okay.
There we go.
Now, what's the last thing here?
Settings and connection. Look, there it's got a model. It has a model. So, does that mean I can just Any second now.
Yes.
Excellent.
And that's it. It looks like it worked.
I can't believe it. I am now running a local LLM which has the ability to chat back to me and do things like this. This is absolutely insane. I do have um I do have hardly anything running and I thought I'd bring over um B top over here. I've been running it in the background whilst this has been going on. I just wanted to see how much RAM's being used. So, RAM actually is not much. Um you can see that >> [clears throat] >> uh overall, I mean, I say much.
It's still running 15 gigs worth of memory by far the most expensive thing running on the system and I'm pretty much running nothing else but this. This is it. This This is the whole thing. I'm running OBS Suite to record this video.
But um yeah, memory-wise, uh of the 60 odd gigs available to the system, I'm running 22 gigs at the moment. 15 of those exclusively going through to Ollama. Um and then obviously GPU-wise, it is running a little bit here. Uh GPU 12 13%, but then 7.3 gigs of the video RAM are being allocated to that already.
So, like basically everything's going to Ollama. So, that's probably why I'm seeing a potentially a little bit of instability. I noticed for example, OBS Suite decided that it had enough and couldn't um record any more video. So, obviously when it's doing stuff, it's really churning through. Um let's ask another question. Now, this is going to fail. It doesn't It's not going to be happy with uh what's the weather?
It will fail, but then immediately you can see the GPU spike up. I mean, 3 seconds give me an answer which is absolutely valid for an offline model.
Um See if it says anything about whether it has access to the internet. I don't know what um code interpreter um integration is. I haven't played with any of this yet, but it's very very interesting. So, it's it's having a good old think about this one. CPU, GPU um reasonably high. So, 51% out of all these cores is burning away at the moment. GPU's not that big, but the memory obviously pretty full on.
But, how cool is this?
There you go.
It does not have direct access to the internet. Um it has massive data set of information that you were trained on. No real-time browsing, no knowledge cut off, and no live data. So, that's absolutely understandable and is exactly what I expected. I did have a look into this. So, in terms of this model that we have here on this system, the mixture of experts one, um when it says supported modalities text and image, I had a look into that. So, when it says text, obviously you can see text is, you know, streaming, doing all the things. It's great. But, image actually means it's a one-way thing. So, what you can do as you can see here is uh a page of a document. Can you interpret what it is and summarize it for me? It will take that um JPEG or a PNG or PDF or whatever, it will take that, look at it, and then summarize it or do whatever you want with it. It will not give you image output like, you know, I don't know, stable diffusion or whatever. So, that's that's the difference. I think these guys up here um they do, I think, some of them do um image generation as well. So, but it's just um amazing amazing that we can do this on this day. So, I'm going to see if I can turn this into a, you know, a developer um, and see what happens. I did notice up at the top of this page, like for example, you can launch an application.
Uh, launch Claude model Gemma 4. I don't mean, I don't know what that does. I got absolutely no idea, but um, here we go, all armor.
Launch Claude model Gemma 4.
Insane. No idea what it's doing now, but I don't know whether that means that the, I'm assuming that's what it means is that the actual Claude code application can then work with the Gemma 4 uh, model as opposed to the Claude code model. And so, you can decide, okay, well, I want to have a work, I want to work today with the local model, not burn through my credits, and then maybe later on you could switch your model back to um, you know, Opus 4.7 or whatever, and choose between the two.
I'll play with that and see how that goes. And of course, you can do the same for um, open code, Code X by Open AI, and um, the very dodgy open claw.
But, you know, I'm scratching the surface here. You can see that I've This is I'm doing this kind of live. I'm just figuring it out as we go along. And I sometimes I like doing that those sort of videos where I'm just kind of showing the reality of things. I don't know what I'm doing. I'm just getting to grips with learning these things myself. But, hopefully that gives you an insight to the reality of it rather than um, you know, these shiny polished videos where you see people going, "Oh, yeah, you just do this and it's amazing and it's done." It's not like that.
Sometimes it takes time to to figure things out and um, I I what I've figured out is that it's very doable. It's not that hard. Um, I really think that the Gemma I the the this this webpage the Gemma 4 webpage is completely useless.
I have no idea how to install it using the download button on this webpage like absolutely nothing. It was only after reading up about the Ollama um page and how it how basically Ollama seems to be the way that you get it working like using Ollama is is the way. So just before I finish up this video, I wanted to uh show you what I used to get this little game running here. Obviously, I used the um Ollama with Gemma 4 and you can see this game working perfectly. In a second, I'm going to show you uh what happened when I used the model Gemma 4 model with the Claude code interpreter.
It was a disaster as you can see.
My goodness.
There is a Pong game of sorts.
I don't really know how to play it.
Is there a No, there's a ball over there.
Uh I can't press anything.
I can see that you can move inwards and outwards, but I can't move up and down.
But this here is a perfectly functioning version of button ball or Pong or whatever you call it. And um this was using uh pieces of where open source again of course called Ada and it works just like Claude code does. Um if you're familiar with Claude code, it works at the terminal. You're not um you know in an IDE or anything like that, but I I guess there should be a way to plug it into an IDE if that's the way you run.
Uh so two things you need um which you probably have already, but in your bash RC or equivalent zsh zsh RC or something, you need to tell it where your Ollama API base is, which is probably the same as this here. So, Ollama API base and then popping it to the local host on the port that it's listening. In this case, it was 11434.
And then either is the name of the software, which you can see I'll show you in a second, but basically you just need to tell it which model to go to, otherwise it's got no idea what which model to go to. But basically, yeah, once you start it up in a in a directory, you can tell it "Make me a simple breakout game like the original Atari 2600 version, but for the web browser."
And then, yeah, it basically gets to work doing it. It's pretty cool. So, I'll just take you over to Aider in the in the web here just to show you the website. So, it's aider.chat, and you can see there an example of it there working away in this sort of preview. And once it spins up, so I just bear that in mind that the first time it runs, really really slow.
It takes a long time to sort of, I don't know, just figure itself out. So, that first pass, when you give it that prompt, it's going to feel like forever, really really sluggish. You can see here it's not that sluggish because I've already used it to make that Pong game. But yeah, really simple to get started with. You just I think I installed it with pipx or yeah, pip. Yeah, you can see there.
So, very very cool.
As I say, I did try with Claude code through the Ollama example that was on the website, and pretty much a total disaster. It felt like going back to GPT-3.
It seemed to forget everything, so you could give it one instruction like make me a breakout game. It would go off and kind of make it, but it was pretty bad.
You'd have to copy and paste the source code into your text editor and save it.
Um and then the game had bugs in it.
Really didn't work at all. And then, you know, if you say, "How could you fix up that code for me?" It's like, "What code?"
>> [laughter] >> So, it really didn't work very well.
But, as you can see this here is now trying to build up a game whilst it runs away, and I guess it will do soon. Um it's thinking about things a lot there.
Um but, that's it. It's really really very effective at um writing code, and you know, it seems to work very similar uh in in the gist of things to Claude code or Codex or something like that.
So, if you're wanting something to give you that experience, but using the uh Gemma 4 model, then uh check Ada out.
I hope this video has been helpful to you. Um it's been helpful to me.
>> [laughter] >> I'll catch you on the next one. Until then, uh be excellent to each other, and uh thanks for watching. See you. Bye.
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 views•2026-05-29
🚀 BCS613C Compiler Design | Module 1 to 5 Schema Evaluation 🔥 | VTU 6th Sem 💯 #VTU #bcs613c #exam
Pranavaa-y4y
104 views•2026-06-02











