This course provides a pragmatic reality check on the hardware costs of local AI while offering a strategic bridge to cloud-based sovereignty. It is an essential blueprint for developers looking to reclaim their coding workflow from proprietary ecosystems.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Open Models Coding Essentials – Running LLMs Locally and in the Cloud CourseAdded:
Hey, this is Andrew Brown, your favorite tech instructor, bringing you another certification course here on free code camp. And this time it is the open models coding essentials, also known as the exp open model code 01. Uh, and the things that I want you to learn in this is to work with open models as we've already covered in other courses, every other provider. So the open stuff is just as important. So models like Gemma, GLM, Kimmy, and Quen, but we need to run them in somewhere like coding harnesses.
So we'll use ones we already know like cloud code and codeex but also ones that you may have not explored as there are a lot. So we will see what will work best for us. We need to run these models somewhere. So we will run them locally as well as in the cloud. And if you love content like this the best way to support it especially for the examro content is to purchase the uh optional certification and paid materials at exampro.co.
Um, and also remember to support your publisher free coamp to help deliver this free content. Uh, if you do not know me, um, I was previously CTO for over a decade and then I transitioned over as a tech educator here with over 50 tech courses covering basically everything. Um, and we're going to keep making courses there. But hopefully, uh, you like what you see and you'll continue on through here and I'll see you soon. Cha chia.
Hey folks, it's Andrew and before we jump into the videos, I just want to talk about why uh I wanted to make this course and what information that I feel that it's revealed at least for me.
Okay, so I've known that there's been multiple uh different coding harnesses and there's these Chinese models that supposedly can perform as well as cloud code or codecs or things like that. Um, but you know, I just wanted to see, you know, can we uh download these models, run them locally, run them in the cloud, do they perform well, and you know, where are the hard edges? Um, we're basically just building smoke tests here. So, we're building Flappy Bird apps. We're not doing anything exhaustive. If we really wanted to test these things, I would have to build out a um what we call a uh eval harness. So, the idea is that we'd make a list of things and test them against that. Didn't have time for that.
In fact, I couldn't really find one.
There's one on uh GitHub, but it's just not going to work very well and apply the stuff. And so, the frustrating part is that, you know, there's all these things and they say they're great and our tools are great, but there's no benchmark to to put them against. And I mean like practical benchmark. I don't mean like the the AI benchmark where they talk about intelligence of models and things like that. Um, so just understand that, you know, we're just getting our feet wet here and we're not going to have deep deep answers here for you. But let's take a look at what answers I do have for you here. So, first thing is that Gemma 4 had recently uh came out and I heard that it had a a small memory footprint and I have an RTX 4060. That's what Beo uh was allowing me to get. And so I thought, okay, maybe there's a model that I can actually run that has a good, um, output of tokens per second that could be used in a coding harness. Um, and surprisingly, uh, you know, Gemma 4 really does have a small, uh, memory footprint, and it actually does work, uh, quite decently as a coding harness. Surprisingly, not as good as, um, the other ones that specialize in it, but it did extremely well. Uh the only issue that I found out was that I to in order to really use it for local models, I would need a 32,000 context window and I would need something like 24 to 32 uh gigabytes of VRAM and I only have 8 gigabytes of VRAM and so it might be possible to run this locally uh if I had the right hardware, but I don't have the right hardware and so I couldn't really test locally um those capabilities. And I can run Quen on my uh computer, but it uses almost all of my VRAMm and its context window is not large enough. But if you don't have a 32 kil uh or 32,000 um size um context window with Gemma 4, it's just not going to work. It's just not going to work locally. And so I was out of options there. And then I had to quickly switch over to cloud models. Um but yeah, were there any models that I could actually run to drive code for me?
No.
Okay. What would it take? I don't know.
Uh it might take a uh what do they call it? It's the Apple Max Apple Max Metal 3 Studio or whatever. And you might need two of those. Um you might need two DJX Sparks on your desk. Um or you might need a combination of those things and one does one and one does the other. Or you need a custom rig. But everyone that is is has these things sitting on their desk and working sounds like they're spending $10,000. So BO is not going to give me $10,000. Maybe I know people buy the the this this content for this course. I can go buy one and evaluate it for us here, but it's just not in the cards here today. Uh which coding harness can we use to run open models? A lot surprisingly and uh models uh uh sorry coding harnesses that I did not expect that we could use like claude code and it worked really really well.
Okay, so that is something that's really interesting. The real question is like if you know that cloud code can run uh models from Ola Mloud, should you even bother with these other ones? And for me, the answer is probably not unless you care about seeing how the code works like uh uh PI uh coding agent or you need to adopt it across your org. So maybe you're using you want to use goose. The other ones that were like backed by startups like Kilo and Open Code and um uh Factory Droid I probably would never touch it like and they're they're nice but I just don't think there's any value in them to be honest.
I would probably just stick with Claude Code um or fiddle around with PI coding agent or write my own for fun which I probably will make my own coding harness just for learning um not necessarily as a as a serious contender. Uh which open models actually work as coding models?
Um, for me the best one was Kimmy 2.5.
Uh, GLM was okay. Uh, Miniax was okay.
Uh, Quen was awful. Gemma was surprisingly good for what it was. And if I if I had the hardware uh, locally, I'd probably run Gemma 4 because it's good enough. Uh, I have friends that say that they they with the right tooling, they can make some of these models work really really well. Like even Llama 3.2, but it's very specialized. So maybe I'm lacking some specialized knowledge, but they were also saying that it took them like a month to set up those workflows to get that uh that thing. So maybe the smaller models can work like Llama 3.2, but you have to put a lot of time into it and they're still using bigger models to drive the smaller models. So we don't have that time or that expertise, but maybe I'll pull in a guest instructor to help us one day. Uh if we can't run them locally, then where can we run them?
There's a lot of providers, but a lot of them are API token driven. And then the ones that are um subscription-based, they're generally with these providers and they're highly quantized and they're not very very good. Um and then the question is like do you just get them straight from China if they're that there? I couldn't figure how to subscribe to them like if and then also you're just paying for a single model and it's not really good one. So for me, Olama was the best option as you got a good range Olama cloud to be particular but you got a good range of it. the price was 2030 bucks was similar to other ones. Um, and they performed fine.
Ones that go under 20 $30 do not touch because they're highly quantized and they're not going to perform very well.
So there's no point in using them. If I had to use an API key driven one, I'd probably use open router. There's a lot of API key ones out there, but I want to do subscription. So you have something that you're more used to doing. Okay.
Uh, can we rent our own GPU compute and serve a model? I have uh service uh service quotas open for compute on AWS and other providers. I just don't have it. So I can't run them and even even if they approve my requests. Compute is so hard to get a hold of right now. I could probably not do it. There are other ones like coreweave but I have to go through sales team and I'm just it's taking forever for them to talk. I wanted to look at any scale but you still have to use ads or their own environment but they were too slow to get back to me. Uh llama is a possible option. Llama AI, but um when you go there, like it's a uh crapshoot. Sorry for the bad language, but it's a crapshoot in terms of what is available for their compute. So they might have an A100 that you can rent for at $3 an hour. Uh but then other times you can only pay $40 an hour for a large cluster. Um and so I could have made a video on that, but I would literally have to keep hitting the the roulette uh like on the reload every hour to find something, and it'd be very hard to replicate. and I don't know if it'd be worth our time, but I did want to go through that just to see what the experience was like. Um because you know at the end of the day it's like can I get this stuff or for my like if if I have the cape capeex spend at my organization or I want to have something on my desk can I do it? Um, and so, you know, hopefully that answers your questions, but um, yeah, I guess the main thing is like go through these videos, see what my experience is, and try it out for yourself. And, you know, just make sure that you have this in your back pocket as, you know, we should just keep all our paths open. Okay, chow chow.
All right, let's take a look at what possible open model selections that we have that we can work with. I'm going to break this into three categories. And really the last category is what we're going to use because there's confusion as to what you can use because there's gen generic models, there's coding optimized models, and there's ones that specifically work for coding harnesses really really well. So for general purpose models, these are good at all kinds of tasks. You've probably heard of these models before like Llama, Mistral, and Gemma. Gemma 4 um is recently uh released and it's very very good. Low memory footprint and surprisingly actually worked in most use cases. Um, it's not the best out of the models that are out here, but for one that has a very low footprint, it worked really really well. Um, it did have some issues with tool calling, but once we used the the cloud models, it had no problems with it. Uh, or some other fiddling with it. Then we uh we got it working. Okay.
Then there's uh coding optimized models.
These ones are specialized specifically for working with code. So they're fine-tuned on uh code data sets. They're better at writing uh and completing code and things like that. So this is where you have code llama, deepcoder, code stroll, gposs and quen 3.5. Technically Quen is a generic one, but it's also uh have a coding optimized one. Um and that I think that one's by Alibaba. And this one's really really popular, but I wouldn't say that it's very good anymore. Um I mean I if you want to put it into a managed tier, you can say it's like Haiku, Claude Haiku, but it has a problem where it just doesn't call tools and you have to do a lot of extra plumbing to make sure that it does call tools. and I just never got to experience this working properly and so I kind of gave up on it. Um, but people like it because it's very cost- effective and it's fast. Um, GPTOS OSS is um an open source version of GPT that's optimized and it sucks. We we did use it with um uh codeex because it's recommended but it's from last year so it's not really recommended. I did not touch code stroll. Deepseek coder is very old. Code llama is very very old.
So we're not working with those. I heard col lama is really really bad so there's no point. Then you have these uh coding harness optimized models. So these ones are tool use aare and that's the key thing here because if they're not tool use aware these things will just write out to the screen and not edit files and stuff like that and that's very frustrating when you make a large plan and then it doesn't do it right. And so we needed to reliably call tools be aware of tools. We needed to reliably create structured output, follow multi-step agentic instructions, not go off task and recover gracefully. And so the models that do that is Kimmy, GLM, Mini Max, GLM, and Quen. Quen not so great, but you know, maybe used it for years, you might disagree with me. I found Kimmy really, really, really, really, really, really good. Um, despite people saying that GLM and Miniax are better. Um, and so maybe they they just mean like when you're working with it uh uh long term and you're loading it up, some people say Kimmy would stall out, but it seemed like many times Kimmy would do better, but like maybe overall Miniax and GLM will be more consistent across the board for you. I don't know.
I'm not going deep enough uh and and creating evaluation harnesses to really really know as we're just basically doing a smoke test, building Flappy Birds, and just getting our feet wet.
Um, but I would just say that I was absolutely impressed with Kimmy and these other open models and I could absolutely see myself driving them every single day. Could they handle the most complex task that Claude Opus could handle? No. But um, for the cost and the reliability and the chance that you might be able to run this on your own hardware one day or if you have enough money to the cape the capeex cost to have this stuff sitting sitting on your desk, then it might be worth it for you or if you you know care about data sovereignty. Um but yeah the biggest challenge here was just uh open models that would invoke tool use. Okay but um yeah there we go and we will jump into it. Okay let us take a look at the possible coding harnesses that we can utilize.
Before we do I want to talk about where we are serving these models from. So, I'm going to be using Olama Cloud because it has a subscription service very very similar to your cloud subscription, your um OpenAI subscription, your Gemini subscription.
Um, if I wasn't using Olama Cloud as a subscription service, I'd probably use Open Router as I'm very familiar with that to get access via API keys. But I thought that this was a really good solution because we already use Olama uh to serve the model locally as it's the easiest way to do it. And so using Olama cloud was super super easy. Uh the subscription cost is $ 20 to $30. I never ran out. Um and so it also just integrates with everything. So I think that this is a really good option.
There's probably other subscription models out there. I just don't know about them. I'm not endorsed uh by old Lama cloud, but it just worked. Okay. Um and so it might surprise you that uh that you can use uh the main three with open models. So with Gemini CLI, you can integrate it with Gemma. However, I was not able to get this to work. it's in experimental mode with terrible documentation. Um, and so I just wasn't able to do this. I don't think you'd be able to serve it through Olama cloud.
Um, so this would be a model that would come from uh maybe Google Cloud Vert.ex AI or a local model, but I'm just having no luck with this, so I'm not even going to try. Um, codeex uh surprisingly integrates with cloud and um I think I had good use with Gemma 4. uh it recommends to use GPTO OSS which is just awfully old and terrible but it wasn't the best experience out of it. Cloud code did exceptionally well with all all the open models. The only model that ever seems to have problem is Quen uh the Quen model where half the time it doesn't call tool use. So there's not much we can do about that. Um but yeah, cloud code was just absolutely incredible and Ola integrating with these things are great. So you absolutely can just continue on using cloud code. In terms of things that aren't from the the main three providers, uh we have PI coding agent.
So this one is a purposely barebones open source coding harness made by a very opinionated uh developer and this person leaves out a lot of features like plan mode and tasks and MCP and it's because they have strong opinions that you don't need these things. I don't necessarily agree with that, but I did really like the transparency with this tool and it was very easy to open the guts of it and see what was going on with it. And so I think if I if I had to drive with one, I'd probably be uh using PI coding agent as my main driver. Um they have been uh the person that developed this was acquired by another for-profit company. And so this product is going under that company. So I don't know what will happen to this product in the future. Um but it is good and even if there's features you don't like, you can extend them, right? And it generally worked good enough. Okay. Goose CLI is the jankiest thing uh I've ever used, but it is part of the Linux Foundation.
So, you have a guarantee that it's going to be uh maintained and and properly open. If you're a large org, you're going to probably like this. Also, Goose is an entire ecosystem. So, it's not just the CLI, but they have like um a desktop app and probably like uh like an open claw, similar thing. So, when you have all those three things, you got to buy into something that's probably a good one to look at. but I couldn't get it to work with Olama. It's supposed to work with Olama cloud. It works with O Lama local. And so if I don't have a video for this, it's because I had too much pain with it and I couldn't give you a good example. Hila CLI is an entire ecosystem that is for-profit startup. Um it allows you to bring your own model, but I couldn't integrate this with Ola cloud, which was kind of sad.
Um I think that they really want you to use their um subscription service where they have their own models. Um, and so I never got to evaluate it because if I'm not using open models, I don't want to use it. And when I say open models, I mean one that I could host myself or comes from a subscription that that I choose and it's not like tied to that specific provider. Um, so I just wasn't that interested in doing that because I don't want to pay for like a 100 subscriptions here. Um, so I wasn't going to do that. Uh, Droid was interesting. you had to sign up just to use it. Where all these other ones with kilo, goose, pi, cloud code, codeex, if you're using the old models, you don't have to sign in anything. But like with factory, they want your information.
They're an entire entire ecosystem. I think there's a lot of marketing around their stuff to make things look more impressive than they are. So they have these things called droids, but they're really just agents. Um, and so I just think that they have a lot of opinionated um, scripts and skills and sub agents um, that just give you better results. But I did find that um, you know, it made it made like models like Kimmy that can already do really well um, it kind of made the results worse.
But I feel like maybe if you use Quen or these dumber models uh, that it that kind of tooling would help these less intelligent models and you probably get better spend. Their thing is they want you to buy into their hosted subscriptions inference just like Kilo.
So that's no surprise there. Open code's going to be very similar to the the the other two there. They're not an entire ecosystem. You can bring your own model.
Uh uh Llama O Lama worked perfectly with that. Um but their thing was more like they had a sleeker UI. Um but I would say that there was things I didn't like like when you had the plan mode it would stay in the plan mode and not know to switch out. They're trying to sell you on their own subscriptions. Um I know that their um subscription is cheap.
It's like five$5 dollars or $10, but their models are highly quantized. So they're not very good. So there's barely no reason to use their subscription. And so um you know even though they they all have their own subscriptions I feel like you should just use O Lama which is gonna give you the maximum flexibility to move between these because a lot of these things like like out of these the four out of four out of five are all backed by startups and then one day they're going to get acquired or they're going to vanish and you're going to lose that tool. This happened already. So like Windsurf and other companies. So that's something you should factor into.
I'd love to use goose but I just can't get it to work or I like the idea of goose but anyway you know worst case you could just use um cloud code the only challenge is that if the providers start blocking them so I know uh cloud code does not let you use their subscription everywhere else only with cloud code but they're not preventing you from using cloud for now so uh that is fine but yeah those are our choices and we will explore them all the best we can Okay.
Hey, this is Andrew. In this video, we're going to go ahead and get Oama set up. So, I already have this installed, and you can install it um with your on Windows on the Windows side or within WSL. If you're on Mac, you can obviously install it directly on Mac. Um and it works a little bit different, but I am going to install it onto WSL uh too, as that will be um the best way for me to work with it. uh they do have a cloud version. I do intend to use this at some point but I also want to use local models as I want to use local and then uh the cloud hosted ones as there's very limited things that I can do with the the compute that I have but we want to explore both options. Um and so for now we're just going to grab it for free. So if we go back to the home here there is a single line to go ahead and get installed here. And so down below here I am in WSL 2. So, Windows Subsystem Linux, it's just Ubuntu. And this one specifically is built to work with my Nvidia card. I do have an Nvidia video on how to how to create that in my Nvidia course. I might carry that over to here. And so, you might already have seen that video. Um, so if you are on Windows and you have an Nvidia graphics card, then you can follow along. If you have a different kind of graphics card, you'll have to do something. If you do not have the local compute, then you won't be able to do it. If you're on a Mac, um, then even, uh, M1s can run models. So, not very good, but you can try to use it just to experiment with to get an idea on how that works. But understand that I'm going to be using uh, Windows with an Nvidia graphics card is that's going to be very, very common.
The other method will be a Mac, but I just don't have a modern I have an M1. I don't have a a newer one, so we're kind of limited here what we can show. Okay.
But anyway, so we have this single line over here. Here, I'm going to go ahead and grab it. And I actually already have it installed, but I'm just going to go through the motions here as it will pretty much be the same. And um it looks like it didn't grab the whole line here for some reason. In fact, I don't remember there being an IRM. Uh so I'm going to go over to download. That's so bizarre. If we go to Linux, there we go.
So like Mac girl.
Oh, that's the Windows version. Okay.
And so I'm going to go ahead and grab that one here and we'll paste it in here. and it's going to go ahead and install the latest version. And this takes a little bit of time. I don't know why it does, but it takes a bit of time.
And we will hang out until it is done.
Okay. All right. Now, we are up to date.
We'll go ahead and check Olama status.
Okay. You can run into issues if you have Olama installed on your Windows side and on the WSL side. So, you might have to shut down the other one. Could also be running somewhere that you're not expecting it to be. But, just check out the version. The current one today is 20 uh uh 27. And if you do have any major issues, I would suggest to go over uh to the Olama GitHub, okay? Because you can run into issues with it. And just check out the issues tab if you get any particular errors. I think I was having issues with Gemma 4 um the other day with some particular integration.
And so I had to go over here and fiddle with something. I can't remember what it fixed. I think I just restarted my computer. But anyway, just consider that it's not always going to work perfectly.
And so the way Olama works and I've shown this in my geni essentials but we will show it again here just so that you have clear understanding is that there are models I'm trying to find here the models on here and you can pull different ones. So you have GLM 5.1 miniax gem 4 and just because you can pull them doesn't necessarily mean it's going to be able to run on your machine.
So use whatever your favorite provider is and tell them the resources you have. So, for me, I'm on a Windows machine and I have Nvidia graphics card.
So, I'm going to open up the Nvidia app here, and it will just say I have the um here in my rig, it will say that I have the Nvidia RTX 4060. That's the best that Baker would let me have. And so, I'll go here and just say like I have I have the GeForce uh RTX um 4060 with a Intel i9 14900 with 64 64 uh RAM Windows 11. What uh Olama models can I run comfortably?
Okay. And it will go ahead and do that.
Um and it might be a little bit out of date. I like I know what models I can run. Um, but basically what's important is like what can fit in the VRAMm. Okay, so whatever your VRAMm is. Um, and where it is here, I'm not exactly sure. It's always kind of a pain to find it and determine the information here. Um, but the point is is that there is uh VRAM and whatever can fit in there is what we'll be able to do. And actually, yeah, here it says 4060 VRAM and Olama model sizing. And so we'll give it a moment here to generate.
So, as it says, the standard is 8, and that's probably what I have. OLAM defaults to systems with under 24 GB of VRAM, so 4K context lengths. So, with a comfortable setup, 3 billion to 8 billion context models. Um, so anything 12 billion might be workable. And then here we run into other areas. So, looks like a 7 to 8 billion ones are going to be kind of okay for us. And if we're in a 3 to 7 billion, it's going to be a little bit nicer. So we'll go over to here um and we'll see what we have that we can work with. So um there is the GLM51. I believe that that is not going to fit.
Quen will absolutely fit the 3.5.
They are generally small. But if you go here, we are looking, let's go over here. I think it said context size.
Um but normally it's the parameters we're looking at. So here it says 8 billion parameter. Okay. Okay. And so if we pull it, it's going to pull the correct size one. It usually does. Um, and so we will go over to here and notice right off the bat we have cloud code, codeex, and other providers that we can run it in.
But I'm going to go ahead and grab this here for now. And we will go and um run.
We can just pull it if we want. So if I go ahead and just say pull, it will download that model. All right. Notice that it's 6.6 gigabytes. So these things are large. If you do not have a lot of room, you'll have to clear them out and you can periodically clear them out.
Sometimes older models get deprecated and so once they are gone, you might not be able to run them ever again. Um, so, you know, just be aware of that. You might want to save the weights somewhere if you really really rely on it. Um, but we will download this and then we'll just run it here and make sure we know how works and then we'll integrate it with something. Okay. All right. All right. So now we have that model uh set up. And so what I want to do is I want to have a way of monitoring um uh you know what's going on here with my machine. And so what I'm going to do is I'm going to open up because I need this more than once here.
I'm just going to open up another another VS Code here if it'll let me. I'll just back out here for two seconds. Um into another folder.
It doesn't let you open the same one.
I'll go back into our open models folder. And so in this one, you know, I'm still connected to the Nvidia Nvidia one here. We're going to go ahead and make a new terminal. Okay, we're gonna drag this on up. And there is a tool called Nvidia SMI. It's just part of Nvidia, so you should have it. And if you don't, or there's some additional instructions, like I have an Nvidia course, so nothing tricky there. But if we look here, sure why it says 10K.
There's no way. There's 10K files. We don't we shouldn't have a git agit directory in there. One second here.
That's going to bug me. I just don't want to stare at that anymore. One second. No, there's no docu directory.
So, I I guess I'll just have to stare at it. But anyway, that 10K just stresses me out. But here we can see Nvidia SMI.
So, uh we can see I think that's the VRAM right there. Um so it's like 8 gigabytes. Um and then you know we can see its usage, the fan, additional information. Uh, and this tool is okay, but there's actually a better one called um it's called N Vid Top. So, I'm just looking up um how to grab Nin top.
Nvidia top.
Nvidia top called Nvid. And here it is, NVtop. And so if we get this installed, we get a lot better information um dynamically. I don't know what all this other stuff is, but we'll go over to here and it's just pip uvx install or pip installer, however you want to do that.
Okay. Um I think you just go pip install nvtop like that. Okay. And so that would be a better way to get it. And so I'm just going to go ahead and run nvtop.
And so this is going to allow me to see as we're going here. And just also consider that I'm also recording. So there is some usage of memory here. Um and so I do have to consider that when running these models. So I do have that overhead. But here on the lefth hand side we have CPU. Um and then here on the right hand side we have GPU. So things might get offloaded to the CPU or stuff there. But if we know that we're utilizing our GPU, it should be showing up over here, right? But here you can see the memory usage and then the memory bandwidth um and then the its actual utility. So hopefully that is clear. Um and so we will go up over here. And by the way, if you don't know how to read this, just take a screenshot, feed it to chatpt or any provider that you like. So um now that we've pulled the model, let's go ahead and run uh serve. That's how you start up the server. And saying it's already in use because I'm I have it running somewhere somewhere else. I'm going to say lama stop.
Olama. I just type lama here. No, no, no. Olama. Uh help hold on the help as there should be a command to stop the server.
And this is always kind of a problem with lamas like it'll be running in the background. Um how do I stop an existing lama server running Linux?
You'd think the command would be right there, but I think we need like something to forcefully kill it. I mean, we could use pseudo uh pseudo system uh pseudo uh systemctl, but I don't know if that's actually how it's running. And so, we can go ahead and I I guess we could just check here. Status Olama.
Okay. And so, it is stopped. So, let's try this again. Olama serve.
It's only because I had this from before. Okay. And so, now that's what we needed. So, it spun up here. Um, I think I would have preferred to have it on the other screen, but that's fine.
And then so we'll go over to here. I say Olama run. Uh, what was the model we choose chose to pull? Well, we don't know. We can just say Olama list, I think. And so here we have Quin 3.5 and Gemma 4. You can see I download JMA 4.
And so if I want to run this one, I just say Olama run quen 3.5 latest. And it's going to now start up the model. If we go over to here, we should get some information that makes it very clear that it is running and being served. So, we're getting a 200, which is good. Um, so it is ready to receive information. So, can you uh write a single line code um for a hello world program in C? How about that? There you go. And so, it's off the races. And it's interesting like look how snappy it is, right? It's incredibly incredibly fast.
So you can kind of see how that's very interesting. But if we go up to here, look at my exhaustion of resources here and it's being like nearly utilized. So we're at 85%. Again, remember that I'm running OBS which is used to actually record capture our screen here, right?
And so we need to pay attention to our memory memory bandwidth and our memory usage. And you can also see things are being offloaded to the CPU.
Okay, so I think the top part says memory and the bottom is compute. At least that's what I think. And then over here, um I don't know. I think this might be the uh the the VRAMm, but like again, we can screenshot this.
Okay.
We don't know what we're looking at. And we can drop it in here and say, where is the VRAM in this image of NVIT top? Okay. And so I think it's probably uh right there. So in your Nvidia screenshot VRAMm GPU and it shows in a couple places. So it's used. Yeah. And so that's what I thought right the visual bar as well. So 86% memory. So here is our VRAM and then here [snorts] is our VRAM. I wish it literally say V RAM or V memory but they don't do that.
Um and so yeah. So it's this thing and then this thing or whatever. So anyway, it's working. It's off to the races. It's still going. I'm going to go ahead and stop that. Okay. And if we look over here, it's just idling. But notice um that this is still large because that's what it's holding within memory, right?
So, if we go ahead and we just uh quit this. So, we say quit uh or exit.
There we go. If we go back over to here, our memory should drop, right?
Is it going to drop?
It's not dropping.
We go over to here.
Yeah, I'm not sure why it's not dropping. So, we'll go over to here and maybe type lama help and we'll take a look here. Say lama stop uh the quen model list and we'll grab this lama stopwen.
So, now it immediately dropped. Okay.
So, that's what we're paying attention for. Go back over to here. We can see maybe that it stopped. I'm not sure. Um, but you can at least see the request coming in of of what you're sending.
Okay. And so, that's one there. Could I run Gemma? I don't know. That one um might hang our system here, but I'll try it. Okay. So, we'll go and say Olama list. And so, I obviously downloaded this. You just do the same thing. You just pull it. We say, "Oh, llama run whatever." In fact, if you do just this line, it'll actually pull it as well.
And so, we're going to start that up.
And I want to see how much that holds into memory.
Uh, Gemma is supposed to be extremely memory efficient. Um, so we might get better results here. Wow, look at that.
Look how efficient that is. Really? Wow, that's impressive. Okay. So, say, you know, write me write me a hello world uh uh program in C. Okay. So, we'll go ahead and do that. Look at that speed.
Look at that usage. Wow, that is low.
That is incredible.
Um, and so it'd be great if we could use Gemma. Gemma 4 is pretty new. Um, but the challenge with these models is like whether they'll end up using tool use and tool calls, which we'll discover as we try to work with these models. Um, obviously Quen, we're basically at the limits of its usage. So, I might have a hard time recording this stuff, but we will see. Um, but anyway, yeah, that's all I wanted to show in this video, but we're absolutely coming back to Llama as it's going to be the easiest way for us to serve and utilize models. But I will see you in the next one. Okay, chow chow.
Hey folks, it's Andrew and in this video what I want to do is use Olama to serve up a model through an existing coding harness. There are a lot of option out out there and I think we'll just start with the most obvious which will be cla um and I actually didn't know that you could serve it through cloud code until I was exploring models that I just don't know why I never noticed. Um, but if we go like over to here, notice that we can say like claude code uh gemma 4 here.
And we can also do codec. So we'll get cloud code installed. I mean I should already have it installed. Um, but just in case I don't, we'll go ahead and go through those motions there as it's not too hard to get cloud code installed. So I'm typing cloud code and we will go down here to claude code and we will look for the installation instructions.
So, here it is for Windows, Mac, and etc. Uh, again, I recommend that you use Linux as it's going to be the easiest way to work with this stuff. Uh, and that's what you really should be using.
So, we'll go ahead and install. I'm sure it's already installed. Um, and so it should not take too long to get going here. It's usually faster than that.
I'll pause here. There we go. Now, it's installed with the latest version. Um, and so normally you'd want to like log into Claude, right? So I type in Claude and normally you want to hook up your subscription. I'm already have mine here. So I'm just going to log out. I just want to show that I'm not using anything. So if I type in Claude now, right? It usually prompts you this first and asks for something. I'm just going to cancel this out here and type in clear. And what I'm going to do is I'm going to go back over to here. I'm going to grab this line where it specifies model Gemma. So it's like llama launch claude model Gemma. Now the question is do we need to have it already running?
Maybe. So I'm going to go ahead and just run it without it right now. And sure dark mode's fine. And we'll say enter.
And we'll go with recommended settings.
And I'm going to go over to here. I just want to see if any model is running.
Another way we can know is if we go over to here. Is there a model? So notice the memory footprint's really small making me think that the model isn't running.
We might have to run it separately. I kind of forget. But if we type in model, we now have Gemma 4, right? All right.
So, I'll just say hello. And I want to see if I get back a hello. And notice that Oh, now the model spun up. Okay.
So, maybe if it it if if it's not there, it invokes and then it spins it up.
Okay. And so, we have that. Excellent.
So, the question is, can I get it to actually do some kind of coding? And I'm just going to cancel out of here for a moment. Does that spin down the model?
No, it's still running. So, you got to be aware that the model might be running. Um, I think if we go back over to here, there might be a command. We didn't look at it earlier, but there Whoops. Lama help. So, there's probably a command to see what is actually running.
Uh, it doesn't show it, but anyway, that's fine. So, what I want to do is I just want to make a new folder in here.
I'll call it Hello World.
And all I want to do is see if I can get it to write. So this is something that I think we will fail on because um I think this is where we'll run into an issue.
But we will see uh what happens. Okay.
So I'm going to go ahead and uh hit up and get back to this command where we have Gemma 4. Okay. So now the context here is this hello world folder. Um and so what I wanted to do is write a new file. So can you create me a hello world.rb with puts hello world. Okay. Okay. And so I'm running that. I want to see if it does that or if it just outputs here.
The challenge with these things is just because you can load them up does not necessarily mean it's going to use the tools that are available to it. It might not even be aware of the tools that are available. And so notice that it did run. The model is running, but it never replied. It didn't do anything. It's strange. So go ahead and say, are you there?
I remember before I got code back, right?
And so again, it's it's thinking it's working.
Come on, Gemma. You can do it.
Now, that doesn't mean that we can't get these models to work, but I just want to show you the challenge. Here we go. I certainly can. So, here's the context.
If you're working in a terminal environment, great. And so I'm saying okay um okay you you are using clawed code and you have tool use to uh write to files please write to files okay um and I'm not sure like if as the conversation grows if the context will grow here and and the the memory bandwidth will increase or whatever but um I'm just trying to get it to invoke so they say I don't have direct access.
So, it does not know about this information. So, I'm going to see if there's some way that we can figure this out. I'm going to go over to chatbt because this is something that like I when I was using I just could not figure out how to get on the right uh to these.
So, just a moment here. All right. And so, here chatbt is suggesting a few things to us. We'll have to verify if it's true or not, but here it's saying that it it it it can support uh the calling, but it might not uh do that.
Um, and you know, could be a permissions issue. It says we do not need to manually teach cloud code what tools are in a normal sense. So here it says use the available cloud code tools. Do not describe just uh do not describe uh the change. Create a file named hello in the current directory. Hello. Then confirm it's written. So I'm going to go ahead and grab this prompt here. I'm going to clear this out. Let's give it a try. Be interesting if that helps it out.
But it's also saying that Olama somehow can help it um with tooling, but we'll see. I've yet to get it to write. So, this is where uh we're trying to experiment and see if we can get that result. If we can't, it's not a big deal, but it's, you know, it'd be nice.
And so, see, we just get this back and there's no response. So, you're in an agentic loop. When uh task requires file change, call the appropriate tool.
Try a model explicitly recommends for cloud code, right? So there's very specific ones or instead use four and that's something we will will test. Make sure your context length is large enough. Olama recommends at least 32,000 tokens for cloud code.
Okay. How do we know what it is set to?
Check whether cloud code is waiting on approval before writing the prompt.
Cloud codes doc say actions require permissions. Um use tools to create the test. do not explain, etc., etc. So, it could be a permissions issue. Um, and so we could rule that out very easily. Um, and I'm just gonna go over here and say because I don't have any example code near me, but I'll just say like, uh, what would be can you give me can you create me a permissions a permissions uh file setting for cloud code. so we can rule it out. So that'll be something we might want to do here.
We also might want to make a claw.md.
Uh no, we'll ignore it. But like that might be something we might want to do.
Just like put like use available tools do not describe the changes.
And so we're just waiting for that to generate. I've honestly forgotten how to make the files even though I made a a giant course. I just did codeex in Gemini. So I'm not remembering exactly what it is. So, I'm going to go look up uh the example of configuration file for claude.
I think it's like config.json. Let me see here. Oh, it's cloud.json. Okay. So, I believe we'll have a claw.json file. Hold on here.
Yeah, I want one that's project specific.
It's not listening to me.
But this one is quite verbose. So go ahead here and we will make a new file here. This will be um clog JSON.
Okay. And so that looks good where we can just allow all those and um What is the claw JSON file for a project? I think it's in theclude uh directory.
Sure, we can do that. That's fine as well. I don't mind that.
So, we'll make a new folder here called.cloud. My hands are getting cold.
I'm going to turn on my little heater.
You probably won't be able to hear it, so that's fine. Okay, now I'm starting to remember this. Settings local. Yep.
Settings JSON. Mhm.
So, we'll go ahead and we will cut this out. And so, we will rule this out and see if that helps us out at all. I'm going to go ahead and clear the context.
I'm going to go and just close out and then relaunch this.
And we will try this again.
And so, we'll see if this helps it write to the folder.
Okay, we'll give this a refresh. Still nothing.
Okay, let's let you intercept auto approve actions.
All right, we can try this as well.
Again, I don't think it's going to work, but we'll go ahead and try it because I think a lot of people would probably be interested in using Gemma 4 because it's so memory efficient. But if this doesn't work, we'll just go get a subscription as I'm not going to be able to run Quen um alongside it. We can try it in a separate video, but I suspect it might hang my video and so I don't want to try it in this exact video here. So now we have added this. They also said like increase the um context window.
So here Okay. What do we set this as?
Is it in our settings?
As I never really needed to do that before, so it seems to think that it's in there. We might have to look it up and confirm.
Yeah. Okay. So, it's controlled by O Lama. Oh, okay.
Um, well, what's what's the default that Olama would launch with?
Oh, that's really, really small.
So, 4K breaks clawed code. Okay, so we'll go here. Oh, it's PS. Okay, so we were trying to find that earlier and it didn't show us command. I should have just tried that. And the context is 4,96.
Yeah. So what we'll do?
Nope. [laughter] We'll grab this here.
I'm just trying to paste it. I don't know why it's refusing to copy here today. My hands are very very cold, so I'm having a hard time writing here. So, we need to stop this model. Um, so I'm going to go here and uh where is it running?
Because we're not running the model directly because it it's right here. But I'll go over to here and we'll say list.
I guess we can see right there. So, ola stop gemma 4 latest. I would think the context model would take up more of our memory. So, that might be another thing we need to consider. And so, it appears to be Whoops. It appears to be stop now here because it's smaller the VRAM. And so, I'm going to go over to here and I'm going to grab this line.
Yeah, we're just going to go with what it needs. And so now we've set the uh context there. We'll say lama run gemma for latest.
And so after that, we will confirm by looking at the context over here, wherever it is, the lama PS, it's still showing 4096.
Okay, so that did not work.
I'm going to stop this here. Exit. Quit.
And we will stop gemma 4.
And I want to check this. We'll say env this value. And so it is set.
Okay. And we'll go over here.
This did not change the size when I checked with PS. Okay. Okay. And so we'll just go here and say like llama context window size uh envir as I don't necessarily trust chatbt here and it's setting it there. We just go look at the documentation. So 24 GB is 4K context here. It says 24 to 48 Gbits VRAM is this. And so I don't think I have enough for this context. So, that's probably my problem here is that this is 32,000 context. And so, I'm just not going to be able um to have a context that size, right? So, I'll go back over to here.
I'm going to stop this. Can I stop? Can I even run a 32K context with my RTX 4060?
because I don't think I can based on that information.
Well, no, I just told you what it was.
It's right here, buddy.
Sorry. The the docs are right here.
But I'm pretty certain based on this information, it's going to be like, no, as that makes it pretty darn clear.
Sable is not saying what you're currently running. If you have enough VRAM, Olama can default to this. So, Olama still defaults to unless overridden. Even if your model supports 32, so even if your GPU is capable. So, here it's uh 8 gigabytes VRAM. So, 4K.
So, you cannot run a 32 VM. So, this will happen. And so, right off the bat, we are already at a dead end here with Gemma, but at least we know, right? And so, if we want to use Gemma, uh we'll have to serve it from somewhere. And so maybe we'll have to grab that from uh Google somehow, but I'm not sure because we kind of expect that to come through Olama. And the question is like if we go to Olama, they do have open uh open models, right? Um in their cloud offering and I just wanted to take a look at their model library.
And so you know, is this something that they serve up in the cloud?
they do have a cloud one. So maybe we can test this and then we'll have the correct context. And so you can see the context window here says 256. Um and so this might be the only way for us to run it. So you can see it's not that you can't run these models locally, but you're going to hit those limits with your VRAM. Um and it really comes down to that, right? And so like if you needed that VRAM, like what could actually run it? That would be the question. So if we go back over to here, uh you know what kind of hardware uh would give us enough uh VRAMm to actually run Gemma 4?
Like would it be max uh Mac uh M3 Max? Okay, there's also the uh um uh G DGX Spark, but DGX Spark normally has a lot of memory. So here it says what running on four actually means. So enough uh VRAM for the model weights, enough for the KV cache, enough B bandwidth so it's not painfully slow. So if it's this okay and we're running the four billion parameter okay so yeah obviously not for that type but possible right so still it's it's really really hard there is software like expo which can allow you to network them together um okay uh AI high, not the band. And so this is a way for you to like have multiple Mac minis and networking together. But you have to consider that there can be latency between the network. So if you took in my or taken my Nvidia course, you'll learn like why you want DJ DGX Sparks GGX Sparks here. Okay, so DGX Sparks um they have uh the Nvidia's uh NVL link in it and so it allows for extremely fast um transportation GP2 GP GPU to GPU transportation and so that makes it extremely valuable. The only challenge here is that okay well you might now have no latency between GPU to GPU when you're stacking them but then um you might have less VRAM as um these uh Mac Studios the the whatever number they are extremely extremely good with VRAM and so like okay what about what about two DGX sparks network together okay and it's interesting because if you go down this rabbit hole. They might suggest you get like a Max 3 and a thing and then network together. I'm not sure how you would do that. Um but again, unless we have this hardware sitting on my desk and I don't I can't answer this uh uh uh clearly, but again we are still going to test out um other stuff here.
So here each Spark has 128 unified memory of 200 Gbits per second. That's the thing that we need. Yeah. Via these um connect X nicks. It's the custom uh nyx that Nvidia makes with MVM NV link.
So two sparks are not the same thing as one box with 256 seamless memory. Uhhuh.
Uhhuh. And so it's all it's always like a it depends. It might work. And so again, if we don't have these on our desk, we're not going to know for certain. Okay? So maybe one day I'll rent them. All right? There's a place that I could rent them for. But for now, I cannot answer these questions for you.
It's just I'm limited with our resources here. But anyway, I think we got our answer here, which is like yeah, we need certain things to run the stuff. Um, so yeah, we'll move on and try to use cloud ones now. Okay, chiao.
Hey, this is Andrew. In this video, I want to see if we can use codecs along with specifically with Gemma. Uh again, I'm not saying that Gemma is the best recommended model, but if we can run it, it'd be very interesting to do that.
You'll notice that for each one, they are recommending very specific ones. So again, we just want to see if it can run it. So I'll go ahead and get Codex installed. So Codeex install instructions as it shouldn't be too difficult for us to get this installed.
So go ahead and grab Codeex here. We'll run it here.
And now it's installed. We're going to go back over to O Lama and in here I'm looking for how we can launch it. So just hyphen m for model. Okay. So I'm going to make a new folder here. This will be flappyird uh codeex gemma. Okay. I'll just make a new readme.md file. And this is just to give me that single file here. This is OOS.
To use codecs with O Lama, you have to use the OS flag. Okay, sure.
And just going to grab this model here.
I just want to see if it busts. Okay.
So, we'll go ahead. Whoops. We'll go ahead and we will copy this. We'll paste it.
Hit enter. The only thing is like they have recommended models, so they don't explain why they're recommended, right?
Right. So they'll be like, "Oh, use GPT OSS, the open source one, right?" Um, we'll go ahead and try this.
And I'm in the wrong directory.
So go Flappy Bird. Oh, Flappy Bird D here.
And we'll go ahead and uh run this command here and just see if it even works.
And so we'll go ahead and type in model.
And so it's not specifying here, but you can see the models up here. Gemma 4, Gemma 3. Okay. So I'm going to go ahead here and we'll just say uh create we'll put it in plan mode. Um, create me a Flappy Bird.
Um, single game, single page, uh, single page, single index HTML page, plain JavaScript and CSS, use WebGL. Okay, so we're just trying to test out the coding harness to see what kind of results we can get. Probably the hardest one would be to utilize Gemini CLI. In fact, they don't even list it here. Uh, and when you want to use Gemma, you I think or Gemini, you'd probably serve it up via Google's API and it's experimental. So, um, but anyway, this is actually giving good planning information, proposed plan. So, I'll start by exploring the directory.
There's nothing to explore. Um, I'm having trouble in the directory since it's from scratch input, etc., etc. Okay, great. Go make Whoops.
Just talking about the implementation.
And so getting good results back. Not to say that cloud code wouldn't have done this, but for whatever reason, we're getting back excellent results. Say great, go implement.
Okay. And we'll go ahead and hit enter.
And um maybe it's still making the plan up here. I'm not sure. But we will hang tight until it is completely done and it executes it. Okay. And so far it's still running in five minutes later. That doesn't mean that it won't come back here, but obviously um uh cloud code came back in a snappy result. Really to test this. We have to run it like three, four, five times. I don't know. Um, I'm also just kind of curious if there's anywhere where I can track my usage. I don't think that there is. As when I looked it up earlier, there was no information.
Um, oh yeah, there is. Here it is. Okay.
So, they have session usage. So, resets in 1 hour, weekly usage. Wow. Resets in 1 hour. That's generous. But maybe I'm not in the 5h hour window here. And apparently I've utilized next to nothing. Um, but it is going really, really slow here. So, you know, will it come back with anything? I do not know.
It looks like it just completed.
Um, which is not bad after five or six.
Oh.
Uhoh.
Oh, is that just the plan? I thought it implemented the plan. Hold on a second here. No, I said great. Go implement it.
I am in still in plan mode. Where's my Where's my code? So, yeah. Uh, see this is where I'd think I'd prefer it to execute things individualized.
So, I'm going to go ahead. I'm going to go and create a plan file here.
And I'm going to go ahead and type in clear.
All right. So, I mean, obviously, it did not execute um what we wanted it to do, but we still have uh the implementation here. So, I'm going to go just stop this. I'm going to re-enter it back here. And let's see if we can drive it based on the plan. Um I kind of wish that it would break up its tasks. So, maybe that might help it a bit. But maybe we just reference it.
Okay. Please implement this plan and we'll see if that works.
I mean, obviously it can use tool use, so not really concerned about that. Ah, there we go. And so now we we have a list. Um, and it's off to the races. I'm not sure if it would check box this off as it goes, but at least we know that it knows what it needs to do. And so we just want to observe the final output here. Okay.
All right. So, um, it didn't go and implement it. I mean, it made an updated plan. Okay, great. Go implement it, right? And so, we're trying to get it to go implement unless it was running this entire time.
Message submitted after next to call.
Okay. So maybe it is still um implementing.
Okay. And maybe we just have to wait here. I just can't really tell that it is processing. Normally it will show something that it is working, but maybe it is working. So maybe I just got to hang tight. All right. So we're still waiting several minutes and we're not getting execution. And so you know maybe this is where uh you know we are not going to get the results that we want.
Um, so I guess what we might want to do here is take Olama suggestion and try to use the recommended one for codecs.
Okay, we're just generally testing these out and seeing what works and what doesn't and making best observations.
Uh, nothing super exciting here, but uh, I'm going to go over to here and there's GPTO OS and they have the 120 billion parameter one here. So maybe we will give that a go. Um, and so we will go up to here.
We'll place that here.
And let's go take a look and see if that makes any kind of difference.
It shouldn't download them all because there's nothing to download, but I guess it's just uh rapidly downloading. Could not find bubble wrap on the path and uh install. So I actually haven't put bubble wrap in here. Bubble wrap is necessary if you want a sandbox environment. See the sandbox prerequisites. Not really concerned about it. So let's go ahead and just say can you implement this plan and let's see if the GPTOSS does any better.
Okay.
And you know while that is going I'm just kind of curious uh what this model is. I don't think I've seen this one before.
Okay, I'll just pause here. We'll wait for this to hang up. But over here, it's saying, you know, there's not a single official model name for OpenAR meta. It's labeled uh used by tooling.
So, GPT style open source. This usually refers to open way LLMs that mimic GPT behavior.
Okay. But which one is it?
Open Way LM hosted by that a host of mixtrol variants. Okay, so maybe it's based off a mixt.
And so we'll go over to here while this is executing in the background. It is working. Let's kind of observe what we have here.
Say GPT.
So it says OpenAI's openweight model designed for powerful open reasoning, versatility, etc., etc. Well, which is it? Here we go. So, Olama partners with OpenAI to bring the latest state-of-the-art, etc. Uh, the two models bring a whole new, etc., etc. Okay, so I don't see whether we have Mistl whatsoever in here. It sounds like this is this is something produced by OpenAI.
I'm not sure how old this is.
2025. Definitely not new.
Okay.
Well, hey, at least uh at least they're there.
Uh at least we can see what it's trying to do, but we'll just have to hang tight and see what happens. All right. So, after five minutes, we have Oh, no. Two minutes maybe. Feels like five minutes though. We have something generated.
We'll go back over to here. Give it a refresh.
And this is the codeex folder.
Okay.
Flappy Bird Codeex Gemma.
And I'm refreshing. I don't see any file there. I just to explicitly see if it's there. I just going to go into the Flappy Bird Codeex.
No, it's not there.
You never wrote a file here. You never wrote a file like Let's do a sanity check.
Create a in uh test.txt txt with uh bananas in it in the current directory, current folder.
Again, we're just seeing if it will correctly use tool use. It's kind of frustrating because after it runs that, right, it's like you don't want it to uh run again, right? [laughter] Um so here it is trying to attempt to write it.
Okay.
which is fine. So now we have our single text file. So I go back over to here and so um I'm going just clear context.
We'll say you know I want uh create a task list a list of tasks that need to be executed uh to complete this plan.
and write it to uh tasks.md.
So this might be the only way that we can kind of help it out. It would be nice to jump back to Gemma and take a look and see if um there are issues uh with just writing.
I need to read the repository to see what's already in the plan and then create task. Do you approve? Yes.
Okay. Let's see if we can get it to write to a file at least a larger file, but like a small one's not going to help us too much.
Hey, well, it's trying to create the task file, so that is good.
It's weird that it does like a cat or whatever. Um, but you know, as long as it's trying to make something. And so here we have a list. So created the task, a concise step-by-step list. Um, sure. Sure. I mean, like I'd probably give it a little bit bit better structure and and try to say like, hey, divide it up to other things, but um Okay, so I'm going to go here and maybe just like make it a task list task list create uh the test harness to verify, right?
Put this at the top here.
Okay. So, I'm going to clear this out.
Um, I want you to implement the uh task list.
The tasks in the task list implement one by one. Check them off as you go. um uh you know for full context C plan MD. Okay. And so hopefully that will help it. Um the big tr trouble here is like it's not going to clear context uh after each action and reload its information. Um but we'll see if it can kind of work this way. Sure, I'll implement it. Uh may I run commands with escalate permissions?
Yes. Yes. Like what other permissions do you need? You're good to go.
And so it won't exactly work, but we'll see how it goes.
All right. So, would you like to run the phone command? So, print fless functions. Sure. I'm not sure what it's trying to do, but we will let it. Yeah. Give a try there.
Mhm.
It's doing something.
But yeah, I just I don't know. These models might not be good enough to Well, again, depends on the other models we try out. Like Kimmy might do a lot better, but um some of these might uh might not be able to be up to the task for completing stuff. I was surprised Gemma actually completed four um cloud code, so at least that's really really good. Um but we will just hang out and see what happens. Okay, everybody's coming back with some stuff here. We'll go ahead and say yes.
Um, and did it check it off as it went?
Oh, yeah. It's going to edit it.
Uh, yeah. Okay, sure.
Apply patch placeholder. I don't know why it's running such strange functions, but it's trying.
Whoa, look at that. Did it really do that much?
Oh, maybe it did. I like how it doesn't seem like it did it one by one, though.
It looks like it just went ahead and did the whole thing. [laughter] But hey, at least it updated the task.
So, I appreciate that. Um, I mean, like you still have to do the harness.
So, unchecked items. Okay. So, okay. Go go go implement your harness.
Go implement the rest.
But hey, at least it was able to do it.
But is a is finicky, you know, like all right, so it's completing the next task.
But it'd be nice if like if again it was more incremental and and there was more verification of each step, but um I'm not sure what we would have to do to get it to listen.
Um but again, it's it's very slow, but I mean like you know, it's it's it's to be expected, right, for what we're doing.
But here, it's going to go ahead and update the list. And so now it's suggesting that everything is completed.
Uh technically we've basically moved over to the codeex one as the GMO one was not working the best way possible.
But let's go over here and we'll go and just open this in a new browser here. Go ahead and the game is broken.
Jumping pulse raises bird.
Okay. Well, it was a nice nice attempt, but you know that model is not a new model. So, I want to switch back over just to Gemma for a second because I'm curious.
I just want to see if it can just write a single file. So, can you uh update the text txt to have apples instead of bananas in it?
Okay.
and it's exploring the file. Can it edit it? I just just again simple test read.
Yeah, I can. Okay.
Um, so you know, I would just say this is like the this is the implementation done by um GPTO OS.
Well, I'm going to go back over to here and let's see if we can give it the similar prompt.
So, I'm just hitting up here.
I I want to make sure it has context.
I'm just going to go ahead and and just say like tasks. Oops. Tasks MD. Oh, it just auto completes it to that. Okay.
Well, that's fine. Then we'll go ahead and hit enter. Um, and so let's see, you know, if it can do the same same thing or or have a better chance of doing better.
But I'm hoping like this way we're kind of forcing it to say, hey, write the file and update it.
Um, but those plans are kind of frustrating because like if you have that plan, it doesn't do it. It sucks.
So yeah, it's built it checklist.
Doesn't look like it's going to do it individ individualistically. Maybe it will. Um, and even the plan, we could have probably wrote in it like, hey, checkbox these off as they go as opposed to placing it in the text there. Maybe higher priority would have actually been a lot smarter about doing it or repeating it would have been good. Um, but anyway, it's off to the races and we will see what happens and see if it actually produces a file. All right, we got a file edited. Also, another thing that I should consider is like is it attempting to read the other HTML file?
I don't think so because it would have told us if it did, right?
Um, so it does list the uh contents out, reads the task in the plan MD. So it catted out the files. You could tell it's not looking at the existing index file. Oh, look at that. It's checkboxing off as it goes. Cool. But is it actually checkboxing off in here? No, it's checkboxing off in whatever this is, which I guess is fine. Um, but only if it actually updates the file, right? So, okay. So, um, I don't know if you caught uh that because I paused and I have no idea how long this has been mucked up for, but I'm going to go ahead here. I was just talking to myself and it was all paused.
E, but the uh, it it finished the job.
And so, we'll drag open over the index one here. And you can see it works.
So, I'm impressed. So, Gemma works really, really well. I'm thinking maybe the reason we're not seeing it recommended by OLAM is just not much not much use with it because Gemma 4 is very uh very new. Um, obviously GPT OS did not do very well, but it's an older model. Um, so we're going to have to experiment with more models here. The question is is like if we were to serve our own model and is it even affordable for us to serve our own model from uh raw compute, what kind of performance would we get? And that's what I'm kind of interested in. Um I mean people from a data sovereignty perspective would care but I'm just thinking like okay what does it actually look like when we know what guaranteed compute that we have what these things would perform at?
That's a separate task here but we have more models to evaluate. So we'll continue on. Okay.
Hey this is Andrew and in this video we are going to utilize um Ola Mloud. So, I just went and I just signed up and andrew in my credit card for uh Ola Mloud for the pro edition. Um, again, it's just to test it out and so we can actually use a model that will work appropriately. Um, and there's obviously a lot of models. I seem to be very focused on demo, but we will give them all a try here. Um, but the idea is if we go down below, you can see that there is a 43 1 billion cloud parameter model that we can utilize. Uh, I want to make sure that I'm not running anything on my GPU here that I don't I don't expect it to really utilize the GPU, but I just want to make sure my GPU is uh done here. Okay. And I don't know if I need Ola to be running. I'm going to stop that uh for now. And I'm going to quit out of this here. And so let's go ahead and hit Oops. Clear. Clear.
And I obviously want to run the model here, but I want to run the cloud one. So, I'll just grab the cloud one. I'm going to assume that that's all we have to do is just put the name in here like this, and it will utilize it.
So, we'll go ahead and hit enter. Could not connect to Olama serve. So, I'm going to go back over to here. I'm just going to type the word Olama. I haven't really been doing that. But here you can see if we want to um utilize Olama through some particular thing but what I want to do what I want to do is see how we can get logged into cloud as I've never done that before. Uh is there like an O sign in? Okay. So we'll say lama sign in and we are already signed in. Okay.
Okay. So, I'll just do lama serve.
I didn't think we would have to do Oh, it's already in use. Uhhuh. So, I'll say lama.
So, it pseudo systemctlama status status lama. I don't know why it's the other way. I guess it kind of makes sense. And so, I'm just going to go here and Whoops. I'm going to say stop.
Sure doesn't look like it's running a and then we'll say llama serve.
Oh my goodness. I really find this frustrating. Um as we aren't running it anywhere. Oh, maybe we're running the model here. Exit.
Okay. And so I'm going to go back over to here. Say stop. We'll get the status.
And yeah, it should not be running.
Oh, llama serve.
And so this will happen from time to time. I just gonna go over to here and be like, how do I kill it? Because I'm obviously not running it. And this is a problem that I would run into.
If you really want, you can just restart the machine. Um, bug go ahead and do that. See, there's no process.
Kill lama. There we go. We'll try serve again. Olama serve.
Okay. So that's just something that we will have to fiddle with that there. So I don't have to look it up again. And we will hit up.
Nope, I don't want that one.
I don't want that one. No. Trying to find where we were running this before.
Okay. Well, I'll just have to copy the line again. It's fine. So we'll go over to here and I'm going to grab this entire line.
And I'll just put it here. And I'm going to grab Gemma 43 billion parameter cloud.
So now we should be able to run this.
My copy paste is just terrible. I don't know why.
And we'll hit enter. And so we'll check model. It says it's using the cloud one.
Say hello.
And so the idea here is it's supposed to be going out here. Notice that we're getting no member usage as it's supposed to be utilizing the cloud and replied back. Okay, excellent. So, are we in the hello world folder? Yes, we are. So, can you write uh hello worldrb with put hello world in this folder? Now, consider that we also have our permission set in this cloud file. So, if we remove those, we might want to consider that. So, I've created the hello world.rb and it instantly worked. So that's really really interesting. What if we were to just get rid of this file? So just say other because it won't load it that way.
And what if we were just to go in here and delete this out for now.
And so I'm going to launch it again, delete this file, and I want to hit up see if it can do it without that information.
Does it still work? It does. Okay, great. So, we don't need our settings.
Okay, we do not need our claw.md. And it's just working.
So, I think that if we were able to get a larger context, it probably would have worked. Um, but that's kind of interesting. So, I would like to extend this to a larger task now. So, I'm going to go back out here.
We'll say uh make a directory and we'll say um Flappy Bird because that's a very small scope project.
And we will go ahead and hit up here. Run it again. Can you We'll make a plan. Can you create me a Flappy Bird uh game that is That's a single HTML file with JavaScript and CSS inline using WebGL. Okay. So, we'll go ahead. I put it into plan mode. I did shift tab to do that. And so all we're trying to do is um test this experience out.
Again, we're using Gemma just because it's super efficient. But we will take a look at the other models as we do have other models available to us. And Gemma is not even the recommended model, right? So if you go over to um the documentation, it basically re recommends everything but Gemma, but it's nice to know that it works, you know? So, we go say O Lama uh claude code here. And if we scroll on down here, it will actually tell you like, hey, you should use these ones, right?
Like Kimmy, GLM, minmax, Quen, GLM, Quen. So, it's not even recommending that one there. But we're going to wait here and see what comes back. Okay. All right. So, after waiting two minutes, it comes back. Create Flappy Bird game ML.
Um, so it created a task. I'm not really sure if that would show up under our task. That makes no sense. I don't know why it just says that. So, we'll go.
Okay, great.
So, maybe um the plan mode's not the best choice with us here. And we'll go ahead. Whoops.
Great. Proceed to create the game. Okay. So, hopefully it understands what it is that we want to do there. But for whatever reason, plan mode, you know, I'm not sure. It doesn't seem like it really comes back there, but it it has that task here. So, it's clearly has set one, right? Um the single task. I don't know what happened to all our requirements. Um but we'll have to learn how to drive this stuff as we go here. Probably I would make a plan file and have a checklist and and give it individualized tasks to do to iterate over. That might work better. Um or you might have a larger model um a manage model and then you use one of these to work that there but we'll give it a moment here and see if it can even implement this simple simple task. Okay. All right is back. Let's take a look here and see what our results are. So it's gone ahead and created a Flappy Bird game. Okay. It is using GL shader. So it is using WebGL which is what I asked to do. So that is good.
Um, if we go over to here, only thing that we're not going to be able to track is our usage. That's the only hard part here. So, WebGL rendering, etc., etc., so I really wasn't expecting it to come back with anything. So, I'm actually surprised, but we will see if it actually works correctly.
So, we will bring it over. Oops.
I'm trying to get it in my browser here.
It's just refusing to open. Come on. There we go. So, we got a Flappy Bird game. Not like this is super complicated as all these models should be able to do this, right?
Okay. Well, it's Flappy Bird.
But, you know, again, that's a very very very very low bar, right? Um, something that would be really impressive to me is like, could it make Wolfenstein? Uh, so the newer Frontier models are able to make Wolfenstein 3D with um coding agents and so that would be a better uh test like with ray casting.
but it didn't make a very good plan. And so that's the only concern that I have is like it did not write out a plan. And so maybe I could be more explicit about it. Um so that would probably be a much better test, but I would call this done for now. So um you know, but we'll do a little bit more testing here before we move to other models. Um the other thing is that we might also want to just do a flappy bird within um this is Gemini. This is not sorry this is claude code um Gemma but I I just want to test this in other providers like codeex and stuff like that and also uh give it something a little bit harder to do. Okay. Um but yeah, we'll see you here again.
Hey, this is Andrew and we are continuing on testing out different models. Um, obviously there are lots of different um harnesses that we can use though I still think that we should persistently uh keep trying Cloud Code here. Um, so I'm just go over here cloud code Olama and I'm going to get this up on the screen here and let's just take a look at some of the other recommended models we have.
So we have Kimmy K2, GLM, Miniax, Quen.
Quen's not that good. GLM and then Quen 3.5. I mean, obviously these ones are ones that you can run locally, but we're going to focus on these cloud ones because, you know, I have limited compute. Um, I'm going to go over into um I mean, these are all basically Flappy Bird. I'm not sure as to why I did it this way, but we will go ahead and we'll just say Flappy Bird. Flappy Bird um Claude code Kimmy K2.
Okay, it's technically K25. Maybe just adjust that there.
Again, we're just doing a benchmark across these things to see what kind of results we can get.
Um, and so just go here. I'll grab this readme.md.
Whoops.
It'd be nice to do them in parallel, but I think we're limited based on the uh service that we have. And so I'm going to go ahead and grab this here. It's saying we don't need the OSS flag. So I'm going to get rid of that. Oh, that's also codecs, by the way. I don't want codeex. Sorry. I I don't need codeex again. It's fine. Uh so we'll go back over to here. Okay. And I mean the codex wasn't that bad. Like I really can't tell the difference. So, I'm not sure if using a a generic coding harness is going to really uh pay off in any way, but there are a few that we want to try like goose and stuff like that, but we won't do that just yet.
Okay, so we're going to go ahead here and we will switch out to um sorry, I know this is really finicky here, but we'll go ahead and grab this one here and we will now go ahead and run it and we'll give it the prompt. So, create me. Well, let's see if I can just hit up. Nope, I can't. So, I kind of feel like I should be saving this prompt so I'm not writing it constantly. So, prompt MD.
Create me create a plan for a uh create a plan for I guess I don't really need the prompt, but I I might need it later on. So, create a plan for Flappy Bird.
Come on. What happened? Can't type now.
for Flappy Bird game.
Single index HTML file, plain JavaScript and CSS, use WebGL.
Okay, so we'll go ahead copy this. We'll go down below, put it into plan mode, hit enter. Whoops, that's not what I wanted. That is not what I wanted. I'm going to just copy this. Paste. Hit enter. And so we will see how Kimmy 25 does. I mean, I feel like these are all going to pretty much perform the exact same way, but what we're looking for is like, are there any super dead stops? Um, is there anything, you know, crazy crazy wrong here, but we'll just continue on here. Okay. All right. So, right right here, we have invalid two parameters. So, it tried to call plan. It's having a little bit of issues with it. So, you know, I again, I don't know if this plan thing is is an issue for it. could be the case, but we will wait to see if we get any plan back here. Okay, so it does come back with a plan. Um, and with code, I'm not sure why we got code back, but is a is a detailed plan.
Looks pretty good. So, go down below here. We'll say yes, auto approve, and let it go ahead and implement it. Okay.
And it looks like it stopped. And so this is again a continuous problem that we're having where it will go ahead and make something and then stop. So here I'll just say plan MD because it does seem a little bit confused.
But I don't want to lose this plan. So just go ahead and copy it.
Go all the way down. Uh you didn't execute the plan.
We'll see if that helps at all. But really, it seems like breaking down the task would help it a bit. All right, we are back. And as I've uh as this has been generating out, I've been trying to see if I could build like an eval agent as it'd be nice to be able to run this against a bunch of tools because right now we're just kind of playing with it and seeing what happens. Uh but it seems to have come back here and says that it has created it. So we'll go ahead and take a look here. Um, and we will reveal it in Explorer.
Okay. The only problem with these tests is that, you know, they can be pretty brittle. So, go here. WHOA.
WOW, that's pretty good. Okay, this is like the best one so far. Uh, like just overall, I mean, again, we can't just say whether this is good or not. I mean, like it's good. It's good. Let's not kid ourselves. But um you know the question is really like evaluation of uh you know like is this model really optimized at Flappy Bird or is it uh good at all general programming tasks and we really do need to create some kind of criteria to test these things against but that'll take a little bit of time. So right now we're just playing with the models but I would say good job.
Uh you know KB 2.5 maybe we'll come back and use this again. That was impressive.
Cha chia.
Hey folks, this is Andrew. We are back and we are going to attempt in this one to use GLM5.
So, we're getting some interesting differences here. So far, Kimmy 2.5 has left me impressed.
We will see what these other ones can do.
Go ahead. We'll say alarm launch claude uh launch cloud model. Okay. And so we'll go ahead grab this here. Launch it up.
And luckily we saved the prompt over here. So go ahead paste it here. Put it into plan mode.
Making sure that we have uh this model selected. We do.
And we'll see how it does. Be back here in just a moment. All right. It's come back here with a plan. And so we'll go ahead here and just make ourselves because we obviously won't know our results are here if we do not save it.
Um, and so we'll go ahead just trying to copy that plan without also killing that there. And so we'll go over to GLM and we'll say plan.md.
Might as well just save our results so we can take a look at the plan. How's this plan look?
I mean, it's definitely more concise.
Looks fine to me. We'll go and say yes.
Auto approve.
We'll send it off here to the races and see what we get back.
Oh, okay. It looks like we are back and it has generated out uh this here. All these things take, you know, they're not super fast, but they're not uh I mean, no, they're slow. They're slow. But, um, at the very least, it's way better than, um, what I was actually expecting for, um, uh, performance-wise.
Well, let's go ahead here.
These open models. Go here. Drag this over. It looks the same. Huh.
Oh, you know why? I'm in the Kimmy folder. That's why. See Kimmy folder. I can be like, that's really weird if that's identical. So, I just need to back out here.
Okay.
And so now we have the actual index file.
Okay.
Nice. Nice aesthetics.
Wow, that is smooth. Why can't Gemma keep up with this? But okay, so so far the recommended models are performing, at least for Flappy Bird, pretty good.
But I will have to give them something harder to do.
um than this.
All right, but that's all I wanted to test here. And so we will move on uh to another model, but that's GLM, folks.
Hey, it's Andrew and we're continuing on building Flappy Birds. Um this time we're going to use Miniax. So that's another one I've heard good things about. Flappy Bird Claw code.
And here we'll just say Miniax M27.
Okay. And I'll make a new file in here.
Will just be readme.md.
And we will go over to here. I keep picking up codecs here. We'll go over to here to this one. Grab it over here. Drop it in place. We'll grab Miniaax.
I'm not looking forward to doing Quinn.
I know Quinn's not very good. I mean, it's probably good for its time, but not anymore. And so, we'll just CD back here. And we'll go to Flappy Bird Claude, uh, Miniax.
Hit enter.
And we'll give it the usual prompt here.
Put it into plan mode. Plan mode. Hit enter. And chill out. So here it has written a plan. We'll go ahead and take a look at um the plan here.
Plan MD.
Go ahead and copy that and we'll save it.
Um so we'll go ahead and say, can you go ahead and execute it? It's nice if it just goes straight to the end here, but so far the recommended models have been working splendidly. So, we will hang tight.
Okay. And so here it looks like it would have executed it. Um, it didn't write the file. Where is it?
Right index. Well, it's calling, right?
So, oh, no, it's right there. Never mind. Okay. And so, let's get this one open. Um, so I'll just reveal my finder here. Drag it into a browser.
Uh, and there we go. So, we'll hit space and it's broken.
Is it worth to try to help it fix it?
No, I don't care. But it either works or it doesn't work. Did it add its own validation step? Go over to here.
It had a verification step, but for whatever reason, it does not work. So, well, everyone's saying, "Oh, miniax is good." We could probably ask it to fix itself. We could try if I go back and just check this one more time.
Okay, one second here.
Just a moment.
Take a look. See what's wrong. Usually don't try to fix these, but from frame URL.
I mean, I don't think there's any major issue here. Let me take a look here.
safe. That's fine.
I don't think that's why it's failing.
But, you know, I don't think that these have vision vision in them, right? So, if I go here and drop this in, I don't think that's going to work.
Uh, control altv, shift v. I think I had a plugin for this, but it doesn't work.
So, I could go, hey, it doesn't work.
I know that's not very descriptive.
Let's just see if it can do anything to solve the index HTML file.
Well, hey, it has detected several bugs.
So, the main one is the pipes, the ground, the clouds are all quad buffered in the UVmapped. Uh, so they're sampling the wrong atlas regions. Let me write these properly. I suppose you could do that. Okay, rewrite them. That's not the problem that we're having though.
Oh, maybe it is working right now.
Sometimes it's confusing because it doesn't show that it's working.
So here I can't tell.
Okay, we'll just hang out here. All right, it's applied a bunch of fixes.
That doesn't necessarily mean it knows what was actually wrong with it. We'll go ahead and give it a refresh. And it still does not work. So, um, and we don't have any errors, per se. So, it's really hard to report back and just tell, hey, there's no birds.
Um, but I think like if there was a test suite of things to check, then that's where we could do some better luck. So, I would just say that unfortunately Miniax did not do as well as we hoped.
But that doesn't mean that it's bad. It just means in this particular use case it did not pass the simple Flappy Bird test. Okay. I'll see you in the next one. Okay.
Hey, this is Andrew and we have one more model to test. I expect this one to perform terribly um as it's actually one that I can run locally and it doesn't normally do a good job, but this is going to be the Gwen 3.5.
So, we'll go ahead here and we will make ourselves a new readme uh MD. We'll go ahead here and just grab this command so we can see it. and we'll go back and we'll grab Quen 3.5. Also, you have to consider that, you know, these models could be quantitized. So, maybe they do perform really well, but um the way that they have been modified or changed for Olama changes their performance. So, that is something that also really really matters when you are evaluating these things. We'll go ahead and we'll paste that in there. I'll go grab our prompt.
We'll place this into plan mode. Paste it in. Hit enter.
and we'll just wait a moment here. All right. So, we have a plan.
Go ahead and grab our plan.
Um, and we are now in uh where is our new folder? This is Quen Quen. Quen Quen. Quen, where's your folder? Oh, I spelled it wrong. It's uh Well, I'll fix the spelling before I publish it here. It's not Gwen, it's Gwen. Probably would have done better if it was spelled Quen uh Gwen, but we'll go go back over because people like Spider-Man, right? So, we'll go down all the way down the ground here and we will now auto approve it and we'll see how it does.
All right, so it is back and let's go take a look. It actually executed very very quickly. Um, surprisingly, extremely quickly. Did it actually write the file? No, it didn't write the file.
Oh, but this is just the plan, right?
No, we already we already saved the plan.
Oh, error writing the file.
Uh, okay. So, we'll go ahead here. And this is a problem with Quen and Gwen.
Half the time it doesn't work. So, um, you never wrote the file.
I did create an index HTML file for you.
The file was not already there.
So, I'm not sure what it's talking about. So, no. No. I created a file which is empty.
You need can you do a sanity check and just write hello world in the index html file. See if that helps it.
Okay. Uh so no I created the file and it just doesn't get it. So I mean querw is doing what I I thought it could do. So could maybe just be like can you output?
It seems like it's trying to read the file. It's really stuck.
Restore the code.
I'm not trying to rewind.
And so that one's just not going to work. Okay. So well that's Quen. We could probably tell it to like output the the code here, but if it can't write to the file, it's it's functionally useless to us to be honest. So yeah, there you go. We've evaluated all the ones that we can with cloud code. uh you know the question really is like is there a coding harness that is going to be better suited as claude code is obviously fine-tuned specifically for anthropic models um I don't want to have to run through every single coding agent to find out but like the only way to really test this stuff would be through an eval harness and the other problem is that all these other coding harnesses might not even use Olama you might have to integrate them with their plans and stuff like that so it makes it really really hard to determine which ones we want to use but you know my goal is just more like let's broadly explore these models and um you know like if we needed to self-host them how could we do that to get data sovereignty right or could we run these locally and get compute so this is really what I'm interested in um not so much comparing every single coding coding harness that's agnostic but hey at least we ran the gambit here okay this is in this video we're going to take a look at um PI agent coding or PI coding agent. I did try to do goose um like I don't know four or five times, but I simply cannot get it to work. So, I've given up uh trying to get goose to work. But we'll go ahead and we'll try out pi. So, it's at uh this domain coding agent.ai. And we'll go ahead and do a single install here. I'm going to create a new folder called pi uh pi example. And so, we will cd into the pi example here. And we will go ahead and globally install the pi coding agent.
So, we'll give that a moment um for this to get going. And so, now that is installed, let's go ahead and figure out how to utilize this. It's probably going to be something like Pi. But I know that Olama has a boot up for this. So, if we have it, we just go ahead enter and I can just choose the model I want.
And it's telling me to navigate over here to connect to Olama. So, I will go do that off screen here. And it just says connect device. I like that. That's going to make my life a lot easier.
Okay. And so now we are connected to pi um and it is installed. So there we go.
So we're inside of it. We got pi.ample.
So we'll just say create me a flappy bird game. Well, hold before we do that, let's do type for slash. So we have moss model.
Can we do that? Hold on. So we only have one model available.
Escape here. What other options do we have? settings, scoped, import, shared, session, change, log, fork, tree, new, compact, resume, reload. So, not a whole lot of stuff. Uh, could we get skills with this? I mean, to me, that would be the most valuable thing is getting skills. Let's go ahead and just say create me. Does it have a plan mode?
Shift tab. Switches to different levels of thinking. We'll leave it on medium.
If I do for/ plan, do we have a plan mode? We do not. So, very primitive, but let's go ahead. Create me a flappy bird.
Well, we have a prompt for it. I think we have it in Kimmy.
And we'll just say create me because there's no plan here. So we'll just go ahead and say say create a Flappy Bird game single index, etc., etc. And we'll let it go off to the races and see if it can do that. I mean, I suspect that should work. Um, but obviously there's just a lot of things we do not have. But I mean, maybe none of that really really matters. Uh because at the end of the day, what do you need? Compact, clear, plan mode, uh debug mode would be nice.
Working with skills would be nice, but PI uh coding agent is supposed to be extremely minimal on purpose. Um so I wonder if there'd be a way that we could get agent skills in here. So while this works, I'm going to be back and see if I can get agent skills. Okay, so while it is working in the background, seems like it is doing an okay job here. Let's see what there is. I I looked for um the docs and extensions and so we can see that there is custom providers.
It's not exactly what I want. I want extensions. That's where I went and it took me to the wrong spot. So what I'm looking for is skills. So here extensions are TypeScript modules that extend PI's behavior. Sounds great. So register tools that the LM can do. User interactions, custom UI components.
Uhhuh.
I just want skills.
That's all I'm looking for is agent skills. So, we'll type in agent skills.
It looks like there's already a folder for it. So, all I want is agent skills. So, that is something that we will need to find out.
So, let me go find out. I literally did a simple search and under bad logic, which I believe is the creator, we have PI skills. So, a collection of skills for the coding agent, but does it already have the skills functionality?
And that's what I don't know. This the person that creates it, right? Yeah, this is the person that creates PI coding agent.
Uhhuh.
Oh, here we go. So, we should just be able to put them in. So, maybe it's already built into the docs and we do not have to. PI coding agent. We do not have to do anything extra here. So, we'll go back over to their docs. The docs and I'm looking for skills. Oh, it's right here. Skills.
Okay. So, you just can drop skills in there and they should work.
And right off the bat, we're seeing Doom. Can it run Doom is always the most important thing here. Um, and it's following the standard format. So this should in theory work.
So when this is done we will test out that but we are just generating this out. Okay. All right. And so um I mean it looks like it was starting to produce something and so there was a chance that this could work. But for whatever reason we have a 400 developer is not one of the system assistant etc. So I'm not sure why it's doing that. But I guess the idea here is the message ro it pass the developer which is just something that does not exist. Um, but we do have we do have code. So, I'm kind of curious about this one problem with the um the format. But I'm going to go ahead and reveal this in the file explorer. I'm going to open it up in Chrome. I just want to see did we get any good results here.
Uh, yeah, we did. Yeah, it looks good.
Okay.
I mean, the pipes could be green. about the only issue here. Uh, it's crazy to think how much this game made when it first came out back, I don't know, it was like 2010 or nine and the creator then was like not happy with their success and then tried pulling the game and didn't wasn't happy with the money.
But anyway, um, the point is is that it is working, but I don't know why we got a 400 error. Um, so I'm going to go just head and ask chatbt for fun. So this is a pi coding agent and it errored out even though it did code. Why is it trying to use a developer role? Because I don't know why it would even try to do that as I've never seen developer role use system assist system whatever it's not a valid role for the open AI standard. Some newer frameworks including PI coding agent uh claude and internal tools use a developer role for higher priority systems. But here's the problem. The developer role is not part of the official API schema. So if the agent sends it back um so the build message is like this sends it back to this why it's still produced. Um so maybe there is a way to fix that. So like we'll just look up pi coding agent developer role because it comes down to the agent, right?
Um and just see if there's a plugin or something.
extension.
Okay. Is there any extension here for that?
So, I'm not sure. But the idea here is that it should be able we should be able to modify based on what that stuff is, but that's definitely not something that I can do much about uh right now. So, what we can do is maybe check out skills. And I really like the fact that they're doing Doom here. So here it says TUI or web UI resuming Doom. So here they are trying to play Doom in here.
Um and so here it looks like we can put our skills in our skills directory, my skills, whatever. So let's go ahead and just try this out. So, I'm gonna go ahead and make a new folder, say skills, and we'll call this new skill um Doom player. If we can even do that, it'd be kind of fun, but I don't think we can. And we'll say skill skill.md.
Um, and no, I think skills. No, I thought they had a um a front matter on them.
Maybe they don't. Let's go take a look at the standard here as I just keep forgetting what it is. And somewhere here we'll have our specification. Yes, it does. So here we should have front matter. We'll go ahead and paste that in here. So this would be like Doom player.
Um and so the idea here is uh you can play the game Doom by receiving input. Right? So we'll go ahead. Maybe that's a bad skill right now, but is there a way that we can run Doom in the browser?
That'd be a good question.
So, I'm going to go over here. I'm just going to use something a little bit more intelligent.
So, I want to create a agent skill using the open format to play uh Doom. Could a code can I run Doom in my terminal?
And is there a way a skill can uh play it?
Okay.
And maybe that is just an unrealistic expectation, but just because they saw it.
Um, but this one just actually launches Doom anytime it asks to do work. So maybe we're trying to do something else here.
And I'm just giving you a moment. Maybe this is just overkill because it would still have to control something. So, let's just make it a more boring realistic goal here. Um, and so I'm going to go ahead here and just make one that I usually make, which is to-do um update update skill. And in fact, we probably have an existing skill somewhere else. If I go to one of our other repositories, I'm going to go ahead and do that and just try to find one. Um, so I'm going to exam pro code offscreen here and I'm just looking for anything like maybe like my Gemini repo might have a skill. Um, so like in the simple to-do app and so is this a skill? Why does it say dots skill on there? That's kind of weird. Maybe it's in here.
Ah, here we go. To-do manager. And so we definitely have one here already. This is online. You can go grab it. You don't need me to give you a link here. It's all here. So, I'm going to grab this one. And Gemini is pretty good as it's pretty much vanilla in terms of its use.
And the to-do manager needs to be called uh this here.
Okay. And the other thing that we're missing here is the script.
Go back a step here and we go into scripts. I'm going to grab this here. Okay.
And we'll make a new folder here called scripts.
And this is called manage to-dos.
We've done my other courses. You've already done this. It's not a big deal.
Okay. So, we now have our script in place. Now, I just need an SQLite 3 database. I'm going to go back a few steps here. And we might already have one in this repository.
We do. So, I'm going to go ahead and download this uh download this raw file and then I'm going to just drag it into this project here, the PI example. And so now we have that there. So, I'm going just uh clear out of there and then hit back up and choose to launch with Pi.
And we're now in here. So, not save my skills available. So, manage or it's to-do manager. So, it's not autocompleting any skills here and I don't get that. So, I'm going to say, can you give me a list of my to-do uh to-dos in my uh SQLite database using the to-do manager skill? So, I'm being very explicit here because I don't know if it's going to automatically invoke it. And so, I'm just trying to help it out here.
And so, here it's kind of exploring our repo.
And there's a skills directory. Let me explore.
Okay. But it's not invoking the skill.
So that's what it should be doing.
So we will go back over to here. We'll take a look at the documentation and see what we have as our options.
Um so skill skills skills skill skills on demand abilities follow the agent skill standard. invoke a skill with skill for/skill colon name, but we didn't see that when it popped up. So, it did list stuff out here, which is fine. It did use the manager, but it didn't actually use the skill. So, I'm going to type in for/skill colon to-do manager add a uh add a task to the to-do list.
Pick up milk. Okay. And we'll see if it actually does that. The user wants me to add a new thing here and it is still not I can't tell if it's actually invoking the skill.
It is using it correctly.
Says not Gemini. So here Oh, so maybe there's something wrong in my script here.
Gemini. No, no Gemini there.
Oh, look at that. We have a bunch of Gemini stuff here.
Just take this out.
Skills.
Skills.
Skills.
Skills.
I like how it reasoned and it still it still technically used the skill, but I can't tell if it's actually invoking it.
Like it it's working, but not in the way that we want it to work. Right. So, let's go back here. place in the pi directory. Oh, okay. So, what we'll do, we'll go back here and I'll just change this to pi.
I mean, that makes sense. Um, that we go there. So, pi pi. I wish they would just have a generic folder for all this nonsense. Um, and so that seems to be what we needed here. We'll go ahead make a new folder here called.py. And then we will drag this folder in here. We'll say move. And then we'll just cancel out of here. And we'll say lama enter launchpie. That is launching up. And now we're in here. Oh, there we go. Okay.
So, add a new item to to-do list.
Pick up bread. Okay. And so, we'll go ahead and do that. And so, now it's directly invoking the skill. I mean, it has to And we'll give it a second here.
We'll just wait a moment. There we go.
And so our skill is working. And I would say that Pi uh you know it works as expected. The only thing I would like to have is like a plan mode. So I'm just wondering like PI coding agent plan mode. Can we get an extension for that?
As they say it's something that we can extend.
Um and so here it says add plan mode support.
Reasons are outlined in the readme. So there's a reasons why they might not do this. What I learned about building opinionated minimal coding agent. Oh, okay. This might be interesting. Let me find out why. Just a moment. Okay, so the reason it doesn't have a plan mode is the creator argues that you just don't need it. Um, and uh, that's interesting. But they also say you don't need MCP and you don't need this. So you don't need sub aents. Um, and so basically they're just, you know, not going to do them. and maybe someone will extend that functionality individually.
Um, which is fine, but I've just it's, you know, that's what they want to do, right? So, um, it's interesting um to read and see someone who has built out one. This is from, you know, November, but obviously there's been multiple changes here, so I'm not sure if they have been updating since then. Um, but hey, at least it works. And, you know, a lot of these things you don't necessarily need if you are uh smart about it, but you know, sometimes we're lazy. It's just nice to do have a plan mode. Okay, but hey, we tested out PI coding agent. It worked. Um, and so we will move on and maybe test out Droid.
Okay.
Hey, this is Andrew. In this video, let's go ahead and try out Factory Droid, which apparently has its own concept of droids, whatever those are supposed to be. Um, so we will go over to the Droid website and go ahead and install the CLI. So, here it is. Um, I'm just going to CD back a directory here and make a new one called U Droid example.
Okay. And we will CD into Droid here and curl that along. So, we'll go ahead and hit enter. And hopefully that it is installing Droid as this looks like they have an entire ecosystem. I just want the CLI.
Uh, am I getting the CLI here? Yeah, it is. Okay. So, it says CLI here. Run Droid to get started. Sure. But I'm going to actually launch it from Olama just to save me some trouble as that's been a much easier way to work with it these days. And we'll go ahead and choose Kimmy. That seems to be the one that I like. Okay, cool. Droid. Huh.
Yeah, great. I don't care. And so here to start using, we log in. But I don't Why would I need to log in?
Please log in your factory account to continue. Is that the only way that we can use it? Really? Okay. Well, I guess I can make an account. I don't really want to.
Um.
Uhhuh.
I was just hoping that we'd launch it and use it. It's interesting. Other providers don't like rope you into that stuff, but we'll go ahead and I'll just quickly make an account. If there are any spend traps and I'll just scrap this video, but if this video exists, you know that we're able to get through it.
Okay.
And yeah, Andrews or Sure. Yeah. I'm just going through the sign up process.
Again, I don't endorse any of these products. I'm just using them because there's some options here. We're just getting a little bit of uh explan exploration here. So, get 1 million free credits. Add a payment method. No, no, thank you. I do not need your uh payment credits right now. And maybe later on I'll consider it, but right now I'm not.
Install the CLI. We've already done that. And so, let's go ahead and attempt a login. Um and so we're waiting for authentication. So here we have this key. Uh we're just matching it up here.
So uh confirm the code is shown on your device. hgtdxm it is. Go ahead and confirm that. So now it is confirmed. Uh would you like to install the VS code extension? No, I do not want it. And so now we are in here.
So the question is what model do we currently have? And we can see that there are multiple models here. But right now we're on the Kimmy one. And so clearly they are providing their own uh one. So we have their Droid core which is whatever that is. Oh, droid core.
It's just GLM, Kimmy, and whatever. So they have various ones here. Okay. And other providers, which is fine. But I just want to stay on our own custom model here. And we'll say create me.
Well, hold on. Spec. Oh, so it has instead of plan, it has spec. So, we'll go ahead and we will go over to our prompt here. And I'm gonna hit this, paste that in, hit enter. Smooth experience with the editor. So, that's really, really nice. Um, again, don't care about that, but it is nice little animations, nice timing, showing us the timer down below. Um, really wants us to use the ID. I don't want an ID. I'm a I'm a developer. I don't need that. So we have our little task list going there. It was interesting. Pi agent code was like we don't need to do our task lists. Um and so we'll see what happens here. Uh while this is going c curious does it have like a status line? It does. So we have a status line that we can adjust but maybe we shouldn't fiddle with it while it's working. So be back here in just a moment. Okay. So now we should have a proposal. So if we go all the way to the top here. So here we have Flappy Bird single file implementation requirements are this architecture. This is really really clean. I wonder if there is something influencing this with the spec information because this is really clean output. Uh so we have core systems game loop input physics game state webgl renderer draw calls. We have a mermaid graph um game objects that it's considering the shaders. So, it's giving us like little little peaks of what it's going to do, its controls, its file structure, the estimated lines, which I don't really care about. Planned features. Looks good to me. Proceed with proposal.
Um, choose autonomy level.
High autonomy. Go for it. I think that's kind of suggesting like whether or not uh we should intervene and suggest what it should do, but I want to see what it can do on its own here. So we will let it give that a try. Okay. All right. It is back. And so it says a complete WebGL Flappy Bird game in a single file. Uh bringing back the results. So clearly there's something special about their uh spec driven framework here that is making this experience a lot smoother.
Um and so we'll go to our Droid example here. And we have our Flappy. I don't know why I didn't name it index HTML. I could have swore that we told it to call it index html. So, it did not listen, but that's fine.
Maybe the results are stellar. Okay, let's go over to here. Let's take a look.
Yeah, I mean, Kimmy produced much better results on its own. Um, but again, you know, maybe when you're working in a much larger complex project, you're going to have better guidance with their spectriven development, whatever it is that they are doing under the hood. So, that might be part of the cell with Droid. Uh, again, they have this concept of droids, but we will figure that out here in just a moment. One second. All right. And so, you know, just trying to see what Droid was suggesting here. you definitely experience this tiered autonomy uh system which is just going to just change whether it's going to ask you stuff. It looks like there's specialized droids instead of generalized ones and they're really trying to claim it based on their I guess their agentic loop and their feedback. Um it says so you know it looks okay. I mean I think I would probably prefer using pi coding agent. Um, even though there are some things I don't agree with it, but it's just the fact that you can exactly see what it's doing. Um, and obviously like in this particular case, I don't feel like we got the best result that we could have and um, Kimmy was able to do a better job without utilizing Droid.
The thing that um, I'm not certain about is like the droids. So, I imagine that they have existing skills or droids or let's up to droids in here. Manage custom droids. That's just agents, right?
And so here they have a few. Yeah. So I feel like it's just kind of a renaming of agents.
Um so yeah, it's fine. I mean, they already have create skills, so you can create a skill right off the bat inside of here.
Uh enter, manage, resume missions.
So fancy way of managing a task list. So yeah, to me this is really much looking like marketing. Um, again, still tight tool, but I don't know. I don't think I'm I'm that convinced here. But anyway, at least we got the experience. It still was a very smooth CLI experience. Um, but yeah, there we go.
Hey folks, it's Andrew. We're going to continue on here and I think we'll go ahead and try out Open Code. I'm not really a huge fan of um the creators of open code as um I know them and I'm just not a fan of them. But we will go ahead and just be fair and try and evaluate it here and see what we have. So we'll go ahead and grab uh the coding tools. I'm going to go ahead and grab it from curl. We'll go ahead and hit enter. Um and we'll go ahead and make a new folder here called open code CLI.
and we will wait until that is installed and see what we get. Looks like they have a whole interface here whizzing around. You know, it's fine. We'll just give it a moment. Okay, so now we are in here and I'm going to CD into our open CLI code directory. I kind of want to call it open open code example.
Okay, and we'll just CD back here. And I want to go ahead and use Olama. and we'll go over to launch open code and I want to choose KE 2.5.
And so here we are. It's a nice interface. Um I kind of wish I was already in uh the interface I expected to be in here. We have a plan and a build mode. So that is nice. I'm going to go into plan mode. We're going to go ahead and grab this line here. Paste it in. Enter. And so right off the bat, it's off to the races. On the right hand side, we have contacts and other information here. So, it's kind of like an expanded status line. Uh, and it's moving pretty fast.
So, I'm not sure why it's streaming so fast. It just it feels fast. Like, this is not faster than other providers, but there's something that makes it feel fast, which I'm not sure about. We do have a heads up on the context and usage. Am I even using the model that I think I'm using? I think I am. Model.
I am. Okay.
Over here we have open code. The only complaint about these models is that they're highly quantized. So even if you use the $5 or $10 subscription, people are um fussing about them, but I I couldn't tell you. I'm not going to go pay the $510 to find out as we have lama and I feel lama we can carry to other places. So what do we have here in our list? So we have um some thinking here.
HTML good.
implementation clarifying questions.
Okay, go execute.
So, go ahead. And now it's executing.
Yeah. And again, it looks nice, but I don't care about whether things look good. I want to see if they work well for me. But I did like the fact that when we launched it, it wasn't trying to force us to make an account like Droid was. I did not like that.
But the issue with these providers is that you know they have to build a moat around somewhere and uh you know it's going to be pushing their own subscriptions and things like that. Um and I'm just not I'm not sure like if if we're going to use open models I want to have something that is open like PI coding agent though that has now been acquired by a company so maybe won't be open forever. Um that or maybe we just need a minimal one that we can all kind of tweak and work with. Um so The results are done. And again, the colors are nice. Um, but we'll go over to here and take a look in our open code.
And where is our file? Oh, we have another plan.
Thought we just had our plan. Why we got a second plan here? Does this plan cover everything you need? Let me know.
And would you like start playing? Go execute. User wants me to go create the Flappy Bird game. they want it is here and then it goes over to here.
Proceed with this plan. Yes, I think maybe because we're in plan mode. Yes, please proceed. And maybe this is easy to fix. We'll just switch out of plan mode. We'll go ahead and try this again.
The user wants me to proceed to creating the flyby game. But wait, the the user said yes, please proceed. This seems like they want me to execute the plan.
However, the system clearly states that I'm in readonly mode. Well, why can't you switch? So, we'll go ahead and hit tab. That's fine. Okay. Please execute and we'll go ahead and try that.
Now, is this the span for um Olam? If it is, let's go take a look here because that might be useful. I think what I would like is if they had an editor where you could just kind of I mean, again, maybe Pyode can do that, but like you could put stuff up there. But we'll go over to uh settings.
and we'll take a look here.
And this is what my usage shows. So, I'm not really sure if this is my actual real usage, if that's my real tokens.
Um, so I cannot tell where the source is coming from. Okay.
Or and also, you know, is it using something else here? But anyway, it's going off and building. We'll we'll be back here in just a moment. Good management of to-dos. I do like how we have a to-do list here on the right hand side, but we will chill. Okay. All right. It's done. It did create the index html correctly and so here it is explaining in summary what it has done.
So that is good. Let's go ahead and reveal the syntal explorer and we'll go over here.
Oh, it's the Wow, that's different. It made a long looking one. Oh, because it stretches with the page. That's funny.
[laughter] We have a a long bird or not. We'll go ahead and tap and it works. So yeah, uh, good success so far. So I I like that. The only thing I would say that I just did not like about open code was the fact that you had to automatically to or do that toggle to get in those modes. Um, but yeah, I mean it looks fine. So you know, would I ever put more time into this? I don't know. Again, it looks more aesthetically nice as opposed to functionally useful. So again, so pi coding agent would probably be my preference as we can exactly see what is going on and that to me is probably more valuable as a developer than anything else. But again, you know, you you need to make the choice of what you want to use. But anyway, so that is open code.
Um, and there we
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 views•2026-05-29
🚀 BCS613C Compiler Design | Module 1 to 5 Schema Evaluation 🔥 | VTU 6th Sem 💯 #VTU #bcs613c #exam
Pranavaa-y4y
104 views•2026-06-02











