This video demonstrates how to replace expensive Claude API subscriptions with local AI models by using Ollama to run open-source models like Qwen 3.5:9b, enabling a completely free and private AI coding workflow. The tutorial covers installing Claude Code, setting up Ollama with proper environment variables, downloading and running local models, and addressing context length limitations by creating custom model files with increased context windows (up to 64K). This approach eliminates ongoing API costs while maintaining the full functionality of Claude Code for development tasks.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Claude Code + Ollama = FULL LOCAL AI AGENTAdded:
In this video, I'm going to show you how to launch your claude with Quen 3.5, which is an open source model using Olama. You can see that we're using open source models and I basically asked it to clone the repo and understand the whole context and it was able to perfectly clone the repo and do all sorts of things. We'll do other such things, look at different experiments.
But let's move on to the issue that we are facing right now. So you can see Claude has this subscription which cost about $17 per month and we have the max plan at $100 per month. Now even though I have limits on my current session but if I start to use this it gets out very quickly and then I would like to use some free models to do my work. This is the motivation of this video. The motivation of this video is to use some open-source models like Gwen 3.5 which is running on Olama on my local system and use it to run cloud code. Let's see even if that's possible. And uh by the way, it's possible and I'm going to show you how. So let's go ahead and start the show. First of all, I would like to go to a particular folder. So I take this folder and I open up VS Code and then I go ahead to open up this folder. So let's open up the folder quickly and then let's get started. So we open up a terminal here and I open up PowerShell because that's better to work with. Need different permissions here. So we are inside the PowerShell and uh and now let us start. So if we go to this link we uh I will paste in the link here. Here basically we have docs.olo.com and there we have an integration of clot code. So if you go to this integration you have all the details of how to get that connection. So, first and foremost, you need to install cloud code. So, I copy this, go back and paste in here and just install cloud code here. While this is installing, you can go ahead and uh just pull up the models that you need. So, you can use the models that are running on the cloud of Olama as well. But again, you will need to be on the paid plan as of today when I tried this. So what you can do is you can download some other models on your local system and then you can just say Olama launch claude and then you can just put in the name of the new model. So the model that uh worked for me you can try other models is Quen 3.5. And if I go down you can see that we have all these sizes and um considering that my RAM is 16 GB this was the model that I found very suitable. So I copy this then I go ahead and install Ola as well. So we need to install Oola in our system as well otherwise that won't work. So for installing Olma you can run this command in in your PowerShell. So I go back and you can see that we have this cloud code installation done. And one more thing I was stuck is you can see this link location of C users low username.local.bin.
You can take this path from C to the bin here and add to the environment variables. Make sure we have this otherwise this won't work. So we go to the variables and you can click here, double click or you can say one click and edit and you add this here. So you click on new and add the path here and then move it up to the top here. So this is one way to add the environment variables because it needs access to this cloud.exe file when we start up the cloud instance here and this time it doesn't show me any error. But when I did it for the first time, uh it specifically mentioned that the environment variables is not added. So you can do that step as well. Now let's install. Let's quickly go ahead and run this. I already have installed and therefore but again it's going to reinstall maybe some updates. We don't know. Next we are looking at this model Quen 3.5. This is you know an amazing model. So the downloading is done for Windows 100%. It is installing now. So the installation of O Lama is complete.
Now you can just run O Lama and just press enter and it will show you that it is alive. So I'm going to stop this but let's go ahead and do Olama list. And you can see that I already have installed Olama. And you can see all these uh models that I have these cloud models are not running on my local system but it's on the cloud of Olama. I need a paid subscription. They give free for few turns but you need a subscription ultimately. But the model that we are going to look at is this quen 3.59 billion. So if you go back to the models here you can see that this is the model. So I copy this model and it's written here that you just need to run this model run quen. And if I go back and and I say lama pull and the name of the model here. If you do this it's going to pull the model from the lama storage. And you can see that uh 6.6GB and it's a success. So the model is pulled and you can see the model here.
This is the model. Let me quickly remove the other two models that are here so that I can show you exactly how uh it's done. So let me remove. So remove uh Quen 3.5. This is an extra model because the latest version and the I can see uh the latest version and the 9 billion version is exactly the same thing. You have the latest here and the 9 billion here. The same thing. So we can remove this. So we need to do we need to I'm extremely sorry we need to doama remove and that's done. We have another model that I would like to remove this latest 64k because this is something I wanted to show to you. Therefore remove this and basically this is removed now. So I clear this and then I say list now I have this single model quen 3.5 9 billion and that runs on my system. So for example I say 1 quen 3.5 and 9 billion. this will run on my system. So you can see that oneplus 1 it's thinking and it's giving me the response. Okay, so this model works. Okay, you can see that this model is working. So I can now close this uh Ctrl D for saying buy and then once again let me see the list of models that I have. So quen 3.5 9 billion. And now what I need to do is I want to start cla with this model. So if I go back to the guidelines here, we can see that we can start this model like this. So I'm going to launch claude and then you just say the name of the model.
So name of the model is quen 3.59 billion and just run this. This is going to launch cloud code and you can see that we are inside this folder. So quick safety check is this the project you created or you trust. Yes, I trust the project. So I say yes. I say enter.
Basically it's ready. So you can see that we have our models here. So we can see /ash models model and you can see that all these models so custom set model custom opus model custom hiq and all these four models basically there are three models and this is a custom model all these are having this default of quen 3.59 billion which is really great. So now we using our local model.
Hi, how are you? Now you can see that it took about 42 seconds but if I don't have this recording instrument on it takes less time but again 42 seconds for this reply using your local LLM quen 3.59 billion. Now this model if you go to cmd uh this is an important thing that I want to say to you. So uh for example if we have this Olama list here there's a list of LLMs that you have and if I do PS it's going to show me the models which are running right now. So you can see that PS when 3.59 billion is running which is a size of 9.8GB and you can see that CPU 45%age of my CPU and 55%age of my GPU is being used to run this model. Now I wanted to focus on this context length for running plot code and the functions and activities that we do inside of plot code like working on a big project. This context is very small but again we know 3.59 billion has a very good context length.
You can see it's it has 256k context length but here when we use it on our system it's just 16 384. Now there is a way to increase the context length or use more of the context length on Olama itself. For that what you need to do is to create a model file. I've talked uh about this in this video. It's a very old video but we need to create a model file no extensions. And inside the model file what we need to write is I need to write from and then say Quinn 3.5 and the model that we're talking about and then we need to put some parameters.
If you're not sure what to do, go to this link. I already have the link. If you go to this link docs.olo.com model file here, you can see that we take from a model. So from that particular model that you're trying to edit and then parameter num cdx and this value. So we go to parameter num cx and this value.
So we don't need this. And this value we are going to increase this to 65536.
And then make sure you have this same name here. So quen 3.59 billion. And that's done. So this is our model file.
Now using this model file, uh we can create another model. Okay. So what I'm going to do is I'm going to stop this here. I'm going to clear this and then I'm going to do create a new model. So the name of the model, it depends on me what I choose, but I'm going to use the same model here like 9 billion. And then I'm going to say 64K context. And then I'm going to say -ashf for the file. I'm going to use the model file. So this is the syntax to create a new model. And you can see that it has exactly created a new model. Now we can see that model inside of list as well. So I can say list and that model will be available on the top. You can see this model. So 64k context is the same model. Basically it has been copied twice taking twice your memory. You can delete the older one if you need. But it's a 64k context now.
Now what you can do? You can start up launch cloud d-model and you can say 64k. So now you would have a good context length. Okay. So you're using this model. Say look at the model. So using this model and now uh you have a much bigger context to work with. Now let's go ahead and uh clone a repo.
Let's go ahead to my GitHub and try to clone this repo. So for example this void model. copy this and uh let's say I'm going to use my voice here. Hey clude, uh can you copy this repo and tell me the important things that I need to know? Also tell the system requirements that I need to run this repo on my local system. If not local, uh what I can do? I have run pots available as VPS. I can use that as well. But tell me the way how to do that. Okay. And I paste in the repo here. And let's see. So it's using we have essentially increased the context length so as to help it otherwise you know it gets tripped off in between.
So you can see that it is asking to run or get clone the repo here. Really good.
Do you want me to proceed? Yes. And don't ask again for get clone. So this is going to make a new file here. So you can see that void model that is being cloned now. And you can see that we have the entire files here and it's going to go ahead and uh not just clone but try to reply what I've asked here. So while this is running uh what I can say uh is if you go to this document that I've shown you docs.ol.com here there are some models recommended models that you can use. So the model that we've used right now was Quen 3.5.
You can use Quen 3.5 cloud which is the model which is running on the Ola cloud.
You need a paid subscription for this.
Sometimes some credits here and there.
But again, to use it on cloud code, you need a paid subscription of Olama.
Otherwise, you can use GLM 4.7 flash as well. You can obviously download these models on your local system or get this models running on VPS like runpod. So you can get this running on runpod like VPS. You can go ahead and check the link. Ultimately, you need a model running on Olama and then you can do this. So you can do this Olama launch Kim K2 cloud and then you can connect with telegram as well. There's different activities that you can do. Go ahead and try if not uh I will certainly make another video but uh I think the model is thinking hard just reading everything and trying to come up with a good answer. So you can see that it co coitated for 10 minutes about 11 minutes but again very good outputs. So let's go ahead and see the outputs. So what void is? Void is an video object removal model built on cog video that removes objects from video along with physical interactions. Okay. Component requirement is about 60 GB training and 30 to 50 GB inference. Your hardware is 16 GB sufficient but inference uses GPU RAM not system RAM. Okay. And GPU is not enough. We need about 40 GB GPU or 800.
Okay. Your 8 GB GB is not enough. 5 billion parameter video alone requests 10 to 14 GB VRAM. Step one would be to pip installer requirements and then get clone. Okay. And then download this model and then open up Python notebook and then use this. Okay. Cool. We can see that we are running this 3.59 bill64 context window model in our local system on cloud code and you can see it's amazing. By the way, by the way, you need to be on at least a pro plan to use this. That's one thing. But I hope you enjoyed this video. In the next video, I'm going to show you how to connect with some of the free models like open router free models that these models that we have are for free. So, I'm going to show you how to connect cloud code with these free models from open router.
Open router is again a huge place where you can get models, paid models, free models and it's really amazing. You can check out this video where we will try to implement clot code and open router.
Related Videos
VALORANT's Latest 'Exclusive' Tier Bundle is Rough...
KangaValorant
17K views•2026-05-28
Flight Attendant Mocks Poor Looking Black Woman — Mid Air Announcement Exposes Her Real Power
SkyboundStories-b4r
184 views•2026-05-28
I FIXED My Friend’s Blown Turbo RX-8… Then Sold It
Cameron-RX8
134 views•2026-05-28
NewsWatch 12 at 5: Top Stories
NewsWatch12
1K views•2026-05-28
Simon Jordan & Danny Murphy deliver PREDICTIONS for Arsenal's Champions League FINAL with PSG
talkSPORTArsenal
6K views•2026-05-28
Botting is OUT OF CONTROL in Classic WoW (Again)...
SolheimGaming
108 views•2026-05-28
The "AI Job Apocalypse" is CANCELLED!
WesRoth
9K views•2026-05-28
STREET FIGHTER 6 - INGRID Story Walkthrough @ 4K 60ᶠᵖˢ ✔
RajmanGamingHD
12K views•2026-05-28











