The video highlights the massive gap between architectural complexity and practical utility, where a 76GB VRAM requirement still fails to guarantee functional code. It is a sobering reminder that "agentic" capabilities often remain trapped behind prohibitive hardware barriers and inconsistent outputs.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Poolside Laguna XS.2: New Open Weight Coding Model Tested Locally with vLLMAdded:
We have a new player in the AI market, Poolside, who released two brand new spanking models yesterday. One is Laguna M.1 at 2.25 billion parameters, and the other one, Laguna X S.2, which we are going to install in this video at 33 billion total with only 3 billion activated per token.
This new model, X S.2, is the open weights release under Apache 2 free to use and modify commercially. This is a mixture of expert model built specifically for agentic coding.
Meaning, it's designed to write code, run tools, think between steps, and work on long horizon tasks autonomously.
The scoring on the benchmark is quite good, but, you know what? As is the spirit of this channel, we are not going to go on these benchmarks. We are going to install it locally, and then we will test it out. This is Fahad Mirza, and I welcome you to the channel. Please like the video and subscribe, and consider follow me on X if you're looking for AI updates.
Now, the thing is that Poolside, like most of these commercial companies, is pushing their own coding tooling. They have Pool, their terminal coding agent or Harnessed and Shimmer, which is their cloud sandbox, but great for their ecosystem. But, we are doing this with vLLM just to stay open source, not to get stuck with any sort of proprietary or any, um, you know, particular ecosystem product. The catch is that Laguna X S.2 support hasn't landed in official vLLM release yet. If you don't know what vLLM is, how to get it installed, I would highly, highly recommend go to my channel, search with vLLM, and watch this second video. Now, the bad thing, or I would say the really, really annoying thing is that I started doing their vLLM install, and I'll quickly show you.
So, this is where I did the vLLM install. You can't do a, you know, proprietary pip install vLLM. You have to build it from scratch, and then you have to, um, you know, copy that one.
You have to pull a new PR, and then build it. So, that, in short, took me around 3 hours to do end-to-end. I was just sitting there pressing enter, waiting that when it will finish, and it finished, uh, in a long time, as you can see here. Anyway, it finished, and then I'm just going to serve that Poolside Laguna X S.2 on my local system with this context length on my, uh, GPU card, which I'm quickly going to show you now. This is my Ubuntu system, and I have this NVIDIA H100 with 80 GB of VRAM, and this is where I am going to serve this model.
The model is being served. While that happens, if you're looking to rent a GPU on very affordable price, you can find the link to Vast Compute in video's description with a discount coupon code of 50% for a range of GPUs.
Uh, to be fair to Poolside, the model is also available on Ollama if you're interested in using this X S.2 over there.
Now, the model architecture looks very, very standard, but quite good. It has 40 layers in total, 30 using sliding window attention of just 512 tokens and 10 with full global attention with per head sigmoid gating. KV cache is quantized to FP8 to reduce memory footprint. It has native reasoning support with interleave thinking between tool calls and a 128K context window, which you just saw in my vLLM command.
256 experts with one shared expert and was trained using the Mo optimizer, uh, from Moonshot, of course, with async off policy agent RL.
So, looks pretty good on the surface, but let's try it out to see how it goes.
Model is being eventually served, and it has taken it more than 1 hour to just go through this where it has compiled that torch with this vLLM and the model.
So, if you're looking to deploy it locally, make sure that you have 5 and 6 hours, plenty of patience, and then, uh, you should be able to run it locally. So, right now it is being served. Let me take you to my, um, VM where I have just installed this open web UI, and you can see that now this Laguna X, uh, S.2 is being served.
So, this is, uh, let me quickly show you the VRAM.
And this is what the VRAM consumption looks like at the moment, around 76 gig of VRAM.
Let's go to our open web UI, and let me give it a coding prompt first.
So, I'm going to ask it to build me a real-world application with Python Flask with web socket for live updates, a code editor with syntax highlighting. So, basically, it's a real-time collaborative code editor with SQLite as backend, and it should give me everything as a single setup.sh file.
It has thought very, very briefly, and has gone right into the code, as you can see here.
And then it has given me some of the directions.
Let's wait for it to finish, and then we will see what exactly it has built. It has already built everything, very interesting. Okay, let me check it out.
I have just increased the context length of this model in the settings.
Uh, you can access it on the top right.
And, hopefully, this is going to produce the full code now.
It already has started.
And it has done. So, let me open it here, and then we will test it out. It was pretty quick.
Or maybe let it Let me open it in the browser. That will be easier to test out.
So, this is what it looks like in the browser. So, I'm just going to paste something here, and we will just add a comment here.
This is, uh, initial commit, something like that.
Add the comment.
And I'll just say I don't know why it is saying line number. I'll just say two.
Didn't add anything.
And it's not adding the comment here.
We'll say, "Okay, new review." New review also doesn't do anything.
So, I'll just add something here.
This is my comment.
There is no save button or anything here. Let's add here.
So, this is not working, as you can see.
Um, and we have tested this same prompt with a lot of models like Deep Seek V4 with Xiaomi in yesterday.
Like in these videos where we did the comparison, and most of the models were at least able to get a working application. Some of them really, really threw it out of the park. So, anyway, so this model has failed royally in this first coding test, but let's not make our impression with that just one test.
Let's do few more. I'm just going to open a new chat, and let's do few more.
Okay, let's do relatively simpler one.
I'm just going to ask it to build me this fully functional Kanban board in a self-contained HTML file, and then there should be some, uh, CSS, HTML, and that sort of stuff.
And if you're wondering, I'm using the same hyper parameter which they have mentioned in their model card, like top K is 20, and temperature is 0.7.
And the context window is increased now.
You can see it is building something.
Let's wait for it to finish.
And it has generated something. Let me download this.
And I'm just going to open it in front of you. It is opened. Let's add a card.
I'll just say, "My card one."
And testing.
Let's add the card. That is good.
Can I put it in progress? Yes, I can.
And then I can also drag it here. And the counter is increasing. This is pretty good. Let's add card here.
Progress.
Test. Add card. This is good.
Let's add one more.
First.
Test and this.
Let's add it like this. Yep. Can I just add like this?
That's good.
There you go.
So, can I move it directly here? Yes, I can.
And move it back here. So, this simple Kanban board, it is working fine. Also, you know, we can move things around.
So, I believe that, you know, don't do very high-fi coding test as we did in the first one, but this one looks pretty good. By the way, this model is more geared towards software engineering agentic coding, and that is where it is been optimized for. So, coding is, I guess, the main use case for this model.
And now, let's see if it can do some canvas rendering and build a simulation.
So, I'm asking it to build me a sand simulation in a single self-contained HTML file, where there will be, uh, all these material, water, fire, stone, and then it needs to, uh, you know, make sure that it is responsive when I click and drag on the canvas canvas, and then it should be running in a real time.
No external libraries. Let's see.
And it has created something. Let me download it and run it in front of you.
There you go. So, >> [clears throat] >> sorry.
I'm just clicking on sand.
Clicking here.
Let's click. Nothing happens. water It says, "Okay, drag and drop." Let's drag and drop. Nothing is happening.
And I'm just dragging dropping. I'm clicking on stone.
Nope.
You know, fire should spread, water should flow. But nothing is happening.
Let me reload it because Okay, now it works. So, this is sand.
You see?
Falls and piles up. This is good. It is See, it piles up.
I'm just piling it up. It is could be improved, but you are not not bad at all. You know what? I'm not being very fussy.
This works.
Now, let's see if it This is water.
Nope. Water doesn't work.
Fire doesn't work.
Stone, something happens, but really doesn't work. I'll just reload it.
It just reloads it with sand, and that is why it works. Once I click anywhere, it doesn't work.
Like you see in the stone, it doesn't work.
So, but I asked it to do all of it. But sand is working, which is our primary. I mean, we can be just good with the sand anyway.
So, look. This is the Laguna XS 0.2, their first model.
So, let's cut them some slack.
Hopefully, it will be much much better in the next iteration.
And but share your thoughts. What do you think about it? And please follow me on X if you're looking for AI updates, and consider becoming a member.
Thank you for all the support.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 viewsβ’2026-05-29
Long-Running Agents β Build an Agent That Never Forgets with Google ADK
suryakunju
142 viewsβ’2026-05-30
5 Mind Blowing Omni Uses Cases
PaulJLipsky
1K viewsβ’2026-06-02
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K viewsβ’2026-05-28
BREAKING: Microsoftβs New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 viewsβ’2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 viewsβ’2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K viewsβ’2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 viewsβ’2026-05-29











