This video evaluates Qwable 3.6 27B, a fine-tuned Qwen model trained on Fable 5 reasoning data, demonstrating that while the model can generate code for basic applications like a 2D driving game and sand physics simulator, it struggles with complex tasks requiring algorithmic implementation (dungeon crawler) and efficient reasoning (HumanEval), ultimately performing worse than both the base Qwen model and Qwopus fine-tunes in the tested scenarios.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Qwable 3.6 27B tested - 16GB Local LLM setup
Added:Hey, welcome to Luke's DevLab. My name's Luke. I've been building software and working as an engineer for 8 years, and I'm here to help you cut through the hype and find out what AI can actually do for you. And in this video, we're going to be looking at Quaible. So, Quaible is essentially, if you've been watching, you might be familiar with Coipus, which is the Qwen model trained on data such as reasoning with Opus.
Well, and as you can guess, this is going to be the Qwen model fine-tuned on data from Fable 5. So, this is the specific Quaible we're going to be using. So, as it says here, it's fine-tuned on Unsloth's Qwen 3.6 27B on a cleaned Fable 5 style reasoning and instruction data set. And then, we're going to be using the 4-bit quant because that's the only option here, and that's GGUF format, so we can use that with llama.cpp.
So, in this video, we're going to be running through the same tests as we did in the previous video for the Coipus 27B coder, so we can get a nice comparison.
And those tests would be the 2D driving game, the sand physics game, the dungeon crawler. Now, this is a new one, which involves a lot of usage of algorithms without using any frameworks. And then, at the end, I'm hoping we can do the OpenAI human eval. I've done some initial runs on this with this model, but the model seems to spend all its thinking budget and not give the correct output. So, I'm working on I'm hoping I can deliver that to you at the end. And then, just a real quick look at the system specs I'm running. So, this is a separate system where the models are running inference on, and this has got 16 gigs of VRAM and M32 gigs DDR4, which we will need to offload to because the 27B cannot fit entirely in my 16 gigs of VRAM. So, let's kick things off with the driving game. So, got the prompt here.
Feel free to pause the video if you want to have a quick read through it.
Otherwise, it will be linked down below at the GitHub in the description. And also, at that link, you'll be able to find model configs, including the one used in this video. So, let's give this over. And the tool I'm I'm to code in here is VS Code, and then we're using the client extension, which allows me to connect my llama.cpp model into it. So, we'll check in along the way if there's anything interesting to see. Otherwise, hopefully, we can check in at the end on the finished version.
Right, it's been about an hour now on the driving game, and it's it's been a journey. So, we're going to run through here how it's gone, and then I'll show you the finished result, because I feel like an hour is enough time for a model to have something working here. So, we gave the initial prompt. It listed out all the things required. It understood the prompt. It made the checklist, and it started coding away. So, it reached this point of trying to verify the files, and it essentially kept coming back with this huge context that was just not possible to fit in our 64K, and this just this request alone was trying to come back with a 100K context. So, I looked at the game, and it had an error in the console. So, I I tried to stop it doing this, and just give it the error, so I could be like, "Hey, stop doing that. Just fix this error." Um and I tried that multiple times, and it kept just repeating in this loop. So, essentially, I had to kill the task, and then I had to create a new instance, and then I could come back I could start this with, "Okay, fix this error. This is the problem." And then it said it fixed it, and then we had a new error, and then it started working away again, and then I said, "Okay, it works. The only problem is the background was a blank. I could see obstacles, there was no background to see." So, went away working again. Um it kind of fell over trying to do some stuff here. So, it came back saying problem solved, uh but it wasn't. It was the same. And then it kept trying to like load the server, check like V was running, do things that I feel like it can't do. Um so, I basically just told it, "Don't worry about the server. It is running. There are no console errors. You just need to fix the issue in the code and find out why the background is black. So, after I instructed it to do that, it then went and made changes and uh now, instead of a just a black background, it now was a black background with labels on it that said things like trees and meadows. So, I told it that and then it worked away at it again and now we have a background, but again, it's been an hour. I feel like it's been enough time.
So, this is what it's built. So, if you want to finish the race, there you go, done. Well, not a race, but if you want to reach the finish, there you go.
Press R.
Um so, let's drive around and see what it's done. So, these are the labels that I saw earlier. It was just these labels.
Nothing else.
Here's apparently the finish line.
Obviously, it's not.
And if we crash into one of these obstacles, uh it says resetting to start and you can't do anything. You can't press R to restart the game. That's it. You've got to reload the page.
So, yeah, it's not great. It's definitely worse than base clan. It's done a worse job than Quipus, in my opinion, but not the Quipus coder. That also did a terrible job, but the Quipus 27B did a decent job.
And well, I guess if we want to give it credit for something, it's one of the only models/fine-tuned engines that's orientated the car in the right direction, so there you go, there's something. So, >> [snorts] >> let's move on.
So, we're on to test number two, sand physics. As usual, pause the video if you'd like to read this or you can find it in the GitHub link down below. Let's give this prompt over and see how it does. Okay, it's just finishing up, giving us the summary now for the sand physics simulator. So, let's give it a second to see if it's going to run it for us or if it's going to give instructions on how to run it.
Although, it there is a clickable link here to open the index file.
Okay, there it goes. It's just opened it for us. So, let's zoom in on that.
There we go.
Right, so visually different from previous tests. Normally, we have a box here to the right with the materials in.
It's also giving us some little icons as well.
Don't remember that being in the previous test, so but let's see if it works. So, sand, yep.
That looks good.
Water, yep, that's good. So, in the last video, the Coda model broke when you switched material.
Although, it's looking a bit buggy there with the way the sand and the water are interacting with each other.
Let's build a wall. Huh, interesting. I can't build a wall through materials that already exist.
That was not present on the previous testing of this.
I wouldn't necessarily say that's wrong, though. I mean, it does kind of make sense that you can only build on top of what already exists, so yeah, that's okay. I'll not penalize it for that.
So, yeah, filling up and overflowing works.
Um, let's try the acid.
I'm not sure that's working.
Um doesn't appear I I know, I am mistaken.
It is degrading. It's just Okay, it's it's not instant. It kind of like takes time and burns through slowly, so yeah, that's interesting. That is quite interesting. So, like if we fill up this with acid, we should be able to see it more clearly there.
So, there we go. Yeah, you can see it steps through, slowly going through.
Okay.
That's kind of satisfying to watch.
>> [laughter] >> And we have an eraser.
Okay. Yep. Uh works quite nicely.
Um let's try brush size. So, bigger wall, yep.
More water. No, sorry.
More water. Yep.
Okay.
And then finally, clear.
All right.
Yeah, not too bad there. So, obviously we had the sand and the water interaction was a little bit glitchy.
But other than that, I think it did quite a good job. And the way that acid behaves was quite interesting in this one, where it slowly degraded the other materials over time. Whereas the other ones, it was almost instantaneous. I wouldn't necessarily say it's right or wrong how it was done.
Um but it's just interesting that it was different here. Like the also the walls could not be placed through materials.
So, yeah, interesting. So, let's have a look at the code and see I mean, there's nothing to see. I prompted it, it built the index file, and then job done. So, yeah, uh it was pretty painless. Just one shot. So, it did a good job on this task. So, let's move on to the next one.
So, we are on to the final browser-based coding challenge, and that is our dungeon crawler type game. So, this is an interesting one because the model has to simulate ray casting and procedural generation in this test, and then we give it some options of algorithms that it can use. So, as usual, feel free to pause if you'd like to read this, or you can find it in the GitHub and read yourself. Otherwise, let's kick this off, and it'll be interesting to see how it does. Okay, looks like our dungeon crawler is in the final stages of summarizing what it's done. All right, it's finished summarizing. So, usual behavior is it's going to open it for us. Otherwise, there is a clickable link, so I'll just give it a second. It is still running. So, let's see. Okay, looks like it's finished. All right, so let's just click this to open up. That doesn't work. Okay, one moment. Okay, here it is. Loading.
Can't open DevTools. Hmm, I can that way. It's not loading. DevTools is not even loading, so okay. Clearly, there's a problem here. We'll be back. All right, it's been almost an hour, and I'm going to call it on this one because we are getting nowhere, and this is seriously lagging my system. So, let's start from the beginning. We asked it to create the dungeon crawler. It went through the prompt, started working away. It keeps opening the browser.
Haven't connected MCP at this point, so it won't be able to do anything once the browser's open. Probably should do that because it would create more of an agentic workflow, but it's not too important in these tests that I'm running, I don't think, so. Anyway, we'll continue. So, it opens the browser, it tries to verify things. Says it's done. So, I open it, and it just loads forever, as we saw. Even DevTools wouldn't load, so I told it to fix it multiple times. I I even got some pop-up in the code editor saying that an algorithm was running, and it had to be stopped for the sake of performance of the system. It keeps trying, and it says the server died. Yep, that's because it's just probably getting stuck in a loop, essentially. So, it keeps going, and then, yeah, tries to do a screen capture, it can't do that, and then I see it trying to load the browser and do things again, and I just kill it and just say, "Look, I'm telling you, it's still not working." So, it carries on, and then it keeps loading the browser up, which is causing my system to lag so bad. So, I tell it, "Don't try and use the browser yourself. Tell me when you have a fix, I'll check it, and I'll let you know." So, it continues on, and then, after doing some more coding, what does it do? All verified, opening in the browser. So, it's not even following my previous prompt instruction. So, I let it I let it do what it wants. So, it opened the browser, says it's going to take a screenshot, couldn't do that. So, I just told it, "Look, it's still not loading." And at this point, I'm constantly having browser open, my system is like lagging hard. So, we've got to the point now where we're done with this. Like, it's just not making any progress since the initial prompt.
Over and over again, it's not following instructions. It's lagging my system, there's nothing to see. So, and other the last model, even the 27B coder in the previous video, which I considered not very good, did this. So, [snorts] yeah, we're going to call it on this one. So, human eval, I did run this on this model last night because, as you can see, it took 5 [snorts] and 1/2 hours to run. And then, the result I'm kind of thinking it's not entirely the model's fault. It looks like essentially seen this error a lot, where basically it's just going through reasoning too much, and it's never given us the answer. So, the score is 53% on here, and it's only passed 88. But, I want to make sure that it is not a problem with the harness. So, I am going to run this again after some tweaking, but of course, it's going to take another 5 and 1/2 hours. So, I'm going to see how it does on a second run of this, and hopefully I can bring you, I don't know, results that are maybe more true to what the model is capable of.
All right, I'm back. It took almost 6 hours to run and I'm getting a lot of no answers on here. And again, I think it's just burning up too much thinking. But there is a little bit of information to take from this. So, if we evaluate the questions it did answer, it scored 90%.
But again, didn't answer a lot. And this is running the same version of this test as the last model. So, yeah, I don't know. But I do know I probably am not going to tinker with this again and then wait another 6 hours. So, probably going to leave this test for now and continue refining it and testing it with more models and see how we get on with it.
So, that brings us to the end of this video on Quable. So, what did you think?
Personally, I'm just going to say I wouldn't bother with this one. It didn't do great on the tests. It did not adhere to my prompt. Sometimes, it went around in circles a bunch. Yeah, I don't know.
I I wouldn't bother with this one. I found Quo Pas better than this. And with the 27B, I even found the base model better than that so far. So, yeah, that's it for this one. Coming up next, I'm going to do Gemma 12B, which was also trained on Fable as well as Composer from Cursor. I'm keen to see how that little 12B does. So, thanks for watching this video. Give it a like if you can. Always helps the channel. Get subscribed if you want to see more and I'll see you in the next one.
Bye.
Related Videos
Getting freaky with the squares
robloxofficialadmin
137 views•2026-06-23
AILA Conference Takeaways + Q&A
TsangAssociatesPLC
166 views•2026-06-21
Realistic Ragdoll With Iron Man - Teardown
breakvoxel
2K views•2026-06-22
China Has Bet Everything On Putting AI Supercomputers In Space — And America Wasn't Ready
chinazeus
283 views•2026-06-18
UI Is Dying. Can We Save It?
enricotartarotti
666 views•2026-06-23
Dr. Peter Leek, OQC | theCUBE + NYSE Wired: Mixture of Experts
siliconangle
7K views•2026-06-18
How Transformers Work - Attention Explained Step by Step | Chapter 06
easewithdata
103 views•2026-06-22
Have we finally solved social engineering? Plus: World Cup fraud, AI IDs and an IBM/OpenAI collab
IBMTechnology
298 views•2026-06-24











