This experiment proves that strategic reasoning optimization can effectively bridge the gap between small models and large-scale intelligence. It marks a significant shift from brute-force scaling to the precision engineering of thought processes.
Inmersión profunda
Prerrequisito
- No hay datos disponibles.
Próximos pasos
- No hay datos disponibles.
Inmersión profunda
WHAT? Qwen3.6-A3B Could Solve This?Añadido:
Hello community. So, great to do you back. Today we make a test. We do an experiment. We take the latest Qwen 3.6 a 35 billion mixture of expert model with only a tiny tiny active 3 billion model.
And we will construct here a little bit of a super intelligence around it and let's see what we can achieve with this tiny little Qwen 3.6. Now, you know I have a real complex test where I test the biggest model on this planet for their causal reasoning performance. This is here on YouTube here an own playlist and you see everything here from Meta's new Mu Spark to GLM 5.1 to Qwen 3.6 plus, Gamma 4, GPT 5.4. So, here you have all the data about the other models. But now what we want to achieve today, maybe we can have this locally.
And I know a 35B is quite a challenge, but look for the quantization on the face, we have more than 151 model quantization. So, I guess whatever is your particular computer hardware infrastructure from VRAM, we find a model that maybe we can try to test it here locally. So, let's go.
Now, you understand here we are here the whole week talking about here the core LLM and then the AI harnessing sphere.
And today I my experiment I will utilize this harnessing sphere. And in a particular way, so we have here our Qwen 3.6, our 3B, and I [snorts] will be the intelligence that will analyze here the pattern deep inside of a Qwen 3.6.
So, the challenge for today, our experiment is simple, you know. We will read here the reasoning traces if they are available for us. We will try to understand the reasoning pattern, the categorical pattern here, the matching of the pattern, and reading this in life, we will try to understand if our 3B makes a mistake, has some problems in understanding or assessing here the risk assessment for particular path routings, and then with an instruction tuning provide some help for the little 3B model, and we have one goal. We want that our 3B, our local model, will solve my complex reasoning test. This has never been done before. This is here once-in-a-lifetime opportunity, but I see in a Qwen 3.6.
Okay, it is a mixture of expert, it is a 35B, it is not a dense 3B model, but let's have a look if the technology is already there that we can use this particular model. It is a first-time experience, and you know exactly what we want. We want to understand the logic pattern of an AI. Whatever we have discussed this week or the last 2 weeks here on my channel, we will now use it, utilize the understanding of what is happening inside what other people call the black box of the AI, and we will try to actively steer now the reasoning behavior with some instruction we will provide here to our little tiny 3B model.
So, imagine we are now outside here. We are now the AI harness, and we with our super intelligence, we now try to guide here 3 billion model to come up with the correct real complex mathematical solution to my causal reasoning test. If we succeed in this, this would absolutely be fascinating.
So, what do you think? Let's do this experiment together.
And we are live here. We have a here our Qwen 3.6 35B active 3B beautiful here.
April 15, 2026. Let's have a look. Now, you know the bigger product, the Qwen 3.6 plus, we already tested it. Great.
But now let's have a look at this little open source here. Now, they tell us, "Hey, it is compatible with a Gamma 4 31 billion model." Now, this is interesting. Let's have a look if this is really true. Strong multimodal perception, everything is there.
Language is beautiful. Here we have it.
The last column you have our Qwen beautiful, and we will compare it. Oh, yeah. The very last before this, a Gamma 4 26 billion active 4B against an active 3B model. So, where is science? Where science? Wait a minute. Here we are.
STEM and reasoning, beautiful.
And now let's look at the last two columns here on the screen and you see sometimes, okay, maybe it is even outperforming a Gamma 4. But we do not believe any benchmarks, you know. We test it ourself.
So, if we scroll down a little bit here, we see, "Okay, vision language model, beautiful. API usage." Yeah, here we go.
As you see, what you can use it Wait a second. You can use it here for open claw. My goodness. Okay. So, they even provided here with open [clears throat] claw, and here on API it's called Qwen 3.6 flash.
So, let's go live testing.
We are here on the Alibaba platform. You know, we have done the 3.6 plus, the bigger product, 1 million token, beautiful. But now we're interested here in this local one. Yes, exactly. No, not the preview, but No, this is the 3.5. We are looking here for the 3.6.
35B active 3B. Okay, maximum token length 262.
65K is the maximum summary generalization length. We are not going to need this at all. So, beautiful.
Let's say let's go with this model. So, settings done. We are here live. We go expand other models and here we are with our mixture of expert model.
Now, to be fair, we activate the thinking mode, you know. I think everybody agrees with this. And I just input here my standard causal reasoning test. I do this now for a year and you have seen my YouTube playlist. So, here we are live. We have sinking, beautiful.
And if we click on this, we have here on the right side of the screen not really the reasoning trace, but some kind of a summarization. Look at this.
Simulating the button interactions, navigating toward the target floor, assessing the sequence of action taken so far, exploring the next logical move from floor 12 to the floor 15, navigating from floor 15 to the floor 50, beautiful, ascending steadily, navigating the sequence, evaluating the implication, assessing the optimal sequence to reach the floor 50 with this elevator test, tracing here the path from floor 35 to the floor 50.
And you see it is working. Oh, it found out that there's a second optimization hidden in the text.
Careful rule application. Yes, there's a little bit of a hiccup here in the rules. It's not as easy as you might assume. Proceeding to the next phase here. Optimize now energy and the token use. Yes, there's three optimization that are nested within.
Okay, how to land exactly at floor 50.
I'm assessing each step of the journey.
Beautiful.
Evaluating the conditional impact of holding the green code card.
I'm not at floor 48, floor 47. Logical transition. What is open to me? I assess the risk. This is now interesting.
Reassessing the optimal route through the sequence, avoiding the traps and the restrictions.
Nice. Almost like a strategy, you know.
Evaluating the constraint given, seeking [snorts] the red code card. Yes, beautiful. Evaluating the goal attainment through this path sequence.
Button presses and their effects. Okay.
Strategic moves. What is possible?
Checking for violation in the sequence of the button presses. Reach floor 16.
Ah, so we are back.
Okay. Reassessing the path to reach the target. Navigating to Here we are.
Process says to what emergency exit at floor 29.
This is here one of the shortcuts that would be great if you would be able to detect it. Considering this, I realized that no available action allows me to stop. So, the opportunity to activate the exit is likely lost.
This is an incorrect statement.
And it moves on. Floor 18. Evaluating the current floor 25.
Okay, backtracking.
Go back to floor 24. Explore possible paths. So, we do have heavy exploration, beautiful. Exploring path three to reach the floor 29. Okay, this looks nice.
Tracing the sequence of move. Yes.
Evaluating here here an ABC cap. Okay.
Strategic precision. Precision, of course.
Navigate to floor 50 with precision again.
Following the interaction between the floor transactions.
Yeah.
Exit.
Tracing the sequence with corrected logic.
Not really a lot of to deduct even from this summarization here of the reasoning trace, and we have a first result. It was live. You see, this is the real time that we have here. Okay, we are on the Alibaba platform, but of course, this is their model, so here we go.
Now, you know the best solution is about eight 10-step solution is great.
And now we have here an 18 press solution. I mean, of course, a 3 billion model, what we expected. Exactly this.
At least it found a valid solution.
18 button presses. So, this is This is what we expected. Yes, of course it is an active 3B model, no?
But let's look at this. It passed here all the resources, the code collect trip trap constraints that were awarded.
Beautiful. So, great.
Now I ask, what other routes have you taken into consideration? Maybe one of those routes would have been worth exploring further. So, you see we are not at this exploitation. This was too too soon. Let's go and explore again your strategy because it found I could reading here the reasoning trace. I could clearly see there was a shortcut that was worth exploring further.
So, maintaining the strategy to make the token requirements. Beautiful. Verifying each step against the established constraints. Yes, evaluating the strategic pathways under the token economy. And we have an answer.
Beautiful. So, let's have a look.
What other alternative routing did this little 3B come up with?
Finished. Great. So, let's start at the beginning. Wait a second. Route two, route one. Where's the beginning? Here.
Three distinct alternatives to the floor 50.
Around different strategic levels. Okay.
So, route one.
16. Oh, we already down from 18.
Interesting.
So, energy stacking is not the optimal optimization procedure to start with.
And you see we immediately run into a problem. We have here a road red collides with the mandatory mirror activation at 33. So, this is not working. This is not recommended.
Route two, 15 button presses. Hey, even better. But verdict traffic failure here to the floor 22 trap. I know. And now this is it. Route three, 29 emergency exit. This would be it. Look, 12 button presses. Between you and me, this is part of the solution. So, this is the path to further examine. We have the ABC sequence integrated. Beautiful.
So, you see? Oh, we have Oh, holds that at nine. Floor 25 completely blocks the standard. Risk is medium.
Interesting. So, it gives me here the constraint route one, two, and three.
Risk assessment, which was worth exploring further.
Route two was the most interesting. No, this is incorrect. If you want to push past the optimal solution, No, this is not the way to go.
But of course, we want something else.
We want to optimize the solution of our little 3B, no?
So, I say, "Hey, with this risk insight, perform your next step to further optimize your result and reduce the number of button presses further."
And now you see our little 3B goes, tries out here some particular paths. A little bit of an instruction falling from my side, but I think it was a real very soft indication only without giving away here which particular route to go.
Beautiful. So, leverage here. Immediately it found here the emergency exit. This would be it, you know? By passing the tedious navigation from the floor 30 to 49 entirely.
However, it has a high cost and it requires some code cards. Therefore, the optimal strategy is acquire blue, execute ABC to get red, navigate to floor 50, navigate [snorts] to the floor 29, and then invoke the emergency exit.
And this is just 12 button presses. It immediately found the right segmentation strategy to solve this puzzle.
This is absolutely amazing.
I've never seen this with a 3B either dense or mixture of expert model.
Because it got it now exactly right here on the second try.
Gives me here the verification. So, eight button presses plus one indication of the emergency exit. This is an excellent result. Even for the proprietary model, this is excellent.
We moved from 18 button presses to eight button presses.
And this is the correct solution. So, absolutely beautiful.
Final deliverables. Yes, we have an exact sequence that is now here from the floor 29 to the floor 50. We have AB AB ABCB. We have a symmetry. What a coincidence if I build a puzzle that it has a symmetric axis.
State summary. Everything is fine.
Everything is according to plan.
This route is superior because it utilizes the emergency exit, which acts as a direct teleport to the goal, eliminating the need to navigate the complex upper floors.
Exactly.
Isn't this beautiful? And now I say, "Do your validation run."
Okay, validate the result. Show me step by step that all rules and all the constraint are respected.
So, we have here another look here at this marketing summarization here.
Advancing through the sequence, proceeding here with the button presses and the state updates. Great.
Reach the floor 50 with all the conditions satisfied. Validate the sequence against this. And we have an answer.
So, beautiful.
Let's just finish here. And then we go back and we have a look here. But the conclusion, as you can see, this is mechanically sound, mathematically precise. It violates zero constraint. It achieves the goal in nine total actions.
It finishes within everything. It is ready to deploy.
What a beautiful solution for a 3B model. So, you see you can use a local 3B model if you understand the reasoning traces, if you are a subscriber of this channel. So, you understand immediately what is happening inside of the model. If you can read a little bit of the reasoning traces.
And then, yes, you have to be lucky that the system really has the performance level. But I have to tell you, final state verification, a QN 3.6, absolutely impressive performance.
Passes everything as you can see.
Beautiful. Handles all the constraint are building.
Rule by rule compliance summary.
Everything is green. Everything is according to the rule. Beautiful.
Emergency cost is within limit and we have a valid sequence. And since I do this now for a year, I can tell you this is correct.
Now I want to analyze here the strategy.
So, I ask my little 3B, "What was helpful? What elements would you like to remember? What is What hints were important for you?" So, let's imagine you would have here a memory module and you say, "What would you learn here from your strategic discovery here, you know?"
So, let's have a look what it comes up with. Let's just get a feeling of the reasoning traces that are happening, you know?
Structured analysis of the winning strategy broken down into what drove success, what cognition layers was bypassed. Yeah. Whenever you have a linear climb, I reverse engineer from the goal.
Beautiful.
So, you immediately understand where you have to start.
What was most helpful here? Apparently driven movement tracking, of course.
A code synergy over the collection. Yes, not all the code cards were necessary.
There was some redundant.
Then, third. Yeah, you don't [clears throat] touch any trap. Beautiful. You just ignore them.
And consecutive move windows. Sometimes I have a sequence like ABC that opens up new possibilities. Analyze them. Go in a higher complexity level.
A mathematical buffer, of course.
Yes, you have to calculate here if you encounter here some problems.
Prerequisite chains backward from the goal state. Flag the mirror triggers.
And analyze the shortcut. Critical hints. Now it gets interesting.
Forced shortcut reliance. Okay. Set a hard floor. Yes.
Collect whatever is necessary. So, absolutely.
Floor 33 forces the mirror on. Okay.
An emergency exit at 29 requires the red and the green code card. Beautiful. This is the linchpin.
Identify There's always, if I write a puzzle, a simple solution, you know?
Why this outcome performs here over the alternative.
Resource efficiency, predictability.
And the risk profile. Look, suddenly we have a risk profile of near zero for our shortcut. And green check mark, it fits here all the requirements.
Work backward from the winning condition. Map the state transition before pressing.
Then treat the constraint as filters, not as obstacles. Prefer determinism over the pure probability distribution, which is absolutely beautiful.
And finally, number five. I think the most important thing, validate every single step.
Multi-validate every single step. This is here the recipe for success. And it even comes up here and it is, "Hey, you want to continue? I can go on." What a beautiful little model.
Videos Relacionados
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











