This video provides a practical reality check by proving that architectural efficiency is more critical than raw parameter count for coding performance. It effectively demonstrates why a well-optimized dense model can outperform a larger MoE model in local deployment.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Testing Qwen 3.6 Locally on Easy, Medium, and Hard Coding TasksAdded:
A few months ago, I tested the Quen 3.5 model and I really liked it. Recently, a new version of this model was released, Quen 3.6, and today I decided to see whether I can use it as one of the main local models in my workflow. At first, I thought I would simply choose the model with the highest number of parameters, which would be the 35 billion version.
But after taking a closer look, it turned out that these models differ not only in parameter count, but also in architecture. So, choosing the model size is not as straightforward as it may seem at first glance. The 35 billion parameter model uses a mixture of experts architecture. This means that instead of one monolithic neural network, it uses a collection of specialized subn networks, so-called experts, and only some of them are activated for each token. The 27 billion parameter model, on the other hand, uses the more traditional architecture everyone is familiar with called dense.
In this setup, the entire model is always active and fully used to generate tokens. I'm not going to dive into theoretical debates about which architecture is better. What interests me right now is the practical side of things. And from that perspective, I discovered two surprising things. The first is that the larger model, the 35 billion version, runs almost three times faster on my machine than the smaller 27 billion model. The second surprise was that according to benchmark results published by the creators of these models, the smaller model performed better on software development tasks.
And that is exactly what I need. So going forward I'll be using the Quen 3.627 billion parameter model with the dense architecture. Just to remind you the difference between these models is not only the number of parameters but also their architecture which explains these results. By the way before I move on to experimenting with the model.
There is one important point I want to discuss in more detail. It concerns local models in general. For me local models are not a complete replacement for paid models. I see them as a complement to paid models for situations where I need to handle less complex tasks or when I don't want my data leaving my computer. So for me, the question is not whether Quen can replace something like Opus or Gemini. The real question is more like this. How complex of a task can I give Quen 3.6 and still be confident that it will handle it successfully? To understand how well this model fits my needs, I prepared several test tasks with different levels of difficulty. starting from very simple ones that the model should handle easily all the way up to fairly challenging tasks where we can see the real limits of this model. By the way, my main computer right now is a MacBook and it is not the best option for running local LLM models. So, I'll be running Quen 3.6 on my desktop PC and accessing it over the local network from the MacBook. On the screen, you can see the specs of that desktop machine. If you also decide to run Quen 3.6 six yourself. The main thing to pay attention to is having a graphics card with as much video RAM as possible. That is the most important factor when running LLM models. So, I'm opening Z editor. That is where I'll be running today's experiments. I already have it configured to connect to the desktop PC wherew.
By the way, my previous video was specifically about how to connect local models to this editor. Now, I'm opening the project where I keep test tasks for evaluating different LLM models. I grouped them by difficulty. So, we'll start with the easiest one. This task is about creating a simple web app for managing a to-do list. It should be a basic HTML page with the ability to view, add, and delete tasks as well as mark them as completed. All right, the model has started thinking. While we wait for the result, I'll connect to the desktop through the terminal and check what kind of load this model creates under active use. So, ignore the top graph. That one belongs to the integrated GPU, which is not being used right now. The lower graph shows the load on the dedicated graphics card, the one running Quen 3.6. As you can see, the model fits entirely into video RAM, and there is still enough space left for a large context window. The GPU is fully utilized. Excellent. Now, let me also check CPU usage. All right, everything looks good here, too. Memory and CPU are almost idle. The entire workload is on the graphics card exactly as it should be. Perfect. I'm not going to show the full response generation process since it could take a while. I'll simply mention at the end how long it took. All right, Quen says it has finished working on this project. Response generation and file creation took around 5 minutes.
Okay, let me check the file. At first glance, everything looks fine. Just as I requested, all the code is contained in a single file. Good. But now, let's test whether the code actually works. So, opening it in the browser, it looks pretty good. Tasks can be added and deleted. You can also mark tasks as completed. It seems that all functionality works correctly. Quen handled this project successfully.
All right, now it's time for a more difficult task. I'm closing the folder with the simple tasks and opening the ones with medium difficulty. Here, I'm choosing my favorite challenge, a sorting algorithm visualizer. There are six sorting algorithms included. And just like with the first task, I'll ask the model to implement the project described in this file. Again, I'm not going to show the entire generation process because I think this time it will take much longer than the previous simple task. So instead, I'll jump straight to the results and tell you how much time it took. By the way, notice that this time I gave the model a fairly abstract project description without a specific implementation plan. In real projects, doing that is not ideal. You should first create a detailed plan or break the project down into smaller tasks. But this time, I intentionally did not do that because I wanted to see how the model would behave in what could be called a stressful situation for it.
You can also take a look at how much power the desktop machine is consuming while the model is working. This is an important point that many people overlook. If the model is running almost constantly, your monthly electricity bill could increase quite a lot. All right, the model has finished and provided a detailed summary of what it completed. It took a little over 20 minutes. First, let me quickly inspect the generated file. All of the code is inside a single HTML file just as requested. Altogether, it is nearly 1,000 lines long. But the most important thing now is to check whether all 1,00 lines actually work and whether there are any errors. I'm opening the file in the browser to verify that the model truly completed the task and didn't just generate code for the sake of it. Well then, I like the look of the page.
Everything appears clean and polished.
All six sorting algorithms are available. There are also extra controls for managing the visualization and everything seems to be working excellent. I can change the speed and the number of elements. Everything matches the original specification. Very good. It looks like the model successfully handled this medium difficulty task as well. Now I'm deleting everything the model generated and moving on to the difficult tasks. I have two projects here and today I'll focus on building a comban board. As the name suggests, this project requires creating a comban board for task management with cards, drag and drop functionality, and everything else users would normally expect from this kind of service. But since the project is fairly complex, I'll first create an implementation plan and break it down into separate tasks. And of course, I'll ask Gwen to help me with that. The model spent around 5 minutes generating the plan. Let me open that plan in an external program, so it's easier to read since it was written in Markdown format.
All right, it looks like the plan the model created is quite detailed. It covers many different aspects of the future application. There are even code examples for important parts of the program. It also includes descriptions of how each part should work and how the components should interact with each other. Very impressive. At the end, there is a recommended implementation sequence, guidance on how to organize the code, what should be tested and how, and even a difficulty estimate for each stage. I'm impressed. The model did an excellent job with the planning phase.
All right. Now, let's see how Quen 3.6 handles implementing that plan. Since the model thoughtfully divided the development process into separate phases, I'll also give it one phase at a time to implement. so it has an easier time staying organized and doesn't get confused. But because there are six phases and each one will likely take a significant amount of time, I'm not going to bore you with repetitive footage. Instead, I'll show you the final result and the total time it took to complete this project.
All right, all tasks are finally complete and the final file containing all the code has been generated. Just like in the previous projects, all of the code is contained in a single file.
Good. Completing all stages took around 1 hour in total. Whether that is a lot or a little is something everyone should decide for themselves based on their own workflow and the kind of tasks they need to solve. All right, now it's time for the final test to see whether a locally running Quen 3.6 was able to handle this challenging final task. Launching the project in the browser. Oh, it looks great. I'm happy to see that everything opened without errors, at least so far.
Let me test the core functionality. I'll add a new card and move it around. So far, everything works correctly and without errors. It looks like everything that was required has been implemented.
You can add a new column. Search also works. Excellent. I'm impressed. For a relatively small model running locally, this is an outstanding result. I did not expect the model to succeed on the first try without any errors. I think I'll definitely continue using it on a fairly regular basis now and we'll see how it performs on my real world tasks. So, that's it for today. Write in the comments whether you have already tried this model and what you think about it.
If you enjoyed this video, leave a like and subscribe so you don't miss the next episodes. See you soon.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
5 Mind Blowing Omni Uses Cases
PaulJLipsky
1K views•2026-06-02
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29











