This video presents a comparative evaluation of seven open-source local language models (including Mistral Medium 3.5, NemaTron Cascade 2, Dev Straw Small 2, Mistral Small 3.2, Dev Straw, Deep Coder, Ministra Three, and Qwen 3.6) using a practical Blazor server application development task. The evaluation reveals that while all models struggle with complex development tasks, Qwen 3.6:latest demonstrates superior capability by proactively identifying and fixing problems rather than providing incomplete solutions. The models exhibit varying levels of competence in file management, code generation, and error correction, with some models showing significant hallucinations or failure to complete tasks despite being prompted to do so. The hosts suggest that adding specialized skills or documentation may improve model performance for specific frameworks like Blazor.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Code it with AI - Language Model Roundup 2 (ep.30)Added:
[music] [music] Hey and welcome back to Coded with AI.
Uh this is episode 30. I'm Carl Franklin from Appv Next. That's Jeff Fritz from Microsoft. Hey Jeff, what's up? Ah, you know, it's another it's another wonderful day here on the east coast and I just learned I have to travel to a Microsoft event next week.
>> Oh, [laughter] probably not going to it's probably going to be raining there.
>> Well, um, San Francisco, so we'll have fun.
>> Yeah. Okay. It'll be nice. All right.
So, this episode is a continuation of last week where we did the local language model roundup number one and promised to do more of them. So, this week, uh, we tested seven open-source models. Seven more.
>> Carl, should we should we get that voice sound effect to come in and play during this? Previously on Coded with AI.
>> That's right. Yeah. Old soap opera style.
>> There you go.
>> Um, so we left uh last week with Quen 3.6 colon latest being the winner.
>> Yeah.
>> Of all of them. That was my favorite anyway, >> the the most capable at being able to answer and and deliver content that we felt was workable >> and think like, you know, as much as a model can think, but it saw problems and fixed them proactively rather than just doing, you know, a half-ass version of what we asked it to do. So, this this week I decided to be a little more uh verbose in the documentation and include the conversations back and forth because I didn't do that in last week. I just kind of reported the findings, but you're going to see if you look at the read me here um the the conversations and how brain damaged some of these things. And [laughter] look, we know that these models are not going to be Claude Opus 46. They're not going to be GPT5.
The they're they're going to have things that some of them are good at and some of them they're not. Right? So that's the whole idea behind this uh language model roundup. Um so take a look at my screen here. This is the actual readme uh and we're going to look at seven open source models. It's the same test as last week, which is to basically flesh out a new Blazer server app with a file manager page. First one I tried was Mistral Medium 35 latest. Huge. Took up all available RAM and it basically took 5 minutes to start spitting out words once every 5 seconds. It was just Yeah.
Okay. Here. It doesn't work on my machine. Might work on yours. Again, my machine has 96 gigs of RAM.
>> Yeah.
>> And uh the Nvidia GeForce RTX 5090 graphics card with 32 gigs of VRAM.
>> Uh so if you have more than that, go ahead and try this one. [laughter] >> Wow. It It feels like a a big enterprise database. We're going to take all your RAM and you might get something back from us.
>> Yeah. All right. So, uh, Neotron Cascade 2 and I by by the way I picked these just going to the, uh, Alama models page and I did a search for anything that had the word code in it, you know, and u basically look for the non-cloud models, the the local models that would fit inside, you know, my RAM RAM.
>> So, did you cross reference it with that can I use page we were looking at?
Yeah, that's a good one. Let's go there.
Let's check that one out. So, the site is actually can I run.ai.
>> And it basically it looks at your uh local machine and gives you a list of models that you can or can't run. And you know, this is kind of good, but I specifically wanted models that Lama listed. However, look at the difference between my machine, which is what I'm recording on right now, and the Lama machine. So on my machine, 23 run great.
Okay. 17 are too heavy. On my alone machine, 50 run great, 16 are too heavy, zero barely run, zero tight fit, one decent, and 10 run well. So yeah, and look, Quen is at the top. Quen is um uh Alibaba.
>> Yeah.
>> Produced by Alibaba.
>> All right. Neatron Cascade 2. So I start every single one of them tell me about this project >> and it said the repository this model roundup 2 doesn't exist because well you'll see later the repository is an aspet core web project razer components created via GitHub copilot. It builds a minimal web app program CS that registers Razer service and static asset handling but no other source code is present. Hm. I said there is code in this project. Did you not find it? What UI framework is it using? The project is built with ASP.NET Core Blazer. The CS project explicitly defines a Blazer app and the source contains Razer components. So, it found them. It just the first response was there's no other code.
>> Now, that's weird.
>> It is weird and a little brain dead if you ask me. [laughter] So, I went forward with it and I said, I want to create an API based file upload functionality. This is essentially the same prompt that I did last week, right?
>> Yeah.
>> Uh and it after a bit of stumbling and what I mean by stumbling is it kind of, you know, timed out and couldn't find files and you got the red path not found and it had to find things. It it created the files, but it put them in strange places.
>> Uhoh.
>> And this is what I told it.
>> I told you to create a files folder in the project folder to receive files. I also asked you to create a models folder where you can put file chunk CS. Since it's not a component, it does not belong in components. It actually put file chunk CS in components uh shared.
>> And it also created a page folder, a pages folder underneath the the top folder for upload razor when we already have a components pages folder where the existing pages are.
So it moved the model into the models folder but completely removed the upload razor file. [laughter] So I said where is the upload razor file? It said it should it should live at yada yada which is true but the recent cleanup that removed the empty pages folder also deleted the upload razor file you asked us to create earlier.
>> So what what can we now whereas Quen would say I'm sorry let me just put that back for you. Now I'm getting a question. Would you like me to recreate the upload razor file now? Yes. Yes, I would. [laughter] >> Oh my god.
>> And then there was a build error. And even though the AI moved file chunk into a models folder, it didn't change the name space from model Roundup 2.components. And so the page, the upload page was looking for model Roundup 2.models, right?
And so the fix was to change the model name space but instead in in upload razer but instead it focused on the upload razor file creating five new build errors. So my conclusion fail. [laughter] >> Yeah. Yeah.
>> Right. So this is it's difficult these having these kinds of interactions require >> you know certain a certain uh level of competence. What I think is that this model right here what is it Neatron Cascade 2 is probably good at C and probably good at you know building I don't know who knows what non Blazer applications but for Blazer no thanks. I and to be clear to the the viewers out there, we're just running these models directly. We're not trying to add any skills in. We're not giving them any additional training or information. We're we're looking at what do these models know from the get-go.
>> Mhm. Yeah. [snorts] All right. So, the next one is Dev Straw Small 2. I was impressed with the quick responses. I said, "Tell me about this project. This is ASP.NET Net Core Blazer web application. It's minimal with standard Blazer component pages like counter, home, and weather.
>> All right.
>> Uses ASP.NET Core with Razer components.
Has Bootstrap for styling. So, it was good, right? And and it was quick.
>> Good. Okay. I like hearing that.
>> Yeah. I gave it the same prompt.
>> Mhm.
>> It created the files in the right place.
However, it said task completed with 33 build errors.
Yeah. [laughter] >> Job complete. Job done.
>> Well done.
>> Well done. Yeah. So I said there are 33 build errors. The task is not completed until you get a clean build.
>> Okay.
>> And then it fixed the errors and the project ran but I I don't or it didn't have a link in the nav menu and I didn't want to just enter the path, you know, the path to the page because then I lose all the context and everything else.
>> [snorts] >> So I said, "Please add a link in the nav menu to the upload page so I can test it." It got a runtime error after I ran it, right? It fixed it and I clicked it and it I got navigating error when navigating to upload. Can it provide a value for property HTTP? So it essentially didn't um add an HTTP service that it was supposedly injecting.
>> So I gave it the error. It runs now, but upload does does nothing. And I on debugging, I discovered that the variable was null where it shouldn't have been null.
So, it fixed that bug, but got another error because the chunk size was bigger than 512K.
This is the same kind of stuff that was happening when I was using Quen 36. I, you know, it it just assumed that it could use this uh >> uh open file input open read stream. And if you do that in Blazer Server tries to load the whole file.
>> Yeah.
>> So I so I said uh okay we're still getting the file size issue but the problem is that the code on line 59 uses file input open read stream which is where the error occurs. What we need to do is access the file by name in a regular stream. Open it and read one chunk at a time and then send that chunk. Right?
So, lots of cycles now [laughter] resulting in a 400 response. And you know, I went over and over and over. It still doesn't work. It still doesn't work. Still doesn't work. Then I went to the controller and I noticed that the that these were missing right at the top.
>> Okay, >> these attributes.
And this was indicative of the model's lack of thoroughess when it comes to finding issues. Before I found this, it basically said the API endp input wasn't being hit and I should write a valid endpoint.
>> Really?
>> Really?
>> You wrote the endpoint and now you're telling me it should work against a valid endpoint.
>> Yeah, it should, but it doesn't.
>> Oh. Oh, Carl, I'm so sorry.
>> I know. So, I eventually gave up on this model because it made too many stupid mistakes.
All right. Mistral-S small 3.2 prompt. Tell me about this project.
>> Yeah, >> it did. It did.
>> So far so good. Looks good.
>> So far so good. Yeah. Prompt. Same as before. The result. I'm powered by Mistral Small 32. Model ID blah blah blah. I understand you want to implement APA API based file upload functionality in your Blazer application. Here's what I'll do. [snorts] And it told me what it was going to do. And then it exited.
So I said continue. I will create the following for you. Blah blah blah blah blah. Let's confirm your choices. Would you like model Roundup 2 models file chunk? Would you like the files directory in your project group? Yes, for both.
And this response was asked user which namespace you would like to use for model Roundup namespace. Are you sure that you blah blah blah? and it and it so it's just confirming of everything that it's going to do >> and it can't do it. It's you know error creating file >> then and it said that's odd. Let's try this using PowerShell. I'll create the models folder and then add a file chunk file to it.
>> PowerShell running the command said to complete this task I need a description of you want what you want to accomplish using PowerShell.
Please provide a description that explains how to achieve your goal using PowerShell. [laughter] It's trying to create a file. [laughter] I said, reread the prompt. Create a files folder to receive files. Create a models folder and put file chunk CS in it. And the response was blah blah blah error. Finally, I can't seem to create that file. Let me use PowerShell to make both folders and the file. I'll need to create the files folder and models folder. And then it stopped and I said continue. Do you see the pattern here?
Yeah. I tried to create both folders in the file chunk CS file but there was a permission issue. I'll retry with the correct path. Let me try this again. And then it stopped and I said continue.
I see the issue now. Let me manually create these files and folders since it's failing for me using PowerShell.
Are you okay with me proceeding? Yes.
>> What is manual?
>> Finally, I'm sorry. You're not using PowerShell.
>> I know. Yeah.
>> So, and the way that I run is I run PowerShell first and then I run Copilot.
So, it's on top of PowerShell.
>> Mhm.
>> So, I gave up. It can't even create folders and files.
>> We We saw this with another model last week.
>> Yeah, I think I think so. Yeah.
>> Yeah.
>> Okay. Devstrol.
>> Devstrol. Tell me about this project.
I'm powered by Devstrol. Let me explore this project. Explore. Analyzing the current project. I'm analyzing the project structure. It's like Woody Allen and Sleeper. Don't you want to check the cell structure of the nose? Yes.
Checking the cell structure. Check the cell structure. Okay. The cell structure has been checked. Thank you very much.
>> All right. So, it basically ended with, you know, some JSON [clears throat] that told it what it its intent was and then it failed.
So, No bueno. Devstrol deep coder. Tell me about this project.
400 registry. Alama AI library. Decoder latest does not support tools. Okay, it's good to know. Uh Lama says it this is one that works with Alama. So mini straw 3 describe described the project fairly quickly and accurately. I gave it the prompt.
And it was very fast and inaccurate. It was also, [laughter] >> come on.
>> It failed really quickly. It also was too quick to declare victory after multiple build errors. I would not use this model. So >> the Quen 36 latest is still the best model so far. Even over the um uh Quen, what was it? 2.5 3 billion that that we started with.
>> Yeah, >> that was pretty good, but it still didn't do what I asked it to. And >> okay, >> so Quen 3 uh Quen 3.6 latest is the favorite so far. So, what do you think?
I think I think as we get a little bit further in here, maybe maybe we need to put together a skill for our for our roundup here and see if adding a skill will help >> because these are all falling on their faces spectacularly.
>> Yeah. So, we'll go back to the ones that can say create files because that's >> that's a good start. That's a good start. [laughter] And we'll give them a skill and see how they do. Like a Blazer expert skill or something.
>> Yeah. Yeah. A little Blazer expert skill that knows a little bit about file chunking dynamics. Maybe we load in a little bit of the documentation. Not reference and tell it go find the documentation out here, but let's actually copy in a piece of the documentation for file upload and file chunking from Microsoft Learn. So, it has it right there. And let's see if that helps out in the in in trying to get this working working better.
>> Yeah. So long as the documentation is not JavaScript and it's Blazer and, you know, basic C stuff.
>> Yep.
>> Yeah, we'll find it. Okay. Well, there you go. That's that's the roundup number two. Uh, of course, if you have any suggestions for models that we should test, uh, or any suggestions for the show and the next show, the next two shows or whatever, or just a suggestion, or just to say hi, send them to us by email at codedwithiapp.com.
And who knows, we might give you a mug.
This is a net rocks mug, but it's still a mug. I would give you a Blazer puzzle or code with AI mug. All right, see you.
Bye-bye. Catch you later, friends.
>> Thanks for watching. Make sure to check out blazerrain.com and blazerpuzzle.com for more great content. [music]
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 viewsโข2026-05-29
Long-Running Agents โ Build an Agent That Never Forgets with Google ADK
suryakunju
142 viewsโข2026-05-30
5 Mind Blowing Omni Uses Cases
PaulJLipsky
1K viewsโข2026-06-02
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K viewsโข2026-05-28
BREAKING: Microsoftโs New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 viewsโข2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 viewsโข2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K viewsโข2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 viewsโข2026-05-29











