Install our extension to search inside any video instantly.

First Look at MiniMax M3: An Open Weights Model with Frontier Capabilities?
Added: 2026-06-02

341 views1017:54TonbisAIGarageOriginal Release: 2026-06-01

MiniMax M3 is an open-weights AI model claiming to be the first to combine frontier coding, agentic, and multimodal capabilities. The model uses sparse attention to scale context to 1 million tokens, enabling efficient processing of long documents. In testing, it demonstrated strong multimodal capabilities by accurately reading and analyzing a low-resolution NFT spreadsheet, identifying collection details and providing financial insights. The model also showed effective research and reasoning abilities by ranking JEPA papers and recommending which could be reproduced on limited hardware. While it struggled with complex front-end coding tasks requiring multiple technologies, its agentic tool calling and long-context research capabilities performed well, with pricing around 30 cents per million input tokens.

[00:00:00]So, just a couple hours ago, Miniax released uh their Miniax M3, their newest model, and there's a lot of excitement around this because they are known for their open weights models, and they're very popular in the uh local LM community. Their M2.7, I know, was very popular and very wellreceived. So, people are excited about the M3, but right now, as of this recording, they haven't actually released the weights yet. um and they haven't released the kind of tech report and all the technical details. So I believe we'll have to wait 10 days for that. But we do have the model itself to try out and use and Miniax is one of the the Chinese AI companies and they're claiming this is uh the first open weights model to combine three frontier capabilities. So coding and agentic frontier. So they're going to call this a coding model and for use in agents as well. You can see some of the benchmarks here. These are limited and we don't know exactly um the details of each of these benchmarks, but you can see on stuff like uh like SWE bench, which is for coding, um it's right up there below Opus 4.7, but slightly above GPT 5.5, which should be pretty incredible if that's actually true for an open open weights model.

[00:01:22]And you could see generally it's kind of in line with a lot of these that they're comparing with Opus 4.7 and GPT 5.5 slightly above in some areas. SVG Bench here um and slightly below in others.

[00:01:37]But we all know how benchmarks go. We actually have to see how it works to really see how it stacks up against these other other models. So like I said, we don't have all the tech details, so I'm not going to um be able to tell you about all that. We do know that they use minia max sparse attention which they say scales context to 1 million. I have other videos that go into more detail about sparse attention but very quickly sparse attention means that you don't let every token look at every other token when it's running. So instead the tokens only attend to very selected tokens. And there's a lot of different ways to do sparse attention.

[00:02:13]Not exactly sure how they did it in this model, but there's different like block or window patterns or using only nearby tokens. There's a lot of different methods, but the idea is that this kind of saves compute and it really helps with longer context windows so you don't have that deterioration that I know uh most of us have experienced with longer context.

[00:02:36]And lastly, they're claiming uh natively multimodal from step zero to be interesting thing to test out. So that's the info we have right now. Uh which is good. We can go right into the tasks and start testing it out. So I want to give it a couple different tasks. Uh give it a coding task. Uh try to see how it runs. We'll be running it in Hermes agent the whole time. So we'll be able to see how it does with tool calling and stuff from an agent. I'm going to give it a task that requires a large amount of context because it's going to be kind of reviewing and retrieving from a very large database. And I think I'll also do something mo multimodal, so it'll have to read a an image and kind of reason and process from there. So, let's get right into it. And if you like this video, please consider supporting the channel by joining Team Garage, which will give you early access to videos or Team Garage Max, where you will get exclusive videos each week, as well as many other perks. I'd like to continue providing as much free valuable content as possible, as well as better experiments on different hardware. So, these memberships will really make it possible. Uh, you can see it's on Open Router right now already. So, that's what we're going to be using it in.

[00:03:47]We'll be able to see how much the spend is. I believe they said, right? Yeah, they have some uh API pricing promotion 50% off standard use. I don't know. That might just be through their API. I don't know if that's through open router um because in Open Router it says um 30 cents per million input, 1.2 20 million output.

[00:04:12]That is pretty cheap. So that might be the discount price. You'll have to check exactly what the prices are when you're actually using this and watching this video. But for now on open matter, this is the pricing which is lower than like the most recent Quen model and like Kim uh KT 2.6. This is cheaper. Okay, let's start it up. Just say hello. Make sure everything's working properly.

[00:04:36]Yeah, you can see 1 million token context window. There we go. Hello, Miniax M3 here. Ready to help. What are you working on? So, the first thing is going to be a coding task. And this is kind of my go-to now, I guess, for model testing, especially well for coding models at least. I think it's a good task cuz it's visual and it seems hard.

[00:04:54]Like none of the models have gotten it like totally nailed it. There's usually some kind of issues with how they do it.

[00:05:00]But the the task is to build a single HTML file, no build step for a One Piece meets Star Wars game that has a full screen hero section with dark background, rotating 3D sphere in 3.js with a custom GLSL fragment shader that makes it look like a glowing planet. Uh, the sphere slowly rotates on the X-axis on scroll GSAP animates the sphere scaling down and fading out as a text headline fades in. The headline has frame motion stagger entrance and it works in Chrome. No mpm CDN imports only. So, that's going to be the prompt.

[00:05:31]Um, I think it's actually a pretty good coding test. It's turned out to be, like I said, nothing, no other models have nailed it. Kim Cape uh 2.6 did the best, I would say, but there were other models that had specific elements that were better than that, and seems difficult to one-shot because there's a lot of different technical front-end elements happening to it. So, we'll be able to compare it pretty easily.

[00:05:55]Okay. So, it's finished uh making it.

[00:06:01]Uh so, let's see what it looks like here. Beyond the ground line.

[00:06:08]Uh here it is. We're in the browser.

[00:06:11]Okay.

[00:06:13]Um it's a little pixelated, but it's definitely working. Uh it's orbiting.

[00:06:19]Has kind of a glow effect to it. So, that did a pretty good job.

[00:06:25]Uh, scroll down.

[00:06:29]It's a nice fade effect there. So, we should see a headline coming up.

[00:06:35]Don't see anything.

[00:06:38]What's this?

[00:06:40]The grand line ends. The outer rim begins. They wrote a whole like backstory here. Okay. So, it's not exactly I was hoping for like a a headline um like in like I had in the prompt. Um, they added a lot of uh backstory.

[00:06:57]The Yonko's throne, interest.

[00:07:01]Uh, so let's give it one more shot cuz it did all of that. It looks good. The planet itself is not as nice as as in other models. Let's give it one more shot. I'm telling you, there was no frame or motion style stagger to the headline.

[00:07:17]Also tried to improve the sphere. It looks a bit pixelated.

[00:07:21]Yeah, like let me give you some comparisons. Like this was Opus 4.7 and this is what Quen uh 3.7 did 3.7 Max. Uh but this did take three prompts to be fair.

[00:07:37]So let's give it one more prompt and see what it can pull off. Okay, this is the second second round. Uh it looks less pixelated, but that's cuz they just made the middle white. Uh let's scroll down here. does fade out and we don't get any don't get any uh stagger motion.

[00:08:00]Okay. Yeah. I mean they claim the first word should start to scroll at 30%. Last word lands at 75. Each word visibly cascades in from below. Uh which is not not true here. Yeah. And to be fair they don't uh boast anything about front end design.

[00:08:18]Um, but I was hoping for a little bit better performance. At least they got the the kind of fade and shrink, right?

[00:08:25]But the orb obviously isn't the sphere isn't very good. Um, and they don't have the headline even after two prompts. So, they have this little backstory at least though. Next, I I want to try the the multimodal test. So, this is a PNG file.

[00:08:40]It's just a spreadsheet of my old NFT collection, as you can see. I'm going to try to see if it can process any of the data here. Um, this is a very low resolution image, so it might be difficult.

[00:08:53]We'll see what it can do. So, I'm saying look at this picture. Try to read the names on the left side. Do research to try to decipher what the spreadsheet is about and give me three insights into the data. This will test its reading and also kind of researchability. It should, I know these are old. They're from 2021.

[00:09:11]But if it reads and researches all this, it should tell that these are NFTTS from from Salana.

[00:09:20]So, it's handling tool calling pretty well. Um, and you can see it did extract those names from the image.

[00:09:27]So, it's starting to search. So, it is making some some good progress here.

[00:09:34]Okay, it got back uh it got back um from the task and you can see it did correctly identify what this was just based on the names.

[00:09:43]Um this is a spreadsheet a per NFT cost basis and P&L ledgers for Salana NFT bag.

[00:09:52]Funny to call it a bag, but uh that is accurate. It was my portfolio back in 2021.

[00:09:59]is part of a broader Salana L1 L1 positions sheet NFTs are legacy 2021 line so that's actually pretty impressive remember um it was one of the opuses couldn't even read this it is a very low resolution image I think it was opus 4.7 couldn't even read this image so it did pretty good research it found soul punks was the first mint on Salana um and some details about the different NFT collections. Three insights and now it's going to break down the uh one position is carrying the whole portfolio. There's a single three NFT lot.

[00:10:41]Uh so it's giving me now kind of a financial breakdown.

[00:10:49]The 2021 Salana PFP meta is a graveyard here. It does have kind of a personality to it.

[00:10:56]I don't know how much this is Hermes, but I feel like this has more personality than uh what I'm usually using this, which which is codeex usually GBT 5.5.

[00:11:08]The ongoing status flag is doing real work. It's not active trades.

[00:11:12]Clearly learn to size down and you're not pretending the older bags are active. So bonus pattern I noticed the dates run uh August to October and the recent small bias cluster at the end.

[00:11:23]Looks like you took a fresh swing at NFTTS after the late 2024 Salana recovery, but with much tighter position sizing. So, I would say that's a pretty good outcome. It was able to correctly identify just from this image. It's a very small image um and just has names.

[00:11:39]There's no other kind of labels to it.

[00:11:42]See, there's a lot of different tags on the bottom, but there's nothing specific to it. It does have NFT up here. Um, but it was able to identify some facts about these different collections and give me kind of a financial analysis of this spreadsheet. So, that's pretty good on the multimodal task, I would say. So, the last task is going to be kind of a multi-step task. It's also going to be kind of running long context here. So I'm asking it to find every paper about JEPA models in the last 6 months. List them in terms of importance to the area of research and legitimacy. Summarize the top three only. Then decide which paper I could most easily reproduce all or part of on my RTX 3060 which would be most interesting and educational for my audience. So this is a multi-step task here asking it to do research and then reasoning uh listing summarizing and then another kind of reasoning or multi-step task I would say.

[00:12:42]So I'll be interested to see how it it does on this.

[00:12:48]Um, JEA models, if you watched my video from last week or my first, uh, the world model reproduction video, are kind of a unique type of model uh, originated from, uh, Yan Lun and they're different from LLMs. They take a different approach. They're usually used in world models and instead of predicting pixels, they try to use hidden latent states to kind of predict. It's an interesting area certainly. You can check out my video from from last week if you want to learn more.

[00:13:22]Let's see what M3 can do on this. I forgot to update on the the spend, but the first two tasks only took 30 cents.

[00:13:29]This one just started, so not too bad. 30 cents.

[00:13:37]Okay, it's done some of the research.

[00:13:39]Jeepa papers in the last six months ranked uh metafair uh these are all uh BJPA this is March 2026 bla these are all co-authored by uh Lun Eb Jeppa which was in February strong academic contributions novel angles code usually released the world model which was the one that I um my first video was about.

[00:14:19]So that's pretty good. It got 15 different uh research papers and the top three summaries here of these top three.

[00:14:35]Me read these over quickly. Okay, they all look look good. Um, they're all accurate to the the papers themselves.

[00:14:44]And then my recommendation for the RTX 3060, reproduce EBJA.

[00:14:48]And EBJA is the only one in the top tier that explicitly targets a single GPU in a few hours.

[00:14:55]So, the reasoning is sound. um the single GPU and doing it on a smaller model certainly fit the requirements that I was giving it and the other papers require heavier uh GPUs.

[00:15:10]And it's also telling me this is kind of the cleanest under the hood narrative uh because the cleanliness of the model itself.

[00:15:19]It's kind of a single best three segment video.

[00:15:23]It's kind of planning out my next video for me.

[00:15:27]Uh, if you want a sharper onep paper video instead of three segment do two rooms planning demo in isolation. That's pretty good. Um, it didn't take that long. We haven't even pushed the context that much. Uh, but it read through a large number of papers and was able to do a pretty good reasoning task on that.

[00:15:44]Uh, so that's a pretty good result, I would say.

[00:15:48]And that took five cents. So very cheap considering.

[00:15:52]Yeah. So, those are the three tasks I wanted to do. This is just kind of a first look at M3. It'll be more interesting when we actually have like the have the open weights and have all the information, the tech details. Um, if people are interested, I might do a followup about that when we can really look under the hood of the model. But for now, this is all we have. And in the task that we had, first look, it fumbled the first kind of front-end design coding task. Um, but like I said, it doesn't specifically boast front-end design skills. The multimodal skills were pretty pretty impressive. The reasoning and analysis from a very low res image was pretty pretty good. I would say in general through all three tasks, like the agent calling, I had no issues with tool calling or weird stuff like that. It all ran pretty smoothly in Hermes agent.

[00:16:44]Speedwise, I would say it's kind of middle middle tier. It didn't feel especially fast. It didn't feel especially slow. Um, it's obviously pretty reasonable in terms of pricing.

[00:16:54]But like I said, this might be a discount pricing right now. But even if that was double, that wouldn't be too bad. And the final like long research assignment and reasoning and planning task I gave it. I would say that was its best performance out of the three. It really did a did a good job. Like this is very usable research and planning.

[00:17:13]Like I could take this and now actually try to run this experiment in reproduction and kind of like a I don't know this is subjective but I feel like the writing is good. I don't know better than a lot of other models I work with. I feel like its writing is a lot clearer and it has a little bit of a personality to it which I wasn't expecting from this.

[00:17:37]So that's going to be it for my first look at Miniax M3. Let me know your thoughts. have you been trying out? What are you doing with it? Uh, let me know.

[00:17:46]If you like this video, please leave a like. Please, uh, leave a comment, subscribe, and I will see you in the next video. Thank you for watching.

#ai #minimax #minimax m3 #m3 #open source

Related Videos

Artificial Intelligence

OpenHuman VS Hermes AI: Who Wins?

JulianGoldieSEO

285 views•2026-05-29

Artificial Intelligence

Long-Running Agents — Build an Agent That Never Forgets with Google ADK

suryakunju

142 views•2026-05-30

Artificial Intelligence

This computer is made from real human brain cells. And you can buy it.

Talktmsmedia

3K views•2026-05-28

Artificial Intelligence

BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2

aimmediahouse

122 views•2026-06-03

Artificial Intelligence

I Made the Same Anime Fight Scene in Every AI Video Generator

NobleGooseAnime

295 views•2026-05-30

Artificial Intelligence

Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S

cnnnews18

3K views•2026-06-01

Artificial Intelligence

I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)

AICodingDaily

298 views•2026-05-29

Artificial Intelligence

3D Platformer Update - NO CAPES

SolarLune

294 views•2026-05-30

Trending

Computer Science

The Meta AI Hack Is a DISASTER

LowLevelTV

141K views•2026-06-03

The Casino Had Us Guessing All Day

VegasMatt

157K views•2026-06-03

The Dancing Plague...

HoodieGuyStories

1730K views•2026-05-30

The Fastest Way To Board A Plane 😮

zackdfilms

6504K views•2026-05-29