MiMo V2.5 Pro is a 1.2 trillion parameter Mixture of Experts (MoE) model with only 42 billion active parameters at any time, featuring a 1 million token context window and hybrid attention mechanism. Released under MIT license on April 27, 2026, it achieves 57.2 on SWE-bench Pro and 78.9 on SWE-bench verified, demonstrating superior long-horizon agent capabilities. The model uses 40-60% fewer tokens than Opus 4.6, Gemini 3.1 Pro, and GPT 5.4 to reach comparable scores, with pricing at $1 per million input tokens and $3 per million output tokens. It can autonomously complete complex tasks like building a full-featured video editor (8,192 lines of code across 1,868 tool calls) and a SysY compiler in Rust (4.3 hours, 60-72 tool calls), making it the strongest open-source option for serious agent work and long coding sessions.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Xiaomi MiMo V2.5 Pro: Open Source Model That Beats Opus 4.6 at 1/3 the CostAdded:
Xiaomi just dropped MiMo V2.5 Pro and it's one of those releases that actually deserves a closer look. Not because of marketing noise, but because the numbers and the demos behind it are kind of hard to ignore. So, I spent the last few days running it, comparing it with other open weight models, and pulling apart what it can actually do versus what it actually claims. Let me walk you through it. All right, so MiMo V2.5 Pro is a mixture of experts model with 1.2 trillion total parameters, but only 42 billion are active at any time. It has a 1 million token context window, uses a hybrid attention setup, and ships under the MIT license. So, fully open source, fully commercial use, weights are on hugging face right now. It was released on April 27th, 2026 and Xiaomi rolled it out across their AI studio, their API platform, and pretty much every surface they own. What makes this release different from the usual Chinese open source drop is the focus.
Xiaomi isn't really chasing the highest score on every leaderboard. They built this model for long horizon agent work, meaning the model has to stay coherent across thousands of tool calls without falling apart. And from what I've seen, that's where it actually delivers.
On SWE-bench Pro, it scores 57.2.
On SWE-bench verified, it lands at 78.9.
Terminal bench 2.0 sits at 68.4.
On the artificial analysis intelligence index, which is a composite score across reasoning, knowledge, math, and code, MiMo V2.5 Pro hits 54. For context, the median for open weight models in its class is around 30. So, it's clearly above the pack. On GSM8K, it nearly maxes out at 99.6.
On math, it does 86.2.
And on HLE, it scores 48.
For coding specifically, it sits very close to Claude Opus 4.6 on Xiaomi's internal coding bench and competes in the same range as Gemini 3.1 Pro and GPT 5.4 on most agent tasks.
But here's the part that actually matters for builders. Mimo uses 40 to 60% fewer tokens than Opus 4.6, Gemini 3.1 Pro, and GPT 5.4 to reach similar scores. On Claude Eval, it reaches 64% pass cubed using only around 70,000 tokens per trajectory. That's a real cost advantage.
The pricing reflects that, too. It's $1 per 1 million input tokens and $3 per 1 million output tokens. Compared to running Opus on a long agent loop, you're looking at a fraction of the cost for comparable output. Look at this SysY compiler written in Rust from scratch.
This is a Peking University compiler course project that takes a computer science major several weeks to finish.
Mimo V2.5 Pro completed it in 4.3 hours across 60-72 tool calls and scored a perfect 233 out of 233 on the hidden test suite. And the way it did it wasn't brute force. The first compile already passed 107 tests, which means the model designed the architecture correctly before writing a single test. When a refactor at turn 512 broke two tests, it diagnosed and recovered on its own.
Also, in another example, they asked it to build a full-featured video editor.
From a few prompts, the model produced a working desktop app with a multi-track timeline, clip trimming, crossfades, audio mixing, and a complete export pipeline. The final build was 8,192 lines of code written across 1,868 tool calls over 11.5 hours of fully autonomous work. No human in the loop, no patch-ups in between. That's not a snippet. That's a real product built end-to-end by a model. There's also a graduate-level analog circuit design demo where they wired the model into an NGSpice simulation loop, and it tuned a low-dropout regulator across six different metrics: phase margin, line regulation, PSRR, transient response, all of it. In about an hour, every spec was met, and four of the metrics improved by an order of magnitude over its first attempt. So, this isn't just a coding model. It can hold long, technical, multi-step engineering work without losing the thread. Okay, so let me show you what this looks like in practice. There's a few ways to use this model, and I'll walk through them quickly.
The easiest is the official Mimo Studio chatbot, which is free and runs in the browser.
Then there's the official API platform from Xiaomi if you're building something. Open router also serves it if you want a drop-in replacement for your existing setup. And if you want to just compare it against another model side by side, LLM Arena has it listed where you can run battles against models like Kimi K 2.6, GLM 5.1, DeepSeek V4 Pro, and others.
That's actually a great way to feel the difference without committing to anything.
For this walk-through, I'm going to do two demos: one 3D project and one front-end project. I'll skip the kitchen sink stuff because, honestly, after testing dozens of these prompts, two well-chosen tests tell you more than 10 superficial ones.
First demo. I'm prompting it to build a 3D solar system in 3.js with realistic orbital speeds, a working camera that can zoom and rotate, click to focus on each planet, and a small info card that pops up when you select one. So, this is a single-shot prompt, no iteration, and what comes back is a complete HTML file with proper lighting, textures, all eight planets with reasonable scale, smooth orbital motion, and the click handlers actually work.
The camera transitions are clean. The info cards have real data, distance from sun, orbital period, diameter. Mimo's version had better physics behavior, and the UI felt less templated. But, its planet textures looked flatter, and the camera controls were laggier. Second demo, front end. I asked it to build a Stripe-style pricing page for a fictional analytics product with three tiers, a feature comparison table, an annual versus monthly toggle that animates the prices, hover states on each card, and a small FAQ section at the bottom. The output came back in one file with proper Tailwind classes, good spacing, real typography hierarchy, and the toggle actually animates the numbers smoothly instead of just swapping them.
The hover states have subtle lift effects. The FAQ uses an accordion pattern that's not janky. When I ran the same exact prompt through GLM 5.1, the layout was fine, but the animations were stiff, and the typography looked generic. Mimo's version felt closer to what a real designer would ship. So, where does this leave us? If you're picking an open-source model right now for serious agent work or long coding sessions, Mimo V2.5 Pro is probably the strongest option you have. It's not always the highest scorer on every single benchmark. Kimmy K 2.6 still has an edge on certain pure coding tests.
DeepSeek V4 Pro has more raw breadth on knowledge benchmarks, but neither of them sustains 1,800 tool calls without falling apart. And neither of them ships a working video editor from a prompt.
The token efficiency thing is also a real deal for anyone running this in production. If you're building an agent that runs for hours, the difference between 70,000 tokens per trajectory and 150,000 tokens per trajectory adds up to actual money. And because the weights are open under MIT, you can host it yourself if you have the compute or just use the API at $1 per million input tokens. What I'd say is the ceiling on this release is the hardware floor to run it locally. 1.2 trillion parameters means you need a serious multi-GPU setup. So, most people will end up using it through the API or open router. And the 1 million token context, while real, still degrades a bit at the upper end.
Past 512K tokens, the BFS scores on graph walks drop from 0.56 to 0.37. So, it's not magic, but it's still much better than V2 Pro, which collapsed to zero at 1 million.
Overall, I think this is the most interesting open-source release of the year so far. Not because it tops every chart, but because it actually does the boring, hard, long horizon work without breaking. Try it through Mimo Studio if you just want to chat with it, hit open router or the API if you want to build.
And put it head-to-head with Kimmy or GLM on Arena if you want to see where it lands for your specific use case. All right, so that's it from the video and I hope you enjoyed it. If you did, please like this video and subscribe to the channel and I'll see you in the next video.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
5 Mind Blowing Omni Uses Cases
PaulJLipsky
1K views•2026-06-02
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29











