Installieren Sie unsere Erweiterung an, um sofort in jedem Video zu suchen

Qwen 3.7 Max: The Model Beating Claude Opus (Nobody's Talking About)
Hinzugefügt: 2026-05-27

333 Aufrufe229:36UniverseofAIzOriginalveröffentlichung: 2026-05-26

Qwen 3.7 Max marks a pivotal shift where Chinese models are no longer just catching up, but actively redefining the cost-to-performance ratio of frontier AI. This model proves that the competitive moat in Silicon Valley is shrinking as high-end agentic capabilities become globally commoditized.

[00:00:00]Alibaba dropped Quen 3.7 Max and almost nobody's talking about it. Gemini 3.5 Pro looks like it's about to ship with the new thinking mode that could shake up the leaderboard and more Chinese labs are dropping their API prices this week.

[00:00:13]So, let's get into it. Let's get into something I think a lot of us are quietly sleeping on. Alibaba dropped Coin 3.7 Max a few days ago and from what I can tell almost nobody outside the AI Twitter space is really talking about this model. And I get it, we've been busy with GPT 5.5, Opus 4.7, the new Gemini 3.5 flash model. So the frontier right now is focused on American labs, but the benchmarks are quite remarkable. And the deeper I dug into the technical post, the more I started thinking Quen might be quietly turning into the lab that actually matters out of China, at least at the moment. To give you guys an example on Terminal Bench 2.0, 0 which is the benchmark that basically simulates a real software engineer working in a sandbox terminal. Coin 3.7 Max scores 69.7 beating Deepseek version pro max at 67.9, Opus 4.6 Max at 65.4 and Kimmy K2.6 Thinking at 66.7. So that's the top result on the Openerboard for the test right now. On the software engineering bench pro, it's the same story. Quinn pulls in at 60.6. Opus 4.6 6 max gets 57.3 and on MCP Atlas which is a realistic coding agent benchmark coin 3.7 Max hits 76.4 versus Opus at 75.8.

[00:01:28]So if you're a developer and you care about agent workflows and not just you know simple applications, this model is a great option for you to use. And what's really crazy is that Alibaba also gave Coin 3.7 Max a really hard coding problem optimizing a GPU kernel. Then they walked away. The model ran autonomously for 35 hours, made 1,58 tool calls, and got 10 times geometric mean speed up over the baseline, and there was no human in the loop, just a model writing the code, testing it, finding bottlenecks, and rewriting testing again. For context on the same task, Deepseek version Pro topped out around 3.3 times, and Kim K 2.6 around five times, GLM 5.1 around 7.3 times, and Quen got to 10. So this model is definitely beating all the other Chinese labs at the moment. But now the real question is how does this actually stack up against the newest models from the western labs? Because beating Deepseek is one thing but beating Opus 4.7 and GPT 5.5 is a whole different conversation. On the artificial analysis intelligence index which is the standard composite score people use GPD 5.5 is leading with a 60.3 max and cloud opus 4.7 both tie and score 57. So on a raw intelligence, open AAI is still at the top while Quen is tied with Opus, which itself is a really good accomplishment for Alibaba. Then it gets more interesting on the coding side. On Terminal Bench 2.0, Quen 3.7 Max scores 69.7 against Opus 4.6 Max at 65.4 on software engineering Bench Pro and MCP Atlas. Quinn also leads Opus.

[00:03:00]The newer Opus 4.7 is stronger than 4.6, six, but the agentic coding gap is still meaningful and Quinn is clearly the one ahead on all those specific evaluations.

[00:03:10]Then there's Gemini 3.5 Flash, and this is where the pricing conversation matters. On Google's published benchmark table, Flash leads MCP Atlas at 83.6 and Tulathon at 56.5. Opus 4.7 less software engineering bench pro at 64.3. GPT 5.5 less terminal bench 2.1 at 78.2. So Flash, which is a cheap mid-tier Google model, is already running ahead of the flagships on two use benchmarks. Pro hasn't even shipped yet, which is coming next month. So where does that leave Quen 3.7 Max? It's API only, $2.50 per million input tokens, $7.50 per million output. For comparison, GPT 5.5 is $5 per input and $30 per output. So Quentyn is roughly half the price on input and one quarter on output. And it natively supports the enthropic API protocol. You can point cloud code, open claw or any enthropic compatible harness directly at the quen endpoint as a drop in. So if you're building agents, you can literally swap your endpoint tonight and try for yourself. Now the catch, the model is verbose. Artificial analysis observed roughly 97 million tokens generated during their evaluation, which is far above the median of 24 million.

[00:04:19]Per token prices add up faster than they look on paper, especially on long agentic runs. But that's your trade-off.

[00:04:25]But to sum up, I'm not saying Quinn has overtaken GBT 5.5 or Opus 4.7 on general intelligence and writing quality. Open AI is obviously still ahead. Enthropic is still the model most developers trust for production. But 6 months ago, the conversation was American Frontier, Chinese openweight, and deepseek as a cheap option. That framing is basically breaking now. From what's currently happening with Quen's release cadence and the gap they just opened on Long Horizon Agentic Work, this is usually a clear signal that a lab is hitting a real groove and I think Quen ends up being the third name in the Frontier conversation by end of the year and Deepseek ends up being the one trying to catch back up. Anyway, that's the model I think most people are sleeping on right now. So, it's worth keeping an eye out on Alibaba for sure. Before we continue, we just launched the Universe of AI newsletter. If you want to stay on top of AI news without having to hunt for it, link is in the description.

[00:05:17]Don't miss out. There are also some early leaks at a mode coming to Gemini 3.5 Pro, which we know is coming next month. Somebody was poking around the Gemini API this morning and got back a 400 error saying thinking level extra high is not supported. You usually only see that kind of error when a parameter exists in the API but isn't switched on yet. So, Google is appearing to be getting prepared for an extreme high thinking tier and it lines up with the 3.5 Pro launch window next month. For context, the current Gemini 3 Pro thinking level parameter accepts low or high, and Gemini 3.1 Pro added a medium tier. No Gemini model has had an extreme high setting yet. Meanwhile, OpenAI shipped GPT 5.5 with five effort levels, including extreme high, and that's the one that's leading the leaderboard at the moment. Right now on the artificial analysis intelligence index, GPT 5.5 extreme high leads at 60. GPT 5.5 high is second at 59. Then cloud opus 4.7 Max and Gemini 3.1 Pro preview are tied at 57. The leak itself is just an API string. We don't know how much extreme high actually adds for Gemini. For reference on OpenAI side, the jump from high to extreme high on GPT 5.5 is one point on the index. So, it's not a big jump on its own, but enough to retake the leaderboard if we're already close.

[00:06:34]And Gemini Pro is already tied with Opus, right behind GPT 5.5. The other thing worth keeping in mind is the Agentic side. Gemini 3.5 Flash already posted big gains on Agentic Evaluations and MCP2 calls. If extreme high thinking layers on top of that for the Pro model, you end up with something that's both reasoning heavy and tool capable in the same release. I'm not too confident on exact numbers until the actual launch, but what we know currently, 3.5 Pro is going to be a big deal for Google because the 3.5 flash model release was a little bit underwhelming and people are not super impressed with it. If 3.5 Pro is able to compete with GPC 5.5 and Opus 4.7 and retake the lead, Google might be actually back then. This is the second big API price cut from a Chinese lab in less than a week. Earlier today, Xiaomi posted that they're permanently dropping prices on the Mimo version 2.5 series by up to 99% compared to previous pricing, effective tonight at 6. The actual numbers are pretty wild. MIMO version 2.5 Pro, their flagship, is now 0.0036 per million tokens on cash hit input and 0.435 on cash miss, 0.87 on output. The cash hit price is a 98 to 99% drop from where it was. And on the smaller version 2.5 model, output is now 28 cents per million tokens. Xiaomi released MIMO version 2.5 and version 2.5 Pro in late April, both under the MIT license, making it open source. The Pro is a mixture of experts model with 1.02 trillion total parameters and 42 billion active. It's trained for agentic work consisting long horizon task across more than a thousand tool calls and the benchmark numbers are competitive with top closed source models. Software Engineering Bench Pro, for example, it's at 57.2 and the claw eval sits at 63.8.

[00:08:24]And the timing here is what's interesting. Deepseek did this exact move earlier this week. They made their 75% version pro discount permanent.

[00:08:32]Input dropped to 0.435 per million tokens output to 0.87, which puts version pro at roughly 7 times cheaper than anthropic and nearly nine times cheaper than OpenAI on output tokens.

[00:08:44]So, in one week, two Chinese labs have permanently dropped flagship pricing into territory where Western labs basically can't match without losing money. This could be a sign that the cost structure on Chinese inference is shifting. And Deepseek hinted it was tied to Huawei Ascend 950 chips shipping in volume. Xiaomi just said it's from inference optimization. I'm a bit worried for the API margin business at OpenAI Anthropic. If this becomes a pattern from what's currently rumored, more Chinese labs may follow this week.

[00:09:12]worth watching how this shifts the conversation around what builders are actually willing to pay for a Frontier model. But that's it for today's video.

[00:09:20]Make sure you guys are subscribed to the channel, follow our new newsletter as well at universeai.behive.com as well as subscribe to the main channel World of AI and support us on X by following the Universe of AIZ as well.

[00:09:32]Until then, I'll see you guys in the next video.

#qwen 3.7 max #qwen 3.7 #alibaba qwen #qwen vs claude opus #qwen beats claude

Ähnliche Videos

Künstliche Intelligenz

OpenHuman VS Hermes AI: Who Wins?

JulianGoldieSEO

285 views•2026-05-29

Künstliche Intelligenz

Long-Running Agents — Build an Agent That Never Forgets with Google ADK

suryakunju

142 views•2026-05-30

Künstliche Intelligenz

This computer is made from real human brain cells. And you can buy it.

Talktmsmedia

3K views•2026-05-28

Künstliche Intelligenz

BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2

aimmediahouse

122 views•2026-06-03

Künstliche Intelligenz

I Made the Same Anime Fight Scene in Every AI Video Generator

NobleGooseAnime

295 views•2026-05-30

Künstliche Intelligenz

Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S

cnnnews18

3K views•2026-06-01

Künstliche Intelligenz

I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)

AICodingDaily

298 views•2026-05-29

Künstliche Intelligenz

3D Platformer Update - NO CAPES

SolarLune

294 views•2026-05-30

Trends

The Meta AI Hack Is a DISASTER

LowLevelTV

141K views•2026-06-03

Paris is in SHAMBLES right now 😭

H1T1

4053K views•2026-05-31

The Casino Had Us Guessing All Day

VegasMatt

157K views•2026-06-03

The Dancing Plague...

HoodieGuyStories

1730K views•2026-05-30