Qwen 3.7 Plus is a multimodal AI agent developed by Alibaba's Qwen team that can simultaneously process text, images, and video while performing autonomous tasks. Unlike traditional AI models that specialize in single tasks, this agent can see screens, write and test its own code, call external tools, and complete complex workflows without human intervention. Key capabilities include iterative coding (writing, testing, and fixing code until successful), visual analysis (reading images and pointing to specific screen elements), and tool integration (accessing external information and services). The model achieved rank 16 on Vision Arena, demonstrating strong visual understanding capabilities. However, it is important to note that Qwen 3.7 Plus reads and analyzes images but does not generate or create images—that functionality requires separate specialized models. This makes it ideal for developers, visual analysts, and anyone working with mixed content types, while text-only users may find the text-only Qwen 3.7 Max model more appropriate.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
China's New Qwen 3.7 Plus is INSANEAdded:
China's new Qwen 3.7 plus is insane.
What if one AI could see, think, code, and actually do the work? Not just talk about it. Not just answer your questions. Actually look at the screen and take action. China just dropped a model that does exactly that. And almost nobody's talking about it yet. Trust me, you'll want to see this one. I'm the digital avatar of Julian Goldie, and I help people actually learn and use AI tools in their real work. Not just mess around with them. So today, I'm breaking down this brand new model, what it really does, and where it fits. And stick with me because there's one thing about it that most people are already getting wrong. I'll clear that up before the end. So let's get into it. The model is called Qwen 3.7 plus. It comes from Alibaba's Qwen team, and it just went live. Here's the simple version. Most AI models do one thing well. They read text, or they look at a picture. Qwen 3.7 plus is built to do it all at once.
It's what they call a multimodal agent.
That's a fancy word, so let me make it plain. Multimodal means it can take in different kinds of input, words, images, even video. Agent means it doesn't just answer. It can plan, take steps, and actually get a task done. One model. It sees, it thinks, it codes, it acts.
Right now you can use it through the API on Alibaba Cloud's Model Studio. There's also a sibling model called Qwen 3.7 max, but that one is text-only. The plus version is the one that can see. So what can it actually do? Let me walk you through the real features because this is where it gets good. First, it's a coding agent, and not the kind that just spits out code and leaves you to fix it.
The Qwen team lists this out clearly. It can reason through a problem step-by-step. It can write and rewrite its own code. It can call outside tools.
It can test its own work. And it can keep looping until the job is done. That last part is the big one. It doesn't stop at the first messy try. It checks the result and keeps going until it's right. Second, it works across both a normal screen and a command line. So it can handle visual tasks and plain text tasks in the same flow without you switching tools. Third, it's a visual agent. It can look at an image, understand what's on it, point to the right spot on the screen, and even search for answers based on what it's seeing. That seeing part is worth slowing down on because it changes how you work with it. When it points to the exact spot on the screen, you're never left guessing which button or which line it means. It shows you. And when it runs into something it doesn't recognize, it doesn't just shrug. It can look it up and answer using both what's right in front of it and what it digs up. So, the image alone doesn't have to carry the whole answer. It looks, fills in the gaps, and hands you something you can actually act on. And fourth, it plays nice with other setups. You're not locked into one app. It can run inside different agent frameworks, so you can plug it into the workflow you already use instead of starting over. Let me unpack that tool part for a second because it's easy to skip right past.
Calling outside tools means it doesn't have to do everything from memory. If a task needs some information or an action that lives outside the chat, it can reach for the right tool and fold the result back into the work. Pair that with the step-by-step thinking, and you get something that plans first and then moves instead of just throwing one guess at the wall and hoping it sticks. On the vision side, an early preview of this model landed at number 16 on a public leaderboard called Vision Arena, where real people vote on which AI reads images better. That put Alibaba as the number five lab for vision. Now, a leaderboard spot shows promise. It's not a promise, but it tells you this thing is serious about seeing. Okay, so let me show you how I'd actually put this to work. So, the first thing I tried was building with it. I gave it this prompt, "Make an HTML website for my new AI Profit Boardroom community." This is right in its wheelhouse. I just described the site I want for the AI Profit Boardroom, and that coding loop does the building, the testing, and the cleanup for me. No babysitting. Next, I tried something more specific. I gave it this prompt, "Create a single HTML page for a drop shipping profit calculator for AI Profit Boardroom with inputs for product cost, shipping fees, selling price, and monthly order volume." One page, one prompt, and it builds the calculator with those exact input fields working and ready. A hands-on tool like that is exactly the kind of thing I'd put in front of the AI Profit Boardroom so people have something to actually use, not just watch. And here's where the seeing side really shines. I took a draft intro frame for the AI Profit Boardroom, a title card with the words AI Profit Boardroom on it, and I handed that image straight to Qwen 3.7 plus. It reads what's actually on the frame, then it tells me what's working, what's hard to read, and what I'd want to change before I use it. That's the visual agent part. It's not me guessing what looks off. It's the model looking at the real thing and giving me grounded feedback for the AI Profit Boardroom. Now, remember that one thing I said people are getting wrong. Here it is. A lot of folks see the word multimodal and assume this model makes images and videos for you. It does not. Qwen 3.7 plus understands images and video. It reads them. It does not create them. Alibaba's image and video makers are separate models. So, if someone tells you to use this one to spin up a flashy animated intro, that's just not what it is.
Knowing that one fact saves you a ton of wasted time and a lot of frustration.
So, who actually needs this? If you build things, web pages, simple tools, little apps, this is for you. That coding loop alone is worth a look because it does the boring fix-it-yourself part for you. If you work with a lot of screenshots, charts, or documents, this is also for you because reading visuals is its strong suit. And if you're just tired of tools that answer but never actually do anything, this one is built to take action. And there's one more group I'd point out. If your work mixes things together, a screenshot here, a block of text there, a bit of code to tie it all up, this is made for that. A lot of tools force you to split that across three different apps. This one keeps the whole thing in one flow. But, I'll be straight with you. If all you ever do is write plain text, you might not need the plus version at all. The text-only max model could be plenty. That's the honest answer, and I'd rather give you that than oversell it. Here's where Qwen sits right now. The team has been shipping fast. They've got the text-only max for heavy thinking, and now the plus for anything with images or video. They're clearly leaning hard into agents, models that go do tasks, not just chat back.
And on top of that, the platform added safety guardrails, so the agent stays inside set limits when it runs commands or edits files. That matters a lot the moment you let an AI actually touch your work. The platform also has a piece that learns from real use. When the agent runs tasks for real, that feedback gets used to sharpen how it performs over time. So, this isn't a frozen tool that stays exactly the same. It's set up to keep getting better at the actual job.
Now, if you're watching this and thinking, "Okay, but where do I even start with all of this?" That's exactly what we cover inside the AI Profit Boardroom. We walk through setting a model like Qwen 3.7 Plus up through the API step-by-step, so you're not stuck staring at a blank screen on day one. We run live coaching calls where you can bring your own setup and ask your questions in real time. There are walkthroughs for using it as a coding agent and as a visual agent, so you see it in action, not just in theory. And there's a clear roadmap, so you know what to try first, second, and third instead of guessing. If you're watching this video and you actually want to use this tool, not just hear about it, the AI Profit Boardroom is built for exactly this kind of thing. Let me leave you with a few quick tips before you go. Tip one, start small. Give it one clear task and let it run its full loop before you pile more on top. Tip two, remember what it's for. It reads images and video. It does not make them. Use it where seeing and acting are the point. Tip three, test it on your own work before you trust it with anything big. A leaderboard score is a hint, not a guarantee. Tip four, a couple of things aren't public yet, like the price and how much it can hold at once. So, check the current details on Model Studio before you build something huge around it. Okay, that's the full breakdown. If you want the full process, SOPs, and 100 plus AI use cases like this one, join the AI Success Lab. Links in the comments and description. You'll get all the video notes from there, plus access to our community of 75,000 members who are crushing it with AI. And if you're ready to actually go and try Qwen 3.7 Plus, here's the thing. The first time you wire it up through the API, you're going to hit little snags. A setting that's off, a prompt that doesn't land the way you hoped, a result you're not quite sure how to read. That's completely normal, and it's exactly where most people give up. Inside the AI AI Boardroom, you don't have to figure it out alone. You get the coaching calls to unstick you, the tutorials that show you the exact steps, the road map that tells you what to do next, and the prompts that get you a real result a whole lot faster. So, if you want to go from just watching this to actually building with it, come join us at aiprofitboardroom.com.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











