Claude Opus 4.7 introduces a significant self-verification behavior where the model checks its own outputs before reporting back, reducing the need for multiple rounds of back-and-forth error correction during complex coding tasks. This improvement is evidenced by substantial benchmark gains: SWE Bench Pro scores increased from 53.4% to 64.3%, and SWE Bench Verified improved from 80.8% to 87.6%, representing a 10-point jump in coding performance. The model also shows enhanced visual reasoning capabilities, scoring 82.1% without tools and 91% with tools on visual reasoning benchmarks, compared to 69.1% and 84.7% respectively in the previous version. These improvements make Opus 4.7 particularly valuable for complex engineering tasks where quality directly impacts outcomes, while simpler repetitive work remains better suited for the more cost-effective Sonnet 4.6 model.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
A Month with Claude Opus 4.7 — Did Anthropic Actually Fix the Problem?Added:
Just a month ago, AMD senior director of AI publicly declared that Claude cannot be trusted to perform complex engineering. And days later, Anthropic shipped Obus 4.7, same price as 4.6. So, did it work? The benchmark say partly yes. For example, SWE bench pro leads a tech coding gains are real and serious teams are actually calling it the strongest coding model they have ever evaluated. But benchmarks are one thing.
So let's put Opus 4.7 to a real test.
Building a website with minimal guidance and see what it can actually produce.
All right. So let's walk through what actually changed in Opus 4.7 because there are several improvements here worth understanding before you use it.
Not every single update affects every use case equally and knowing which improvements are relevant to your work can change how you prompt it and what you can expect from it. And here is how Enthropic describes it directly. Opus 4.7 is a notable improvement on Opus 4.6 in advanced software engineering with particular gains on the most difficult tasks. Users report being able to handle their hardest coding work the kind that previously needed close supervision to open 4.7 with confidence and open 4.7 handles complex longunning tasks with rigor and consistency pays precise attention to instructions and defies it way to verify its own outputs before reporting back. All right. So the key phrase here is verifying its own outputs before reporting back. And this is actually a new behavior because previous models would complete a task and hand it back. And now Orus 4.7 will check its own work first. And for developers doing complex builds that means fewer rounds of back and forth catching errors. The model should have caught itself. And now let's look at the numbers because this chart is actually worth spending a moment on. So for example the two aenic coding entries are where Opus 4.7 makes its biggest statement on SWE bench pro the hard real world coding benchmark OBS 4.7 scores 64.3% and OBS 4.6 was at 53.4% and this is a 10point jump and then for GPT 5.4 sits at around 57.7 and Gemini 3.1 Pro is at 54.2 two OBUS 4.7 leads the generally available models by a meaningful margin and then on SWE bench verified the broader verified coding benchmark OBS 4.7 even reaches 87.6% and OBUS 4.6 was at 80.8 again it is a clear step forward and this is not just a benchmark number on one key benchmark cla 4.7 resolves three times more production tasks than Opus 4.6 ICS with double-digit gains in code quality and test quality. And this is the kind of improvement that shows up in real daily work, not just controlled evaluations.
And also the community feedback reflects this as well. After weeks of developers publicly saying Opus 4.6 had gotten worse on complex engineering tasks, the reception to 4.7 has been noticeably different. So the consensus coming back is that the hard tasks still like hard tasks again like not problems the model gives up on or half softs anymore. And now methus preview still leads almost across every single category on this chart. But Mus is not generally available and anthropic has been clear that it is not planned to make it publicly accessible anytime soon due to the safety considerations. So OBS 4.7 is actually the best model you can actually use right now. And one more thing worth mentioning here is the facial reasoning.
So for Opus 4.7 scores 82.1% without tools and 91% with tools on a trust surfaceial reasoning benchmark. And for Opus 4.6 it was at 69.1 and 84.7% respectively. That jump is pretty significant. And for anyone working with design files, technical diagrams or document analysis, that is actually not a minor improvement. It is a meaningful capability expansion. And those are the improvements that matter most in practice. And now let's talk about what we are building today and put OBUS 4.7 to work on a real project. So here's the thing about benchmarks. They may tell you what a model is capable of, but they do not tell you what it feels like to actually build with it. So let's just find it out. And now we're building a full marketing website for an example game called Celestra, a fantasy mpg. So we are aiming to create something that requires the model to make real design and layout decisions, not just to generate a boilerplate. So Anthropic actually says Opus 4.7 is more tasteful and creative when completing professional tasks and a game website with a clear quality bars sounds a pretty good test for it, right? So let's just see if this holds up. So here we are in CL code and we kept the project folder intentionally simple. Just two files, that's it. So the first one is the PRD. We call it the product requirement document. Let me walk through it quickly. This is a full marketing site for an MMO RPG game with six section in total. An animated hero section, a world introduction section with lore copy and a two column layout, a classes section to show six playable rows and each with its own SVG illustrations and hover animations and core features grid and explore the realms section with four world regions and news and block section. a community call to action and a footer. So the design system is already defined in the PRD. And lastly, the second file is a video asset for the hero section like the cinematic background threshold that anchors the entire opening experience.
And now that we have got that ready, let's just start building and see what Opus 4.7 produces. So normally for a build like this, we usually prepare prompts to break implementation into multiple phases. That is the safe controlled approach for complex projects. But today we are just testing Opus 4.7. So we are just going to handcote the full PRD and ask it to plan the entire build for us. First we are going to let it think like a senior front- end architect and then we just execute from that plan. And here is what we are giving it. We are asking Claude to analyze the full PRD and produce a professional implementation plan before writing a single line of code. And the plan needs to cover the project architecture, component hierarchy, section bysection, build strategy, animation strategy, reusable UI systems, assess structure, responsive approach, performance optimization, SEO structure and a final phased execution road map.
So we are giving it minimal guidance beyond that and the PRD is attached. The rest is just up to open 4.7. But before we run this, of course, let's make sure we are actually on the right model. So we need to type the command /model in cord code and select opus 4.7 from the list. And now we are on it right now. So let's just replace the prompt and hit enter. This can take a moment. So just be patient. Great. So after cl is done.
Here is the summary at the bottom. And let's scroll up and read through the full plan from the top. And it looks like that response got cached before we could see the full output from the beginning. And that happens occasionally on longer generations in CL code. And what we are going to do is to ask cl to save the implementation plan as a markdown file. And that way we have the complete plan in one single place. We can reference it throughout the build.
And now this is just a minor task. It does not need opus 4.7's reasoning yet.
So we will switch down to sonet 4.6 for this step to save tokens and then come back to opus open.7 when we start the actual implementation. And here the implementation plan MD file is ready.
Let's just check it out first. So the plan opens with the project architecture like NexjosJS app router components organized by section a dedicated animations directory utilities folder and a CMS data layer is clean and logical and for component hierarchy is laid out kind of the way a senior engineer would do it as well like shared layout components at the top for navigation bar footer then a scroll wrapper section level components below reusable building blocks within each not just a flat file list and the animation strategy is solid as well. For example, the GSAP scroll trigger for scroll driven refuse and parallax frame motion for micro interactions. Explicit guidance on keeping motion elegant no aggressive bouncing and it actually matches the PRD motion guidelines exactly. So reusable UI systems assess optimization mobile first responsive approach or covered. So is actually performance thinking built into the plan before a line of code exists. Overall, the plan is pretty solid and Opus 4.7 reads the PRD accurately and produces an architectural response that reflects the brief rather than a generic Nex.js boilerplate plan and this is actually what we wanted to see, right? So, we can just give C the go signal to proceed with the implementation and this can take some time. So, just let it run. All right. So, after a couple of minutes, the phase one is done. The foundation is scaffolded and the pure section is built. So, we could open the dev server now. Let's wait until all four sections are in place before reviewing. Let's just give Claude the green light for phases two and three. Great. So for phases two and three are done. Most of the content sections are now complete.
The new section, community, code to action and footer are the only remaining parts and those are actually covered by phase 4. Let's run it as well. Great.
After some time, phase four is done as well. Let's just open the project in the terminal and run npm rundeaf to bring up the deaf server. And here it is. The hero section looks pretty great. The cinematic video background is in. The headline and corded actions are positioned cleanly. And the overall opening impression lands pretty well. So we can just scroll through the rest of the site. So there is a lot of white space actually. The sections are structurally correct and layout is sound but without a rich visually library behind it. The sections feel lighter than the PR antenna and that is not a failure of OBS 4.7. It is actually a resource constraint. So we did not bring in a mage journey or additional assess generation tools for this build and actually caught that is best to compensate with gradient backgrounds and placeholder officials. But the gap between the PR's official ambition and what an AI can generate without external assets is visible here. So for a first pass with a single prompt and no additional tooling, this is actually a pretty solid foundation. At least like the structure is right, the motion logic is in place. What it needs is just official death. So let's just refine it.
All right, so we are back in Claude code and here is the first refinement prompt.
Now we are flagging the overuse of white throughout the site and asking Claude to pull in more official background elements. So the text alignment is also inconsistent starting from the your calling section because we want everything from there centered and the custom SVGs on the arc and the explorer the realms section are not working visually. We are actually asking cl to generate a proper image for the ark and replace the realms SVGs with a 3D interactive map and the core system cards need to be uniform in size so they can align correctly. some min changes on the def log entry as well. So let's just run it and see what comes back. Now it's finished. Let's just check the def. And you can see the refinement pass made a noticeable difference. And we kept the hero section exactly as it was. And the cinematic video background, the headline, the call to actions, those were already working and it sets the right tone for everything below it. But when we scroll down, it's immediately clear the refinement prompt made a real difference. You can see that the white space is gone. The sections have facial weight now and the layout reads as intentional rather than just scaffolded now. And we did run another prompt to redesign some of the lower sections. But just like the very first prompt we used, we gave Claude code a lot of creative freedom like minimal constraints, some direction on what we wanted. No handholding on the specific implementation choices. And that matters because the output you are seeing is not the result of micromanaging every single element. It is claw making real design decisions within a defined direction.
And honestly, it looks pretty good as well. For example, the details feel intentional throughout. The color system is consistent. The motion has the right weight. And for something built in just one sitting from a like PRD and two refinement prompts, this is a pretty good result. Of course, it's not going to look like something like a professional studio spent weeks on.
Think of what we built here as like giving you like 60% of the way there fast. The remaining 40% we need the human intuition, the design taste, the details that only can come from someone who deeply understands your brand and your audience and that part still require a person. So for something that genuinely feels custom and premium at the level that the best AI built website are hitting right now. So you need more time, you need more resources and more deliberate design decisions layered on top of what court produces. But here's the point worth mentioning. OS 4.7 actually with the minimal guidance we gave it actually produced this like no multi-day brief no design team no phased back and forth over weeks and imagine what this workflow produces when you bring real assets a refined PRD dedicated time for iteration and a clear creative fiction behind every single prompt this is the ceiling worth building towards and obus 4.7 just made it significantly more accessible to get there so after a month of using OBUS 4.7 the verdict is clear it is the real upgrade not a marginal one. And the best part is the same price as 4.6. The real question is where it fits in your workflow like better performance on the hardest coding task, strong official reasoning, more creative and tasteful outputs, etc. And for the right situations like complex builds, a workflows, task where quality directly affects the outcome. Obus 4.7 is the obvious choice of course for simpler repetitive work. Stet 4.6 six is still the smarter pick and that has not changed in the month since launch and in fact Anthropic is not slowing down and neither should you. So if you want more in-depth tutorial like this and to actually make more money with AI feel free to join our Andy no code community.
You can find the links in the description below. And as always if you found this video helpful hit the like and subscribe button for more video like this in the future. I'll see you in our next
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











