Third-party AI harnesses like Cursor can extract more performance from models than the official IDEs provided by model companies (Anthropic's Claude Code, OpenAI's Codex) because they focus on developer experience and have accumulated institutional knowledge about how users interact with models, whereas model providers primarily focus on improving the models themselves rather than optimizing the developer interface.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Cursor is now better than Claude Code at running Opus 4.7Added:
So, Anthropic, they train Opus 4.7, so they know exactly how the model thinks.
They ship Claude code their own harness, and then Cursor, a third party with no privileged access, they're able to extract more performance from Opus than Anthropic does. It's the same story with OpenAI with GPT-5.5 and and Codex. I think it's mostly the second thing. I think Cursor, they've been at this battle for for years now, since 2022 when they were founded, right? Where they're trying to figure out every last detail about the developer experience, whereas Anthropic and OpenAI are are really focused on building the best models, right? So, they have institutional knowledge across all models and how they work, and maybe having that holistic understanding of how every model kind of is able to interact in this developer coding way, you can build a better harness. So, yeah, it's super ironic now that, you know, last week Elon bought or has the option to buy Cursor at a $60 billion valuation, and then this week everyone's saying that Cursor's actually the best way to harness the capabilities of the model when it comes to development. So, super interesting. It's probably all the user insights that Cursor has been generating over the years on how the users interact with all these models, and then how these users also get to the best results using these models and then implementing that into their harness.
While if the big labs are training their models, they're training this on general knowledge. They're not training it specifically for coding use cases. I mean, Anthropic might be close to, but still. I think Cursor has some unique insights there, and probably that's why you see, as this tweet is saying as well, that they're just outperforming the base models or the IDEs that that Anthropic or OpenAI delivers to us.
>> If we look at uh different kinds of harness, even Open Claude could have come we consider it to have its own kind of harness, too. And when you have these harnesses inside of the IDE, is when you have them together working to create these new tools that allow people to work with Claude code or with uh with with with 5.5. So, I think this actually contradicts a lot what we were disagreeing on a few weeks ago, which was the do IDEs have a have value in the future? And I still personally believe they really do, just because they're able to not work with just one model, but with different models. So, an advantage of Cursor versus Claude code is Cursor could be using GPT-5.5 to to do some of the tasks and Claude to um Claude to do other of the tasks. And now we know that they're actually quite quite closer than they used to be. So, uh if you have one which is focusing on maybe the planning and one that is actually doing the coding, maybe that's where the gains are coming in Cursor's case. If you think about it, like Cursor's at an advantage because maybe we're talking about harnesses and we're just thinking, okay, yes, if you're just using Claude, you're just using Claude, but they may be using other models behind the scenes that are acting as a harness and another layer on top. So, they might actually have an advantage there where they can be pulling from different sources and knowing, oh well, GPT-5.5 will be used to write the prompt, which then you use for Claude Opus 4.7 to actually run that that command, right? Uh so, they may be actually like batching together different models, and it might be an advantage to be this uh you know, uh provider of of different things. I know Perplexity has talked about that a lot, where they feel that they're at an advantage because they can, you know, route things efficiently based on how they feel is the best way to get to the desired outcome. Yeah, and on that note, I've also heard a lot of people saying, like for example, they use Gemini for planning, and then they use Opus for orchestration, and they use Codex or GPT-5.5 for execution, and then they use another model for checking up if like for actually checking if if the code is written correctly. Same with Naval podcast that I listened to, I think 2 days ago on Vibe Coding, where he said, "Every time he pushes something to GitHub, an agent checks what's been pushed to GitHub, and lets three other models check the code that's written by Opus, for example." So, you see a lot of orchestration happening, and that could be happening under the hood at Cursor as well. So, that's a very interesting how we now see that different models it's sort of start talking to each other or checking up on each other. People don't really trust one specific model anymore.
They're getting so good, but they're also so convincing that they're so good that we sometimes just forget to check.
And I think we always need to have some checks in place, and maybe that's the strength of Cursor or the harnesses in particular as well. Something else that Cursor announced, um which I think is a big deal, is the Cursor SDK. Cursor tweeted, "We're introducing the Cursor SDK, so you can now build agents with the same runtime, harness, and models that power Cursor. Run agents from CI/CD pipelines, create autonomous from create automations from end-to-end workflows, or embed agents directly inside of your product."
Does this mean that anyone can basically spin up a Vibe Coding tool now and plug in Cursor, and then a user can basically use the Cursor harness in your product?
Ben, um do you have any idea what this means for for app development? Well, look, initially Anthropic and OpenAI were just providers of the models, right? And then they came out with Codex and Claude code, and then they got into the IDA space, and then Cursor, this is kind of their version of going that way.
So, every company is kind of going at it their own way to just try to spread their surface area at which they're kind of touching this whole landscape of of software engineering, right? So, basically now Cursor is directly competing with their own model suppliers, right? OpenAI, they want to own that agentic layer. That's literally where their agents SDK is. And Anthropic wants to do that with computer use. They launched the MCP standard, this whole ecosystem, right? Well, now Cursor wants to own that as well, and there are are these basically three companies that are deeply commercially entangled with each other. Cursor pays huge sums to OpenAI and Anthropic for all this inference, but now they're competing for the layer that those labs have wanted to dominate.
So, it's an interesting kind of frontier lab versus application layer collision that's coming, and Cursor SDK Yeah, it's it's it's it's it's the biggest kind of leap towards that for for Cursor going in that direction.
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 views•2026-05-29
🚀 BCS613C Compiler Design | Module 1 to 5 Schema Evaluation 🔥 | VTU 6th Sem 💯 #VTU #bcs613c #exam
Pranavaa-y4y
104 views•2026-06-02











