GPT-5.5 successfully prioritizes direct execution over over-engineered reasoning, but its poor context retention makes it a brilliant tool with a very short memory. It is a significant step forward in efficiency that remains hampered by its own cautious and fragmented nature.
Deep Dive
Voraussetzung
- Keine Daten verfügbar.
Nächste Schritte
- Keine Daten verfügbar.
Deep Dive
GPT-5.5 is the best model ever made (but there's a catch)Hinzugefügt:
So, I know I'm a little late on this one. 5.5 has been out for about a week now, but I really did want to make a video on this because I have a lot of feelings. I've actually been using this model for about 4 weeks. Openai was kind enough to give me early access to it. I have personally been loving this model.
There is a ton to talk about with how different it is. This is not a 0.1 increase. One of the things I've actually been complaining about a ton with this model is that I hate the name GPT 5.5. I know what they're going for there. Like last year we had GBT 4.5 come out and that was like the halfway point to GBT 5. I think it had like it was a really really big model with I think new pre-training or something like that. I don't know the exact details but it was kind of like the beginning of what would become GPT5. And that is definitely what we're dealing with here.
GPT 5.5 is the beginning of what will become six. And I know why it's not six.
It can't be six because if they had called this GPT6 the economy would have collapsed in on itself. The problem with just naming things is just the way humans work. It's dumb, but it's true.
The name carries a lot of weight, and GPT6 needs to be a cataclysmic model.
Like, it needs to be the model. GBT5 was really good, but it did not live up to the hype of GPT5. And if this was GBT6 and it didn't feel like a freely available mythos model, it didn't live up to Anthropic Mythos hype, the economy would be [ __ ] So, I get why they didn't do it. This is definitely not the final evolution of what this new model is, but it is a new model. This is not just a slight evolution on top of 5.4.
This is a fundamentally different experience and it is an experience that I personally love. Throughout my time using this model, I've pretty much entirely been using it on low reasoning.
This is something that they kind of keyed us into early on during the testing that like, hey, this model is really strong right out of the box. You don't need high reasoning anymore. You can use it on low reasoning or even no reasoning and it will actually perform really, really well. So, for most of my work, I've been using GBT 5.5 on low reasoning within PI. We're going to talk about PI a little bit later in this video, just cuz I've been having so much [ __ ] fun with it. But also, I've been posting this a lot, and I'm going to say this a lot in this video, I'm going to be very insufferable about the fact that you need to use GPT 5.5 on low reasoning. The thing about old OpenAI models, like if you go back to GPT 5.4, the base model, like the foundational pre-training layer of just like the core corpus of knowledge was kind of [ __ ] [ __ ] Like, it was not very good. But if you use GBT 5.4 on no reasoning or low reasoning, it was a very unimpressive model. Which is why everyone would default to GBT 5.4 on high reasoning on fast mode because that would get you all of the benefits of OpenAI's RLing.
They're really, really good at reasoning. They know how to make amazing reasoning models. They did an incredible job on 54 and were able to take what was a pretty weak model and make it incredible through reasoning. The thing about GBT 5.5 is, like I said, this is a new model. This is new pre-training. And what that means is that the foundationalbased model is actually finally good. This is the thing that Enthropic has had over the other labs for a while. Like if you go back and watch my old videos from last fall, I was super hyped on enthropic models. And I actually really liked using anthropic models with low to no reasoning because the core base model was really, really good. And when you have a smart base model, you no longer need the reasoning.
And since 5.5 is smart, you can just get away with not giving it a ton of reasoning. And what the reasoning serves to do is it serves to give the model cycles to overthink things because this model is incredibly smart. It is also very um I don't know what the right word for it is. I'm going to use aligned here although may that word's kind of loaded at this point. It does what you tell it to a fault. Like to a insane degree it will do exactly what you tell it. And it is a little prone to overthinking things. I've noticed if you give it like higher reasoning it's going to do some weird [ __ ] Like a good example of this is when uh Julius was trying out the model. He was having it remove a feature from T3 code because there was like some legacy thing that they didn't want anymore. So he had the model remove it.
It removed it correctly. All was good.
But then the model added [ __ ] regression tests to ensure that the feature was actually removed. It literally added in a test to make sure that when you run the function, it doesn't exist. That is a level of stupid that just [ __ ] sucks. That is like the the thing that 5.4 always did where it was super super thorough. It is that to an obscene degree that just [ __ ] no. It's terrible. So, that's the kind of weird thing that it can end up doing if you just let it go off the rails with too much reasoning. But, when you have it on low reasoning, this thing is insanely fast. It is really smart. It will do exactly what you tell it to, and I have personally been having an incredible time actually using it. The only thing I've been liking better than this model, though, is today's sponsor.
I've been building so much more over the last couple of months. These coding agents make building things so much more fun, so much more productive, at least until you need to deploy it. I'm in constant need of more VMs, more databases, more Reddis instances, more storage buckets. All of these things constantly have to be hooked up to all of my different projects, and it just kind of sucks. Unless you're using today's sponsor, Railway, they are the best place to deploy pretty much anything. I have an insane amount of stuff already running on Railway. You can see this one, for example, has a Tanstack start app, a Spellkit app, a normal Hono API, another Spellkit app, and a Reddus instance all running on Railway's bare metal VMs, which are absurdly fast. you can easily see what's going on with this really nice dashboard. And on top of all of that, the pricing is insane. Even though Railway isn't a true serverless platform, you're getting longunning instances, which is a very powerful thing to have, you get the serverless benefit of only being built when you're actually using the CPU, which means that the days of overprovisioning and losing a bunch of money at night when your CPU just isn't getting hit, but you're paying for some super big CPU that you need during peak traffic, all of that is gone with Railway. And on top of all of that, they are the best platform to let your coding agents actually deploy stuff. This is one of the biggest problems I've been running into is it's really really annoying to hook up all of these cloud services using these agents.
Railway just kind of solved it. Their agent directory has support for basically every single coding agent you would care about. Personally, I'm a CEX guy, so I have the Railway skill set up globally within my Codex instance. So that anytime I want to deploy one of my apps or add a database or add some Reddus or set up a storage bucket, I can just tell the agent to do it. It'll use Railway's really powerful CLI, get it all done, and it just kind of works.
There's really nowhere else I would recommend deploying in 2026. You should really go check out Railway at davis7.linkrailway.
So, before we talk about what I've been using it for, I do want to address the pricing cuz you just go look at their pricing page. It looks really bad. GBT 5.4 was $2.50 in, $15 out. GBT 5.5 is $5 in, $30 out. It is literally double the price of the old model, which again on paper looks really, really bad. You are just going to be doubling your cost, doubling your usage in codecs. it is just you're not going to get as much value out of the model. The price is getting squeezed, blah blah blah. I actually haven't found that to be happening nearly as much as I expected.
Yes, the core of the model is much more expensive, but it is also much more efficient. I found that generally speaking for the same exact tasks. I'm using probably about 50% or less of the tokens that I would have used with 5.4 because again, remember with this model, 90% of the time you should be using it on low reasoning. And when you're using it on low reasoning, it is incredibly efficient. It does not use that many tool calls. It's super accurate to only use the exact ones that you need exactly when you need it. Doesn't think too much because it's on low reasoning. And since the base model is just naturally really smart, it doesn't need to go that long or that hard. And as a result, you're not going to be acrewing nearly as much cost as it feels like you might be. I've been tracking this pretty closely within my Codex dashboard. Unfortunately, I can't show it cuz it has like personal information in there. You just have to take my word for it that like I've been going very hard on this both in my coding agents like PI codeex T3 code all that stuff. I've also been using it a ton on my OpenClaw instance because I have set that up. I'm sorry in advance for the psychosis that I have been subjected to via GStack. It's it's a thing. We'll talk about it later. But the thing is I've been going very hard with this. I've done a fuckload of tokens with it and it's like my usage has never dropped below 90%. it just like doesn't hit it at all because it's so [ __ ] efficient and OpenAI doesn't really have a problem running the model.
It seems reference this quote tweet from Tibo on the uh one of the many anthropic shooting themselves in the foot post that they've been doing lately where he basically just said like Codex will continue to be available in both the free and plus $20 a month plans. We have the compute and efficient models to support it and that's just kind of what it seems. It seems like they can run this model really well. If you go to Open Rider and you look at the throughput for GBT 5.5 on Azure and uh actual OpenAI servers, it looks really bad. Like 39TPS, 11TPS. This is I mean this is just Azure things. Like what what are you going to do? It's Azure.
Come on. Um but like even on OpenAI, this looks really bad, but it's not been my experience at all. Especially on low reasoning. Like I don't know if they're just juicing it through the codec stuff.
I haven't tried it too much over API dedicated. But if I just do a quick thing here and if I just give it like a very simple prompt of like what does this project actually do and you just send it this prompt, watch it go. If you just look at this generation, it's really really fast. I have a um a very naive TPS tracker that is vibe coded into my Pi instance, but like you know it's averaging out to be not 93 tokens per second. Like it's kind of flying. I this model does not feel like it is doing 30 tokens per second at [ __ ] all. It's quite quick. There's another thing here that I didn't know if I was going to talk about because I don't know if people are quite ready to hear this yet, but um 5.5 with no reasoning is also extremely good. It's been a long time since any of us were using no reasoning models. They've kind of just become the default from every lab. So, it feels weird to turn reasoning off.
But, I would recommend giving it a shot on this model. Is it what I would use for all my day-to-day coding? No. But like this uh reply from Mario, who's the guy who created Pi, I trust him a lot on this stuff, said that I am ready for that combo. That's how I make it bash my computers and it's glorious. Like a super smart and fast haiku. That is actually kind of the thing that I've been feeling with this model. This post from Sunnil also sums up my thoughts pretty well. Uh that 5.5 is just better than 4.7 referring to Opus. Uh now for me, it's smarter, faster, doesn't try too much. I don't need X high or whatever. I prefer the shorter hops and letting me participate. This, I think, is the most important piece here. If you really just like letting your models go off and do stuff in the background and like not paying too much attention and just kind of writing some vague thing and letting it reason it out for you, this model's probably not going to feel nearly as good because of some of the shortcomings it has, which we'll talk about in a second here. If you're the type who really likes babysitting these things more and paying more attention as the agents are actually doing stuff and directing them. If you like look at a project, you see the thing you want to do and you're like, "Okay, I have these three experiments I want to run. I want to do this, this, and this." This is a beautiful model for it. If you can define the outputs for it, it will work really, really well. They even mentioned this within the prompting guide that shorter outcome first prompts usually work better than process heavy prompt stacks because like a lot of the old models if you gave it a ton of direction on exactly how you want it to do things.
This one you kind of just like tell it the outcome and let it go do its actual thing. They also call out specifically that efficient reasoning means low and medium effort should be re-evaluated before escalating it and you should just be testing this within your day-to-day work. Generally speaking, at least until you've really got a good feeling for what should be used when, I would just default to the lower reasoning. And if it really can't do it, then you up the reasoning from there. Because the problem, as I mentioned earlier, is that it will just kind of overthink itself into weird weird places. I remember I started a video about the model uh when we were recording the podcast episode about it last week when we were down in Miami. And he at the end of it compared the model to Grocode Fast as an attempt to dunk on and bully me for it. Which little did he know, I love Grocode Fast.
Please give me Grocode Fast V2. I do not care. But actually like this model does kind of have some weirdly similar characteristics to that model. It is again really fast. It really likes jumping to actually solving things. And the way you use it is very different from the old models. It's got that GPT accuracy like I said, but I do want to talk about the big downside this thing has, and that is twofold. One, the compaction in this thing is a lot weaker than it was in 54. 54 made Ralph loops pretty much completely obsolete. If you're not familiar with what a Ralph loop is, it's effectively just taking a plan you make with the model. You split it up into seven sections, call it. Each of those seven sections becomes a dedicated prompt and then you make a little bash script that will trigger your coding agent like codeex in a while loop. So you'll have the first iteration will do step one, second iteration will do step two, third iteration will do step three. And the point of this is to make sure that the model never overloads its context window. if you were only having it do bite-sized pieces and then having the models leave critical notes in between the steps. That was a really good solution for models that just couldn't handle compaction, would just lose too much context and would just kind of get stupid when it happened. F4 pretty much entirely solved this.
Julius, the guy behind T3 code, he has did not love the model at first. I think he's kind of coming around to it. I don't want to put words in his mouth, but he had a lot of issues with it. And one of the things that he likes doing is he likes having these gigantic hell threads where he will literally just have billions of tokens go through one single codeex thread because it doesn't matter. The model can just keep compacting every single time it hits it and you just don't have to pay any attention. Ralph loops were dead because you could just write out that crazy plan and even if that plan would take 10 hours to implement, you can just give it to codeex and trust it to compact nicely as it's going and not have any issues.
5.5 is not quite as good at that. And the reason is because it is super sensitive to everything that is in its context window. If you tell the model at any point within a thread, hey, could you commit this change once you're done making this? Every single time it makes a change after that, even like after you had sent that message and it done the commit, it will do that again. That thread is now poisoned with that information. And you can tell it like, "Hey, stop doing that." It's 50/50 on whether or not it's actually going to follow that instruction. It does not seem to do a very good job of discerning the hierarchy of user instructions where like you gave it an opinion early in a thread and then gave it a contradicting opinion later in the thread. It will not wait the later one higher the way it should. Sometimes it will, but I've noticed this is a recurring pattern and recurring issue with the model. It struggles with longer context and this is definitely a model where you need to use it differently. You need to use it on low reasoning. You need to use it in something like PI or CEX or whatever.
I'm a huge PI guy and we'll talk about that in a second. But the biggest thing is you need to be constantly making new threads. I have almost never let a thread with 5.5 go beyond 100,000 tokens within the context window. I a haven't needed to because that's a lot of space for 5.5 because again it doesn't reason all that much. It doesn't do that many tool calls cuz it's super [ __ ] efficient and smart. So you can do like I've had 50 plus message long threads that only had 90,000 tokens in the context within code bases that had two to 300 files in them. like it's not unheard of for the model to not have that many issues. But if you're doing like a deep dive and a deep refactor of something and you needed to explore an entire codebase, I would recommend doing that on high reasoning, especially if it needs to do research. You let it do the research, you let it do the plan, you let it do the exploration, you let it formulate out all the notes it needs, you maybe go back and forth a little bit on the plan. Then when you're done with that, if you were doing this in five forward times, you would just like be like, "Okay, cool. Looks good. Implement it. Go away." You can't do that now. You need to persist the plan to a markdown file somewhere, then open up a new thread, set it to low reasoning, tell it to implement the plan. And even then, sometimes it also suffers from the not biasing towards action enough thing where like claude models will naturally just go and do things regardless of what you tell them to do. They'll just [ __ ] send it, which can make them feel really fun to work with, but also can result in them doing some really stupid [ __ ] sometimes, which I don't really trust them all that much. This model is not going to do that and it's still going to be a little overly cautious at times which can be kind of [ __ ] annoying where like I've had a 10step plans I've given it and it will get to like step four and a half and be like okay I I finished step four and a half what would you like to do now and I'm like finish the [ __ ] plan you dumbass. It's not without flaws. Would I call it AGI? No. But would I call it an extraordinarily useful model? Yes. I have been using the [ __ ] out of this thing and for day-to-day work it is insanely useful. It's great for writing code. It's amazing at like actually using computers and doing computer use type stuff. It's really useful for like weird GStack style workflows where you have like a bunch of markdown and internal tools stuff that you're having the model orchestrate and control. I've been really enjoying using it within like OpenClaw and stuff like that. It's even like actually pretty competent at front end. Is this the best looking site ever made? No. But it's a very competent, decently well put together site. And basically all of the UI on this, the UI pass that I did recently was done with the new model. I've also been working a ton on the new version of the BTCA web app. If you're familiar with the old BTCA CLI thing that I made, I've actually completely deprecated the CLI because while it was a very cool idea, I've realized that there is a much better way to do this and is the way I've been doing it for the last couple weeks. So I open up my Pi instance here and before you would have done like BTCA, get your little TUI or whatever.
This is my Pi instance. I'm going to do SLBTCA, which is a skill. So I have a global skill that's BTCA local. I run this. It's going to give the model a prompt that defines how it should run this program. It's going to tell me like, hey, these are the things that we've previously searched. Like, okay, to look into the AI SDK, could you give me a getting started guide on using the Open AI API with GBT 5.5? So now it's doing the exact same thing that the old version of BTCA would have done within the CLI just within your own coding agent. And you can stick this into any session. You can pull it up at any time.
It works a lot better and you don't have to remake the actual TUI or agent which both just kind of suck. But the web app isn't quite in that bucket. The web app is specifically there so that you can do this wherever. You can save the threads.
You can copy paste them into other things. I've mostly just been doing this as like a research project for myself to try and figure out what does it look like to have a useful agent hosted in the cloud? Was the right architecture for this? Should definitely do a follow-up video on this once I release it in the next couple days here. We'll do that another time. The point I was really just trying to get to here is the UI for this homepage, which again, is it the nicest thing ever? No. But this is 5.5. This is all generated by 5.5 and also slightly by GBT Image Gen 2. Not the point of this video. If you look at this little section right here, this is a direct like quote from GBT image genen. So I like fed the codebase into GPT 5.5. I told it to do a deep analysis of all the styles and branding and all that stuff and then feed that into GBT image genen to come up with a like design system and design pallet. And once I had done that, I then took that pallet and I gave it back to GBT 5.5. I was like, "Okay, do a deep audit of the entire codebase and make sure that we are following this new design palette."
And the pallet was [ __ ] killer cuz the level of design that the image gen model can do is pretty insane. So, I fed it back in there, went back and forth a little bit, and the end result was this.
And like one of the things too with 5.5 is that right out of the box, the design stuff is still GPTE. Is it better than 5.4? Yeah, it's it's a lot better, but is not at the level of something like a claw model out of the box. it will still bias towards the stupid card design things that we're all very sick of by now. But again, you can get around this.
You can give it references. You can give it skills that will loize that specific piece of its behavior out of it. There are ways to solve this problem. And with how steerable this model is, I find those problems to be fairly easy to solve. I won't go too deep into it during this video, but just to give you a highle overview, I did a bunch of very complicated architectural refactors on this project to really try and figure out what is the right way to have durable streams running in the background that required a lot of research, a lot of weird over the network boundary stuff, and it was able to handle all of it. This model is very, very smart. It's very, very capable. You just kind of have to really know what you're doing when you're using it. I think that transitions well into what I want to talk about now, which is pi, my favorite way to use this model. If you don't know anything about coding and you're just trying to vibe code out a project, this model is probably not going to be your friend, it really requires you to really be paying attention and really know what the [ __ ] you're doing. I like pie a ton for this because I have done a ton of work and really it's not a ton of work. I've just done a lot of customizations to this harness to set it up to my liking and make sure it works correctly. The kind of the the trifecta that I've been using for the vast majority of my dev work has been Pi for the harness and TUI GBT 5.5 on low reasoning for the model and CMUX for the actual terminal. I was kind of iffy on Semox when it came out a couple months ago. I really expected this to just be one of those projects that comes and goes pretty quickly. It wouldn't have much staying power and would kind of just get forgotten about, but that hasn't actually happened. It's in fact gotten better over time, like a lot better. And it's in a really good state right now. So, what I've been doing is my general workflow, if you were looking over my shoulder while I was working dayto-day, typically speaking, there would be like two things open here. So, I'll like do uh VP dev. Been using V+ a ton. Really like it. So, like this is one of the things I've been working on.
I do a bunch of data analytics stuff on the YouTube channels that I work on and for. So, I have this little project that I've been doing with GBT 5.5. I have my dev server running over here. Then, I'll have a Pi instance running here. And typically speaking, I'll have like threeish tabs here and maybe two to three tabs over here as well. And I'm just constantly cycling between new Pi sessions on the channels page. The UI is kind of cardy, specifically the uh top level stats at the top of it. Can you make those not look like cards and just kind of be floating there naturally in a way that looks good and fits with the design system? So, it just be in here.
I'd fire off a prompt like this and then I would go over to some other part of the project and start working on it because one of the things that I used to do a ton of is I would be working on like six different projects at once. I was trying to do that. These models are magical for letting you parallelize because even as fast as GBT 5.5 is, you know, I've been talking for 30 seconds here. It's still going and it's going to be going for another probably 2ish minutes. This is enough time to go check on another thread from another model and just see what's happening here. I guess in this case it is now done. So we go back here and yeah, beautiful. and like actually did the thing. Look at that.
So, if I wanted to go in and review this, I would just do slashde and then this gives me a list of files that I could open in zed. So, I'll open this one up in zed. Take a look at what it did here. And you get the general point of this. And then when I'm done with this, I would just maybe do like / yeet, which is a little extension command here to tell the model, hey, commit and push all the changes up to main, whatever. I really like parallelizing work within one project. I find it much easier. You can still parallelize with these models.
you should, but doing it across projects found me understanding what was actually happening a lot less, getting a lot more confused, getting a lot more fatigued. I have a lot more endurance, and it feels way better to instead be working on the front end and the database layer and the sync layer for one project at one time.
That is a better way of doing things.
Another thing that you might be a little confused at looking at this is if you download and install Pi for yourself, it is going to look nothing like this. I'm going to do a dedicated video on this very soon, but the gist of this is that this is actually a very customized Pi setup. It's very easy to customize Pi because all you have to do is just open up a new Pi instance and be like, "Hey, can you add in XYZ functionality?"
That's actually how I've been working on this is I will be doing some work in one of these sessions like this. And as I'm doing it, I'm like, "Man, I really wish I had an extension or a command for going to the files that edited like that diff command or that ye command." Those are all little paper cuts that I found myself feeling wishing they existed. And the way I solve that problem is I just jumped over to a new tab. So I just tell it like, "Hey, can you change the theme of this little like Pi logo thing up here?" It knows where the actual Pi stuff is because it's in the system prompt of Pi where Pi stuff lives. So it can go find that in here. It's going to do some searching, do some changes. Now, if I do slash reload, um, actually I need to do slash new. There we go. It is now green. So yeah, it's super customizable. I fully open sourced my setup for Pi. It's just I'll have this link down below. Again, I don't actually recommend you directly copy my setup.
You should make it your own. That's the point of PI. The point of PI is not to import someone else's opinions. There are a lot of other harnesses that have much better opinions than I am going to have. My opinion is that I like super minimal harnesses and I'm just adding in little useful utilities for myself.
That's all I need. But for a lot of people, they really like the way something like Cursor or Open Code or Droid or Codeex or Cloud Code. I don't know who likes cloud code, but someone probably does. The way all of those feel, you should probably just pick one of those if you don't care enough to go and customize it yourself. But if you do just want to see what's possible and screw with that, this is a good place to start. You can just clone the repo into root home directory.py/ aent. Super simple to set up. That's literally all you have to do and it'll just work. The extension system is magical. This is um genuinely one of the best projects I have ever seen. Last thing before I close, I probably should have mentioned this a lot earlier, but um this model writes the best code of any model I've ever seen. It is the first model that truly shows restraint in its outputs. A lot of these models will just overengineer the [ __ ] out of the solution because it is easier. Models and humans are both lazy, but the way their laziness shows up is very different. And I actually find the laziness of models to be a lot more destructive in programming than the laziness of humans. Because human laziness will result in simpler, better code because they just don't want to deal with a bunch of crap. Models don't care. It takes effort and intelligence and time to figure out a way to make something super simple and lazy and boring. It does not take a lot of effort to add in every single possible edge case and write 10,000 lines to solve a simple problem. That is the easy way to do it for a model because they are not constrained by time or anything like that. They're constrained by intelligence and this model is very intelligent. I have been using this site models.dev. It's super useful, very good open source project that is linked to the models.json directory thing. It's what open code and pi and a lot of other different harnesses and just general model providers use as a giant open- source data set for the price and other useful information for basically every single AI model that is hosted and matters. They have a site for this to actually view all of these. And it's fine. It's not great. The performance is not amazing on the scroll. Like I'm scrolling as fast as I can. You can see like the flickering kind of sucks here.
Do I do this all the time? No. But like it takes a while to load. It doesn't feel very good. I wanted a better version of this, especially since the data set is open source. So I made a better version of this and I had the model make the better version of this.
This is a speltkit site. It is super simple. It is in spa mode. It fetches the data from their JSON API so it's always up todate which can be a little bit slow. Like if I refresh the page it'll take a second to load from models.dev but once it shows up the actual performance here is absurd. Go back to all and just start scrolling down here. It is beautiful how well this thing scrolls cuz this is a really nice virtual scroll implementation in spelt that the model wrote. And when I saw the code it blew my mind because it is the simplest code I have ever seen a model write. I've tested this with a couple other different models. They will do a bunch of weird [ __ ] to like come up with a super custom like virtual scroll implementation that like does a bunch of random stuff and it will end up being like 30ish lines long typically, but they won't correctly use the like spelt built-ins like the bind directive and some of the weird attachment behaviors you can do to divs to make this a very easy trivial problem if you deeply understand spelt. This model figured it out. It knew how to do it correctly. It wrote like a 10line solution for the scroll box which is just killer. It is so [ __ ] good. I am so happy with the code that this thing outputed. This was the first thing I tested with the model.
I was pretty blown away. I'm still pretty blown away. It's been my daily driver ever since. Currently, the only models I'm really using heavily day-to-day are GBT 5.5 on low reasoning, high reasoning if I really need the planning. Um, occasionally using the Clawed Opus models if I like really need to break them out for design stuff, but that's pretty rare at this point, especially since there's really cool projects and solutions like this impeccable project. Uh, these will be a dedicated video in the future, too, where like it's just a super powerful skill to make the models create actually good designs. It's pretty damn sick.
I've also been experimenting a little bit with some of the open source models and local models. Uh, worth talking about, but probably not worth it in your day-to-day. I think at least as of right now, until Google responds or Xi responds or anyone responds, this is the model. This is the one that I would use for everything. And if it feels weird to you, just give it a real shot. You'll figure it out. Openai did really good here. Very curious and excited to see where this goes because if this is the beginning of GPT6, hopefully the real thing is substantially better. Although looking back on Opus 45 up to 47, nothing is really a guarantee these days. If you like this video, make sure you like and subscribe.
Ähnliche Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











