AI coding agents like Claude Code and Codex can significantly accelerate software development by automating repetitive tasks, but effective use requires understanding that these tools function best when developers apply proven software engineering principles such as breaking tasks into small pieces, following test-driven development, and using YAGNI (You Ain't Gonna Need It) and DRY (Don't Repeat Yourself) principles. The key to successful AI-assisted development is clear communication of intent and requirements, as agents excel at executing well-defined tasks but struggle with ambiguous or incomplete instructions. This approach transforms traditional waterfall development into a faster, more iterative process where developers can quickly prototype, review, and refine code without the lengthy cycles of manual development.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
424: Waterfall But Really Fast (obra)Added:
Jesse Vincent, welcome back to the show.
Thanks so much for having me. It's been a minute. It's been a while.
Um the last time you were on my show was episode nine of this podcast.
>> about right.
>> [laughter] >> And it was about 13 years ago.
Jesse, I I knew it was over 10. I didn't know it was 13. Wow.
>> Right.
I I didn't have a courage to listen to the show because it's kind of I usually I do not find it embarrassing, but if it's more than 10 years ago, it's it's unbearable to to listen to myself speaking. It was a long time ago. I I can't listen to myself on podcasts ever, even if they're current.
Yeah, so the last time we talked, it was it was even before you started doing the Kickstarter campaign for the first ever keyboard. Yeah. Was Were we I can't remember were we talking about Pearl stuff or were we talking about keyboards when it was a hobby? I think it was the keyboard when you were preparing for the first Kickstarter campaign. Yeah. Wow.
That's yeah, it's Yeah, no, we went through four keyboards.
Um four keyboard Kickstarters. Couple of them were, you know, seven figure US dollars.
Uh it was a like the whole thing was crazy. There was a period when we discovered that our factory sales person was a scammer who was scamming us and scamming the factory. Nobody in China believed it. Um I've got great lawyer recommendations in Shenzhen. I had a whole hard work career and for the last year I've been sort of very nose down doing AI stuff.
Right. That's a that's a great transformation, I think.
I mean, I was I was thinking about what to talk about and uh I think it's fascinating that you and I share some attributes in common and one of those is that we were at a certain point of time known for being a a Pearl developer Mhm. and then now being known to a different people for being very very different things.
Um >> Yeah.
I'm I'm for especially for the listenership of the show, I'm better known as a podcaster rather than an engineer. That's wild. Right. And you you you come from a Pearl developer a leadership of the Pearl programming language and then a a creator of the a keyboard and then I don't know what it is. What is What is it now? It's a >> I Now I think it's superpowers. Now that seems to be the, you know, it's it's, you know, it's very weird to have something that I originally put together as a it was a really a demo of how I was doing AI dev and me packaging up the stuff I was already doing so that I didn't have to keep typing the same things over and over again.
>> Right. Um You were you are now a kind of a a thought leader in the AI industry. I think. Apparently, yeah, apparently I am an Microsoft claimed on stage at GitHub Universe that I was an AI coding pioneer.
>> A pioneer.
>> Um and that might have I can't remember if that was before superpowers or not, but it was it might have been. I think it was around the same time. Yeah.
>> the I think it was last year in Moscone there was GitHub Universe conference and then your your a picture of your your face is up on the screen along with Simon Willison and and other people. And And Angie who ran Goose for originally for Block and now for the Linux Foundation. Yeah.
Yeah.
>> What a time to be alive. Yeah. Um Yeah, so let's let's dive through that. Um I don't know I don't know where to start.
Well, I mean, one of the places that I sometimes start is talking about the sort of the first time that I did something that looks like agentic dev like and how hard it was for me to adapt because what I was, you know, I would spend my day trying to help, you know, essentially help somebody with debugging, help somebody else with project planning, review somebody else's code and I come home at the end of the day and I'm like, I have not done any real work because at the time I was a working programmer and all, you know, it's and I was and it turned, you know, and I was spending a lot of time typing into it into a terminal window and these entities on the other side of the terminal window would be the ones who were actually doing the work.
>> Mhm. And it felt really weird because up till then my job had been lines of code.
And as it turns out, this was almost 20 years ago. Mhm.
>> These weren't agents. These were MIT interns.
I was managing them through IRC and they were really smart. Some of them thought they were better than they were at coding, but they were all quite good.
They were often very junior.
They didn't have great didn't have great taste mostly. Some of their judgment was a little bit suspect. Um they weren't sleeping very well, so they were having trouble forming memories. Um and yeah, and so I developed all of these essentially management hacks to get good product, you know, good productivity out of them. And some of them have gone on to be really influential engineers, really capable people. But at the time they were they were kids. And it turns out that a lot of these management hacks that I figured out for helping junior engineers be productive on a small team work really well for AI.
And when I started learning how to use Claude Code I realized that I was pulling out the same tricks.
And, you know, they were helping figure out what, you know, what needs to get done and then break the task into tiny little pieces and then do one task do the code review, make sure that you're always doing red green test driven development you know, teaching them about YAGNI, you ain't going to need it, and DRY, don't repeat yourself. And this that I mean, that's the core of what the superpowers engineering loop is and it works well for, you know, it works well for agents just like it works well for people.
When you talk about these junior engineers 20 years ago, a lot of the the description of these junior engineers feel really similar to how you describe AI today or maybe a year ago. I think I think this this landscape is changing a lot, so Every month. Every month it's different.
Yeah. Yeah, yeah. Um but it's all of the tools are getting better, but you still you know, you still need to break down problems and it is still the case that pretty much every tool will do what you ask it to do. Mhm. And the problem is for the most part that humans are really bad at asking for what they need. They ask for what they think they want without really thinking about it.
And one of the things that seems to make some people much better at using AI is the man like the management experience of being able to think about what you want and explain it clearly before you start. Right. It is the difference between you open up Claude Code and you say, let's make a React to-do list and what it's going to do is go make a React to-do list, but what you want it to do is say, hang on a second, why do you want to do that? What's What are you trying to do? What's the point? Right.
Did you figure that out almost immediately when you started using Claude Code?
Or is that something that you learned over time?
I would say that it it it was within probably within a month I got to the sort of the core of it, but it was also that intent thing comes from my consulting career where I would walk into these big companies and it was consulting for my open source product, which was RT, this ticketing system.
And the clients would ask me to do something to the product for their organization, for their enterprise.
And usually what they were asking me to do was essentially fix a symptom of a business problem.
And when we talked for a while, it turned out that the real problem was something completely different and there was a much better way to solve it. Mhm.
And so I learned a long time ago that when, you know, when somebody tells you the solution they want it is almost always just another way of them describing their problem. Right. Um but now when I picked up Claude Code the first time, I had an idea for a thing I wanted to make and I described it and Claude went off and built this crazy monstrosity that had every possible feature you could could possibly want.
They weren't particularly well implemented cuz this was I mean, this was the first day. Right.
>> Um this was the first day that Claude Code existed.
>> That was like April or May last year.
>> It was I think I think it was actually I think it was late February. Okay.
>> Um I think it was Feb 20th if I remember right. Anthropic had birthday parties for Claude Code recently.
>> right. Yeah. Yeah. Um but it was, you know, what sometimes gets described in in English as a Homer Simpson car because there was this episode of The Simpsons where Homer got to design a car and it had everything.
It had, you know, six radios and a dome top and, you know, antennas and screens and and Claude is really good at all the features. In fact, most of the AIs are.
And it used to be that it was hard to get all the stuff that you might possibly want into the product. And now the hard thing is telling it, "Wait, wait, stop, stop, I don't need that." Right. That's interesting. Yeah, I think I started using Claude Code, I think it's about April or whatever a public beta was announced. Yeah. Um actually, I was I started using it for non-coding stuff uh very early in the day. So, the first thing I did with Claude Code was to uh I think there was I was using Heroku, I still do, and uh Heroku emailed me that the version of Postgres that I was using is going to sunset. So, I need to upgrade to a newer version of Postgres, and here's a document how to do it. Good luck.
>> Yeah. That was the email.
>> [laughter] >> And then I I thought it was this maybe a good use of Claude uh Claude Code because the agent can run on a terminal.
It can run any commands that that I needed to run. So, what I did was to download the the manual, the web page as a markdown, and uh copy into paste it into a into the session.
And uh let it plan. I I don't know if the plan mode was there at the time.
Yeah. But basically, it write down the markdown to describe what the plan is and and I would review it and and and execute from there. So, I think I don't know, for some reason I was kind of getting into this plan-based execution flow from very early days for some reason.
>> That's good. It's a like it's it turns out that it's a really good way to keep the agents on track. And yeah, that sounds pretty early for realizing that Claude Code was good for other things you can do in a shell. Yeah, exactly.
Yeah. Yeah. Yeah, it's it's been really interesting watch how different people engage with it and find interesting and different uses for it.
>> Right. Um Yeah, I will probably talk about this later, but over time uh since then, the way I use coding agent, especially Claude Code, has been has changed quite a lot. Early in the days, I was very micromanaging the the way the coding agent writes code. And I think the way these skills work kind of, I don't know, enhances the um the micromanagement aspect of it without me having to do the micromanagement if if that makes sense.
>> Yeah, no, it's part I mean, part of the design is trying to do as much planning as you can up front so that at the point where the agent is ready to go, you don't have to supervise it quite as tightly. Right. Um It almost sounds like we are doing the waterfall development. It's that's one of the things that sort of I've been thinking about a lot at at work is what AI-native methodologies look like because yeah, we've got we've gone from a whole like, you know, 50 to 70 years of software engineering experience where we got away from waterfall to agile to a a variety of things, and now a lot of agentic dev feels like waterfall but really fast. Right. And that is kind of neat because I no longer feel bad about trying something, realizing that I have designed it completely wrong or have told the agent to do the wrong thing and just starting over.
But it's not the I don't I don't think it is necessarily the right way to build forever.
Um I've been playing with a new set of skills for iterative development instead of the superpower style plan everything up front.
It's um my experience is like superpower is plans can get to somewhere like 30 or 40K before I start to worry about whether they can be executed well.
And with the new the new tools, I've been I've ta- I've taken up to 600K of specs and been able to generate the app generate an app from those specs that doesn't drop requirements.
Okay.
And so, this is the the far side of the of a set of tools I built to reverse engineer things.
So, it's a it started off as something to do sort of adversarial reverse engineering where you can have it look at any product where you can get source or obfuscated source and create behavioral specs. Specs that don't include anything about the code but just about what's supposed to happen, how the pieces might fit together, how it how it would get used, user journeys. And so, one of the first places that I used this uh I use Obsidian for my for my personal notes.
>> Mhm. And I very much want wanted my agents to have access to my notes.
But Obsidian's first-party sync engine at the time was only available as part of their electron desktop app. And I really wanted to run my agent in the cloud on a in a container, and I really didn't want to have an X server there just to be able to run Obsidian.
So, I had the tools reverse engineer Obsidian spec sync engine to specs.
And then I had agents on another host re-implement it from those specs in Rust. Right. And there was one point where the implementing agents were like, "There's a missing detail. I need you to go ask the agents that are that did the reverse engineering to update the specs with this one bit of detail." But all of the crypto got got behavioral specs. The whole thing worked. I wrote to uh the CEO of Obsidian is a friend of a friend, and I wrote to him and asked if they'd be okay with me releasing it. And they said that they would really rather I not. Uh-huh. Because they they had they they were very worried about like, "What happens if there's a tiny bug and somebody's vault gets corrupted?" And they did say that and they have their own version coming out in a little while. And that did finally ship.
>> Okay. Um but more recently, I've I finally found a candidate project to test this with. Um Matt Hartman, who's a venture capitalist and was a friend of a friend, put out an app called Ghost Pepper, which is uh basically started off as WhisperFlow, open source, runs on your own computer.
And his first version was good enough that I could use it but not good enough that it felt really great. And he nerd sniped me into adding um uh let's see. Uh start It started with with Parakeet transcription. I added speaker diarization so that it could reject other speakers in the room. I added OCR of your screen to better uh to improve the quality of transcription as it feeds into the the LLM that that cleans up the transcript and a whole bunch of other stuff.
And so then I ran these tools to do uh to reverse engineer specs from that. Generated 600 kilobytes of specs and fed them into these new tools, and they managed to generate three, you know, three ti- three times out of three, a copy of the app that was almost the same but completely different code inside.
And so, I'm kind of excited about the possibility of that for being able to take brownfield code bases and turn them into brand new code in another language not being tainted by weirdness in the implementation.
In your in your experience, does that produce a better code ins- rather than reading reading the actual disassembled code as etc.? I think so, I think it depends a lot on how good the original code is.
>> Right.
>> Um and so, you know, I've been tuning the uh the the iterative development skills because early on, they optimized for testability, which meant that there were many more API boundaries than there were in the original one. And so, the code was a little bit more complicated because it had like 10 times the number of tests.
And so, that was and it and it was designing to add more testability.
Uh my guess is that for older code bases that were not that were sort of built over a number of years, it might actually be a better way to generate good code.
One of the cool things that some friends of mine who have been doing a lot of this kind of agentic dev have discovered is that doing a a source-to-source port agentically, you can improve the quality of a code base by porting it through a language that has some property you want. Mhm.
So, if you've got, you know, a code base that's in JavaScript or Ruby or Python, and you run it and you port it through Rust to basically, you play you have the agent port the thing to rust and then port it to something else. Whatever it ends up in will have better type safety by having been translated through rust to something else.
And this the surprising thing that they my expectation if I was going to port a product to language you know from language A to language B to language C is that it would get worse over time as it's like a game of telephone. But what they've said is that what my friends who do have done a bunch of this have said is that they've found that it actually does the opposite and the code quality improves every time it gets ported.
Because it's getting focused and rewritten and cleaned and cleaned up and it's absorbing the properties of those languages you run it through.
In the case of rust I assume more type safety and some conciseness but because of the expressiveness of the rust language like type matching.
Yeah.
Um I have no idea what happens if you port through Java. I have no idea what happens if you port through shell.
Um but these are experiments that are actually really easy to run now. Right.
And that's that's actually one of the things that's really fun about these new tools is that even if you're afraid of using them for production code or you work in a regulated industry or with safety critical systems for running experiments and prototyping and trying things it's you can you can just build stuff that was impossible before. Right.
Talking to Simon Willison one of the one of the things that he has noticed is he describes it as having 30 years of really really finely honed intuition about what parts of software engineering are easy what parts are hard and what parts are impossible.
And what he's discovered is that everything he's learned about that is now wrong.
Everything? Um like the things that used to be impossible are easy things that when you're doing them by hand are really easy are nearly impossible. I've got one really good example of this. There's a there's an Android game I used to love.
It was called Wordiest. It was you got 14 Scrabble tiles and you had to make two words just by drag by dragging them around.
And it was it was a free game on the Play Store. The company that made it went out of business 10 years ago. It got pulled from the Play Store cuz it hadn't been updated.
And I'd switched to iOS.
Uh the first the first project I vibe coded when cursor and windsurf were new was actually a web-based version and it was kind of garbage cuz it was a bunch of react and it was I was doing it bit by bit as opposed to a sort of more modern agentic just let the agent cook. Right. But right around when GPT-5 came out I decided I wanted to try Codex.
So I downloaded Codex. I downloaded an old APK of this game from a mirror site and I opened up Codex and said we're going to reverse engineer this game.
We are going to build a brand new version for iOS and I'm going to put it on the App Store. What tools do you want? Just to see which reverse engineer you know whether it'd be willing to reverse engineer which tools it would want. Um way back way back when I used to make K-9 mail for Android and so I had some experience with Java decompilation peeling apart Android layout files but it it had been about 10 years. But it named the right tools. I installed them and said go. Go for it. Ask me questions if you got them.
And it came back to me an hour and a half later and said all right I got most of the plan together but which but do we need to include an in-app purchase to remove the ads?
>> [laughter] >> Took me a second like ads? It's like yeah the original used the the Google ads SDK so I've got Apple ad the Apple ads SDK ready to go. Do you want an in-app purchase to remove the ads?
>> [laughter] >> All right we're going to skip the ads.
We're this is going to be free. Um and then I said okay just just keep running do the port let me know when you need me and it ran for about 12 hours. Meanwhile I tracked down the original author from a single Reddit post that he had made and emailed him and asked if it was going to be okay with him for me to do this and like if I'll put ads in and and we can share the revenue if you want. You can release it under your name but I'd really like to make it public.
And I don't think in that initial email I told him I was using AI but I like but he came back he was very friendly he's like here here's some interesting details about how I built the original and do whatever you want with it. I would love a one-line credit in the about screen but if you want to charge money for it you can keep the money.
I'm so glad you're bringing my game back. Wow. Meanwhile Codex built the game.
It had all the gameplay was right all the weird layouts showing you know how you scored against everybody else was right. There was one missing animation when you were the highest scorer and the original Scrabble tiles had been squares with a with a one of the sides was curved and that's where it would show like double word or double letter score.
And Codex had skipped that.
So I had a fully playable game but without the rounded corners on the tiles. I spent probably about 6 hours going back and forth with Codex trying to add the rounded corners on the tiles.
Other than that it one-shotted the game and Apple accepted it onto the App Store on the first try.
Right.
Um so yeah stuff that's I would have would have been near impossible for me before happened while I slept and something that should have been really easy was almost impossible.
Yeah similar things happened to me as well. I have currently have three vibe coded iOS apps installed on my iPhone.
Awesome.
>> use all of them day-to-day.
Um one of them I just recently vibe coded was uh an app to learn Korean.
Mhm. Um I wanted to use Anki that's the space repetition flashcard app. But you know it's a paid app and it it's a generic memorization app. It's not tailored for any any specific purposes.
And I just wanted to be very specific to learning languages especially Korean.
So what I built was exactly the same UI.
I just copied it from the screenshots of the iOS store and paid it to Yep. Codex to Initially I built a web version and it was you know one-shotted pretty quickly. And I used the web version to build an iOS out of it. I didn't do it with with the intention because web web version is better or anything but initially I thought doing the web HTML JavaScript CSS is easier for me to understand what's going on and then put it into Swift UI which I have very little knowledge of. So Yeah. that went really well.
Yeah and now I've I've shipped a few Mac OS and iOS apps in the past year and I still don't know any Swift.
>> [laughter] >> Um like I can recognize Swift as you know if someone showed me a code sample I'm like oh yeah that's Swift. Um but it's I think my favorite is I shipped a thing called Multipass which was the first mobile client for humans for Molt Book.
Which was The open claw thing? The open claw social network.
>> Yeah yeah. Um it's like it was a you know it was a yeah it was like a standard social media client app but designed for a social media site that no human is ever supposed to use.
>> Right. Um the most useful Mac app that I've shipped is called Clearance which is a a synonym for markdown. Oh.
>> So it's a markdown browser. It's not a mark like it has an editor but you it's really a light editor it's not intended as an editor. It's not like Obsidian where you can get a tree of all the files in your project. It's a it lets you re read markdown files. It lets you click links between markdown files and there's a tab that is the history of all the files you've opened like a browser history.
Because all of us doing all of this AI dev we are constantly reading markdown files and anytime I clicked on a markdown file what would open? Xcode. Antigravity VS code.
>> Obsidian. Yeah I don't use IDEs anymore but I still had a couple installed and I just wanted to read a text file. I also have a bespoke blogging client designed to like for my static site my 11T blog that uses the GitHub API but is a full desktop app and most of a a most of an iOS app.
Um it's like software for one.
I'm the only user. Yeah.
Yeah that's the that's the best part. I can create my own app and the UI can be tailored exactly as I want it rather than picking from hundreds of to-do apps that do exactly what I want.
Yeah.
Um this is actually related to where I think a lot of stuff is going. Um it's you know people talk about the SaaS apocalypse the the idea that software as a service >> Yeah. uh is in trouble and I think it is you know it's going to take a while before the future is everywhere, but it seems very It is very clear to me that if you're good with the tools today, it is usually easier to build the part of a SaaS product you need than it is to use, you know, than it is to get on-boarded with a commercial service.
There are times when, you know, they have some other moat, like they've got, you know, they take liability or they have an interaction with the physical world for you. But it's easier and easier to build exactly the software you want. And right now a lot of that is personal software, but I think that we're starting to see the beginnings of very broad changes in the build versus buy decision for companies. Um it it is now often easier to build exactly the thing they want than to pay to use a commercial product. Right.
And so the like the long tail is going to get longer and longer. Um I work for a SaaS company, so I need to be a little careful >> [laughter] >> about how to how to phrase this, but yeah. I mean, you work for a company that is much more than SaaS. Like the they have significant physical infrastructure. They provide deliverability. The software isn't their moat. Their moat, you know, you know, you um your employer's moat is that they provide network connectivity and services over the network and they have points of presence everywhere. And so it's not like it is not like they're um you know, providing an online bookkeeping tool or an or uh customer service tools. Yeah. And so that's I think the kinds of things that that Fastly does are still very valuable. It's It's all of the It's just a piece of software companies that feel I'm I'm much more nervous about.
>> Yeah, especially the tools that you can really easily replicate with a bespoke software. So for example, I think one of those is I'm not going to name names, but like a Scrum bot, that's a Slack bot that asks each member of the team a question, "What have you done yesterday? What are you going to do today?"
And then aggregate these messages across the team and then share it in a in a different channel. And obviously there's some aspects of the this piece of software that it was painful, like, you know, keeping a history and like some privacy setting across the teams, being able to archive the messages or or delete the messages when it's needed, etc. But I think majority of like 99% of the features can be easily replicated and it's they don't have great moats.
Yeah, that one in particular, one of the things that I put together for our corporate internal Slack was a little bot whose initial purpose was two things. When it learned facts, it updates a wiki that's a get repo. And when somebody mentions something that's a problem, it opens a linear ticket.
And then at some point I realized that we were doing daily stand-ups and I told it, "By the way, you need to pay attention to the daily stand-ups channel. Anytime somebody reports what they're doing today, you know, what their morning stand-up is, record that in a data directory and the next time they post a stand-up, reply with what they said last time and ask them how they did." Mhm.
Uh and it was it was a prompt. Like that's it was, you know, that it was a thing that our Slack bot does. Um what's really funny is one of our investors is in a single private channel with me on Slack and I invited the bot in because it's useful to have it keeping notes. And he's discovered that he can the my our investor has he can ask it questions about what the team's been doing, what it thinks the I mean, because on the back end it's just it's just Claude because that was the easiest way to build it. He he can ask it today digest news items and ask how, you know, those impact what what the bot sees as the team's current work.
Um it's really funny to watch him interact with it because it has all the information.
Right. Yeah. And it it's just prompt is is a very powerful thing. Even before this Claude code thing uh became the mainstream um last year. I think 2 years ago I wrote a a giant iOS shortcut that interacts with the as GPT API.
And one of those is to get the uh weather information from iOS shortcuts based on my location and get the to-do list from Asana using Asana HTTP API and stock information based on the stock that I that I own. And aggregate this information and send a text message to me every morning 7:00 a.m. Yeah.
>> Um I was able to write this entirely on iOS shortcut. And the nice thing about it is if I want to change the layout or change the tone of the messages or add some features, I just need to update the prompt. I don't need to recompile anything. It's just a text field. And I actually changed later to get the text from that text file in iCloud Drive. Nice.
>> So I I don't even need to open the shortcuts shortcuts app to update the shortcut. I can just open a text file and change the prompt and it will get the text file from the iCloud Drive. I don't need to do anything other than that.
>> Yeah.
What's wild is that thing you just described, 15 years ago you would have written as a Pearl script running on a server. But it's like it's it's I mean, the the difference is that you're prompting it in a human language rather than you're writing in a human language rather than a computer language. Out of curiosity, is that in Japanese or in English? It's in English. Okay. Yeah.
So yeah, if if it was 15 years ago, I would have written a Pearl script that fetches all the things using shell script or HTTP client. And I use template toolkit to generate the the text file and then send it over a text message. I could totally do that, but maintaining those things will require It's annoying. It's annoying. Yeah.
It is definitely It is absolutely the case that because you're a programmer, the doing this in English with, you know, the ChatGPT integration, you are probably getting better results than the average person who is not a programmer because you understand systems thinking. And that's one of those really important skills that people still need to learn even when they're vibe coding. It's like how to think about problems and how to explain them. Right.
Yeah.
>> I think that's that's that's especially true for SwiftUI app. I don't know SwiftUI at all. But if I read it, I can see what's going on generally. And when I built this Anki memorization Korean learning app, I wanted to build a few features that do not exist in in the Anki app, which was to practice the pronunciation of the Korean. Ooh.
>> Um the idea was because the the data set contains the Korean phrase and Japanese phrase.
Uh what I can do is to implement speech-to-text engine and then tap a button and I read that that Korean text and then recognize that and compare it with the Korean text in the database. And then if it if it roughly 99% match, it will say the match. And I I can repeat until it matches. Yeah. Um I was able to build that feature in like 5 minutes.
Yeah. [laughter] Using the superpowers brainstorming to figure out what needs to be done. You know, what API does it need? I was I I didn't even need to use the coding agent for that. I just used the, you know, browser search engine to figure out what iOS SDK needs. It was It was pretty easy.
Why did you need to tell You shouldn't have even needed to tell it. You could have just told it to go to go figure it out. Yeah. Sometimes the coding agent, especially about the SwiftUI and iOS SDK, sometimes they try to use an old version of the SDK, which is not current. Yeah.
>> So I was a little careful about that aspect. But I think in the in the hindsight, I think I could have asked to just read do a research on its own and figure out the um I have been I have been noticing that that uh Claude and Codex have both gotten much better at not using old versions of Swift and SwiftUI. Swift has the problem of because it's an a language that's evolving so quickly, what was the best practice a year ago when the training data was collected is no longer right. Um there are a couple of good MCPs. So one of the ones that's out there is called uh Sosumi, like s o s u m i, like the old the old uh Apple boot sound. Yeah. I think it's sosumi.ai and it's an MCP that is a proxy to Apple's technical documentation because [clears throat] Apple's technical documentation is not downloadable anymore. It's only on the web. And the web pages are all rendering from a back-end API.
They're using JavaScript to render.
And so if you point like regular Claude regular Claude's web fetch tool at it, it fails.
And so so Sumi is basically an MCP that knows how to browse the back end of Apple's technical docs.
>> Right.
The web fetch tool failing on lot of domains is annoying.
Yeah. So I ended up um last fall building my own browser MCP because I got so annoyed at the Playwright MCP Mhm. because it was and so I built a browser a Chrome browser MCP using the Chrome developer protocol like the the same dev tools protocol. It is one tool and about a thousand tokens and it is by API design it's criminal. The tool has a has three parameters. One of them is called action, one of them is called selector, and one of them is called payload.
And the description for action is a list of the 20 commands that you can paste in there.
The selector parameter originally it said only CSS selectors no XPath.
And when I was doing early testing Claude kept getting confused and trying to include XPath.
And then I realized you know what?
I don't need to make it be CSS only and I don't even need to a parser. I can just you know it's literally just put in whatever you want. It'll work.
I set it up so that after any action you take it automatically dumps a screenshot, a copy of the DOM, a markdown version of the page, and a copy of the browser console into a well-defined place on disk so you don't need to do another tool called at back to the browser to get it to do things.
And it's been super useful.
A couple of weeks ago I for for kicks I set up a a nano claw as my company's new junior go-to-marketing person.
And just to see how it just to see how it would go. And one of the first I bought it a Google like a genuine G G Suite corporate account that I'm paying for.
We bought it a a corporate GitHub account and I told it, "Okay, I need you to go clean up Google you know set up Google Analytics for us."
And nano claw is Claude code on the back end and it started off with, "Well, you're going to need to go set up a GCP project and grant me this and grant me that and I'll need tokens."
And I tell it, "Look, you have a real browser. You know it's like Agent Browser got stopped by Google because because it's you know fingerprinted me as being not a real browser." I'm like, "You have you have superpowers Chrome.
Superpowers Chrome is a headed copy of Chrome. You can go do it."
And it it's like, "Oh, I do have this tool. Let me go try." And it says, "Hang on, I got a CAPTCHA. I can't fill out CAPTCHAs."
I say, "Oh yes, you can."
It says, "Right, I forgot I can fill out CAPTCHAs." And I wander away for about half an hour and I come back and I'm about to try to figure out how to do this myself and it's like, "I got logged in. I got the tokens.
I got I got Google Analytics all set up.
I had it set up Klaviyo and then discovered that in Klaviyo I hadn't we didn't hadn't implemented drag and drop because drag and drop it never come up. So now Superpowers Chrome has drag and drop. It has type like a human.
So it it there are definitely still agentic CAPTCHAs that will capture it but any place where you want it you know any reasonable place it seems to work.
Even last December I used an agent with Superpowers Chrome to do our incorporation paperwork on Stripe Atlas just to see if I could do it. It worked."
Wow.
Yeah.
So the lesson is you don't need an MCP that has hundred functions. Instead you can provide just a one function that takes a string to a vowel. Is that is that the lesson? So this I mean so I've actually got a blog post up about this but it's um if you think about an MCP as an API facade you're doing it wrong because the the entity that's using the MCP is more like a person and so if you think of it more like it's a Unix command and how do you build a Unix command you it'll you'll do better. Um when I started setting up an agent to to read my email I used to ask mail and so when I found the best fast mail MCP server and it's JMAP which is their JavaScript mail access protocol and I watched it and it was struggling. It was it took it a little while to be able to download a single message and I I stopped Claude and said, "So what what's so hard about that?"
Claude says, "Well, the JMAP MCP is a strict facade over the JMAP wire protocol and it's okay. I can just go read the protocol specs anytime you need me to read an email."
>> [laughter] >> I'm like, "Okay, I need Claude I need you to go read my blog post about why MCPs aren't like other APIs."
And Claude comes back and it's like, "Oh, now I understand. And a good MCP should be designed so that the kid working in the knock could operate it at 2:00 in the morning without opening the run book."
I'm like, "Well, that's not how I would have phrased it but that's not wrong.
>> [laughter] >> And so now whenever I'm designing an MCP I literally tell Claude go read this blog post of mine so you understand the zen of how to write good MCPs."
And I I still find MCPs to be really useful even just for tools because the models have been trained so hard to look for tools in their tools array that it's way more likely to use those tools than if it's if you give it some skills and tell it that the skills tell it where to find shell scripts.
Right.
Yeah.
And because it's it's more structured and an array of tools you will get more reliable and stable results than telling the agent to figure out how to use it every time with the giant skill.
Yeah, that's been my experience. The other thing is this is a thing where Anthropic is better than any other company that I've seen at focusing on the descriptions of the tools in the tools array. So I remember when you know when Codex first got open source their tool descriptions were very weak and Anthropic's tool descriptions they talked about how to use the tool and when to use the tool and why to use the tool and it was like it was really good prompting.
And so that's a lesson I learned very quickly is telling the agent not just what to do but you know when to do it and why and why you want it done in a specific way or at all and how to think about it gets you much better results. I assume that's the same when you write skills in the description field of the skill you need to be really expressive about when to use this skill and how to use it.
So this is a thing where so Superpowers started off as a skills framework for Claude code and I accidentally front ran Anthropic by about two weeks on skills for Claude code. I didn't know they were shipping them.
And so my skill system was a little different. It had not in the the headers of the skills it didn't have the two field name and description. It had three fields, name, description, and when to use. Because what I had discovered is if you told the model what a skill does it often or it sometimes will choose not to read it because it thinks it knows how to do the work. So if you say like this skill describe you know describes our node module release engineering process it turns out there's a lot of documentation on the internet that you know agents have already read about how to do node module release engineering.
But if your skill description just says read this skill before you do before you do any node module release engineering it doesn't know what's in there. It doesn't have an expectation that it is even the process. It might be you know it might be cautions. It might be API key information. It could be anything.
And so they're much more likely to what I found is that agents are much more likely to use skills if the only thing in the description is when you're supposed to use it. Right.
And then once they read them the tokens are in the context window and you're home free.
But the Anthropic's skill system doesn't have a dedicated field when to use it when to not use it.
Right? Right. So So I my So I use at this point I use the description field and my description fields only ever say when to use it. They don't say what it does. Right. Okay.
And that seems to work very well across pretty much every agent and model that I've tested against. My biggest problem with it is that I get well-meaning pull requests from people who tell me that I'm violating the guidelines for how to use skills and they need to make my skills I need to make my skills compliant with Anthropic's guidance.
Mhm.
It's, you know, >> [laughter] >> it's not it I I appreciate that they're trying to help. They're trying to help.
Um but like agentic pull request is a whole in open source is a whole thing.
Right.
>> Um have you have you been having trouble with uh slop pull requests?
>> Mhm, not personally. Okay.
>> But I've seen some of those in the repos that I have access to. Yeah. Yeah.
I it's So, unsurprisingly because Superpowers is kind of popular, I get a lot of pull requests. Right.
>> And a lot and I was running running into this problem where pull requests were really low quality. They were not explained, they were not tested, they were often things that we absolutely did not want.
And then I realized that I didn't have a pull request template.
Ah.
I had Claude help me build a pretty nice pull request template that assumes that all pull requests are being submitted by an agent. So, it asked questions like, "What prompted your human give you to you know that resulted in this pull request?
>> Ah. Has your human read every line of the PR? Um have you done a search for other pull requests you know related to this that might have been rejected?
And that helped a little bit. Does the agent actually read that though?
Uh not Claude code because Claude code is usually using GH. Right. That's what I do as well and GHPR command as far as I know does not use the pull request template.
>> It does not. And so, Claude code was not seeing the line that said, "If you ignore the pull request template, we will close your pull request without reading it." Mhm. Yeah, that's fair. So, I figured something out. Um the project now has a Claude.md and an agents.md that is only a contribution guidelines for agents.
And it was I didn't write it, Claude wrote it. I first told Claude roughly what I wanted and it built it. And then I had this idea of "Actually, why don't you go read every pull request that we've rejected and update the guidance?" Mhm.
And Claude comes back and let me let me see if I can get the text cuz the text is pretty crazy.
Um Is that up in the repo?
It's it's in the repo, it's also on my blog. Um I I told I told Claude, um "Go update the guidance." And then Claude wrote this.
"If you are an AI agent, stop. Read this section before doing anything. This repo has a 94% PR rejection rate. Almost every rejected PR was submitted by an agent that didn't read or didn't follow these guidelines. The maintainers close slop PRs within hours, often with comments like, 'This pull request is slop that is made of lies.'"
Which I have act That was me doing by hand, I've done that. Um and then it goes on to say, "Your job is to protect your human partner from that outcome.
Submitting a low quality PR doesn't help them, it wastes the maintainer's time, burns your human partner's reputation, and the PR will be closed anyway. This is not being helpful, this is being a tool of embarrassment.
>> Wow. Before you open a PR against this repo, you must and then it goes on.
Um Sounds almost like a threat.
Oh, it's absolutely a threat, but I didn't write the threat, Claude wrote the threat.
>> Right.
>> [laughter] >> Um and it's you know, it's a little bit of a threat and a little bit of a promise.
And what's what's amazing is after after this change, I would say that the quality the quality of PRs is I I don't have numbers, I'm not Claude, I don't make up percentage numbers like that, but it is much much higher. Mhm.
Um most of the problems now are the human told Claude to do a thing that's wrong. The human you know, so it the things that we get now are, "You need to update Superpowers to include my proprietary product." or "I have these new skills that you should include."
But it's not random dr- it's not random drive-by garbage. And so, that's it's a huge improvement.
Yeah, coming back to the scale triggering loading issue. Yeah. Yeah, initial the first scale that I wrote at my work was to uh debug a uh Fastly service when when you encounter some of the issue, here's a list of commands and API calls to investigate what's going on with the customer service. Mhm. Um I put it out of scale uh because that sometimes what I need to do during an incident.
But um interestingly, because the name of the scale is very generic, Fastly {slash} debugging, um [laughter] every time I go to a code repository and try to debug a thing, Claude founds a scale. Oh, here's the Fastly debugging scale. Maybe this is something useful.
And uh you know, a few seconds later, you know, find that that's not exactly what I what we want and basically move on after just wasting a little bit of tokens. So, just delete it that that scale and move it into a specific directory.
Before before the scale system, I was doing something similar. I think a lot of people did something similar by creating a bunch of directories under an agent directory and put the specific Claude.md file for the specific workflow and write a shell script that CD into that directory and launch Claude from there to put the Claude.md rather than you know, having a very sophisticated and structured scale system. So, I think >> Yeah. And also I had a bunch of slash commands, which is essentially the same thing, but more more limited, I guess.
Yeah, those I mean the slash commands were designed for met you know, human manual triggering. It was interesting to see Anthropic roll out skills next to slash commands. As far as I can tell, the first implementation of skills was actually as slash commands. Right.
>> in inside inside Claude code. Then they added a skill command. Then they made skills appear to be slash commands to Claude.
And then they've kind of been trying to kill slash commands in in favor of skills. Um it's a little bit confused, but it's you know, we're we're figuring these things out as we go.
Um I was you know, up until I figured out skills, I had often literally been I had like four or five chunks of text that I would copy and paste. I I was you know, it was very old school, very you know, very manual.
And for me a lot of the value of skills it you know, has been auto triggering. I shouldn't ever have to say, "Go use the such-and-such skill."
It should just know. Mhm.
I did some research on on Twitter.
I know you are not pretty active anymore, but I but I happen to be because that's where some people are hanging out. No, it's it's where a lot of AI stuff is and it's it's very weird to not be active there, but still constantly getting linked there.
Yeah.
Um I did some research about how popular Superpowers uh for especially Japanese Twitter users. Yeah. And uh yeah, seems like a lot of people like it and recommend to solo developers who want to have some consistency in the process.
That's that's that's pretty fascinating.
But one of the complaints that I see was that sometimes the brainstorming task get triggered even if all they want to do is just a quick quick task and without any thinking. [snorts] Yeah, and so the I mean this is this is a it's difficult to to sort of get to tune these things because every model and every harness changes things.
When um sometime a couple of months ago when Anthropic made their plan mode trigger much more aggressively, Mhm.
Superpowers brainstorming was never triggering because plan mode was triggering. Mhm. Um and plan mode I mean plan mode I caught plan mode triggering on its own, making a plan, leaving plan mode, and then starting plan mode again.
Um because it was like any they had phrased it as like anytime you're going to do complex work or something like that. Uh and we played some games to try to make it so that if you were if you have brainstorming, you usually want that instead of plan mode. And so, I did some work to try to get it to if you if you're about to start plan mode, you really want to trigger brainstorming.
And I think a lot of the brainstorm triggers too often is because plan mode was about to trigger. Mhm.
Um I don't love it. I don't have a I don't have a great answer other than you can tell Claude, "Let's just get this done. We don't need to brainstorm this."
Right, right.
Just to be explicit about it. Yeah. And that's like the like some of the some of the best work I get out of agents is by being really clear with what I want. Or when something weird happens, stopping them and saying, "Can Can explain why you did that? What were you thinking?
What could I have said that would have gotten you to do it the right way?
Right. And then rewinding and going back and saying basically unwinding the conversation and starting over and saying it the way that they think that Claude said it might be better. Yeah, personally I use Superpowers for brainstorming, especially the visual companion um for when building website or iOS app is is really powerful.
Yeah. I find it really fascinating, especially the user experience of it.
Yeah, it it took a a little while to get it to work right just because Claude really doesn't want me to do the thing I'm doing.
>> [laughter] >> Um but this was a a thing that I have, you know, I had sometimes had Claude, you know, make an HTML mock-up and open a browser.
And spending and then spend a little bit of time figuring out how to get Claude to spin up a web server and write files into the right place so that the browser content updates and clicks from the browser propagate back to Claude.
The very first version had actually had it so you could type notes you could basically have the conversation in the browser.
But Claude doesn't have a way or at least at the time didn't have a way to inject a real user message from outside a running session. Right.
>> So it would basically either hang waiting for the browser or um like it basically like it there was no good way to make it feel right. Um but I've been really happy with letting Claude write HTML mock-ups. And I even used it for some logo design for the new company. And it we went through a logo design exercise where it was writing SVGs and shoving them to me and asking like, you know, do you like A or B better? Right. Um one of the things we've been playing with is going a little bit further than mock-ups into prototypes. And so having it write workable, you know, HTML and JavaScript prototypes with actual functionality to get a feel for whether the functionality is right.
And that's not quite ready to go yet, but it's a thing that feels really good. Right.
We already had talked about how this kind of looks like looks like a waterfall.
Uh develop them all all over again. But I think I guess the problem with waterfall was that after spending a lot of time making making sure the requirement is right, planning is done correctly.
And when you go to development, which will take a really long time and once you realize that there's a problem with the requirement you need to undo and redo the whole thing again, which takes a lot of time.
But that aspect of this development takes a really long time is not is not true anymore.
You can go from designing a requirement, brainstorming, writing a plans to get the code up and running in I don't know an in an hour. Yeah. Um and if if there's a problem with the requirement after everything is done surely some of the tokens are wasted, but you learn the thing and you can redo it and whoever writes the code is not tired, so you can you can do that again.
Yeah. No, like the the sort of the most important thing that makes that work is that it is often easier to go and update the requirements and start over than it is to try to modify the thing that's not quite right.
That um I've been spending a lot of time sort of trying to figure out what the how you do [clears throat] the sort of iterative version of dev. So cuz like right now there's sort of two modalities. There's this very big upfront you do all your planning, you you know, maybe you Superpowers brainstorming or something else, you generate an app, and then you want to make small changes. And the metaphor that I've been trying to use that Americans don't quite understand um is and I'm going to mispronounce it dorodango. This the the polishing of of the mud balls to be these gorgeous works of art. Okay. And so it's it's you're I mean you're taking this thing that starts off lumpy and brown and not all that you know, not all that interesting. Um conveniently there's also a software metaphor called a big ball of mud where the insides are kind of garbage and spending time carefully polishing it, making tiny changes, fit cleaning up this and that. Um this sort of the quick iterative stuff, it's you end you can end up with this gorgeous work of art from something that started off kind of lumpy, brown, and unexciting.
And I don't feel like we have good methodologies for doing that yet. It's a thing you can do sort of in Claude code with it open or any of these tools, but it doesn't get you updated specs at the same time. Right.
Um and that's a thing that we're I think as a as an industry we need to figure out. Hm.
One of the things that Claude did or used to do, maybe not anymore was to write down a plan document and without me asking for it, it puts some rough estimate of how much time it takes to implement.
>> [laughter] >> Um usually it says like uh estimation is about 2 weeks of coding.
And uh I read it 2 weeks? No, no, no.
2 hours.
And you're going to do it.
Yeah. Um things that I have found improve that. So one is, okay, that was a human estimate. How long would it be for a coding agent like you?
Um and the other is in my Claude.md I now have a line that says, anytime you are giving me software estimates, you must provide them in lines of code rather than time. [laughter] And [clears throat] I've been noticing 4.7 is not as good at honoring that, but in general it that seems, you know, I don't care what the estimate is really. Right. Like what I care about is, is this crazy and impossible and going to burn up all of my Claude code token, you know, all of my Claude code credits or is it easy? Right.
I find it fascinating when I wrote a design doc for a project and it estimated the migration time it needs to happen in production was I think 3 months or something.
Um and it was actually not wrong. Yeah, okay. That's actually true for a production system to migrate one system to the other. Maybe 3 months is actually pretty optimistic. Okay. Uh so that was maybe on the right spot.
Sure.
>> [laughter] >> Yeah. Okay. Yeah, production systems are different and you have customers and this is one of the things that I constantly struggle with the coding agents is how much they love backward compatibility. Oh, yeah.
>> And and it's I have to constantly remind them that like this is this is an, you know, unshipped V1. There are no users.
You do not need to migrate data. You do not need to write a compatibility code path to put, you know, to protect all the old users. Yes. Um And this happens to me all the time. I have that in a project Claude.md.
Um this project does have existing users and it has an existing database.
But what Claude tries to do is even for a function that's not public and export it. So this is a function that's already only used internally in this project.
But whenever we make a change to its one of the signatures, it tries to create a new function um with the new set of parameters while keeping the old one for backward compatibility or downstream users, which doesn't exist.
Right.
>> So I need to put Claude.md, this is a standalone application.
No other application uses this as a library. So whenever we update a function signature or anything, we don't need to care about backward compatibility.
Yeah. Yeah. Do you have that kind of thing in Superpowers?
I it's not in Superpowers per se, it is in my Claude.md and my agents.md. And so what's funny is I I think it's like written as like anytime you think you need to include something for backward compatibility, you need to ask. Right.
And Codex and the GPT models are so rules following that they will come to a dead stop and be like, I I think I need to add a function, you know, a function here to preserve backward compatibility, but your rules say I need to get your explicit permission. May I add that function, Jesse?
Um yeah, it's Is that because of the model or is this the system prompt? That's cuz of the model. The GPT models have always been way more rules following than the the Anthropic models. Um what when I first ported Superpowers to Codex and GPT, I started it up and one of the first things that um brainstorming tells Claude, you know, told when at the time it had been just for Claude, was you need to use your to-do right tool to, you know, to put this list of tasks on on your task list.
Codex freaks out. It says, "I don't have a to-do write tool. Let me see if there's an MCP. Nope. Let me see if there's a shell script in the current directory.
Nope. I'm going to search the entire disk for a tool called to-do write."
Um and I stopped it and I had to go and added a little translation table of like some things in Superpowers originally written for Claude Code. When you see something that is for Claude, you you should use your own equivalent.
For example, the to-do write tool is is called you know, it's called task. The this tool is called that. The and and once they do that, Codex has no problem.
Um uh one of the things that make Superpowers work well on Claude Code is I have a bootstrap hook that when you start up a session, it loads the using Superpowers skill into the into the context buffer automatically.
And that's that's the reason that Superpowers skills trigger better than average Claude Code skills cuz I load extra text that explains to Claude how important it is to use skills.
For Codex, they don't have plugin hooks.
And so but what we discovered is that because the using Superpowers skills description just says, "You need to read this at the start of every session."
Codex just does it every time.
>> [snorts] >> Because it follows rules all the time.
It follows the rules and when it you know, when the user says something is you know, says something that doesn't have an exception, there are no exceptions. It just does it. Um it I find that it has made Codex less fun to interact with. I and I I don't feel like Codex is as good an architect as Claude, but I feel like it's a more competent engineer and more reliable at putting together good quality code. Friends of mine who study this stuff, um they basically don't trust Claude to write good code without it being reviewed by Codex.
Yeah, that's what I heard from some of my co-workers as well as friends on the internet.
Like I've just started playing with Wes McKinney's RoboRev, which does sort of automated code review through skills.
And so from Claude Code, it will use RoboRev to run Codex and do a code review on this and come back with complaints. Um and Codex likes to complain about code quality.
Right.
So do you at this point of time, do you recommend Claude to write requirements and plan document design documents and then let Codex implement it? It varies a lot. Um somewhat on the day, somewhat on the project, somewhat on how my token subscriptions are today.
>> [laughter] >> I I have been really enjoying the Codex macOS desktop app. Like way more than I thought I would. Yes. Um it's they they did a really good job. It feels fast. It surfaces the stuff I care about.
It doesn't surface a lot of stuff I don't care about. Um I'm surprised how much I like it. And I I spent today trying to use the Claude Code desktop app and it feels like it is optimized for people who want to see every detail and read every line of code. And that's not who I am anymore.
I I don't write the code and I don't read the code. Right. Yeah, I I I find it surprising as well. I tried Codex CLI after Claude CLI and I didn't like it as much. And then I tried the Mac desktop.
And I think it feels right and then I initially thought I'm a type of CLI person. Everything TUI is better than GUI, but not this but not this case, especially the interaction with Codex with the macOS app.
Whenever I need to iterate on the result, it's much more natural and easier to do than doing that same thing with Claude because you know, when I give the agents execute the the plan for me and I need to review it and I find a few things to correct and it's so much easier to deal with Codex using the diff view. I can just put a comment inline inside the diff viewer just like a code review on the GitHub.
>> I I've not tried that.
>> Yeah, that's that's a you can queue [snorts] these comments and submit them in the batch Yeah. rather than doing doing it every single time. And it that's very difficult to do in in the Claude CLI because I need to basically quote the exact line of code and paste it or just put the line number, which is very error-prone. Yeah. I'm also usually running Claude Code in tmux on a remote server now.
Um because I wanted to keep working when I close my laptop. Uh and so it's even more annoying for anything around copying and pasting.
Uh the Codex desktop app like one of the other the integrated browser is actually very reasonable.
I haven't spent a lot of time playing with they let they have a mode that lets you like draw on top of the browser and send that to Codex.
Uh which is like very it's clever. Um Also, they have shipped a bunch of improvements to computer use inside the macOS desktop app. So Codex is much more able to use other programs on your Mac. Right.
Um and so like when I'm doing iOS dev or macOS dev, it it does command and control of the app or the simulator and it it's yeah, I like I feel a little weird advertising for them.
>> [laughter] >> You know. Yeah, can you do that for the Codex?
With the computer use? The Codex desktop app macOS desktop app has computer use and the big update they shipped this week or last week, I don't remember which uh last week, um improved it dramatically. Okay.
Um yeah, I think it's behind a flag that you have to turn on.
Right. Got it. Um Superpowers finally became a a proper Codex plugin last week and it got announced as part of the the big Codex for everything uh app launch. They like even had us in one of the promo videos. Mhm.
Yeah.
The breaking news is that Anthropic is not allowing Claude Code under $20 pro plans. So I saw that this afternoon and then I saw clarifications on Twitter from somebody in comms that said this is a trial for less than 2% of new users and they promised that they're actually going to tell us before they make that change.
Everybody I know who's been who's tried to use Claude Code on the $20 plan in the last 3 months has complained about the fact that they ran out of tokens instantly. So I saw this and thought, "Okay, that's a reasonable choice because it burns tokens too fast."
Um but they said they're getting like in the in this trial, they're not going to let you use Claude Code, but they are going to let you use Claude Co-work.
>> Co-work, yeah. And Co-work is just Claude Code running in a VM on your Mac.
Um except that it doesn't have access to arbitrary file path because of the sandbox restriction. So it has sandbox restriction from your Mac, but inside the Linux VM that it's that it's running on your Mac, it's yeah. And so if you didn't want to pay much money, didn't want to use Claude Code, in theory you could just go into the VM and let it do your work. That I don't know.
It doesn't make any sense None of this makes any sense. No, I actually used Claude Code with the $20 pro plan.
That's me. So uh Wow. Really? Yeah, I Okay.
>> I do not use Claude Code as much for my personal things. So I I do not use the $20 pro plan for my work stuff, obviously. Work stuff is under different enterprise stuff. Yeah. So and I use both Claude Code and Codex and switch between them depending on the the token usage and the type of work that I need to do.
Yeah. Are you using any of the Chinese models or other models for for code work?
>> I have not tried. Is it like Kimi and Gwen and >> Like Kimi and Gwen and Minimax and GLM. I have a bunch of friends who are increasingly excited about them and also seem very cranky about Anthropic over the last month.
Um and I've played a little. I haven't yet found one where I'm like, "This is good enough that I would switch." Because I feel like for the most part, you always want to use the best possible model you can use right now. Yeah.
Um I do think there's a bunch of interesting stuff that's finally starting to happen around using local models for the cheap work like tool calls and things like that and smart models in the cloud. I've even built a uh a prototype coding agent where everything is a sub agent, even things like file read.
So to make it easier to play to play with those kinds of architectures.
Right.
I heard good things about composer 2 from cursor. I think it's based on Kimi.
Yeah, 2.5. Yeah.
I mean, the other breaking news about cursor is that they got they they might have gotten acquired by presumably the x.ai part of SpaceX. Right.
And so it's yeah, it's A few hours ago before we are recording today. Yeah, but it was it was not a a done deal anything. It just says they have the right to do it.
So it's So what it it's it's like they're going to work together for a little while and uh SpaceX has the right to buy them for $60 billion, but if they choose not to buy them, they will pay them $10 billion. Okay. It's a breakup fee. It's a pretty standard thing in a um but it sounded like what they were trying to say what like some people on the internet were claiming that um because x.ai has all all of these GPUs that they it like they are speculating that maybe they can get cursor to help them train a a better coding model. Mhm.
To make a a grok a better model model for programming? I guess so. I don't know. Um like I am not a huge fan of x.ai and their political side of things, but when I I was using a cloaked model on open router and it was this amazing coding model and then it turned out that it was actually grok 4.
Um it was a much better coding model than I would have expected.
Interesting. Yeah.
Uh you mentioned the uh you are not reading code anymore.
Yeah. Do you think that's the future that we are going to like is even for professional?
So just speaking for myself Yeah. for my personal applications for web coded apps like iOS app, macOS app, I do not read the code at all.
And for my personal projects, for example, the website of this podcast I still try to read the code even if I let the agent drive the actual implementation. I try to review it. I do not I stopped being nitpicky about particular choice of the uh the design. As long as it works, it's okay. And as as long as I understand the intent, uh that's okay. Yeah. And for my professional use case I use cloud code for 95% of my output at the work.
I try to review the whole code and I do not submit a pull request if I do not understand part of the code because I That sounds I don't know. At this point of time I that sounds a little unprofessional and rude to my co-workers. It It It depends a lot on what it is and what the organization is. Yeah. Um I have you know, I I try to be very clear that there are in a safety critical uh system or a regulated industry uh it is not current we are not currently at the point where it where I would find I would find it reasonable to be not reading the code. Right. In a business situation where you are using AI to contribute to a project where every where everyone else is hand coding it is it is the same as if you hired an intern and told the intern to write the code and sent the pull request. Right. And it's you know, ultimately you are responsible for the thing you're submitting.
And what that means in different organizations is different. Um I And so it is going to be project by project like superpowers. I don't just read I mean, I don't just read the code.
I look at every character because [clears throat] it is um you know, skills are skills are English text and uh the way cloud code works super I think as of last Friday, we have close to 500,000 installs inside cloud code according to Anthropic and most of those auto update. Um I I am very cautious about what we ship in superpowers and absolutely read every every word of it.
>> I mean, it's not just the source code.
That's the product Right. that that's get shipped to the end users. Yeah.
Um but in for code, what matters to me is outcomes. And you need to be able to prove to yourself and possibly others that the code does what you what what it is supposed to and doesn't do things it's not supposed to.
And that is not like simply reading the source code is not always the solution for that anyway.
Uh and so testing and verification are super important. And what that looks like in the you know, going forward from here is a very long conversation on its own.
Right.
>> Because it's not especially once there agents involved, old old school tests are not going to be enough.
But you know, it what you know, what matters is outcomes.
And >> Yeah. Yeah.
I was early in the days of coding agent before having superpowers I was struggling to let coding agent do the test driven development. Even if I write the definition of TDD and red green refactor it doesn't follow and it doesn't do what I meant. Mhm. Um the cloud code says like, "Okay, following your rule in cloud.md, I'm going to start with a failing test." And it I think it was a go project and it started with a test that literally says fail.
Um >> [laughter] >> Because yeah, that's a failing test.
Yep. That's not what I meant. You write you know, you you have to write an actual expectation which will fail before the actual code.
But that's that's you know, the the agent didn't get it. I tried a few iterations to get it right.
I had the same experience as you when I started using cloud code and I and I was quite frustrated that I didn't know in the beginning how to get get cloud to do the right thing and I went around went looking for people who had done prompt engineering. Like real engineering of prompts and I couldn't find anything.
And so you can find somewhere in my GitHub there is a repo. I think it's called like uh cloud docs setup which is not really very easy to understand, but if if you look inside, you'll find my early experiments. And so what I did is I I wrote a prompt and the basic prompt was let's make a react to-do list use local storage.
And I wrote a tiny little harness wrapped around cloud code that would pipe that text in and would just let it run in dangerously skip permissions mode.
And I saved the transcript and the thing it [clears throat] built and the prompt.
And then I edited my cloud.md to attempt to start to figure out how to get it to do proper TDD. I would this was API credits before subscriptions existed. The first time I ran it, it cost like 25 cents. It took 2 minutes and the to-do list app looked really pretty, but if you reloaded the page, all the to-dos were gone.
And over the course of a couple of weeks, I got it to the point where it was a with the same prompt and a better cloud.md it was a five-phase project that cost $25 and took over 20 minutes.
And it did strict red green TDD from the beginning too far. So the first thing it would do is it would write a failing test that proved that there was no package.json.
Right. Um >> [laughter] >> Everything.
And but it's but that was sort of my introduction to how to teach cloud to do things right. So that's where I learned about the idea of say if say you say something is a hard gate rather than a rule, it's a a thing that has to become true or it can't continue. Mhm. And that which is now a pretty standard prompting technique, but it's it was very instructive to essentially do these sort of mini evals and see what a one-line change would would do in the cloud.md. One of the best single single word changes was switching the first line from saying you're a senior engineer to saying you're a pragmatic senior engineer.
>> [laughter] >> Um you know, adding YAGNI adding DRY, both of those materially changed how cloud behaved because there's so much history baked into those terms that cloud knew what to do.
Yeah.
Going back to the waterfall model, but still uses lot of historically proven attributes like TDD, YAGNI, DRY.
I mean, it turns out these are all things that you actually want. Right.
>> You want the test to be written for You want a failing test to be written and then only enough code to make the test pass. You don't want to say the same You know, you don't want to have code that does the same thing in two places.
>> Yeah. You don't want to do premature engineering.
Uh and it's it is absolutely the case that superpowers are sort of optimized for individual or small team work, but I think that that's It is also the case that agentic dev is best done as individuals or small teams because one person can move so fast and make so and do so much work that it's it's harder and harder to keep a large team on the same page and contributing together.
Yeah.
The cool thing is that that means that individuals and small teams now have an advantage that they never had before.
Right. Um you know, it's like I I have joked about we're entering the age of So, Amazon had this idea of a two-pizza team. It's if a team is bigger if if you can't feed your team with two pizzas, the team is too big. Um and that's like that's the right size to get work done.
And I think that we're kind of approaching the era of the two-pizza carrot suit.
Like and it's it because small team small teams of humans can stay aligned on a vision and a plan in a way that a large organization can't. Right.
>> And with a small team who have a lot of essentially agentic workers they're able to do things that used to take 10 times as many people, 50 times as many people.
Yeah, I I I get a feeling that humans are becoming the bottleneck. Reviews are becoming the bottleneck.
And single-member two-member team with the help of coding agent agentic loop can outperform the other teams who require code reviews and approvals for everything.
Yeah. Yeah. Um I mean, it's it's always been true that big companies get slower and with you know, lots of There There is all There are often reasons for needing those reviews, but um you know, the the the CEO of Tailscale, Avery, has this post about every you know, how every layer of review that you need to add doubles the amount of time things take. Mhm.
If every you know, if every line of code gets read by a human it's it it is going to slow you down.
Sometimes you want to be slowed down.
Sometimes there are good reasons to slow down.
But also, I remember even back in the Pearl days, there was you know, it was if you wanted to get a patch into a big project uh and you sent a 10-line patch somebody would reply with 10 problems in your 10-line patch.
If you sent a 10,000-line patch somebody would reply "Looks good to me. Merged." Right.
Um and like you know, human review of code is not is not the solution to a lot of our problems. It's a thing we do and we've always done because we are unsure and we're you know, and we're trying to take care. Right.
>> But human review doesn't catch the bugs.
In In the case of getting a patch into a Pearl language, I think the I think I think the best thing you can do is to demonstrate how Pearl fails at one thing.
Yeah.
>> With a failing test and then nerd snipe someone to write a patch for it. Nerd sniping is is the best way. I remember [snorts] many, many years ago So, Pearl historically has this debugger that's like it's been around forever.
Um it's it is a Pearl it is a Pearl 5 script, but it is so old or the Pearl 5 module, but it is so old that it is.pl.
>> Pearl 5 db.pl. Yeah. Um it is it has my favorite comment ever in a piece of software.
That comment was, if I remember correctly, increment I.
And the comment was a lie. That wasn't what the line of code was doing.
Um but I remember, you know, many years ago uh we were at a YAPC::Asia with uh our friend Leon Berkhard and we were talking about the debugger. Leon had had a beer and I said something to him like, "I think it's it would be impossible to make a new Pearl debugger. There's no way you could do it."
Uh he made a new debugger for Pearl that week. Yep, I remember that. Yeah. It's in the Devel::Ebug. Devel::Ebug. It had It even had time travel. Like it was actually a very nice debugger.
>> [laughter] >> Right.
It's been too long since I've written Pearl. Yeah, it's been too long. I haven't writ The last time I wrote any lines of code was October. I wrote three lines of shell script in October.
Okay.
Yeah, it's been 6 months now. Yeah.
Uh but I've had the most prolific 6 months of my career. I've shipped You know, it's a weird week when I don't ship a new product.
>> [laughter] >> Right.
>> [clears throat] >> Wow.
What a time to be alive. Yeah, it's it's fun. Yeah, it's fun. Uh it's like I know venture capitalists who have taken leaves of absence from their VC firms so that they can write software again cuz cuz it's fun. Yeah. And it's all these people who are too senior to program anymore are super successful with these new tools because the skill the important skill is you know, isn't and wasn't do you know the syntax? It's can you explain the idea? Yeah.
That's always been the case except in reality you have to type all the things anyway.
You either had to type all the things or you had to hire employees. Right. Which which has a lot of layers.
Yeah. Um and you know, it's it's a lot easier to you know, have a bad idea to in the morning and tell your tell your employees Claude and Codex, "Hey, go try to do this thing for me." Mhm.
Your human employees don't like it as much when you call them at 2:00 in the morning and say, "Hey, I want you to try this thing for me."
>> Yeah.
Exactly.
Well, as you see, it's been a pretty interesting and fun conversation.
Yeah, it's it's it's been good to chat.
Yeah.
If uh if someone wants to contribute to superpowers and even probably a sponsor there's a GitHub sponsorship, right?
There There is GitHub sponsorship. Yeah, it's like I am very happy to have, you know, some some folks who are sponsoring me and but it is not absolutely not a requirement for anybody. Uh I don't ever I don't ever This is you know, for for better or worse, it's a thing that that I am happy you know, I'm happy to share. Um but yeah, it's There's GitHub sponsors. Uh there are a bunch of open issues.
It's you know, it's worth talking things through and if there's stuff that people want to contribute, I'm very happy to have contributors.
That sounds good.
All right, Jesse. Thanks for your time.
Thanks so much for having me. Yeah.
All right.
If someone wants to contribute to superpowers and even probably a sponsor there's a GitHub sponsorship, right?
There There is GitHub sponsorship. Yeah, it's like I am very happy to have, you know, some some folks who are sponsoring me and but it is not absolutely not a requirement for anybody. Uh I don't ever I don't ever This is you know, for for better or worse, it's a thing that that I am happy you know, I'm happy to share. Um but yeah, it's There's GitHub sponsors. Uh there are a bunch of open issues.
It's you know, it's worth talking things through and if there's stuff that people want to contribute, I'm very happy to have contributors.
That sounds good.
All right, Jesse. Thanks for your time.
Thanks so much for having me. Yeah.
Related Videos
VALORANT's Latest 'Exclusive' Tier Bundle is Rough...
KangaValorant
17K views•2026-05-28
Flight Attendant Mocks Poor Looking Black Woman — Mid Air Announcement Exposes Her Real Power
SkyboundStories-b4r
184 views•2026-05-28
I FIXED My Friend’s Blown Turbo RX-8… Then Sold It
Cameron-RX8
134 views•2026-05-28
NewsWatch 12 at 5: Top Stories
NewsWatch12
1K views•2026-05-28
Simon Jordan & Danny Murphy deliver PREDICTIONS for Arsenal's Champions League FINAL with PSG
talkSPORTArsenal
6K views•2026-05-28
Botting is OUT OF CONTROL in Classic WoW (Again)...
SolheimGaming
108 views•2026-05-28
The "AI Job Apocalypse" is CANCELLED!
WesRoth
9K views•2026-05-28
STREET FIGHTER 6 - INGRID Story Walkthrough @ 4K 60ᶠᵖˢ ✔
RajmanGamingHD
12K views•2026-05-28











