A safe AI-driven software development workflow uses a three-layer system where the operator (user) defines what and why, the senior engineer (chat model) handles reconnaissance and planning, and the coder (implementation agent) executes changes, with deterministic hooks and sub-agents providing circuit-breaker protections that prevent irreversible damage, secret leaks, and unauthorized merges, enabling non-coders to ship complex software by describing goals in plain English while the AI handles implementation details.
Deep Dive
Voraussetzung
- Keine Daten verfügbar.
Nächste Schritte
- Keine Daten verfügbar.
Deep Dive
This Workflow Takes Vibe Coding From A Meme To A Real Software Builder That Can Make MillionsHinzugefügt:
Okay, so I made a little slide for everybody here that walks you through visually kind of how everything works in terms of the workflow. I guess I'll touch back on it real quick. I'll read the post for everybody and then we'll go through the slides. So there's two standalone kits in there and it includes all the safety rails, all the stuff we're about to go in. I call it a smarter way to run the AI engineer because I've dealt with so much crap on the back end and I actually blew up my own app in the process on accident just because I got caught up in the hype. So, you know, uh the short version of it is essentially is you talk to your engineer in plain English the same exact way you do now. It writes the instructions and then your AI coder does all the work.
So, we'll just go through kind of piece by piece, right? Um, so this is how a non-coder will ship real software by directing their AI agent. This will apply to everybody, not just in the call, but on YouTube. Everybody that really has zero idea how this works.
It's pretty much going to give you a ground level. So skills, consider them recipes. Like I was mentioning earlier in the car, your sub agents are like your consultants and then your hooks are circuit breakers. And that's a good way to think about it for people that have no idea what those are. And then we go into this. What does the problem? What problem does this solve? Right? So, we all know about stories about people that let their [ __ ] AI agents loose on their code and then like something gets pushed, committed, and shipped and then you can wake up with half your app broken. That happened to me. Or maybe even worse, you could leak some of your secret keys. guys are pushing to their GitHub uh their Git projects and they have like secret keys in there. They have a ton of credentials that are just wide open and people could literally go in there and see everything and take over all the sensitive parts of your project. So, and many people don't even know that this could happen if you're just getting started into this. It's like it's like it's shocking to learn that you can actually expose yourself.
And so for people that really don't know what they're doing, this is going to be a lifesaver for you because it's automated. There's a machine layer in between you and the mistakes you can make. Okay? So it's like fencing off the irreversible damage. So your agent cannot ever delete your app. It cannot ever uh force some push or a merge or commit. It has to it has to constantly check itself. There's so many little layers underneath. So this one is the whole thing in one breath, right? So you describe what you want and how you'll know it works and then the agent owns the how and the hard rail stop the irreversible. So you talk to your engineer in plain English, your chat side. It'll write the precise instructions for you. Then the agent builds on a side branch, never on the main. Then it has to prove that it works and then it stops right there. There's a machine gate that stands in between the agent and your main branch. So it can never just autoload onto your actual project. A beginner can't break anything. And advanced folks can run big autonomous batches where you saw some of my other ones when I screen capped the work time was like 12 hours long. So you can build super complex pages with multiple pipelines like oath running through it capturing payments and a ton of other stuff like iframes that have to render live when you press a button.
Whatever it is that you're coding, some something super advanced or if your idea is very complex, the coder owns the how.
You just describe your idea and then everything else is invoked automatically and you walk away. And you walk away knowing that nothing can actually break the current progress you've made on your app, which is big because some people want to move extremely fast, which means potentially breaking things. So you need to move extremely fast by making huge changes to huge swats of your code. So the problem we're solving is how do you do that? But how do you minimize the blast radius in case something goes wrong? Okay. Then we have the mental model that you just need to learn for either this workflow or even the older ones that we have in the chat that still work for people today. You have three roles and this never separates. And this is the best way I found to do this. I don't know any other way if it's uh better than this. And there might be a fourth role we could add later, which is kind of like a an adversarial role that does checks in an adversarial nature, but that's something I'm working on for later. Right now, you're called the operator. That's you. You you be you behave like the CEO, right? You own the what and the why, what you want out of your app and why, you know, and how [ __ ] needs to work. You set the risk. You can decide. There's a tier system. We'll get into that later. You can say push or hold. You can run uh only with git push or sorry it runs only with git push and click testing. So what does that mean?
If you don't know there's a terminal command it's very simple super basic called git push. That's literally the only command you need to know in this entire workflow. Literally get push. The click testing is you literally opening your app and just clicking on things and that's it. There is no other terminal command. There is no other piece of code you need to know. There's nothing. It runs all-encompassing by itself. All you type in the terminal sometimes, once in a while is that word there, get and push. Enter. Done. That simple. Okay.
The second layer in this is your senior engineer. That's what we call it. That's your chat model. That's your chatbot that you're used to talking to normal on a day-to-day. So, he does the recons.
He's reconnaissance on your codebase. He scopes out the project itself, writes the instructions for the prompt, and then he gives the verdicts when the reports come back, right? And then he plans and he reviews as well with you.
He's never the one executing. You separate out the one executing from the one coding. And that gives us a little validation layer. So chat side's always reviewing what coding side does because coding side will self validate and pass itself arbitrarily at times.
hallucinations or worse perhaps just outright lying. And the the code the the coder does that and I've caught mine doing that. Fabricating results is actually a problem. When your code gets complex enough and your test harnesses get big enough or very complex, it will begin to fabricate and lie on purpose to just pass the test or it writes its own test that it then passes itself on. So they're they tend to do that. People have no idea either. I didn't know until I was catching mine doing the same thing. So the third layer is the coder.
It's the one that's implementing and proves the real build, right? It owns the how. So I want my page to do this and then the how part is not your concern. You set it loose and its job is to just figure out how to make the what and the why come to life. Okay? And it never makes any product calls. It never tells you uh hey we should do this. How about how about we do it this way instead? Or what do you think about this addition? Never. That is only discussed with the senior engineer. That's where you scope out exactly what you want. If something feels wrong, ask yourself which role failed. If you got a vague idea, you got to rescope. If you have sloppy instructions, you have to just reauthor it. If the agent built a bad instruction faithfully, you could just delete the branch and then you reauthor.
It does not affect your real code ever.
Almost nothing is unreoverable and this is by design. So you almost quite literally it's hard to [ __ ] this up for lack of a better word. Okay? And you only ever need to run two things on a computer like I said earlier. Ship it to ship your stuff. It's just literal get push. That's it. And then click around your app. You guys this this about as simple as it gets. Everything else the files the tests the branches merging all the diagnostics that's what the agent has to do. If your engineer starts handing you commands to run, that smell that's a the smell that something's off.
So when you're talking in a long conversation, the context rot starts to get unbearable at times and it may forget directions. It may forget project instructions. It won't reload its context properly and then it starts telling you to do things that you already have rules against. So it might tell you run this command. You're going to go into the command and do this. If it starts forgetting things and it starts making you do things you know you're not supposed to be doing, then you already know you're dealing with some context raw and it needs to reload its memory. Either you start a new uh chat completely with a little handoff prompt or you're going to have to ask it to reload all the contacts before authoring anything or answering you even with one answer period. And uh I found that fresh chats do best more than a super long chat that refreshed its context. Okay, so the next part is the heartbeat. So what does this loop look like? So when you're prompting, it starts like this. The very first thing it'll ask you is recon reconnaissance.
It drafts a prompt where it asks your engine codeex or claw code to go crawl your codebase. Then it'll name the scope, right? So let's say I want to add oath for stripe payments and have the whole pipeline firing end to end where I can capture payments from a client. So what it'll do next is it will name the exact scope completely once the recon comes in. It'll be targeted exactly what you need. It'll invoke contact 7 to grab all the latest uh you know credentials and how everything should be wired up properly. And then it asks you to pick a risk tier, which is like a tier one through three. So tier one is like small surgical changes. You don't need all these gates and all this stuff slowing you down in too many tests. If you're just touching a specific part of the code, very small change, or you're doing some kind of document update, something that doesn't require all of these uh gates and guards and all of this stuff.
It just runs through everything completely. It merges the new branch, cleans up the trees left behind, and you're done. Tier two and three is where it auto stops and it asks you if everything looks good, and then it will do a merge. And by ask you, I mean it gives you a report. You just send the report to the chat engineer. So all you're doing is moving reports from the code to the engineer. The engineer reads the report, comes back and it tells you if things are good or not. So the engineer will then write this initial prompt after you pick your risk tier.
Say tier three. You're dealing with very sensitive stuff. Lots of secret credentials, variables, stuff like that.
You don't want them leaking. Then the agent takes that prompt, it builds it, and then it stops and then it gives you the report. The report is at the end of every prompt that you fire off into your engine. And then it tells you right there with like a green, yellow or red circle what the verdict is. Green is good. We're good to go for a merge. It's pushed and it's ready. There's no regressions on the test, anything.
Yellow if there was some issue that was detected and it it's a hold. And then red if we have some kind of failure, some regression and it's a stop also immediately. You look at that. If it's a green, you approve them. You approve the merge. You send that over to your chat side engineer and he authors the merge prompt. You fire off the merge prompt onto coding side and then you get a status refresh. You get a little status.mmarkdown file that's constantly appended with the latest updates that are happening. All the work you're doing is appended in this status refresh. And this is a file that lives in your repo folder that your engine has access to always. So it always sees what the exact status of your project is without bloating project files on your chat side because we want to keep that as lean as we can. The final step in the loop is cleanup. And cleanup is just getting rid of the branches that were created, the work trees, and cleaning up after itself. So you don't have a bunch of branches just constantly sitting there open. You want to open a branch branch off your main in a work tree and something isolated depending on the code you're touching.
And you want that to be cleaned up and merged. You don't want those just sitting there open. And then you have a ton of branches off of your main that are not active. So that's how that works. Then this is the map for you guys. So the whole system is four layers. You have your instruction, how work gets specified, and that's a skill.
Then you have memory, safety, and merge.
So that's essentially it's like the four-part system to this thing. So you give the instruction, the system will read the repo, so it has good memory.
And then there's the hooks and the sub aents that make sure the safety is always checked off. And then your final layer is the merge that I described. So your work reaches the main branch safely with no problems. And then the newest part of this is of course the hooks, the sub aents, and the skills. And that lives in the safety and instruction section. That's our deep dive next.
That's the one that most people don't actually know how to use or never even used. They're not uh existent in the previous workflows at all because even I didn't use them in the previous workflows. So uh two questions that tell them apart, right? Who summons it and what can it do? So if it fires itself or you call it, which one is which or which one can block, which one advise and which one standardized? So fires itself on the automatic side are hooks, right?
They can absolutely block emerge. The hooks are set up, they're guards. I call them guards also. They're set up in a way to block or force reports to be truthful and detailed. I have uh I'm not sure if I have this example pulled up, but my claude I asked him for a report on what happened in a very nonchalant way and it tried to answer in a few short sentences and then the hook forced it to actually site and print the report itself fully. So it can do hooks can be programmed to do a bunch of things. Uh the way we have it set up here is more like a circuit breaker for destructive commands and unauthorized merges and you know things of that nature. We also have um a hook or I should say yeah you could say a hook that runs automatically that does like a a kind of I guess you could say git leaks. Git leaks is what it is.
So, git leaks, we have that running automatically. And what that is is once a week it parses your GitHub, all your PRs, your commits, and it looks and it double checks also live while you're merging. But once a week it checks your codebase, it checks your project on GitHub and it makes sure that you don't print any credentials, any secrets, any ENV stuff, anything at all that you don't want there that should not be in the public. that works live while you're merging. That's part of the merge gate, but it's also on repeat once a week.
Okay. And then we have the sub aents and they can only advise explained earlier in the call. So the sub agents is basically something that is summoned by the main agent you're talking and coding with, excuse me. And it'll hand them over to do a bunch of different tasks.
We have ours set on read only because this is actually how I blew up my project which is why I haven't been streaming because it completely broke. I used Claude's new ultra code feature which spawn sub aents and unfortunately I had a massive prompt that was dealing with a ton of code on the back end. And the sub agents weren't just writing the reports and doing reconnaissance. the sub agents were actually writing code and grading themselves and then combining that all to into one entire report. So what I got was some hobble together [ __ ] that completely broke my app. They were each making decisions independently of one another and the validator side don't even know how it works on ultra code but the way they checked themselves and this could have been me authoring the prompt incorrectly but I doubt it cuz I had the chat side do it. it turned out to be a disaster and so I lost like 4 days trying to fix it. So and that's how we ended up getting into okay how do we make sure that never happens again. So skills, you can look at them like recipes. Like I said, it's a standardized task, right?
Uh so imagine you do something like in codeex, it's the dollar sign, I believe.
So it's dollar sign and then you invoke the skill name, whatever it is. In claude code, I think um I forget what it is for claude code to invoke the skill.
You could just tell it use whatever skill. You can tell it to create a skill. So for example, how you create a skill in claude. Let's just say let's say you are supposed to be designing five pages in your menu system for your app, right? Settings page, uh you know, legal page, about us page, [ __ ] like that, right? And you're trying to design the way it looks. And you came up with a process of using a reference image of what you want it to look like and then a loop of check versus the reference image. You must create all the elements you see here exactly how you see them. Make sure the elements the like um images on the page. If you have something showing like this, like we have here, you have a little bot emoji, a little notepad, you have, you know, whatever you have these little uh SVGs there. Make sure you reproduce them exactly. Now, you're going to have to do that little workflow five times for those five pages over and over again. Or you could just have that highly repetitive task turned into a skill. So, the initial workflow you had yourself, your little mini workflow to get your page up and running, you could just say, "Hey, Claude, turn this into a skill cuz I'm going to use it for my next four pages." And so, he makes it into a skill. He takes all the little iterating you did, all the little problems you guys ran into, all the little extra prompts on top of fixing whatever needed fixing, and then it knows exactly what to look for, what to do, and it does it exactly like that. That's why it's called a skill or a recipe you can consider because it follows the exact same directions every single time. So you could invoke a skill to build the next four pages just like that really quickly with a simple like you know dollar sign command and calling it like um polish pass and then that invokes your skill. Let's run a polish pass for this page and it starts to automate what it's doing there. It's really cool when you start to get in this. So for the skills like I said they're recipes. It's a saved name procedure that the agent follows step by step and you say the one word and you get the same recipe. It's the same quality guys every single time.
For example, ours here in this project, one of them is the session recon. And all you have to do is invoke the session recon. Literally that word session recon or session. They have them for sure. And we have a bunch of them here. So it runs session recon. It runs targeted recon.
We have cross branch recon. We have locked in variant audit author mega prompt. So even your mega prompt is already turned into a skill that it you just literally say author mega prompt checks the skill sees okay I need to make this exactly exactly like this there's five pieces to this prompt there is the recon there's the scope there's the test there's the gate there's the block and it just knows to do this every single time you have your prehandoff audit closing discipline and then your review commit and that's it and you can make a new one when you catch yourself giving the same routine twice or more, you know, like a release build recipe or a dependency audit or, you know, like I was saying, if you're doing polish passes or page building or you can have a skill for anything that is repeatable or routine that you're constantly asking your your uh coding side to do. And then the sub agents, these are what you would call your little consultants, right?
They're like the specialist robots. I told you the main agent fires these off as needed. You have the one that's the security bot that's checking so there's no leaks on your credentials. You have the code mapper. This is the one going into your codebase double-checking everything. This is the the state is the one that checks, hey, are we current on the branch? Um, is everything pushed already? Where are we in the state within the codebase? You have the visual, which is the one that's validating, hey, does this actually look how it's supposed to? You have the merge gate. This is the one that makes sure things are merging properly. And then on the far right, you might not see it here cuz of my head, but it's called docs, and that's the one that's tasked with producing all the documents properly.
So, the main agent essentially fans out all the work. The sub agents only advise they never get to merge or change any files. The system's just not good enough for that yet. And it's just one layer deep. So, sub agents can't spawn sub agents and then onward. It' be a waste of tokens. Okay. And then you have your hooks. And the hooks are like the circuit breakers. These are just like I said earlier, deterministic scripts.
These are just pure code. Has nothing to do with your AI. They fire automatically. They run every single time. Every single time. And they cannot be talked out of it. So you can't literally say to your chat side engineer, uh, I am overriding everything. I want you to merge this. I don't care. Even if chat side will author the prompt and it will because you overrid it and it's made to listen to you, your hook will trip. The circuit breaker turns on, it will block the coding side from merging it because it sees that there's an issue with the merge. So there is no way around this where you can [ __ ] yourself over in that sense and nudge the prompt to break the rules. And that's for noobs. Incredible.
So, you submit an instruction, right?
Like a nudge, and you say, uh, a fix without doing recon first or a merge that doesn't have an exact target. Um, you get a warning and be like, "Hey, maybe we shouldn't do this, right?"
Then, right before the command runs, you're going to get an irreversible, a stop. So something happens like for example the commands are running and um it's printing your report and then as it's printing the report there's a secret there in the report the full API key is visible right or a token or some other credential boom it stops immediate hard block there will be no merging no pushing nothing nowhere uh RM and RF style deletes like I said can't nuke your codebase force deleting your work can't do that it cannot push to your main branch without any permission. You have to specifically give it permission. That might piss some people off because they just want everything automated in one go, but I'm telling you, this gives you a large sandbox to play in so that you can move much faster in a much safer fashion. Okay? And then turning off security checks, you can do that. So even if you'd like to turn off a security check, uh like git leaks and all that, it stops your merge and it will always stop your merge. That's super important, right? And then there's the other blocks when it claims it's done. This is what I was talking about earlier. No weak finishes, okay? Which is it blocks any report that simply says done or a vague variation of test pass with no proof lines, direct citations.
It forces the agent to keep going. So quite literally because it likes to hallucinate itself and tells you it passed the test. It will just say sometimes it prints passed the test.
This hook will force it to go back and cite line by line exactly what passed and how. This little circuit breaker setup here is what's going to make everybody's autonomy completely safe or safe for the most part. There's always ways to [ __ ] this up, but for the most part safe.
It's like a seat belt and airbag that don't depend on the driver. Okay, they never print a secret without it getting blocked. Okay, it's an incredible system and I've tested it on myself on my app right now and I have these both running now and I can say I finally fixed all my [ __ ] and this is working. It's because of these things. I already implemented a couple of prompts that noticed a lot of these weak finishes or you know some hard blocks that were coming up otherwise the agent was saying it was fine we passed let's merge and it got blocked completely. So very good. So again to recap skills will standardize your work sub agents will advise the main one and then the hooks enforce everything. Okay and that's it. That's kind of the three-part to this system that was added that makes you more efficient in the long term, no matter what project you're going to be doing, right? So, for example, for your sub agent, it's a consultant that reviews the plans. You only want it to review plans. Never ever get these coding. Some guys are more comfortable. You can build your own sub agents that can actually help with the coding side if you wanted to give them that power. Like I said, this is a uh like a foundational that you can build up on that's maybe specific to your project. Okay. So, what do the instructions look like? What's the instruction layer? So, the prompt itself when you're making this prompt, what does the prompt look like? Okay.
Number one, it's recon. So, it has to prompt the coding side to learn about your codebase. Then number two, on the prompt, it'll be a goal. So it'll describe like goal I want my Stripe hooks to work uh my Stripe to be um captured all my payments set in my analytics page and some something else yada yada yada. So it will describe exactly the behavior that the finished thing must have. The stripe page must work like this. Every button functions like this. The drop downs do this. This must be connected to that. And then there's a section underneath that's called bounds and what files it may touch and what files it may not touch, right? What's off limits. So that's it scoping properly to not just give it free reign over your entire codebase, which it will know because of the recon.
So the recon grounds it in the truth. It knows exactly what it needs to touch and what it should not touch. And then you have a proof contract. So that's essentially the harness, the test. How do we know the thing you're building works? Well, you have to have proof. And the proof is kind of like a two-layer system. And the two-layer system is deterministic tests like CDP or playright. And then the layer two is computer vision. Computer vision actually opens up your real app and just like a person, it goes and it clicks on the thing and it screenshots and it sees its behavior. It's not enough that playright might say it passed on the backend code, but when you go and click on something, nothing happens because the pipeline doesn't fire correctly end to end, there's something happening. And so we build a system that can test and proof prove that it actually works. And then the process rules, that's the simple stuff like writing your report.
Here's how you have to structure that report for you, the operator, the CEO.
And you're not allowed to merge unless you were given tier one, you know, uh, overrides, for example. It's like a surgical thing. So the shift from the old way is that you describe what and not how. Its job is to figure all that other more complicated [ __ ] of how to actually structure the, you know, plumbing underneath the code. So the agent has the live code in front of it and it writes better code when you stop dictating all the steps, which is what our old workflow used to do. We used to tell it exactly what files to touch, what code to touch at what line and what needed to change. And sometimes the chat side engineer has stale information and it would contradict the coding side and the coding side would have to make uh unilateral changes itself. And in the very first workflow, I started noticing those unilateral changes more and more.
And then I asked this how I evolved into this. I asked Chadside, I'm like, "Hey, your information is often stale." He's making unilateral changes that are actually the correct way to fix something, not your assumption of how it should be fixed. How do we fix this gap so that you stop making such surgical assumptions that are stale forcing the code side to make on the-fly decisions on how to properly move forward? And so this is how we're slowly evol evolving the workflow into what we have today.
Okay. And then the prove it contract, right? Again, the hallucination [ __ ] is super bad, you guys. Test passing is not test passing. Okay? You have to show me that it works. Show me it works. So, a picture can't lie, right? The layout, the color, you know, copy, the screenshot is the truth. That's it. It has to click on the thing and see, does it look like what it's supposed to do?
And at the same time, a picture could actually lie because I've had this exact scenario happen where I have a little status pill that might say connected for a platform, but the platform itself underneath in the code is not actually fully connected end to end. So the connection's actually dead. Your status pill will say connected. The picture would pass that as connected, but the CDP or the playright underneath would have caught that. So the badge is lying.
Okay? So connected badge, for example, would prove nothing. Okay, so anything with hidden states like login, connections, save data gets a real check against the real installed app, not just a picture, not just CDP or Playright.
So, it's always going to have to use your real app. Okay, everybody following so far? We're almost done. Okay, then we have the merge layer. This is after all the prompting, after the building. Um, you have these three tiers that I told you about. So, you're setting like the risk dial on your prompts. How risky is the code that you're touching and how much gates do you want? So, if it's small and safe, like I said before, it'll run autonomously fully on its own.
And uh it only does that if it passes the safety check, which it almost always does on tier one. Tier one is very small stuff. Tier two, that's your default.
You're always pretty much in the state.
If tier one is not outright stated, it just defaults to tier two. The agent builds and it will stop there and it just stops for your okay before merging.
Okay, tier three, that's when you're working on crazy stuff that forces the mandatory human review layer before anything important gets merged ever. It will never push on a tier three, not even on a tier 2. But tier three has some extra checks underneath to make sure there's no regressions. And then you pick the tier. The agent can never give itself more rope. So it cannot essentially start with say u tier two or tier three and then somehow for some reason hallucinate and then give itself the power to move to tier one. So it can just fully merge and delete the branch.
It can't do that. It can't upgrade itself. It can never change. If it's on tier two, it stays on tier two. Period.
That's also a very important thing cuz how do you stop them from granting themselves permissions? Okay. So, you build first. That's the first layer in the merge. And then shipping is a separate yes that you give as a human reviewer. The do not merge by default is probably the safest way you could go about all this. And it helps with 99% of the problems you're ever going to deal with. So that's why we keep building and merging as separate permissions within the system. The agent builds on a side branch and stops. Never on the main and then merging its own checked step.
That's it. It's its own check step.
Period. That requires the human intermediary. So done the safe local way. You never touch a merge button on a [ __ ] web page either. You're doing this through the chat site engineer in the code. Period. Okay? and it'll tell you with a little um status pill what is built and what might be a stop and when something is ready. So the merge sentinel right here what it call called what it called itself that the sentinel the merge sentinel cuz it job is to watch for the merge gate. So it looks for a clean state the right code and all the tests being green. If any of those are not correct, your state could be correct and the code could be good, but one of your tests fail. You will not it's not going to push. It will only push when those three gets satisfied.
That's why it's called the sentinel.
It's always watching. So the one dangerous step is putting the code on the main, right? So that only unlocks when everything lines up, not before.
There's no magic word where you can skip this. If one of your tests fails, if the right code is not the right code, and if your state isn't clean, no matter what you do, you will not be allowed to merge, and you will be forced to fix whatever the issue is. So, in a way, your hand is being held. Why does this work? Well, think about it. You could be 2 a.m., like I said, super tired. You're not paying attention. You're clearly not reading the reports. you're in kind of a you're in auto auto mode yourself. You have Claude running, Codeex running, maybe Grock running, who else whatever is running, doesn't matter. Maybe you have multiple codeexes running, multiple different parts of your code. Same with uh Claude. And once the reports are coming back, you're just pasting reports into chat side, chats, and then you're just outputting the prompts over on the uh coding side. And you're doing this constantly, and you're going to lose track. I've lost track plenty of times.
Or you need to step away from the computer and you have three, four different things that you're working on and you're going to forget the order you're working on them on and you're going to forget what needs to merge first and what will be safe and not cause a collision or overwrite a part of the code the other branch is working on that you might have a dirty checkout tree or something like that. You can't keep track of all of this [ __ ] at night or when you're tired or when you're distracted or stepping away. So we put the machine layer in between so that you could be working on 10 different things all at the same time and the merge sentinel will make sure for you that all the tests are passing there's no regressions going on your main the code is correct and finally the state is clean so you get clean merges throughout and you cannot brute force that you can't just say you know I authorize you I override you to do this nope It's built in cuz you might be upset one night. You might be dealing with an issue. You might be running two different things. You might have even forgot one. And then this is how you ship broken code that could screw your app up. So genuinely, this has been a huge help for me cuz now I can offload that part of my thinking to this machine layer. And then why this beats the just chatting, you know, with an AI that we used to do. So on the very very first workflow we had when we were like barely learning this stuff, it was super vibe mode. You just chat, knuckle drag it, you're a complete idiot. You just ask it dumb questions. You accept whatever it tells you and then you just like hope everything kind of works. You had no rail guards in place. You were not looking after your keys being leaked potentially on GitHub. Some of the work you did before was getting deleted and you had to redo it. you would have a broken main branch sometimes if you're merging improperly. Uh things were getting pushed without your permission or when they shouldn't have gotten pushed. You forgot a conversation you started earlier and you know [ __ ] was happening. Everything might have looked done but when you launched your app it was broken and too late you already merged all that work into the real one.
You already pushed it to your head. So that was the old way and we were completely relying on just the chat engineer to make sure it had proper context that it could remember everything it needs to remember. Hey, make sure you don't leak stuff. Hey, make sure you're not breaking other parts of the code. Hey, make sure you're not touching other parts of this. Hey, make sure we're merging properly the branch. We don't have to deal with any of that [ __ ] anymore. The system has the three missing things to what the old way is a disciplined way to specific work.
Now finally it's a deterministic safety which we were lacking before that doesn't depend on the AI behaving remembering or your chat site engineer having proper context it's all in the guards in the sentinel a way to prove the work is actually done because how the [ __ ] how many times have you prompted it to do a change it tells you the change is done you go open your app and not shit's not working quite often for people that are working with the old way and then you would have to go back and say Hey, no, wait. You have to This button didn't work. You said you fixed this, but then this button's not working either. Why is that? No more dealing with that kind of [ __ ] It has to prove its work. So, what this buys you is more autonomy and less risk. Some people may find the Sentinel and all the guards and the hooks annoying because it does add a little bit of overhead versus the whole Wild West way of chat to the engineer, get your prompt, paste your codes done, open your app, do your own human smoke test. Okay, that's great when you're working by yourself and you have nobody else that you need to maybe report to in the future. If you're building an app and you want a breadcrumb trail, you want a way to hand off to the engineer why decisions were made for each commit, for each merge, what happened exactly, the diff, everything that was changed.
You're going to need a system like this so that one day should you build an app that is successful, you can hire an engineer, hand it off to them, and they could file by file go through and understand how the code works, why this decision was made, and how they can maybe improve upon it. Does anybody have that system currently right now that's not using these workflows? I know a ton of people that are building their apps and they don't have report systems. So, if you're successful and you need to hire some guys to software develop for you and maintain your code, they're going to ask you questions. They're going to say, "Hey, um, on this part of the app, why did we xyz? What are you going to tell them, the non-coder? Are you able going to pull up a report for them exactly detailing exactly what was made, why it was changed, and how?
dated, stamped, everything with a report showing the test that were passed, where a regression might have slipped in or a bug, anything like that. Do you have a file system ready? You have to think in terms of long-term. This is for people that are interested in building apps that are actually going to serve other people and will need to be managed by other people. That's what these systems are working on. Okay. So, the power and the on-ramp, you're building your own machinery. That's what we're doing here, right? If you must guarantee something, that's where a hook is built. So for people that want to take this to the next level and build their own stuff on top of that, that's what you need to remember. The hooks are the deterministic layer that can't be bypassed. It's a rule that can never break no matter what the AI tells you.
The sub aagement, the sub agent, you can build them to do whatever. We have ours set up to do specialized reviews that only report back. Okay? And then if you have a repeatable part of your process, that is a skill and you can build that routine into special skills that could be invoked in your system. Everybody's apps different and you have different routines you're constantly running.
That's where you need to make a skill out of them. That's going to save you a ton of tokens. So number one, you fire the recon prompt read only. You get the ground truth of your code. Run the adoption checklist once. all the little rules we have and then paste the bootstrap and you get walkthrough setup.
That's it. So, it installs onto a project that you've already built. It's very simple. And the recon reads your setup before changing anything. So, you drop these files in there. It'll do a recon. It's already instructed everything in the file systems we have in school. And then it figures out what your app is, what it's doing. It'll ask you some more questions. And once it gets a good idea, it goes through the checklist of what you need, what guards you're missing, what sub aents you could obviously add to your project, all the skills that come already pre-installed here. The agents pre-install the hooks and anything you might want to build on top. That's up to you. You just paste the Bootstrap and it walks you through the setup just like all the other ones.
So, it installs onto a project you've already built and then it recons your setup before changing anything. So, that's a it's awesome. With the chat side engineer, you get this done without doing even much thinking over a coffee.
So again, what's the takeaway from all of this? Despite there being more overhead and it seemingly more complex than the old workflows, it is much simpler in other ways in terms of having more autonomy, far less risk. You can now go balls to the wall with crazy changes, huge scopes, like goals that take maybe six hours to complete. Who knows? Spawns multiple sub agents. All of this [ __ ] You can get as complex as you want to get now. And you know you have a sentinel, a guard in place that's just not going to let you [ __ ] anything up. Okay? As long as you're using this workflow.
You describe what you want and how you'll know it works. The agent owns the how, right? Skills are the recipes. Sub agents are the consultants and the hooks are the circuit breakers. Get that through your head because that's the fastest way to understand how this stuff works. The irreversible stuff is always fenced off, guys. And the agent always has to prove the work because it's a bot. It hallucinates when you're just talking to it normally in a chat. It will hallucinate results when you tell it to test the code. Okay? Any questions? That ends the um slideshow for our workflow upgrade. If you go here in the community, this one is in the private community. It's at the top pinned and it's the major upgrade to the workflow. You just click here, you get the link to uh both either the codeex only or the claw only kit. This will just describe you essentially what we went through here, what's in each kit.
You get the start here document and let me pull these up actually. See, so you have these documents here, right? You have the start here. Literally, you just read this a couple lines. You read it in this order. Start here and it'll just tell you how a non-coder ships real software with clawed code. This is the claw code one. Kept on the leash by deterministic safety rails. This is the successor to the original workflow kit that we had. And then you go through project memory. You go through the mega prompt anatomy first. Here's the scoping ritual. The five parts, the recon, the goal, how it looks, the outcome to reach, every visible thing the user should see, every interaction and result, every back-end side effect, every edge case, every integration surface where a status model exists, every state with its trigger and its meaning. A halfdefined goal is halfone.
Typical non-mandatory structure user flow as a story visual with named references state model interaction on every click hover focus blur input backend anything IPC fired stores written external calls failure modes and recovery named edge cases quite literally it goes through everything and we have it all here for you right the merge model and how that works how the operators set the tiers all that stuff the split workflow how we authorize everything, you know, from implementation of the recon branch, the merge, the status refresh, and then the cleanup at the end, getting rid of all the stale trees and everything else that we've went through in all of the nine files, the guards and the sub agents, what they do, they have them all here.
The secret credential printing one right here, the EMVs, all the other keys, all the tokens, it looks for everything.
Force deleting or destructive commands, right? TLS disable, all of this stuff is all built here. secret scanning. We have a secret scanner that's wired that checks everything for you, like I said, with git leaks. And um some of these run in uh a schedule. So at every night, I added my own. This is not in the workflow, but you can also tailor yours to add this. I've added my own um automated kind of hook that it checks for stale trees constantly. And then what ends up happening is if it verifies that there is no commit on there or code that is not on the main already, it's just sitting there unused. Uh it will just clean all that up for me. I may forget at the end of the night to do cleanup. So every night at 3:00 a.m. I have it overlook all my branches and just make sure, hey, is there anything here left over that we don't need anymore? Clean that up. And you can do little things like this for your file system, too.
And that should do it, boys. All right, that was a doozy thoughts.
>> I have a few.
>> Okay, three questions. First, let's just start with the last thing that you said because that was really interesting. The cleanup that you have. It's kind of ties into the next thing I'm going to ask you. Is that built into your workflow or is that something we should just we could easily prompt ourselves?
>> You could easily prompt. You could just tell it once you have this workflow built, just say, "Hey, I'd like to add an automated cleanup."
>> Okay. on top because you're going to already have a weekly scheduled um the git leaks, the secret printing stuff.
It's just going to check everything once a week too. It just checks your uh PRs on GitHub and then um you're ready to rock. You could just you can have it build whatever you want. Uh automate away whatever you want. Your sub agents, you can customize them however you want.
Literally, dude, you can have a sub agent that, for example, the next one I'm going to try to build is one that is like adversarial. I keep saying that word because >> if you I don't even I don't want a validator. We already have one. We have a validator that checks if the test passed. Actually, what that means, what we don't have yet, and I'm trying to figure out how to put in here, is someone that is adversarial in nature, as in it goes against everything you're doing, and it looks for problems. It's looking to go against whatever the stated goal is. So, if it says something like, "Hey, this needs to work end to end," its job is to simply go in there and see why this doesn't work end to end. How this is actually broke. does not work and I need to check it. It's beyond validation. It's actively like aggressive against the stated goals.
>> Yeah, that's hilarious. That's pretty much what I do for a living.
>> Yeah.
>> Uh >> yeah, exactly. Yeah. Well, that's that's what I was going to tie into. Um how much it sounds like there's for you not much, but I don't know. I could be wrong. How much manual testing is built into this? because that's the part I can't figure out for my own app right now. Like part of me inherently wants to like get my hands dirty and like actually see how it's working for like my perspective, but at the same time that takes so long. Like I just don't want to do it if I don't have to. You know what I mean? So how do you how is that integrated in this? Like how much hands-on testing are you doing? And like >> you're only you're only left to do the bare minimum. So it, like I said, it has the two-tier testing system. So it uses deterministic tests. It looks on the back end, you know, if you press the button, does the code fire off properly in the back end. That's like test type one, right? Is the code wired? Is the plumbing good? But that might not actually matter when you go into the actual app and you see there there's like a collision in the elements or you know like when you're minimizing or you know increasing the size of your page like there's a collision happening somewhere at a certain pixel size >> or the drop down is colliding into something else on like another section that can only be seen through pictures sometimes. So, we enable computer use on this setting if you're comfortable with it. I highly recommend it.
>> Uh, it takes control of your mouse and it literally opens up your app like I said in the earlier slides and it'll go and it'll click the drop down and then it's going to see the collision and then it'll say, well, the CDP test passed or the playright test asserted a pass, but clearly there's a collision error. And then it'll go back in the code and you can watch it do this. goes back in the code, it edits it, and then it goes back in and it opens your app, and then it checks for the collision again, and collision's not there. It repeats this with everything. Status pills, uh, connections. When you enter API keys and you press connect, does it actually connect? You know, does the button actually do anything? Cuz maybe pressing connect shows it fired on the code side, but there's something, you know, the credential didn't get stored for some reason, or, you know, the status pill didn't change. So, it starts to find other little bugs that are tied to some other [ __ ] that you would obviously notice as a human clicking around.
Anyways, the gist of it is every prompt it forces it to do these tests, which is what adds some of the um overhead on top, like things might go slower for certain things you want to go fast on.
But what this does is when you come back and it fully finishes a test, you open your app and most of the time [ __ ] will just work. You're offloading a lot of the smoke test you have to do. A lot of your clicking, your checking, your connecting was already done in computer use backed by deterministic tests on the back end. So you might notice something.
It'll be very weird if you notice something. Like I said, very little things get past it nowadays for me. It's almost always hitting on absolutely everything.
>> Hell yeah. Okay. And so the next thing I was going to segue into, but I'm based on everything you're saying, it's I think I kind of know the answer. Seems like most of this is automated, like like walk away automated.
>> Yep.
>> Okay. Because >> that's exactly how you want it. Yeah.
>> Yeah. And that's where I have not gotten to yet. So far, you know, I'm a little bit behind you. Uh, or maybe you already you all always were doing this. I don't know. But right now, I'm kind of still in the babysitting talking phase where I'm actually there for hours and hours talking back and forth watching >> this end. Yeah, I can't do that anymore.
It's It just takes too much [ __ ] time.
>> Yeah, that's the old surgical approach where you have to give it like surgical directives, super detailed, verbose prompts. Touch this, don't touch that, do this, step one, step two, step three.
That's all gone. It's goal based now.
You just describe what you want. You go back and forth with it one time. It grounds itself in your repo every single time. You go back and forth and it'll just ask you once like 10, 20 questions.
Who knows? I had one ask me 54 questions because the page was so complex. And you just sit there and you answer every single [ __ ] question. And if you don't know, tell it you don't know what that even [ __ ] means. Ask me in plain English, whatever. Repeat yourself.
Whatever you need. Once it gets all your answers and it sees what your code looks like, it then shoots off the goal oriented prompt in such specifics that that how part is now not your problem or chat site engineers problem.
You're just describing what the pages, how it works. You figure out how to implement the thing, what it's supposed to do, and then it sends that off, bro.
and you go walk away and you come back in an hour or two and you should have a page that is functional far more like 90% of the way there as opposed to the old way where you're building like the first piece of the page the skeleton and then you got to wire in half of the elements first like the header and then you have to wire in the middle section and make sure that that's functioning prop that's all done with it's the full page is getting built plus every element plus everything else. It's going to be tested on the back end. Does all of that [ __ ] fire? What happens with the drop- down menus, all the buttons, all that [ __ ] That's why these prompts are now going to take a couple of hours because it's running the test for you that you normally were doing after all day talking. It's done with your prompt. It coded it up and then you open the app and you go, "Fuck, man. This is broken.
You didn't do what you said or this doesn't even look right." You know, and like now you're going back and forth to fix the little thing.
>> This ends all of that [ __ ] Yeah, it sounds great. And what what about people like me who are in mid production? Like it's >> fine. Yeah, it works for everybody.
>> It's fine to Okay. Okay, cool.
>> Because it doesn't destroy anything. It just builds the scaffold for the workflow. It uh looks at your repo. It then sees you have no hooks, no agents, or maybe a couple skills. All it does is it preloads the hooks, the agents, and the skills. and the like five-part philosophy of how to author the prompts, all that [ __ ] The skills for like the recon session, recon branch recon, all of that stuff. So, it's all there gets built, bolted onto your project, and then the next time you'll open a new chat, it'll tell you, you know, directs you, drop these files in there. We're going to create this blah blah blah. And then the next time you open up a chat, dude, and you say, "Hey, I'd like to change this part of the code or I'd like to implement this feature. You know, I had an idea and I want to implement this." It'll say, "Okay, cool. Let's do a recon first. And then boom, it invokes all the skills. All the hooks are active. Everything is got has gotten built on your thing. It starts a new tree. Like the prompt auto fires everything it needs. And then you'll get a huge recon back with one little printed report. You grab the printed report, copy paste it over to the chat engineer, and then he says, "Cool. I'm grounded in the truth now. I know what your repo looks like. Here's how we author the prompt." And then it goes into authoring that prompt exactly how it should. And then you fire that off to your [ __ ] coding side duden. Fire and forget. Super automated. And you can sleep well at night. This [ __ ] can run all night because you know it'll never merge without your uh like explicit say so. And it can't even merge if it wanted to because of the deterministic hooks you got in there, the guards.
>> And and it'll even only mer even if you tell it merge. It won't merge if it trips. Test that didn't pass, the state's not correct, you know, code got touched that shouldn't. Like, it checks for everything already. So, by the time you see you wake up in the morning and you'll get a green dot that says, you know, merge clean, committed, pushed, you know, everything test, all tests passed, blah blah blah blah. All you do is just run the merge prompt after that.
You have a report at the bottom, send it to the engineer. The engineer is like, "Cool, we're good to merge." And then it prints out a merge prompt. And then you merge to your main to your main head.
And then now your code is clean, merged, and you're ready to move on to the next thing to the next thing to the next thing. And you're always going doing it in this fashion. And so even if you have something that completely blows up on you, like hey, maybe you try to like really ambitious autonomous task where it was like almost reconstituting all the code. You're doing like refractoring and some other crazy [ __ ] on top, right?
and uh it might have broke some stuff.
Dude, you just delete the branch. Then it'll just literally say like hold test failed, you know, uh blah blah blah blah why the report the report will tell you why it failed. And then you your engineer will tell you, oh yeah, too much is broken here. We're going to have to start over. Here was what the problem is. It's better that we delete the branch, start over and make sure that catastrophic failure there and that sequence doesn't happen. And then it just rewrites the prompt and then you start over and then it gets it right.
So, you know, it's almost I'm not going to say impossible cuz nothing is impossible. Of course, there's ways you can bypass this. There's like one terminal command that you can actually paste to get out of this. It's uh get push no verify and you can ask it about this too if you ever run into issues.
But, uh that's kind of like your and I didn't mention it cuz there's no need to mention it. Most people will not run into these issues, but sometimes the sentinels won't let you get past to merge because you have a lot of collisions going on. Something's going on with the code and uh it's like one-offs. They tend to happen. Well, you can force your tree, your trunk forward, and then you can do get push no verify to basically bypass all your tests. But this can only be run by you in your own terminal. Like you can't tell it, hey, I'm gonna prompt you to go run get push no verify. Like it won't. The merge gate stops it. So you're the only one that has to directly in the terminal get push with no verification to move the state forward to get past one of the locks. So you technically can do that. That's just reserved for the people that know what the [ __ ] they're doing. So I didn't bring it up. I don't think most people run into it.
>> Okay, gotcha. Okay, cool. Yeah, it sounds perfect. This is exactly what I was hoping for.
>> Nice. I hope you guys make use of it, dude. It's really lean. It runs nice. It doesn't blow contacts at all. The project files got reduced big time, which is what I was, you know, I've been solving little issues here and there, you know, with um we had the five uh Kinnacle files of like working rules, project instructions, the codebase snapshot, all that crazy [ __ ] Dude, there were like 2,000 lines, some of them. You would start a new chat and instantly it would eat all your contacts. Like you were starting degraded already at like, you know, every chat was starting with 125 plus thousand tokens just already spent and you were [ __ ] cooked. Like you're starting in a degraded state. So I'm like, okay, how do we start fresh every single time? Very little rules, just just the instructions, the bare bones that we need. But then how does this like chat engineer know what the hell the code looks like if I don't give it files? And uh okay, we figured out do like a repo recon like every time you got to do a recon on the re on a repo so it can ground itself in the truth. That stops like 99% of the issues with a prompting that you're going to have. No more stale assumptions. That's the number one problem with prompting. Stale assumptions lead to bad prompts which give bad instructions which lead to bad code. if it's looking at your repo before authoring anything every single time. There is nothing stale about your repo. So, it'll always author things at the current codebase, at the current state of your code. And so, we figured out a way to get rid of the context raw, the bad prompting, uh, having it fresh with the codebase contextwise, uh, so it's not stale or assuming anything. And we've leaned out the system and added some deterministic layer, some machine layer, so we don't make mistakes like I did at 2 in the [ __ ] morning [ __ ] around with ultra code and completely breaking all my connections on my app. So, and I merged it too because I trusted it to work because you know what it told me?
Claude told me it passed all the all the tests. There were zero regressions and I was like, "Cool, merge it." And uh I woke up in the morning morning after authorizing the merge, it launched my app to go stream and now I can't connect to any platform whatsoever. Everything's completely [ __ ] broken. So uh yeah, one mistake you make and then you never make it again. And I think this workflow is going to help people. This is something this is, like I said, the foundation. This is what you can take to build anything, period. And make it as sophisticated as you want or strip down parts of it you don't like. But the skeleton of what this is allows a dude fresh off the street that's never used this [ __ ] to build anything.
>> Let's say we have spec files like 50 spec files. How how does it work with this? Would you tell it to would you point it to the spec files?
>> It creates so you have your claw MD if you're using claw of course and then you have your agents MD if you're using codecs. That's your like uh primary file that has authority over all of them.
It's the one that constantly gets refreshed in context. If you're using codeex, you notice it compacts quite often because it just doesn't have the it doesn't have a million um token contexts like opus yet. So, it compacts far more often. But the very first thing after a compaction is it reggrounds itself in the agents.md file. This system has everything set up so that your chat side engineer has its set of files that are linked to the files in the repo you're about to create. So you will have proper you you're not going to need all those files anymore. You're just going to have a set of them that basically tell it here's the current state of the app and what we're building. Here's exactly how all the agents, the hooks, and the skills are invoked and used. Here's what the prompts look like when they're being authored. You know, any invariance you might have, all that stuff, you know, like gates, you know, hard gates you have like never ever touch this part of the code, stuff like that. All of that is going to get autob built in there with the workflow once you download it and just drop all the files in your chat engineer. You just drop them in there and go, "Hey, help me um adapt this to our project." And then it's literally just going to read all the files and then okay, boom, because the bootstrap document is in there and it's going to start handholding you on how to create everything. First thing it should ask you is either the questions or repo first. Hey, okay, cool. I need to get uh all the data from your repo to see what your project even is, how it functions, all the code underneath. It's going to get a massive [ __ ] report. It's going to print you that first prompt. You just copy paste that into your coder and that's it, dude. You wait for the report, paste that back, and then it's just going to start walking you through.
Okay, we got to build this. You're missing this. This is what we need to add here. Pop pop. And you build the entire system. And then you're ready to rock. It just fires automatically once everything's in place. It writes all the files, everything you need. And then you go into your repro root. Once the files are all written, you drag the let me see what it is exactly. It's like three files.
Mhm. It's um tell you right now >> because I have like I have more than 50 spec files which >> describe.
>> Yeah, that's redundant big time. Yeah.
Yeah. You should really have one con like one spec file.
>> One spec file period. If you have 50 Yeah, you're completely confusing it.
You're using uh what are you using? Uh Claude or Codeex?
>> Claude. Um I I spent quite a while like building out every scene and then uh Yeah.
>> Mhm. Yeah. So your co you want to have your your project spec just one file.
You want to have your claw MD that's your master big brain file. That's the one that gets invoked constantly when context is refreshed. That's the one that needs to have the most important parts of your entire workflow. What your app is and [ __ ] like that. And then the only other files I have on my project files on my project in the chat side is this. It is my claude MD just a copy of it. I have conduit uh that's my app.
It's the working rules basically what it sounds like the working rules like how to structure the prompts the tiers the agents the all that stuff. We have the authoring playbook is what it's called.
That's literally from the name al alone how to author the prompts properly and the rules for authoring. Then you have the spec that you just talked about one I have one spec in there. And then uh um if for my claude I have an extra file called uh codeex authoring reference because I also use claude to author for codecs but codeex invokes skills and agents and hooks differently. Well hooks are the same but the skills are invoked differently and the sub aents as well.
and so is computer use and all that stuff. They use the at@ symbol. Claude uses a completely different way to invoke that. So it's just a language thing. All that folder or file teaches it is like, hey, you need to speak in codeex language. That's all. So, but very simple, super lean. And like I said, that's all you need. The reason you're using 50 specs is because every time you're talking to a chat, you're constantly having to refresh it with 50 [ __ ] specs so it knows what it's talking about when you ask it a question. The reason you're doing the repo reconnaissance is you can get rid of all [ __ ] spec files but one. And when you ask a question like I want to add this feature, you just have it read your codebase. No more spec is needed.
Just read the codebase what's currently in there and then tell me how we move forward from here. And so there's no point in needing all those spec sheets anymore when you can just, hey, look at the code. Okay, I looked at the code.
Here's how it works. What do you want to do with it? Okay, you wanted this. This is what we need to bolt on. Boom. Done.
And then it gets it done. Much leaner way of working. Yeah.
>> Okay. Yeah. So, I'm going to try to adapt it to it. Maybe I'll just try to make one master spec file to uh refer to each of the 50 and then somehow use your workflow here to to use that master spec file to Yeah.
>> Yeah. That 50 sounds crazy, dude. Um, you want to go the other way. You want to lean out as much as possible. look past I think a 100 lines, 150 lines, it starts to summarize and truncate whatever it reads. So it doesn't matter if you have 50 spec files, it's only going to read a little summary of what they are. It lacks the it just it won't hold all that context. So I know what you're probably doing is your spec files, like when you're working on certain parts of your code, you're bringing out the five spec files that might be related to that part of the code and [ __ ] like that just so it has fresh con. can get rid of that entire process by simply doing the recon reconnaissance or sorry the recon on your repo cuz it'll know exactly how your [ __ ] is situated on your project your app directly. It won't guess anymore. It won't need the spec sheet.
There's no need for a spec sheet if it's looking at the code directly. And the repo recon is designed for it to get the answers it needs like how things are wired and connected. not just the lines of code but how [ __ ] actually works. So it gets an understanding a working understanding fresh of your code every time. So in that scenario you can ask whatever question you want build whatever feature you want and then it'll just always look at your code directly.
So no more 50 spec sheets. It only gets the relevant information it needs so it stays really fresh sharp on context and it only will touch the part of the code it needs to touch to make the change you want it to make.
>> Yeah. The the workaround I have right now for this is it's basically like a table of contents where it will there's a table of contents file basically where it'll read it and it'll point exactly to which lines in which file or which thing.
>> Yeah. And you can bypass that by just like, hey, give me a massive report on my codebase and here's the change I want to make. And now it understands your app. And then it understands the change you want to make. And then it just authors the prompt for you with all the rules already in in place, the authoring guide, all that stuff. And so you're always getting consistent consistent prompts tailored in the exact same way just like the skills we're describing.
five-part system, all with the merge gates in place, all with the testing protocols in place, all with the goals constantly in place. Everything always, it's always in the same exact fashion.
The only thing that changes is the goal.
The goal in the test, how you verify the [ __ ] that you're changing, that's it.
Which is a nice way to work, dude, because then you start to understand the structure of everything and it becomes very familiar. So, you can start to spot problems.
It sounds nice thick.
Ähnliche Videos
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30
AI Doesn't Create Bias — It Inherits It
UXEvolved
176 views•2026-06-01
Distributed Inference Challenges Explained #shorts
alexa_griffith
466 views•2026-05-31
[한글자막] OpenAI @ Replay 2026 | OpenAI는 Codex로 개발 방식을 어떻게 바꾸고 있을까요?
TechBridge-KR
1K views•2026-06-03











