Roman effectively applies the principle that verification is computationally simpler than generation to transform fragile AI prompts into a robust, deterministic engineering framework. This is a masterclass in replacing "vibes-based" automation with the rigorous logic required for true production-grade agents.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
The Claude Code Workflow That Runs My Business Without MeAdded:
There's a reason that you can't make successful automations with Claude code, and it's not your prompt. In this video, I'll take you through the most important step to automating with your coding agents. By the end, you'll have the tools to actually run automations that don't break the second you take yourself out of the loop. I'm Roman. I published a top 3% paper at NeurIPS, the largest AI conference in the world. Now, I'm focused on pushing agentic coding to its limits. Most people try to make agents reliable enough to run unsupervised by loading rules into the prompt or their claude.md. For example, asking it no font under 22 pixels, only palette colors, or other dozens of constraints.
This quickly results in the agents having a decreased performance and eventually ignoring your rules if you stack too many.
When the agent ignores your rules, you have to go back to babysitting the model, and the cycle continues. This is why I highly suggest do not put rules in your claude.md.
Here's the core insight that actually lets you step away, which was mathematically proven in a recent paper by OpenAI called YLMN solutioning. Turns out that generation is at least two times more difficult than classification. This is because the generator agent has to make thousands of branching decisions and trajectories, and verification just has to make one.
But, a separate verifier doesn't always have to be another LLM. Many people get so into AI usage that they forget that they can do tests and scripts to check exactly the type of behavior they want without an LLM call. This means that you can stack many, many of these scripts and check a lot of different behaviors.
You can build these scripts with AI into your pipelines, and they run instantly and cheaply. So, once you start implementing this at scale and stacking these deterministic checks, you will be well on your way to taking yourself out of the loop. The way that you can achieve taking yourself out of the loop is that when the generation agent finishes its task, we have what we call a verification gate, which can either be triggered by a hook or some other type of behavior.
In this case on the screen, it's just a bash script checking if the font size is greater than 22 pixels. This is easy, deterministic, and binary.
Then, step two is that if the verification gate fails, we launch another generation agent with the issue or we give it back to the same generation agent. The second agent can easily clean up the mess from the first one, and you continue with no human intervention.
True automation is just the process of adding as many of these quality verification gates and healing cycles as possible until you get a task to run end-to-end with no human intervention.
Now, let me motivate this with an example of how internalizing this flow changed my work.
Some of you may know that all of my videos are created in a software that I built, and I code all of my slides.
After I storyboard and come up with the design for the slides, I have some pipelines that build them. However, I noticed that the same issues would consistently pop up even if I put rules in. Specifically, overlapping components were a consistent pain point where I would have to manually rearrange them.
So, I implemented an LLM reviewer to verify and fix these issues, but continued to notice that the overlapping wasn't fixed.
Then, I came to the realization that this issue could be measured deterministically in a bash script, which I call a slide linter.
So, I created a 50-line bash script that checks for violations in my slides and then automatically spawns a reviewer agent if something is wrong.
This bash script simply triggers after my slide builder agent completes, and boom, I didn't have to play the game of going in and cleaning up all the work from my agents anymore. So, in summary, after every step in any sort of automated pipeline, we want to add a verification gate. Specifically, I follow this hierarchy directly. Starting from the top, if I can do a verification higher up in the list, I will. For example, if you have an automated pipeline writing blog posts, you can't afford human review, you can't turn the check into a script since it's based on unclear writing principles, and you can't apply rules, but you can bring in LLM as a judge. And better yet, if you have clear criteria like patterns you explicitly don't want to see such as m dashes, you can have an LLM as a judge with a clear grading criteria, and it can fix the issues.
One thing I'll note is that sometimes it can be good to give the model rules in its prompt.
This is because if it's an easier task, you might as well give those rules, and it'll listen to you. But, you use the LLM as a judge when your agent starts forgetting your rules and ignoring you.
Now, if I were to walk back through the hierarchy with my actual linter, we would have the script handles it with the font being size greater than 22 pixels, uh if a certain color is in the palette, or if interpolation is clamped.
We can deterministically check this easily with a script. But, I also have an LLM review step for my slides, where the LLM has to judge is the layout balanced? Is the visual hierarchy right?
Is the animation timing good? These are more subjective and harder to measure.
But, it does clean up and do the job.
Now, let's think about this hierarchy in Claude code specifically for how to actually start using Claude code properly.
Well, first, you want to use hooks all the time because hooks are these deterministic gates while you're writing code, and these allow you to properly trigger these verification gates after something gets done.
Well, then down the hierarchy, we have path-scoped rules, which in this case, many people don't know, but is a.claude/rules folder, where if you put the rule in there, it will only dynamically load according to if the agent is working within a path. This is much better than claude.md.
Then, we have skills or sub-agents.
Putting some rules in a specific skill and expecting the agent to trigger the skill and follow. Well, we want to be careful here because we're putting it in the hands of the agent. And then, all the way down at the bottom is claude.md.
Many people, most people actually flip this and put most of their rules in claude.md specifically. Do not do that.
It will get particularly ignored because the system prompt says that not all things in claude.md are relevant. So, if you end up putting rules in your claude.md, they'll just get ignored. So, at least move them out of it and preferably into hooks.
Now, I'm going to motivate all of this with a real-time demo, where I'm going to be building a slide using a linter and a verification gate, and show you in real time how I actually use this flow in my own day-to-day work.
All right, so we're going to jump in to the live demo and start by just opening up Claude in the directory which I am currently making one of my videos.
So, once we're in, I'm going to paste in this prompt.
Basically, what this prompt is doing is that we're creating a sub-agent stop hook that runs slide-lint.sh, which is a bash script that checks for stuff such as m dashes and font size less than 22 pixels. Those are two things that I personally do not want to see in my video.
So, once it goes ahead and checks those, we actually also want to specify that the linter will only run the slide that the sub-agent just fixed. The way I'm currently doing that now is I actually use git diff in order to check what slides were edited, but there's also ways to parse the JSON logs from the agent itself.
So, let's go and enter that in.
And then, we can see here that it finally loaded.
And let's see what happens. So, first, we're going to go ahead and clear context.
And then now, we're going to go ahead and create the prompt for the actual slide population so that I can show you the right side of the slide getting filled in and then edited in real time.
So, let's see what happens when that finishes.
So, now you can see that the slide builder has been launched, and now we just have to wait for it to go ahead and finish. And boom, the slide is populated. But, you can notice that there are m dashes, and there is really small font size. So, these are two clear violations of the rules of the slides.
So, what are we going to do in this case? Well, we have the linter, which is currently triggering, and it's you can see in real time it's updating those m dashes and turning them into hyphens instead of m dashes. You know, obviously, instead of replacing it with hyphens, we should probably be replacing it with better NLP such as colons or commas or parentheses.
And then, we're also going to scale up the font size.
And boom, there it is. We have the thing fully finished, and the slides look super good, way better than when they first started. Now, here's the thing.
Every automation is only as good as its verification. So, you got to fix the verification, then you can go ahead and take yourself out of the loop. In the case shown in this video, 50 lines of bash script outperformed an entire LLM reviewer. Just remember that, and it's instant, meaning you can stack as many of these as you want. And this allows you to stack many, many gates, which prevents many, many errors, right? So, you want to push determinism up. There's a rule that I stand by, determinism over probabilism. So, once you fix this verification issue, you are well on your way to being able to take yourself out of the loop in your Claude code automations.
Link below is the number one agentic coding community on school. I just released my free Claude code course in there, so I'll see you in there. Go get it now. Thank you for watching.
Related Videos
VALORANT's Latest 'Exclusive' Tier Bundle is Rough...
KangaValorant
17K views•2026-05-28
Flight Attendant Mocks Poor Looking Black Woman — Mid Air Announcement Exposes Her Real Power
SkyboundStories-b4r
184 views•2026-05-28
I FIXED My Friend’s Blown Turbo RX-8… Then Sold It
Cameron-RX8
134 views•2026-05-28
NewsWatch 12 at 5: Top Stories
NewsWatch12
1K views•2026-05-28
Simon Jordan & Danny Murphy deliver PREDICTIONS for Arsenal's Champions League FINAL with PSG
talkSPORTArsenal
6K views•2026-05-28
Botting is OUT OF CONTROL in Classic WoW (Again)...
SolheimGaming
108 views•2026-05-28
The "AI Job Apocalypse" is CANCELLED!
WesRoth
9K views•2026-05-28
STREET FIGHTER 6 - INGRID Story Walkthrough @ 4K 60ᶠᵖˢ ✔
RajmanGamingHD
12K views•2026-05-28











