This video correctly identifies sandboxing as the essential bridge between AI toys and production-ready autonomous agents. It offers a practical solution to the security bottleneck that currently prevents most developers from fully leveraging AI.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
The 1 AI Unlock Most Developers Aren’t UsingAdded:
This is insane. Lud has committed over 50 billion lines of code and there is no sign of this stopping. Around December 2025, AI coding made a huge leap. But not everyone took advantage of it. For example, Spotify claims that their top developers have stopped coding for months. This is quite rare in reality because it means a team completely changed their way of working. Most of the job becomes writing specifications, architecting, delegating tasks, and reviewing huge amounts of slop. Sorry, I actually meant code. We'll fix that in post. Fix that in post. The crazy part is some even skipped the review stage.
Don't do this. Even so, for a small group of people, this felt like a revolution, whereas for everyone else, it felt incremental. I think I know exactly why. In this video, I will show you the biggest unlock in AI coding so far. So you can level up too, but do it in the comfort of your own codebase.
Quick thanks to Sentry for sponsoring this video. Part of why LLMs got so much better at coding is obvious. The models got smarter, harnesses got better, contexts got larger, and dumb zones moved further away to the edge of the context. But most importantly, AI for the first time ever was able to run for hours and hours unattended and still produce working code. That needs to be stated. Running unattended meant that an agent could install packages, edit half your code base, hit the network, run some arbitrary script, and then somewhere in the middle of the process, you'd think, should we even allow that?
Is this the best idea? There are obviously tons of security concerns here. AI is getting good, maybe even too good because we have literally seen aentic AI trying to give itself more permissions. No wonder only a small percentage gives it full control. In fact, in this survey, we see that only about 18% of developers let the AI run loose on their machines. In enterprise, that's going to be closer to 0%. And at the same time, this survey shows that 73% of developers think that AI should run in isolation by default, probably PTSD from past hallucinations. This is why it stayed as a niche unlock until today. What if I told you that you can have both longunning agents with full permissions and proper secure sandboxing? Not the placebo Codeex or Cloud Code is giving us. No, no, no. The real deal sandboxing. Even if you've been using the full permission/ yolo mode, you're going to want to stick around. Check this.
>> It feels like the early days of web hacking where SQL injection was everywhere and you could get shell on almost any enterprise-based internet accessible website.
>> Look, I don't want to scare you. I mean, I do want to scare you just a little bit. Security is going to be a huge issue with AI and everyone should be using sandboxing. It's super easy. I promise. Let me show you how easy.
There's a lot of AI news these days and quite frankly a lot of it is B. So I can't blame you if you've missed this one. Docker released one of the coolest dev tools in years. It's called Docker Sandboxes. And what it is is a terminal UI or TUI completely separate from Docker Desktop that lets you run coding agents in isolated VMs with their own file system and with their own network and with no chance at nuking your machine. So AI agents can add or remove whatever binaries they like and you won't be affected at all. The only thing you have to do is review the code at the end. The coolest part, the setup is minimal and the Gentic tools work exactly the same as before. So in practice, you won't even see the difference. Okay, getting started is stupid simple. Let me show you. First, install the sandbox CLI called SBX. This works on Mac OS and Windows. Then go to a project's directory and start your first sandbox by running the command spx run claude. On the first run, you will need to sign into docker. Afterwards, you will be asked about the default network policy. You have three options: open, balanced, and lockdown. Most people should choose balanced, but you can always change this later. And after that, the agent CLI opens up as per usual. I'm using Claude here, but this works just as well with Codeex, Gemini, Copilot, Open Code, and many others. As you can see, I'm not logged in into Claude Code. So, I'm going to go ahead and do exactly that by hitting /lo. And keep in mind, you will need to log in into each of your sandboxes. And done. I told you this is pretty simple. You're already up and running. If you want to create a sandbox for another project, you will need to go to its directory and then hit SBX runcloud. Again, you can also see at the bottom of the left corner that it says dangerously skip permissions mode. This is the default mode set by the sandbox. Now, let me show you how to get the most out of this. But speaking about the way to get the most out of your work, let me tell you about the best possible way to get the most out of your code in production.
Sentry, their new AI debugger, SEIR, can keep an eye on issues, find root causes, and even create fixes for your production issues. Check it out at sentry.io. And huge thanks to Sentry for sponsoring this video. All right, so we are Sandbox in YOLO mode. This will already enable the agent to run for a lot longer because we don't need to babysit it anymore. That also means the work moves upstream. Planning, delegation, spec work, all of that needs to be tight. So, how do we do it? One of the best ways is to give the AI agent all of the possible information in our own words and then ask it to make a fool out of us. Everyone is a critic, right?
But this technique can be extremely effective. For example, we can use Matt PCO's grill me agent skill. This will painfully extract all of the information out of you until the ambiguity dies.
It's brutally simple, but it always uncovers stuff I didn't think about.
Always. Okay, so we have the spec, the big plan. What now? The second thing you need is a hardness that can self-verify.
Ideally, that means tests, builds, type checks, linting, and a way to check the UI. If the agent cannot verify its own work, you will get slop. That's a fact.
Also, for tasks involving front-end work, you need eyes on the UI. That can be done with tools such as Versol's agent browser or Playrate. The agent needs to be able to open a page, click buttons, input text, take screenshots, and verify entire flows. The more an agent can verify its own work, the longer it could run without wasting your time. I highly recommend that in your spec, you add instructions for the agent to run these checks. It is crucial. All right, now that we have that in place, let me show you some important Docker sandbox commands. The two most important commands are number one, sbx. This opens the terminal UI and shows all of your sandboxes. Sandboxes on the left, network on the right, all fully interactive. You can see what sandboxes are running, start them, stop them, remove them, or check their network activity. For example, we can see that some requests are being blocked. You can actually click around and allow access in this interface. It's really simple and powerful. And command number two, SBXports-publish.
By default, dev servers running in the sandbox are not reachable. This command exposes a port, so you can open a sandbox dev server directly in your browser. Remember, this is per sandbox, so you will need to do this for each of your projects. By the way, you can run sbx list to see a list of all of your sandbox names. One more cool thing, I'm so excited about this one. You can also run Docker sandboxes in headless mode.
Here's the tiniest possible example of this. The -p parameter tells Claude to run the prompt and return the output. No UI, no nothing. Okay, why was that cool?
Imagine you created three, five, 10 specs with the grill me scale I mentioned earlier. Then imagine a simple for loop passing each of the specs to Claude one by one using the -p parameter. Do you see it? With this, you've just expanded the maximum running time of your agent from minutes to hours and possibly even days. By the way, this is also how a Ralph Wigum loop works. If you want to see a Ralph loop tutorial, I got you. Check this video. You'll be up and running in one command. I myself often have Ralph loops running for 8 plus hours. Okay, so what about when things don't work? No problem. You can drop into the sandbox directly using this command. Then cd into the same path the project has in your own system. And from there you can do whatever you like.
Run the dev server, rerun tests, install packages manually, you name it. If you want to reach any website, although not recommended, you can loosen up the network policy in a sandbox using this command. If the sandbox stops working, you can always remove it and recreate it. No harm done. That is the beauty of it. But really, there isn't much more to it than that. And I think that's why this is so huge. It is the tools you know and love, but a ton more secure.
The unlock is letting it run and doing so responsibly and without accidentally nuking your machine. I would like to thank Sentry again for sponsoring our show. And I would like to thank you for watching. I hope you've learned something today. And if you did, please consider to like, comment, subscribe.
Really appreciate that. Until next time, happy sandboxing.
Related Videos
VALORANT's Latest 'Exclusive' Tier Bundle is Rough...
KangaValorant
17K views•2026-05-28
Flight Attendant Mocks Poor Looking Black Woman — Mid Air Announcement Exposes Her Real Power
SkyboundStories-b4r
184 views•2026-05-28
I FIXED My Friend’s Blown Turbo RX-8… Then Sold It
Cameron-RX8
134 views•2026-05-28
NewsWatch 12 at 5: Top Stories
NewsWatch12
1K views•2026-05-28
Simon Jordan & Danny Murphy deliver PREDICTIONS for Arsenal's Champions League FINAL with PSG
talkSPORTArsenal
6K views•2026-05-28
Botting is OUT OF CONTROL in Classic WoW (Again)...
SolheimGaming
108 views•2026-05-28
The "AI Job Apocalypse" is CANCELLED!
WesRoth
9K views•2026-05-28
STREET FIGHTER 6 - INGRID Story Walkthrough @ 4K 60ᶠᵖˢ ✔
RajmanGamingHD
12K views•2026-05-28











