By bypassing high-level abstractions for direct CDP control, Browser Harness transforms the often-fickle AI agent into a precise, self-correcting automation engine. It is a sophisticated bridge between LLM reasoning and the messy reality of modern web architecture.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
They Bet A Mac Mini This Browser Agent Could Do Anything!Added:
In this video, we're going to talk about browser harness. It's a new repo put out by the folks at browser use and why it is easily the best browser agents I've ever used. I've been playing with it a little bit this afternoon and I'm already convinced, okay? So, we're going to talk about some practical use cases, some cool stuff it can do, and just really how to get started, okay? So, stick into it. Very first thing that got me is I came across this post by browser use. It says, "First person to find a task that doesn't work gets a fresh Mac Mini." And I'm like, "Okay, wait. Hold on.
What are we talking about here?" The second post was about a repo that they just put out called browser harness. And the idea is to do just that. We've been spending a lot of time on the channel talking about harnesses, so this was essentially giving full browser control to an LLM. The way that they did that was build it directly on the CDP. So, essentially, it's plugging directly into your Chrome browser either locally or on the cloud, which sounds awesome. So, of course, let's check out more about what this can do, right? So, I was looking through the Twitter account of the founder. This guy has actually got a screenshot of his browser harness agent generating a video with C dance 2, uploading and scheduling on the TikTok studio, and then later on analyzing the best hooks based on views. He says it modifies its own harness, so no task is impossible, and it shares domain-specific skills between agents.
So, I mean, honestly, even like right now, I'm I'm like hooked, right? I want to learn more, but then you check it out even more and he's like playing games.
He's got a screenshot of Claude drawing in Excalidraw.
Like it's it's drawing out a heart.
That's the browser agent. So, it really seems like they've given the LLM like a mouse and keyboard, which is kind of fascinating, right? So, I'm like, "All right. All right. You win. Let me see exactly what's going on here." It's actually a public repo, so you can clone it or bring it in with the GitHub CLI, whatever you need, and it's as easy as running the setup prompt. And when Chrome opens in your environment, you just allow remote debugging, and that will run for you, no problem, locally, okay? I did that and everything worked out great. It also runs on the cloud for remote browsers. They offer free remote browser use, three concurrent browsers, and no card. That's where you set up your API key, but let's just focus on local for now instead of launching this stuff into the cloud before we know it works. So, the unique thing about this isn't just how they're approaching the browser agent, but also this helpers.py.
It's essentially like how they do stuff, and it is self-annealing, or at least it it the agent can update this on its own, right? So, for example, it says the agent wants to upload a file, helpers.py doesn't have the actual script or the capabilities to do that, agent writes that line of code needed to execute that task into the helpers file, and then the agent is able to do that now and forever because it's baked into the harness.
It's done that task, and it already knows how to do that moving forward. The other super cool thing is they've got a bunch of these already built out, right?
So, the TikTok one, for example, it's got an MD file for TikTok. There's one for Facebook. For Sarah is interesting.
eBay, Etsy, Craigslist. There's quite a lot. Steam, Zillow for real estate listings. Like, they've got quite a few domain skills already kind of like mapped out from things that people have already done. Okay, but the other cool part that like kind of got me excited was looking at the interaction skills, and this is this explains exactly how the LLM is able to interact with Chrome.
Like, you look down this list of skills.
It's like tabs, uploads, downloads, drag and drop, iframes, everything. So, to this point, everything's got me pretty excited, and I'm like wanting to give this a shot, right? And the number one thing I use these browser agents for to date, essentially, is just web scraping, right? Getting information from a directory, finding leads from somewhere.
But we always run into a couple different problems, either like anti-bot stuff, Cloudflare, or whatever, or like obfuscation and emails being hidden behind a couple separate button clicks, and it's always really hard to wire that into like a Playwright agents or something, for example. So, I figured let's just give it a shot. We'll go to a directory where we we have just that.
I'm just going to hide some of this stuff. So, but you know how you click, right? Reveal phone, email. I just want to collect that information because it's all publicly available and totally safe, not black hat or anything like that, but I do have to be at the desk in order to like click these buttons and get the emails from there. So, what I wanted to do was automate that like I was at the desk, but the agent is going to do it for me. That was mission number one, and did it deliver? What was really exciting about this, we had the AI running that to begin with, but since we had the SOP, essentially, and like proof of concept AI gone through and done that first by itself, we were then able to write a deterministic Python script to do that automatically. So, I'm not having the AI babysit anything, but it's still being woken up if there are any errors. So, that is the actual crazy crazy use case behind this is I've turned the AI into more of a manager rather than like a micro-manager doing all these different tasks. We've got the Python script running in the background getting these different emails, and then if there's any errors, right? Or a page is blank or whatever, then it wakes up the AI. If there's any sort of lag or whatever, the AI can then address the issue and continue, or alternatively, it can escalate to me. It can send me an SMS, a Telegram message, or whatever, and then I can come to the desk and fix the problem as needed. We've got the script running with the browser harness. It's going through each of those pages clicking the button to reveal emails and collect those contacts. Then it's going to output into a CSV format, so we can use this for whatever moving forward. So good. And because it's information that's publicly available that you would be able to go and collect yourself, it's just making the agent go and do the button clicks instead of you. And it's still ethical. You're not putting any strain on anybody's servers, and you're not, you know, getting information in any sort of like nefarious sort of ways.
Anyway, so just like a little bit of a shorter one because I'm just getting started with this myself, but I really wanted to share with you because this is kind of awesome, and I see a lot of really practical use cases for this. Not just for lead generation, even though that's like a really obvious first step for a lot of folks. But let's say like social media, right? Like, a lot of platforms that are really hard to connect to, like Instagram, Facebook, LinkedIn, you could get these automations completed from the browser side of thing. Again, this runs for free locally just with your Claude code instance or Py or whatever it is that you're using, and I'll leave a link to this GitHub repo in the description.
I'll also leave a link to my school community where I will put the workflow that I just built out for today in the tools and agent section of the classroom. You can check it out. We're over a thousand members now. It's free to join, but there's a lot of perks with membership. Let me know in the comments exactly how you're using this. This opens up a lot of different possibilities that were only like partially available with the other browser frameworks. I want to check out exactly how people are using this with things like the Hermes agent moving forward and whatnot, but that's a different video. Anyway, thanks for hanging out. We'll catch you in the next one.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











