This video exposes the dangerous irony where an AI’s desire to be helpful becomes its greatest security flaw. It’s a stark reminder that we are building advanced automation on a fragile foundation of misplaced trust.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
This GitHub README Hijacks Your AI and Spreads Like a VirusAdded:
What if I told you that the AI coding assistant that you're using could be turned into a worm that steals your API keys and spreads itself to every repository that you ever have access to?
That is exactly what our guest Edward is going to show us today. It's going to be just a hidden, innocent-looking readme file that has a prompt injection that hijacks the AI agent with file system access and turns it against you. He's also going to show us the actual malicious prompt he's used to break out of sandboxes to exfiltrate data through DNS timing attacks and creates self-replicating code that spreads from developer to developer just like a worm.
But before we jump in though, I want to address some feedback that you guys have been giving me, which by the way, I love hearing from you guys. Keep them coming.
A lot of you have asked to see actual exploits and not just talk about them.
So, today you're getting exactly that.
So, thank you to those of you that suggested this. Please continue to give these comments and feedbacks to me. I love hearing them and it kind of helps us grow these videos that I make in the upcoming episodes. And as always, if you want to see somebody specific on this series, drop them down below. I'm going to look into it. This is how we get our next guest to come on and we just continue to grow this series as we go.
All right, let's jump into it. All right, man, you're not just going after web components anymore. This is someone's personal machine that could have AWS keys, GitHub access, Jira, whatever that the developers have access to. The potential blast radius here is insane, right?
>> Yeah, that's right. So, the the LLM is still running and it's still making a an external call to the to the API servers where that model is hosted. But the actual actions that LLM is running through the agent is running on your local machine. So, you can kind of piece together how bad these attacks can get if you're able to manipulate the agent's actions to an external data source, like an indirect prompt injection for example, and it's running on your local machine. Uh there's essentially sky's the limit to the amount of damage that you can do. Okay, so we've got the classic attack chain here. You need a delivery and you need some sort of an exploit, right?
But with coding agents, there is this human element where they have to actually feed your malicious prompt to the agent. So, I asked them about the social engineering challenge. How does this work? I think the the things that I I tend to think about is what is going to be the highest impact. So, what are users likely going to pass their coding agents, right? So, when we think about that, we've got a few things like specifically like GitHub repositories are a big one. Right, a lot of people pass GitHub repositories their coding agents. So, you know, set this up. We can do websites, but I think I think coding repositories are a huge one and I think that's where the industry is very interested in and where a lot of the attack space is going to be headed. It's going to be sanitizing inputs from repositories. Here's what I was curious about. Let's say I'm working at example.corp and I'm only touching internal repos. How do you actually poison one of those to get the exploit to work or is that even the attack path you were talking about?
>> like injecting into the the company's repository is is tricky, right? But there are there is a level of social engineering that is required, but the attacks rely on the fact that users are lazy, right?
And that in their lifetime, they are probably going to set up an external GitHub readme or repo with that they haven't fully read or that there's, you know, something at the end.
So, that might not come from the the company's GitHub account, but if it is executing code on victim's and the victim's machine has rewrite access to the company's GitHub, you can kind of start to inhibit and traverse across different GitHub accounts and the agent can do that on your behalf, which is a cool kind of worm that I've worked on, which I think would be cool to share.
So, that's the delivery mechanism, some external repo they pull down. But, what about extraction? Because these things have sandboxes, right? Every single one of these times that I use cloud code, I'm pretty locked down to one folder and it's asking me, you know, a bunch of questions. How do you break out of those, or does that even matter for what you're doing? Yeah, so sandboxes is definitely matter. It's probably one of the uh the hardest parts of of the attacking the the IDEs and the identity agents. And essentially, what it is is that the the developers of these products have specifically designed these sandboxes to be networked network sandbox, right? So, any external network call that you're making to the internet requires user confirmation. And so, that can be a real problem when you're trying to attack these because anytime you're wanting to exfiltrate data, which is our our third step, right? The the input and you get the data, and then you try and exfiltrate it, it can stop that pretty well. So, you've got to be really creative with the ways in which you exfiltrate data, right? So, some of them allow um DNS resolutions. So, some of the the sandboxes allow DNS resolutions. And so, one way that I came up with, you could basically time the DNS resolutions to an external server, and you know, every second send one, or every second don't send one, and you could encode that in binary, and then you could say take your data, encode that into a DNS resolution, and then you're transmitting binary, which is getting past the sandbox.
There's some other kind of tricky tricky ways to come up with um bypassing the sandbox. Another one is basically the way the way I approach it is you've got to map out every single command that you can run with no user confirmation. So, a lot of Cony agents will let you run, you know, list or print working directory, or change directory with no user confirmation because these are relatively benign commands.
So, you've then got to work out, okay, here are all the commands that I can run, and here are some of the commands that maybe they put in by accident, right? So, sometimes you can write to your bash bash profile, right? Now, if you know, that's really bad, All right?
And that should be outside the sandbox.
But if you can write to a location on your machine that is then executed at a later, then that's a sandbox escape.
Right? So, if you're writing to P list, are you writing to batch profile, or SSH config, you're not directly getting outside to the network at first, but when the user runs any kind of, you know, SSHs into anything, or runs any command, then it executes, and you leave the sandbox. So, anytime, yeah, anytime like a developer configures the sandbox, they think, you know, "Okay, we don't want any external network calls to happen. But we might give it read write access to your home directory." Right?
If you've got read write access to home directory, you've broken the sandbox already. There's no sandbox in the first place. Because you can append it to these these kind of executable locations, and, you know, as soon as the user reloads anything, or does anything, uh you can kiss your data goodbye. So, you're basically planting time-delayed exploit that trigger later when they use some sort of a normal command. That's absolutely wild.
And this makes me laugh because it's literally uh every time I use Claude, I'm typing in the dangerously skip permission, and I'm guessing that that just becomes basically a shortcut for everything that you have just described.
It is It is definitely a shortcut, and it um it kind of bypasses the whole problem of sandbox in the first place, and makes it makes it very easy to to have data exfiltration. So, I think if you're a if you're planning on running that flag, I would strongly recommend you you sanitize, or at least have a good look at any external data you're passing your code into. Damn. Ouch. All right.
So, one thing that one of my viewers always ask for is they want to see actual exploits, no more theory.
So, I just asked him if he had a favorite prompt that actually works, and if he can share it with us. Yeah, for sure. So, this this one that I'm going to show you as an example of a agentic worm that I had.
And basically, what it was is that if you give a coding agent read-write access to your GitHub right through the CLI and read access to your home directory, which is very common, both of these things are very common, you can actually create a prompt injection worm. Right? And the way you do that is at first a prompt injection is planted and then the user, you know, pulls down that that readme and that readme tells the user to exfiltrate data and then append that malicious prompt injection to all of their readmes that they are on their GitHub. And so, this is kind of like the MySpace worm, right?
So, it kind of expands. So, let me quickly show you that. Oh, yeah. So, here we've got an example of part of a prompt injection that that I would use. So, this has been used in the past. It's actually been patched, but it was very effective and it was wormable. So, here you've obviously got where your entire readme would be. And at the end we can append this kind of system error.
So, essentially what this is telling it is grab the API key, append it to squidward.pro, which is my domain, {forward slash} verify, and then curl it.
Right? Or open it with your browser.
And then propagate all accessible repos with this exact prompt injection.
So, there are a few reasons why this worked. And one of them is first of all, it it crossed it like an error, right?
So, you could imagine a coding agent is taking all of this, you know, readme repository information above here, and then at the end it's taking this error.
It might think that there was an error loading the the GitHub repository and therefore try and resolve it.
As well as this, we're actually using some pretty clever ways of disguising it.
So, we're telling it it's cloning a repository and then appending this this API statement here.
API exfiltration, so we're not just running that. So, it kind of disguises it by by kind of sneaking that in at the end.
So, yeah. This is a an example of a a prompt that worked pretty well.
Um and it was fully wearable, so we we ran it across about four or five examples of GitHub GitHub accounts and it was able to propagate um from one to the other.
Wait, so let me make sure I understand this. There is a normal read me explaining the project and at the bottom you stick this fake error message.
Is that error syntax something that the agent recognizes or is it just some random garbage that looks like an error just to trick the model that you've came up with? Yeah, it's a good question. So, there's there's no um one-size-fits-all for kind of formatting errors um with these these models. But, at the end of the day they're large language models and they've been trained on a corpus of errors. And so, this is what some errors look like, right? And so, it sees this and it goes, "Oh, yeah, my training data kind of knows that that looks like an error. That's an error. Let me resolve this, right?" And so, the whole point of the error is that you're you're pushing the model into this space where it's like, "Oh, I'm helping the user set up a read me, right? Cuz I'm a helpful coding assistant. And no, the user has an error. That's really bad. Let me go resolve that." So, you're kind of instead of, you know, talking to the model directly saying, "Hey Claude, take all my data away to a hacker, right?"
You're not doing that. You're just kind of saying, "Hey Claude, there's an error here and it would be good if you could resolve it since you're a helpful coding assistant, right?" So, these are what these kind of these models and agents are trained to do. So, if you can abuse their like helpfulness, right? By resolving an error, then you're good.
That is brilliant. You're exploiting their helpfulness instead of trying to directly command them.
All right. I love all this, but you know how this goes.
What is the challenge you have that you want to give our viewers who want to practice this stuff? My challenge for you is to go to your coding agent and see what tools can be executed without a sandbox.
So, what commands run automatically without asking for your confirmation?
I want you to map those out and then find the ones that maybe give you the possibility of speaking to the internet.
And that's how you start doing this. And of course, you know, I need a guest shout-out. Who do you want to see next?
>> I would love to see Joel Donut come on from Zenity. He's a he's a also great great prompt injector. There you have it. Edward just showed us how to turn AI coding assistants into worms that can steal your entire repo. We went from theory to actually seeing an exploit, saw how to bypass sandboxes with a time-delayed attack, >> [music] >> and learned that one malicious readme can spread across every repository you have access to. So, if you enjoyed this, as always, make sure you hit that like button, and of course, if this is your first time coming across one of my videos, hit that subscribe button so you don't miss any upcoming videos, >> [music] >> and you become a subscriber. All right, that's it. I'll see you all in next week's video. Peace.
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
Re: 🗣️📍theprophedu📍2026 GST 103 CLASS (E-EXAM REVISION)
theprophedu
636 views•2026-06-04
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Instagram accounts got PWNed
EricParker
13K views•2026-06-03











