Data poisoning is a necessary act of digital sabotage that forces a long-overdue friction into the unethical harvesting of human creativity. It represents a desperate yet vital reclamation of agency in an era where data extraction has far outpaced legal protection.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Poison Your Data. Fight Back Against AI.Added:
Back in 1996, a man called John Perry Barlow published a document titled A Declaration of the Independence of Cyberspace. His argument was simple. The internet is not of the physical world, therefore it should not be governed by the same rules and regulations of the physical world. No borders, no paywalls, just ordinary people connecting and sharing freely.
What followed was an era of the internet that sadly most of us have forgotten.
There would be communities formed on these peer-to-peer file sharing networks where people would come and freely share art, music, knowledge, and information.
There was no corporations, no paywalls, no algorithms. It was bliss.
Soulseek was a site that shared music.
It was built for music lovers by music lovers. Rare underground vinyl rips, underground electronic music, lossless audio that you couldn't find anywhere on Earth. Soulseek had it all. It launched in the year 2000, and today in 2026, it's surprisingly still active and probably more active than it's ever been because there was something which happened which John Perry Barlow couldn't predict. Billion and trillion-dollar companies have found these platforms and communities, but they're not exactly joining them.
They are harvesting them. You see, sites like Soulseek contain thousands and thousands of hours of high-quality lossless FLAC audio. And that's exactly what they need to train their AI models on. So, they are silently scraping these sites and downloading everything they can get their hands on.
These files were shared by real people and musicians out of love for free, but now they are being extracted for value so that the AI companies can create better models and make profit that the original musicians will never see a penny of.
That's not the open internet, that's extraction. But there's a way that we can fight back.
Look at this river.
Still, clean, untouched. You'd probably take a drink out of it if you were here yourself.
But, now imagine all of this water is raw music data sat on Soulseek. There's nothing stopping a big AI company coming here and sucking up every last drop of this. Nothing.
That is unless someone got here first and introduced some doubt by poisoning the water.
This is the story of data poisoning and how ordinary people are using it to fight back against AI companies to protect artists and musicians.
Let's find out more.
>> [music] >> Meet Mr. Daniels. He's a 25-year-old from England who likes to tinker with AI and deep fakes. Recently, he made a bit of a name for himself when he took his entire music library of over 2,000 records, stripped out the original vocals, and replaced every single one of them with the voice of Homer Simpson.
Then, he uploaded all of them to Soulseek. He didn't change [music] the metadata, the file names, the artist tags, the album information. They all stayed exactly the same. So, when you search for a track and downloaded it, everything looked completely legit. Some songs even had long intros, so you might listen to them for 30 seconds before you notice anything's changed. And then suddenly, there's Homer.
Now, why Homer Simpson?
Well, with an AI script that downloads all the files, it doesn't exactly listen to them first. It does read the metadata and assumes that the file is legit and then files it away accordingly. So, somewhere deep in a training algorithm's data set is the audio of Homer Simpson, which the AI will sounds like Madonna, Rihanna, or maybe even Sean Paul. The model doesn't know the difference. It just ingests the data and treats that as the truth. And that is exactly what Mr. Daniels is hoping for, to introduce noise, chaos into a system which is built entirely on pure, clean, high-quality audio. Now, Mr. Daniels has admitted that he does feel some guilt that genuine users are being caught in the crossfire here. They will unknowingly download these files, listen to them, realize it's Homer, likely have a laugh, move on, and download the next file until they find the exact one that they want. But, an AI algorithm scraper doesn't have that luxury.
And that's exactly who he's targeting.
And what Mr. Daniels is doing is just one form of data poisoning. You see, it comes in many different shapes and sizes. Back in World War II, Operation Mincemeat intentionally fed the Nazis with false information, which yielded a particular outcome. And in the early 2000s, major record labels were worried and scared about these peer-to-peer file-sharing sites. So, they flooded them with corrupted and terrible audio quality files, hoping that users would eventually download those enough times to get frustrated and leave, and just go buy the album anyways. The only difference now is that we're not targeting individuals, we're targeting corporations, the AI companies themselves. Oh, and if you're feeling sorry for these big AI companies, don't.
[music] Because Spotify and some major labels have recently banded together to sue them for an approximate $13 trillion [music] have downloaded nearly all of the commercial sound recordings worldwide on the internet.
>> [music] >> 99.6% of them to be precise.
I'm sure you can work out where these audio and sound recordings are likely to end up.
Now, the Homer Simpson approach is fun and it definitely grabs headlines, but it has its obvious limitations. It's time-consuming and very easy to detect by simply just listening to the song.
So, what if there was a way that you could encode data or poison the data that is completely undetectable to the human ear?
Well, that's exactly what companies like Harmony Clock, Poison Pill, and Synth ID are working on.
The idea is a form of audio watermarking. [music] You take an original track and embed the additional signals into the patterns in the frequency [music] spectrum that human listeners will never notice. Human hearing is roughly 20 to 20,000 hertz and >> [music] >> more standard audio formats only represent frequencies up to around 22,000 hertz. In theory, [music] you could technically encode data up to 300,000 hertz, but this would produce [music] enormously large files. And the hidden data would be destroyed the moment the audio is compressed into something like an MP3. In practice, most audio watermarking works by embedding subtle signals within the audible range itself, exploiting psychoacoustic [music] masking. Essentially, hiding data behind louder sounds that make it impossible to hear [music] to the human ear. Either way, the watermark isn't meant to be heard. It's designed to be read by a computer.
>> [music] >> AI models don't listen to the music the way we do.
They process the entire mathematical [music] representation of an audio file and its data. Every frequency, every waveform, [music] every detail, including the parts we can't perceive.
So, when a model trains on watermarked audio, it swallows the poison without the developer even realizing. [music] The result? Well, the AI's understanding of the fundamentals of music may get corrupted. While it thinks that it has trained on rock and roll, if that music has been watermarked or it's data poisoned to trick the AI into believing that it's actually trained on classical piano, later, when a user comes and prompts the AI to create a rock and roll record, it may sound like a soft piano or Mozart.
It makes the AI unreliable, less valuable, and it's more likely that the user will not want to use that AI again in the future. Essentially, it's weaponized noise, and the best part is that nobody knows it's there until the damage has already been done.
Now, the big question is data poisoning actually working?
Well, the honest answer is that it's still early doors. These tools are young.
They're still being developed, and they don't run locally yet, which is limiting adoption.
But, what data poisoning does do is create friction, a speed bump. It makes data unreliable.
See, nobody's actually claiming that data poisoning is going to stop the AI wave and bankrupt companies like OpenAI.
But, it doesn't have to.
What it does is create doubt, >> [music] >> and it forces these companies now to spend money, time, and resources checking and checking and checking that this data is clean and legit. Money, time, and resources that companies like OpenAI would rather spend on advertising, scaling, and fundraising for more money.
And that's what data poisoning is pretty much all about. It's friction.
And it is that friction that matters.
In 1996, John Perry Barlow imagined an internet free from the controls of powerful institutions. Unfortunately, to this day, that's just not [music] true.
But, what if the tools to fight back came from the exact place where the internet pretty much started? Ordinary people connecting online and acting on principle and sharing information and occasionally replacing every vocal in their music library with Homer Simpson.
Now, for the musicians, the artists, and the creators who for years have watched their work get consumed without their compensation or consent, data poisoning offers a solution that [music] regulation and lawsuits just still haven't delivered. Protection and agency. Now, it's not the final solution and this really isn't a victory, but it is a speed bump and >> [music] >> it's definitely a start.
Huge shout-out to Ben Jordan for his deep dive on data poisoning. He did a YouTube video about a year ago himself which goes into much more technical depth than I do here and he actually sits down with some of the creators of Harmony Clock and really picks apart how they work. I've linked his video in the description down below so you can go watch that.
Anyways, I hope you've enjoyed today's video and maybe learned a thing or two.
I'll see you in the next one.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











