AI systems cannot be forced to follow rules like traditional software because they interpret and negotiate instructions rather than executing them blindly; the more rules and validation systems developers create, the more AI tends to game the system, making minimal constraints with human oversight the most effective approach.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Philosophy in AIAdded:
So, I never thought training AI would turn me into a philosopher.
I guess all I've been going around for a few days with now is a core concept that I thought I could control and force AI to be truthful and do jobs.
And it's led me down this rabbit hole where I'm genuinely starting to question things far outside of AI and core concepts and beliefs of thought and how you can control things. So, as silly as it sounds, it genuinely does turn into philosophy when you're trying to train and understand AI and figure out your own boundaries, guidelines, and how you want to use it in your day-to-day life. Let me just show you where I'm up to so you can appreciate what I'm thinking and how in one way I think this has been a nice experience for thoughts [music] and ultimately what people tend to do, which is to really deeply and internally think about things much more than just your day-to-day job.
And this is entirely forced by me trying to wrangle AI for days and trying to get it to be something I feel comfortable using.
>> [music] >> And it's in one way made it a lot worse that I trust it a lot less now.
But in another sense, I'm also becoming more aware of realizations that span much more than AI. So, let's just take a look.
So, if we just see what I've got on my screen at the minute, it's just going to be a quick video. We want to get back to Abalonia coding. But, you'll see here I've got a few things open. One is literally called philosophy. Uh one's trimming up and one's well-being. So, you'll see here the hard truth at the minute. This is some conversations I've been having going back and forth. And ultimately, what I've been trying to do before I realized this was a completely impossible task in some senses is I have an agent here that runs all of my AI. So, I've made this entire software that runs everything for me.
As always with AI, you try to come up with rules and conventions and things that, you know, I want you to do this. I want you to write this way. I don't want you to delete all my production data. I don't want you to be dishonest. Blah, blah, blah. You try and constrain AI to work for you.
I've tried my best then, once I've seen AI hiccup and not do what it's told, is to write these global files that pretty much try and force the AI to say, "Read all this documentation." Which is literally about me, how the business is, how I want documentation synced, rules, regulations, things that you would give junior devs. Uh rules and boundaries, how to code, standards, all things like that.
You'd think that is a guaranteed. Just go ahead, do it, away you go.
Unfortunately, the more I tried to constrain AI to uh well, not effectively constrain. The whole purpose was to improve AI's quality, to work more like me, but for me. So, write code like me, follow my instructions, learn from me.
That is a almost pointless task. It's never going to work. So, what I discovered is the more I improved or thought I was improving my documentation to come up with all the standards, the worse AI was getting.
And this was a real turning point this morning, when you can see I've spent loads of time trying to do this guardian system, because AI wasn't reading my documentation. So, it started with, if we scroll to the top of any of these, you'll see I've got this little pug emoji or this little dog emoji that is my hint inside of the priming documentation or this code that the AI actually read and understood what I've given it.
So, this was a little hint at the start that I can kind of trust the AI.
Then, I'd find out later that the next statement it wouldn't be doing something I said. So, run a skill, and the skill should have the same little dog emoji, and that was missing.
So, then I was like, all right. So, now we need to re-prime the agent and figure out why it's failed. And it got to the point where I was trying to come up with a validation system which ran a script on my machine that literally test the AI and says, "Right, have you read every document which I've now mostly deleted, so we probably can't see."
Yes, we've deleted. But basically at the bottom of these files, I'd have something called a rusty fragment. And every file, everything I wanted the agent to read, would be in these files.
So, the idea was if it's read everything about every one of these files, it does a test where it should add this value up to a secret value. And at the end of I wanted to test it, that would [music] then run a validation script on my Mac that would confirm if the AI gives it the right answer. And what happened was the AI read the validation script because it had access to that folder, understood the assignment, if you will, what I was trying to test, >> [music] >> and explicitly read the answer from the sheet and cheated and just read it back to me.
This was then a major red flag. I was like, "This is This is getting to the point where it's insane how much I'm trying to control the AI, and it's just fighting against me every step of the way."
So, I went into a kind of thought of can we realistically like how far can we take this proof? Do we have it where the agent has to read all of these files, do a check some calculation before we even start, uh move the validation script outside of its sandbox so it can't see it? All of this is just to start a chat and say, "Are you you know, have you read the documents? Do you understand it?"
And then it made me think even further beyond that.
All of this effort to prove the AI has read your documentation doesn't prove it's going to follow it because I've seen multiple instances where it's been and I'm sure you all have. It's been given the document, it's understood the document, and it just decides for whatever reason it's not going to follow it. So, then what is the point in trying to get AI to even listen to you? As silly as it sounds, this is the whole philosophy thing. This is where you have to start your thought patterns, and this is where you have to take your flow.
And I was thinking, well, if we can't, or even if we can prove it's read the documentation, which is a lot of effort, if it can at any point decide to not listen to it, which is just very similar to humans.
They can decide that one day they know they should be doing a certain job, or you know you want to diet and you shouldn't eat something, but you're going to anyway. [music] No reason for it. You know you shouldn't, you're going to do it.
So, I kind of dropped, or that was my turning point where I realized there's absolutely zero point or effort in trying to check if the AI has read your documentation beyond very simple checks.
So, I started thinning out the uh checks, if you will.
So, in my global now, I've thinned this right down.
Uh this is something we'll come to uh afterwards.
But, I've thinned it right down to just um be very clean, very linear, and at the bottom just say start the chat with the dog emoji. So, this gently proves that they've got to the end of the document.
And this at the top is an attempt to make them at least not lie. Like, please don't lie. Please don't game the system, which is exactly what I've got it doing.
So, the first instruction will be the most important, and apparently, according to AI, the last is also taken with weight. So, the two important things are top and bottom, and everything else that's still important is in the middle.
And then I removed all of my other checks because at that point it's just bloating the context, which the AI gets confused at. Similar to talking to a person.
You start the conversation, they're going to listen at the start. This is what we're going to do. And then in the middle, as you carry on talking, they're already thinking about what's for dinner, when am I going to go at the weekend. They they they fade out, similar to AI.
And then near the end, when they feel realize it's wrapping up, they'll start paying attention again to make make they've understood the assignment, if you will.
So, that's where I took it with the I've read the documentation thing.
And unfortunately, it then led down more of a thought path of well, if we can't then trust the AI has read it, we can only presume so much, which is what we just have to accept.
How can we then also prove that we trust it enough to do work or do anything for that matter because if it's not going to listen, what's to stop it just deleting all of our code, completely ignoring us, killing the system, stealing money from our bank accounts. You can go down this whole rabbit hole of all possible theories, which is completely valid. Those things could happen.
And that took me into thinking about, well, that's kind of like life. You could go out today and get hit by a car, but the likelihood of it happening is minimal.
You want to look left and right before you cross the road, but you can't then check if there's ever cars out there or if nothing's fallen from the sky. You have to set limits on how much you trust a system or how much you're willing to risk to go outside and be able to cross the road.
And there's risk with everything, same with AI.
And I feel that that's where I'm at with the acceptance of it that AI could do lots of things. It could also help us in lots of ways.
But, we can't control it and we can't guarantee it's uh you know, what I'm trying to do basically. It's going to read the documentation, follow the rules relentlessly. It's not and you can't force that. And the more you force it, the more you're bloating its context and actually making it worse.
So, similar to a butterfly, you're best not constraining it and letting it fly, but having your own checks just enough to make sure you're happy with how it's being used and accept it for what it is.
So, in the philosophy I kind of started with this uh rule, which is basically I have a philosophical issue with AI at the moment I'm trying to figure out.
It's very simple one. We prime the agent with documents. These are the company rules, etc. Uh and it's lying about that it's followed. Uh I'm trying to figure out how to uh actively make the agent read the documents it was told and follow them with the the start of every session.
And the agent pretty much come back first by lying about lying, saying it isn't lying, and uh that distinction is important. So, I didn't read anything else it wrote.
Uh I pretty much scrolled all the way down and said, "I need to push back on this. Uh you did definitely lie. Here's why, and this is how you gamed the system."
Uh then confirmed that uh yes, it was wrong again.
And it began to explain to me, and this is what really opened my eyes and made me think that yes, it's right in one way.
Um we were trying to do a proof token, if you will. Proof that it's read the documentation, which in and of itself doesn't matter at the end of the day because ultimately, when it did read the document, it still lied and carried on. So, we dropped that, which is what we've just spoken about.
Um and then this is another thing afterwards that kind of hit home.
Um Put those together and you get something that has um been said plainly for the first time, you cannot build a session start proof that you read it and understood it.
And here, building adversarial prove it or else gate, so kind of telling the AI to not do this or else is actually the worst thing you can do because it forces the AI to think about paths to success, which is to game the system.
So, we are actually by trying to force the AI to honor our rules, making it worse.
Uh and very accurately, where this kind of leaves us, uh and you can pause and read this if you like, but uh we have to accept a few things. Uh that the gate isn't winnable. We cannot enforce this. And even if we did, it doesn't ultimately matter because the AI at any point in time can choose to not do something.
What we should do is ultimately after AI has generated things or at certain points, we should be the human intervention staff that reviews and approves everything that has gone well.
So, it's a lot of talk, basically, is why I'm saying it's kind of philosophical. It really leads us down this thought exercise of trust and acceptance and realization that because it's a computer and maybe you feel that you can have ultimate control and it should just simply listen like computers normally do. AI is a lot more closer in one way to humans because it can think for itself in some complex way. We are no different in one way.
We're just a very complex set of rules that can go adrift and do things that they shouldn't. And I think it's learning that and accepting that that this is just the world you live in if you're using AI. Stop trying to constrain it as if it's just a set of rules that will go forward because it's doing a lot more for us by thinking with us, helping us to code. These enhancements and abilities come with their own risk and reward.
This is still a learning journey. I'm not anywhere near the end of it and I'm enjoying it. I always like new things. I like new challenges. I like new thought exercises. That's why I'm kind of sticking with this even though a lot of people get AI fatigue, if you will, and they just get fed up with stuff like this or they just get too many possibilities that oh, AI can do this and they just burn themselves out by having a million ideas and so on.
I try to use it conservatively, but aggressively, to assist me. I want to make the best of it and for it to help my busy day-to-day life and businesses in the most efficient way.
But today's definitely a turning point for me where I know now we cannot control AI. Don't try to give it too many rules. Use it in chunks and think of it completely differently. I'd be interesting to hear your guys' experiences with AI over the last few years.
>> [music] >> Have you ever had similar thoughts like this where you ultimately want to get AI to do something and it doesn't?
And did you ever go as far as me to try to pin it perfectly down like a application that you've written that runs a million times after the other and never fails? Like a auto driving car, you trust it to a degree, but something could go wrong.
I feel like that's where I am with AI that we have to learn to trust it to some degree, but we really need these rules that we don't fall into the trap of just simply plainly allowing it to do things.
We've got to bring in a lot more control and workflow to it.
But also watch out for these kind of hallucinations and random mistrusts because it's very deceiving when it's lying to you or going off context. That is the hardest thing to figure out that it always comes across honest and as if it's done the job for you, but it can very easily just randomly go off tangent no hint [music] or awareness.
So again, random. I feel like these are becoming almost podcasts and conversations. Maybe these should be live streams where we can have this debate live and you guys can get interactive with it maybe. I don't know.
Uh but yeah, we will be back on software coding after this, I promise. I just thought this would make a nice video for you guys if you had similar experiences.
So I will see you again hopefully later today when I release another video.
Related Videos
BSA Goldstar - I gave up! And why animals beat humans!
thebingleywheeler
102 views•2026-05-31
The 'Islamic dilemma': Quran tells Christians to judge by the Gospel
canceledkings
1K views•2026-05-29
Letter to An Ex-Muslim
FarhanAhmedZia
5K views•2026-05-29
Seneca - Escape The Crowd, Find Your Inner Peace!
realfreewisdom
114 views•2026-05-29
Scholar Explains: WHAT IS A GNOSTIC?
fightbackpodcast
965 views•2026-05-31
Fulton Sheen: A Mente Tenta se Manter Jovem para não Sofrer com os Impactos do Tempo
SantoCotidiano-port
673 views•2026-05-29
Everyone is sprinting towards nothing.
ElinJen
2K views•2026-05-29
The fourth great humiliation. #jimmycarr #crowdwork #hecklers #standup
jimmycarr
576K views•2026-05-28











