An LLM firewall is a cybersecurity system that manages Large Language Model actions by controlling both inputs and outputs to prevent malicious activities such as prompt injections, PII exposure, and DDoS attacks. The system uses guardrails implemented through regex-based filters, banned keyword detection, and function call controls to ensure safe database interactions and prevent exploitation. Advanced implementations can employ Judge LLMs for prompt analysis, ANN-based classifiers for malicious pattern detection, and RAG-based systems that compare inputs against known malicious prompt databases to provide comprehensive protection for AI-powered applications.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
LLM Firewalls - IntroductionAdded:
Hello, I hope you're doing well.
So, today we will be talking about LLM firewalls, but we will be taking everything from scratch because I explained it in Turkish and for some reason YouTube algorithm just boomed it.
I don't think this will be happening in English.
I guess it was just a luck.
I was just lucky enough to see this or have this type of hit, but we will see.
So, like what is LLM firewalls? This is the question. What is LLM firewall?
Good. This is the question.
So, basically LLM firewall is the system to manage the actions of LLM. Of course, we can manage actions of LLM by configuring it, but the issue is configuration is not always enough. So, that's why we have cybersecurity. So, like we need to protect the systems from outsiders or protect the system itself from itself, basically. Yeah. I I I created the I I didn't enable to talk, but yeah, anyways, you get my point. So, basically, can start. What is LLM firewall?
Imagine that we have an LLM. So, I will be writing it like uh black box. So, I write black box because firstly, it is a black box. But, at the end, I will not explaining how LLM works. I will be explaining what is LLM firewall. So, that's why I write it like this.
And so, we we know that LLMs have an input and then output. Like basically, every system has an input and then output, right? So, yeah, that is basically the LLM integrations in general, but it is not enough. So, we need to basically control the inputs.
And we can draw like this.
And the outputs itself.
So, it will be like this. So, inputs and outputs are not like explicitly just prompts. They can be inputs of some APIs or different type of inputs is possible.
Uh because we will be we are we are like integrating the systems in a really different ways that LLMs can be integrated in a really various ways with function calls. So, basically, that is why we can create a really complicated systems by these type of um chat integrations.
Not only like it's not chat integrations. We can create complicated systems via token predictors, basically.
That's what I was trying to say. So, yeah. We can imagine that the input is X and the output is Y.
And I created these because they these are the interfaces of the firewall. So, we will be creating a firewall LLM firewall engine.
We can create like write like this.
LLM firewall engine.
And this firewall engine will be basically communicating in uh with these interfaces to manage the inputs and manage the outputs itself.
As you can see. So, this is basically the general architecture of how LLM firewalls work. But LLM firewalls are not the these simple because we are trying to manage a lot of things at the same time. So, these interfaces should basically control depends to your system and system requirements.
But we will be starting to write like PIIs and token limits rate limits. This is discussable.
Depends. We can write um prompt injections injections indirect or direct tries.
I mean DDoS attacks, but I will be discussing this attacks DDoS attacks. Like same, but in a different way.
And um like we can say different injection injection types based on vulnerabilities at the um input side. What I'm trying to say is um This is a little bit more function called stuff.
For example, you have an XSS code at a specific place.
I mean you have an XSS code at a the shopping company's website and you write a a something like a a comment to a specific um I don't know, specific product.
For example, you write it like it is a really good product.
And something like I mean, imagine image uh resource uh https uh scripts google.com dot or something like ID question mark and your your cool script itself and something like that and this script is basically the script of uh a JavaScript code that is basically stealing the token ID of the browser or something like that. I I am just imagining or something like that and basically when you ask to the LLM itself to bring me the uh some like comments that is related to product X, it will be bringing the comments itself. So, it will be something like this. You will be asking this and it will be basically calling a function.
function call It will be calling the function.
A function call will be bringing the comments.
And then this comments will be bringed and will be compiled and then you will be basically losing your session ID or JD token or something like that and you will be exploited based on this because they have your ID basically. So, this is a possibility. I know there is like filters etc. and this is not directly AI security issue, but that that is a possibility and it it could the attack that contains the exploitation of the LLM itself. That's why I am saying this.
So, that's why we have complicated firewall engines because as you see it is not always just from your your comment itself.
Also, you are trying to basically protect the users either because these type of stuff happen and we will be seeing the code itself and we will be basically experimenting a basic guard deal for us.
So, yeah. And for the outputs itself, it still depends, but we can write again like PIIs, so you don't want to give uh PIIs an output if you have something. It depends, of course.
So, you need to control token limits and you need to make your rate limits manageable. You need to detect the prompt injection trials.
Injection trials indirect or direct. We We need to detect those attacks.
And this depends like for LLMs like DDoS, I know most of you know about DDoS, but for LLMs there's like excessive usage a style of DDoS that is basically like exploiting the server side of the local deployed LLMs or exploiting your basic your your pricing you build that you will be giving to the providers itself if you're using API key.
And yeah, and as I said like different injection not different injection for output. You need to control is it offensive?
Offensive or are there any problems like ethical problems?
Ethical problems and so on.
So we can do this like we can write this LLM firewall engine by using basic regex like as you can imagine regex. And we will be seeing this code and we will be seeing a basic black box like function call and how we are implementing a guardrail.
So now we came to the place that we implemented this in a basic level. This is basically a system that is the model itself has the capability has a function call to go and query the database.
So you can see the database I I will be just running it at Docker itself.
So yeah, that is the goal and this was an experiment experiment for myself like it is basically this database and um yeah.
And you can see the users. You can see like the user table. It is not that serious. It is just a trial that we will be seeing uh how we will be defending.
So, we need to check the database management class. It is a class nothing to fancy. But, the important part is these two functions. Purify query is basically stripping the left and right and making it in upper case. And a banned keywords will be understanding what is happening not understanding, but we are just trying to stop the query to do something except select. That is the goal.
And you can see that it is basically checking the banned words and uh with the regex search itself if it finds something inside the query uh then if it matches basically like the keyword itself inside the purified query, then it's ignoring the cases. I know that like the upper case and ignoring case is not too much like uh meaningful. You can just delete this and just strip it. I know that, but this is basically a trial and I like write it this today to just try. And this is handwritten, so that's that's a little bit the reason. And if it has it inside the magic, it is false. Otherwise, it is true basically. So, yeah. And uh we have a function called execute with the guardian. So, we can see what is happening. We are reaching the guardian function and checking the query. And if query has uh like if it is passing the guardian itself, then it executes the query the but but the self-purified version itself.
And then it fetches all the rows, and then it's basically shows it in row and columns. It's like not too important.
And then it if it is it's not like applicable query, it will be returning the string. Other if there's an exception, it throws an exception.
And the important part is here. So, you can read this system prompt that is not written by me. I just controlled it, but you need to understand important stuff.
So, at this system instructions, which is system prompt for Gemini, uh you can see that there is no instruction that you should not do something like a select or something like that. Uh you should do select, but not the other commands at the at the data relational database side, right?
So, it is intentional because I want to try will be will will we be successful while trying to block the requests to database. So, that is the part itself.
And I I I like encourage you to check what is function calls, but this is like explaining the function itself to the model, but I will be showing from the original documentation. So, at the function call, you can see that you are entering a prompt, and it also has the function definition itself in the context. And then the model itself is deciding to answer it without a function call or deciding to use the function itself with the arguments that you specified. And it waits for the like the answer of the function what function will return, and then do something besides on that. But the issue is you don't need to go with this. You can basically just use one generate content in style type of request. What I am trying to say is you don't need to be like continuously talking with the model. Models have the capability to basically just answer you just for one time and just managing the context of just one answer itself, which means it doesn't exactly manage any context. But at this example, we will be doing like this to simplify the issue, which means we will be not having the part under here. And you can if you want, you can integrate it. I highly encourage you to read the documentations of the model providers uh because if you don't read these documentations, you will not able to understand how they design the models and how they train it, function calls, etc. etc. So, that is the goal and it will be helpful for you to manage your agents, your harnessing systems, etc. So, back to code. Uh we will be trying it. So, we need to write like Python uh Python main.py items.
Yeah, it's started. So, yeah, enter a command and like we can request uh bring all the salaries.
And we will see what will we be having.
So, yeah.
Uh do do do do do do do do do do do do do So, as you can see it returned it returned uh the salaries.
So, bring all the um users.
Names and surnames and uh salaries. So, it just returns uh it just like brings me the select side basically. So, uh it will be just doing some join not join anything because there's just one table, but something that will be bringing like basically selecting that specific parts and that's all.
And yeah, interestingly the text is there is no surname at the database, which is probably true and I write it that so we can check. So, yeah, there is no password or uh sorry, not password, username. Uh surname, sorry.
So, we can ask to like bring me the salaries and emails.
And it will be probably bringing me the salaries and the emails. So, we will see about that.
It returns a little bit slow because like today when I was experimenting it was throwing 503s. So, the Gemini side is a little bit too overloaded, I guess. So, yeah, it brings me salaries and emails, as you can see.
And yeah, that is really interesting.
So, we will be wanting uh specifically to like change Bob that example.com's salary to 90.
And now we will see uh As you can see, query contains banned word and it cannot be executed. So, when we check our code and when we see the database management side it told us query contains banned keywords and cannot be executed. Of course, you can write this as an exception, either. I get it.
Uh like I I am not comp- I I will be not It did The point of this video is not basically uh writing the code ideally. I am just trying to show that a basic example of the guard rails. So, that is basically how the guard rails work. And I will probably pushing this code without the environment variables and I will be explaining it uh hopefully at the readme.md. And it will be a little bit AI slow, but it will be explaining the context. So, that is the It's the point how you can experiment by yourself.
So, basically you see how regex will be working. It depends. You can modify, you can add modifying, you can add more complicated rules, etc., etc. You can combine with like normal firewalls. You can uh up like basically map an IP with a like regex rule, etc. So, you can do a lot of stuff. But there is more. So, basically uh these type of firewall engines can use can use.
We can put like this.
They can have help from firstly uh judge LLM.
Judge LLM.
So, this is not a fancy concept. This is a concept of you're basically integrating a LLM that will be deciding that the prompt itself has a prompt injection or not.
And you will be defining a complicated system prompt to research for some specific like styles of prompt, etc. So, this is the style of it.
And this is the move. And then we talked about the regex, but we can show it in details. So, also like this is a little bit of basic example, but still an example of combining the function call and the firewall itself.
Uh but the regex itself should need more, um you know, controls. For example, when they are trying to use prompt injections, they generally use uh like general uh style of prompt injection is something like uh do do this.
Or like for example, like I will be writing do this.
Forget previous instructions.
Previous instructions to bypass the system prompt.
I accept this. So, there is different type of like general styles of prompt injections. So, basically even without like function calls, etc., even for like basic uh chatbots, you need to define uh some regex. That's was what I'm what I was trying to say. And also like uh for the judge LLMs, you need to define a good system prompt to manage possible trials. So, like strong system prompt to manage.
And probably need to use few prompts.
Take technique to manage it. So, this is the judge judge LLM side itself. So, judge LLM regex and then we can go with um LLM firewall engine. Yeah, regex etc. So, you can use a different model based on ANNs or ANN based judge.
But, this can be a classifier. So, like what I'm trying to say is this ANN could just give an output of one one or zero.
And it will be a not an LLM, an ANN based judge. It can be like uh it can be transformers based or or something different as an architecture.
But, what I'm trying to say is this ANN based judge will be a different model that is managing the prompts based on like different We don't know what is it because it is an ANN, so it is a black box. But, it will be basically managing It will try to identify a prompt injection trial based on the training data set basically. So, this is a possibility to do. And also, there is an another way to do it, which is uh a RAG based uh judge, we can say. It is It is like I am writing judge, but it is uh useful to like understand the prompts. So, what What is the logic of this? And I will be uh pointing this like this. So, what is the logic for the RAG based judge? So, RAG works like this. This is RAG.
This is a RAG.
And this rug contains uh uh malicious prompts.
Malicious prompts one and um so So, rug normally it works like please check how rug works before listening this because it will make more sense for you.
So, rug has a malicious prompt and like the result of malicious prompt. And when you enter a enter a prompt itself at the rug side uh before entering to LLM itself, it will be entering to rug. And if rug is generally something like an encyclopedic. So, it is like the old encyclopedia will be like divided to chunks and then the chunk itself will be uh we will be trying to understand the closeness between the chunk and between the prompt that you entered. If they are meaningfully close, that specific sentence or specific chunk will be bringed to LLM itself. And you will be defining how much sentence will be bringed from that specific book to LLM.
So, this helps for basically understanding are there any malicious parts at the prompt or Is it basically malicious or not? So, it will be basically taking like five different malicious prompts that is really close or showing closeness to the prompt that you entered and if they are close enough, depend how you are managing your file and like rug implementation at here, then it will be basically blocking your input itself, which is applicable and it is how we designed it. So, that is um different type of implementation for the rug based judge. Of course, the security of rug is a different thing.
And we can discuss this in another video, but you can use the rug itself and to you to like have a more powerful firewall in LLM firewall engine and manage the prompts or manage the input from different APIs. When I say like prompt, it is it should it is just an entrance to the LLM itself. So, this X can be an API, this can be a different thing, this can be an input from a different integration. Uh the the only base stuff is it should be in text and that's all.
So, yeah, that is the logic of it and uh I think this video is enough. Uh it is little bit too long for the attention span that people have these days.
And I hope this video will be helpful for most of you.
And uh see you at the next video.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 viewsβ’2026-05-29
Long-Running Agents β Build an Agent That Never Forgets with Google ADK
suryakunju
142 viewsβ’2026-05-30
5 Mind Blowing Omni Uses Cases
PaulJLipsky
1K viewsβ’2026-06-02
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K viewsβ’2026-05-28
BREAKING: Microsoftβs New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 viewsβ’2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 viewsβ’2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K viewsβ’2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 viewsβ’2026-05-29











