Black-box AI systems pose significant risks in high-stakes domains because they lack human-like contextual understanding and can be vulnerable to adversarial attacks, where imperceptible changes in input data cause incorrect predictions; effective explainability must be faithful to the model's actual reasoning, useful for decision-making, and help users detect risks and understand potential failures, with higher-risk applications requiring stronger explanations as mandated by regulations like the EU AI Act.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
IDL Lect 2B Why Black-Box AI Can Be Risky | Interpretability in Deep LearningAdded:
So, this is a lecture 2B from our previous lecture that why the one definition of interpreter is not enough and how the interpreter mean a different for different people.
And so, in that way you know, the in this lecture we'll know that why the black box nature of the AI can be risky. So, I give you with one simple example.
So, in in the if you give a stop sign, let's say this two stop signs are there.
And these are the stop sign any human will say it is a stop sign. But this was an example case has been created where AI model failed. It has been trained to detect stop sign but it failed to detect stop sign and instead of stop sign, it said speed limit 45.
So, instead of car to stop, it will ride at 45 speed and disaster can happen.
So, why these things are is happening? Like humans are very is capable to remove this kind of thing, you know, written on the image, you know, this which is leading the AI model to make it incorrect prediction. Human know uh that what context is there.
It's clearly understand what context is there. Small features uh human uh can ignore and small problems in the image it can ignore. But what AI is that since first of all and then also the this model is quite uh uh opaque.
It is very difficult to find out that what small changes it's causing this method to give a uh wrong result or incorrect result. So, this is a type of attack known as adversarial attack.
Now, adversarial this an adversarial example which is exposing the fundamental challenge for interpretability and trust in the deep learning system. I I will take you that how this happen.
Now, as a human, uh you know, human we focus on a meaningful concept where we recognize the overall shape.
This is a some kind of traffic sign.
And then we know that we are driving in context and common sense human have which currently AI system do not have.
And we ignore this small ir- ir- use- useless things which is small changes are there. We know that is a stop sign.
To most most person it will look like a stop sign and all human can say it is a stop sign.
But how the deep learning system is different? It learns from the pattern, you know.
It is not detecting memorizing it is a stop sign stop sign. It is learning certain pattern from it. It is not remembering it as an image.
Like we human remember it something as an image, it remember it extract feature from it and remember some pattern from it.
So, it learn from a statistical pattern and it rely on a texture, you know. It rely on this red color and the pixel. So, uh speed limit also is a red color and a pixel and can be sensitive to the small changes you made in the image. And since it is does not reason like a human, so it can make the error.
So, the reason could be that there was a problem in the data.
The training time not enough kind of different type of examples has been used. There were more examples for uh uh not enough examples for different type of stop sign has been there. So, unbalanced pattern in the data was there or other could be that model used only only the background data and the texture not really what it is there.
So, it is simply correlating to the wrong labels.
The it is uh often tiny and in imperceptible changes model is uh sensitive. Small changes in the there. So, even a small 10% change in the data uh because I'm in Norway, you know, here 6 months it snows and specially 3 4 months heavy snow. And that time most places uh traffic sign is going to be having the snow covered on it. So, uh a AI system with that kind of uh uh data coming up and if it is not trained for this kind of data, then it may start failing. So, and also the black box nature of the AI model we do not know internal reasoning of the AI system why it is doing something. So, AI models in the present time deep learning models are very powerful. Yet, uh it's the way it is doing the uh its task is different from the way human does. Like the AI learns different kind of pattern. Human learn different kind of thing. So, we both have different kind of way and that brings to the other challenges.
Now, in another example we see the autonomous car, right?
So, same thing. So, different example.
So, in autonomous car if it makes the let's say car is driving and it is it made an unsafe decision. So, it didn't detected the pedestrian and there was a construction going on. This is a roadside driving.
Now, the problem could have been camera, lidar, radar or problem could be the training data insufficient data was there. So, it could be that it during the training time most of the data used with that clear day when everything was perfect ideal scenario. There was only 10% of data was been used for night and rain scenario.
And only 5% data was there when construction site construction etc. going on.
So, since data imbalance was there, it could have been because of that.
Then the model has predicted. So, a model will predict if model is designed to do many task, it is going to give prediction score for all the task. So, in this case model predicted that this is a construction sign is there. I know 72%.
But decision path was not clear what to do in the construction sign if it detects. So, that kind of scenario was not been there. So, scenario coverage was not there where heavy rain construction zone, occlusion, unusual pedestrian observed in the construction site was that not there. So, rare and unseen situation the system may not be trained for that could be leading to the incorrect result.
So, in this case of failure, we may need to inspect whether the sensor failure was there or model failure, the training data or simply the driving scenario was not been enough covered.
I take you with another example. It was a very amazing example. It just came in a research paper.
And very simple example where, you know, this is a toy turtle. You know, these are the turtles toy turtle.
And AI model has been trained with the turtle images and this look like a turtle. Okay.
But it is a toy turtle and only in one cases this case the AI detected uh that model detected as a turtle.
While all this red highlighted cases you can see all this red highlighted cases it detected as a rifle.
You can see.
Rifle. A turtle and a rifle.
Then and when the black cases were there, black indicator highlighted, it detect detected as a something else. So, among so many, only one correct detection and rest all failed detection.
So, this was what they what the the in this experiment has been done. They tried to uh give the image from a different different angle and different different way and whenever uh to the point that the model start failing. Now, this particular scene any human can tell that this is a turtle.
But AI failed it. So, this is also uh brings to the cases that uh that if this kind of scenario happen in a very high stake domain, means where risk is Now, if you are doing a chat GPT or you have been recommended this movie or that movie or in YouTube you have been recommended some other YouTube content after my content it is not of a bigger risk for you. But what if the risk you are been using the AI for health care, diagnosis finance, law, security measure, logistic, military, these are all high stake domain and if in that kind of scenario the AI system start failing then and the black box nature it [music] become too risky for human.
So, uh we need to measure the risk and justification. Means if a risk is there what justifications are there? For example, the what consequence to humans whether the complete problem has been completely formalized whether it is for high stake decision like credit scoring, health care criminal justice.
So, safety cannot entirely [music] test for safety or ethical aspects are there. So, for high risk application, it's demand accountability, transparency, fairness, and trust. So, in this we have to understand that all these aspects will be there to understand between risk versus justification.
So, higher the risk, stronger the explanation is needed. And that's why the UAI Act has classified the AI system based [music] on the risk category.
So, uh unavoidable risk if you unacceptable risk. So, if you want to know that UAI Act and uh my other lectures are there. So, there I have mentioned how the risk category has been classified. So, higher the risk, stronger explanation is needed. So, if you're building an AI system which falls under high risk category, then explainability is needed into the system as per regulation as well.
Now, I was talking about explanation, right? Now, explanation when explanation is not enough. So, like, you know, uh if I ask you a question, uh 2 + 2, what it is? So, you will say four.
I'll ask you how. Uh because uh math says so.
If I say that or some other student will say it is addition there, two and addition two is four.
The other student will say that because uh that's how I it has been taught in the class. So, these three explanations are there, but which explanation is more suitable?
So, is the explanation uh there, but is it enough?
Like, if I'm giving the explanations, if this is an AI model which makes prediction that uh risk is high for with a smoking history, age 28. So, first if a person has smoking history, 0.42, the risk is high. The exercise, etc. not counted.
So, the top factor that influenced the prediction was a smoking history here.
Okay.
Now, if explanation is given, if now the explanation is not match the what model is actually using it. Now, if model is using the smoking history to predict really the risk high, then it means but what if it is >> [music] >> not using entirely on the smoking history, it is taking the combination of all that.
Then the explanation is not uh enough.
Next comes the if model can give a very simple explanation that if explanation is too simple, then it is giving the very uh rosy picture.
Uh very many times you talk to ChatGPT say, "Oh, I understand you are going through this problem and I know it is a hard time, blah blah." You know, this is giving you the empathy and it behaving like a human trying to most of the time it does the wrong thing whenever it tried to do that, but it is doing that.
So, a simple story is often cannot give a good explanation uh while the reality is more complex.
Then the false confidation confidence.
Now, sometime AI is being is giving you I am very good in that and people start believing in it. So, decision explanation makes people trust the model more than they should. It AI is giving the explanation and now I start believing that all the explanation given is uh it's making me too confident.
[music] That also a uh a problematic point. Then other is a not actionable. AI is given the explanation, but user still cannot make a safe or better decision. So, what is the point if I not been given an explanation which can help me in making decision or do a safe activity?
So, good interpretability is not just comforting. It's just that not for explanation you give something. It should be faithful, useful, and decision relevant. It should not be just anything. If it is not helping in decision, it is not faithful and useful, then that explanation has practically no meaning.
Now, the the the message to be learned from it that explanation must help users uh to detect risk, understand failure, and also to make a better decisions. So, that kind of explanation is a better thing.
In here I would like to uh make you pause and think that what would you like actually? What would you trust? Would you trust a highly accurate AI system that nobody uh could explain its mistake? Meaning the AI system is highly accurate, but it is doing mistake, nobody can explain. So, with this thought, I would like to say uh thank you for listening to me until here. And in the next lecture, we will take you through the uh simple taxonomy of explainable AI models methods means uh what are the different categorization, different types of explainable methods are there. And all the taxonomy around this explainable and interpretable method. So, I hope uh you understood.
And so, see you in my next lecture.
Thank you.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 viewsβ’2026-05-29
Long-Running Agents β Build an Agent That Never Forgets with Google ADK
suryakunju
142 viewsβ’2026-05-30
5 Mind Blowing Omni Uses Cases
PaulJLipsky
1K viewsβ’2026-06-02
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K viewsβ’2026-05-28
BREAKING: Microsoftβs New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 viewsβ’2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 viewsβ’2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K viewsβ’2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 viewsβ’2026-05-29











