Retrieval Augmented Generation (RAG) is an AI technique that combines retrieval mechanisms with generative models to provide accurate, domain-specific answers by first searching external data sources (like documents, databases, or PDFs) and then generating responses based on the retrieved information, rather than relying solely on the model's pre-trained knowledge. This approach significantly reduces hallucinations and ensures answers are based on specific, up-to-date information from provided sources. The RAG pipeline consists of three main steps: (1) Retrieve relevant documents from external data sources, (2) Generate a response using the retrieved information, and (3) Respond to the user with the final answer. The retriever component searches for relevant documents, while the generator component reads those documents and creates the final answer. Vector databases are commonly used to store and search embeddings, which are numerical representations of text that capture semantic meaning, enabling the system to find similar content based on meaning rather than exact keyword matches.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
AI Agents Mastery Program tutorials || Demo - 33 || by Mr. DURGA Sir On 26-04-2026 @7PM (IST)Added:
Good evening to all.
So, welcome to AI with Durga sir today.
5 minutes.
>> 5 minutes. 5 minutes. 5 minutes.
>> Today, exactly 5 minutes. I'm ready to start to Hyderabad. Okay, just for consistency sake, I'm taking this session.
So, do you know how many people aware rag RAG rag model rag rag model Okay?
retrieval augmented generation What is the word? Anyone, can you please spell out?
retrieval augmented generation augmented generation What will happen is Suppose if I ask ChatGPT, okay, LLM, any LLM.
Then with the its own knowledge the LLM is always going to provide answer for us, for our query, for our question, whatever we are asking.
With the LLM's pre-trained knowledge, it is going to answer. Correct, right?
Respond. Yes, sir.
Yes, with the with its own knowledge, it's going to correct, it's going to provide.
Most of the times that knowledge is a generic knowledge.
Not specific to a particular domain, not specific to a particular organization, like So, but sometimes what will happen is Suppose if I'm talking about bank, ICICI Bank.
If I ask uh what is the interest rate in your bank?
Fixed on the fixed deposit, what is the interest rate? If I ask this question then automatically what will happen is uh so, very very simple, it's always going to provide the answer. It's always going to It's always going to provide the answer, generic way, normal LLM. But generic concepts are not allowed because the banking more specific related to that particular bank only, it has to talk.
For that, what we are going to do is we are going to convey clearly, "Hey, don't use your existing knowledge.
Don't use your existing knowledge. Okay?
What you require to do is I will provide documentation. So, refer these documents and provide answer."
"Refer these documents and provide the answer. And provide provide the answer."
Like we can How you can do that is For that purpose, we should have retrieval augmented generation rag model concept. Now, very simple, in simple words I'm going to tell.
Suppose I am the trainer.
Okay? So, I learned from several websites or from several books or my own knowledge. So, somewhere I learned.
You're asking the question to me.
With my own knowledge, I'm trying to answer you.
Okay, like Suppose you are telling, "Hey, don't use your existing knowledge.
Each and every point you have to teach from these four books only. You are not allowed to use your existing knowledge.
Each and every point what you are going to teach from these four books only."
Like you provided four books to me.
Then automatically what will happen is I'm always uh What will what will happen? If whatever the question you are asking, I have to refer these books, from these books only I require to answer that.
This type of thing is what we call considered as rag model. "Don't use your existing knowledge. Please refer from my provided documents or from my provided data sources, from there you require to analyze and you require to provide the answer." If we have such a type of requirement, happily we can go for what?
Rag model.
What is the biggest advantage of this rag model? Anyone, can you please tell what advantage we get?
The biggest advantage is hallucination is going to be reduced.
Correct, right?
Hallucination is going to be reduced, telling wrong answers. Next up, prediction won't be there. So, prediction won't be there, telling wrong answers is going to be is going to be gone.
Okay, like uh so, it's always going to come from our provided things only. You are not giving the chance to the AI to get something.
That's why hallucination is going to be controlled.
Next, more specific concepts we can get.
Next, do you know? We can believe more believable output we can expect. All these benefits are there with the this rag model.
Everyone got the basic idea about how you can implement the rag model. I will discuss in the generative AI or agent AI, how we can do that part, I will take care. You are not required to worry.
Is it clear for you people? Anyone?
Everyone got the basic idea about rag?
Yes, sir. Basic idea, theoretical idea.
Don't worry much about practical implementation of these things, we will discuss in details. Okay? Now, let me go for some few questions are there related to this rag model. Okay, let me So, we will we will discuss these things so that you people can feel more comfortable. Okay? Here, see this one.
Just I provided the prompt.
"Provide seven MCQs on rag. After every question, wait for my answer. Validate the answer and provide learning conclusion. Wait for my confirmation to generate next question. Have you got it?" Yes. So, we will go step by step.
One question, your answer, validation plus learning. Wait for your Y to generate the next question. Yes, perfect. "What is the main purpose of retrieval augmented generation in AI systems?" Okay? So, can you please answer this question? To train large language models from the scratch, to improve answers by retrieving external data before generating response, to reduce model size and memory, to convert a text into images. Which one is the correct answer? B.
Well, respond a bit loudly.
B. I did B. To improve answers by retrieving external data before generating response, before generating response, by retrieving external data.
Okay, like uh yes.
Remember this clearly.
Okay? So, validation Yes, correct. Yes, you are the right. Retrieval augmented generation works First, it retrieves relevant information from external sources, PDFs, databases, documents. Then it generates a better and accurate answer using the data. So, instead of guessing the answer from the memory, AI will search and then answer.
But where AI is going to search, sir?
Please confirm. Where AI is going to search?
In your in your provided documents, in provided database, provided your data source.
Wherever, it's always going to search there, your provided thing, then only it's going to answer. Yes, perfect. So, type Y for the next question.
Okay? So, in a typical retrieval augmented generation pipeline, what is the correct order of steps?
What is the correct order of steps?
Okay?
Generate, retrieve, store.
Retrieve, generate, respond. Train, retrieve, generate, store, train, generate. These two things are not Yeah. So, can you please confirm which one is the correct answer? B. Second. B B, again. What is the reason is? So, can you please retrieve from our provided data source.
Generate the response and then provide that response as the response. Provide that as the response. Okay, like So, B itself is the correct answer.
Okay? Perfect, you are understanding clearly. In retrieval augmented generation flow is very simple. Get the relevant data from the documents or databases. Next, LLM uses that data to create answer. LLM uses that data to create answer. Final output given to the user. Retrieve, generate, respond.
Retrieve, generate, respond.
Instead of direct answering, it is search first and then answer. Instead of direct answering, it is search first and then answer. Yes, perfect, friends.
Okay? Which component in retrieval augmented generation system So, is responsible for finding relevant documents? Relevant documents. Oh, so this one is a implementation level.
Okay, we are not required to worry, but anyway, just we will we will touch.
Which component in a retrieval rag system is responsible for finding relevant documents? Generator, retriever, tokenizer, decoder. Anyone, can you please tell? Because I do not know.
>> LLM.
It's generated.
The people who already familiar with the rag model, can you please answer this question?
The people already familiar with rag model, basic idea. It's asking about implementation thing.
Which component >> will find the relevant document. In retrieval augmented generation system, is responsible to find the relevant documents? Is it generator?
LLM?
Or retriever? Or tokenizer? Or decoder?
Tokenizer.
Yes, friend?
Generator.
Retriever or generator?
Okay? So, let me take I hope not generator.
Retriever.
I hope not generator.
Tokenizer also no.
No.
Actually, what will Actually, what will happen is first the request will comes to our external to retrieve that thing.
And that retrieved thing then pass it to generator.
That LLM is going to okay prepare the answer with your data and then. So then obviously LLM should come in the second step not in the first step directly.
Naveen, what's your answer? Because I hope you have some First first we'll have the like pipeline data stored in our database vector database, right? Vector database. So then like suppose I'm asking one question.
Okay. So that question I need to convert into like the destination data vector embeddings so that we will find the relevant match because using the cosine similarity and all.
Correct. So we will send our question and ask the use the LLM to use that destination data our vector database data and find the relevant documents.
Then obviously then obviously it is not going to hit our question is not going to hit LLM directly first.
Yeah, LLM yeah LLM we need to convert that data into like is a normal question right sir. Yes. So our destination data is in embeddings.
Mhm, correct. So we need to convert the data into embeddings.
If you don't convert there is no no match.
Okay. So there is two step. Yeah, yeah.
So I thought that first it is not going to if the two step first it will contact our external vector database or otherwise PDF actually.
So it's going to identify chunks the corresponding mapping matched chunks it's going to identify and then the chunks are going to be handover to the LLM. LLM is the responsible to prepare the answer. It is the responsible to send the response. Correct, right?
Yes, yes. First is LLM will be there to find the data document but before that [clears throat] we need to convert our question is also into embeddings.
Because our is like natural language question. Which component is the responsible for finding relevant documents?
Anyone can you please tell about the answer?
Retrieval.
Retrieval?
Because B is the correct answer for most of our questions and that's why.
When the embeddings when they match that when they when the embedding when they match the tokens it retrieve the based on your uh similarity match.
Based on the percentage like whatever you match it it will retrieve based on so after retrieval then it will go for the generation. LLM will LLM is a just language to generate the your text that's it. It it will not match.
It will not retrieve so that's where.
Correct. That's why obviously LLM is not the answer. Mahesh. Yes.
Yes.
So yes, that's what my question is also LLM will come in the middle stage if it is the rag model is there.
Okay. So we are going to get chunks and then those chunks are going to be handover to the LLM. Then LLM is the responsible okay to prepare the response to prepare not to generate correct answers and then it's going to handover.
Just for preparing beautification activity only LLM is going to do not generation.
Correct, correct. Okay, then maybe I don't want to keep >> [laughter] >> Okay, even it's a wrong don't worry at all. Okay, because retrieval because we didn't implement that's why I'm not in a position to tell exactly. Okay, B. Any how many people feels that B is the answer?
Decoder, tokenizer no. A or B only.
Okay. Yeah, B only.
Let me check. Yeah, perfect man.
Excellent.
Okay, retriever searches and it brings relevant documents.
Generator reads those documents and creates final answer. Okay, retriever like a Google search generator like a teacher explaining the result.
Wonderful, right?
Naveen.
Yes, sir.
The remaining people just observe. So retriever retriever is a component which is the responsible to searches and brings the relevant documents. Generator reads those documents and creates final answer.
Okay. First we need to find the documents so Ah, exactly exactly. Then LLM after once we find the chunks matched chunks then LLM will come into the picture.
Yeah, right.
Which type of database is commonly used in retrieval augmented systems to store and search embeddings? Which type of database? And obviously relational database, vector database, file system, spreadsheet.
There is no doubt at all the answer should be vector database. Vector database.
>> I think here uh here in the question it is based on the question it is B sir. Yes. But in some relational database also no in the relational database also they are providing like embedding column.
Okay, okay. Like for example like Postgres. Mhm, mhm, mhm. They have both the relational and the vector database capabilities. Vector database capability is there. So then Maybe in MySQL it is not there.
That's why it is specified MySQL in the brackets.
Vector database.
Correct answer B.
Correct again, good consistency. Oh, in retrieval augmented we convert the text into embeddings.
These embeddings are stored in the vector database. Then the system finds similar meaning data using similarity searching. Instead of keyword search it does meaning based search. Okay, semantic search we are always using right? The same.
Okay.
What is the main benefit of using rag instead of relying only on pre-trained LLM? Yes. Anyone can you please observe that what is the answer? It makes a model smaller. It allows real-time access to updated and external information. It removes the need of prompts. It converts the text into audio. Which one is the correct answer?
B sir.
It allows it allows real-time access to updated and external information. So updated latest information by default will be there.
Exact information. Okay? So with the retriever models can use the latest or external PDFs. No need to retrain model again and again. Answer becomes more accurate and up-to-date.
LLM alone older knowledge. Rag live plus updated knowledge.
Which of the following best describes embeddings used in retrieval augmented system?
Images are stored in the database.
Numerical representation of text capturing meaning.
Audio signals, encrypted passwords.
Which one is the correct answer?
D sir. Numeric representation of the text capturing meaning. Vector embeddings means semantic meaning.
Numeric representations of the text capturing meaning.
Vector embeddings. In your prior sessions also during the concept we discussed. Yes, yes. We discussed about these concepts. Embeddings sir. Yes.
Numbers representing meaning of text.
Similar meaning similar numbers so this helps you to find related content easily. Car and vehicle close embeddings. Car and banana far embeddings. So semantic distance. Have you remember at our childhood somewhere we covered.
Yes, Naveen.
The first starting from the engine.
Yes, friends.
In transformer architecture when you were explaining you explained that In the generative AI transformer architecture like I covered right?
Yes. Then I told how many how many people remember? Yes, after knowing that semantic distance vector embeddings concept yesterday the answer for two minutes I felt very happy oh this is the secret how it is going to generate the content how it is going to understand our context. Like have you remember right? I told this point. Yes, sir. Yes, sir.
Attention.
Yes, attention mechanism [clears throat] internally attention mechanism is going to be there. That attention mechanism is always going to work based on semantic distance semantic search.
Like that numbers will be there embeddings will be there. So If possible can you take one more session for who missed if possible? Those things we will discuss in detail. Don't worry. Okay, just that core foundations of generative AI only I covered just basic idea what is generative AI like.
Okay, we will discuss in detail.
Embeddings means insertion sir.
Uh embeddings means what we call numeric numeric representation something like that.
For our data for our data you know?
Converting natural text to system understanding numeric digits. Numeric numeric digits you know? So something like if car is there then automatically for the car almost around some thousand some thousand thousand thousand numbers are going to be added. Is it a car? Is it a vehicle? Is it having the steering? Next is it a fruit? Like that that that that and so on. So a number a numeric representation associated with the word with the text word with the token. Then automatically what will happen is so similar type having same embeddings nearby embeddings nearby numbers. So if these two numbers are nearby means oh two things are uh related.
Getting right, okay?
Yeah, got it, sir. Yeah. Yeah, that's >> [clears throat] >> something like a transformer level architecture. That part anyway, if it is required, maybe soon I will discuss in detail whenever we are implementing like Okay, don't worry. Okay, thank you, sir.
Yeah.
Here, what is the common real-world use case of retrieval augmented generation?
Okay, what is the common use case?
Playing video games, chatbot answering questions from company documents, creating hardware circuits, editing images. Which one is the correct answer?
B.
Yeah, chatbot answering questions from from company documents only. I'm telling you, all right, don't apply your own meaning. Your own trained data I don't want. I want from these documents only the answer should be. Something like that. Then obviously, we need to go for Okay, this rag model. Wonderful.
Okay, B is the correct answer.
>> [clears throat] >> Okay, perfect. All answers are correct.
Strong understanding. A very common use case of retrieval augmented generation is chatbot answer using company data, PDFs, docs, knowledge base, user and customer support, HR internal training systems. So, employee asks, "What is the leave policy?" So, here have you observed what is the leave policy? Leave policy is varied from company to company or not?
Yes, sir. Respond, friends. Is the leave policy going to be varied from company to company? Yes, sir. Definitely, it's going to be varied. Then if I ask normal LLM, "Hey, what is the leave policy?"
then what is going to provide? Generic answer it's going to provide, which may not exactly match my company policy.
Then what rag chatbot is going to do?
Search company docs and give exact answer. Okay, like that. So, search search company docs and give exact answer like that. So, clear that rag means search plus answer. Retriever plus a generator rules, vector DB plus embeddings, and a real-world use case.
Okay, like that.
That's it.
Okay, these are the just the seven questions whatever we covered. I will share in our chat window, don't worry.
Okay, just I'm closing. Just I'm ready to start to Hyderabad. Okay, just for consistency sake only I'm taking.
Tomorrow onwards, I will be available.
So, with the like earlier, full full sessions will be there. Okay? Thank you, friends.
Related Videos
VALORANT's Latest 'Exclusive' Tier Bundle is Rough...
KangaValorant
17K views•2026-05-28
Flight Attendant Mocks Poor Looking Black Woman — Mid Air Announcement Exposes Her Real Power
SkyboundStories-b4r
184 views•2026-05-28
I FIXED My Friend’s Blown Turbo RX-8… Then Sold It
Cameron-RX8
134 views•2026-05-28
NewsWatch 12 at 5: Top Stories
NewsWatch12
1K views•2026-05-28
Simon Jordan & Danny Murphy deliver PREDICTIONS for Arsenal's Champions League FINAL with PSG
talkSPORTArsenal
6K views•2026-05-28
Botting is OUT OF CONTROL in Classic WoW (Again)...
SolheimGaming
108 views•2026-05-28
The "AI Job Apocalypse" is CANCELLED!
WesRoth
9K views•2026-05-28
STREET FIGHTER 6 - INGRID Story Walkthrough @ 4K 60ᶠᵖˢ ✔
RajmanGamingHD
12K views•2026-05-28











