Krishna provides a necessary reality check by exposing the limitations of vector-only retrieval in favor of structural reasoning. It is a pragmatic guide that moves beyond the "everything is a vector" hype to focus on how document hierarchy actually drives intelligence.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Traditional RAG Vs Vectorless RAG-When To Use What?Added:
Hello all, my name is Krishna and welcome to my YouTube channel. So guys, today in this particular video we are going to discuss about a very important topic that is understanding the differences between a traditional rag versus a vectorless rag. Already one month back I had uploaded a detailed video on making you understand about vectorless rag with a library called as page index where we did not do any kind of chunking nor used any kind of vector databases to store the embedding vectors. Right? So we directly went and understood about vectorizer rag and how to specifically implement it. Now once I uploaded this video right in this specific uh comment section we got to see many many many many questions saying that when should be vectorless rag be used for what kind of documents you know for very huge documents it will probably not be that useful. Some people started saying that okay in vectorless rag we basically create a JSON structure how do we go ahead and save it you know and uh tell us the differences between vectorless rag and the traditional rag that we usually implemented right so there were many many questions and I had purposely not discussed about that because I wanted you all to probably write many things in the comment and because of that you know there was a lot of discussions that is going on you can just see that there are around 116 comments and that is all about when should we use vectorless rag right so in this particular video I will be covering in depth about that okay when should we specifically use vectorless rag compared to the traditional rag that we used to use okay now first of all I will just go ahead and open the same material which I had actually used while teaching about vectorless rag we'll try just try to understand what is the differences between a traditional vector lag uh vector rag and the vectorless rag Right.
So in traditional vector uh vector rag there will be a very huge PDF document let's say. So first of all what we do is that we actually go ahead and do the chunking. Then we do the embedding.
Embedding basically means we convert that into a vectors. Then we store it in some kind of vector database like pine constrom anything as such. Once this is stored in the vector database, now there will be another pipeline. Whenever a user gives any kind of query or it is searching related to anything related to this particular PDF, first the user query will be converted into vectors and then through similarity search or cosine similarity. The search will be done within this particular vector database and then you probably go ahead and get the context. that context is further combined with LLM based on the prompt and it finally generates the output. So here the algorithm that specifically work is just like a similarity search.
You find the nearest vector uh and you try to probably get the output. Okay.
Now based on this match you know nearest vector sometimes you may not get the best search because since we are doing chunking right one of the chunk it'll be available somewhere other chunk will be available somewhere right now in case of vectorless rag here we take this PDF document and we create something called as a llm tree builder and with the help of LLM tree builder it is nothing but it is a it is a hierarchy of section now this PDF should be a structured PDF where you have some kind of page index like on this page number one this particular content is present 1.2 to this content is present right so when you have a structured PDF or structured content right there you'll be able to generate this LLM tree builder okay and I had also shown in that specific video that video I will be giving in the description of this particular video itself so that you can go ahead and watch because there I have also discussed about the practical implementation then you go ahead and create the JSON tree index see like this the structure will be node one node two and at the end node right there will be a summarized version of that specific speific topic right so that way the JSON tree index will be created JSON tree is just like this kind of tree that is avail that that you can actually see over here on the right hand side right so this kind of tree now the first question comes is that where do we save this tree because many comments I have actually seen many people asked where do we save this tree now see guys when we say JSON tree index right in short it is in the JSON structure Now whenever you have a JSON structure you can save it anywhere you can save it in a file system you can use a S3 bucket you can save it over there or you can use even MongoDB you can use different kind of databases which will be specifically used for storing the key value pairs and you can save it over there right and from there you can actually call and uh you know load it.
So this was the question that was basically made many people asked where do we go ahead and store the JSON structure. All right. And how big this JSON can actually happen. It can happen like see guys uh I will talk about the detailed scenario when you should go ahead and use um vectorless rack. Then you'll also be able to understand that how big the JSON can actually be. Okay.
So based on that I will be talking about it. Right. But right now the main thing is this this JSON structure can be stored anywhere in the file system in the S3 bucket in the MongoDB whichever supports this JSON structure you can actually go ahead and use that right uh so all those things you will be able to save it right now in the next pipeline whenever a user gives a query the LM research will be done name section like title page summary see that all information will be available in this end nodes right so whenever a query is basically doing it is basically iterating through that particular structure and getting the response and giving you the response back then it is combined with the LLM and finally it generates the answer. So this entire thing if you see in my practical video also in this first video that I've actually shown you that was one month back uploaded right over here if you go forward right there we have also discussed about the entire code we have given this how to go ahead and use this page index library I've actually done it right now that was the recap of this particular video so if you go back over here and see we have still discussed about this things how how the um you know the the PDF is basically passed right so there will be a table of content detection it'll go and scan all the pages if it has a TOC that is table of content it'll parse all the chapters and it will do section aware splitting okay respect logical boundaries not token counts then it will summarize each and every section so if it does not have a TOC then it is just going to directly go over here right if it has a T tst to then it will go ahead and split chapter wise and it'll make all the summaries Right? And then a symbol hierarchal tree parent child grand node and finally you'll be able to see this kind of nodes will be created. Right? In the case of financial stability you'll be able to see one more node is over here. This is the summarized version between this page to this page 22 to 28. Similarly 28 to 31 another node will be there. That will be a summarized version. And when we are quering it'll go ahead and parse through this and it'll try to get up the content. Okay. Now till here I think from the previous video also it is clear. If it is not clear, go ahead and watch the previous video because it is in complete detailed along with all the codes and all that is given. Now I'm going to talk about what is the differences between the vectorless rag and traditional rag. Right? So here you can see I've clearly explained okay in the case of vectorless rag what is basically going to happen. So in the vectorless rag you'll be able to see that okay first of all we go ahead and create the heracle index. So like this let's say there is an annual report 2024 okay there's the annual report 2024 and this has all the nodes all the sections pages wise everything right so this is going to probably go ahead and create this kind of structure it'll build a tree lm reads root summary descend a tree read full section you know no chunking nothing is required answer and site the path okay so this is the thing that is basically happening now if I go to the next slide tradition Traditional rag the real picture right it's powerful but it has nonfailure modes let's say the what are the strengths of a traditional rag we'll discuss about first of all whenever you have millions of documents right millions and millions of documents you have huge amount of content of a company anything as such right and you quickly want to have a look up and get some context from that particular documents at that point of time you can actually go ahead and use traditional D because this is the main thing why we are discussing about right Then when you want a mature ecosystem.
Now when we say mature ecosystem that basically means we have some kind of database over there like a vector database which is purposely driven for all this kind of activities. So there will be chroma fire pine cone quadrant v right. So different different vector databases you can specifically use. Now let's say if your retrieval is basically cheap you want it more cheaper and whenever you have huge data it is always a good idea to have something like a cheap retrieval right. So here you'll be able to see one embedding plus one similarity search per query. Whenever I make one query, okay, to that specific vector database, what is going to happen? First of all, that query is going to get converted into embeddings, right? So first of all, you'll be able to see that what will basically happen whenever you make a query first is that the query is going to get embedded, right? It is going to get embedded. Then you're going to do a vector DB search.
then you're going to do a vector DB search right so in one call you'll be able to see this is basically happening and this is actually happening okay okay I think I have uh went in the previous slide but no worries okay I will go ahead okay yeah vectorless okay traditional rag over here we were right so cheap retrieval one embedding and one vector DB search now the next thing is that here you have something called as grade for factoids okay grade for factor toids short short and lookup style questions whenever you have like this let's say that I have a huge amount of document I may go and ask in that particular document what is the revenue of the company right and quickly I will be able to get that particular answer and get that specific response it is very important to understand because this is the way like tomorrow a problem statement that comes to in uh like let's say you are working in a company and tomorrow a specific problem statement comes you really need to go ahead and decide whether you need to use a traditional rag or a vectorless rag.
These all questions will should come in your mind. Okay? Then it is domain agnostic. Works on any text, block, tickets, PDF, right? So what does domain agnostic actually mean it? You not depend on any kind of domains over there. Okay? It can be any kind of text like blogs, tickets, PDF. You have some random information. and you quickly want to create a chatbot which will be able to act like an assistant to ask any query to that specific chatbot at that point of time you can actually use a rag. Now the next thing is about weaknesses. In weaknesses one very important weakness of the traditional rag is chunking destroys context. Okay.
Now this is really really important. It says chunking destroys context. Why?
Because when chunking is done let's say in chunk one some information will be there. In chunk two, some information will be there. In chunk three, some more information will be there. Why do we specifically do chunking? Because LLM specifically has a context issue, right?
And if we perform chunking, we will even be able to save this chunking into a specific vector databases. We cannot combine everything at once and probably give it to the LLM, right? Because the data is very very huge. Yes, from a query whatever chunking similarity is basically done that response we can combine it with the uh with our prompt and give it to the LLM right so chunking destroys context like some of the chunking memes miss over here right let's say some information about a very important concepts is available in this three chunk right and in the four chunk there are some more information but this is not getting matched so this information will be missed right and because of that that entire context information which LLM needs to get will not be able to get right now. The other thing is that similarity is not equal to relevance. Embedding can match wrong things confidently. So this is one of the problem that can actually happen. No cross-section reasoning. Can't answer compare risk versus mitigation, right?
Hard to explain. Why was this chunk picked? Cosine score isn't an answer.
See cosine score when you basically do it is just like a similarity search.
Relevance search is not there. Context relevance, right? how one chunk is related to the other chunk on what order it should basically pick up. So that relevance is not there right. So this is some of the major weakness about traditional rack. Then you can see embedding drift. Now what does embedding drift? Basically means when model changes you need to rem.
Let's say tomorrow you're using some different model right? Then the model may have trained with some more information some more different information over there and because of that you need to again rebed everything with respect to a vector embedding models and again use that particular context over there. Right. So this is the major major problems with respect to traditional rag. Okay. Now what I will do is that I will go ahead and talk about vectorless rag. In vectorless rag what we are specifically doing we are letting the LLM navigate the document like a human world. Like how do we iterate through all the books and pages that is how. Now let's talk about the strength here. You really need to understand many things. Okay. First of all, the major strength is it preserves document context because why? You have a structured data, right? You have a table of content.
Okay? You have a table of content and based on this table of content, you are creating the JSON tree in the node, you'll be having the JSON information along with the summary. Right? So here, no chunking is happening through this flow. important in this node only the information related to this node will be available in this node only the information related to this particular node will be available right the so this is the most important things the section stays whole no broken references let's say one important information is available here the same information will not be available in the different node in this node only it'll be available in the form of a summarized version cross-section reasoning LLM can compare contrast and synthesize when it is making the specific flow It will also be able to compare, contrast and synthesize so that you get a actual output. Okay.
Explainable retriever right returns the navigation part not a cosign source. So when you see the output of a vectorless rag over there it will also give you a kind of a navigation path. Okay. And why this specific path is chosen? Because of the flow that we have selected and here we don't get cosine score. So what is basically happening because of this relevance which we are talking about right relevance is basically getting captured okay cosign similarity is not getting captured that much okay only relevance relevance if you have that basically means the context information when you are comparing with the traditional vector rag is much more better over here no embedding pipeline so this is one of the major cost that is being removed Right? So we don't have to use any kind of embedding pipeline over here. We don't because we're skipping the embedding. Right? Embedding. So we don't even have to re-mbed things. We don't even have to convert. So here what is the best about thing about vectorless rag. We are not converting text to vectors. Right? We're not doing this.
We're not converting this. Right? Plays well with the structure reports, contracts, filing, textbooks. Now by just seeing this particular point I think you should be able to understand when should we specifically use vectorless rag and when should we use traditional rag. It is said we also have understood about do domain agnostic right. This is specifically required for domain preferences. the previous traditional rag whatever data it can be if it is not structured go ahead and use vector rag vector rag that basically is traditional vector rag if it has a structure if it is of a specific domain I'd suggest go ahead and use this now let's talk about some of the weakness see we are using vector db right when you are using vector db you know the whenever we make a query one embedding model cost and then one query retrieval right two things are happening and then the lm used. The major weakness of a vectorless tag is that you have to make multiple calls to traverse the tree. See every node here summary is basically created right? Who is creating the summary? The summary is basically created by the LM right. So because of this higher latency se several hundred ms to a few seconds per query whenever I make one query right now I've just shown you a small tree in a real scenario there will be a very huge tree based on the content right based on the content there will be a huge tree. Now whenever I make a query it needs to traverse to all these things right let's say the information is present over here it'll go ahead and traverse over here and because of this several hundreds few milliseconds to few seconds per query the query basically imp like increases with respect to the higher latency right whenever we compared with the traditional vector then does not scale to millions now just imagine if you have millions of documents then this tree will become very very huge right works for 10 to thousand of docs not internet scale millions of records no not possible because I have to create this very huge right and for traversing you know just just understand the performance whenever I'm asking a question inference for any solution that you create you first have to look on the inference part if the inference is very very good or not okay the last thing the second last thing you need to definitely have structured documents if you're not having structured documents it is no use to use vector tag okay like random block post tree added add adds little values right so if you have a structured documents I'd always suggest to do this so first condition is that whether the document is structured or not then the second condition is that how long it is whether it is 10 thousands of documents you know and do you think that you are making this domain specific right that is also really really important less mature tooling page index and fewer than the ecosystem is so this is still improving but what I feel is that for uh domain specific use cases. This can be definitely very very handy. Okay, so this was about uh you know vectorless rag. Now let's go to the next slide and talk more about it and this will basically give you a more idea about when to use this. So slide by slide comparison right. So whenever you have scale of millions of documents quickly go ahead and use traditional rag. If you have 10 to thousands of documents, vectorless drag latency query milliseconds, hundred of milliseconds, you know, cost per query cheap this is basically higher because here you have multiple LLM calls. Cross-sectional reasoning, this is weak, this is strong, right? Because in the chunking, you may miss the context from one section to the other sent section. In vector slag, what you do? You summarize the entire content, right? Then explanability is cosign score here navigation path best for fact Q&A mixed corpora here for long structured documents right let's say I want to probably go ahead and create a vectorless rag for um whatever you know finances are there of a company or let's say uh legal contracts of the company so at that point of time I will go ahead and use vector hag setting up complexity this is little bit high this is less because here directly tree builder is basically Here you need to go ahead and create a embedding pipeline plus DB. If you talk about ecosystem maturity, it is very mature. It is emerging right now.
Uh we will go to the next one. When to use traditional rack? When you have massive, see very important statement, very simple statement that we have written over here. When you have massive hetron heterogenous corpora that is data millions of mixed format datas blog tickets transcript knowledge based articles you can use this latency critical apps like chatbot search because you want quickly all the uh inferences outputs what you are then short factoid queries what are the warranty period who is the CEO what is the uh revenue of a specific company costsensitive as a clay at a scale if we are focused on cost sensitive things like thousand thousand of queries per minute embedding lookups in pennies llm's tree walk will not be suitable in this particular case okay so now I hope you are able to get some idea with respect to this uh now the next thing is that when do we use the other one okay so that is the vector list rag that also we'll discuss so whenever you have a long structured document you can go ahead and use this like annual reports 10ks legal contracts These all things are there. When reasoning is more important than similarity that basically means relevance is more important than similarity. Then explanity is required.
Why compliance audit legal financial advisor show your work not just answer?
Chunking destroys meaning. Right here you feel that chunking is actually destroying the meaning of the entire data then I would definitely suggest don't ever use u traditional rag instead use vectorless rag. Okay. So key takeaways but one very important thing is right right u which I definitely want to talk about because at the end of the day what we are going to use whether traditional rag or vectors but as we go ahead now people will start using hybrid rag okay they will try to do something like they'll use the most powerful systems of features of vectorless rag and combine it with the traditional vector rack Okay, so two types of search will specifically happen. You can see traditional rag is equal to scale plus vectorless rag is equal to reasoning plus structure. They are not competitors. They are complimentary.
Pure vector search and pure tree navigation are both extremes. Right? The right pick depends on the doc not on the hype. Long structured filings is equal to vectorless. Mixed knowledge base vector big system. If you have a huge system where you have both the combination of data, it is better to go with the hybrid approach. production system are going hybrid right and many many companies have started using both the specific techniques. So I hope uh you like this specific video this was all about making you understand about vectorless rag versus traditional rag.
So yes this was it from my side. I will see you in the next video. Have a great day. Thank you and all. Take care.
Bye-bye.
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 viewsβ’2026-05-28
How agent o11y differs from traditional o11y β Phil Hetzel, Braintrust
aiDotEngineer
450 viewsβ’2026-05-28
Re: π£οΈπthepropheduπ2026 GST 103 CLASS (E-EXAM REVISION)
theprophedu
636 viewsβ’2026-06-04
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanationπ―β
LearnwithSahera
1K viewsβ’2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 viewsβ’2026-05-29
Search Algorithms Explained in 60 Seconds! π€π¨
samarthtuliofficial
218 viewsβ’2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 viewsβ’2026-05-30
Instagram accounts got PWNed
EricParker
13K viewsβ’2026-06-03











