SIRA is essentially an expensive query expansion wrapper that trades massive computational overhead for marginal gains in retrieval accuracy. Despite the "Superintelligent" branding, its fragility and high costs make it more of an over-engineered experiment than a scalable breakthrough.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
META’s New SIRA: Superintelligence RAGAdded:
Hello community. So great that you are back. Today we talk about super intelligence AI and in particular a brand new rack system. And I know you might said hey in your last video you showed us a better cheaper rack based on a neuros symbolic multihop reasoning where we interact between a continuous vector space and a discrete topological space. Yes, I know. But now we have something special. We have a new article May 8, 2026 published by Meta and Ma now and here the new department the MIA super intelligence lab they give us new a new agent but not any agent a super intelligent retrieval agent and they claim that this is the next frontier of information retrieval in artificial intelligence. So I sort okay we have to have a look we have to analyze this. So let's have a look what ma is cooking and let's see what's happening. Of course you have your GitHub as you see just more or less one day after it was published I have here the information repo was by the owner it's read only okay it's an MIT license okay so let's have a look we want to understand what is happening so let's take a step back and look at this now what we have we have multiple dimensions of rack what you are familiar with is a classical one a dense retrieval an information retrieval with openi embeddings. This is a vector space beautiful. We know how to calculate mean field approximation of a text. So what we do we map a highly complex discrete semantic structure the human language a paragraph onto or into a lowdimensional continuous isotropic manifold. This is pure mathematics. No this is phmologically useful for measuring here the overlap and the overlap we define mathematically with a cosine similarity.
And you notice, but of course there is a problem because this inevitably destroys a little tiny bit of the real fine structure, the exact rare constraints.
No, this technical jargon sometimes the topological structures that define here the absolute precise information because guess what? Yeah, this is a vector representation. We have a cosine similarity and this is a statistical methodology.
Now since we know now that the dense retrievers compress here the meaning into a fixed dimensional embeddings maybe only a 1,00 dimensional embedding no they can miss information and require some iterative exploration and now the second trend now for 2 years let's say is to have a gentic LLMs no they iteratively probe here this particular text corpus or information corpus whatever you have or data lake they generate now a query inspect here the retrieved snippets of text update now their working context understanding and reformulate the next iteration of the search process. So this process if you want resembles here a classical stoastic probing through a latent context space and you can run this here in a loop and this is here what we do classicalally and of course it is sometimes timeconuming and maybe expensive.
Give you an example. You have the question, why did the Roman concrete survive longer than any modern concrete architecture? Now, so the first query by the LM might be a Roman concrete composition, the chemical decomposition of this. No, and you will get back here some botulani ash and some seawater reaction and some lime class here. And so the LM agents here it realizes okay maybe seawater interaction may be important and I have to have here my second query that I ask now here the botulanic ash seawater mineral formation durability and you see it just goes on and whatever it gets it builds it construct here more complex environments and the comp the retrieval process becomes here an adaptive searcher rather than a oneshot exercise.
Now in real complex scientific terms in medical or even in finance this can go on and maybe you do miss the particular detail that you are looking for. So now let's look here at ma ma claims now we have sir sir is the super intelligent retrieval agent and this is now the new frontier. So let's have a look. They say now you know what stop this random walking random is here the retrieval process here the stoastic one stop the random walking and compute now the exact boundary condition mathematically of this before we do any execution. So we go if you want back to a oneshot um try and error. So how does it do?
Sirina uses here the vast parametric memory of an LLM to generate here an anzot. You know this is the German word.
This is here this philosophical ans idea here of a spectrum. So simply in our case an eye this is a missing vocabulary or some aliases or some counter arguments or some whatever is here in a closed absolute environment here. If we think about a vector space representation on our query vector and now it says okay I have memory I have an understanding as an LLM as the core of an agent an LLM I know exactly from my parametric memory what I have been pre-trained and fine-tuned on what I expect it to be what I expect to see from the user so this is now an interesting approach super intelligence is projection what the AI thinks that the human wants to see or expects to see. So we do have a strong polarization into known facts to the LLM and we are not freely searching here for new information but we are limiting here our search process to expectation that the LLM knows already about and is now giving the humor some feedback.
Now you might say how do this how complex will it be? Turns out not at all. We go back in time. We go back in time to I think 1994 95 I don't know exactly but we operate now with a BM25 a best match 25. This is the grandfather before there was any neural dimension. This was in statistic here a BM25. This is a known algorithm.
So rather than in a compressed dense vector space like we do with vector embeddings, we operate now grandfather away in a nearly infinite dimensional space which is if you code this and you calcul it is quite challenging of sparse lexical retrieval.
You say okay wow and then we restore here define structured resolution. Now in theory this is correct. As I told you here, if we go in vector spaces and we have approximation and statistics, we will lose the absolute finetuned absolute technical jargon.
But to go back now in an infinite dimensional mathematical space and have there our lexical keyword retrieval, this is not an interesting approach. So I was interested how meter will continue this. And it just tells me it expands here both the query and the document with missing latent features filters out the high entropy noise. Okay. And executed deterministic singleshot interaction. So here we are.
Okay. So if you are not familiar with PM25 best matching this is the version 25. It is simply give me a query which document we have a pool of documents. We have a a a corpus of documents are the most likely relevant now out of 10,000 papers based on this keyword evidence.
No. So this is the core style of retrieval used in the good old fashioned traditional search engine Google in the 20th century before dancing retrieval became known or dominant at all. Okay.
So here we have it now all three now the dense retrieval then the agentic loop and now here the new zero resolution and you look at this and say this is interesting now dense retrieval is clear multistep agent here the looping here from the llm what it expects to see what it gets back and then it updates here reformulates here the next round of queries we have additional information here that seawater might be an interesting parameter for the for the old Roman and concrete.
Beautiful. And now we have here also an LLM reasoning. But now we have the expected response sketch. Then we have here a DF validation and pruning.
And now we have here the constraint and the weighted keywords that come in now.
And then we have a BM25 retrieval.
Okay. And we have a oneshot element. No.
So you see this going out into the environment getting a little bit more clever every time I ask something learning a little bit of new knowledge every time I go out in this loop so that I really can formulate a complex or answer a complex query a multi route context dependent now meter tells us no we don't do this anymore we go for a oneshot BM25 retrieval okay let's have a look now BM25 has a quite some spectrum So let's go here with a very specific this is here if you want here from Robertson the dominant lexical ranking function in modern search system it's course a document D against the query Q by summing per term contribution. So the good old BM25 we have here our IDF our inverse document frequency and here we have here our term frequency zero now abandons here really the standard continuous inner product space our dot product our vector space approximation our cosine similarity and reverts to a sparse retrieval okay so let's do this let's follow ma now they say okay Since this is already known, sir now modifies the input to this function dynamically and they define three new input modification. At first they say an offline corpus side matrix expansion. They say okay let C be the corpus of the document. So an LLM evaluates now each D document and predicts here a V a set of out of document discriminative vocabulary of synonyms or alternative names.
Okay. So we understand immediately it is about out of document vocabulary that is not in our tokenizer.
So we want to predict something that extends here. Note the thematic topic a little bit outside the known boundary.
So therefore ensure this added dimension actually informative applies. Now an upper bound filter and this is a document frequency filter demanding that the proposed term not exceed a particular threshold. Whatever the threshold is. No is prunes here some generic vocabulary down. Beautiful B.
The second one is a online query side expected response catch. Now as I showed you we have a real human initial query Q original and then the LLM reading this Q original proposes now kind of an expansion. is Q expected consisting now of maybe more domain vocabulary think about this concrete no and entity attribute that are absent from my human query but what the LLM think that the human expects here in the target document this is quite a lot of guessing no so they say okay and there's also an intersection applied here and we have to be within here defined interval.
Beautiful. And we have a lower bound proves that the term physically exists in the corpus index because we we are indexing everything terminating thereby hallucinated vocabulary before the execution. This is great. But remember a where this might not actually take place. So okay we have the online query sign. No entity attributes absent from the query but expected. So we all rely now again on the parametric knowledge of this LLM and hopefully this LLM has been exactly trained massively on your domain knowledge let's say theoretical physics or finance or medicine otherwise this LLM would have no idea at all what is expected here today let's say here by a human queries now this particular agent and then of course Cena we have to have a superposition Because now we have B. So how we integrate B into this process. So superposition operator we have a linear superposition here plus well well surprise surprise of our BM25 our original query and a hyperparameter let's talk about this and a BM25 of the expected query. Yeah. Now this W is a parameter scalar controlling now the amplitude of the expansion dimension and we come back to this a little bit later.
Beautiful. So you're saying okay great.
So let's do this. So we are now having a human search and we rely on the parametric knowledge of the LLM. This is not a super intelligence although it is maybe just a 7 billion classical AI model. And then we know what the human maybe has an intention of knowing about the expected outcome as the LLM interprets the human intent.
Okay. Okay. So here again the dense retriever the multi-step retriever in the loop and now sir uh sir sorry. So you see you see a raw elem it generates an expected response catch no and I say the very first step is already something that has here a high variance now but okay and then it validates the proposed terms against the statistic of the text corpus that the LLM knows hopefully you have theoretical physics massively in your text corpus and then it compiles a control good old grandfather BM2 25 query with some weighted keywords and constraints. Okay.
All without reading any retrieved passages from the real world environment. We completely rely on the internal knowledge of the internal pre-training. And just think about Gemini. Now, Gemini has a knowledge cutoff date from January 25. Full stop.
So you see why I am a little tiny bit skeptical that you can use this for science. But okay now there were a lot of other elements.
No, you remember grabb rag here the study for optimization of grap like retrieval of code completion here where they say how far can a simple index lexical retrieval go in supporting here we were talking about code code completion before a more complex retrieval mechanism becomes necessary and I think if you want to have a look at the study and understand that there are massive limitation yeah it is not a coincidence that I show you here crap rack anyway this is here our sierra our pipeline overview. We have here the offline and the online component here with the predicted answer document vocabulary all based here on our frozen LLM.
And now we think we should go here to the definition because now meter defines what it means with super intelligence.
So now we have all an idea what super intelligence is and now comes the awakening the talis in retrieval as the ability to replace this multi- round process of a gentic LLMs with a single expert level retrieval action and this is now four steps first form a domain informed expectation via the LLM of what the relevant evidence looks like. So you ask the LLM, hey LLM, what do you think that the answer might look like?
Um, Ma, honestly, do you think that this is an information retrieval? And then B, Ma goes on in the super intelligence definition, ground that expectation using now the document frequency of our corpus of documents that we have here as a basis in our training data.
As I told you, Gemini has January 2025 cutoff. Then we compile the results. No.
And then we execute everything efficiently and transparently and beautiful.
Okay. Okay. So let us look at the results. Here you have here the screenshot here from the original study.
And you see if you go here and now I have to smile a tiny little bit with some recall at 10 PM25. So I would call yeah this is not the the latest let's make it this way. This is not the latest elements I would use in a benchmark but okay so we have here the chain of sword then we have the search R1 then we have the grab rack and now compared to those elements zero as you see is almost ever has a better result. But if you look closely already at chain of sword on all these beautiful benchmarks give us here on the average [clears throat] the last column uh chain of sort gives us here 52%.
Sierra gives us here in the best case 69%. So we might say okay this is a significant jump but think about it this was a chain of sword. If we go with search of one we are jumping now to 61%.
So you see we're in the same interval like zero without all this additional indexing and whatever. Yeah. So think about about here the the real value jump and we will talk about the cost in a minute. Yeah.
So okay there they found the benchmark where it is almost always better performance than any other of these old systems. And I mean yeah search R1 is from August 2025. If you want to see this is the very latest version, version version five and they had a completely different idea Google cloud AI research and they formulated here reinforcement learning objective function utilizing here a search engine as follows and did an optimization here on this expectation. So maybe a little bit more on the scientific side. Okay, but in general this is it. This is what ma tells us. This is here the new frontier in the eye. And you might or may not agree with me, but let's be scientific and let's say how can we build on this?
How can we improve on meter? And what are the limitations per meter?
Now at first I would line something out that uh meter itself might say this is nonsense but you have high compute cost high offline compute cost because s requires feeding every single document in the corpus through an LLM to generate here this v expansion offline yeah if you have 5 million documents yeah you can have here an idea if you have only a 35 billion parameter LLM even if you go quantized here how to populate here the inverted index is yeah has some compute costs. No, and the orers here from meters handwave this as this will kind of amortize here with the time. But if you are not working at meter, if you're a normal user with a normal computer infrastructure, maybe you would say that this computational footprint is massive.
Okay. Then the hyperparameters. No, I told you we'll come back here to the scaling weight W. You know the superposition score equation and also the frequency bound towel are asserted without any [clears throat] well they just drop no without any rigorous analytical deviation derivation of course [clears throat] how sensitive is the face base to these parameters no because if tower is too small we lose any valid synonyms and it's too high the original signal is just swamped no so how can we calculate this maybe based on dynamic entropy or a variance idea here of the specific query. All of this is on unanswered. Yeah. And then I think maybe one of the most important weaknesses the entire architecture hanging here on your particular LLM let's say a Gemini with a cut off date in April or in January 2025 it possesses some knowledge of the corpus domain let's say physics or mathematics or finance and if applied on highly appropriate data or some novel let's say coincidence I use here physics or medicine or new ideas in politics or in finance.
They are absolutely underreresented in the LLM's pre-training manifold January 25. The expected response catch by this LLM will output allow me to write the word garbage.
This means fundamentally breaking here the retrieval symmetry that Sierra absolutely relies upon.
So careful there are some heavy limitation if you want to go with Sierra. It is a new idea by meter. Okay.
Yes, I understand. we go back to an infinite um space here. Okay. But maybe be aware what are the limitation. No.
And finally allow me to come back to the word super intelligence. uh I mean if you look at it from a mathematical point of view what this study is all about is I would call it a well-gineered no don't get me wrong a well-gineered birectional query or document expansion that is validated by TF a term frequency inverse document frequency bound constraint I mean this is it no this is the statistics that we used before neural networks were on woke no so I would definitely say Hey, this is a nice idea.
No, but to call it super intelligent is something that Ma decided to go with.
Okay, but between you and me now, how would I now continue this idea? I think the idea to pick up is where the orers tell us LLM produces you an expected response catch.
This is where mira [clears throat] tries to look into the future. integrate the expectation of a particular human being.
Understanding here exactly the domain complexity and the domain knowledge of this human and the query and how this query embeds here in a higher let's say scientific complexity or medical complexity and then have an interpretation or a prediction of future implementations. No, so you see what I mean? There are terms like okay how will a world model where we have a physics engine implemented maybe calculate here some futuristic outcomes but I think that an LLM based on its purely parametric knowledge now a compact hypothesis of the concept the entities and the discriminative items likely to appear in the relevant evidence is a little bit of a too soft term for to claim scientifically that you have a super intelligence agent or a super intelligent retrieval agent. If you disagree with me, please leave a comment. Absolutely. Let's have a discussion. Anyway, it would be great to see you in my next video.
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 views•2026-05-29
🚀 BCS613C Compiler Design | Module 1 to 5 Schema Evaluation 🔥 | VTU 6th Sem 💯 #VTU #bcs613c #exam
Pranavaa-y4y
104 views•2026-06-02











