Elasticsearch achieves millisecond search performance on billion documents through an inverted index architecture where data is organized around words rather than document IDs, enabling direct lookup instead of linear scanning; this is combined with immutable segments for fast reads, BM25 scoring for relevance ranking, and distributed shards across nodes for horizontal scaling, allowing queries to bypass scanning all records by directly accessing pre-indexed word-document mappings.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Why Elasticsearch Is So FastAdded:
If you run this query across a hundred million rows, you will probably need to wait for some time to get the result.
The database has to scan every single row character by character just to check if there is a match. This is O N operation and it scales linearly with your data. The more data you have, the more time it takes for the database to scan it. So, how does Elasticsearch take that same query across a billion documents and return the exact match in 5 milliseconds? Basically, it doesn't scan your records in a moment when you run the query. All the computational heavy lifting happens when you insert the data, not when you run the query.
Elasticsearch under the hood uses Apache Lucene, so whole logic for indexing and searching is managed by Lucene. On top of that, Elasticsearch search adds a distributed layer which provide additional functionality that spreads your data across many machines. It splits the data into pieces called shards and each shard is just one Lucene index. A normal relational database is using primary key to organize records.
Each row has an ID column and for example, body column where it store the text. The database itself does not know what words are stored in body column.
Lucene handle the data in different way.
It organizes data around the words, not the ID. Each word points to a list of documents that contain it.
Let's say we have three short documents.
Document one contains the quick brown fox. Document two contains brown foxes run fast and document three contains the fox is quick. When you put these documents into Elasticsearch, it breaks each one into words and builds a table where every unique word is a row. For the word brown, it sees that it appears in document one and document two. Then, for the word fox, it sees it shows up in document one and document three. For each word, Elasticsearch keeps a list of the documents it appears in. This is called postings list and it holds more information than just the IDs. Posting list stores the exact position of the word in the document, which is needed when you want to run a phrase query where words must appear next to each other. Additionally, it stores a term frequency, which tells the system how many times that word appeared in that specific document. When you search for the word brown, Elasticsearch goes straight to the term dictionary, finds the word brown, and just retrieves the postings list. It doesn't scan the documents or compare text character by character.
Another thing is that this lookup doesn't care how big your whole index is.
What matters is how many documents actually contain the word, which is basically the length of that postings list. So, a rare word stays fast even in a billion document index, because only a handful of documents match it. If you were watching carefully, you might notice one problem with the documents we just indexed. Document two said foxes, which is plural. Document one said fox, which is singular. If someone searches for the term fox, they expect to see both variants, foxes and fox. We can solve this problem by using something called analyzer. Before any text gets indexed, Elasticsearch search runs it through this thing.
The analyzer pipeline contains a couple of stages. The first thing is text filtering and cleaning the string. In this stage, text is stripped out from HTML and we map characters like ampersand to word and. Then, the clean text goes into the tokenizer. In this stage, the string is split into separate tokens, usually by white space and punctuation. The last stage is token filters, where we change the tokens. By default, we only make lowercase everything, but in this step, we can also connect additional filters that solves fox and foxes problem.
Elasticsearch provide ready-made stemmers that stem words, turning foxes into fox and drop common words like and and is. One important thing in the whole process is that the exact same analyzer that processed your text at index time runs against your query at search time.
If you search for foxes with a capital F, the query analyzer lowercases it and stems it to fox. It searches the inverted index for fox, which matches the exact token stored months ago when the document was indexed. Finally, when tokens are generated, we need to write it on disk. Instead of one big file, Elasticsearch writes your documents in chunks, and each chunk is called a segment. A segment is a small piece of the index that holds some of your documents, and your full index is just many segments stacked together. Each segment is immutable, which just means it never changes once it's written. So, you can't update or delete a document that's already inside. Because a segment never changes, we can heavily compress the data.
Also, we can cache it in RAM and never worry about it going stale, and this is a reason why reads are fast because the data comes from memory instead of the disk. And since nothing ever modifies a segment, there are no locks. So, lots of searches can read the same segment at once. Now, you're probably thinking about how we can delete something. In that case, we can just write a tombstone. So, basically, we just store the information about deleted document ID, and during search, Elasticsearch retrieves the results, and then it filters out anything listed in the tombstone file. The physical data is cleared out from segment during periodical background merge process, which takes several segments and rewrites them into one larger and more optimized. And as a side effect, it also delete documents that are declared in tombstone file.
Finding the document is only half of the problem. If a query matches 1,000 documents, you want only best results.
So, Elasticsearch ranks them with a scoring algorithm called BM25, and it has three factors to rank some documents higher and some lower. The first factor is term frequency. The more times a word shows up in a document, the more that document is probably about it. So, if word kernel shows up once, it might be relevant. If it shows twice, it is probably more relevant, but it doesn't go forever. Each extra appearance counts for less and less. That behavior is called saturation. The 10th kernel adds almost nothing compared to the second.
Another factor is length normalization.
If you're you're for latency, and you find a match in a short tweet, there's high probability that this tweet is almost for sure about latency. However, if you find a single match in a 500-page database manual, this probably isn't about latency at all. Basically, BM25 punishes matches in long fields and rewards matches in short fields. The last factor is how rare a word is across all your documents. Some words like data or server show up in almost every document, so a match on them doesn't tell you much. But a word like kernel is in way fewer documents, so a document that has it is probably the one you're looking for. So, BM25 checks how many documents a word appears in. And the rarer the word, the more weight a match on it carries. Common words barely move the score, but rare words push it up a lot. This one is called inverse document frequency. Everything we've talked about so far happens inside a single shard, but when you have terabytes of data, one shard on one machine just can't handle that.
So, when you create an index, you need to split it into several primary shards.
When you add a document, Elasticsearch chooses which shard will store it. It hashes the document's ID, which produces a number, then taking that number modulo the number of shards. The result we get from this calculation is the shard number the document goes to. Those shards don't all live on one machine, they are spread across the machines in your cluster, and each machine is called a node. When you search, you usually have no idea which shard holds the match, so the node you send the request to becomes the coordinating node, and it does a scatter-gather. This stuff happens in two steps. The first step is the query phase. The coordinator sends your search to every shard at once, and each shard searches its own Lucene index and take the best 10 results. Then it sends those 10 results back, but these items only contains a document ID and score, so it doesn't flood the network with full documents.
Second step is called fetch phase.
Here in this step, coordinator takes all those lists, merges them, then sorts them, and then picks the best 10 results from merged data. Now, it knows exactly which 10 documents it needs. So, it can fetch the actual content of these best 10 items and send it back to you. And that's basically how Elasticsearch works. So, just remember that inverted index prepares the text, distributed layer scales it out, and that's how a billion documents collapse into a few milliseconds. If you enjoyed this video, drop a like and subscribe, and tell me in the comments what you want me to break down next.
Related Videos
resume fixed instantly 😭 Comment “app”andI’ll sendyou the link #parakeetaipartnership #resumetips
Ritcareer
686 views•2026-05-31
3D Basics in C
HirschDaniel
2K views•2026-06-05
Re: 🗣️📍theprophedu📍2026 GST 103 CLASS (E-EXAM REVISION)
theprophedu
636 views•2026-06-04
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
Making Minecraft Clone with C++ & Raylib
PecaCSLive
686 views•2026-06-04
Instagram accounts got PWNed
EricParker
13K views•2026-06-03
So What's Odin Lang Even Good For
TechOverTea
131 views•2026-06-01
🚀 BCS613C Compiler Design | Module 1 to 5 Schema Evaluation 🔥 | VTU 6th Sem 💯 #VTU #bcs613c #exam
Pranavaa-y4y
104 views•2026-06-02











