A masterfully concise breakdown of architectural evolution that turns complex scaling theory into a clear, logical roadmap. It is an essential primer for understanding the structural backbone of modern high-traffic systems.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
From 5 Users to 1 Million π How Systems Scale (Step-by-Step)Added:
latency zero. Have you ever wondered why your website work perfectly with the five user, but suddenly crashes when the thousand or 10,000 user comes up?
So, today we are going to understand how to build a system that a scale from few users to the thousand to the millions without breaking.
Let Let's understand this a step by a step.
Suppose this is you and you have built a website.
This is your website.
Now on the day one you notice that only the five user have been visited to your website.
Only the five user have been visited.
Now you notice that everything work perfectly.
Uh request and response is very fast. No error and user find very a smooth experience. Now on the day two let's say 10k user have been visited to your website.
Now your website become slow.
Request start falling.
And user start leaving your website because user find very bad experience uh from your website.
So, what went wrong?
Let's break it down.
Before we fix this problem, we need to understand the basic thing. First one is the request.
So, request is like this is the user and this is the server.
So, when the user ask something from the server, simply this is called request.
So, if I take it one example, so let's say if you open the Instagram and user send request to the server and the server send that data back.
Now the server server is simply just a machine that process request and send response.
Uh talking about the scaling, so a scaling means handling more user without slowing down or crashing.
So, this is the simply meaning of the scaling.
Now let's see how system evolve a step by a step. So, for that we will divide into the multiple steps starting from based on the number of user. So, at a start at a starting we are taking only the 100 user.
So, let's say we are taking only the 100 users.
Now in the beginning everything run on a single machine.
So, basically we have the monolith structure.
We have only the single machine.
Everything will run on the single machine. Your back end back end we have no JS Django rails whatever back end will run on the single machine.
Database and the server all are inside one system.
So, if I take one example, let's say you have created a small blog website. So, what you usually do? You keep all the things into the single server.
Because the number of user is very less.
Now the coming to the bottleneck.
So, as the number of user have been increased let's say we are increasing the number of user.
So, your system start a struggling. CPU become busy and the memory get full. Database become slow.
So, the So, the thing is that how do we fix this?
So, we come to the step a stage two.
So uh in the stage two we basically taking 100 to 1k users.
Now we separate the database from the main server.
So, this this is the this is the app server.
And this is the database server.
So, app server will handle the logic and database server will handle the data.
Okay, so if I take one example, so let's say you are trying to login. So, login system basically login system at the start at the starting app server will handle like it check the logic and DB store the users data.
Now we are increasing the number of user from 1k to the 5k.
So you can see that we are increasing the number of user gradually starting from the 100 to the thousand now five 5,000 then we will increase number of 10,000 then millions billions.
So So, at the stage three So, what we can do? We can add the multiple server here.
So, let's say this is the server one, this is server two, this is server three.
Right?
And uh But the thing is that how we can distribute the traffic? Let's say this is the user and uh user want to get response. But the But the question is that how uh how the how we can get to know that the request goes to the particular server. So, the load balancer comes into the picture and load balancer basically take a request and it redirect to the multiple server based on the availability of the server.
So, like we have the L4 L7 type of load balancer. We will We will discuss this into the upcoming lecture.
Now using a load balancer uh like we are using the load balancer, so it will distribute the traffic to the multiple server.
Now the traffic is distributed and the system become faster and reliable.
So the question is that why we are using the load balancer?
Thing is that we want high availability so that so that it uh eliminate the single point of failure. If app server uh like let's say we have the multiple server and this is the load balancer we have.
This is server one, server two, and server server one is get crashed. So, load balancer redirect the request to the server two or server three based on the availability.
Now uh it provide the infinite horizontal scaling. So, let's say uh in we have the server one, server two, and the server three. We can add more server here. Let's the server four, server five.
So, load balancer basically uh like it provide the idea or it give the give the condition uh so that we can increase the we can do the infinite horizontal scaling here.
Now we are increasing the number of user from 5k to 10k.
So, this is the stage four.
Previous So, starting from the stage one, stage two, stage three, stage four, we we have come to the final stage where we have increased the number of user. Now the thing is that uh we can add the caching here.
Because the database is still getting too many requests and so we have introduced cache here. So, cache basically store the frequent use data in the fast memory.
For example, let's say you open the Instagram. So this is your Instagram and uh let's say at the first time user profile get a store like at the first time user this is the user.
So user profile this is stored into the DB.
And at the second time it basically uh it basically get the user profile from the cache.
Instead of calling or instead of getting the data from the DB uh it directly get it from the cache.
Because the cache is faster so >> [snorts] >> Now we can add the CDN here also.
CDN that is the content delivery network. So we have multiple type of CDN. For example CloudFlare or Akamai.
So these are basically provide CDN facility. So the let's say we have the images in our website, videos or a static content to serve that that we can use the CDN.
So let's say we have the videos.
images or a static content To serve all those we can use the CDN.
Like CDN CDN delivers content from the nearest server.
And for example let's say we are taking example of YouTube or Netflix so fast basically it provide a fast loading because of the CDN.
Now coming to the scaling comparison So vertical is killing we have we have the horizontal is killing and we have the stateless tiers. So in horizontal is killing basically in the vertical is killing we increase the like we increase the capacity of the CPU RAM the ROM and basically we have the single machine and we are increasing the all the capacity of a single machine.
Uh in the horizontal is killing we are adding more.
So let's say for the analogy I remember like this this is the edge.
Uh so this and this is get getting connected.
So in the horizontal is killing we are adding the number of nodes here.
Now let's say everything is connected together.
So user uh give like user provide request to the DNS then DNS provide uh request to the load balancer.
From the load balancer the request goes to the app server then it goes to the cache and at the last it goes to the database.
So this is how the real system uh actually is killed from the zero to millions user.
So if you have any question kindly ask me in the comment section.
Thank you.
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 viewsβ’2026-05-28
How agent o11y differs from traditional o11y β Phil Hetzel, Braintrust
aiDotEngineer
450 viewsβ’2026-05-28
Re: π£οΈπthepropheduπ2026 GST 103 CLASS (E-EXAM REVISION)
theprophedu
636 viewsβ’2026-06-04
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanationπ―β
LearnwithSahera
1K viewsβ’2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 viewsβ’2026-05-29
Search Algorithms Explained in 60 Seconds! π€π¨
samarthtuliofficial
218 viewsβ’2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 viewsβ’2026-05-30
Instagram accounts got PWNed
EricParker
13K viewsβ’2026-06-03











