The video provides a rare moment of conceptual clarity by stripping away technical jargon to reveal the elegant logic of distributed event streaming. It is a perfect entry point for those who want to understand the "why" behind architectural decoupling before drowning in the "how."
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
You will never forget Apache Kafka after watching this.Added:
My name is Abishek and welcome back to my channel. So if there is one topic that beginners find it difficult, I would say it's Apache Kafka. So in today's video, I'll explain Kafka in very very simple words. I'll try to take some realtime examples so that after watching the video, the concept of Kafka is very clear to you. And just like any other video on the channel, I have a detailed GitHub repo. I'll share the link in the description. After watching the video, go through the GitHub repository. It has the detailed notes and even the demonstration that I'm going to perform. So without wasting any time, let's get started.
So what is Kafka?
If I tell you Kafka is an event streaming platform, if you are a beginner, that wouldn't make a lot of sense. So instead of understanding what is Kafka, let's try to understand why Kafka. Now why do you need an event streaming platform to understand this better? Let's take a food delivery application applications that we deal on day-to-day basis. It can be Uber Eats or it can be Swiggy, Zomato, anything.
Now when you look at life cycle of these applications, they deal with stream of events.
What does that mean? Imagine a user goes to the food delivery application and let's say user place an order. Now as soon as user placed an order, the first event is created. Order created. And this event is followed by a stream of events. Once the order is created, now for the same order, there is next event that is order is placed with the restaurant. Then there is another event in the same stream. Restaurant has accepted the order.
Then order is ready for preparation.
Then there is another event in the same stream. Order is prepared.
Then order is ready to dispatch.
Order is ready for delivery. Right?
Similarly, order is delivered and order is complete. So what I'm trying to say here when you deal with platforms like a food delivery application for every user action a stream of events are created and you can also see the stream of events if you go to your mobile application you can see order is created order is placed order is accepted all of the events that I talked about but Abishek what is the significance of this okay let's say there are stream of events But for that why I need an event streaming platform. Now why can't I go with the traditional approach just like how I deal with a e-commerce platform or let's say how I deal with a regular website.
Now just imagine if a developer deals with the same problem. Okay let's say a developer deals with a food delivery application and there is no event streaming platform. How would developer approach this problem? The obvious solution is APIs.
Now what happens when you're dealing with APIs?
There is a huge problem because whenever user place an order, there are at least 10 events that are created and these 10 events have to be communicated to multiple other microservices.
If you take the same example, orders micros service right in the previous example has to communicate the information with the restaurant's micros service.
Every restaurant has an application or every restaurant has a micros service and they also have to be updated with the information when order is created.
They have to know when order is placed they have to know. Now they also have to know the information when order is ready to be delivered when order is complete.
Similarly the orders micros service has to pass this information to delivery partners application.
Every delivery partner or the delivery boy would also have an application even in their system the event stream has to be notified as on when an event took place they have to be notified. The same way customer support if your order is delayed you would call the customer center and you would ask them what is the status of my order. Now they need to have the status only then they can tell you the status. On top of that there can be an internal team even the internal team should be updated because they might be tracking the order or they might be counting the number of orders in a region in a minute for any kind of telemetry information.
Even they would need the event string that is taking place.
Now if you consider developer using APIs for this imagine if orders microser has to send this information to 10 different microservices then we are talking about 10 event streams sent to 10 different microservices that is 10 * 10 100 API requests per minute that is only for one order imagine for a food delivery application during the peak hours they might even get thousand user requests per minute. Now we are talking about 100 * 1,000. So technically we are talking about 100k requests.
Not only that if you're dealing with APIs that is if a developer is dealing with APIs there are bunch of other challenges. Okay first one we discussed number of API requests. This is the first challenge.
On top of that, developer also has to deal with retries.
It is not possible that every time you send an API request, the other microser would receive the response. It's not possible. Now, there is possibility that the microser is under some issue. Maybe there is rate limiting or anything. So application developers also implement retries.
Let's say five retries per every request. Now we are talking about 500k requests.
On top of that the API configuration.
Now it is possible restaurants microser is written by a developer where this is written it will only accept graphql.
This is accepting rest API request. this is accepting SOAP API request. No, I'm just telling you the worst possible situation. So even API configuration has to be taken care on top of that if this entire platform is on Kubernetes.
There can be IP address change or orders microser has to keep the information of the DNS because IP address can keep changing in Kubernetes.
It has to keep the information of the DNS of each micros service.
And it is also possible each microser can have SSL or TLS implemented. And to keep this entire thing secure, you might have to implement STO or mutual TLS because there are multiple requests between the services and all of them have to be secure. Restaurants microser can be somewhere else. orders micros service can be somewhere else the request has to travel secure so you need to implement mutual TLS through so we are talking about a lot of complication one we are dealing with 500k API requests per minute which itself is a huge challenge on top of that there are other challenges with the API requests how do you address all of this so this is APIs cannot be helpful when you're dealing with event streams.
Abishek, what are the other examples of event streams? Okay, one is food delivery application that you know.
Other even challenging thing is a stock brokering application.
Now imagine the complexity with stock brokering.
At least when you look at a food delivery application, there are at most,000 orders per minute. But during the peak hours of a stock brokering application, this would be 1 million requests.
And same thing applies for stock brokering. Order created, order placed, order accepted. Even stock brokering platform has to send this events to multiple places. It has to update this information at different other services.
So here we might be dealing with 5 million API requests. Technically, it is not possible. Even if you try to do it, you would fail. Even the best of developers cannot deal with 5 million API requests per minute. The service would go down.
Same as with platforms like Netflix.
Let's say in Netflix you are dealing with every single action a user logs in or user performs an activity. Again, these are series of events. Even Netflix cannot use APIs because during peak cards when there is a new movie or when there is a new series there would be thousands of user requests. In all of these cases what you need is an event streaming platform because APIs would fail.
But Abishek how would event streaming platform solve this problem? Okay, now the problem statement is clear. But how would this tackle?
So if you take event streaming platform like Kafka because Kafka is the best event streaming platform.
What you do? You would create a broker in Kafka. But basically you would create a Kafka instance. You don't have to be confused when I say Kafka broker. It's just like a Kubernetes node. When you create a Kubernetes cluster, you actually create the Kubernetes nodes.
Same as with Kafka.
So what a DevOps engineer would do or developer would do, they would create a Kafka broker and within the Kafka broker, they would create a topic.
Now it's all simple.
There are entities like producer. Who is producer? Producer is the one that is producing the series of events. In our case, orders micros service.
And then there can be one or multiple consumers. Who are consumers? Consumers are the one who are reading the events.
In our case, the restaurant's micros service or it can be the delivery partner microser or it can be the customer support microser.
So the advantage with Kafka is that once the topic is created you will notify the order micros service this is the topic where you have to pass the information and topic is just a virtual thing you don't have to worry it doesn't change so once you give the developer of the orders microser the name of the topic and address of the Kafka instance they would send the series of events to the topic and consumer will read the information from the topic.
There is no direct communication between producer and consumer. So the problem is eliminated. If there is direct communication between the producer and consumer, producer has to make thousands or millions of API requests and even the consumer might not be taking the information like event has happened because producer is making 1 million API requests. Consumer might get that information 30 minutes later. But in this case, individual consumer is reading the information from the topic.
Consumer one is reading independently.
Consumer two is reading independently.
Consumer 3 4 5 6 it can be 100 as well.
All that producer takes if there are 10 events and let's say if there are 1,000 requests. So it is taking these 10 events and for thousand requests or for thousand user activities it is posting the information to the topic individual consumer will go ahead and take that information.
Now what if one of the consumer is not active right? What if there is some issue with the rest micros service? It is not responsibility of the producer.
That's the advantage of event streaming platform. Producer says my responsibility is to put the information in the topic and I have done that. Other consumers were able to read the information. If you have a problem, I'm not going to retry or I'm not going to send the information directly to you.
the developer of the consumer micros service is going to fix the issue and once the consumer service is back it can go back to the topic and read the information. This is another advantage of Kafka. You know in Kafka you not only get realtime information. There is something called offset. Right now you might find it confusing but there is something in Kafka called offset using which a consumer that could not read the information at a point it can go back in time and it can read the information from the topic. It is as simple as that.
But Abishek why do we call this as distributed event streaming platform?
Okay, I read on the internet Kafka is a distributed event streaming platform. Right now whatever you explained is just an event streaming platform. Yeah. Again the concept is simple just like Kubernetes even in Kafka the DevOps engineers or whoever is responsible for it they create a Kafka cluster instead of a single Kafka instance. Same thing right?
You would not be creating a single node kubernetes cluster. You would create a kafka cluster with one two three or 10 brokers whatever is required for your organization.
Now a lot of things don't change even when you deal with a broker in this case a topic exists on all the brokers of your Kafka cluster. Right? In the previous a single node Kafka cluster topic exists on one of the broker but here it exists on all of the brokers because topic is just a virtual thing.
Now for the producer again there is not a lot of change. Producer would write the information to the topic. Consumer would read the information from the topic. Then Amish what is point of distributed event streaming platform? If nothing has changed, what is the purpose of it? Now there is an advantage when you're dealing with scale. That is when you're dealing with let's say stock brokering application. As I told you with stock brokering application you might deal with 1 million users at a time. Then what you can do is you can create 10 Kafka brokers and within a topic okay let's say this is a topic within a topic you can create partitions so you can create partition one partition two partition three partition four and you can tell the producer write first 100k requests to the partition one next 100k request to partition two, next 100k request to partition three. So this way within a topic you have logically divided or you have logically broke it into multiple partitions. Abishek what is the advantage of this?
Now what consumers can do? They can create a consumer group.
Okay.
So just like how you create multiple pods of an application if in the stock brokering application customer support best example customer support is receiving the information what the developer of the customer support can do they can create 10 pods customer support zero customer support one customer support two they can create 10 different pods and each pod will read the information from a partition Right? So partition 0 is read by customer support zero. Partition one is read by customer support one. So this way when you're dealing with huge number of events or huge number of user requests then the distributed Kafka system comes into picture. Otherwise a single node Kafka cluster is good enough for you. go with distributed event streaming platform that is distributed Kafka cluster when you're dealing with tons of user requests. Now I'll repeat this topic no especially the distributed part in case you are very new to Kafka you might find it little difficult to understand so let me explain this again imagine there is a producer okay now producer let's say there are three users for easy understanding user one user two user three what DevOps engineer has done created three broker Kafka cluster Ideally three broker Kafka cluster is highly available Kafka cluster. Just like even if you look at what is a high available Kubernetes cluster you say a three node Kubernetes cluster is considered high available Kubernetes cluster. Same is with Kafka as well.
Three broker Kafka cluster is highly available Kafka broker cluster. Cool.
And then let's say there is a consumer group that is customer support first part or instance of customer support, second instance of customer support and third instance of customer support.
Similarly, let's say there is internal system just like customer support. There is an internal application. Again, there are three instances of it.
Now first user let's say write 30 events. Okay. Every time an order is placed with the stock brokering company let's say 30 events are created. So within this Kafka cluster DevOps engineer would create a topic and would break this topic into three partitions and would assign each broker as the partition leader. So this is broker one, broker two, broker three.
Broker one is assigned as P 0's partition leader. Same topic is broken into three parts. Partition one, partition 2. P 0, P1, P2. Now this broker becomes the partition leader for the second partition. And this broker becomes the partition leader for the part third partition. I'll tell you what is the advantage of partition leader.
Just understand this at this point. Even this user's action would have 30 events and this user action would also have 30 events in a string. Right? So first user placed an order, 30 events are created. Second user placed an order, 30 events are created. Third user place an order, 30 events are created. Now producer writes these 30 events to P 0, these 30 events to P1 and these 30 events to P2. So that way in a topic every partition is kept active instead of writing the whole thing to a single topic. That's all.
That is the advantage of partitions.
Even you can implement the architecture without creating partitions. You can only create one topic. But when you create partitions, the efficiency of Kafka will increase.
With one topic, producer is writing all of this information to the topic directly. But when partitions are created individually, the user one requests are written to P 0, user two are written to P1 and user three are written to P2. Even for consumers, it becomes very easy to read the information. So this is what we call as distributed event streaming platforms.
Abishek before we go to the demo are there any disadvantages with Kafka?
Obviously you know whenever you are dealing with a new system it comes with advantages and it comes with disadvantages as well. The disadvantage of Kafka if you don't deal it well it becomes very very expensive. You know sometimes people use it even though it is not required but make sure whenever you're using Kafka you use right number of topics you use right number of partitions and you use right number of brokers. This comes with experience but this is the only drawback of Kafka.
Otherwise Kafka is very efficient. Kafka works very very well. Only if you don't tackle it well you will end up with the cost related issues with Kafka. But best thing that you can do when you are starting with Kafka go with managed Kafka solutions. I made videos on managed Kafka solutions like I made video on Confluent. So you can go with these solutions instead of setting Kafka cluster on your own just like how you don't set up a Kubernetes cluster on your own. You would go to uh GKE or you would go for EKS. In the same way you can go to these systems. They offer manage Kafka solutions and in fact when you go with them as beginner even for your enterprise you would find them cost effective. This is just a suggestion if you want you can set up Kafka on your own as well. Right. Finally before demo what did we learn? We learned a lot of keywords. Let's try to put them in a place. The first thing that we learned I mean the keyword that we learned is Kafka itself.
So what is Kafka? Now you can say Kafka is a distributed event streaming platform.
Then we learned about topic. What is a topic in Kafka? A topic is basically a virtual thing where producer writes the information to the consumer or producer writes the information and consumer reads the information.
Then we learned about partition.
What is a partition? Partition is basically splitting a topic into multiple parts. And why do you do that?
To use the topic in a very efficient way. If a producer is writing tons of information to the topic, consumer find it difficult. Instead you can break the topic into multiple parts and you can write some information of the users to P 0, some information to P1 and some information to P2. Even consumers will find it easy to read the information.
Then I told you something as partition leader. So partition leader is basically the one that is given responsibility of the partition. Imagine there are three brokers.
If you give the first broker responsibility of partition one, then first broker becomes partition leader.
Abishek. What is the advantage of leader? I mean what is the responsibility?
When you are dealing with HA high available setup when you don't want to lose the information right end of the day you are writing a lot of event related information to Kafka. Let's say you don't want to lose that information.
Obviously partition leader or the broker's responsibility is to store that information onto the storage.
It can be S3 bucket, it can be EBS volume or it can be EFS volume whatever you configure with Kafka. So partition leader that is the broker takes the responsibility of storing the information and duplicating the information that is the backup of the events and so we learned about Kafka we learned about topic partition partition leader and broker as well so broker is basically a node within the Kafka within the Kafka cluster you can create multibroker Kafka cluster that's it This information is good enough for you to get started with Kafka.
Now let's see a very simple demo. As I told you will find all this information in the GitHub repo and even the demo that I'm going to perform you will find that information within the GitHub repo.
So please after watching the video go through the GitHub repo. It will also help you quickly. So get clone https.
I hope the uh repository location is right. Github.com followed by Viraala which is my username.
Kafka 0 to her. Okay. So this is the repository. You can find link in the description.
Now once this is cloned let's head to the repo Kafka and this has all the information. In fact this has more information.
So if you go to the src folder, you will find application related information. If you are a devops engineer, you can skip it. But if you are an application developer, you can go through the src folder and see how the information is written. How I wrote the producer, how I wrote the consumer, etc. For the demo, let's start with creating a virtual environment. So what are we going to do? So we will create a producer. We will create Kafka instance.
We will create a consumer. And we will see how producer information is sent to the consumer. And how is this taking place? Live without making any API requests.
Perfect. Python 3 - m venv. Let's call the virtual environment as VNV.
Now once the virtual environment is created, let's activate it.
Don't do it without creating virtual environment. You might run into some conflicts.
And let's upgrade pip and also install the dependencies.
- upgrade pip.
Let's install the dependencies. pip install - e. There you go. So now we have all the prerequisites done. Now let's start Kafka. So if you see I have a compost file. So this is a docker compost file that installs a single node Kafka broker on my local machine.
So these are all the Kafka related environment variables. What you have to focus is this is a Kafka port. So this is the internal port but what I'm doing I'm exposing it on my local host on the port 8080. So once you have this container running so you will find the port map to 8080. You can access Kafka on localhost 8080 within your browser.
And what is other information that is important for you? So we are using this image for Kafka user interface.
Perfect.
So we will run the compost file docker compost upd.
Okay. Docker is not running on my machine. I'll quickly start it.
Cool. So now we have docker running. I started it on the other tag. You might not see it. But now when I run it, docker compos t in the background. We have Kafka instance running.
Just wait for all the containers to be in the created state only. Then try to access Kafka from the browser.
There you go. Now let's head to the browser and quickly see localhost col80.
Cool. So this is our single node Kafka cluster. In fact you can see it here. So it says if you go to the broker the broker count is only one. That is only one node Kafka instance.
And right now there are no topics. We haven't created any. If you look at consumers, there are no consumers as well and there are no producers also. We just started the Kafka instance. Once producer has active, we will go to the topics and we will see the events information here. You can also create topic from here. But let's do it from the CLI.
Going back. So I'll go back to my terminal. Now let's create the topic through the CLI. So we will use this Python command.
Python examples live topic setup.l. As I told you, if you're not interested in this, just simply go back and create topic from the UI. But I'm trying to explain how things work in real time. So the name of the topic is order events live.
Okay, cool. Now the topic is also created. Let's start the producer.
Producer is the one that actually writes information to the topic.
Even producer code is in the examples.
So if you go to the examples, you have live producer.py file. If you want, you can read it. Again, only if you're a developer, you might understand it. If you're a DevOps engineer, I would highly recommend you to skip it because some of the logic you might find difficult to understand.
Cool. So how do we start the producer?
Just run the command python example live producer.py followed by write the information to this topic. Now this is the responsibility of the developer.
Whenever the infrastructure engineer creates Kafka instance and Kafka topic they have to share that information to the developer and developer would take the responsibility of writing the streams to the topic. Now you can see if we go back to the topic let me refresh. So we have order events live and if you go here go to the messages you will find bunch of messages written. Observe carefully.
So these are the event streams related to order one in real time. You might not find it um this synchronously that is you know this is for uh the first user order. So order is placed then you have order is accepted then you have see here look at the status order is created order is accepted order is prepared then order is ready to pick finally out for delivery and order is delivered in real time you might not find it this symmetrically you might find one event for order one event for order because it is possible all the orders are placed parallel but just for your understanding I wrote a while loop in the Python and I placed all of the events in the loop. I did this intentionally.
Cool. Anyways, so this is how producer writes the information of all the event streams to Kafka broker. Right? Nothing is written to consumer. Now let's start the consumer. Okay? Now let's see once the consumer is started will consumer be able to read the information from the topic.
For that we will keep this session active. Okay, we will not disturb this.
What we will do? I will take a different tab. Okay, this tab is for producer.
This tab is for consumer.
Now I will start the consumer as well.
Save.
But I want to show you before starting the consumer within the consumer logic how I wrote to read the information from the topic. So this is a consumer logic. If you go I have provided the information of the topic and from this it is reading the information.
Perfect.
So now let's start the consumer python examples.py and provide the information of the topic. But Abishek why are you providing here? You already pass the information that is the default here. You can override if you want. Right? So if you don't want this you can skip it also.
But if you want to override the topic name, you can do it from the command line arguments.
Cool.
Oh, my bad. I did not uh activate the virtual environment. So, let me activate the virtual environment and rerun the command.
There you go. Now, consumer is running and you can see consumer starts reading the information. Right now, it is reading the information of 33. If you go, producer is writing the information of 33.
As soon as this goes to 34, go to the consumer, you will find information related to 34. So this is very very live. That's the advantage. Imagine if you are making API calls. This one single API call from producer to consumer might take 5 seconds. What does that mean? Consumer receives the information 5 seconds late. And what if it is making 500k API requests or 1 million API requests? Then once this API request is made, consumer will read the information 20 minutes later. But here with Kafka, see how live this is.
Sequence number is 237. Just switch the tab. Sequence number is 239 240. Right?
So it is very very real time. It is not even taking 1 second for the producer to send the information to consumer. All this is happening through Kafka broker.
One final time producer is writing information to the broker and consumer is reading information from the broker.
This is what we call as event streaming platform. And because in Kafka you can create multiple partitions and you can create multiple brokers. That's why we call Kafka as distributed event streaming platform.
I hope you found the video informative.
Please go through the GitHub documentation. I wrote it you know with a lot of care. Of course I took the help of a to write the uh documents but I have provided the information so that it is very beginner friendly. So go through the information in the GitHub repo practice the demo so that you also get a feel of Kafka.
If you have any questions do let me know in the comment section. Thank you so much for watching the video. See you all in the next one.
Related Videos
Re: 🗣️📍theprophedu📍2026 GST 103 CLASS (E-EXAM REVISION)
theprophedu
636 views•2026-06-04
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Instagram accounts got PWNed
EricParker
13K views•2026-06-03
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 views•2026-05-29
So What's Odin Lang Even Good For
TechOverTea
131 views•2026-06-01











