Airflow is a standalone orchestration tool that manages data pipelines across any cloud service (AWS, Azure, GCP) by using DAGs (Directed Acyclic Graphs) as the core workflow structure, where each DAG defines tasks, their execution order, dependencies, and success/failure handling, while operators serve as the building blocks that specify what actions each task should perform, enabling complex data flows with backfill support and multi-cloud integration.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Data Engineering Add-On Program: Spark • Kafka • AirflowAdded:
Okay.
Let's get started.
Okay. So, I hope my screen is visible.
In the yesterday's session, we were talking about Airflow.
Now, what is this Airflow?
We have an each and every cloud service individual tool.
Maybe in GCP we have a separate tool which can help you for executing your pipelines, monitoring your pipelines.
In Azure you may have, in AWS you may have, in GCP you may have.
But, this Airflow uh becomes a standalone tool.
It becomes a a standalone tool which could work well with any of the services or any of the services could be called.
Okay? Now, here with respect to Airflow with respect to Airflow we can orches- strate the entire project. When it has to execute, what time it has to execute, which one has to execute first, all of this we can go for it. And why we are going with Airflow?
Like, most of your your native clouds orchestration may not work properly for all the task. That's why we're going with Airflow. In Airflow we can do some complex data flows as well. Right? And it comes with certain controls and in case if you if you are like comparing, it works with multi-cloud. You can have a complex dependencies. You can have a backfill support. That means you can work with the older data. Right? So, all of this, right?
Now, with respect to Airflow, we have something called DAG.
We have something called DAG. So, I can say DAG is the heart of Airflow. Okay, I can compare it like that. We can consider your DAG is a heart of Airflow.
Now, here what I'll do, I'll give this as directed acyclic graph.
Let us consider one DAG to be data pipeline.
Or we can consider as a one workflow.
Now, what is this DAG does, right? So, it basically tells what task I should run.
What task I should run.
Which I should run first.
Order of execution. That is order of task.
And what should I do?
If task success or fails.
Right? So, this is exactly what we need to with respect to mention it in DAG, okay? You can consider like a DAG is like one complete pipeline or a workflow.
It can help you to figure out which I should run first.
And after running that, what I should do.
Followed by the order of execution. And in case if it's a failed or success, what should be the process, okay? Now, let's take a a simple pipeline.
Let's take a simple pipeline.
In the simple pipeline, imagine I need to read the data.
Uh I need to clean the data.
I need to load the data.
And I need to send an email.
Let's assume that this is one pipeline.
Reading the data, cleaning the data, loading the data, and sending the email, okay? Now, what happens in the DAG is this complete pipeline is called one DAG.
Okay?
This full pipeline is called DAG.
Okay? And and one more thing you need to understand is I'll give a name. Okay? The DAG name is daily sales report or sales pipeline, something I'll give.
Right? Now, each step what you have, right? Each step what you have, right? This is actually called task.
So, task number one, reading data.
Task number two, cleaning data.
Task number three, loading data.
Task number four, send email.
Right? So, that means each step is called task.
Each step is called task. Now, why it is called D as directed, right? Always remember, you will have the flow here, right? From this uh reading data, it is directed to clean data.
And from the clean data, it is directed to load data. And from load data, it is going to a send email. That means the complete flow, the complete pipeline has a direction.
It has a direction from which direction to which direction it should go, right?
And it is also called acyclic. Acyclic means uh it cannot loop over it, right?
It cannot loop over it. It has to be always in a direction. It cannot form a loop.
Okay, that means uh what I will do from A from one one to two two to three three to four, again I cannot go and say task number one, right? This cannot happen.
This I cannot This I cannot happen. It can never form a loop here, okay? So, it should not form a loop. That is why it is called as a acyclic. The correct direction will be 1 2 2 2 2 3 3 2 4 and 4 2 1 it can never happen. That is one thing you need to remember, okay? Now, here if you want to learn Airflow, if you want to learn Airflow, majority of the times the logics and the implementation like which one has to execute first is written in Python.
It is written in Python, okay? So, when you say Python, in the Python you will be writing the code. So, if if someone has to write the logics, writing in Python gives you a good flexibility. And here you can easily define the DAGs, you can easily define the task, okay? So, let's execute or let's write one simple skeleton of how this Airflow code will work, okay? So, I may not be going completely writing an end-to-end code as of today. Today we can see a a simple code. How does this work, okay? So, you need to get that confidence of writing the Airflow, okay?
Now, as and then if you want to go and execute something, uh let me you do one thing. Let me go and write it in Python itself.
Of course, I'll not write in Jupiter notebook, but I'll show you how the code behaves like, okay? So, that I'll show you.
>> Let me open this.
Here, let me write one simple Airflow code.
Let me write one simple Airflow code, okay?
Now, I'll start with from Airflow import DAG.
Okay?
Now, I'll define some functions, okay? So, probably I can define one task one.
I can write some logics here.
Next, I'll define task two.
Again, I can draw write logic here.
Again, I'll write task three.
Just writing some comments here, okay?
Now, after that after probably I'll take let's say five tasks.
Okay. I'll take task one, task two, task three, task four, task five.
Now, once I take this task five, we can define one DAG.
Now, how do you define one DAG? Okay, with DAG open bracket close bracket as DAG.
Now, in this we will have to provide all the information, okay? We will have to provide all the information. For example, I'll give one DAG ID.
I'll give one DAG ID, right? And followed by I can give description.
Then followed by start date.
Then schedule.
Okay. So, like this I can give. Now, whenever I'm initial- initializing one DAG, what do you mean by DAG? DAG is like one pipeline.
DAG is like one pipeline. Reading the data, cleaning the data, loading the data, sending an email will come as one full pipeline in DAG, okay? So, this is how the DAG will be called as. So, here I'm writing a simple DAG. Now, what is a DAG ID? I'll give a a simple demo pipeline.
Now, description I'll say that a DAG with four task.
Okay, start date.
So, date time of Again, I'll have to import a for this.
The date time of 2026 05.22 from today.
Okay, from today 22 5 2026 I'm starting this. Now, schedule, right?
Now, here we can give daily.
We can give daily.
Okay, like this we can define the we can define the DAG. Okay. Now, now here once you dig this once you design a DAG, this is a DAG, right? This is a complete flow is a DAG. Now, here we have something called another important topic we have is we have something called operators.
Okay, we have operators. Now, understanding Airflow is very very easy.
It's very very easy, okay? Where did we start?
We told that it no need to use any of the cloud services. You can directly go with the Airflow. It is a standalone tool for orchestrating your pipelines, running the pipelines, irrespective of whichever cloud you're using. You're using AWS, you're using Azure, you're using GCP, you can always go and use Airflow. That is what we learned, right? And from there, what we learned? It can do all your complex data flows as well. Why Airflow is important? We have seen this.
And in this this is a main hero. In this particular movie, who is a main hero?
The main hero is DAG. Now, what does this DAG does? This DAG basically one complete pipeline we'll call it as a DAG. So, what does this DAG tell? When should I run? What should I do? What are the order of the task, right? Here it is.
Okay, now this has four task and this runs on daily basis. So, like this I will have to mention when should I run, who's going what is is to happen when it is success or failure, everything I'll have to mention, right? Now, as we take a simple pipeline, reading the data, cleaning the data, loading the data, and sending the email, right? So, all of this is called one DAG. Now, each step is called one DAG. This is one step, this is one step, this is one step, okay?
Now, apart from that, we have something called operators.
Okay? These two concepts, if you are aware, most of the things you will be able to design it, okay? Now, what is the operators?
Operators are a building block used to create task in DAG.
Okay? So, that means in short, in short, operator will tell us what action should I perform?
What action should I perform?
Okay?
Now, this will be the hierarchy.
DAG will be full pipeline.
Task will be one step in a pipeline.
And operator will be type of >> Okay, so it will be the type of work done by that particular task.
Now, here this is a flow. Okay, if you know to define this, we have already defined a DAG here.
We have already defined a DAG here. We have already defined the task also. Now, when to execute each task, right? We can go and have it. But, in order to do that, we have something called operators. Now, operators will tell you what you're going to do in that particular task. Okay, the type of work done by the task. Now, if you see, there are n number of operators in Airflow, but I'll be giving you uh the most commonly used that we usually use it in our pipelines, particularly for the data engineering pipelines. See, not only for our um data engineering, okay? This tool can be used for anything. Even for the models, data science models, even if something has to happen in a data analytics also, everywhere we can use. But, we will focus only on for our role, right? This is a tool. This is a tool, okay? To give an example, scissors, okay? Now, scissors is a tool.
This scissors could be used for n number of use cases, okay? You go You go to a barber, he he uses for cutting your hair. You You use it in your house, you use to cut your packets. So, like this scissors has a different usage, correct? So, similarly, a tool will be used by different people at different instances for different use case. But, what we are going to do? We are going to do how this Airflow is going to help be helpful for our pipelines.
That is what is important, right? So, here some of the commonly used commonly used operators are >> Okay, so we call it as Python operator.
Okay. Then uh email operator.
S3 list operator.
Okay, I don't want to write it here, but we will go and see from the documentation only.
Okay, let's say Airflow.
Let's say in Airflow um we have a a documentation here.
Okay. This is a Apache Airflow documentation. It is a platform created by the community here. And uh I'll go and see the documentation.
See.
Here uh Airflows and then you can see a command line interface. Okay, and all of this you can see.
Now here, if you come to the operators If you come to the operator, see uh I I always uh get this question.
What is a source where we can learn any new technology?
Okay.
Honestly, I always believe in learning from documentation. Okay? If you go for anywhere like in YouTube or maybe in uh probably in LinkedIn or maybe in any of the articles they may not come they may not give you a complete information. Okay?
Um and and also is something which I will say is half knowledge is worse than no knowledge. You know why? With the half knowledge, you end up in disaster decisions. Okay, something if I don't know from scratch, okay, fine, I'll agree it. But with the half knowledge, what happens, right? You mess up things.
This usually happens. Okay, so if you want to learn something in detail, I strongly believe in learning from the official documentation. It could be any tool. Always without official documentation, they will never release that product in the market. Please remember this. Tomorrow if any new technology is coming, directly you're not going and searching in YouTube or maybe in any of the online courses, you should be finding that in the official documentation. Your official documentation will always help you to get the complete details. And this Apache Airflow, you can see that um maybe use case and announcements, blogs, documentation, meetups, you can see here what are the versions that has been there. For example, now we have a stable version. Now, what is Airflow? So, you can see open source platform for developing and scheduling and monitoring. So, you can go and see this, okay? See, honestly speaking, you can learn everything from here.
You need need not You need not even listen all this from me. You can learn it. But unfortunately, you'll not be able to implement it single-handedly if you're using for the first time. That is why we always believe in learning from someone who has already done it, right?
So, here you can go through it. You can see what is Airflow, what is the things that we can do here. Everything you can see here.
With this, how to install, right? And what will be the UI for this, right? What will be the UI? It looks like this. So, probably for each and every run, I can see when it is success or failure. And in terms of failure, what is the errors? All of this I can see, Okay, so probably here each row includes it tells us what it is right dag ID schedule what is the status everything it will tell you you can apply some filters and check in case of failed one we can go and filter with failed and check anyway we will be doing all of this but this is something which will tell on which particular day this was executed red means it's a failure green means it's a success all of this you can do right now why are we here we were looking at the operators. So where is the operators?
Now we have the operators here. Okay.
So if I click the operators see here is a list of operators that hooks that are available in this release note that commonly used operators and sensors such as python operator external task operators are provided by the standard package. Airflow has many integration available. Okay, so here see operators.
Okay, all this are operators.
Assume that we are working with Azure.
Okay, in Azure if you see right with respect to Microsoft Azure blob storage. Okay, so we can see airflow.providers.microsoftazure.operators and if I click this if I click this so when to use this you will get to know.
When to use this you have to get to know. We have to provide a container name you have to provide a blog ID name we have to give all these details. And if you're giving this, automatically it will go and right call this function, okay? Now, we will go with ADF, okay? We'll show you ADF, Azure Data Factory.
So, in Azure Data Factory, with respect to operators, you have airflow.providers.microsoft.azure.operators.datafactory, okay? So, I mean, operators is there, hooks is there, sensors is there. Each one does a different functionality. If I go and click this, now it can tell what it can do, okay? We can execute a a Data Factory pipeline.
So, Azure Data Factory Run Pipeline Operator, okay? We can use this. In this, what I need to do? If I go and click it, it will tell you, "Please give the pipeline name."
Okay? And please give the resource resource group name. What is a factory name? Okay, is there any other reference for this? When it should start? When it should end? In case of failure, what I need to do? All of this, if I provide it, it's more than enough for us, right?
So, since we are aware of the Azure, we can see that we can have all the services, like for example, in the Azure, right from the batch, block storage, compute, container, Cosmos, Data Factory, okay? Synapses, even in for Microsoft Power BI also, you can do it. If you go with a Power BI, in for Power BI, what it is saying? We have operators, hooks, and guides. So, if I go to the operators, if I go and click this, you can see that with respect to this one, it will refresh a Power BI data set, right? Get a list of all the workspaces, get a list of all the data sets. Now, all this are called operators. And I'm telling you, even I don't remember all the operators, and no need to remember.
No need to remember. There could be hundreds of operators. No need to remember. And of course we will do spelling mistakes, typo mistakes. So it is always good that we refer from the documentation. Always Airflow documentation will be there on my screen if I want to use any of the operators so that I'll get to know what is the syntax. Okay, what is the syntax? Now here whenever I am using data factory operators, right?
Data factory operators.
See.
Data factory operators. Now this is how it looks like.
This is what it looks like. Okay, see here.
Data factory run pipeline operator.
Okay, what is a task ID? What is a pipeline name? If it has any parameters, you'll give it. Okay. And then you can also pass the task ID. What task needs to be done? Is there anything which I have to wait for? So like this we can provide all the details. So this is called one operator. Now what is this operator? This operator will tell us what needs to be done. So this Azure data factory run pipeline operator, we will have to give all the pipeline details about this and we will be able to get it. See, very easy. You just need to copy this code.
Build your own Python code. That's it.
Right?
Build your own Python code.
And then you can fill your details. Okay, what is a task ID? What is a pipeline name? All of this. So this is called operator.
This is called operator. Like this you can build a Python operator, you can build a email operator, you can build a S3 operator. So it keeps on going on. It keeps on increasing. Okay. So that's why I keep saying it, when you know Python, when you know libraries, when you know documentation, uh that's it.
Airflow is that's it. Okay, so probably I'm writing in a a plain notebook. But we can also have a good UI for this. Okay, we can also have a good UI for this.
Like a UI overview.
Okay, so Airflow UI provides a powerful way of monitoring, managing, and troubleshooting. So you can see you can see a dark theme by default, but you can also have a light theme it seems. So we can have this. Okay, so here it's a open source. Anyone can use it, but you need to configure it well. And as I said, this becomes a standalone could be integrated with any of the tool and you can monitor it. Okay, so it since it's comes with a a user interface, one can go and filter it, monitor which has failed, what is the reason, and error logging, email notification somewhere in the Airflow has failed, immediately I'll get a notification, right? See for example, when I'm running a pipeline in the midnight 1:00, 2:00, something has gone wrong, I will not be seeing in front of the screen, right? My screen will be off, my screen will be shut down. So how will I get to know? And at 2:00, what I can do if a pipeline is failed, I can get an email notification.
That email notification will be triggered to my email. Probably I may see in my mobile. If worst case, I need to react immediately, maybe I can come online and fix it. So that type of notifications, status, details, workloads, everything can give you.
But if I'm doing without this and if I'm doing everything in maybe Azure, probably what I need to do, I need to go and separate do it separately for Data Factory, separately for Data Bricks, separately for pipelines. So it becomes bit hard for mapping. And this becomes like a one-stop shop for all of your flows. And that's why I say this, this is the one tool which can be much better than all the other tools which is available in the market. Of course, there are lot of competitive tools here, but this one gives you a lot of flexibility where you can build your own bag, own task, and own operators. It's all about writing the logic.
Okay? It's all about writing the logic.
I say as I say that, building a bag, building a task, and building an operator. Now, operator, you need not remember everything. Always check from the documentation, and you figure out what you're doing. Right, what you're doing. If you're building a pipeline, if you're building a pipeline, let me show you one one simple project.
Let me show you one simple project.
This is one project.
This is one project, okay? The project name is Optum.
So, this project we have built it using Azure services, okay? This is a an healthcare project that we have done. Now, you can see that some data is coming from MySQL, some data is coming from um and some data is coming from blob, and all the data which is required for Optum, disease group, and then um your patients subgroup, everything. So, I'm getting this, and here we are using Azure Data Factory. Then, it goes through a different layers in Medallion architecture. In Medallion architecture, we have a bronze, gold, silver. So, the data goes into different different layers, and finally we are going to get into the Azure SQL database, right? Now, if I if I if I count this, uh we are using Cosmos, we are using Azure Data Factory, we are using blob, we're using Azure SQL database, you're using Logic Apps, you're using Key Vaults, and you're using Azure DevOps. Now, all these are the services, right? So, what you can do in this particular operators.
In this particular operators, you can figure out what is that operator which I am looking for.
I am looking for what?
Azure, correct?
See, for Amazon, it has a lot of service support.
Okay.
And you can see this for Microsoft Azure.
In Microsoft Azure, Airflow has limited support for Azure, but yeah, when I say limited, it's it's still possible that we can build the entire pipelines here.
Okay, and if you can see here base Azure providers and what are the services that it supports, you can see.
Okay, it supports Cosmos is there. Okay, we have a Synapse we have, we have a storage data lake storage, we have a data factory. Now, all of this we have.
So, we can use any one of this.
And then we can use it for our use case. Okay.
So, let me go and operate use the operator. Now, you can see.
Creates a new object from the past data, deletes the files in a specific path, list all the files in a specific path. So, like this we have operators. So, for every functionality, we can have a operators.
Right, with this operators, we should be able to achieve it.
Okay. So, these three things is very important as part of our building a DAG. I mean, building a Airflow pipeline.
Now, this will tackle all the complex adding dependencies, in case of failure, what it should do, in case of success, what it should do, notifications, everything will be we will be able to get it. Okay? Now, with this now you should have got a a complete idea about what is this Airflow and Kafka. Both are not related to each other. Both are good at each one of them. Kafka is very good at real-time streaming, whereas Airflow is good at orchestrating and scheduling the pipelines, monitoring the pipelines.
So, this two will definitely be a a value add.
This two will definitely be a value add for your data engineers. Okay? So, data engineers with Azure or AWS or GCP services having a standalone experience, we can go for it. Okay? Now, I I I I actually uh I had this um concern. I mean, I wouldn't say concern, it's an example, okay? It's an example.
There was a resume that we wanted to pick. Okay?
There is a a resume that we wanted to pick.
That resume was sales executive.
Okay, sales executive. Now, what do you mean by sales executive? It is one of the role.
It is one of the role.
Okay? Sales executive is one of the role where we wanted to hire, okay? Now, usually in the in the recent days, what is happening? Most of the companies are getting filtered.
Resumes are getting filtered with AI tools.
So, AI tool AI tools will actually pick the resume. Okay?
AI tool will actually pick the resume.
So, whether this particular resume is matching my job description or it will check. Okay? Now, in in this in this resume uh it came up to 92% score.
So, what is this 92% score? GD and resume matching score.
GD and resume matching score.
Actually, this happened in in our organization. Okay? That's why I'm sharing these thoughts.
So, GD and uh resume matching score, it was 92%. Okay? See, for a for a person with 95% probably we should have picked this resume.
Right? Because it has highest matching.
So, most of the features you will have this. Okay?
Now, with technical tools, both are same.
With technical tools, both are same.
That means whatever the technical things is needed for sales executive, when is a technical like you need to know Excel, uh probably you need to know how to uh enter make a data entry, uh you need to know where to go, how to go, and you need to have a well-planned structure.
So, all of this is a a requirement for sales executive. Okay?
Now, you will not believe in the resume to the person in his in his one component. Usually, we don't do it, but this guy has done it.
Apart from resume, he has mentioned as hobbies.
Okay, hobbies like usually we don't do anything like hobbies. Hobbies in a sense, what you do apart from your work, right? And this guy has mentioned enjoy traveling.
I enjoy traveling.
Okay?
Now, we sent this resume to our AI tool. Our AI tool is actually giving us the one it should pick one resume, okay? And unfortunately, both the resume is giving you 92% score. So, in this either one I can pick actually. Either one I can pick.
But but my model, my AI tool is actually recommending this.
Is actually recommending this.
Now, we're all wondering, how come?
With all the skills that I have, both the resumes have the same sort of things. But how it is raising how it is giving us the resume two as a preference, preferred one, priority one.
Then we observed, see, we are brilliant.
AI is also thinking that it's brilliant.
You know what it is doing?
It is assuming that for the sales executive, one has to travel to many places for business.
Like he has to sell the product, right?
Sell our business. Imagine we are developing our own app. He has to travel. He has to go and deliver presentation to the clients. One present one client will be in Chennai, one client will be in Goa, one client will be in Delhi, one client will be in Maldives. So, he has to travel.
Right? Sales executive has to travel.
So, this particular AI tool assuming that okay, in the hobbies he has mentioned enjoy traveling and the sales will also gets a traveling. So it is linking each other.
Of course, traveling for a personal traveling as a solo is not equivalent to traveling for an organization or a business. But it is linking.
Right? Now, why it is matching? Because this one traveling it is considering as one additional extra skill.
We know that we humans can decide. But the AI tools are miscalculating here.
Enjoying traveling is considered as enjoying traveling for his work.
So maybe in one of the GD, okay, in one of the GD if you if you focus on certain GDs, right? They would have mentioned ready to travel. They would have mentioned, okay.
Probably I can show you this.
Let's see.
LinkedIn.
Let's see. I'll open a LinkedIn and I open any job.
Okay, I'll open a sales executive.
There would have somewhere mentioned traveling frequent travelers like this, okay? So if you see here Okay, maybe they have not given.
Okay, see here.
Uh key qualifications, uh strong understanding of digital market essential communication. Okay, now you will see.
Preferred candidates from Bangalore only are willing to relocate, okay? Probably in this we don't have, but in the requirement you can have is preferred candidates from Bangalore willing to travel. There will be one point.
Willing to travel.
That job description is going and mapping with his hobbies.
That's slightly has happened in our real case.
See how it has changed the complete context here.
This person has mentioned hobbies as traveling and in resume and in the job description it has actually given us ready to travel and both are linked.
That is what the AI tools will do.
Right? Now you know why I've told you this example. Why I've told you this example?
A data engineer with cloud data engineer with a complete one cloud.
Let's say Azure.
Okay?
Data engineer with Azure and data engineer with Azure plus Kafka plus Airflow this has more weightage.
This has more weightage.
Compared to this compared to this this has more weightage.
Because additionally, apart from your Azure services as a data engineer, you're trying to give another additional tools that you're aware.
So tomorrow if any company wants to use Kafka and Airflow rather than considering this, see I'm not saying this person will not get job. Okay? Data engineering is always in demand. Okay?
I've been seeing for the last 5 years.
Every time I hear this, data engineers are the guys who get lot of frequent calls on weekly basis. On every week on an average, even I'm doing that. Even I am keep on uploading my resume and checking how many calls I'm getting for data engineering. Literally, on an average, 15 15 calls we get it for about 4 to 6 years of experience. 15 calls we are getting.
Right? But, when we look at the job description, sometimes they say that we need that tool, we need this tool. So, they'll start filtering. And most of the screening is nowadays is happening with the AI tool itself.
So, AI tool itself is restricting your resumes. So, if the two resumes has been prepared, one with Azure with data engineer Azure data engineer, another one is this data engineer. Probably, if this one, the GD and the resume matching could be somewhere around 78%.
But, this could lead to 83%.
So, obviously, the 83% of matching GD and the resume, that will be picked. So, your resume will not be picked here if you have just mentioned Azure.
That's why we wanted to understand because we are doing this, right? We are filtering out resumes. As a manager, I keep getting it. And I am using that AI tool. I pass the GD in one Even even in In fact, our own AI tool, Clear My Interviews, is doing this. We upload the job description. We upload the resume. It will give you what is the score.
The same thing if you're do If an organization is doing, they will pick this resume than this.
So, knowing additional tools will always add on. Now, I'm telling you one more time, this is not the end.
This is just the beginning.
Tomorrow, they may ask you to have AI knowledge. You need to have.
Tomorrow, they may ask you, we need to learn some XXX. Tomorrow, they'll ask you to learn YYY.
Tomorrow they'll ask you to learn ZZZ.
Whatever they're asking, we uh should be adaptable and we should be readily available.
Data engineering is a core and after that trying to add on certain things to your subject will always keep you adopted in the market. If you want to be visible If you need to have that visibility in the market, you need to have maximum skills. Okay? There were the days where people used to ask what experience you have. But now the things have changed. Now they ask what tools and skills you have.
So experience has now turned into skills. So in any organization, the value is given for skills than experience. Right?
We say that he's a 10 years experience, 15 years experience, 20 years experience. Now they're asking he knows two tools, three tools, five tools, 15 tools, 20 tools. Now tools and technologies is being valued. Tools, technologies, skills is valued than experience. A person with 15 years of experience with one tool is not at all in the market when I compared with a fresher with five tools.
So please learn as much as your tools because this will help you to get filtered and get your opportunity on this part.
Okay?
Are we all good?
So Kafka and Airflow will definitely be a very, very good add-on to your profile.
Okay? So with this we have completed the first five free sessions. We initially started with Kafka for for the first three days. The next two days I've given you the overview of Airflow. Now, this five days is an introduction class. When I say introduction class, we have set up the stage for you. The stage is set.
Now, all is needed is just go and dance.
From the next week onwards, we will be getting into the implementation part. Okay? So, this five days session is just to give you a overview of what has happened. And for the next 25 days, we will be completing step-by-step and we'll be doing doing two projects. One will be completely focusing on Kafka. Another one will be focusing on completely on Airflow.
Okay? I've already told you the road map. It comes around 25 to 30 session, and you should be knowing data engineering, Python, and your at least you should have done one project in your warehouse. Because I cannot repeat Python. Okay? I cannot again go for data engineering. You should have already gone through it. If you're not aware of this, probably what you can do, you can join first data engineering course, join first Python course, then maybe you can come here. Okay? So, this is a pre-requisite, I would say. And it's going to be daily 1 hour 6:30 on maximum 6:30 to 7:30.
It could be delayed by either 5 or 10 minutes, but 6:30 6:45 will be the time.
And we will be taking about 1 hour session. And the cost of this course will be 10,000. And if you're already been a part of KSR, you'll get some sort of discounts. And you'll get all the benefits like resume preparation, and one-on-one guidance, and how you can involve your Kafka and Airflow to your resume. And don't expect that you will be able to put five years of experience or six years of experience. I would have told if you have learned data engineering with me, I would have told if you have learned data engineering in KSR, the maximum number of experience that you can put is 5 years. But don't expect the same thing for Kafka and Airflow because this is a tool that we are learning just for 1 month. So, with this 1 month of learning you can show 1 year of experience. More than enough.
More than enough. You can say that I have overall 5 years of experience in data engineering and 1 year I worked on Airflow and Kafka. You can easily tell this.
Okay?
Are we good?
Any question?
I see couple of questions in the chat.
What is the pay package usually offered for a fresher DE with Azure, Kafka, and Airflow?
See, uh a data engineering with Azure For freshers, it doesn't matter, okay?
You know why?
As a fresher, they know that you've done only a a use case or a POC project, right? So, data engineering with Azure or data engineering with Kafka and uh Airflow, it's it's not going to change much, okay? For fresher is a fresher, right? So, fresher probably nowadays we are starting with 4 to 5 lakhs.
So, the 4 to 5 lakhs stays same irrespective of this. So, this Kafka and Airflow may add value if you are experienced.
It may get because experience is what is matters for a companies. So, probably you may see a 2 3% or even 5% of increase for additional tools.
But for freshers, it's not going to change, okay?
Uh okay, let's say the another question.
Is it possible to have a couple of sessions on DBT?
Because most of the companies are also asking for DBT. Yes, that's right. That's why I told you today Azure, tomorrow Kafka, day after tomorrow Airflow, they may then ask AI, and who knows, they may ask DBT also.
Okay? So, we'll plan it for frequent workshops. Probably after all of this, we'll have another workshop for DBT AI.
So, we'll try to have in the recent course in the recent batch, the batch that we have started was AI data engineer. Okay?
But when you guys were learning or in the past, it was just a data engineering.
Now we have included AI concepts. Okay?
That AI concepts will also be part of a one-month course in the upcoming sessions. Now it's Kafka and Airflow.
Next one month it's going to be DBT or later one month it's going to be AI.
Like this, we are going to add all the toppings on top of data engineering.
20 200 rupees pizza if you buy itself, you are putting all the toppings, extra cheese, right? Extra toppings you're doing. This is the one which is going to be your career. This is the one which is going to get you a job. Why can't we add the toppings on top of it? So, let's add all the possible tools on top of the data engineering to be in demand and we should be readily available in the market for whichever tool they are asking.
Okay? So, we will be planning for DBT as well.
We will be planning for AI as well.
Whatever is needed, we'll be planning for the upcoming sessions.
What knowledge exactly we need to get with AI additional to Azure data engineering we learned last year. See, uh particularly for AI uh from compared to data engineering and AI data engineering, right? You need to know how LLM models works, right? So, what is the role of a data engineer, you tell me?
Providing the required data to different teams as needed, right?
Data analyst, they're asking in a separate version. Data scientist, they're looking for a separate version.
So, you're Each and every team is looking for a a predefined data, and you're giving. Why you're giving?
Because you're a data engineer.
Similarly, for the guys who are building LLM models, right? GenAI developers, agentic AI developers, they need a specific format.
They need the data in a vector format or they need a data in a proper structure format. So, for them you're going to give them data. So, for AI data engineer, knowing some basic models of how rag works, how LLM model works, what is this agentic AI, what is this generative AI, knowing that is should be good add-on.
We are not asking you to build a model there. We are not going to ask you to build a agentic model. We are not going to ask you to build a computer vision network. No.
It's just that you need to understand how it works so that as a data engineer, you can do what is needed.
So, I'll tell you a very simple question, okay?
I let's say in today afternoon, we are going to prepare one special food.
One of the special food that I can always think of is let's say biryani, okay?
I may not know how to prepare biryani, but I may know all the ingredients that is required for preparing biryani.
That is important.
I know every ingredient.
I'd say I I go and say that in my in my house, let's say I go and ask my mom, "Today let's prepare biryani."
She'll say, "Okay, you buy all the things." But I don't expect her to tell the things. I know it.
I know because I'm the person who's going to get her everything. Similarly, for a AI data engineer also, you are the guys who are going to provide the data for AI developers.
You are the guys. So, knowing what is needed for them, if you know that, it's more than enough. We are not expecting you to know A to Z of AI. You should know chat GPT, you should know Gemini, you know you need to everything A to Z.
No. We are expecting you to know what is the requirement, so that you can provide them the requirement. That's it.
If you know that, well and good.
Any other questions?
Okay. So, with that, if you don't have any questions, we'll stop here for today.
Please reach out to our team for additional information on this course.
And from Monday onwards, we'll start installing and we'll start practicing our Kafka and Airflow topics.
Hello.
This is Jagdish. Yeah. Like is it Spark also comes into picture of this program?
No. This is we are covering only Kafka and Airflow. So, as I said that the prerequisite is you need to already have a knowledge of data engineering and Python. So, it is expected that you already know Python and PySpark Spark before learning this. If not, then we strongly recommend to go through our a bigger program that we are conducting for 6 months. There we are teaching SQL, Python, big data, Hive, Spark, then Azure, Azure services, fabric, CICD pipelines, everything we are covering for 6 months. It's a a complete course there. First, you learn there, then later you can take up this add-on.
Uh no, no, no. I'm I'm already working as a Azure data engineer. So, yeah, I'm missing this real-time streaming and all. That's why I was interested.
>> Okay, then okay, PySpark is not mandatory needed here. If you know data engineering, if you have built a pipeline, then go for it.
Okay. Okay.
But but it should be processed, right?
Only I mean at least my knowledge like Kafka uh send events to uh Spark streaming and the Spark streaming will process the data, right?
>> Nothing like that. It It's not See, Spark streaming is a one way of processing. Okay, without Spark also you can still process using Kafka.
Okay, like it's a competitor for the Kafka, right?
Spark streaming. I wouldn't say competitor. What your Spark can do, your Kafka also can do do. Okay. Okay, got it.
Thank you.
Yeah.
Uh and is it a 5 days per week or Uh it's it's 5 days. It's Monday to Friday.
Monday Okay.
Thank you.
Okay, we'll stop here.
Please reach out to our team in case of you have any questions. And Uh sorry sorry, uh one last question like uh for Spark like Kafka is for software engineers also, right? Like how how like it is different from data engineer like uh what way it is different for uh data engineers and software >> See, Kafka, as I said, it's a it's a tool which we can use it for real-time streaming and also it can be used for messaging services. So, we'll be using for our use case. We are going to build a pipeline real-time streaming pipeline, right? Possibly for that we'll be using it. So, for software developers, they'll be using in a different case study. They will have their own problem statement.
They'll use it differently.
Okay.
Okay. Got it.
Okay. We'll meet on Monday. Thank you all.
Thank you.
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 views•2026-05-29
🚀 BCS613C Compiler Design | Module 1 to 5 Schema Evaluation 🔥 | VTU 6th Sem 💯 #VTU #bcs613c #exam
Pranavaa-y4y
104 views•2026-06-02











