OpenTelemetry is an open-source project by CNCF that provides a standardized way to collect logs, metrics, and traces for applications, solving the fragmentation problem where different teams used incompatible monitoring tools; it enables distributed tracing across microservices written in different programming languages, and when combined with AI-powered tools like OpManager Nexus, it allows DevOps engineers to detect anomalies, perform root cause analysis, and receive predictive alerts for system issues.
Inmersión profunda
Prerrequisito
- No hay datos disponibles.
Próximos pasos
- No hay datos disponibles.
Inmersión profunda
What is Open Telemetry and real world Devops dashboardsAñadido:
Hey there everyone, my name is and in this video I will walk you through how a day of a DevOps engineer looks like.
What kind of dashboard he stares all day, what kind of anomalies he has to detect as well as this video will guide you through about what is open telemetry. This is kind of a final boss video where you have to watch nothing outside of this and h and can get a full confidence about what is open telemetry.
truly understand it as well as get the meaning and the sense why it exists, what it does and how can I use it in my application as well. This is all jam-packed video with a lot of information, lot of an analogies, lot of case studies and you will thoroughly love this part of the video. So before moving ahead into this video just a humble request if you enjoy these kinds of DevOps and system design videos please do let me know in the comment section what next would you like to see and I would definitely make that video in the similar kind of a style I would love to do that but just a humble request please drop in the comment section a simple hey I want to learn this whatever it can be I can make that video so go ahead and put that if you don't know what to say just a thank would be really really nice let me take you onto the screen because this is where we will study I will show you entire thing in fact the kind of dashboard that we see every day as a DevOps engineer and in fact some of the anomalies detection why the AI that what does the DevOps engineer see and look around every single day I will introduce you some of the tools as well but before that we also have to go through with a few of these you can say slides or examples that we go through don't worry they are not too much I have already pre-prepared them we usually do a drawing of Excal but these are the only five or six that we have to go through and after going through with them these are very interesting stories by the way you will have full idea about what observability is what is open telemetry and all this. So just try to get it in such a way that imagine you own a coffee shop on a very normal day everything runs very smoothly but one morning a very good and interesting morning uh one customer start complaining that hey your orders are taking too long. You want to figure out what went wrong. Was it the barista? Was it the coffee machine? Was it a supplier delay? Was your coffee not prepared?
Were was your hot water not hot enough?
What happened in during that situation?
Now simply blaming anybody hey you are not working that doesn't work in any business and in order to answer that you need to see what's happening inside the shop the entire shop operation should be just like a birdeye view in front of you the idea of being able to see what's happening inside is known as observability and this is the most important part of DevOps as well as a system design in the world of uh software observability means your application or your program gives you enough information uh so that when something breaks down you can take actionable step and you exactly know which part is actually not functioning the way we want. Now here's the thing.
The modern world of software, this is no coffee shop. This looks like a whole airport is functioning. Now imagine how does the airport function. The giant airport, it has hundreds of flights. It has staff, the system, the restaurants at airport, the launches. When anything goes in the airport, you need lot more than just a bird eye view. You need cameras, you need logs, you need tracking system almost every point. And just to give you a brief overview, our softwares are much more complex than these airport analogies that I'm trying to give you. The most important three pillars of observability are logs, midresses, and the traces. The log is like a diary. Every time something happens in your app, you write a note about it. Uh for example, a user logged in 3 p.m. Uh the payment happened at uh 3:10 p.m. Whatever whatever is the case, that is your logs. The matrices are like a dashboard of the car. It shows you a lot of numbers. Some of them make sense to you. Some of them definitely doesn't make sense to you. And for example, how much memory is being served in a CPU and these kinds of things. And traces, this is the most important part. Traces is like a GPS trail. It shows you the full journey of a single software uh that moves through the entire system from start to finish. So this is the basic baseline of how observability works. To give you an exact idea, we'll go on to the next slide in a minute. The problem which open telemetry solves. But at least now you know what observability is. So this is how a complex software or even a decently complex software looks like. Look at this. This is the business overview amount of detail that is going through. How many machines you have? How many things are going on even in the network or the servers? What are what is our applications look like in our server? Which one are up? Which one are not running? uh how is the virtualization going on a hyperv VMware we usually don't see them because in the early days of software we're just working with AWS or just one but as the system goes really complex or you move into the DevOps role this is what you see in your day-to-day life you can look at the storage overview what's happening in your storage which storage are red configured which are not which of them have the backup which backups of the backups are failing you need a bird eye view and this is what the software like op manager Nexus actually comes previously uh they were known as site 24x7 observability now they call this as op manager and access so basically site 24x7 plus the ops manager becomes the op manager nexus that's what they offer and they have a whole variety of operations uh offerings all of that but what I want to show you is this dashboard this is what a daily things looks like and if you look at the zia insight in this I'll come back onto this one in a minute but before coming on to this part We need to understand what is the problem that telemetry is solving us and what is the problem before the telemetry exists. So let's go back to our airport analogy.
Imagine every airline at the airport uses completely different tracking system for your bags of course. Now let's just say there is a Delta airline, there is a United Airline for Indian reference. There could be indigo, there is Air India airline and every other airline use a different kind of software to track the passenger as well as your luggage. Now imagine if a passenger luggage get lost and it crossed through three airlines or probably more, who will be able to trace it? I don't think so anybody anybody will be able to trace it because it has traveled to th so many airlines and each of them are using different software for managing. So these softwares cannot communicate with each other. This is exactly the problem that software team face before the telemetry exist. Uh here's a scenario. A uh a company builds an app. They use a tool called as data dog to collect matrices and they use another to tool probably Jagger to track the traces and they use another tool some another tool to trace the logs. Now each of these tools needs a very specialized application or agent to track the things and uh the special code that you use to collect the software is known as instrument instrumentation library. So everybody is using a different instrument instrumentation library. Uh so the developer has to configure the code for each one of them. They have to learn how to interact with them, how to use them, how to inject them in the application and use different tools for data dog monitoring and whatnot all these things. So if there are different team in the company, some are working on the front end, some are working on the back end. The back end team simply the ones who manage the logic and everything and front end team uses a different kind of collection mechanism as well as analysis mechanism. Imagine the communication between all these teams.
It is so hard to communicate between each one of these uh tools. Now think of this like this. Uh imagine every hospital in the city uses a different format of the patient record. And here's the interesting thing since the doctor are too much busy in doing their own job. If a patient moves from one hospital to another, doctor cannot read the old record easily. Everyone suffers because there is no standard of sharing the patient record. Isn't it bad? Yeah, it is absolutely bad. This shouldn't be like that. The software industry needs a single open standard for collecting the observability data that works with any tool that you're using, any programming language that you're using, any cloud provider that you're using. And this is exactly what open telemetry creates was created and this is exactly what it does. It allows you to have a standard of collecting the tool analyzing the tool regardless of the company or the software that you use. And now I hopefully you can understand that it makes communication easy. It doesn't punish you for using another company tool. It just says hey if you are supporting open telemetry that means uh you can communicate with any tool any engineer between the things and this is exactly what the problem was and this is exactly the problem which uh telemetry has solved for you and when things are actually at centralized place when things can communicate with open telemetry this is exactly what happens here. So for example, if I go onto the ops manager and as I was discussing about the ZI insight, so they collect all these analysis between the front end, back end, your RAID system, your disk, storage, networking and everything. And when we inject AI on top of that, what this AI does, it actually not only just create alarms, it gives you possible impact as well as a recommendation that what you can do. Let me just give you one example scenario.
For example, let's just say the disk utilization CLI is currently at 85% and the AI is detecting that it is expected to reach 100% in next 28 days. What possible it can do because this insight doesn't bother me much. All right, it is okay if it is 85% and in 28 days maybe it will reach, maybe it will not. But here's a possible impact. This might affect the server hosted on the server and the recommendation is you can delete a few of the large file and the disk capacity. And here's the best part you can actually create a workflow based on that that I want exact solution workflow based on AI or you can assign a ticket to some of the other engineer that hey you should really look on the storage either increase the storage or just remove some of the file or do something about it. So these are the things where AI actually shine. It not only gives you what could go wrong but it also see the possible impact and gives you recommendation and gives you the workflow. This is what I really love about these kinds of AI when they get injected with things like op manager and access. A really really nice product to go on. Let's move on to the next part.
And finally the most important part not finally but yeah what is uh what is even this open telemetry? We need to understand this. So it is really one of the big thing before we move further is it is a CNCF project the same company who manages kubernetes and tons of other open source projects. So open telemetry open often mentioned short as hotel uh which is open source project gives you a kind of a develop gives the developer a single standardized way to collect log matrices and the traces for their application. Open source means the cl the code is freely available you and it is maintained by a large community not just one or two developer it is CNCF one of the biggest open-source community out there think open telemetry as like universal power adapter uh when you travel internationally different countries have different plug shapes and I'm pretty sure you that is very annoying the universal adapter lets you plug in into any socket anywhere which you carry in your bag as well open telemetry does exactly same thing for observability data uh you instrument your app once using open telemetry standard and then you can send data to any monitoring tool that you choose. It can be data dog, it can be graphana, it can be new relic or it can also be uh the same guy the op manager and access and I highly recommend this one. You should use that.
You should try that. It's much more cheaper than that. Uh here's how it works. Uh there are a lot of steps into this but I'll just go through with I would say couple of steps only. The first one is instrumentation. The second one being the collector and the third third one being the back end itself. So the step one is always the instrumentation. This is the one. So what you do you add open telemetry library to your application code where the application is running. It's it can be either back end it can either be the front front end. These libraries automatically start collecting the data like how long the each function takes to run, what error occurred and how many requests came in. These are very very crucial information. The step two here is uh the collector one. The collector is uh pretty nice. Uh the the data flows in something like uh open telemetry collectors.
Think of this as a post office which sorts the data uh into different pin codes. So same kind of a thing happens into open telemetry collectors. And the step three is the back end. Uh the collector sends the data into your chosen monitoring tool. We call this as a back end. So in this case it can be op nexus or can be any other as well. And this is where you visualize all the dashboard and data and setup alerts. If I take you on to this part here, you can see this is not the only one. But if let's just say for example, we can go into capacity planning or we can go into anomalies. These are really nice. So you can just figure out into the dashboard that how many anomalies are actually occurring and this is not possible via the regular visualization this is happening through the open telemetry uh through the regular monitoring of that.
And this is how you can see what's the trend of anomalies. How many time we are running out of the space or how many time a function is failing in a day or in a week. How many payments are failing? Not just payments, a database also fails a lot. So this is all that you learn and understand in open telemetry. Let me give you a real world scenario again on top of this. So here's a a real world scenario of a broken food delivery app. Now here's the interesting one. Uh let me walk you through a real scenario to make this more concrete.
Imagine you work at a food delivery company. Similar to Door Dash, Swiggy, Zomato, uh Blinket, whatever you want to pick with them. The customer place an order and the app crashes halfway through the through the process of putting up the order. your phone rings, your manager is upset and you need to find uh find the bug fast and resolve this because if it keeps on happening to every other customer, this is the time where you actually receive those really weird alarm on your phone that hey there is an incident you need to get that. Uh so in the modern apps there are many small services that works together. One service handles the login, another for restaurant menu, another for the payment one and the fourth one sends the notific there can be hundreds of others as well.
These small independent services we call them as microservices.
When the customer place the order the requests travel through variety of microservices and here's the interesting part the microservices sometimes are written in different languages. One in Python, one in Java, one in JavaScript.
They might be talking to different databases as well. Each might have their own databases as well. Now in a mon in a modern application it is not that easy to handle or to debug that where the error is going on where something has broken in which micros service. This is where the importance of monitoring and observability comes in. This is where open telemetry tracing feature becomes incredibly powerful. When tracing is enabled every step of the customer request is recorded like a breadcrumb trail. You can open your monitoring dashboard and see something like this that uh user responded within 50 millisecond. Here we can see that and you can also see that uh the menu service responded in 80 millisecond which is also fine. The payment service is where it took the 400 millisecond and this actually comes just like this dashboard. I have just made this dashboard but there could be other variety of dashboard but still you will get the exact same data. This kind of end to-end visibility simply means that your entire journey from request to the order. Uh this is known as distributed tracing but you are given the whole thing with the open telemetry and the tool that you might be using could be different but this is what exactly open telemetry allows you to see with the different microservices and any team who is building at scale. This is the most important thing because the downtime cost money. Nobody likes to have a downtime and this is where you need to go into this one. And just on a final note before we move on to the AI part of it, here are the key components of open telemetry that anybody or who is even trying to look forward to get into open telemetry should at least have an idea.
And by the way, they have their APIs and everything being injectable in every single programming language whether it's Java, JavaScript,.NET, whatever you use. So the key component are first one the API, the SDK, the auto instrumentation, the collector and OTLP.
Don't worry about anything. I'll walk you through. So the API also known as application programming interface. This is the set of rules and commands the developer use to tell open telemetry, hey this is what you have to measure.
This is what the data I'm giving to you.
Then comes the SDK. This is the actual implementation. You just inject the SDK by open telemetry or the service provider that you're using. For example, in this case, you might be using op manager. They also give you this uh open SDK or the SD not open SDK the SDK that you can actually use and actually implement the thing. This one is little tricky the open instrumentation. This is the magical feature between the open telemetry which automatically start collecting the data once you do all the code part and all of that. It's like a small coffee machine that automatically grinds the bean, heats the water, brews the coffee at right time. uh looks little bit magical but this is the main thing which does and this is something that you don't do it this is something being injected or done by the guys like uh op manager and access or any other thing that you use then comes the collector the collector as mentioned earlier it does the same thing it's a standalone service u it runs on a separate server not onto your main server because otherwise if the service goes up this collector will also go down so you keep it onto a separate app separate server in itself. It receives the data from your application, can filter it, transform it, export it to more backends, gives you nice graphs, all these things. And finally, the OTLP of often known as open telemetry protocol. These are the standard. This is the language in which all the components communicate between each other and if any data has to come or have to leave open telemetry, this is the standard that it goes through. Not that hard but once you start using it feels like ah this is no big deal of that. Now here's the interesting thing.
What does the AI impact in this? This is where the biggest impact comes into the picture. Before the AI it would be very very difficult to find the things. For example, this is a ZIA anomaly dashboard. This is actual data. You can just look on any data and AI can evaluate and figure out whether this was actually an impact an anomaly. Previously your DevOps engineer has to keep on going into that and find had to find the root cause. Now you can just click on the button that hey this is this can be a root cause just find it about it and it can give you nice graphs when the incident happened and you can just exactly at that point in time you can find all the details about what is happening with my disk what's the network what's the outbound flow looks like what's the disk bandwidth what's the IO operation looks like and not only this I really like how uh this zia anomalies and all these you can just click on any one of them and can see hey I want to know more on what's happening exist at this particular exact time the AI in the DevOps and especially finding the root cause analysis finding anomalies even if it happens like out of 100,000 time if there is one anomaly AI can easily detect it you can make a report via the AI you can pass on that report to developers so that they can fix that one incident because it looks like this is one out of 100,000 but hey there's a revenue loss in there and you can actually work with that so coming on to the point The impact of AI it is my favorite thing. It is changing as almost every technology every observability tool is kind of getting rewritten with the AI first in mind. Uh the combination of open telemetry and AI uh is creating engineer and especially those late night calls can be avoided. It is getting avoided quite a lot. So anomaly detection is one of my favorite thing.
Root cause analysis uh is again one of my favorite thing out of it. Predictive alert. This is now getting really it's like a weather forecast for your software. It warns before the issue happens on a large scale because even one anomaly uh doesn't go undetected in the kind of AI. Uh there's also predictive alerts that you can get for these special anomalies and the natural language queries you can ask in English.
NLP is the best thing which AI has given. You don't have to be really expert in SQL or that programming language. You can just ask hey what happened here? why the IO operations are spiking up there and this is what I really love about it. So I highly highly recommend all of you guys to go ahead and check out the op manager. They have just recently rebranded a little site 24x7 and op manager plus is now operation manage op manager nexus. They have a lot of FAQs as well in case you are worried about that. But what I really like about is their pricing. They are dirt cheap. They give you enough of free tools also so that you can also go ahead get it try. so many things on a large enterprise application that they have to manage like look at the monitoring you obviously want to monitor your mailing records you have to monitor your TLS records SPF record like if all these tools are not at one place I can't even imagine that somebody has to like there I'm pretty sure there used to be one point of time when somebody has to deploy this many of the tool you just need probably more than developers, you need DevOps people to monitor each one of them. And now even look at this uh and these are some of the really nice ones. So if I go into the dashboard, this is how a dashboard usually looks like. And this is what a DevOps engineer in day-to-day life actually stare whole day that hey what happened here? Okay, the packet received violated a threshold here. Hey, what happened here? They can just click on it, have a debug and all these things need really professionalized trained eyes to go with them. Not only that, this one is going to take some time. You can go ahead and look onto the server, storage, application. We have database servers as well. Any given point of time a DevOps engineer is spinning up the dev more databases, reducing them down based on the demand and all of that. Really like how this is all going on. This is a little slow for me because a lot of data is there in this dashboard. But again as you can see so we can notice here this one immediately gets the notice and attention. Hey there is something going on in this one and this one also needs probably my attention. Oh there's a schedule maintenance here planned. This one easily monitors. That's why I say u it looks very fun outside that okay we have built the application but once it reaches to people like devops this is how their day in and day out looks like uh this is how they monitor everything oh man and again injecting them is not that much difficult they gives you SDK and APIs you just inject them in the code and that's it the this software automatically does the magic of getting the data and this is where observability layer comes into the picture. Uh hope this video has given you enough of the real world impact. It was little bit geared towards the high-end engineers who are pro in the DevOps side. But somebody who is also beginner wants to see that what happens in the day of a life of DevOps engineer. This is the real world scenario. Not those fancy cafes and uh coffees that you drink and play. No, you don't do that. This is the real world. You need to understand the problem in this way. But at least I can say that by this point you have an idea about what is open telemetry, why the standard was designed and how it impacts everyday life of every single engineer that you see and software is way more complex than you can even imagine. So hope this video has given you enough of real world idea some of the examples of the tools and software that you can use can recommend in your company or if you are already working in the company you can try these tools and see how they looks like and all of that. By the way there's a link in the description section uh which gives you more additional free uh months into the uh op nexus op manager nexus and hope you will be loving that. It's by the way Zohoo product uh homegrown from our own country. That's why I love them. Uh that is it for this video and I'll surely catch you up in the next one.
Videos Relacionados
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 views•2026-05-29
🚀 BCS613C Compiler Design | Module 1 to 5 Schema Evaluation 🔥 | VTU 6th Sem 💯 #VTU #bcs613c #exam
Pranavaa-y4y
104 views•2026-06-02











