AWS observability evolves through three levels: Level 1 (Log Hell) involves unstructured plain text logs without correlation IDs, making debugging difficult; Level 2 (Three Pillars) introduces structured logs with correlation IDs, metrics using EMF, and traces via X-Ray, enabling efficient log analysis through Logs Insights; Level 3 (AI-Powered) uses agents like Claude Code that query CloudWatch automatically, eliminating manual console interaction.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
How I Debug AWS Without Opening CloudWatchAdded:
This is how I debug AWS now. I don't open the CloudWatch console anymore. I don't open the AWS management console at all anymore. I do all of that straight from my Cloud Code. And honestly, I wasn't able to do this 3 years ago, not just because Cloud Code wasn't there, but also because I didn't have the basic straight. So, I figured out there are three levels of AWS observability I went through, and I want to show you quickly how you can go through these levels that you can do the same thing. In 2022, I joined a startup. And this startup had quite some scale. We had over 100 million requests a month. We had over a million events on EventBridge a month.
So, I loved the scale. It is a dream gig for an AWS developer.
And one thing which was a real nightmare was debugging. So, we had more and more users, especially business-to-business users, reaching out to us and saying, "Hey, they have a certain issue." And debugging became quite a nightmare.
So, let me walk you through all three levels so you can skip straight to level three without missing the basics.
Level one is where everyone starts, and this is what I call the log hell. So, this is what most apps look like in CloudWatch. We have two Lambda functions. We have one producer. So, we send something to a queue, for example, an order. And we have a consumer who picks up the queue and saves it to a database. In that example, we put it into DynamoDB.
And each one of these Lambda functions writes into its own log group by default. So, the good thing is we already have logs. The bad thing is they're a mess. So, if you look at that, you can see that we have so many different logs in here, and we don't understand anything.
We see plain text console logs in here.
So, we see that some orders were produced. We see that some orders were picked up by the queue and sent to DynamoDB, which is nice. But, if one of those customers has an issue correlating these logs, so understanding which producer belongs to which consumer or vice versa is quite hard. For that, we need correlation IDs. That means one ID which belongs to both of them.
Also, having log levels helps immensely here. So, what we need is we need some structure. We need a structured log in here. Typically, how you get started if you're in in a situation like that, you start using string tools, right? So, we see there's an ID after processing order, there's an ID. So, we can just like use this ID in a string and build a regex and find these logs. But, honestly, this is not a nice developer experience and it takes ages. And once you change this log, you can't correlate anything with each other anymore. So, we need some more structure. So, that is what I call the log hell. You can survive it. You can do it, but it is just a hard life. You can't debug anything fast anymore.
>> [snorts] >> And to get out of that, we need level two. And level two introduces the three pillars of observability. And just a quick one before we jump into level two.
If you're wondering if you're still a level one AWS developer, check out my scorecard. I implemented it on awsfundamentals.com/scorecard.
We have 15 questions where you can check out what your AWS level is, how far you are, and you get a personalized growth plan on how to get better. Then, let's get back to level two.
Level two is what I call the three pillars of observability. And these three pillars are called structured logs, metrics, and traces.
And where I got these three pillars from is especially this book, Observability Engineering. The book is from Charity Majors, Liz Fong-Jones, and George Miranda. And they are the founders, or some of them at least are the founders of Honeycomb, an amazing competitor to CloudWatch.
And in the book, they also talk about the fundamentals of observability and mention exactly these kind of things.
So, if you want to make a deep dive, check out this book. So, in pillar one, we have structured logs. I use AWS PowerTools for that. So, I simply have a logger where I can go into and then emit these logs. Every log is now in JSON.
Every log has a correlation ID, an order ID, and a customer ID.
PowerTools is a super flexible in defining a log formatter, so you can add more key attributes for that.
The important part here is the correlation ID, because the correlation ID could be something like the AWS request ID. And this request ID is attached to it and follows from the producer to the SQS queue to the consumer to DynamoDB. So, you understand the whole flow. The second part of metrics. And metrics are in the end just data points at a given time. We have many CloudWatch standard metrics already. If you use EC2, you know how much CPU was used. If you use Lambda, you know how many errors were emitted.
But, often we need some custom metrics.
And in that case, I love to emit these custom metrics using the embedded metric format, short EMF.
And again, PowerTools is doing this by default. What it is doing, it just prints out a CloudWatch log, and CloudWatch automatically picks it up and saves it as a metric. It's super cheap, and it is very nice. In our case, we have some example metrics, for example, orders created, orders accepted, orders rejected, orders processed, or orders failed. All of them are just some example from our e-commerce example.
And pillar three are traces. And traces are very important since you saw that we have different services here from going to API Gateway to Lambda, DynamoDB, SQS, and so on. We want to visually see how the whole trace, how the whole user journey behaves. And this is where we use X-Ray. The cool part about level two is now that we have correlation IDs and user IDs, we can easily analyze our logs by using logs insights. So, we can write some queries and do that. For example, here I have a query which simply shows you all errors, so the log level error logs, and shows you these errors.
Then, we have a query which gives you all logs based on a certain customer ID.
This is often helpful if a customer reaches out. You need to see what he's done in the back end, so you just look at our logs. And then there's my favorite query to get all logs by correlation ID. I told you what correlation IDs is, and this actually shows you now how I can get all logs with one correlation IDs to see the whole user request. This is level two.
And this is what the AWS docs suggest you, and this is how you officially use CloudWatch. And it worked great for me the past years until AI came around. And once I used AI in my workflow, I never opened CloudWatch again. And this actually is level three. Let's go into that. Just as a quick recap, we have level one, which is just getting data into CloudWatch. We have level two, that means we get structured data into CloudWatch. And in level three, we stop using CloudWatch entirely. And the whole magic lives in this file. This file is my agent within Cloud Code. I always called it CloudWatch log searcher, and I have one of these files for each of my repositories, for each of my projects.
And it is just one markdown file which explains Cloud Code how CloudWatch works. What are my most important CloudWatch log groups? What is the region I'm in? How can you use the AWS CLI for CloudWatch? And typically, I also give it some queries I want. So, now let's see an example. And what I'm doing is I say, "Search for logs from cost seven."
And then, Cloud Code starts the agent.
It recognizes, "Okay, I need to search for logs at the same time." That means I should use my CloudWatch log searcher agent, and it starts going off and finding some results. It uses the AWS CLI for that, like I instructed it to do. And then, we just need to wait to see a summary. Then, another use case you will see a lot, especially if you work a lot with Lambda, is that you have timeouts. And I can say, "Okay, find any orders that timed out yesterday." So, let's execute this query. And the same thing will happen. It starts the agent, and it will look for timeouts and for orders that timed out.
So, I told you about the agent file. I add the agent file in a GitHub gist so that you can access it as well. Feel free to comment it. Feel free to update it to your needs, or give me some ideas in the comments on how I can even further improve it. So, I've added two things in the description. First of all, the link to the GitHub gist where the agent file lives. And then, a free CloudWatch infographic that you still learn the basics and still learn, for example, the logs insights syntax, so how it works, and that you understand CloudWatch. Because honestly, this is the basic of everything. If this helps you building real AWS skills and not just some vibe coded slop, feel free to subscribe. Feel free to comment and let me know what you think. Thank you so much.
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 views•2026-05-29
🚀 BCS613C Compiler Design | Module 1 to 5 Schema Evaluation 🔥 | VTU 6th Sem 💯 #VTU #bcs613c #exam
Pranavaa-y4y
104 views•2026-06-02











