A masterclass in algorithmic efficiency that proves why deep architectural understanding outweighs brute-force hardware solutions. It is a necessary reality check for developers who rely too heavily on abstraction layers instead of fundamental logic.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
How I Made a 21-Second Query Run in 0.07 SecondsAdded:
I had a query that took 21 seconds to run. I made one change and the run time dropped to 0.07 seconds. In this video, I'm going to walk you through exactly what I found, why it was happening, and give you a process you can follow in your own queries. Let me give you some context on the system first because this matters for how we approach the problem.
There was a report generation area of the system.
When a user triggered a report, the whole thing took up to 60 seconds to complete.
That seemed quite slow, so I wanted to look into it and see what could be done.
So, I looked into the code and tried to work out what part was slow. I wasn't sure what was happening, and my feeling that it was slow was not backed by any evidence yet.
The first thing to do is not to guess at fixes.
I didn't know anything was wrong yet.
So, the first thing to do is to find out where those 60 seconds are actually going. There are many things I look for in a slow query, and you can find them in my SQL performance checklist using the link in the description. The data for this report was being retrieved from an SQL Server database using stored procedures.
A series of stored procedures was being run.
Here is some pseudo code to show in a simple way what was happening. So, I opened a new query editor and wrote the exact calls for each of these stored procedures with the parameters used for the report.
In between each of them, I put a print statement to print out the current date and time. This would allow me to see the time difference between each procedure call, so I knew which one would take the longest. I ran these procedures, which showed the results on the screen.
But, I was more interested in the execution time. So, I opened the messages tab in SSMS and saw the output for each of my print statements. When you profile the stored procedures running inside the report, one procedure stood out. It's responsible for roughly half the total elapsed time.
It was responsible for roughly half of the total elapsed time.
So, we're already narrowing things down.
We've gone from this report is slow to this procedure is slow.
So, one procedure took about 30 seconds to run.
We can see that here in the output. The procedure that was ran between lines three and four had 30 seconds of difference between those two calls.
Next, I looked into that procedure.
I opened the procedure and converted the create procedure code to a script that ran the same code but did not create or call the procedure itself.
It just ran the variable assignments and the logic and SQL queries in the procedure.
The main reason I did this was so I could put print statements within this code as well.
It was easier to do than recreate the procedure.
I ran this code and it showed the time taken for each part.
In this procedure, I saw that one query is responsible for a large chunk of the procedure's run time. It took about 21 seconds of the 30 seconds that the overall procedure took.
This narrowing process is important.
A lot of developers will look at a slow report and start tweaking things at random. Adding indexes here or rewriting a join there.
Sometimes it helps and sometimes it doesn't and you're never quite sure why.
The better approach is to measure first, find the biggest contributor and focus there.
Fix that one thing and you've moved the needle more than a dozen small guesses would have.
So, now we have one query to focus on.
Let's look at what it does.
Now, I can't show the actual query here because it ran on my employer's system, but I can show you some pseudo code to explain what it does.
The query retrieves transaction data and account data. It has some grouping across supply and a couple of joins.
Nothing unusual on the surface.
It's the kind of query you'd write if you were summarizing financial or transactional information per account.
My original thought was the cross apply section as that can often cause some performance issues if it's not written effectively.
The data volumes though are where things start to get interesting. The transactions table has around 38 million rows.
The accounts table had around 90,000.
And there's a third table involved with about 17,000 rows.
So, we're not dealing with a small data set.
This is a real production system with real data volumes.
And the query is ultimately trying to return about 500 rows, which are the results for a specific account.
Now, before we change anything, we need to understand what the database actually doing to get those 500 rows.
And for that, we look at the execution plan.
I ran the query with the actual execution plan enabled.
The runtime was just over 21 seconds.
Here is a mock-up of what the execution plan looked like.
The first thing I noticed was that the plan showed index seeks on the transactions table.
If you're not familiar with the execution plans, an index seek is generally a good sign.
It means the database is using an index to find specific rows rather than reading the entire table.
So, on the surface this looks fine.
However, an index seek tells you the database found the data efficiently.
It does not tell you how much data it found.
And that distinction matters a lot when you're working with 38 million rows.
The second thing I noticed was the sort step.
It was the most expensive step in the plan.
What made this interesting is that there was no order by in the query.
There was no explicit sort, so why was the database sorting? This can happen sometimes.
A group by in your query can require an internal sort.
Certain join types can introduce one as well.
The database sometimes needs to sort data to perform an operation even when you haven't asked for sorted output.
It's not wrong, but it is expensive and it's a signal worth paying attention to.
The third thing, and this is the one that explained the slowness, was the row counts flowing through the plan.
In SQL Server, you read the execution plan from right to left.
The data comes in from the right, gets processed through each step, and the final result comes out on the left.
When I looked at the diagram and the row counts, I saw millions of rows entering the plan at the early stages, flowing through the joins and the grouping and the cross apply, and then late in the plan a filter is applied and the result drops down to around 500 rows.
The database was processing millions of rows to produce 500.
That's the problem in one sentence.
The question is, why did this happen?
Let me show you what was happening in the query structure.
The query had a filter on the account number.
The logic was correct. If you ran the query, you got the right 500 rows back.
The issue was where that filter lived.
The filter on the account number was applied in the outermost part of the query where the account records were being handled.
The subquery that retrieved transaction data, which was the part that was going into the 38 million row table, had a filter on the account number, too.
But it was not filtering on the parameter of the account number.
This seems like it doesn't matter, but it did for this query.
So, what the database was doing was this: run the inner subquery, retrieve a massive set of transaction data, carry that through the joins and the grouping and the cross apply, and then at the very end, apply the account filter and throw away everything that wasn't needed. To anyone reading the query, it probably seemed okay.
There was a filter to ensure the data is only retrieved for the right account.
But it was filtered at the end of the process.
It was doing an enormous amount of work and then discarding most of it.
Now, you might be wondering why didn't the query optimizer fix this automatically?
The optimizer is smart and in a lot of cases it will push filters down into the subqueries on its own, but it doesn't always do it.
The complexity of the query, including the cross apply, the grouping, the structure of the subquery, meant the optimizer didn't push the filter down in this case.
It processed the data in the order the query described. This is an important thing to understand.
The optimizer works within the structure you give it. It has limits and when you write a filter in the wrong place, you are making a performance decision, even if you're only thinking about logic at the time.
The fix was straightforward once the cause was clear.
The fix for this query was to add a filter to match the transaction account number to the account number parameter inside the subquery that retrieves the transaction data. That's it. One condition was added in one place.
The subquery now only pulls transactions for the relevant account. Instead of loading millions of rows and filtering later, the database loads a small set of rows from the start and everything after that point is working with far less data.
I ran the query. It completed in 0.07 seconds.
That's a drop from 21 seconds to 0.07 seconds, around 300 times faster from one added filter.
It would vary slightly for each run, but it was roughly 0.07 seconds.
I want to look at the updated execution plan for a moment because it's worth seeing what actually changed.
The plan looks similar in structure. The same joins, the same cross apply, the same grouping, but the row counts at the early stages are dramatically lower.
The data retrieval step is now returning a small number of rows rather than millions.
And because fewer rows are entering the plan, every step after it, including the joins, the sort, and the grouping, all become cheaper.
The sort step, which was the most expensive step in the original plan, is still there, but its cost drops significantly because there's much less to sort.
The result is still exactly correct.
It has the same 500 rows.
The only thing that changed is how much work the database had to do to get there.
This pattern of filtering light and spending time loading every row you didn't need is not unusual.
It comes up regularly in real systems, and it's easy to miss because the query is often logically correct.
When developers write subqueries, they tend to think about them in terms of what they return, not in terms of how much data they touch to return it.
You write the inner query to get the transaction data. You write the outer query to filter it down to the account you care about, and the query works.
You test it on a small data set, and it's fast enough. You move on. The problem only surfaces at scale.
When the transactions table has 38 million rows, a subquery that reads the whole thing becomes expensive in a way it wasn't when the table had 50,000 rows.
This is one of the reasons that performance work on a development environment doesn't always translate to production. The data volumes are different, and queries that looked fine at a small scale behave completely differently when the table grows.
The execution plan is what makes this visible.
Without it, you're looking at a query that is logically correct and wondering why it's slow.
With it, you can see the row counts and see where the data is coming from, and find the point where the unnecessary work is happening.
Let me give you the steps I used here, because this is more useful than the fix itself.
The first step is to measure.
Before you change anything, run the query and record the baseline run time.
If you don't have a baseline, you can't tell whether your changes actually helped.
The second step is to narrow the problem.
If you're dealing with a slow report or process, don't start at the query level.
Profile the procedures first, find the biggest contributor, then look at the queries within it.
Focus on the thing that accounts for the most time.
The third step is to open the actual execution plan.
Not the estimated plan, the actual one.
Run the query and look at what the database actually did, including the actual row counts at each step.
The estimated execution plan can help, which is sometimes the only option, but the actual execution plan is more useful.
The fourth step is to read the row counts. Start from the right side of the plan where the data comes in and trace it through to the left, where the result comes out.
Find where large volumes are entering the plan and where they are filtered out.
If the filtering is happening late, that's your signal.
The fifth step is to ask a simple question. Is there a filter that could be applied earlier?
If the outer query is filtering on something that could also exist in the inner query or the subquery, try adding it there.
Check that the result is still correct.
Measure the run time again.
That's the whole process. It's not complicated, but it requires looking at the execution plan rather than guessing.
This approach works beyond subqueries as well.
The same pattern can appear in CTEs, in derived tables, and in multi-step procedures.
The specific structure is different, but the principle is the same.
Find where unnecessary rows are entering the plan and stop them as early as possible.
If you want a checklist of things to look at to improve the performance of your SQL queries, I've put together a free SQL performance checklist that covers this. The link is in the description.
If you found this useful, you'll want to watch this video next where I take a slow query and walk through the full analysis from scratch including a couple of things I tried that didn't work and what that tells you about the process.
Thanks for watching.
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 views•2026-05-29
🚀 BCS613C Compiler Design | Module 1 to 5 Schema Evaluation 🔥 | VTU 6th Sem 💯 #VTU #bcs613c #exam
Pranavaa-y4y
104 views•2026-06-02











