Execution Plans
When you send a SQL query to the database, you are telling it what you want. The Query Optimizer then decides how to get it. The resulting “map” of how the database will execute your query is called the Execution Plan.
Definition: The Execution Plan
Section titled “Definition: The Execution Plan”An Execution Plan is a series of steps (or operators) that the database engine follows to retrieve the requested data. It shows which indexes will be used, how tables will be joined, and the estimated “cost” of each step.
Context: EXPLAIN
Section titled “Context: EXPLAIN”To see the plan, you prefix your query with the EXPLAIN keyword (or use a graphical tool like SQL Server Management Studio’s “Execution Plan” tab).
EXPLAIN SELECT * FROM Users WHERE ID = 50;Detailed Explanation: Key Operators to Watch
Section titled “Detailed Explanation: Key Operators to Watch”When reading a plan, you are looking for specific “Red Flags” that indicate slow performance.
1. Sequential Scan (Table Scan)
Section titled “1. Sequential Scan (Table Scan)”Red Flag! The database is reading every single row of the table from the disk. This is usually caused by a missing index or a non-SARGable query.
2. Index Seek / Index Scan
Section titled “2. Index Seek / Index Scan”Good Signal. The database is using an index. A “Seek” is generally better than a “Scan.”
3. Nested Loop vs. Hash Join
Section titled “3. Nested Loop vs. Hash Join”- Nested Loop: Good for small sets or when joining on an index.
- Hash Join: Good for large, unindexed sets. If the plan shows a Hash Join for a small query, it might mean an index is missing.
Example: Comparing Two Plans
Section titled “Example: Comparing Two Plans”Imagine we have a Products table with 1 million rows.
Query 1: Unindexed Search
Section titled “Query 1: Unindexed Search”EXPLAIN SELECT * FROM Products WHERE Price = 19.99;
- Result:
Seq Scan on Products (cost=0.00..18500.00 rows=12 width=45) - The “cost” is high because it must read the whole table.
Query 2: Indexed Search
Section titled “Query 2: Indexed Search”After running: CREATE INDEX idx_price ON Products(Price);
EXPLAIN SELECT * FROM Products WHERE Price = 19.99;
- Result:
Index Seek using idx_price on Products (cost=0.42..8.44 rows=12 width=45) - The cost dropped from 18,500 to 8.44!
Under the Hood: Statistics and the Cost-Based Optimizer
Section titled “Under the Hood: Statistics and the Cost-Based Optimizer”How does the optimizer know which index is best? It uses Statistics.
The database maintains a “Metadata” layer that stores:
- The total number of rows in each table.
- A “Histogram” of values (e.g., “30% of users are from the USA”).
- The “Density” of columns (how many unique values exist).
If your statistics are “Stale” (outdated), the optimizer might choose a terrible plan. Most databases have an ANALYZE or UPDATE STATISTICS command to refresh this metadata.
The “Tipping Point”
Section titled “The “Tipping Point””If your query asks for 90% of the rows in a table, the optimizer might choose a Sequential Scan even if an index exists. Why? Because jumping back and forth between the index and the table data (Random I/O) is slower than just reading the whole table in order (Sequential I/O).