Execution Plans

When you send a SQL query to the database, you are telling it what you want. The Query Optimizer then decides how to get it. The resulting “map” of how the database will execute your query is called the Execution Plan.

Definition: The Execution Plan

An Execution Plan is a series of steps (or operators) that the database engine follows to retrieve the requested data. It shows which indexes will be used, how tables will be joined, and the estimated “cost” of each step.

Context: EXPLAIN

To see the plan, you prefix your query with the EXPLAIN keyword (or use a graphical tool like SQL Server Management Studio’s “Execution Plan” tab).

EXPLAIN SELECT * FROM Users WHERE ID = 50;

Detailed Explanation: Key Operators to Watch

When reading a plan, you are looking for specific “Red Flags” that indicate slow performance.

1. Sequential Scan (Table Scan)

Red Flag! The database is reading every single row of the table from the disk. This is usually caused by a missing index or a non-SARGable query.

2. Index Seek / Index Scan

Good Signal. The database is using an index. A “Seek” is generally better than a “Scan.”

3. Nested Loop vs. Hash Join

Nested Loop: Good for small sets or when joining on an index.
Hash Join: Good for large, unindexed sets. If the plan shows a Hash Join for a small query, it might mean an index is missing.

Example: Comparing Two Plans

Imagine we have a Products table with 1 million rows.

Query 1: Unindexed Search

EXPLAIN SELECT * FROM Products WHERE Price = 19.99;

Result: Seq Scan on Products (cost=0.00..18500.00 rows=12 width=45)
The “cost” is high because it must read the whole table.

Query 2: Indexed Search

After running: CREATE INDEX idx_price ON Products(Price); EXPLAIN SELECT * FROM Products WHERE Price = 19.99;

Result: Index Seek using idx_price on Products (cost=0.42..8.44 rows=12 width=45)
The cost dropped from 18,500 to 8.44!

Under the Hood: Statistics and the Cost-Based Optimizer

How does the optimizer know which index is best? It uses Statistics.

The database maintains a “Metadata” layer that stores:

The total number of rows in each table.
A “Histogram” of values (e.g., “30% of users are from the USA”).
The “Density” of columns (how many unique values exist).

If your statistics are “Stale” (outdated), the optimizer might choose a terrible plan. Most databases have an ANALYZE or UPDATE STATISTICS command to refresh this metadata.

The “Tipping Point”

If your query asks for 90% of the rows in a table, the optimizer might choose a Sequential Scan even if an index exists. Why? Because jumping back and forth between the index and the table data (Random I/O) is slower than just reading the whole table in order (Sequential I/O).