Optimization Best Practices
Performance optimization is often about doing less work. By writing more precise SQL, you reduce the burden on the database’s CPU, Memory, and Disk I/O.
Definition: The “Less is More” Philosophy
Section titled “Definition: The “Less is More” Philosophy”Every optimization strategy boils down to one goal: Reduce the amount of data the database has to touch.
Context: Top 5 Performance Boosters
Section titled “Context: Top 5 Performance Boosters”1. Stop using SELECT *
Section titled “1. Stop using SELECT *”Always name your columns.
- Why:
SELECT *forces the database to fetch data for every column, even those you don’t need (like large BLOB or TEXT fields). It also prevents “Covering Index” optimizations.
2. Use UNION ALL instead of UNION
Section titled “2. Use UNION ALL instead of UNION”- Why:
UNIONperforms a hiddenDISTINCToperation, which requires an expensive sort and duplicate check. If you know your data sets are unique or you don’t care about duplicates,UNION ALLis nearly instant.
3. Filter Early (The “Narrowing” Principle)
Section titled “3. Filter Early (The “Narrowing” Principle)”Put your most restrictive filters in the WHERE clause rather than the HAVING clause or the final SELECT.
- Why: The earlier you remove rows, the less data the subsequent
JOIN,GROUP BY, andORDER BYoperations have to process.
4. Avoid Wildcards at the Start of a String
Section titled “4. Avoid Wildcards at the Start of a String”- Slow:
LIKE '%abc' - Fast:
LIKE 'abc%' - Why: Leading wildcards make a query non-SARGable (see Chapter 8.3).
5. Be Careful with OR
Section titled “5. Be Careful with OR”- Why: The optimizer often struggles with
OR. Sometimes, it’s faster to write two separate queries combined with aUNION ALL.
Detailed Explanation: The Covering Index
Section titled “Detailed Explanation: The Covering Index”A Covering Index is the ultimate performance optimization. It occurs when an index contains every column requested by the query.
-- QuerySELECT Email FROM Users WHERE Email = 'test@test.com';
-- IndexCREATE INDEX idx_email ON Users(Email);In this case, the database engine never touches the actual table. It finds the email in the index, sees that the query only asked for the email, and returns it directly from the index. This avoids a “Bookmark Lookup” (jumping from index to table), which is the most expensive part of a query.
Example: Before and After Optimization
Section titled “Example: Before and After Optimization”The Slow Way
Section titled “The Slow Way”SELECT *FROM OrdersWHERE UPPER(Status) = 'SHIPPED'UNIONSELECT *FROM ArchivesWHERE UPPER(Status) = 'SHIPPED';The Optimized Way
Section titled “The Optimized Way”SELECT OrderID, CustomerID, OrderDateFROM OrdersWHERE Status = 'Shipped'UNION ALLSELECT OrderID, CustomerID, OrderDateFROM ArchivesWHERE Status = 'Shipped';What changed?
- Column Selection: Removed
SELECT *. - SARGability: Removed
UPPER()from theWHEREclause. - Set Operator: Changed
UNIONtoUNION ALL.
Under the Hood: The Cost of Sorting
Section titled “Under the Hood: The Cost of Sorting”Many SQL operations trigger a Sort:
ORDER BYGROUP BYDISTINCTUNION(not UNION ALL)JOIN(specifically Merge Joins)
Sorting is a CPU-intensive operation. If the data to be sorted is larger than the available work_mem (in Postgres) or Sort Area (in SQL Server), the database must use External Merge Sort, which writes data to the disk. This is a common cause of “Disk I/O spikes.”