UNION (Distinct)
The database must perform a “Sort” or “Hash” operation to find and remove duplicate rows. This is expensive and slow on large data sets.
While JOINs combine tables horizontally (adding columns), Set Operators combine result sets vertically (adding rows).
Set operators allow you to combine the output of two separate SELECT statements into a single result set.
UNION: Combines rows and removes duplicates.UNION ALL: Combines rows and keeps duplicates.INTERSECT: Returns only rows that appear in both results.EXCEPT (or MINUS): Returns rows from the first result that do not exist in the second.To use set operators, both queries must follow these rules:
This is one of the most important performance distinctions in SQL.
UNION (Distinct)
The database must perform a “Sort” or “Hash” operation to find and remove duplicate rows. This is expensive and slow on large data sets.
UNION ALL
The database simply “appends” the second result to the first. No sorting or checking for duplicates occurs. This is significantly faster.
We have a Customers table and a Suppliers table. We want a single list of all unique cities where we have business partners.
SELECT City FROM CustomersUNIONSELECT City FROM SuppliersORDER BY City;SELECT City FROM CustomersEXCEPTSELECT City FROM Suppliers;| City |
|---|
| Berlin |
| London |
| New York |
| Tokyo |
Where does ORDER BY go?
You cannot put an ORDER BY in the first query of a UNION. The ORDER BY must come at the very end and applies to the entire consolidated result set.
Logically, the engine:
UNION).ORDER BY on the combined data.If you have two tables on different physical disks or servers, UNION ALL can sometimes be executed in parallel by the database engine, fetching rows from both tables simultaneously. UNION (distinct) usually prevents this parallelism because the engine needs to wait for all data to arrive before it can start the deduplication process.