Skip to content

Set Operators

While JOINs combine tables horizontally (adding columns), Set Operators combine result sets vertically (adding rows).

Set operators allow you to combine the output of two separate SELECT statements into a single result set.

  1. UNION: Combines rows and removes duplicates.
  2. UNION ALL: Combines rows and keeps duplicates.
  3. INTERSECT: Returns only rows that appear in both results.
  4. EXCEPT (or MINUS): Returns rows from the first result that do not exist in the second.

To use set operators, both queries must follow these rules:

  • The number of columns must be the same.
  • The data types of the columns must be compatible (in order).
  • The column names in the final result set are taken from the first query.

This is one of the most important performance distinctions in SQL.

UNION (Distinct)

The database must perform a “Sort” or “Hash” operation to find and remove duplicate rows. This is expensive and slow on large data sets.

UNION ALL

The database simply “appends” the second result to the first. No sorting or checking for duplicates occurs. This is significantly faster.

We have a Customers table and a Suppliers table. We want a single list of all unique cities where we have business partners.

Vertical Combination
SELECT City FROM Customers
UNION
SELECT City FROM Suppliers
ORDER BY City;

Finding “Customers who are NOT Suppliers”

Section titled “Finding “Customers who are NOT Suppliers””
The Set Difference
SELECT City FROM Customers
EXCEPT
SELECT City FROM Suppliers;
City
Berlin
London
New York
Tokyo

Where does ORDER BY go?

You cannot put an ORDER BY in the first query of a UNION. The ORDER BY must come at the very end and applies to the entire consolidated result set.

Logically, the engine:

  1. Executes Query A.
  2. Executes Query B.
  3. Combines them (and potentially sorts them to remove duplicates if using UNION).
  4. Performs the final ORDER BY on the combined data.

If you have two tables on different physical disks or servers, UNION ALL can sometimes be executed in parallel by the database engine, fetching rows from both tables simultaneously. UNION (distinct) usually prevents this parallelism because the engine needs to wait for all data to arrive before it can start the deduplication process.