Skip to content

Distinct Values

Data often contains repetitive information. To find unique entries or count how many unique categories exist, we use the DISTINCT keyword.

DISTINCT is used in the SELECT clause to remove duplicate rows from the result set. It ensures that every row returned is unique across all selected columns.

DISTINCT is used in two primary ways:

  1. Row-level uniqueness: SELECT DISTINCT Category FROM Products;
  2. Inside an aggregate: SELECT COUNT(DISTINCT Category) FROM Products;

When you use DISTINCT with multiple columns, the database looks at the combination of those columns. SELECT DISTINCT City, Country FROM Customers; This will return one row for “Paris, France” and one for “Paris, Texas”. It only removes a row if both City and Country are identical to another row.

This is a powerful tool for finding the number of unique items in a set. SELECT COUNT(DISTINCT CustomerID) FROM Orders; This tells you how many unique customers placed orders, regardless of how many total orders exist in the table.

We have an Orders table:

OrderIDCustomerIDOrderMonth
1501January
2501January
3502January
4501February

Task: How many unique customers placed orders in January?

Section titled “Task: How many unique customers placed orders in January?”
Unique Count
SELECT COUNT(DISTINCT CustomerID) AS UniqueJanuaryCustomers
FROM Orders
WHERE OrderMonth = 'January';
UniqueJanuaryCustomers
2

(Note: Total rows for January was 3, but Customer 501 was a repeat customer, so they only count once.)

How does the database find duplicates?

  1. Sort: It sorts the entire result set. Once sorted, duplicates will be adjacent to each other. The engine then simply skips any row that is identical to the one before it.
  2. Hash: It builds a hash table. As it processes rows, it checks if the “hash” of that row already exists in the table. If yes, it discards the row.

Both sorting and hashing are expensive operations that require CPU and memory. Using DISTINCT on a result set of 10 million rows can be very slow.

ScenarioTool
”What is the total…”SUM()
”How many unique…”COUNT(DISTINCT ...)
”For each category…”GROUP BY
”Only show groups where…”HAVING
”Remove duplicates from results…”DISTINCT