Filtering Aggregates
One of the most frequent questions in SQL is: “I’ve already used WHERE, why do I need HAVING?” The answer lies in the Logical Order of Execution.
Definition: WHERE vs. HAVING
Section titled “Definition: WHERE vs. HAVING”WHERE: Filters individual rows before any grouping or aggregation occurs.HAVING: Filters grouped rows after the aggregation has been calculated.
Context: The Execution Timeline
Section titled “Context: The Execution Timeline”Recall the logical order of execution once more:
FROMWHERE(Filters raw data)GROUP BY(Groups data)HAVING(Filters the calculated groups)SELECTORDER BY
If you try to filter an aggregate result in the WHERE clause, the query will fail because at that point in time, the sum or average hasn’t been calculated yet.
Detailed Explanation: When to use which?
Section titled “Detailed Explanation: When to use which?”Use WHERE for:
Section titled “Use WHERE for:”- Filtering based on specific column values (e.g.,
WHERE Status = 'Active'). - Removing data you don’t want to include in your sums/counts.
- Narrowing down a date range.
Use HAVING for:
Section titled “Use HAVING for:”- Filtering based on the result of an aggregate (e.g.,
HAVING COUNT(*) > 5). - Finding groups that meet a summary condition (e.g.,
HAVING AVG(Price) < 50).
Example: High-Value Departments
Section titled “Example: High-Value Departments”We want to find departments with more than 1 employee, whose total payroll is over $150,000, but we only care about “Full Time” employees.
SELECT Dept, COUNT(*) AS EmployeeCount, SUM(Salary) AS TotalPayrollFROM EmployeesWHERE JobType = 'Full Time' -- Filter raw rows firstGROUP BY DeptHAVING COUNT(*) > 1 -- Filter groups after calculation AND SUM(Salary) > 150000;Output
Section titled “Output”| Dept | EmployeeCount | TotalPayroll |
|---|---|---|
| Engineering | 5 | 550000 |
| Product | 3 | 280000 |
(Note: The ‘Marketing’ department was removed because it only had 1 employee, and ‘Sales’ was removed because its payroll was too low.)
Under the Hood: Efficiency Matters
Section titled “Under the Hood: Efficiency Matters”A common mistake is putting a filter in HAVING that could have gone in WHERE.
Inefficient:
GROUP BY Dept HAVING Dept = 'Sales'
Efficient:
WHERE Dept = 'Sales' GROUP BY Dept
Why?
The WHERE clause reduces the data before the expensive grouping process. If you use HAVING to filter a non-aggregated column, you are forcing the database to group all the data, only to throw most of it away at the very end.