The GROUP BY Clause
Aggregating an entire table into one row is useful, but usually, we want to see statistics broken down by category—for example, total sales per region or average salary per department. This is where GROUP BY comes in.
Definition: The GROUP BY Clause
Section titled “Definition: The GROUP BY Clause”The GROUP BY clause groups rows that have the same values in specified columns into summary rows.
The Golden Rule of Aggregation
Section titled “The Golden Rule of Aggregation”This is the most common source of errors for SQL beginners.
If you use an aggregate function (like SUM) in your SELECT list, every other column in that SELECT list MUST either be part of an aggregate function or appear in the GROUP BY clause.
Context: Logical Order Revisited
Section titled “Context: Logical Order Revisited”Remember the logical order of execution:
FROMWHEREGROUP BYSELECT
Because GROUP BY happens before SELECT, the database first organizes the data into buckets (groups). Only then does it calculate the SUM or COUNT for each bucket.
Detailed Explanation: Syntax
Section titled “Detailed Explanation: Syntax”SELECT column_name, AGGREGATE_FUNCTION(column_name)FROM table_nameWHERE conditionGROUP BY column_name;Grouping by Multiple Columns
Section titled “Grouping by Multiple Columns”You can group by multiple levels. For example, to see sales by Year AND Region:
GROUP BY Year, Region
Example: Departmental Stats
Section titled “Example: Departmental Stats”We have an Employees table:
| Name | Dept | Salary |
|---|---|---|
| Alice | IT | 90000 |
| Bob | Sales | 70000 |
| Charlie | IT | 85000 |
| Diana | Sales | 65000 |
Task: Find the total payroll and headcount for each department.
Section titled “Task: Find the total payroll and headcount for each department.”SELECT Dept, COUNT(*) AS Headcount, SUM(Salary) AS TotalPayrollFROM EmployeesGROUP BY Dept;Output
Section titled “Output”| Dept | Headcount | TotalPayroll |
|---|---|---|
| IT | 2 | 175000 |
| Sales | 2 | 135000 |
Under the Hood: The Internal Grouping Process
Section titled “Under the Hood: The Internal Grouping Process”When the engine encounters GROUP BY Dept:
- Sorting/Hashing: It either sorts the table by
Deptor builds a hash map where each uniqueDeptname is a key. - Bucketing: As it scans the rows, it puts each row into its respective bucket.
- Aggregating: It updates the running totals (
COUNT,SUM) for that specific bucket. - Finalization: Once the table scan is complete, it outputs one row for every bucket created.
Performance Tip: Filtering Early
Section titled “Performance Tip: Filtering Early”Always filter as much data as possible using WHERE before the GROUP BY.
WHEREreduces the number of rows that need to be sorted and bucketed.- The less data the
GROUP BYhas to process, the faster your query will run.