Skip to content

The GROUP BY Clause

Aggregating an entire table into one row is useful, but usually, we want to see statistics broken down by category—for example, total sales per region or average salary per department. This is where GROUP BY comes in.

The GROUP BY clause groups rows that have the same values in specified columns into summary rows.

This is the most common source of errors for SQL beginners.

If you use an aggregate function (like SUM) in your SELECT list, every other column in that SELECT list MUST either be part of an aggregate function or appear in the GROUP BY clause.

Remember the logical order of execution:

  1. FROM
  2. WHERE
  3. GROUP BY
  4. SELECT

Because GROUP BY happens before SELECT, the database first organizes the data into buckets (groups). Only then does it calculate the SUM or COUNT for each bucket.

GROUP BY Syntax
SELECT column_name, AGGREGATE_FUNCTION(column_name)
FROM table_name
WHERE condition
GROUP BY column_name;

You can group by multiple levels. For example, to see sales by Year AND Region: GROUP BY Year, Region

We have an Employees table:

NameDeptSalary
AliceIT90000
BobSales70000
CharlieIT85000
DianaSales65000

Task: Find the total payroll and headcount for each department.

Section titled “Task: Find the total payroll and headcount for each department.”
Grouping Example
SELECT
Dept,
COUNT(*) AS Headcount,
SUM(Salary) AS TotalPayroll
FROM Employees
GROUP BY Dept;
DeptHeadcountTotalPayroll
IT2175000
Sales2135000

Under the Hood: The Internal Grouping Process

Section titled “Under the Hood: The Internal Grouping Process”

When the engine encounters GROUP BY Dept:

  1. Sorting/Hashing: It either sorts the table by Dept or builds a hash map where each unique Dept name is a key.
  2. Bucketing: As it scans the rows, it puts each row into its respective bucket.
  3. Aggregating: It updates the running totals (COUNT, SUM) for that specific bucket.
  4. Finalization: Once the table scan is complete, it outputs one row for every bucket created.

Always filter as much data as possible using WHERE before the GROUP BY.

  • WHERE reduces the number of rows that need to be sorted and bucketed.
  • The less data the GROUP BY has to process, the faster your query will run.