Skip to content

Aggregate Functions

Aggregate functions perform a calculation on a set of values and return a single value. They are the backbone of reporting and data analysis in SQL.

There are five primary aggregate functions that are standard across all SQL dialects:

  1. COUNT(): Returns the number of rows.
  2. SUM(): Returns the total sum of a numeric column.
  3. AVG(): Returns the average value of a numeric column.
  4. MIN(): Returns the smallest value.
  5. MAX(): Returns the largest value.

A critical rule in SQL is that aggregate functions ignore NULL values, with one major exception: COUNT(*).

  • COUNT(column_name) counts the number of non-null values in that column.
  • COUNT(*) counts every row in the result set, including those that are entirely null.

Aggregate functions are used in the SELECT list.

Basic Aggregation
SELECT
COUNT(*) AS TotalRows,
SUM(Salary) AS TotalPayroll,
AVG(Salary) AS AverageSalary,
MIN(Salary) AS LowestSalary,
MAX(Salary) AS HighestSalary
FROM Employees;

We have a Sales table:

SaleIDAmountTax
1100.005.00
2200.0010.00
350.00NULL
4NULLNULL

Task: Calculate basic statistics for the sales amounts.

Section titled “Task: Calculate basic statistics for the sales amounts.”
Aggregates in Action
SELECT
COUNT(*) AS RawCount,
COUNT(Amount) AS ValidAmountCount,
SUM(Amount) AS TotalRevenue,
MAX(Amount) AS LargestSale
FROM Sales;
RawCountValidAmountCountTotalRevenueLargestSale
43350.00200.00

Under the Hood: Stream Aggregation vs. Hash Aggregation

Section titled “Under the Hood: Stream Aggregation vs. Hash Aggregation”

How does the database calculate a sum over 1 billion rows without running out of memory?

If the data is already sorted (via an index), the engine can “stream” the data. It keeps a single running total in memory and adds each row to it as it passes by. This uses almost zero memory regardless of the table size.

If the data is not sorted and you are grouping (see next chapter), the engine builds a “Hash Table” in memory. Each group becomes a key in the table, and the aggregate value (the running sum) is the value. If the hash table becomes too large for memory, the engine must spill the data to disk, which significantly slows down the query.

For SUM and AVG on floating-point numbers (FLOAT/REAL), you might encounter tiny rounding errors (e.g., 0.000000000001) due to how computers handle binary decimals. For financial data, always use DECIMAL or NUMERIC types to ensure exact precision.