Aggregate Functions
Aggregate functions perform a calculation on a set of values and return a single value. They are the backbone of reporting and data analysis in SQL.
Definition: The Big 5
Section titled “Definition: The Big 5”There are five primary aggregate functions that are standard across all SQL dialects:
COUNT(): Returns the number of rows.SUM(): Returns the total sum of a numeric column.AVG(): Returns the average value of a numeric column.MIN(): Returns the smallest value.MAX(): Returns the largest value.
Context: Aggregates and NULLs
Section titled “Context: Aggregates and NULLs”A critical rule in SQL is that aggregate functions ignore NULL values, with one major exception: COUNT(*).
COUNT(column_name)counts the number of non-null values in that column.COUNT(*)counts every row in the result set, including those that are entirely null.
Detailed Explanation: Syntax
Section titled “Detailed Explanation: Syntax”Aggregate functions are used in the SELECT list.
SELECT COUNT(*) AS TotalRows, SUM(Salary) AS TotalPayroll, AVG(Salary) AS AverageSalary, MIN(Salary) AS LowestSalary, MAX(Salary) AS HighestSalaryFROM Employees;Example: Sales Performance
Section titled “Example: Sales Performance”We have a Sales table:
| SaleID | Amount | Tax |
|---|---|---|
| 1 | 100.00 | 5.00 |
| 2 | 200.00 | 10.00 |
| 3 | 50.00 | NULL |
| 4 | NULL | NULL |
Task: Calculate basic statistics for the sales amounts.
Section titled “Task: Calculate basic statistics for the sales amounts.”SELECT COUNT(*) AS RawCount, COUNT(Amount) AS ValidAmountCount, SUM(Amount) AS TotalRevenue, MAX(Amount) AS LargestSaleFROM Sales;Output
Section titled “Output”| RawCount | ValidAmountCount | TotalRevenue | LargestSale |
|---|---|---|---|
| 4 | 3 | 350.00 | 200.00 |
Under the Hood: Stream Aggregation vs. Hash Aggregation
Section titled “Under the Hood: Stream Aggregation vs. Hash Aggregation”How does the database calculate a sum over 1 billion rows without running out of memory?
1. Stream Aggregation
Section titled “1. Stream Aggregation”If the data is already sorted (via an index), the engine can “stream” the data. It keeps a single running total in memory and adds each row to it as it passes by. This uses almost zero memory regardless of the table size.
2. Hash Aggregation
Section titled “2. Hash Aggregation”If the data is not sorted and you are grouping (see next chapter), the engine builds a “Hash Table” in memory. Each group becomes a key in the table, and the aggregate value (the running sum) is the value. If the hash table becomes too large for memory, the engine must spill the data to disk, which significantly slows down the query.
Accuracy Note
Section titled “Accuracy Note”For SUM and AVG on floating-point numbers (FLOAT/REAL), you might encounter tiny rounding errors (e.g., 0.000000000001) due to how computers handle binary decimals. For financial data, always use DECIMAL or NUMERIC types to ensure exact precision.