Common Table Expressions (CTEs)
Common Table Expressions (CTEs) are the modern standard for writing complex SQL. They allow you to define a “virtual table” (a named result set) that exists only for the duration of a single query.
Definition: The WITH Clause
Section titled “Definition: The WITH Clause”A CTE is defined using the WITH keyword at the very beginning of your query.
WITH MyVirtualTable AS ( SELECT ... (Your Subquery Here))SELECT * FROM MyVirtualTable;Context: Why CTEs are Better than Subqueries
Section titled “Context: Why CTEs are Better than Subqueries”- Readability: CTEs allow for “Top-Down” logic. You define your data components first and then use them in the final
SELECT. Subqueries force you to read “Inside-Out”. - Modularity: You can define multiple CTEs in a single query, separated by commas.
- Reusability: You can reference the same CTE multiple times in your main query (e.g., joining a CTE to itself).
- Recursion: Only CTEs support recursive logic (needed for tree structures).
Detailed Explanation: Modular Design
Section titled “Detailed Explanation: Modular Design”Think of CTEs as “variables” for tables.
WITHDepartmentPayroll AS ( SELECT Dept, SUM(Salary) AS TotalDeptSalary FROM Employees GROUP BY Dept),CompanyAverage AS ( SELECT AVG(TotalDeptSalary) AS AvgPayroll FROM DepartmentPayroll)SELECT Dept, TotalDeptSalaryFROM DepartmentPayrollWHERE TotalDeptSalary > (SELECT AvgPayroll FROM CompanyAverage);Example: Cleaning Before Joining
Section titled “Example: Cleaning Before Joining”Often, you want to clean or aggregate data from one table before joining it to another. Doing this in a CTE makes the join logic much clearer.
WITH CleanedSales AS ( SELECT ProductID, SUM(Quantity) AS TotalSold FROM Sales WHERE SaleDate >= '2023-01-01' GROUP BY ProductID)SELECT p.ProductName, s.TotalSoldFROM Products pJOIN CleanedSales s ON p.ID = s.ProductID;Under the Hood: Optimization and Inlining
Section titled “Under the Hood: Optimization and Inlining”How does a CTE affect performance compared to a subquery?
1. Inlining
Section titled “1. Inlining”In most modern versions of PostgreSQL and SQL Server, the optimizer treats a CTE exactly like a subquery. It “inlines” the logic into the main query to find the best execution plan.
2. Materialization (Postgres specific)
Section titled “2. Materialization (Postgres specific)”Historically, PostgreSQL always “materialized” CTEs—meaning it ran the CTE and saved it to a temp table before doing anything else. While this was great for preventing expensive subqueries from running multiple times, it prevented the optimizer from using indexes from the outer query.
- Note: Since Postgres 12, the engine now inlines CTEs automatically unless you explicitly tell it not to using the
MATERIALIZEDkeyword.
3. Scope
Section titled “3. Scope”A CTE only exists for the specific query it is attached to. You cannot reference a CTE in a different tab or a separate database connection.