Value Extraction
In standard SQL, rows are independent. You cannot easily compare “today’s price” with “yesterday’s price” without performing a complex self-join. LEAD and LAG solve this by allowing you to “reach” into other rows.
Definition: LEAD and LAG
Section titled “Definition: LEAD and LAG”LAG(column, offset, default): Returns the value from a row behind (above) the current row.LEAD(column, offset, default): Returns the value from a row ahead (below) the current row.
- offset: How many rows to reach (Default is 1).
- default: What to return if the reach goes out of bounds (Default is
NULL).
Context: Period-over-Period Analysis
Section titled “Context: Period-over-Period Analysis”LEAD and LAG are the primary tools for time-series analysis. They are used to calculate:
- Month-over-Month growth.
- The time elapsed between two events.
- Price changes between transactions.
Detailed Explanation: Syntax
Section titled “Detailed Explanation: Syntax”SELECT Date, Price, LAG(Price) OVER(ORDER BY Date) AS PrevPrice, Price - LAG(Price) OVER(ORDER BY Date) AS PriceChangeFROM StockPrices;Example: Customer Re-ordering
Section titled “Example: Customer Re-ordering”We have an Orders table:
| CustomerID | OrderDate |
|---|---|
| 1 | 2023-01-01 |
| 1 | 2023-01-15 |
| 2 | 2023-01-05 |
| 1 | 2023-02-01 |
Task: For each order, find how many days have passed since that customer’s previous order.
Section titled “Task: For each order, find how many days have passed since that customer’s previous order.”SELECT CustomerID, OrderDate, OrderDate - LAG(OrderDate) OVER(PARTITION BY CustomerID ORDER BY OrderDate) AS DaysSinceLastOrderFROM Orders;Output
Section titled “Output”| CustomerID | OrderDate | DaysSinceLastOrder |
|---|---|---|
| 1 | 2023-01-01 | NULL |
| 1 | 2023-01-15 | 14 |
| 1 | 2023-02-01 | 17 |
| 2 | 2023-01-05 | NULL |
(Note: The first order for each customer returns NULL because there is no “previous” row within their partition.)
Under the Hood: The Row Buffer
Section titled “Under the Hood: The Row Buffer”When executing LAG, the database engine keeps a small buffer of the previous rows in memory as it iterates through the partition. Because the window is already sorted (required by the ORDER BY in the OVER clause), it simply looks back at the $n$-th previous row in its memory buffer.
Performance Tip
Section titled “Performance Tip”LEAD and LAG are extremely efficient because they only require a single pass through the sorted data. A self-join achieving the same result would require a second scan of the table, making it twice as slow.