Skip to content

Value Extraction

In standard SQL, rows are independent. You cannot easily compare “today’s price” with “yesterday’s price” without performing a complex self-join. LEAD and LAG solve this by allowing you to “reach” into other rows.

  1. LAG(column, offset, default): Returns the value from a row behind (above) the current row.
  2. LEAD(column, offset, default): Returns the value from a row ahead (below) the current row.
  • offset: How many rows to reach (Default is 1).
  • default: What to return if the reach goes out of bounds (Default is NULL).

LEAD and LAG are the primary tools for time-series analysis. They are used to calculate:

  • Month-over-Month growth.
  • The time elapsed between two events.
  • Price changes between transactions.
Time Series Comparison
SELECT
Date,
Price,
LAG(Price) OVER(ORDER BY Date) AS PrevPrice,
Price - LAG(Price) OVER(ORDER BY Date) AS PriceChange
FROM StockPrices;

We have an Orders table:

CustomerIDOrderDate
12023-01-01
12023-01-15
22023-01-05
12023-02-01

Task: For each order, find how many days have passed since that customer’s previous order.

Section titled “Task: For each order, find how many days have passed since that customer’s previous order.”
Customer Loyalty Analysis
SELECT
CustomerID,
OrderDate,
OrderDate - LAG(OrderDate) OVER(PARTITION BY CustomerID ORDER BY OrderDate) AS DaysSinceLastOrder
FROM Orders;
CustomerIDOrderDateDaysSinceLastOrder
12023-01-01NULL
12023-01-1514
12023-02-0117
22023-01-05NULL

(Note: The first order for each customer returns NULL because there is no “previous” row within their partition.)

When executing LAG, the database engine keeps a small buffer of the previous rows in memory as it iterates through the partition. Because the window is already sorted (required by the ORDER BY in the OVER clause), it simply looks back at the $n$-th previous row in its memory buffer.

LEAD and LAG are extremely efficient because they only require a single pass through the sorted data. A self-join achieving the same result would require a second scan of the table, making it twice as slow.