Ranking Functions
Ranking functions allow you to assign a numerical rank to each row within a partition. This is the most common use case for window functions, especially for finding “Top-N” records (e.g., “The top 3 sales per region”).
Definition: The 4 Ranking Specialists
Section titled “Definition: The 4 Ranking Specialists”ROW_NUMBER(): A unique, sequential integer for every row. Even if values are identical, the row numbers will be different.RANK(): Assigns the same rank to identical values. However, it leaves gaps in the sequence. (e.g., 1, 2, 2, 4).DENSE_RANK(): LikeRANK(), it handles ties identically, but it never leaves gaps. (e.g., 1, 2, 2, 3).NTILE(n): Divides the rows intonroughly equal buckets and returns the bucket number. (e.g.,NTILE(4)for quartiles).
Context: Handling Ties
Section titled “Context: Handling Ties”The difference between RANK and DENSE_RANK is best explained with an example. Imagine a race where two people tie for 2nd place.
RANK: The next person is in 4th place. (1, 2, 2, 4).DENSE_RANK: The next person is in 3rd place. (1, 2, 2, 3).
Detailed Explanation: Syntax
Section titled “Detailed Explanation: Syntax”SELECT Name, Score, ROW_NUMBER() OVER(ORDER BY Score DESC) AS RowNum, RANK() OVER(ORDER BY Score DESC) AS Rnk, DENSE_RANK() OVER(ORDER BY Score DESC) AS DenseRnkFROM Students;Example: Top Salesperson per Region
Section titled “Example: Top Salesperson per Region”We have a SalesPerformance table:
| Name | Region | Sales |
|---|---|---|
| Alice | East | 1000 |
| Bob | West | 1200 |
| Charlie | East | 1000 |
| Diana | West | 1500 |
Task: Identify the top seller in each region.
Section titled “Task: Identify the top seller in each region.”WITH RankedSales AS ( SELECT Name, Region, Sales, DENSE_RANK() OVER(PARTITION BY Region ORDER BY Sales DESC) AS SalesRank FROM SalesPerformance)SELECT *FROM RankedSalesWHERE SalesRank = 1;Output
Section titled “Output”| Name | Region | Sales | SalesRank |
|---|---|---|---|
| Alice | East | 1000 | 1 |
| Charlie | East | 1000 | 1 |
| Diana | West | 1500 | 1 |
(Note: Both Alice and Charlie are returned for the East region because they tied for 1st place.)
Under the Hood: The Cost of Sorting
Section titled “Under the Hood: The Cost of Sorting”To calculate a rank, the database must sort the data (internally) by the ORDER BY clause of the window.
1. In-Memory Sort
Section titled “1. In-Memory Sort”If the partition fits in memory, the engine sorts it and then iterates through the rows, keeping track of the current rank and the previous value to determine if a tie has occurred.
2. Index Optimization
Section titled “2. Index Optimization”If an index exists on the PARTITION BY and ORDER BY columns, the database can skip the sort entirely. It simply walks the index and assigns ranks as it goes. This is the difference between a query taking 1 second vs. 1 minute on large datasets.