Learn by reading through in order

Window Functions ③ — Ranking, Analytic Functions, and Execution Order

Walk through how ROW_NUMBER / RANK / DENSE_RANK handle ties, how LAG / LEAD pull values from the previous / next row, how NTILE(4) splits rows into n groups, and why you have to use a subquery to filter on a rank — all with sales data.

Ranking — ROW_NUMBER / RANK / DENSE_RANK

The dedicated ranking functions are ROW_NUMBER() / RANK() / DENSE_RANK().

All three number rows in the order given by OVER (ORDER BY ...), but they handle ties (equal values) differently.

FunctionHow ties (equal values) are handled
ROW_NUMBER()Forces consecutive numbering even on ties (always 1, 2, 3, ...)
RANK()Ties share the same rank, then the next rank skips by however many were tied (2 tied at 5th → next is 7)
DENSE_RANK()Ties share the same rank, but the next rank doesn't skip (the row after a tie at 5th is 6)

The SQL below — and the diagram that visualizes its result — walks through a concrete case with tied totals row by row.

-- Compute each employee's total sales, then rank in descending order of total in 3 ways
SELECT emp_id,
  SUM(amount) AS total,
  ROW_NUMBER() OVER (ORDER BY SUM(amount) DESC) AS rn,
  RANK()       OVER (ORDER BY SUM(amount) DESC) AS rk,
  DENSE_RANK() OVER (ORDER BY SUM(amount) DESC) AS dr
FROM sales
GROUP BY emp_id
ORDER BY total DESC;
-- You can first aggregate with GROUP BY and then rank those aggregates
How ROW_NUMBER / RANK / DENSE_RANK differ
tied at 5th (same total)emp_idtotalROW_NUMBERRANKDENSE_RANK29147500044481250000555141250000655181245000776← tied at 5th (same total)consecutiveskips 6no skip
When 2 employees (emp_id 8 and 14) tie at 5th place with total 1,250,000, ROW_NUMBER assigns 5 and 6 (forced consecutive), RANK assigns 5 to both and jumps the next person to 7 (skipping 6), and DENSE_RANK assigns 5 to both and gives the next person 6 (no skip). Cells with the same rank value share a color.

Imagine you want a ranking of employees by total sales. There are ties, so you'd like to see all three side by side: consecutive numbering, skipping next ranks, and not skipping. (If your query runs correctly, an explanation will appear.)

GROUP BY the sales table by emp_id and select SUM(amount) aliased total.

② Over a window ordered by descending total, add ROW_NUMBER() aliased rn, RANK() aliased rk, and DENSE_RANK() aliased dr.

③ The grader checks row order, so end with ORDER BY total DESC, emp_id (descending by total, ascending by emp_id for ties).

SQL Editor

Run a query to see results

LAG / LEAD — pull values from the previous / next row

LAG(column) pulls the value from the previous row onto the current row, and LEAD(column) pulls from the next row (lag = to fall behind, lead = to go ahead).

OVER (PARTITION BY ... ORDER BY ...) decides what "previous" and "next" mean by setting the order.

Use them for period-over-period diffs (vs. previous sale), vs.-last-month deltas, or comparisons against the next scheduled event.

LAG(amount) returns NULL on the first row of each partition (there's no previous row); similarly LEAD is NULL on the last row.

The optional second argument controls offset ("how many rows back") and the third sets a default value when the target row is missing — e.g., LAG(amount, 1, 0).

Subtract with amount - LAG(amount) OVER (...) to get the diff vs. the previous row directly.

LAG = previous row / LEAD = next row
sale_dateamountLAG(amount)prev rowLEAD(amount)next row01-16400000NULL46000003-1146000040000044500004-1844500046000043000005-2543000044500041500006-09415000430000NULL↑ copies amount from above↓ copies amount from below
Ordering emp_id=2's sales by sale_date, LAG(amount) shows the previous row's amount and LEAD(amount) shows the next row's amount, attached to the current row. The very first LAG and the very last LEAD are NULL since there's no neighboring row.
-- Show previous sale (LAG) and the diff vs. the previous sale
SELECT emp_id, sale_date, amount,
  LAG(amount) OVER (PARTITION BY emp_id ORDER BY sale_date) AS prev_amount,
  amount - LAG(amount) OVER (PARTITION BY emp_id ORDER BY sale_date) AS diff_prev
FROM sales
ORDER BY emp_id, sale_date;

-- Use LEAD if you want to show the next sale alongside the current row
SELECT emp_id, sale_date, amount,
  LEAD(amount) OVER (PARTITION BY emp_id ORDER BY sale_date) AS next_amount
FROM sales
ORDER BY emp_id, sale_date;

Imagine you want a per-employee list of sales ordered by date, with the previous sale and the diff vs. that previous sale alongside each row.

① Select emp_id, sale_date, and amount from sales.

② Over a window partitioned per employee and ordered by sale_date, add the previous row's amount aliased prev_amount.

③ Also add amount minus the previous amount aliased diff_prev.

④ The grader checks row order, so end with ORDER BY emp_id, sale_date.

SQL Editor

Run a query to see results

NTILE — split ordered rows into n evenly sized groups

NTILE(n) splits the rows ordered by OVER (ORDER BY ...) into n as-evenly-sized-as-possible groups, labeling each row with a group number from 1 to n.

Use it for quartiles, quintiles, "top 25%", and similar bucketing.

When the row count isn't divisible by n, the earlier groups get one extra row each (e.g., splitting 5 rows into 4 groups gives 2 / 1 / 1 / 1).

Combine with PARTITION BY to do things like "split each employee's sales into upper / lower buckets within that employee."

The diagram below shows what happens when emp_id=11's 5 sales are sorted by amount desc and split with NTILE(4).

NTILE(4) — 5 rows into 4 groups (1 extra up front)
group 1 (top 25%)amount (desc)NTILE(4)which group4150001group 1 (2 rows)4000001group 1 (2 rows)3850002group 2 (1 row)3700003group 3 (1 row)3550004group 4 (1 row)5 ÷ 4 remainder 1 → +1 row to the first group
Ordering emp_id=11's 5 sales by amount desc and applying NTILE(4) puts 2 rows in group 1 and 1 row in each of the remaining 3 groups (the remainder 1 from 5 ÷ 4 goes to the first group).
-- Per employee, split that employee's sales into 4 groups by amount desc
SELECT emp_id, sale_date, amount,
  NTILE(4) OVER (PARTITION BY emp_id ORDER BY amount DESC) AS quartile
FROM sales
ORDER BY emp_id, amount DESC;

-- Split all sales into 4 buckets by amount (top 25% has quartile = 1)
SELECT sale_id, emp_id, amount,
  NTILE(4) OVER (ORDER BY amount DESC) AS quartile
FROM sales
ORDER BY amount DESC;

Imagine you want a list of sales split into 4 groups per employee (by amount desc), shown in quartile order from top group down.

① Select emp_id, sale_date, and amount from sales.

② Over a window partitioned per employee and ordered by amount desc, compute NTILE(4) and add it as a 4th column aliased quartile.

③ The grader checks row order, so end with `ORDER BY quartile, emp_id, sale_id` (ORDER BY can refer to the quartile alias computed in SELECT).

SQL Editor

Run a query to see results

Imagine you want to filter down to just the top 25% (quartile = 1). Try adding WHERE quartile = 1 to the Exercise 3 query.

① Use the same SELECT from Exercise 3 and add WHERE quartile = 1 after FROM sales.

② Running this returns an error (an error means you've got it right).

③ The next section, "SQL execution order", explains why. quartile worked in ORDER BY but doesn't work in WHERE — keep that puzzle in mind as we move on.

SQL Editor

Run a query to see results

SQL execution order — why you can't put a window function in WHERE

You might wonder, "can't I just filter on the rank I computed, like WHERE rk = 1?"

You'd get an error. The reason is SQL's logical execution order.

The order you write the clauses (SELECTFROMWHERE ...) is not the order they're evaluated.

Evaluation roughly goes: FROM / JOIN (assemble tables) → WHERE (filter rows) → GROUP BY (group) → HAVING (filter groups) → SELECT (compute columns — this is where window functions run) → ORDER BY (sort) → LIMIT (truncate).

Since WHERE runs before SELECT, it can't see the rank that SELECT is about to compute.

SQL execution order — windows run at SELECT
1. FROM / JOINassemble tables2. WHEREfilter rows(rank not computed yet)3. GROUP BYgroup4. HAVINGfilter groups5. SELECTcompute columnswindow functions run here6. ORDER BYsort7. LIMITtruncate row count
WHERE runs before SELECT, so you can't reference SELECT-computed values like a rank from WHERE. To filter on a rank, push the window into a subquery / CTE so it runs first.

To filter on a rank, push the query that computes the window function into a subquery (or CTE) and apply WHERE on the outside.

The inner SELECT finishes computing the window function, and the outer query can then treat that result as just another column.

-- This errors: WHERE runs before SELECT, so rk doesn't exist yet
-- SELECT emp_id, RANK() OVER (ORDER BY SUM(amount) DESC) AS rk
-- FROM sales GROUP BY emp_id WHERE rk <= 3;

-- Correct: compute the rank in an inner subquery, filter on the outside
SELECT * FROM (
  SELECT emp_id,
    SUM(amount) AS total,
    RANK() OVER (ORDER BY SUM(amount) DESC) AS rk
  FROM sales
  GROUP BY emp_id
) AS ranked
WHERE rk <= 3
ORDER BY rk, emp_id;

Imagine you want just the top 3 employees by total sales. Since you're filtering on a rank, you need a subquery to work around execution order.

① In the inner subquery, GROUP BY sales by emp_id and select SUM(amount) aliased total, plus RANK() OVER (ORDER BY SUM(amount) DESC) aliased rk. Alias the subquery ranked.

② Filter the outer query with WHERE rk <= 3.

③ The grader checks row order, so end with ORDER BY rk, emp_id (ascending by rk, then by emp_id for ties).

SQL Editor

Run a query to see results
QUIZ

Knowledge Check

Answer each question one by one.

Q1Two employees are tied at 5th place with the same total. What rank does the next person get under RANK() vs. DENSE_RANK()?

Q2What does LAG(amount) OVER (PARTITION BY emp_id ORDER BY sale_date) return on the first sale row of each employee?

Q3Why does SELECT emp_id, RANK() OVER (ORDER BY SUM(amount) DESC) AS rk FROM sales GROUP BY emp_id WHERE rk <= 3; not work?