Learn by reading through in order

Eliminating Repeated Work and Optimizing Subqueries — Worked Examples

Practice rewriting queries so the same aggregation isn't computed twice, correlated subqueries collapse into a single GROUP BY, and a derived table is materialized once via a CTE — comparing the query plans before and after.

The data we use in this article — perf_sales and employee

In the previous article you looked at rewriting IN / EXISTS and correlated subqueries.

This article builds on that with rewrites that avoid computing the same aggregation twice (folding repeated aggregations into a single intermediate result) and materializing derived tables (building a subquery in a FROM clause once and holding it temporarily) — practiced through concrete worked examples.

Before getting into the exercises, take a look at the column definitions and sample data of the two tables this article uses — perf_sales and employee.

① Use PRAGMA table_info(perf_sales); and PRAGMA table_info(employee); to check the column definitions for both tables.

② Use SELECT * FROM perf_sales LIMIT 5; and SELECT * FROM employee LIMIT 5; to preview the first 5 rows. perf_sales has 50,000 rows, so always attach LIMIT when previewing.

SQL Editor

Run a query to see results

Don't write the same aggregation twice — fold the duplication into one intermediate result

When you want to ask "is each region's sales total above the average total across all regions?", writing the total and the average as two separate subqueries scans perf_sales twice.

If you fold the same aggregation into one CTE and compute it once, then reuse the result, you only scan once.

The rewrite pattern is: spot the case where "the same GROUP BY aggregation appears both in the main query and in a subquery", lift that aggregation into a single CTE, and have the main query reference the CTE instead.

The result is identical, and the duplicate scan disappears from the plan.

-- Before: total and "average across regions" aggregated separately (perf_sales scanned twice)
SELECT region, SUM(amount) AS total
FROM perf_sales
GROUP BY region
HAVING SUM(amount) > (
  SELECT AVG(t) FROM (
    SELECT SUM(amount) AS t FROM perf_sales GROUP BY region
  )
);

-- After: aggregate by region once in a CTE
WITH region_total AS (
  SELECT region, SUM(amount) AS total
  FROM perf_sales
  GROUP BY region
)
SELECT region, total
FROM region_total
WHERE total > (SELECT AVG(total) FROM region_total)
ORDER BY region;
Fold repeated aggregations into one
Same aggregation writtentwice — once in the mainGROUP BY, once in theHAVING subqueryWITH r AS (...GROUP BY region)SELECT ... FROM rWHERE tot > (AVG(tot) FROM r)perf_sales scannedtwiceAggregate oncereuse the CTE
Aggregating the total and the "average across regions" separately scans perf_sales twice. Fold the aggregation into a single CTE and compute the average from that CTE — and you only scan once.

Run the question "which regions have a sales total above the average total across all regions?" in two forms — the duplicate-aggregation version and the CTE version — and compare the plans.

① Prefix the query with EXPLAIN QUERY PLAN and show the plan for the version that writes the GROUP BY region aggregation both in the main query and inside the HAVING subquery.

② Fold the same region aggregation into a single CTE, take the average from that CTE with AVG, and order regions above it with ORDER BY region. Show the plan for this version with EXPLAIN QUERY PLAN too.

③ Confirm that SCAN perf_sales appears multiple times in the before version and drops to once in the after version.

SQL Editor

Run a query to see results

Collapse correlated subqueries into a one-shot aggregation

When a condition like "is each region's sales total above the maximum per-employee total within that region?" is computed as a correlated subquery per outer region row, the inner query is re-evaluated once for every region.

This can also be folded into a one-time aggregation keyed by region, then joined back.

The rewrite pattern is: take a (SELECT aggregate ... WHERE child.key = parent.key) called per outer row and turn it into "build a GROUP BY key aggregate table once, then join back to the parent".

The per-region re-evaluations are replaced with a single aggregation plus a join.

-- Before: correlate per region to find the "per-employee max total"
SELECT region,
  (SELECT MAX(s) FROM (
     SELECT SUM(amount) AS s FROM perf_sales p2
     WHERE p2.region = p1.region GROUP BY emp_id
  )) AS max_emp_total
FROM perf_sales p1
GROUP BY region;

-- After: aggregate once by (region, emp_id), then take the MAX per region
WITH emp_region AS (
  SELECT region, emp_id, SUM(amount) AS s
  FROM perf_sales
  GROUP BY region, emp_id
)
SELECT region, MAX(s) AS max_emp_total
FROM emp_region
GROUP BY region
ORDER BY region;
Per-region correlation → single aggregation + JOIN
Per region:(SELECT MAX(..) WHERE inner.region = outer.region)WITH g AS ( GROUP BY region, emp_id ...)MAX per regionin one shotInner re-evaluatedonce per regionOne aggregation + joincorrelation gone from plan
Re-evaluating the inner query per region row can be folded into building a single aggregate keyed by region and joining back. The per-region re-evaluations disappear.

Run "the maximum per-employee sales total within each region" in two forms — the correlated version and the one-shot aggregation version — and compare the plans.

① Prefix the query with EXPLAIN QUERY PLAN and show the plan for the version that uses a correlated subquery to compute the "per-employee max total inside this region" per outer region row.

② Build a CTE that does GROUP BY on `(region, emp_id)` exactly once, then GROUP BY region over that CTE to take the max. Show the plan for this version with EXPLAIN QUERY PLAN too.

③ Confirm that the word CORRELATED in the correlated plan disappears in the rewritten plan.

SQL Editor

Run a query to see results

Materializing derived tables — build the FROM-clause subquery once

A derived table — a subquery written in a FROM clause — referenced in two places can sometimes be recomputed twice by the optimizer.

If you build it once and hold it temporarily (materialize it), the references don't rebuild it each time.

In SQLite, referencing a CTE multiple times makes EXPLAIN QUERY PLAN show a MATERIALIZE line — confirming the CTE is materialized exactly once.

In the final exercise of this article you'll take a heavy derived table that aggregates perf_sales by emp_id and is referenced twice, turn it into a CTE materialized once, and confirm that MATERIALIZE shows up in the plan and the scan happens only once.

-- Before: the same derived table referenced in two FROM clauses
SELECT a.emp_id, a.tot, b.avg_tot
FROM (SELECT emp_id, SUM(amount) AS tot FROM perf_sales GROUP BY emp_id) a
CROSS JOIN (
  SELECT AVG(tot) AS avg_tot FROM (
    SELECT emp_id, SUM(amount) AS tot FROM perf_sales GROUP BY emp_id
  )
) b
ORDER BY a.emp_id;

-- After: derived table as a CTE referenced twice (materialized once)
WITH emp_total AS (
  SELECT emp_id, SUM(amount) AS tot
  FROM perf_sales
  GROUP BY emp_id
)
SELECT e.emp_id, e.tot, (SELECT AVG(tot) FROM emp_total) AS avg_tot
FROM emp_total e
ORDER BY e.emp_id;
Materialize a derived table once
(SELECT ... GROUP BY emp_id)referenced in twoFROM clausesWITH agg AS ( GROUP BY emp_id)agg referenced twice→ MATERIALIZEDerived tablecomputed twiceMaterialized onceperf_sales scanned once
A heavy derived table referenced in two places may be computed twice. Move it into a CTE referenced multiple times and MATERIALIZE materializes it once — the scan only happens once.

Run "each employee's sales total alongside the overall average across employees" in two forms — the version that references the derived table twice and the CTE version — and compare the plans. This is the final exercise of the article.

① Prefix the query with EXPLAIN QUERY PLAN and show the plan for the version that references the emp_id-aggregated derived table in two places in the FROM clause.

② Fold the same aggregation into a single CTE, take the average from the same CTE via (SELECT AVG(tot) FROM same_CTE), and order by emp_id. Show the plan for this version with EXPLAIN QUERY PLAN too.

③ Confirm that MATERIALIZE shows up in the rewritten plan and SCAN perf_sales drops to once.

SQL Editor

Run a query to see results
QUIZ

Knowledge Check

Answer each question one by one.

Q1What's the best fix for a query that writes the same GROUP BY aggregation twice — once in the main query and once in a HAVING subquery?

Q2Which rewrite collapses a correlation that calls (SELECT aggregate ... WHERE child.key = parent.key) for every outer group row?

Q3What does MATERIALIZE in EXPLAIN QUERY PLAN indicate?