Q1What's the best fix for a query that writes the same GROUP BY aggregation twice — once in the main query and once in a HAVING subquery?
Eliminating Repeated Work and Optimizing Subqueries — Worked Examples
Practice rewriting queries so the same aggregation isn't computed twice, correlated subqueries collapse into a single GROUP BY, and a derived table is materialized once via a CTE — comparing the query plans before and after.
The data we use in this article — perf_sales and employee
In the previous article you looked at rewriting IN / EXISTS and correlated subqueries.
This article builds on that with rewrites that avoid computing the same aggregation twice (folding repeated aggregations into a single intermediate result) and materializing derived tables (building a subquery in a FROM clause once and holding it temporarily) — practiced through concrete worked examples.
Don't write the same aggregation twice — fold the duplication into one intermediate result
When you want to ask "is each region's sales total above the average total across all regions?", writing the total and the average as two separate subqueries scans perf_sales twice.
If you fold the same aggregation into one CTE and compute it once, then reuse the result, you only scan once.
The rewrite pattern is: spot the case where "the same GROUP BY aggregation appears both in the main query and in a subquery", lift that aggregation into a single CTE, and have the main query reference the CTE instead.
The result is identical, and the duplicate scan disappears from the plan.
-- Before: total and "average across regions" aggregated separately (perf_sales scanned twice)
SELECT region, SUM(amount) AS total
FROM perf_sales
GROUP BY region
HAVING SUM(amount) > (
SELECT AVG(t) FROM (
SELECT SUM(amount) AS t FROM perf_sales GROUP BY region
)
);
-- After: aggregate by region once in a CTE
WITH region_total AS (
SELECT region, SUM(amount) AS total
FROM perf_sales
GROUP BY region
)
SELECT region, total
FROM region_total
WHERE total > (SELECT AVG(total) FROM region_total)
ORDER BY region;
Collapse correlated subqueries into a one-shot aggregation
When a condition like "is each region's sales total above the maximum per-employee total within that region?" is computed as a correlated subquery per outer region row, the inner query is re-evaluated once for every region.
This can also be folded into a one-time aggregation keyed by region, then joined back.
The rewrite pattern is: take a (SELECT aggregate ... WHERE child.key = parent.key) called per outer row and turn it into "build a GROUP BY key aggregate table once, then join back to the parent".
The per-region re-evaluations are replaced with a single aggregation plus a join.
-- Before: correlate per region to find the "per-employee max total"
SELECT region,
(SELECT MAX(s) FROM (
SELECT SUM(amount) AS s FROM perf_sales p2
WHERE p2.region = p1.region GROUP BY emp_id
)) AS max_emp_total
FROM perf_sales p1
GROUP BY region;
-- After: aggregate once by (region, emp_id), then take the MAX per region
WITH emp_region AS (
SELECT region, emp_id, SUM(amount) AS s
FROM perf_sales
GROUP BY region, emp_id
)
SELECT region, MAX(s) AS max_emp_total
FROM emp_region
GROUP BY region
ORDER BY region;
Materializing derived tables — build the FROM-clause subquery once
A derived table — a subquery written in a FROM clause — referenced in two places can sometimes be recomputed twice by the optimizer.
If you build it once and hold it temporarily (materialize it), the references don't rebuild it each time.
In SQLite, referencing a CTE multiple times makes EXPLAIN QUERY PLAN show a MATERIALIZE line — confirming the CTE is materialized exactly once.
In the final exercise of this article you'll take a heavy derived table that aggregates perf_sales by emp_id and is referenced twice, turn it into a CTE materialized once, and confirm that MATERIALIZE shows up in the plan and the scan happens only once.
-- Before: the same derived table referenced in two FROM clauses
SELECT a.emp_id, a.tot, b.avg_tot
FROM (SELECT emp_id, SUM(amount) AS tot FROM perf_sales GROUP BY emp_id) a
CROSS JOIN (
SELECT AVG(tot) AS avg_tot FROM (
SELECT emp_id, SUM(amount) AS tot FROM perf_sales GROUP BY emp_id
)
) b
ORDER BY a.emp_id;
-- After: derived table as a CTE referenced twice (materialized once)
WITH emp_total AS (
SELECT emp_id, SUM(amount) AS tot
FROM perf_sales
GROUP BY emp_id
)
SELECT e.emp_id, e.tot, (SELECT AVG(tot) FROM emp_total) AS avg_tot
FROM emp_total e
ORDER BY e.emp_id;
Knowledge Check
Answer each question one by one.
Q2Which rewrite collapses a correlation that calls (SELECT aggregate ... WHERE child.key = parent.key) for every outer group row?
Q3What does MATERIALIZE in EXPLAIN QUERY PLAN indicate?