Learn by reading through in order

GROUP BY and HAVING — Grouped Aggregates

Learn SQL GROUP BY and HAVING: per-group aggregates, filtering groups with HAVING, and how it differs from WHERE — practiced on CSV score data, live in your browser.

The data used in this article — the score table

In the previous article you collapsed an entire table into a single value. This article's `GROUP BY` (grouping — collecting rows that share the same value into one group) runs aggregates per group and returns one aggregate row per group. On top of that, `HAVING` lets you filter the aggregate results, like "only groups whose average is 80 or higher."

The subject is the same score table as last time (30 rows = 10 people × 3 subjects). Grouping by name gives 10 groups, and grouping by subject gives 3 groups.

Before diving into the exercises, take a look at the column definitions and a sample of the data in the score table.

① Use PRAGMA table_info(score); to check the column names, types, and primary key.

② Use SELECT * FROM score LIMIT 5; to preview the first 5 rows of data.

SQL Editor

Run a query to see results

GROUP BY — aggregate per group

When you write GROUP BY column, rows that share the same value in that column are collected into one group, and aggregate functions are computed once per group. With SELECT name, AVG(score) FROM score GROUP BY name;, the 3 rows with the same name (one per subject) form one group, and you get one average row per name. The only "plain columns" you can put in SELECT are the columns named in `GROUP BY`; any other column has to be wrapped in an aggregate function (because its value isn't single within a group).

GROUP BY name
Original rows (30)GROUP BY nameGroup aggregateAlice Math 92Alice English 85Alice Science 78Alice groupAlice / AVG 85.0
The 3 rows with the same name (one per subject) collapse into one group, and AVG(score) is computed per group. With 10 people, the result is 10 rows.
-- Count / average / top score per subject (3 groups by subject)
SELECT
  subject,
  COUNT(*)             AS row_count,
  ROUND(AVG(score), 1) AS avg_score,
  MAX(score)           AS top_score
FROM score
GROUP BY subject;

Imagine the requirement "I want a list of each student's average score." (Run it correctly and the explanation will appear.)

① Group the score table by name.

② For each group, pull the name as name, the number of subjects taken as subjects, and the average score as avg_score, in that order.

③ Round the average to 2 decimal places. The order is not specified.

SQL Editor

Run a query to see results

HAVING — filter the aggregate results

After a group aggregate, you sometimes want to filter on the aggregate value, like "only show groups whose average is 80 or higher." You can't do this with WHERE. WHERE is a clause that judges individual rows before grouping, so a group aggregate like AVG(score) hasn't been computed yet. That's where `HAVING` comes in. HAVING is written after GROUP BY and judges its condition against the group aggregate results. Writing HAVING AVG(score) >= 80 keeps only the groups whose average is 80 or higher.

After GROUP BY, filter with HAVING
score 30 rowsGROUP BY name(10 groups)HAVINGAVG(score) >= 806 groups remainJudged after aggregation
GROUP BY aggregates into 10 groups, then HAVING AVG(score) >= 80 judges the already-aggregated groups and keeps only those that satisfy the condition.
-- Only names whose average is above 85 (HAVING judges the aggregate value)
SELECT
  name,
  ROUND(AVG(score), 2) AS avg_score
FROM score
GROUP BY name
HAVING AVG(score) > 85;

Imagine the requirement "I want to list only the top performers with an average score of 80 or higher."

① Group the score table by name.

② For each group, pull the name as name and the average score as avg_score (rounded to 2 decimal places).

③ Keep only the groups whose average score is 80 or higher. The order is not specified.

SQL Editor

Run a query to see results

WHERE vs. HAVING — filter rows, or filter groups

WHERE and HAVING are both filters, but they work at different times. WHERE judges individual rows and drops them before grouping. HAVING judges per group and drops them after grouping and aggregating. The execution order is FROMWHEREGROUP BYHAVINGSELECTORDER BY.

You can also use both at once. In that case the flow is "narrow the target rows with WHERE, then group, then filter the aggregate results with HAVING." For example, to get "people whose per-name average is 80 or higher, considering only Math and English," you narrow rows with WHERE subject IN ('Math','English') and narrow groups with HAVING AVG(score) >= 80.

WHERE filters rows, HAVING filters groups
Original rows (30)Rows that failthe condition drop hereWHEREjudges rowson plain columnsGROUP BY namegroups the rowsHAVINGjudges groupson aggregate valuesGroups that failthe condition drop hereRemaining groupsare the resultAll rows flow inDropped rowsOnly remaining rowsAfter aggregatingDropped groupsRemaining groups
WHERE judges rows on plain columns and drops them before grouping, while HAVING judges groups on aggregate values (AVG, etc.) and drops them after grouping. The key point is where things get dropped. Example: WHERE subject='Math' (filters rows) and HAVING AVG(score)>=80 (filters groups).
-- Use WHERE and HAVING together
-- Exclude Science, keep only names whose average is 80 or higher
SELECT
  name,
  ROUND(AVG(score), 2) AS avg_two
FROM score
WHERE subject <> 'Science'
GROUP BY name
HAVING AVG(score) >= 80;

Imagine the requirement "considering only the two subjects Math and English, I want people whose per-name average is 85 or higher."

① From the score table, keep only the rows where subject is Math or English (Science is out of scope).

② Group the remaining rows by name and pull the name as name and the two-subject average score as avg_two (rounded to 2 decimal places).

③ Then keep only the groups whose average is 85 or higher. The order is not specified.

SQL Editor

Run a query to see results

Imagine the requirement "I want to display the per-subject averages as a ranking, highest first."

① Group the score table by subject.

② For each group, pull the subject as subject, the number of students who took it as students, and the average score as avg_score (rounded to 2 decimal places), in that order.

③ Order them from the highest average down.

SQL Editor

Run a query to see results
QUIZ

Knowledge Check

Answer each question one by one.

Q1When you specify GROUP BY name, how many rows does the result have (score table / 10 distinct names)?

Q2When you want to extract "only people whose per-name average score is 80 or higher," which clause does the 80-or-higher condition go in?

Q3Which statement correctly describes the difference between WHERE and HAVING?