Learn by reading through in order

Aggregate Functions — COUNT / SUM / AVG / MIN / MAX

Learn the SQL aggregate functions COUNT / SUM / AVG / MIN / MAX hands-on — from whole-table aggregates to combining them with WHERE — using CSV score data, all live in your browser.

The data we'll use — the score table

In this article we'll work with aggregate functions (functions that fold many rows into a single value). How many rows are there, what's the total, the average, the minimum, the maximum — these all compress many rows down to a single number. The five basics are COUNT / SUM / AVG / MIN / MAX, and almost every summary figure in a report can be built from just these.

The dataset is the score table (30 rows = 10 people × the 3 subjects Math / English / Science). Targeting the points in the score column, you'll work through whole-table aggregates first, then aggregates after narrowing the rows with WHERE. Per-group aggregates (like the average per subject) are covered in the next article, so here we focus on folding the whole table — or the whole range narrowed by WHERE — into a single value.

Before diving into the exercises, take a look at the column definitions and a data sample of the score table.

① Run PRAGMA table_info(score); to check the column names, types, and primary key.

② Run SELECT * FROM score LIMIT 5; to preview the first 5 rows of data.

SQL Editor

Run a query to see results

Aggregate functions: "many rows → one value"

Aggregate functions go in the column position of a SELECT.

The thing to watch out for here is not mixing a plain column and an aggregate function in the same `SELECT`. A plain column like name has a different value per row, whereas an aggregate function like COUNT(*) folds the whole table (or the whole set narrowed by `WHERE`) into a single value. If you write both, as in SELECT name, COUNT(*) FROM score;, the aggregate side is a single value for the whole set while name has no defined answer for "which row's value should I show?" In most databases this is an error.

Aggregate functions compress many rows into one value
Input (30 rows)Aggregate fnResult (1 value)score column of scoreCOUNT(*)SUM(score)AVG(score)MIN(score)MAX(score)30233877.935495
Taking the 30 rows of score as input, COUNT returns the row count, SUM the total, AVG the average, and MIN / MAX the minimum and maximum — each as a single value.
-- Get the count, distinct count, total, and average together
SELECT
  COUNT(*)             AS row_count,
  COUNT(DISTINCT name) AS name_kinds,
  SUM(score)           AS total,
  AVG(score)           AS avg_raw
FROM score
WHERE subject = 'English';

The difference between COUNT(*) and COUNT(column)

COUNT(*) counts the row count itself. COUNT(column) counts only the rows where that column is not NULL. The score table has no NULLs, so both come out to the same 30, but for a column that contains NULLs you'll see a difference here (we'll confirm this later using a table with NULLs). On top of that, writing COUNT(DISTINCT column) counts the number of distinct kinds with duplicates removed (use it on subject and you get 3, the number of distinct subjects).

Tip — aggregate functions and NULL

SUM / AVG / MIN / MAX don't include NULL rows in the calculation (they ignore them). In particular, AVG is computed as the sum of the non-NULL values ÷ the count of non-NULL values. So for a column containing NULLs, SUM(column) / COUNT(*) (denominator = all rows) and AVG(column) (denominator = non-NULL count) come out different. When you want to average treating NULL as 0, spell it out explicitly with something like AVG(COALESCE(column, 0)).

Imagine a requirement: "I want to show the overall scale of the score data on a dashboard in a single row." (Run it correctly and the explanation will appear.)

① From the score table, use the five aggregate functions to pull the following 5 columns in a single row.

② Alias the count of all rows as row_count, the sum of the points as total, the average of the points as avg_score, the lowest point as min_score, and the highest point as max_score, in that order.

③ Since the average displays with a long decimal, round it to 2 decimal places and alias it as avg_score.

SQL Editor

Run a query to see results

Narrow the target with WHERE before aggregating

Adding WHERE makes only the rows matching the condition the input to the aggregate functions. The execution order is "FROM reads all rows → WHERE narrows the rows → the remaining rows are folded by the aggregate functions." In other words, the aggregate functions only see the rows that passed WHERE, and the sum or average is computed after the filtering. Add WHERE subject = 'Math' and only the 10 Math rows are the target, so the average becomes the average of Math alone.

The flow of narrowing with WHERE before aggregating
FROM score (30 rows)WHERE subject = 'Math'10 Math rowsAVG(score)76.8 (1 value)narrow
FROM reads 30 rows, WHERE subject = 'Math' narrows to 10 rows, and AVG(score) folds just those 10 rows. The aggregation happens after the filtering.
-- Count / average / top score for just the 10 Science rows
SELECT
  COUNT(*)             AS row_count,
  ROUND(AVG(score), 1) AS avg_science,
  MAX(score)           AS top_science
FROM score
WHERE subject = 'Science';

Imagine a requirement: "I want to slice out just the Math scores and produce the average, max, and min."

① From the score table, target only the rows where subject is Math.

② Pull, in a single row, the count of target rows aliased as math_count, the average score as avg_math, the highest point as max_math, and the lowest point as min_math, in that order.

③ Round the average to 2 decimal places.

SQL Editor

Run a query to see results

COUNT(column) and NULL, and rounding AVG

The score table so far had no NULLs. Let's confirm how aggregation changes depending on whether NULLs are present, using the customer table that has gaps (8 people, with NULLs in age / email / country). COUNT(*) is the row count, so it stays 8, but COUNT(email) counts only the rows where email is filled in, so it's 4. Running COUNT(column) on a column containing NULLs comes out smaller than COUNT(*) like this.

Also, since AVG is division, it produces decimals like 31.5. In a report it's easier to handle if you round with ROUND(expression, digits), as in ROUND(AVG(age), 2).

The difference between COUNT(*) / COUNT(column) / COUNT(DISTINCT column)
How to countValue returned (customer, 8 people)COUNT(*)8 (all rows)COUNT(email)4 (NULL excluded)COUNT(DISTINCT country)4 (distinct count)
For customer (8 people) which has NULLs, COUNT(*) stays 8 while COUNT(email) counts only the 4 rows where email is filled in. For a column containing NULLs, COUNT(column) comes out smaller than COUNT(*).
-- How COUNT and AVG change with NULLs present
SELECT
  COUNT(*)            AS rows_all,
  COUNT(country)      AS with_country,
  ROUND(AVG(age), 2)  AS avg_age
FROM customer;

Imagine a requirement: "The member data has unfilled (NULL) fields. I want to compare the total count against the count where a value is actually present."

① From the customer table, pull, in a single row, the total row count aliased as all_rows, the count where email is filled in as with_email, the count where age is filled in as with_age, and the number of distinct country values as country_kinds, in that order.

② Confirm that running COUNT(column) on a column with NULLs comes out smaller than COUNT(*).

SQL Editor

Run a query to see results

Imagine a requirement: "I want to pull only the scores that beat the overall average and see their count and average." The overall average is 77.93… (you confirmed it in the earlier exercise).

① From the score table, target only the rows where score is 78 or higher.

② Pull, in a single row, the count of target rows aliased as high_count, their average score as avg_high, and the highest point as max_high, in that order.

③ Round the average to 2 decimal places.

SQL Editor

Run a query to see results
QUIZ

Knowledge Check

Answer each question one by one.

Q1When you write just one of the aggregate functions COUNT / SUM / AVG / MIN / MAX in a SELECT and run it, how many rows does the result have, basically?

Q2Which is a correct explanation of the difference between COUNT(*) and COUNT(column)?

Q3When you add WHERE subject = 'Math' and run AVG(score), over what range is the average computed?