Q1A customer's phone number is duplicated across multiple order rows, so changing the phone number requires updating every row without missing any — otherwise the values disagree. What is this problem called?
Table Design and Normalization (3-Tier Schema)
Use orders_flat with item1/item2 repeating columns to feel the update, insert, and delete anomalies firsthand, then split the table step by step through 1NF / 2NF / 3NF into the customer / order_record / order_line 3-tier schema.
The dataset for this article — orders_flat (an unnormalized example)
Normalization (a design technique that splits tables to reduce duplicated and inconsistent data) is the idea of breaking up a single giant table that holds everything into multiple tables that are easier to keep consistent.
In this article you'll see the problems that occur in an unnormalized table with repeating columns, then build up an understanding of 1NF / 2NF / 3NF and the 3-tier schema through hands-on design exercises.
The dataset is orders_flat (2 rows), which crams order data into a single table.
It's a classic unnormalized table where customer name, phone number, product 1, and product 2 are all lined up in one row — and you'll normalize it into the three tables customer / order_record / order_line.
Three anomalies in an unnormalized table
A design like orders_flat that crams everything into a single table easily produces contradictions when you add, change, or delete data.
The classic examples are the three: update anomaly / insert anomaly / delete anomaly.
An update anomaly happens when the same fact is duplicated across many rows — for example, if you want to change a phone number, you have to update every row without missing any.
An insert anomaly is when you can't register a customer who hasn't placed an order yet, because there's no order row to attach the customer information to.
A delete anomaly is when deleting a single order also wipes out information you wanted to keep, like the customer's phone number.
These happen because the design doesn't hold "one fact in one place".
-- Delete anomaly example: deleting order 2 also wipes Bob's contact info
-- (observe the behavior on a throwaway table flat_demo so the real orders_flat is untouched)
DROP TABLE IF EXISTS flat_demo;
CREATE TABLE flat_demo(
order_id INTEGER, customer TEXT, customer_tel TEXT, product TEXT
);
INSERT INTO flat_demo VALUES
(1,'Alice','03-1111','Pen'),
(2,'Bob','03-2222','Clip');
-- Bob has only one order. Deleting it also loses Bob's phone number 03-2222
DELETE FROM flat_demo WHERE order_id = 2;
SELECT * FROM flat_demo;
-- -> Bob's contact info is now gone from the database (delete anomaly)
1NF through 3NF — split step by step
Tables are organized through normalization in stages.
1NF — get rid of repeating columns
Drop numbered repeating columns like item1 / item2.
Before — item1 / item2 repeating columns
| order_id | customer | tel | item1 | item1_qty | item2 | item2_qty |
|---|---|---|---|---|---|---|
| 1 | Alice | 03-1111 | Pen | 2 | Note | 1 |
| 2 | Bob | 03-2222 | Clip | 5 | (empty) | NULL |
After — one row per product, with the composite primary key `(order_id, product)`
| order_id (PK) | customer | tel | product (PK) | qty |
|---|---|---|---|---|
| 1 | Alice | 03-1111 | Pen | 2 |
| 1 | Alice | 03-1111 | Note | 1 |
| 2 | Bob | 03-2222 | Clip | 5 |
2NF — move columns that depend on only part of the primary key into another table
The table from the previous step has the composite primary key (order_id, product). But look closely: customer and tel are determined by `order_id` alone (the same order_id always has the same customer).
Moving these columns that depend on only part of the primary key (partial dependency) into a separate table is 2NF. You split it into an order header order_header and order details order_line.
Before — customer and tel duplicated on every row
| order_id (PK) | customer | tel | product (PK) | qty |
|---|---|---|---|---|
| 1 | Alice | 03-1111 | Pen | 2 |
| 1 | Alice | 03-1111 | Note | 1 |
| 2 | Bob | 03-2222 | Clip | 5 |
After — split into 2 tables
order_header (one row per order)
| order_id (PK) | customer | tel |
|---|---|---|
| 1 | Alice | 03-1111 |
| 2 | Bob | 03-2222 |
order_line (composite primary key (order_id, product), where order_id is a foreign key into order_header)
| order_id (PK,FK) | product (PK) | qty |
|---|---|---|
| 1 | Pen | 2 |
| 1 | Note | 1 |
| 2 | Clip | 5 |
3NF — move columns determined by non-key columns into yet another table
Looking at the order_header from the previous step, tel is not determined by order_id but by `customer` (the same Alice always has 03-1111).
Eliminating dependencies that go through another non-key column instead of straight from the primary key (transitive dependency: order_id -> customer -> tel) is what 3NF is about.
You extract a customer master customer and leave order_record holding only the foreign key to the customer.
Before — tel depends on customer
| order_id (PK) | customer | tel |
|---|---|---|
| 1 | Alice | 03-1111 |
| 2 | Bob | 03-2222 |
After — split into 3 tables
customer (customer master)
| customer_id (PK) | customer_name | customer_tel |
|---|---|---|
| 1 | Alice | 03-1111 |
| 2 | Bob | 03-2222 |
order_record (order header: references customer via foreign key)
| order_id (PK) | customer_id (FK) |
|---|---|
| 1 | 1 |
| 2 | 2 |
order_line (order details: same as the previous step)
| order_id (PK,FK) | product (PK) | qty |
|---|---|---|
| 1 | Pen | 2 |
| 1 | Note | 1 |
| 2 | Clip | 5 |
Once split this far, each fact (a customer's phone number, the link between order and customer, the quantity of a product in an order) lives in exactly one place. That eliminates the update, insert, and delete anomalies.
The 3-tier schema and the final form — customer / order_record / order_line
The normalized schema can be organized as three tiers with different roles.
- Master tier: entities that change infrequently and are referenced by others, like
customer - Transaction tier: tables that record business events, like
order_record - Detail tier: tables that hold the breakdown rows of one event, like
order_line
By pointing the referencing side at the master with a foreign key, each fact (such as a customer's phone number) ends up in one place.
The final form is these three tables:
customer(customer_id primary key, customer_name, customer_tel), order_record(order_id primary key, customer_id -> customer), order_line(order_id -> order_record, product, qty, composite primary key (order_id, product)).
The phone number lives in just one row of customer, so an update is a single row; a customer with no orders can still be added to customer (insert anomaly resolved); and deleting an order leaves the customer in customer (delete anomaly resolved).
-- Benefit of normalization (read-only): a customer's phone number is in just one row of customer
-- In the orders_flat shape you'd have to fix every Alice order row = update anomaly
UPDATE customer SET customer_tel = '03-9999'
WHERE customer_name = 'Alice';
-- Fix one spot, and the new number flows into every order on JOIN
SELECT o.order_id, c.customer_name, c.customer_tel
FROM order_record o
JOIN customer c ON o.customer_id = c.customer_id
ORDER BY o.order_id;
Tips — Always put a primary key on every table
Whether or not you've normalized, design every table with a primary key (`PRIMARY KEY`). A primary key gives you three things:
- Unique row identification: no two rows can have the same key, so UPDATE / DELETE reliably targets exactly one row via WHERE pk = ...
- Referenceable from other tables: a foreign key needs a primary key (or UNIQUE) on the other side. It's the foundation for linking the tables you split via normalization
- Automatic indexing: primary key columns get a B-Tree index automatically, speeding up equality lookups and JOINs
If no single column is unique on its own, use a composite primary key like PRIMARY KEY (order_id, product). Skipping it with "I'll add it later" makes duplicate rows indistinguishable and gives up all three benefits above.
Knowledge Check
Answer each question one by one.
Q2Which operation gets a table into 1NF?
Q3After normalization, orders_flat is split into the three tables customer / order_record / order_line. How do you get the original combined list back?