Database normalization is like organizing your closet. Instead of throwing everything in one messy pile, you organize items by type, remove duplicates, and create a system that makes things easy to find and maintain.
Database Normalization is the process of organizing data in a database to:
Think of it as breaking down a large, complex table into smaller, more manageable tables connected by relationships.
Let’s start with a BADLY DESIGNED table for a school database:
| StudentID | StudentName | StudentEmail | StudentPhone | StudentAddress | CourseID | CourseName | InstructorName | InstructorEmail | InstructorOffice | Grade | Semester | Books |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | John Smith | john@email.com | 123-4567 | 123 Main St, NY | 101, 102 | Math, Physics | Dr. Adams, Dr. Brown | adams@uni.edu, brown@uni.edu | Room 201, Room 305 | A, B+ | Spring 2024 | Calculus Vol 1, Advanced Calculus; Physics Fundamentals |
| 2 | Sarah Johnson | sarah@email.com | 234-5678 | 456 Oak Ave, CA | 101 | Math | Dr. Adams | adams@uni.edu | Room 201 | A- | Spring 2024 | Calculus Vol 1, Advanced Calculus |
| 3 | Mike Davis | mike@email.com | 345-6789 | 789 Pine Rd, TX | 103 | Chemistry | Dr. Carter | carter@uni.edu | Room 410 | B | Spring 2024 | Chemistry Basics |
| 1 | John Smith | john@email.com | 123-4567 | 123 Main St, NY | 103 | Chemistry | Dr. Carter | carter@uni.edu | Room 410 | B+ | Fall 2024 | Chemistry Basics |
Now let’s fix this table step by step through different normal forms!
STUDENT_ENROLLMENTS Table (1NF)
| StudentID | StudentName | StudentEmail | StudentPhone | StudentAddress | CourseID | CourseName | InstructorName | InstructorEmail | InstructorOffice | Grade | Semester | BookTitle |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | John Smith | john@email.com | 123-4567 | 123 Main St, NY | 101 | Math | Dr. Adams | adams@uni.edu | Room 201 | A | Spring 2024 | Calculus Vol 1 |
| 1 | John Smith | john@email.com | 123-4567 | 123 Main St, NY | 101 | Math | Dr. Adams | adams@uni.edu | Room 201 | A | Spring 2024 | Advanced Calculus |
| 1 | John Smith | john@email.com | 123-4567 | 123 Main St, NY | 102 | Physics | Dr. Brown | brown@uni.edu | Room 305 | B+ | Spring 2024 | Physics Fundamentals |
| 1 | John Smith | john@email.com | 123-4567 | 123 Main St, NY | 103 | Chemistry | Dr. Carter | carter@uni.edu | Room 410 | B+ | Fall 2024 | Chemistry Basics |
| 2 | Sarah Johnson | sarah@email.com | 234-5678 | 456 Oak Ave, CA | 101 | Math | Dr. Adams | adams@uni.edu | Room 201 | A- | Spring 2024 | Calculus Vol 1 |
| 2 | Sarah Johnson | sarah@email.com | 234-5678 | 456 Oak Ave, CA | 101 | Math | Dr. Adams | adams@uni.edu | Room 201 | A- | Spring 2024 | Advanced Calculus |
| 3 | Mike Davis | mike@email.com | 345-6789 | 789 Pine Rd, TX | 103 | Chemistry | Dr. Carter | carter@uni.edu | Room 410 | B | Spring 2024 | Chemistry Basics |
Key Points:
“ONE cell, ONE value” - If you can split it, it’s not 1NF!
A partial dependency occurs when a non-key column depends on only PART of a composite primary key, not the whole key.
Example of Partial Dependency:
STUDENTS Table | StudentID (PK) | StudentName | StudentEmail | StudentPhone | StudentAddress | |—————-|————-|————–|————–|—————-| | 1 | John Smith | john@email.com | 123-4567 | 123 Main St, NY | | 2 | Sarah Johnson | sarah@email.com | 234-5678 | 456 Oak Ave, CA | | 3 | Mike Davis | mike@email.com | 345-6789 | 789 Pine Rd, TX |
Explanation: Student details depend ONLY on StudentID
COURSES Table | CourseID (PK) | CourseName | InstructorName | InstructorEmail | InstructorOffice | |—————|————|—————-|—————–|——————| | 101 | Math | Dr. Adams | adams@uni.edu | Room 201 | | 102 | Physics | Dr. Brown | brown@uni.edu | Room 305 | | 103 | Chemistry | Dr. Carter | carter@uni.edu | Room 410 |
Explanation: Course details depend ONLY on CourseID
ENROLLMENTS Table | StudentID (PK, FK) | CourseID (PK, FK) | Semester (PK) | Grade | |——————–|——————-|—————|——-| | 1 | 101 | Spring 2024 | A | | 1 | 102 | Spring 2024 | B+ | | 1 | 103 | Fall 2024 | B+ | | 2 | 101 | Spring 2024 | A- | | 3 | 103 | Spring 2024 | B |
Explanation: Grade depends on the combination of (StudentID, CourseID, Semester)
COURSE_BOOKS Table | CourseID (PK, FK) | BookTitle (PK) | |——————-|—————-| | 101 | Calculus Vol 1 | | 101 | Advanced Calculus | | 102 | Physics Fundamentals | | 103 | Chemistry Basics |
Explanation: Books are associated with courses
“The WHOLE key” - Every non-key column must depend on the WHOLE primary key, not just part of it!
When a non-key column depends on another non-key column, which depends on the primary key.
Example:
STUDENTS Table (No Change) | StudentID (PK) | StudentName | StudentEmail | StudentPhone | StudentAddress | |—————-|————-|————–|————–|—————-| | 1 | John Smith | john@email.com | 123-4567 | 123 Main St, NY | | 2 | Sarah Johnson | sarah@email.com | 234-5678 | 456 Oak Ave, CA | | 3 | Mike Davis | mike@email.com | 345-6789 | 789 Pine Rd, TX |
INSTRUCTORS Table (NEW - Extracted from COURSES) | InstructorID (PK) | InstructorName | InstructorEmail | InstructorOffice | |——————-|—————-|—————–|——————| | 1 | Dr. Adams | adams@uni.edu | Room 201 | | 2 | Dr. Brown | brown@uni.edu | Room 305 | | 3 | Dr. Carter | carter@uni.edu | Room 410 |
Explanation: Instructor details depend only on InstructorID, not on Course
COURSES Table (Modified) | CourseID (PK) | CourseName | InstructorID (FK) | |—————|————|——————-| | 101 | Math | 1 | | 102 | Physics | 2 | | 103 | Chemistry | 3 |
Explanation: Now only stores course name and reference to instructor
ENROLLMENTS Table (No Change) | StudentID (PK, FK) | CourseID (PK, FK) | Semester (PK) | Grade | |——————–|——————-|—————|——-| | 1 | 101 | Spring 2024 | A | | 1 | 102 | Spring 2024 | B+ | | 1 | 103 | Fall 2024 | B+ | | 2 | 101 | Spring 2024 | A- | | 3 | 103 | Spring 2024 | B |
COURSE_BOOKS Table (No Change) | CourseID (PK, FK) | BookTitle (PK) | |——————-|—————-| | 101 | Calculus Vol 1 | | 101 | Advanced Calculus | | 102 | Physics Fundamentals | | 103 | Chemistry Basics |
“Nothing but the key” - Non-key columns should depend ONLY on the primary key, not on other non-key columns!
“The key, the WHOLE key, and NOTHING BUT the key, so help me Codd!” (Edgar F. Codd invented normalization)
A determinant is any column (or set of columns) that determines the value of another column.
If A → B (A determines B), then A is a determinant.
COURSE_SCHEDULES Table | CourseID | TimeslotID | Room | Instructor | |———-|————|——|————| | 101 | TS1 | A101 | Dr. Adams | | 102 | TS1 | A102 | Dr. Brown | | 101 | TS2 | A101 | Dr. Adams |
Problem:
INSTRUCTOR_ROOMS Table | Instructor (PK) | Room | |—————–|——| | Dr. Adams | A101 | | Dr. Brown | A102 |
COURSE_SCHEDULES Table | CourseID (PK) | TimeslotID (PK) | Instructor (FK) | |—————|—————–|—————–| | 101 | TS1 | Dr. Adams | | 102 | TS1 | Dr. Brown | | 101 | TS2 | Dr. Adams |
“Only keys determine” - If something determines another column, it must be a candidate key!
When one column determines multiple independent sets of values.
INSTRUCTOR_SKILLS Table | InstructorID | Subject | Language | |————–|———|———-| | 1 | Math | English | | 1 | Math | Spanish | | 1 | Physics | English | | 1 | Physics | Spanish |
Problem:
Result:
INSTRUCTOR_SUBJECTS Table | InstructorID (PK, FK) | Subject (PK) | |———————–|————–| | 1 | Math | | 1 | Physics |
INSTRUCTOR_LANGUAGES Table | InstructorID (PK, FK) | Language (PK) | |———————–|—————| | 1 | English | | 1 | Spanish |
“Separate the independent” - If two facts are independent, put them in separate tables!
When a table can be broken into smaller tables and reconstructed without losing information, it should be.
AGENT_COMPANY_PRODUCT Table | Agent | Company | Product | |——-|———|———| | Smith | Acme | Widget | | Smith | Acme | Gadget | | Smith | Beta | Widget | | Jones | Acme | Widget |
Complex Rule: An agent can sell a product for a company ONLY IF:
AGENT_COMPANY Table | Agent (PK) | Company (PK) | |————|————–| | Smith | Acme | | Smith | Beta | | Jones | Acme |
COMPANY_PRODUCT Table | Company (PK) | Product (PK) | |————–|————–| | Acme | Widget | | Acme | Gadget | | Beta | Widget |
AGENT_PRODUCT Table | Agent (PK) | Product (PK) | |————|————–| | Smith | Widget | | Smith | Gadget | | Jones | Widget |
“Break down to the most atomic relationships” - If you can split without losing info, do it!
Note: 5NF is rarely used in practice. Most databases stop at 3NF or BCNF.
| Normal Form | Key Question | Quick Check |
|---|---|---|
| 1NF | Any multi-valued cells? | Each cell = ONE value only |
| 2NF | Partial dependencies? | Non-key columns depend on WHOLE key |
| 3NF | Transitive dependencies? | Non-key columns depend ONLY on key |
| BCNF | Non-key determinants? | Only candidate keys determine values |
| 4NF | Multi-valued dependencies? | Separate independent multi-valued facts |
| 5NF | Join dependencies? | Decompose to atomic relationships |
“The key, the WHOLE key, and NOTHING BUT the key, so help me Codd!”
CUSTOMERS Table: ORDERS Table:
- CustomerID (PK) - OrderID (PK)
- CustomerName - CustomerID (FK)
- Email - OrderDate
- TotalAmount
STUDENTS Table: ENROLLMENTS Table: COURSES Table:
- StudentID (PK) - StudentID (PK, FK) - CourseID (PK)
- StudentName - CourseID (PK, FK) - CourseName
- Email - EnrollmentDate - Credits
- Grade
EMPLOYEES Table:
- EmployeeID (PK)
- EmployeeName
- ManagerID (FK) → References EmployeeID
❌ Repeating columns (Phone1, Phone2, Phone3) → Create related table ❌ Calculated/Derived columns (TotalPrice = Quantity × UnitPrice) → Calculate in queries ❌ Nullable columns everywhere → Might need separate optional tables ❌ Very wide tables (50+ columns) → Probably needs splitting ❌ Copy-paste column names → Look for hidden entities
Here’s our complete normalized database for the school system:
-- STUDENTS Table
CREATE TABLE Students (
StudentID INT PRIMARY KEY AUTO_INCREMENT,
StudentName VARCHAR(100) NOT NULL,
StudentEmail VARCHAR(100) UNIQUE NOT NULL,
StudentPhone VARCHAR(20),
StudentAddress VARCHAR(200)
);
-- INSTRUCTORS Table
CREATE TABLE Instructors (
InstructorID INT PRIMARY KEY AUTO_INCREMENT,
InstructorName VARCHAR(100) NOT NULL,
InstructorEmail VARCHAR(100) UNIQUE NOT NULL,
InstructorOffice VARCHAR(50)
);
-- COURSES Table
CREATE TABLE Courses (
CourseID INT PRIMARY KEY,
CourseName VARCHAR(100) NOT NULL,
Credits INT,
InstructorID INT,
FOREIGN KEY (InstructorID) REFERENCES Instructors(InstructorID)
);
-- ENROLLMENTS Table
CREATE TABLE Enrollments (
EnrollmentID INT PRIMARY KEY AUTO_INCREMENT,
StudentID INT NOT NULL,
CourseID INT NOT NULL,
Semester VARCHAR(20) NOT NULL,
Grade VARCHAR(5),
EnrollmentDate DATE,
FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
FOREIGN KEY (CourseID) REFERENCES Courses(CourseID),
UNIQUE (StudentID, CourseID, Semester)
);
-- COURSE_BOOKS Table
CREATE TABLE CourseBooks (
CourseID INT NOT NULL,
BookTitle VARCHAR(200) NOT NULL,
ISBN VARCHAR(20),
PRIMARY KEY (CourseID, BookTitle),
FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
);
Remember:
Practice Exercise: Try normalizing this table yourself:
| OrderID | CustomerName | CustomerEmail | Items | Quantities | Prices | TotalAmount |
|---|---|---|---|---|---|---|
| 1 | Alice | alice@email.com | Laptop, Mouse | 1, 2 | 1000, 25 | 1050 |
Hint: You’ll need at least 4 tables: Customers, Orders, Products, OrderItems
Happy database designing! 🗄️