04 June 2024
Database Normalization
Database normalization is a systematic approach to organizing data in a database to reduce redundancy and improve data integrity. It involves dividing large tables into smaller, related tables and defining relationships between them. The goal is to ensure that the database is efficient and consistent. Here are the key normalization techniques, typically described as a series of "normal forms" (NFs):
First Normal Form (1NF)
- Objective: Eliminate repeating groups in individual tables.
- Rule: Each table cell should contain a single value, and each record needs to be unique.
- Example: Instead of having a table where a single row holds multiple values in a column (e.g., multiple phone numbers), create separate rows for each value.
Second Normal Form (2NF)
- Objective: Remove partial dependencies; ensure that non-key attributes are fully dependent on the primary key.
- Rule: The table must be in 1NF, and all non-key attributes should depend on the entire primary key, not just part of it.
- Example: In a table with a composite primary key, move any attribute that depends on only part of the composite key to a new table.
Third Normal Form (3NF)
- Objective: Remove transitive dependencies; ensure that non-key attributes are not dependent on other non-key attributes.
- Rule: The table must be in 2NF, and all the attributes should be directly dependent on the primary key.
- Example: If an attribute in a table is dependent on another non-key attribute, move it to a separate table.
Boyce-Codd Normal Form (BCNF)
- Objective: Handle certain types of anomalies not covered by 3NF.
- Rule: The table must be in 3NF, and for any dependency A → B, A should be a super key.
- Example: If there are multiple candidate keys, ensure that every determinant is a candidate key.
Fourth Normal Form (4NF)
- Objective: Remove multi-valued dependencies.
- Rule: The table must be in BCNF, and multi-valued dependencies should be eliminated.
- Example: If a table has attributes that independently repeat, they should be separated into different tables.
Fifth Normal Form (5NF)
- Objective: Handle join dependencies and ensure that data cannot be reconstructed from smaller pieces without loss of information.
- Rule: The table must be in 4NF, and any join dependency in the table should be a consequence of the candidate keys.
- Example: Decompose tables in such a way that they can be joined without introducing redundancy.
Domain-Key Normal Form (DKNF)
- Objective: Ensure that all constraints on the data are logical consequences of the definitions of keys and domains.
- Rule: The table must be free of all modification anomalies.
- Example: All constraints should be enforced by key and domain relationships.
Normalization often involves balancing the trade-offs between data redundancy and the complexity of database design and queries. In practice, databases may not be fully normalized (especially beyond 3NF) to improve performance and because certain denormalizations can simplify queries or improve speed in read-heavy applications.
Unnormalized Table
Here's an example of an unnormalized table:
CREATE TABLE StudentCourses (
StudentID INT,
StudentName VARCHAR(100),
CourseID INT,
CourseName VARCHAR(100),
InstructorID INT,
InstructorName VARCHAR(100),
PhoneNumbers VARCHAR(100) -- This could contain multiple phone numbers
);First Normal Form (1NF)
To bring this table to 1NF, we ensure each column contains atomic values:
CREATE TABLE StudentCourses (
StudentID INT,
StudentName VARCHAR(100),
CourseID INT,
CourseName VARCHAR(100),
InstructorID INT,
InstructorName VARCHAR(100),
PhoneNumber VARCHAR(15) -- Each row will have a single phone number
);Second Normal Form (2NF)
To achieve 2NF, we remove partial dependencies. Let's split the table into two tables:
CREATE TABLE Students (
StudentID INT PRIMARY KEY,
StudentName VARCHAR(100)
);
CREATE TABLE Courses (
CourseID INT PRIMARY KEY,
CourseName VARCHAR(100),
InstructorID INT,
InstructorName VARCHAR(100)
);
CREATE TABLE StudentCourses (
StudentID INT,
CourseID INT,
PRIMARY KEY (StudentID, CourseID),
FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
);Third Normal Form (3NF)
To achieve 3NF, we need to ensure there are no transitive dependencies. We'll create a separate table for instructors:
CREATE TABLE Instructors (
InstructorID INT PRIMARY KEY,
InstructorName VARCHAR(100)
);
ALTER TABLE Courses
DROP COLUMN InstructorName;
ALTER TABLE Courses
ADD FOREIGN KEY (InstructorID) REFERENCES Instructors(InstructorID);Boyce-Codd Normal Form (BCNF)
BCNF is a stronger version of 3NF. Suppose we have a situation where a course can have multiple instructors:
CREATE TABLE CourseInstructors (
CourseID INT,
InstructorID INT,
PRIMARY KEY (CourseID, InstructorID),
FOREIGN KEY (CourseID) REFERENCES Courses(CourseID),
FOREIGN KEY (InstructorID) REFERENCES Instructors(InstructorID)
);Fourth Normal Form (4NF)
To achieve 4NF, we remove multi-valued dependencies. Assume students can have multiple phone numbers and addresses:
CREATE TABLE StudentPhoneNumbers (
StudentID INT,
PhoneNumber VARCHAR(15),
PRIMARY KEY (StudentID, PhoneNumber),
FOREIGN KEY (StudentID) REFERENCES Students(StudentID)
);
CREATE TABLE StudentAddresses (
StudentID INT,
Address VARCHAR(255),
PRIMARY KEY (StudentID, Address),
FOREIGN KEY (StudentID) REFERENCES Students(StudentID)
);Fifth Normal Form (5NF)
5NF deals with join dependencies. Assume we have a table that links students, courses, and projects:
CREATE TABLE Projects (
ProjectID INT PRIMARY KEY,
ProjectName VARCHAR(100)
);
CREATE TABLE StudentCoursesProjects (
StudentID INT,
CourseID INT,
ProjectID INT,
PRIMARY KEY (StudentID, CourseID, ProjectID),
FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
FOREIGN KEY (CourseID) REFERENCES Courses(CourseID),
FOREIGN KEY (ProjectID) REFERENCES Projects(ProjectID)
);Domain-Key Normal Form (DKNF)
DKNF requires that all constraints are enforced by domain and key definitions. Suppose we have constraints on phone number formats and course codes:
CREATE DOMAIN PhoneNumber AS VARCHAR(15)
CHECK (VALUE ~ '^[0-9]{10}$'); -- Example: Ensures phone numbers are 10 digits
CREATE DOMAIN CourseCode AS VARCHAR(10)
CHECK (VALUE ~ '^[A-Z]{4}[0-9]{4}$'); -- Example: Ensures course codes follow a specific pattern
CREATE TABLE Students (
StudentID INT PRIMARY KEY,
StudentName VARCHAR(100),
PhoneNumber PhoneNumber
);
CREATE TABLE Courses (
CourseID INT PRIMARY KEY,
CourseName VARCHAR(100),
CourseCode CourseCode,
InstructorID INT,
FOREIGN KEY (InstructorID) REFERENCES Instructors(InstructorID)
);These examples illustrate how to progressively normalize a database from an unnormalized state to DKNF, improving data integrity and reducing redundancy at each step.