Overview
Database normalization structures relational data to eliminate redundancy and prevent update anomalies. The process decomposes tables with anomalies into smaller, well-structured tables that preserve data integrity and reduce storage overhead. E.F. Codd introduced the concept in 1970 as part of the relational model, establishing a theoretical foundation for database design.
The normalization process evaluates functional dependencies between attributes and progressively refines table structures through a series of normal forms. Each normal form addresses specific types of redundancy and anomalies. Tables advance from unnormalized structures through First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), and beyond to more restrictive forms like Boyce-Codd Normal Form (BCNF) and Fourth Normal Form (4NF).
Normalization applies during database schema design and when refactoring existing databases. The process identifies functional dependencies, decomposes tables based on these dependencies, and establishes relationships between the resulting tables. While normalization improves data integrity and reduces redundancy, it introduces complexity through table joins and may impact query performance.
Consider a denormalized customer order table:
| OrderID | CustomerName | CustomerEmail | Product | Price | Quantity |
|---------|--------------|---------------|---------|-------|----------|
| 1 | Alice Smith | alice@ex.com | Laptop | 1200 | 1 |
| 2 | Alice Smith | alice@ex.com | Mouse | 25 | 2 |
| 3 | Bob Jones | bob@ex.com | Laptop | 1200 | 1 |
After normalization into three tables:
Customers: | CustomerID | Name | Email |
Orders: | OrderID | CustomerID | ProductID | Quantity |
Products: | ProductID | Name | Price |
This separation eliminates redundant customer and product data while maintaining all original information through foreign key relationships.
Key Principles
Normalization builds on functional dependencies, which define relationships where one attribute determines another. If attribute A determines attribute B (written A → B), then each value of A associates with exactly one value of B. In the customer example, CustomerID → CustomerName represents a functional dependency because each customer ID maps to exactly one name. Functional dependencies drive the decomposition process and determine which attributes belong together in tables.
Update anomalies occur in denormalized tables where redundant data creates inconsistencies during modifications. An insertion anomaly prevents adding certain data without unrelated information—storing a product requires an order. A deletion anomaly loses unintended data—removing the last order for a customer erases all customer information. An update anomaly requires changing data in multiple locations—updating a customer's email necessitates modifying every order row for that customer. Normalization eliminates these anomalies by removing redundancy.
Candidate keys uniquely identify rows in a table. A table may have multiple candidate keys; one becomes the primary key. An attribute belongs to a candidate key if it forms part of any candidate key in the table. Non-key attributes depend on candidate keys but don't participate in them. The distinction between key and non-key attributes determines how dependencies violate normal forms.
Partial dependencies exist when a non-key attribute depends on part of a composite key rather than the entire key. In a table with composite key (OrderID, ProductID), if CustomerName depends only on OrderID, a partial dependency exists. Second Normal Form eliminates partial dependencies.
Transitive dependencies occur when a non-key attribute depends on another non-key attribute, which depends on the key. If CustomerID → PostalCode and PostalCode → City, then CustomerID → City represents a transitive dependency. Third Normal Form removes transitive dependencies.
Normal forms create a hierarchy where each form satisfies the requirements of previous forms:
- First Normal Form (1NF): Eliminates repeating groups and ensures atomic values
- Second Normal Form (2NF): Satisfies 1NF and eliminates partial dependencies
- Third Normal Form (3NF): Satisfies 2NF and eliminates transitive dependencies
- Boyce-Codd Normal Form (BCNF): Satisfies 3NF with stricter dependency rules
- Fourth Normal Form (4NF): Satisfies BCNF and eliminates multi-valued dependencies
- Fifth Normal Form (5NF): Satisfies 4NF and eliminates join dependencies
Most database designs target 3NF as it balances normalization benefits with practical implementation concerns. BCNF and higher forms apply to specific scenarios with complex dependencies.
Implementation Approaches
First Normal Form (1NF) requires atomic values in all columns and eliminates repeating groups. A table violates 1NF if it contains arrays, lists, or comma-separated values in columns, or if it has repeating column groups like Phone1, Phone2, Phone3. Converting to 1NF splits multi-valued attributes into separate rows or tables.
A denormalized customer table with multiple phone numbers:
| CustomerID | Name | Phones |
|------------|-------------|----------------------|
| 1 | Alice Smith | 555-0100, 555-0101 |
| 2 | Bob Jones | 555-0200 |
Converting to 1NF creates separate rows for each phone:
| CustomerID | Name | Phone |
|------------|-------------|----------|
| 1 | Alice Smith | 555-0100 |
| 1 | Alice Smith | 555-0101 |
| 2 | Bob Jones | 555-0200 |
Alternatively, create a separate phone numbers table:
Customers: | CustomerID | Name |
PhoneNumbers: | CustomerID | Phone |
Second Normal Form (2NF) requires 1NF and eliminates partial dependencies. If the table has a composite primary key, every non-key attribute must depend on the entire key, not just part of it. Tables with single-attribute primary keys automatically satisfy 2NF.
An order details table with partial dependency:
| OrderID | ProductID | Quantity | ProductName | ProductPrice |
|---------|-----------|----------|-------------|--------------|
| 1 | 101 | 2 | Laptop | 1200 |
| 1 | 102 | 1 | Mouse | 25 |
Primary key: (OrderID, ProductID)
ProductName and ProductPrice depend only on ProductID, not the full key.
Converting to 2NF separates product information:
OrderDetails: | OrderID | ProductID | Quantity |
Products: | ProductID | ProductName | ProductPrice |
Third Normal Form (3NF) requires 2NF and eliminates transitive dependencies. Non-key attributes must depend directly on the primary key, not on other non-key attributes.
An orders table with transitive dependency:
| OrderID | CustomerID | CustomerName | CustomerCity | CustomerCountry |
|---------|------------|--------------|--------------|-----------------|
| 1 | 100 | Alice Smith | Seattle | USA |
| 2 | 100 | Alice Smith | Seattle | USA |
CustomerName, CustomerCity, and CustomerCountry depend on CustomerID, not directly on OrderID.
Converting to 3NF separates customer data:
Orders: | OrderID | CustomerID |
Customers: | CustomerID | CustomerName | CustomerCity | CustomerCountry |
Boyce-Codd Normal Form (BCNF) strengthens 3NF by requiring that every determinant is a candidate key. A determinant is an attribute or set of attributes on the left side of a functional dependency. In 3NF, non-key attributes cannot determine other non-key attributes. BCNF extends this—no attribute can determine another unless it's a candidate key.
A table violating BCNF but satisfying 3NF:
| Student | Course | Instructor |
|---------|--------|------------|
| Alice | DB101 | Dr. Smith |
| Bob | DB101 | Dr. Smith |
| Alice | OS201 | Dr. Jones |
Dependencies: (Student, Course) → Instructor and Instructor → Course
The second dependency violates BCNF because Instructor determines Course but isn't a candidate key.
Converting to BCNF:
StudentInstructor: | Student | Instructor |
InstructorCourse: | Instructor | Course |
Denormalization intentionally introduces redundancy to improve query performance. While normalized databases minimize redundancy, joins across many tables can degrade performance for read-heavy workloads. Denormalization strategies include duplicating frequently accessed data, maintaining computed aggregates, or combining tables that are always queried together.
Common denormalization patterns:
- Storing calculated values instead of computing on each query
- Duplicating frequently joined data into a single table
- Maintaining materialized views with pre-joined data
- Adding redundant foreign key references to skip intermediate tables
Ruby Implementation
Ruby applications interact with normalized databases primarily through Object-Relational Mapping (ORM) frameworks like ActiveRecord. These frameworks map normalized database tables to Ruby classes and provide abstractions for managing relationships between tables.
ActiveRecord associations define relationships between normalized tables. The framework provides methods to declare one-to-many, many-to-many, and one-to-one relationships.
class Customer < ApplicationRecord
has_many :orders
has_many :phone_numbers
end
class Order < ApplicationRecord
belongs_to :customer
has_many :order_items
has_many :products, through: :order_items
end
class OrderItem < ApplicationRecord
belongs_to :order
belongs_to :product
end
class Product < ApplicationRecord
has_many :order_items
has_many :orders, through: :order_items
end
class PhoneNumber < ApplicationRecord
belongs_to :customer
end
These associations enable navigation between related records:
customer = Customer.find(1)
customer.orders.each do |order|
order.order_items.each do |item|
puts "#{item.quantity}x #{item.product.name}"
end
end
Database migrations create and modify normalized table structures. Migrations define schema changes that the framework applies to the database.
Creating a normalized schema for customers and orders:
class CreateNormalizedSchema < ActiveRecord::Migration[7.0]
def change
create_table :customers do |t|
t.string :name, null: false
t.string :email, null: false
t.timestamps
t.index :email, unique: true
end
create_table :phone_numbers do |t|
t.references :customer, null: false, foreign_key: true
t.string :number, null: false
t.timestamps
end
create_table :products do |t|
t.string :name, null: false
t.decimal :price, precision: 10, scale: 2, null: false
t.timestamps
end
create_table :orders do |t|
t.references :customer, null: false, foreign_key: true
t.timestamps
end
create_table :order_items do |t|
t.references :order, null: false, foreign_key: true
t.references :product, null: false, foreign_key: true
t.integer :quantity, null: false
t.timestamps
end
end
end
Querying normalized data requires joining tables to retrieve related information. ActiveRecord provides methods to optimize these queries.
# Inefficient: N+1 queries
orders = Order.all
orders.each do |order|
puts order.customer.name # Separate query for each customer
end
# Efficient: Eager loading with includes
orders = Order.includes(:customer, order_items: :product)
orders.each do |order|
puts "Customer: #{order.customer.name}"
order.order_items.each do |item|
puts " #{item.product.name}: #{item.quantity}"
end
end
Handling denormalization in Ruby applications often involves caching calculated values or maintaining redundant data for performance.
class Order < ApplicationRecord
belongs_to :customer
has_many :order_items
# Denormalized total stored in orders table
# Updated when order items change
def recalculate_total!
self.total_amount = order_items.joins(:product)
.sum('order_items.quantity * products.price')
save!
end
end
# Callback to maintain denormalized data
class OrderItem < ApplicationRecord
belongs_to :order
belongs_to :product
after_save :update_order_total
after_destroy :update_order_total
private
def update_order_total
order.recalculate_total!
end
end
Database views provide denormalized read-only representations of normalized data. Some Ruby database adapters support creating and querying views.
# Create a denormalized view via migration
class CreateOrderSummaryView < ActiveRecord::Migration[7.0]
def up
execute <<-SQL
CREATE VIEW order_summaries AS
SELECT
orders.id,
orders.created_at,
customers.name as customer_name,
customers.email as customer_email,
COUNT(order_items.id) as item_count,
SUM(order_items.quantity * products.price) as total_amount
FROM orders
JOIN customers ON customers.id = orders.customer_id
JOIN order_items ON order_items.order_id = orders.id
JOIN products ON products.id = order_items.product_id
GROUP BY orders.id, customers.name, customers.email
SQL
end
def down
execute "DROP VIEW order_summaries"
end
end
# Query the view like a table
class OrderSummary < ApplicationRecord
self.primary_key = :id
def readonly?
true
end
end
summaries = OrderSummary.where('total_amount > 1000')
Practical Examples
Example 1: Normalizing a Denormalized Order System
Starting with a single denormalized table containing all order information:
| OrderID | OrderDate | CustomerName | CustomerEmail | CustomerPhone | Product | Price | Quantity | OrderTotal |
|---------|-----------|--------------|---------------|---------------|---------|-------|----------|------------|
| 1 | 2024-01-15| Alice Smith | alice@ex.com | 555-0100 | Laptop | 1200 | 1 | 1225 |
| 1 | 2024-01-15| Alice Smith | alice@ex.com | 555-0100 | Mouse | 25 | 1 | 1225 |
| 2 | 2024-01-16| Bob Jones | bob@ex.com | 555-0200 | Laptop | 1200 | 2 | 2400 |
Step 1: Convert to 1NF
The table already has atomic values but contains repeating groups (multiple items per order create multiple rows with duplicated order information). The structure satisfies 1NF requirements.
Step 2: Convert to 2NF
Composite key: (OrderID, Product)
Partial dependencies exist:
OrderID → OrderDate, CustomerName, CustomerEmail, CustomerPhone, OrderTotalProduct → Price
Decompose into tables eliminating partial dependencies:
Orders:
| OrderID | OrderDate | CustomerName | CustomerEmail | CustomerPhone | OrderTotal |
OrderItems:
| OrderID | Product | Quantity |
Products:
| Product | Price |
Step 3: Convert to 3NF
Transitive dependencies exist in the Orders table:
OrderID → CustomerName, CustomerEmail, CustomerPhoneCustomerEmail → CustomerName, CustomerPhone(customer attributes depend on each other)
Also, OrderTotal is a calculated value that depends on order items, creating a transitive dependency.
Final 3NF structure:
# Schema
Customers: | CustomerID | Name | Email | Phone |
Orders: | OrderID | CustomerID | OrderDate |
OrderItems: | OrderItemID | OrderID | ProductID | Quantity |
Products: | ProductID | Name | Price |
Example 2: Student Course Registration System
Initial denormalized structure:
| StudentID | StudentName | Major | Course | CourseName | Instructor | InstructorOffice | Grade |
|-----------|-------------|-------|--------|------------|------------|------------------|-------|
| 1 | Alice | CS | DB101 | Databases | Dr. Smith | Room 301 | A |
| 1 | Alice | CS | OS201 | OS | Dr. Jones | Room 302 | B |
| 2 | Bob | EE | DB101 | Databases | Dr. Smith | Room 301 | A |
Analyzing Dependencies:
StudentID → StudentName, MajorCourse → CourseNameInstructor → InstructorOfficeCourse → Instructor(each course has one instructor)(StudentID, Course) → Grade
Normalizing to 3NF:
Students: | StudentID | StudentName | Major |
Courses: | CourseID | CourseName | InstructorID |
Instructors: | InstructorID | InstructorName | Office |
Enrollments: | EnrollmentID | StudentID | CourseID | Grade |
Ruby implementation:
class Student < ApplicationRecord
has_many :enrollments
has_many :courses, through: :enrollments
end
class Course < ApplicationRecord
belongs_to :instructor
has_many :enrollments
has_many :students, through: :enrollments
end
class Instructor < ApplicationRecord
has_many :courses
end
class Enrollment < ApplicationRecord
belongs_to :student
belongs_to :course
end
# Query normalized data
student = Student.includes(enrollments: { course: :instructor }).find(1)
student.enrollments.each do |enrollment|
course = enrollment.course
puts "#{course.course_name}: #{enrollment.grade} (#{course.instructor.instructor_name})"
end
Example 3: E-commerce Product Catalog with Attributes
Initial structure with repeating attribute groups:
| ProductID | Name | Price | Color1 | Color2 | Size1 | Size2 | Size3 |
|-----------|--------|-------|--------|--------|-------|-------|-------|
| 1 | Shirt | 29.99 | Red | Blue | S | M | L |
| 2 | Pants | 49.99 | Black | NULL | 32 | 34 | 36 |
Converting to 1NF eliminates repeating columns:
Products: | ProductID | Name | Price |
ProductColors: | ProductID | Color |
ProductSizes: | ProductID | Size |
This structure reaches 3NF. The composite key (ProductID, Color) in ProductColors and (ProductID, Size) in ProductSizes determine all attributes in their respective tables.
Ruby implementation with inventory management:
class Product < ApplicationRecord
has_many :product_colors
has_many :product_sizes
has_many :inventory_items
def available_variants
inventory_items.includes(:product_color, :product_size)
.where('quantity > 0')
end
end
class ProductColor < ApplicationRecord
belongs_to :product
has_many :inventory_items
end
class ProductSize < ApplicationRecord
belongs_to :product
has_many :inventory_items
end
class InventoryItem < ApplicationRecord
belongs_to :product
belongs_to :product_color
belongs_to :product_size
validates :quantity, numericality: { greater_than_or_equal_to: 0 }
end
# Query specific variant
shirt = Product.find_by(name: 'Shirt')
red_medium = shirt.inventory_items
.joins(:product_color, :product_size)
.where(product_colors: { color: 'Red' })
.where(product_sizes: { size: 'M' })
.first
Design Considerations
Normalization Benefits and Costs
Normalization reduces data redundancy and improves data integrity by centralizing information. Updating a customer's email requires changing one row instead of dozens. Deleting a product doesn't lose customer information. These benefits come with costs: queries require more joins, and applications need more complex object graphs to represent related data.
The decision to normalize depends on access patterns, data volatility, and consistency requirements. Systems with frequent writes and strong consistency requirements benefit from normalization. The single source of truth prevents inconsistencies, and reduced redundancy minimizes storage costs. Systems with read-heavy workloads and less frequent updates may benefit from denormalization to avoid join overhead.
When to Apply Each Normal Form
1NF serves as the minimum requirement for relational databases. All production databases should satisfy 1NF. Storing arrays or comma-separated values in columns breaks relational algebra and prevents efficient querying. Convert multi-valued attributes to separate rows or tables.
2NF applies to tables with composite keys. Single-column primary keys automatically satisfy 2NF. Apply 2NF when composite keys exist and non-key attributes depend only on part of the key. The process typically creates separate tables for entities that were previously embedded.
3NF represents the practical target for most databases. Third normal form eliminates most redundancy while maintaining reasonable query complexity. Apply 3NF to all tables unless specific performance requirements justify denormalization. Most database design methodologies aim for 3NF as the standard.
BCNF applies when 3NF allows anomalies due to overlapping candidate keys or unusual dependency patterns. If functional dependencies exist where the determinant isn't a candidate key, consider BCNF. The form prevents subtle anomalies but may require decomposing tables in ways that complicate queries.
Denormalization Scenarios
Performance requirements often justify denormalization. Specific patterns include:
Aggregated data: Storing counts, sums, or averages that would require expensive calculations. An order total field duplicates information derivable from order items but eliminates complex joins and aggregations for common queries.
class Order < ApplicationRecord
has_many :order_items
# Denormalized field maintained by callbacks
# Faster than calculating on every access
def self.with_totals
select('orders.*,
(SELECT SUM(quantity * price)
FROM order_items
WHERE order_items.order_id = orders.id) as total')
end
end
Frequently joined data: Combining tables that are always accessed together. If every product query requires the category name, storing the category name in the products table avoids a join despite redundancy.
Historical snapshots: Copying current values into transaction records to preserve point-in-time state. An order stores the product price at purchase time rather than referencing the products table, which may change prices later.
class OrderItem < ApplicationRecord
belongs_to :order
belongs_to :product
before_create :capture_product_price
private
def capture_product_price
# Store current price to preserve historical accuracy
self.price_at_purchase = product.price
end
def line_total
quantity * price_at_purchase
end
end
Read replicas and caching: Maintaining denormalized copies of data optimized for specific read patterns. Write to normalized tables, then propagate changes to denormalized read structures.
Balancing Normalization and Performance
Start with a normalized design and denormalize only when measurements demonstrate performance problems. Premature denormalization adds complexity without confirmed benefits. Profile queries to identify bottlenecks. If joins create performance issues, consider:
- Adding indexes before denormalizing
- Optimizing query patterns and reducing N+1 queries
- Implementing caching layers
- Using database-specific optimizations (materialized views, indexed views)
- Denormalizing as a last resort
When denormalizing, maintain data consistency through application logic or database triggers. The application must update all copies of duplicated data. This complexity increases maintenance burden and creates opportunities for bugs.
# Maintaining consistency in denormalized data
class Product < ApplicationRecord
has_many :order_items
after_update :update_order_items, if: :saved_change_to_price?
private
def update_order_items
# Update denormalized price in open orders only
order_items.joins(:order)
.where('orders.status = ?', 'pending')
.update_all(current_price: price)
end
end
Performance Considerations
Join Performance Impact
Normalized databases require joins to retrieve related data. Each join operation processes rows from multiple tables, matching records based on key relationships. The cost increases with the number of tables, rows, and join complexity. A query requiring five joins across large tables can become significantly slower than reading from a single denormalized table.
Join algorithms affect performance:
- Nested loop joins iterate through one table and search the other for matches
- Hash joins build hash tables for faster lookup
- Merge joins sort both tables and merge results
Database optimizers select algorithms based on table sizes, available indexes, and statistics. Understanding these mechanisms helps predict performance characteristics of normalized designs.
Index Strategy for Normalized Tables
Foreign keys used in joins require indexes. Without indexes, the database performs full table scans to find matching rows. Index every foreign key column used in joins:
class AddIndexesToNormalizedSchema < ActiveRecord::Migration[7.0]
def change
# Index foreign keys for join performance
add_index :orders, :customer_id
add_index :order_items, :order_id
add_index :order_items, :product_id
add_index :phone_numbers, :customer_id
# Composite indexes for common query patterns
add_index :order_items, [:order_id, :product_id], unique: true
end
end
Composite indexes support queries filtering on multiple columns. The index column order matters—place the most selective column first, or match the query's WHERE clause order.
Query Optimization Patterns
Eager loading prevents N+1 query problems in normalized schemas. Loading related records in batches rather than individually reduces database round trips:
# N+1 problem: One query for orders, then one per customer
orders = Order.limit(100)
orders.each { |order| puts order.customer.name } # 101 queries
# Solution: Eager load customers
orders = Order.includes(:customer).limit(100)
orders.each { |order| puts order.customer.name } # 2 queries
# Complex eager loading with nested associations
orders = Order.includes(
:customer,
order_items: { product: :category }
).limit(100)
Select specific columns instead of loading entire records when accessing limited attributes:
# Load entire Customer objects
Order.includes(:customer).map { |o| o.customer.email }
# More efficient: load only needed columns
Order.joins(:customer).pluck('customers.email')
Measuring Normalization Performance Trade-offs
Benchmark queries to quantify normalization costs. Compare normalized and denormalized approaches with realistic data volumes:
require 'benchmark'
# Normalized query with joins
normalized_time = Benchmark.realtime do
Order.includes(:customer, order_items: :product)
.where('orders.created_at > ?', 30.days.ago)
.each do |order|
total = order.order_items.sum { |item| item.quantity * item.product.price }
end
end
# Denormalized query without joins
denormalized_time = Benchmark.realtime do
Order.where('orders.created_at > ?', 30.days.ago)
.pluck(:id, :total_amount)
end
puts "Normalized: #{normalized_time}s"
puts "Denormalized: #{denormalized_time}s"
Test with production-scale data volumes. Performance characteristics change dramatically between 100 rows and 10 million rows. Queries that perform well on small datasets may become unusable at scale.
Caching Strategies
Application-level caching reduces database load for normalized data. Cache frequently accessed records and expensive joins:
class Order < ApplicationRecord
belongs_to :customer
has_many :order_items
def total_with_cache
Rails.cache.fetch("order/#{id}/total", expires_in: 1.hour) do
order_items.joins(:product)
.sum('order_items.quantity * products.price')
end
end
# Invalidate cache when order changes
after_save :clear_total_cache
private
def clear_total_cache
Rails.cache.delete("order/#{id}/total")
end
end
Fragment caching stores rendered HTML for complex queries:
# View template with fragment caching
<% cache ['order-summary', order, order.order_items.maximum(:updated_at)] do %>
<div class="order-summary">
<h3><%= order.customer.name %></h3>
<ul>
<% order.order_items.each do |item| %>
<li><%= item.product.name %>: <%= item.quantity %></li>
<% end %>
</ul>
</div>
<% end %>
Database-Specific Optimizations
Materialized views maintain pre-computed joins and aggregations. Unlike regular views, materialized views store results physically and must be refreshed when underlying data changes:
-- PostgreSQL materialized view
CREATE MATERIALIZED VIEW order_totals AS
SELECT
orders.id,
orders.customer_id,
SUM(order_items.quantity * products.price) as total_amount
FROM orders
JOIN order_items ON order_items.order_id = orders.id
JOIN products ON products.id = order_items.product_id
GROUP BY orders.id, orders.customer_id;
CREATE INDEX idx_order_totals_customer ON order_totals(customer_id);
-- Refresh when data changes
REFRESH MATERIALIZED VIEW order_totals;
Partial indexes reduce index size and improve performance for queries with common WHERE clauses:
class AddPartialIndexes < ActiveRecord::Migration[7.0]
def change
# Index only active orders
add_index :orders, :customer_id,
where: "status = 'active'",
name: 'idx_orders_customer_active'
end
end
Reference
Normal Forms Comparison
| Normal Form | Requirement | Eliminates | Key Rule |
|---|---|---|---|
| First (1NF) | Atomic values, no repeating groups | Repeating columns and multi-valued attributes | Each cell contains single value |
| Second (2NF) | 1NF plus no partial dependencies | Partial key dependencies | Non-key attributes depend on entire key |
| Third (3NF) | 2NF plus no transitive dependencies | Transitive dependencies via non-key attributes | Non-key attributes depend only on key |
| Boyce-Codd (BCNF) | 3NF plus all determinants are candidate keys | Dependencies where determinant is not a key | Every determinant is a candidate key |
| Fourth (4NF) | BCNF plus no multi-valued dependencies | Independent multi-valued facts in same table | No multi-valued dependencies |
| Fifth (5NF) | 4NF plus no join dependencies | Redundancy from combining independent relationships | Cannot decompose without losing information |
Functional Dependency Notation
| Notation | Meaning | Example |
|---|---|---|
| A → B | A determines B | CustomerID → CustomerName |
| A → B, C | A determines both B and C | OrderID → OrderDate, CustomerID |
| A, B → C | Combination of A and B determines C | OrderID, ProductID → Quantity |
| A ↔ B | A and B determine each other (equivalent) | Email ↔ Username |
| A →→ B | A multi-determines B (4NF) | StudentID →→ Major |
Anomaly Types
| Anomaly | Description | Example | Solution |
|---|---|---|---|
| Insertion | Cannot add data without unrelated information | Cannot add product without an order | Separate products into own table |
| Deletion | Removing data loses unrelated information | Deleting last order removes customer | Separate customers into own table |
| Update | Changing data requires multiple row updates | Updating customer email in every order row | Reference customer table with foreign key |
| Modification | Same data stored multiple times creates inconsistency | Product price differs across orders | Store current price in products, historical in order items |
ActiveRecord Association Methods
| Association | Usage | Generates | Inverse |
|---|---|---|---|
| belongs_to | Declares foreign key ownership | Foreign key column, association method | has_many or has_one |
| has_many | Declares one-to-many relationship | Collection methods, count, build | belongs_to |
| has_one | Declares one-to-one relationship | Singular association method | belongs_to |
| has_many :through | Many-to-many through join table | Collection through intermediate model | has_many :through (reverse) |
| has_and_belongs_to_many | Direct many-to-many | Pure join table without model | has_and_belongs_to_many |
Common Denormalization Patterns
| Pattern | Description | Trade-off | Use Case |
|---|---|---|---|
| Calculated Fields | Store computed values | Consistency vs calculation cost | Order totals, aggregated statistics |
| Duplicate References | Store data from related table | Redundancy vs join elimination | Category name in products |
| Materialized Aggregates | Pre-compute and store summaries | Storage vs query performance | Dashboard metrics, reports |
| Historical Snapshots | Copy current values to preserve state | Storage vs temporal accuracy | Prices at time of order |
| Wide Tables | Combine frequently joined tables | Update complexity vs read performance | User profiles with preferences |
Migration Patterns for Normalization
| Operation | Purpose | Implementation |
|---|---|---|
| Extract Table | Separate repeating data into new table | Create table, move data, add foreign key |
| Add Join Table | Create many-to-many relationship | Create table with two foreign keys |
| Add Foreign Key | Establish relationship constraint | Add reference column with constraint |
| Split Columns | Separate multi-valued column | Create related table with individual rows |
| Merge Tables | Denormalize for performance | Combine columns, add redundant data |
Database View Types
| View Type | Storage | Update | Performance | Use Case |
|---|---|---|---|---|
| Regular View | Query definition only | Through view in some cases | Calculated on access | Simplify complex queries |
| Materialized View | Physical storage | Manual or scheduled refresh | Fast reads, slow refresh | Pre-computed reports |
| Indexed View | Physical storage with indexes | Automatic (SQL Server) | Very fast reads | Frequently accessed aggregates |
| Updateable View | Query definition | Direct updates possible | Calculated on access | Security, abstraction |