CrackedRuby - Database Normalization

Overview

Database normalization structures relational data to eliminate redundancy and prevent update anomalies. The process decomposes tables with anomalies into smaller, well-structured tables that preserve data integrity and reduce storage overhead. E.F. Codd introduced the concept in 1970 as part of the relational model, establishing a theoretical foundation for database design.

The normalization process evaluates functional dependencies between attributes and progressively refines table structures through a series of normal forms. Each normal form addresses specific types of redundancy and anomalies. Tables advance from unnormalized structures through First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), and beyond to more restrictive forms like Boyce-Codd Normal Form (BCNF) and Fourth Normal Form (4NF).

Normalization applies during database schema design and when refactoring existing databases. The process identifies functional dependencies, decomposes tables based on these dependencies, and establishes relationships between the resulting tables. While normalization improves data integrity and reduces redundancy, it introduces complexity through table joins and may impact query performance.

Consider a denormalized customer order table:

| OrderID | CustomerName | CustomerEmail | Product | Price | Quantity |
|---------|--------------|---------------|---------|-------|----------|
| 1       | Alice Smith  | alice@ex.com  | Laptop  | 1200  | 1        |
| 2       | Alice Smith  | alice@ex.com  | Mouse   | 25    | 2        |
| 3       | Bob Jones    | bob@ex.com    | Laptop  | 1200  | 1        |

After normalization into three tables:

Customers: | CustomerID | Name        | Email        |
Orders:    | OrderID | CustomerID | ProductID | Quantity |
Products:  | ProductID | Name   | Price |

This separation eliminates redundant customer and product data while maintaining all original information through foreign key relationships.

Key Principles

Normalization builds on functional dependencies, which define relationships where one attribute determines another. If attribute A determines attribute B (written A → B), then each value of A associates with exactly one value of B. In the customer example, CustomerID → CustomerName represents a functional dependency because each customer ID maps to exactly one name. Functional dependencies drive the decomposition process and determine which attributes belong together in tables.

Update anomalies occur in denormalized tables where redundant data creates inconsistencies during modifications. An insertion anomaly prevents adding certain data without unrelated information—storing a product requires an order. A deletion anomaly loses unintended data—removing the last order for a customer erases all customer information. An update anomaly requires changing data in multiple locations—updating a customer's email necessitates modifying every order row for that customer. Normalization eliminates these anomalies by removing redundancy.

Candidate keys uniquely identify rows in a table. A table may have multiple candidate keys; one becomes the primary key. An attribute belongs to a candidate key if it forms part of any candidate key in the table. Non-key attributes depend on candidate keys but don't participate in them. The distinction between key and non-key attributes determines how dependencies violate normal forms.

Partial dependencies exist when a non-key attribute depends on part of a composite key rather than the entire key. In a table with composite key (OrderID, ProductID), if CustomerName depends only on OrderID, a partial dependency exists. Second Normal Form eliminates partial dependencies.

Transitive dependencies occur when a non-key attribute depends on another non-key attribute, which depends on the key. If CustomerID → PostalCode and PostalCode → City, then CustomerID → City represents a transitive dependency. Third Normal Form removes transitive dependencies.

Normal forms create a hierarchy where each form satisfies the requirements of previous forms:

First Normal Form (1NF): Eliminates repeating groups and ensures atomic values
Second Normal Form (2NF): Satisfies 1NF and eliminates partial dependencies
Third Normal Form (3NF): Satisfies 2NF and eliminates transitive dependencies
Boyce-Codd Normal Form (BCNF): Satisfies 3NF with stricter dependency rules
Fourth Normal Form (4NF): Satisfies BCNF and eliminates multi-valued dependencies
Fifth Normal Form (5NF): Satisfies 4NF and eliminates join dependencies

Most database designs target 3NF as it balances normalization benefits with practical implementation concerns. BCNF and higher forms apply to specific scenarios with complex dependencies.

Implementation Approaches

First Normal Form (1NF) requires atomic values in all columns and eliminates repeating groups. A table violates 1NF if it contains arrays, lists, or comma-separated values in columns, or if it has repeating column groups like Phone1, Phone2, Phone3. Converting to 1NF splits multi-valued attributes into separate rows or tables.

A denormalized customer table with multiple phone numbers:

| CustomerID | Name        | Phones               |
|------------|-------------|----------------------|
| 1          | Alice Smith | 555-0100, 555-0101  |
| 2          | Bob Jones   | 555-0200            |

Converting to 1NF creates separate rows for each phone:

| CustomerID | Name        | Phone    |
|------------|-------------|----------|
| 1          | Alice Smith | 555-0100 |
| 1          | Alice Smith | 555-0101 |
| 2          | Bob Jones   | 555-0200 |

Alternatively, create a separate phone numbers table:

Customers:     | CustomerID | Name        |
PhoneNumbers:  | CustomerID | Phone    |

Second Normal Form (2NF) requires 1NF and eliminates partial dependencies. If the table has a composite primary key, every non-key attribute must depend on the entire key, not just part of it. Tables with single-attribute primary keys automatically satisfy 2NF.

An order details table with partial dependency:

| OrderID | ProductID | Quantity | ProductName | ProductPrice |
|---------|-----------|----------|-------------|--------------|
| 1       | 101       | 2        | Laptop      | 1200         |
| 1       | 102       | 1        | Mouse       | 25           |

Primary key: (OrderID, ProductID)
ProductName and ProductPrice depend only on ProductID, not the full key.

Converting to 2NF separates product information:

OrderDetails: | OrderID | ProductID | Quantity |
Products:     | ProductID | ProductName | ProductPrice |

Third Normal Form (3NF) requires 2NF and eliminates transitive dependencies. Non-key attributes must depend directly on the primary key, not on other non-key attributes.

An orders table with transitive dependency:

| OrderID | CustomerID | CustomerName | CustomerCity | CustomerCountry |
|---------|------------|--------------|--------------|-----------------|
| 1       | 100        | Alice Smith  | Seattle      | USA            |
| 2       | 100        | Alice Smith  | Seattle      | USA            |

CustomerName, CustomerCity, and CustomerCountry depend on CustomerID, not directly on OrderID.

Converting to 3NF separates customer data:

Orders:    | OrderID | CustomerID |
Customers: | CustomerID | CustomerName | CustomerCity | CustomerCountry |

Boyce-Codd Normal Form (BCNF) strengthens 3NF by requiring that every determinant is a candidate key. A determinant is an attribute or set of attributes on the left side of a functional dependency. In 3NF, non-key attributes cannot determine other non-key attributes. BCNF extends this—no attribute can determine another unless it's a candidate key.

A table violating BCNF but satisfying 3NF:

| Student | Course | Instructor |
|---------|--------|------------|
| Alice   | DB101  | Dr. Smith  |
| Bob     | DB101  | Dr. Smith  |
| Alice   | OS201  | Dr. Jones  |

Dependencies: (Student, Course) → Instructor and Instructor → Course

The second dependency violates BCNF because Instructor determines Course but isn't a candidate key.

Converting to BCNF:

StudentInstructor: | Student | Instructor |
InstructorCourse:  | Instructor | Course |

Denormalization intentionally introduces redundancy to improve query performance. While normalized databases minimize redundancy, joins across many tables can degrade performance for read-heavy workloads. Denormalization strategies include duplicating frequently accessed data, maintaining computed aggregates, or combining tables that are always queried together.

Common denormalization patterns:

Storing calculated values instead of computing on each query
Duplicating frequently joined data into a single table
Maintaining materialized views with pre-joined data
Adding redundant foreign key references to skip intermediate tables

Ruby Implementation

Ruby applications interact with normalized databases primarily through Object-Relational Mapping (ORM) frameworks like ActiveRecord. These frameworks map normalized database tables to Ruby classes and provide abstractions for managing relationships between tables.

ActiveRecord associations define relationships between normalized tables. The framework provides methods to declare one-to-many, many-to-many, and one-to-one relationships.

class Customer < ApplicationRecord
  has_many :orders
  has_many :phone_numbers
end

class Order < ApplicationRecord
  belongs_to :customer
  has_many :order_items
  has_many :products, through: :order_items
end

class OrderItem < ApplicationRecord
  belongs_to :order
  belongs_to :product
end

class Product < ApplicationRecord
  has_many :order_items
  has_many :orders, through: :order_items
end

class PhoneNumber < ApplicationRecord
  belongs_to :customer
end

These associations enable navigation between related records:

customer = Customer.find(1)
customer.orders.each do |order|
  order.order_items.each do |item|
    puts "#{item.quantity}x #{item.product.name}"
  end
end

Database migrations create and modify normalized table structures. Migrations define schema changes that the framework applies to the database.

Creating a normalized schema for customers and orders:

class CreateNormalizedSchema < ActiveRecord::Migration[7.0]
  def change
    create_table :customers do |t|
      t.string :name, null: false
      t.string :email, null: false
      t.timestamps
      t.index :email, unique: true
    end

    create_table :phone_numbers do |t|
      t.references :customer, null: false, foreign_key: true
      t.string :number, null: false
      t.timestamps
    end

    create_table :products do |t|
      t.string :name, null: false
      t.decimal :price, precision: 10, scale: 2, null: false
      t.timestamps
    end

    create_table :orders do |t|
      t.references :customer, null: false, foreign_key: true
      t.timestamps
    end

    create_table :order_items do |t|
      t.references :order, null: false, foreign_key: true
      t.references :product, null: false, foreign_key: true
      t.integer :quantity, null: false
      t.timestamps
    end
  end
end

Querying normalized data requires joining tables to retrieve related information. ActiveRecord provides methods to optimize these queries.

# Inefficient: N+1 queries
orders = Order.all
orders.each do |order|
  puts order.customer.name  # Separate query for each customer
end

# Efficient: Eager loading with includes
orders = Order.includes(:customer, order_items: :product)
orders.each do |order|
  puts "Customer: #{order.customer.name}"
  order.order_items.each do |item|
    puts "  #{item.product.name}: #{item.quantity}"
  end
end

Handling denormalization in Ruby applications often involves caching calculated values or maintaining redundant data for performance.

class Order < ApplicationRecord
  belongs_to :customer
  has_many :order_items

  # Denormalized total stored in orders table
  # Updated when order items change
  def recalculate_total!
    self.total_amount = order_items.joins(:product)
                                   .sum('order_items.quantity * products.price')
    save!
  end
end

# Callback to maintain denormalized data
class OrderItem < ApplicationRecord
  belongs_to :order
  belongs_to :product
  
  after_save :update_order_total
  after_destroy :update_order_total

  private

  def update_order_total
    order.recalculate_total!
  end
end

Database views provide denormalized read-only representations of normalized data. Some Ruby database adapters support creating and querying views.

# Create a denormalized view via migration
class CreateOrderSummaryView < ActiveRecord::Migration[7.0]
  def up
    execute <<-SQL
      CREATE VIEW order_summaries AS
      SELECT 
        orders.id,
        orders.created_at,
        customers.name as customer_name,
        customers.email as customer_email,
        COUNT(order_items.id) as item_count,
        SUM(order_items.quantity * products.price) as total_amount
      FROM orders
      JOIN customers ON customers.id = orders.customer_id
      JOIN order_items ON order_items.order_id = orders.id
      JOIN products ON products.id = order_items.product_id
      GROUP BY orders.id, customers.name, customers.email
    SQL
  end

  def down
    execute "DROP VIEW order_summaries"
  end
end

# Query the view like a table
class OrderSummary < ApplicationRecord
  self.primary_key = :id
  
  def readonly?
    true
  end
end

summaries = OrderSummary.where('total_amount > 1000')

Practical Examples

Example 1: Normalizing a Denormalized Order System

Starting with a single denormalized table containing all order information:

| OrderID | OrderDate | CustomerName | CustomerEmail | CustomerPhone | Product | Price | Quantity | OrderTotal |
|---------|-----------|--------------|---------------|---------------|---------|-------|----------|------------|
| 1       | 2024-01-15| Alice Smith  | alice@ex.com  | 555-0100      | Laptop  | 1200  | 1        | 1225       |
| 1       | 2024-01-15| Alice Smith  | alice@ex.com  | 555-0100      | Mouse   | 25    | 1        | 1225       |
| 2       | 2024-01-16| Bob Jones    | bob@ex.com    | 555-0200      | Laptop  | 1200  | 2        | 2400       |

Step 1: Convert to 1NF

The table already has atomic values but contains repeating groups (multiple items per order create multiple rows with duplicated order information). The structure satisfies 1NF requirements.

Step 2: Convert to 2NF

Composite key: (OrderID, Product)

Partial dependencies exist:

OrderID → OrderDate, CustomerName, CustomerEmail, CustomerPhone, OrderTotal
Product → Price

Decompose into tables eliminating partial dependencies:

Orders:
| OrderID | OrderDate  | CustomerName | CustomerEmail | CustomerPhone | OrderTotal |

OrderItems:
| OrderID | Product | Quantity |

Products:
| Product | Price |

Step 3: Convert to 3NF

Transitive dependencies exist in the Orders table:

OrderID → CustomerName, CustomerEmail, CustomerPhone
CustomerEmail → CustomerName, CustomerPhone (customer attributes depend on each other)

Also, OrderTotal is a calculated value that depends on order items, creating a transitive dependency.

Final 3NF structure:

# Schema
Customers:    | CustomerID | Name        | Email        | Phone    |
Orders:       | OrderID | CustomerID | OrderDate  |
OrderItems:   | OrderItemID | OrderID | ProductID | Quantity |
Products:     | ProductID | Name   | Price |

Example 2: Student Course Registration System

Initial denormalized structure:

| StudentID | StudentName | Major | Course | CourseName | Instructor | InstructorOffice | Grade |
|-----------|-------------|-------|--------|------------|------------|------------------|-------|
| 1         | Alice       | CS    | DB101  | Databases  | Dr. Smith  | Room 301        | A     |
| 1         | Alice       | CS    | OS201  | OS         | Dr. Jones  | Room 302        | B     |
| 2         | Bob         | EE    | DB101  | Databases  | Dr. Smith  | Room 301        | A     |

Analyzing Dependencies:

StudentID → StudentName, Major
Course → CourseName
Instructor → InstructorOffice
Course → Instructor (each course has one instructor)
(StudentID, Course) → Grade

Normalizing to 3NF:

Students:     | StudentID | StudentName | Major |
Courses:      | CourseID | CourseName | InstructorID |
Instructors:  | InstructorID | InstructorName | Office |
Enrollments:  | EnrollmentID | StudentID | CourseID | Grade |

Ruby implementation:

class Student < ApplicationRecord
  has_many :enrollments
  has_many :courses, through: :enrollments
end

class Course < ApplicationRecord
  belongs_to :instructor
  has_many :enrollments
  has_many :students, through: :enrollments
end

class Instructor < ApplicationRecord
  has_many :courses
end

class Enrollment < ApplicationRecord
  belongs_to :student
  belongs_to :course
end

# Query normalized data
student = Student.includes(enrollments: { course: :instructor }).find(1)
student.enrollments.each do |enrollment|
  course = enrollment.course
  puts "#{course.course_name}: #{enrollment.grade} (#{course.instructor.instructor_name})"
end

Example 3: E-commerce Product Catalog with Attributes

Initial structure with repeating attribute groups:

| ProductID | Name   | Price | Color1 | Color2 | Size1 | Size2 | Size3 |
|-----------|--------|-------|--------|--------|-------|-------|-------|
| 1         | Shirt  | 29.99 | Red    | Blue   | S     | M     | L     |
| 2         | Pants  | 49.99 | Black  | NULL   | 32    | 34    | 36    |

Converting to 1NF eliminates repeating columns:

Products:         | ProductID | Name   | Price |
ProductColors:    | ProductID | Color |
ProductSizes:     | ProductID | Size  |

This structure reaches 3NF. The composite key (ProductID, Color) in ProductColors and (ProductID, Size) in ProductSizes determine all attributes in their respective tables.

Ruby implementation with inventory management:

class Product < ApplicationRecord
  has_many :product_colors
  has_many :product_sizes
  has_many :inventory_items
  
  def available_variants
    inventory_items.includes(:product_color, :product_size)
                   .where('quantity > 0')
  end
end

class ProductColor < ApplicationRecord
  belongs_to :product
  has_many :inventory_items
end

class ProductSize < ApplicationRecord
  belongs_to :product
  has_many :inventory_items
end

class InventoryItem < ApplicationRecord
  belongs_to :product
  belongs_to :product_color
  belongs_to :product_size
  
  validates :quantity, numericality: { greater_than_or_equal_to: 0 }
end

# Query specific variant
shirt = Product.find_by(name: 'Shirt')
red_medium = shirt.inventory_items
                  .joins(:product_color, :product_size)
                  .where(product_colors: { color: 'Red' })
                  .where(product_sizes: { size: 'M' })
                  .first

Design Considerations

Normalization Benefits and Costs

Normalization reduces data redundancy and improves data integrity by centralizing information. Updating a customer's email requires changing one row instead of dozens. Deleting a product doesn't lose customer information. These benefits come with costs: queries require more joins, and applications need more complex object graphs to represent related data.

The decision to normalize depends on access patterns, data volatility, and consistency requirements. Systems with frequent writes and strong consistency requirements benefit from normalization. The single source of truth prevents inconsistencies, and reduced redundancy minimizes storage costs. Systems with read-heavy workloads and less frequent updates may benefit from denormalization to avoid join overhead.

When to Apply Each Normal Form

1NF serves as the minimum requirement for relational databases. All production databases should satisfy 1NF. Storing arrays or comma-separated values in columns breaks relational algebra and prevents efficient querying. Convert multi-valued attributes to separate rows or tables.

2NF applies to tables with composite keys. Single-column primary keys automatically satisfy 2NF. Apply 2NF when composite keys exist and non-key attributes depend only on part of the key. The process typically creates separate tables for entities that were previously embedded.

3NF represents the practical target for most databases. Third normal form eliminates most redundancy while maintaining reasonable query complexity. Apply 3NF to all tables unless specific performance requirements justify denormalization. Most database design methodologies aim for 3NF as the standard.

BCNF applies when 3NF allows anomalies due to overlapping candidate keys or unusual dependency patterns. If functional dependencies exist where the determinant isn't a candidate key, consider BCNF. The form prevents subtle anomalies but may require decomposing tables in ways that complicate queries.

Denormalization Scenarios

Performance requirements often justify denormalization. Specific patterns include:

Aggregated data: Storing counts, sums, or averages that would require expensive calculations. An order total field duplicates information derivable from order items but eliminates complex joins and aggregations for common queries.

class Order < ApplicationRecord
  has_many :order_items
  
  # Denormalized field maintained by callbacks
  # Faster than calculating on every access
  def self.with_totals
    select('orders.*, 
            (SELECT SUM(quantity * price) 
             FROM order_items 
             WHERE order_items.order_id = orders.id) as total')
  end
end

Frequently joined data: Combining tables that are always accessed together. If every product query requires the category name, storing the category name in the products table avoids a join despite redundancy.

Historical snapshots: Copying current values into transaction records to preserve point-in-time state. An order stores the product price at purchase time rather than referencing the products table, which may change prices later.

class OrderItem < ApplicationRecord
  belongs_to :order
  belongs_to :product
  
  before_create :capture_product_price
  
  private
  
  def capture_product_price
    # Store current price to preserve historical accuracy
    self.price_at_purchase = product.price
  end
  
  def line_total
    quantity * price_at_purchase
  end
end

Read replicas and caching: Maintaining denormalized copies of data optimized for specific read patterns. Write to normalized tables, then propagate changes to denormalized read structures.

Balancing Normalization and Performance

Start with a normalized design and denormalize only when measurements demonstrate performance problems. Premature denormalization adds complexity without confirmed benefits. Profile queries to identify bottlenecks. If joins create performance issues, consider:

Adding indexes before denormalizing
Optimizing query patterns and reducing N+1 queries
Implementing caching layers
Using database-specific optimizations (materialized views, indexed views)
Denormalizing as a last resort

When denormalizing, maintain data consistency through application logic or database triggers. The application must update all copies of duplicated data. This complexity increases maintenance burden and creates opportunities for bugs.

# Maintaining consistency in denormalized data
class Product < ApplicationRecord
  has_many :order_items
  
  after_update :update_order_items, if: :saved_change_to_price?
  
  private
  
  def update_order_items
    # Update denormalized price in open orders only
    order_items.joins(:order)
               .where('orders.status = ?', 'pending')
               .update_all(current_price: price)
  end
end

Performance Considerations

Join Performance Impact

Normalized databases require joins to retrieve related data. Each join operation processes rows from multiple tables, matching records based on key relationships. The cost increases with the number of tables, rows, and join complexity. A query requiring five joins across large tables can become significantly slower than reading from a single denormalized table.

Join algorithms affect performance:

Nested loop joins iterate through one table and search the other for matches
Hash joins build hash tables for faster lookup
Merge joins sort both tables and merge results

Database optimizers select algorithms based on table sizes, available indexes, and statistics. Understanding these mechanisms helps predict performance characteristics of normalized designs.

Index Strategy for Normalized Tables

Foreign keys used in joins require indexes. Without indexes, the database performs full table scans to find matching rows. Index every foreign key column used in joins:

class AddIndexesToNormalizedSchema < ActiveRecord::Migration[7.0]
  def change
    # Index foreign keys for join performance
    add_index :orders, :customer_id
    add_index :order_items, :order_id
    add_index :order_items, :product_id
    add_index :phone_numbers, :customer_id
    
    # Composite indexes for common query patterns
    add_index :order_items, [:order_id, :product_id], unique: true
  end
end

Composite indexes support queries filtering on multiple columns. The index column order matters—place the most selective column first, or match the query's WHERE clause order.

Query Optimization Patterns

Eager loading prevents N+1 query problems in normalized schemas. Loading related records in batches rather than individually reduces database round trips:

# N+1 problem: One query for orders, then one per customer
orders = Order.limit(100)
orders.each { |order| puts order.customer.name }  # 101 queries

# Solution: Eager load customers
orders = Order.includes(:customer).limit(100)
orders.each { |order| puts order.customer.name }  # 2 queries

# Complex eager loading with nested associations
orders = Order.includes(
  :customer,
  order_items: { product: :category }
).limit(100)

Select specific columns instead of loading entire records when accessing limited attributes:

# Load entire Customer objects
Order.includes(:customer).map { |o| o.customer.email }

# More efficient: load only needed columns
Order.joins(:customer).pluck('customers.email')

Measuring Normalization Performance Trade-offs

Benchmark queries to quantify normalization costs. Compare normalized and denormalized approaches with realistic data volumes:

require 'benchmark'

# Normalized query with joins
normalized_time = Benchmark.realtime do
  Order.includes(:customer, order_items: :product)
       .where('orders.created_at > ?', 30.days.ago)
       .each do |order|
         total = order.order_items.sum { |item| item.quantity * item.product.price }
       end
end

# Denormalized query without joins
denormalized_time = Benchmark.realtime do
  Order.where('orders.created_at > ?', 30.days.ago)
       .pluck(:id, :total_amount)
end

puts "Normalized: #{normalized_time}s"
puts "Denormalized: #{denormalized_time}s"

Test with production-scale data volumes. Performance characteristics change dramatically between 100 rows and 10 million rows. Queries that perform well on small datasets may become unusable at scale.

Caching Strategies

Application-level caching reduces database load for normalized data. Cache frequently accessed records and expensive joins:

class Order < ApplicationRecord
  belongs_to :customer
  has_many :order_items
  
  def total_with_cache
    Rails.cache.fetch("order/#{id}/total", expires_in: 1.hour) do
      order_items.joins(:product)
                 .sum('order_items.quantity * products.price')
    end
  end
  
  # Invalidate cache when order changes
  after_save :clear_total_cache
  
  private
  
  def clear_total_cache
    Rails.cache.delete("order/#{id}/total")
  end
end

Fragment caching stores rendered HTML for complex queries:

# View template with fragment caching
<% cache ['order-summary', order, order.order_items.maximum(:updated_at)] do %>
  <div class="order-summary">
    <h3><%= order.customer.name %></h3>
    <ul>
      <% order.order_items.each do |item| %>
        <li><%= item.product.name %>: <%= item.quantity %></li>
      <% end %>
    </ul>
  </div>
<% end %>

Database-Specific Optimizations

Materialized views maintain pre-computed joins and aggregations. Unlike regular views, materialized views store results physically and must be refreshed when underlying data changes:

-- PostgreSQL materialized view
CREATE MATERIALIZED VIEW order_totals AS
SELECT 
  orders.id,
  orders.customer_id,
  SUM(order_items.quantity * products.price) as total_amount
FROM orders
JOIN order_items ON order_items.order_id = orders.id
JOIN products ON products.id = order_items.product_id
GROUP BY orders.id, orders.customer_id;

CREATE INDEX idx_order_totals_customer ON order_totals(customer_id);

-- Refresh when data changes
REFRESH MATERIALIZED VIEW order_totals;

Partial indexes reduce index size and improve performance for queries with common WHERE clauses:

class AddPartialIndexes < ActiveRecord::Migration[7.0]
  def change
    # Index only active orders
    add_index :orders, :customer_id, 
              where: "status = 'active'",
              name: 'idx_orders_customer_active'
  end
end

Reference

Normal Forms Comparison

Normal Form	Requirement	Eliminates	Key Rule
First (1NF)	Atomic values, no repeating groups	Repeating columns and multi-valued attributes	Each cell contains single value
Second (2NF)	1NF plus no partial dependencies	Partial key dependencies	Non-key attributes depend on entire key
Third (3NF)	2NF plus no transitive dependencies	Transitive dependencies via non-key attributes	Non-key attributes depend only on key
Boyce-Codd (BCNF)	3NF plus all determinants are candidate keys	Dependencies where determinant is not a key	Every determinant is a candidate key
Fourth (4NF)	BCNF plus no multi-valued dependencies	Independent multi-valued facts in same table	No multi-valued dependencies
Fifth (5NF)	4NF plus no join dependencies	Redundancy from combining independent relationships	Cannot decompose without losing information

Functional Dependency Notation

Notation	Meaning	Example
A → B	A determines B	CustomerID → CustomerName
A → B, C	A determines both B and C	OrderID → OrderDate, CustomerID
A, B → C	Combination of A and B determines C	OrderID, ProductID → Quantity
A ↔ B	A and B determine each other (equivalent)	Email ↔ Username
A →→ B	A multi-determines B (4NF)	StudentID →→ Major

Anomaly Types

Anomaly	Description	Example	Solution
Insertion	Cannot add data without unrelated information	Cannot add product without an order	Separate products into own table
Deletion	Removing data loses unrelated information	Deleting last order removes customer	Separate customers into own table
Update	Changing data requires multiple row updates	Updating customer email in every order row	Reference customer table with foreign key
Modification	Same data stored multiple times creates inconsistency	Product price differs across orders	Store current price in products, historical in order items

ActiveRecord Association Methods

Association	Usage	Generates	Inverse
belongs_to	Declares foreign key ownership	Foreign key column, association method	has_many or has_one
has_many	Declares one-to-many relationship	Collection methods, count, build	belongs_to
has_one	Declares one-to-one relationship	Singular association method	belongs_to
has_many :through	Many-to-many through join table	Collection through intermediate model	has_many :through (reverse)
has_and_belongs_to_many	Direct many-to-many	Pure join table without model	has_and_belongs_to_many

Common Denormalization Patterns

Pattern	Description	Trade-off	Use Case
Calculated Fields	Store computed values	Consistency vs calculation cost	Order totals, aggregated statistics
Duplicate References	Store data from related table	Redundancy vs join elimination	Category name in products
Materialized Aggregates	Pre-compute and store summaries	Storage vs query performance	Dashboard metrics, reports
Historical Snapshots	Copy current values to preserve state	Storage vs temporal accuracy	Prices at time of order
Wide Tables	Combine frequently joined tables	Update complexity vs read performance	User profiles with preferences

Migration Patterns for Normalization

Operation	Purpose	Implementation
Extract Table	Separate repeating data into new table	Create table, move data, add foreign key
Add Join Table	Create many-to-many relationship	Create table with two foreign keys
Add Foreign Key	Establish relationship constraint	Add reference column with constraint
Split Columns	Separate multi-valued column	Create related table with individual rows
Merge Tables	Denormalize for performance	Combine columns, add redundant data

Database View Types

View Type	Storage	Update	Performance	Use Case
Regular View	Query definition only	Through view in some cases	Calculated on access	Simplify complex queries
Materialized View	Physical storage	Manual or scheduled refresh	Fast reads, slow refresh	Pre-computed reports
Indexed View	Physical storage with indexes	Automatic (SQL Server)	Very fast reads	Frequently accessed aggregates
Updateable View	Query definition	Direct updates possible	Calculated on access	Security, abstraction

Database Normalization