CrackedRuby CrackedRuby

Overview

Primary and foreign keys form the foundation of relational database design by establishing identity and relationships between tables. A primary key uniquely identifies each record in a table, while a foreign key references the primary key of another table, creating relationships that maintain data integrity across the database.

The concept originated with E.F. Codd's relational model in 1970, which established mathematical principles for organizing data in tables with explicit relationships. Primary keys enforce entity integrity by guaranteeing each record can be uniquely identified. Foreign keys enforce referential integrity by ensuring relationships reference valid records.

These constraints prevent orphaned records, maintain consistency during updates and deletes, and enable efficient queries across related data. Modern database systems implement keys at the storage level, creating indexes and constraint checking mechanisms that validate operations before committing changes.

# Database schema with primary and foreign keys
create_table :authors do |t|
  t.bigint :id, primary_key: true  # Primary key
  t.string :name
  t.timestamps
end

create_table :books do |t|
  t.bigint :id, primary_key: true      # Primary key
  t.bigint :author_id, null: false     # Foreign key
  t.string :title
  t.timestamps
  t.foreign_key :authors
end
# Books reference authors through author_id foreign key

Key Principles

Primary Key Characteristics

A primary key must contain unique values for every record in the table. No two rows can share the same primary key value. The key cannot contain NULL values, as NULL represents unknown or missing data and would violate the uniqueness constraint. Each table should have exactly one primary key, though that key may consist of multiple columns.

Primary keys serve as the canonical identifier for records. When other tables need to reference a specific record, they store the primary key value. The database uses the primary key for indexing, making record lookups efficient. Most database systems automatically create a unique index on primary key columns.

Foreign Key Characteristics

A foreign key in one table references the primary key of another table. The foreign key column must match the data type of the referenced primary key. Unlike primary keys, foreign keys can contain NULL values unless explicitly defined as NOT NULL, representing an optional relationship.

Multiple records in the child table can reference the same parent record, establishing one-to-many relationships. The database validates foreign key values during INSERT and UPDATE operations, rejecting any value that does not exist in the referenced table. This validation ensures referential integrity.

Referential Integrity

Referential integrity guarantees that relationships between tables remain valid. When a record in the parent table is updated or deleted, the database must handle dependent records in child tables. Four main strategies exist:

CASCADE propagates changes to dependent records. Deleting a parent record automatically deletes all child records referencing it. Updating a parent's primary key updates all corresponding foreign keys.

RESTRICT prevents operations that would orphan child records. Attempting to delete or update a parent record with existing children raises an error. The application must remove dependencies before modifying the parent.

SET NULL replaces foreign key values with NULL when the parent is deleted or updated. This maintains child records while breaking the relationship. The foreign key column must allow NULL values for this strategy.

SET DEFAULT assigns a default value to foreign keys when parents are removed. The default typically references a special "unknown" or "archived" parent record. This preserves children while redirecting the relationship.

Key Types

Simple keys consist of a single column. Most tables use auto-incrementing integers or UUIDs as simple primary keys. The column value directly identifies the record with no ambiguity.

Composite keys combine multiple columns to form the primary key. Order management systems often use composite keys like (order_id, line_number) where neither column alone is unique. Both columns together identify a specific order line item.

Natural keys derive from the entity's inherent properties. A person's social security number or a product's ISBN represent natural keys with meaning in the problem domain. Natural keys can change over time, creating maintenance challenges when values need updating across foreign key references.

Surrogate keys are artificial identifiers created solely for database purposes. Auto-incrementing integers and UUIDs serve as surrogate keys with no business meaning. Surrogate keys remain stable even when natural properties change, simplifying updates and improving performance.

Relationship Cardinality

One-to-one relationships connect exactly one record in each table. Implementation places a unique constraint on the foreign key column, preventing multiple children from referencing the same parent. User profiles often use one-to-one relationships, where each user has exactly one profile.

One-to-many relationships allow multiple child records per parent. Foreign keys in the child table reference the parent's primary key without uniqueness constraints. Authors and books demonstrate one-to-many relationships, where one author can write many books.

Many-to-many relationships require a junction table containing two foreign keys. Each foreign key references one of the connected tables, and the combination forms the junction table's primary key. Students and courses use many-to-many relationships, where students enroll in multiple courses and courses contain multiple students.

Ruby Implementation

Migration Definition

ActiveRecord migrations define primary and foreign keys using the table definition DSL. The framework automatically creates an auto-incrementing integer primary key named id unless specified otherwise. Explicit primary key definitions override this default behavior.

# Standard migration with auto-generated primary key
class CreateOrders < ActiveRecord::Migration[7.0]
  def change
    create_table :orders do |t|
      t.string :order_number
      t.decimal :total_amount
      t.timestamps
    end
    # Creates: id bigint PRIMARY KEY auto-increment
  end
end

Foreign key columns require explicit definition with the appropriate data type matching the referenced primary key. The foreign_key method establishes the constraint at the database level, configuring validation and cascade behavior.

# Migration with explicit foreign key constraint
class CreateOrderItems < ActiveRecord::Migration[7.0]
  def change
    create_table :order_items do |t|
      t.references :order, null: false, foreign_key: true
      t.references :product, null: false, foreign_key: true
      t.integer :quantity
      t.decimal :price
      t.timestamps
    end
  end
end
# Creates: order_id and product_id bigint columns with foreign key constraints
# Indexes automatically added to foreign key columns

The references method combines column creation with foreign key constraints and index generation. It creates a column with the _id suffix, adds an index, and optionally establishes the foreign key constraint. Setting foreign_key: true creates the constraint with default CASCADE behavior.

Composite Primary Keys

Tables requiring composite primary keys specify multiple columns in the primary_key option. ActiveRecord 7.1 and later provide native support for composite primary keys through array syntax.

# Composite primary key definition
class CreateOrderLineItems < ActiveRecord::Migration[7.1]
  def change
    create_table :order_line_items, primary_key: [:order_id, :line_number] do |t|
      t.bigint :order_id
      t.integer :line_number
      t.references :product, foreign_key: true
      t.decimal :price
      t.timestamps
    end
    
    add_foreign_key :order_line_items, :orders, column: :order_id
  end
end

UUID Primary Keys

UUID primary keys provide globally unique identifiers without coordination between database instances. They enable distributed systems to generate keys independently without collision risk.

# UUID primary key configuration
class CreateDocuments < ActiveRecord::Migration[7.0]
  def change
    enable_extension 'pgcrypto' unless extension_enabled?('pgcrypto')
    
    create_table :documents, id: :uuid do |t|
      t.string :title
      t.text :content
      t.timestamps
    end
    
    create_table :document_versions, id: :uuid do |t|
      t.uuid :document_id, null: false
      t.integer :version_number
      t.text :content
      t.timestamps
    end
    
    add_foreign_key :document_versions, :documents
    add_index :document_versions, :document_id
  end
end

Association Declarations

Model associations define relationships through primary and foreign key mappings. ActiveRecord infers foreign key column names from association names, following the pattern association_name_id.

# Model associations reflecting database relationships
class Author < ApplicationRecord
  has_many :books, dependent: :destroy
  has_one :author_profile, dependent: :destroy
end

class Book < ApplicationRecord
  belongs_to :author
  belongs_to :publisher, optional: true
  has_many :reviews, dependent: :destroy
end

class AuthorProfile < ApplicationRecord
  belongs_to :author
end

class Review < ApplicationRecord
  belongs_to :book
  belongs_to :reviewer, class_name: 'User'
end

The belongs_to association expects a foreign key column in the declaring model. The has_many association looks for a foreign key in the associated model's table. The dependent option controls cascade behavior when destroying parent records.

Polymorphic Associations

Polymorphic associations allow a model to reference multiple parent types through a single foreign key. The implementation stores both the foreign key ID and the parent's class name.

# Polymorphic association migration
class CreateComments < ActiveRecord::Migration[7.0]
  def change
    create_table :comments do |t|
      t.references :commentable, polymorphic: true, null: false
      t.text :body
      t.timestamps
    end
    
    add_index :comments, [:commentable_type, :commentable_id]
  end
end

# Models using polymorphic association
class Comment < ApplicationRecord
  belongs_to :commentable, polymorphic: true
end

class Article < ApplicationRecord
  has_many :comments, as: :commentable, dependent: :destroy
end

class Video < ApplicationRecord
  has_many :comments, as: :commentable, dependent: :destroy
end

Custom Foreign Keys

Non-standard foreign key names require explicit specification in both migrations and model associations. This supports legacy schemas or domain-specific naming conventions.

# Custom foreign key configuration
class CreateEmployees < ActiveRecord::Migration[7.0]
  def change
    create_table :employees do |t|
      t.string :name
      t.bigint :supervisor_id
      t.bigint :dept_id
      t.timestamps
    end
    
    add_foreign_key :employees, :employees, column: :supervisor_id
    add_foreign_key :employees, :departments, column: :dept_id
    add_index :employees, :supervisor_id
    add_index :employees, :dept_id
  end
end

class Employee < ApplicationRecord
  belongs_to :supervisor, class_name: 'Employee', foreign_key: 'supervisor_id', optional: true
  belongs_to :department, foreign_key: 'dept_id'
  has_many :subordinates, class_name: 'Employee', foreign_key: 'supervisor_id'
end

Design Considerations

Natural Versus Surrogate Keys

Natural keys use domain-specific attributes as primary keys. Email addresses, social security numbers, or product SKUs serve as natural keys when they guarantee uniqueness. Natural keys provide immediate meaning and eliminate the need for joins when querying by the key value.

Natural keys create maintenance challenges when values change. Email address changes require updating all foreign key references across the database. Some natural candidates like email addresses may need to be reused, violating the immutability principle. Privacy regulations sometimes mandate removing or anonymizing identifiable information, complicating natural key management.

Surrogate keys use generated identifiers without business meaning. Auto-incrementing integers and UUIDs serve as stable surrogate keys that never change regardless of entity attribute updates. Foreign key references remain valid even when business attributes like email or username change.

Surrogate keys add an extra column and require joins when filtering by natural attributes. Queries must join through the surrogate key to access related data, potentially impacting performance compared to natural key queries. The choice depends on update frequency, query patterns, and stability of natural candidates.

Integer Versus UUID Keys

Integer primary keys use sequential auto-incrementing values generated by the database. They consume minimal storage space (8 bytes for bigint), provide excellent index performance, and sort chronologically. The sequential nature reveals creation order and record count information.

Sequential integers create challenges in distributed systems. Multiple database instances cannot independently generate unique keys without coordination. Merging data from separate databases risks key collisions. The sequential pattern may expose business metrics or enable ID enumeration attacks.

UUID keys generate globally unique identifiers without coordination. Each instance can create UUIDs independently with negligible collision probability. They hide record count information and prevent ID enumeration. UUIDs support distributed architectures and data migration scenarios.

UUIDs consume more storage (16 bytes) and create larger indexes. Random UUIDs reduce index efficiency compared to sequential values, as inserts scatter across the index tree rather than appending to the end. Database systems optimized for sequential keys may experience performance degradation with UUIDs. UUID version 7 addresses some concerns by incorporating timestamp ordering.

Composite Key Trade-offs

Composite keys naturally represent entities requiring multiple attributes for identification. Order line items identified by (order_id, line_number) or seat assignments identified by (flight_id, seat_number) reflect inherent domain structure. Composite keys enforce uniqueness across the combined columns without additional constraints.

Composite keys complicate foreign key relationships. Child tables referencing composite keys must include all constituent columns in the foreign key definition. This increases storage requirements and index size. Queries joining through composite keys require matching multiple columns, adding complexity to WHERE clauses.

Many-to-many junction tables naturally use composite keys formed from both foreign keys. The combination (student_id, course_id) identifies enrollments uniquely. Some developers add a surrogate ID to junction tables for simplicity, though this duplicates the uniqueness constraint.

Nullable Foreign Keys

Nullable foreign keys represent optional relationships. A book's publisher_id may be NULL during draft stages before publisher assignment. The NULL value indicates the relationship does not yet exist rather than referencing an invalid record.

Required relationships use NOT NULL constraints on foreign keys. Order items must reference valid products, so product_id should prohibit NULL values. The database rejects any INSERT or UPDATE attempting to create orphaned records.

Queries on nullable foreign keys require careful NULL handling. LEFT JOINS include records with NULL foreign keys, while INNER JOINS exclude them. Application logic must distinguish between "relationship does not exist" and "related record not found" scenarios.

Some designs avoid nullable foreign keys by creating default "unknown" parent records. A placeholder publisher with id=0 allows all books to reference a valid publisher_id while semantically representing "no publisher assigned". This approach simplifies queries but creates special-case records requiring documentation.

Cascade Strategy Selection

CASCADE deletion removes child records when parents are deleted. This maintains referential integrity automatically for composition relationships where children have no independent existence. Deleting an order should cascade to order items, as line items cannot exist without the parent order.

RESTRICT deletion prevents removing parents with existing children. Blog posts should use RESTRICT for author relationships, as deleting an author should not remove all their posts. The application must explicitly handle the dependency, perhaps by reassigning posts or soft-deleting the author.

SET NULL disconnects children when parents are deleted. Product categories might use SET NULL, allowing products to persist without category assignment after category deletion. This requires nullable foreign keys and application logic to handle uncategorized items.

The cascade strategy affects data lifecycle management. Systems requiring audit trails typically use RESTRICT or soft deletes rather than CASCADE. Cascade operations can create performance issues when deleting parents with thousands of children, as each deletion triggers constraint checks.

Common Patterns

Standard Relationship Patterns

One-to-many relationships dominate database design. The parent table contains the primary key, and the child table stores the foreign key with an index for efficient lookups. Authors and books, customers and orders, and categories and products all follow this pattern.

# Standard one-to-many relationship
class CreateBlogPosts < ActiveRecord::Migration[7.0]
  def change
    create_table :users do |t|
      t.string :username
      t.timestamps
    end
    
    create_table :blog_posts do |t|
      t.references :user, null: false, foreign_key: true
      t.string :title
      t.text :body
      t.timestamps
    end
  end
end

One-to-one relationships extend parent records with additional attributes. User profiles, authentication credentials, and configuration settings often use one-to-one relationships. The child table's foreign key includes a unique constraint preventing multiple children per parent.

# One-to-one relationship with unique constraint
class CreateUserProfiles < ActiveRecord::Migration[7.0]
  def change
    create_table :user_profiles do |t|
      t.references :user, null: false, foreign_key: true, index: { unique: true }
      t.string :bio
      t.string :avatar_url
      t.timestamps
    end
  end
end

Junction Table Pattern

Many-to-many relationships require a junction table containing foreign keys to both related tables. The composite primary key formed from both foreign keys ensures each relationship pair appears only once.

# Many-to-many junction table
class CreateEnrollments < ActiveRecord::Migration[7.0]
  def change
    create_table :students do |t|
      t.string :name
      t.timestamps
    end
    
    create_table :courses do |t|
      t.string :title
      t.timestamps
    end
    
    create_table :enrollments, primary_key: [:student_id, :course_id] do |t|
      t.bigint :student_id, null: false
      t.bigint :course_id, null: false
      t.date :enrolled_at
      t.string :grade
    end
    
    add_foreign_key :enrollments, :students
    add_foreign_key :enrollments, :courses
    add_index :enrollments, :student_id
    add_index :enrollments, :course_id
  end
end

class Student < ApplicationRecord
  has_many :enrollments
  has_many :courses, through: :enrollments
end

class Course < ApplicationRecord
  has_many :enrollments
  has_many :students, through: :enrollments
end

class Enrollment < ApplicationRecord
  belongs_to :student
  belongs_to :course
end

Junction tables storing additional relationship attributes may use surrogate primary keys instead of composite keys. Enrollment date, grade, or status values justify modeling the junction as a first-class entity with its own ID.

Self-Referential Pattern

Self-referential relationships connect records within the same table. Organizational hierarchies, comment threads, and category trees use self-referential foreign keys. The foreign key references the same table's primary key, typically with a nullable constraint to represent root nodes.

# Self-referential relationship
class CreateCategories < ActiveRecord::Migration[7.0]
  def change
    create_table :categories do |t|
      t.string :name
      t.references :parent, foreign_key: { to_table: :categories }
      t.timestamps
    end
  end
end

class Category < ApplicationRecord
  belongs_to :parent, class_name: 'Category', optional: true
  has_many :subcategories, class_name: 'Category', foreign_key: 'parent_id'
end

Polymorphic Pattern

Polymorphic associations allow a single foreign key to reference multiple parent tables. Comments, attachments, and tags often use polymorphic associations to avoid duplicating similar structures across multiple parent types.

The implementation stores two columns: a foreign key ID and a type discriminator string. Queries filter by both columns when retrieving records. Database-level foreign key constraints cannot be applied to polymorphic associations, as the referenced table varies by record.

# Polymorphic tagging system
class CreateTags < ActiveRecord::Migration[7.0]
  def change
    create_table :tags do |t|
      t.string :name
      t.timestamps
    end
    
    create_table :taggings do |t|
      t.references :tag, null: false, foreign_key: true
      t.references :taggable, polymorphic: true, null: false
      t.timestamps
    end
    
    add_index :taggings, [:taggable_type, :taggable_id, :tag_id], unique: true
  end
end

class Tag < ApplicationRecord
  has_many :taggings
end

class Tagging < ApplicationRecord
  belongs_to :tag
  belongs_to :taggable, polymorphic: true
end

class Article < ApplicationRecord
  has_many :taggings, as: :taggable
  has_many :tags, through: :taggings
end

Soft Delete Pattern

Soft deletion marks records as deleted without removing them from the database. A deleted_at timestamp column indicates deletion status. Foreign key relationships remain intact, preserving historical data and referential integrity.

Queries must filter deleted records explicitly. Scopes and default scopes automate this filtering, though default scopes can create confusion when accessing deleted records is necessary.

# Soft delete implementation
class AddDeletedAtToRecords < ActiveRecord::Migration[7.0]
  def change
    add_column :users, :deleted_at, :datetime
    add_index :users, :deleted_at
    
    add_column :posts, :deleted_at, :datetime
    add_index :posts, :deleted_at
  end
end

class User < ApplicationRecord
  has_many :posts
  
  scope :active, -> { where(deleted_at: nil) }
  
  def soft_delete
    update(deleted_at: Time.current)
  end
end

Performance Considerations

Foreign Key Indexing

Foreign key columns require indexes for optimal query performance. Without indexes, queries joining tables or filtering by foreign key values perform full table scans. The database must examine every row to find matching foreign key values.

# Explicit index creation on foreign keys
class CreateOrderLineItems < ActiveRecord::Migration[7.0]
  def change
    create_table :orders do |t|
      t.timestamps
    end
    
    create_table :order_line_items do |t|
      t.bigint :order_id, null: false
      t.integer :quantity
      t.timestamps
    end
    
    add_foreign_key :order_line_items, :orders
    add_index :order_line_items, :order_id  # Critical for join performance
  end
end

The references method automatically creates indexes on foreign key columns. Manual foreign key definitions require explicit index creation. Composite foreign keys need indexes on the full column combination, though partial indexes on individual columns may also benefit specific query patterns.

Primary Key Size Impact

Primary key size affects index storage and join performance. Every foreign key reference duplicates the primary key value. A 16-byte UUID consumes twice the space of an 8-byte bigint. Tables with millions of foreign key references amplify this difference.

Index structures in child tables store foreign key values. Larger keys create larger indexes, reducing cache effectiveness and increasing I/O requirements. Each index node holds fewer entries, increasing tree depth and requiring more seeks.

# Comparing primary key storage impact
# Integer keys: 8 bytes per reference
create_table :products, id: :bigint

# UUID keys: 16 bytes per reference
create_table :products, id: :uuid

# In a table with 10 million order_line_items:
# Integer foreign keys: 80 MB
# UUID foreign keys: 160 MB
# Plus index overhead

Query Optimization Strategies

Queries joining multiple tables through foreign keys benefit from proper index selection. The database optimizer chooses indexes based on selectivity and access patterns. Covering indexes including foreign key columns and frequently accessed attributes eliminate additional lookups.

# Query requiring multiple joins
# SELECT orders.*, customers.name, products.title
# FROM orders
# JOIN customers ON customers.id = orders.customer_id
# JOIN order_items ON order_items.order_id = orders.id
# JOIN products ON products.id = order_items.product_id
# WHERE orders.status = 'pending'

# Optimal indexes:
add_index :orders, [:status, :customer_id]
add_index :order_items, :order_id
add_index :order_items, :product_id

Partial indexes on foreign keys combined with common predicates reduce index size while maintaining performance. An index on (user_id WHERE deleted_at IS NULL) supports queries on active records without indexing deleted data.

Cascade Operation Performance

CASCADE delete operations can create performance bottlenecks when parents have thousands of children. The database must identify all dependent records, verify constraints on their dependents, and delete each record. This work occurs within a single transaction, potentially locking tables for extended periods.

Applications handling cascade deletes in code gain more control over batch sizes and timing. Processing deletions in smaller batches with explicit commits prevents long-running transactions. Background jobs can handle deletions asynchronously, keeping user-facing operations responsive.

# Handling cascading deletes efficiently
class Order < ApplicationRecord
  has_many :order_items, dependent: false
  
  def destroy_with_items
    transaction do
      # Delete in batches to avoid long locks
      order_items.in_batches(of: 1000) do |batch|
        batch.delete_all
      end
      destroy
    end
  end
end

Referential Integrity Checking Cost

Foreign key constraints impose validation overhead on INSERT, UPDATE, and DELETE operations. Each modification triggers constraint verification queries. High-throughput systems sometimes disable foreign key constraints during bulk operations, re-enabling them afterward.

Deferrable constraints postpone validation until transaction commit rather than checking immediately. This improves performance when loading related data, as all records can be inserted before verification. PostgreSQL supports deferrable constraints through the DEFERRABLE INITIALLY DEFERRED option.

Reference

Primary Key Types

Type Storage Characteristics Use Cases
Bigint 8 bytes Sequential, auto-increment Standard tables, high insert rate
UUID 16 bytes Globally unique, random or time-ordered Distributed systems, merging databases
Composite Varies Multiple columns combine for uniqueness Junction tables, line items
Natural Varies Domain attribute, may change Legacy systems, external identifiers

Foreign Key Cascade Options

Option Delete Behavior Update Behavior Characteristics
CASCADE Delete dependent records Update foreign keys Composition relationships
RESTRICT Block delete if dependents exist Block update if dependents exist Aggregation relationships
SET NULL Set foreign keys to NULL Set foreign keys to NULL Optional relationships
SET DEFAULT Set foreign keys to default value Set foreign keys to default value Fallback parent exists
NO ACTION Block unless constraints satisfied Block unless constraints satisfied Deferred constraint checking

ActiveRecord Migration Reference

Method Purpose Options
create_table Define table with primary key id, primary_key, force
references Add foreign key column with index null, foreign_key, polymorphic
add_foreign_key Create foreign key constraint column, on_delete, on_update
add_index Create index on columns unique, where, name
add_reference Add foreign key to existing table foreign_key, polymorphic, index

ActiveRecord Association Reference

Association Foreign Key Location Cardinality Options
belongs_to Declaring model One optional, foreign_key, class_name
has_one Associated model One dependent, foreign_key, through
has_many Associated model Many dependent, foreign_key, through
has_and_belongs_to_many Junction table Many-to-many join_table, foreign_key
has_many :through Through model Many-to-many source, source_type

Index Strategy Reference

Scenario Index Configuration Rationale
Standard foreign key Single column index Join optimization
Composite foreign key Multi-column index Match full key
Polymorphic association [type, id] compound index Filter by type first
Optional relationship Partial index WHERE NOT NULL Exclude NULL values
Frequent paired queries Covering index with columns Eliminate lookups
High cardinality key B-tree index Standard access pattern
Self-referential Index on parent_id Hierarchy traversal

Constraint Definition Syntax

# Foreign key with cascade options
add_foreign_key :child_table, :parent_table,
  column: :parent_id,
  on_delete: :cascade,
  on_update: :cascade

# Composite foreign key
add_foreign_key :table, :parent,
  column: [:col1, :col2],
  primary_key: [:id1, :id2]

# Foreign key with custom name
add_foreign_key :orders, :customers,
  column: :buyer_id,
  name: :fk_orders_buyers

# Polymorphic (no database FK)
t.references :taggable, polymorphic: true, index: true

Performance Optimization Checklist

Optimization Implementation Impact
Index all foreign keys add_index on FK columns Join performance
Use appropriate key size bigint vs uuid decision Storage and cache efficiency
Limit cascade depth Restrict cascade levels Transaction duration
Batch large operations Process in chunks Lock contention reduction
Cover common queries Compound indexes Query optimization
Defer constraints Deferrable option Bulk load performance
Soft delete when appropriate deleted_at pattern Preserve referential integrity