Overview
Primary and foreign keys form the foundation of relational database design by establishing identity and relationships between tables. A primary key uniquely identifies each record in a table, while a foreign key references the primary key of another table, creating relationships that maintain data integrity across the database.
The concept originated with E.F. Codd's relational model in 1970, which established mathematical principles for organizing data in tables with explicit relationships. Primary keys enforce entity integrity by guaranteeing each record can be uniquely identified. Foreign keys enforce referential integrity by ensuring relationships reference valid records.
These constraints prevent orphaned records, maintain consistency during updates and deletes, and enable efficient queries across related data. Modern database systems implement keys at the storage level, creating indexes and constraint checking mechanisms that validate operations before committing changes.
# Database schema with primary and foreign keys
create_table :authors do |t|
t.bigint :id, primary_key: true # Primary key
t.string :name
t.timestamps
end
create_table :books do |t|
t.bigint :id, primary_key: true # Primary key
t.bigint :author_id, null: false # Foreign key
t.string :title
t.timestamps
t.foreign_key :authors
end
# Books reference authors through author_id foreign key
Key Principles
Primary Key Characteristics
A primary key must contain unique values for every record in the table. No two rows can share the same primary key value. The key cannot contain NULL values, as NULL represents unknown or missing data and would violate the uniqueness constraint. Each table should have exactly one primary key, though that key may consist of multiple columns.
Primary keys serve as the canonical identifier for records. When other tables need to reference a specific record, they store the primary key value. The database uses the primary key for indexing, making record lookups efficient. Most database systems automatically create a unique index on primary key columns.
Foreign Key Characteristics
A foreign key in one table references the primary key of another table. The foreign key column must match the data type of the referenced primary key. Unlike primary keys, foreign keys can contain NULL values unless explicitly defined as NOT NULL, representing an optional relationship.
Multiple records in the child table can reference the same parent record, establishing one-to-many relationships. The database validates foreign key values during INSERT and UPDATE operations, rejecting any value that does not exist in the referenced table. This validation ensures referential integrity.
Referential Integrity
Referential integrity guarantees that relationships between tables remain valid. When a record in the parent table is updated or deleted, the database must handle dependent records in child tables. Four main strategies exist:
CASCADE propagates changes to dependent records. Deleting a parent record automatically deletes all child records referencing it. Updating a parent's primary key updates all corresponding foreign keys.
RESTRICT prevents operations that would orphan child records. Attempting to delete or update a parent record with existing children raises an error. The application must remove dependencies before modifying the parent.
SET NULL replaces foreign key values with NULL when the parent is deleted or updated. This maintains child records while breaking the relationship. The foreign key column must allow NULL values for this strategy.
SET DEFAULT assigns a default value to foreign keys when parents are removed. The default typically references a special "unknown" or "archived" parent record. This preserves children while redirecting the relationship.
Key Types
Simple keys consist of a single column. Most tables use auto-incrementing integers or UUIDs as simple primary keys. The column value directly identifies the record with no ambiguity.
Composite keys combine multiple columns to form the primary key. Order management systems often use composite keys like (order_id, line_number) where neither column alone is unique. Both columns together identify a specific order line item.
Natural keys derive from the entity's inherent properties. A person's social security number or a product's ISBN represent natural keys with meaning in the problem domain. Natural keys can change over time, creating maintenance challenges when values need updating across foreign key references.
Surrogate keys are artificial identifiers created solely for database purposes. Auto-incrementing integers and UUIDs serve as surrogate keys with no business meaning. Surrogate keys remain stable even when natural properties change, simplifying updates and improving performance.
Relationship Cardinality
One-to-one relationships connect exactly one record in each table. Implementation places a unique constraint on the foreign key column, preventing multiple children from referencing the same parent. User profiles often use one-to-one relationships, where each user has exactly one profile.
One-to-many relationships allow multiple child records per parent. Foreign keys in the child table reference the parent's primary key without uniqueness constraints. Authors and books demonstrate one-to-many relationships, where one author can write many books.
Many-to-many relationships require a junction table containing two foreign keys. Each foreign key references one of the connected tables, and the combination forms the junction table's primary key. Students and courses use many-to-many relationships, where students enroll in multiple courses and courses contain multiple students.
Ruby Implementation
Migration Definition
ActiveRecord migrations define primary and foreign keys using the table definition DSL. The framework automatically creates an auto-incrementing integer primary key named id unless specified otherwise. Explicit primary key definitions override this default behavior.
# Standard migration with auto-generated primary key
class CreateOrders < ActiveRecord::Migration[7.0]
def change
create_table :orders do |t|
t.string :order_number
t.decimal :total_amount
t.timestamps
end
# Creates: id bigint PRIMARY KEY auto-increment
end
end
Foreign key columns require explicit definition with the appropriate data type matching the referenced primary key. The foreign_key method establishes the constraint at the database level, configuring validation and cascade behavior.
# Migration with explicit foreign key constraint
class CreateOrderItems < ActiveRecord::Migration[7.0]
def change
create_table :order_items do |t|
t.references :order, null: false, foreign_key: true
t.references :product, null: false, foreign_key: true
t.integer :quantity
t.decimal :price
t.timestamps
end
end
end
# Creates: order_id and product_id bigint columns with foreign key constraints
# Indexes automatically added to foreign key columns
The references method combines column creation with foreign key constraints and index generation. It creates a column with the _id suffix, adds an index, and optionally establishes the foreign key constraint. Setting foreign_key: true creates the constraint with default CASCADE behavior.
Composite Primary Keys
Tables requiring composite primary keys specify multiple columns in the primary_key option. ActiveRecord 7.1 and later provide native support for composite primary keys through array syntax.
# Composite primary key definition
class CreateOrderLineItems < ActiveRecord::Migration[7.1]
def change
create_table :order_line_items, primary_key: [:order_id, :line_number] do |t|
t.bigint :order_id
t.integer :line_number
t.references :product, foreign_key: true
t.decimal :price
t.timestamps
end
add_foreign_key :order_line_items, :orders, column: :order_id
end
end
UUID Primary Keys
UUID primary keys provide globally unique identifiers without coordination between database instances. They enable distributed systems to generate keys independently without collision risk.
# UUID primary key configuration
class CreateDocuments < ActiveRecord::Migration[7.0]
def change
enable_extension 'pgcrypto' unless extension_enabled?('pgcrypto')
create_table :documents, id: :uuid do |t|
t.string :title
t.text :content
t.timestamps
end
create_table :document_versions, id: :uuid do |t|
t.uuid :document_id, null: false
t.integer :version_number
t.text :content
t.timestamps
end
add_foreign_key :document_versions, :documents
add_index :document_versions, :document_id
end
end
Association Declarations
Model associations define relationships through primary and foreign key mappings. ActiveRecord infers foreign key column names from association names, following the pattern association_name_id.
# Model associations reflecting database relationships
class Author < ApplicationRecord
has_many :books, dependent: :destroy
has_one :author_profile, dependent: :destroy
end
class Book < ApplicationRecord
belongs_to :author
belongs_to :publisher, optional: true
has_many :reviews, dependent: :destroy
end
class AuthorProfile < ApplicationRecord
belongs_to :author
end
class Review < ApplicationRecord
belongs_to :book
belongs_to :reviewer, class_name: 'User'
end
The belongs_to association expects a foreign key column in the declaring model. The has_many association looks for a foreign key in the associated model's table. The dependent option controls cascade behavior when destroying parent records.
Polymorphic Associations
Polymorphic associations allow a model to reference multiple parent types through a single foreign key. The implementation stores both the foreign key ID and the parent's class name.
# Polymorphic association migration
class CreateComments < ActiveRecord::Migration[7.0]
def change
create_table :comments do |t|
t.references :commentable, polymorphic: true, null: false
t.text :body
t.timestamps
end
add_index :comments, [:commentable_type, :commentable_id]
end
end
# Models using polymorphic association
class Comment < ApplicationRecord
belongs_to :commentable, polymorphic: true
end
class Article < ApplicationRecord
has_many :comments, as: :commentable, dependent: :destroy
end
class Video < ApplicationRecord
has_many :comments, as: :commentable, dependent: :destroy
end
Custom Foreign Keys
Non-standard foreign key names require explicit specification in both migrations and model associations. This supports legacy schemas or domain-specific naming conventions.
# Custom foreign key configuration
class CreateEmployees < ActiveRecord::Migration[7.0]
def change
create_table :employees do |t|
t.string :name
t.bigint :supervisor_id
t.bigint :dept_id
t.timestamps
end
add_foreign_key :employees, :employees, column: :supervisor_id
add_foreign_key :employees, :departments, column: :dept_id
add_index :employees, :supervisor_id
add_index :employees, :dept_id
end
end
class Employee < ApplicationRecord
belongs_to :supervisor, class_name: 'Employee', foreign_key: 'supervisor_id', optional: true
belongs_to :department, foreign_key: 'dept_id'
has_many :subordinates, class_name: 'Employee', foreign_key: 'supervisor_id'
end
Design Considerations
Natural Versus Surrogate Keys
Natural keys use domain-specific attributes as primary keys. Email addresses, social security numbers, or product SKUs serve as natural keys when they guarantee uniqueness. Natural keys provide immediate meaning and eliminate the need for joins when querying by the key value.
Natural keys create maintenance challenges when values change. Email address changes require updating all foreign key references across the database. Some natural candidates like email addresses may need to be reused, violating the immutability principle. Privacy regulations sometimes mandate removing or anonymizing identifiable information, complicating natural key management.
Surrogate keys use generated identifiers without business meaning. Auto-incrementing integers and UUIDs serve as stable surrogate keys that never change regardless of entity attribute updates. Foreign key references remain valid even when business attributes like email or username change.
Surrogate keys add an extra column and require joins when filtering by natural attributes. Queries must join through the surrogate key to access related data, potentially impacting performance compared to natural key queries. The choice depends on update frequency, query patterns, and stability of natural candidates.
Integer Versus UUID Keys
Integer primary keys use sequential auto-incrementing values generated by the database. They consume minimal storage space (8 bytes for bigint), provide excellent index performance, and sort chronologically. The sequential nature reveals creation order and record count information.
Sequential integers create challenges in distributed systems. Multiple database instances cannot independently generate unique keys without coordination. Merging data from separate databases risks key collisions. The sequential pattern may expose business metrics or enable ID enumeration attacks.
UUID keys generate globally unique identifiers without coordination. Each instance can create UUIDs independently with negligible collision probability. They hide record count information and prevent ID enumeration. UUIDs support distributed architectures and data migration scenarios.
UUIDs consume more storage (16 bytes) and create larger indexes. Random UUIDs reduce index efficiency compared to sequential values, as inserts scatter across the index tree rather than appending to the end. Database systems optimized for sequential keys may experience performance degradation with UUIDs. UUID version 7 addresses some concerns by incorporating timestamp ordering.
Composite Key Trade-offs
Composite keys naturally represent entities requiring multiple attributes for identification. Order line items identified by (order_id, line_number) or seat assignments identified by (flight_id, seat_number) reflect inherent domain structure. Composite keys enforce uniqueness across the combined columns without additional constraints.
Composite keys complicate foreign key relationships. Child tables referencing composite keys must include all constituent columns in the foreign key definition. This increases storage requirements and index size. Queries joining through composite keys require matching multiple columns, adding complexity to WHERE clauses.
Many-to-many junction tables naturally use composite keys formed from both foreign keys. The combination (student_id, course_id) identifies enrollments uniquely. Some developers add a surrogate ID to junction tables for simplicity, though this duplicates the uniqueness constraint.
Nullable Foreign Keys
Nullable foreign keys represent optional relationships. A book's publisher_id may be NULL during draft stages before publisher assignment. The NULL value indicates the relationship does not yet exist rather than referencing an invalid record.
Required relationships use NOT NULL constraints on foreign keys. Order items must reference valid products, so product_id should prohibit NULL values. The database rejects any INSERT or UPDATE attempting to create orphaned records.
Queries on nullable foreign keys require careful NULL handling. LEFT JOINS include records with NULL foreign keys, while INNER JOINS exclude them. Application logic must distinguish between "relationship does not exist" and "related record not found" scenarios.
Some designs avoid nullable foreign keys by creating default "unknown" parent records. A placeholder publisher with id=0 allows all books to reference a valid publisher_id while semantically representing "no publisher assigned". This approach simplifies queries but creates special-case records requiring documentation.
Cascade Strategy Selection
CASCADE deletion removes child records when parents are deleted. This maintains referential integrity automatically for composition relationships where children have no independent existence. Deleting an order should cascade to order items, as line items cannot exist without the parent order.
RESTRICT deletion prevents removing parents with existing children. Blog posts should use RESTRICT for author relationships, as deleting an author should not remove all their posts. The application must explicitly handle the dependency, perhaps by reassigning posts or soft-deleting the author.
SET NULL disconnects children when parents are deleted. Product categories might use SET NULL, allowing products to persist without category assignment after category deletion. This requires nullable foreign keys and application logic to handle uncategorized items.
The cascade strategy affects data lifecycle management. Systems requiring audit trails typically use RESTRICT or soft deletes rather than CASCADE. Cascade operations can create performance issues when deleting parents with thousands of children, as each deletion triggers constraint checks.
Common Patterns
Standard Relationship Patterns
One-to-many relationships dominate database design. The parent table contains the primary key, and the child table stores the foreign key with an index for efficient lookups. Authors and books, customers and orders, and categories and products all follow this pattern.
# Standard one-to-many relationship
class CreateBlogPosts < ActiveRecord::Migration[7.0]
def change
create_table :users do |t|
t.string :username
t.timestamps
end
create_table :blog_posts do |t|
t.references :user, null: false, foreign_key: true
t.string :title
t.text :body
t.timestamps
end
end
end
One-to-one relationships extend parent records with additional attributes. User profiles, authentication credentials, and configuration settings often use one-to-one relationships. The child table's foreign key includes a unique constraint preventing multiple children per parent.
# One-to-one relationship with unique constraint
class CreateUserProfiles < ActiveRecord::Migration[7.0]
def change
create_table :user_profiles do |t|
t.references :user, null: false, foreign_key: true, index: { unique: true }
t.string :bio
t.string :avatar_url
t.timestamps
end
end
end
Junction Table Pattern
Many-to-many relationships require a junction table containing foreign keys to both related tables. The composite primary key formed from both foreign keys ensures each relationship pair appears only once.
# Many-to-many junction table
class CreateEnrollments < ActiveRecord::Migration[7.0]
def change
create_table :students do |t|
t.string :name
t.timestamps
end
create_table :courses do |t|
t.string :title
t.timestamps
end
create_table :enrollments, primary_key: [:student_id, :course_id] do |t|
t.bigint :student_id, null: false
t.bigint :course_id, null: false
t.date :enrolled_at
t.string :grade
end
add_foreign_key :enrollments, :students
add_foreign_key :enrollments, :courses
add_index :enrollments, :student_id
add_index :enrollments, :course_id
end
end
class Student < ApplicationRecord
has_many :enrollments
has_many :courses, through: :enrollments
end
class Course < ApplicationRecord
has_many :enrollments
has_many :students, through: :enrollments
end
class Enrollment < ApplicationRecord
belongs_to :student
belongs_to :course
end
Junction tables storing additional relationship attributes may use surrogate primary keys instead of composite keys. Enrollment date, grade, or status values justify modeling the junction as a first-class entity with its own ID.
Self-Referential Pattern
Self-referential relationships connect records within the same table. Organizational hierarchies, comment threads, and category trees use self-referential foreign keys. The foreign key references the same table's primary key, typically with a nullable constraint to represent root nodes.
# Self-referential relationship
class CreateCategories < ActiveRecord::Migration[7.0]
def change
create_table :categories do |t|
t.string :name
t.references :parent, foreign_key: { to_table: :categories }
t.timestamps
end
end
end
class Category < ApplicationRecord
belongs_to :parent, class_name: 'Category', optional: true
has_many :subcategories, class_name: 'Category', foreign_key: 'parent_id'
end
Polymorphic Pattern
Polymorphic associations allow a single foreign key to reference multiple parent tables. Comments, attachments, and tags often use polymorphic associations to avoid duplicating similar structures across multiple parent types.
The implementation stores two columns: a foreign key ID and a type discriminator string. Queries filter by both columns when retrieving records. Database-level foreign key constraints cannot be applied to polymorphic associations, as the referenced table varies by record.
# Polymorphic tagging system
class CreateTags < ActiveRecord::Migration[7.0]
def change
create_table :tags do |t|
t.string :name
t.timestamps
end
create_table :taggings do |t|
t.references :tag, null: false, foreign_key: true
t.references :taggable, polymorphic: true, null: false
t.timestamps
end
add_index :taggings, [:taggable_type, :taggable_id, :tag_id], unique: true
end
end
class Tag < ApplicationRecord
has_many :taggings
end
class Tagging < ApplicationRecord
belongs_to :tag
belongs_to :taggable, polymorphic: true
end
class Article < ApplicationRecord
has_many :taggings, as: :taggable
has_many :tags, through: :taggings
end
Soft Delete Pattern
Soft deletion marks records as deleted without removing them from the database. A deleted_at timestamp column indicates deletion status. Foreign key relationships remain intact, preserving historical data and referential integrity.
Queries must filter deleted records explicitly. Scopes and default scopes automate this filtering, though default scopes can create confusion when accessing deleted records is necessary.
# Soft delete implementation
class AddDeletedAtToRecords < ActiveRecord::Migration[7.0]
def change
add_column :users, :deleted_at, :datetime
add_index :users, :deleted_at
add_column :posts, :deleted_at, :datetime
add_index :posts, :deleted_at
end
end
class User < ApplicationRecord
has_many :posts
scope :active, -> { where(deleted_at: nil) }
def soft_delete
update(deleted_at: Time.current)
end
end
Performance Considerations
Foreign Key Indexing
Foreign key columns require indexes for optimal query performance. Without indexes, queries joining tables or filtering by foreign key values perform full table scans. The database must examine every row to find matching foreign key values.
# Explicit index creation on foreign keys
class CreateOrderLineItems < ActiveRecord::Migration[7.0]
def change
create_table :orders do |t|
t.timestamps
end
create_table :order_line_items do |t|
t.bigint :order_id, null: false
t.integer :quantity
t.timestamps
end
add_foreign_key :order_line_items, :orders
add_index :order_line_items, :order_id # Critical for join performance
end
end
The references method automatically creates indexes on foreign key columns. Manual foreign key definitions require explicit index creation. Composite foreign keys need indexes on the full column combination, though partial indexes on individual columns may also benefit specific query patterns.
Primary Key Size Impact
Primary key size affects index storage and join performance. Every foreign key reference duplicates the primary key value. A 16-byte UUID consumes twice the space of an 8-byte bigint. Tables with millions of foreign key references amplify this difference.
Index structures in child tables store foreign key values. Larger keys create larger indexes, reducing cache effectiveness and increasing I/O requirements. Each index node holds fewer entries, increasing tree depth and requiring more seeks.
# Comparing primary key storage impact
# Integer keys: 8 bytes per reference
create_table :products, id: :bigint
# UUID keys: 16 bytes per reference
create_table :products, id: :uuid
# In a table with 10 million order_line_items:
# Integer foreign keys: 80 MB
# UUID foreign keys: 160 MB
# Plus index overhead
Query Optimization Strategies
Queries joining multiple tables through foreign keys benefit from proper index selection. The database optimizer chooses indexes based on selectivity and access patterns. Covering indexes including foreign key columns and frequently accessed attributes eliminate additional lookups.
# Query requiring multiple joins
# SELECT orders.*, customers.name, products.title
# FROM orders
# JOIN customers ON customers.id = orders.customer_id
# JOIN order_items ON order_items.order_id = orders.id
# JOIN products ON products.id = order_items.product_id
# WHERE orders.status = 'pending'
# Optimal indexes:
add_index :orders, [:status, :customer_id]
add_index :order_items, :order_id
add_index :order_items, :product_id
Partial indexes on foreign keys combined with common predicates reduce index size while maintaining performance. An index on (user_id WHERE deleted_at IS NULL) supports queries on active records without indexing deleted data.
Cascade Operation Performance
CASCADE delete operations can create performance bottlenecks when parents have thousands of children. The database must identify all dependent records, verify constraints on their dependents, and delete each record. This work occurs within a single transaction, potentially locking tables for extended periods.
Applications handling cascade deletes in code gain more control over batch sizes and timing. Processing deletions in smaller batches with explicit commits prevents long-running transactions. Background jobs can handle deletions asynchronously, keeping user-facing operations responsive.
# Handling cascading deletes efficiently
class Order < ApplicationRecord
has_many :order_items, dependent: false
def destroy_with_items
transaction do
# Delete in batches to avoid long locks
order_items.in_batches(of: 1000) do |batch|
batch.delete_all
end
destroy
end
end
end
Referential Integrity Checking Cost
Foreign key constraints impose validation overhead on INSERT, UPDATE, and DELETE operations. Each modification triggers constraint verification queries. High-throughput systems sometimes disable foreign key constraints during bulk operations, re-enabling them afterward.
Deferrable constraints postpone validation until transaction commit rather than checking immediately. This improves performance when loading related data, as all records can be inserted before verification. PostgreSQL supports deferrable constraints through the DEFERRABLE INITIALLY DEFERRED option.
Reference
Primary Key Types
| Type | Storage | Characteristics | Use Cases |
|---|---|---|---|
| Bigint | 8 bytes | Sequential, auto-increment | Standard tables, high insert rate |
| UUID | 16 bytes | Globally unique, random or time-ordered | Distributed systems, merging databases |
| Composite | Varies | Multiple columns combine for uniqueness | Junction tables, line items |
| Natural | Varies | Domain attribute, may change | Legacy systems, external identifiers |
Foreign Key Cascade Options
| Option | Delete Behavior | Update Behavior | Characteristics |
|---|---|---|---|
| CASCADE | Delete dependent records | Update foreign keys | Composition relationships |
| RESTRICT | Block delete if dependents exist | Block update if dependents exist | Aggregation relationships |
| SET NULL | Set foreign keys to NULL | Set foreign keys to NULL | Optional relationships |
| SET DEFAULT | Set foreign keys to default value | Set foreign keys to default value | Fallback parent exists |
| NO ACTION | Block unless constraints satisfied | Block unless constraints satisfied | Deferred constraint checking |
ActiveRecord Migration Reference
| Method | Purpose | Options |
|---|---|---|
| create_table | Define table with primary key | id, primary_key, force |
| references | Add foreign key column with index | null, foreign_key, polymorphic |
| add_foreign_key | Create foreign key constraint | column, on_delete, on_update |
| add_index | Create index on columns | unique, where, name |
| add_reference | Add foreign key to existing table | foreign_key, polymorphic, index |
ActiveRecord Association Reference
| Association | Foreign Key Location | Cardinality | Options |
|---|---|---|---|
| belongs_to | Declaring model | One | optional, foreign_key, class_name |
| has_one | Associated model | One | dependent, foreign_key, through |
| has_many | Associated model | Many | dependent, foreign_key, through |
| has_and_belongs_to_many | Junction table | Many-to-many | join_table, foreign_key |
| has_many :through | Through model | Many-to-many | source, source_type |
Index Strategy Reference
| Scenario | Index Configuration | Rationale |
|---|---|---|
| Standard foreign key | Single column index | Join optimization |
| Composite foreign key | Multi-column index | Match full key |
| Polymorphic association | [type, id] compound index | Filter by type first |
| Optional relationship | Partial index WHERE NOT NULL | Exclude NULL values |
| Frequent paired queries | Covering index with columns | Eliminate lookups |
| High cardinality key | B-tree index | Standard access pattern |
| Self-referential | Index on parent_id | Hierarchy traversal |
Constraint Definition Syntax
# Foreign key with cascade options
add_foreign_key :child_table, :parent_table,
column: :parent_id,
on_delete: :cascade,
on_update: :cascade
# Composite foreign key
add_foreign_key :table, :parent,
column: [:col1, :col2],
primary_key: [:id1, :id2]
# Foreign key with custom name
add_foreign_key :orders, :customers,
column: :buyer_id,
name: :fk_orders_buyers
# Polymorphic (no database FK)
t.references :taggable, polymorphic: true, index: true
Performance Optimization Checklist
| Optimization | Implementation | Impact |
|---|---|---|
| Index all foreign keys | add_index on FK columns | Join performance |
| Use appropriate key size | bigint vs uuid decision | Storage and cache efficiency |
| Limit cascade depth | Restrict cascade levels | Transaction duration |
| Batch large operations | Process in chunks | Lock contention reduction |
| Cover common queries | Compound indexes | Query optimization |
| Defer constraints | Deferrable option | Bulk load performance |
| Soft delete when appropriate | deleted_at pattern | Preserve referential integrity |