Overview
Physical data models represent the concrete implementation of database schemas within specific database management systems. Where logical data models define what data exists and how entities relate conceptually, physical data models specify how data gets stored on disk, accessed by queries, and optimized for performance.
A physical data model includes storage-level details: table structures with exact data types, index definitions, partitioning schemes, constraint implementations, and storage parameters. These decisions directly affect query performance, storage costs, and system scalability. The same logical model can map to radically different physical implementations depending on workload characteristics, database engine capabilities, and performance requirements.
Physical modeling operates within the constraints of specific database systems. PostgreSQL, MySQL, Oracle, and other databases offer different storage engines, index types, and optimization features. Physical models must account for these system-specific capabilities while maintaining the integrity of the underlying logical design.
# Logical concept: Users have many posts
class User
has_many :posts
end
# Physical implementation in migration
create_table :users do |t|
t.string :email, null: false, limit: 255
t.string :encrypted_password, limit: 128
t.timestamps precision: 6
end
add_index :users, :email, unique: true,
using: :btree, algorithm: :concurrently
The transformation from logical to physical modeling requires balancing multiple concerns: query performance, storage efficiency, write throughput, data integrity enforcement, and operational maintainability. Each physical design choice creates trade-offs that affect system behavior under different load patterns.
Key Principles
Physical data models translate abstract logical structures into concrete storage implementations through several core mechanisms. Each database table corresponds to an actual file or set of files on disk, organized according to the storage engine's rules. The physical model specifies not just what columns exist, but their precise data types, storage sizes, null handling, and default values.
Data type mapping converts logical types into database-specific physical representations. An email address might logically be a string, but physically could be VARCHAR(255), TEXT, or CITEXT depending on validation requirements, index considerations, and search needs. Numeric types require decisions about precision, scale, and storage size that affect both accuracy and performance.
# Different physical representations of the same logical concept
create_table :products do |t|
# Option 1: Fixed precision for currency
t.decimal :price, precision: 10, scale: 2
# Option 2: Integer cents for exact arithmetic
t.integer :price_cents, limit: 8
# Option 3: Float for scientific calculations (not recommended for currency)
t.float :price
end
Index structures determine query performance characteristics. Physical models define which columns get indexed, what index types to use, and how indexes get maintained. B-tree indexes support range queries and sorting, hash indexes optimize exact-match lookups, GiST indexes handle geometric data, and GIN indexes accelerate full-text search. Index choices affect both read and write performance.
Partitioning schemes divide large tables into smaller physical segments based on column values or ranges. Range partitioning splits data by date ranges, list partitioning groups by discrete values, and hash partitioning distributes rows evenly. Partitioning decisions affect query planning, maintenance operations, and data lifecycle management.
# Declarative partitioning in PostgreSQL via Ruby
execute <<-SQL
CREATE TABLE events (
id BIGSERIAL,
user_id BIGINT NOT NULL,
event_type VARCHAR(50) NOT NULL,
created_at TIMESTAMP NOT NULL,
data JSONB
) PARTITION BY RANGE (created_at);
SQL
execute <<-SQL
CREATE TABLE events_2024_q1 PARTITION OF events
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
SQL
execute <<-SQL
CREATE TABLE events_2024_q2 PARTITION OF events
FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');
SQL
Storage parameters control how the database physically stores and manages data. These include fill factors that determine how densely data packs into pages, autovacuum settings that manage dead tuple cleanup, and tablespace assignments that control which physical disks hold data. Storage parameters significantly affect performance for write-heavy workloads.
Constraint enforcement at the physical level ensures data integrity through database mechanisms rather than application logic. Primary keys guarantee uniqueness and provide default clustering, foreign keys enforce referential integrity, check constraints validate data ranges, and exclusion constraints prevent conflicting data. Physical constraint implementation affects both data quality and write performance.
Physical models must account for access patterns observed in production systems. Columns frequently used in WHERE clauses need indexes, columns in JOIN conditions require optimization, and columns in ORDER BY clauses benefit from specific index types. The physical design reflects how applications actually query data, not just theoretical data relationships.
Denormalization strategies trade storage space and update complexity for query performance. Materialized aggregates eliminate expensive calculations, redundant columns avoid joins, and precomputed values speed up common queries. Physical models often intentionally violate normalization rules when query patterns justify the trade-off.
Ruby Implementation
Ruby applications interact with physical data models primarily through ActiveRecord migrations and schema definitions. Migrations define incremental changes to database schemas, creating a version-controlled history of physical model evolution. Each migration translates logical requirements into specific database commands.
class CreateOrdersWithOptimizations < ActiveRecord::Migration[7.0]
def change
create_table :orders, id: :bigint do |t|
t.references :user, null: false, foreign_key: true,
index: { algorithm: :concurrently }
t.references :shipping_address, foreign_key: { to_table: :addresses }
t.string :order_number, null: false, limit: 32
t.integer :status, null: false, default: 0, limit: 2
t.decimal :total_amount, precision: 12, scale: 2, null: false
t.decimal :tax_amount, precision: 10, scale: 2
t.datetime :placed_at, precision: 6
t.datetime :fulfilled_at, precision: 6
t.timestamps precision: 6
t.index :order_number, unique: true
t.index [:user_id, :created_at], order: { created_at: :desc }
t.index :status, where: "status IN (0, 1)",
name: 'index_orders_on_pending_statuses'
end
# Add constraint enforcement at database level
add_check_constraint :orders,
"total_amount >= 0",
name: 'orders_total_amount_positive'
end
end
ActiveRecord's migration DSL abstracts database-specific syntax while exposing physical model features. The create_table method accepts options for storage engines, tablespaces, and table options. Column definitions specify exact types, sizes, and constraints. Index definitions control index types, sort orders, and partial index conditions.
Schema caching maintains a representation of the physical model in memory. ActiveRecord loads schema information at startup, caching column types, default values, and constraint information. This cache affects how ActiveRecord instantiates model objects and validates data before database operations.
# Schema information stored in db/schema.rb
ActiveRecord::Schema[7.0].define(version: 2024_03_15_120000) do
enable_extension "pgcrypto"
enable_extension "btree_gin"
create_table "users", force: :cascade do |t|
t.string "email", limit: 255, null: false
t.string "encrypted_password", limit: 128
t.integer "sign_in_count", default: 0, null: false
t.datetime "current_sign_in_at", precision: 6
t.datetime "created_at", precision: 6, null: false
t.datetime "updated_at", precision: 6, null: false
t.index ["email"], name: "index_users_on_email", unique: true
end
end
Database-specific adapters translate ActiveRecord operations into database-native commands. PostgreSQL adapters support array columns, JSON types, and advanced index types. MySQL adapters handle storage engine selection and charset specifications. Adapter capabilities directly affect what physical features migrations can express.
# PostgreSQL-specific physical features
create_table :documents do |t|
t.string :title
t.text :body
t.tsvector :search_vector
t.string :tags, array: true, default: []
t.jsonb :metadata, default: {}
t.timestamps
# PostgreSQL GIN index for full-text search
t.index :search_vector, using: :gin
# PostgreSQL GIN index for JSONB queries
t.index :metadata, using: :gin
# PostgreSQL array index
t.index :tags, using: :gin
end
# Trigger to maintain search vector
execute <<-SQL
CREATE TRIGGER documents_search_vector_update
BEFORE INSERT OR UPDATE ON documents
FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(search_vector, 'pg_catalog.english', title, body);
SQL
Connection pool configuration affects physical resource usage. Pool size determines maximum concurrent database connections, checkout timeouts control connection wait behavior, and reaping settings manage stale connections. These parameters directly affect how applications interact with physical database instances.
# config/database.yml physical connection settings
production:
adapter: postgresql
pool: <%= ENV.fetch("RAILS_MAX_THREADS", 5) %>
timeout: 5000
checkout_timeout: 5
reaping_frequency: 10
statement_limit: 1000
variables:
statement_timeout: 5000
lock_timeout: 3000
Physical model changes require migration strategy considerations. Online migrations use techniques like algorithm concurrency to avoid blocking production writes. Adding indexes with algorithm: :concurrently allows queries to continue during index builds. Column additions with non-null defaults on large tables require careful approaches to avoid table locks.
class AddColumnsWithoutLocking < ActiveRecord::Migration[7.0]
disable_ddl_transaction!
def change
# Add column without default first
add_column :users, :account_type, :integer
# Backfill in batches
User.in_batches(of: 1000).update_all(account_type: 0)
# Add not null constraint separately
change_column_null :users, :account_type, false
# Add index concurrently
add_index :users, :account_type, algorithm: :concurrently
end
end
Model classes interact with physical schemas through attribute methods. ActiveRecord generates methods based on physical column names, maps Ruby types to database types, and enforces constraints through validations that mirror physical constraints. Model behavior depends directly on underlying physical structure.
Design Considerations
Physical data model design requires evaluating multiple dimensions of database behavior. The optimal physical design for a high-write, time-series workload differs fundamentally from an analytical reporting database or a transactional e-commerce system. Each design decision creates trade-offs between competing concerns.
Normalization versus denormalization forms the central design tension. Normalized physical models minimize data redundancy, simplify updates, and reduce storage space. Queries spanning multiple normalized tables require joins that cost CPU cycles and I/O operations. Denormalized physical models duplicate data strategically to eliminate joins, trading storage space and update complexity for query speed.
# Normalized physical model
create_table :order_items do |t|
t.references :order, null: false, foreign_key: true
t.references :product, null: false, foreign_key: true
t.integer :quantity, null: false
t.decimal :unit_price, precision: 10, scale: 2
# Total must be calculated: quantity * unit_price
end
# Denormalized physical model with precomputed totals
create_table :order_items do |t|
t.references :order, null: false, foreign_key: true
t.references :product, null: false, foreign_key: true
t.string :product_name, limit: 255 # Denormalized from products
t.integer :quantity, null: false
t.decimal :unit_price, precision: 10, scale: 2
t.decimal :line_total, precision: 12, scale: 2 # Precomputed
end
Index selection balances query performance against write overhead and storage costs. Each index speeds up specific query patterns while slowing down inserts and updates. Composite indexes serve queries filtering on multiple columns but consume more disk space than single-column indexes. Covering indexes eliminate table lookups entirely but duplicate column data.
Partial indexes optimize specific query patterns by indexing only rows matching a WHERE condition. A partial index on active users reduces index size and maintenance overhead for applications that rarely query inactive users. Expression indexes support queries filtering on computed values or function results.
# Partial index for common query pattern
add_index :orders, :created_at,
where: "status = 'pending'",
name: 'index_orders_pending_by_date'
# Expression index for case-insensitive searches
add_index :users, "LOWER(email)",
name: 'index_users_on_lower_email'
# Covering index to avoid table lookups
add_index :products, [:category_id, :price],
include: [:name, :sku],
name: 'index_products_for_catalog'
Data type precision affects both storage efficiency and application behavior. Choosing between VARCHAR(255) and TEXT impacts index size and sorting performance. Selecting DECIMAL versus INTEGER for currency determines accuracy guarantees. Date precision choices affect timestamp resolution and comparison behavior.
Integer size selection determines value ranges and storage space. A SMALLINT uses 2 bytes and supports values from -32768 to 32767. A BIGINT uses 8 bytes and handles values up to 9 quintillion. Undersized integer columns cause overflow errors, while oversized columns waste storage space.
Constraint placement determines where the database enforces data integrity rules. Physical constraints in the database guarantee consistency across all application paths, but add overhead to write operations. Application-level validations offer flexibility and better error messages but cannot prevent constraint violations from other database clients.
# Database-enforced constraints
create_table :accounts do |t|
t.decimal :balance, precision: 15, scale: 2, null: false
t.datetime :created_at, null: false
# Physical constraint prevents negative balances
t.check_constraint "balance >= 0", name: 'accounts_positive_balance'
end
# Foreign key with cascade behavior
add_foreign_key :posts, :users, on_delete: :cascade
# Exclusion constraint (PostgreSQL)
execute <<-SQL
ALTER TABLE bookings
ADD CONSTRAINT bookings_no_overlap
EXCLUDE USING GIST (room_id WITH =, tsrange(start_time, end_time) WITH &&);
SQL
Partitioning strategy selection depends on data lifecycle and query patterns. Time-based range partitioning suits event logs and time-series data with retention policies. List partitioning groups data by discrete categories like geographic regions. Hash partitioning distributes data evenly when no natural partitioning key exists.
Partition granularity affects query planning and maintenance operations. Daily partitions create thousands of table segments for historical data, while monthly or yearly partitions reduce management overhead at the cost of less precise data lifecycle control. Partition pruning eliminates irrelevant partitions from query execution but requires queries to filter on partition keys.
Storage engine selection determines transaction semantics, locking behavior, and performance characteristics. PostgreSQL offers a single MVCC-based engine with tunable parameters. MySQL provides InnoDB for transactions and MyISAM for read-heavy workloads without transaction overhead. Storage engine choices affect crash recovery, backup procedures, and replication behavior.
Implementation Approaches
Physical data model implementation requires coordinating schema creation, data migration, and application deployment. Different approaches balance risk, downtime requirements, and operational complexity.
Schema-first implementation creates the complete physical model before deploying application code. Database administrators review and optimize the schema, create indexes proactively, and validate constraints. This approach works well for new applications or major schema overhauls but requires accurate prediction of access patterns.
# Complete schema definition before deployment
class InitialSchema < ActiveRecord::Migration[7.0]
def change
create_table :users do |t|
t.string :email, null: false, limit: 255
t.string :username, limit: 50
t.integer :role, null: false, default: 0
t.timestamps precision: 6
t.index :email, unique: true
t.index :username, unique: true
t.index [:role, :created_at]
end
create_table :posts do |t|
t.references :user, null: false, foreign_key: true
t.string :title, null: false, limit: 255
t.text :body
t.integer :status, null: false, default: 0
t.timestamps precision: 6
t.index [:user_id, :created_at]
t.index :status
end
# Additional tables...
end
end
Iterative implementation evolves physical models incrementally through small migrations. Each migration represents a single logical change: adding a column, creating an index, or modifying a constraint. This approach minimizes deployment risk and allows physical model evolution based on observed production patterns.
Iterative implementation requires coordination between schema changes and application code. Adding a required column needs a multi-step process: add column as nullable, backfill values in production, add not-null constraint, deploy code expecting the column. Removing columns follows the reverse pattern: deploy code ignoring the column, verify no usage, drop the column.
# Step 1: Add column as nullable
class AddUserTypeColumn < ActiveRecord::Migration[7.0]
def change
add_column :users, :user_type, :integer
end
end
# Step 2: Backfill default values
class BackfillUserType < ActiveRecord::Migration[7.0]
def up
User.in_batches(of: 5000).update_all(user_type: 0)
end
end
# Step 3: Add not null constraint
class RequireUserType < ActiveRecord::Migration[7.0]
def change
change_column_null :users, :user_type, false
end
end
Zero-downtime migration strategies modify physical models without interrupting production traffic. Techniques include creating new columns before deprecating old ones, building indexes concurrently, and using database features like triggers to maintain dual-write compatibility during transitions.
Ghost table approaches create shadow copies of tables, apply migrations to shadows, and swap table names atomically. This technique works for schema changes that would otherwise lock tables for extended periods. Tools like pt-online-schema-change automate ghost table workflows.
# Dual-write pattern for column migration
class MigrateAddressToComponents < ActiveRecord::Migration[7.0]
def change
# Add new columns
add_column :users, :street_address, :string
add_column :users, :city, :string
add_column :users, :postal_code, :string
# Application code writes to both old and new columns
# After validation period, drop old column
end
end
Constraint-based implementation leverages database constraints to enforce business rules physically. Check constraints validate value ranges, foreign keys guarantee referential integrity, and unique constraints prevent duplicates. This approach centralizes validation logic in the database but requires careful constraint design to avoid overly restrictive rules.
Deferred constraints allow transactions to temporarily violate rules before commit. This technique supports complex multi-row updates that would otherwise violate constraints during intermediate states. Deferred constraint checking trades immediate validation for transactional consistency.
Staged rollout implementation deploys physical model changes progressively across database replicas or shards. Primary databases receive schema changes first, replicas follow after validation. This approach detects problems on secondary infrastructure before affecting critical primary databases.
Performance Considerations
Physical data model design directly determines database performance characteristics. Query execution plans, storage I/O patterns, and memory utilization all depend on physical schema decisions. Performance optimization requires understanding how database engines interact with physical structures.
Index performance depends on cardinality, selectivity, and index structure. High-cardinality columns (many distinct values) benefit more from indexes than low-cardinality columns. Gender columns with two values rarely justify dedicated indexes, while email columns with unique values per user always benefit from indexing.
Composite indexes serve queries filtering on multiple columns but only accelerate queries matching the index column order. An index on (user_id, created_at) speeds up queries filtering on user_id alone or both columns together, but not queries filtering only on created_at. Index column ordering significantly affects query optimizer decisions.
# Composite index with column order considerations
add_index :events, [:user_id, :event_type, :created_at],
name: 'index_events_on_user_and_type_and_date'
# Queries that use this index efficiently:
# WHERE user_id = 123
# WHERE user_id = 123 AND event_type = 'login'
# WHERE user_id = 123 AND event_type = 'login' AND created_at > '2024-01-01'
# Queries that cannot use this index:
# WHERE event_type = 'login' # First column not specified
# WHERE created_at > '2024-01-01' # First column not specified
Index maintenance overhead increases with table write volume. Each insert, update, or delete operation modifies all relevant indexes. Tables with ten indexes pay the cost of maintaining all ten on every write. Heavy write workloads benefit from selective indexing based on critical query patterns.
Partitioning impact varies by access pattern. Range partitioning accelerates queries filtering on partition keys by eliminating irrelevant partitions. Partition pruning reduces I/O operations and improves query planning speed. Queries not filtering on partition keys scan all partitions, potentially slower than querying unpartitioned tables.
Partition-wise joins enable parallel processing when joining partitioned tables on partition keys. Database engines can process each partition pair independently, utilizing multiple CPU cores. This technique dramatically accelerates large analytical joins.
Data type selection affects storage size and operation speed. Fixed-length types like INTEGER and TIMESTAMP enable faster comparisons and sorting than variable-length types like TEXT. Numeric types stored as integers perform arithmetic faster than floating-point types, critical for high-throughput financial calculations.
# Storage size and performance implications
create_table :transactions do |t|
# OPTION 1: Decimal - exact precision, slower arithmetic
t.decimal :amount, precision: 15, scale: 2
# OPTION 2: Integer cents - fast arithmetic, no rounding errors
t.integer :amount_cents, limit: 8
# OPTION 3: Money type (PostgreSQL) - specialized storage
t.money :amount
end
Normalized schemas require joins that cost CPU cycles and I/O operations. Each join combines row data from multiple table segments, requiring disk seeks and memory allocation. Queries joining five normalized tables typically execute slower than queries against a single denormalized table.
Denormalized schemas trade write performance for read speed. Maintaining redundant data across tables requires updating multiple rows per logical change. Applications must coordinate updates to prevent inconsistency. Denormalization works best for read-heavy workloads with infrequent updates.
Fill factor tuning affects storage density and update performance. Tables with 100% fill factor pack data tightly but require page splits during updates. Setting fill factor to 90% reserves space for updates, reducing page splits at the cost of 10% more storage. Write-heavy tables benefit from lower fill factors.
Autovacuum configuration manages dead tuple accumulation and statistics freshness. Aggressive autovacuum settings keep tables compact and statistics current at the cost of increased I/O overhead. Delayed autovacuum reduces maintenance overhead but allows dead tuples to bloat tables.
Connection pooling parameters affect concurrency and resource usage. Pool sizes too small create connection queueing delays. Pool sizes too large exhaust database backend processes and memory. Optimal pool sizing depends on query durations and concurrent user counts.
Tools & Ecosystem
Ruby applications manage physical data models through migration frameworks, schema management tools, and database analysis utilities. These tools coordinate schema changes, validate models, and optimize performance.
ActiveRecord migrations form the primary tool for schema management. The rails generate migration command creates timestamped migration files with boilerplate code. Migration files define change, up, and down methods that specify forward and backward schema modifications.
# Generated migration structure
class AddIndexToProducts < ActiveRecord::Migration[7.0]
def change
add_index :products, :sku, unique: true
end
end
# Running migrations
rake db:migrate
# Reverting migrations
rake db:rollback STEP=1
# Checking migration status
rake db:migrate:status
Schema dumpers extract physical model definitions from databases into version-controlled files. The db/schema.rb file represents the authoritative schema state, generated from active database connections. This file enables creating identical database structures in development and test environments.
Strong Migrations gem catches dangerous migration patterns before deployment. It flags migrations adding columns with defaults on large tables, creating indexes without concurrency, and removing columns without safety periods. This tool prevents common operational mistakes.
# Gemfile
gem 'strong_migrations'
# Configuration in config/initializers/strong_migrations.rb
StrongMigrations.start_after = 20240101000000
StrongMigrations.check_down = true
# Catching unsafe operations
class UnsafeMigration < ActiveRecord::Migration[7.0]
def change
add_column :users, :account_type, :integer, default: 0, null: false
# StrongMigrations raises error: Adding a column with a default value requires rewriting table
end
end
Database analysis tools examine physical models and recommend optimizations. PgHero monitors PostgreSQL performance, identifies missing indexes, and highlights table bloat. It tracks query performance over time and suggests schema improvements.
# Gemfile
gem 'pghero'
# Routes
mount PgHero::Engine, at: "pghero"
# Viewing recommendations
PgHero.suggested_indexes
# => [{ table: "events", columns: ["user_id", "created_at"], ...}]
PgHero.missing_indexes
# => Lists queries that would benefit from indexes
Schema validators ensure physical models meet organizational standards. Database Consistency gem validates foreign key presence, index definitions, and constraint completeness. These tools catch schema design issues during code review.
Annotate gem adds schema information as comments to model files. Comments document physical column types, limits, and indexes directly in Ruby code. This improves developer understanding of physical schema without requiring database queries.
# Gemfile
gem 'annotate'
# Generated annotations in app/models/user.rb
# == Schema Information
#
# Table name: users
#
# id :bigint not null, primary key
# email :string(255) not null
# encrypted_password :string(128)
# sign_in_count :integer default(0), not null
# created_at :datetime not null
# updated_at :datetime not null
#
# Indexes
#
# index_users_on_email (email) UNIQUE
#
class User < ApplicationRecord
end
Online schema change tools modify production tables without locking. gh-ost (GitHub's online schema change tool) uses triggers and shadow tables to apply migrations while serving production traffic. pt-online-schema-change provides similar functionality for MySQL databases.
Migration testing frameworks validate schema changes in production-like environments. These tools clone production schemas, apply migrations, and verify results before production deployment. Schema migration testing prevents deployment failures from incompatible schema changes.
Schema drift detection tools compare physical database schemas against version-controlled definitions. They identify manual schema changes, missing migrations, and inconsistencies across environments. Regular drift detection prevents schema divergence.
Reference
Physical Schema Elements
| Element | Purpose | Ruby DSL |
|---|---|---|
| Table | Primary data container | create_table |
| Column | Data field with type | t.string, t.integer, t.decimal |
| Index | Query acceleration structure | add_index, t.index |
| Constraint | Data integrity rule | add_check_constraint, add_foreign_key |
| Partition | Table subdivision | execute with partition DDL |
| Trigger | Automated action on data change | execute with trigger DDL |
| View | Stored query | create_view (scenic gem) |
Common Column Types
| Logical Type | PostgreSQL | MySQL | Storage Size |
|---|---|---|---|
| Small integer | smallint | SMALLINT | 2 bytes |
| Integer | integer | INT | 4 bytes |
| Large integer | bigint | BIGINT | 8 bytes |
| Decimal | numeric(p,s) | DECIMAL(p,s) | variable |
| Float | double precision | DOUBLE | 8 bytes |
| Short text | varchar(n) | VARCHAR(n) | n+1 bytes |
| Long text | text | TEXT | variable |
| Boolean | boolean | TINYINT(1) | 1 byte |
| Date | date | DATE | 4 bytes |
| Timestamp | timestamp | DATETIME | 8 bytes |
| Binary | bytea | BLOB | variable |
| JSON | jsonb | JSON | variable |
Index Types
| Type | Use Case | PostgreSQL | MySQL |
|---|---|---|---|
| B-tree | General purpose, range queries | btree | BTREE |
| Hash | Exact match only | hash | HASH |
| GiST | Geometric, full-text | gist | N/A |
| GIN | Array, JSONB, full-text | gin | N/A |
| BRIN | Large sequential data | brin | N/A |
| Spatial | Geographic queries | N/A | SPATIAL |
Migration Methods
| Operation | Method | Options |
|---|---|---|
| Create table | create_table | id, primary_key, options, force |
| Drop table | drop_table | if_exists |
| Add column | add_column | null, default, limit, precision, scale |
| Remove column | remove_column | N/A |
| Change column | change_column | type, limit, precision, scale |
| Rename column | rename_column | N/A |
| Add index | add_index | unique, where, using, algorithm |
| Remove index | remove_index | column, name |
| Add foreign key | add_foreign_key | on_delete, on_update |
| Add check | add_check_constraint | name, validate |
Constraint Types
| Constraint | Enforces | Example |
|---|---|---|
| Primary Key | Uniqueness + not null + clustering | t.primary_key :id |
| Foreign Key | Referential integrity | add_foreign_key :posts, :users |
| Unique | No duplicate values | add_index :users, :email, unique: true |
| Check | Value validation | add_check_constraint :accounts, "balance >= 0" |
| Not Null | Required value | t.string :email, null: false |
| Exclusion | Prevent overlaps | EXCLUDE USING gist |
Partition Strategies
| Strategy | Key Type | Best For |
|---|---|---|
| Range | Continuous values | Time-series, dates, sequences |
| List | Discrete values | Categories, regions, types |
| Hash | Any hashable value | Even distribution without pattern |
Performance Optimization Checklist
| Optimization | Impact | When to Apply |
|---|---|---|
| Add B-tree index | Read +50-1000%, Write -5% | High-cardinality columns in WHERE |
| Composite index | Read +100-500% | Multi-column filters |
| Partial index | Storage -30-70% | Filtering on specific values |
| Covering index | Read +20-100% | SELECT includes indexed columns |
| Partition table | Read +50-500% | Time-series or categorical data |
| Denormalize | Read +100-500%, Write -20% | Join-heavy queries |
| Reduce index count | Write +10-50% | Write-heavy workloads |
| Increase fill factor | Storage +10%, Update -20% | Update-heavy tables |
Online Migration Patterns
| Pattern | Technique | Downtime |
|---|---|---|
| Concurrent index | algorithm: :concurrently | None |
| Dual write | Write to old and new | None |
| Ghost table | Create shadow, swap | Seconds |
| Staged deployment | Deploy to replicas first | None |
| Lock timeout | Set timeout, retry | Brief retries |