CrackedRuby - Physical Data Models

Overview

Physical data models represent the concrete implementation of database schemas within specific database management systems. Where logical data models define what data exists and how entities relate conceptually, physical data models specify how data gets stored on disk, accessed by queries, and optimized for performance.

A physical data model includes storage-level details: table structures with exact data types, index definitions, partitioning schemes, constraint implementations, and storage parameters. These decisions directly affect query performance, storage costs, and system scalability. The same logical model can map to radically different physical implementations depending on workload characteristics, database engine capabilities, and performance requirements.

Physical modeling operates within the constraints of specific database systems. PostgreSQL, MySQL, Oracle, and other databases offer different storage engines, index types, and optimization features. Physical models must account for these system-specific capabilities while maintaining the integrity of the underlying logical design.

# Logical concept: Users have many posts
class User
  has_many :posts
end

# Physical implementation in migration
create_table :users do |t|
  t.string :email, null: false, limit: 255
  t.string :encrypted_password, limit: 128
  t.timestamps precision: 6
end

add_index :users, :email, unique: true, 
  using: :btree, algorithm: :concurrently

The transformation from logical to physical modeling requires balancing multiple concerns: query performance, storage efficiency, write throughput, data integrity enforcement, and operational maintainability. Each physical design choice creates trade-offs that affect system behavior under different load patterns.

Key Principles

Physical data models translate abstract logical structures into concrete storage implementations through several core mechanisms. Each database table corresponds to an actual file or set of files on disk, organized according to the storage engine's rules. The physical model specifies not just what columns exist, but their precise data types, storage sizes, null handling, and default values.

Data type mapping converts logical types into database-specific physical representations. An email address might logically be a string, but physically could be VARCHAR(255), TEXT, or CITEXT depending on validation requirements, index considerations, and search needs. Numeric types require decisions about precision, scale, and storage size that affect both accuracy and performance.

# Different physical representations of the same logical concept
create_table :products do |t|
  # Option 1: Fixed precision for currency
  t.decimal :price, precision: 10, scale: 2
  
  # Option 2: Integer cents for exact arithmetic
  t.integer :price_cents, limit: 8
  
  # Option 3: Float for scientific calculations (not recommended for currency)
  t.float :price
end

Index structures determine query performance characteristics. Physical models define which columns get indexed, what index types to use, and how indexes get maintained. B-tree indexes support range queries and sorting, hash indexes optimize exact-match lookups, GiST indexes handle geometric data, and GIN indexes accelerate full-text search. Index choices affect both read and write performance.

Partitioning schemes divide large tables into smaller physical segments based on column values or ranges. Range partitioning splits data by date ranges, list partitioning groups by discrete values, and hash partitioning distributes rows evenly. Partitioning decisions affect query planning, maintenance operations, and data lifecycle management.

# Declarative partitioning in PostgreSQL via Ruby
execute <<-SQL
  CREATE TABLE events (
    id BIGSERIAL,
    user_id BIGINT NOT NULL,
    event_type VARCHAR(50) NOT NULL,
    created_at TIMESTAMP NOT NULL,
    data JSONB
  ) PARTITION BY RANGE (created_at);
SQL

execute <<-SQL
  CREATE TABLE events_2024_q1 PARTITION OF events
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
SQL

execute <<-SQL
  CREATE TABLE events_2024_q2 PARTITION OF events
    FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');
SQL

Storage parameters control how the database physically stores and manages data. These include fill factors that determine how densely data packs into pages, autovacuum settings that manage dead tuple cleanup, and tablespace assignments that control which physical disks hold data. Storage parameters significantly affect performance for write-heavy workloads.

Constraint enforcement at the physical level ensures data integrity through database mechanisms rather than application logic. Primary keys guarantee uniqueness and provide default clustering, foreign keys enforce referential integrity, check constraints validate data ranges, and exclusion constraints prevent conflicting data. Physical constraint implementation affects both data quality and write performance.

Physical models must account for access patterns observed in production systems. Columns frequently used in WHERE clauses need indexes, columns in JOIN conditions require optimization, and columns in ORDER BY clauses benefit from specific index types. The physical design reflects how applications actually query data, not just theoretical data relationships.

Denormalization strategies trade storage space and update complexity for query performance. Materialized aggregates eliminate expensive calculations, redundant columns avoid joins, and precomputed values speed up common queries. Physical models often intentionally violate normalization rules when query patterns justify the trade-off.

Ruby Implementation

Ruby applications interact with physical data models primarily through ActiveRecord migrations and schema definitions. Migrations define incremental changes to database schemas, creating a version-controlled history of physical model evolution. Each migration translates logical requirements into specific database commands.

class CreateOrdersWithOptimizations < ActiveRecord::Migration[7.0]
  def change
    create_table :orders, id: :bigint do |t|
      t.references :user, null: false, foreign_key: true, 
        index: { algorithm: :concurrently }
      t.references :shipping_address, foreign_key: { to_table: :addresses }
      
      t.string :order_number, null: false, limit: 32
      t.integer :status, null: false, default: 0, limit: 2
      t.decimal :total_amount, precision: 12, scale: 2, null: false
      t.decimal :tax_amount, precision: 10, scale: 2
      
      t.datetime :placed_at, precision: 6
      t.datetime :fulfilled_at, precision: 6
      t.timestamps precision: 6
      
      t.index :order_number, unique: true
      t.index [:user_id, :created_at], order: { created_at: :desc }
      t.index :status, where: "status IN (0, 1)", 
        name: 'index_orders_on_pending_statuses'
    end
    
    # Add constraint enforcement at database level
    add_check_constraint :orders, 
      "total_amount >= 0", 
      name: 'orders_total_amount_positive'
  end
end

ActiveRecord's migration DSL abstracts database-specific syntax while exposing physical model features. The create_table method accepts options for storage engines, tablespaces, and table options. Column definitions specify exact types, sizes, and constraints. Index definitions control index types, sort orders, and partial index conditions.

Schema caching maintains a representation of the physical model in memory. ActiveRecord loads schema information at startup, caching column types, default values, and constraint information. This cache affects how ActiveRecord instantiates model objects and validates data before database operations.

# Schema information stored in db/schema.rb
ActiveRecord::Schema[7.0].define(version: 2024_03_15_120000) do
  enable_extension "pgcrypto"
  enable_extension "btree_gin"
  
  create_table "users", force: :cascade do |t|
    t.string "email", limit: 255, null: false
    t.string "encrypted_password", limit: 128
    t.integer "sign_in_count", default: 0, null: false
    t.datetime "current_sign_in_at", precision: 6
    t.datetime "created_at", precision: 6, null: false
    t.datetime "updated_at", precision: 6, null: false
    t.index ["email"], name: "index_users_on_email", unique: true
  end
end

Database-specific adapters translate ActiveRecord operations into database-native commands. PostgreSQL adapters support array columns, JSON types, and advanced index types. MySQL adapters handle storage engine selection and charset specifications. Adapter capabilities directly affect what physical features migrations can express.

# PostgreSQL-specific physical features
create_table :documents do |t|
  t.string :title
  t.text :body
  t.tsvector :search_vector
  t.string :tags, array: true, default: []
  t.jsonb :metadata, default: {}
  
  t.timestamps
  
  # PostgreSQL GIN index for full-text search
  t.index :search_vector, using: :gin
  
  # PostgreSQL GIN index for JSONB queries
  t.index :metadata, using: :gin
  
  # PostgreSQL array index
  t.index :tags, using: :gin
end

# Trigger to maintain search vector
execute <<-SQL
  CREATE TRIGGER documents_search_vector_update
  BEFORE INSERT OR UPDATE ON documents
  FOR EACH ROW EXECUTE FUNCTION
  tsvector_update_trigger(search_vector, 'pg_catalog.english', title, body);
SQL

Connection pool configuration affects physical resource usage. Pool size determines maximum concurrent database connections, checkout timeouts control connection wait behavior, and reaping settings manage stale connections. These parameters directly affect how applications interact with physical database instances.

# config/database.yml physical connection settings
production:
  adapter: postgresql
  pool: <%= ENV.fetch("RAILS_MAX_THREADS", 5) %>
  timeout: 5000
  checkout_timeout: 5
  reaping_frequency: 10
  statement_limit: 1000
  variables:
    statement_timeout: 5000
    lock_timeout: 3000

Physical model changes require migration strategy considerations. Online migrations use techniques like algorithm concurrency to avoid blocking production writes. Adding indexes with algorithm: :concurrently allows queries to continue during index builds. Column additions with non-null defaults on large tables require careful approaches to avoid table locks.

class AddColumnsWithoutLocking < ActiveRecord::Migration[7.0]
  disable_ddl_transaction!
  
  def change
    # Add column without default first
    add_column :users, :account_type, :integer
    
    # Backfill in batches
    User.in_batches(of: 1000).update_all(account_type: 0)
    
    # Add not null constraint separately
    change_column_null :users, :account_type, false
    
    # Add index concurrently
    add_index :users, :account_type, algorithm: :concurrently
  end
end

Model classes interact with physical schemas through attribute methods. ActiveRecord generates methods based on physical column names, maps Ruby types to database types, and enforces constraints through validations that mirror physical constraints. Model behavior depends directly on underlying physical structure.

Design Considerations

Physical data model design requires evaluating multiple dimensions of database behavior. The optimal physical design for a high-write, time-series workload differs fundamentally from an analytical reporting database or a transactional e-commerce system. Each design decision creates trade-offs between competing concerns.

Normalization versus denormalization forms the central design tension. Normalized physical models minimize data redundancy, simplify updates, and reduce storage space. Queries spanning multiple normalized tables require joins that cost CPU cycles and I/O operations. Denormalized physical models duplicate data strategically to eliminate joins, trading storage space and update complexity for query speed.

# Normalized physical model
create_table :order_items do |t|
  t.references :order, null: false, foreign_key: true
  t.references :product, null: false, foreign_key: true
  t.integer :quantity, null: false
  t.decimal :unit_price, precision: 10, scale: 2
  # Total must be calculated: quantity * unit_price
end

# Denormalized physical model with precomputed totals
create_table :order_items do |t|
  t.references :order, null: false, foreign_key: true
  t.references :product, null: false, foreign_key: true
  t.string :product_name, limit: 255  # Denormalized from products
  t.integer :quantity, null: false
  t.decimal :unit_price, precision: 10, scale: 2
  t.decimal :line_total, precision: 12, scale: 2  # Precomputed
end

Index selection balances query performance against write overhead and storage costs. Each index speeds up specific query patterns while slowing down inserts and updates. Composite indexes serve queries filtering on multiple columns but consume more disk space than single-column indexes. Covering indexes eliminate table lookups entirely but duplicate column data.

Partial indexes optimize specific query patterns by indexing only rows matching a WHERE condition. A partial index on active users reduces index size and maintenance overhead for applications that rarely query inactive users. Expression indexes support queries filtering on computed values or function results.

# Partial index for common query pattern
add_index :orders, :created_at, 
  where: "status = 'pending'",
  name: 'index_orders_pending_by_date'

# Expression index for case-insensitive searches
add_index :users, "LOWER(email)", 
  name: 'index_users_on_lower_email'

# Covering index to avoid table lookups
add_index :products, [:category_id, :price], 
  include: [:name, :sku],
  name: 'index_products_for_catalog'

Data type precision affects both storage efficiency and application behavior. Choosing between VARCHAR(255) and TEXT impacts index size and sorting performance. Selecting DECIMAL versus INTEGER for currency determines accuracy guarantees. Date precision choices affect timestamp resolution and comparison behavior.

Integer size selection determines value ranges and storage space. A SMALLINT uses 2 bytes and supports values from -32768 to 32767. A BIGINT uses 8 bytes and handles values up to 9 quintillion. Undersized integer columns cause overflow errors, while oversized columns waste storage space.

Constraint placement determines where the database enforces data integrity rules. Physical constraints in the database guarantee consistency across all application paths, but add overhead to write operations. Application-level validations offer flexibility and better error messages but cannot prevent constraint violations from other database clients.

# Database-enforced constraints
create_table :accounts do |t|
  t.decimal :balance, precision: 15, scale: 2, null: false
  t.datetime :created_at, null: false
  
  # Physical constraint prevents negative balances
  t.check_constraint "balance >= 0", name: 'accounts_positive_balance'
end

# Foreign key with cascade behavior
add_foreign_key :posts, :users, on_delete: :cascade

# Exclusion constraint (PostgreSQL)
execute <<-SQL
  ALTER TABLE bookings
  ADD CONSTRAINT bookings_no_overlap
  EXCLUDE USING GIST (room_id WITH =, tsrange(start_time, end_time) WITH &&);
SQL

Partitioning strategy selection depends on data lifecycle and query patterns. Time-based range partitioning suits event logs and time-series data with retention policies. List partitioning groups data by discrete categories like geographic regions. Hash partitioning distributes data evenly when no natural partitioning key exists.

Partition granularity affects query planning and maintenance operations. Daily partitions create thousands of table segments for historical data, while monthly or yearly partitions reduce management overhead at the cost of less precise data lifecycle control. Partition pruning eliminates irrelevant partitions from query execution but requires queries to filter on partition keys.

Storage engine selection determines transaction semantics, locking behavior, and performance characteristics. PostgreSQL offers a single MVCC-based engine with tunable parameters. MySQL provides InnoDB for transactions and MyISAM for read-heavy workloads without transaction overhead. Storage engine choices affect crash recovery, backup procedures, and replication behavior.

Implementation Approaches

Physical data model implementation requires coordinating schema creation, data migration, and application deployment. Different approaches balance risk, downtime requirements, and operational complexity.

Schema-first implementation creates the complete physical model before deploying application code. Database administrators review and optimize the schema, create indexes proactively, and validate constraints. This approach works well for new applications or major schema overhauls but requires accurate prediction of access patterns.

# Complete schema definition before deployment
class InitialSchema < ActiveRecord::Migration[7.0]
  def change
    create_table :users do |t|
      t.string :email, null: false, limit: 255
      t.string :username, limit: 50
      t.integer :role, null: false, default: 0
      t.timestamps precision: 6
      
      t.index :email, unique: true
      t.index :username, unique: true
      t.index [:role, :created_at]
    end
    
    create_table :posts do |t|
      t.references :user, null: false, foreign_key: true
      t.string :title, null: false, limit: 255
      t.text :body
      t.integer :status, null: false, default: 0
      t.timestamps precision: 6
      
      t.index [:user_id, :created_at]
      t.index :status
    end
    
    # Additional tables...
  end
end

Iterative implementation evolves physical models incrementally through small migrations. Each migration represents a single logical change: adding a column, creating an index, or modifying a constraint. This approach minimizes deployment risk and allows physical model evolution based on observed production patterns.

Iterative implementation requires coordination between schema changes and application code. Adding a required column needs a multi-step process: add column as nullable, backfill values in production, add not-null constraint, deploy code expecting the column. Removing columns follows the reverse pattern: deploy code ignoring the column, verify no usage, drop the column.

# Step 1: Add column as nullable
class AddUserTypeColumn < ActiveRecord::Migration[7.0]
  def change
    add_column :users, :user_type, :integer
  end
end

# Step 2: Backfill default values
class BackfillUserType < ActiveRecord::Migration[7.0]
  def up
    User.in_batches(of: 5000).update_all(user_type: 0)
  end
end

# Step 3: Add not null constraint
class RequireUserType < ActiveRecord::Migration[7.0]
  def change
    change_column_null :users, :user_type, false
  end
end

Zero-downtime migration strategies modify physical models without interrupting production traffic. Techniques include creating new columns before deprecating old ones, building indexes concurrently, and using database features like triggers to maintain dual-write compatibility during transitions.

Ghost table approaches create shadow copies of tables, apply migrations to shadows, and swap table names atomically. This technique works for schema changes that would otherwise lock tables for extended periods. Tools like pt-online-schema-change automate ghost table workflows.

# Dual-write pattern for column migration
class MigrateAddressToComponents < ActiveRecord::Migration[7.0]
  def change
    # Add new columns
    add_column :users, :street_address, :string
    add_column :users, :city, :string
    add_column :users, :postal_code, :string
    
    # Application code writes to both old and new columns
    # After validation period, drop old column
  end
end

Constraint-based implementation leverages database constraints to enforce business rules physically. Check constraints validate value ranges, foreign keys guarantee referential integrity, and unique constraints prevent duplicates. This approach centralizes validation logic in the database but requires careful constraint design to avoid overly restrictive rules.

Deferred constraints allow transactions to temporarily violate rules before commit. This technique supports complex multi-row updates that would otherwise violate constraints during intermediate states. Deferred constraint checking trades immediate validation for transactional consistency.

Staged rollout implementation deploys physical model changes progressively across database replicas or shards. Primary databases receive schema changes first, replicas follow after validation. This approach detects problems on secondary infrastructure before affecting critical primary databases.

Performance Considerations

Physical data model design directly determines database performance characteristics. Query execution plans, storage I/O patterns, and memory utilization all depend on physical schema decisions. Performance optimization requires understanding how database engines interact with physical structures.

Index performance depends on cardinality, selectivity, and index structure. High-cardinality columns (many distinct values) benefit more from indexes than low-cardinality columns. Gender columns with two values rarely justify dedicated indexes, while email columns with unique values per user always benefit from indexing.

Composite indexes serve queries filtering on multiple columns but only accelerate queries matching the index column order. An index on (user_id, created_at) speeds up queries filtering on user_id alone or both columns together, but not queries filtering only on created_at. Index column ordering significantly affects query optimizer decisions.

# Composite index with column order considerations
add_index :events, [:user_id, :event_type, :created_at],
  name: 'index_events_on_user_and_type_and_date'

# Queries that use this index efficiently:
# WHERE user_id = 123
# WHERE user_id = 123 AND event_type = 'login'
# WHERE user_id = 123 AND event_type = 'login' AND created_at > '2024-01-01'

# Queries that cannot use this index:
# WHERE event_type = 'login'  # First column not specified
# WHERE created_at > '2024-01-01'  # First column not specified

Index maintenance overhead increases with table write volume. Each insert, update, or delete operation modifies all relevant indexes. Tables with ten indexes pay the cost of maintaining all ten on every write. Heavy write workloads benefit from selective indexing based on critical query patterns.

Partitioning impact varies by access pattern. Range partitioning accelerates queries filtering on partition keys by eliminating irrelevant partitions. Partition pruning reduces I/O operations and improves query planning speed. Queries not filtering on partition keys scan all partitions, potentially slower than querying unpartitioned tables.

Partition-wise joins enable parallel processing when joining partitioned tables on partition keys. Database engines can process each partition pair independently, utilizing multiple CPU cores. This technique dramatically accelerates large analytical joins.

Data type selection affects storage size and operation speed. Fixed-length types like INTEGER and TIMESTAMP enable faster comparisons and sorting than variable-length types like TEXT. Numeric types stored as integers perform arithmetic faster than floating-point types, critical for high-throughput financial calculations.

# Storage size and performance implications
create_table :transactions do |t|
  # OPTION 1: Decimal - exact precision, slower arithmetic
  t.decimal :amount, precision: 15, scale: 2
  
  # OPTION 2: Integer cents - fast arithmetic, no rounding errors
  t.integer :amount_cents, limit: 8
  
  # OPTION 3: Money type (PostgreSQL) - specialized storage
  t.money :amount
end

Normalized schemas require joins that cost CPU cycles and I/O operations. Each join combines row data from multiple table segments, requiring disk seeks and memory allocation. Queries joining five normalized tables typically execute slower than queries against a single denormalized table.

Denormalized schemas trade write performance for read speed. Maintaining redundant data across tables requires updating multiple rows per logical change. Applications must coordinate updates to prevent inconsistency. Denormalization works best for read-heavy workloads with infrequent updates.

Fill factor tuning affects storage density and update performance. Tables with 100% fill factor pack data tightly but require page splits during updates. Setting fill factor to 90% reserves space for updates, reducing page splits at the cost of 10% more storage. Write-heavy tables benefit from lower fill factors.

Autovacuum configuration manages dead tuple accumulation and statistics freshness. Aggressive autovacuum settings keep tables compact and statistics current at the cost of increased I/O overhead. Delayed autovacuum reduces maintenance overhead but allows dead tuples to bloat tables.

Connection pooling parameters affect concurrency and resource usage. Pool sizes too small create connection queueing delays. Pool sizes too large exhaust database backend processes and memory. Optimal pool sizing depends on query durations and concurrent user counts.

Tools & Ecosystem

Ruby applications manage physical data models through migration frameworks, schema management tools, and database analysis utilities. These tools coordinate schema changes, validate models, and optimize performance.

ActiveRecord migrations form the primary tool for schema management. The rails generate migration command creates timestamped migration files with boilerplate code. Migration files define change, up, and down methods that specify forward and backward schema modifications.

# Generated migration structure
class AddIndexToProducts < ActiveRecord::Migration[7.0]
  def change
    add_index :products, :sku, unique: true
  end
end

# Running migrations
rake db:migrate

# Reverting migrations
rake db:rollback STEP=1

# Checking migration status
rake db:migrate:status

Schema dumpers extract physical model definitions from databases into version-controlled files. The db/schema.rb file represents the authoritative schema state, generated from active database connections. This file enables creating identical database structures in development and test environments.

Strong Migrations gem catches dangerous migration patterns before deployment. It flags migrations adding columns with defaults on large tables, creating indexes without concurrency, and removing columns without safety periods. This tool prevents common operational mistakes.

# Gemfile
gem 'strong_migrations'

# Configuration in config/initializers/strong_migrations.rb
StrongMigrations.start_after = 20240101000000
StrongMigrations.check_down = true

# Catching unsafe operations
class UnsafeMigration < ActiveRecord::Migration[7.0]
  def change
    add_column :users, :account_type, :integer, default: 0, null: false
    # StrongMigrations raises error: Adding a column with a default value requires rewriting table
  end
end

Database analysis tools examine physical models and recommend optimizations. PgHero monitors PostgreSQL performance, identifies missing indexes, and highlights table bloat. It tracks query performance over time and suggests schema improvements.

# Gemfile
gem 'pghero'

# Routes
mount PgHero::Engine, at: "pghero"

# Viewing recommendations
PgHero.suggested_indexes
# => [{ table: "events", columns: ["user_id", "created_at"], ...}]

PgHero.missing_indexes
# => Lists queries that would benefit from indexes

Schema validators ensure physical models meet organizational standards. Database Consistency gem validates foreign key presence, index definitions, and constraint completeness. These tools catch schema design issues during code review.

Annotate gem adds schema information as comments to model files. Comments document physical column types, limits, and indexes directly in Ruby code. This improves developer understanding of physical schema without requiring database queries.

# Gemfile
gem 'annotate'

# Generated annotations in app/models/user.rb
# == Schema Information
#
# Table name: users
#
#  id                  :bigint           not null, primary key
#  email               :string(255)      not null
#  encrypted_password  :string(128)
#  sign_in_count      :integer          default(0), not null
#  created_at         :datetime         not null
#  updated_at         :datetime         not null
#
# Indexes
#
#  index_users_on_email  (email) UNIQUE
#
class User < ApplicationRecord
end

Online schema change tools modify production tables without locking. gh-ost (GitHub's online schema change tool) uses triggers and shadow tables to apply migrations while serving production traffic. pt-online-schema-change provides similar functionality for MySQL databases.

Migration testing frameworks validate schema changes in production-like environments. These tools clone production schemas, apply migrations, and verify results before production deployment. Schema migration testing prevents deployment failures from incompatible schema changes.

Schema drift detection tools compare physical database schemas against version-controlled definitions. They identify manual schema changes, missing migrations, and inconsistencies across environments. Regular drift detection prevents schema divergence.

Reference

Physical Schema Elements

Element	Purpose	Ruby DSL
Table	Primary data container	create_table
Column	Data field with type	t.string, t.integer, t.decimal
Index	Query acceleration structure	add_index, t.index
Constraint	Data integrity rule	add_check_constraint, add_foreign_key
Partition	Table subdivision	execute with partition DDL
Trigger	Automated action on data change	execute with trigger DDL
View	Stored query	create_view (scenic gem)

Common Column Types

Logical Type	PostgreSQL	MySQL	Storage Size
Small integer	smallint	SMALLINT	2 bytes
Integer	integer	INT	4 bytes
Large integer	bigint	BIGINT	8 bytes
Decimal	numeric(p,s)	DECIMAL(p,s)	variable
Float	double precision	DOUBLE	8 bytes
Short text	varchar(n)	VARCHAR(n)	n+1 bytes
Long text	text	TEXT	variable
Boolean	boolean	TINYINT(1)	1 byte
Date	date	DATE	4 bytes
Timestamp	timestamp	DATETIME	8 bytes
Binary	bytea	BLOB	variable
JSON	jsonb	JSON	variable

Index Types

Type	Use Case	PostgreSQL	MySQL
B-tree	General purpose, range queries	btree	BTREE
Hash	Exact match only	hash	HASH
GiST	Geometric, full-text	gist	N/A
GIN	Array, JSONB, full-text	gin	N/A
BRIN	Large sequential data	brin	N/A
Spatial	Geographic queries	N/A	SPATIAL

Migration Methods

Operation	Method	Options
Create table	create_table	id, primary_key, options, force
Drop table	drop_table	if_exists
Add column	add_column	null, default, limit, precision, scale
Remove column	remove_column	N/A
Change column	change_column	type, limit, precision, scale
Rename column	rename_column	N/A
Add index	add_index	unique, where, using, algorithm
Remove index	remove_index	column, name
Add foreign key	add_foreign_key	on_delete, on_update
Add check	add_check_constraint	name, validate

Constraint Types

Constraint	Enforces	Example
Primary Key	Uniqueness + not null + clustering	t.primary_key :id
Foreign Key	Referential integrity	add_foreign_key :posts, :users
Unique	No duplicate values	add_index :users, :email, unique: true
Check	Value validation	add_check_constraint :accounts, "balance >= 0"
Not Null	Required value	t.string :email, null: false
Exclusion	Prevent overlaps	EXCLUDE USING gist

Partition Strategies

Strategy	Key Type	Best For
Range	Continuous values	Time-series, dates, sequences
List	Discrete values	Categories, regions, types
Hash	Any hashable value	Even distribution without pattern

Performance Optimization Checklist

Optimization	Impact	When to Apply
Add B-tree index	Read +50-1000%, Write -5%	High-cardinality columns in WHERE
Composite index	Read +100-500%	Multi-column filters
Partial index	Storage -30-70%	Filtering on specific values
Covering index	Read +20-100%	SELECT includes indexed columns
Partition table	Read +50-500%	Time-series or categorical data
Denormalize	Read +100-500%, Write -20%	Join-heavy queries
Reduce index count	Write +10-50%	Write-heavy workloads
Increase fill factor	Storage +10%, Update -20%	Update-heavy tables

Online Migration Patterns

Pattern	Technique	Downtime
Concurrent index	algorithm: :concurrently	None
Dual write	Write to old and new	None
Ghost table	Create shadow, swap	Seconds
Staged deployment	Deploy to replicas first	None
Lock timeout	Set timeout, retry	Brief retries

Physical Data Models