CrackedRuby CrackedRuby

Overview

A query execution plan represents the sequence of operations a database management system performs to execute a SQL query. The database query optimizer generates these plans by evaluating multiple strategies and selecting the one with the lowest estimated cost. Execution plans expose the internal decision-making process of the database, showing which indexes the system uses, how it joins tables, and the order of operations.

Database systems convert declarative SQL statements into procedural execution strategies. When a query arrives, the optimizer analyzes available access paths, estimates the cost of different approaches, and constructs a plan tree. This plan tree consists of nodes representing operations like table scans, index lookups, joins, sorts, and aggregations. Each node receives input from child nodes and produces output for parent nodes.

Execution plans serve multiple purposes in database performance analysis. Developers examine plans to understand why queries perform poorly, identify missing indexes, detect inefficient join strategies, and verify that the optimizer makes correct assumptions about data distribution. Plans reveal the gap between how developers expect queries to execute and how they actually execute.

The format and detail level of execution plans varies across database systems. PostgreSQL provides detailed text-based plans with cost estimates and row counts. MySQL offers both traditional and JSON-formatted plans with additional metrics. SQLite produces simpler plans focused on index usage and join order. Despite these differences, all execution plans share common elements: operation types, execution order, cost estimates, and cardinality predictions.

-- Simple query example
SELECT users.name, COUNT(orders.id)
FROM users
LEFT JOIN orders ON users.id = orders.user_id
WHERE users.active = true
GROUP BY users.id, users.name;

-- Execution plan shows:
-- 1. Filter users by active flag
-- 2. Join with orders table
-- 3. Group results
-- 4. Count aggregation

Database optimizers balance multiple competing factors: minimizing disk I/O, reducing memory usage, parallelizing operations when possible, and avoiding expensive sort operations. The optimizer assigns costs to each operation based on statistics about table sizes, data distribution, and available indexes. These statistics directly influence plan quality, making regular statistics updates critical for optimal performance.

Key Principles

Query execution plans follow a tree structure where leaf nodes access base data and intermediate nodes transform or combine data. The database executes plans bottom-up, with child nodes producing rows that parent nodes consume. This hierarchical organization allows the optimizer to compose complex operations from simple building blocks.

Cost-Based Optimization drives plan selection. The optimizer calculates estimated costs for different execution strategies, measured in abstract cost units that represent I/O operations, CPU cycles, and memory usage. The system chooses the plan with the lowest total cost. Cost estimation depends on table statistics, which track row counts, column value distributions, index selectivity, and data clustering. Outdated statistics lead to suboptimal plans because the optimizer makes decisions based on incorrect assumptions about data characteristics.

Access Methods determine how the database retrieves rows from tables. Sequential scans read entire tables from disk, appropriate for small tables or queries that access most rows. Index scans use B-tree or hash indexes to locate specific rows quickly, ideal for selective queries. Bitmap index scans combine multiple indexes before accessing table data, reducing random I/O. Index-only scans retrieve all required data from the index itself, avoiding table access entirely.

# Sequential scan - reads all rows
User.where("created_at > ?", 1.year.ago)  # Without index on created_at

# Index scan - uses index to find specific rows
User.where(email: "user@example.com")  # With unique index on email

# Index-only scan - all columns in index
User.select(:id, :email).where(email: "user@example.com")  # Covered by index

Join Strategies combine data from multiple tables using different algorithms. Nested loop joins iterate over one table and probe the other table for each row, efficient when one table is small. Hash joins build an in-memory hash table from one input and probe it with rows from the other input, fast for medium-sized tables. Merge joins require sorted inputs and scan both tables in parallel, optimal for pre-sorted data or when sort cost is acceptable.

The database selects join strategies based on table sizes, available indexes, memory limits, and sort requirements. Small tables often use nested loops, medium tables benefit from hash joins, and large sorted tables work well with merge joins. The optimizer may choose different strategies for different joins in the same query.

Join Order significantly impacts query performance. For N tables, N! possible join orders exist. The optimizer uses dynamic programming or heuristics to explore join orders and select the most efficient sequence. The system considers table sizes, join selectivity, and available indexes when ordering joins. Joining smaller result sets first reduces intermediate result sizes and overall query cost.

Cardinality Estimation predicts how many rows each operation produces. The optimizer uses these estimates to choose appropriate algorithms and allocate memory. Inaccurate estimates cause poor plan choices, such as selecting nested loop joins when hash joins would perform better. Estimation errors compound through the plan tree, making accurate statistics essential for good plans.

Plan Caching stores compiled plans to avoid repeated optimization overhead. Prepared statements and parameterized queries enable plan reuse across multiple executions. However, cached plans may become suboptimal when data distribution changes significantly. Some databases use adaptive optimization to regenerate plans when execution metrics deviate from estimates.

Parallel Execution splits operations across multiple worker processes to utilize available CPU cores. The optimizer considers parallelization when processing large tables or performing expensive operations. Parallel query execution requires sufficient memory for worker coordination and may not benefit small queries due to coordination overhead.

Ruby Implementation

Ruby applications interact with query execution plans primarily through database adapters and ActiveRecord. Each database adapter provides methods to retrieve and display execution plans, with syntax varying by database system.

PostgreSQL with ActiveRecord uses the EXPLAIN command to generate execution plans. The explain method on ActiveRecord relations returns plan details as a string.

# Basic EXPLAIN
plan = User.where(email: "test@example.com").explain
puts plan

# EXPLAIN ANALYZE - actually executes query and shows real timing
plan = User.where(email: "test@example.com").explain(:analyze, :buffers)
puts plan

# Multiple conditions
plan = Order
  .joins(:user)
  .where(created_at: 1.week.ago..Time.now)
  .where(users: { active: true })
  .explain
# Shows join strategy, filter operations, index usage

The explain method accepts options that control plan detail level. The :analyze option executes the query and includes actual timing and row counts. The :buffers option shows buffer usage statistics. The :verbose option includes additional plan details.

# Detailed analysis with all options
User
  .joins(:orders)
  .group(:id)
  .having("COUNT(orders.id) > ?", 5)
  .explain(:analyze, :buffers, :verbose)

MySQL/MariaDB implementation differs slightly. ActiveRecord supports EXPLAIN for MySQL, but the format varies between MySQL versions.

# MySQL 5.6 and earlier - traditional format
Product.where("price > ?", 100).explain

# MySQL 5.7+ - JSON format provides more detail
# Access through raw SQL for JSON format
result = ActiveRecord::Base.connection.execute(
  "EXPLAIN FORMAT=JSON #{Product.where('price > 100').to_sql}"
)
json_plan = JSON.parse(result.first.first)

SQLite provides simpler execution plans focused on query structure rather than detailed cost analysis.

# SQLite EXPLAIN
plan = User.where(active: true).explain
# Shows SQLite bytecode operations

# EXPLAIN QUERY PLAN - higher level view
connection = ActiveRecord::Base.connection
query_sql = User.where(active: true).to_sql
plan = connection.execute("EXPLAIN QUERY PLAN #{query_sql}")

Custom Plan Analysis requires parsing plan output and extracting metrics. Building helper methods simplifies repetitive analysis tasks.

class QueryAnalyzer
  def self.analyze(relation)
    plan_text = relation.explain(:analyze)
    
    {
      execution_time: extract_execution_time(plan_text),
      sequential_scans: count_sequential_scans(plan_text),
      index_scans: count_index_scans(plan_text),
      sort_operations: count_sorts(plan_text)
    }
  end
  
  private
  
  def self.extract_execution_time(plan)
    match = plan.match(/Execution Time: ([\d.]+) ms/)
    match ? match[1].to_f : nil
  end
  
  def self.count_sequential_scans(plan)
    plan.scan(/Seq Scan/).count
  end
  
  def self.count_index_scans(plan)
    plan.scan(/Index Scan|Index Only Scan|Bitmap Index Scan/).count
  end
  
  def self.count_sorts(plan)
    plan.scan(/Sort/).count
  end
end

# Usage
stats = QueryAnalyzer.analyze(
  User.joins(:orders).where("orders.total > ?", 1000)
)

Connection-Level Plan Analysis accesses plans for any SQL statement, not just ActiveRecord relations.

# Direct SQL with EXPLAIN
sql = <<-SQL
  SELECT users.*, COUNT(orders.id) as order_count
  FROM users
  LEFT JOIN orders ON orders.user_id = users.id
  GROUP BY users.id
  HAVING COUNT(orders.id) > 5
SQL

connection = ActiveRecord::Base.connection
plan = connection.execute("EXPLAIN ANALYZE #{sql}").to_a

Plan Comparison helps evaluate optimization efforts by running queries before and after index changes.

class PlanComparator
  def compare(relation)
    before_plan = capture_plan(relation)
    
    yield  # Make schema changes (add indexes, etc.)
    
    after_plan = capture_plan(relation)
    
    {
      before: before_plan,
      after: after_plan,
      improvement: calculate_improvement(before_plan, after_plan)
    }
  end
  
  private
  
  def capture_plan(relation)
    plan_text = relation.explain(:analyze)
    {
      text: plan_text,
      execution_time: extract_time(plan_text),
      operations: extract_operations(plan_text)
    }
  end
  
  def calculate_improvement(before_plan, after_plan)
    before_time = before_plan[:execution_time]
    after_time = after_plan[:execution_time]
    
    return nil unless before_time && after_time
    
    ((before_time - after_time) / before_time * 100).round(2)
  end
  
  def extract_time(plan)
    match = plan.match(/Execution Time: ([\d.]+) ms/)
    match ? match[1].to_f : nil
  end
  
  def extract_operations(plan)
    plan.scan(/^[^\s].*(?:Scan|Join|Sort|Hash|Aggregate)/).map(&:strip)
  end
end

Database-Specific Adapters expose plan functionality through adapter-specific methods. The pg gem for PostgreSQL provides direct access to plan information.

require 'pg'

conn = PG.connect(dbname: 'myapp_production')

# Get plan without execution
result = conn.exec("EXPLAIN SELECT * FROM users WHERE active = true")
puts result.values

# Get plan with execution
result = conn.exec("EXPLAIN ANALYZE SELECT * FROM users WHERE active = true")

# JSON format for programmatic processing
result = conn.exec(
  "EXPLAIN (ANALYZE, FORMAT JSON) SELECT * FROM users WHERE active = true"
)
plan_json = JSON.parse(result.getvalue(0, 0))

Practical Examples

Examining real execution plans demonstrates how databases process queries and identifies optimization opportunities. These examples progress from simple lookups to complex multi-table joins.

Example 1: Index vs Sequential Scan

A common scenario involves finding users by email. The execution plan reveals whether the database uses an index.

# Without index on email column
plan = User.where(email: "user@example.com").explain

# PostgreSQL output (formatted):
# Seq Scan on users  (cost=0.00..2584.00 rows=1 width=524)
#   Filter: ((email)::text = 'user@example.com'::text)
# Planning Time: 0.123 ms
# Execution Time: 45.234 ms

# After adding index
ActiveRecord::Base.connection.add_index :users, :email

plan = User.where(email: "user@example.com").explain

# Index Scan using index_users_on_email on users
#   (cost=0.42..8.44 rows=1 width=524)
#   Index Cond: ((email)::text = 'user@example.com'::text)
# Planning Time: 0.089 ms
# Execution Time: 0.234 ms

The sequential scan cost of 2584.00 units reflects reading the entire table. The index scan cost of 8.44 units shows dramatic improvement. Execution time dropped from 45ms to 0.2ms, a 200x speedup. The plan confirms the database uses the index through "Index Scan using index_users_on_email."

Example 2: Join Strategy Selection

Join strategy depends on table sizes and available indexes. This example shows how the optimizer chooses between nested loop and hash joins.

# Small users table (1000 rows), large orders table (1 million rows)
plan = User
  .joins(:orders)
  .where(users: { country: "US" })
  .explain(:analyze)

# Nested Loop  (cost=0.71..25234.45 rows=5000 width=1048)
#   -> Index Scan using idx_users_country on users
#        (cost=0.28..45.30 rows=100 width=524)
#        Index Cond: ((country)::text = 'US'::text)
#   -> Index Scan using idx_orders_user_id on orders
#        (cost=0.43..250.45 rows=50 width=524)
#        Index Cond: (user_id = users.id)

# Large users table (1 million rows), small orders table (10000 rows)
plan = User
  .joins(:orders)
  .where("orders.created_at > ?", 1.day.ago)
  .explain(:analyze)

# Hash Join  (cost=452.00..35678.45 rows=500 width=1048)
#   Hash Cond: (orders.user_id = users.id)
#   -> Seq Scan on orders  (cost=0.00..234.00 rows=500 width=524)
#        Filter: (created_at > '2024-10-06 00:00:00')
#   -> Hash  (cost=234.00..234.00 rows=10000 width=524)
#        -> Seq Scan on users  (cost=0.00..234.00 rows=10000 width=524)

The nested loop appears when the outer table has few rows (100 users). The optimizer probes the orders table once per user, using the index for efficient lookups. The hash join appears when filtering produces a small orders set (500 rows). The database builds a hash table from the larger users table and probes it for each order.

Example 3: Aggregation and Grouping

Aggregation queries show how databases handle grouping, sorting, and aggregation operations.

# Count orders per user
plan = Order
  .group(:user_id)
  .select("user_id, COUNT(*) as order_count")
  .having("COUNT(*) > 10")
  .explain(:analyze)

# HashAggregate  (cost=45234.52..45456.52 rows=222 width=16)
#   Group Key: user_id
#   Filter: (count(*) > 10)
#   -> Seq Scan on orders  (cost=0.00..34523.00 rows=1000000 width=8)
# Planning Time: 0.234 ms
# Execution Time: 234.567 ms

# With index on user_id for sorting
ActiveRecord::Base.connection.add_index :orders, :user_id

plan = Order
  .group(:user_id)
  .select("user_id, COUNT(*) as order_count")
  .having("COUNT(*) > 10")
  .explain(:analyze)

# GroupAggregate  (cost=0.42..78234.56 rows=222 width=16)
#   Group Key: user_id
#   Filter: (count(*) > 10)
#   -> Index Scan using index_orders_on_user_id on orders
#        (cost=0.42..56234.56 rows=1000000 width=8)

The hash aggregate builds an in-memory hash table for grouping. When the index exists, the optimizer switches to group aggregate, which processes pre-sorted input. Group aggregate uses less memory but may be slower if the sort operation is expensive.

Example 4: Subquery vs Join

Subqueries can execute differently than equivalent joins. The plan reveals the actual execution strategy.

# Subquery approach - find users with recent orders
plan = User
  .where("id IN (SELECT user_id FROM orders WHERE created_at > ?)", 1.week.ago)
  .explain(:analyze)

# Hash Semi Join  (cost=2345.00..4567.89 rows=150 width=524)
#   Hash Cond: (users.id = orders.user_id)
#   -> Seq Scan on users  (cost=0.00..1234.00 rows=10000 width=524)
#   -> Hash  (cost=1023.45..1023.45 rows=150 width=8)
#        -> Bitmap Heap Scan on orders
#             (cost=45.67..1023.45 rows=150 width=8)
#             Recheck Cond: (created_at > '2024-09-30')
#             -> Bitmap Index Scan on idx_orders_created_at
#                  (cost=0.00..45.63 rows=150 width=0)

# Equivalent join approach
plan = User
  .joins(:orders)
  .where("orders.created_at > ?", 1.week.ago)
  .distinct
  .explain(:analyze)

# Similar plan - optimizer converts subquery to semi-join

Modern optimizers often transform subqueries into joins automatically. The plan shows a semi-join, which stops searching after finding the first match. Both query forms produce identical plans, demonstrating that the optimizer normalizes different SQL expressions to the same execution strategy.

Example 5: Complex Multi-Table Query

Real applications often join many tables. This example shows a query joining four tables with multiple conditions.

# Find active users with high-value orders and their associated products
plan = User
  .joins(orders: :order_items)
  .joins(orders: :products)
  .where(users: { active: true })
  .where("orders.total > ?", 1000)
  .where("products.category = ?", "electronics")
  .select("users.*, SUM(order_items.quantity) as total_items")
  .group("users.id")
  .explain(:analyze)

# GroupAggregate  (cost=45678.90..46789.12 rows=50 width=536)
#   Group Key: users.id
#   -> Sort  (cost=45678.90..45789.01 rows=440 width=528)
#         Sort Key: users.id
#         -> Hash Join  (cost=3456.78..45234.56 rows=440 width=528)
#               Hash Cond: (order_items.order_id = orders.id)
#               -> Seq Scan on order_items  (cost=0.00..23456.00 rows=500000)
#               -> Hash  (cost=3234.56..3234.56 rows=177 width=520)
#                     -> Hash Join  (cost=2345.67..3234.56 rows=177 width=520)
#                           Hash Cond: (orders.user_id = users.id)
#                           -> Index Scan using idx_orders_total on orders
#                                 (cost=0.43..678.90 rows=890 width=16)
#                                 Index Cond: (total > 1000)
#                           -> Hash  (cost=1234.00..1234.00 rows=5000)
#                                 -> Index Scan using idx_users_active on users
#                                       (cost=0.42..1234.00 rows=5000)
#                                       Index Cond: (active = true)

The plan reveals join order: users filtered by active status, joined with orders filtered by total amount, then joined with order_items, and finally grouped. The optimizer chose this order because filtering users and orders first produces small intermediate results. The final sort operation prepares data for group aggregation.

Performance Considerations

Query execution plans guide performance optimization by exposing inefficiencies. Analyzing plans systematically identifies bottlenecks and validates optimization attempts.

Sequential Scan Detection reveals missing indexes. Sequential scans read entire tables regardless of how many rows the query needs. For large tables, sequential scans cause significant I/O overhead.

# Detect sequential scans in plan
def has_sequential_scan?(plan_text)
  plan_text.include?("Seq Scan") && !plan_text.include?("rows=")
end

# Find problematic queries
queries = [
  User.where(email: "test@example.com"),
  Order.where("created_at > ?", 1.week.ago),
  Product.where(category: "electronics", available: true)
]

queries.each do |query|
  plan = query.explain
  if has_sequential_scan?(plan)
    puts "Sequential scan detected:"
    puts query.to_sql
    puts plan
  end
end

Add indexes for columns in WHERE clauses, JOIN conditions, and ORDER BY clauses. Verify index usage by checking the updated execution plan.

Join Order Optimization affects queries joining multiple tables. Poor join orders create large intermediate results that slow subsequent operations.

# Inefficient join order - large intermediate result
plan = Order
  .joins(:order_items)  # 1M orders × 3M order_items = huge intermediate set
  .joins(:user)
  .where(users: { country: "US" })  # Filters late
  .explain(:analyze)

# Improved query - filter first
plan = User
  .where(country: "US")  # Reduces to 100K users
  .joins(:orders)
  .joins(orders: :order_items)
  .explain(:analyze)

The optimizer usually chooses good join orders, but complex queries with many tables sometimes benefit from manual ordering through query restructuring. Forcing specific join orders requires database-specific hints or query rewriting.

Index Selection impacts query speed significantly. Composite indexes serve queries with multiple conditions better than separate single-column indexes.

# Query with multiple conditions
query = Order.where(user_id: 123, status: "completed")

# Single indexes on user_id and status
# Plan shows: Index Scan on idx_user_id, then filter on status
# Database must filter status after index lookup

# Composite index on (user_id, status)
ActiveRecord::Base.connection.add_index :orders, [:user_id, :status]

# Plan shows: Index Scan on idx_user_id_status
# Database uses index for both conditions

Index column order matters for composite indexes. Place high-selectivity columns first. An index on (user_id, status) works for queries filtering by user_id alone but not for queries filtering by status alone.

Sort Operation Elimination improves performance for queries with ORDER BY clauses. Indexes that match the sort order enable index-only scans without explicit sorting.

# Query with sort
query = Order.order(created_at: :desc).limit(10)

# Without index - full table sort
# Plan shows:
# Limit
#   -> Sort (cost=high)
#        -> Seq Scan

# With index on created_at
ActiveRecord::Base.connection.add_index :orders, :created_at

# Plan shows:
# Limit
#   -> Index Scan Backward using idx_created_at
# No sort operation needed

Cardinality Estimate Accuracy determines plan quality. Inaccurate estimates cause the optimizer to choose suboptimal strategies.

# Update statistics to improve estimates
ActiveRecord::Base.connection.execute("ANALYZE users")
ActiveRecord::Base.connection.execute("ANALYZE orders")

# Check estimate accuracy
plan = User.joins(:orders).where(active: true).explain(:analyze)

# Compare estimated rows vs actual rows in plan output:
# estimated rows=1000 actual rows=50000  # Bad estimate
# estimated rows=48000 actual rows=50000  # Good estimate

Large discrepancies between estimated and actual row counts indicate stale statistics. Run ANALYZE after bulk data changes to update statistics.

Memory Usage affects hash joins and sort operations. Hash joins require enough memory to build hash tables. Insufficient memory forces disk-based operations.

# Large hash join
plan = User
  .joins(:orders)
  .where("orders.total > ?", 1000)
  .explain(:analyze, :buffers)

# Plan includes buffer statistics:
# Buffers: shared hit=1234 read=5678 written=234
# 
# High 'read' and 'written' values indicate memory pressure
# Consider increasing work_mem for this query

PostgreSQL's work_mem setting controls per-operation memory. Increasing work_mem allows larger in-memory operations but risks out-of-memory errors when many queries run concurrently.

Partial Index Usage reduces index size and improves performance for queries with constant filters.

# Create partial index for active users only
ActiveRecord::Base.connection.execute(
  "CREATE INDEX idx_active_users_email ON users(email) WHERE active = true"
)

# Query matches partial index condition
query = User.where(active: true, email: "user@example.com")
plan = query.explain

# Plan shows: Index Scan using idx_active_users_email
# Smaller index means faster scans

# Query without active condition doesn't use partial index
query = User.where(email: "user@example.com")
plan = query.explain
# Plan shows: Seq Scan or different index

Covering Index Implementation eliminates table access by including all required columns in the index.

# Query needs id, email, and name
query = User.select(:id, :email, :name).where(email: "user@example.com")

# Regular index on email requires table lookup for name
plan = query.explain
# Index Scan using idx_email
# -> Heap Fetch to get name column

# Covering index includes all needed columns
ActiveRecord::Base.connection.execute(
  "CREATE INDEX idx_users_email_covering ON users(email) INCLUDE (name)"
)

# Plan shows Index Only Scan
plan = query.explain
# Index Only Scan using idx_users_email_covering
# No table access needed

Query Parallelization splits work across multiple CPU cores for large table scans and aggregations.

# Large table scan with aggregation
query = Order.where("created_at > ?", 1.year.ago).count
plan = query.explain(:analyze)

# Single-threaded plan:
# Aggregate (cost=567890.12..567890.13 rows=1)
#   -> Seq Scan on orders

# Parallel plan (PostgreSQL with parallel_workers enabled):
# Finalize Aggregate (cost=234567.89..234567.90 rows=1)
#   -> Gather (workers=4)
#        -> Partial Aggregate
#             -> Parallel Seq Scan on orders

Parallel execution benefits large sequential scans but adds coordination overhead. Small queries run slower with parallelization due to worker startup costs.

Tools & Ecosystem

Multiple tools visualize and analyze execution plans beyond basic command-line output. These tools parse plan structures, highlight performance issues, and compare plans.

PgHero is a Ruby gem that monitors PostgreSQL performance and analyzes query patterns. It identifies missing indexes, slow queries, and provides execution plan analysis.

# Gemfile
gem 'pghero'

# Configuration
# config/pghero.yml
databases:
  production:
    url: <%= ENV["DATABASE_URL"] %>

# Usage
PgHero.slow_queries  # Lists slowest queries
PgHero.missing_indexes  # Suggests beneficial indexes
PgHero.index_usage  # Shows which indexes are unused

# Analyze specific query
query = "SELECT * FROM users WHERE email = 'test@example.com'"
PgHero.explain(query)

Explain Visualizers transform text plans into graphical representations. Several web-based tools parse and visualize PostgreSQL plans:

  • explain.dalibo.com - Renders plans as tree diagrams with cost metrics
  • explain.depesz.com - Provides detailed plan analysis with performance insights
  • tatiyants.com/pev - JavaScript-based visualizer for plan exploration
# Generate JSON plan for visualizers
plan_json = ActiveRecord::Base.connection.execute(
  "EXPLAIN (ANALYZE, FORMAT JSON) #{query.to_sql}"
).first["QUERY PLAN"]

# Save to file for upload to visualizer
File.write("plan.json", plan_json)

Database-Specific Tools provide native plan analysis interfaces:

PostgreSQL pgAdmin includes a graphical query tool with visual EXPLAIN output. MySQL Workbench offers visual explain for MySQL queries. SQLite's command-line tool provides EXPLAIN QUERY PLAN output.

Bullet Gem detects N+1 queries in Ruby applications automatically during development and testing.

# Gemfile
gem 'bullet', group: :development

# Configuration
# config/environments/development.rb
config.after_initialize do
  Bullet.enable = true
  Bullet.alert = true
  Bullet.bullet_logger = true
  Bullet.console = true
  Bullet.rails_logger = true
end

# Bullet detects N+1 queries and suggests eager loading
# GET /users shows notification:
# USE eager loading detected
#   User => [:orders]
#   Add to query: .includes(:orders)

Query Performance Monitoring in production environments tracks slow queries and generates automated alerts.

# NewRelic integration
# Automatically captures slow query plans

# Scout APM integration
# config/scout_apm.yml
common: &defaults
  monitor: true
  capture_params: true

# Automatically tracks slow queries with execution plans

# Custom monitoring
class QueryMonitor
  def self.track(relation, threshold_ms: 100)
    start_time = Time.now
    result = relation.load
    duration = (Time.now - start_time) * 1000
    
    if duration > threshold_ms
      plan = relation.explain(:analyze)
      Rails.logger.warn "Slow query (#{duration.round(2)}ms): #{relation.to_sql}"
      Rails.logger.warn plan
      
      # Send to monitoring service
      # MetricsService.track_slow_query(...)
    end
    
    result
  end
end

# Usage
users = QueryMonitor.track(User.where(country: "US"))

Development Tools integrate plan analysis into development workflow:

# Add to ApplicationRecord for easy plan inspection
class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true
  
  def self.explain_query
    explain(:analyze, :buffers).tap { |plan| puts plan }
  end
end

# Usage in console
User.where(active: true).explain_query

# Create rake task for plan analysis
# lib/tasks/query_analysis.rake
namespace :db do
  desc "Analyze slow queries"
  task analyze_queries: :environment do
    queries = [
      User.joins(:orders).where("orders.total > ?", 1000),
      Product.where(available: true).order(price: :desc),
      Order.group(:user_id).having("COUNT(*) > 10")
    ]
    
    queries.each_with_index do |query, idx|
      puts "\n=== Query #{idx + 1} ==="
      puts query.to_sql
      puts "\n=== Plan ==="
      puts query.explain(:analyze)
    end
  end
end

Benchmark Tools measure query performance alongside plan analysis:

require 'benchmark'

def compare_queries(*queries)
  queries.each_with_index do |query, idx|
    puts "\n=== Query #{idx + 1} ==="
    puts query.to_sql
    
    time = Benchmark.realtime do
      query.load
    end
    
    puts "Execution time: #{(time * 1000).round(2)}ms"
    puts "\nPlan:"
    puts query.explain(:analyze)
    puts "=" * 80
  end
end

# Usage
compare_queries(
  User.joins(:orders).where("orders.total > ?", 1000),
  Order.where("total > ?", 1000).joins(:user)
)

Prosopite Gem detects N+1 queries in test suite and development:

# Gemfile
gem 'prosopite', group: :development

# Raise exception on N+1 detection in tests
# spec/rails_helper.rb
RSpec.configure do |config|
  config.before(:each) do
    Prosopite.scan
  end
  
  config.after(:each) do
    Prosopite.finish
  end
end

Common Pitfalls

Misinterpreting execution plans leads to incorrect optimization decisions. Understanding common mistakes prevents wasted effort and maintains query performance.

Misreading Cost Units causes confusion because cost values represent abstract units, not milliseconds or seconds. Comparing costs across queries provides relative performance indicators, but absolute cost values lack meaning without context.

# Two queries with different costs
query1 = User.where(active: true).explain
# Cost: 45.67..123.45

query2 = Order.where(status: "completed").explain
# Cost: 234.56..567.89

# WRONG: Query2 takes 5x longer than query1
# RIGHT: Query2 has higher estimated cost, but actual time depends on
#        data distribution, cache state, concurrent load, etc.

Cost estimates predict relative resource usage during planning. Actual execution time varies based on cache hits, I/O speed, CPU load, and memory availability. Always use EXPLAIN ANALYZE to measure real performance.

Ignoring Actual vs Estimated Rows overlooks critical plan quality indicators. Large discrepancies between estimated and actual row counts indicate statistics problems.

plan = User.joins(:orders).where("orders.total > ?", 1000).explain(:analyze)

# Plan output shows:
# Hash Join (rows=100 actual rows=50000)
#   -> Seq Scan on users (rows=10000 actual rows=10000)
#   -> Hash (rows=10 actual rows=5000)
#        -> Index Scan on orders (rows=10 actual rows=5000)

# Problem: Orders scan estimated 10 rows but found 5000
# Impact: Optimizer chose nested loop expecting few orders
# Solution: ANALYZE orders table to update statistics
ActiveRecord::Base.connection.execute("ANALYZE orders")

The optimizer chose a nested loop join expecting 10 orders, but 5000 orders exist. This mismatch causes poor performance. Regular statistics updates prevent estimate errors.

Assuming Index Usage Guarantees Speed ignores index overhead and table access costs. Indexes help selective queries but hurt queries accessing many rows.

# Query returning 90% of table rows
query = User.where("created_at > ?", 5.years.ago)

# Plan with index:
# Index Scan using idx_created_at (cost=0.42..45678.90)
#   -> Heap Fetches: 900000 of 1000000

# Plan without index:
# Seq Scan (cost=0.00..12345.67)

# Sequential scan is faster - avoids random I/O for each row

Index scans add overhead: reading index blocks, following pointers to table blocks, and handling random I/O patterns. For queries returning most rows, sequential scans perform better by reading table blocks sequentially.

Neglecting Statistics Maintenance causes gradual plan quality degradation. Data distribution changes over time make existing statistics inaccurate.

# Application with growing orders table
# Initial statistics: 10K orders
# Current state: 10M orders

# Old plan (with outdated stats):
# Nested Loop
#   -> Seq Scan on users (rows=100)
#   -> Index Scan on orders (rows=100)

# After ANALYZE:
# Hash Join
#   -> Seq Scan on users (rows=100)
#   -> Hash
#        -> Seq Scan on orders (rows=1000000)

# Schedule regular statistics updates
# lib/tasks/maintenance.rake
namespace :db do
  desc "Update query optimizer statistics"
  task update_stats: :environment do
    tables = ActiveRecord::Base.connection.tables
    tables.each do |table|
      ActiveRecord::Base.connection.execute("ANALYZE #{table}")
    end
  end
end

# Run nightly via cron or scheduler

Optimizing for Wrong Metrics focuses on plan costs instead of wall-clock time. Some queries show high costs but run quickly due to caching or parallel execution.

# Query with high cost but fast execution
plan = Product.where(available: true).explain(:analyze)

# Plan shows:
# Seq Scan on products (cost=0.00..5678.90 rows=100000)
# Planning Time: 1.234 ms
# Execution Time: 2.456 ms

# High cost, but fast execution due to cached data
# Optimization unnecessary despite high cost estimate

Execution time matters more than estimated cost. Measure actual performance under realistic conditions before optimizing.

Forgetting Database-Specific Differences applies PostgreSQL optimization techniques to MySQL or SQLite. Different databases use different optimizers with varying capabilities.

# PostgreSQL CTE optimization
query = <<-SQL
  WITH recent_orders AS (
    SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '7 days'
  )
  SELECT * FROM recent_orders WHERE total > 1000
SQL

# PostgreSQL optimizes CTE by pushing filters
# MySQL may materialize CTE fully before filtering

# Check actual behavior for target database
ActiveRecord::Base.connection.execute("EXPLAIN #{query}")

Test query behavior on the target database. PostgreSQL, MySQL, and SQLite have different optimization capabilities, cost models, and plan formats.

Misunderstanding Join Types confuses INNER JOIN behavior with LEFT JOIN performance. LEFT JOINs cannot be optimized as aggressively because they must return all left table rows.

# LEFT JOIN preserves all users
query = User.left_joins(:orders).where("orders.id IS NULL")
# Finds users without orders

plan = query.explain(:analyze)
# Hash Left Join with filter on NULL
# Cannot optimize to avoid scanning all users

# More efficient alternative using NOT EXISTS
query = User.where("NOT EXISTS (SELECT 1 FROM orders WHERE user_id = users.id)")
# Optimizer can stop searching after first match

Overlooking Prepared Statement Caching causes repeated planning overhead. Prepared statements cache plans, but generic plans may perform poorly for skewed data distributions.

# Prepared statement with cached plan
stmt = ActiveRecord::Base.connection.raw_connection.prepare(
  "SELECT * FROM users WHERE country = $1"
)

# First execution: country='US' (1% of data)
# Second execution: country='Other' (70% of data)

# Cached plan optimized for first execution performs poorly
# for second execution

# Solution: Use simple statements for highly variable queries
User.where(country: country).load  # Plans for each execution

Ignoring Slow Queries in Transactions masks performance issues. Long-running transactions hold locks and block other queries.

# Slow query inside transaction
User.transaction do
  # Fast query
  user = User.find(id)
  
  # Slow query with locks held
  orders = Order.where(user_id: id).includes(:order_items).load
  
  # More operations while holding locks
  # ...
end

# Better: Minimize transaction scope
orders = Order.where(user_id: id).includes(:order_items).load

User.transaction do
  user = User.find(id)
  # Quick updates only
  user.update!(last_order_id: orders.first.id)
end

Reference

Common Plan Node Types

Node Type Description When Used
Seq Scan Sequential table scan reading all rows Small tables or queries accessing most rows
Index Scan B-tree index traversal with table lookups Selective queries with indexed conditions
Index Only Scan Index traversal without table access Queries where index contains all needed columns
Bitmap Index Scan Index scan creating row location bitmap Multiple index conditions combined
Bitmap Heap Scan Table access using row location bitmap After bitmap index scan completion
Nested Loop Iterate outer table, probe inner for each row Small outer table with indexed inner table
Hash Join Build hash table, probe with other input Medium tables with equality conditions
Merge Join Parallel scan of pre-sorted inputs Large sorted tables or acceptable sort cost
Sort Order rows by specified columns ORDER BY, GROUP BY, or merge join preparation
HashAggregate Group rows using in-memory hash table GROUP BY with sufficient memory
GroupAggregate Group pre-sorted rows GROUP BY with sorted input
Limit Return first N rows LIMIT clause in query
Subquery Scan Execute subquery and scan results Subqueries in FROM clause

Cost Components

Component Description Impact
Startup Cost Work before first row output Sorting, hash table building
Total Cost Complete operation cost All rows processed
Rows Estimated result rows Affects parent operation choices
Width Average row size in bytes Memory usage estimation

Index Selection Criteria

Condition Index Used Alternative
Single column equality Single-column index Sequential scan for low selectivity
Multiple column equality Composite index preferred Multiple single-column indexes
Range condition B-tree index Sequential scan for wide ranges
Sorting requirement Index matching ORDER BY Explicit sort operation
Partial table access Partial index if conditions match Full index with filter
All columns in query Covering index Regular index with table access

Join Strategy Selection

Scenario Strategy Reason
Small outer, indexed inner Nested Loop Efficient index probes
Medium tables, equality Hash Join Fast hash table lookup
Large sorted tables Merge Join Linear scan of both inputs
Small result set expected Nested Loop Early termination possible
No join indexes Hash Join Avoids full table scans per row
Memory constrained Merge Join Lower memory usage

Plan Analysis Commands

Database Basic Plan With Execution JSON Format
PostgreSQL EXPLAIN query EXPLAIN ANALYZE query EXPLAIN (FORMAT JSON) query
MySQL EXPLAIN query N/A - executes without timing EXPLAIN FORMAT=JSON query
SQLite EXPLAIN QUERY PLAN query N/A N/A
Ruby/ActiveRecord relation.explain relation.explain(:analyze) N/A - database specific

Performance Indicators

Indicator Meaning Action
Seq Scan on large table Reading entire table Add index on filter columns
High actual vs estimated rows Statistics outdated Run ANALYZE on table
Nested Loop with high cost Join order issue Consider query restructuring
Sort operation Ordering data Add index matching ORDER BY
Multiple Bitmap Heap Scans Index combination Consider composite index
Hash with high buckets Memory pressure Increase work_mem setting

Common Optimization Patterns

Pattern Implementation Benefit
Covering Index Index includes all query columns Eliminates table access
Partial Index Index with WHERE clause Smaller index size
Composite Index Multiple columns in one index Serves multi-column queries
Index Column Order High selectivity columns first Better index filtering
Eager Loading includes() or preload() Prevents N+1 queries
Query Rewriting Change query structure Better plan selection
Statistics Update Regular ANALYZE runs Accurate cost estimates
Index-Only Scans SELECT only indexed columns Faster data access