Overview
A query execution plan represents the sequence of operations a database management system performs to execute a SQL query. The database query optimizer generates these plans by evaluating multiple strategies and selecting the one with the lowest estimated cost. Execution plans expose the internal decision-making process of the database, showing which indexes the system uses, how it joins tables, and the order of operations.
Database systems convert declarative SQL statements into procedural execution strategies. When a query arrives, the optimizer analyzes available access paths, estimates the cost of different approaches, and constructs a plan tree. This plan tree consists of nodes representing operations like table scans, index lookups, joins, sorts, and aggregations. Each node receives input from child nodes and produces output for parent nodes.
Execution plans serve multiple purposes in database performance analysis. Developers examine plans to understand why queries perform poorly, identify missing indexes, detect inefficient join strategies, and verify that the optimizer makes correct assumptions about data distribution. Plans reveal the gap between how developers expect queries to execute and how they actually execute.
The format and detail level of execution plans varies across database systems. PostgreSQL provides detailed text-based plans with cost estimates and row counts. MySQL offers both traditional and JSON-formatted plans with additional metrics. SQLite produces simpler plans focused on index usage and join order. Despite these differences, all execution plans share common elements: operation types, execution order, cost estimates, and cardinality predictions.
-- Simple query example
SELECT users.name, COUNT(orders.id)
FROM users
LEFT JOIN orders ON users.id = orders.user_id
WHERE users.active = true
GROUP BY users.id, users.name;
-- Execution plan shows:
-- 1. Filter users by active flag
-- 2. Join with orders table
-- 3. Group results
-- 4. Count aggregation
Database optimizers balance multiple competing factors: minimizing disk I/O, reducing memory usage, parallelizing operations when possible, and avoiding expensive sort operations. The optimizer assigns costs to each operation based on statistics about table sizes, data distribution, and available indexes. These statistics directly influence plan quality, making regular statistics updates critical for optimal performance.
Key Principles
Query execution plans follow a tree structure where leaf nodes access base data and intermediate nodes transform or combine data. The database executes plans bottom-up, with child nodes producing rows that parent nodes consume. This hierarchical organization allows the optimizer to compose complex operations from simple building blocks.
Cost-Based Optimization drives plan selection. The optimizer calculates estimated costs for different execution strategies, measured in abstract cost units that represent I/O operations, CPU cycles, and memory usage. The system chooses the plan with the lowest total cost. Cost estimation depends on table statistics, which track row counts, column value distributions, index selectivity, and data clustering. Outdated statistics lead to suboptimal plans because the optimizer makes decisions based on incorrect assumptions about data characteristics.
Access Methods determine how the database retrieves rows from tables. Sequential scans read entire tables from disk, appropriate for small tables or queries that access most rows. Index scans use B-tree or hash indexes to locate specific rows quickly, ideal for selective queries. Bitmap index scans combine multiple indexes before accessing table data, reducing random I/O. Index-only scans retrieve all required data from the index itself, avoiding table access entirely.
# Sequential scan - reads all rows
User.where("created_at > ?", 1.year.ago) # Without index on created_at
# Index scan - uses index to find specific rows
User.where(email: "user@example.com") # With unique index on email
# Index-only scan - all columns in index
User.select(:id, :email).where(email: "user@example.com") # Covered by index
Join Strategies combine data from multiple tables using different algorithms. Nested loop joins iterate over one table and probe the other table for each row, efficient when one table is small. Hash joins build an in-memory hash table from one input and probe it with rows from the other input, fast for medium-sized tables. Merge joins require sorted inputs and scan both tables in parallel, optimal for pre-sorted data or when sort cost is acceptable.
The database selects join strategies based on table sizes, available indexes, memory limits, and sort requirements. Small tables often use nested loops, medium tables benefit from hash joins, and large sorted tables work well with merge joins. The optimizer may choose different strategies for different joins in the same query.
Join Order significantly impacts query performance. For N tables, N! possible join orders exist. The optimizer uses dynamic programming or heuristics to explore join orders and select the most efficient sequence. The system considers table sizes, join selectivity, and available indexes when ordering joins. Joining smaller result sets first reduces intermediate result sizes and overall query cost.
Cardinality Estimation predicts how many rows each operation produces. The optimizer uses these estimates to choose appropriate algorithms and allocate memory. Inaccurate estimates cause poor plan choices, such as selecting nested loop joins when hash joins would perform better. Estimation errors compound through the plan tree, making accurate statistics essential for good plans.
Plan Caching stores compiled plans to avoid repeated optimization overhead. Prepared statements and parameterized queries enable plan reuse across multiple executions. However, cached plans may become suboptimal when data distribution changes significantly. Some databases use adaptive optimization to regenerate plans when execution metrics deviate from estimates.
Parallel Execution splits operations across multiple worker processes to utilize available CPU cores. The optimizer considers parallelization when processing large tables or performing expensive operations. Parallel query execution requires sufficient memory for worker coordination and may not benefit small queries due to coordination overhead.
Ruby Implementation
Ruby applications interact with query execution plans primarily through database adapters and ActiveRecord. Each database adapter provides methods to retrieve and display execution plans, with syntax varying by database system.
PostgreSQL with ActiveRecord uses the EXPLAIN command to generate execution plans. The explain method on ActiveRecord relations returns plan details as a string.
# Basic EXPLAIN
plan = User.where(email: "test@example.com").explain
puts plan
# EXPLAIN ANALYZE - actually executes query and shows real timing
plan = User.where(email: "test@example.com").explain(:analyze, :buffers)
puts plan
# Multiple conditions
plan = Order
.joins(:user)
.where(created_at: 1.week.ago..Time.now)
.where(users: { active: true })
.explain
# Shows join strategy, filter operations, index usage
The explain method accepts options that control plan detail level. The :analyze option executes the query and includes actual timing and row counts. The :buffers option shows buffer usage statistics. The :verbose option includes additional plan details.
# Detailed analysis with all options
User
.joins(:orders)
.group(:id)
.having("COUNT(orders.id) > ?", 5)
.explain(:analyze, :buffers, :verbose)
MySQL/MariaDB implementation differs slightly. ActiveRecord supports EXPLAIN for MySQL, but the format varies between MySQL versions.
# MySQL 5.6 and earlier - traditional format
Product.where("price > ?", 100).explain
# MySQL 5.7+ - JSON format provides more detail
# Access through raw SQL for JSON format
result = ActiveRecord::Base.connection.execute(
"EXPLAIN FORMAT=JSON #{Product.where('price > 100').to_sql}"
)
json_plan = JSON.parse(result.first.first)
SQLite provides simpler execution plans focused on query structure rather than detailed cost analysis.
# SQLite EXPLAIN
plan = User.where(active: true).explain
# Shows SQLite bytecode operations
# EXPLAIN QUERY PLAN - higher level view
connection = ActiveRecord::Base.connection
query_sql = User.where(active: true).to_sql
plan = connection.execute("EXPLAIN QUERY PLAN #{query_sql}")
Custom Plan Analysis requires parsing plan output and extracting metrics. Building helper methods simplifies repetitive analysis tasks.
class QueryAnalyzer
def self.analyze(relation)
plan_text = relation.explain(:analyze)
{
execution_time: extract_execution_time(plan_text),
sequential_scans: count_sequential_scans(plan_text),
index_scans: count_index_scans(plan_text),
sort_operations: count_sorts(plan_text)
}
end
private
def self.extract_execution_time(plan)
match = plan.match(/Execution Time: ([\d.]+) ms/)
match ? match[1].to_f : nil
end
def self.count_sequential_scans(plan)
plan.scan(/Seq Scan/).count
end
def self.count_index_scans(plan)
plan.scan(/Index Scan|Index Only Scan|Bitmap Index Scan/).count
end
def self.count_sorts(plan)
plan.scan(/Sort/).count
end
end
# Usage
stats = QueryAnalyzer.analyze(
User.joins(:orders).where("orders.total > ?", 1000)
)
Connection-Level Plan Analysis accesses plans for any SQL statement, not just ActiveRecord relations.
# Direct SQL with EXPLAIN
sql = <<-SQL
SELECT users.*, COUNT(orders.id) as order_count
FROM users
LEFT JOIN orders ON orders.user_id = users.id
GROUP BY users.id
HAVING COUNT(orders.id) > 5
SQL
connection = ActiveRecord::Base.connection
plan = connection.execute("EXPLAIN ANALYZE #{sql}").to_a
Plan Comparison helps evaluate optimization efforts by running queries before and after index changes.
class PlanComparator
def compare(relation)
before_plan = capture_plan(relation)
yield # Make schema changes (add indexes, etc.)
after_plan = capture_plan(relation)
{
before: before_plan,
after: after_plan,
improvement: calculate_improvement(before_plan, after_plan)
}
end
private
def capture_plan(relation)
plan_text = relation.explain(:analyze)
{
text: plan_text,
execution_time: extract_time(plan_text),
operations: extract_operations(plan_text)
}
end
def calculate_improvement(before_plan, after_plan)
before_time = before_plan[:execution_time]
after_time = after_plan[:execution_time]
return nil unless before_time && after_time
((before_time - after_time) / before_time * 100).round(2)
end
def extract_time(plan)
match = plan.match(/Execution Time: ([\d.]+) ms/)
match ? match[1].to_f : nil
end
def extract_operations(plan)
plan.scan(/^[^\s].*(?:Scan|Join|Sort|Hash|Aggregate)/).map(&:strip)
end
end
Database-Specific Adapters expose plan functionality through adapter-specific methods. The pg gem for PostgreSQL provides direct access to plan information.
require 'pg'
conn = PG.connect(dbname: 'myapp_production')
# Get plan without execution
result = conn.exec("EXPLAIN SELECT * FROM users WHERE active = true")
puts result.values
# Get plan with execution
result = conn.exec("EXPLAIN ANALYZE SELECT * FROM users WHERE active = true")
# JSON format for programmatic processing
result = conn.exec(
"EXPLAIN (ANALYZE, FORMAT JSON) SELECT * FROM users WHERE active = true"
)
plan_json = JSON.parse(result.getvalue(0, 0))
Practical Examples
Examining real execution plans demonstrates how databases process queries and identifies optimization opportunities. These examples progress from simple lookups to complex multi-table joins.
Example 1: Index vs Sequential Scan
A common scenario involves finding users by email. The execution plan reveals whether the database uses an index.
# Without index on email column
plan = User.where(email: "user@example.com").explain
# PostgreSQL output (formatted):
# Seq Scan on users (cost=0.00..2584.00 rows=1 width=524)
# Filter: ((email)::text = 'user@example.com'::text)
# Planning Time: 0.123 ms
# Execution Time: 45.234 ms
# After adding index
ActiveRecord::Base.connection.add_index :users, :email
plan = User.where(email: "user@example.com").explain
# Index Scan using index_users_on_email on users
# (cost=0.42..8.44 rows=1 width=524)
# Index Cond: ((email)::text = 'user@example.com'::text)
# Planning Time: 0.089 ms
# Execution Time: 0.234 ms
The sequential scan cost of 2584.00 units reflects reading the entire table. The index scan cost of 8.44 units shows dramatic improvement. Execution time dropped from 45ms to 0.2ms, a 200x speedup. The plan confirms the database uses the index through "Index Scan using index_users_on_email."
Example 2: Join Strategy Selection
Join strategy depends on table sizes and available indexes. This example shows how the optimizer chooses between nested loop and hash joins.
# Small users table (1000 rows), large orders table (1 million rows)
plan = User
.joins(:orders)
.where(users: { country: "US" })
.explain(:analyze)
# Nested Loop (cost=0.71..25234.45 rows=5000 width=1048)
# -> Index Scan using idx_users_country on users
# (cost=0.28..45.30 rows=100 width=524)
# Index Cond: ((country)::text = 'US'::text)
# -> Index Scan using idx_orders_user_id on orders
# (cost=0.43..250.45 rows=50 width=524)
# Index Cond: (user_id = users.id)
# Large users table (1 million rows), small orders table (10000 rows)
plan = User
.joins(:orders)
.where("orders.created_at > ?", 1.day.ago)
.explain(:analyze)
# Hash Join (cost=452.00..35678.45 rows=500 width=1048)
# Hash Cond: (orders.user_id = users.id)
# -> Seq Scan on orders (cost=0.00..234.00 rows=500 width=524)
# Filter: (created_at > '2024-10-06 00:00:00')
# -> Hash (cost=234.00..234.00 rows=10000 width=524)
# -> Seq Scan on users (cost=0.00..234.00 rows=10000 width=524)
The nested loop appears when the outer table has few rows (100 users). The optimizer probes the orders table once per user, using the index for efficient lookups. The hash join appears when filtering produces a small orders set (500 rows). The database builds a hash table from the larger users table and probes it for each order.
Example 3: Aggregation and Grouping
Aggregation queries show how databases handle grouping, sorting, and aggregation operations.
# Count orders per user
plan = Order
.group(:user_id)
.select("user_id, COUNT(*) as order_count")
.having("COUNT(*) > 10")
.explain(:analyze)
# HashAggregate (cost=45234.52..45456.52 rows=222 width=16)
# Group Key: user_id
# Filter: (count(*) > 10)
# -> Seq Scan on orders (cost=0.00..34523.00 rows=1000000 width=8)
# Planning Time: 0.234 ms
# Execution Time: 234.567 ms
# With index on user_id for sorting
ActiveRecord::Base.connection.add_index :orders, :user_id
plan = Order
.group(:user_id)
.select("user_id, COUNT(*) as order_count")
.having("COUNT(*) > 10")
.explain(:analyze)
# GroupAggregate (cost=0.42..78234.56 rows=222 width=16)
# Group Key: user_id
# Filter: (count(*) > 10)
# -> Index Scan using index_orders_on_user_id on orders
# (cost=0.42..56234.56 rows=1000000 width=8)
The hash aggregate builds an in-memory hash table for grouping. When the index exists, the optimizer switches to group aggregate, which processes pre-sorted input. Group aggregate uses less memory but may be slower if the sort operation is expensive.
Example 4: Subquery vs Join
Subqueries can execute differently than equivalent joins. The plan reveals the actual execution strategy.
# Subquery approach - find users with recent orders
plan = User
.where("id IN (SELECT user_id FROM orders WHERE created_at > ?)", 1.week.ago)
.explain(:analyze)
# Hash Semi Join (cost=2345.00..4567.89 rows=150 width=524)
# Hash Cond: (users.id = orders.user_id)
# -> Seq Scan on users (cost=0.00..1234.00 rows=10000 width=524)
# -> Hash (cost=1023.45..1023.45 rows=150 width=8)
# -> Bitmap Heap Scan on orders
# (cost=45.67..1023.45 rows=150 width=8)
# Recheck Cond: (created_at > '2024-09-30')
# -> Bitmap Index Scan on idx_orders_created_at
# (cost=0.00..45.63 rows=150 width=0)
# Equivalent join approach
plan = User
.joins(:orders)
.where("orders.created_at > ?", 1.week.ago)
.distinct
.explain(:analyze)
# Similar plan - optimizer converts subquery to semi-join
Modern optimizers often transform subqueries into joins automatically. The plan shows a semi-join, which stops searching after finding the first match. Both query forms produce identical plans, demonstrating that the optimizer normalizes different SQL expressions to the same execution strategy.
Example 5: Complex Multi-Table Query
Real applications often join many tables. This example shows a query joining four tables with multiple conditions.
# Find active users with high-value orders and their associated products
plan = User
.joins(orders: :order_items)
.joins(orders: :products)
.where(users: { active: true })
.where("orders.total > ?", 1000)
.where("products.category = ?", "electronics")
.select("users.*, SUM(order_items.quantity) as total_items")
.group("users.id")
.explain(:analyze)
# GroupAggregate (cost=45678.90..46789.12 rows=50 width=536)
# Group Key: users.id
# -> Sort (cost=45678.90..45789.01 rows=440 width=528)
# Sort Key: users.id
# -> Hash Join (cost=3456.78..45234.56 rows=440 width=528)
# Hash Cond: (order_items.order_id = orders.id)
# -> Seq Scan on order_items (cost=0.00..23456.00 rows=500000)
# -> Hash (cost=3234.56..3234.56 rows=177 width=520)
# -> Hash Join (cost=2345.67..3234.56 rows=177 width=520)
# Hash Cond: (orders.user_id = users.id)
# -> Index Scan using idx_orders_total on orders
# (cost=0.43..678.90 rows=890 width=16)
# Index Cond: (total > 1000)
# -> Hash (cost=1234.00..1234.00 rows=5000)
# -> Index Scan using idx_users_active on users
# (cost=0.42..1234.00 rows=5000)
# Index Cond: (active = true)
The plan reveals join order: users filtered by active status, joined with orders filtered by total amount, then joined with order_items, and finally grouped. The optimizer chose this order because filtering users and orders first produces small intermediate results. The final sort operation prepares data for group aggregation.
Performance Considerations
Query execution plans guide performance optimization by exposing inefficiencies. Analyzing plans systematically identifies bottlenecks and validates optimization attempts.
Sequential Scan Detection reveals missing indexes. Sequential scans read entire tables regardless of how many rows the query needs. For large tables, sequential scans cause significant I/O overhead.
# Detect sequential scans in plan
def has_sequential_scan?(plan_text)
plan_text.include?("Seq Scan") && !plan_text.include?("rows=")
end
# Find problematic queries
queries = [
User.where(email: "test@example.com"),
Order.where("created_at > ?", 1.week.ago),
Product.where(category: "electronics", available: true)
]
queries.each do |query|
plan = query.explain
if has_sequential_scan?(plan)
puts "Sequential scan detected:"
puts query.to_sql
puts plan
end
end
Add indexes for columns in WHERE clauses, JOIN conditions, and ORDER BY clauses. Verify index usage by checking the updated execution plan.
Join Order Optimization affects queries joining multiple tables. Poor join orders create large intermediate results that slow subsequent operations.
# Inefficient join order - large intermediate result
plan = Order
.joins(:order_items) # 1M orders × 3M order_items = huge intermediate set
.joins(:user)
.where(users: { country: "US" }) # Filters late
.explain(:analyze)
# Improved query - filter first
plan = User
.where(country: "US") # Reduces to 100K users
.joins(:orders)
.joins(orders: :order_items)
.explain(:analyze)
The optimizer usually chooses good join orders, but complex queries with many tables sometimes benefit from manual ordering through query restructuring. Forcing specific join orders requires database-specific hints or query rewriting.
Index Selection impacts query speed significantly. Composite indexes serve queries with multiple conditions better than separate single-column indexes.
# Query with multiple conditions
query = Order.where(user_id: 123, status: "completed")
# Single indexes on user_id and status
# Plan shows: Index Scan on idx_user_id, then filter on status
# Database must filter status after index lookup
# Composite index on (user_id, status)
ActiveRecord::Base.connection.add_index :orders, [:user_id, :status]
# Plan shows: Index Scan on idx_user_id_status
# Database uses index for both conditions
Index column order matters for composite indexes. Place high-selectivity columns first. An index on (user_id, status) works for queries filtering by user_id alone but not for queries filtering by status alone.
Sort Operation Elimination improves performance for queries with ORDER BY clauses. Indexes that match the sort order enable index-only scans without explicit sorting.
# Query with sort
query = Order.order(created_at: :desc).limit(10)
# Without index - full table sort
# Plan shows:
# Limit
# -> Sort (cost=high)
# -> Seq Scan
# With index on created_at
ActiveRecord::Base.connection.add_index :orders, :created_at
# Plan shows:
# Limit
# -> Index Scan Backward using idx_created_at
# No sort operation needed
Cardinality Estimate Accuracy determines plan quality. Inaccurate estimates cause the optimizer to choose suboptimal strategies.
# Update statistics to improve estimates
ActiveRecord::Base.connection.execute("ANALYZE users")
ActiveRecord::Base.connection.execute("ANALYZE orders")
# Check estimate accuracy
plan = User.joins(:orders).where(active: true).explain(:analyze)
# Compare estimated rows vs actual rows in plan output:
# estimated rows=1000 actual rows=50000 # Bad estimate
# estimated rows=48000 actual rows=50000 # Good estimate
Large discrepancies between estimated and actual row counts indicate stale statistics. Run ANALYZE after bulk data changes to update statistics.
Memory Usage affects hash joins and sort operations. Hash joins require enough memory to build hash tables. Insufficient memory forces disk-based operations.
# Large hash join
plan = User
.joins(:orders)
.where("orders.total > ?", 1000)
.explain(:analyze, :buffers)
# Plan includes buffer statistics:
# Buffers: shared hit=1234 read=5678 written=234
#
# High 'read' and 'written' values indicate memory pressure
# Consider increasing work_mem for this query
PostgreSQL's work_mem setting controls per-operation memory. Increasing work_mem allows larger in-memory operations but risks out-of-memory errors when many queries run concurrently.
Partial Index Usage reduces index size and improves performance for queries with constant filters.
# Create partial index for active users only
ActiveRecord::Base.connection.execute(
"CREATE INDEX idx_active_users_email ON users(email) WHERE active = true"
)
# Query matches partial index condition
query = User.where(active: true, email: "user@example.com")
plan = query.explain
# Plan shows: Index Scan using idx_active_users_email
# Smaller index means faster scans
# Query without active condition doesn't use partial index
query = User.where(email: "user@example.com")
plan = query.explain
# Plan shows: Seq Scan or different index
Covering Index Implementation eliminates table access by including all required columns in the index.
# Query needs id, email, and name
query = User.select(:id, :email, :name).where(email: "user@example.com")
# Regular index on email requires table lookup for name
plan = query.explain
# Index Scan using idx_email
# -> Heap Fetch to get name column
# Covering index includes all needed columns
ActiveRecord::Base.connection.execute(
"CREATE INDEX idx_users_email_covering ON users(email) INCLUDE (name)"
)
# Plan shows Index Only Scan
plan = query.explain
# Index Only Scan using idx_users_email_covering
# No table access needed
Query Parallelization splits work across multiple CPU cores for large table scans and aggregations.
# Large table scan with aggregation
query = Order.where("created_at > ?", 1.year.ago).count
plan = query.explain(:analyze)
# Single-threaded plan:
# Aggregate (cost=567890.12..567890.13 rows=1)
# -> Seq Scan on orders
# Parallel plan (PostgreSQL with parallel_workers enabled):
# Finalize Aggregate (cost=234567.89..234567.90 rows=1)
# -> Gather (workers=4)
# -> Partial Aggregate
# -> Parallel Seq Scan on orders
Parallel execution benefits large sequential scans but adds coordination overhead. Small queries run slower with parallelization due to worker startup costs.
Tools & Ecosystem
Multiple tools visualize and analyze execution plans beyond basic command-line output. These tools parse plan structures, highlight performance issues, and compare plans.
PgHero is a Ruby gem that monitors PostgreSQL performance and analyzes query patterns. It identifies missing indexes, slow queries, and provides execution plan analysis.
# Gemfile
gem 'pghero'
# Configuration
# config/pghero.yml
databases:
production:
url: <%= ENV["DATABASE_URL"] %>
# Usage
PgHero.slow_queries # Lists slowest queries
PgHero.missing_indexes # Suggests beneficial indexes
PgHero.index_usage # Shows which indexes are unused
# Analyze specific query
query = "SELECT * FROM users WHERE email = 'test@example.com'"
PgHero.explain(query)
Explain Visualizers transform text plans into graphical representations. Several web-based tools parse and visualize PostgreSQL plans:
- explain.dalibo.com - Renders plans as tree diagrams with cost metrics
- explain.depesz.com - Provides detailed plan analysis with performance insights
- tatiyants.com/pev - JavaScript-based visualizer for plan exploration
# Generate JSON plan for visualizers
plan_json = ActiveRecord::Base.connection.execute(
"EXPLAIN (ANALYZE, FORMAT JSON) #{query.to_sql}"
).first["QUERY PLAN"]
# Save to file for upload to visualizer
File.write("plan.json", plan_json)
Database-Specific Tools provide native plan analysis interfaces:
PostgreSQL pgAdmin includes a graphical query tool with visual EXPLAIN output. MySQL Workbench offers visual explain for MySQL queries. SQLite's command-line tool provides EXPLAIN QUERY PLAN output.
Bullet Gem detects N+1 queries in Ruby applications automatically during development and testing.
# Gemfile
gem 'bullet', group: :development
# Configuration
# config/environments/development.rb
config.after_initialize do
Bullet.enable = true
Bullet.alert = true
Bullet.bullet_logger = true
Bullet.console = true
Bullet.rails_logger = true
end
# Bullet detects N+1 queries and suggests eager loading
# GET /users shows notification:
# USE eager loading detected
# User => [:orders]
# Add to query: .includes(:orders)
Query Performance Monitoring in production environments tracks slow queries and generates automated alerts.
# NewRelic integration
# Automatically captures slow query plans
# Scout APM integration
# config/scout_apm.yml
common: &defaults
monitor: true
capture_params: true
# Automatically tracks slow queries with execution plans
# Custom monitoring
class QueryMonitor
def self.track(relation, threshold_ms: 100)
start_time = Time.now
result = relation.load
duration = (Time.now - start_time) * 1000
if duration > threshold_ms
plan = relation.explain(:analyze)
Rails.logger.warn "Slow query (#{duration.round(2)}ms): #{relation.to_sql}"
Rails.logger.warn plan
# Send to monitoring service
# MetricsService.track_slow_query(...)
end
result
end
end
# Usage
users = QueryMonitor.track(User.where(country: "US"))
Development Tools integrate plan analysis into development workflow:
# Add to ApplicationRecord for easy plan inspection
class ApplicationRecord < ActiveRecord::Base
self.abstract_class = true
def self.explain_query
explain(:analyze, :buffers).tap { |plan| puts plan }
end
end
# Usage in console
User.where(active: true).explain_query
# Create rake task for plan analysis
# lib/tasks/query_analysis.rake
namespace :db do
desc "Analyze slow queries"
task analyze_queries: :environment do
queries = [
User.joins(:orders).where("orders.total > ?", 1000),
Product.where(available: true).order(price: :desc),
Order.group(:user_id).having("COUNT(*) > 10")
]
queries.each_with_index do |query, idx|
puts "\n=== Query #{idx + 1} ==="
puts query.to_sql
puts "\n=== Plan ==="
puts query.explain(:analyze)
end
end
end
Benchmark Tools measure query performance alongside plan analysis:
require 'benchmark'
def compare_queries(*queries)
queries.each_with_index do |query, idx|
puts "\n=== Query #{idx + 1} ==="
puts query.to_sql
time = Benchmark.realtime do
query.load
end
puts "Execution time: #{(time * 1000).round(2)}ms"
puts "\nPlan:"
puts query.explain(:analyze)
puts "=" * 80
end
end
# Usage
compare_queries(
User.joins(:orders).where("orders.total > ?", 1000),
Order.where("total > ?", 1000).joins(:user)
)
Prosopite Gem detects N+1 queries in test suite and development:
# Gemfile
gem 'prosopite', group: :development
# Raise exception on N+1 detection in tests
# spec/rails_helper.rb
RSpec.configure do |config|
config.before(:each) do
Prosopite.scan
end
config.after(:each) do
Prosopite.finish
end
end
Common Pitfalls
Misinterpreting execution plans leads to incorrect optimization decisions. Understanding common mistakes prevents wasted effort and maintains query performance.
Misreading Cost Units causes confusion because cost values represent abstract units, not milliseconds or seconds. Comparing costs across queries provides relative performance indicators, but absolute cost values lack meaning without context.
# Two queries with different costs
query1 = User.where(active: true).explain
# Cost: 45.67..123.45
query2 = Order.where(status: "completed").explain
# Cost: 234.56..567.89
# WRONG: Query2 takes 5x longer than query1
# RIGHT: Query2 has higher estimated cost, but actual time depends on
# data distribution, cache state, concurrent load, etc.
Cost estimates predict relative resource usage during planning. Actual execution time varies based on cache hits, I/O speed, CPU load, and memory availability. Always use EXPLAIN ANALYZE to measure real performance.
Ignoring Actual vs Estimated Rows overlooks critical plan quality indicators. Large discrepancies between estimated and actual row counts indicate statistics problems.
plan = User.joins(:orders).where("orders.total > ?", 1000).explain(:analyze)
# Plan output shows:
# Hash Join (rows=100 actual rows=50000)
# -> Seq Scan on users (rows=10000 actual rows=10000)
# -> Hash (rows=10 actual rows=5000)
# -> Index Scan on orders (rows=10 actual rows=5000)
# Problem: Orders scan estimated 10 rows but found 5000
# Impact: Optimizer chose nested loop expecting few orders
# Solution: ANALYZE orders table to update statistics
ActiveRecord::Base.connection.execute("ANALYZE orders")
The optimizer chose a nested loop join expecting 10 orders, but 5000 orders exist. This mismatch causes poor performance. Regular statistics updates prevent estimate errors.
Assuming Index Usage Guarantees Speed ignores index overhead and table access costs. Indexes help selective queries but hurt queries accessing many rows.
# Query returning 90% of table rows
query = User.where("created_at > ?", 5.years.ago)
# Plan with index:
# Index Scan using idx_created_at (cost=0.42..45678.90)
# -> Heap Fetches: 900000 of 1000000
# Plan without index:
# Seq Scan (cost=0.00..12345.67)
# Sequential scan is faster - avoids random I/O for each row
Index scans add overhead: reading index blocks, following pointers to table blocks, and handling random I/O patterns. For queries returning most rows, sequential scans perform better by reading table blocks sequentially.
Neglecting Statistics Maintenance causes gradual plan quality degradation. Data distribution changes over time make existing statistics inaccurate.
# Application with growing orders table
# Initial statistics: 10K orders
# Current state: 10M orders
# Old plan (with outdated stats):
# Nested Loop
# -> Seq Scan on users (rows=100)
# -> Index Scan on orders (rows=100)
# After ANALYZE:
# Hash Join
# -> Seq Scan on users (rows=100)
# -> Hash
# -> Seq Scan on orders (rows=1000000)
# Schedule regular statistics updates
# lib/tasks/maintenance.rake
namespace :db do
desc "Update query optimizer statistics"
task update_stats: :environment do
tables = ActiveRecord::Base.connection.tables
tables.each do |table|
ActiveRecord::Base.connection.execute("ANALYZE #{table}")
end
end
end
# Run nightly via cron or scheduler
Optimizing for Wrong Metrics focuses on plan costs instead of wall-clock time. Some queries show high costs but run quickly due to caching or parallel execution.
# Query with high cost but fast execution
plan = Product.where(available: true).explain(:analyze)
# Plan shows:
# Seq Scan on products (cost=0.00..5678.90 rows=100000)
# Planning Time: 1.234 ms
# Execution Time: 2.456 ms
# High cost, but fast execution due to cached data
# Optimization unnecessary despite high cost estimate
Execution time matters more than estimated cost. Measure actual performance under realistic conditions before optimizing.
Forgetting Database-Specific Differences applies PostgreSQL optimization techniques to MySQL or SQLite. Different databases use different optimizers with varying capabilities.
# PostgreSQL CTE optimization
query = <<-SQL
WITH recent_orders AS (
SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '7 days'
)
SELECT * FROM recent_orders WHERE total > 1000
SQL
# PostgreSQL optimizes CTE by pushing filters
# MySQL may materialize CTE fully before filtering
# Check actual behavior for target database
ActiveRecord::Base.connection.execute("EXPLAIN #{query}")
Test query behavior on the target database. PostgreSQL, MySQL, and SQLite have different optimization capabilities, cost models, and plan formats.
Misunderstanding Join Types confuses INNER JOIN behavior with LEFT JOIN performance. LEFT JOINs cannot be optimized as aggressively because they must return all left table rows.
# LEFT JOIN preserves all users
query = User.left_joins(:orders).where("orders.id IS NULL")
# Finds users without orders
plan = query.explain(:analyze)
# Hash Left Join with filter on NULL
# Cannot optimize to avoid scanning all users
# More efficient alternative using NOT EXISTS
query = User.where("NOT EXISTS (SELECT 1 FROM orders WHERE user_id = users.id)")
# Optimizer can stop searching after first match
Overlooking Prepared Statement Caching causes repeated planning overhead. Prepared statements cache plans, but generic plans may perform poorly for skewed data distributions.
# Prepared statement with cached plan
stmt = ActiveRecord::Base.connection.raw_connection.prepare(
"SELECT * FROM users WHERE country = $1"
)
# First execution: country='US' (1% of data)
# Second execution: country='Other' (70% of data)
# Cached plan optimized for first execution performs poorly
# for second execution
# Solution: Use simple statements for highly variable queries
User.where(country: country).load # Plans for each execution
Ignoring Slow Queries in Transactions masks performance issues. Long-running transactions hold locks and block other queries.
# Slow query inside transaction
User.transaction do
# Fast query
user = User.find(id)
# Slow query with locks held
orders = Order.where(user_id: id).includes(:order_items).load
# More operations while holding locks
# ...
end
# Better: Minimize transaction scope
orders = Order.where(user_id: id).includes(:order_items).load
User.transaction do
user = User.find(id)
# Quick updates only
user.update!(last_order_id: orders.first.id)
end
Reference
Common Plan Node Types
| Node Type | Description | When Used |
|---|---|---|
| Seq Scan | Sequential table scan reading all rows | Small tables or queries accessing most rows |
| Index Scan | B-tree index traversal with table lookups | Selective queries with indexed conditions |
| Index Only Scan | Index traversal without table access | Queries where index contains all needed columns |
| Bitmap Index Scan | Index scan creating row location bitmap | Multiple index conditions combined |
| Bitmap Heap Scan | Table access using row location bitmap | After bitmap index scan completion |
| Nested Loop | Iterate outer table, probe inner for each row | Small outer table with indexed inner table |
| Hash Join | Build hash table, probe with other input | Medium tables with equality conditions |
| Merge Join | Parallel scan of pre-sorted inputs | Large sorted tables or acceptable sort cost |
| Sort | Order rows by specified columns | ORDER BY, GROUP BY, or merge join preparation |
| HashAggregate | Group rows using in-memory hash table | GROUP BY with sufficient memory |
| GroupAggregate | Group pre-sorted rows | GROUP BY with sorted input |
| Limit | Return first N rows | LIMIT clause in query |
| Subquery Scan | Execute subquery and scan results | Subqueries in FROM clause |
Cost Components
| Component | Description | Impact |
|---|---|---|
| Startup Cost | Work before first row output | Sorting, hash table building |
| Total Cost | Complete operation cost | All rows processed |
| Rows | Estimated result rows | Affects parent operation choices |
| Width | Average row size in bytes | Memory usage estimation |
Index Selection Criteria
| Condition | Index Used | Alternative |
|---|---|---|
| Single column equality | Single-column index | Sequential scan for low selectivity |
| Multiple column equality | Composite index preferred | Multiple single-column indexes |
| Range condition | B-tree index | Sequential scan for wide ranges |
| Sorting requirement | Index matching ORDER BY | Explicit sort operation |
| Partial table access | Partial index if conditions match | Full index with filter |
| All columns in query | Covering index | Regular index with table access |
Join Strategy Selection
| Scenario | Strategy | Reason |
|---|---|---|
| Small outer, indexed inner | Nested Loop | Efficient index probes |
| Medium tables, equality | Hash Join | Fast hash table lookup |
| Large sorted tables | Merge Join | Linear scan of both inputs |
| Small result set expected | Nested Loop | Early termination possible |
| No join indexes | Hash Join | Avoids full table scans per row |
| Memory constrained | Merge Join | Lower memory usage |
Plan Analysis Commands
| Database | Basic Plan | With Execution | JSON Format |
|---|---|---|---|
| PostgreSQL | EXPLAIN query | EXPLAIN ANALYZE query | EXPLAIN (FORMAT JSON) query |
| MySQL | EXPLAIN query | N/A - executes without timing | EXPLAIN FORMAT=JSON query |
| SQLite | EXPLAIN QUERY PLAN query | N/A | N/A |
| Ruby/ActiveRecord | relation.explain | relation.explain(:analyze) | N/A - database specific |
Performance Indicators
| Indicator | Meaning | Action |
|---|---|---|
| Seq Scan on large table | Reading entire table | Add index on filter columns |
| High actual vs estimated rows | Statistics outdated | Run ANALYZE on table |
| Nested Loop with high cost | Join order issue | Consider query restructuring |
| Sort operation | Ordering data | Add index matching ORDER BY |
| Multiple Bitmap Heap Scans | Index combination | Consider composite index |
| Hash with high buckets | Memory pressure | Increase work_mem setting |
Common Optimization Patterns
| Pattern | Implementation | Benefit |
|---|---|---|
| Covering Index | Index includes all query columns | Eliminates table access |
| Partial Index | Index with WHERE clause | Smaller index size |
| Composite Index | Multiple columns in one index | Serves multi-column queries |
| Index Column Order | High selectivity columns first | Better index filtering |
| Eager Loading | includes() or preload() | Prevents N+1 queries |
| Query Rewriting | Change query structure | Better plan selection |
| Statistics Update | Regular ANALYZE runs | Accurate cost estimates |
| Index-Only Scans | SELECT only indexed columns | Faster data access |