Overview
Performance optimization in Ruby involves identifying bottlenecks and applying targeted improvements to reduce execution time and memory consumption. Ruby provides multiple approaches for performance enhancement, including algorithmic improvements, method selection optimization, memory management, and leveraging built-in optimizations.
The Ruby interpreter includes several performance-focused features: method caching through inline caching, constant lookup optimization, and garbage collection tuning. Ruby 2.0+ introduced significant performance improvements with copy-on-write friendly garbage collection, while Ruby 3.0+ added Just-In-Time compilation capabilities.
Core performance optimization strategies center around profiling and measurement. The Benchmark
module provides timing capabilities, while GC.stat
offers garbage collection metrics. Third-party tools like ruby-prof
and memory_profiler
deliver detailed analysis.
require 'benchmark'
# Basic performance measurement
result = Benchmark.measure do
1_000_000.times { "string".upcase }
end
puts result.real
# => 0.123456
Ruby's performance characteristics differ significantly from compiled languages. Method dispatch involves dynamic lookup, blocks create closures, and automatic memory management introduces garbage collection overhead. Understanding these fundamentals guides optimization decisions.
# Performance-conscious string building
slow_way = ""
1000.times { |i| slow_way += "item#{i}" }
fast_way = []
1000.times { |i| fast_way << "item#{i}" }
result = fast_way.join
The interpreter optimizes certain patterns automatically. Constant lookup caching reduces repeated constant resolution overhead. Method caching stores method dispatch results. String literals with the same content share memory when frozen.
# Automatic optimization through interning
CONSTANT_VALUE = "shared_string".freeze
1000.times { puts CONSTANT_VALUE } # Same object referenced each time
Basic Usage
Performance optimization begins with identifying bottlenecks through profiling. The Benchmark
module provides fundamental timing capabilities for measuring code execution duration. Compare different approaches to determine the fastest implementation.
require 'benchmark'
# Comparing string concatenation methods
Benchmark.bm(15) do |x|
x.report("String +=") { 1000.times { str = ""; 100.times { str += "x" } } }
x.report("Array join") { 1000.times { arr = []; 100.times { arr << "x" }; arr.join } }
x.report("String <<") { 1000.times { str = ""; 100.times { str << "x" } } }
end
Method selection impacts performance significantly. Built-in methods typically outperform equivalent Ruby implementations. Native C extensions provide substantial speed improvements for computationally intensive operations.
# Prefer built-in methods
# Slow: manual implementation
def slow_sum(array)
total = 0
array.each { |n| total += n }
total
end
# Fast: built-in method
def fast_sum(array)
array.sum
end
numbers = (1..10_000).to_a
# fast_sum executes 3-5x faster than slow_sum
Object allocation represents a major performance factor. Creating unnecessary objects increases garbage collection pressure and memory usage. Reuse objects when possible, prefer in-place operations, and avoid temporary object creation in loops.
# Object allocation optimization
class DataProcessor
def initialize
@buffer = ""
@temp_array = []
end
def process_items(items)
@buffer.clear
@temp_array.clear
items.each do |item|
@buffer << item.to_s
@temp_array << @buffer.dup
end
@temp_array
end
end
Block usage affects performance through closure creation and variable capture overhead. Blocks that capture many local variables create larger closures. Symbol-to-proc conversions (&:method
) often perform better than equivalent block implementations.
# Block performance optimization
numbers = (1..1000).to_a
# Slower: block with variable capture
multiplier = 2
result1 = numbers.map { |n| n * multiplier }
# Faster: symbol-to-proc when applicable
result2 = numbers.map(&:to_s)
# Faster: direct method reference
result3 = numbers.map { |n| n * 2 }
Hash access patterns influence performance through key comparison overhead. Symbol keys perform faster than string keys due to identity comparison versus content comparison. Integer keys provide the fastest access for numeric indices.
# Hash key performance comparison
string_hash = { "key1" => 1, "key2" => 2, "key3" => 3 }
symbol_hash = { key1: 1, key2: 2, key3: 3 }
integer_hash = { 1 => 1, 2 => 2, 3 => 3 }
# Symbol access: fastest
value = symbol_hash[:key1]
# Integer access: fastest for numeric keys
value = integer_hash[1]
# String access: slower due to content comparison
value = string_hash["key1"]
Performance & Memory
Memory optimization directly impacts performance through reduced garbage collection overhead and improved cache locality. Ruby's garbage collection uses a mark-and-sweep algorithm with generational collection for short-lived objects.
Monitor garbage collection behavior using GC.stat
to identify memory pressure points. High allocation rates increase collection frequency, reducing overall performance. Track object allocation counts and memory usage patterns.
# Memory profiling with GC statistics
require 'objspace'
def analyze_memory_usage
GC.start
initial_stats = GC.stat
ObjectSpace.count_objects
yield
GC.start
final_stats = GC.stat
{
objects_allocated: final_stats[:total_allocated_objects] - initial_stats[:total_allocated_objects],
gc_collections: final_stats[:count] - initial_stats[:count],
heap_slots: final_stats[:heap_available_slots]
}
end
stats = analyze_memory_usage do
1000.times { "string" + "concatenation" }
end
puts "Allocated #{stats[:objects_allocated]} objects"
puts "Triggered #{stats[:gc_collections]} GC cycles"
String optimization provides significant performance improvements. String concatenation using +=
creates new objects repeatedly, while <<
modifies existing strings in-place. For multiple concatenations, array collection with join
minimizes intermediate object creation.
# String performance optimization techniques
require 'benchmark'
def string_concat_performance
Benchmark.bm(20) do |x|
x.report("String += (slow)") do
result = ""
1000.times { result += "segment" }
end
x.report("String << (fast)") do
result = ""
1000.times { result << "segment" }
end
x.report("Array join (fastest)") do
segments = []
1000.times { segments << "segment" }
result = segments.join
end
x.report("StringIO (alternative)") do
require 'stringio'
buffer = StringIO.new
1000.times { buffer << "segment" }
result = buffer.string
end
end
end
Array operations demonstrate significant performance variations based on access patterns and modification types. Index-based access performs consistently, while search operations scale with array size. In-place modifications avoid object creation overhead.
# Array performance optimization patterns
class ArrayOptimizer
def self.efficient_filtering(array, threshold)
# Avoid creating intermediate arrays
array.select! { |item| item > threshold }
array
end
def self.batch_operations(array)
# Single-pass operations reduce iteration overhead
sum = 0
max = array.first
array.each do |item|
sum += item
max = item if item > max
end
{ sum: sum, max: max }
end
def self.memory_efficient_mapping(array)
# Use enumerators for large datasets
array.lazy.map { |item| expensive_operation(item) }.first(100)
end
private
def self.expensive_operation(item)
item * item + Math.sqrt(item)
end
end
Hash optimization focuses on key selection and collision minimization. Consistent hash distributions improve access performance. Symbol keys avoid string comparison overhead, while proper hash functions reduce collision rates.
# Hash performance optimization
class PerformantCache
def initialize
@symbol_cache = {}
@integer_cache = {}
@access_counts = Hash.new(0)
end
def fetch_by_symbol(key)
@symbol_cache[key] ||= expensive_computation(key)
end
def fetch_by_integer(key)
@integer_cache[key] ||= expensive_computation(key)
end
def track_access(key)
@access_counts[key] += 1
end
def most_accessed(limit = 10)
@access_counts.sort_by { |_, count| -count }.first(limit)
end
private
def expensive_computation(key)
# Simulate expensive operation
sleep(0.001)
"computed_#{key}"
end
end
Enumerable optimization requires understanding method chaining overhead and lazy evaluation benefits. Chain operations efficiently by minimizing intermediate object creation. Use lazy enumerators for large datasets or infinite sequences.
# Enumerable performance patterns
class DataProcessor
def self.optimized_chain(data)
# Single-pass processing avoids intermediate arrays
data
.lazy
.select { |item| item.valid? }
.map { |item| item.transform }
.reject { |item| item.empty? }
.first(1000)
.force
end
def self.inefficient_chain(data)
# Multiple passes create intermediate arrays
data
.select { |item| item.valid? }
.map { |item| item.transform }
.reject { |item| item.empty? }
.first(1000)
end
def self.custom_enumeration(data)
# Manual iteration for maximum control
results = []
data.each do |item|
next unless item.valid?
transformed = item.transform
next if transformed.empty?
results << transformed
break if results.size >= 1000
end
results
end
end
Production Patterns
Production performance optimization requires monitoring, profiling, and systematic measurement under realistic load conditions. Establish baseline performance metrics before implementing optimizations to quantify improvements accurately.
Application-level performance monitoring identifies bottlenecks in real-world usage patterns. Ruby provides hooks for method instrumentation and execution tracking. Third-party monitoring tools offer detailed application performance insights.
# Production performance monitoring
class PerformanceMonitor
def self.monitor_method(klass, method_name)
original_method = klass.instance_method(method_name)
klass.define_method(method_name) do |*args, &block|
start_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
start_memory = GC.stat[:total_allocated_objects]
result = original_method.bind(self).call(*args, &block)
end_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
end_memory = GC.stat[:total_allocated_objects]
duration = end_time - start_time
allocated = end_memory - start_memory
Rails.logger.info "#{klass}##{method_name}: #{duration.round(4)}s, #{allocated} objects"
result
end
end
end
# Usage in Rails application
PerformanceMonitor.monitor_method(UserController, :show)
PerformanceMonitor.monitor_method(OrderService, :process_payment)
Database query optimization represents the most impactful performance improvement for web applications. Minimize query counts through eager loading, optimize query structure, and implement appropriate caching strategies.
# Database performance optimization patterns
class OptimizedUserService
def self.load_users_with_orders(user_ids)
# Single query with eager loading
User.includes(:orders, :profile)
.where(id: user_ids)
.order(:created_at)
end
def self.cached_user_stats(user_id)
Rails.cache.fetch("user_stats:#{user_id}", expires_in: 1.hour) do
user = User.find(user_id)
{
order_count: user.orders.count,
total_spent: user.orders.sum(:total),
last_order_date: user.orders.maximum(:created_at)
}
end
end
def self.bulk_update_users(updates)
# Batch operations reduce database round trips
User.transaction do
updates.each_slice(1000) do |batch|
User.upsert_all(batch)
end
end
end
end
Caching strategies provide substantial performance improvements through computation avoidance and data locality. Implement multi-level caching with appropriate expiration policies. Consider cache warming for frequently accessed data.
# Multi-level caching implementation
class CachingService
MEMORY_CACHE = {}
MEMORY_LIMIT = 10_000
def self.fetch(key, expires_in: 1.hour, &block)
# L1: Memory cache
if MEMORY_CACHE.key?(key) && !expired?(MEMORY_CACHE[key])
return MEMORY_CACHE[key][:value]
end
# L2: Redis cache
cached_value = Rails.cache.read(key)
if cached_value
store_in_memory(key, cached_value, expires_in)
return cached_value
end
# L3: Computation
computed_value = block.call
Rails.cache.write(key, computed_value, expires_in: expires_in)
store_in_memory(key, computed_value, expires_in)
computed_value
end
private
def self.store_in_memory(key, value, expires_in)
return if MEMORY_CACHE.size >= MEMORY_LIMIT
MEMORY_CACHE[key] = {
value: value,
expires_at: Time.current + expires_in
}
end
def self.expired?(cache_entry)
cache_entry[:expires_at] < Time.current
end
end
Background job optimization improves application responsiveness by moving expensive operations outside the request-response cycle. Implement job batching, prioritization, and resource management for optimal throughput.
# Background job performance optimization
class OptimizedDataProcessor
include Sidekiq::Worker
sidekiq_options queue: :high_priority, retry: 3
def perform(batch_id, record_ids)
# Process records in optimized batches
Record.where(id: record_ids).find_in_batches(batch_size: 500) do |batch|
process_batch(batch)
# Yield control periodically for other jobs
sleep(0.01) if batch.size == 500
end
mark_batch_complete(batch_id)
end
private
def process_batch(records)
# Bulk operations for efficiency
updates = []
records.each do |record|
processed_data = expensive_processing(record)
updates << { id: record.id, processed_data: processed_data }
end
Record.upsert_all(updates) if updates.any?
end
def expensive_processing(record)
# Actual processing logic
record.data.transform(&:upcase).join("-")
end
def mark_batch_complete(batch_id)
BatchStatus.where(id: batch_id).update_all(status: 'completed')
end
end
Asset optimization reduces page load times through compression, minification, and efficient delivery. Implement CDN usage, asset fingerprinting, and appropriate caching headers for static resources.
# Asset performance optimization configuration
# config/environments/production.rb
Rails.application.configure do
# Enable asset compression and caching
config.assets.compress = true
config.assets.compile = false
config.assets.digest = true
# Set far-future expires headers
config.public_file_server.headers = {
'Cache-Control' => 'public, max-age=31536000',
'Expires' => 1.year.from_now.to_formatted_s(:rfc822)
}
# Enable gzip compression
config.middleware.use Rack::Deflater
# Configure CDN for asset delivery
config.asset_host = ENV['CDN_HOST']
end
# Custom asset optimization helper
module AssetOptimizationHelper
def optimized_image_tag(source, options = {})
# Generate responsive image sources
sizes = options.delete(:sizes) || [480, 768, 1024, 1200]
srcset = sizes.map { |size| "#{image_url(source, width: size)} #{size}w" }.join(', ')
image_tag(source, options.merge(srcset: srcset, sizes: "(max-width: 768px) 100vw, 50vw"))
end
def critical_css_inline
# Inline critical CSS for faster rendering
Rails.cache.fetch('critical_css', expires_in: 1.day) do
File.read(Rails.root.join('app', 'assets', 'stylesheets', 'critical.css'))
end.html_safe
end
end
Common Pitfalls
String immutability assumptions cause performance problems when developers create unnecessary string objects. Each string concatenation with +=
generates a new string object, leading to quadratic time complexity for repeated concatenations.
# Common string concatenation pitfall
def slow_string_building(items)
result = ""
items.each { |item| result += item.to_s } # Creates new string each time
result
end
def fast_string_building(items)
result = +"" # Unfreeze string literal
items.each { |item| result << item.to_s } # Modifies existing string
result
end
# Alternative: use array collection
def fastest_string_building(items)
items.map(&:to_s).join
end
Premature optimization leads to complex code without measurable benefits. Profile first, then optimize bottlenecks identified through measurement. Micro-optimizations often provide negligible improvements while sacrificing code readability.
# Premature optimization example
class PrematurelyOptimized
# Overly complex optimization for minimal gain
def calculate_average(numbers)
return 0.0 if numbers.empty?
# Micro-optimization that hurts readability
sum = 0
count = 0
numbers.each do |n|
sum += n
count += 1
end
sum.to_f / count
end
end
class ProperlyOptimized
# Simple, readable implementation
def calculate_average(numbers)
return 0.0 if numbers.empty?
numbers.sum.to_f / numbers.size
end
end
Memory leak patterns emerge from unexpected object retention and circular references. Event listeners, global variables, and class variables can prevent garbage collection of referenced objects.
# Memory leak through global state
class LeakyCache
@@cache = {} # Class variable persists across instances
def store(key, value)
@@cache[key] = value # Objects never released
end
def fetch(key)
@@cache[key]
end
end
# Fixed version with size limits
class BoundedCache
def initialize(max_size = 1000)
@cache = {}
@max_size = max_size
@access_order = []
end
def store(key, value)
if @cache.size >= @max_size && !@cache.key?(key)
evict_oldest
end
@cache[key] = value
@access_order.delete(key)
@access_order << key
end
private
def evict_oldest
oldest_key = @access_order.shift
@cache.delete(oldest_key)
end
end
N+1 query problems cause exponential performance degradation with dataset growth. Each parent record triggers additional queries for associated records, resulting in excessive database operations.
# N+1 query problem
class SlowController
def index
@users = User.all
# This triggers one query per user for orders
@users.each { |user| puts user.orders.count }
end
end
# Fixed with eager loading
class FastController
def index
@users = User.includes(:orders).all
# Single query loads users and orders
@users.each { |user| puts user.orders.count }
end
end
# Alternative: counter cache
class User < ApplicationRecord
has_many :orders
end
class Order < ApplicationRecord
belongs_to :user, counter_cache: true
end
# Then access cached count
@users.each { |user| puts user.orders_count }
Block allocation overhead accumulates in frequently called methods. Each block creates a closure object, capturing local variables and adding allocation pressure. Symbol-to-proc conversion avoids block creation for simple operations.
# Block allocation pitfall
def slow_processing(items)
multiplier = 2
items.map { |item| item * multiplier } # Creates block object each call
end
# Optimized version
def fast_processing(items)
items.map { |item| item * 2 } # No variable capture
end
# Even faster with symbol-to-proc when applicable
def fastest_processing(items)
items.map(&:to_i) # No block allocation
end
Hash key performance problems occur when using mutable objects as keys. String keys require content comparison, while symbol keys use identity comparison. Frozen strings improve performance but still involve content comparison.
# Slow hash key usage
def slow_lookup(data, key_string)
cache = {}
data.each { |item| cache[item.name] = item } # String keys
cache[key_string]
end
# Fast hash key usage
def fast_lookup(data, key_symbol)
cache = {}
data.each { |item| cache[item.name.to_sym] = item } # Symbol keys
cache[key_symbol]
end
# Alternative: integer keys for numeric indices
def integer_key_lookup(data, index)
cache = {}
data.each_with_index { |item, i| cache[i] = item }
cache[index]
end
Reference
Performance Measurement Methods
Method | Parameters | Returns | Description |
---|---|---|---|
Benchmark.measure { block } |
Block | Benchmark::Tms |
Times block execution with detailed metrics |
Benchmark.realtime { block } |
Block | Float |
Returns wall clock time in seconds |
Benchmark.bm(width) { |x| ... } |
Width (Integer), Block | Array |
Formatted benchmark comparison |
Process.clock_gettime(Process::CLOCK_MONOTONIC) |
Clock type | Float |
High-resolution monotonic time |
Memory Analysis Methods
Method | Parameters | Returns | Description |
---|---|---|---|
GC.stat |
None | Hash |
Garbage collection statistics |
GC.start |
None | nil |
Forces garbage collection cycle |
ObjectSpace.count_objects |
Type (optional) | Hash |
Object count by type |
ObjectSpace.memsize_of(obj) |
Object | Integer |
Memory size of object in bytes |
String Performance Methods
Method | Parameters | Returns | Description |
---|---|---|---|
String#<<(other) |
String/Integer | String |
Appends to string in-place |
String#+(other) |
String | String |
Creates new concatenated string |
String#freeze |
None | String |
Makes string immutable |
Array#join(separator) |
String (optional) | String |
Joins array elements efficiently |
Hash Optimization Methods
Method | Parameters | Returns | Description |
---|---|---|---|
Hash#fetch(key, default) |
Key, Default (optional) | Value | Retrieves value with default |
Hash#store(key, value) |
Key, Value | Value | Stores key-value pair |
Hash#key?(key) |
Key | Boolean |
Tests key existence |
Hash#values_at(*keys) |
Keys | Array |
Retrieves multiple values |
Array Performance Methods
Method | Parameters | Returns | Description |
---|---|---|---|
Array#<<(obj) |
Object | Array |
Appends object in-place |
Array#push(*objects) |
Objects | Array |
Appends multiple objects |
Array#select!(&block) |
Block | Array or nil |
Filters array in-place |
Array#map!(&block) |
Block | Array |
Transforms array in-place |
Enumerable Optimization Methods
Method | Parameters | Returns | Description |
---|---|---|---|
Enumerable#lazy |
None | Enumerator::Lazy |
Creates lazy enumerator |
Enumerable#each_slice(n) |
Integer | Enumerator |
Iterates in chunks |
Enumerable#find_index(&block) |
Block | Integer or nil |
Finds first matching index |
Enumerable#any?(&block) |
Block | Boolean |
Tests if any element matches |
GC Statistics Keys
Key | Type | Description |
---|---|---|
:count |
Integer | Total GC cycles performed |
:heap_allocated_pages |
Integer | Pages allocated to heap |
:total_allocated_objects |
Integer | Objects allocated since start |
:total_freed_objects |
Integer | Objects freed by GC |
:malloc_increase_bytes |
Integer | C malloc memory increase |
:oldmalloc_increase_bytes |
Integer | Old generation malloc increase |
Performance Optimization Patterns
Pattern | Use Case | Implementation |
---|---|---|
Object Pooling | Expensive object creation | Reuse pre-allocated objects |
Lazy Loading | Large data structures | Load data on first access |
Memoization | Expensive computations | Cache computed results |
Batch Processing | Multiple similar operations | Group operations together |
Common Performance Anti-Patterns
Anti-Pattern | Problem | Solution |
---|---|---|
String concatenation with += |
Quadratic time complexity | Use << or Array#join |
Repeated database queries | N+1 query problem | Eager loading with includes |
Block allocation in loops | Excessive object creation | Use symbol-to-proc or avoid blocks |
Global state accumulation | Memory leaks | Implement size limits and cleanup |
Profiling Tool Options
Tool | Purpose | Installation |
---|---|---|
ruby-prof |
Method-level profiling | gem install ruby-prof |
memory_profiler |
Memory allocation tracking | gem install memory_profiler |
benchmark-ips |
Iterations per second | gem install benchmark-ips |
rack-mini-profiler |
Web request profiling | gem install rack-mini-profiler |