CrackedRuby logo

CrackedRuby

Performance Gains

Overview

Performance optimization in Ruby involves identifying bottlenecks and applying targeted improvements to reduce execution time and memory consumption. Ruby provides multiple approaches for performance enhancement, including algorithmic improvements, method selection optimization, memory management, and leveraging built-in optimizations.

The Ruby interpreter includes several performance-focused features: method caching through inline caching, constant lookup optimization, and garbage collection tuning. Ruby 2.0+ introduced significant performance improvements with copy-on-write friendly garbage collection, while Ruby 3.0+ added Just-In-Time compilation capabilities.

Core performance optimization strategies center around profiling and measurement. The Benchmark module provides timing capabilities, while GC.stat offers garbage collection metrics. Third-party tools like ruby-prof and memory_profiler deliver detailed analysis.

require 'benchmark'

# Basic performance measurement
result = Benchmark.measure do
  1_000_000.times { "string".upcase }
end
puts result.real
# => 0.123456

Ruby's performance characteristics differ significantly from compiled languages. Method dispatch involves dynamic lookup, blocks create closures, and automatic memory management introduces garbage collection overhead. Understanding these fundamentals guides optimization decisions.

# Performance-conscious string building
slow_way = ""
1000.times { |i| slow_way += "item#{i}" }

fast_way = []
1000.times { |i| fast_way << "item#{i}" }
result = fast_way.join

The interpreter optimizes certain patterns automatically. Constant lookup caching reduces repeated constant resolution overhead. Method caching stores method dispatch results. String literals with the same content share memory when frozen.

# Automatic optimization through interning
CONSTANT_VALUE = "shared_string".freeze
1000.times { puts CONSTANT_VALUE } # Same object referenced each time

Basic Usage

Performance optimization begins with identifying bottlenecks through profiling. The Benchmark module provides fundamental timing capabilities for measuring code execution duration. Compare different approaches to determine the fastest implementation.

require 'benchmark'

# Comparing string concatenation methods
Benchmark.bm(15) do |x|
  x.report("String +=")     { 1000.times { str = ""; 100.times { str += "x" } } }
  x.report("Array join")    { 1000.times { arr = []; 100.times { arr << "x" }; arr.join } }
  x.report("String <<")     { 1000.times { str = ""; 100.times { str << "x" } } }
end

Method selection impacts performance significantly. Built-in methods typically outperform equivalent Ruby implementations. Native C extensions provide substantial speed improvements for computationally intensive operations.

# Prefer built-in methods
# Slow: manual implementation
def slow_sum(array)
  total = 0
  array.each { |n| total += n }
  total
end

# Fast: built-in method
def fast_sum(array)
  array.sum
end

numbers = (1..10_000).to_a
# fast_sum executes 3-5x faster than slow_sum

Object allocation represents a major performance factor. Creating unnecessary objects increases garbage collection pressure and memory usage. Reuse objects when possible, prefer in-place operations, and avoid temporary object creation in loops.

# Object allocation optimization
class DataProcessor
  def initialize
    @buffer = ""
    @temp_array = []
  end

  def process_items(items)
    @buffer.clear
    @temp_array.clear

    items.each do |item|
      @buffer << item.to_s
      @temp_array << @buffer.dup
    end

    @temp_array
  end
end

Block usage affects performance through closure creation and variable capture overhead. Blocks that capture many local variables create larger closures. Symbol-to-proc conversions (&:method) often perform better than equivalent block implementations.

# Block performance optimization
numbers = (1..1000).to_a

# Slower: block with variable capture
multiplier = 2
result1 = numbers.map { |n| n * multiplier }

# Faster: symbol-to-proc when applicable
result2 = numbers.map(&:to_s)

# Faster: direct method reference
result3 = numbers.map { |n| n * 2 }

Hash access patterns influence performance through key comparison overhead. Symbol keys perform faster than string keys due to identity comparison versus content comparison. Integer keys provide the fastest access for numeric indices.

# Hash key performance comparison
string_hash = { "key1" => 1, "key2" => 2, "key3" => 3 }
symbol_hash = { key1: 1, key2: 2, key3: 3 }
integer_hash = { 1 => 1, 2 => 2, 3 => 3 }

# Symbol access: fastest
value = symbol_hash[:key1]

# Integer access: fastest for numeric keys
value = integer_hash[1]

# String access: slower due to content comparison
value = string_hash["key1"]

Performance & Memory

Memory optimization directly impacts performance through reduced garbage collection overhead and improved cache locality. Ruby's garbage collection uses a mark-and-sweep algorithm with generational collection for short-lived objects.

Monitor garbage collection behavior using GC.stat to identify memory pressure points. High allocation rates increase collection frequency, reducing overall performance. Track object allocation counts and memory usage patterns.

# Memory profiling with GC statistics
require 'objspace'

def analyze_memory_usage
  GC.start
  initial_stats = GC.stat
  ObjectSpace.count_objects

  yield

  GC.start
  final_stats = GC.stat

  {
    objects_allocated: final_stats[:total_allocated_objects] - initial_stats[:total_allocated_objects],
    gc_collections: final_stats[:count] - initial_stats[:count],
    heap_slots: final_stats[:heap_available_slots]
  }
end

stats = analyze_memory_usage do
  1000.times { "string" + "concatenation" }
end
puts "Allocated #{stats[:objects_allocated]} objects"
puts "Triggered #{stats[:gc_collections]} GC cycles"

String optimization provides significant performance improvements. String concatenation using += creates new objects repeatedly, while << modifies existing strings in-place. For multiple concatenations, array collection with join minimizes intermediate object creation.

# String performance optimization techniques
require 'benchmark'

def string_concat_performance
  Benchmark.bm(20) do |x|
    x.report("String += (slow)") do
      result = ""
      1000.times { result += "segment" }
    end

    x.report("String << (fast)") do
      result = ""
      1000.times { result << "segment" }
    end

    x.report("Array join (fastest)") do
      segments = []
      1000.times { segments << "segment" }
      result = segments.join
    end

    x.report("StringIO (alternative)") do
      require 'stringio'
      buffer = StringIO.new
      1000.times { buffer << "segment" }
      result = buffer.string
    end
  end
end

Array operations demonstrate significant performance variations based on access patterns and modification types. Index-based access performs consistently, while search operations scale with array size. In-place modifications avoid object creation overhead.

# Array performance optimization patterns
class ArrayOptimizer
  def self.efficient_filtering(array, threshold)
    # Avoid creating intermediate arrays
    array.select! { |item| item > threshold }
    array
  end

  def self.batch_operations(array)
    # Single-pass operations reduce iteration overhead
    sum = 0
    max = array.first
    array.each do |item|
      sum += item
      max = item if item > max
    end
    { sum: sum, max: max }
  end

  def self.memory_efficient_mapping(array)
    # Use enumerators for large datasets
    array.lazy.map { |item| expensive_operation(item) }.first(100)
  end

  private

  def self.expensive_operation(item)
    item * item + Math.sqrt(item)
  end
end

Hash optimization focuses on key selection and collision minimization. Consistent hash distributions improve access performance. Symbol keys avoid string comparison overhead, while proper hash functions reduce collision rates.

# Hash performance optimization
class PerformantCache
  def initialize
    @symbol_cache = {}
    @integer_cache = {}
    @access_counts = Hash.new(0)
  end

  def fetch_by_symbol(key)
    @symbol_cache[key] ||= expensive_computation(key)
  end

  def fetch_by_integer(key)
    @integer_cache[key] ||= expensive_computation(key)
  end

  def track_access(key)
    @access_counts[key] += 1
  end

  def most_accessed(limit = 10)
    @access_counts.sort_by { |_, count| -count }.first(limit)
  end

  private

  def expensive_computation(key)
    # Simulate expensive operation
    sleep(0.001)
    "computed_#{key}"
  end
end

Enumerable optimization requires understanding method chaining overhead and lazy evaluation benefits. Chain operations efficiently by minimizing intermediate object creation. Use lazy enumerators for large datasets or infinite sequences.

# Enumerable performance patterns
class DataProcessor
  def self.optimized_chain(data)
    # Single-pass processing avoids intermediate arrays
    data
      .lazy
      .select { |item| item.valid? }
      .map { |item| item.transform }
      .reject { |item| item.empty? }
      .first(1000)
      .force
  end

  def self.inefficient_chain(data)
    # Multiple passes create intermediate arrays
    data
      .select { |item| item.valid? }
      .map { |item| item.transform }
      .reject { |item| item.empty? }
      .first(1000)
  end

  def self.custom_enumeration(data)
    # Manual iteration for maximum control
    results = []
    data.each do |item|
      next unless item.valid?
      transformed = item.transform
      next if transformed.empty?
      results << transformed
      break if results.size >= 1000
    end
    results
  end
end

Production Patterns

Production performance optimization requires monitoring, profiling, and systematic measurement under realistic load conditions. Establish baseline performance metrics before implementing optimizations to quantify improvements accurately.

Application-level performance monitoring identifies bottlenecks in real-world usage patterns. Ruby provides hooks for method instrumentation and execution tracking. Third-party monitoring tools offer detailed application performance insights.

# Production performance monitoring
class PerformanceMonitor
  def self.monitor_method(klass, method_name)
    original_method = klass.instance_method(method_name)

    klass.define_method(method_name) do |*args, &block|
      start_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
      start_memory = GC.stat[:total_allocated_objects]

      result = original_method.bind(self).call(*args, &block)

      end_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
      end_memory = GC.stat[:total_allocated_objects]

      duration = end_time - start_time
      allocated = end_memory - start_memory

      Rails.logger.info "#{klass}##{method_name}: #{duration.round(4)}s, #{allocated} objects"
      result
    end
  end
end

# Usage in Rails application
PerformanceMonitor.monitor_method(UserController, :show)
PerformanceMonitor.monitor_method(OrderService, :process_payment)

Database query optimization represents the most impactful performance improvement for web applications. Minimize query counts through eager loading, optimize query structure, and implement appropriate caching strategies.

# Database performance optimization patterns
class OptimizedUserService
  def self.load_users_with_orders(user_ids)
    # Single query with eager loading
    User.includes(:orders, :profile)
        .where(id: user_ids)
        .order(:created_at)
  end

  def self.cached_user_stats(user_id)
    Rails.cache.fetch("user_stats:#{user_id}", expires_in: 1.hour) do
      user = User.find(user_id)
      {
        order_count: user.orders.count,
        total_spent: user.orders.sum(:total),
        last_order_date: user.orders.maximum(:created_at)
      }
    end
  end

  def self.bulk_update_users(updates)
    # Batch operations reduce database round trips
    User.transaction do
      updates.each_slice(1000) do |batch|
        User.upsert_all(batch)
      end
    end
  end
end

Caching strategies provide substantial performance improvements through computation avoidance and data locality. Implement multi-level caching with appropriate expiration policies. Consider cache warming for frequently accessed data.

# Multi-level caching implementation
class CachingService
  MEMORY_CACHE = {}
  MEMORY_LIMIT = 10_000

  def self.fetch(key, expires_in: 1.hour, &block)
    # L1: Memory cache
    if MEMORY_CACHE.key?(key) && !expired?(MEMORY_CACHE[key])
      return MEMORY_CACHE[key][:value]
    end

    # L2: Redis cache
    cached_value = Rails.cache.read(key)
    if cached_value
      store_in_memory(key, cached_value, expires_in)
      return cached_value
    end

    # L3: Computation
    computed_value = block.call
    Rails.cache.write(key, computed_value, expires_in: expires_in)
    store_in_memory(key, computed_value, expires_in)
    computed_value
  end

  private

  def self.store_in_memory(key, value, expires_in)
    return if MEMORY_CACHE.size >= MEMORY_LIMIT

    MEMORY_CACHE[key] = {
      value: value,
      expires_at: Time.current + expires_in
    }
  end

  def self.expired?(cache_entry)
    cache_entry[:expires_at] < Time.current
  end
end

Background job optimization improves application responsiveness by moving expensive operations outside the request-response cycle. Implement job batching, prioritization, and resource management for optimal throughput.

# Background job performance optimization
class OptimizedDataProcessor
  include Sidekiq::Worker

  sidekiq_options queue: :high_priority, retry: 3

  def perform(batch_id, record_ids)
    # Process records in optimized batches
    Record.where(id: record_ids).find_in_batches(batch_size: 500) do |batch|
      process_batch(batch)

      # Yield control periodically for other jobs
      sleep(0.01) if batch.size == 500
    end

    mark_batch_complete(batch_id)
  end

  private

  def process_batch(records)
    # Bulk operations for efficiency
    updates = []
    records.each do |record|
      processed_data = expensive_processing(record)
      updates << { id: record.id, processed_data: processed_data }
    end

    Record.upsert_all(updates) if updates.any?
  end

  def expensive_processing(record)
    # Actual processing logic
    record.data.transform(&:upcase).join("-")
  end

  def mark_batch_complete(batch_id)
    BatchStatus.where(id: batch_id).update_all(status: 'completed')
  end
end

Asset optimization reduces page load times through compression, minification, and efficient delivery. Implement CDN usage, asset fingerprinting, and appropriate caching headers for static resources.

# Asset performance optimization configuration
# config/environments/production.rb
Rails.application.configure do
  # Enable asset compression and caching
  config.assets.compress = true
  config.assets.compile = false
  config.assets.digest = true

  # Set far-future expires headers
  config.public_file_server.headers = {
    'Cache-Control' => 'public, max-age=31536000',
    'Expires' => 1.year.from_now.to_formatted_s(:rfc822)
  }

  # Enable gzip compression
  config.middleware.use Rack::Deflater

  # Configure CDN for asset delivery
  config.asset_host = ENV['CDN_HOST']
end

# Custom asset optimization helper
module AssetOptimizationHelper
  def optimized_image_tag(source, options = {})
    # Generate responsive image sources
    sizes = options.delete(:sizes) || [480, 768, 1024, 1200]
    srcset = sizes.map { |size| "#{image_url(source, width: size)} #{size}w" }.join(', ')

    image_tag(source, options.merge(srcset: srcset, sizes: "(max-width: 768px) 100vw, 50vw"))
  end

  def critical_css_inline
    # Inline critical CSS for faster rendering
    Rails.cache.fetch('critical_css', expires_in: 1.day) do
      File.read(Rails.root.join('app', 'assets', 'stylesheets', 'critical.css'))
    end.html_safe
  end
end

Common Pitfalls

String immutability assumptions cause performance problems when developers create unnecessary string objects. Each string concatenation with += generates a new string object, leading to quadratic time complexity for repeated concatenations.

# Common string concatenation pitfall
def slow_string_building(items)
  result = ""
  items.each { |item| result += item.to_s }  # Creates new string each time
  result
end

def fast_string_building(items)
  result = +""  # Unfreeze string literal
  items.each { |item| result << item.to_s }  # Modifies existing string
  result
end

# Alternative: use array collection
def fastest_string_building(items)
  items.map(&:to_s).join
end

Premature optimization leads to complex code without measurable benefits. Profile first, then optimize bottlenecks identified through measurement. Micro-optimizations often provide negligible improvements while sacrificing code readability.

# Premature optimization example
class PrematurelyOptimized
  # Overly complex optimization for minimal gain
  def calculate_average(numbers)
    return 0.0 if numbers.empty?

    # Micro-optimization that hurts readability
    sum = 0
    count = 0
    numbers.each do |n|
      sum += n
      count += 1
    end
    sum.to_f / count
  end
end

class ProperlyOptimized
  # Simple, readable implementation
  def calculate_average(numbers)
    return 0.0 if numbers.empty?
    numbers.sum.to_f / numbers.size
  end
end

Memory leak patterns emerge from unexpected object retention and circular references. Event listeners, global variables, and class variables can prevent garbage collection of referenced objects.

# Memory leak through global state
class LeakyCache
  @@cache = {}  # Class variable persists across instances

  def store(key, value)
    @@cache[key] = value  # Objects never released
  end

  def fetch(key)
    @@cache[key]
  end
end

# Fixed version with size limits
class BoundedCache
  def initialize(max_size = 1000)
    @cache = {}
    @max_size = max_size
    @access_order = []
  end

  def store(key, value)
    if @cache.size >= @max_size && !@cache.key?(key)
      evict_oldest
    end

    @cache[key] = value
    @access_order.delete(key)
    @access_order << key
  end

  private

  def evict_oldest
    oldest_key = @access_order.shift
    @cache.delete(oldest_key)
  end
end

N+1 query problems cause exponential performance degradation with dataset growth. Each parent record triggers additional queries for associated records, resulting in excessive database operations.

# N+1 query problem
class SlowController
  def index
    @users = User.all
    # This triggers one query per user for orders
    @users.each { |user| puts user.orders.count }
  end
end

# Fixed with eager loading
class FastController
  def index
    @users = User.includes(:orders).all
    # Single query loads users and orders
    @users.each { |user| puts user.orders.count }
  end
end

# Alternative: counter cache
class User < ApplicationRecord
  has_many :orders
end

class Order < ApplicationRecord
  belongs_to :user, counter_cache: true
end

# Then access cached count
@users.each { |user| puts user.orders_count }

Block allocation overhead accumulates in frequently called methods. Each block creates a closure object, capturing local variables and adding allocation pressure. Symbol-to-proc conversion avoids block creation for simple operations.

# Block allocation pitfall
def slow_processing(items)
  multiplier = 2
  items.map { |item| item * multiplier }  # Creates block object each call
end

# Optimized version
def fast_processing(items)
  items.map { |item| item * 2 }  # No variable capture
end

# Even faster with symbol-to-proc when applicable
def fastest_processing(items)
  items.map(&:to_i)  # No block allocation
end

Hash key performance problems occur when using mutable objects as keys. String keys require content comparison, while symbol keys use identity comparison. Frozen strings improve performance but still involve content comparison.

# Slow hash key usage
def slow_lookup(data, key_string)
  cache = {}
  data.each { |item| cache[item.name] = item }  # String keys
  cache[key_string]
end

# Fast hash key usage
def fast_lookup(data, key_symbol)
  cache = {}
  data.each { |item| cache[item.name.to_sym] = item }  # Symbol keys
  cache[key_symbol]
end

# Alternative: integer keys for numeric indices
def integer_key_lookup(data, index)
  cache = {}
  data.each_with_index { |item, i| cache[i] = item }
  cache[index]
end

Reference

Performance Measurement Methods

Method Parameters Returns Description
Benchmark.measure { block } Block Benchmark::Tms Times block execution with detailed metrics
Benchmark.realtime { block } Block Float Returns wall clock time in seconds
Benchmark.bm(width) { |x| ... } Width (Integer), Block Array Formatted benchmark comparison
Process.clock_gettime(Process::CLOCK_MONOTONIC) Clock type Float High-resolution monotonic time

Memory Analysis Methods

Method Parameters Returns Description
GC.stat None Hash Garbage collection statistics
GC.start None nil Forces garbage collection cycle
ObjectSpace.count_objects Type (optional) Hash Object count by type
ObjectSpace.memsize_of(obj) Object Integer Memory size of object in bytes

String Performance Methods

Method Parameters Returns Description
String#<<(other) String/Integer String Appends to string in-place
String#+(other) String String Creates new concatenated string
String#freeze None String Makes string immutable
Array#join(separator) String (optional) String Joins array elements efficiently

Hash Optimization Methods

Method Parameters Returns Description
Hash#fetch(key, default) Key, Default (optional) Value Retrieves value with default
Hash#store(key, value) Key, Value Value Stores key-value pair
Hash#key?(key) Key Boolean Tests key existence
Hash#values_at(*keys) Keys Array Retrieves multiple values

Array Performance Methods

Method Parameters Returns Description
Array#<<(obj) Object Array Appends object in-place
Array#push(*objects) Objects Array Appends multiple objects
Array#select!(&block) Block Array or nil Filters array in-place
Array#map!(&block) Block Array Transforms array in-place

Enumerable Optimization Methods

Method Parameters Returns Description
Enumerable#lazy None Enumerator::Lazy Creates lazy enumerator
Enumerable#each_slice(n) Integer Enumerator Iterates in chunks
Enumerable#find_index(&block) Block Integer or nil Finds first matching index
Enumerable#any?(&block) Block Boolean Tests if any element matches

GC Statistics Keys

Key Type Description
:count Integer Total GC cycles performed
:heap_allocated_pages Integer Pages allocated to heap
:total_allocated_objects Integer Objects allocated since start
:total_freed_objects Integer Objects freed by GC
:malloc_increase_bytes Integer C malloc memory increase
:oldmalloc_increase_bytes Integer Old generation malloc increase

Performance Optimization Patterns

Pattern Use Case Implementation
Object Pooling Expensive object creation Reuse pre-allocated objects
Lazy Loading Large data structures Load data on first access
Memoization Expensive computations Cache computed results
Batch Processing Multiple similar operations Group operations together

Common Performance Anti-Patterns

Anti-Pattern Problem Solution
String concatenation with += Quadratic time complexity Use << or Array#join
Repeated database queries N+1 query problem Eager loading with includes
Block allocation in loops Excessive object creation Use symbol-to-proc or avoid blocks
Global state accumulation Memory leaks Implement size limits and cleanup

Profiling Tool Options

Tool Purpose Installation
ruby-prof Method-level profiling gem install ruby-prof
memory_profiler Memory allocation tracking gem install memory_profiler
benchmark-ips Iterations per second gem install benchmark-ips
rack-mini-profiler Web request profiling gem install rack-mini-profiler