CrackedRuby logo

CrackedRuby

Garbage Collection Tuning

Complete guide to optimizing Ruby's garbage collection performance through tuning parameters and monitoring techniques.

Performance Optimization Memory Management
7.2.1

Overview

Ruby's garbage collection system automatically manages memory allocation and deallocation through a tri-color mark-and-sweep collector combined with generational collection. The garbage collector organizes objects into generations based on survival rates, with newer objects in younger generations and long-lived objects promoted to older generations.

The GC module provides the primary interface for garbage collection operations and configuration. Ruby exposes numerous tuning parameters through environment variables and runtime methods that control collection frequency, heap sizing, and allocation patterns.

# Basic GC information
GC.count          # => 127
GC.stat[:count]   # => 127
GC.stat[:heap_available_slots]  # => 163840
GC.stat[:heap_live_slots]       # => 154830

Ruby divides its heap into pages containing slots for objects. Each slot holds one Ruby object, and the garbage collector tracks object lifecycle through these slots. The collector uses a tri-color marking algorithm where objects transition between white (unmarked), gray (marked but not scanned), and black (marked and scanned) states.

Generational collection assumes most objects die young. Ruby maintains separate heaps for different object generations, collecting younger generations more frequently than older ones. Objects surviving multiple collections get promoted to older generations, reducing collection overhead for long-lived data.

# Check generational GC stats
stats = GC.stat
puts "Young gen collections: #{stats[:minor_gc_count]}"
puts "Old gen collections: #{stats[:major_gc_count]}"
puts "Objects promoted: #{stats[:old_objects]}"

Environment variables control initial GC configuration before Ruby starts. These settings establish baseline behavior that runtime methods can later modify. The collector responds to memory pressure and allocation patterns, automatically adjusting collection frequency and heap growth.

Basic Usage

Ruby's GC configuration relies on environment variables set before process startup and runtime methods for dynamic adjustments. The most common tuning parameters control heap size, collection frequency, and object allocation patterns.

The RUBY_GC_HEAP_INIT_SLOTS environment variable sets the initial number of object slots available at startup. Ruby allocates additional slots as needed, but starting with appropriate capacity reduces early allocation overhead.

# Set before Ruby starts
# RUBY_GC_HEAP_INIT_SLOTS=100000
# Check current slot allocation
puts GC.stat[:heap_available_slots]   # Initial slot count
puts GC.stat[:heap_live_slots]        # Currently used slots
puts GC.stat[:heap_free_slots]        # Available unused slots

The growth factor controls how aggressively Ruby expands the heap when more space is needed. RUBY_GC_HEAP_GROWTH_FACTOR accepts decimal values, with higher values creating more slots during expansion phases.

# Environment: RUBY_GC_HEAP_GROWTH_FACTOR=1.8
# Runtime heap growth monitoring
before_slots = GC.stat[:heap_available_slots]
# Trigger allocation pressure
array = Array.new(50000) { |i| "object_#{i}" }
after_slots = GC.stat[:heap_available_slots]
growth = after_slots - before_slots
puts "Heap expanded by #{growth} slots"

The garbage collector uses allocation limits to determine when to run collection cycles. RUBY_GC_MALLOC_LIMIT sets the threshold for malloc-allocated memory that triggers collection, while RUBY_GC_MALLOC_LIMIT_MAX caps the upper limit.

# Check malloc limits and usage
stats = GC.stat
puts "Malloc limit: #{stats[:malloc_increase_bytes_limit]}"
puts "Current malloc: #{stats[:malloc_increase_bytes]}"
puts "Will GC when malloc reaches limit: #{stats[:malloc_increase_bytes] >= stats[:malloc_increase_bytes_limit]}"

Object allocation counting provides another collection trigger. RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR controls when major collections run based on old object accumulation.

# Monitor object age distribution
stats = GC.stat
total_objects = stats[:heap_live_slots]
old_objects = stats[:old_objects]
old_ratio = old_objects.to_f / total_objects
puts "Old object ratio: #{(old_ratio * 100).round(2)}%"
puts "Old object limit factor affects major GC frequency"

Dynamic GC control methods allow runtime adjustments without restarting the process. GC.start forces immediate collection, while GC.stress= enables continuous collection for debugging.

# Force garbage collection
before_count = GC.count
GC.start(full_mark: true, immediate_sweep: true)
after_count = GC.count
puts "Forced #{after_count - before_count} collection cycles"

# Enable GC stress testing (not for production)
GC.stress = true  # Collect after every allocation
# ... test allocation-heavy code ...
GC.stress = false # Disable stress mode

Collection can be temporarily disabled during critical sections where GC pauses would cause problems. GC.disable prevents automatic collection until GC.enable restores normal operation.

# Disable GC during time-sensitive operations
GC.disable
start_time = Time.now
# Critical real-time processing here
critical_operation()
end_time = Time.now
GC.enable

puts "Critical section ran #{end_time - start_time}s without GC interruption"

Performance & Memory

Garbage collection tuning directly impacts application performance through collection frequency, pause times, and memory utilization. Proper configuration reduces GC overhead while maintaining memory efficiency and preventing memory leaks.

Collection frequency depends on allocation rate and heap configuration. Applications creating many short-lived objects benefit from larger initial heaps to reduce collection frequency during startup phases.

# Measure allocation rate impact
def measure_allocation_performance(iterations)
  GC.start # Clean slate
  start_count = GC.count
  start_time = Time.now
  
  iterations.times { |i| "string_#{i}" }
  
  end_time = Time.now
  end_count = GC.count
  
  {
    elapsed: end_time - start_time,
    collections: end_count - start_count,
    rate: iterations / (end_time - start_time)
  }
end

# Compare different allocation patterns
small_objects = measure_allocation_performance(100_000)
large_objects = measure_allocation_performance(10_000)

puts "Small objects: #{small_objects[:rate].round} ops/sec, #{small_objects[:collections]} GCs"
puts "Large objects: #{large_objects[:rate].round} ops/sec, #{large_objects[:collections]} GCs"

Major collection frequency affects long-running application performance more than minor collections. Applications with stable object sets benefit from tuning old object limits to reduce major collection overhead.

# Monitor major vs minor collection performance
def gc_performance_snapshot
  stats = GC.stat
  {
    minor_collections: stats[:minor_gc_count],
    major_collections: stats[:major_gc_count], 
    total_time: stats[:time],
    average_minor: stats[:time] / [stats[:minor_gc_count], 1].max,
    average_major: stats[:time] / [stats[:major_gc_count], 1].max
  }
end

before = gc_performance_snapshot
# Run workload
10.times { Array.new(10_000) { |i| { id: i, data: "x" * 100 } } }
after = gc_performance_snapshot

puts "Minor GC time: #{(after[:average_minor] - before[:average_minor]) * 1000}ms"
puts "Major GC time: #{(after[:average_major] - before[:average_major]) * 1000}ms"

Memory fragmentation occurs when object allocation patterns leave unusable gaps in the heap. Fragmentation reduces effective memory utilization and forces premature heap expansion.

# Analyze heap fragmentation
def heap_fragmentation_analysis
  stats = GC.stat
  total_slots = stats[:heap_available_slots]
  used_slots = stats[:heap_live_slots]
  free_slots = stats[:heap_free_slots]
  
  utilization = used_slots.to_f / total_slots
  fragmentation = 1.0 - (used_slots.to_f / (used_slots + free_slots))
  
  {
    total_slots: total_slots,
    utilization: (utilization * 100).round(2),
    fragmentation: (fragmentation * 100).round(2),
    wasted_slots: total_slots - used_slots - free_slots
  }
end

# Create fragmentation scenario
large_objects = Array.new(1000) { "x" * 1000 }
large_objects = nil # Create holes in heap
small_objects = Array.new(5000) { "small" }

analysis = heap_fragmentation_analysis
puts "Heap utilization: #{analysis[:utilization]}%"
puts "Fragmentation: #{analysis[:fragmentation]}%"
puts "Wasted slots: #{analysis[:wasted_slots]}"

Object pool patterns reduce allocation pressure by reusing objects instead of creating new ones. Pools work best for frequently allocated objects with predictable usage patterns.

# Object pool implementation for GC optimization
class StringPool
  def initialize(initial_size = 100)
    @available = Array.new(initial_size) { String.new(capacity: 1000) }
    @in_use = []
  end
  
  def acquire
    if @available.empty?
      # Pool exhausted, create new object
      string = String.new(capacity: 1000)
    else
      string = @available.pop
      string.clear
    end
    @in_use << string
    string
  end
  
  def release(string)
    if @in_use.delete(string)
      string.clear
      @available << string if @available.size < 200 # Prevent unbounded growth
    end
  end
  
  def stats
    { available: @available.size, in_use: @in_use.size }
  end
end

# Usage comparison
pool = StringPool.new(50)

# With pool (reduced allocations)
before_count = GC.count
100.times do
  str = pool.acquire
  str << "some data processing"
  pool.release(str)
end
pooled_gcs = GC.count - before_count

# Without pool (many allocations)  
before_count = GC.count
100.times do
  str = String.new
  str << "some data processing"
end
unpooled_gcs = GC.count - before_count

puts "Pooled approach triggered #{pooled_gcs} GCs"
puts "Unpooled approach triggered #{unpooled_gcs} GCs"
puts "Pool stats: #{pool.stats}"

Production Patterns

Production environments require careful GC tuning to balance throughput, latency, and memory efficiency. Applications must monitor GC behavior and adjust parameters based on actual usage patterns and performance requirements.

Web applications experience variable load patterns that affect GC behavior. Request spikes create allocation bursts followed by quiet periods where collection can run with minimal impact.

# Production GC monitoring middleware
class GCMonitoringMiddleware
  def initialize(app, options = {})
    @app = app
    @gc_threshold = options[:gc_threshold] || 10
    @stats_interval = options[:stats_interval] || 100
    @request_count = 0
  end
  
  def call(env)
    before_stats = gc_snapshot
    
    response = @app.call(env)
    
    after_stats = gc_snapshot  
    request_gc_info = calculate_gc_impact(before_stats, after_stats)
    
    @request_count += 1
    
    if @request_count % @stats_interval == 0
      log_gc_summary(request_gc_info)
    end
    
    if request_gc_info[:collections] > @gc_threshold
      log_gc_warning(request_gc_info, env)
    end
    
    response
  end
  
  private
  
  def gc_snapshot
    stats = GC.stat
    {
      count: stats[:count],
      time: stats[:time],
      live_slots: stats[:heap_live_slots],
      free_slots: stats[:heap_free_slots]
    }
  end
  
  def calculate_gc_impact(before, after)
    {
      collections: after[:count] - before[:count],
      time_spent: after[:time] - before[:time],
      slot_change: after[:live_slots] - before[:live_slots],
      timestamp: Time.now
    }
  end
  
  def log_gc_summary(info)
    puts "[GC] Request #{@request_count}: #{info[:collections]} collections, #{info[:time_spent]}ms, #{info[:slot_change]} slot change"
  end
  
  def log_gc_warning(info, env)
    puts "[GC WARNING] High GC activity: #{info[:collections]} collections for #{env['REQUEST_PATH']}"
  end
end

Background job processing requires different GC strategies than web requests. Long-running jobs can tolerate larger GC pauses in exchange for better memory efficiency and reduced collection frequency.

# Background job GC optimization
class GCOptimizedJobProcessor
  def initialize
    @jobs_processed = 0
    @gc_interval = 50  # Force GC every N jobs
    @initial_gc_stats = GC.stat
  end
  
  def process_job(job)
    # Disable GC during critical job processing if needed
    gc_was_enabled = GC.enable
    GC.disable if job.priority == :critical
    
    result = job.execute
    
    # Re-enable GC if it was disabled
    GC.enable if gc_was_enabled && !GC.enable
    
    @jobs_processed += 1
    
    # Periodic cleanup to prevent memory buildup
    if @jobs_processed % @gc_interval == 0
      perform_maintenance_gc
    end
    
    result
  rescue => e
    GC.enable # Ensure GC is re-enabled on exceptions
    raise
  end
  
  private
  
  def perform_maintenance_gc
    before = GC.stat
    
    # Force full collection
    GC.start(full_mark: true, immediate_sweep: true)
    
    after = GC.stat
    memory_freed = (before[:heap_live_slots] - after[:heap_live_slots]) * 40 # Rough bytes per slot
    
    puts "[GC Maintenance] Processed #{@jobs_processed} jobs, freed ~#{memory_freed / 1024}KB"
    
    # Reset interval based on memory pressure
    utilization = after[:heap_live_slots].to_f / after[:heap_available_slots]
    @gc_interval = utilization > 0.8 ? 25 : 50
  end
end

Memory leak detection requires tracking object growth patterns over time. Production applications should monitor heap growth and object accumulation to identify potential leaks.

# Production memory leak detection
class MemoryLeakDetector
  def initialize(threshold_mb = 100, sample_interval = 300)
    @threshold_bytes = threshold_mb * 1024 * 1024
    @sample_interval = sample_interval
    @samples = []
    @last_sample_time = Time.now
    
    start_monitoring
  end
  
  def sample_memory_usage
    stats = GC.stat
    sample = {
      timestamp: Time.now,
      live_slots: stats[:heap_live_slots],
      total_allocated: stats[:total_allocated_objects],
      heap_pages: stats[:heap_allocated_pages],
      process_rss: get_process_rss
    }
    
    @samples << sample
    @samples.shift if @samples.size > 100  # Keep recent history
    
    check_for_leaks if @samples.size > 10
    sample
  end
  
  private
  
  def start_monitoring
    Thread.new do
      loop do
        sleep(@sample_interval)
        sample_memory_usage
      end
    end
  end
  
  def get_process_rss
    # Platform-specific RSS retrieval
    if File.exist?('/proc/self/status')
      File.readlines('/proc/self/status').each do |line|
        if line.start_with?('VmRSS:')
          return line.scan(/\d+/).first.to_i * 1024  # Convert KB to bytes
        end
      end
    end
    0  # Fallback for non-Linux systems
  end
  
  def check_for_leaks
    recent_samples = @samples.last(10)
    oldest = recent_samples.first
    newest = recent_samples.last
    
    slot_growth = newest[:live_slots] - oldest[:live_slots]
    time_span = newest[:timestamp] - oldest[:timestamp]
    growth_rate = slot_growth / time_span  # Slots per second
    
    rss_growth = newest[:process_rss] - oldest[:process_rss]
    
    if rss_growth > @threshold_bytes || growth_rate > 100
      log_potential_leak(oldest, newest, growth_rate, rss_growth)
    end
  end
  
  def log_potential_leak(oldest, newest, growth_rate, rss_growth)
    puts "[MEMORY LEAK WARNING]"
    puts "  Slot growth rate: #{growth_rate.round(2)} slots/sec"
    puts "  RSS growth: #{rss_growth / 1024 / 1024}MB"
    puts "  Time span: #{(newest[:timestamp] - oldest[:timestamp]).round}s"
    puts "  Current live slots: #{newest[:live_slots]}"
    puts "  Current RSS: #{newest[:process_rss] / 1024 / 1024}MB"
  end
end

Container deployments require specific GC configurations to work within memory limits. Ruby applications in Docker containers should configure GC parameters based on container resource constraints.

# Container-aware GC configuration
class ContainerGCConfig
  def self.configure_for_container
    container_memory = detect_container_memory_limit
    cpu_count = detect_container_cpu_limit
    
    if container_memory > 0
      configure_memory_based_gc(container_memory)
    end
    
    if cpu_count > 0
      configure_cpu_based_gc(cpu_count)
    end
    
    log_gc_configuration
  end
  
  private
  
  def self.detect_container_memory_limit
    # Check cgroup memory limit
    memory_files = ['/sys/fs/cgroup/memory/memory.limit_in_bytes', 
                   '/sys/fs/cgroup/memory.max']
    
    memory_files.each do |file|
      if File.exist?(file)
        limit = File.read(file).strip.to_i
        # cgroup v1 uses very large numbers for unlimited
        return limit if limit < (2**62)
      end
    end
    
    0  # No limit detected
  end
  
  def self.detect_container_cpu_limit
    if File.exist?('/sys/fs/cgroup/cpu/cpu.cfs_quota_us')
      quota = File.read('/sys/fs/cgroup/cpu/cpu.cfs_quota_us').strip.to_i
      period = File.read('/sys/fs/cgroup/cpu/cpu.cfs_period_us').strip.to_i
      return (quota.to_f / period).ceil if quota > 0 && period > 0
    end
    
    0  # No limit detected
  end
  
  def self.configure_memory_based_gc(container_memory_bytes)
    # Allocate ~60% of container memory to Ruby heap
    heap_memory = (container_memory_bytes * 0.6).to_i
    estimated_slots = heap_memory / 40  # Rough bytes per slot
    
    ENV['RUBY_GC_HEAP_INIT_SLOTS'] = (estimated_slots * 0.1).to_i.to_s
    ENV['RUBY_GC_HEAP_FREE_SLOTS'] = (estimated_slots * 0.05).to_i.to_s
    ENV['RUBY_GC_HEAP_GROWTH_FACTOR'] = '1.2'  # Conservative growth
    
    # Set malloc limits based on available memory
    malloc_limit = [container_memory_bytes / 32, 16 * 1024 * 1024].max
    ENV['RUBY_GC_MALLOC_LIMIT'] = malloc_limit.to_s
    ENV['RUBY_GC_MALLOC_LIMIT_MAX'] = (malloc_limit * 4).to_s
  end
  
  def self.configure_cpu_based_gc(cpu_count)
    # Adjust GC aggressiveness based on CPU availability
    if cpu_count == 1
      # Single CPU: prefer throughput over low latency
      ENV['RUBY_GC_HEAP_GROWTH_FACTOR'] = '1.8'
      ENV['RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR'] = '2.0'
    else
      # Multi CPU: balance latency and throughput
      ENV['RUBY_GC_HEAP_GROWTH_FACTOR'] = '1.4'
      ENV['RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR'] = '1.2'
    end
  end
  
  def self.log_gc_configuration
    puts "[GC] Container-aware configuration applied:"
    puts "  RUBY_GC_HEAP_INIT_SLOTS=#{ENV['RUBY_GC_HEAP_INIT_SLOTS']}"
    puts "  RUBY_GC_HEAP_GROWTH_FACTOR=#{ENV['RUBY_GC_HEAP_GROWTH_FACTOR']}"
    puts "  RUBY_GC_MALLOC_LIMIT=#{ENV['RUBY_GC_MALLOC_LIMIT']}"
  end
end

Common Pitfalls

Garbage collection tuning involves numerous subtle behaviors that can lead to counterintuitive results or performance regressions. Understanding these pitfalls helps avoid common configuration mistakes and debugging difficulties.

Excessive heap size configuration can hurt performance by increasing collection times. Larger heaps mean more objects to scan during each collection cycle, leading to longer pause times despite less frequent collections.

# Demonstrate heap size vs pause time trade-off
def measure_gc_pause_times(heap_init_slots)
  # Configure heap size
  original_env = ENV['RUBY_GC_HEAP_INIT_SLOTS']
  ENV['RUBY_GC_HEAP_INIT_SLOTS'] = heap_init_slots.to_s
  
  # Restart would be needed for env var to take effect
  # This demonstrates the measurement approach
  
  # Create significant object churn
  objects = []
  pause_times = []
  
  10.times do
    start_time = Time.now
    GC.start(full_mark: true, immediate_sweep: true)
    pause_time = Time.now - start_time
    pause_times << pause_time
    
    # Add objects to increase next collection time
    objects.concat(Array.new(10_000) { |i| "object_#{i}_#{rand(1000)}" })
  end
  
  ENV['RUBY_GC_HEAP_INIT_SLOTS'] = original_env
  
  {
    heap_size: heap_init_slots,
    average_pause: pause_times.sum / pause_times.size,
    max_pause: pause_times.max,
    live_objects: GC.stat[:heap_live_slots]
  }
end

# Compare different heap sizes
small_heap = measure_gc_pause_times(10_000)
large_heap = measure_gc_pause_times(500_000)

puts "Small heap - Avg pause: #{(small_heap[:average_pause] * 1000).round(2)}ms"
puts "Large heap - Avg pause: #{(large_heap[:average_pause] * 1000).round(2)}ms"
puts "Pause time increase: #{((large_heap[:average_pause] / small_heap[:average_pause] - 1) * 100).round}%"

Disabling GC for too long causes memory exhaustion and severe performance degradation when collection finally runs. Applications that disable GC must carefully limit the scope and monitor memory usage.

# Dangerous GC disable pattern
class ProblematicGCDisabling
  def process_batch(items)
    GC.disable  # Dangerous: unbounded disable scope
    
    results = []
    items.each do |item|
      # Each iteration allocates without cleanup
      processed = expensive_processing(item)  # Creates many temp objects
      results << processed
    end
    
    GC.enable  # Massive pause when GC finally runs
    results
  end
  
  def expensive_processing(item)
    # Simulates processing that creates temporary objects
    temp_data = Array.new(1000) { |i| "temp_#{item}_#{i}" }
    temp_data.map { |str| str.upcase }.join(",")
  end
end

# Better approach with bounded GC disable
class SafeGCManagement
  def process_batch(items, batch_size = 100)
    results = []
    
    items.each_slice(batch_size) do |batch|
      # Disable GC only for small batches
      GC.disable
      
      batch_results = batch.map { |item| expensive_processing(item) }
      results.concat(batch_results)
      
      GC.enable
      
      # Force cleanup between batches to prevent buildup
      GC.start if results.size % (batch_size * 5) == 0
    end
    
    results
  end
  
  def expensive_processing(item)
    temp_data = Array.new(1000) { |i| "temp_#{item}_#{i}" }
    temp_data.map { |str| str.upcase }.join(",")
  end
end

# Demonstrate the difference
items = (1..1000).to_a

# Measure problematic approach
start_memory = GC.stat[:heap_live_slots]
problematic = ProblematicGCDisabling.new
start_time = Time.now
problematic_results = problematic.process_batch(items)
problematic_time = Time.now - start_time
peak_memory = GC.stat[:heap_live_slots]

# Measure safe approach  
GC.start  # Clean slate
safe = SafeGCManagement.new
start_time = Time.now
safe_results = safe.process_batch(items)
safe_time = Time.now - start_time
final_memory = GC.stat[:heap_live_slots]

puts "Problematic approach: #{problematic_time.round(3)}s, peak memory: #{peak_memory} slots"
puts "Safe approach: #{safe_time.round(3)}s, final memory: #{final_memory} slots"

Object retention through unintended references prevents garbage collection and causes memory leaks. Ruby's reference counting means any accessible reference keeps objects alive regardless of intended usage.

# Subtle reference retention patterns
class SubtleMemoryLeak
  def initialize
    @event_handlers = {}
    @cached_data = {}
  end
  
  # Problematic: creates circular references
  def register_handler(event_type, target_object)
    handler = proc do |data|
      # This proc captures 'self' and 'target_object'
      process_event(data, target_object)
      
      # Problem: handler references target_object,
      # target_object might reference this instance
      @cached_data[target_object.object_id] = data
    end
    
    @event_handlers[event_type] = handler
    
    # Return handler so caller can store it (more references!)
    handler
  end
  
  def process_event(data, target)
    puts "Processing #{data} for #{target.class}"
  end
  
  # Cleanup method that doesn't actually clean up
  def cleanup
    @event_handlers.clear  # Clears handlers but not cached_data!
    # @cached_data still holds references to target objects
  end
end

# Demonstrate reference retention
class TestTarget
  attr_accessor :leak_manager
  
  def initialize(id)
    @id = id
  end
  
  def to_s
    "Target-#{@id}"
  end
end

# Create retention scenario
leak_manager = SubtleMemoryLeak.new
targets = Array.new(1000) { |i| TestTarget.new(i) }

# Create circular references
targets.each_with_index do |target, i|
  target.leak_manager = leak_manager  # Target references manager
  leak_manager.register_handler("event_#{i}", target)  # Manager references target
end

# "Cleanup" doesn't break cycles
before_cleanup = GC.stat[:heap_live_slots]
targets = nil  # Release array reference
leak_manager.cleanup
GC.start
after_partial_cleanup = GC.stat[:heap_live_slots]

# Proper cleanup breaks reference cycles
leak_manager = nil
GC.start  
after_full_cleanup = GC.stat[:heap_live_slots]

puts "Before cleanup: #{before_cleanup} slots"
puts "After partial cleanup: #{after_partial_cleanup} slots"
puts "After full cleanup: #{after_full_cleanup} slots"
puts "Objects retained by partial cleanup: #{after_partial_cleanup - after_full_cleanup}"

Premature GC optimization can hurt performance more than help. Applications should measure actual GC impact before applying optimizations, as Ruby's default settings work well for many scenarios.

# Premature GC optimization example
class PrematureOptimization
  def initialize
    # "Optimizing" without measuring
    GC.disable  # Blanket disable - bad idea
    
    # Setting arbitrary parameters
    ENV['RUBY_GC_HEAP_INIT_SLOTS'] = '1000000'  # Way too large
    ENV['RUBY_GC_HEAP_GROWTH_FACTOR'] = '3.0'   # Aggressive growth
    
    @object_pool = Array.new(10000) { String.new }  # Premature pooling
  end
  
  def process_data(data_set)
    # Forcing manual GC everywhere
    GC.start  # Unnecessary forced collection
    
    results = data_set.map do |item|
      # Using object pool for everything
      pooled_string = @object_pool.pop || String.new
      pooled_string.clear
      pooled_string << item.to_s
      
      result = pooled_string.upcase
      @object_pool.push(pooled_string)  # Return to pool
      
      GC.start if rand(100) == 0  # Random GC triggering
      
      result
    end
    
    GC.start  # More unnecessary GC
    results
  end
end

# Baseline: Ruby defaults
class DefaultBehavior  
  def process_data(data_set)
    # Let Ruby handle GC naturally
    data_set.map { |item| item.to_s.upcase }
  end
end

# Performance comparison
data_set = (1..10000).to_a

# Measure default behavior
default_processor = DefaultBehavior.new
start_time = Time.now
start_gc_count = GC.count
default_results = default_processor.process_data(data_set)
default_time = Time.now - start_time
default_gc_count = GC.count - start_gc_count

# Measure "optimized" behavior
optimized_processor = PrematureOptimization.new
start_time = Time.now  
start_gc_count = GC.count
optimized_results = optimized_processor.process_data(data_set)
optimized_time = Time.now - start_time
optimized_gc_count = GC.count - start_gc_count

puts "Default approach: #{default_time.round(3)}s, #{default_gc_count} GCs"
puts "Optimized approach: #{optimized_time.round(3)}s, #{optimized_gc_count} GCs" 
puts "Performance change: #{((optimized_time / default_time - 1) * 100).round}%"

Reference

Environment Variables

Variable Type Default Description
RUBY_GC_HEAP_INIT_SLOTS Integer 10000 Initial number of object slots in heap
RUBY_GC_HEAP_FREE_SLOTS Integer 4096 Minimum free slots maintained after GC
RUBY_GC_HEAP_GROWTH_FACTOR Float 1.8 Multiplier for heap expansion
RUBY_GC_HEAP_GROWTH_MAX_SLOTS Integer 0 Maximum slots to add during expansion (0 = unlimited)
RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR Float 2.0 Factor controlling major GC frequency
RUBY_GC_MALLOC_LIMIT Integer 16MB Malloc bytes threshold triggering GC
RUBY_GC_MALLOC_LIMIT_MAX Integer 32MB Maximum malloc limit
RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR Float 1.4 Growth factor for malloc limit
RUBY_GC_OLDMALLOC_LIMIT Integer 16MB Old malloc bytes triggering major GC
RUBY_GC_OLDMALLOC_LIMIT_MAX Integer 128MB Maximum old malloc limit

GC Module Methods

Method Parameters Returns Description
GC.count None Integer Total number of GC runs
GC.start(**opts) full_mark: Boolean, immediate_sweep: Boolean nil Force garbage collection
GC.enable None Boolean Enable GC and return previous state
GC.disable None Boolean Disable GC and return previous state
GC.stress None Boolean Current GC stress mode state
GC.stress=(value) value: Boolean Boolean Enable/disable GC stress mode
GC.stat(key = nil) key: Symbol (optional) Hash or value GC statistics
GC.latest_gc_info(key = nil) key: Symbol (optional) Hash or value Info about most recent GC
GC.compact None Hash Compact heap and return move info
GC.verify_compaction_references None nil Debug compaction reference issues

GC.stat Keys

Key Type Description
:count Integer Total GC runs since start
:time Integer Total GC time in microseconds
:minor_gc_count Integer Young generation collections
:major_gc_count Integer Old generation collections
:heap_allocated_pages Integer Total heap pages allocated
:heap_available_slots Integer Total object slots available
:heap_live_slots Integer Slots containing live objects
:heap_free_slots Integer Empty slots available for allocation
:heap_final_slots Integer Slots containing objects with finalizers
:heap_marked_slots Integer Slots marked during last collection
:heap_swept_slots Integer Slots swept during last collection
:malloc_increase_bytes Integer Current malloc increase since last GC
:malloc_increase_bytes_limit Integer Malloc increase limit triggering GC
:old_objects Integer Objects promoted to old generation
:old_objects_limit Integer Old object threshold for major GC
:oldmalloc_increase_bytes Integer Old malloc increase since major GC
:oldmalloc_increase_bytes_limit Integer Old malloc limit for major GC
:remembered_wb_unprotected_objects Integer Write barrier unprotected objects
:remembered_wb_unprotected_objects_limit Integer WB unprotected limit
:total_allocated_objects Integer Total objects allocated since start
:total_freed_objects Integer Total objects freed since start

GC Options for start()

Option Type Default Description
:full_mark Boolean true Perform full marking phase
:immediate_sweep Boolean true Sweep immediately after marking
:immediate_mark Boolean false Mark objects immediately

Tuning Guidelines by Application Type

Application Type Heap Init Growth Factor Malloc Limit Notes
Web Application 100,000 1.4 32MB Balance latency and throughput
Background Jobs 50,000 1.8 64MB Favor throughput over latency
Data Processing 200,000 2.0 128MB Large working sets
API Server 150,000 1.2 24MB Minimize response time variance
Development 25,000 1.8 16MB Default values work well

Memory Calculation Formulas

Calculation Formula Purpose
Heap Size (bytes) heap_available_slots * 40 Approximate heap memory usage
Utilization Ratio heap_live_slots / heap_available_slots Memory efficiency percentage
Collection Frequency total_allocated_objects / count Objects allocated per GC cycle
Average Pause Time time / count Microseconds per collection
Major GC Ratio major_gc_count / count Percentage of major collections