CrackedRuby logo

CrackedRuby

Garbage Collection Control

Manual control and monitoring of Ruby's garbage collection system for memory optimization and performance tuning.

Core Modules GC Module
3.6.1

Overview

Ruby's garbage collection system runs automatically to reclaim memory from objects no longer referenced by the program. The GC module provides methods to manually control collection timing, gather statistics, and monitor memory usage patterns. Ruby uses a mark-and-sweep garbage collector with generational collection that divides objects into young and old generations.

The garbage collector identifies unreachable objects by marking all objects accessible from root references, then sweeping through memory to free unmarked objects. Ruby's GC operates in phases: marking, sweeping, and compacting (in newer versions). Each phase impacts application performance differently.

# Basic GC information
GC.count          # => 42 (number of GC runs)
GC.stat[:count]   # => 42 (same information via stats)
GC.stat[:heap_allocated_pages]  # => 245

Ruby automatically triggers garbage collection when memory allocation reaches certain thresholds. The collector runs more frequently for young generation objects and less frequently for long-lived objects. This generational approach improves performance since most objects become garbage quickly.

# Check current GC status
GC.stat.select { |k, v| k.to_s.include?('count') }
# => {:count=>42, :major_gc_count=>12, :minor_gc_count=>30}

# Memory usage information
GC.stat[:heap_live_slots]     # => 125432
GC.stat[:heap_free_slots]     # => 8765
GC.stat[:total_allocated_objects]  # => 1456789

The GC module exposes methods for manual collection, enabling/disabling automatic collection, and retrieving detailed statistics about memory usage and collection frequency. These controls allow developers to optimize performance-critical sections and gather memory profiling data.

Basic Usage

Manual garbage collection control starts with GC.start, which immediately triggers a full collection cycle. This method blocks execution until collection completes, making it suitable for controlled environments but problematic in request-response cycles.

# Force garbage collection
objects_before = ObjectSpace.count_objects[:T_STRING]
1000.times { "temporary string #{rand}" }
objects_after = ObjectSpace.count_objects[:T_STRING]
puts "Created #{objects_after - objects_before} strings"

GC.start
objects_final = ObjectSpace.count_objects[:T_STRING]
puts "Collected #{objects_after - objects_final} strings"

Disabling automatic garbage collection with GC.disable prevents Ruby from running collection cycles automatically. This creates memory pressure but eliminates GC pauses during critical operations. Memory allocation continues until manual collection or re-enabling automatic collection.

# Disable GC for performance-critical section
GC.disable
start_time = Time.now

# Memory-intensive operation without GC interruption
large_array = []
10_000.times do |i|
  large_array << { id: i, data: "item_#{i}" * 100 }
end

processing_time = Time.now - start_time
memory_used = GC.stat[:heap_live_slots]

GC.enable
GC.start  # Clean up accumulated garbage

puts "Processed without GC in #{processing_time}s"
puts "Memory slots used: #{memory_used}"

The GC.enable method reactivates automatic garbage collection after being disabled. Ruby immediately evaluates whether collection is needed based on current memory pressure and allocation patterns.

# Check if GC is enabled
puts "GC enabled: #{GC.enable?}"  # => true

GC.disable
puts "GC enabled: #{GC.enable?}"  # => false

# Process some data
data = (1..5000).map { |n| n.to_s * 10 }

GC.enable
puts "GC enabled: #{GC.enable?}"  # => true

Accessing garbage collection statistics provides insight into memory usage patterns and collection frequency. The GC.stat method returns a hash with detailed information about heap usage, object counts, and collection timing.

# Comprehensive GC statistics
stats = GC.stat
puts "Total collections: #{stats[:count]}"
puts "Major collections: #{stats[:major_gc_count]}"
puts "Minor collections: #{stats[:minor_gc_count]}"
puts "Live objects: #{stats[:heap_live_slots]}"
puts "Free slots: #{stats[:heap_free_slots]}"
puts "Allocated objects: #{stats[:total_allocated_objects]}"
puts "Freed objects: #{stats[:total_freed_objects]}"

Performance & Memory

Garbage collection timing significantly impacts application performance. Each collection cycle pauses execution while marking and sweeping memory, creating latency spikes in responsive applications. Understanding collection patterns helps optimize memory allocation strategies.

# Measure GC impact on performance
def benchmark_with_gc_stats
  start_stats = GC.stat
  start_time = Time.now
  
  # Memory-intensive operation
  arrays = []
  1000.times do
    arrays << Array.new(1000) { rand(1000) }
  end
  
  end_time = Time.now
  end_stats = GC.stat
  
  {
    duration: end_time - start_time,
    gc_runs: end_stats[:count] - start_stats[:count],
    objects_allocated: end_stats[:total_allocated_objects] - start_stats[:total_allocated_objects],
    objects_freed: end_stats[:total_freed_objects] - start_stats[:total_freed_objects]
  }
end

results = benchmark_with_gc_stats
puts "Duration: #{results[:duration]}s"
puts "GC runs: #{results[:gc_runs]}"
puts "Objects allocated: #{results[:objects_allocated]}"
puts "Objects freed: #{results[:objects_freed]}"

Memory allocation patterns affect garbage collection frequency. Creating many short-lived objects triggers frequent minor collections, while long-lived objects accumulate in older generations and require major collections.

# Compare allocation strategies
def create_many_small_objects
  GC.start  # Clean slate
  start_count = GC.count
  
  10_000.times { "string_#{rand(1000)}" }
  
  GC.count - start_count
end

def create_few_large_objects  
  GC.start  # Clean slate
  start_count = GC.count
  
  100.times { "x" * 100_000 }
  
  GC.count - start_count
end

small_gc_count = create_many_small_objects
large_gc_count = create_few_large_objects

puts "Small objects triggered #{small_gc_count} collections"
puts "Large objects triggered #{large_gc_count} collections"

Heap growth patterns indicate memory usage efficiency. Ruby allocates memory in pages, and excessive page allocation suggests memory pressure or inefficient object lifecycle management.

# Monitor heap growth during processing
def monitor_heap_growth(&block)
  initial_pages = GC.stat[:heap_allocated_pages]
  initial_slots = GC.stat[:heap_available_slots]
  
  yield
  
  final_pages = GC.stat[:heap_allocated_pages]
  final_slots = GC.stat[:heap_available_slots]
  
  {
    pages_added: final_pages - initial_pages,
    slots_added: final_slots - initial_slots,
    pages_total: final_pages,
    slots_total: final_slots
  }
end

# Test with different workloads
small_growth = monitor_heap_growth { 1000.times { |i| i.to_s } }
large_growth = monitor_heap_growth { 1000.times { |i| "data" * 1000 } }

puts "Small strings: #{small_growth[:pages_added]} pages, #{small_growth[:slots_added]} slots"
puts "Large strings: #{large_growth[:pages_added]} pages, #{large_growth[:slots_added]} slots"

Object lifecycle optimization reduces garbage collection pressure by reusing objects instead of creating new instances. Pooling strategies and in-place modifications minimize allocation overhead.

# Compare object creation vs reuse
class StringProcessor
  def initialize
    @buffer = String.new
  end
  
  def process_with_reuse(data)
    @buffer.clear
    data.each do |item|
      @buffer << item.to_s
      @buffer << "\n"
    end
    @buffer.dup
  end
  
  def process_with_creation(data)
    result = ""
    data.each do |item|
      result += item.to_s + "\n"
    end
    result
  end
end

processor = StringProcessor.new
test_data = (1..1000).to_a

# Benchmark both approaches
reuse_stats = GC.stat
processor.process_with_reuse(test_data)
reuse_allocated = GC.stat[:total_allocated_objects] - reuse_stats[:total_allocated_objects]

creation_stats = GC.stat
processor.process_with_creation(test_data)
creation_allocated = GC.stat[:total_allocated_objects] - creation_stats[:total_allocated_objects]

puts "Reuse approach allocated: #{reuse_allocated} objects"
puts "Creation approach allocated: #{creation_allocated} objects"
puts "Reduction: #{((creation_allocated - reuse_allocated) / creation_allocated.to_f * 100).round(2)}%"

Thread Safety & Concurrency

Ruby's garbage collector operates across all threads simultaneously, pausing execution in all threads during collection phases. This stop-the-world behavior affects multithreaded applications differently than single-threaded programs, requiring careful consideration of GC timing and thread coordination.

# Demonstrate GC impact on multiple threads
require 'thread'

def create_worker_thread(name, work_size)
  Thread.new do
    puts "#{name} starting work"
    start_time = Time.now
    
    # Create objects that will become garbage
    work_size.times do |i|
      data = Array.new(100) { rand(1000) }
      # Process data briefly then discard
      data.sum if i % 100 == 0
    end
    
    duration = Time.now - start_time
    puts "#{name} completed in #{duration}s"
  end
end

# Start multiple threads with different workloads
threads = [
  create_worker_thread("Heavy", 10_000),
  create_worker_thread("Medium", 5_000), 
  create_worker_thread("Light", 1_000)
]

# Monitor GC while threads run
gc_monitor = Thread.new do
  initial_count = GC.count
  while threads.any?(&:alive?)
    current_count = GC.count
    if current_count > initial_count
      puts "GC occurred - pausing all threads"
      initial_count = current_count
    end
    sleep 0.1
  end
end

threads.each(&:join)
gc_monitor.kill

Manual garbage collection in multithreaded environments affects all threads simultaneously. Calling GC.start from any thread pauses execution across the entire process, making it unsuitable for background collection in responsive applications.

# Show cross-thread GC impact
require 'thread'

shared_data = []
mutex = Mutex.new
gc_thread_active = true

# Background thread adding data
producer = Thread.new do
  counter = 0
  while gc_thread_active
    mutex.synchronize do
      shared_data << { id: counter, timestamp: Time.now }
      counter += 1
    end
    sleep 0.01
  end
end

# Foreground thread triggering GC
gc_controller = Thread.new do
  sleep 1  # Let producer run
  puts "Triggering GC - will pause producer"
  
  start_size = nil
  mutex.synchronize { start_size = shared_data.size }
  
  GC.start  # This pauses ALL threads
  
  end_size = nil
  mutex.synchronize { end_size = shared_data.size }
  
  puts "Data added during GC: #{end_size - start_size} (should be 0)"
  gc_thread_active = false
end

[producer, gc_controller].each(&:join)

Disabling garbage collection affects memory pressure across all threads. When one thread disables GC, memory allocation continues in all threads until collection is manually triggered or re-enabled from any thread.

# Memory pressure across threads with disabled GC
require 'thread'

memory_data = {}
threads_finished = false

# Thread 1: Allocates large objects
allocator = Thread.new do
  counter = 0
  while !threads_finished
    # Create large temporary objects
    large_data = Array.new(10_000) { "data_#{counter}_#{rand(1000)}" }
    counter += 1
    sleep 0.05
  end
  puts "Allocator created #{counter} large arrays"
end

# Thread 2: Monitors memory usage
monitor = Thread.new do
  GC.disable  # Disable GC from this thread affects all threads
  
  while !threads_finished
    stats = GC.stat
    memory_data[Time.now] = {
      live_slots: stats[:heap_live_slots],
      free_slots: stats[:heap_free_slots],
      pages: stats[:heap_allocated_pages]
    }
    sleep 0.1
  end
  
  GC.enable
  GC.start  # Clean up accumulated objects
  puts "Memory monitoring complete"
end

sleep 2
threads_finished = true
[allocator, monitor].each(&:join)

# Show memory growth
sorted_times = memory_data.keys.sort
first_sample = memory_data[sorted_times.first]
last_sample = memory_data[sorted_times.last]

puts "Live slots grew from #{first_sample[:live_slots]} to #{last_sample[:live_slots]}"
puts "Pages grew from #{first_sample[:pages]} to #{last_sample[:pages]}"

Production Patterns

Production applications require garbage collection monitoring to identify memory leaks, optimize performance, and prevent out-of-memory conditions. Establishing baseline GC metrics helps detect abnormal memory usage patterns before they impact system stability.

# Production GC monitoring system
class GCMonitor
  def initialize(alert_threshold: 100, sample_interval: 30)
    @alert_threshold = alert_threshold
    @sample_interval = sample_interval
    @baseline_stats = nil
    @alert_callbacks = []
  end
  
  def establish_baseline
    # Take multiple samples to establish normal GC patterns
    samples = []
    5.times do
      samples << GC.stat.dup
      sleep @sample_interval / 5
    end
    
    @baseline_stats = {
      avg_major_gc_time: samples.map { |s| s[:major_gc_count] }.sum / samples.size.to_f,
      avg_minor_gc_time: samples.map { |s| s[:minor_gc_count] }.sum / samples.size.to_f,
      avg_heap_pages: samples.map { |s| s[:heap_allocated_pages] }.sum / samples.size.to_f
    }
    
    puts "Baseline established: #{@baseline_stats}"
  end
  
  def monitor_continuously
    Thread.new do
      loop do
        current_stats = GC.stat
        check_for_anomalies(current_stats)
        sleep @sample_interval
      end
    end
  end
  
  def on_alert(&callback)
    @alert_callbacks << callback
  end
  
  private
  
  def check_for_anomalies(stats)
    return unless @baseline_stats
    
    alerts = []
    
    # Check for excessive major GC
    if stats[:major_gc_count] > @baseline_stats[:avg_major_gc_time] * 2
      alerts << "High major GC count: #{stats[:major_gc_count]}"
    end
    
    # Check for heap growth
    if stats[:heap_allocated_pages] > @baseline_stats[:avg_heap_pages] * 1.5
      alerts << "Heap growth: #{stats[:heap_allocated_pages]} pages"
    end
    
    # Check for low free slots (memory pressure)
    if stats[:heap_free_slots] < stats[:heap_live_slots] * 0.1
      alerts << "Low free slots: #{stats[:heap_free_slots]}"
    end
    
    alerts.each { |alert| trigger_alert(alert, stats) }
  end
  
  def trigger_alert(message, stats)
    @alert_callbacks.each { |callback| callback.call(message, stats) }
  end
end

# Set up production monitoring
monitor = GCMonitor.new(sample_interval: 60)

monitor.on_alert do |message, stats|
  puts "[ALERT] #{Time.now}: #{message}"
  puts "  Live objects: #{stats[:heap_live_slots]}"
  puts "  GC count: #{stats[:count]}"
  # In production: send to logging system, metrics service, etc.
end

monitor.establish_baseline
monitoring_thread = monitor.monitor_continuously

# Simulate production load
load_simulation = Thread.new do
  1000.times do |i|
    # Simulate request processing
    request_data = Array.new(rand(100..500)) { "request_#{i}_#{rand(1000)}" }
    
    # Occasionally create memory pressure
    if i % 100 == 0
      large_response = "x" * 100_000
    end
    
    sleep 0.1
  end
end

# Let monitoring run
load_simulation.join
sleep 10  # Allow final monitoring samples

Web applications benefit from strategic garbage collection timing between requests to minimize response latency. Middleware can trigger collection during idle periods or after memory-intensive operations.

# Rack middleware for GC optimization
class GCOptimizationMiddleware
  def initialize(app, options = {})
    @app = app
    @gc_frequency = options[:gc_frequency] || 10
    @memory_threshold = options[:memory_threshold] || 50_000
    @request_count = 0
  end
  
  def call(env)
    pre_request_stats = GC.stat
    
    # Process the request
    status, headers, response = @app.call(env)
    
    post_request_stats = GC.stat
    objects_allocated = post_request_stats[:total_allocated_objects] - 
                       pre_request_stats[:total_allocated_objects]
    
    # Decide whether to trigger GC
    should_gc = should_trigger_gc?(objects_allocated)
    
    if should_gc
      GC.start
      log_gc_decision(env, objects_allocated, true)
    end
    
    @request_count += 1
    [status, headers, response]
  end
  
  private
  
  def should_trigger_gc?(objects_allocated)
    # Trigger GC based on multiple criteria
    return true if objects_allocated > @memory_threshold
    return true if @request_count % @gc_frequency == 0
    return true if GC.stat[:heap_free_slots] < GC.stat[:heap_live_slots] * 0.2
    
    false
  end
  
  def log_gc_decision(env, objects_allocated, triggered)
    path = env['PATH_INFO']
    method = env['REQUEST_METHOD']
    
    if triggered
      puts "[GC] Triggered after #{method} #{path} (#{objects_allocated} objects)"
    end
  end
end

# Usage in web application
# use GCOptimizationMiddleware, gc_frequency: 5, memory_threshold: 100_000

Database-heavy applications require careful GC management during bulk operations to prevent memory exhaustion while processing large result sets.

# Database batch processing with GC management
class BatchProcessor
  def initialize(batch_size: 1000, gc_interval: 10)
    @batch_size = batch_size
    @gc_interval = gc_interval
  end
  
  def process_large_dataset(query)
    batch_count = 0
    total_processed = 0
    gc_stats = { initial: GC.stat.dup }
    
    # Simulate database cursor/streaming
    simulate_database_results(query) do |batch|
      process_batch(batch)
      batch_count += 1
      total_processed += batch.size
      
      # Periodic GC to manage memory
      if batch_count % @gc_interval == 0
        before_gc = GC.stat[:heap_live_slots]
        GC.start
        after_gc = GC.stat[:heap_live_slots]
        
        puts "Batch #{batch_count}: Processed #{total_processed} records"
        puts "  GC freed #{before_gc - after_gc} slots"
        puts "  Memory pages: #{GC.stat[:heap_allocated_pages]}"
      end
    end
    
    gc_stats[:final] = GC.stat.dup
    report_processing_stats(total_processed, gc_stats)
  end
  
  private
  
  def simulate_database_results(query)
    # Simulate large result set processing
    total_records = 50_000
    (0...total_records).each_slice(@batch_size) do |batch_ids|
      # Simulate fetching batch from database
      batch = batch_ids.map do |id|
        {
          id: id,
          data: "record_data_#{id}" * rand(10..50),
          metadata: { processed_at: Time.now, batch: id / @batch_size }
        }
      end
      yield batch
    end
  end
  
  def process_batch(batch)
    # Simulate processing work that creates temporary objects
    batch.each do |record|
      # Transform data (creates intermediate objects)
      processed = record[:data].upcase.split('_').join('-')
      
      # Validate (creates temporary objects)
      validation_result = processed.length > 10 && processed.include?('-')
      
      # Store or transmit result (retain only necessary data)
      store_processed_record(record[:id], validation_result)
    end
  end
  
  def store_processed_record(id, valid)
    # Simulate storing minimal data
    @results ||= {}
    @results[id] = valid
  end
  
  def report_processing_stats(total_processed, gc_stats)
    initial = gc_stats[:initial]
    final = gc_stats[:final]
    
    puts "\nProcessing complete:"
    puts "  Records processed: #{total_processed}"
    puts "  GC runs: #{final[:count] - initial[:count]}"
    puts "  Objects allocated: #{final[:total_allocated_objects] - initial[:total_allocated_objects]}"
    puts "  Objects freed: #{final[:total_freed_objects] - initial[:total_freed_objects]}"
    puts "  Final heap pages: #{final[:heap_allocated_pages]}"
    puts "  Results stored: #{@results&.size || 0}"
  end
end

# Process large dataset with controlled GC
processor = BatchProcessor.new(batch_size: 500, gc_interval: 5)
processor.process_large_dataset("SELECT * FROM large_table WHERE active = true")

Reference

Core Methods

Method Parameters Returns Description
GC.start full_mark: true, immediate_sweep: true nil Triggers garbage collection immediately
GC.enable none true or false Enables automatic GC, returns previous state
GC.disable none true or false Disables automatic GC, returns previous state
GC.enable? none true or false Returns current automatic GC state
GC.count none Integer Returns number of GC runs since start
GC.stat hash = nil, symbol = nil Hash or Integer Returns GC statistics

GC Statistics Keys

Statistic Type Description
:count Integer Total garbage collections
:major_gc_count Integer Major (full) garbage collections
:minor_gc_count Integer Minor (generational) garbage collections
:heap_allocated_pages Integer Total memory pages allocated
:heap_available_slots Integer Total object slots available
:heap_live_slots Integer Object slots currently in use
:heap_free_slots Integer Object slots available for allocation
:heap_final_slots Integer Object slots swept in last GC
:total_allocated_objects Integer Objects allocated since start
:total_freed_objects Integer Objects freed by GC since start

Memory Metrics Reference

Metric Calculation Interpretation
Memory Utilization heap_live_slots / heap_available_slots Percentage of allocated memory in use
Allocation Rate total_allocated_objects / uptime Objects allocated per second
GC Frequency count / uptime Collections per second
Collection Efficiency total_freed_objects / total_allocated_objects Percentage of objects eventually freed
Heap Growth Rate heap_allocated_pages over time Memory expansion pattern

GC Tuning Environment Variables

Variable Values Effect
RUBY_GC_HEAP_INIT_SLOTS Integer Initial object slots allocated
RUBY_GC_HEAP_FREE_SLOTS Integer Minimum free slots maintained
RUBY_GC_HEAP_GROWTH_FACTOR Float Heap expansion multiplier
RUBY_GC_HEAP_GROWTH_MAX_SLOTS Integer Maximum slots added per expansion
RUBY_GC_MALLOC_LIMIT Integer Malloc bytes triggering GC
RUBY_GC_MALLOC_LIMIT_MAX Integer Maximum malloc limit
RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR Float Malloc limit growth rate

ObjectSpace Integration

Method Parameters Returns Description
ObjectSpace.count_objects result_hash = nil Hash Object counts by type
ObjectSpace.count_objects_size result_hash = nil Hash Memory usage by object type
ObjectSpace.memsize_of(obj) Object Integer Memory size of specific object
ObjectSpace.memsize_of_all klass = nil Integer Total memory usage by class

Performance Patterns

Pattern Code Example Use Case
Batch GC GC.disable; process_batch; GC.enable; GC.start Eliminate GC pauses during critical work
Memory Monitoring before = GC.stat; work; after = GC.stat Track allocation patterns
Heap Preallocation Set RUBY_GC_HEAP_INIT_SLOTS Reduce early heap expansions
Object Reuse @buffer.clear; populate_buffer Minimize allocation overhead
Periodic Collection GC.start if counter % interval == 0 Control collection timing

Common GC Statistics Combinations

# Memory pressure indicators
def memory_pressure_score
  stats = GC.stat
  live_ratio = stats[:heap_live_slots].to_f / stats[:heap_available_slots]
  free_ratio = stats[:heap_free_slots].to_f / stats[:heap_available_slots]
  
  pressure = (live_ratio * 0.7) + ((1.0 - free_ratio) * 0.3)
  (pressure * 100).round(2)
end

# Allocation efficiency metrics  
def allocation_efficiency
  stats = GC.stat
  return 0 if stats[:total_allocated_objects] == 0
  
  freed_ratio = stats[:total_freed_objects].to_f / stats[:total_allocated_objects]
  (freed_ratio * 100).round(2)
end

# GC overhead estimation
def gc_overhead_estimate
  stats = GC.stat
  # Rough estimate: each major GC ~5ms, minor GC ~1ms
  major_time = stats[:major_gc_count] * 0.005
  minor_time = stats[:minor_gc_count] * 0.001
  
  {
    major_gc_time: major_time,
    minor_gc_time: minor_time, 
    total_gc_time: major_time + minor_time
  }
end