CrackedRuby CrackedRuby

Garbage Collection Concepts

Overview

Garbage collection automates memory management by identifying and reclaiming memory occupied by objects no longer accessible to a program. The garbage collector tracks object references, determines which objects remain reachable from program execution, and frees memory occupied by unreachable objects.

Manual memory management requires programmers to explicitly allocate and deallocate memory, leading to errors including memory leaks (failing to free unused memory), dangling pointers (accessing freed memory), and double-free bugs (freeing the same memory twice). Garbage collection eliminates these categories of errors by managing the entire lifecycle of memory allocation and deallocation.

The fundamental operation involves two phases: identifying live objects and reclaiming dead objects. A live object is reachable through some chain of references starting from root references, which include global variables, stack frames, and CPU registers. Dead objects have no path of references from any root.

# Objects become garbage when no references remain
def create_temporary
  temp = "This string will be garbage collected"
  # temp goes out of scope when method returns
end

create_temporary
# The string object has no remaining references
# GC will reclaim its memory

Programming languages implementing garbage collection include Ruby, Python, Java, JavaScript, Go, and C#. Languages requiring manual memory management include C, C++, and Rust (which uses ownership rules instead of GC).

Key Principles

Reachability Analysis

Garbage collectors determine object liveness through reachability from root references. Root references form the starting point for tracing object graphs. The collector marks all objects reachable from roots, then reclaims unmarked objects.

Roots include:

  • Global variables and constants
  • Local variables in active stack frames
  • CPU registers holding object references
  • Thread-local storage
  • JIT-compiled code references

An object becomes garbage when no chain of references connects it to any root. Circular references between objects do not prevent collection if the entire cycle is unreachable from roots.

class Node
  attr_accessor :next
  
  def initialize(value)
    @value = value
  end
end

# Create circular reference
node1 = Node.new(1)
node2 = Node.new(2)
node1.next = node2
node2.next = node1

# Break reference from root
node1 = nil
node2 = nil

# Both nodes are now garbage despite circular reference
# GC can collect them because no root references exist

Generational Hypothesis

Most objects die young. The generational hypothesis observes that recently allocated objects become garbage more frequently than long-lived objects. Generational collectors exploit this pattern by segregating objects by age and collecting young objects more frequently.

A typical generational scheme uses multiple generations:

  • Young generation: newly allocated objects
  • Old generation: objects surviving multiple collections
  • Permanent generation: class metadata and constants (in some implementations)

Promotion moves objects from younger to older generations after surviving collections. Minor collections scan only young generations, running frequently with low overhead. Major collections scan all generations, running less frequently but with higher overhead.

Write Barriers

Generational collection requires tracking references from old objects to young objects. Write barriers intercept pointer updates to maintain this information. When an old object receives a reference to a young object, the write barrier records this cross-generational reference.

Without write barriers, collecting only the young generation risks collecting live young objects referenced solely from old objects. The write barrier maintains a remembered set of old-to-young references, treated as additional roots during minor collections.

Stop-the-World vs Concurrent Collection

Stop-the-world collection pauses all application threads during garbage collection. This ensures objects remain stationary while the collector examines the heap, simplifying collector implementation but introducing latency spikes.

Concurrent collection runs simultaneously with application threads, reducing pause times at the cost of increased complexity. The collector must handle objects being modified during collection, requiring write barriers and additional bookkeeping.

Incremental collection divides collection work into smaller units, interleaving brief collection phases with application execution. This bounds maximum pause times but may increase total collection overhead.

Conservative vs Precise Collection

Conservative collectors treat any bit pattern that could represent a memory address as a potential reference. This allows collecting languages without explicit type information but prevents moving objects (the apparent reference might not be real) and can cause memory leaks (data values mistaken for references).

Precise collectors know exactly which values are references, enabled by type information from the runtime or compiler. Precise collection allows compacting collectors to move objects and guarantees collecting all garbage.

Ruby Implementation

Ruby uses a mark-and-sweep garbage collector with generational collection capabilities. The implementation combines multiple collection strategies to balance throughput and pause times.

Mark-and-Sweep Algorithm

Ruby's primary collection algorithm marks reachable objects, then sweeps through memory to reclaim unmarked objects. The marking phase traverses the object graph from roots, setting a mark bit on each reachable object. The sweep phase iterates through all allocated memory, freeing objects without mark bits.

# Objects are marked during collection
class Container
  def initialize
    @objects = Array.new(1000) { |i| "Object #{i}" }
  end
end

# Create objects
container = Container.new

# Force garbage collection
GC.start

# container and its objects are marked as reachable
# Other unreferenced objects are collected

Generational Collection

Ruby implements three generations: young, old, and permanent. New objects allocate in the young generation. Objects surviving collections promote to old generation. Permanent generation holds long-lived objects like frozen strings and symbols.

# Control generational collection behavior
GC.start(full_mark: false)  # Minor collection (young generation)
GC.start(full_mark: true)   # Major collection (all generations)

# Check object generation
obj = "string"
ObjectSpace.trace_object_allocations_start
new_obj = "new string"
generation = ObjectSpace.allocation_generation(new_obj)
ObjectSpace.trace_object_allocations_stop

Tri-color Marking

Ruby uses tri-color marking to implement incremental collection. Objects transition through three colors:

  • White: not yet examined
  • Gray: examined but references not yet scanned
  • Black: examined with all references scanned

Collection proceeds by processing gray objects, marking their references gray, and coloring the processed object black. When no gray objects remain, white objects are garbage.

# Observe marking behavior through stats
before = GC.stat(:total_allocated_objects)
10000.times { Object.new }
after = GC.stat(:total_allocated_objects)

puts "Allocated: #{after - before} objects"
puts "Heap pages: #{GC.stat(:heap_allocated_pages)}"
puts "Old objects: #{GC.stat(:old_objects)}"

Write Barriers

Ruby implements write barriers to track old-to-young references. When an old object receives a reference to a young object, the write barrier marks the old object, adding it to the remembered set for minor collections.

# Write barrier activates when old object references young object
old_array = []
GC.start  # Promote old_array to old generation

100.times { GC.start }  # Ensure old_array is in old generation

young_object = "newly created"
old_array << young_object  # Write barrier triggers here

GC Tuning Variables

Ruby exposes environment variables for tuning garbage collection:

# Set GC parameters via environment variables or programmatically
GC::INTERNAL_CONSTANTS
# => {:RVALUE_SIZE=>40, :HEAP_PAGE_OBJ_LIMIT=>408, ...}

# Adjust malloc thresholds
GC.malloc_allocated_size
GC.malloc_allocations

# Tune heap growth
ENV['RUBY_GC_HEAP_INIT_SLOTS'] = '100000'
ENV['RUBY_GC_HEAP_GROWTH_FACTOR'] = '1.1'
ENV['RUBY_GC_HEAP_GROWTH_MAX_SLOTS'] = '100000'

Compaction

Recent Ruby versions support heap compaction to reduce memory fragmentation. Compaction moves objects to consolidate free memory, updating all references to moved objects.

# Manual compaction
GC.compact

# Auto-compaction during major GC
GC.auto_compact = true

# Check compaction statistics
stats = GC.stat
puts "Moved objects: #{stats[:compact_count]}"

Performance Considerations

Collection Frequency

Frequent collections reduce memory usage but increase CPU overhead. Infrequent collections allow memory to grow but reduce collection overhead. The optimal frequency depends on allocation rate, heap size, and acceptable pause times.

# Monitor collection frequency
before_count = GC.count
sleep 1
after_count = GC.count

puts "Collections per second: #{after_count - before_count}"

# Tune collection triggers
GC::OPTS  # View current options

Pause Time Impact

Stop-the-world collections pause application execution. Pause duration correlates with heap size and number of live objects. Large heaps with many live objects produce longer pauses.

Minor collections examine only young generation, producing shorter pauses. Major collections examine all generations, producing longer pauses proportional to total heap size.

require 'benchmark'

# Measure GC pause time
result = Benchmark.measure do
  GC.start
end

puts "GC pause: #{result.real} seconds"

# Minimize pause times by keeping heap smaller
# and promoting fewer objects to old generation

Allocation Pressure

High allocation rates increase collection frequency. Reducing allocation pressure decreases GC overhead. Object pooling, object reuse, and avoiding temporary objects reduce allocations.

# High allocation pressure
def inefficient
  1000.times do
    temp = "temporary string"  # Allocates new object each iteration
    process(temp)
  end
end

# Lower allocation pressure  
def efficient
  temp = "reused string"
  1000.times do
    temp.replace("new content")  # Reuses same object
    process(temp)
  end
end

Memory Fragmentation

Fragmentation occurs when free memory scatters across non-contiguous regions. Ruby's heap consists of fixed-size pages. Objects of different sizes can cause fragmentation within pages, reducing effective memory utilization.

Compaction addresses fragmentation by moving objects to consolidate free space. However, compaction itself incurs overhead during the compaction phase.

Write Barrier Overhead

Generational collection imposes write barrier overhead on reference updates. Each pointer assignment checks whether it creates an old-to-young reference. This overhead is typically small but measurable in write-heavy workloads.

# Write barrier overhead in hot path
array = Array.new(1000)
GC.start  # Promote to old generation
100.times { GC.start }

# Each append triggers write barrier
Benchmark.bmbm do |x|
  x.report("append") do
    100000.times { array << Object.new }
  end
end

GC-Aware Programming

Understanding GC behavior enables performance optimization. Patterns include:

  • Reusing objects instead of allocating new ones
  • Reducing object lifetime variance (objects die together)
  • Avoiding unexpected object retention
  • Breaking circular references explicitly when possible
  • Using frozen objects for constants
# GC-aware pattern: object pooling
class ObjectPool
  def initialize
    @pool = []
  end
  
  def acquire
    @pool.pop || create_new
  end
  
  def release(obj)
    obj.reset
    @pool.push(obj)
  end
  
  private
  
  def create_new
    PooledObject.new
  end
end

Common Patterns

Object Pooling

Object pooling reuses objects instead of allocating new ones. This reduces allocation rate and GC pressure. The pool maintains a collection of reusable objects, lending them to clients and reclaiming them after use.

class ConnectionPool
  def initialize(size)
    @pool = size.times.map { create_connection }
    @mutex = Mutex.new
  end
  
  def acquire
    @mutex.synchronize do
      @pool.pop || create_connection
    end
  end
  
  def release(connection)
    @mutex.synchronize do
      @pool.push(connection) if @pool.size < @max_size
    end
  end
  
  private
  
  def create_connection
    Connection.new
  end
end

Weak References

Weak references allow referencing objects without preventing collection. The weak reference becomes nil when the garbage collector reclaims the referenced object. This enables cache implementations that don't prevent collection.

require 'weakref'

class Cache
  def initialize
    @cache = {}
  end
  
  def get(key)
    weak_ref = @cache[key]
    return nil unless weak_ref
    
    begin
      weak_ref.__getobj__
    rescue WeakRef::RefError
      @cache.delete(key)
      nil
    end
  end
  
  def set(key, value)
    @cache[key] = WeakRef.new(value)
  end
end

# Usage
cache = Cache.new
obj = "expensive object"
cache.set(:key, obj)

# Object can be collected even though cache references it
obj = nil
GC.start

# Cache returns nil after collection
cache.get(:key)  # => nil

Finalizers

Finalizers execute code when an object is garbage collected. Ruby implements finalizers through ObjectSpace.define_finalizer. Finalizers must not reference the object being finalized (which would prevent collection).

class Resource
  def initialize(name)
    @name = name
    
    # Finalizer must not reference self
    ObjectSpace.define_finalizer(self, 
      self.class.finalizer(@name))
  end
  
  def self.finalizer(name)
    proc { puts "Resource #{name} finalized" }
  end
end

# Create and discard resource
Resource.new("test")
GC.start  # Triggers finalizer

Escape Analysis Optimization

While Ruby doesn't perform escape analysis automatically, understanding the concept helps minimize object lifetime. Objects that don't escape method scope could theoretically stack-allocate, but Ruby heap-allocates all objects. Minimizing object creation in hot paths reduces GC pressure.

# Object escapes method scope
def create_and_return
  value = expensive_computation
  value  # Escapes
end

result = create_and_return  # Object must survive GC

# Object doesn't escape
def create_and_use
  value = expensive_computation
  process_locally(value)
  # value doesn't escape, becomes garbage quickly
end

Reducing Object Churn

Object churn refers to high allocation and deallocation rates. Reducing churn improves performance by decreasing GC frequency. Techniques include memoization, caching computed values, and avoiding allocations in loops.

# High churn
def process_items(items)
  items.map do |item|
    result = transform(item)  # Allocates each iteration
    format(result)            # Allocates again
  end
end

# Lower churn
def process_items_efficient(items)
  items.map! do |item|  # Modifies in place when possible
    transform_format(item)  # Combined operation
  end
end

Common Pitfalls

Memory Leaks in Garbage Collected Languages

Garbage collection prevents certain memory leaks but not all. Memory leaks occur when objects remain reachable unintentionally. Common sources include:

  • Global collections growing unbounded
  • Event listeners never removed
  • Cache without eviction policy
  • Thread-local storage never cleaned
  • Circular references through closures
# Memory leak: unbounded global collection
$global_cache = {}

def cache_result(key, value)
  $global_cache[key] = value  # Never removed
end

# Fix: implement cache eviction
require 'lru_redux'

$global_cache = LruRedux::Cache.new(1000)

Unintended Object Retention

Objects survive collection when references exist in unexpected places. Closures capture references to surrounding scope. Instance variables hold references for object lifetime. Class variables persist until class unloads.

# Unintended retention through closure
class Handler
  def setup
    large_data = load_large_dataset
    
    # Closure captures large_data
    @callback = proc { |x| process(x, large_data) }
  end
  
  def handle(input)
    @callback.call(input)
    # large_data retained as long as Handler instance exists
  end
end

# Fix: explicitly release data
class Handler
  def setup
    large_data = load_large_dataset
    processed = preprocess(large_data)
    large_data = nil  # Explicit release
    
    @callback = proc { |x| process(x, processed) }
  end
end

Finalizer Pitfalls

Finalizers introduce complexity and potential errors. Finalizer execution timing is unpredictable. Finalizers run in unpredictable order. Finalizers may never run if program exits before GC. Finalizers must not resurrect objects or reference the finalized object.

# Problematic finalizer
class BadFinalizer
  def initialize
    # Captures self reference, prevents collection
    ObjectSpace.define_finalizer(self, proc { cleanup(self) })
  end
end

# Better approach: explicit cleanup
class GoodResource
  def initialize
    @file = open_file
  end
  
  def close
    @file.close
  end
  
  # Finalizer as backup only
  def self.finalizer(file)
    proc { file.close rescue nil }
  end
end

# Use with explicit cleanup
resource = GoodResource.new
begin
  use_resource(resource)
ensure
  resource.close
end

Assuming Deterministic Collection

Garbage collection timing is non-deterministic. Objects may survive multiple collection cycles. Finalizers may not run promptly. Programs cannot rely on specific collection timing.

# Wrong: assuming immediate collection
def process_file
  file = File.open("data.txt")
  read_data(file)
  file = nil  # Doesn't guarantee immediate finalization
  # File may remain open
end

# Correct: explicit resource management
def process_file
  file = File.open("data.txt")
  begin
    read_data(file)
  ensure
    file.close
  end
end

Stop-the-World Impact Underestimation

GC pauses affect application latency. Long pauses disrupt time-sensitive operations. Real-time systems may miss deadlines during collection. Network services may timeout during pauses.

# Measure actual pause impact
server = TCPServer.new(3000)

Thread.new do
  loop do
    client = server.accept
    start = Time.now
    
    # Long GC pause affects response time
    handle_request(client)
    
    duration = Time.now - start
    puts "Request took #{duration}s (includes GC)"
    client.close
  end
end

Premature Optimization

Optimizing for GC before measuring impact wastes effort. Profile GC behavior before optimizing. Many perceived GC problems actually stem from other issues. Measure actual pause times and allocation rates before changing code.

# Profile before optimizing
require 'objspace'

GC.stat  # Baseline statistics

# Run workload
result = perform_operation

# Measure allocations
allocations = ObjectSpace.count_objects
puts "Objects: #{allocations[:TOTAL]}"
puts "GC runs: #{GC.count}"

Tools & Ecosystem

GC Statistics

Ruby provides detailed GC statistics through GC.stat. These statistics reveal collection frequency, pause times, heap size, and object counts.

# Comprehensive GC statistics
stats = GC.stat

puts "Collections: #{stats[:count]}"
puts "Major collections: #{stats[:major_gc_count]}"
puts "Minor collections: #{stats[:minor_gc_count]}"
puts "Heap pages: #{stats[:heap_allocated_pages]}"
puts "Live objects: #{stats[:heap_live_slots]}"
puts "Free slots: #{stats[:heap_free_slots]}"
puts "Old objects: #{stats[:old_objects]}"

Memory Profiler Gem

The memory_profiler gem tracks memory allocations and retained objects. It identifies allocation sources, object types, and retention patterns.

require 'memory_profiler'

report = MemoryProfiler.report do
  # Code to profile
  1000.times { |i| "string #{i}" }
end

report.pretty_print

# Output shows:
# - Total allocated memory
# - Total retained memory  
# - Allocated objects by gem/file/location
# - Retained objects by gem/file/location

Derailed Benchmarks

Derailed benchmarks measures memory usage in Rails applications. It identifies memory bloat, allocation sources, and memory leaks.

# Add to Gemfile
gem 'derailed_benchmarks', group: :development

# Measure memory usage
# $ bundle exec derailed bundle:mem

# Identify memory bloat
# $ bundle exec derailed bundle:mem

# Find memory leaks
# $ bundle exec derailed exec perf:mem_over_time

GC::Profiler

GC::Profiler provides detailed GC timing information. It records individual collection events with timestamps and durations.

GC::Profiler.enable

# Run code to profile
perform_operations

GC::Profiler.report

# Detailed report format
result = GC::Profiler.result
result.each do |entry|
  puts "GC: #{entry[:GC_TIME]}s"
end

GC::Profiler.disable

ObjectSpace

ObjectSpace module provides object introspection and allocation tracking. It counts objects by type, traces allocations, and walks the object graph.

ObjectSpace.trace_object_allocations_start

# Code to trace
objects = Array.new(100) { Object.new }

ObjectSpace.trace_object_allocations_stop

# Find allocation sources
objects.each do |obj|
  file = ObjectSpace.allocation_sourcefile(obj)
  line = ObjectSpace.allocation_sourceline(obj)
  puts "Allocated at #{file}:#{line}"
end

# Count objects by type
counts = ObjectSpace.count_objects
puts "Strings: #{counts[:T_STRING]}"
puts "Arrays: #{counts[:T_ARRAY]}"

Heap Dump Analysis

Ruby can dump the entire heap for offline analysis. Tools like heapy analyze heap dumps to identify memory leaks and bloat.

require 'objspace'

# Generate heap dump
File.open('heap.dump', 'w') do |file|
  ObjectSpace.dump_all(output: file)
end

# Analyze with heapy gem
# $ heapy read heap.dump

Rack Mini Profiler

Rack Mini Profiler adds memory profiling to web applications. It displays allocation information for each request, identifying expensive operations.

# Add to Gemfile
gem 'rack-mini-profiler'
gem 'memory_profiler'

# Automatically profiles requests
# Shows allocation counts and retained memory
# Access via query parameter: ?pp=profile-memory

Reference

Collection Algorithms

Algorithm Description Characteristics
Mark-and-Sweep Marks reachable objects then reclaims unmarked objects Simple implementation, fragments memory, stop-the-world
Reference Counting Tracks reference count per object, frees when count reaches zero Immediate reclamation, cannot handle cycles, reference update overhead
Copying Collection Copies live objects to new space, reclaims old space Eliminates fragmentation, requires double memory, proportional to live objects
Generational Segregates objects by age, collects young generation more frequently Reduces pause times, exploits generational hypothesis, requires write barriers
Incremental Divides collection into small steps interleaved with execution Bounds pause times, increased overhead, more complex implementation
Concurrent Runs collector simultaneously with application threads Minimal pause times, highest complexity, requires synchronization

Ruby GC Configuration

Parameter Environment Variable Default Purpose
Initial heap slots RUBY_GC_HEAP_INIT_SLOTS 10000 Initial number of object slots
Heap growth factor RUBY_GC_HEAP_GROWTH_FACTOR 1.8 Heap size multiplier after collection
Maximum growth slots RUBY_GC_HEAP_GROWTH_MAX_SLOTS 0 (unlimited) Maximum slots to add per growth
Free slot ratio RUBY_GC_HEAP_FREE_SLOTS 4096 Minimum free slots to maintain
Old object factor RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR 2.0 Old object threshold multiplier
Malloc threshold RUBY_GC_MALLOC_LIMIT 16MB Malloc bytes before collection
Old malloc threshold RUBY_GC_OLDMALLOC_LIMIT 16MB Old object malloc threshold

GC Statistics Keys

Statistic Description Unit
count Total number of GC runs integer
major_gc_count Number of major collections integer
minor_gc_count Number of minor collections integer
heap_allocated_pages Total heap pages allocated integer
heap_live_slots Slots containing live objects integer
heap_free_slots Empty slots available integer
heap_final_slots Slots containing finalizable objects integer
total_allocated_objects Objects allocated since start integer
total_freed_objects Objects freed since start integer
malloc_increase_bytes Malloc bytes since last GC bytes
oldmalloc_increase_bytes Old object malloc bytes bytes
old_objects Objects promoted to old generation integer

Common GC Methods

Method Purpose Example
GC.start Trigger garbage collection GC.start(full_mark: true)
GC.count Return number of collections GC.count
GC.stat Return statistics hash GC.stat(:heap_live_slots)
GC.disable Disable automatic collection GC.disable
GC.enable Enable automatic collection GC.enable
GC.stress Enable stress testing mode GC.stress = true
GC.compact Compact heap to reduce fragmentation GC.compact
GC.verify_compaction_references Verify compaction correctness GC.verify_compaction_references

ObjectSpace Methods

Method Purpose Return Type
ObjectSpace.count_objects Count objects by type Hash
ObjectSpace.each_object Iterate over objects Enumerator
ObjectSpace.garbage_collect Trigger collection nil
ObjectSpace.allocation_sourcefile Get allocation file String
ObjectSpace.allocation_sourceline Get allocation line number Integer
ObjectSpace.allocation_generation Get object generation Integer
ObjectSpace.define_finalizer Register finalizer Array
ObjectSpace.undefine_finalizer Remove finalizer Object

Performance Tuning Guidelines

Scenario Recommendation Rationale
High allocation rate Increase heap size, reduce allocations Fewer collections, lower overhead
Long pause times Enable incremental GC, reduce heap size Shorter individual pauses
Memory constrained Frequent collections, compact heap Minimize memory footprint
High throughput needed Larger heap, infrequent collections Maximize application execution time
Real-time requirements Tune for consistent pause times Predictable latency
Object pooling viable Reuse objects, reduce new allocations Eliminate allocation overhead