Overview
Garbage collection automates memory management by identifying and reclaiming memory occupied by objects no longer accessible to a program. The garbage collector tracks object references, determines which objects remain reachable from program execution, and frees memory occupied by unreachable objects.
Manual memory management requires programmers to explicitly allocate and deallocate memory, leading to errors including memory leaks (failing to free unused memory), dangling pointers (accessing freed memory), and double-free bugs (freeing the same memory twice). Garbage collection eliminates these categories of errors by managing the entire lifecycle of memory allocation and deallocation.
The fundamental operation involves two phases: identifying live objects and reclaiming dead objects. A live object is reachable through some chain of references starting from root references, which include global variables, stack frames, and CPU registers. Dead objects have no path of references from any root.
# Objects become garbage when no references remain
def create_temporary
temp = "This string will be garbage collected"
# temp goes out of scope when method returns
end
create_temporary
# The string object has no remaining references
# GC will reclaim its memory
Programming languages implementing garbage collection include Ruby, Python, Java, JavaScript, Go, and C#. Languages requiring manual memory management include C, C++, and Rust (which uses ownership rules instead of GC).
Key Principles
Reachability Analysis
Garbage collectors determine object liveness through reachability from root references. Root references form the starting point for tracing object graphs. The collector marks all objects reachable from roots, then reclaims unmarked objects.
Roots include:
- Global variables and constants
- Local variables in active stack frames
- CPU registers holding object references
- Thread-local storage
- JIT-compiled code references
An object becomes garbage when no chain of references connects it to any root. Circular references between objects do not prevent collection if the entire cycle is unreachable from roots.
class Node
attr_accessor :next
def initialize(value)
@value = value
end
end
# Create circular reference
node1 = Node.new(1)
node2 = Node.new(2)
node1.next = node2
node2.next = node1
# Break reference from root
node1 = nil
node2 = nil
# Both nodes are now garbage despite circular reference
# GC can collect them because no root references exist
Generational Hypothesis
Most objects die young. The generational hypothesis observes that recently allocated objects become garbage more frequently than long-lived objects. Generational collectors exploit this pattern by segregating objects by age and collecting young objects more frequently.
A typical generational scheme uses multiple generations:
- Young generation: newly allocated objects
- Old generation: objects surviving multiple collections
- Permanent generation: class metadata and constants (in some implementations)
Promotion moves objects from younger to older generations after surviving collections. Minor collections scan only young generations, running frequently with low overhead. Major collections scan all generations, running less frequently but with higher overhead.
Write Barriers
Generational collection requires tracking references from old objects to young objects. Write barriers intercept pointer updates to maintain this information. When an old object receives a reference to a young object, the write barrier records this cross-generational reference.
Without write barriers, collecting only the young generation risks collecting live young objects referenced solely from old objects. The write barrier maintains a remembered set of old-to-young references, treated as additional roots during minor collections.
Stop-the-World vs Concurrent Collection
Stop-the-world collection pauses all application threads during garbage collection. This ensures objects remain stationary while the collector examines the heap, simplifying collector implementation but introducing latency spikes.
Concurrent collection runs simultaneously with application threads, reducing pause times at the cost of increased complexity. The collector must handle objects being modified during collection, requiring write barriers and additional bookkeeping.
Incremental collection divides collection work into smaller units, interleaving brief collection phases with application execution. This bounds maximum pause times but may increase total collection overhead.
Conservative vs Precise Collection
Conservative collectors treat any bit pattern that could represent a memory address as a potential reference. This allows collecting languages without explicit type information but prevents moving objects (the apparent reference might not be real) and can cause memory leaks (data values mistaken for references).
Precise collectors know exactly which values are references, enabled by type information from the runtime or compiler. Precise collection allows compacting collectors to move objects and guarantees collecting all garbage.
Ruby Implementation
Ruby uses a mark-and-sweep garbage collector with generational collection capabilities. The implementation combines multiple collection strategies to balance throughput and pause times.
Mark-and-Sweep Algorithm
Ruby's primary collection algorithm marks reachable objects, then sweeps through memory to reclaim unmarked objects. The marking phase traverses the object graph from roots, setting a mark bit on each reachable object. The sweep phase iterates through all allocated memory, freeing objects without mark bits.
# Objects are marked during collection
class Container
def initialize
@objects = Array.new(1000) { |i| "Object #{i}" }
end
end
# Create objects
container = Container.new
# Force garbage collection
GC.start
# container and its objects are marked as reachable
# Other unreferenced objects are collected
Generational Collection
Ruby implements three generations: young, old, and permanent. New objects allocate in the young generation. Objects surviving collections promote to old generation. Permanent generation holds long-lived objects like frozen strings and symbols.
# Control generational collection behavior
GC.start(full_mark: false) # Minor collection (young generation)
GC.start(full_mark: true) # Major collection (all generations)
# Check object generation
obj = "string"
ObjectSpace.trace_object_allocations_start
new_obj = "new string"
generation = ObjectSpace.allocation_generation(new_obj)
ObjectSpace.trace_object_allocations_stop
Tri-color Marking
Ruby uses tri-color marking to implement incremental collection. Objects transition through three colors:
- White: not yet examined
- Gray: examined but references not yet scanned
- Black: examined with all references scanned
Collection proceeds by processing gray objects, marking their references gray, and coloring the processed object black. When no gray objects remain, white objects are garbage.
# Observe marking behavior through stats
before = GC.stat(:total_allocated_objects)
10000.times { Object.new }
after = GC.stat(:total_allocated_objects)
puts "Allocated: #{after - before} objects"
puts "Heap pages: #{GC.stat(:heap_allocated_pages)}"
puts "Old objects: #{GC.stat(:old_objects)}"
Write Barriers
Ruby implements write barriers to track old-to-young references. When an old object receives a reference to a young object, the write barrier marks the old object, adding it to the remembered set for minor collections.
# Write barrier activates when old object references young object
old_array = []
GC.start # Promote old_array to old generation
100.times { GC.start } # Ensure old_array is in old generation
young_object = "newly created"
old_array << young_object # Write barrier triggers here
GC Tuning Variables
Ruby exposes environment variables for tuning garbage collection:
# Set GC parameters via environment variables or programmatically
GC::INTERNAL_CONSTANTS
# => {:RVALUE_SIZE=>40, :HEAP_PAGE_OBJ_LIMIT=>408, ...}
# Adjust malloc thresholds
GC.malloc_allocated_size
GC.malloc_allocations
# Tune heap growth
ENV['RUBY_GC_HEAP_INIT_SLOTS'] = '100000'
ENV['RUBY_GC_HEAP_GROWTH_FACTOR'] = '1.1'
ENV['RUBY_GC_HEAP_GROWTH_MAX_SLOTS'] = '100000'
Compaction
Recent Ruby versions support heap compaction to reduce memory fragmentation. Compaction moves objects to consolidate free memory, updating all references to moved objects.
# Manual compaction
GC.compact
# Auto-compaction during major GC
GC.auto_compact = true
# Check compaction statistics
stats = GC.stat
puts "Moved objects: #{stats[:compact_count]}"
Performance Considerations
Collection Frequency
Frequent collections reduce memory usage but increase CPU overhead. Infrequent collections allow memory to grow but reduce collection overhead. The optimal frequency depends on allocation rate, heap size, and acceptable pause times.
# Monitor collection frequency
before_count = GC.count
sleep 1
after_count = GC.count
puts "Collections per second: #{after_count - before_count}"
# Tune collection triggers
GC::OPTS # View current options
Pause Time Impact
Stop-the-world collections pause application execution. Pause duration correlates with heap size and number of live objects. Large heaps with many live objects produce longer pauses.
Minor collections examine only young generation, producing shorter pauses. Major collections examine all generations, producing longer pauses proportional to total heap size.
require 'benchmark'
# Measure GC pause time
result = Benchmark.measure do
GC.start
end
puts "GC pause: #{result.real} seconds"
# Minimize pause times by keeping heap smaller
# and promoting fewer objects to old generation
Allocation Pressure
High allocation rates increase collection frequency. Reducing allocation pressure decreases GC overhead. Object pooling, object reuse, and avoiding temporary objects reduce allocations.
# High allocation pressure
def inefficient
1000.times do
temp = "temporary string" # Allocates new object each iteration
process(temp)
end
end
# Lower allocation pressure
def efficient
temp = "reused string"
1000.times do
temp.replace("new content") # Reuses same object
process(temp)
end
end
Memory Fragmentation
Fragmentation occurs when free memory scatters across non-contiguous regions. Ruby's heap consists of fixed-size pages. Objects of different sizes can cause fragmentation within pages, reducing effective memory utilization.
Compaction addresses fragmentation by moving objects to consolidate free space. However, compaction itself incurs overhead during the compaction phase.
Write Barrier Overhead
Generational collection imposes write barrier overhead on reference updates. Each pointer assignment checks whether it creates an old-to-young reference. This overhead is typically small but measurable in write-heavy workloads.
# Write barrier overhead in hot path
array = Array.new(1000)
GC.start # Promote to old generation
100.times { GC.start }
# Each append triggers write barrier
Benchmark.bmbm do |x|
x.report("append") do
100000.times { array << Object.new }
end
end
GC-Aware Programming
Understanding GC behavior enables performance optimization. Patterns include:
- Reusing objects instead of allocating new ones
- Reducing object lifetime variance (objects die together)
- Avoiding unexpected object retention
- Breaking circular references explicitly when possible
- Using frozen objects for constants
# GC-aware pattern: object pooling
class ObjectPool
def initialize
@pool = []
end
def acquire
@pool.pop || create_new
end
def release(obj)
obj.reset
@pool.push(obj)
end
private
def create_new
PooledObject.new
end
end
Common Patterns
Object Pooling
Object pooling reuses objects instead of allocating new ones. This reduces allocation rate and GC pressure. The pool maintains a collection of reusable objects, lending them to clients and reclaiming them after use.
class ConnectionPool
def initialize(size)
@pool = size.times.map { create_connection }
@mutex = Mutex.new
end
def acquire
@mutex.synchronize do
@pool.pop || create_connection
end
end
def release(connection)
@mutex.synchronize do
@pool.push(connection) if @pool.size < @max_size
end
end
private
def create_connection
Connection.new
end
end
Weak References
Weak references allow referencing objects without preventing collection. The weak reference becomes nil when the garbage collector reclaims the referenced object. This enables cache implementations that don't prevent collection.
require 'weakref'
class Cache
def initialize
@cache = {}
end
def get(key)
weak_ref = @cache[key]
return nil unless weak_ref
begin
weak_ref.__getobj__
rescue WeakRef::RefError
@cache.delete(key)
nil
end
end
def set(key, value)
@cache[key] = WeakRef.new(value)
end
end
# Usage
cache = Cache.new
obj = "expensive object"
cache.set(:key, obj)
# Object can be collected even though cache references it
obj = nil
GC.start
# Cache returns nil after collection
cache.get(:key) # => nil
Finalizers
Finalizers execute code when an object is garbage collected. Ruby implements finalizers through ObjectSpace.define_finalizer. Finalizers must not reference the object being finalized (which would prevent collection).
class Resource
def initialize(name)
@name = name
# Finalizer must not reference self
ObjectSpace.define_finalizer(self,
self.class.finalizer(@name))
end
def self.finalizer(name)
proc { puts "Resource #{name} finalized" }
end
end
# Create and discard resource
Resource.new("test")
GC.start # Triggers finalizer
Escape Analysis Optimization
While Ruby doesn't perform escape analysis automatically, understanding the concept helps minimize object lifetime. Objects that don't escape method scope could theoretically stack-allocate, but Ruby heap-allocates all objects. Minimizing object creation in hot paths reduces GC pressure.
# Object escapes method scope
def create_and_return
value = expensive_computation
value # Escapes
end
result = create_and_return # Object must survive GC
# Object doesn't escape
def create_and_use
value = expensive_computation
process_locally(value)
# value doesn't escape, becomes garbage quickly
end
Reducing Object Churn
Object churn refers to high allocation and deallocation rates. Reducing churn improves performance by decreasing GC frequency. Techniques include memoization, caching computed values, and avoiding allocations in loops.
# High churn
def process_items(items)
items.map do |item|
result = transform(item) # Allocates each iteration
format(result) # Allocates again
end
end
# Lower churn
def process_items_efficient(items)
items.map! do |item| # Modifies in place when possible
transform_format(item) # Combined operation
end
end
Common Pitfalls
Memory Leaks in Garbage Collected Languages
Garbage collection prevents certain memory leaks but not all. Memory leaks occur when objects remain reachable unintentionally. Common sources include:
- Global collections growing unbounded
- Event listeners never removed
- Cache without eviction policy
- Thread-local storage never cleaned
- Circular references through closures
# Memory leak: unbounded global collection
$global_cache = {}
def cache_result(key, value)
$global_cache[key] = value # Never removed
end
# Fix: implement cache eviction
require 'lru_redux'
$global_cache = LruRedux::Cache.new(1000)
Unintended Object Retention
Objects survive collection when references exist in unexpected places. Closures capture references to surrounding scope. Instance variables hold references for object lifetime. Class variables persist until class unloads.
# Unintended retention through closure
class Handler
def setup
large_data = load_large_dataset
# Closure captures large_data
@callback = proc { |x| process(x, large_data) }
end
def handle(input)
@callback.call(input)
# large_data retained as long as Handler instance exists
end
end
# Fix: explicitly release data
class Handler
def setup
large_data = load_large_dataset
processed = preprocess(large_data)
large_data = nil # Explicit release
@callback = proc { |x| process(x, processed) }
end
end
Finalizer Pitfalls
Finalizers introduce complexity and potential errors. Finalizer execution timing is unpredictable. Finalizers run in unpredictable order. Finalizers may never run if program exits before GC. Finalizers must not resurrect objects or reference the finalized object.
# Problematic finalizer
class BadFinalizer
def initialize
# Captures self reference, prevents collection
ObjectSpace.define_finalizer(self, proc { cleanup(self) })
end
end
# Better approach: explicit cleanup
class GoodResource
def initialize
@file = open_file
end
def close
@file.close
end
# Finalizer as backup only
def self.finalizer(file)
proc { file.close rescue nil }
end
end
# Use with explicit cleanup
resource = GoodResource.new
begin
use_resource(resource)
ensure
resource.close
end
Assuming Deterministic Collection
Garbage collection timing is non-deterministic. Objects may survive multiple collection cycles. Finalizers may not run promptly. Programs cannot rely on specific collection timing.
# Wrong: assuming immediate collection
def process_file
file = File.open("data.txt")
read_data(file)
file = nil # Doesn't guarantee immediate finalization
# File may remain open
end
# Correct: explicit resource management
def process_file
file = File.open("data.txt")
begin
read_data(file)
ensure
file.close
end
end
Stop-the-World Impact Underestimation
GC pauses affect application latency. Long pauses disrupt time-sensitive operations. Real-time systems may miss deadlines during collection. Network services may timeout during pauses.
# Measure actual pause impact
server = TCPServer.new(3000)
Thread.new do
loop do
client = server.accept
start = Time.now
# Long GC pause affects response time
handle_request(client)
duration = Time.now - start
puts "Request took #{duration}s (includes GC)"
client.close
end
end
Premature Optimization
Optimizing for GC before measuring impact wastes effort. Profile GC behavior before optimizing. Many perceived GC problems actually stem from other issues. Measure actual pause times and allocation rates before changing code.
# Profile before optimizing
require 'objspace'
GC.stat # Baseline statistics
# Run workload
result = perform_operation
# Measure allocations
allocations = ObjectSpace.count_objects
puts "Objects: #{allocations[:TOTAL]}"
puts "GC runs: #{GC.count}"
Tools & Ecosystem
GC Statistics
Ruby provides detailed GC statistics through GC.stat. These statistics reveal collection frequency, pause times, heap size, and object counts.
# Comprehensive GC statistics
stats = GC.stat
puts "Collections: #{stats[:count]}"
puts "Major collections: #{stats[:major_gc_count]}"
puts "Minor collections: #{stats[:minor_gc_count]}"
puts "Heap pages: #{stats[:heap_allocated_pages]}"
puts "Live objects: #{stats[:heap_live_slots]}"
puts "Free slots: #{stats[:heap_free_slots]}"
puts "Old objects: #{stats[:old_objects]}"
Memory Profiler Gem
The memory_profiler gem tracks memory allocations and retained objects. It identifies allocation sources, object types, and retention patterns.
require 'memory_profiler'
report = MemoryProfiler.report do
# Code to profile
1000.times { |i| "string #{i}" }
end
report.pretty_print
# Output shows:
# - Total allocated memory
# - Total retained memory
# - Allocated objects by gem/file/location
# - Retained objects by gem/file/location
Derailed Benchmarks
Derailed benchmarks measures memory usage in Rails applications. It identifies memory bloat, allocation sources, and memory leaks.
# Add to Gemfile
gem 'derailed_benchmarks', group: :development
# Measure memory usage
# $ bundle exec derailed bundle:mem
# Identify memory bloat
# $ bundle exec derailed bundle:mem
# Find memory leaks
# $ bundle exec derailed exec perf:mem_over_time
GC::Profiler
GC::Profiler provides detailed GC timing information. It records individual collection events with timestamps and durations.
GC::Profiler.enable
# Run code to profile
perform_operations
GC::Profiler.report
# Detailed report format
result = GC::Profiler.result
result.each do |entry|
puts "GC: #{entry[:GC_TIME]}s"
end
GC::Profiler.disable
ObjectSpace
ObjectSpace module provides object introspection and allocation tracking. It counts objects by type, traces allocations, and walks the object graph.
ObjectSpace.trace_object_allocations_start
# Code to trace
objects = Array.new(100) { Object.new }
ObjectSpace.trace_object_allocations_stop
# Find allocation sources
objects.each do |obj|
file = ObjectSpace.allocation_sourcefile(obj)
line = ObjectSpace.allocation_sourceline(obj)
puts "Allocated at #{file}:#{line}"
end
# Count objects by type
counts = ObjectSpace.count_objects
puts "Strings: #{counts[:T_STRING]}"
puts "Arrays: #{counts[:T_ARRAY]}"
Heap Dump Analysis
Ruby can dump the entire heap for offline analysis. Tools like heapy analyze heap dumps to identify memory leaks and bloat.
require 'objspace'
# Generate heap dump
File.open('heap.dump', 'w') do |file|
ObjectSpace.dump_all(output: file)
end
# Analyze with heapy gem
# $ heapy read heap.dump
Rack Mini Profiler
Rack Mini Profiler adds memory profiling to web applications. It displays allocation information for each request, identifying expensive operations.
# Add to Gemfile
gem 'rack-mini-profiler'
gem 'memory_profiler'
# Automatically profiles requests
# Shows allocation counts and retained memory
# Access via query parameter: ?pp=profile-memory
Reference
Collection Algorithms
| Algorithm | Description | Characteristics |
|---|---|---|
| Mark-and-Sweep | Marks reachable objects then reclaims unmarked objects | Simple implementation, fragments memory, stop-the-world |
| Reference Counting | Tracks reference count per object, frees when count reaches zero | Immediate reclamation, cannot handle cycles, reference update overhead |
| Copying Collection | Copies live objects to new space, reclaims old space | Eliminates fragmentation, requires double memory, proportional to live objects |
| Generational | Segregates objects by age, collects young generation more frequently | Reduces pause times, exploits generational hypothesis, requires write barriers |
| Incremental | Divides collection into small steps interleaved with execution | Bounds pause times, increased overhead, more complex implementation |
| Concurrent | Runs collector simultaneously with application threads | Minimal pause times, highest complexity, requires synchronization |
Ruby GC Configuration
| Parameter | Environment Variable | Default | Purpose |
|---|---|---|---|
| Initial heap slots | RUBY_GC_HEAP_INIT_SLOTS | 10000 | Initial number of object slots |
| Heap growth factor | RUBY_GC_HEAP_GROWTH_FACTOR | 1.8 | Heap size multiplier after collection |
| Maximum growth slots | RUBY_GC_HEAP_GROWTH_MAX_SLOTS | 0 (unlimited) | Maximum slots to add per growth |
| Free slot ratio | RUBY_GC_HEAP_FREE_SLOTS | 4096 | Minimum free slots to maintain |
| Old object factor | RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR | 2.0 | Old object threshold multiplier |
| Malloc threshold | RUBY_GC_MALLOC_LIMIT | 16MB | Malloc bytes before collection |
| Old malloc threshold | RUBY_GC_OLDMALLOC_LIMIT | 16MB | Old object malloc threshold |
GC Statistics Keys
| Statistic | Description | Unit |
|---|---|---|
| count | Total number of GC runs | integer |
| major_gc_count | Number of major collections | integer |
| minor_gc_count | Number of minor collections | integer |
| heap_allocated_pages | Total heap pages allocated | integer |
| heap_live_slots | Slots containing live objects | integer |
| heap_free_slots | Empty slots available | integer |
| heap_final_slots | Slots containing finalizable objects | integer |
| total_allocated_objects | Objects allocated since start | integer |
| total_freed_objects | Objects freed since start | integer |
| malloc_increase_bytes | Malloc bytes since last GC | bytes |
| oldmalloc_increase_bytes | Old object malloc bytes | bytes |
| old_objects | Objects promoted to old generation | integer |
Common GC Methods
| Method | Purpose | Example |
|---|---|---|
| GC.start | Trigger garbage collection | GC.start(full_mark: true) |
| GC.count | Return number of collections | GC.count |
| GC.stat | Return statistics hash | GC.stat(:heap_live_slots) |
| GC.disable | Disable automatic collection | GC.disable |
| GC.enable | Enable automatic collection | GC.enable |
| GC.stress | Enable stress testing mode | GC.stress = true |
| GC.compact | Compact heap to reduce fragmentation | GC.compact |
| GC.verify_compaction_references | Verify compaction correctness | GC.verify_compaction_references |
ObjectSpace Methods
| Method | Purpose | Return Type |
|---|---|---|
| ObjectSpace.count_objects | Count objects by type | Hash |
| ObjectSpace.each_object | Iterate over objects | Enumerator |
| ObjectSpace.garbage_collect | Trigger collection | nil |
| ObjectSpace.allocation_sourcefile | Get allocation file | String |
| ObjectSpace.allocation_sourceline | Get allocation line number | Integer |
| ObjectSpace.allocation_generation | Get object generation | Integer |
| ObjectSpace.define_finalizer | Register finalizer | Array |
| ObjectSpace.undefine_finalizer | Remove finalizer | Object |
Performance Tuning Guidelines
| Scenario | Recommendation | Rationale |
|---|---|---|
| High allocation rate | Increase heap size, reduce allocations | Fewer collections, lower overhead |
| Long pause times | Enable incremental GC, reduce heap size | Shorter individual pauses |
| Memory constrained | Frequent collections, compact heap | Minimize memory footprint |
| High throughput needed | Larger heap, infrequent collections | Maximize application execution time |
| Real-time requirements | Tune for consistent pause times | Predictable latency |
| Object pooling viable | Reuse objects, reduce new allocations | Eliminate allocation overhead |