Overview
Ruby 3.4 introduces Modular GC, a standardized interface that allows alternative garbage collector implementations to be loaded dynamically at runtime through shared libraries. This system separates Ruby's existing garbage collection code into distinct integration and implementation layers, enabling developers to override GC behavior without recompiling or relinking Ruby binaries.
The feature uses dlopen
to load shared GC libraries at runtime, creating function pointer mappings to implementations provided by the shared library. When no shared library is configured, Ruby falls back to statically compiled default GC implementations. The system maintains strict security controls by only loading GC libraries from a predefined directory path specified at build time.
Ruby provides two GC implementations out of the box: the traditional default collector and an experimental MMTk-based implementation. The MMTk integration leverages the Memory Management Toolkit, a language-agnostic library providing sophisticated memory management building blocks that can be combined to produce complete GC strategies.
# Check current GC implementation
puts GC.config[:implementation]
# => "default"
# Verify modular GC support in Ruby description
puts RUBY_DESCRIPTION
# => "ruby 3.4.0dev (2024-12-06T12:47:35Z master 78614ee900) +PRISM +GC [arm64-darwin24]"
The core API revolves around GC.config
, which provides implementation-agnostic access to garbage collector configuration parameters. Each GC implementation documents its own configuration keys, maintaining isolation between different collector strategies.
Basic Usage
Enabling Modular GC requires configuring Ruby at build time with the --with-modular-gc
flag, specifying a directory path where Ruby will load GC libraries. This directory restriction serves as a security measure to control filesystem locations for code loading.
# Configure Ruby with modular GC support
./configure --with-modular-gc=$HOME/ruby-mod-gc
make -j
make install
The default garbage collector can be built as a modular library using the dedicated make target. This allows testing configuration changes or fixes without rebuilding entire Ruby binaries.
# Build default GC as modular library
make modular-gc MODULAR_GC=default
# Build MMTk-based GC (requires Rust)
make modular-gc MODULAR_GC=mmtk
Loading GC libraries occurs through the RUBY_GC_LIBRARY
environment variable. Libraries must follow the naming convention librubygc.{name}.{extension}
where the extension matches the platform's shared object format (.so
on Linux, .bundle
on macOS).
# Load default GC as library
# RUBY_GC_LIBRARY=default ruby script.rb
# Verify loaded implementation
GC.config[:implementation]
# => "default"
# Access implementation-specific configuration
config = GC.config
puts config.inspect
# => {:implementation=>"default", :rgengc_allow_full_mark=>true}
The GC.config
method serves dual purposes: retrieving current configuration as a Hash with Symbol keys, or updating configuration by passing a Hash argument. Configuration keys missing from the passed Hash remain unmodified.
# Get current configuration
current_config = GC.config
puts current_config[:rgengc_allow_full_mark]
# => true
# Update specific configuration parameters
GC.config(rgengc_allow_full_mark: false)
# Verify configuration change
puts GC.config[:rgengc_allow_full_mark]
# => false
Invalid configuration keys don't raise errors but return nil
values in the resulting Hash, maintaining system stability while providing feedback about unsupported parameters.
# Attempt to set invalid configuration
result = GC.config(nonexistent_key: "value")
puts result[:nonexistent_key]
# => nil
Advanced Usage
Ruby's default GC implementation supports the rgengc_allow_full_mark
parameter, which controls whether the garbage collector can run full marking cycles covering both young and old objects. When set to false, only minor marking runs, affecting young objects exclusively.
# Disable major GC cycles for performance testing
GC.config(rgengc_allow_full_mark: false)
# Monitor GC behavior changes
before_stats = GC.stat
GC.start(full_mark: false) # Forces minor collection only
after_stats = GC.stat
puts "Minor GC count increased: #{after_stats[:minor_gc_count] - before_stats[:minor_gc_count]}"
puts "Major GC count unchanged: #{after_stats[:major_gc_count] - before_stats[:major_gc_count]}"
When rgengc_allow_full_mark
is disabled, heap exhaustion triggers immediate page allocation rather than full marking cycles. A flag indicates when full marking becomes necessary, accessible through GC.latest_gc_info(:need_major_by)
.
class GCMonitor
def self.monitor_gc_pressure
# Disable major GC for controlled testing
GC.config(rgengc_allow_full_mark: false)
# Allocate objects to create memory pressure
1000.times { |i| Array.new(1000) { "object_#{i}" } }
gc_info = GC.latest_gc_info
if gc_info[:need_major_by]
puts "Major GC required due to: #{gc_info[:need_major_by]}"
# Manually trigger major collection
GC.start(full_mark: true)
puts "Major GC completed"
end
# Re-enable normal GC behavior
GC.config(rgengc_allow_full_mark: true)
end
end
GCMonitor.monitor_gc_pressure
The MMTk implementation provides sophisticated configuration through environment variables, supporting different heap modes and GC algorithms. The Dynamic heap mode allows growth between fixed bounds, while Fixed mode pre-allocates unchangeable heap sizes.
# MMTk configuration demonstration
def configure_mmtk_gc
# Environment variables must be set before Ruby starts
# MMTK_HEAP_MODE=Dynamic
# MMTK_HEAP_MIN=1048576 # 1MB minimum
# MMTK_HEAP_MAX=30923764531 # ~30GB maximum
# MMTK_PLAN=MarkSweep
# MMTK_THREADS=4
# Verify MMTk configuration
if GC.config[:implementation] == "mmtk"
puts "Running MMTk GC with MarkSweep plan"
# Monitor MMTk performance
start_time = Time.now
GC.start
gc_time = Time.now - start_time
puts "MMTk GC cycle completed in #{gc_time * 1000}ms"
else
puts "MMTk GC not loaded"
end
end
configure_mmtk_gc
Multi-threaded GC configuration requires careful consideration of Ruby's Global Interpreter Lock interactions. MMTk runs multiple GC threads in parallel, though they don't execute concurrently with Ruby VM threads. The system still requires "stop the world" behavior during collection phases.
class AdvancedGCController
def self.benchmark_gc_implementations
implementations = ["default"]
implementations << "mmtk" if mmtk_available?
results = {}
implementations.each do |impl|
# This would require restarting Ruby with different RUBY_GC_LIBRARY
puts "Benchmarking #{impl} GC implementation..."
start_memory = GC.stat[:heap_allocated_pages]
start_time = Time.now
# Memory allocation workload
data = Array.new(10000) { |i| Hash.new.tap { |h| h[i] = "data_#{i}" * 100 } }
# Force garbage collection
GC.start
end_time = Time.now
end_memory = GC.stat[:heap_allocated_pages]
results[impl] = {
duration: end_time - start_time,
memory_delta: end_memory - start_memory,
gc_count: GC.stat[:count]
}
data = nil # Release references
end
results
end
private
def self.mmtk_available?
GC.config[:implementation] == "mmtk" rescue false
end
end
Performance & Memory
Ruby 3.4's Modular GC system introduces runtime overhead through function pointer indirection and dynamic library loading. However, this architectural change enables experimentation with high-performance algorithms that weren't previously accessible.
Memory allocation patterns vary significantly between GC implementations. The default collector maintains Ruby's traditional mark-and-sweep behavior with generational collection, while MMTk provides access to modern algorithms like Immix and LXR.
class GCPerformanceAnalyzer
def self.measure_allocation_performance
# Baseline measurement before allocations
initial_stats = GC.stat
start_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
# Heavy allocation workload
objects = Array.new(50000) do |i|
{
id: i,
data: Array.new(100) { rand(1000) },
metadata: { created_at: Time.now, iteration: i }
}
end
end_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
final_stats = GC.stat
# Calculate performance metrics
{
allocation_time: end_time - start_time,
gc_cycles: final_stats[:count] - initial_stats[:count],
major_gc: final_stats[:major_gc_count] - initial_stats[:major_gc_count],
minor_gc: final_stats[:minor_gc_count] - initial_stats[:minor_gc_count],
heap_pages: final_stats[:heap_allocated_pages] - initial_stats[:heap_allocated_pages],
objects_allocated: final_stats[:total_allocated_objects] - initial_stats[:total_allocated_objects]
}
end
def self.memory_pressure_test
results = []
(1..10).each do |iteration|
# Create memory pressure
large_objects = Array.new(1000) { Array.new(1000) { "x" * 100 } }
# Measure GC response
gc_before = GC.stat
GC.start
gc_after = GC.stat
results << {
iteration: iteration,
heap_live_slots: gc_after[:heap_live_slots],
heap_free_slots: gc_after[:heap_free_slots],
gc_time: GC.measure_total_time ? gc_after[:time] - gc_before[:time] : "Not measured"
}
large_objects = nil # Force deallocation
end
results
end
end
# Performance comparison
puts "Allocation Performance:"
perf_data = GCPerformanceAnalyzer.measure_allocation_performance
perf_data.each { |key, value| puts "#{key}: #{value}" }
puts "\nMemory Pressure Test:"
memory_data = GCPerformanceAnalyzer.memory_pressure_test
memory_data.last(3).each { |result| puts result.inspect }
MMTk's MarkSweep implementation operates with dynamic heap sizing, growing between configured minimum and maximum bounds. This approach contrasts with Ruby's traditional fixed page allocation strategy.
# Memory usage optimization for different workloads
class MemoryOptimizer
def self.optimize_for_batch_processing
# Configuration for batch processing workloads
if GC.config[:implementation] == "default"
# Disable major GC during processing phase
GC.config(rgengc_allow_full_mark: false)
puts "Optimized for batch processing with minor GC only"
elsif GC.config[:implementation] == "mmtk"
# MMTk configuration handled via environment variables
puts "MMTk heap configuration from environment variables"
end
# Process large dataset
batch_data = process_large_dataset
# Re-enable major GC and clean up
GC.config(rgengc_allow_full_mark: true) if GC.config[:implementation] == "default"
GC.start(full_mark: true)
batch_data
end
def self.optimize_for_low_latency
# Configuration for low-latency applications
GC.disable # Temporarily disable automatic GC
begin
# Critical low-latency section
yield if block_given?
ensure
GC.enable # Re-enable automatic GC
# Schedule deferred cleanup
Thread.new { sleep(0.1); GC.start }
end
end
private
def self.process_large_dataset
# Simulate batch processing
data = []
10000.times do |i|
data << { id: i, processed: Time.now }
# Periodic minor cleanup without major GC overhead
GC.start(full_mark: false) if i % 1000 == 0
end
data
end
end
Production Patterns
Production environments require careful GC tuning to balance throughput, latency, and memory usage. Modular GC enables A/B testing different collector implementations in production without requiring separate Ruby deployments.
class ProductionGCManager
class << self
def configure_for_web_application
# Web application optimization
case GC.config[:implementation]
when "default"
configure_default_for_web
when "mmtk"
configure_mmtk_for_web
else
Rails.logger.warn "Unknown GC implementation: #{GC.config[:implementation]}"
end
# Set up GC monitoring
setup_gc_monitoring
end
def configure_for_background_jobs
# Background job optimization prioritizes throughput over latency
GC.config(rgengc_allow_full_mark: true) # Allow major GC
# Increase GC frequency for memory cleanup
original_gc_stress = GC.stress
GC.stress = false # Disable stress mode for production
# Return cleanup block
proc { GC.stress = original_gc_stress }
end
private
def configure_default_for_web
# Conservative configuration for web requests
GC.config(rgengc_allow_full_mark: true)
# Enable compaction for long-running processes
GC.auto_compact = true
Rails.logger.info "Configured default GC for web application"
end
def configure_mmtk_for_web
# MMTk configuration through environment variables
# MMTK_HEAP_MODE=Dynamic
# MMTK_PLAN=MarkSweep
# MMTK_THREADS=2 # Conservative for web servers
Rails.logger.info "MMTk GC configured via environment variables"
end
def setup_gc_monitoring
# Enable GC time measurement for monitoring
GC.measure_total_time = true
# Set up periodic GC statistics collection
Thread.new do
loop do
sleep(60) # Every minute
log_gc_statistics
end
end
end
def log_gc_statistics
stats = GC.stat
gc_info = GC.latest_gc_info
Rails.logger.info({
event: "gc_statistics",
implementation: GC.config[:implementation],
major_gc_count: stats[:major_gc_count],
minor_gc_count: stats[:minor_gc_count],
heap_allocated_pages: stats[:heap_allocated_pages],
heap_live_slots: stats[:heap_live_slots],
total_time: stats[:time],
last_gc_reason: gc_info[:gc_by]
}.to_json)
end
end
end
# Rails initializer example
class Application < Rails::Application
config.after_initialize do
ProductionGCManager.configure_for_web_application
end
end
Blue-green deployment strategies can test GC implementations by routing traffic between environments with different collectors:
class GCDeploymentStrategy
def self.deploy_with_gc_testing
# Environment A: Default GC
# Environment B: MMTk GC
deployment_config = {
environments: {
"blue" => {
gc_library: "default",
traffic_percentage: 90
},
"green" => {
gc_library: "mmtk",
traffic_percentage: 10
}
},
metrics_to_track: [
:response_time_p95,
:gc_frequency,
:memory_usage,
:throughput
]
}
monitor_deployment(deployment_config)
end
def self.monitor_deployment(config)
config[:environments].each do |env_name, env_config|
puts "Monitoring #{env_name} environment:"
puts " GC Implementation: #{env_config[:gc_library]}"
puts " Traffic: #{env_config[:traffic_percentage]}%"
# Collect metrics for comparison
metrics = collect_environment_metrics(env_name)
if metrics[:error_rate] > 0.01 # 1% error threshold
puts " WARNING: High error rate detected"
# Trigger rollback logic
end
end
end
private
def self.collect_environment_metrics(environment)
# Simulate metrics collection
{
response_time_p95: rand(50..200),
gc_frequency: rand(1..10),
memory_usage: rand(500..2000),
throughput: rand(100..1000),
error_rate: rand * 0.02
}
end
end
Container orchestration platforms like Kubernetes benefit from GC-aware resource management:
# kubernetes-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ruby-app-mmtk
spec:
replicas: 3
template:
spec:
containers:
- name: app
image: ruby-app:3.4
env:
- name: RUBY_GC_LIBRARY
value: "mmtk"
- name: MMTK_HEAP_MODE
value: "Dynamic"
- name: MMTK_HEAP_MAX
value: "1073741824" # 1GB
resources:
requests:
memory: "512Mi"
limits:
memory: "1Gi"
Common Pitfalls
The most frequent issue involves attempting to configure GC implementations without proper build-time support. Ruby must be compiled with --with-modular-gc
to access the Modular GC functionality. Without this flag, the existing default Ruby GC compiles statically into binaries with no modular code included.
# Common mistake: assuming GC.config works without modular support
begin
result = GC.config(custom_parameter: "value")
puts "Configuration successful"
rescue NoMethodError => e
puts "Error: GC.config not available - Ruby compiled without modular GC support"
puts "Rebuild Ruby with: ./configure --with-modular-gc=/path/to/gc/dir"
end
Environment variable configuration often fails due to timing issues. MMTk configuration must occur before Ruby process initialization:
# INCORRECT: Setting environment variables after Ruby starts
ENV['MMTK_HEAP_MODE'] = 'Fixed' # This has no effect
ENV['RUBY_GC_LIBRARY'] = 'mmtk' # This has no effect
# CORRECT: Set before starting Ruby process
# MMTK_HEAP_MODE=Fixed RUBY_GC_LIBRARY=mmtk ruby script.rb
Security restrictions prevent loading GC libraries from arbitrary locations. The directory path specified during --with-modular-gc
configuration is the only filesystem location where Ruby loads shared GC libraries. This serves as risk mitigation to tightly control code loading sources.
# This will fail if library is outside configured directory
# RUBY_GC_LIBRARY=../../../malicious_gc ruby script.rb
def verify_gc_library_location
config_dir = "/configured/gc/directory" # Set during build
library_name = ENV['RUBY_GC_LIBRARY']
if library_name
expected_path = File.join(config_dir, "librubygc.#{library_name}.so")
unless File.exist?(expected_path)
puts "Error: GC library not found in configured directory"
puts "Expected: #{expected_path}"
return false
end
end
true
end
Mixing GC configuration approaches causes unpredictable behavior. Ruby's default GC uses GC.config
, while MMTk uses environment variables:
class GCConfigurationValidator
def self.validate_configuration
impl = GC.config[:implementation]
issues = []
case impl
when "default"
# Validate default GC configuration
if ENV['MMTK_HEAP_MODE']
issues << "MMTk environment variables set but using default GC"
end
unless GC.config.key?(:rgengc_allow_full_mark)
issues << "Missing required default GC configuration parameters"
end
when "mmtk"
# Validate MMTk configuration
required_env_vars = %w[MMTK_HEAP_MODE MMTK_PLAN]
missing_vars = required_env_vars.reject { |var| ENV[var] }
if missing_vars.any?
issues << "Missing required MMTk environment variables: #{missing_vars.join(', ')}"
end
# Check for conflicting default GC settings
if GC.config[:rgengc_allow_full_mark] == false
issues << "Conflicting GC configuration detected"
end
end
issues
end
def self.report_configuration_issues
issues = validate_configuration
if issues.any?
puts "GC Configuration Issues:"
issues.each { |issue| puts " - #{issue}" }
false
else
puts "GC configuration validated successfully"
true
end
end
end
# Run validation
GCConfigurationValidator.report_configuration_issues
Performance expectations often misalign with reality. MMTk implementation currently lags significantly behind Ruby's existing GC in performance and hasn't been tested on production workloads. The feature remains experimental with frequent changes expected.
def realistic_performance_expectations
puts "Current GC Performance Status (Ruby 3.4):"
puts ""
puts "Default GC:"
puts " - Production-ready and optimized"
puts " - Years of real-world tuning"
puts " - Predictable performance characteristics"
puts ""
puts "MMTk GC:"
puts " - Experimental implementation"
puts " - Performance currently slower than default"
puts " - Subject to frequent changes"
puts " - Not recommended for production use"
puts ""
puts "Use MMTk for:"
puts " - Research and development"
puts " - Algorithm experimentation"
puts " - Future-proofing architecture"
puts ""
puts "Avoid MMTk for:"
puts " - Production applications"
puts " - Performance-critical systems"
puts " - Stable deployment environments"
end
Reference
Core Methods
Method | Parameters | Returns | Description |
---|---|---|---|
GC.config |
hash = nil |
Hash |
Gets or sets GC configuration parameters. Returns current config as Hash with Symbol keys, or updates config when Hash argument provided |
GC.config[:implementation] |
none | String |
Returns name of currently loaded GC implementation. Read-only key present regardless of modular GC support |
GC.start |
full_mark: true, immediate_mark: true, immediate_sweep: true |
nil |
Initiates garbage collection with keyword arguments controlling collection type and timing behavior |
Configuration Parameters
Parameter | Type | Default | Description |
---|---|---|---|
:implementation |
String |
"default" |
Read-only identifier of current GC implementation |
:rgengc_allow_full_mark |
Boolean |
true |
Controls whether GC can run full marking cycles. When false, only minor marking occurs |
Environment Variables
Variable | Values | Description |
---|---|---|
RUBY_GC_LIBRARY |
"default" , "mmtk" |
Specifies GC library to load at runtime. Library must exist in configured modular GC directory |
MMTK_HEAP_MODE |
"Dynamic" , "Fixed" |
MMTk heap allocation strategy. Dynamic allows growth between bounds, Fixed pre-allocates unchangeable size |
MMTK_HEAP_MIN |
Integer (bytes) | 1048576 |
MMTK_HEAP_MAX |
Integer (bytes) | 30923764531 |
MMTK_PLAN |
"MarkSweep" , "NoGC" |
"MarkSweep" |
MMTK_THREADS |
Integer | Platform default |
Build Configuration
Configure Flag | Required Value | Description |
---|---|---|
--with-modular-gc |
Directory path | Enables Modular GC feature and specifies exclusive directory for loading GC libraries |
Make Targets
Target | Parameters | Description |
---|---|---|
modular-gc |
MODULAR_GC=default |
Builds default GC as modular library. Requires Ruby configured with --with-modular-gc |
modular-gc |
MODULAR_GC=mmtk |
Builds MMTk-based GC library. Requires Rust toolchain for compilation |
Library Naming Convention
GC libraries follow strict naming patterns for runtime loading:
Pattern | Example | Platform |
---|---|---|
librubygc.{name}.so |
librubygc.default.so |
Linux (ELF) |
librubygc.{name}.bundle |
librubygc.mmtk.bundle |
macOS (Darwin) |
Runtime Verification
# Check modular GC support
modular_gc_enabled = RUBY_DESCRIPTION.include?("+GC")
puts "Modular GC: #{modular_gc_enabled ? 'Enabled' : 'Disabled'}"
# Identify loaded GC library
if RUBY_DESCRIPTION.include?("+GC[")
gc_name = RUBY_DESCRIPTION[/\+GC\[([^\]]+)\]/, 1]
puts "Loaded GC: #{gc_name}"
else
puts "Default static GC in use"
end
# Verify configuration access
begin
config = GC.config
puts "Configuration available: #{config.keys.inspect}"
rescue NoMethodError
puts "GC.config not available - modular GC not compiled"
end