CrackedRuby logo

CrackedRuby

Modular GC

Ruby's Modular Garbage Collection system that allows dynamic loading of alternative GC implementations at runtime.

Core Modules GC Module
3.6.5

Overview

Ruby 3.4 introduces Modular GC, a standardized interface that allows alternative garbage collector implementations to be loaded dynamically at runtime through shared libraries. This system separates Ruby's existing garbage collection code into distinct integration and implementation layers, enabling developers to override GC behavior without recompiling or relinking Ruby binaries.

The feature uses dlopen to load shared GC libraries at runtime, creating function pointer mappings to implementations provided by the shared library. When no shared library is configured, Ruby falls back to statically compiled default GC implementations. The system maintains strict security controls by only loading GC libraries from a predefined directory path specified at build time.

Ruby provides two GC implementations out of the box: the traditional default collector and an experimental MMTk-based implementation. The MMTk integration leverages the Memory Management Toolkit, a language-agnostic library providing sophisticated memory management building blocks that can be combined to produce complete GC strategies.

# Check current GC implementation
puts GC.config[:implementation]
# => "default"

# Verify modular GC support in Ruby description
puts RUBY_DESCRIPTION
# => "ruby 3.4.0dev (2024-12-06T12:47:35Z master 78614ee900) +PRISM +GC [arm64-darwin24]"

The core API revolves around GC.config, which provides implementation-agnostic access to garbage collector configuration parameters. Each GC implementation documents its own configuration keys, maintaining isolation between different collector strategies.

Basic Usage

Enabling Modular GC requires configuring Ruby at build time with the --with-modular-gc flag, specifying a directory path where Ruby will load GC libraries. This directory restriction serves as a security measure to control filesystem locations for code loading.

# Configure Ruby with modular GC support
./configure --with-modular-gc=$HOME/ruby-mod-gc
make -j
make install

The default garbage collector can be built as a modular library using the dedicated make target. This allows testing configuration changes or fixes without rebuilding entire Ruby binaries.

# Build default GC as modular library
make modular-gc MODULAR_GC=default

# Build MMTk-based GC (requires Rust)
make modular-gc MODULAR_GC=mmtk

Loading GC libraries occurs through the RUBY_GC_LIBRARY environment variable. Libraries must follow the naming convention librubygc.{name}.{extension} where the extension matches the platform's shared object format (.so on Linux, .bundle on macOS).

# Load default GC as library
# RUBY_GC_LIBRARY=default ruby script.rb

# Verify loaded implementation
GC.config[:implementation]
# => "default"

# Access implementation-specific configuration
config = GC.config
puts config.inspect
# => {:implementation=>"default", :rgengc_allow_full_mark=>true}

The GC.config method serves dual purposes: retrieving current configuration as a Hash with Symbol keys, or updating configuration by passing a Hash argument. Configuration keys missing from the passed Hash remain unmodified.

# Get current configuration
current_config = GC.config
puts current_config[:rgengc_allow_full_mark]
# => true

# Update specific configuration parameters
GC.config(rgengc_allow_full_mark: false)

# Verify configuration change
puts GC.config[:rgengc_allow_full_mark]  
# => false

Invalid configuration keys don't raise errors but return nil values in the resulting Hash, maintaining system stability while providing feedback about unsupported parameters.

# Attempt to set invalid configuration
result = GC.config(nonexistent_key: "value")
puts result[:nonexistent_key]
# => nil

Advanced Usage

Ruby's default GC implementation supports the rgengc_allow_full_mark parameter, which controls whether the garbage collector can run full marking cycles covering both young and old objects. When set to false, only minor marking runs, affecting young objects exclusively.

# Disable major GC cycles for performance testing
GC.config(rgengc_allow_full_mark: false)

# Monitor GC behavior changes
before_stats = GC.stat
GC.start(full_mark: false)  # Forces minor collection only
after_stats = GC.stat

puts "Minor GC count increased: #{after_stats[:minor_gc_count] - before_stats[:minor_gc_count]}"
puts "Major GC count unchanged: #{after_stats[:major_gc_count] - before_stats[:major_gc_count]}"

When rgengc_allow_full_mark is disabled, heap exhaustion triggers immediate page allocation rather than full marking cycles. A flag indicates when full marking becomes necessary, accessible through GC.latest_gc_info(:need_major_by).

class GCMonitor
  def self.monitor_gc_pressure
    # Disable major GC for controlled testing
    GC.config(rgengc_allow_full_mark: false)
    
    # Allocate objects to create memory pressure
    1000.times { |i| Array.new(1000) { "object_#{i}" } }
    
    gc_info = GC.latest_gc_info
    if gc_info[:need_major_by]
      puts "Major GC required due to: #{gc_info[:need_major_by]}"
      
      # Manually trigger major collection
      GC.start(full_mark: true)
      puts "Major GC completed"
    end
    
    # Re-enable normal GC behavior
    GC.config(rgengc_allow_full_mark: true)
  end
end

GCMonitor.monitor_gc_pressure

The MMTk implementation provides sophisticated configuration through environment variables, supporting different heap modes and GC algorithms. The Dynamic heap mode allows growth between fixed bounds, while Fixed mode pre-allocates unchangeable heap sizes.

# MMTk configuration demonstration
def configure_mmtk_gc
  # Environment variables must be set before Ruby starts
  # MMTK_HEAP_MODE=Dynamic
  # MMTK_HEAP_MIN=1048576    # 1MB minimum
  # MMTK_HEAP_MAX=30923764531  # ~30GB maximum
  # MMTK_PLAN=MarkSweep
  # MMTK_THREADS=4
  
  # Verify MMTk configuration
  if GC.config[:implementation] == "mmtk"
    puts "Running MMTk GC with MarkSweep plan"
    
    # Monitor MMTk performance
    start_time = Time.now
    GC.start
    gc_time = Time.now - start_time
    
    puts "MMTk GC cycle completed in #{gc_time * 1000}ms"
  else
    puts "MMTk GC not loaded"
  end
end

configure_mmtk_gc

Multi-threaded GC configuration requires careful consideration of Ruby's Global Interpreter Lock interactions. MMTk runs multiple GC threads in parallel, though they don't execute concurrently with Ruby VM threads. The system still requires "stop the world" behavior during collection phases.

class AdvancedGCController
  def self.benchmark_gc_implementations
    implementations = ["default"]
    implementations << "mmtk" if mmtk_available?
    
    results = {}
    
    implementations.each do |impl|
      # This would require restarting Ruby with different RUBY_GC_LIBRARY
      puts "Benchmarking #{impl} GC implementation..."
      
      start_memory = GC.stat[:heap_allocated_pages]
      start_time = Time.now
      
      # Memory allocation workload
      data = Array.new(10000) { |i| Hash.new.tap { |h| h[i] = "data_#{i}" * 100 } }
      
      # Force garbage collection
      GC.start
      
      end_time = Time.now
      end_memory = GC.stat[:heap_allocated_pages]
      
      results[impl] = {
        duration: end_time - start_time,
        memory_delta: end_memory - start_memory,
        gc_count: GC.stat[:count]
      }
      
      data = nil  # Release references
    end
    
    results
  end
  
  private
  
  def self.mmtk_available?
    GC.config[:implementation] == "mmtk" rescue false
  end
end

Performance & Memory

Ruby 3.4's Modular GC system introduces runtime overhead through function pointer indirection and dynamic library loading. However, this architectural change enables experimentation with high-performance algorithms that weren't previously accessible.

Memory allocation patterns vary significantly between GC implementations. The default collector maintains Ruby's traditional mark-and-sweep behavior with generational collection, while MMTk provides access to modern algorithms like Immix and LXR.

class GCPerformanceAnalyzer
  def self.measure_allocation_performance
    # Baseline measurement before allocations
    initial_stats = GC.stat
    
    start_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
    
    # Heavy allocation workload
    objects = Array.new(50000) do |i|
      {
        id: i,
        data: Array.new(100) { rand(1000) },
        metadata: { created_at: Time.now, iteration: i }
      }
    end
    
    end_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
    final_stats = GC.stat
    
    # Calculate performance metrics
    {
      allocation_time: end_time - start_time,
      gc_cycles: final_stats[:count] - initial_stats[:count],
      major_gc: final_stats[:major_gc_count] - initial_stats[:major_gc_count],
      minor_gc: final_stats[:minor_gc_count] - initial_stats[:minor_gc_count],
      heap_pages: final_stats[:heap_allocated_pages] - initial_stats[:heap_allocated_pages],
      objects_allocated: final_stats[:total_allocated_objects] - initial_stats[:total_allocated_objects]
    }
  end
  
  def self.memory_pressure_test
    results = []
    
    (1..10).each do |iteration|
      # Create memory pressure
      large_objects = Array.new(1000) { Array.new(1000) { "x" * 100 } }
      
      # Measure GC response
      gc_before = GC.stat
      GC.start
      gc_after = GC.stat
      
      results << {
        iteration: iteration,
        heap_live_slots: gc_after[:heap_live_slots],
        heap_free_slots: gc_after[:heap_free_slots],
        gc_time: GC.measure_total_time ? gc_after[:time] - gc_before[:time] : "Not measured"
      }
      
      large_objects = nil  # Force deallocation
    end
    
    results
  end
end

# Performance comparison
puts "Allocation Performance:"
perf_data = GCPerformanceAnalyzer.measure_allocation_performance
perf_data.each { |key, value| puts "#{key}: #{value}" }

puts "\nMemory Pressure Test:"
memory_data = GCPerformanceAnalyzer.memory_pressure_test
memory_data.last(3).each { |result| puts result.inspect }

MMTk's MarkSweep implementation operates with dynamic heap sizing, growing between configured minimum and maximum bounds. This approach contrasts with Ruby's traditional fixed page allocation strategy.

# Memory usage optimization for different workloads
class MemoryOptimizer
  def self.optimize_for_batch_processing
    # Configuration for batch processing workloads
    if GC.config[:implementation] == "default"
      # Disable major GC during processing phase
      GC.config(rgengc_allow_full_mark: false)
      
      puts "Optimized for batch processing with minor GC only"
    elsif GC.config[:implementation] == "mmtk"
      # MMTk configuration handled via environment variables
      puts "MMTk heap configuration from environment variables"
    end
    
    # Process large dataset
    batch_data = process_large_dataset
    
    # Re-enable major GC and clean up
    GC.config(rgengc_allow_full_mark: true) if GC.config[:implementation] == "default"
    GC.start(full_mark: true)
    
    batch_data
  end
  
  def self.optimize_for_low_latency
    # Configuration for low-latency applications
    GC.disable  # Temporarily disable automatic GC
    
    begin
      # Critical low-latency section
      yield if block_given?
    ensure
      GC.enable  # Re-enable automatic GC
      
      # Schedule deferred cleanup
      Thread.new { sleep(0.1); GC.start }
    end
  end
  
  private
  
  def self.process_large_dataset
    # Simulate batch processing
    data = []
    10000.times do |i|
      data << { id: i, processed: Time.now }
      
      # Periodic minor cleanup without major GC overhead
      GC.start(full_mark: false) if i % 1000 == 0
    end
    data
  end
end

Production Patterns

Production environments require careful GC tuning to balance throughput, latency, and memory usage. Modular GC enables A/B testing different collector implementations in production without requiring separate Ruby deployments.

class ProductionGCManager
  class << self
    def configure_for_web_application
      # Web application optimization
      case GC.config[:implementation]
      when "default"
        configure_default_for_web
      when "mmtk"
        configure_mmtk_for_web
      else
        Rails.logger.warn "Unknown GC implementation: #{GC.config[:implementation]}"
      end
      
      # Set up GC monitoring
      setup_gc_monitoring
    end
    
    def configure_for_background_jobs
      # Background job optimization prioritizes throughput over latency
      GC.config(rgengc_allow_full_mark: true)  # Allow major GC
      
      # Increase GC frequency for memory cleanup
      original_gc_stress = GC.stress
      GC.stress = false  # Disable stress mode for production
      
      # Return cleanup block
      proc { GC.stress = original_gc_stress }
    end
    
    private
    
    def configure_default_for_web
      # Conservative configuration for web requests
      GC.config(rgengc_allow_full_mark: true)
      
      # Enable compaction for long-running processes
      GC.auto_compact = true
      
      Rails.logger.info "Configured default GC for web application"
    end
    
    def configure_mmtk_for_web
      # MMTk configuration through environment variables
      # MMTK_HEAP_MODE=Dynamic
      # MMTK_PLAN=MarkSweep  
      # MMTK_THREADS=2  # Conservative for web servers
      
      Rails.logger.info "MMTk GC configured via environment variables"
    end
    
    def setup_gc_monitoring
      # Enable GC time measurement for monitoring
      GC.measure_total_time = true
      
      # Set up periodic GC statistics collection
      Thread.new do
        loop do
          sleep(60)  # Every minute
          log_gc_statistics
        end
      end
    end
    
    def log_gc_statistics
      stats = GC.stat
      gc_info = GC.latest_gc_info
      
      Rails.logger.info({
        event: "gc_statistics",
        implementation: GC.config[:implementation],
        major_gc_count: stats[:major_gc_count],
        minor_gc_count: stats[:minor_gc_count],
        heap_allocated_pages: stats[:heap_allocated_pages],
        heap_live_slots: stats[:heap_live_slots],
        total_time: stats[:time],
        last_gc_reason: gc_info[:gc_by]
      }.to_json)
    end
  end
end

# Rails initializer example
class Application < Rails::Application
  config.after_initialize do
    ProductionGCManager.configure_for_web_application
  end
end

Blue-green deployment strategies can test GC implementations by routing traffic between environments with different collectors:

class GCDeploymentStrategy
  def self.deploy_with_gc_testing
    # Environment A: Default GC
    # Environment B: MMTk GC
    
    deployment_config = {
      environments: {
        "blue" => {
          gc_library: "default",
          traffic_percentage: 90
        },
        "green" => {
          gc_library: "mmtk", 
          traffic_percentage: 10
        }
      },
      metrics_to_track: [
        :response_time_p95,
        :gc_frequency,
        :memory_usage,
        :throughput
      ]
    }
    
    monitor_deployment(deployment_config)
  end
  
  def self.monitor_deployment(config)
    config[:environments].each do |env_name, env_config|
      puts "Monitoring #{env_name} environment:"
      puts "  GC Implementation: #{env_config[:gc_library]}"  
      puts "  Traffic: #{env_config[:traffic_percentage]}%"
      
      # Collect metrics for comparison
      metrics = collect_environment_metrics(env_name)
      
      if metrics[:error_rate] > 0.01  # 1% error threshold
        puts "  WARNING: High error rate detected"
        # Trigger rollback logic
      end
    end
  end
  
  private
  
  def self.collect_environment_metrics(environment)
    # Simulate metrics collection
    {
      response_time_p95: rand(50..200),
      gc_frequency: rand(1..10),
      memory_usage: rand(500..2000),
      throughput: rand(100..1000),
      error_rate: rand * 0.02
    }
  end
end

Container orchestration platforms like Kubernetes benefit from GC-aware resource management:

# kubernetes-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ruby-app-mmtk
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: app
        image: ruby-app:3.4
        env:
        - name: RUBY_GC_LIBRARY
          value: "mmtk"
        - name: MMTK_HEAP_MODE
          value: "Dynamic"  
        - name: MMTK_HEAP_MAX
          value: "1073741824"  # 1GB
        resources:
          requests:
            memory: "512Mi"
          limits:
            memory: "1Gi"

Common Pitfalls

The most frequent issue involves attempting to configure GC implementations without proper build-time support. Ruby must be compiled with --with-modular-gc to access the Modular GC functionality. Without this flag, the existing default Ruby GC compiles statically into binaries with no modular code included.

# Common mistake: assuming GC.config works without modular support
begin
  result = GC.config(custom_parameter: "value")
  puts "Configuration successful"
rescue NoMethodError => e
  puts "Error: GC.config not available - Ruby compiled without modular GC support"
  puts "Rebuild Ruby with: ./configure --with-modular-gc=/path/to/gc/dir"
end

Environment variable configuration often fails due to timing issues. MMTk configuration must occur before Ruby process initialization:

# INCORRECT: Setting environment variables after Ruby starts
ENV['MMTK_HEAP_MODE'] = 'Fixed'  # This has no effect
ENV['RUBY_GC_LIBRARY'] = 'mmtk'  # This has no effect

# CORRECT: Set before starting Ruby process
# MMTK_HEAP_MODE=Fixed RUBY_GC_LIBRARY=mmtk ruby script.rb

Security restrictions prevent loading GC libraries from arbitrary locations. The directory path specified during --with-modular-gc configuration is the only filesystem location where Ruby loads shared GC libraries. This serves as risk mitigation to tightly control code loading sources.

# This will fail if library is outside configured directory
# RUBY_GC_LIBRARY=../../../malicious_gc ruby script.rb

def verify_gc_library_location
  config_dir = "/configured/gc/directory"  # Set during build
  
  library_name = ENV['RUBY_GC_LIBRARY']
  if library_name
    expected_path = File.join(config_dir, "librubygc.#{library_name}.so")
    
    unless File.exist?(expected_path)
      puts "Error: GC library not found in configured directory"
      puts "Expected: #{expected_path}"
      return false
    end
  end
  
  true
end

Mixing GC configuration approaches causes unpredictable behavior. Ruby's default GC uses GC.config, while MMTk uses environment variables:

class GCConfigurationValidator
  def self.validate_configuration
    impl = GC.config[:implementation]
    issues = []
    
    case impl
    when "default"
      # Validate default GC configuration
      if ENV['MMTK_HEAP_MODE']
        issues << "MMTk environment variables set but using default GC"
      end
      
      unless GC.config.key?(:rgengc_allow_full_mark)
        issues << "Missing required default GC configuration parameters"
      end
      
    when "mmtk"
      # Validate MMTk configuration
      required_env_vars = %w[MMTK_HEAP_MODE MMTK_PLAN]
      missing_vars = required_env_vars.reject { |var| ENV[var] }
      
      if missing_vars.any?
        issues << "Missing required MMTk environment variables: #{missing_vars.join(', ')}"
      end
      
      # Check for conflicting default GC settings
      if GC.config[:rgengc_allow_full_mark] == false
        issues << "Conflicting GC configuration detected"
      end
    end
    
    issues
  end
  
  def self.report_configuration_issues
    issues = validate_configuration
    
    if issues.any?
      puts "GC Configuration Issues:"
      issues.each { |issue| puts "  - #{issue}" }
      false
    else
      puts "GC configuration validated successfully"
      true
    end
  end
end

# Run validation
GCConfigurationValidator.report_configuration_issues

Performance expectations often misalign with reality. MMTk implementation currently lags significantly behind Ruby's existing GC in performance and hasn't been tested on production workloads. The feature remains experimental with frequent changes expected.

def realistic_performance_expectations
  puts "Current GC Performance Status (Ruby 3.4):"
  puts ""
  puts "Default GC:"
  puts "  - Production-ready and optimized"
  puts "  - Years of real-world tuning"
  puts "  - Predictable performance characteristics"
  puts ""
  puts "MMTk GC:"
  puts "  - Experimental implementation"
  puts "  - Performance currently slower than default"
  puts "  - Subject to frequent changes"
  puts "  - Not recommended for production use"
  puts ""
  puts "Use MMTk for:"
  puts "  - Research and development"
  puts "  - Algorithm experimentation"
  puts "  - Future-proofing architecture"
  puts ""
  puts "Avoid MMTk for:"
  puts "  - Production applications"
  puts "  - Performance-critical systems"
  puts "  - Stable deployment environments"
end

Reference

Core Methods

Method Parameters Returns Description
GC.config hash = nil Hash Gets or sets GC configuration parameters. Returns current config as Hash with Symbol keys, or updates config when Hash argument provided
GC.config[:implementation] none String Returns name of currently loaded GC implementation. Read-only key present regardless of modular GC support
GC.start full_mark: true, immediate_mark: true, immediate_sweep: true nil Initiates garbage collection with keyword arguments controlling collection type and timing behavior

Configuration Parameters

Parameter Type Default Description
:implementation String "default" Read-only identifier of current GC implementation
:rgengc_allow_full_mark Boolean true Controls whether GC can run full marking cycles. When false, only minor marking occurs

Environment Variables

Variable Values Description
RUBY_GC_LIBRARY "default", "mmtk" Specifies GC library to load at runtime. Library must exist in configured modular GC directory
MMTK_HEAP_MODE "Dynamic", "Fixed" MMTk heap allocation strategy. Dynamic allows growth between bounds, Fixed pre-allocates unchangeable size
MMTK_HEAP_MIN Integer (bytes) 1048576
MMTK_HEAP_MAX Integer (bytes) 30923764531
MMTK_PLAN "MarkSweep", "NoGC" "MarkSweep"
MMTK_THREADS Integer Platform default

Build Configuration

Configure Flag Required Value Description
--with-modular-gc Directory path Enables Modular GC feature and specifies exclusive directory for loading GC libraries

Make Targets

Target Parameters Description
modular-gc MODULAR_GC=default Builds default GC as modular library. Requires Ruby configured with --with-modular-gc
modular-gc MODULAR_GC=mmtk Builds MMTk-based GC library. Requires Rust toolchain for compilation

Library Naming Convention

GC libraries follow strict naming patterns for runtime loading:

Pattern Example Platform
librubygc.{name}.so librubygc.default.so Linux (ELF)
librubygc.{name}.bundle librubygc.mmtk.bundle macOS (Darwin)

Runtime Verification

# Check modular GC support
modular_gc_enabled = RUBY_DESCRIPTION.include?("+GC")
puts "Modular GC: #{modular_gc_enabled ? 'Enabled' : 'Disabled'}"

# Identify loaded GC library  
if RUBY_DESCRIPTION.include?("+GC[")
  gc_name = RUBY_DESCRIPTION[/\+GC\[([^\]]+)\]/, 1]
  puts "Loaded GC: #{gc_name}"
else
  puts "Default static GC in use"
end

# Verify configuration access
begin
  config = GC.config
  puts "Configuration available: #{config.keys.inspect}"
rescue NoMethodError
  puts "GC.config not available - modular GC not compiled"
end