CrackedRuby logo

CrackedRuby

Memory Management

Ruby memory management through garbage collection, object lifecycle, and memory optimization techniques.

Core Modules ObjectSpace Module
3.5.2

Overview

Ruby manages memory automatically through a mark-and-sweep garbage collector that reclaims unused objects. The garbage collector runs in generational phases, tracking object references and freeing memory when objects become unreachable. Ruby's memory management operates transparently but provides interfaces for monitoring and tuning performance-critical applications.

The garbage collector divides objects into generations based on survival patterns. Young objects occupy eden space and survivor spaces, while long-lived objects move to old space. This generational approach optimizes collection frequency since most objects die young. Ruby also maintains separate spaces for large objects and uses copy collection for young generations.

# Object creation triggers memory allocation
user = User.new(name: "Alice")
data = Array.new(1000) { |i| "item_#{i}" }

# Objects become eligible for collection when unreferenced
user = nil
data = nil

ObjectSpace provides the primary interface for memory introspection. The module exposes object counting, memory statistics, and garbage collection controls. These tools help diagnose memory usage patterns and optimize allocation-intensive code.

# Count objects by class
ObjectSpace.count_objects
# => {:TOTAL=>47891, :FREE=>3421, :T_OBJECT=>1205, ...}

# Force garbage collection
GC.start

# Examine memory statistics  
GC.stat
# => {:count=>23, :heap_allocated_pages=>117, :heap_sorted_length=>117, ...}

Ruby's memory model handles both stack and heap allocation. Local variables and method parameters use stack allocation, while objects created with .new allocate on the heap. The garbage collector only manages heap memory, making stack allocation extremely fast but limited in scope.

Basic Usage

Garbage collection runs automatically but applications can trigger collection manually using GC.start. Manual collection helps control timing in performance-sensitive code sections. The collector provides statistics through GC.stat and object counting through ObjectSpace.count_objects.

# Manual garbage collection
GC.start

# Disable automatic collection temporarily
GC.disable
# ... performance-critical code
GC.enable

# Check if GC is enabled
puts "GC enabled: #{GC.enable?}"

ObjectSpace.each_object iterates through all live objects of a specified class. This method helps identify memory leaks and analyze object distribution. The iteration occurs during garbage collection suspension to ensure consistent results.

# Count string objects
string_count = 0
ObjectSpace.each_object(String) { |str| string_count += 1 }
puts "Live strings: #{string_count}"

# Find large arrays
large_arrays = []
ObjectSpace.each_object(Array) do |arr|
  large_arrays << arr if arr.size > 1000
end

Weak references through ObjectSpace.define_finalizer attach cleanup code to objects. Finalizers execute during garbage collection but cannot access the original object. This mechanism helps release external resources like file handles or network connections.

class FileManager
  def initialize(filename)
    @file = File.open(filename)
    
    # Register finalizer for cleanup
    ObjectSpace.define_finalizer(self, self.class.finalizer(@file))
  end
  
  def self.finalizer(file)
    proc { file.close unless file.closed? }
  end
end

Memory profiling uses ObjectSpace callbacks to track allocation patterns. The trace_object_allocations method records allocation sites and counts for debugging memory growth. This tracing adds overhead but provides detailed allocation information.

# Enable allocation tracing
ObjectSpace.trace_object_allocations_start

# Run code under analysis
1000.times { "string_#{rand(100)}" }

# Examine allocations
ObjectSpace.allocation_sourcefile("test string") # => "filename.rb"
ObjectSpace.allocation_sourceline("test string") # => 15

ObjectSpace.trace_object_allocations_stop

Performance & Memory

Memory allocation performance varies significantly between object types and sizes. String and array allocation dominates most application profiles. Small objects allocate faster than large objects due to memory pool management. Understanding allocation patterns helps optimize critical code paths.

Ruby uses size-segregated free lists for small objects and direct malloc for large objects. Objects smaller than 40 bytes use optimized allocation pools. Arrays and hashes pre-allocate capacity to reduce reallocations during growth. String objects use copy-on-write optimization for substring operations.

# Benchmark different allocation patterns
require 'benchmark'

Benchmark.bm(20) do |x|
  x.report("small objects:") do
    100_000.times { Object.new }
  end
  
  x.report("large arrays:") do
    1000.times { Array.new(10_000, 0) }
  end
  
  x.report("string allocation:") do
    100_000.times { |i| "string_#{i}" }
  end
end

Garbage collection frequency depends on heap growth and allocation rate. The collector triggers when heap occupancy reaches threshold percentages. Tuning RUBY_GC_HEAP_GROWTH_FACTOR and RUBY_GC_HEAP_GROWTH_MAX_SLOTS controls collection timing. Lower factors reduce memory usage but increase collection frequency.

Memory-intensive applications benefit from heap preallocation using GC::Profiler. The profiler tracks collection timing and frequency across program execution. Applications can use this data to optimize allocation patterns and reduce collection overhead.

# Profile garbage collection
GC::Profiler.enable
GC::Profiler.clear

# Run memory-intensive code
data = Array.new(100_000) { |i| { id: i, value: "item_#{i}" } }
processed = data.map { |item| item[:value].upcase }

# Analyze collection profile
report = GC::Profiler.result
puts report

# Examine specific metrics
GC::Profiler.total_time  # => 0.0123 seconds

Object pooling reduces allocation overhead for frequently created objects. Pools maintain pre-allocated objects for reuse instead of creating new instances. This technique works best for objects with expensive initialization or high allocation rates.

class ConnectionPool
  def initialize(size = 10)
    @pool = []
    @mutex = Mutex.new
    
    size.times { @pool << create_connection }
  end
  
  def checkout
    @mutex.synchronize do
      @pool.pop || create_connection
    end
  end
  
  def checkin(connection)
    @mutex.synchronize do
      @pool.push(connection) if @pool.size < 20
    end
  end
  
  private
  
  def create_connection
    # Expensive connection creation
    { socket: TCPSocket.new("localhost", 8080), created_at: Time.now }
  end
end

Thread Safety & Concurrency

Ruby's Global Interpreter Lock (GIL) serializes thread execution but allows concurrent garbage collection. The collector can interrupt any thread and must coordinate with all threads before proceeding. This coordination creates brief synchronization points across the entire application.

Garbage collection uses a tri-color marking algorithm that operates concurrently with application threads. The collector marks objects as white (unmarked), gray (marked but not scanned), or black (marked and scanned). This approach reduces stop-the-world pause times but requires careful synchronization.

# Demonstrate GC coordination across threads
threads = []
mutex = Mutex.new
gc_count = 0

5.times do |i|
  threads << Thread.new do
    1000.times do
      # Allocate objects in each thread
      Array.new(100) { "thread_#{i}_data" }
      
      # Track GC occurrences
      current_count = GC.count
      mutex.synchronize do
        if current_count > gc_count
          gc_count = current_count
          puts "GC ##{gc_count} occurred during thread #{i}"
        end
      end
    end
  end
end

threads.each(&:join)

Thread-local allocation reduces contention for memory pools. Each thread maintains separate allocation buffers for small objects. This design minimizes lock contention during allocation but requires coordination during garbage collection. Large object allocation still uses shared pools.

Finalizers execute on arbitrary threads and must handle concurrent access correctly. The garbage collector thread runs finalizers during collection cycles. Finalizers should avoid accessing shared state or use appropriate synchronization primitives.

class ThreadSafeResource
  @@cleanup_mutex = Mutex.new
  @@resources = []
  
  def initialize
    @resource_id = SecureRandom.uuid
    
    @@cleanup_mutex.synchronize do
      @@resources << @resource_id
    end
    
    ObjectSpace.define_finalizer(self, self.class.finalizer(@resource_id))
  end
  
  def self.finalizer(resource_id)
    proc do
      @@cleanup_mutex.synchronize do
        @@resources.delete(resource_id)
        # Cleanup external resources safely
      end
    end
  end
  
  def self.active_resources
    @@cleanup_mutex.synchronize { @@resources.dup }
  end
end

Common Pitfalls

Memory leaks occur when objects remain reachable through unintended references. Global variables, class variables, and constants prevent garbage collection of referenced objects. Circular references between objects also prevent collection in some cases.

# Problematic global reference
$cached_data = []

def process_request(data)
  # Accidentally accumulates data forever  
  $cached_data << data
  # ... process data
end

# Better approach with size limits
class RequestCache
  def initialize(max_size = 1000)
    @cache = []
    @max_size = max_size
  end
  
  def add(data)
    @cache << data
    @cache.shift if @cache.size > @max_size
  end
end

String interpolation creates new string objects on each evaluation. Repeated interpolation in loops generates significant garbage. Using string concatenation with << or building arrays and joining reduces allocations.

# Inefficient string building
result = ""
1000.times { |i| result += "item_#{i}," }

# More efficient approaches
result = []
1000.times { |i| result << "item_#{i}" }
final_string = result.join(",")

# Or using string buffer
result = String.new
1000.times { |i| result << "item_#{i}," }

Hash and array growth triggers internal reallocation and copying. Pre-allocating capacity using Array.new(size) or Hash.new with capacity hints reduces reallocations. The capacity method reveals internal buffer sizes for debugging growth patterns.

Finalizers create subtle ordering dependencies and resource management issues. Finalizers cannot reference the original object and execute on unpredictable threads. External resource cleanup should use explicit close methods rather than relying on finalization.

# Problematic finalizer usage
class BadFileHandler
  def initialize(filename)
    @filename = filename
    @file = File.open(filename)
    
    # Finalizer cannot access @file reliably
    ObjectSpace.define_finalizer(self) do
      @file.close  # This won't work!
    end
  end
end

# Better explicit resource management
class GoodFileHandler
  def initialize(filename)
    @file = File.open(filename)
  end
  
  def close
    @file.close unless @file.closed?
    @file = nil
  end
  
  def self.open(filename)
    handler = new(filename)
    if block_given?
      begin
        yield handler
      ensure
        handler.close
      end
    else
      handler
    end
  end
end

ObjectSpace iteration holds references to enumerated objects, preventing collection during iteration. Long-running iterations can cause memory buildup. Breaking large iterations into smaller chunks allows intermediate collection.

Closure captures retain references to enclosing scopes, potentially preventing garbage collection of large objects. Proc and lambda objects capture the entire binding, not just referenced variables. Explicit niling of unused variables helps release references.

# Closure retains large dataset unnecessarily
large_dataset = Array.new(1_000_000) { rand }
small_value = large_dataset.first

processor = proc { puts "Processing: #{small_value}" }

# large_dataset remains referenced through closure
# Better to explicitly clear
large_dataset = nil
processor = proc { puts "Processing: #{small_value}" }

Reference

Garbage Collection Methods

Method Parameters Returns Description
GC.start full_mark: true, immediate_sweep: true nil Initiates garbage collection cycle
GC.enable None true/false Enables automatic garbage collection
GC.disable None true/false Disables automatic garbage collection
GC.count None Integer Returns total number of GC runs
GC.stat key = nil Hash/Integer Returns GC statistics hash or specific value

ObjectSpace Methods

Method Parameters Returns Description
ObjectSpace.count_objects result_hash = nil Hash Counts objects by type
ObjectSpace.each_object class = nil, &block Integer Iterates through live objects
ObjectSpace.define_finalizer obj, callable Array Registers object finalizer
ObjectSpace.undefine_finalizer obj obj Removes object finalizer
ObjectSpace.garbage_collect full_mark: true, immediate_sweep: true nil Triggers garbage collection

Allocation Tracing Methods

Method Parameters Returns Description
ObjectSpace.trace_object_allocations_start None nil Enables allocation tracing
ObjectSpace.trace_object_allocations_stop None nil Disables allocation tracing
ObjectSpace.allocation_sourcefile object String/nil Returns allocation source file
ObjectSpace.allocation_sourceline object Integer/nil Returns allocation source line
ObjectSpace.allocation_class_path object String/nil Returns allocation class path

GC Statistics Keys

Key Type Description
:count Integer Total garbage collections performed
:heap_allocated_pages Integer Number of heap pages allocated
:heap_sorted_length Integer Number of heap pages that can hold objects
:heap_allocatable_pages Integer Number of heap pages available for allocation
:heap_available_slots Integer Number of heap slots available for objects
:heap_live_slots Integer Number of heap slots containing live objects
:heap_free_slots Integer Number of heap slots available for allocation
:heap_final_slots Integer Number of heap slots pending finalization
:old_objects Integer Number of old generation objects
:old_objects_limit Integer Threshold for old generation collection
:oldmalloc_increase Integer Bytes allocated outside heap since last GC
:oldmalloc_limit Integer Threshold for malloc-triggered collection

Object Count Types

Type Description
:TOTAL Total number of allocated objects
:FREE Number of free object slots
:T_OBJECT Basic Ruby objects
:T_CLASS Class objects
:T_MODULE Module objects
:T_FLOAT Float objects
:T_STRING String objects
:T_REGEXP Regular expression objects
:T_ARRAY Array objects
:T_HASH Hash objects
:T_FILE File objects
:T_DATA Extension data objects
:T_MATCH MatchData objects
:T_COMPLEX Complex number objects
:T_RATIONAL Rational number objects

Environment Variables

Variable Default Description
RUBY_GC_HEAP_INIT_SLOTS 10000 Initial heap slot count
RUBY_GC_HEAP_FREE_SLOTS 4096 Minimum free slots maintained
RUBY_GC_HEAP_GROWTH_FACTOR 1.8 Heap growth multiplier
RUBY_GC_HEAP_GROWTH_MAX_SLOTS 0 Maximum slots per growth (0 = unlimited)
RUBY_GC_MALLOC_LIMIT 16MB Malloc limit before collection
RUBY_GC_MALLOC_LIMIT_MAX 32MB Maximum malloc limit
RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR 1.4 Malloc limit growth factor
RUBY_GC_OLDMALLOC_LIMIT 16MB Old malloc limit before collection
RUBY_GC_OLDMALLOC_LIMIT_MAX 128MB Maximum old malloc limit