CrackedRuby - Memory Management Concepts

Overview

Memory management controls how programs allocate, use, and release memory during execution. Every running program requires memory to store variables, data structures, execution context, and machine instructions. The method used to manage this memory directly impacts application performance, reliability, and resource consumption.

Memory management operates at multiple levels. At the hardware level, physical RAM stores program data. The operating system manages virtual memory, mapping physical addresses to virtual address spaces for each process. Programming languages implement memory management through various strategies ranging from manual allocation to automatic garbage collection. Each approach presents different trade-offs between performance, developer complexity, and safety.

The memory lifecycle follows a consistent pattern: allocation, usage, and deallocation. During allocation, the system reserves memory for program data. The program then reads and writes to this memory during execution. Finally, deallocation releases the memory back to the system for reuse. Failures in this cycle lead to memory leaks, dangling pointers, and crashes.

Two primary memory regions exist in most programs: the stack and the heap. The stack stores local variables, function parameters, and return addresses in a last-in-first-out structure. Stack allocation and deallocation occur automatically when functions are called and return. The heap provides dynamic memory for data structures with unpredictable lifetimes. Programs explicitly request heap memory during execution, and different languages handle heap deallocation through manual or automatic mechanisms.

# Stack allocation - automatic and fast
def calculate_sum(a, b)
  result = a + b  # 'result' allocated on stack
  result          # memory freed when function returns
end

# Heap allocation - dynamic and persistent
class DataProcessor
  def initialize
    @large_dataset = Array.new(1_000_000)  # Allocated on heap
    @cache = {}                             # Hash also on heap
  end
end

Memory management complexity increases with program scale. Small programs may function adequately with inefficient memory handling, but production systems processing millions of requests require careful memory optimization. Memory issues often manifest as gradual performance degradation, making them difficult to diagnose without proper monitoring.

Key Principles

Memory management builds on fundamental principles that apply across programming languages and platforms. Understanding these principles provides the foundation for writing efficient, reliable code regardless of the specific memory management approach used.

Memory Allocation Strategies

Programs allocate memory through two primary mechanisms: static allocation and dynamic allocation. Static allocation occurs at compile time for variables with known size and lifetime. The compiler determines exact memory requirements and reserves space in the program binary. Dynamic allocation happens at runtime when memory requirements are not known in advance. The program requests memory from the system during execution, and the allocation size can vary based on runtime conditions.

Stack allocation provides the fastest memory access pattern. The stack pointer moves up and down as functions call and return, making allocation and deallocation single instruction operations. Stack memory is inherently limited, typically measured in megabytes, making it suitable only for small, short-lived data. Stack overflow occurs when nested function calls or large local variables exceed the available stack space.

Heap allocation offers flexibility at the cost of performance. Programs request variable amounts of heap memory during execution, and the allocator finds suitable memory blocks from the available heap space. Heap memory persists until explicitly freed or garbage collected, making it suitable for long-lived data structures. The heap can grow to fill available system memory, but fragmentation and allocation overhead impact performance.

Memory Layout and Addressing

Programs organize memory into distinct segments, each serving specific purposes. The text segment contains executable machine code and remains read-only during execution. The data segment stores global and static variables initialized at program start. The BSS segment holds uninitialized global variables, allocated but not stored in the executable file. The heap grows upward in memory as allocations occur, while the stack grows downward from high addresses.

Virtual memory abstracts physical memory addresses, providing each process with an isolated address space. The operating system maps virtual addresses to physical RAM locations, enabling memory protection and overcommitment. Page tables store these mappings, and the memory management unit (MMU) performs address translation during memory access. Virtual memory allows processes to use more memory than physically available through swapping to disk.

Ownership and Lifetime Management

Memory ownership determines which code component is responsible for deallocating memory. In manual memory management, the allocating code typically owns and must free the memory. Ownership transfer occurs when functions return allocated memory to callers. Shared ownership arises when multiple components reference the same memory, requiring coordination for safe deallocation.

Object lifetime spans from allocation to deallocation. Local variables have automatic lifetime tied to function scope. Dynamic allocations persist until explicitly freed or garbage collected. Static variables live for the entire program execution. Mismatched lifetimes cause memory leaks when allocations outlive their usefulness, and use-after-free bugs when code accesses deallocated memory.

Garbage Collection Fundamentals

Garbage collection automates memory deallocation by identifying unreachable objects. The garbage collector traces references from root objects (globals, stack variables) to find all reachable memory. Objects not reached during tracing are considered garbage and can be deallocated. This automatic approach eliminates manual deallocation bugs but introduces runtime overhead.

Reference counting tracks how many references point to each object. When the count reaches zero, the object becomes garbage. Reference counting provides deterministic deallocation timing but cannot handle reference cycles where objects reference each other circularly. Tracing collectors handle cycles but run periodically, causing unpredictable pauses.

Memory Safety and Correctness

Memory safety ensures programs cannot access invalid memory addresses or interpret memory as the wrong type. Buffer overflows occur when code writes beyond allocated memory boundaries, corrupting adjacent data. Dangling pointers reference deallocated memory, leading to crashes or security vulnerabilities. Double-free errors attempt to deallocate the same memory twice, corrupting allocator metadata.

Type safety prevents interpreting memory as incorrect types. Type systems enforce that operations match data types, preventing undefined behavior. Memory-safe languages eliminate entire classes of vulnerabilities by preventing unsafe memory access at compile time or runtime.

# Ruby prevents manual memory access
array = [1, 2, 3]
# No pointer arithmetic or manual addressing
# Bounds checking prevents overflows
value = array[10]  # Returns nil, doesn't crash

# References are managed automatically
obj1 = Object.new
obj2 = obj1        # Reference counted
obj1 = nil         # Decrements reference
# obj2 still valid - memory not freed until last reference gone

Memory Pooling and Allocation Patterns

Memory pools pre-allocate blocks of memory for efficient repeated allocations of same-sized objects. Instead of requesting memory from the system for each allocation, programs draw from the pool. This reduces allocation overhead and fragmentation for frequently created and destroyed objects. Object pools reuse allocated objects rather than repeatedly allocating and freeing memory.

Allocation patterns significantly impact performance. Many small allocations cause overhead from repeated system calls. Allocating large contiguous blocks and subdividing them improves efficiency. Arena allocators batch allocate memory that gets freed together, simplifying memory management for request-scoped data. Stack-like allocation patterns allocate and free in reverse order, enabling efficient bump-pointer allocation.

Implementation Approaches

Memory management strategies span a spectrum from fully manual to fully automatic, each presenting distinct characteristics for reliability, performance, and developer experience.

Manual Memory Management

Manual memory management places allocation and deallocation responsibility entirely on the programmer. Languages like C and C++ provide explicit functions for memory operations. Programs call malloc or new to allocate memory, receiving a pointer to the allocated space. When finished, programs must call free or delete to return memory to the system.

This approach offers maximum control over memory usage and timing. Programmers decide exactly when to allocate and free memory, enabling fine-tuned optimization. Memory footprint remains predictable since allocations are explicit. No runtime overhead from garbage collection exists, making manual management suitable for real-time systems requiring deterministic performance.

Manual management demands careful attention to ownership semantics. Each allocation must have a corresponding deallocation, and programs must track which code owns each memory region. Documentation and conventions establish ownership rules, but the compiler cannot enforce them. Complex data structures with shared references require sophisticated coordination to avoid double-free errors or leaks.

# Ruby simulates manual patterns through explicit cleanup
class ResourceManager
  def initialize
    @resources = []
  end
  
  def allocate_resource
    resource = ExpensiveResource.new
    @resources << resource
    resource
  end
  
  def free_resource(resource)
    @resources.delete(resource)
    resource.cleanup  # Explicit cleanup method
  end
  
  def free_all
    @resources.each(&:cleanup)
    @resources.clear
  end
end

Automatic Reference Counting

Automatic reference counting (ARC) tracks how many references point to each object. The system maintains a reference count metadata field for each allocation. When code creates a reference, the count increments. When a reference goes out of scope or gets reassigned, the count decrements. When the count reaches zero, the object is immediately deallocated.

Reference counting provides deterministic deallocation timing. Objects are freed as soon as the last reference disappears, making resource release predictable. This characteristic benefits objects managing external resources like file handles or network connections. Destructors run immediately when references drop to zero, ensuring timely resource cleanup.

Reference counting introduces overhead for every reference operation. Each assignment requires incrementing one count and decrementing another. Atomic operations are necessary in multithreaded programs to prevent race conditions in count updates. Cyclic references create memory leaks when objects reference each other, preventing counts from ever reaching zero.

# Ruby uses reference counting internally
class Node
  attr_accessor :next, :data
  
  def initialize(data)
    @data = data
    @next = nil
  end
end

# Circular reference - both nodes reference each other
node1 = Node.new("A")
node2 = Node.new("B")
node1.next = node2
node2.next = node1
# Ruby's GC handles this cycle through tracing

Tracing Garbage Collection

Tracing garbage collectors periodically scan memory to identify reachable objects. Starting from root references (stack variables, globals, registers), the collector traces through all reachable objects, marking them as live. After tracing completes, any unmarked objects are garbage and can be reclaimed.

Mark-and-sweep collection traverses the object graph in two phases. The mark phase traces references and sets mark bits on reachable objects. The sweep phase scans all allocated memory, freeing unmarked objects. This approach handles reference cycles correctly but requires stopping program execution during collection.

Generational collection optimizes tracing based on the observation that most objects die young. The heap is divided into generations, with newly allocated objects in the young generation. Minor collections frequently scan only the young generation, quickly reclaiming short-lived objects. Objects surviving multiple collections promote to older generations scanned less frequently. This reduces collection overhead since most allocations are reclaimed in fast minor collections.

Copying collection moves live objects to compact them and eliminate fragmentation. The heap is divided into two semi-spaces. Allocation occurs in the from-space until full. Collection copies live objects to the to-space, leaving garbage behind. The roles of spaces swap, and allocation continues. Copying automatically compacts memory but requires twice the heap space.

Hybrid Approaches

Modern systems often combine multiple strategies to balance trade-offs. Languages may use reference counting for immediate cleanup supplemented by cycle-detecting tracing collection. This provides predictable destructor timing while handling cyclic structures.

Region-based memory management groups allocations into regions freed together. Programs allocate memory in the current region, then free the entire region at once. This provides efficient bulk deallocation for request-scoped or phase-based allocation patterns. Arena allocators implement this pattern for temporary data structures.

Escape analysis determines whether objects remain local to a function or escape to the heap. When the compiler proves an object does not escape, it can allocate it on the stack instead of the heap. This eliminates garbage collection overhead for objects with local lifetime, combining manual allocation efficiency with automatic safety.

Ruby Implementation

Ruby implements automatic memory management through a sophisticated garbage collector that has evolved significantly over time. Understanding Ruby's memory model enables developers to write efficient code and diagnose performance issues.

Ruby Memory Model

Ruby allocates objects on the heap and manages them through the garbage collector. Each object requires memory for its data and metadata. The Ruby VM stores objects in allocated slots, with each slot sized for the object's type. Small objects like integers and symbols are allocated in compact slots, while larger objects like arrays and hashes require more space.

Ruby uses object references throughout. Variables store references to objects rather than the objects themselves. When assigning one variable to another, Ruby copies the reference, not the object data. This reference model enables efficient passing of large objects without copying but requires understanding reference semantics.

# Variables hold references, not values
array1 = [1, 2, 3]
array2 = array1        # Copy reference, not data
array2 << 4
puts array1.inspect    # => [1, 2, 3, 4] - same object

# Primitive values are immediate, not references
x = 5
y = x
y += 1
puts x  # => 5 - integers are immutable values

Ruby symbols are interned strings stored in a global symbol table. The first time Ruby encounters a symbol, it allocates memory and adds it to the table. Subsequent uses of that symbol reference the existing allocation. Symbols persist for the program lifetime and are never garbage collected in older Ruby versions, though modern Ruby can collect unreferenced symbols.

Garbage Collection Implementation

Ruby's garbage collector uses a mark-and-sweep algorithm with generational collection. The collector identifies live objects by tracing references from root objects: global variables, constants, local variables on the stack, and objects referenced by C extensions.

The marking phase traverses the object graph recursively, setting mark bits on reachable objects. Ruby uses a tri-color marking scheme: white objects are candidates for collection, gray objects are reached but not yet scanned, and black objects are fully processed. This scheme enables incremental marking where collection work spreads across multiple phases.

The sweeping phase scans allocated object slots, freeing unmarked objects. Ruby maintains free lists of available slots organized by size. When sweeping frees an object, it returns the slot to the appropriate free list. Future allocations draw from free lists before requesting new memory from the system.

# Examining GC statistics
GC.stat.each do |key, value|
  puts "#{key}: #{value}"
end

# Typical output includes:
# count: number of GC runs
# heap_allocated_pages: total allocated pages
# heap_live_slots: slots with live objects
# heap_free_slots: available slots
# total_allocated_objects: cumulative allocations

Generational Garbage Collection

Ruby divides the heap into generations to optimize collection performance. Newly allocated objects start in the young generation. Objects surviving multiple collections promote to the old generation. Minor collections scan only the young generation, while major collections scan all generations.

The write barrier tracks references from old objects to young objects. When code assigns a young object to a field in an old object, the write barrier records this reference. Minor collections must scan these recorded references to avoid missing reachable young objects. This mechanism allows minor collections to avoid scanning the old generation.

Ruby uses a three-generation model in modern versions: young, old, and remembered sets. The young generation is collected frequently and quickly. Objects surviving several minor collections promote to the old generation. Remembered sets track cross-generational references detected by write barriers.

# Forcing garbage collection
GC.start  # Triggers full collection

# Disabling/enabling GC
GC.disable
# ... allocate many objects without collection
GC.enable
GC.start

# Checking GC statistics before and after
before = GC.stat
# ... code to profile
after = GC.stat
puts "Collections: #{after[:count] - before[:count]}"
puts "Allocated: #{after[:total_allocated_objects] - before[:total_allocated_objects]}"

Memory Allocation Patterns

Ruby allocates memory in pages, with each page divided into slots for objects. The page size and slot size vary by Ruby version and configuration. When the free list empties, Ruby allocates new pages from the operating system. These pages remain allocated for the program lifetime, even if all objects within them are freed.

Ruby optimizes allocation for common object types. Small strings, arrays, and hashes get slots of specific sizes. Large objects that exceed slot sizes are allocated separately. Ruby may store small arrays inline within the array object rather than allocating separate memory for elements.

String optimization includes interning and copy-on-write semantics. Frozen strings are interned, reusing existing allocations. Substring operations may share memory with the original string through shared buffers. Modifications trigger copy-on-write, allocating new memory only when necessary.

# String interning reduces memory
str1 = -"constant_string"  # Frozen and interned
str2 = -"constant_string"  # Reuses same memory
puts str1.object_id == str2.object_id  # => true

# Hash optimization
hash = {}
1000.times { |i| hash[i] = i * 2 }
# Ruby may optimize hash storage as it grows

# Array optimization for small arrays
small = [1, 2, 3]        # Elements stored inline
large = Array.new(1000)  # Separate allocation for elements

Memory Profiling and Debugging

Ruby provides several tools for memory profiling and debugging. The ObjectSpace module enables enumerating all live objects, counting objects by type, and tracking allocations. The GC module exposes statistics and controls for the garbage collector.

The memory_profiler gem provides detailed allocation tracking, showing where objects are allocated and how long they live. This gem helps identify memory leaks and excessive allocations. The allocation tracer shows allocation patterns over time, helping optimize hot paths.

require 'objspace'

# Count objects by type
ObjectSpace.count_objects.each do |type, count|
  puts "#{type}: #{count}" if count > 1000
end

# Find where objects are allocated
ObjectSpace.trace_object_allocations_start
array = Array.new(1000)
file = ObjectSpace.allocation_sourcefile(array)
line = ObjectSpace.allocation_sourceline(array)
puts "Allocated at #{file}:#{line}"
ObjectSpace.trace_object_allocations_stop

# Memory usage by class
counts = Hash.new(0)
ObjectSpace.each_object do |obj|
  counts[obj.class.name] += 1
end
counts.sort_by { |k, v| -v }.first(10).each do |klass, count|
  puts "#{klass}: #{count}"
end

Tuning Ruby Garbage Collection

Ruby exposes environment variables and runtime settings to tune garbage collection behavior. These settings adjust heap growth, collection frequency, and generation thresholds. Tuning requires understanding application allocation patterns and performance requirements.

The RUBY_GC environment variables control various collector parameters. RUBY_GC_HEAP_INIT_SLOTS sets the initial heap size. RUBY_GC_HEAP_GROWTH_FACTOR controls how aggressively the heap grows. RUBY_GC_HEAP_GROWTH_MAX_SLOTS limits heap growth. These settings balance memory usage against collection frequency.

# Programmatic GC tuning
GC.start(full_mark: true, immediate_sweep: true)

# Adjusting GC parameters at runtime
GC::Profiler.enable
# ... run code
GC::Profiler.report
GC::Profiler.disable

# Application-specific tuning example
if ENV['RAILS_ENV'] == 'production'
  # Tune for throughput over latency
  GC.stat[:heap_allocated_pages] * 0.3
end

Practical Examples

Memory management concepts become clearer through concrete examples demonstrating allocation patterns, lifecycle management, and optimization techniques.

Managing Object Lifecycles

Applications often create temporary objects during request processing that should be released after the request completes. Proper lifecycle management prevents memory leaks in long-running processes.

class RequestProcessor
  def process_request(data)
    # Temporary objects created during processing
    parser = DataParser.new(data)
    result = parser.parse
    
    # Transform result
    transformed = transform(result)
    
    # Heavy object no longer needed
    parser = nil
    result = nil
    
    # Return transformed data
    # parser and result eligible for collection
    transformed
  end
  
  def transform(data)
    # Create temporary structures
    temp_buffer = []
    
    data.each do |item|
      processed = expensive_operation(item)
      temp_buffer << processed
    end
    
    # Compact result
    temp_buffer.compact
    # temp_buffer eligible for collection after return
  end
  
  def expensive_operation(item)
    # Simulate expensive computation
    result = item.transform
    result.validate? ? result : nil
  end
end

Optimizing Memory Allocation in Loops

Allocating objects inside loops can cause excessive garbage collection. Moving allocations outside loops or reusing objects reduces memory pressure.

# Inefficient: allocates string on each iteration
def process_inefficient(items)
  items.each do |item|
    formatted = "Item: #{item}"  # New string each iteration
    log(formatted)
  end
end

# Efficient: reuse string buffer
def process_efficient(items)
  buffer = String.new
  
  items.each do |item|
    buffer.clear
    buffer << "Item: " << item.to_s
    log(buffer)
  end
end

# Alternative: use frozen string optimization
def process_frozen(items)
  prefix = "Item: ".freeze  # Single allocation, shared
  
  items.each do |item|
    log(prefix + item.to_s)  # Only item.to_s allocates
  end
end

Managing Large Collections

Large collections consume significant memory and require careful handling to prevent out-of-memory errors. Streaming and batching techniques process data without loading everything into memory.

class LargeDataProcessor
  def process_file_streaming(filepath)
    # Stream file line by line instead of reading all at once
    File.foreach(filepath) do |line|
      process_line(line)
      # Previous lines eligible for collection
    end
  end
  
  def process_database_batched(records)
    # Process records in batches
    records.find_each(batch_size: 1000) do |record|
      process_record(record)
      # Batches collected after processing
    end
  end
  
  def build_large_result
    # Use enumerator for lazy evaluation
    (1..1_000_000).lazy
      .map { |n| expensive_computation(n) }
      .select { |result| result.valid? }
      .first(100)  # Only computes 100 results
  end
  
  def expensive_computation(n)
    # Simulate expensive operation
    OpenStruct.new(value: n * 2, valid?: n.even?)
  end
  
  def process_line(line)
    # Process individual line
    line.strip.split(',').each { |field| validate(field) }
  end
  
  def process_record(record)
    # Process individual record
    record.update(processed: true)
  end
  
  def validate(field)
    !field.empty?
  end
end

Object Pool Implementation

Object pooling reuses expensive objects instead of repeatedly allocating and deallocating them. This pattern reduces garbage collection overhead for frequently created objects.

class ConnectionPool
  def initialize(size: 5)
    @size = size
    @pool = []
    @available = []
    @mutex = Mutex.new
    
    # Pre-allocate connections
    size.times do
      conn = create_connection
      @pool << conn
      @available << conn
    end
  end
  
  def acquire
    @mutex.synchronize do
      if @available.empty?
        raise "No connections available"
      end
      
      conn = @available.pop
      conn.reset  # Reset state for reuse
      conn
    end
  end
  
  def release(connection)
    @mutex.synchronize do
      connection.cleanup  # Cleanup but don't close
      @available << connection
    end
  end
  
  def with_connection
    conn = acquire
    begin
      yield conn
    ensure
      release(conn)
    end
  end
  
  private
  
  def create_connection
    # Simulate expensive connection creation
    ExpensiveConnection.new
  end
end

class ExpensiveConnection
  def reset
    @state = nil
  end
  
  def cleanup
    @buffer = nil
  end
  
  def query(sql)
    # Simulate query
    [{ id: 1, name: "test" }]
  end
end

# Usage
pool = ConnectionPool.new(size: 10)
pool.with_connection do |conn|
  results = conn.query("SELECT * FROM users")
  process_results(results)
end  # Connection automatically returned to pool

Memory-Efficient Data Structures

Choosing appropriate data structures impacts memory consumption. Different structures have different memory characteristics and access patterns.

class MemoryEfficientProcessor
  # Use array for ordered, indexed access
  def process_sequential(count)
    data = Array.new(count) { |i| compute_value(i) }
    data.each { |value| process(value) }
  end
  
  # Use hash for key-value lookups
  def process_keyed(items)
    lookup = items.each_with_object({}) do |item, hash|
      hash[item.key] = item.value
    end
    
    lookup.each { |key, value| process_pair(key, value) }
  end
  
  # Use set for membership testing
  def process_unique(items)
    require 'set'
    seen = Set.new
    
    items.each do |item|
      next if seen.include?(item.id)
      seen.add(item.id)
      process(item)
    end
  end
  
  # Avoid intermediate arrays
  def process_streaming(large_array)
    # Inefficient: creates intermediate arrays
    # large_array.map { |x| x * 2 }.select { |x| x > 10 }.first(5)
    
    # Efficient: lazy evaluation
    large_array.lazy
      .map { |x| x * 2 }
      .select { |x| x > 10 }
      .first(5)
  end
  
  private
  
  def compute_value(i)
    i * 2
  end
  
  def process(value)
    # Process value
  end
  
  def process_pair(key, value)
    # Process key-value pair
  end
end

Performance Considerations

Memory management performance affects application throughput, latency, and resource utilization. Understanding performance characteristics enables optimizing for specific requirements.

Allocation Performance

Memory allocation speed varies by size and allocator strategy. Small allocations from pre-allocated pools complete in nanoseconds. Large allocations require system calls taking microseconds. Allocation patterns significantly impact performance.

Ruby's object allocation involves finding a free slot, initializing object metadata, and running initialization code. Minor allocations from free lists complete quickly. Allocations triggering page allocations or garbage collection cause significant latency spikes.

require 'benchmark'

# Measure allocation patterns
Benchmark.bm do |x|
  x.report("small objects") do
    100_000.times { Object.new }
  end
  
  x.report("medium arrays") do
    10_000.times { Array.new(1000) }
  end
  
  x.report("large arrays") do
    1_000.times { Array.new(100_000) }
  end
  
  x.report("string interpolation") do
    100_000.times { |i| "iteration #{i}" }
  end
  
  x.report("frozen strings") do
    frozen = "iteration ".freeze
    100_000.times { |i| frozen + i.to_s }
  end
end

Garbage Collection Overhead

Garbage collection pauses application execution during collection cycles. The pause duration depends on the heap size, object count, and collection type. Minor collections complete in milliseconds, while major collections may take hundreds of milliseconds.

Allocation rate determines collection frequency. Applications allocating many short-lived objects trigger frequent minor collections. High allocation rates increase time spent in collection, reducing throughput. Reducing allocations or increasing heap size decreases collection frequency.

# Measure GC impact
GC::Profiler.enable

before_gc = GC.stat
before_time = Time.now

# Allocate many objects
1_000_000.times { Object.new }

after_time = Time.now
after_gc = GC.stat

puts "Time elapsed: #{after_time - before_time}s"
puts "GC count: #{after_gc[:count] - before_gc[:count]}"
puts "Total allocated: #{after_gc[:total_allocated_objects] - before_gc[:total_allocated_objects]}"

# Show GC profile
GC::Profiler.report
GC::Profiler.disable

Memory Fragmentation

Memory fragmentation occurs when free memory exists in non-contiguous blocks. External fragmentation leaves gaps between allocated objects. Internal fragmentation wastes space within allocated blocks. Fragmentation reduces effective memory capacity and slows allocation.

Ruby's generational collector handles fragmentation through compaction during major collections. Copying live objects eliminates gaps, producing contiguous free space. However, compaction requires moving objects and updating references, increasing collection time.

Cache Performance

Memory access patterns affect CPU cache efficiency. Sequential access to contiguous memory achieves high cache hit rates. Random access to scattered objects causes cache misses, slowing execution by an order of magnitude. Data structure layout impacts cache performance.

Locality of reference describes accessing nearby memory locations in time or space. Temporal locality reuses recently accessed data. Spatial locality accesses nearby addresses. High locality improves cache performance through prefetching and reduced cache misses.

# Cache-friendly: sequential array access
def sum_array(array)
  total = 0
  array.each { |value| total += value }  # Sequential access
  total
end

# Cache-unfriendly: scattered hash access
def sum_hash_values(hash, keys)
  total = 0
  keys.each { |key| total += hash[key] }  # Random access
  total
end

Memory Bandwidth

Memory bandwidth limits the rate at which data transfers between RAM and CPU. Applications processing large datasets become memory-bound when computation waits for memory access. Reducing memory traffic through smaller data structures and fewer allocations improves bandwidth utilization.

Copying large objects consumes memory bandwidth. Ruby copies references rather than objects, minimizing bandwidth usage. However, operations like array slicing or string duplication copy data, consuming bandwidth proportional to data size.

Optimization Strategies

Reducing allocations provides the most significant performance improvement. Reusing objects, using object pools, and allocating outside loops minimize allocation overhead. Frozen strings and symbols reduce duplicate allocations for repeated values.

Sizing collections appropriately avoids repeated resizing. Creating arrays or hashes with estimated final size prevents growth-related allocations. Pre-allocation completes in one operation rather than many small allocations.

# Inefficient: repeated resizing
def build_array_inefficient(count)
  array = []
  count.times { |i| array << i }  # Grows and reallocates
  array
end

# Efficient: pre-sized
def build_array_efficient(count)
  array = Array.new(count)
  count.times { |i| array[i] = i }  # No resizing
  array
end

# Efficient: exact size known
def build_hash_efficient(items)
  hash = Hash.new(items.size)  # Pre-sized
  items.each { |item| hash[item.key] = item.value }
  hash
end

Reducing object retention decreases live heap size and collection time. Nulling references to large objects makes them eligible for collection. Avoiding global variables and class variables prevents unintended retention. Weak references allow references without preventing collection.

Profiling and Measurement

Memory profiling identifies allocation hotspots and excessive retention. Ruby's allocation tracking shows where objects are created and which code paths allocate the most. The memory_profiler gem provides detailed reports of allocations, retentions, and memory usage by location.

require 'memory_profiler'

report = MemoryProfiler.report do
  # Code to profile
  1000.times do
    data = Array.new(100)
    process(data)
  end
end

# Show top allocation locations
report.pretty_print(scale_bytes: true)

Common Pitfalls

Memory management complexity creates opportunities for subtle bugs that impact reliability and performance. Recognizing common pitfalls enables avoiding them during development.

Unintended Object Retention

Objects remain in memory longer than necessary when references persist after the object's purpose is fulfilled. Closures capture variables from enclosing scopes, preventing collection even when the captured data is no longer needed. Global variables and class variables retain objects for the application lifetime.

class Processor
  # Pitfall: class variable retains all instances
  @@instances = []
  
  def initialize(data)
    @data = data
    @@instances << self  # Retains forever
  end
end

# Better: use instance variable with cleanup
class BetterProcessor
  @instances = []
  
  class << self
    attr_accessor :instances
  end
  
  def initialize(data)
    @data = data
    self.class.instances << self
  end
  
  def cleanup
    self.class.instances.delete(self)
  end
end

Callbacks and event handlers commonly cause retention issues. Registering an object as a listener creates a reference preventing collection. Applications must unregister listeners when objects should be collected.

class EventPublisher
  def initialize
    @listeners = []
  end
  
  def subscribe(listener)
    @listeners << listener
  end
  
  def unsubscribe(listener)
    @listeners.delete(listener)  # Important: allow GC
  end
  
  def notify
    @listeners.each { |listener| listener.handle_event }
  end
end

# Usage must unsubscribe
publisher = EventPublisher.new
listener = EventListener.new
publisher.subscribe(listener)
# ... use listener
publisher.unsubscribe(listener)  # Required for GC

Excessive Temporary Allocations

Creating many short-lived objects in hot code paths increases garbage collection overhead. String concatenation in loops allocates intermediate strings. Mapping and filtering collections creates intermediate arrays.

# Pitfall: allocates many intermediate strings
def build_message_inefficient(items)
  message = ""
  items.each do |item|
    message += "#{item},"  # New string each iteration
  end
  message
end

# Better: use string buffer
def build_message_efficient(items)
  message = String.new
  items.each do |item|
    message << item.to_s << ","
  end
  message
end

# Pitfall: multiple intermediate arrays
def process_inefficient(items)
  items.map { |x| x * 2 }
       .select { |x| x > 10 }
       .take(5)
  # Creates 3 intermediate arrays
end

# Better: lazy evaluation
def process_efficient(items)
  items.lazy
       .map { |x| x * 2 }
       .select { |x| x > 10 }
       .take(5)
       .to_a
  # Single final array
end

Memory Leaks Through Caches

Unbounded caches grow indefinitely, consuming increasing memory over time. Applications must limit cache size through eviction policies. Least-recently-used (LRU) caches evict old entries when reaching size limits.

# Pitfall: unbounded cache grows forever
class UnboundedCache
  def initialize
    @cache = {}
  end
  
  def get(key)
    @cache[key] ||= compute(key)
  end
  
  def compute(key)
    # Expensive computation
    key.to_s.upcase
  end
end

# Better: bounded LRU cache
class BoundedCache
  def initialize(max_size: 1000)
    @cache = {}
    @max_size = max_size
    @access_order = []
  end
  
  def get(key)
    if @cache.key?(key)
      @access_order.delete(key)
      @access_order << key
      return @cache[key]
    end
    
    value = compute(key)
    set(key, value)
    value
  end
  
  private
  
  def set(key, value)
    if @cache.size >= @max_size
      oldest = @access_order.shift
      @cache.delete(oldest)
    end
    
    @cache[key] = value
    @access_order << key
  end
  
  def compute(key)
    key.to_s.upcase
  end
end

Circular References in Data Structures

Circular references occur when objects reference each other directly or indirectly. While Ruby's garbage collector handles cycles, they increase collection time and complicate debugging.

# Circular reference example
class Node
  attr_accessor :value, :next, :prev
  
  def initialize(value)
    @value = value
    @next = nil
    @prev = nil
  end
end

# Creating circular list
node1 = Node.new(1)
node2 = Node.new(2)
node1.next = node2
node2.prev = node1
node2.next = node1  # Circular
node1.prev = node2  # Circular

# Breaking cycles for cleanup
def break_cycle(node)
  node.next = nil
  node.prev = nil
end

Forgetting to Close Resources

Objects managing external resources like files or connections must explicitly close them. Relying on garbage collection for resource cleanup causes resource exhaustion before collection occurs.

# Pitfall: relying on GC to close files
def process_file_unsafe(filename)
  file = File.open(filename)
  data = file.read
  process_data(data)
  # File remains open until GC runs
end

# Better: explicit close
def process_file_safe(filename)
  file = File.open(filename)
  begin
    data = file.read
    process_data(data)
  ensure
    file.close
  end
end

# Best: block form auto-closes
def process_file_best(filename)
  File.open(filename) do |file|
    data = file.read
    process_data(data)
  end  # Automatically closed
end

def process_data(data)
  # Process the data
  data.length
end

Performance Degradation From Large Live Sets

Applications maintaining large numbers of live objects experience slow garbage collection. Collection time increases with heap size since the collector must trace all reachable objects. Reducing live set size through aggressive cleanup improves performance.

# Pitfall: retaining large intermediate results
class BatchProcessor
  def initialize
    @results = []  # Grows without bound
  end
  
  def process_batches(batches)
    batches.each do |batch|
      result = process_batch(batch)
      @results << result  # Retains everything
    end
    @results
  end
  
  def process_batch(batch)
    batch.map { |item| transform(item) }
  end
  
  def transform(item)
    item * 2
  end
end

# Better: stream results, don't accumulate
class StreamingProcessor
  def process_batches(batches)
    batches.flat_map do |batch|
      result = process_batch(batch)
      yield result  # Stream results
      result = nil  # Allow GC
    end
  end
  
  def process_batch(batch)
    batch.map { |item| transform(item) }
  end
  
  def transform(item)
    item * 2
  end
end

Reference

Memory Management Terminology

Term	Definition
Stack	Memory region for local variables and call frames, automatically managed with LIFO allocation
Heap	Memory region for dynamic allocations with explicit or automatic lifetime management
Allocation	Reserving memory for program use from available system memory
Deallocation	Returning allocated memory to the system for reuse
Garbage Collection	Automatic memory management that reclaims unreachable objects
Reference Counting	Tracking number of references to each object for automatic deallocation
Mark-and-Sweep	Garbage collection algorithm that traces reachable objects and frees unreachable ones
Generational GC	Collection strategy dividing objects by age, collecting young objects frequently
Memory Leak	Failure to release unused memory, causing gradual memory exhaustion
Dangling Pointer	Reference to deallocated memory causing undefined behavior when accessed
Fragmentation	Scattered free memory reducing effective capacity and allocation efficiency
Virtual Memory	Abstraction mapping virtual addresses to physical memory locations
Page	Fixed-size memory block used as unit of memory management
Object Pool	Pre-allocated set of objects reused to reduce allocation overhead
Arena Allocator	Batch allocator for memory freed together as a group
Write Barrier	Mechanism tracking references between objects in generational collection

Ruby Memory Management API

Method	Purpose
GC.start	Triggers garbage collection cycle
GC.disable	Disables automatic garbage collection
GC.enable	Re-enables automatic garbage collection
GC.stat	Returns hash of garbage collector statistics
GC.count	Returns number of GC runs since program start
GC::Profiler.enable	Enables GC profiling
GC::Profiler.report	Prints GC profiling report
ObjectSpace.count_objects	Returns count of objects by type
ObjectSpace.each_object	Iterates over all live objects
ObjectSpace.trace_object_allocations_start	Begins tracking object allocation locations
ObjectSpace.allocation_sourcefile	Returns file where object was allocated
ObjectSpace.allocation_sourceline	Returns line number where object was allocated

GC Statistics Keys

Statistic	Description
count	Total number of garbage collection runs
heap_allocated_pages	Total pages allocated from operating system
heap_sorted_length	Size of heap page array
heap_allocatable_pages	Pages available for allocation without growing heap
heap_available_slots	Total slots available across all pages
heap_live_slots	Slots containing live objects
heap_free_slots	Empty slots available for allocation
heap_final_slots	Slots for objects with finalizers
heap_marked_slots	Slots marked during most recent collection
total_allocated_objects	Cumulative count of allocated objects
total_freed_objects	Cumulative count of freed objects
malloc_increase_bytes	Bytes allocated through malloc since last GC
oldmalloc_increase_bytes	Malloc bytes for old generation objects
minor_gc_count	Count of minor garbage collections
major_gc_count	Count of major garbage collections

GC Environment Variables

Variable	Effect
RUBY_GC_HEAP_INIT_SLOTS	Initial number of heap slots
RUBY_GC_HEAP_FREE_SLOTS	Minimum free slots maintained after collection
RUBY_GC_HEAP_GROWTH_FACTOR	Heap growth multiplier when allocating pages
RUBY_GC_HEAP_GROWTH_MAX_SLOTS	Maximum slots to add when growing heap
RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR	Factor for old generation growth
RUBY_GC_MALLOC_LIMIT	Malloc bytes triggering collection
RUBY_GC_OLDMALLOC_LIMIT	Old generation malloc bytes triggering major GC

Memory Optimization Techniques

Technique	When to Use
Object pooling	Frequently created/destroyed expensive objects
String freezing	Repeated use of same string values
Lazy evaluation	Processing large collections with early termination
Streaming	Processing data too large for memory
Batching	Handling large datasets in manageable chunks
Pre-allocation	Known collection sizes before population
Reference clearing	Releasing large objects after use
Weak references	Caches that should not prevent collection
Arena allocation	Request-scoped or phase-based allocations
Copy-on-write	Sharing read-only data between processes

Common Memory Issues

Issue	Symptoms	Detection
Memory leak	Gradual memory growth, eventual OOM	Monitor heap size over time
Excessive GC	High CPU usage, throughput degradation	GC profiling, allocation tracking
Fragmentation	Available memory but allocation failures	Heap statistics, page utilization
Retention	Objects not collected as expected	ObjectSpace enumeration, object tracking
Cache growth	Unbounded memory increase	Monitor cache sizes
Resource leak	File descriptor or socket exhaustion	System resource monitoring
Large live set	Slow GC pauses	Heap object counts, GC pause times
Allocation storm	Rapid memory allocation/deallocation	Allocation profiling, GC statistics

Memory Management Concepts