Overview
Memory leaks occur when a program allocates memory but fails to release it after use, causing memory consumption to grow continuously until system resources become exhausted. In Ruby applications, memory leaks manifest despite the presence of automatic garbage collection because objects remain reachable through unintended references, preventing the garbage collector from reclaiming their memory.
Ruby's garbage collection system automatically manages memory by tracking object references and deallocating objects that become unreachable. However, garbage collection cannot free memory occupied by objects that remain accessible through any reference chain from root objects. This creates scenarios where developers inadvertently maintain references to objects that serve no functional purpose, causing memory to accumulate over time.
Memory leaks in Ruby applications typically emerge from several sources: persistent collections that grow without bounds, circular references between objects, class variables that accumulate data across requests, thread-local storage that persists beyond thread lifetime, and closure scope captures that retain references to large objects. Production Ruby applications often show memory growth patterns where resident set size increases steadily over hours or days, eventually triggering out-of-memory errors or severe performance degradation.
The impact of memory leaks extends beyond memory exhaustion. As heap size grows, garbage collection cycles become longer and more frequent, consuming increasing CPU resources. A Ruby process that starts with 100MB resident memory might grow to several gigabytes over days of operation, with GC pauses extending from milliseconds to seconds, causing request timeouts and system instability.
# Memory leak through persistent collection growth
class RequestLogger
@requests = []
def self.log(request)
# Memory leak: array grows indefinitely
@requests << {
timestamp: Time.now,
path: request.path,
params: request.params.dup
}
end
end
# After processing 1 million requests
# @requests contains 1 million hash objects
# consuming hundreds of megabytes
Detection requires systematic approaches combining monitoring, profiling, and analysis. Memory leaks rarely appear during development with small datasets but emerge in production under sustained load. The detection process involves establishing baseline memory usage, monitoring memory growth patterns, identifying suspect code paths through profiling, and isolating root causes through careful analysis of object retention graphs.
Key Principles
Memory leaks in garbage-collected languages differ fundamentally from manual memory management leaks. In C or C++, leaks occur when allocated memory addresses are lost before deallocation. In Ruby, leaks occur when references prevent garbage collection despite objects serving no purpose. Understanding this distinction shapes detection and prevention strategies.
Ruby's garbage collection uses mark-and-sweep algorithms with generational optimization. The collector starts from root objects—global variables, constants, thread locals, stack frames—and marks all reachable objects by traversing reference chains. Objects not marked during traversal become candidates for collection. A memory leak exists when objects remain reachable through unintended reference chains, keeping them marked and preventing collection.
Reference retention forms the core mechanism behind Ruby memory leaks. Every instance variable, class variable, global variable, constant, and closure binding creates a reference. Collections like arrays and hashes create references to their elements. Objects referenced directly or indirectly from roots remain alive regardless of whether application logic still needs them.
Heap growth patterns distinguish normal memory usage from leaks. Normal applications show sawtooth patterns: memory increases during processing, decreases during GC, and stabilizes around a consistent baseline. Leaked memory creates monotonic growth where baseline memory increases steadily across GC cycles. Production monitoring tracks resident set size (RSS) and heap size over time, with sustained upward trends indicating leaks.
Object retention graphs map reference chains from roots to leaked objects. Analyzing these graphs reveals why objects remain reachable. A leaked model instance might be retained through a chain: global variable → cache hash → stale entry → model instance. Breaking any link in this chain allows collection.
Generational hypothesis in Ruby's GC assumes most objects die young. Young objects occupy a nursery heap with frequent, fast GC. Surviving objects promote to older generations with less frequent collection. Memory leaks violate this hypothesis by creating long-lived objects that should be short-lived, filling older generations and triggering expensive major GC cycles.
# Circular reference preventing collection
class Node
attr_accessor :value, :parent, :children
def initialize(value)
@value = value
@children = []
end
def add_child(child)
@children << child
child.parent = self # Creates circular reference
end
end
root = Node.new("root")
child = Node.new("child")
root.add_child(child)
# Both objects remain reachable through each other
# Even if root becomes unreferenced elsewhere
# They form an island of objects kept alive by mutual references
Memory allocation patterns affect leak visibility. Small leaks accumulating a few kilobytes per request become critical in high-throughput applications. A 10KB leak per request results in 1GB memory growth after 100,000 requests—achievable in minutes for busy applications. Detection requires per-request memory tracking rather than aggregate metrics.
Native memory versus heap memory introduces complexity. Ruby objects live on the Ruby heap, managed by GC. Native extensions allocate from system memory outside Ruby's control. Leaks in native code require different detection tools. Total process memory includes both heap memory and native allocations, complicating diagnosis when both leak sources exist.
Copy-on-write semantics in forked processes create apparent leaks. Parent process memory shared with children through copy-on-write becomes unshared when modified. Ruby's GC can break sharing by touching pages during collection. Applications using Unicorn or Puma in fork mode show memory growth as shared pages become private, which differs from true leaks but produces similar symptoms.
Ruby Implementation
Ruby provides multiple APIs for memory inspection and profiling. The ObjectSpace module offers low-level access to all live objects, enabling comprehensive memory analysis. The GC module controls garbage collection behavior and exposes GC statistics. Process memory information comes from system-level APIs exposed through gems.
ObjectSpace module enumerates all live objects in the Ruby VM:
require 'objspace'
# Count objects by class
def object_count_by_class
counts = Hash.new(0)
ObjectSpace.each_object do |obj|
counts[obj.class] += 1
end
counts.sort_by { |_, count| -count }.first(10)
end
# Before operation
before_counts = object_count_by_class
# Run suspect code
1000.times { |i| User.create(name: "User #{i}") }
# After operation
after_counts = object_count_by_class
# Compare differences
before_hash = before_counts.to_h
after_counts.each do |klass, count|
diff = count - before_hash.fetch(klass, 0)
puts "#{klass}: +#{diff}" if diff > 0
end
ObjectSpace.memsize_of returns the memory size of an object in bytes, including internal structures but not referenced objects:
require 'objspace'
string = "x" * 1_000_000
array = Array.new(10_000) { |i| i }
hash = Hash[Array.new(10_000) { |i| [i, i * 2] }]
puts "String: #{ObjectSpace.memsize_of(string)} bytes"
puts "Array: #{ObjectSpace.memsize_of(array)} bytes"
puts "Hash: #{ObjectSpace.memsize_of(hash)} bytes"
# Output shows shallow size only
# String: 1000040 bytes (string data + overhead)
# Array: 80040 bytes (array structure, not elements)
# Hash: 294952 bytes (hash structure, not entries)
ObjectSpace.reachable_objects_from traces references from an object, revealing what an object keeps alive:
require 'objspace'
class Container
def initialize
@data = Array.new(1000) { |i| "string_#{i}" }
@metadata = { created: Time.now, count: 1000 }
end
end
container = Container.new
reachable = ObjectSpace.reachable_objects_from(container)
puts "Objects reachable from container: #{reachable.size}"
puts "Classes: #{reachable.map(&:class).uniq.join(', ')}"
# Shows Array, Hash, String, Time instances
# Reveals everything this object keeps in memory
GC module controls and monitors garbage collection:
# Get current GC statistics
stats = GC.stat
puts "Total collections: #{stats[:count]}"
puts "Total allocated objects: #{stats[:total_allocated_objects]}"
puts "Total freed objects: #{stats[:total_freed_objects]}"
puts "Heap pages: #{stats[:heap_live_slots]}"
# Force GC and measure impact
before_mb = GC.stat(:heap_live_slots) * GC::INTERNAL_CONSTANTS[:RVALUE_SIZE] / 1024.0 / 1024.0
GC.start(full_mark: true, immediate_sweep: true)
after_mb = GC.stat(:heap_live_slots) * GC::INTERNAL_CONSTANTS[:RVALUE_SIZE] / 1024.0 / 1024.0
puts "Freed: #{(before_mb - after_mb).round(2)} MB"
GC.stat provides detailed metrics about garbage collection behavior. Comparing these statistics before and after suspect operations reveals memory retention:
def measure_gc_impact
GC.disable # Prevent GC during measurement
before = GC.stat
yield # Execute code block
after = GC.stat
GC.enable
{
allocated: after[:total_allocated_objects] - before[:total_allocated_objects],
retained: after[:heap_live_slots] - before[:heap_live_slots]
}
end
impact = measure_gc_impact do
10_000.times { |i| "string_#{i}" }
end
puts "Allocated: #{impact[:allocated]} objects"
puts "Retained: #{impact[:retained]} objects"
Allocation tracking records where objects are allocated:
require 'objspace'
ObjectSpace.trace_object_allocations_start
# Run code to trace
suspicious_operation
# Find allocation sources
allocations = ObjectSpace.each_object.select { |obj|
obj.is_a?(String) && obj.size > 1000
}.map { |obj|
{
class: obj.class,
size: obj.size,
file: ObjectSpace.allocation_sourcefile(obj),
line: ObjectSpace.allocation_sourceline(obj)
}
}.group_by { |info| [info[:file], info[:line]] }
allocations.sort_by { |_, objs| -objs.size }.first(5).each do |(file, line), objs|
puts "#{file}:#{line} - #{objs.size} large strings"
end
ObjectSpace.trace_object_allocations_stop
Ruby 2.1+ includes allocation generation tracking through ObjectSpace.allocation_generation, which identifies objects allocated during specific application phases. This helps isolate request-specific leaks versus startup memory.
Tools & Ecosystem
Ruby's ecosystem provides several specialized tools for memory leak detection and analysis. These tools range from lightweight profilers for development to comprehensive production monitoring solutions.
memory_profiler gem generates detailed allocation and retention reports:
require 'memory_profiler'
report = MemoryProfiler.report do
# Code to profile
users = Array.new(1000) { |i|
User.new(
name: "User #{i}",
email: "user#{i}@example.com",
metadata: { created: Time.now }
)
}
end
# Print detailed report
report.pretty_print
# Key sections:
# - Total allocated: objects and memory
# - Total retained: objects kept after GC
# - Allocated by gem: shows third-party library impact
# - Allocated by file: pinpoints leak sources
# - Allocated by location: specific line numbers
The memory_profiler report distinguishes allocated versus retained objects. Allocated objects include all temporary objects created during execution. Retained objects remain alive after garbage collection, indicating actual memory growth. High retention rates signal potential leaks.
derailed_benchmarks gem provides memory profiling tools for Rails applications:
# Gemfile
gem 'derailed_benchmarks', group: :development
# Command line memory profiling
# Measures memory for each endpoint
$ bundle exec derailed bundle:mem
$ bundle exec derailed bundle:objects
# Profile specific controller action
$ PATH_TO_HIT=/users/1 bundle exec derailed exec perf:mem
# Output shows memory allocated per request
# Identifies high-memory endpoints
derailed_benchmarks integrates with Rails routing to profile individual controller actions. The perf:mem task measures total memory allocated per request, while perf:objects counts object allocations. Comparing results across endpoints identifies memory-intensive operations.
allocation_tracer gem tracks allocation call stacks:
require 'allocation_tracer'
ObjectSpace::AllocationTracer.setup(%i[path line type])
ObjectSpace::AllocationTracer.trace
# Run code
process_large_dataset
result = ObjectSpace::AllocationTracer.stop
# Analyze allocations by type and location
result.sort_by { |key, (count, _)| -count }.first(10).each do |key, (count, size)|
puts "#{key.inspect}: #{count} objects, #{size} bytes"
end
allocation_tracer provides lower overhead than ObjectSpace.trace_object_allocations by using Ruby VM hooks. This makes it suitable for profiling longer operations or production traffic sampling.
rbtrace gem enables production debugging without restart:
# In application initialization
require 'rbtrace'
# From another terminal while app runs
$ rbtrace -p <pid> -e 'ObjectSpace.count_objects'
$ rbtrace -p <pid> -e 'Thread.list.map(&:backtrace)'
$ rbtrace -p <pid> --gc
# Inject memory profiling code
$ rbtrace -p <pid> -e '
require "memory_profiler"
report = MemoryProfiler.report { GC.start }
report.pretty_print
'
rbtrace attaches to running Ruby processes using Unix domain sockets and msgpack serialization. This allows memory inspection without application restart or preloaded profiling code.
get_process_mem gem tracks total process memory consumption:
require 'get_process_mem'
mem = GetProcessMem.new
# Monitor memory over time
initial = mem.mb
5.times do
perform_operation
current = mem.mb
delta = current - initial
puts "Memory: #{current.round(2)} MB (+#{delta.round(2)} MB)"
sleep 1
end
get_process_mem reads memory from /proc filesystem on Linux or uses platform-specific APIs on other systems. This provides resident set size (RSS), which includes all memory the process occupies, not just Ruby heap.
Production monitoring requires continuous tracking rather than one-time profiling. Integrating memory metrics into application performance monitoring systems enables trend analysis:
# Using StatsD/Datadog
def track_memory_metrics
mem = GetProcessMem.new
gc_stats = GC.stat
$statsd.gauge('memory.rss_mb', mem.mb)
$statsd.gauge('memory.heap_live_slots', gc_stats[:heap_live_slots])
$statsd.gauge('memory.heap_free_slots', gc_stats[:heap_free_slots])
$statsd.gauge('gc.count', gc_stats[:count])
end
# Call periodically via background thread or middleware
Thread.new do
loop do
track_memory_metrics
sleep 60
end
end
Practical Examples
Memory leaks manifest differently based on their root cause. These examples demonstrate common leak patterns and detection approaches.
Circular references in object graphs prevent collection when objects reference each other:
class Document
attr_accessor :name, :versions
def initialize(name)
@name = name
@versions = []
end
end
class Version
attr_accessor :document, :content, :number
def initialize(document, content, number)
@document = document # Circular reference
@content = content
@number = number
end
end
# Creating versions
doc = Document.new("Report")
10_000.times do |i|
version = Version.new(doc, "Content #{i}", i)
doc.versions << version
end
# Problem: each Version holds reference to Document
# Document holds array of all Versions
# Even if doc becomes unreferenced, the version array keeps it alive
# And each version keeps doc alive through @document
# Detection
require 'objspace'
doc_id = doc.object_id
doc = nil
GC.start
# Check if document still exists
found = ObjectSpace.each_object(Document).find { |d| d.object_id == doc_id }
puts "Document leaked!" if found
# Solution: use weak references or break cycles explicitly
class Version
def document_name
@document.name # Access properties without holding reference
end
def cleanup
@document = nil # Break reference manually
end
end
Global variable accumulation creates persistent leaks by maintaining references indefinitely:
# Anti-pattern: global request cache
$request_cache = {}
class RequestHandler
def process(request_id)
# Cache grows without bounds
$request_cache[request_id] = {
timestamp: Time.now,
data: fetch_large_data(request_id),
user_context: current_user.to_h
}
# Memory leak: cache never clears
end
end
# After 100,000 requests
# $request_cache contains 100,000 entries
# Each with timestamp, data, and user context
# Detection over request lifetime
before_count = $request_cache.size
before_mem = GetProcessMem.new.mb
1000.times { |i| RequestHandler.new.process(i) }
after_count = $request_cache.size
after_mem = GetProcessMem.new.mb
puts "Cache entries: #{before_count} -> #{after_count}"
puts "Memory: #{before_mem.round(2)} -> #{after_mem.round(2)} MB"
puts "Leaked: #{after_count - before_count} entries, #{(after_mem - before_mem).round(2)} MB"
# Solution: bounded cache with expiration
class BoundedCache
MAX_SIZE = 1000
MAX_AGE = 3600
def initialize
@cache = {}
@mutex = Mutex.new
end
def set(key, value)
@mutex.synchronize do
prune if @cache.size >= MAX_SIZE
@cache[key] = { value: value, timestamp: Time.now }
end
end
def get(key)
@mutex.synchronize do
entry = @cache[key]
return nil unless entry
if Time.now - entry[:timestamp] > MAX_AGE
@cache.delete(key)
nil
else
entry[:value]
end
end
end
private
def prune
cutoff = Time.now - MAX_AGE
@cache.delete_if { |_, entry| entry[:timestamp] < cutoff }
# Still too many? Remove oldest
if @cache.size >= MAX_SIZE
sorted = @cache.sort_by { |_, entry| entry[:timestamp] }
sorted.first(@cache.size - MAX_SIZE / 2).each { |key, _| @cache.delete(key) }
end
end
end
Class variable retention creates leaks because class variables persist across all instances:
class RequestProcessor
# Class variable shared across all instances
@@processed_requests = []
def process(request)
result = expensive_operation(request)
# Leak: class variable accumulates all results
@@processed_requests << {
request_id: request.id,
result: result,
timestamp: Time.now
}
result
end
def self.stats
@@processed_requests.size
end
end
# After processing many requests
# @@processed_requests contains every result ever processed
# Detection
before = RequestProcessor.stats
mem_before = GetProcessMem.new.mb
10_000.times { |i| RequestProcessor.new.process(Request.new(i)) }
after = RequestProcessor.stats
mem_after = GetProcessMem.new.mb
puts "Requests tracked: #{after - before}"
puts "Memory growth: #{(mem_after - mem_before).round(2)} MB"
# Output shows linear growth with request count
# Solution: use instance variables with controlled lifecycle
class RequestProcessor
attr_reader :processed_count
def initialize
@processed_count = 0
end
def process(request)
result = expensive_operation(request)
@processed_count += 1
# Don't retain results, only count
result
end
end
Thread-local variable leaks occur when threads store data without cleanup:
# Thread-local storage in connection pool
class DatabaseConnection
def initialize
@connection = create_connection
end
def execute(query)
# Store result in thread-local for later access
Thread.current[:last_query_result] = @connection.query(query)
end
end
# Create thread pool
threads = 10.times.map do
Thread.new do
conn = DatabaseConnection.new
loop do
query = $query_queue.pop
conn.execute(query)
# Memory leak: Thread.current hash grows
# :last_query_result never removed
end
end
end
# Detection: inspect thread-local storage
def analyze_thread_memory
Thread.list.each_with_index do |thread, i|
locals = thread.keys
sizes = locals.map { |key|
val = thread[key]
[key, ObjectSpace.memsize_of(val)]
}
puts "Thread #{i}: #{locals.size} locals"
sizes.sort_by { |_, size| -size }.first(5).each do |key, size|
puts " #{key}: #{size} bytes"
end
end
end
# Solution: explicit cleanup or bounded storage
class DatabaseConnection
def execute(query)
result = @connection.query(query)
# Option 1: Don't store
result
# Option 2: Store with cleanup
# Thread.current[:last_query_result] = result
# at_exit { Thread.current[:last_query_result] = nil }
end
end
Rails application memory leak from association preloading:
# Controller with association leak
class UsersController < ApplicationController
def index
# Leak: loads all associations into memory
@users = User.includes(:posts, :comments, :likes).all
# Each user loads all posts, comments, likes
# For 1000 users with 100 posts each
# Memory holds 1000 user objects + 100,000 post objects
# Only render summary, but all records loaded
render json: @users.map { |u| { id: u.id, name: u.name } }
end
end
# Detection: profile controller action memory
require 'memory_profiler'
report = MemoryProfiler.report do
# Simulate request
controller = UsersController.new
controller.index
end
report.pretty_print
# Shows high retention of ActiveRecord objects
# Solution: select only needed fields
class UsersController < ApplicationController
def index
@users = User.select(:id, :name).all
render json: @users
end
# Or paginate large datasets
def index_paginated
page = params[:page] || 1
@users = User.select(:id, :name).page(page).per(50)
render json: @users
end
end
Performance Considerations
Memory leaks degrade application performance through multiple mechanisms. The most immediate impact appears in garbage collection overhead, as larger heaps require longer mark-and-sweep cycles. A Ruby application with 100MB heap might complete GC in 10-20 milliseconds. The same application with 2GB heap from memory leaks requires 200-400 milliseconds per major GC cycle.
GC pause time scales with heap size. Ruby's GC must traverse all live objects during marking phase, with traversal time proportional to object count. Applications starting with 500,000 live objects might grow to 10 million objects with leaks. GC pause times increase correspondingly, blocking all threads during major collection in Ruby MRI.
# Measure GC pause impact
def benchmark_gc_performance(object_count)
# Create objects
objects = Array.new(object_count) { |i| { id: i, data: "x" * 100 } }
# Force GC and measure
times = 10.times.map do
start = Time.now
GC.start(full_mark: true, immediate_sweep: true)
Time.now - start
end
avg = times.sum / times.size
puts "#{object_count} objects: #{(avg * 1000).round(2)} ms GC pause"
objects = nil
GC.start
end
[100_000, 500_000, 1_000_000, 5_000_000].each do |count|
benchmark_gc_performance(count)
end
# Output demonstrates superlinear growth
# 100,000 objects: 5.2 ms
# 500,000 objects: 28.7 ms
# 1,000,000 objects: 61.3 ms
# 5,000,000 objects: 342.8 ms
Memory page faults increase when working set exceeds physical RAM. Operating systems page inactive memory to swap, causing disk I/O during access. Ruby's GC exacerbates this by touching all objects during collection, forcing page-ins from swap. Applications consuming more memory than available RAM experience severe performance degradation.
CPU overhead from frequent GC cycles compounds memory problems. Ruby 2.1+ includes generational GC to reduce pause times, but leaked objects promote to old generation, requiring expensive major collections. Applications with memory leaks trigger major GC increasingly often as old generation fills.
# Track GC frequency over time
class GCMonitor
def initialize
@initial_stats = GC.stat.dup
@samples = []
end
def sample
current = GC.stat
@samples << {
timestamp: Time.now,
minor_gc: current[:minor_gc_count] - @initial_stats[:minor_gc_count],
major_gc: current[:major_gc_count] - @initial_stats[:major_gc_count],
total_time: current[:time],
heap_mb: current[:heap_live_slots] * 40 / 1024.0 / 1024.0
}
end
def report
return if @samples.empty?
duration = @samples.last[:timestamp] - @samples.first[:timestamp]
minor_per_sec = @samples.last[:minor_gc] / duration
major_per_sec = @samples.last[:major_gc] / duration
puts "Duration: #{duration.round(2)} seconds"
puts "Minor GC: #{@samples.last[:minor_gc]} total (#{minor_per_sec.round(2)}/sec)"
puts "Major GC: #{@samples.last[:major_gc]} total (#{major_per_sec.round(2)}/sec)"
puts "Heap growth: #{@samples.first[:heap_mb].round(2)} -> #{@samples.last[:heap_mb].round(2)} MB"
end
end
monitor = GCMonitor.new
# Run application code
10.times do
process_requests
monitor.sample
sleep 10
end
monitor.report
# Increasing major GC frequency indicates memory pressure
Heap fragmentation reduces memory efficiency in long-running processes. Ruby allocates objects in heap slots. Freed objects leave empty slots, but Ruby cannot always reuse these slots efficiently if remaining objects have different size requirements. Memory leaks worsen fragmentation by keeping old objects alive, preventing heap compaction.
Swap thrashing represents the worst-case performance scenario. When memory usage exceeds RAM, the operating system pages memory to disk. If the working set exceeds RAM, the system constantly swaps pages between RAM and disk, degrading performance by 100-1000x. Ruby applications show this as request timeouts and complete unresponsiveness.
Production monitoring tracks several metrics to detect memory-related performance degradation:
# Comprehensive memory monitoring
class MemoryHealthCheck
THRESHOLDS = {
heap_growth_rate_mb_per_hour: 10.0,
major_gc_per_minute: 2.0,
rss_mb: 1024.0,
gc_pause_ms: 100.0
}
def initialize
@baseline = capture_metrics
@baseline_time = Time.now
end
def check
current = capture_metrics
duration_hours = (Time.now - @baseline_time) / 3600.0
heap_growth = (current[:heap_mb] - @baseline[:heap_mb]) / duration_hours
major_gc_rate = (current[:major_gc] - @baseline[:major_gc]) / (duration_hours * 60)
{
healthy: heap_growth < THRESHOLDS[:heap_growth_rate_mb_per_hour] &&
major_gc_rate < THRESHOLDS[:major_gc_per_minute] &&
current[:rss_mb] < THRESHOLDS[:rss_mb],
metrics: {
heap_growth_mb_per_hour: heap_growth.round(2),
major_gc_per_minute: major_gc_rate.round(2),
rss_mb: current[:rss_mb].round(2),
last_gc_pause_ms: current[:gc_pause_ms].round(2)
}
}
end
private
def capture_metrics
gc_stats = GC.stat
{
heap_mb: gc_stats[:heap_live_slots] * 40 / 1024.0 / 1024.0,
major_gc: gc_stats[:major_gc_count],
rss_mb: GetProcessMem.new.mb,
gc_pause_ms: gc_stats[:time] * 1000.0
}
end
end
Common Pitfalls
Memory leaks in Ruby applications often stem from patterns that appear safe but create unintended reference retention. Recognizing these patterns prevents leaks before they reach production.
Closure variable capture creates subtle leaks by retaining references to variables in the enclosing scope:
class EventEmitter
def initialize
@listeners = []
end
def on(event, &block)
@listeners << block
end
def emit(event, data)
@listeners.each { |block| block.call(data) }
end
end
emitter = EventEmitter.new
# Memory leak: closures capture entire scope
users = User.all.to_a # Load all users (large array)
users.each do |user|
emitter.on(:user_event) do |data|
# Closure only uses user.id
# But captures entire user object
# And entire users array through scope
puts "Event for user #{user.id}: #{data}"
end
end
# Each closure holds reference to users array
# Even though only one user needed
# All users kept in memory by all closures
# Solution: extract only needed data
users.each do |user|
user_id = user.id # Copy only needed value
emitter.on(:user_event) do |data|
puts "Event for user #{user_id}: #{data}"
end
end
Class-level caching without expiration accumulates data indefinitely:
class Product
@cache = {}
def self.cached_find(id)
# Memory leak: cache grows without bounds
@cache[id] ||= find(id)
end
def self.cache_size
@cache.size
end
end
# After accessing many products
10_000.times { |i| Product.cached_find(i) }
puts "Cache size: #{Product.cache_size}" # => 10000
# All products remain in memory indefinitely
# Solution: LRU cache with size limit
require 'lru_redux'
class Product
@cache = LruRedux::Cache.new(1000) # Max 1000 entries
def self.cached_find(id)
@cache.getset(id) { find(id) }
end
end
String interning creates permanent references when abused:
# Anti-pattern: interning dynamic strings
def log_event(event_type, user_id)
# Memory leak: each unique string interned permanently
event_key = "#{event_type}_#{user_id}".to_sym
STATS[event_key] ||= 0
STATS[event_key] += 1
end
# After many events with different user_ids
# Symbol table contains millions of entries
# Symbols never garbage collected in Ruby < 2.2
# Detection
before = Symbol.all_symbols.size
10_000.times { |i| log_event("click", i) }
after = Symbol.all_symbols.size
puts "Symbols created: #{after - before}" # => 10000
# Solution: use strings for dynamic keys
def log_event(event_type, user_id)
event_key = "#{event_type}_#{user_id}" # String, not symbol
STATS[event_key] ||= 0
STATS[event_key] += 1
end
Singleton accumulation creates leaks through per-class instance variables:
class Service
include Singleton
def initialize
@cache = {}
end
def fetch(key)
# Singleton persists for application lifetime
# Cache grows without bounds
@cache[key] ||= expensive_operation(key)
end
end
# Singleton instance never deallocated
# Cache accumulates all fetched keys
service = Service.instance
100_000.times { |i| service.fetch(i) }
# Solution: implement cache management
class Service
include Singleton
MAX_CACHE_SIZE = 10_000
def initialize
@cache = {}
@access_times = {}
end
def fetch(key)
prune_cache if @cache.size > MAX_CACHE_SIZE
@access_times[key] = Time.now
@cache[key] ||= expensive_operation(key)
end
private
def prune_cache
# Remove least recently accessed
sorted = @access_times.sort_by { |_, time| time }
to_remove = sorted.first(@cache.size - MAX_CACHE_SIZE / 2)
to_remove.each do |key, _|
@cache.delete(key)
@access_times.delete(key)
end
end
end
ActiveRecord association preloading loads unnecessary data:
# Memory leak through over-eager loading
class Report
def user_activity_summary
# Leak: loads all associations into memory
users = User.includes(:posts, :comments, :likes, :followers, :following).all
# But only uses counts
users.map do |user|
{
name: user.name,
posts: user.posts.size, # All posts loaded unnecessarily
comments: user.comments.size # All comments loaded unnecessarily
}
end
end
end
# Solution: use counter caches or database aggregation
class Report
def user_activity_summary
# Only load needed data
User.select('users.*, COUNT(DISTINCT posts.id) as posts_count,
COUNT(DISTINCT comments.id) as comments_count')
.joins(:posts, :comments)
.group('users.id')
.map do |user|
{
name: user.name,
posts: user.posts_count,
comments: user.comments_count
}
end
end
end
Background job retention keeps references to job arguments:
# Sidekiq job with leak
class ProcessUserJob
include Sidekiq::Worker
def perform(user_id)
user = User.find(user_id)
# Store reference for retry logic
@user = user # Memory leak if job fails and retries
process_user(user)
end
end
# Failed jobs remain in Redis with serialized @user
# Retries create new instances but old data persists
# Solution: avoid instance variables in jobs
class ProcessUserJob
include Sidekiq::Worker
def perform(user_id)
# Reload user for each attempt
user = User.find(user_id)
process_user(user)
# user deallocated after method returns
end
end
Memoization with arguments creates unbounded caches:
class DataProcessor
def fetch_data(id)
# Memory leak: memoizes every ID ever requested
@data_cache ||= {}
@data_cache[id] ||= expensive_fetch(id)
end
end
# Solution: use method-level memoization carefully
class DataProcessor
def fetch_data(id)
# Only memoize for current instance's primary ID
return @cached_data if defined?(@cached_data) && @current_id == id
@current_id = id
@cached_data = expensive_fetch(id)
end
end
Reference
Memory Profiling Commands
| Command | Purpose | Overhead |
|---|---|---|
| GC.stat | Get GC statistics | Minimal |
| ObjectSpace.each_object | Enumerate live objects | High |
| ObjectSpace.count_objects | Count objects by type | Low |
| MemoryProfiler.report | Detailed allocation report | High |
| derailed bundle:mem | Measure gem memory usage | Medium |
| allocation_tracer | Track allocations with stacks | Medium |
| rbtrace | Production process inspection | Low |
Detection Strategy Checklist
| Phase | Action | Tool |
|---|---|---|
| Baseline | Establish normal memory usage | monitoring graphs |
| Monitor | Track RSS and heap growth over time | APM, GetProcessMem |
| Detect | Identify unusual growth patterns | trend analysis |
| Profile | Measure specific code paths | memory_profiler |
| Analyze | Find retention sources | ObjectSpace |
| Verify | Confirm leak resolution | before/after comparison |
| Prevent | Add memory tests | CI integration |
Common Leak Patterns
| Pattern | Symptom | Detection Method |
|---|---|---|
| Unbounded cache | Linear memory growth with usage | Cache size tracking |
| Circular references | Objects not collected | reachable_objects_from |
| Closure captures | Unexpected scope retention | Block source analysis |
| Class variables | Persistent accumulation | Class instance inspection |
| Thread locals | Per-thread growth | Thread.current analysis |
| Global variables | Indefinite retention | Global variable audit |
| Symbol creation | Symbol table growth | Symbol.all_symbols.size |
GC Configuration Options
| Environment Variable | Default | Purpose |
|---|---|---|
| RUBY_GC_HEAP_GROWTH_FACTOR | 1.8 | Heap growth multiplier |
| RUBY_GC_HEAP_GROWTH_MAX_SLOTS | 0 | Maximum slots per growth |
| RUBY_GC_HEAP_INIT_SLOTS | 10000 | Initial heap slots |
| RUBY_GC_HEAP_FREE_SLOTS | 4096 | Free slots to maintain |
| RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR | 2.0 | Old generation growth limit |
| RUBY_GC_MALLOC_LIMIT | 16MB | Malloc trigger threshold |
Object Size Analysis
| Method | Returns | Use Case |
|---|---|---|
| ObjectSpace.memsize_of(obj) | Bytes | Shallow object size |
| ObjectSpace.reachable_objects_from(obj) | Array | Direct references |
| ObjectSpace.count_objects | Hash | Objects per class |
| GC.stat(:heap_live_slots) | Integer | Total live objects |
| GC.stat(:heap_allocated_pages) | Integer | Memory pages allocated |
Memory Monitoring Metrics
| Metric | Significance | Alert Threshold |
|---|---|---|
| RSS growth rate | Overall memory leak | >5MB/hour sustained |
| Heap size growth | Ruby object retention | >10MB/hour sustained |
| Major GC frequency | Memory pressure | >5 per minute |
| GC pause time | Performance impact | >100ms average |
| Heap slots live | Object count | Steady increase |
| Heap fragmentation | Memory efficiency | >30% free slots |
Tool Comparison
| Tool | Production Safe | Overhead | Detail Level | Use Case |
|---|---|---|---|---|
| memory_profiler | No | High | Detailed | Development profiling |
| derailed_benchmarks | No | High | Per-endpoint | Development analysis |
| allocation_tracer | Limited | Medium | Stack traces | Staging investigation |
| rbtrace | Yes | Low | Real-time inspection | Production debugging |
| GetProcessMem | Yes | Minimal | Process memory | Continuous monitoring |
| ObjectSpace | No | Very high | Complete object graph | Deep analysis |
Memory Leak Indicators
| Indicator | Normal Pattern | Leak Pattern |
|---|---|---|
| RSS over time | Sawtooth with stable baseline | Monotonic increase |
| Heap size | Grows then stabilizes | Continuous growth |
| GC frequency | Consistent rate | Increasing frequency |
| Object count | Fluctuates around baseline | Steady accumulation |
| GC pause time | Consistent duration | Increasing duration |
| Free heap slots | Maintains buffer | Decreasing ratio |