Overview
Ruby's automatic memory management through garbage collection provides convenience but requires understanding for optimal performance. Memory optimization in Ruby involves controlling object allocation, managing references, and working with the garbage collector rather than against it.
Ruby uses a mark-and-sweep garbage collector with generational collection. Objects start in the young generation and move to older generations if they survive multiple collection cycles. The garbage collector runs automatically when memory pressure increases, but developers can influence when and how collection occurs.
# Basic memory monitoring
GC.stat[:total_allocated_objects] # Total objects created
GC.stat[:heap_live_slots] # Currently live objects
GC.stat[:heap_free_slots] # Available object slots
Memory optimization strategies focus on reducing allocations, reusing objects, and structuring data efficiently. Ruby provides several mechanisms for memory control including object pooling, string optimization, and collection tuning.
# String allocation comparison
# Creates new string object each time
def wasteful_concatenation(arr)
result = ""
arr.each { |item| result += item.to_s }
result
end
# Reuses string buffer
def efficient_concatenation(arr)
result = String.new
arr.each { |item| result << item.to_s }
result
end
The Ruby VM tracks object allocation patterns and adjusts collection frequency based on application behavior. Understanding these patterns helps developers write memory-efficient code that works with Ruby's memory management system.
Basic Usage
Memory optimization begins with understanding object lifecycle and allocation patterns. Ruby creates objects frequently, and small optimizations compound across large applications.
String operations represent a primary source of memory allocations. Ruby creates new string objects for most operations unless specifically using mutating methods.
# High allocation approach
def format_names(names)
names.map { |name| name.capitalize + " Smith" }
end
# Lower allocation approach
def format_names(names)
names.map do |name|
result = name.dup
result.capitalize!
result << " Smith"
end
end
Array and hash operations also impact memory usage. Growing collections dynamically causes Ruby to reallocate and copy data structures.
# Inefficient dynamic growth
data = []
1000.times { |i| data << "item_#{i}" }
# Pre-allocate known size
data = Array.new(1000)
1000.times { |i| data[i] = "item_#{i}" }
# Or build directly
data = (1..1000).map { |i| "item_#{i}" }
Symbol usage affects memory because symbols are never garbage collected in older Ruby versions. Use symbols for identifiers and strings for data.
# Memory leak in older Ruby versions
user_input.each { |input| input.to_sym } # Dangerous
# Safe approach
VALID_KEYS = %i[name email address].freeze
data = user_input.select { |k, v| VALID_KEYS.include?(k.to_sym) }
Method calls and block creation also allocate objects. Reducing intermediate objects and reusing blocks improves memory efficiency.
# Creates intermediate array
def process_data(items)
items.select(&:valid?).map(&:process).compact
end
# Single pass with enum
def process_data(items)
items.each_with_object([]) do |item, result|
if item.valid?
processed = item.process
result << processed if processed
end
end
end
Performance & Memory
Memory optimization directly impacts application performance through reduced garbage collection overhead and improved cache locality. Measuring allocation patterns provides data for optimization decisions.
Ruby's allocation tracking reveals hotspots where optimization provides the greatest benefit. The allocation tracker identifies methods creating the most objects.
require 'objspace'
# Track allocations for a code block
result = ObjectSpace.trace_object_allocations do
# Code to analyze
large_dataset.map { |item| item.transform.to_s.upcase }
end
# Analyze allocation sources
ObjectSpace.allocation_sourcefile(result.first) # File
ObjectSpace.allocation_sourceline(result.first) # Line
ObjectSpace.allocation_class_path(result.first) # Class
Garbage collection frequency affects application responsiveness. Applications with high allocation rates trigger collection more frequently, causing pause times.
# Monitor GC activity
before_stats = GC.stat
perform_work
after_stats = GC.stat
major_collections = after_stats[:major_gc_count] - before_stats[:major_gc_count]
minor_collections = after_stats[:minor_gc_count] - before_stats[:minor_gc_count]
puts "Major GCs: #{major_collections}, Minor GCs: #{minor_collections}"
Object pooling reduces allocation pressure for frequently created objects. Pools work best for objects with predictable lifecycles and expensive initialization.
class StringPool
def initialize(size = 100)
@pool = Array.new(size) { String.new }
@available = @pool.dup
end
def checkout
return String.new if @available.empty?
@available.pop.clear
end
def checkin(str)
return if @available.size >= @pool.size
@available.push(str)
end
end
# Usage pattern
pool = StringPool.new
buffer = pool.checkout
buffer << "temporary data"
process(buffer)
pool.checkin(buffer)
Memory profiling tools provide detailed allocation analysis. The memory_profiler
gem tracks object creation and retention across code execution.
require 'memory_profiler'
report = MemoryProfiler.report do
# Code to profile
process_large_dataset(data)
end
report.pretty_print(to_file: 'memory_report.txt')
Data structure choice significantly impacts memory usage. Different collection types have varying memory overhead and access patterns.
# Memory comparison for different structures
require 'benchmark/ips'
data = (1..1000).to_a
# Array lookup O(n)
array_lookup = -> { data.include?(500) }
# Hash lookup O(1) but higher memory overhead
hash_data = data.each_with_object({}) { |v, h| h[v] = true }
hash_lookup = -> { hash_data.key?(500) }
# Set provides O(1) lookup with less overhead than Hash
require 'set'
set_data = data.to_set
set_lookup = -> { set_data.include?(500) }
Benchmark.ips do |x|
x.report('Array') { array_lookup.call }
x.report('Hash') { hash_lookup.call }
x.report('Set') { set_lookup.call }
x.compare!
end
Advanced Usage
Advanced memory optimization involves understanding Ruby's internal object representation and garbage collection behavior. Techniques include object layout optimization, collection tuning, and memory mapping for large datasets.
Ruby stores objects in memory slots with fixed sizes. Objects requiring more space than a single slot use multiple slots, reducing memory efficiency. Designing objects to fit within single slots improves memory density.
# Object size analysis
require 'objspace'
class CompactUser
def initialize(id, name)
@id = id
@name = name
end
end
class BloatedUser
def initialize(id, name, metadata = {})
@id = id
@name = name
@created_at = Time.now
@updated_at = Time.now
@metadata = metadata
@cache = {}
@observers = []
end
end
compact = CompactUser.new(1, "Alice")
bloated = BloatedUser.new(1, "Alice")
puts "Compact: #{ObjectSpace.memsize_of(compact)} bytes"
puts "Bloated: #{ObjectSpace.memsize_of(bloated)} bytes"
Copy-on-write optimization reduces memory usage when forking processes. Ruby shares object pages between processes until modifications occur.
# Pre-load shared data before forking
SHARED_CONFIG = JSON.parse(File.read('config.json')).freeze
SHARED_LOOKUP = build_lookup_table.freeze
# Fork after loading shared data
if fork.nil?
# Child process inherits shared pages
# Modifications trigger copy-on-write
process_requests
else
# Parent continues
Process.wait
end
Weak references prevent objects from being retained solely for caching or observation purposes. Ruby's WeakRef
allows references that don't prevent garbage collection.
require 'weakref'
class ObjectCache
def initialize
@cache = {}
end
def store(key, obj)
@cache[key] = WeakRef.new(obj)
end
def fetch(key)
weak_ref = @cache[key]
return nil unless weak_ref
begin
weak_ref.__getobj__
rescue WeakRef::RefError
@cache.delete(key)
nil
end
end
end
cache = ObjectCache.new
obj = ExpensiveObject.new
cache.store(:key, obj)
obj = nil # Remove strong reference
GC.start # Object becomes eligible for collection
cached = cache.fetch(:key) # May return nil after GC
Memory mapping provides access to large files without loading entire contents into memory. The mmap
approach works well for read-heavy workloads with large datasets.
require 'mmap'
# Map large file into memory
mapped_file = Mmap.new('large_dataset.txt', 'r')
# Access file contents without full load
def search_mapped_file(mapped, pattern)
offset = 0
while offset < mapped.size
chunk = mapped[offset, 4096] # Read 4KB chunks
if chunk.include?(pattern)
return find_line_containing(chunk, pattern, offset)
end
offset += 4096
end
nil
end
result = search_mapped_file(mapped_file, "search_term")
mapped_file.unmap
Custom garbage collection tuning adjusts collection frequency and aggressiveness based on application characteristics.
# Tune GC for high-allocation applications
GC.start(full_mark: true, immediate_sweep: false)
# Adjust collection thresholds
if defined?(GC.tune)
GC.tune(
heap_slots_increment: 10_000,
heap_slots_growth_factor: 1.8,
oldmalloc_limit_max: 33_554_432,
oldmalloc_limit_min: 16_777_216
)
end
# Monitor collection impact
def with_gc_monitoring
before = GC.stat
start_time = Time.now
yield
after = GC.stat
duration = Time.now - start_time
puts "Duration: #{duration}s"
puts "Minor GCs: #{after[:minor_gc_count] - before[:minor_gc_count]}"
puts "Major GCs: #{after[:major_gc_count] - before[:major_gc_count]}"
end
Common Pitfalls
Memory optimization introduces subtle bugs and performance anti-patterns. Understanding common pitfalls prevents memory leaks and performance degradation.
String concatenation using +=
creates intermediate objects for each operation. This pattern scales quadratically with input size.
# Quadratic memory allocation - dangerous for large inputs
def build_report(items)
report = ""
items.each do |item|
report += "Item: #{item.name}\n" # Creates new string each iteration
report += "Value: #{item.value}\n"
report += "Status: #{item.status}\n\n"
end
report
end
# Linear memory allocation
def build_report(items)
parts = []
items.each do |item|
parts << "Item: #{item.name}\n"
parts << "Value: #{item.value}\n"
parts << "Status: #{item.status}\n\n"
end
parts.join
end
Closure capture retains references to entire scope chains, preventing garbage collection of local variables no longer needed.
def create_processors(data)
large_cache = build_expensive_cache(data) # Large object
small_lookup = extract_lookup_table(large_cache) # Small subset
# Bad: closure captures entire scope including large_cache
data.map do |item|
->(x) { small_lookup[item.key] + x } # large_cache still referenced
end
end
# Good: limit scope capture
def create_processors(data)
large_cache = build_expensive_cache(data)
small_lookup = extract_lookup_table(large_cache)
large_cache = nil # Explicit release
data.map do |item|
key = item.key # Capture only needed values
->(x) { small_lookup[key] + x }
end
end
Global variables and class variables prevent garbage collection of referenced objects. These references persist for the application lifetime.
class DataProcessor
@@cache = {} # Class variable - never garbage collected
def self.process(data)
key = data.hash
@@cache[key] ||= expensive_computation(data) # Accumulates forever
end
end
# Better: bounded cache with eviction
class DataProcessor
@cache = {}
@max_size = 1000
def self.process(data)
key = data.hash
if @cache.size >= @max_size
@cache.shift # Remove oldest entry
end
@cache[key] ||= expensive_computation(data)
end
end
Circular references between objects prevent collection in older Ruby versions or when using C extensions that don't participate in garbage collection.
class Parent
attr_accessor :children
def initialize
@children = []
end
end
class Child
attr_accessor :parent
def initialize(parent)
@parent = parent
parent.children << self # Circular reference created
end
end
# Circular reference prevents collection
parent = Parent.new
child = Child.new(parent)
parent = nil
child = nil
# Objects may not be collected due to circular reference
# Break cycles explicitly
class Child
def detach
@parent.children.delete(self) if @parent
@parent = nil
end
end
Regular expressions compiled repeatedly consume memory without benefit. Pre-compiling patterns improves performance and reduces allocation.
# Inefficient: recompiles regex each call
def extract_emails(text)
text.scan(/[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}/)
end
# Efficient: compiled once
EMAIL_REGEX = /[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}/.freeze
def extract_emails(text)
text.scan(EMAIL_REGEX)
end
Production Patterns
Production memory optimization requires monitoring, alerting, and gradual optimization based on real usage patterns. Successful optimization balances memory efficiency with code maintainability.
Memory monitoring in production environments tracks allocation trends and identifies memory pressure before it impacts performance.
# Production memory monitoring
class MemoryMonitor
def initialize(interval = 30)
@interval = interval
@stats_history = []
end
def start_monitoring
Thread.new do
loop do
record_stats
sleep @interval
end
end
end
private
def record_stats
stats = {
timestamp: Time.now,
rss_mb: process_rss_mb,
heap_live: GC.stat[:heap_live_slots],
heap_free: GC.stat[:heap_free_slots],
major_gc_count: GC.stat[:major_gc_count],
minor_gc_count: GC.stat[:minor_gc_count]
}
@stats_history << stats
@stats_history.shift if @stats_history.size > 1000
check_memory_thresholds(stats)
end
def process_rss_mb
`ps -o rss= -p #{Process.pid}`.to_i / 1024.0
end
def check_memory_thresholds(stats)
if stats[:rss_mb] > 500 # 500MB threshold
Rails.logger.warn "High memory usage: #{stats[:rss_mb]}MB"
end
if recent_gc_frequency > 10 # More than 10 GCs per minute
Rails.logger.warn "High GC frequency detected"
end
end
end
# Start monitoring in production
MemoryMonitor.new.start_monitoring if Rails.env.production?
Request-scoped object pools prevent allocation spikes during traffic bursts. Pools sized for typical request patterns reduce garbage collection overhead.
class RequestScopedPool
def initialize(app)
@app = app
end
def call(env)
Thread.current[:object_pool] = ObjectPool.new
begin
@app.call(env)
ensure
Thread.current[:object_pool] = nil
end
end
end
class ObjectPool
def initialize
@string_pool = []
@array_pool = []
end
def checkout_string
@string_pool.pop || String.new
end
def checkin_string(str)
return if @string_pool.size > 50
@string_pool.push(str.clear)
end
def self.current
Thread.current[:object_pool]
end
end
# Use in controllers
class ApiController < ApplicationController
def process_data
pool = ObjectPool.current
buffer = pool.checkout_string
# Use buffer for processing
pool.checkin_string(buffer)
end
end
Database connection pooling and query result streaming reduce memory usage for large datasets.
class LargeDataProcessor
def process_in_batches(batch_size = 1000)
total_processed = 0
User.find_in_batches(batch_size: batch_size) do |batch|
batch.each do |user|
process_user(user)
total_processed += 1
# Force GC periodically for long-running processes
GC.start if total_processed % 10_000 == 0
end
# Log progress
Rails.logger.info "Processed #{total_processed} users"
end
end
def stream_large_query
sql = "SELECT * FROM large_table WHERE condition = ?"
ActiveRecord::Base.connection.select_all(sql).each do |row|
yield row # Process one row at a time
# Row becomes eligible for GC immediately
end
end
end
Application server tuning optimizes memory usage across worker processes. Configuration balances memory efficiency with performance requirements.
# config/puma.rb - Production memory optimization
workers ENV.fetch("WEB_CONCURRENCY") { 2 }
threads_count = ENV.fetch("RAILS_MAX_THREADS") { 5 }
threads threads_count, threads_count
# Optimize for memory efficiency
preload_app!
# Restart workers when memory exceeds threshold
before_fork do
# Share database connections across workers
ActiveRecord::Base.connection_pool.disconnect!
end
on_worker_boot do |index|
ActiveRecord::Base.establish_connection
# Monitor worker memory usage
Thread.new do
loop do
rss_mb = `ps -o rss= -p #{Process.pid}`.to_i / 1024.0
if rss_mb > 400 # 400MB per worker limit
Process.kill('SIGUSR1', Process.pid) # Graceful restart
end
sleep 60
end
end
end
end
# Graceful worker restarts on memory pressure
on_restart do
puts "Puma restarting due to memory pressure..."
end
Reference
Memory Management Methods
Method | Parameters | Returns | Description |
---|---|---|---|
GC.start |
full_mark: false, immediate_sweep: true |
nil |
Initiates garbage collection cycle |
GC.disable |
None | true/false |
Disables automatic garbage collection |
GC.enable |
None | true/false |
Enables automatic garbage collection |
GC.stat |
key = nil |
Hash/Integer |
Returns garbage collection statistics |
GC.count |
None | Integer |
Returns total number of GC runs |
ObjectSpace.each_object |
class = nil, &block |
Integer |
Iterates over all objects of given class |
ObjectSpace.memsize_of |
obj |
Integer |
Returns memory size of object in bytes |
ObjectSpace.allocation_* |
obj |
String/Integer |
Returns allocation information for object |
GC Statistics Keys
Key | Type | Description |
---|---|---|
:count |
Integer | Total number of GC runs |
:major_gc_count |
Integer | Number of major (full) collections |
:minor_gc_count |
Integer | Number of minor (partial) collections |
:heap_allocated_pages |
Integer | Total allocated memory pages |
:heap_sorted_length |
Integer | Length of sorted heap |
:heap_allocatable_pages |
Integer | Pages available for allocation |
:heap_available_slots |
Integer | Available object slots |
:heap_live_slots |
Integer | Currently occupied slots |
:heap_free_slots |
Integer | Free slots in heap |
:heap_final_slots |
Integer | Objects pending finalization |
:total_allocated_objects |
Integer | Total objects created since start |
:total_freed_objects |
Integer | Total objects garbage collected |
Memory Profiling Tools
Tool | Purpose | Installation | Usage |
---|---|---|---|
memory_profiler |
Track object allocation and retention | gem install memory_profiler |
MemoryProfiler.report { code }.pretty_print |
allocation_tracer |
Detailed allocation source tracking | gem install allocation_tracer |
AllocationTracer.trace { code } |
get_process_mem |
Cross-platform process memory usage | gem install get_process_mem |
GetProcessMem.new.mb |
derailed_benchmarks |
Rails memory benchmarking | gem install derailed_benchmarks |
bundle exec derailed exec perf:mem |
String Optimization Patterns
Pattern | Memory Impact | Use Case |
---|---|---|
String#<< vs String#+ |
Lower allocation | Building strings incrementally |
String.new(capacity: n) |
Pre-allocated buffer | Known string size |
String#freeze |
Prevents duplication | Immutable strings |
Array#join vs concatenation |
Single allocation | Joining many strings |
String interpolation vs concatenation | Fewer intermediate objects | Formatted strings |
Collection Optimization Guidelines
Structure | Memory Overhead | Access Pattern | Best Use Case |
---|---|---|---|
Array | Lowest | Sequential O(1), Search O(n) | Ordered data, frequent iteration |
Hash | Medium | Key lookup O(1) | Key-value mapping |
Set | Low-Medium | Membership O(1) | Unique value testing |
OpenStruct | Highest | Method call overhead | Small, infrequent objects |
Garbage Collection Tuning Variables
Variable | Default | Purpose | Tuning Strategy |
---|---|---|---|
RUBY_GC_HEAP_INIT_SLOTS |
10000 | Initial heap slots | Increase for high-allocation apps |
RUBY_GC_HEAP_FREE_SLOTS |
4096 | Minimum free slots | Adjust based on allocation patterns |
RUBY_GC_HEAP_GROWTH_FACTOR |
1.8 | Heap growth multiplier | Lower for memory-constrained environments |
RUBY_GC_HEAP_GROWTH_MAX_SLOTS |
0 | Maximum slots per increment | Set limit in memory-constrained environments |
RUBY_GC_MALLOC_LIMIT |
16MB | Malloc memory threshold | Adjust based on C extension usage |
Memory Leak Detection Checklist
Check | Command/Code | Expected Result |
---|---|---|
Growing object count | GC.stat[:heap_live_slots] |
Should stabilize |
Increasing process RSS | ps -o rss= -p #{Process.pid} |
Should not grow indefinitely |
GC frequency | GC.stat[:count] trend |
Should scale with work |
Object retention | ObjectSpace.each_object.count |
Should release objects |
String pool growth | Symbol.all_symbols.size |
Should stabilize (modern Ruby) |