CrackedRuby CrackedRuby

Overview

Performance profiling measures how programs execute, tracking metrics like execution time, memory consumption, CPU usage, and method call frequency. Profiling provides empirical data about program behavior during runtime, revealing where computational resources are spent and identifying optimization opportunities.

Profiling differs from benchmarking. Benchmarking measures overall execution time under controlled conditions to compare implementations. Profiling examines internal program behavior during execution, breaking down resource consumption by function, line, or operation. A benchmark might show one algorithm runs faster than another; profiling reveals why by exposing which operations consume the most time.

The profiling process involves three stages: instrumentation, data collection, and analysis. Instrumentation adds measurement code to track execution. Data collection captures metrics during program execution. Analysis interprets collected data to identify bottlenecks and optimization targets.

Profilers operate through sampling or instrumentation. Sampling profilers periodically check program state, recording which code is executing at each sample point. Instrumentation profilers inject measurement code into the program, tracking every relevant event. Sampling introduces minimal overhead but provides statistical approximations. Instrumentation captures precise data but significantly impacts execution speed.

# Sampling profiler example - checks program state periodically
# Shows statistical view of where time is spent
require 'stackprof'

StackProf.run(mode: :cpu, out: 'profile.dump') do
  10_000.times { expensive_operation }
end

# Instrumentation profiler example - tracks every method call
# Shows exact call counts and timing
require 'ruby-prof'

result = RubyProf.profile do
  10_000.times { expensive_operation }
end

Profiling serves multiple objectives. Performance optimization identifies slow code paths for improvement. Capacity planning determines resource requirements for production deployment. Debugging reveals unexpected behavior manifesting as performance issues. Performance regression detection catches optimization losses between versions.

Key Principles

Performance profiling rests on measurement accuracy, statistical validity, and representative workloads. Measurements must reflect actual program behavior without significant distortion from profiling overhead. Statistical validity requires sufficient sample sizes and repeated measurements to account for variance. Representative workloads ensure profiling results transfer to production scenarios.

The observer effect describes how measurement changes the measured system. Profiling adds computational overhead, altering execution characteristics. Instrumentation profilers may increase execution time by 10-100x. Sampling profilers typically add 1-5% overhead. Heavier instrumentation provides more detailed data at the cost of less representative behavior.

Profiling granularity determines measurement detail. Function-level profiling tracks execution time per method. Line-level profiling measures individual statement execution. Block-level profiling examines code regions. Higher granularity increases data volume and overhead while providing finer optimization targets.

Hot spots represent code sections consuming disproportionate resources. The 90/10 rule suggests 90% of execution time concentrates in 10% of code. Profiling identifies these hot spots for targeted optimization. A function called once but running slowly may matter less than a fast function called millions of times.

# Hot spot example - seemingly fast operation becomes bottleneck
def process_records(records)
  records.map do |record|
    # Fast operation: 0.0001 seconds
    validate(record)  
  end
end

# Called with 1,000,000 records
# Total time: 0.0001 * 1,000,000 = 100 seconds
# This becomes the hot spot despite individual speed

Call graphs represent program execution flow, showing which functions call which and how often. Call depth reveals function nesting levels. Exclusive time measures time spent within a function excluding called functions. Inclusive time includes time spent in the function and all functions it calls. These metrics distinguish between functions doing work directly versus coordinating other functions.

Memory profiling tracks allocation patterns, object lifetimes, and garbage collection behavior. Allocation profiling counts object creation by type and location. Retention profiling identifies objects remaining in memory when expected to be garbage collected. Generation analysis examines object survival across garbage collection cycles.

Baseline establishment compares performance measurements against reference points. Baselines may represent previous application versions, competing implementations, or theoretical limits. Regression testing detects performance decreases between versions. Performance budgets set acceptable thresholds for operations.

Statistical significance addresses measurement variance. Multiple profiling runs produce different results due to system state variations. Outliers may represent rare edge cases or measurement errors. Confidence intervals quantify measurement uncertainty. Averaging multiple runs reduces noise but may obscure important variance.

Ruby Implementation

Ruby provides multiple profiling approaches through standard library modules and third-party gems. The ruby-prof gem offers comprehensive instrumentation-based profiling with multiple output formats. StackProf provides sampling-based profiling integrated with Ruby's internal structures. The Benchmark module enables controlled timing comparisons.

Ruby-prof operates through explicit profiling blocks. Starting the profiler enables instrumentation. The profiled code executes with measurement overhead. Stopping returns results containing timing and call data.

require 'ruby-prof'

# Start profiling
RubyProf.start

# Code to profile
def calculate_fibonacci(n)
  return n if n <= 1
  calculate_fibonacci(n - 1) + calculate_fibonacci(n - 2)
end

result = calculate_fibonacci(25)

# Stop profiling and get results
profile_result = RubyProf.stop

# Print flat profile sorted by self time
printer = RubyProf::FlatPrinter.new(profile_result)
printer.print(STDOUT, min_percent: 2)

StackProf samples the call stack at regular intervals, recording which methods are executing. The mode parameter determines sampling basis: cpu samples based on CPU time, wall samples by wall clock time, object samples by object allocations.

require 'stackprof'

# Profile CPU usage
StackProf.run(mode: :cpu, out: 'cpu_profile.dump') do
  array = (1..1_000_000).to_a
  array.each { |n| Math.sqrt(n) }
end

# Profile object allocations  
StackProf.run(mode: :object, out: 'object_profile.dump') do
  1_000.times { Array.new(1000) { rand } }
end

# Load and analyze results
StackProf::Report.new(StackProf.results).print_text

The memory_profiler gem tracks object allocations and retention. It reports allocated objects by class, location, and retention status.

require 'memory_profiler'

report = MemoryProfiler.report do
  array = []
  10_000.times do |i|
    array << { id: i, data: "Record #{i}" * 10 }
  end
end

# Show allocation breakdown
report.pretty_print(scale_bytes: true)

TracePoint provides low-level event hooks into Ruby execution. It captures method calls, returns, line execution, and object allocation events.

trace = TracePoint.new(:call, :return) do |tp|
  puts "#{tp.event} - #{tp.method_id} in #{tp.path}:#{tp.lineno}"
end

trace.enable do
  some_method
end

The Benchmark module measures execution time for code blocks. It compares multiple implementations under identical conditions.

require 'benchmark'

n = 1_000_000

Benchmark.bm(20) do |x|
  x.report("Array#each:") do
    arr = (1..n).to_a
    arr.each { |i| i * 2 }
  end
  
  x.report("Array#map:") do
    arr = (1..n).to_a
    arr.map { |i| i * 2 }
  end
  
  x.report("for loop:") do
    arr = (1..n).to_a
    for i in arr
      i * 2
    end
  end
end

Production profiling requires minimal overhead and safe failure modes. Rack Mini Profiler integrates into Rails applications, providing per-request profiling accessible through browser interfaces.

# In Gemfile
gem 'rack-mini-profiler'

# In config/environments/development.rb
config.middleware.use(Rack::MiniProfiler)

# Access profiling data by appending ?pp=enable to URLs
# Displays SQL queries, rendering time, allocations per request

Datadog, New Relic, and Skylight offer production profiling services. They sample production traffic, aggregate metrics, and provide visualization interfaces. These services balance overhead with data collection, typically sampling 1-5% of requests.

Tools & Ecosystem

Ruby's profiling ecosystem includes sampling profilers, instrumentation profilers, memory analyzers, and production monitoring tools. Each tool addresses specific profiling scenarios with different trade-offs.

RubyProf provides the most comprehensive instrumentation profiling. It supports multiple output formats: flat profiles showing per-method metrics, graph profiles displaying call relationships, and call stack profiles revealing execution paths. Graph HTML output generates interactive visualizations showing call hierarchies with timing data.

require 'ruby-prof'

result = RubyProf.profile do
  complex_operation
end

# Flat text output - methods sorted by self time
flat_printer = RubyProf::FlatPrinter.new(result)
flat_printer.print(File.open('flat.txt', 'w'))

# Graph HTML output - interactive call graph
graph_printer = RubyProf::GraphHtmlPrinter.new(result)
graph_printer.print(File.open('graph.html', 'w'))

# Call stack output - execution paths
stack_printer = RubyProf::CallStackPrinter.new(result)
stack_printer.print(File.open('stack.html', 'w'))

StackProf excels at production profiling due to minimal overhead. It samples at configurable intervals, recording stack traces. The report format shows methods consuming the most time, sorted by total samples.

require 'stackprof'

# Sample every 1000 CPU cycles
StackProf.run(mode: :cpu, interval: 1000, out: 'profile.dump') do
  application_code
end

# Generate flamegraph visualization
system('stackprof --flamegraph profile.dump > flamegraph.txt')
system('flamegraph.pl flamegraph.txt > flamegraph.svg')

Flamegraphs visualize profiling data as hierarchical rectangles. Width represents time consumed, height shows call depth. Colors distinguish different code paths. Clicking sections zooms into specific call paths.

Memory Profiler tracks allocation locations and retained objects. It identifies memory leaks by finding objects that should be garbage collected but remain referenced.

require 'memory_profiler'

report = MemoryProfiler.report(top: 50) do
  leak_code
end

# Find objects allocated but not freed
puts "Retained objects: #{report.total_retained}"
puts "Retained memory: #{report.total_retained_memsize} bytes"

# Show allocation locations
report.pretty_print(to_file: 'memory_report.txt', 
                    scale_bytes: true,
                    normalize_paths: true)

Derailed Benchmarks profiles Rails applications, measuring boot time, memory usage per request, and gem loading time. It identifies expensive gems and request handlers.

# In Gemfile
gem 'derailed_benchmarks', group: :development

# Measure memory per request
$ bundle exec derailed bundle:mem

# Find expensive gems
$ bundle exec derailed bundle:objects

# Profile boot time
$ bundle exec derailed bundle:gemfile

Rack Mini Profiler operates as middleware, profiling each request and displaying results in the browser. It shows SQL queries, template rendering, and method call timing without external services.

Skylight, Scout APM, and New Relic provide production profiling as services. They agent-based sampling, aggregating data across many requests. They track endpoint performance, database query patterns, and external service calls. These tools identify slow endpoints and correlate performance with deployment changes.

Benchmark IPS (iterations per second) measures operation throughput rather than elapsed time. It determines how many times an operation completes per second, accounting for variance through statistical analysis.

require 'benchmark/ips'

Benchmark.ips do |x|
  x.report("string concatenation") do
    str = ""
    100.times { str += "a" }
  end
  
  x.report("string interpolation") do
    str = ""
    100.times { str = "#{str}a" }
  end
  
  x.report("array join") do
    arr = []
    100.times { arr << "a" }
    arr.join
  end
  
  x.compare!
end

Allocation Tracer tracks object allocations per source location. It identifies allocation hot spots causing excessive garbage collection.

require 'allocation_tracer'

ObjectSpace::AllocationTracer.setup(%i[path line type])
ObjectSpace::AllocationTracer.trace do
  allocation_heavy_code
end

result = ObjectSpace::AllocationTracer.allocated_count_table
result.sort_by { |k, v| -v }.first(10).each do |location, count|
  puts "#{location}: #{count} allocations"
end

Practical Examples

Profiling a slow Rails endpoint begins with request-level timing. Rack Mini Profiler shows database queries, view rendering, and method calls contributing to response time.

# Install rack-mini-profiler
# Gemfile
gem 'rack-mini-profiler'

# Enable for development
# config/environments/development.rb
config.middleware.use Rack::MiniProfiler

# Access endpoint with profiling
# GET /users?pp=enable
# Shows breakdown:
# - SQL: 450ms (30 queries)
# - Rendering: 200ms
# - Controller: 50ms

Identifying the bottleneck reveals 30 database queries due to N+1 patterns. Profiling isolates the problem method.

# Slow controller action
def index
  @users = User.all
  # Template iterates users, triggering queries
  # <%= user.posts.count %> causes one query per user
end

# Profile specific method
require 'ruby-prof'

RubyProf.start
@users = User.includes(:posts).all
result = RubyProf.stop

printer = RubyProf::FlatPrinter.new(result)
printer.print(STDOUT)

Memory profiling detects leaks in background jobs. A job processing uploaded images holds references to processed data, preventing garbage collection.

require 'memory_profiler'

# Profile job execution
report = MemoryProfiler.report do
  ImageProcessingJob.perform_now(image_id)
end

# Results show retained objects
# Retained String: 50,000 (5MB)
# Allocated at app/jobs/image_processing_job.rb:15

Examining line 15 reveals an unnecessary instance variable holding processed data.

class ImageProcessingJob < ApplicationJob
  def perform(image_id)
    image = Image.find(image_id)
    @processed_data = process_image(image) # Retained after job completes
    upload_to_s3(@processed_data)
    # Should be: upload_to_s3(process_image(image))
  end
end

Profiling algorithm performance compares implementations. A data processing pipeline filters and transforms large datasets. Profiling reveals which operations dominate execution time.

require 'benchmark'

data = (1..1_000_000).map { { value: rand(1000), category: rand(10) } }

Benchmark.bm(30) do |x|
  x.report("select then map:") do
    data.select { |h| h[:value] > 500 }
        .map { |h| h[:value] * 2 }
  end
  
  x.report("each_with_object:") do
    data.each_with_object([]) do |h, acc|
      acc << h[:value] * 2 if h[:value] > 500
    end
  end
  
  x.report("reduce:") do
    data.reduce([]) do |acc, h|
      h[:value] > 500 ? acc << h[:value] * 2 : acc
    end
  end
end

Detailed profiling with StackProf identifies allocation hot spots.

require 'stackprof'

StackProf.run(mode: :object, out: 'objects.dump') do
  result = data.select { |h| h[:value] > 500 }
              .map { |h| h[:value] * 2 }
end

# Analyze allocation patterns
report = StackProf::Report.from_file('objects.dump')
report.print_text(limit: 20)

# Shows:
# Array#select: 500,000 object allocations
# Array#map: 500,000 object allocations  
# Total: 1,000,000 intermediate objects

Profiling production performance requires sampling to minimize overhead. StackProf runs continuously, dumping profiles periodically for analysis.

# In production initializer
if ENV['ENABLE_PROFILING'] == 'true'
  require 'stackprof'
  
  # Start profiling on server boot
  StackProf.run(mode: :cpu, 
                interval: 5000,  # Sample less frequently
                out: 'tmp/stackprof.dump',
                raw: true) do
    # Application runs indefinitely
    # Profile captures samples during execution
  end
end

# Analyze collected data
# bundle exec stackprof tmp/stackprof.dump --text --limit 50

Profiling database query performance combines application profiling with database explain plans. Slow query logs identify problematic queries, while profiling shows which application code triggers them.

# Enable query logging
ActiveRecord::Base.logger = Logger.new(STDOUT)

# Profile request with database queries
require 'stackprof'

StackProf.run(mode: :wall, out: 'queries.dump') do
  User.includes(:posts).where(active: true).each do |user|
    user.posts.where(published: true).count
  end
end

# Analyze to find query-heavy methods

Common Patterns

Performance profiling workflows follow established patterns for different optimization scenarios. The profile-bottleneck-optimize-verify cycle forms the core pattern.

The diagnostic profiling pattern identifies performance problems in existing code. Start with high-level profiling showing overall time distribution. Drill into hot spots with detailed profiling. Verify bottleneck identification through multiple profiling runs. This pattern prevents premature optimization by confirming which code sections actually consume resources.

# Step 1: High-level profiling
require 'ruby-prof'

result = RubyProf.profile(measure_mode: RubyProf::WALL_TIME) do
  full_application_workflow
end

flat = RubyProf::FlatPrinter.new(result)
flat.print(STDOUT, min_percent: 5) # Show methods consuming >5% time

# Step 2: Detailed profiling of identified bottleneck
result = RubyProf.profile(measure_mode: RubyProf::PROCESS_TIME) do
  identified_slow_method
end

graph = RubyProf::GraphPrinter.new(result)
graph.print(STDOUT)

The comparison profiling pattern evaluates multiple implementations. Profile each approach under identical conditions. Compare results statistically to account for variance. This pattern validates optimization attempts, confirming improvements before committing changes.

require 'benchmark/ips'

# Compare implementations statistically
Benchmark.ips do |x|
  x.config(time: 10, warmup: 2) # Run longer for accuracy
  
  x.report("original") { original_implementation }
  x.report("optimized") { optimized_implementation }
  
  x.compare! # Shows statistical significance
end

The memory leak detection pattern tracks allocation growth over time. Profile allocation locations, identifying objects remaining in memory. Compare object counts before and after operations that should free memory.

require 'memory_profiler'

# Baseline measurement
GC.start # Start clean
before = GC.stat(:heap_live_slots)

# Run operation multiple times
report = MemoryProfiler.report do
  10.times { potentially_leaking_operation }
end

# Check for growth
GC.start
after = GC.stat(:heap_live_slots)
growth = after - before

puts "Live slots grew by: #{growth}"
report.pretty_print(retained_strings: 50)

The continuous profiling pattern monitors production performance over time. Sample a percentage of production traffic continuously. Aggregate profiling data to track performance trends. Alert on performance regression when metrics exceed thresholds.

# Middleware for selective profiling
class ProductionProfiler
  def initialize(app, sample_rate: 0.01)
    @app = app
    @sample_rate = sample_rate
  end
  
  def call(env)
    if rand < @sample_rate
      require 'stackprof'
      
      profile_data = nil
      result = StackProf.run(mode: :cpu, raw: true) do
        @app.call(env)
      end
      
      # Store profile data for analysis
      store_profile(result[:raw], env)
      result[:response]
    else
      @app.call(env)
    end
  end
  
  def store_profile(data, env)
    # Send to aggregation service
    ProfileStorage.save(
      data: data,
      endpoint: env['PATH_INFO'],
      timestamp: Time.now
    )
  end
end

The allocation reduction pattern minimizes garbage collection pressure. Profile object allocations per operation. Identify allocation hot spots creating unnecessary objects. Modify code to reduce allocations, then verify improvement.

# Profile allocations
require 'allocation_tracer'

ObjectSpace::AllocationTracer.setup(%i[path line type])
ObjectSpace::AllocationTracer.trace do
  allocation_heavy_code
end

table = ObjectSpace::AllocationTracer.allocated_count_table

# Before optimization: 100,000 String allocations
# app/services/processor.rb:25

# Optimization: reuse strings, avoid intermediate arrays
def process(items)
  result = +""  # Mutable string
  items.each do |item|
    result << transform(item)
  end
  result
end

# After: 1 String allocation

The progressive profiling pattern narrows focus iteratively. Start with coarse-grained profiling showing function-level timing. Profile identified hot spots at finer granularity. Continue drilling down until the specific slow operation appears.

# Level 1: Function-level profiling
result = RubyProf.profile { slow_controller_action }
# Identifies slow_service_method consuming 80% time

# Level 2: Detailed method profiling  
result = RubyProf.profile { slow_service_method }
# Identifies database_query consuming 70% of method time

# Level 3: Query-level profiling
ActiveRecord::Base.logger = Logger.new(STDOUT)
slow_service_method
# Identifies missing index on users.email column

Common Pitfalls

Profiling without representative data produces misleading results. Development datasets often contain orders of magnitude fewer records than production. Profile performance with production-sized datasets or the bottleneck may not appear during profiling.

# Misleading profiling with small dataset
users = User.limit(10)  # Development has 10 users
profile { users.each { |u| u.posts.count } }
# Shows no performance problem

# Production has 100,000 users
# Same code exhibits severe N+1 problem

Profiling overhead distorts measurements, especially with heavy instrumentation. RubyProf may increase execution time 10-100x, changing relative timing between operations. Fast operations appear slower, altering which methods seem like bottlenecks.

# Without profiling: method_a takes 0.1s, method_b takes 0.01s  
# With profiling: method_a takes 2s, method_b takes 1.5s
# method_b now appears significant due to profiler overhead

Optimizing the wrong metric wastes effort. CPU time, wall clock time, and memory usage represent different resources. CPU-bound operations benefit from algorithmic improvements. I/O-bound operations need concurrency changes. Profiling CPU time for I/O-heavy code identifies problems unrelated to actual bottlenecks.

# Profiling CPU time for I/O-bound code
StackProf.run(mode: :cpu) do
  users.each { |u| u.posts.count }  # Database I/O
end
# Shows little CPU usage because time spent waiting on database
# Should profile wall time instead

Ignoring garbage collection impact creates incomplete performance pictures. Allocation-heavy code may appear fast in profiling but cause performance issues through garbage collection pauses. Profile both execution time and allocations.

# Fast in profiling
def process(items)
  items.map { |i| transform(i) }  # Creates intermediate array
       .select { |i| i > threshold }  # Creates another array
       .map { |i| finalize(i) }  # And another
end
# 3 intermediate arrays → heavy GC pressure in production

Profiling in the wrong environment yields non-transferable results. Development mode in Rails loads different middleware, eager loading behaves differently, and caching may be disabled. Profile in production mode or results won't reflect production performance.

Single-run profiling provides insufficient data for optimization decisions. Performance varies between runs due to system state, garbage collection timing, and JIT compilation. Run profiling multiple times, analyzing variance and median values.

# Insufficient: single run
Benchmark.measure { slow_method }

# Better: multiple runs with statistics
require 'benchmark/ips'

Benchmark.ips do |x|
  x.config(time: 10, warmup: 2)
  x.report("method") { slow_method }
end
# Provides iterations/sec with statistical confidence

Premature optimization without profiling wastes time on unimportant code. Developers often guess wrong about bottlenecks. Profile first to identify actual hot spots before optimizing.

Profiling modified code without baseline comparison prevents determining whether optimizations succeeded. Establish baseline metrics before changes, then profile again after optimization.

# Establish baseline
baseline = Benchmark.measure { original_implementation }

# Make optimization changes
# ...

# Compare after optimization  
optimized = Benchmark.measure { optimized_implementation }

speedup = baseline.real / optimized.real
puts "Speedup: #{speedup}x"

Misinterpreting inclusive vs exclusive time leads to incorrect optimization targets. Inclusive time includes called methods, making coordinator functions appear slow when they're fast but call slow functions. Focus on exclusive time to find methods doing actual slow work.

# Coordinator method shows high inclusive time
def process_all(items)
  items.each { |item| slow_processor(item) }
end
# Inclusive time: 100s (includes slow_processor)
# Exclusive time: 0.1s (just the loop overhead)
# Optimization target: slow_processor, not process_all

Profiling without accounting for concurrency produces incorrect results in multi-threaded applications. CPU time may exceed wall clock time when multiple threads execute. Per-thread profiling shows actual work distribution.

Production profiling without safety mechanisms risks impacting user experience. Continuous profiling should sample minimally, fail gracefully, and disable automatically if overhead becomes excessive.

Reference

Profiling Modes Comparison

Mode Measures Overhead Use Case
CPU time Processor cycles consumed Low (1-3%) CPU-bound optimization
Wall time Elapsed real time Low (1-3%) I/O-bound optimization
Object allocations Created objects Medium (5-10%) Memory optimization
Process time Includes GC time Low (1-3%) Overall execution
Instrumentation Every method call High (10-100x) Detailed call analysis

Ruby Profiling Tools

Tool Type Best For Overhead
RubyProf Instrumentation Detailed analysis High
StackProf Sampling Production profiling Low
MemoryProfiler Allocation tracking Memory leaks Medium
Benchmark Comparison Algorithm comparison None
TracePoint Event hooks Custom profiling Variable
Rack Mini Profiler Request profiling Rails debugging Medium

StackProf Configuration

Mode Samples Typical Use
cpu CPU execution Find CPU bottlenecks
wall Elapsed time Find I/O bottlenecks
object Object allocations Reduce GC pressure
custom User-defined events Specialized profiling

RubyProf Measurement Modes

Mode Constant Measures
Wall time WALL_TIME Real elapsed time
Process time PROCESS_TIME CPU time + GC time
Allocations ALLOCATIONS Objects created
Memory MEMORY Bytes allocated

RubyProf Printers

Printer Output Use Case
FlatPrinter Text table Quick overview
GraphPrinter Call graph Execution flow
GraphHtmlPrinter Interactive HTML Visual analysis
CallStackPrinter Stack traces Path analysis
DotPrinter Graphviz format Visualization

Profiling Workflow Checklist

Step Action Validation
Establish baseline Profile current implementation Multiple runs for variance
Identify bottlenecks Analyze hot spots Confirm with focused profiling
Hypothesize cause Determine bottleneck reason Check assumptions with data
Implement optimization Modify code Keep original for comparison
Profile optimized version Measure improvement Compare against baseline
Verify in production Monitor live metrics Confirm improvement transfers

Common Profiling Flags

Flag Purpose Example
min_percent Filter small methods min_percent: 2
measure_mode Timing basis RubyProf::WALL_TIME
eliminate_methods Exclude patterns eliminate_methods: [/^Kernel/]
raw Preserve raw data raw: true
interval Sampling frequency interval: 1000

Memory Profiling Metrics

Metric Description Significance
Allocated objects Total objects created Allocation pressure
Retained objects Objects not garbage collected Memory leak indicator
Allocated memory Total bytes allocated Memory throughput
Retained memory Bytes remaining in heap Memory leak size
Allocations by location Objects per source line Hot spot identification

Flamegraph Interpretation

Visual Element Meaning Optimization Signal
Width Time consumed Wider = more time
Height Call stack depth Deep = complex flow
Flat top Direct work Optimization target
Plateau Called methods Check callees
Color intensity Sample concentration Darker = hot spot

Benchmark Output Fields

Field Meaning Units
user User CPU time Seconds
system System CPU time Seconds
total User + system Seconds
real Wall clock time Seconds
iterations Times executed Count

Profiling Command Examples

# Basic RubyProf profiling
result = RubyProf.profile { target_code }
RubyProf::FlatPrinter.new(result).print(STDOUT)

# StackProf with flamegraph generation
StackProf.run(mode: :cpu, out: 'profile.dump') { target_code }
# Generate: stackprof --flamegraph profile.dump > flame.txt

# Memory profiling with filtering  
report = MemoryProfiler.report(top: 50) { target_code }
report.pretty_print(scale_bytes: true)

# Benchmark comparison
Benchmark.bm(20) do |x|
  x.report("implementation_a") { code_a }
  x.report("implementation_b") { code_b }  
end

# Production sampling
StackProf.run(mode: :cpu, interval: 10000, raw: true) do
  # Runs continuously with minimal overhead
end

GC Metrics for Profiling

Metric Method Indicates
Heap slots GC.stat(:heap_live_slots) Live objects
GC count GC.count Collection frequency
GC time GC::Profiler.total_time Collection overhead
Minor collections GC.stat(:minor_gc_count) Young generation pressure
Major collections GC.stat(:major_gc_count) Old generation pressure

Profiling Environment Variables

Variable Effect Example
RUBY_GC_HEAP_GROWTH_FACTOR Heap growth rate 1.8
RUBY_GC_HEAP_INIT_SLOTS Initial heap size 600000
RUBY_GC_MALLOC_LIMIT GC trigger threshold 16000000
RUBY_PROF_MEASURE_MODE Default measurement process_time

Statistical Significance Thresholds

Confidence Level Minimum Runs Use Case
90% 10 runs Quick validation
95% 30 runs Standard comparison
99% 50 runs Critical optimization
Exploratory 5 runs Initial investigation