CrackedRuby - Performance Profiling

Overview

Performance profiling measures how programs execute, tracking metrics like execution time, memory consumption, CPU usage, and method call frequency. Profiling provides empirical data about program behavior during runtime, revealing where computational resources are spent and identifying optimization opportunities.

Profiling differs from benchmarking. Benchmarking measures overall execution time under controlled conditions to compare implementations. Profiling examines internal program behavior during execution, breaking down resource consumption by function, line, or operation. A benchmark might show one algorithm runs faster than another; profiling reveals why by exposing which operations consume the most time.

The profiling process involves three stages: instrumentation, data collection, and analysis. Instrumentation adds measurement code to track execution. Data collection captures metrics during program execution. Analysis interprets collected data to identify bottlenecks and optimization targets.

Profilers operate through sampling or instrumentation. Sampling profilers periodically check program state, recording which code is executing at each sample point. Instrumentation profilers inject measurement code into the program, tracking every relevant event. Sampling introduces minimal overhead but provides statistical approximations. Instrumentation captures precise data but significantly impacts execution speed.

# Sampling profiler example - checks program state periodically
# Shows statistical view of where time is spent
require 'stackprof'

StackProf.run(mode: :cpu, out: 'profile.dump') do
  10_000.times { expensive_operation }
end

# Instrumentation profiler example - tracks every method call
# Shows exact call counts and timing
require 'ruby-prof'

result = RubyProf.profile do
  10_000.times { expensive_operation }
end

Profiling serves multiple objectives. Performance optimization identifies slow code paths for improvement. Capacity planning determines resource requirements for production deployment. Debugging reveals unexpected behavior manifesting as performance issues. Performance regression detection catches optimization losses between versions.

Key Principles

Performance profiling rests on measurement accuracy, statistical validity, and representative workloads. Measurements must reflect actual program behavior without significant distortion from profiling overhead. Statistical validity requires sufficient sample sizes and repeated measurements to account for variance. Representative workloads ensure profiling results transfer to production scenarios.

The observer effect describes how measurement changes the measured system. Profiling adds computational overhead, altering execution characteristics. Instrumentation profilers may increase execution time by 10-100x. Sampling profilers typically add 1-5% overhead. Heavier instrumentation provides more detailed data at the cost of less representative behavior.

Profiling granularity determines measurement detail. Function-level profiling tracks execution time per method. Line-level profiling measures individual statement execution. Block-level profiling examines code regions. Higher granularity increases data volume and overhead while providing finer optimization targets.

Hot spots represent code sections consuming disproportionate resources. The 90/10 rule suggests 90% of execution time concentrates in 10% of code. Profiling identifies these hot spots for targeted optimization. A function called once but running slowly may matter less than a fast function called millions of times.

# Hot spot example - seemingly fast operation becomes bottleneck
def process_records(records)
  records.map do |record|
    # Fast operation: 0.0001 seconds
    validate(record)  
  end
end

# Called with 1,000,000 records
# Total time: 0.0001 * 1,000,000 = 100 seconds
# This becomes the hot spot despite individual speed

Call graphs represent program execution flow, showing which functions call which and how often. Call depth reveals function nesting levels. Exclusive time measures time spent within a function excluding called functions. Inclusive time includes time spent in the function and all functions it calls. These metrics distinguish between functions doing work directly versus coordinating other functions.

Memory profiling tracks allocation patterns, object lifetimes, and garbage collection behavior. Allocation profiling counts object creation by type and location. Retention profiling identifies objects remaining in memory when expected to be garbage collected. Generation analysis examines object survival across garbage collection cycles.

Baseline establishment compares performance measurements against reference points. Baselines may represent previous application versions, competing implementations, or theoretical limits. Regression testing detects performance decreases between versions. Performance budgets set acceptable thresholds for operations.

Statistical significance addresses measurement variance. Multiple profiling runs produce different results due to system state variations. Outliers may represent rare edge cases or measurement errors. Confidence intervals quantify measurement uncertainty. Averaging multiple runs reduces noise but may obscure important variance.

Ruby Implementation

Ruby provides multiple profiling approaches through standard library modules and third-party gems. The ruby-prof gem offers comprehensive instrumentation-based profiling with multiple output formats. StackProf provides sampling-based profiling integrated with Ruby's internal structures. The Benchmark module enables controlled timing comparisons.

Ruby-prof operates through explicit profiling blocks. Starting the profiler enables instrumentation. The profiled code executes with measurement overhead. Stopping returns results containing timing and call data.

require 'ruby-prof'

# Start profiling
RubyProf.start

# Code to profile
def calculate_fibonacci(n)
  return n if n <= 1
  calculate_fibonacci(n - 1) + calculate_fibonacci(n - 2)
end

result = calculate_fibonacci(25)

# Stop profiling and get results
profile_result = RubyProf.stop

# Print flat profile sorted by self time
printer = RubyProf::FlatPrinter.new(profile_result)
printer.print(STDOUT, min_percent: 2)

StackProf samples the call stack at regular intervals, recording which methods are executing. The mode parameter determines sampling basis: cpu samples based on CPU time, wall samples by wall clock time, object samples by object allocations.

require 'stackprof'

# Profile CPU usage
StackProf.run(mode: :cpu, out: 'cpu_profile.dump') do
  array = (1..1_000_000).to_a
  array.each { |n| Math.sqrt(n) }
end

# Profile object allocations  
StackProf.run(mode: :object, out: 'object_profile.dump') do
  1_000.times { Array.new(1000) { rand } }
end

# Load and analyze results
StackProf::Report.new(StackProf.results).print_text

The memory_profiler gem tracks object allocations and retention. It reports allocated objects by class, location, and retention status.

require 'memory_profiler'

report = MemoryProfiler.report do
  array = []
  10_000.times do |i|
    array << { id: i, data: "Record #{i}" * 10 }
  end
end

# Show allocation breakdown
report.pretty_print(scale_bytes: true)

TracePoint provides low-level event hooks into Ruby execution. It captures method calls, returns, line execution, and object allocation events.

trace = TracePoint.new(:call, :return) do |tp|
  puts "#{tp.event} - #{tp.method_id} in #{tp.path}:#{tp.lineno}"
end

trace.enable do
  some_method
end

The Benchmark module measures execution time for code blocks. It compares multiple implementations under identical conditions.

require 'benchmark'

n = 1_000_000

Benchmark.bm(20) do |x|
  x.report("Array#each:") do
    arr = (1..n).to_a
    arr.each { |i| i * 2 }
  end
  
  x.report("Array#map:") do
    arr = (1..n).to_a
    arr.map { |i| i * 2 }
  end
  
  x.report("for loop:") do
    arr = (1..n).to_a
    for i in arr
      i * 2
    end
  end
end

Production profiling requires minimal overhead and safe failure modes. Rack Mini Profiler integrates into Rails applications, providing per-request profiling accessible through browser interfaces.

# In Gemfile
gem 'rack-mini-profiler'

# In config/environments/development.rb
config.middleware.use(Rack::MiniProfiler)

# Access profiling data by appending ?pp=enable to URLs
# Displays SQL queries, rendering time, allocations per request

Datadog, New Relic, and Skylight offer production profiling services. They sample production traffic, aggregate metrics, and provide visualization interfaces. These services balance overhead with data collection, typically sampling 1-5% of requests.

Tools & Ecosystem

Ruby's profiling ecosystem includes sampling profilers, instrumentation profilers, memory analyzers, and production monitoring tools. Each tool addresses specific profiling scenarios with different trade-offs.

RubyProf provides the most comprehensive instrumentation profiling. It supports multiple output formats: flat profiles showing per-method metrics, graph profiles displaying call relationships, and call stack profiles revealing execution paths. Graph HTML output generates interactive visualizations showing call hierarchies with timing data.

require 'ruby-prof'

result = RubyProf.profile do
  complex_operation
end

# Flat text output - methods sorted by self time
flat_printer = RubyProf::FlatPrinter.new(result)
flat_printer.print(File.open('flat.txt', 'w'))

# Graph HTML output - interactive call graph
graph_printer = RubyProf::GraphHtmlPrinter.new(result)
graph_printer.print(File.open('graph.html', 'w'))

# Call stack output - execution paths
stack_printer = RubyProf::CallStackPrinter.new(result)
stack_printer.print(File.open('stack.html', 'w'))

StackProf excels at production profiling due to minimal overhead. It samples at configurable intervals, recording stack traces. The report format shows methods consuming the most time, sorted by total samples.

require 'stackprof'

# Sample every 1000 CPU cycles
StackProf.run(mode: :cpu, interval: 1000, out: 'profile.dump') do
  application_code
end

# Generate flamegraph visualization
system('stackprof --flamegraph profile.dump > flamegraph.txt')
system('flamegraph.pl flamegraph.txt > flamegraph.svg')

Flamegraphs visualize profiling data as hierarchical rectangles. Width represents time consumed, height shows call depth. Colors distinguish different code paths. Clicking sections zooms into specific call paths.

Memory Profiler tracks allocation locations and retained objects. It identifies memory leaks by finding objects that should be garbage collected but remain referenced.

require 'memory_profiler'

report = MemoryProfiler.report(top: 50) do
  leak_code
end

# Find objects allocated but not freed
puts "Retained objects: #{report.total_retained}"
puts "Retained memory: #{report.total_retained_memsize} bytes"

# Show allocation locations
report.pretty_print(to_file: 'memory_report.txt', 
                    scale_bytes: true,
                    normalize_paths: true)

Derailed Benchmarks profiles Rails applications, measuring boot time, memory usage per request, and gem loading time. It identifies expensive gems and request handlers.

# In Gemfile
gem 'derailed_benchmarks', group: :development

# Measure memory per request
$ bundle exec derailed bundle:mem

# Find expensive gems
$ bundle exec derailed bundle:objects

# Profile boot time
$ bundle exec derailed bundle:gemfile

Rack Mini Profiler operates as middleware, profiling each request and displaying results in the browser. It shows SQL queries, template rendering, and method call timing without external services.

Skylight, Scout APM, and New Relic provide production profiling as services. They agent-based sampling, aggregating data across many requests. They track endpoint performance, database query patterns, and external service calls. These tools identify slow endpoints and correlate performance with deployment changes.

Benchmark IPS (iterations per second) measures operation throughput rather than elapsed time. It determines how many times an operation completes per second, accounting for variance through statistical analysis.

require 'benchmark/ips'

Benchmark.ips do |x|
  x.report("string concatenation") do
    str = ""
    100.times { str += "a" }
  end
  
  x.report("string interpolation") do
    str = ""
    100.times { str = "#{str}a" }
  end
  
  x.report("array join") do
    arr = []
    100.times { arr << "a" }
    arr.join
  end
  
  x.compare!
end

Allocation Tracer tracks object allocations per source location. It identifies allocation hot spots causing excessive garbage collection.

require 'allocation_tracer'

ObjectSpace::AllocationTracer.setup(%i[path line type])
ObjectSpace::AllocationTracer.trace do
  allocation_heavy_code
end

result = ObjectSpace::AllocationTracer.allocated_count_table
result.sort_by { |k, v| -v }.first(10).each do |location, count|
  puts "#{location}: #{count} allocations"
end

Practical Examples

Profiling a slow Rails endpoint begins with request-level timing. Rack Mini Profiler shows database queries, view rendering, and method calls contributing to response time.

# Install rack-mini-profiler
# Gemfile
gem 'rack-mini-profiler'

# Enable for development
# config/environments/development.rb
config.middleware.use Rack::MiniProfiler

# Access endpoint with profiling
# GET /users?pp=enable
# Shows breakdown:
# - SQL: 450ms (30 queries)
# - Rendering: 200ms
# - Controller: 50ms

Identifying the bottleneck reveals 30 database queries due to N+1 patterns. Profiling isolates the problem method.

# Slow controller action
def index
  @users = User.all
  # Template iterates users, triggering queries
  # <%= user.posts.count %> causes one query per user
end

# Profile specific method
require 'ruby-prof'

RubyProf.start
@users = User.includes(:posts).all
result = RubyProf.stop

printer = RubyProf::FlatPrinter.new(result)
printer.print(STDOUT)

Memory profiling detects leaks in background jobs. A job processing uploaded images holds references to processed data, preventing garbage collection.

require 'memory_profiler'

# Profile job execution
report = MemoryProfiler.report do
  ImageProcessingJob.perform_now(image_id)
end

# Results show retained objects
# Retained String: 50,000 (5MB)
# Allocated at app/jobs/image_processing_job.rb:15

Examining line 15 reveals an unnecessary instance variable holding processed data.

class ImageProcessingJob < ApplicationJob
  def perform(image_id)
    image = Image.find(image_id)
    @processed_data = process_image(image) # Retained after job completes
    upload_to_s3(@processed_data)
    # Should be: upload_to_s3(process_image(image))
  end
end

Profiling algorithm performance compares implementations. A data processing pipeline filters and transforms large datasets. Profiling reveals which operations dominate execution time.

require 'benchmark'

data = (1..1_000_000).map { { value: rand(1000), category: rand(10) } }

Benchmark.bm(30) do |x|
  x.report("select then map:") do
    data.select { |h| h[:value] > 500 }
        .map { |h| h[:value] * 2 }
  end
  
  x.report("each_with_object:") do
    data.each_with_object([]) do |h, acc|
      acc << h[:value] * 2 if h[:value] > 500
    end
  end
  
  x.report("reduce:") do
    data.reduce([]) do |acc, h|
      h[:value] > 500 ? acc << h[:value] * 2 : acc
    end
  end
end

Detailed profiling with StackProf identifies allocation hot spots.

require 'stackprof'

StackProf.run(mode: :object, out: 'objects.dump') do
  result = data.select { |h| h[:value] > 500 }
              .map { |h| h[:value] * 2 }
end

# Analyze allocation patterns
report = StackProf::Report.from_file('objects.dump')
report.print_text(limit: 20)

# Shows:
# Array#select: 500,000 object allocations
# Array#map: 500,000 object allocations  
# Total: 1,000,000 intermediate objects

Profiling production performance requires sampling to minimize overhead. StackProf runs continuously, dumping profiles periodically for analysis.

# In production initializer
if ENV['ENABLE_PROFILING'] == 'true'
  require 'stackprof'
  
  # Start profiling on server boot
  StackProf.run(mode: :cpu, 
                interval: 5000,  # Sample less frequently
                out: 'tmp/stackprof.dump',
                raw: true) do
    # Application runs indefinitely
    # Profile captures samples during execution
  end
end

# Analyze collected data
# bundle exec stackprof tmp/stackprof.dump --text --limit 50

Profiling database query performance combines application profiling with database explain plans. Slow query logs identify problematic queries, while profiling shows which application code triggers them.

# Enable query logging
ActiveRecord::Base.logger = Logger.new(STDOUT)

# Profile request with database queries
require 'stackprof'

StackProf.run(mode: :wall, out: 'queries.dump') do
  User.includes(:posts).where(active: true).each do |user|
    user.posts.where(published: true).count
  end
end

# Analyze to find query-heavy methods

Common Patterns

Performance profiling workflows follow established patterns for different optimization scenarios. The profile-bottleneck-optimize-verify cycle forms the core pattern.

The diagnostic profiling pattern identifies performance problems in existing code. Start with high-level profiling showing overall time distribution. Drill into hot spots with detailed profiling. Verify bottleneck identification through multiple profiling runs. This pattern prevents premature optimization by confirming which code sections actually consume resources.

# Step 1: High-level profiling
require 'ruby-prof'

result = RubyProf.profile(measure_mode: RubyProf::WALL_TIME) do
  full_application_workflow
end

flat = RubyProf::FlatPrinter.new(result)
flat.print(STDOUT, min_percent: 5) # Show methods consuming >5% time

# Step 2: Detailed profiling of identified bottleneck
result = RubyProf.profile(measure_mode: RubyProf::PROCESS_TIME) do
  identified_slow_method
end

graph = RubyProf::GraphPrinter.new(result)
graph.print(STDOUT)

The comparison profiling pattern evaluates multiple implementations. Profile each approach under identical conditions. Compare results statistically to account for variance. This pattern validates optimization attempts, confirming improvements before committing changes.

require 'benchmark/ips'

# Compare implementations statistically
Benchmark.ips do |x|
  x.config(time: 10, warmup: 2) # Run longer for accuracy
  
  x.report("original") { original_implementation }
  x.report("optimized") { optimized_implementation }
  
  x.compare! # Shows statistical significance
end

The memory leak detection pattern tracks allocation growth over time. Profile allocation locations, identifying objects remaining in memory. Compare object counts before and after operations that should free memory.

require 'memory_profiler'

# Baseline measurement
GC.start # Start clean
before = GC.stat(:heap_live_slots)

# Run operation multiple times
report = MemoryProfiler.report do
  10.times { potentially_leaking_operation }
end

# Check for growth
GC.start
after = GC.stat(:heap_live_slots)
growth = after - before

puts "Live slots grew by: #{growth}"
report.pretty_print(retained_strings: 50)

The continuous profiling pattern monitors production performance over time. Sample a percentage of production traffic continuously. Aggregate profiling data to track performance trends. Alert on performance regression when metrics exceed thresholds.

# Middleware for selective profiling
class ProductionProfiler
  def initialize(app, sample_rate: 0.01)
    @app = app
    @sample_rate = sample_rate
  end
  
  def call(env)
    if rand < @sample_rate
      require 'stackprof'
      
      profile_data = nil
      result = StackProf.run(mode: :cpu, raw: true) do
        @app.call(env)
      end
      
      # Store profile data for analysis
      store_profile(result[:raw], env)
      result[:response]
    else
      @app.call(env)
    end
  end
  
  def store_profile(data, env)
    # Send to aggregation service
    ProfileStorage.save(
      data: data,
      endpoint: env['PATH_INFO'],
      timestamp: Time.now
    )
  end
end

The allocation reduction pattern minimizes garbage collection pressure. Profile object allocations per operation. Identify allocation hot spots creating unnecessary objects. Modify code to reduce allocations, then verify improvement.

# Profile allocations
require 'allocation_tracer'

ObjectSpace::AllocationTracer.setup(%i[path line type])
ObjectSpace::AllocationTracer.trace do
  allocation_heavy_code
end

table = ObjectSpace::AllocationTracer.allocated_count_table

# Before optimization: 100,000 String allocations
# app/services/processor.rb:25

# Optimization: reuse strings, avoid intermediate arrays
def process(items)
  result = +""  # Mutable string
  items.each do |item|
    result << transform(item)
  end
  result
end

# After: 1 String allocation

The progressive profiling pattern narrows focus iteratively. Start with coarse-grained profiling showing function-level timing. Profile identified hot spots at finer granularity. Continue drilling down until the specific slow operation appears.

# Level 1: Function-level profiling
result = RubyProf.profile { slow_controller_action }
# Identifies slow_service_method consuming 80% time

# Level 2: Detailed method profiling  
result = RubyProf.profile { slow_service_method }
# Identifies database_query consuming 70% of method time

# Level 3: Query-level profiling
ActiveRecord::Base.logger = Logger.new(STDOUT)
slow_service_method
# Identifies missing index on users.email column

Common Pitfalls

Profiling without representative data produces misleading results. Development datasets often contain orders of magnitude fewer records than production. Profile performance with production-sized datasets or the bottleneck may not appear during profiling.

# Misleading profiling with small dataset
users = User.limit(10)  # Development has 10 users
profile { users.each { |u| u.posts.count } }
# Shows no performance problem

# Production has 100,000 users
# Same code exhibits severe N+1 problem

Profiling overhead distorts measurements, especially with heavy instrumentation. RubyProf may increase execution time 10-100x, changing relative timing between operations. Fast operations appear slower, altering which methods seem like bottlenecks.

# Without profiling: method_a takes 0.1s, method_b takes 0.01s  
# With profiling: method_a takes 2s, method_b takes 1.5s
# method_b now appears significant due to profiler overhead

Optimizing the wrong metric wastes effort. CPU time, wall clock time, and memory usage represent different resources. CPU-bound operations benefit from algorithmic improvements. I/O-bound operations need concurrency changes. Profiling CPU time for I/O-heavy code identifies problems unrelated to actual bottlenecks.

# Profiling CPU time for I/O-bound code
StackProf.run(mode: :cpu) do
  users.each { |u| u.posts.count }  # Database I/O
end
# Shows little CPU usage because time spent waiting on database
# Should profile wall time instead

Ignoring garbage collection impact creates incomplete performance pictures. Allocation-heavy code may appear fast in profiling but cause performance issues through garbage collection pauses. Profile both execution time and allocations.

# Fast in profiling
def process(items)
  items.map { |i| transform(i) }  # Creates intermediate array
       .select { |i| i > threshold }  # Creates another array
       .map { |i| finalize(i) }  # And another
end
# 3 intermediate arrays → heavy GC pressure in production

Profiling in the wrong environment yields non-transferable results. Development mode in Rails loads different middleware, eager loading behaves differently, and caching may be disabled. Profile in production mode or results won't reflect production performance.

Single-run profiling provides insufficient data for optimization decisions. Performance varies between runs due to system state, garbage collection timing, and JIT compilation. Run profiling multiple times, analyzing variance and median values.

# Insufficient: single run
Benchmark.measure { slow_method }

# Better: multiple runs with statistics
require 'benchmark/ips'

Benchmark.ips do |x|
  x.config(time: 10, warmup: 2)
  x.report("method") { slow_method }
end
# Provides iterations/sec with statistical confidence

Premature optimization without profiling wastes time on unimportant code. Developers often guess wrong about bottlenecks. Profile first to identify actual hot spots before optimizing.

Profiling modified code without baseline comparison prevents determining whether optimizations succeeded. Establish baseline metrics before changes, then profile again after optimization.

# Establish baseline
baseline = Benchmark.measure { original_implementation }

# Make optimization changes
# ...

# Compare after optimization  
optimized = Benchmark.measure { optimized_implementation }

speedup = baseline.real / optimized.real
puts "Speedup: #{speedup}x"

Misinterpreting inclusive vs exclusive time leads to incorrect optimization targets. Inclusive time includes called methods, making coordinator functions appear slow when they're fast but call slow functions. Focus on exclusive time to find methods doing actual slow work.

# Coordinator method shows high inclusive time
def process_all(items)
  items.each { |item| slow_processor(item) }
end
# Inclusive time: 100s (includes slow_processor)
# Exclusive time: 0.1s (just the loop overhead)
# Optimization target: slow_processor, not process_all

Profiling without accounting for concurrency produces incorrect results in multi-threaded applications. CPU time may exceed wall clock time when multiple threads execute. Per-thread profiling shows actual work distribution.

Production profiling without safety mechanisms risks impacting user experience. Continuous profiling should sample minimally, fail gracefully, and disable automatically if overhead becomes excessive.

Reference

Profiling Modes Comparison

Mode	Measures	Overhead	Use Case
CPU time	Processor cycles consumed	Low (1-3%)	CPU-bound optimization
Wall time	Elapsed real time	Low (1-3%)	I/O-bound optimization
Object allocations	Created objects	Medium (5-10%)	Memory optimization
Process time	Includes GC time	Low (1-3%)	Overall execution
Instrumentation	Every method call	High (10-100x)	Detailed call analysis

Ruby Profiling Tools

Tool	Type	Best For	Overhead
RubyProf	Instrumentation	Detailed analysis	High
StackProf	Sampling	Production profiling	Low
MemoryProfiler	Allocation tracking	Memory leaks	Medium
Benchmark	Comparison	Algorithm comparison	None
TracePoint	Event hooks	Custom profiling	Variable
Rack Mini Profiler	Request profiling	Rails debugging	Medium

StackProf Configuration

Mode	Samples	Typical Use
cpu	CPU execution	Find CPU bottlenecks
wall	Elapsed time	Find I/O bottlenecks
object	Object allocations	Reduce GC pressure
custom	User-defined events	Specialized profiling

RubyProf Measurement Modes

Mode	Constant	Measures
Wall time	WALL_TIME	Real elapsed time
Process time	PROCESS_TIME	CPU time + GC time
Allocations	ALLOCATIONS	Objects created
Memory	MEMORY	Bytes allocated

RubyProf Printers

Printer	Output	Use Case
FlatPrinter	Text table	Quick overview
GraphPrinter	Call graph	Execution flow
GraphHtmlPrinter	Interactive HTML	Visual analysis
CallStackPrinter	Stack traces	Path analysis
DotPrinter	Graphviz format	Visualization

Profiling Workflow Checklist

Step	Action	Validation
Establish baseline	Profile current implementation	Multiple runs for variance
Identify bottlenecks	Analyze hot spots	Confirm with focused profiling
Hypothesize cause	Determine bottleneck reason	Check assumptions with data
Implement optimization	Modify code	Keep original for comparison
Profile optimized version	Measure improvement	Compare against baseline
Verify in production	Monitor live metrics	Confirm improvement transfers

Common Profiling Flags

Flag	Purpose	Example
min_percent	Filter small methods	min_percent: 2
measure_mode	Timing basis	RubyProf::WALL_TIME
eliminate_methods	Exclude patterns	eliminate_methods: [/^Kernel/]
raw	Preserve raw data	raw: true
interval	Sampling frequency	interval: 1000

Memory Profiling Metrics

Metric	Description	Significance
Allocated objects	Total objects created	Allocation pressure
Retained objects	Objects not garbage collected	Memory leak indicator
Allocated memory	Total bytes allocated	Memory throughput
Retained memory	Bytes remaining in heap	Memory leak size
Allocations by location	Objects per source line	Hot spot identification

Flamegraph Interpretation

Visual Element	Meaning	Optimization Signal
Width	Time consumed	Wider = more time
Height	Call stack depth	Deep = complex flow
Flat top	Direct work	Optimization target
Plateau	Called methods	Check callees
Color intensity	Sample concentration	Darker = hot spot

Benchmark Output Fields

Field	Meaning	Units
user	User CPU time	Seconds
system	System CPU time	Seconds
total	User + system	Seconds
real	Wall clock time	Seconds
iterations	Times executed	Count

Profiling Command Examples

# Basic RubyProf profiling
result = RubyProf.profile { target_code }
RubyProf::FlatPrinter.new(result).print(STDOUT)

# StackProf with flamegraph generation
StackProf.run(mode: :cpu, out: 'profile.dump') { target_code }
# Generate: stackprof --flamegraph profile.dump > flame.txt

# Memory profiling with filtering  
report = MemoryProfiler.report(top: 50) { target_code }
report.pretty_print(scale_bytes: true)

# Benchmark comparison
Benchmark.bm(20) do |x|
  x.report("implementation_a") { code_a }
  x.report("implementation_b") { code_b }  
end

# Production sampling
StackProf.run(mode: :cpu, interval: 10000, raw: true) do
  # Runs continuously with minimal overhead
end

GC Metrics for Profiling

Metric	Method	Indicates
Heap slots	GC.stat(:heap_live_slots)	Live objects
GC count	GC.count	Collection frequency
GC time	GC::Profiler.total_time	Collection overhead
Minor collections	GC.stat(:minor_gc_count)	Young generation pressure
Major collections	GC.stat(:major_gc_count)	Old generation pressure

Profiling Environment Variables

Variable	Effect	Example
RUBY_GC_HEAP_GROWTH_FACTOR	Heap growth rate	1.8
RUBY_GC_HEAP_INIT_SLOTS	Initial heap size	600000
RUBY_GC_MALLOC_LIMIT	GC trigger threshold	16000000
RUBY_PROF_MEASURE_MODE	Default measurement	process_time

Statistical Significance Thresholds

Confidence Level	Minimum Runs	Use Case
90%	10 runs	Quick validation
95%	30 runs	Standard comparison
99%	50 runs	Critical optimization
Exploratory	5 runs	Initial investigation

Performance Profiling