Overview
Performance profiling measures how programs execute, tracking metrics like execution time, memory consumption, CPU usage, and method call frequency. Profiling provides empirical data about program behavior during runtime, revealing where computational resources are spent and identifying optimization opportunities.
Profiling differs from benchmarking. Benchmarking measures overall execution time under controlled conditions to compare implementations. Profiling examines internal program behavior during execution, breaking down resource consumption by function, line, or operation. A benchmark might show one algorithm runs faster than another; profiling reveals why by exposing which operations consume the most time.
The profiling process involves three stages: instrumentation, data collection, and analysis. Instrumentation adds measurement code to track execution. Data collection captures metrics during program execution. Analysis interprets collected data to identify bottlenecks and optimization targets.
Profilers operate through sampling or instrumentation. Sampling profilers periodically check program state, recording which code is executing at each sample point. Instrumentation profilers inject measurement code into the program, tracking every relevant event. Sampling introduces minimal overhead but provides statistical approximations. Instrumentation captures precise data but significantly impacts execution speed.
# Sampling profiler example - checks program state periodically
# Shows statistical view of where time is spent
require 'stackprof'
StackProf.run(mode: :cpu, out: 'profile.dump') do
10_000.times { expensive_operation }
end
# Instrumentation profiler example - tracks every method call
# Shows exact call counts and timing
require 'ruby-prof'
result = RubyProf.profile do
10_000.times { expensive_operation }
end
Profiling serves multiple objectives. Performance optimization identifies slow code paths for improvement. Capacity planning determines resource requirements for production deployment. Debugging reveals unexpected behavior manifesting as performance issues. Performance regression detection catches optimization losses between versions.
Key Principles
Performance profiling rests on measurement accuracy, statistical validity, and representative workloads. Measurements must reflect actual program behavior without significant distortion from profiling overhead. Statistical validity requires sufficient sample sizes and repeated measurements to account for variance. Representative workloads ensure profiling results transfer to production scenarios.
The observer effect describes how measurement changes the measured system. Profiling adds computational overhead, altering execution characteristics. Instrumentation profilers may increase execution time by 10-100x. Sampling profilers typically add 1-5% overhead. Heavier instrumentation provides more detailed data at the cost of less representative behavior.
Profiling granularity determines measurement detail. Function-level profiling tracks execution time per method. Line-level profiling measures individual statement execution. Block-level profiling examines code regions. Higher granularity increases data volume and overhead while providing finer optimization targets.
Hot spots represent code sections consuming disproportionate resources. The 90/10 rule suggests 90% of execution time concentrates in 10% of code. Profiling identifies these hot spots for targeted optimization. A function called once but running slowly may matter less than a fast function called millions of times.
# Hot spot example - seemingly fast operation becomes bottleneck
def process_records(records)
records.map do |record|
# Fast operation: 0.0001 seconds
validate(record)
end
end
# Called with 1,000,000 records
# Total time: 0.0001 * 1,000,000 = 100 seconds
# This becomes the hot spot despite individual speed
Call graphs represent program execution flow, showing which functions call which and how often. Call depth reveals function nesting levels. Exclusive time measures time spent within a function excluding called functions. Inclusive time includes time spent in the function and all functions it calls. These metrics distinguish between functions doing work directly versus coordinating other functions.
Memory profiling tracks allocation patterns, object lifetimes, and garbage collection behavior. Allocation profiling counts object creation by type and location. Retention profiling identifies objects remaining in memory when expected to be garbage collected. Generation analysis examines object survival across garbage collection cycles.
Baseline establishment compares performance measurements against reference points. Baselines may represent previous application versions, competing implementations, or theoretical limits. Regression testing detects performance decreases between versions. Performance budgets set acceptable thresholds for operations.
Statistical significance addresses measurement variance. Multiple profiling runs produce different results due to system state variations. Outliers may represent rare edge cases or measurement errors. Confidence intervals quantify measurement uncertainty. Averaging multiple runs reduces noise but may obscure important variance.
Ruby Implementation
Ruby provides multiple profiling approaches through standard library modules and third-party gems. The ruby-prof gem offers comprehensive instrumentation-based profiling with multiple output formats. StackProf provides sampling-based profiling integrated with Ruby's internal structures. The Benchmark module enables controlled timing comparisons.
Ruby-prof operates through explicit profiling blocks. Starting the profiler enables instrumentation. The profiled code executes with measurement overhead. Stopping returns results containing timing and call data.
require 'ruby-prof'
# Start profiling
RubyProf.start
# Code to profile
def calculate_fibonacci(n)
return n if n <= 1
calculate_fibonacci(n - 1) + calculate_fibonacci(n - 2)
end
result = calculate_fibonacci(25)
# Stop profiling and get results
profile_result = RubyProf.stop
# Print flat profile sorted by self time
printer = RubyProf::FlatPrinter.new(profile_result)
printer.print(STDOUT, min_percent: 2)
StackProf samples the call stack at regular intervals, recording which methods are executing. The mode parameter determines sampling basis: cpu samples based on CPU time, wall samples by wall clock time, object samples by object allocations.
require 'stackprof'
# Profile CPU usage
StackProf.run(mode: :cpu, out: 'cpu_profile.dump') do
array = (1..1_000_000).to_a
array.each { |n| Math.sqrt(n) }
end
# Profile object allocations
StackProf.run(mode: :object, out: 'object_profile.dump') do
1_000.times { Array.new(1000) { rand } }
end
# Load and analyze results
StackProf::Report.new(StackProf.results).print_text
The memory_profiler gem tracks object allocations and retention. It reports allocated objects by class, location, and retention status.
require 'memory_profiler'
report = MemoryProfiler.report do
array = []
10_000.times do |i|
array << { id: i, data: "Record #{i}" * 10 }
end
end
# Show allocation breakdown
report.pretty_print(scale_bytes: true)
TracePoint provides low-level event hooks into Ruby execution. It captures method calls, returns, line execution, and object allocation events.
trace = TracePoint.new(:call, :return) do |tp|
puts "#{tp.event} - #{tp.method_id} in #{tp.path}:#{tp.lineno}"
end
trace.enable do
some_method
end
The Benchmark module measures execution time for code blocks. It compares multiple implementations under identical conditions.
require 'benchmark'
n = 1_000_000
Benchmark.bm(20) do |x|
x.report("Array#each:") do
arr = (1..n).to_a
arr.each { |i| i * 2 }
end
x.report("Array#map:") do
arr = (1..n).to_a
arr.map { |i| i * 2 }
end
x.report("for loop:") do
arr = (1..n).to_a
for i in arr
i * 2
end
end
end
Production profiling requires minimal overhead and safe failure modes. Rack Mini Profiler integrates into Rails applications, providing per-request profiling accessible through browser interfaces.
# In Gemfile
gem 'rack-mini-profiler'
# In config/environments/development.rb
config.middleware.use(Rack::MiniProfiler)
# Access profiling data by appending ?pp=enable to URLs
# Displays SQL queries, rendering time, allocations per request
Datadog, New Relic, and Skylight offer production profiling services. They sample production traffic, aggregate metrics, and provide visualization interfaces. These services balance overhead with data collection, typically sampling 1-5% of requests.
Tools & Ecosystem
Ruby's profiling ecosystem includes sampling profilers, instrumentation profilers, memory analyzers, and production monitoring tools. Each tool addresses specific profiling scenarios with different trade-offs.
RubyProf provides the most comprehensive instrumentation profiling. It supports multiple output formats: flat profiles showing per-method metrics, graph profiles displaying call relationships, and call stack profiles revealing execution paths. Graph HTML output generates interactive visualizations showing call hierarchies with timing data.
require 'ruby-prof'
result = RubyProf.profile do
complex_operation
end
# Flat text output - methods sorted by self time
flat_printer = RubyProf::FlatPrinter.new(result)
flat_printer.print(File.open('flat.txt', 'w'))
# Graph HTML output - interactive call graph
graph_printer = RubyProf::GraphHtmlPrinter.new(result)
graph_printer.print(File.open('graph.html', 'w'))
# Call stack output - execution paths
stack_printer = RubyProf::CallStackPrinter.new(result)
stack_printer.print(File.open('stack.html', 'w'))
StackProf excels at production profiling due to minimal overhead. It samples at configurable intervals, recording stack traces. The report format shows methods consuming the most time, sorted by total samples.
require 'stackprof'
# Sample every 1000 CPU cycles
StackProf.run(mode: :cpu, interval: 1000, out: 'profile.dump') do
application_code
end
# Generate flamegraph visualization
system('stackprof --flamegraph profile.dump > flamegraph.txt')
system('flamegraph.pl flamegraph.txt > flamegraph.svg')
Flamegraphs visualize profiling data as hierarchical rectangles. Width represents time consumed, height shows call depth. Colors distinguish different code paths. Clicking sections zooms into specific call paths.
Memory Profiler tracks allocation locations and retained objects. It identifies memory leaks by finding objects that should be garbage collected but remain referenced.
require 'memory_profiler'
report = MemoryProfiler.report(top: 50) do
leak_code
end
# Find objects allocated but not freed
puts "Retained objects: #{report.total_retained}"
puts "Retained memory: #{report.total_retained_memsize} bytes"
# Show allocation locations
report.pretty_print(to_file: 'memory_report.txt',
scale_bytes: true,
normalize_paths: true)
Derailed Benchmarks profiles Rails applications, measuring boot time, memory usage per request, and gem loading time. It identifies expensive gems and request handlers.
# In Gemfile
gem 'derailed_benchmarks', group: :development
# Measure memory per request
$ bundle exec derailed bundle:mem
# Find expensive gems
$ bundle exec derailed bundle:objects
# Profile boot time
$ bundle exec derailed bundle:gemfile
Rack Mini Profiler operates as middleware, profiling each request and displaying results in the browser. It shows SQL queries, template rendering, and method call timing without external services.
Skylight, Scout APM, and New Relic provide production profiling as services. They agent-based sampling, aggregating data across many requests. They track endpoint performance, database query patterns, and external service calls. These tools identify slow endpoints and correlate performance with deployment changes.
Benchmark IPS (iterations per second) measures operation throughput rather than elapsed time. It determines how many times an operation completes per second, accounting for variance through statistical analysis.
require 'benchmark/ips'
Benchmark.ips do |x|
x.report("string concatenation") do
str = ""
100.times { str += "a" }
end
x.report("string interpolation") do
str = ""
100.times { str = "#{str}a" }
end
x.report("array join") do
arr = []
100.times { arr << "a" }
arr.join
end
x.compare!
end
Allocation Tracer tracks object allocations per source location. It identifies allocation hot spots causing excessive garbage collection.
require 'allocation_tracer'
ObjectSpace::AllocationTracer.setup(%i[path line type])
ObjectSpace::AllocationTracer.trace do
allocation_heavy_code
end
result = ObjectSpace::AllocationTracer.allocated_count_table
result.sort_by { |k, v| -v }.first(10).each do |location, count|
puts "#{location}: #{count} allocations"
end
Practical Examples
Profiling a slow Rails endpoint begins with request-level timing. Rack Mini Profiler shows database queries, view rendering, and method calls contributing to response time.
# Install rack-mini-profiler
# Gemfile
gem 'rack-mini-profiler'
# Enable for development
# config/environments/development.rb
config.middleware.use Rack::MiniProfiler
# Access endpoint with profiling
# GET /users?pp=enable
# Shows breakdown:
# - SQL: 450ms (30 queries)
# - Rendering: 200ms
# - Controller: 50ms
Identifying the bottleneck reveals 30 database queries due to N+1 patterns. Profiling isolates the problem method.
# Slow controller action
def index
@users = User.all
# Template iterates users, triggering queries
# <%= user.posts.count %> causes one query per user
end
# Profile specific method
require 'ruby-prof'
RubyProf.start
@users = User.includes(:posts).all
result = RubyProf.stop
printer = RubyProf::FlatPrinter.new(result)
printer.print(STDOUT)
Memory profiling detects leaks in background jobs. A job processing uploaded images holds references to processed data, preventing garbage collection.
require 'memory_profiler'
# Profile job execution
report = MemoryProfiler.report do
ImageProcessingJob.perform_now(image_id)
end
# Results show retained objects
# Retained String: 50,000 (5MB)
# Allocated at app/jobs/image_processing_job.rb:15
Examining line 15 reveals an unnecessary instance variable holding processed data.
class ImageProcessingJob < ApplicationJob
def perform(image_id)
image = Image.find(image_id)
@processed_data = process_image(image) # Retained after job completes
upload_to_s3(@processed_data)
# Should be: upload_to_s3(process_image(image))
end
end
Profiling algorithm performance compares implementations. A data processing pipeline filters and transforms large datasets. Profiling reveals which operations dominate execution time.
require 'benchmark'
data = (1..1_000_000).map { { value: rand(1000), category: rand(10) } }
Benchmark.bm(30) do |x|
x.report("select then map:") do
data.select { |h| h[:value] > 500 }
.map { |h| h[:value] * 2 }
end
x.report("each_with_object:") do
data.each_with_object([]) do |h, acc|
acc << h[:value] * 2 if h[:value] > 500
end
end
x.report("reduce:") do
data.reduce([]) do |acc, h|
h[:value] > 500 ? acc << h[:value] * 2 : acc
end
end
end
Detailed profiling with StackProf identifies allocation hot spots.
require 'stackprof'
StackProf.run(mode: :object, out: 'objects.dump') do
result = data.select { |h| h[:value] > 500 }
.map { |h| h[:value] * 2 }
end
# Analyze allocation patterns
report = StackProf::Report.from_file('objects.dump')
report.print_text(limit: 20)
# Shows:
# Array#select: 500,000 object allocations
# Array#map: 500,000 object allocations
# Total: 1,000,000 intermediate objects
Profiling production performance requires sampling to minimize overhead. StackProf runs continuously, dumping profiles periodically for analysis.
# In production initializer
if ENV['ENABLE_PROFILING'] == 'true'
require 'stackprof'
# Start profiling on server boot
StackProf.run(mode: :cpu,
interval: 5000, # Sample less frequently
out: 'tmp/stackprof.dump',
raw: true) do
# Application runs indefinitely
# Profile captures samples during execution
end
end
# Analyze collected data
# bundle exec stackprof tmp/stackprof.dump --text --limit 50
Profiling database query performance combines application profiling with database explain plans. Slow query logs identify problematic queries, while profiling shows which application code triggers them.
# Enable query logging
ActiveRecord::Base.logger = Logger.new(STDOUT)
# Profile request with database queries
require 'stackprof'
StackProf.run(mode: :wall, out: 'queries.dump') do
User.includes(:posts).where(active: true).each do |user|
user.posts.where(published: true).count
end
end
# Analyze to find query-heavy methods
Common Patterns
Performance profiling workflows follow established patterns for different optimization scenarios. The profile-bottleneck-optimize-verify cycle forms the core pattern.
The diagnostic profiling pattern identifies performance problems in existing code. Start with high-level profiling showing overall time distribution. Drill into hot spots with detailed profiling. Verify bottleneck identification through multiple profiling runs. This pattern prevents premature optimization by confirming which code sections actually consume resources.
# Step 1: High-level profiling
require 'ruby-prof'
result = RubyProf.profile(measure_mode: RubyProf::WALL_TIME) do
full_application_workflow
end
flat = RubyProf::FlatPrinter.new(result)
flat.print(STDOUT, min_percent: 5) # Show methods consuming >5% time
# Step 2: Detailed profiling of identified bottleneck
result = RubyProf.profile(measure_mode: RubyProf::PROCESS_TIME) do
identified_slow_method
end
graph = RubyProf::GraphPrinter.new(result)
graph.print(STDOUT)
The comparison profiling pattern evaluates multiple implementations. Profile each approach under identical conditions. Compare results statistically to account for variance. This pattern validates optimization attempts, confirming improvements before committing changes.
require 'benchmark/ips'
# Compare implementations statistically
Benchmark.ips do |x|
x.config(time: 10, warmup: 2) # Run longer for accuracy
x.report("original") { original_implementation }
x.report("optimized") { optimized_implementation }
x.compare! # Shows statistical significance
end
The memory leak detection pattern tracks allocation growth over time. Profile allocation locations, identifying objects remaining in memory. Compare object counts before and after operations that should free memory.
require 'memory_profiler'
# Baseline measurement
GC.start # Start clean
before = GC.stat(:heap_live_slots)
# Run operation multiple times
report = MemoryProfiler.report do
10.times { potentially_leaking_operation }
end
# Check for growth
GC.start
after = GC.stat(:heap_live_slots)
growth = after - before
puts "Live slots grew by: #{growth}"
report.pretty_print(retained_strings: 50)
The continuous profiling pattern monitors production performance over time. Sample a percentage of production traffic continuously. Aggregate profiling data to track performance trends. Alert on performance regression when metrics exceed thresholds.
# Middleware for selective profiling
class ProductionProfiler
def initialize(app, sample_rate: 0.01)
@app = app
@sample_rate = sample_rate
end
def call(env)
if rand < @sample_rate
require 'stackprof'
profile_data = nil
result = StackProf.run(mode: :cpu, raw: true) do
@app.call(env)
end
# Store profile data for analysis
store_profile(result[:raw], env)
result[:response]
else
@app.call(env)
end
end
def store_profile(data, env)
# Send to aggregation service
ProfileStorage.save(
data: data,
endpoint: env['PATH_INFO'],
timestamp: Time.now
)
end
end
The allocation reduction pattern minimizes garbage collection pressure. Profile object allocations per operation. Identify allocation hot spots creating unnecessary objects. Modify code to reduce allocations, then verify improvement.
# Profile allocations
require 'allocation_tracer'
ObjectSpace::AllocationTracer.setup(%i[path line type])
ObjectSpace::AllocationTracer.trace do
allocation_heavy_code
end
table = ObjectSpace::AllocationTracer.allocated_count_table
# Before optimization: 100,000 String allocations
# app/services/processor.rb:25
# Optimization: reuse strings, avoid intermediate arrays
def process(items)
result = +"" # Mutable string
items.each do |item|
result << transform(item)
end
result
end
# After: 1 String allocation
The progressive profiling pattern narrows focus iteratively. Start with coarse-grained profiling showing function-level timing. Profile identified hot spots at finer granularity. Continue drilling down until the specific slow operation appears.
# Level 1: Function-level profiling
result = RubyProf.profile { slow_controller_action }
# Identifies slow_service_method consuming 80% time
# Level 2: Detailed method profiling
result = RubyProf.profile { slow_service_method }
# Identifies database_query consuming 70% of method time
# Level 3: Query-level profiling
ActiveRecord::Base.logger = Logger.new(STDOUT)
slow_service_method
# Identifies missing index on users.email column
Common Pitfalls
Profiling without representative data produces misleading results. Development datasets often contain orders of magnitude fewer records than production. Profile performance with production-sized datasets or the bottleneck may not appear during profiling.
# Misleading profiling with small dataset
users = User.limit(10) # Development has 10 users
profile { users.each { |u| u.posts.count } }
# Shows no performance problem
# Production has 100,000 users
# Same code exhibits severe N+1 problem
Profiling overhead distorts measurements, especially with heavy instrumentation. RubyProf may increase execution time 10-100x, changing relative timing between operations. Fast operations appear slower, altering which methods seem like bottlenecks.
# Without profiling: method_a takes 0.1s, method_b takes 0.01s
# With profiling: method_a takes 2s, method_b takes 1.5s
# method_b now appears significant due to profiler overhead
Optimizing the wrong metric wastes effort. CPU time, wall clock time, and memory usage represent different resources. CPU-bound operations benefit from algorithmic improvements. I/O-bound operations need concurrency changes. Profiling CPU time for I/O-heavy code identifies problems unrelated to actual bottlenecks.
# Profiling CPU time for I/O-bound code
StackProf.run(mode: :cpu) do
users.each { |u| u.posts.count } # Database I/O
end
# Shows little CPU usage because time spent waiting on database
# Should profile wall time instead
Ignoring garbage collection impact creates incomplete performance pictures. Allocation-heavy code may appear fast in profiling but cause performance issues through garbage collection pauses. Profile both execution time and allocations.
# Fast in profiling
def process(items)
items.map { |i| transform(i) } # Creates intermediate array
.select { |i| i > threshold } # Creates another array
.map { |i| finalize(i) } # And another
end
# 3 intermediate arrays → heavy GC pressure in production
Profiling in the wrong environment yields non-transferable results. Development mode in Rails loads different middleware, eager loading behaves differently, and caching may be disabled. Profile in production mode or results won't reflect production performance.
Single-run profiling provides insufficient data for optimization decisions. Performance varies between runs due to system state, garbage collection timing, and JIT compilation. Run profiling multiple times, analyzing variance and median values.
# Insufficient: single run
Benchmark.measure { slow_method }
# Better: multiple runs with statistics
require 'benchmark/ips'
Benchmark.ips do |x|
x.config(time: 10, warmup: 2)
x.report("method") { slow_method }
end
# Provides iterations/sec with statistical confidence
Premature optimization without profiling wastes time on unimportant code. Developers often guess wrong about bottlenecks. Profile first to identify actual hot spots before optimizing.
Profiling modified code without baseline comparison prevents determining whether optimizations succeeded. Establish baseline metrics before changes, then profile again after optimization.
# Establish baseline
baseline = Benchmark.measure { original_implementation }
# Make optimization changes
# ...
# Compare after optimization
optimized = Benchmark.measure { optimized_implementation }
speedup = baseline.real / optimized.real
puts "Speedup: #{speedup}x"
Misinterpreting inclusive vs exclusive time leads to incorrect optimization targets. Inclusive time includes called methods, making coordinator functions appear slow when they're fast but call slow functions. Focus on exclusive time to find methods doing actual slow work.
# Coordinator method shows high inclusive time
def process_all(items)
items.each { |item| slow_processor(item) }
end
# Inclusive time: 100s (includes slow_processor)
# Exclusive time: 0.1s (just the loop overhead)
# Optimization target: slow_processor, not process_all
Profiling without accounting for concurrency produces incorrect results in multi-threaded applications. CPU time may exceed wall clock time when multiple threads execute. Per-thread profiling shows actual work distribution.
Production profiling without safety mechanisms risks impacting user experience. Continuous profiling should sample minimally, fail gracefully, and disable automatically if overhead becomes excessive.
Reference
Profiling Modes Comparison
| Mode | Measures | Overhead | Use Case |
|---|---|---|---|
| CPU time | Processor cycles consumed | Low (1-3%) | CPU-bound optimization |
| Wall time | Elapsed real time | Low (1-3%) | I/O-bound optimization |
| Object allocations | Created objects | Medium (5-10%) | Memory optimization |
| Process time | Includes GC time | Low (1-3%) | Overall execution |
| Instrumentation | Every method call | High (10-100x) | Detailed call analysis |
Ruby Profiling Tools
| Tool | Type | Best For | Overhead |
|---|---|---|---|
| RubyProf | Instrumentation | Detailed analysis | High |
| StackProf | Sampling | Production profiling | Low |
| MemoryProfiler | Allocation tracking | Memory leaks | Medium |
| Benchmark | Comparison | Algorithm comparison | None |
| TracePoint | Event hooks | Custom profiling | Variable |
| Rack Mini Profiler | Request profiling | Rails debugging | Medium |
StackProf Configuration
| Mode | Samples | Typical Use |
|---|---|---|
| cpu | CPU execution | Find CPU bottlenecks |
| wall | Elapsed time | Find I/O bottlenecks |
| object | Object allocations | Reduce GC pressure |
| custom | User-defined events | Specialized profiling |
RubyProf Measurement Modes
| Mode | Constant | Measures |
|---|---|---|
| Wall time | WALL_TIME | Real elapsed time |
| Process time | PROCESS_TIME | CPU time + GC time |
| Allocations | ALLOCATIONS | Objects created |
| Memory | MEMORY | Bytes allocated |
RubyProf Printers
| Printer | Output | Use Case |
|---|---|---|
| FlatPrinter | Text table | Quick overview |
| GraphPrinter | Call graph | Execution flow |
| GraphHtmlPrinter | Interactive HTML | Visual analysis |
| CallStackPrinter | Stack traces | Path analysis |
| DotPrinter | Graphviz format | Visualization |
Profiling Workflow Checklist
| Step | Action | Validation |
|---|---|---|
| Establish baseline | Profile current implementation | Multiple runs for variance |
| Identify bottlenecks | Analyze hot spots | Confirm with focused profiling |
| Hypothesize cause | Determine bottleneck reason | Check assumptions with data |
| Implement optimization | Modify code | Keep original for comparison |
| Profile optimized version | Measure improvement | Compare against baseline |
| Verify in production | Monitor live metrics | Confirm improvement transfers |
Common Profiling Flags
| Flag | Purpose | Example |
|---|---|---|
| min_percent | Filter small methods | min_percent: 2 |
| measure_mode | Timing basis | RubyProf::WALL_TIME |
| eliminate_methods | Exclude patterns | eliminate_methods: [/^Kernel/] |
| raw | Preserve raw data | raw: true |
| interval | Sampling frequency | interval: 1000 |
Memory Profiling Metrics
| Metric | Description | Significance |
|---|---|---|
| Allocated objects | Total objects created | Allocation pressure |
| Retained objects | Objects not garbage collected | Memory leak indicator |
| Allocated memory | Total bytes allocated | Memory throughput |
| Retained memory | Bytes remaining in heap | Memory leak size |
| Allocations by location | Objects per source line | Hot spot identification |
Flamegraph Interpretation
| Visual Element | Meaning | Optimization Signal |
|---|---|---|
| Width | Time consumed | Wider = more time |
| Height | Call stack depth | Deep = complex flow |
| Flat top | Direct work | Optimization target |
| Plateau | Called methods | Check callees |
| Color intensity | Sample concentration | Darker = hot spot |
Benchmark Output Fields
| Field | Meaning | Units |
|---|---|---|
| user | User CPU time | Seconds |
| system | System CPU time | Seconds |
| total | User + system | Seconds |
| real | Wall clock time | Seconds |
| iterations | Times executed | Count |
Profiling Command Examples
# Basic RubyProf profiling
result = RubyProf.profile { target_code }
RubyProf::FlatPrinter.new(result).print(STDOUT)
# StackProf with flamegraph generation
StackProf.run(mode: :cpu, out: 'profile.dump') { target_code }
# Generate: stackprof --flamegraph profile.dump > flame.txt
# Memory profiling with filtering
report = MemoryProfiler.report(top: 50) { target_code }
report.pretty_print(scale_bytes: true)
# Benchmark comparison
Benchmark.bm(20) do |x|
x.report("implementation_a") { code_a }
x.report("implementation_b") { code_b }
end
# Production sampling
StackProf.run(mode: :cpu, interval: 10000, raw: true) do
# Runs continuously with minimal overhead
end
GC Metrics for Profiling
| Metric | Method | Indicates |
|---|---|---|
| Heap slots | GC.stat(:heap_live_slots) | Live objects |
| GC count | GC.count | Collection frequency |
| GC time | GC::Profiler.total_time | Collection overhead |
| Minor collections | GC.stat(:minor_gc_count) | Young generation pressure |
| Major collections | GC.stat(:major_gc_count) | Old generation pressure |
Profiling Environment Variables
| Variable | Effect | Example |
|---|---|---|
| RUBY_GC_HEAP_GROWTH_FACTOR | Heap growth rate | 1.8 |
| RUBY_GC_HEAP_INIT_SLOTS | Initial heap size | 600000 |
| RUBY_GC_MALLOC_LIMIT | GC trigger threshold | 16000000 |
| RUBY_PROF_MEASURE_MODE | Default measurement | process_time |
Statistical Significance Thresholds
| Confidence Level | Minimum Runs | Use Case |
|---|---|---|
| 90% | 10 runs | Quick validation |
| 95% | 30 runs | Standard comparison |
| 99% | 50 runs | Critical optimization |
| Exploratory | 5 runs | Initial investigation |