Overview
Ruby provides multiple profiling approaches for measuring application performance and identifying execution bottlenecks. The built-in profile
module offers deterministic profiling through method call tracing, while sampling profilers like stackprof
and ruby-prof
provide statistical performance analysis. Each profiler serves different measurement needs and deployment scenarios.
Ruby-prof operates as a C extension with measurement modes for wall time, process time, object allocations, and memory usage. StackProf implements sampling-based profiling using Ruby's C-API tracepoint system, collecting call stacks at regular intervals without significant runtime overhead. The built-in profiler tracks every method call, producing detailed but computationally expensive reports.
# Built-in profile module
require 'profile'
def calculate_fibonacci(n)
return n if n <= 1
calculate_fibonacci(n - 1) + calculate_fibonacci(n - 2)
end
calculate_fibonacci(30)
# Automatically generates profiling report on exit
Ruby profilers measure different execution aspects: CPU time tracks processor usage, wall time measures real-world elapsed time, and allocation profilers monitor object creation and memory consumption. Thread-aware profilers handle concurrent code execution, while sampling profilers balance measurement accuracy with performance impact.
# StackProf sampling profiler
require 'stackprof'
profile = StackProf.run(mode: :cpu, interval: 1000) do
1000.times { |i| String.new("iteration #{i}") }
end
StackProf::Report.new(profile).print_text
Ruby profiling integrates with development workflows through command-line interfaces, programmatic APIs, and web application middleware. External profilers like rbspy attach to running processes without code modification, while embedded profilers require application integration.
Basic Usage
The standard library profile module requires only require 'profile'
to activate automatic profiling for the entire program execution. Output appears when the program terminates, showing method call counts, execution times, and performance percentages.
require 'profile'
class DataProcessor
def process_records(records)
records.map { |record| transform_record(record) }
end
private
def transform_record(record)
record.upcase.strip.gsub(/\s+/, '_')
end
end
processor = DataProcessor.new
data = [" hello world ", " ruby programming "] * 1000
processor.process_records(data)
# Outputs detailed profiling report:
# % time seconds calls ms/call name
# 45.32 0.124 1000 0.12 String#gsub
# 23.18 0.063 1000 0.06 String#upcase
# 15.42 0.042 1000 0.04 String#strip
Ruby-prof provides programmatic profiling control through start/stop methods and block-based profiling. Multiple measurement modes capture different performance aspects, with configurable output formats for analysis.
require 'ruby-prof'
# Explicit start/stop profiling
RubyProf.start
expensive_operation
result = RubyProf.stop
# Generate flat report
printer = RubyProf::FlatPrinter.new(result)
printer.print(STDOUT)
# Block-based profiling with options
result = RubyProf.profile(measure_mode: RubyProf::WALL_TIME) do
10000.times { Math.sqrt(rand(1000)) }
end
# Graph report showing call relationships
printer = RubyProf::GraphPrinter.new(result)
printer.print(STDOUT)
StackProf sampling profiler reduces performance impact by collecting call stacks at specified intervals rather than tracing every method call. Different sampling modes target CPU usage, wall clock time, or object allocations.
require 'stackprof'
# CPU profiling with 1ms sampling interval
StackProf.run(mode: :cpu, interval: 1000, out: 'cpu_profile.dump') do
complex_calculation
end
# Wall time profiling for I/O operations
StackProf.run(mode: :wall, interval: 10, out: 'wall_profile.dump') do
File.read('large_file.txt').split("\n").each(&:strip)
end
# Object allocation profiling
StackProf.run(mode: :object, interval: 1, out: 'alloc_profile.dump') do
1000.times { Hash.new.merge!({key: 'value'}) }
end
# Analyze results from command line
# $ stackprof cpu_profile.dump --text
# $ stackprof wall_profile.dump --flamegraph
Rails applications integrate profiling through middleware and development tools. Rack Mini Profiler provides browser-based profiling interface with database query analysis and memory tracking.
# Gemfile
group :development do
gem 'rack-mini-profiler'
gem 'stackprof'
gem 'memory_profiler'
end
# config/environments/development.rb
if Rails.env.development?
require 'rack-mini-profiler'
Rack::MiniProfilerRails.initialize!(Rails.application)
end
# Profile specific controller actions
class UsersController < ApplicationController
def index
StackProf.run(mode: :cpu, out: 'tmp/users_index.dump') do
@users = User.includes(:posts).limit(100)
end
end
end
Performance & Memory
Profiling introduces measurement overhead that varies significantly between deterministic and sampling approaches. Ruby-prof can slow execution by 2-20x depending on measurement mode, while sampling profilers like StackProf maintain 1-5% overhead.
Measurement mode selection affects both overhead and information quality. Wall time captures real execution duration including I/O waits, process time excludes system overhead, and allocation tracking measures object creation patterns.
require 'benchmark'
require 'ruby-prof'
require 'stackprof'
def cpu_intensive_task
(1..100000).map { |i| Math.sqrt(i) }.sum
end
# Measure profiling overhead
puts Benchmark.measure { cpu_intensive_task }
# => 0.045000 0.000000 0.045000 ( 0.045123)
# Ruby-prof overhead (deterministic)
puts Benchmark.measure do
RubyProf.profile { cpu_intensive_task }
end
# => 0.890000 0.010000 0.900000 ( 0.901234)
# StackProf overhead (sampling)
puts Benchmark.measure do
StackProf.run(mode: :cpu) { cpu_intensive_task }
end
# => 0.048000 0.000000 0.048000 ( 0.048567)
Memory profiling reveals allocation patterns and garbage collection pressure. Ruby-prof memory mode tracks byte allocation per method, while object mode counts created instances. Memory profilers help identify allocation hotspots and optimize garbage collection performance.
require 'ruby-prof'
require 'memory_profiler'
# Compare allocation patterns
def string_concatenation(count)
result = ""
count.times { |i| result += "item #{i} " }
result
end
def string_interpolation(count)
items = (0...count).map { |i| "item #{i}" }
items.join(" ")
end
# Memory allocation profiling
report1 = MemoryProfiler.report do
string_concatenation(1000)
end
report2 = MemoryProfiler.report do
string_interpolation(1000)
end
puts "Concatenation allocated: #{report1.total_allocated_memsize} bytes"
puts "Interpolation allocated: #{report2.total_allocated_memsize} bytes"
# Shows significant allocation difference
# Ruby-prof memory tracking
result = RubyProf.profile(measure_mode: RubyProf::MEMORY) do
10.times { string_concatenation(100) }
end
printer = RubyProf::FlatPrinter.new(result)
printer.print(STDOUT)
# Reveals method-level memory consumption
StackProf object allocation mode provides detailed allocation tracking with source location information. This mode enables identification of specific code lines responsible for memory pressure.
# Detailed allocation profiling
StackProf.run(mode: :object, raw: true, out: 'allocations.json') do
users = []
1000.times do |i|
user = {
id: i,
name: "User #{i}",
email: "user#{i}@example.com",
metadata: {created_at: Time.now, active: true}
}
users << user
end
end
# Analyze allocation hotspots
# $ stackprof allocations.json --text
# Reveals hash and string allocation patterns
Garbage collection analysis requires understanding allocation patterns and object lifecycle. Profilers can exclude or highlight garbage collection overhead, showing time spent in mark and sweep phases.
# Monitor GC impact during profiling
GC.stat # => {:count=>15, :heap_allocated_pages=>45, ...}
profile = StackProf.run(mode: :wall, ignore_gc: false) do
large_dataset = Array.new(100000) { |i| "data_#{i}" * 50 }
large_dataset.map(&:upcase).select { |s| s.include?('5') }
end
# GC statistics appear in profile output
# Shows percentage of time spent in garbage collection
Production Patterns
Production profiling requires minimal performance impact and safe data collection. External profilers like rbspy attach to running processes without code modification, making them suitable for production investigation. Sampling profilers balance measurement accuracy with application stability.
# Safe production profiling approach
class ApplicationProfiler
def self.profile_action(controller, action)
return yield unless should_profile?
profile_data = StackProf.run(
mode: :wall,
interval: 100, # Reduced sampling frequency
raw: true
) { yield }
store_profile_data(controller, action, profile_data)
rescue => e
Rails.logger.error "Profiling error: #{e.message}"
yield # Continue execution if profiling fails
end
private
def self.should_profile?
rand < Rails.application.config.profiling_sample_rate
end
def self.store_profile_data(controller, action, data)
# Async storage to avoid blocking request
ProfileStorageJob.perform_later(
controller: controller,
action: action,
profile: data,
timestamp: Time.current
)
end
end
# Controller integration
class ApplicationController < ActionController::Base
around_action :profile_requests
private
def profile_requests
ApplicationProfiler.profile_action(
controller_name,
action_name
) { yield }
end
end
Application Performance Monitoring (APM) tools provide continuous profiling with minimal configuration. These tools automatically instrument applications and collect performance metrics without manual profiling code.
# APM tool configuration
# Gemfile
group :production do
gem 'skylight' # or 'newrelic_rpm', 'appsignal'
end
# config/application.rb
config.skylight.environments = ['production']
config.skylight.probes += ['redis', 'mongo']
# Automatic instrumentation tracks:
# - HTTP request performance
# - Database query analysis
# - Background job monitoring
# - Custom instrumentation points
Conditional profiling enables targeted performance investigation without affecting all requests. Profile collection based on request parameters, user segments, or performance thresholds minimizes overhead.
# Conditional profiling strategies
class ConditionalProfiler
def self.profile_if_slow(threshold_ms = 1000)
start_time = Time.current
result = yield
duration_ms = (Time.current - start_time) * 1000
if duration_ms > threshold_ms
profile_slow_request(duration_ms)
end
result
end
def self.profile_user_segment(user)
return yield unless user.beta_tester?
StackProf.run(
mode: :cpu,
out: "tmp/beta_user_#{user.id}_#{Time.current.to_i}.dump"
) { yield }
end
def self.profile_sample_percentage(percentage = 0.1)
return yield unless rand < percentage
StackProf.run(mode: :wall) { yield }
end
end
Production profiling data aggregation requires careful handling of sensitive information and storage constraints. Profile data contains method names, file paths, and execution patterns that may reveal application internals.
# Secure profile data handling
class ProfileDataManager
def self.sanitize_profile(profile_data)
profile_data.deep_transform_values do |value|
case value
when String
sanitize_path(value)
when Hash
value.except(:sensitive_data)
else
value
end
end
end
def self.aggregate_profiles(time_window = 1.hour)
profiles = ProfileData.where(
created_at: time_window.ago..Time.current
)
aggregated = profiles.group_by(&:controller_action).map do |action, data|
{
action: action,
avg_samples: data.map(&:samples).sum / data.size,
total_requests: data.size,
top_methods: extract_hot_methods(data)
}
end
ProfileReport.create!(
time_window: time_window,
data: aggregated
)
end
private
def self.sanitize_path(path)
path.gsub(Rails.root.to_s, '[ROOT]')
.gsub(/\/gems\/[^\/]+/, '/[GEM]')
end
end
Common Pitfalls
Ruby-prof significantly slows program execution, making wall-time measurements unreliable for performance optimization decisions. Deterministic profilers change execution behavior, potentially masking real performance issues or creating artificial bottlenecks.
# Incorrect: Using ruby-prof for wall-time analysis
RubyProf.profile(measure_mode: RubyProf::WALL_TIME) do
api_call_with_network_timeout # Results not representative
end
# Correct: Use sampling profiler for realistic timing
StackProf.run(mode: :wall) do
api_call_with_network_timeout # Minimal impact on execution
end
Sampling profilers require sufficient sample collection for statistical accuracy. Short-running code may not generate enough samples for meaningful analysis, while incorrect sampling intervals can miss important execution patterns.
# Insufficient sampling - unreliable results
StackProf.run(mode: :cpu, interval: 10000) do # Too infrequent
quick_operation # May not be sampled at all
end
# Better sampling configuration
StackProf.run(mode: :cpu, interval: 1000) do # 1ms intervals
1000.times { quick_operation } # Ensure adequate samples
end
# Long-running profiling for statistical accuracy
def profile_with_warmup
# Warmup phase - exclude from profiling
100.times { target_method }
# Actual profiling with sufficient duration
StackProf.run(mode: :cpu) do
1000.times { target_method }
end
end
Thread profiling introduces complexity with concurrent execution analysis. Multiple threads create interleaved execution patterns that complicate bottleneck identification. Thread exclusion and filtering help isolate specific execution paths.
# Thread profiling challenges
require 'ruby-prof'
# Problematic: All threads profiled simultaneously
RubyProf.start
threads = 5.times.map do |i|
Thread.new { worker_method(i) }
end
threads.each(&:join)
result = RubyProf.stop
# Output contains mixed thread execution - difficult to analyze
# Better: Profile specific threads
main_thread = Thread.current
RubyProf.profile(include_threads: [main_thread]) do
single_threaded_work
end
# Or exclude framework threads
excluded_threads = Thread.list.select { |t| t[:name]&.include?('server') }
RubyProf.profile(exclude_threads: excluded_threads) do
application_logic
end
Memory profiling interpretation requires understanding Ruby's object allocation and garbage collection behavior. Object allocation profilers count creation events, not concurrent object existence.
# Memory profiling misconceptions
def analyze_memory_usage
# Incorrect assumption: High allocation means high memory usage
profile = StackProf.run(mode: :object) do
1000.times do
temp_array = [1, 2, 3] # Allocated but quickly collected
temp_array.sum
end
end
# This shows 1000 array allocations, not 1000 arrays in memory
end
# Better memory analysis combines allocation and retention
def comprehensive_memory_analysis
# Track allocations
allocation_profile = StackProf.run(mode: :object) do
complex_operation
end
# Track memory retention
memory_before = GC.stat[:heap_live_slots]
complex_operation
GC.start
memory_after = GC.stat[:heap_live_slots]
puts "Allocated objects: #{allocation_profile[:samples]}"
puts "Retained objects: #{memory_after - memory_before}"
end
Profile interpretation errors include focusing on absolute rather than relative performance metrics. Method call counts and execution times depend on input size, system load, and measurement overhead.
# Misleading absolute metrics
def profile_string_operations
small_data = ["short"] * 10
large_data = ["longer string data"] * 10000
# Different input sizes produce incomparable results
small_profile = StackProf.run(mode: :cpu) do
small_data.join(" ")
end
large_profile = StackProf.run(mode: :cpu) do
large_data.join(" ")
end
# Cannot compare absolute sample counts between profiles
end
# Correct approach: Normalize and compare patterns
def comparative_profiling
[100, 1000, 10000].each do |size|
data = ["item"] * size
profile = StackProf.run(mode: :cpu) do
data.join(" ")
end
# Analyze samples per input unit for scaling behavior
samples_per_item = profile[:samples].to_f / size
puts "Size #{size}: #{samples_per_item} samples per item"
end
end
Reference
Core Classes
Class/Module | Purpose | Key Methods |
---|---|---|
Profiler__ |
Built-in deterministic profiler | start_profile , stop_profile , print_profile |
RubyProf::Profile |
Advanced profiling with multiple modes | start , stop , profile , pause , resume |
StackProf |
Sampling-based call stack profiler | run , start , stop |
MemoryProfiler |
Memory allocation tracking | report |
Benchmark |
Simple timing measurements | measure , realtime , benchmark |
Measurement Modes
Mode | Ruby-Prof Constant | StackProf Mode | Measures | Overhead |
---|---|---|---|---|
Wall Time | RubyProf::WALL_TIME |
:wall |
Real elapsed time including I/O | Low-Medium |
Process Time | RubyProf::PROCESS_TIME |
:cpu |
CPU time excluding system calls | Low |
Allocations | RubyProf::ALLOCATIONS |
:object |
Object creation count | Medium-High |
Memory Usage | RubyProf::MEMORY |
N/A | Byte allocation tracking | High |
Command Line Tools
Tool | Usage | Output Formats |
---|---|---|
ruby -rprofile script.rb |
Automatic built-in profiling | Text report |
ruby-prof script.rb |
Command-line ruby-prof execution | Text, HTML, GraphViz, JSON |
stackprof dump.dump --text |
StackProf report generation | Text, GraphViz, FlameGraph, JSON |
rbspy record --pid PID |
External process profiling | FlameGraph, raw samples |
Configuration Options
StackProf Options
StackProf.run(
mode: :cpu, # :cpu, :wall, :object
interval: 1000, # Sampling interval (microseconds for :cpu/:wall, allocations for :object)
out: 'profile.dump', # Output file path
raw: false, # Include raw sample data
ignore_gc: false # Exclude garbage collection frames
)
RubyProf Options
RubyProf.profile(
measure_mode: RubyProf::WALL_TIME, # Measurement type
track_allocations: false, # Track object allocation details
include_threads: [Thread.current], # Threads to include
exclude_threads: [], # Threads to exclude
merge_fibers: false # Combine fiber execution
)
Output Interpreters
Report Type | Description | Use Case |
---|---|---|
Flat Report | Method-level time/allocation summary | Quick bottleneck identification |
Graph Report | Call hierarchy with parent/child relationships | Understanding call flow |
Call Tree | Hierarchical execution structure | Detailed execution analysis |
FlameGraph | Interactive flame graph visualization | Visual performance analysis |
Integration Patterns
Rails Middleware
# config/application.rb
config.middleware.insert_before(
Rack::Runtime,
Rack::MiniProfiler
)
Background Job Profiling
class ProfiledJob < ApplicationJob
around_perform :profile_execution
private
def profile_execution
StackProf.run(
mode: :wall,
out: "tmp/job_#{jid}_profile.dump"
) { yield }
end
end
Error Handling
Error Type | Common Cause | Solution |
---|---|---|
NoMethodError on profile results |
Incorrect profiler usage | Verify profiler is started/stopped correctly |
Empty profile data | Insufficient execution time | Increase workload or reduce sampling interval |
High memory usage during profiling | Allocation tracking enabled | Use sampling modes or disable allocation tracking |
Thread synchronization errors | Multi-threaded profiling | Use thread inclusion/exclusion filters |
Performance Baselines
Operation | Ruby-Prof Overhead | StackProf Overhead | Recommended Profiler |
---|---|---|---|
CPU-intensive computation | 5-20x slower | < 5% | StackProf :cpu |
I/O operations | 2-10x slower | < 2% | StackProf :wall |
Memory allocation analysis | 10-50x slower | 10-30% | StackProf :object |
Production profiling | Not recommended | < 1% | StackProf or rbspy |