Overview
Stackprof is a statistical profiling library for Ruby that samples call stack information during program execution. Ruby implements stackprof as a C extension that hooks into the interpreter's execution cycle to collect timing and memory allocation data.
The profiler operates by interrupting program execution at regular intervals and recording the current call stack. This sampling approach provides performance insights with minimal overhead compared to tracing profilers that instrument every method call.
Stackprof supports three primary sampling modes: CPU time sampling measures actual processor time spent in code, wall time sampling captures elapsed real time including I/O waits, and object allocation sampling tracks memory allocation patterns. Each mode serves different profiling scenarios and performance analysis needs.
require 'stackprof'
# Basic CPU profiling
StackProf.run(mode: :cpu, out: 'profile.dump') do
# Code to profile
1000.times { expensive_calculation }
end
The profiler generates binary dump files containing call stack samples and execution statistics. These dumps can be analyzed using built-in report generators or external tools like speedscope for interactive visualization.
# Generate text report from dump file
StackProf::Report.new(StackProf.results).print_text
Ruby applications typically integrate stackprof for production performance monitoring, development debugging, and optimization analysis. The library works across different Ruby implementations and provides consistent profiling data for performance regression detection and hot spot identification.
Basic Usage
Stackprof profiling begins with the StackProf.run
method that wraps code blocks with sampling collection. The profiler accepts configuration options that control sampling behavior and output format.
require 'stackprof'
# Profile CPU usage with 1000 samples per second
result = StackProf.run(mode: :cpu, interval: 1000) do
data = []
10000.times do |i|
data << i.to_s.reverse
end
data.sort
end
# Access raw profiling data
puts result[:samples] # Total samples collected
puts result[:missed] # Samples that couldn't be collected
The mode
parameter determines what stackprof measures during execution. CPU mode samples based on processor time, wall mode samples based on elapsed time, and object mode samples based on memory allocations.
# Wall time profiling includes I/O waits
StackProf.run(mode: :wall, out: 'wall_profile.dump') do
Net::HTTP.get('example.com', '/')
File.read('large_file.txt')
sleep(0.1)
end
# Object allocation profiling
StackProf.run(mode: :object, out: 'object_profile.dump') do
1000.times { Array.new(100) { rand } }
end
Report generation transforms raw profiling data into readable analysis. The print_text
method produces call stack summaries with execution time percentages and sample counts.
# Load and analyze existing profile
data = StackProf.load('profile.dump')
report = StackProf::Report.new(data)
# Generate text report
report.print_text
# Generate method-focused report
report.print_method(/expensive_method/)
Filtering options focus analysis on specific code areas. Regular expressions match method names, and limit parameters control output verbosity.
# Profile with custom filtering
StackProf.run(mode: :cpu, interval: 1000) do
process_user_data
generate_reports
cleanup_resources
end
# Analyze only methods matching pattern
report.print_text(filter: /process_/)
report.print_method(/generate_reports/, limit: 10)
Advanced Usage
Stackprof provides extensive configuration options for specialized profiling scenarios and detailed performance analysis. Custom sampling intervals adjust profiling granularity based on application characteristics and analysis requirements.
# High-frequency sampling for short-running operations
StackProf.run(mode: :cpu, interval: 10000, raw: true) do
critical_algorithm
end
# Low-frequency sampling for long-running processes
StackProf.run(mode: :wall, interval: 100, out: 'background.dump') do
background_job_processor
end
The raw
option preserves individual sample data for custom analysis instead of aggregating results. This mode enables detailed timing analysis and custom report generation.
result = StackProf.run(mode: :cpu, raw: true) do
complex_operation
end
# Access individual samples
result[:raw].each do |timestamp, stack|
puts "Sample at #{timestamp}: #{stack.length} frames"
stack.each { |frame| puts " #{frame[:name]}" }
end
Profile merging combines multiple profiling sessions for comprehensive analysis of distributed operations or repeated executions.
profiles = []
# Collect multiple profiles
5.times do |i|
profile = StackProf.run(mode: :cpu, interval: 1000) do
process_batch(i)
end
profiles << profile
end
# Merge profiles for aggregate analysis
merged = profiles.reduce { |acc, profile| StackProf.merge(acc, profile) }
StackProf::Report.new(merged).print_text
Custom report formatting extracts specific performance metrics and integrates with monitoring systems. The report data structure provides programmatic access to profiling statistics.
data = StackProf.load('production.dump')
report = StackProf::Report.new(data)
# Extract top methods by sample count
top_methods = report.frames.sort_by { |_, frame| -frame[:samples] }
.first(10)
.map { |name, frame| [name, frame[:samples]] }
# Generate custom metrics
total_samples = data[:samples]
top_methods.each do |method, samples|
percentage = (samples.to_f / total_samples * 100).round(2)
puts "#{method}: #{samples} samples (#{percentage}%)"
end
Thread-specific profiling isolates performance analysis to individual threads in multi-threaded applications. This approach identifies thread-specific bottlenecks and synchronization issues.
# Profile specific thread
thread = Thread.new do
StackProf.run(mode: :cpu, out: 'thread_profile.dump') do
thread_specific_work
end
end
thread.join
# Analyze thread performance
data = StackProf.load('thread_profile.dump')
StackProf::Report.new(data).print_text
Performance & Memory
Stackprof introduces minimal performance overhead during profiling, typically consuming 1-5% of CPU time depending on sampling frequency and application characteristics. The overhead primarily comes from stack traversal and sample recording operations.
Sampling interval configuration balances profiling accuracy with performance impact. Higher intervals provide more detailed data but increase overhead, while lower intervals reduce impact but may miss performance hotspots.
# Measure profiling overhead
require 'benchmark'
# Baseline execution time
baseline = Benchmark.measure do
10000.times { expensive_operation }
end
# Profiled execution time
profiled = Benchmark.measure do
StackProf.run(mode: :cpu, interval: 1000) do
10000.times { expensive_operation }
end
end
overhead = ((profiled.real - baseline.real) / baseline.real * 100).round(2)
puts "Profiling overhead: #{overhead}%"
Memory usage scales with profile duration and call stack depth. Stackprof stores frame information and sample data in memory before writing to dump files. Applications with deep call stacks or long profiling sessions may require memory management considerations.
# Monitor memory usage during profiling
def measure_memory_usage
GC.start
ObjectSpace.count_objects[:TOTAL]
end
before_memory = measure_memory_usage
profile_data = StackProf.run(mode: :object, interval: 1) do
large_data_processing
end
after_memory = measure_memory_usage
memory_increase = after_memory - before_memory
puts "Memory used by profiling: #{memory_increase} objects"
puts "Profile samples collected: #{profile_data[:samples]}"
Object allocation profiling tracks memory allocation patterns and identifies memory-intensive operations. This mode helps locate memory leaks and optimize allocation-heavy code paths.
# Analyze allocation patterns
allocation_profile = StackProf.run(mode: :object) do
data = {}
1000.times do |i|
key = "key_#{i}"
data[key] = Array.new(100) { i * rand }
end
data
end
report = StackProf::Report.new(allocation_profile)
report.print_text
# Focus on high-allocation methods
report.print_method(/new/, limit: 5)
Wall clock profiling captures I/O wait times and system call overhead that CPU profiling misses. This mode reveals performance issues related to file operations, network requests, and external service dependencies.
# Compare CPU vs wall time profiling
cpu_result = StackProf.run(mode: :cpu) do
perform_io_heavy_operation
end
wall_result = StackProf.run(mode: :wall) do
perform_io_heavy_operation
end
puts "CPU samples: #{cpu_result[:samples]}"
puts "Wall samples: #{wall_result[:samples]}"
puts "I/O wait ratio: #{(wall_result[:samples] - cpu_result[:samples]).to_f / wall_result[:samples]}"
Production Patterns
Production environments require careful stackprof integration to avoid performance degradation while maintaining comprehensive performance monitoring. Conditional profiling activates sampling based on request characteristics or system conditions.
class ApplicationController < ActionController::Base
around_action :conditional_profiling
private
def conditional_profiling
if should_profile?
profile_name = "#{controller_name}_#{action_name}_#{Time.current.to_i}"
StackProf.run(mode: :wall, out: "profiles/#{profile_name}.dump") do
yield
end
else
yield
end
end
def should_profile?
# Profile 1% of requests or specific slow operations
rand < 0.01 || params[:action] == 'expensive_report'
end
end
Automated profile collection and analysis integrates with monitoring systems to detect performance regressions and track optimization improvements over time.
class ProfileCollector
def initialize(storage_path:, retention_days: 7)
@storage_path = storage_path
@retention_days = retention_days
end
def collect_profile(name, mode: :cpu, interval: 1000)
timestamp = Time.current.to_i
filename = "#{@storage_path}/#{name}_#{timestamp}.dump"
result = StackProf.run(mode: mode, interval: interval, out: filename) do
yield
end
analyze_and_alert(filename, result)
cleanup_old_profiles
result
end
private
def analyze_and_alert(filename, result)
if result[:samples] > 10000 # High sample count indicates slow operation
send_performance_alert(filename)
end
end
def cleanup_old_profiles
cutoff_time = Time.current - (@retention_days * 24 * 60 * 60)
Dir.glob("#{@storage_path}/*.dump").each do |file|
File.delete(file) if File.mtime(file) < cutoff_time
end
end
end
Background job profiling captures performance characteristics of asynchronous operations without impacting user-facing request performance.
class ProfiledJob < ApplicationJob
def perform(*args)
job_name = self.class.name.underscore
StackProf.run(mode: :cpu, out: "job_profiles/#{job_name}_#{job_id}.dump") do
original_perform(*args)
end
end
def original_perform(*args)
# Actual job implementation
raise NotImplementedError
end
end
class DataProcessingJob < ProfiledJob
def original_perform(dataset_id)
dataset = Dataset.find(dataset_id)
dataset.process_analytics
dataset.generate_reports
end
end
Profile aggregation and comparison enables performance trend analysis and regression detection across application deployments.
class ProfileAnalyzer
def compare_profiles(baseline_file, current_file)
baseline = StackProf.load(baseline_file)
current = StackProf.load(current_file)
baseline_report = StackProf::Report.new(baseline)
current_report = StackProf::Report.new(current)
# Extract top methods from each profile
baseline_methods = extract_top_methods(baseline_report, 20)
current_methods = extract_top_methods(current_report, 20)
# Compare performance changes
performance_changes = {}
baseline_methods.each do |method, baseline_samples|
current_samples = current_methods[method] || 0
change_ratio = current_samples.to_f / baseline_samples
performance_changes[method] = change_ratio
end
# Identify significant regressions
regressions = performance_changes.select { |_, ratio| ratio > 1.5 }
improvements = performance_changes.select { |_, ratio| ratio < 0.5 }
{ regressions: regressions, improvements: improvements }
end
private
def extract_top_methods(report, limit)
report.frames.sort_by { |_, frame| -frame[:samples] }
.first(limit)
.map { |name, frame| [name, frame[:samples]] }
.to_h
end
end
Error Handling & Debugging
Stackprof profiling can encounter various error conditions that require proper handling to maintain application stability and ensure reliable performance data collection.
Signal handling conflicts occur when stackprof's timer interrupts interfere with application signal handlers. This commonly affects applications using timeout mechanisms or custom signal processing.
begin
# Wrap profiling with signal handling
old_handler = Signal.trap('PROF', 'IGNORE') if Signal.list.include?('PROF')
result = StackProf.run(mode: :cpu, interval: 1000) do
operation_that_may_timeout
end
rescue SignalException => e
puts "Signal handling conflict: #{e.message}"
# Fallback to wall time profiling
result = StackProf.run(mode: :wall, interval: 100) do
operation_that_may_timeout
end
ensure
Signal.trap('PROF', old_handler) if old_handler
end
Memory constraints can cause profiling failures when sampling generates more data than available memory. This typically occurs with high-frequency sampling or very long profiling sessions.
def safe_profile(code_block, fallback_interval: 100)
intervals = [10000, 5000, 1000, 500, fallback_interval]
intervals.each do |interval|
begin
return StackProf.run(mode: :cpu, interval: interval, &code_block)
rescue SystemStackError, NoMemoryError => e
puts "Profiling failed with interval #{interval}: #{e.message}"
next if interval > fallback_interval
raise
end
end
end
# Usage with automatic fallback
result = safe_profile(lambda { expensive_recursive_operation })
Profile corruption can occur when profiling is interrupted or when concurrent access attempts modify dump files. Validation ensures profile integrity before analysis.
def load_and_validate_profile(filename)
return nil unless File.exist?(filename)
begin
data = StackProf.load(filename)
# Validate profile structure
required_keys = [:version, :mode, :interval, :samples, :frames]
missing_keys = required_keys - data.keys
if missing_keys.any?
puts "Invalid profile #{filename}: missing keys #{missing_keys}"
return nil
end
# Validate sample count consistency
if data[:samples] == 0 && data[:frames].any?
puts "Inconsistent profile #{filename}: no samples but frames present"
return nil
end
data
rescue => e
puts "Failed to load profile #{filename}: #{e.message}"
nil
end
end
# Safe profile analysis with validation
profile_files = Dir.glob('profiles/*.dump')
valid_profiles = profile_files.filter_map { |file| load_and_validate_profile(file) }
if valid_profiles.empty?
puts "No valid profiles found for analysis"
else
merged_profile = valid_profiles.reduce { |acc, profile| StackProf.merge(acc, profile) }
StackProf::Report.new(merged_profile).print_text
end
Thread safety issues arise when multiple threads attempt to profile simultaneously or when profiling multi-threaded code. Ruby's global interpreter lock mitigates some concurrency issues but doesn't eliminate all race conditions.
class ThreadSafeProfiler
def initialize
@profiling = false
@mutex = Mutex.new
end
def profile(name, mode: :cpu)
@mutex.synchronize do
if @profiling
puts "Profiling already in progress, skipping #{name}"
return yield
end
@profiling = true
end
begin
StackProf.run(mode: mode, out: "#{name}.dump") do
yield
end
ensure
@mutex.synchronize { @profiling = false }
end
end
end
# Safe concurrent profiling
profiler = ThreadSafeProfiler.new
threads = 3.times.map do |i|
Thread.new do
profiler.profile("thread_#{i}") { cpu_intensive_work(i) }
end
end
threads.each(&:join)
Reference
Core Methods
Method | Parameters | Returns | Description |
---|---|---|---|
StackProf.run(options, &block) |
mode (Symbol), interval (Integer), out (String), raw (Boolean) |
Hash | Profiles block execution and returns results |
StackProf.load(filename) |
filename (String) |
Hash | Loads profile data from dump file |
StackProf.results |
None | Hash | Returns current profiling session results |
StackProf.merge(profile1, profile2) |
profile1 (Hash), profile2 (Hash) |
Hash | Merges two profile datasets |
Profiling Modes
Mode | Samples | Overhead | Use Case |
---|---|---|---|
:cpu |
CPU time | Low | CPU-bound operations |
:wall |
Elapsed time | Medium | I/O-bound operations |
:object |
Allocations | High | Memory optimization |
Configuration Options
Option | Type | Default | Description |
---|---|---|---|
mode |
Symbol | :cpu |
Sampling mode (cpu, wall, object) |
interval |
Integer | 1000 | Sampling frequency (microseconds for cpu/wall, allocations for object) |
out |
String | nil | Output dump file path |
raw |
Boolean | false | Preserve individual samples |
Report Methods
Method | Parameters | Returns | Description |
---|---|---|---|
Report.new(data) |
data (Hash) |
Report | Creates report from profile data |
#print_text(options) |
filter (Regexp), limit (Integer) |
nil | Prints text summary |
#print_method(filter, options) |
filter (String/Regexp), limit (Integer) |
nil | Prints method details |
#frames |
None | Hash | Returns frame data |
Data Structure
# Profile data structure
{
:version => "1.0",
:mode => :cpu,
:interval => 1000,
:samples => 12045,
:missed => 15,
:frames => {
0x12345 => {
:name => "MyClass#method_name",
:file => "/path/to/file.rb",
:line => 42,
:samples => 150,
:total_samples => 250
}
}
}
Error Classes
Error | Cause | Resolution |
---|---|---|
SignalException |
Timer signal conflicts | Use wall mode or handle signals |
SystemStackError |
Stack overflow during sampling | Reduce sampling interval |
NoMemoryError |
Insufficient memory for samples | Reduce interval or profile duration |
Errno::EACCES |
Permission denied writing dump | Check file permissions |
Environment Variables
Variable | Effect | Example |
---|---|---|
STACKPROF_ENABLED |
Enable/disable profiling | STACKPROF_ENABLED=0 |
STACKPROF_MODE |
Default profiling mode | STACKPROF_MODE=wall |
STACKPROF_INTERVAL |
Default sampling interval | STACKPROF_INTERVAL=2000 |