CrackedRuby logo

CrackedRuby

stackprof

A comprehensive guide to profiling Ruby applications using stackprof for performance analysis and optimization.

Performance Optimization Profiling Tools
7.3.2

Overview

Stackprof is a statistical profiling library for Ruby that samples call stack information during program execution. Ruby implements stackprof as a C extension that hooks into the interpreter's execution cycle to collect timing and memory allocation data.

The profiler operates by interrupting program execution at regular intervals and recording the current call stack. This sampling approach provides performance insights with minimal overhead compared to tracing profilers that instrument every method call.

Stackprof supports three primary sampling modes: CPU time sampling measures actual processor time spent in code, wall time sampling captures elapsed real time including I/O waits, and object allocation sampling tracks memory allocation patterns. Each mode serves different profiling scenarios and performance analysis needs.

require 'stackprof'

# Basic CPU profiling
StackProf.run(mode: :cpu, out: 'profile.dump') do
  # Code to profile
  1000.times { expensive_calculation }
end

The profiler generates binary dump files containing call stack samples and execution statistics. These dumps can be analyzed using built-in report generators or external tools like speedscope for interactive visualization.

# Generate text report from dump file
StackProf::Report.new(StackProf.results).print_text

Ruby applications typically integrate stackprof for production performance monitoring, development debugging, and optimization analysis. The library works across different Ruby implementations and provides consistent profiling data for performance regression detection and hot spot identification.

Basic Usage

Stackprof profiling begins with the StackProf.run method that wraps code blocks with sampling collection. The profiler accepts configuration options that control sampling behavior and output format.

require 'stackprof'

# Profile CPU usage with 1000 samples per second
result = StackProf.run(mode: :cpu, interval: 1000) do
  data = []
  10000.times do |i|
    data << i.to_s.reverse
  end
  data.sort
end

# Access raw profiling data
puts result[:samples]    # Total samples collected
puts result[:missed]     # Samples that couldn't be collected

The mode parameter determines what stackprof measures during execution. CPU mode samples based on processor time, wall mode samples based on elapsed time, and object mode samples based on memory allocations.

# Wall time profiling includes I/O waits
StackProf.run(mode: :wall, out: 'wall_profile.dump') do
  Net::HTTP.get('example.com', '/')
  File.read('large_file.txt')
  sleep(0.1)
end

# Object allocation profiling
StackProf.run(mode: :object, out: 'object_profile.dump') do
  1000.times { Array.new(100) { rand } }
end

Report generation transforms raw profiling data into readable analysis. The print_text method produces call stack summaries with execution time percentages and sample counts.

# Load and analyze existing profile
data = StackProf.load('profile.dump')
report = StackProf::Report.new(data)

# Generate text report
report.print_text

# Generate method-focused report
report.print_method(/expensive_method/)

Filtering options focus analysis on specific code areas. Regular expressions match method names, and limit parameters control output verbosity.

# Profile with custom filtering
StackProf.run(mode: :cpu, interval: 1000) do
  process_user_data
  generate_reports
  cleanup_resources
end

# Analyze only methods matching pattern
report.print_text(filter: /process_/)
report.print_method(/generate_reports/, limit: 10)

Advanced Usage

Stackprof provides extensive configuration options for specialized profiling scenarios and detailed performance analysis. Custom sampling intervals adjust profiling granularity based on application characteristics and analysis requirements.

# High-frequency sampling for short-running operations
StackProf.run(mode: :cpu, interval: 10000, raw: true) do
  critical_algorithm
end

# Low-frequency sampling for long-running processes
StackProf.run(mode: :wall, interval: 100, out: 'background.dump') do
  background_job_processor
end

The raw option preserves individual sample data for custom analysis instead of aggregating results. This mode enables detailed timing analysis and custom report generation.

result = StackProf.run(mode: :cpu, raw: true) do
  complex_operation
end

# Access individual samples
result[:raw].each do |timestamp, stack|
  puts "Sample at #{timestamp}: #{stack.length} frames"
  stack.each { |frame| puts "  #{frame[:name]}" }
end

Profile merging combines multiple profiling sessions for comprehensive analysis of distributed operations or repeated executions.

profiles = []

# Collect multiple profiles
5.times do |i|
  profile = StackProf.run(mode: :cpu, interval: 1000) do
    process_batch(i)
  end
  profiles << profile
end

# Merge profiles for aggregate analysis
merged = profiles.reduce { |acc, profile| StackProf.merge(acc, profile) }
StackProf::Report.new(merged).print_text

Custom report formatting extracts specific performance metrics and integrates with monitoring systems. The report data structure provides programmatic access to profiling statistics.

data = StackProf.load('production.dump')
report = StackProf::Report.new(data)

# Extract top methods by sample count
top_methods = report.frames.sort_by { |_, frame| -frame[:samples] }
                          .first(10)
                          .map { |name, frame| [name, frame[:samples]] }

# Generate custom metrics
total_samples = data[:samples]
top_methods.each do |method, samples|
  percentage = (samples.to_f / total_samples * 100).round(2)
  puts "#{method}: #{samples} samples (#{percentage}%)"
end

Thread-specific profiling isolates performance analysis to individual threads in multi-threaded applications. This approach identifies thread-specific bottlenecks and synchronization issues.

# Profile specific thread
thread = Thread.new do
  StackProf.run(mode: :cpu, out: 'thread_profile.dump') do
    thread_specific_work
  end
end

thread.join

# Analyze thread performance
data = StackProf.load('thread_profile.dump')
StackProf::Report.new(data).print_text

Performance & Memory

Stackprof introduces minimal performance overhead during profiling, typically consuming 1-5% of CPU time depending on sampling frequency and application characteristics. The overhead primarily comes from stack traversal and sample recording operations.

Sampling interval configuration balances profiling accuracy with performance impact. Higher intervals provide more detailed data but increase overhead, while lower intervals reduce impact but may miss performance hotspots.

# Measure profiling overhead
require 'benchmark'

# Baseline execution time
baseline = Benchmark.measure do
  10000.times { expensive_operation }
end

# Profiled execution time
profiled = Benchmark.measure do
  StackProf.run(mode: :cpu, interval: 1000) do
    10000.times { expensive_operation }
  end
end

overhead = ((profiled.real - baseline.real) / baseline.real * 100).round(2)
puts "Profiling overhead: #{overhead}%"

Memory usage scales with profile duration and call stack depth. Stackprof stores frame information and sample data in memory before writing to dump files. Applications with deep call stacks or long profiling sessions may require memory management considerations.

# Monitor memory usage during profiling
def measure_memory_usage
  GC.start
  ObjectSpace.count_objects[:TOTAL]
end

before_memory = measure_memory_usage

profile_data = StackProf.run(mode: :object, interval: 1) do
  large_data_processing
end

after_memory = measure_memory_usage
memory_increase = after_memory - before_memory

puts "Memory used by profiling: #{memory_increase} objects"
puts "Profile samples collected: #{profile_data[:samples]}"

Object allocation profiling tracks memory allocation patterns and identifies memory-intensive operations. This mode helps locate memory leaks and optimize allocation-heavy code paths.

# Analyze allocation patterns
allocation_profile = StackProf.run(mode: :object) do
  data = {}
  1000.times do |i|
    key = "key_#{i}"
    data[key] = Array.new(100) { i * rand }
  end
  data
end

report = StackProf::Report.new(allocation_profile)
report.print_text

# Focus on high-allocation methods
report.print_method(/new/, limit: 5)

Wall clock profiling captures I/O wait times and system call overhead that CPU profiling misses. This mode reveals performance issues related to file operations, network requests, and external service dependencies.

# Compare CPU vs wall time profiling
cpu_result = StackProf.run(mode: :cpu) do
  perform_io_heavy_operation
end

wall_result = StackProf.run(mode: :wall) do
  perform_io_heavy_operation
end

puts "CPU samples: #{cpu_result[:samples]}"
puts "Wall samples: #{wall_result[:samples]}"
puts "I/O wait ratio: #{(wall_result[:samples] - cpu_result[:samples]).to_f / wall_result[:samples]}"

Production Patterns

Production environments require careful stackprof integration to avoid performance degradation while maintaining comprehensive performance monitoring. Conditional profiling activates sampling based on request characteristics or system conditions.

class ApplicationController < ActionController::Base
  around_action :conditional_profiling

  private

  def conditional_profiling
    if should_profile?
      profile_name = "#{controller_name}_#{action_name}_#{Time.current.to_i}"
      
      StackProf.run(mode: :wall, out: "profiles/#{profile_name}.dump") do
        yield
      end
    else
      yield
    end
  end

  def should_profile?
    # Profile 1% of requests or specific slow operations
    rand < 0.01 || params[:action] == 'expensive_report'
  end
end

Automated profile collection and analysis integrates with monitoring systems to detect performance regressions and track optimization improvements over time.

class ProfileCollector
  def initialize(storage_path:, retention_days: 7)
    @storage_path = storage_path
    @retention_days = retention_days
  end

  def collect_profile(name, mode: :cpu, interval: 1000)
    timestamp = Time.current.to_i
    filename = "#{@storage_path}/#{name}_#{timestamp}.dump"

    result = StackProf.run(mode: mode, interval: interval, out: filename) do
      yield
    end

    analyze_and_alert(filename, result)
    cleanup_old_profiles
    
    result
  end

  private

  def analyze_and_alert(filename, result)
    if result[:samples] > 10000  # High sample count indicates slow operation
      send_performance_alert(filename)
    end
  end

  def cleanup_old_profiles
    cutoff_time = Time.current - (@retention_days * 24 * 60 * 60)
    Dir.glob("#{@storage_path}/*.dump").each do |file|
      File.delete(file) if File.mtime(file) < cutoff_time
    end
  end
end

Background job profiling captures performance characteristics of asynchronous operations without impacting user-facing request performance.

class ProfiledJob < ApplicationJob
  def perform(*args)
    job_name = self.class.name.underscore
    
    StackProf.run(mode: :cpu, out: "job_profiles/#{job_name}_#{job_id}.dump") do
      original_perform(*args)
    end
  end

  def original_perform(*args)
    # Actual job implementation
    raise NotImplementedError
  end
end

class DataProcessingJob < ProfiledJob
  def original_perform(dataset_id)
    dataset = Dataset.find(dataset_id)
    dataset.process_analytics
    dataset.generate_reports
  end
end

Profile aggregation and comparison enables performance trend analysis and regression detection across application deployments.

class ProfileAnalyzer
  def compare_profiles(baseline_file, current_file)
    baseline = StackProf.load(baseline_file)
    current = StackProf.load(current_file)

    baseline_report = StackProf::Report.new(baseline)
    current_report = StackProf::Report.new(current)

    # Extract top methods from each profile
    baseline_methods = extract_top_methods(baseline_report, 20)
    current_methods = extract_top_methods(current_report, 20)

    # Compare performance changes
    performance_changes = {}
    baseline_methods.each do |method, baseline_samples|
      current_samples = current_methods[method] || 0
      change_ratio = current_samples.to_f / baseline_samples
      performance_changes[method] = change_ratio
    end

    # Identify significant regressions
    regressions = performance_changes.select { |_, ratio| ratio > 1.5 }
    improvements = performance_changes.select { |_, ratio| ratio < 0.5 }

    { regressions: regressions, improvements: improvements }
  end

  private

  def extract_top_methods(report, limit)
    report.frames.sort_by { |_, frame| -frame[:samples] }
                .first(limit)
                .map { |name, frame| [name, frame[:samples]] }
                .to_h
  end
end

Error Handling & Debugging

Stackprof profiling can encounter various error conditions that require proper handling to maintain application stability and ensure reliable performance data collection.

Signal handling conflicts occur when stackprof's timer interrupts interfere with application signal handlers. This commonly affects applications using timeout mechanisms or custom signal processing.

begin
  # Wrap profiling with signal handling
  old_handler = Signal.trap('PROF', 'IGNORE') if Signal.list.include?('PROF')
  
  result = StackProf.run(mode: :cpu, interval: 1000) do
    operation_that_may_timeout
  end
rescue SignalException => e
  puts "Signal handling conflict: #{e.message}"
  # Fallback to wall time profiling
  result = StackProf.run(mode: :wall, interval: 100) do
    operation_that_may_timeout
  end
ensure
  Signal.trap('PROF', old_handler) if old_handler
end

Memory constraints can cause profiling failures when sampling generates more data than available memory. This typically occurs with high-frequency sampling or very long profiling sessions.

def safe_profile(code_block, fallback_interval: 100)
  intervals = [10000, 5000, 1000, 500, fallback_interval]
  
  intervals.each do |interval|
    begin
      return StackProf.run(mode: :cpu, interval: interval, &code_block)
    rescue SystemStackError, NoMemoryError => e
      puts "Profiling failed with interval #{interval}: #{e.message}"
      next if interval > fallback_interval
      raise
    end
  end
end

# Usage with automatic fallback
result = safe_profile(lambda { expensive_recursive_operation })

Profile corruption can occur when profiling is interrupted or when concurrent access attempts modify dump files. Validation ensures profile integrity before analysis.

def load_and_validate_profile(filename)
  return nil unless File.exist?(filename)
  
  begin
    data = StackProf.load(filename)
    
    # Validate profile structure
    required_keys = [:version, :mode, :interval, :samples, :frames]
    missing_keys = required_keys - data.keys
    
    if missing_keys.any?
      puts "Invalid profile #{filename}: missing keys #{missing_keys}"
      return nil
    end
    
    # Validate sample count consistency
    if data[:samples] == 0 && data[:frames].any?
      puts "Inconsistent profile #{filename}: no samples but frames present"
      return nil
    end
    
    data
  rescue => e
    puts "Failed to load profile #{filename}: #{e.message}"
    nil
  end
end

# Safe profile analysis with validation
profile_files = Dir.glob('profiles/*.dump')
valid_profiles = profile_files.filter_map { |file| load_and_validate_profile(file) }

if valid_profiles.empty?
  puts "No valid profiles found for analysis"
else
  merged_profile = valid_profiles.reduce { |acc, profile| StackProf.merge(acc, profile) }
  StackProf::Report.new(merged_profile).print_text
end

Thread safety issues arise when multiple threads attempt to profile simultaneously or when profiling multi-threaded code. Ruby's global interpreter lock mitigates some concurrency issues but doesn't eliminate all race conditions.

class ThreadSafeProfiler
  def initialize
    @profiling = false
    @mutex = Mutex.new
  end

  def profile(name, mode: :cpu)
    @mutex.synchronize do
      if @profiling
        puts "Profiling already in progress, skipping #{name}"
        return yield
      end
      
      @profiling = true
    end

    begin
      StackProf.run(mode: mode, out: "#{name}.dump") do
        yield
      end
    ensure
      @mutex.synchronize { @profiling = false }
    end
  end
end

# Safe concurrent profiling
profiler = ThreadSafeProfiler.new

threads = 3.times.map do |i|
  Thread.new do
    profiler.profile("thread_#{i}") { cpu_intensive_work(i) }
  end
end

threads.each(&:join)

Reference

Core Methods

Method Parameters Returns Description
StackProf.run(options, &block) mode (Symbol), interval (Integer), out (String), raw (Boolean) Hash Profiles block execution and returns results
StackProf.load(filename) filename (String) Hash Loads profile data from dump file
StackProf.results None Hash Returns current profiling session results
StackProf.merge(profile1, profile2) profile1 (Hash), profile2 (Hash) Hash Merges two profile datasets

Profiling Modes

Mode Samples Overhead Use Case
:cpu CPU time Low CPU-bound operations
:wall Elapsed time Medium I/O-bound operations
:object Allocations High Memory optimization

Configuration Options

Option Type Default Description
mode Symbol :cpu Sampling mode (cpu, wall, object)
interval Integer 1000 Sampling frequency (microseconds for cpu/wall, allocations for object)
out String nil Output dump file path
raw Boolean false Preserve individual samples

Report Methods

Method Parameters Returns Description
Report.new(data) data (Hash) Report Creates report from profile data
#print_text(options) filter (Regexp), limit (Integer) nil Prints text summary
#print_method(filter, options) filter (String/Regexp), limit (Integer) nil Prints method details
#frames None Hash Returns frame data

Data Structure

# Profile data structure
{
  :version => "1.0",
  :mode => :cpu,
  :interval => 1000,
  :samples => 12045,
  :missed => 15,
  :frames => {
    0x12345 => {
      :name => "MyClass#method_name",
      :file => "/path/to/file.rb",
      :line => 42,
      :samples => 150,
      :total_samples => 250
    }
  }
}

Error Classes

Error Cause Resolution
SignalException Timer signal conflicts Use wall mode or handle signals
SystemStackError Stack overflow during sampling Reduce sampling interval
NoMemoryError Insufficient memory for samples Reduce interval or profile duration
Errno::EACCES Permission denied writing dump Check file permissions

Environment Variables

Variable Effect Example
STACKPROF_ENABLED Enable/disable profiling STACKPROF_ENABLED=0
STACKPROF_MODE Default profiling mode STACKPROF_MODE=wall
STACKPROF_INTERVAL Default sampling interval STACKPROF_INTERVAL=2000