CrackedRuby - Profile

Overview

Ruby provides multiple profiling approaches for measuring application performance and identifying execution bottlenecks. The built-in profile module offers deterministic profiling through method call tracing, while sampling profilers like stackprof and ruby-prof provide statistical performance analysis. Each profiler serves different measurement needs and deployment scenarios.

Ruby-prof operates as a C extension with measurement modes for wall time, process time, object allocations, and memory usage. StackProf implements sampling-based profiling using Ruby's C-API tracepoint system, collecting call stacks at regular intervals without significant runtime overhead. The built-in profiler tracks every method call, producing detailed but computationally expensive reports.

# Built-in profile module
require 'profile'

def calculate_fibonacci(n)
  return n if n <= 1
  calculate_fibonacci(n - 1) + calculate_fibonacci(n - 2)
end

calculate_fibonacci(30)
# Automatically generates profiling report on exit

Ruby profilers measure different execution aspects: CPU time tracks processor usage, wall time measures real-world elapsed time, and allocation profilers monitor object creation and memory consumption. Thread-aware profilers handle concurrent code execution, while sampling profilers balance measurement accuracy with performance impact.

# StackProf sampling profiler
require 'stackprof'

profile = StackProf.run(mode: :cpu, interval: 1000) do
  1000.times { |i| String.new("iteration #{i}") }
end

StackProf::Report.new(profile).print_text

Ruby profiling integrates with development workflows through command-line interfaces, programmatic APIs, and web application middleware. External profilers like rbspy attach to running processes without code modification, while embedded profilers require application integration.

Basic Usage

The standard library profile module requires only require 'profile' to activate automatic profiling for the entire program execution. Output appears when the program terminates, showing method call counts, execution times, and performance percentages.

require 'profile'

class DataProcessor
  def process_records(records)
    records.map { |record| transform_record(record) }
  end

  private

  def transform_record(record)
    record.upcase.strip.gsub(/\s+/, '_')
  end
end

processor = DataProcessor.new
data = ["  hello world  ", "  ruby programming  "] * 1000
processor.process_records(data)

# Outputs detailed profiling report:
# % time     seconds  calls  ms/call  name
# 45.32      0.124    1000    0.12    String#gsub
# 23.18      0.063    1000    0.06    String#upcase
# 15.42      0.042    1000    0.04    String#strip

Ruby-prof provides programmatic profiling control through start/stop methods and block-based profiling. Multiple measurement modes capture different performance aspects, with configurable output formats for analysis.

require 'ruby-prof'

# Explicit start/stop profiling
RubyProf.start
expensive_operation
result = RubyProf.stop

# Generate flat report
printer = RubyProf::FlatPrinter.new(result)
printer.print(STDOUT)

# Block-based profiling with options
result = RubyProf.profile(measure_mode: RubyProf::WALL_TIME) do
  10000.times { Math.sqrt(rand(1000)) }
end

# Graph report showing call relationships
printer = RubyProf::GraphPrinter.new(result)
printer.print(STDOUT)

StackProf sampling profiler reduces performance impact by collecting call stacks at specified intervals rather than tracing every method call. Different sampling modes target CPU usage, wall clock time, or object allocations.

require 'stackprof'

# CPU profiling with 1ms sampling interval
StackProf.run(mode: :cpu, interval: 1000, out: 'cpu_profile.dump') do
  complex_calculation
end

# Wall time profiling for I/O operations
StackProf.run(mode: :wall, interval: 10, out: 'wall_profile.dump') do
  File.read('large_file.txt').split("\n").each(&:strip)
end

# Object allocation profiling
StackProf.run(mode: :object, interval: 1, out: 'alloc_profile.dump') do
  1000.times { Hash.new.merge!({key: 'value'}) }
end

# Analyze results from command line
# $ stackprof cpu_profile.dump --text
# $ stackprof wall_profile.dump --flamegraph

Rails applications integrate profiling through middleware and development tools. Rack Mini Profiler provides browser-based profiling interface with database query analysis and memory tracking.

# Gemfile
group :development do
  gem 'rack-mini-profiler'
  gem 'stackprof'
  gem 'memory_profiler'
end

# config/environments/development.rb
if Rails.env.development?
  require 'rack-mini-profiler'
  Rack::MiniProfilerRails.initialize!(Rails.application)
end

# Profile specific controller actions
class UsersController < ApplicationController
  def index
    StackProf.run(mode: :cpu, out: 'tmp/users_index.dump') do
      @users = User.includes(:posts).limit(100)
    end
  end
end

Performance & Memory

Profiling introduces measurement overhead that varies significantly between deterministic and sampling approaches. Ruby-prof can slow execution by 2-20x depending on measurement mode, while sampling profilers like StackProf maintain 1-5% overhead.

Measurement mode selection affects both overhead and information quality. Wall time captures real execution duration including I/O waits, process time excludes system overhead, and allocation tracking measures object creation patterns.

require 'benchmark'
require 'ruby-prof'
require 'stackprof'

def cpu_intensive_task
  (1..100000).map { |i| Math.sqrt(i) }.sum
end

# Measure profiling overhead
puts Benchmark.measure { cpu_intensive_task }
# => 0.045000   0.000000   0.045000 (  0.045123)

# Ruby-prof overhead (deterministic)
puts Benchmark.measure do
  RubyProf.profile { cpu_intensive_task }
end
# => 0.890000   0.010000   0.900000 (  0.901234)

# StackProf overhead (sampling)
puts Benchmark.measure do
  StackProf.run(mode: :cpu) { cpu_intensive_task }
end
# => 0.048000   0.000000   0.048000 (  0.048567)

Memory profiling reveals allocation patterns and garbage collection pressure. Ruby-prof memory mode tracks byte allocation per method, while object mode counts created instances. Memory profilers help identify allocation hotspots and optimize garbage collection performance.

require 'ruby-prof'
require 'memory_profiler'

# Compare allocation patterns
def string_concatenation(count)
  result = ""
  count.times { |i| result += "item #{i} " }
  result
end

def string_interpolation(count)
  items = (0...count).map { |i| "item #{i}" }
  items.join(" ")
end

# Memory allocation profiling
report1 = MemoryProfiler.report do
  string_concatenation(1000)
end

report2 = MemoryProfiler.report do  
  string_interpolation(1000)
end

puts "Concatenation allocated: #{report1.total_allocated_memsize} bytes"
puts "Interpolation allocated: #{report2.total_allocated_memsize} bytes"
# Shows significant allocation difference

# Ruby-prof memory tracking
result = RubyProf.profile(measure_mode: RubyProf::MEMORY) do
  10.times { string_concatenation(100) }
end

printer = RubyProf::FlatPrinter.new(result)
printer.print(STDOUT)
# Reveals method-level memory consumption

StackProf object allocation mode provides detailed allocation tracking with source location information. This mode enables identification of specific code lines responsible for memory pressure.

# Detailed allocation profiling
StackProf.run(mode: :object, raw: true, out: 'allocations.json') do
  users = []
  1000.times do |i|
    user = {
      id: i,
      name: "User #{i}",
      email: "user#{i}@example.com",
      metadata: {created_at: Time.now, active: true}
    }
    users << user
  end
end

# Analyze allocation hotspots
# $ stackprof allocations.json --text
# Reveals hash and string allocation patterns

Garbage collection analysis requires understanding allocation patterns and object lifecycle. Profilers can exclude or highlight garbage collection overhead, showing time spent in mark and sweep phases.

# Monitor GC impact during profiling
GC.stat # => {:count=>15, :heap_allocated_pages=>45, ...}

profile = StackProf.run(mode: :wall, ignore_gc: false) do
  large_dataset = Array.new(100000) { |i| "data_#{i}" * 50 }
  large_dataset.map(&:upcase).select { |s| s.include?('5') }
end

# GC statistics appear in profile output
# Shows percentage of time spent in garbage collection

Production Patterns

Production profiling requires minimal performance impact and safe data collection. External profilers like rbspy attach to running processes without code modification, making them suitable for production investigation. Sampling profilers balance measurement accuracy with application stability.

# Safe production profiling approach
class ApplicationProfiler
  def self.profile_action(controller, action)
    return yield unless should_profile?
    
    profile_data = StackProf.run(
      mode: :wall,
      interval: 100,  # Reduced sampling frequency
      raw: true
    ) { yield }
    
    store_profile_data(controller, action, profile_data)
  rescue => e
    Rails.logger.error "Profiling error: #{e.message}"
    yield  # Continue execution if profiling fails
  end

  private

  def self.should_profile?
    rand < Rails.application.config.profiling_sample_rate
  end

  def self.store_profile_data(controller, action, data)
    # Async storage to avoid blocking request
    ProfileStorageJob.perform_later(
      controller: controller,
      action: action,
      profile: data,
      timestamp: Time.current
    )
  end
end

# Controller integration
class ApplicationController < ActionController::Base
  around_action :profile_requests

  private

  def profile_requests
    ApplicationProfiler.profile_action(
      controller_name, 
      action_name
    ) { yield }
  end
end

Application Performance Monitoring (APM) tools provide continuous profiling with minimal configuration. These tools automatically instrument applications and collect performance metrics without manual profiling code.

# APM tool configuration
# Gemfile
group :production do
  gem 'skylight'  # or 'newrelic_rpm', 'appsignal'
end

# config/application.rb
config.skylight.environments = ['production']
config.skylight.probes += ['redis', 'mongo']

# Automatic instrumentation tracks:
# - HTTP request performance
# - Database query analysis  
# - Background job monitoring
# - Custom instrumentation points

Conditional profiling enables targeted performance investigation without affecting all requests. Profile collection based on request parameters, user segments, or performance thresholds minimizes overhead.

# Conditional profiling strategies
class ConditionalProfiler
  def self.profile_if_slow(threshold_ms = 1000)
    start_time = Time.current
    result = yield
    
    duration_ms = (Time.current - start_time) * 1000
    
    if duration_ms > threshold_ms
      profile_slow_request(duration_ms)
    end
    
    result
  end

  def self.profile_user_segment(user)
    return yield unless user.beta_tester?
    
    StackProf.run(
      mode: :cpu,
      out: "tmp/beta_user_#{user.id}_#{Time.current.to_i}.dump"
    ) { yield }
  end

  def self.profile_sample_percentage(percentage = 0.1)
    return yield unless rand < percentage
    
    StackProf.run(mode: :wall) { yield }
  end
end

Production profiling data aggregation requires careful handling of sensitive information and storage constraints. Profile data contains method names, file paths, and execution patterns that may reveal application internals.

# Secure profile data handling
class ProfileDataManager
  def self.sanitize_profile(profile_data)
    profile_data.deep_transform_values do |value|
      case value
      when String
        sanitize_path(value)
      when Hash
        value.except(:sensitive_data)
      else
        value
      end
    end
  end

  def self.aggregate_profiles(time_window = 1.hour)
    profiles = ProfileData.where(
      created_at: time_window.ago..Time.current
    )
    
    aggregated = profiles.group_by(&:controller_action).map do |action, data|
      {
        action: action,
        avg_samples: data.map(&:samples).sum / data.size,
        total_requests: data.size,
        top_methods: extract_hot_methods(data)
      }
    end
    
    ProfileReport.create!(
      time_window: time_window,
      data: aggregated
    )
  end

  private

  def self.sanitize_path(path)
    path.gsub(Rails.root.to_s, '[ROOT]')
        .gsub(/\/gems\/[^\/]+/, '/[GEM]')
  end
end

Common Pitfalls

Ruby-prof significantly slows program execution, making wall-time measurements unreliable for performance optimization decisions. Deterministic profilers change execution behavior, potentially masking real performance issues or creating artificial bottlenecks.

# Incorrect: Using ruby-prof for wall-time analysis
RubyProf.profile(measure_mode: RubyProf::WALL_TIME) do
  api_call_with_network_timeout  # Results not representative
end

# Correct: Use sampling profiler for realistic timing
StackProf.run(mode: :wall) do
  api_call_with_network_timeout  # Minimal impact on execution
end

Sampling profilers require sufficient sample collection for statistical accuracy. Short-running code may not generate enough samples for meaningful analysis, while incorrect sampling intervals can miss important execution patterns.

# Insufficient sampling - unreliable results
StackProf.run(mode: :cpu, interval: 10000) do  # Too infrequent
  quick_operation  # May not be sampled at all
end

# Better sampling configuration
StackProf.run(mode: :cpu, interval: 1000) do   # 1ms intervals
  1000.times { quick_operation }  # Ensure adequate samples
end

# Long-running profiling for statistical accuracy
def profile_with_warmup
  # Warmup phase - exclude from profiling
  100.times { target_method }
  
  # Actual profiling with sufficient duration
  StackProf.run(mode: :cpu) do
    1000.times { target_method }
  end
end

Thread profiling introduces complexity with concurrent execution analysis. Multiple threads create interleaved execution patterns that complicate bottleneck identification. Thread exclusion and filtering help isolate specific execution paths.

# Thread profiling challenges
require 'ruby-prof'

# Problematic: All threads profiled simultaneously  
RubyProf.start
threads = 5.times.map do |i|
  Thread.new { worker_method(i) }
end
threads.each(&:join)
result = RubyProf.stop
# Output contains mixed thread execution - difficult to analyze

# Better: Profile specific threads
main_thread = Thread.current
RubyProf.profile(include_threads: [main_thread]) do
  single_threaded_work
end

# Or exclude framework threads
excluded_threads = Thread.list.select { |t| t[:name]&.include?('server') }
RubyProf.profile(exclude_threads: excluded_threads) do
  application_logic
end

Memory profiling interpretation requires understanding Ruby's object allocation and garbage collection behavior. Object allocation profilers count creation events, not concurrent object existence.

# Memory profiling misconceptions
def analyze_memory_usage
  # Incorrect assumption: High allocation means high memory usage
  profile = StackProf.run(mode: :object) do
    1000.times do
      temp_array = [1, 2, 3]  # Allocated but quickly collected
      temp_array.sum
    end
  end
  
  # This shows 1000 array allocations, not 1000 arrays in memory
end

# Better memory analysis combines allocation and retention
def comprehensive_memory_analysis
  # Track allocations
  allocation_profile = StackProf.run(mode: :object) do
    complex_operation
  end
  
  # Track memory retention
  memory_before = GC.stat[:heap_live_slots]
  complex_operation
  GC.start
  memory_after = GC.stat[:heap_live_slots]
  
  puts "Allocated objects: #{allocation_profile[:samples]}"
  puts "Retained objects: #{memory_after - memory_before}"
end

Profile interpretation errors include focusing on absolute rather than relative performance metrics. Method call counts and execution times depend on input size, system load, and measurement overhead.

# Misleading absolute metrics
def profile_string_operations
  small_data = ["short"] * 10
  large_data = ["longer string data"] * 10000
  
  # Different input sizes produce incomparable results
  small_profile = StackProf.run(mode: :cpu) do
    small_data.join(" ")
  end
  
  large_profile = StackProf.run(mode: :cpu) do  
    large_data.join(" ")
  end
  
  # Cannot compare absolute sample counts between profiles
end

# Correct approach: Normalize and compare patterns
def comparative_profiling
  [100, 1000, 10000].each do |size|
    data = ["item"] * size
    
    profile = StackProf.run(mode: :cpu) do
      data.join(" ")
    end
    
    # Analyze samples per input unit for scaling behavior
    samples_per_item = profile[:samples].to_f / size
    puts "Size #{size}: #{samples_per_item} samples per item"
  end
end

Reference

Core Classes

Class/Module	Purpose	Key Methods
`Profiler__`	Built-in deterministic profiler	`start_profile`, `stop_profile`, `print_profile`
`RubyProf::Profile`	Advanced profiling with multiple modes	`start`, `stop`, `profile`, `pause`, `resume`
`StackProf`	Sampling-based call stack profiler	`run`, `start`, `stop`
`MemoryProfiler`	Memory allocation tracking	`report`
`Benchmark`	Simple timing measurements	`measure`, `realtime`, `benchmark`

Measurement Modes

Mode	Ruby-Prof Constant	StackProf Mode	Measures	Overhead
Wall Time	`RubyProf::WALL_TIME`	`:wall`	Real elapsed time including I/O	Low-Medium
Process Time	`RubyProf::PROCESS_TIME`	`:cpu`	CPU time excluding system calls	Low
Allocations	`RubyProf::ALLOCATIONS`	`:object`	Object creation count	Medium-High
Memory Usage	`RubyProf::MEMORY`	N/A	Byte allocation tracking	High

Command Line Tools

Tool	Usage	Output Formats
`ruby -rprofile script.rb`	Automatic built-in profiling	Text report
`ruby-prof script.rb`	Command-line ruby-prof execution	Text, HTML, GraphViz, JSON
`stackprof dump.dump --text`	StackProf report generation	Text, GraphViz, FlameGraph, JSON
`rbspy record --pid PID`	External process profiling	FlameGraph, raw samples

Configuration Options

StackProf Options

StackProf.run(
  mode: :cpu,           # :cpu, :wall, :object
  interval: 1000,       # Sampling interval (microseconds for :cpu/:wall, allocations for :object)  
  out: 'profile.dump',  # Output file path
  raw: false,           # Include raw sample data
  ignore_gc: false      # Exclude garbage collection frames
)

RubyProf Options

RubyProf.profile(
  measure_mode: RubyProf::WALL_TIME,  # Measurement type
  track_allocations: false,           # Track object allocation details
  include_threads: [Thread.current],  # Threads to include
  exclude_threads: [],                # Threads to exclude
  merge_fibers: false                 # Combine fiber execution
)

Output Interpreters

Report Type	Description	Use Case
Flat Report	Method-level time/allocation summary	Quick bottleneck identification
Graph Report	Call hierarchy with parent/child relationships	Understanding call flow
Call Tree	Hierarchical execution structure	Detailed execution analysis
FlameGraph	Interactive flame graph visualization	Visual performance analysis

Integration Patterns

Rails Middleware

# config/application.rb
config.middleware.insert_before(
  Rack::Runtime,
  Rack::MiniProfiler
)

Background Job Profiling

class ProfiledJob < ApplicationJob
  around_perform :profile_execution

  private

  def profile_execution
    StackProf.run(
      mode: :wall,
      out: "tmp/job_#{jid}_profile.dump"
    ) { yield }
  end
end

Error Handling

Error Type	Common Cause	Solution
`NoMethodError` on profile results	Incorrect profiler usage	Verify profiler is started/stopped correctly
Empty profile data	Insufficient execution time	Increase workload or reduce sampling interval
High memory usage during profiling	Allocation tracking enabled	Use sampling modes or disable allocation tracking
Thread synchronization errors	Multi-threaded profiling	Use thread inclusion/exclusion filters

Performance Baselines

Operation	Ruby-Prof Overhead	StackProf Overhead	Recommended Profiler
CPU-intensive computation	5-20x slower	< 5%	StackProf :cpu
I/O operations	2-10x slower	< 2%	StackProf :wall
Memory allocation analysis	10-50x slower	10-30%	StackProf :object
Production profiling	Not recommended	< 1%	StackProf or rbspy