CrackedRuby - CPU Profiling

Overview

CPU profiling measures how much processor time different parts of a program consume during execution. The profiler tracks function calls, execution time, and call frequencies to identify where a program spends most of its computational resources. This information reveals bottlenecks that slow down applications and guides optimization efforts toward the areas with the highest performance impact.

Profilers operate through two primary mechanisms: sampling and instrumentation. Sampling profilers periodically interrupt program execution to record the current call stack, building a statistical picture of where time is spent. Instrumentation profilers insert tracking code around function calls to measure exact execution times. Each approach offers different trade-offs between accuracy and overhead.

The output from CPU profiling typically includes flat profiles showing total time per function, call graphs illustrating caller-callee relationships, and flame graphs visualizing the call stack hierarchy. These representations help developers understand not just which functions are slow, but why they are slow and how they relate to the broader execution flow.

# A function consuming CPU time
def process_data(items)
  items.map do |item|
    expensive_calculation(item)
  end
end

# Profiling reveals which operations dominate execution time
# Without profiling: assumption that map is the bottleneck
# With profiling: discovery that expensive_calculation is the actual problem

CPU profiling differs from memory profiling, which tracks allocation patterns, and from wall-clock profiling, which measures elapsed time including I/O waits and system delays. CPU profiling focuses exclusively on computational work, making it the correct tool for optimization when the program is CPU-bound rather than waiting on external resources.

Key Principles

Sampling vs Instrumentation: Sampling profilers interrupt execution at regular intervals and record the active call stack. This approach introduces minimal overhead but produces statistical approximations rather than exact measurements. The accuracy improves with longer profiling runs and higher sampling rates. Instrumentation profilers inject tracking code that executes on every function entry and exit, providing precise measurements but with higher overhead that can distort the actual performance characteristics.

Inclusive vs Exclusive Time: Inclusive time represents the total duration spent in a function including all functions it calls. Exclusive time measures only the time spent in the function's own code, excluding called functions. A function might show high inclusive time because it calls expensive operations, but low exclusive time because its own logic is fast. This distinction identifies whether to optimize the function itself or the operations it invokes.

Call Graph Analysis: The call graph captures the relationship between callers and callees, showing how many times each function was called and from where. A function called once but consuming significant time differs fundamentally from a function called millions of times with low individual cost. The call graph reveals whether optimization should focus on reducing per-call cost or reducing call frequency.

Profiling Overhead: Every profiling technique adds overhead to program execution. Sampling profilers typically add 1-5% overhead, while instrumentation can add 10-100% depending on the granularity. This overhead can change program behavior, particularly in timing-sensitive code or applications with many short-lived function calls. Production profiling requires low-overhead approaches, while development profiling can accept higher overhead for increased accuracy.

Statistical Significance: Sampling profilers provide statistical estimates that vary between runs. A function appearing at the top of one profile might appear lower in another due to sampling variance. Longer profiling sessions and multiple runs improve statistical reliability. Functions consuming less than 1-2% of total time often fall within noise thresholds where the measurements lack significance.

Hot Path Identification: Programs typically spend most execution time in a small percentage of code paths. The Pareto principle often applies: 80% of time spent in 20% of code. Profiling identifies these hot paths where optimization efforts yield the highest returns. Code executed rarely contributes negligibly to overall performance regardless of individual function cost.

# Inclusive vs Exclusive time demonstration
def outer_function(data)
  prepare_data(data)      # Some exclusive time here
  process_items(data)     # Large inclusive time due to this call
  finalize_results(data)  # Some exclusive time here
end

def process_items(data)
  data.each do |item|
    expensive_operation(item)  # This dominates inclusive time of process_items
  end
end

# Profiler output interpretation:
# outer_function: high inclusive, low exclusive -> bottleneck in called functions
# process_items: high inclusive, low exclusive -> bottleneck in expensive_operation
# expensive_operation: high exclusive time -> optimize this function's code

Self Time Metrics: Self time (equivalent to exclusive time) shows the time spent executing a function's own instructions. This metric isolates the function's performance characteristics from its dependencies. High self time indicates inefficient algorithms or operations within the function body. Low self time with high inclusive time suggests the function coordinates expensive operations but performs minimal work itself.

Flame Graph Interpretation: Flame graphs display the call stack as stacked rectangles where width represents time consumed and height represents call depth. The bottom represents program entry points, with called functions stacked above their callers. Wide rectangles indicate hot paths, while tall stacks show deep call chains. Flat, wide patterns suggest tight loops or repeated operations, while narrow spikes indicate occasional expensive operations.

Ruby Implementation

Ruby provides built-in profiling through the ruby-prof gem and standard library profilers. The Benchmark module measures execution time for code blocks, while ruby-prof offers comprehensive profiling with multiple output formats.

require 'benchmark'

# Basic timing measurement
time = Benchmark.measure do
  1_000_000.times { Math.sqrt(rand) }
end

puts time
# => 0.234000   0.001000   0.235000 (  0.237426)
#    user CPU   system CPU  total     elapsed real time

The ruby-prof gem provides multiple profiling modes. Flat profiling shows total time per method, graph profiling displays call relationships, and call stack profiling captures complete execution hierarchies.

require 'ruby-prof'

# Start profiling
RubyProf.start

# Code to profile
result = expensive_computation(large_dataset)

# Stop profiling and generate report
result = RubyProf.stop

# Print flat profile
printer = RubyProf::FlatPrinter.new(result)
printer.print(STDOUT)

Ruby's profiling distinguishes between wall time, process time, and CPU time. Wall time includes all elapsed time including I/O waits. Process time measures time the Ruby process was active. CPU time tracks actual processor execution, excluding waits.

# Configure profiler for CPU time measurement
RubyProf.measure_mode = RubyProf::CPU_TIME

RubyProf.start
perform_cpu_intensive_work
result = RubyProf.stop

# Generate call graph
printer = RubyProf::GraphPrinter.new(result)
printer.print(STDOUT, min_percent: 2)

Call graph output shows parent-child relationships with time attribution. Each entry displays the function name, self time, cumulative time, number of calls, and calling relationships.

# Call graph output interpretation example
Thread ID: 70123456789000
Fiber ID: 70123456789020
Total: 2.34
Sort by: self_time

 %self      total      self      wait     child     calls  name
 45.32      1.061     1.061     0.000     0.000   500000   Array#each
 23.45      0.549     0.549     0.000     0.000   500000   Object#expensive_operation
 12.34      1.610     0.289     0.000     1.321        1   Object#process_data

Ruby-prof supports multiple output formats for different analysis needs. Flat printer shows sorted method times. Graph printer displays call relationships. Call stack printer outputs full execution traces. HTML and graphviz formats provide visual representations.

# Generate multiple output formats
result = RubyProf.stop

# Flat profile to text file
File.open('profile_flat.txt', 'w') do |file|
  RubyProf::FlatPrinter.new(result).print(file)
end

# Call graph visualization
File.open('profile_graph.html', 'w') do |file|
  RubyProf::GraphHtmlPrinter.new(result).print(file)
end

# Call stack profile
File.open('profile_stack.html', 'w') do |file|
  RubyProf::CallStackPrinter.new(result).print(file)
end

Stack profiling with ruby-prof captures complete execution contexts. The stack profiler records every method invocation with timing information, creating a detailed execution timeline.

RubyProf.measure_mode = RubyProf::WALL_TIME
result = RubyProf.profile do
  complex_application_flow
end

# Generate flame graph data
File.open('flamegraph.txt', 'w') do |file|
  RubyProf::FlameGraphPrinter.new(result).print(file)
end

The stackprof gem provides sampling-based profiling with lower overhead than instrumentation. It's suitable for production environments where overhead must remain minimal.

require 'stackprof'

# Start sampling profiler
StackProf.run(mode: :cpu, out: 'stackprof-cpu.dump') do
  run_application_code
end

# Analyze results separately
# stackprof stackprof-cpu.dump --text --limit 20

Block-based profiling wraps specific code sections for targeted analysis rather than profiling entire application execution.

RubyProf.start

# Only this code block is profiled
result = suspect_method(large_input)

profile_result = RubyProf.stop

# Analysis focuses on suspect_method and its callees
printer = RubyProf::GraphPrinter.new(profile_result)
printer.print(STDOUT)

Tools & Ecosystem

ruby-prof: Comprehensive instrumentation-based profiler with multiple output formats and measurement modes. Supports wall time, CPU time, allocations, and memory profiling. Generates flat profiles, call graphs, call stacks, and flame graphs. Higher overhead than sampling profilers but provides exact measurements.

# Install ruby-prof
gem install ruby-prof

# Command-line profiling
ruby-prof --printer=graph --file=profile.txt script.rb

# Programmatic use with customization
RubyProf.measure_mode = RubyProf::CPU_TIME
result = RubyProf.profile { run_code }
printer = RubyProf::GraphPrinter.new(result)
printer.print(STDOUT, min_percent: 1)

stackprof: Sampling profiler designed for low overhead in production environments. Collects statistical samples rather than instrumenting every call. Outputs raw dump files for offline analysis. Integrates with flamegraph generators.

# Install stackprof
gem install stackprof

# Profile with specific mode
StackProf.run(mode: :cpu, interval: 1000, out: 'tmp/stackprof.dump') do
  application_code
end

# Command-line analysis
# stackprof tmp/stackprof.dump --text
# stackprof tmp/stackprof.dump --method 'MyClass#slow_method'

rack-mini-profiler: Rack middleware for profiling web applications. Displays profiling results in browser UI overlays. Shows SQL queries, rendering times, and Ruby execution breakdown. Supports both ruby-prof and stackprof backends.

# Add to Gemfile
gem 'rack-mini-profiler'

# Enable in development
# config/environments/development.rb
config.middleware.use Rack::MiniProfiler

# Access profiling UI
# Visit any page, profiler badge appears in corner
# Click badge to expand detailed timing breakdown

perftools.rb: Ruby bindings to Google Performance Tools. Provides extremely low-overhead sampling profiling. Generates profiles compatible with pprof visualization tools. Requires separate installation of gperftools system library.

require 'perftools'

# Profile code block
PerfTools::CpuProfiler.start('/tmp/profile.dump') do
  cpu_intensive_operations
end

# Generate PDF callgraph
# pprof.rb --pdf /tmp/profile.dump > profile.pdf

benchmark-ips: Measures iterations per second for performance comparison. Not a profiler but complementary for micro-benchmarking alternative implementations. Shows statistical confidence intervals and comparative performance.

require 'benchmark/ips'

Benchmark.ips do |x|
  x.report('map') { array.map { |n| n * 2 } }
  x.report('each') { result = []; array.each { |n| result << n * 2 }; result }
  x.compare!
end

# Output shows relative performance
# map:  123456.7 i/s
# each: 98765.4 i/s - 1.25x slower

memory_profiler: Tracks object allocations rather than CPU time. Complementary to CPU profiling for comprehensive performance analysis. Identifies allocation hot spots that contribute to garbage collection overhead.

require 'memory_profiler'

report = MemoryProfiler.report do
  create_many_objects
end

report.pretty_print

flamegraph: Generates flame graph visualizations from stackprof or ruby-prof output. Creates interactive SVG graphs showing call stack hierarchies. Width represents time spent, colors differentiate stack frames.

# Install flamegraph tool
gem install flamegraph

# Generate from stackprof dump
stackprof --flamegraph stackprof.dump > profile.txt
flamegraph profile.txt > flame.svg

# Open flame.svg in browser for interactive exploration

vernier: Modern sampling profiler with both wall time and CPU time support. Generates multiple output formats including flame graphs. Lower overhead than ruby-prof with good accuracy.

require 'vernier'

Vernier.profile(out: 'profile.json') do
  application_code
end

# View in browser-based UI
# vernier profile.json

rbspy: System-level sampling profiler that attaches to running Ruby processes without code modification. Profiles production applications without requiring gem installation or code changes. Useful for debugging live performance issues.

# Install rbspy (not a Ruby gem, system tool)
# cargo install rbspy  # or platform-specific package manager

# Attach to running Ruby process
rbspy record --pid 12345 --file profile.svg

# No code changes required in target application

Practical Examples

Example 1: Identifying Algorithmic Bottleneck

An application processes user data through multiple transformation steps. Performance degrades with larger datasets but the bottleneck location remains unclear.

require 'ruby-prof'

class DataProcessor
  def process(users)
    users.map { |user| normalize(user) }
         .select { |user| valid?(user) }
         .map { |user| enrich(user) }
         .sort_by { |user| user[:score] }
  end

  def normalize(user)
    # Simulate data normalization
    user.transform_keys(&:to_sym)
  end

  def valid?(user)
    # Simulate validation with database check
    sleep(0.001)  # Simulates I/O
    user[:email]&.include?('@')
  end

  def enrich(user)
    # Simulate data enrichment
    1000.times { Math.sqrt(rand) }  # CPU-intensive work
    user.merge(enriched: true)
  end
end

# Generate test data
users = 1000.times.map do |i|
  { 'email' => "user#{i}@example.com", 'score' => rand(100) }
end

processor = DataProcessor.new

# Profile the processing
RubyProf.measure_mode = RubyProf::CPU_TIME
result = RubyProf.profile do
  processor.process(users)
end

# Output shows enrich method dominates CPU time
printer = RubyProf::FlatPrinter.new(result)
printer.print(STDOUT)

The profile reveals that enrich consumes 95% of CPU time despite valid? appearing expensive. The sleep in valid? doesn't consume CPU time, only wall time. Optimization efforts should focus on the enrichment calculation rather than validation logic.

Example 2: Web Request Profiling

A Rails controller action experiences slow response times. Profiling identifies which component causes the delay.

class ArticlesController < ApplicationController
  def index
    result = RubyProf.profile do
      @articles = Article.includes(:author, :comments)
                        .where(published: true)
                        .order(created_at: :desc)
                        .limit(20)
      
      @articles.each do |article|
        @featured_image = article.featured_image_url
        @summary = article.generate_summary
      end
    end
    
    # Save profile to file for analysis
    File.open('tmp/controller_profile.html', 'w') do |file|
      RubyProf::GraphHtmlPrinter.new(result).print(file, min_percent: 2)
    end
  end
end

# Profile reveals generate_summary called on each article
# consuming 70% of request time
# Optimization: move summary generation to background job
# or cache summaries after article creation

Example 3: Background Job Optimization

A background job processes images but occasionally times out on large batches.

require 'stackprof'

class ImageProcessor
  def process_batch(image_ids)
    StackProf.run(mode: :cpu, out: 'tmp/image_processing.dump', interval: 1000) do
      image_ids.each do |id|
        image = Image.find(id)
        process_image(image)
      end
    end
  end

  def process_image(image)
    original = load_image(image.path)
    thumbnail = resize(original, 150, 150)
    medium = resize(original, 500, 500)
    large = resize(original, 1200, 1200)
    
    save_processed(image, thumbnail, medium, large)
  end

  def resize(image, width, height)
    # Image manipulation logic
    # Simulated with computation
    5000.times { Math.sqrt(rand) }
  end

  def load_image(path)
    # Load image from storage
  end

  def save_processed(image, *versions)
    # Save processed versions
  end
end

# Analysis command:
# stackprof tmp/image_processing.dump --text --limit 10

StackProf analysis shows resize method dominates execution time with multiple calls per image. The profile distinguishes between resize calls for different dimensions, revealing that large image resizing consumes disproportionate time. Optimization strategies: implement incremental resizing from large to small, or parallelize resizing operations.

Example 4: Algorithm Comparison

Comparing different sorting implementations to identify the most efficient approach for specific data characteristics.

require 'ruby-prof'

def quicksort(array)
  return array if array.length <= 1
  pivot = array[array.length / 2]
  left = array.select { |x| x < pivot }
  middle = array.select { |x| x == pivot }
  right = array.select { |x| x > pivot }
  quicksort(left) + middle + quicksort(right)
end

def mergesort(array)
  return array if array.length <= 1
  mid = array.length / 2
  left = mergesort(array[0...mid])
  right = mergesort(array[mid..-1])
  merge(left, right)
end

def merge(left, right)
  result = []
  until left.empty? || right.empty?
    result << (left.first <= right.first ? left.shift : right.shift)
  end
  result + left + right
end

data = Array.new(10_000) { rand(1_000_000) }

# Profile each algorithm
algorithms = {
  'quicksort' => -> { quicksort(data.dup) },
  'mergesort' => -> { mergesort(data.dup) },
  'native_sort' => -> { data.dup.sort }
}

algorithms.each do |name, algo|
  result = RubyProf.profile { algo.call }
  puts "\n#{name}:"
  printer = RubyProf::FlatPrinter.new(result)
  printer.print(STDOUT)
end

# Profile comparison reveals:
# - native_sort: fastest, implemented in C
# - mergesort: predictable performance, higher memory usage
# - quicksort: slower on average, more array operations

Example 5: Database Query Analysis

Profiling reveals whether application slowness comes from Ruby code or database queries.

require 'ruby-prof'

class ReportGenerator
  def generate_sales_report(start_date, end_date)
    RubyProf.measure_mode = RubyProf::WALL_TIME
    result = RubyProf.profile do
      # Fetch all orders in date range
      orders = Order.where(created_at: start_date..end_date).includes(:line_items, :customer)
      
      # Calculate metrics
      total_revenue = orders.sum { |order| order.total }
      average_order = total_revenue / orders.count
      
      # Process each order
      order_details = orders.map do |order|
        {
          id: order.id,
          customer: order.customer.name,
          items: order.line_items.count,
          total: order.total,
          profit_margin: calculate_margin(order)
        }
      end
      
      { total_revenue: total_revenue, average_order: average_order, orders: order_details }
    end
    
    # Analyze profile
    printer = RubyProf::GraphPrinter.new(result)
    printer.print(STDOUT)
  end

  def calculate_margin(order)
    # Complex calculation
    order.line_items.sum do |item|
      (item.price - item.cost) / item.price.to_f
    end / order.line_items.count
  end
end

# Profile shows time split between:
# - ActiveRecord query time (database I/O)
# - calculate_margin CPU time (Ruby computation)
# - order.total calls triggering additional queries (N+1)

Performance Considerations

Profiler overhead directly affects application behavior. Instrumentation profilers like ruby-prof insert measurement code around every method call, adding 10-100% overhead depending on call frequency. This overhead distorts measurements, particularly for functions called millions of times with microsecond execution times. The measurement overhead becomes larger than the actual work performed.

Sampling profilers reduce overhead to 1-5% by periodically interrupting execution rather than tracking every call. The trade-off is statistical accuracy versus precise measurement. Sampling intervals determine both overhead and accuracy. Lower intervals (500 microseconds) provide detailed sampling but higher overhead. Higher intervals (5000 microseconds) reduce overhead but might miss short-lived hot paths.

# Demonstrate overhead impact
require 'benchmark'

def fast_function
  x = 1 + 1
end

# Without profiling
time_normal = Benchmark.measure do
  10_000_000.times { fast_function }
end

# With instrumentation profiling
require 'ruby-prof'
time_profiled = Benchmark.measure do
  RubyProf.profile do
    10_000_000.times { fast_function }
  end
end

puts "Normal: #{time_normal.real}s"
puts "Profiled: #{time_profiled.real}s"
puts "Overhead: #{((time_profiled.real / time_normal.real - 1) * 100).round(1)}%"

# Typical output:
# Normal: 0.89s
# Profiled: 9.34s
# Overhead: 949.4%

Profile interpretation requires distinguishing between self time and total time. Functions with high total time but low self time delegate work to other functions. Optimizing the coordinator function yields minimal improvement. Functions with high self time perform substantial work internally and represent optimization targets.

Recursive functions present profiling challenges. Each recursive call appears as a separate frame in the call stack, making aggregate time calculations complex. Profilers may show confusing call counts where a single logical invocation results in thousands of profile entries. Understanding the recursive structure is essential for correct interpretation.

# Recursive function profiling
def fibonacci(n)
  return n if n <= 1
  fibonacci(n - 1) + fibonacci(n - 2)
end

# Profile shows:
# fibonacci called 177 times for fibonacci(10)
# Each call appears separately in profile
# Aggregate time across all calls needed for optimization

Call frequency matters as much as individual call cost. A function executing in 1 millisecond called once per request differs dramatically from the same function called 1000 times per request. The latter dominates execution time despite identical per-call cost. Profiling reveals both dimensions, enabling optimization decisions based on reducing call frequency or call cost.

Garbage collection time appears in profiles as GC-related methods. Heavy object allocation triggers frequent collections, showing as time in garbage collector functions. This appears as Ruby execution time but stems from allocation patterns rather than algorithmic efficiency. Memory profiling complements CPU profiling to identify allocation hot spots.

Background I/O operations don't appear in CPU profiles. A function that makes network requests or queries databases shows minimal CPU time while consuming significant wall time. CPU profiling identifies computational bottlenecks but misses I/O-bound performance issues. Wall time profiling captures both CPU and I/O delays.

# CPU time vs wall time demonstration
require 'ruby-prof'

def cpu_intensive
  1_000_000.times { Math.sqrt(rand) }
end

def io_intensive
  sleep(0.5)  # Simulates I/O wait
end

# CPU time measurement
RubyProf.measure_mode = RubyProf::CPU_TIME
cpu_result = RubyProf.profile do
  cpu_intensive
  io_intensive
end

# cpu_intensive shows high CPU time
# io_intensive shows near-zero CPU time

# Wall time measurement
RubyProf.measure_mode = RubyProf::WALL_TIME
wall_result = RubyProf.profile do
  cpu_intensive
  io_intensive
end

# Both functions show significant wall time

Profile granularity affects result usefulness. Too granular profiling shows individual small functions with minimal time each, obscuring high-level bottlenecks. Too coarse profiling aggregates time into large modules without identifying specific problem areas. Adjusting minimum time thresholds in profile output focuses on significant contributors.

Production profiling requires low-overhead approaches. Sampling profilers like stackprof or rbspy enable production profiling without significant performance degradation. Continuous profiling can run in production to identify performance regressions or anomalies under real traffic patterns.

Common Pitfalls

Profiling debug or development builds: Development environments often run with different optimizations than production. Ruby running with RUBY_ENV=development may include additional checks or logging absent in production. Profiling development builds produces measurements that don't reflect production performance characteristics. Profile in production-like environments with production configuration.

Insufficient profiling duration: Short profiling runs produce unreliable statistics, particularly with sampling profilers. Statistical noise dominates when profiling for seconds rather than minutes. Functions may appear hot due to sampling coincidence rather than actual performance impact. Run profiles for sufficient duration to collect statistically significant samples, typically 30-60 seconds minimum.

Profiling with unrealistic data: Testing with small datasets or simplified scenarios misses scaling problems. A function performing well with 100 records might become a bottleneck with 100,000 records. Profile using production-scale data and realistic workloads to identify genuine performance issues rather than artificial test case artifacts.

# Unrealistic test data
small_dataset = Array.new(10) { rand(1000) }

# This profiles well
result = RubyProf.profile do
  small_dataset.each do |item|
    expensive_operation(item)
  end
end

# Production reality
production_dataset = Array.new(1_000_000) { rand(1000) }

# Different bottlenecks emerge at scale
production_result = RubyProf.profile do
  production_dataset.each do |item|
    expensive_operation(item)
  end
end

Ignoring call context: A function might be slow when called from one context but fast from another due to different input characteristics. Profilers show aggregate time across all call sites. Examining call graphs reveals which callers contribute most to total time, enabling targeted optimization of specific use cases rather than premature generalization.

Optimizing insignificant functions: Focusing on functions consuming 2-3% of total time yields minimal overall improvement. The 80/20 rule applies to performance optimization. Functions below 5% threshold rarely warrant optimization effort. Profile data guides effort toward high-impact optimizations rather than premature micro-optimization.

Confusing wall time and CPU time: Wall time includes I/O waits, thread synchronization, and system interrupts. CPU time measures only processor execution. A function with high wall time but low CPU time is I/O-bound, not compute-bound. Optimizing CPU-bound functions when the bottleneck is I/O wastes effort. Match profiling mode to the actual performance characteristic being investigated.

Profiler observer effect: Heavy instrumentation changes program behavior. Tight loops or frequently called functions exhibit disproportionate slowdown under profiling. The measured performance no longer represents actual performance. Sampling profilers reduce but don't eliminate this effect. Compare profiled and unprofiled execution times to assess measurement reliability.

# Observer effect demonstration
def tight_loop
  sum = 0
  1_000_000.times { sum += 1 }
  sum
end

# The profiler overhead exceeds the actual work
# Profiled version shows artificially inflated time
# Real performance: 50ms
# Profiled performance: 600ms
# Measurements become unreliable

Single-run optimization decisions: Performance varies between runs due to system load, garbage collection timing, and other factors. One profile showing a particular bottleneck might not represent typical behavior. Multiple profiling runs reveal consistent patterns versus anomalies. Base optimization decisions on consistent patterns across multiple runs rather than single measurements.

Ignoring standard library performance: Ruby's standard library methods have varying performance characteristics. Some methods create intermediate arrays, others operate in-place. Profiling reveals when standard library choices become bottlenecks. For example, map creates a new array while each doesn't; this difference matters when processing large collections.

Profiling with concurrent activity: Profiling while other applications consume CPU resources produces distorted measurements. System load affects both execution time and profile accuracy. Profile on idle systems or dedicated profiling environments to eliminate external interference. Container environments may show different performance characteristics than bare metal due to resource sharing.

Misinterpreting recursive profiles: Recursive algorithms generate confusing profile output with many identical function entries. The profiler shows each recursive call separately, making the call graph complex. Total time gets divided across recursion depth rather than showing aggregate function cost. Understanding the recursion pattern is essential for extracting meaningful insights.

Reference

Ruby Profiling Modes

Mode	Measurement	Use Case	Overhead
CPU_TIME	Processor time only	Compute-bound optimization	Low to Medium
WALL_TIME	Total elapsed time	Overall performance analysis	Low to Medium
ALLOCATIONS	Object creation count	Memory optimization	Medium
MEMORY	Memory consumption	Memory leak detection	Medium to High

Profiler Output Types

Output	Information	Best For
Flat Profile	Sorted function times	Quick bottleneck identification
Call Graph	Caller-callee relationships	Understanding execution flow
Call Stack	Complete execution traces	Detailed path analysis
Flame Graph	Visual stack hierarchy	Pattern recognition

Ruby-prof Printers

Printer	Format	Primary Use
FlatPrinter	Text table	Terminal output, quick analysis
GraphPrinter	Text call graph	Understanding call relationships
GraphHtmlPrinter	HTML visualization	Detailed browser-based analysis
CallStackPrinter	HTML timeline	Execution flow visualization
FlameGraphPrinter	Flame graph data	Visual pattern identification
CallTreePrinter	KCacheGrind format	External tool analysis

Key Metrics Interpretation

Metric	Definition	Optimization Signal
Self Time	Time in function's own code	High value indicates direct optimization target
Total Time	Time including called functions	High value suggests examining callees
Call Count	Number of invocations	High value with low time suggests overhead reduction
Average Time	Total time divided by calls	High value indicates expensive individual calls

Profiling Command Examples

Task	Command
Basic ruby-prof	ruby-prof script.rb
Specify printer	ruby-prof --printer=graph script.rb
Output to file	ruby-prof --file=profile.txt script.rb
Filter results	ruby-prof --min-percent=2 script.rb
Stackprof text output	stackprof profile.dump --text
Stackprof method details	stackprof profile.dump --method MethodName
Generate flame graph	stackprof profile.dump --flamegraph > profile.txt

Profile Analysis Checklist

Step	Action	Goal
1	Run profile with realistic data	Ensure measurements reflect reality
2	Identify functions above 5% total time	Focus on significant contributors
3	Examine self time vs total time	Distinguish direct cost from delegated work
4	Check call counts	Identify frequency-driven bottlenecks
5	Analyze call graph	Understand execution flow
6	Repeat profile multiple times	Verify consistency
7	Profile after optimization	Measure improvement

Common Profiling Patterns

Pattern	Code Structure	Profiling Approach
Hot Loop	Repeated iteration	Focus on per-iteration cost
Deep Recursion	Self-calling functions	Aggregate recursive call costs
N+1 Queries	Database access in loops	Wall time with query logging
Heavy Computation	Math-intensive operations	CPU time measurement
Object Creation	Many allocations	Allocation mode profiling

Production Profiling Tools

Tool	Type	Overhead	Installation
stackprof	Sampling	1-2%	Ruby gem
rbspy	System sampling	1-3%	System binary
rack-mini-profiler	Request profiling	2-5%	Rails middleware
vernier	Sampling	1-2%	Ruby gem

Profile Filtering Options

Filter	Effect	Use Case
min-percent	Hide functions below threshold	Focus on significant contributors
method-filter	Show specific methods only	Analyze particular components
call-depth	Limit stack depth	Simplify deep call graphs
thread-filter	Isolate specific threads	Concurrent program analysis

CPU Profiling